Before coming to the Moses we need to know brief introduction to Natural Language Processing and Language translation. Then we could understand the Moses easily.
Natural Language Processing
Natural Language Processing is an Artificial Intelligence method which is used to communicate with Intelligent system such as Computers using natural language such as English, Tamil and Sinhala. By utilizing NLP, developers can organize and structure knowledge to
perform tasks such as automatic summarization, translation, named entity
recognition, relationship extraction, sentiment analysis, speech
recognition, and topic segmentation. NLP considers the hierarchical structure of language: several words make a phrase, several phrases make a sentence. NLP is commonly used for text mining, machine translation, and automated question answering.
We can use NLP to translate from one language to another language. Instead of hand-coding large sets of rules, NLP can rely on machine
learning to automatically learn these rules by analyzing a set of
examples (i.e. a large corpus) and making a statistical inference.
Moses
Moses is a statistical machine translation system which allows you to translate from one language to another language by training translation models. For training the model you need collection of translated texts in both language (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices. It is a data driven machine translation approach. Moses system based on the Bayes theorem.
If we explain this in translation point of view, probability of translation from f language to e language is depend on probability of translation from e language to f language and probability of e language.
Further the system can be drilled down as log linear models.
- Weight-t Translation
- Weight-l Language model
- Weight-d distortion (reordering)
- Weight-w word penalty
Moses was developed in C++ for efficiency and followed modular, object-oriented design.The toolkit is a complete out-of-the-box
translation system for academic research. It consists of all the
components needed to preprocess data, train the language models and the
translation models. It also contains tools for tuning these models using
minimum error rate training and evaluating the resulting translations
using the BLEU score.
Moses requires two main things
- Parallel text- Collection of sentences in two different languages, which is sentence-aligned, each sentence in one language is matched with its corresponding translated sentence in the other language.
- Monolingual target set- A statistical model built using monolingual data in the target language and used by the decoder to try to ensure the fluency of the output.
- Training Pipeline- Take the raw data and turn it into a machine translation model
- Decoder- Translate the source sentence into the target language
- Input: This can be a plain sentence, or it can be annotated with xml-like elements to guide the translation process, or it can be a more complex structure like a lattice or confusion network.
- Translation model: This can use phrase-phrase rules, or hierarchical (perhaps syntactic) rules.
- Decoding algorithm: Decoding is a huge search problem.
- Language model: Moses supports several different language model toolkits (SRILM, KenLM, IRSTLM, RandLM)
We need to install Moses to train the system and get output. So in the next blog we will see how to install Moses in the Ubuntu operating system.
This gives a good understanding to the beginners. Well done. Keep going.
ReplyDeletegood job.
ReplyDeletenice tutoria.it gives good understanding. but if a littel topic included it will be more usable.
ReplyDelete