Before we start, we will make sure that we have installed those packages.
If you have not install above packages you can install using below command.
sudo apt-get install <package name>
g++ and boost is needed for compile Moses. Already we could install boost by above command. So below we will see how to install boost.
For that, we need to download the boost. You can use wget command to download the boost. If you have any trouble to download it, you can straightly download the latest version of the boost from https://sourceforge.net/projects/boost/files/boost/. After you download boost<version>.tar.gz,
you can extract it using the following command.
tar zxvf boost<version>.tar.gz
Then go inside the boost folder and u need to start the script.
./b2 -j5 --prefix=$PWD --libdir=$PWD/lib64 --layout=tagged link=static threading=multi,single install || echo FAILURE
This creates library file in the directory lib64, NOT in the system directory.
Note: In the last command " -j5 " indicates my PC is 5 Core machine (i.e my processor is CORE I5 ) If you are using different core machine change it in your core value.
For installing Moses, you need to clone it from the GitHub. That is why we installed git in our system.
You can clone the Moses from this git hub link https://github.com/moses-smt/mosesdecoder by below code.
git clone https://github.com/moses-smt/mosesdecoder.git
Then you can compile Moses using
make -f contrib/Makefiles/install-dependencies.gmake
Installing Word Alignment tool
Moses requires a word alignment tool, such as giza++, mgiza, or Fast Align. Here I am going to mention about installing GIZA++ and mgiza. You can select what you want to use for word alignment. So you can install one of them.
- Installing GIZA++
Untar the package in the folder you wish to install GIZA++.
tar zxvf giza-pp
If you copy the GIZA ++ into theMosesdecoder tools package, it is easy when you are training the system afterward.
cd ~/mosesdecoder mkdir tools
cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-v2/snt2cooc.out \ ~/giza-pp/mkcls-v2/mkcls tools
- Installing MGIZA
Untar the package in the folder you wish to install MGIZA.
cmake . $ make $ make install
It will take some time to install, so you can take rest for some time.
You can create language model using IRSTLM. Language model toolkits perform two main tasks: training and querying. You can train a language model with any of them, produce an ARPA file, and query with a different one. To train a model, just call the relevant script.
If you want to use SRILM or IRSTLM to query the language model, then they need to be linked
You need to download IRSTLM from http://sourceforge.net/projects/irstlm/
tar zxvf irstlm-<version>.tgz
Fine, Now we have installed Moses and related tools. Now we are ready to do baseline system. In the next blog, we will see how to build a baseline system for Tamil to Sinhala translation.