Tuesday, April 18, 2017

Install Moses in Ubuntu

In this blog, We will mainly focus on installing Moses and data processing tools in Ubuntu Operating System. We need to install some other packages before installing Moses. We will see those also in this blog.

Before we start, we will make sure that we have installed those packages.


If you have not install  above packages you can install using below command.

sudo apt-get install <package name>

g++ and boost is needed for compile Moses. Already we could install boost by above command. So below we will see how to install boost.

Installing Boost

For that, we need to download the boost. You can use wget command to download the boost. If you have any trouble to download it, you can straightly download the latest version of the boost from https://sourceforge.net/projects/boost/files/boost/. After you download boost<version>.tar.gz, 
you can extract it using the following command.

tar zxvf boost<version>.tar.gz
Then go inside the boost folder and u need to start the script.

cd boost<version>/ 
./b2 -j5 --prefix=$PWD --libdir=$PWD/lib64 --layout=tagged link=static threading=multi,single install || echo FAILURE

This creates library file in the directory lib64, NOT in the system directory.

Note: In the last command " -j5 " indicates my PC is 5 Core machine (i.e my processor is CORE I5 ) If you are using different core machine change it in your core value.

Installing Moses

For installing Moses, you need to clone it from the GitHub. That is why we installed git in our system.

You can clone the Moses from this git hub link https://github.com/moses-smt/mosesdecoder by below code.
git clone https://github.com/moses-smt/mosesdecoder.git
cd mosesdecoder/ 
Then you can compile Moses using

make -f contrib/Makefiles/install-dependencies.gmake 

Installing Word Alignment tool

Moses requires a word alignment tool, such as giza++, mgiza, or Fast Align. Here I am going to mention about installing GIZA++ and mgiza. You can select what you want to use for word alignment. So you can install one of them.

  • Installing GIZA++
You can clone GIZA++ from https://github.com/moses-smt/giza-pp.
Untar the package in the folder you wish to install GIZA++.
tar zxvf  giza-pp
cd giza-pp 

If you copy the GIZA ++ into theMosesdecoder tools package, it is easy when you are training the system afterward.

cd ~/mosesdecoder mkdir tools
cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-v2/snt2cooc.out \ ~/giza-pp/mkcls-v2/mkcls tools

  •  Installing MGIZA
You can clone MGIZA from https://github.com/moses-smt/mgiza.

Untar the package in the folder you wish to install MGIZA.

cd mgiza/mgizapp
cmake . $ make $ make install
make install

It will take some time to install, so you can take rest for some time.

Installing IRSTLM

You can create language model using IRSTLM. Language model toolkits perform two main tasks: training and querying. You can train a language model with any of them, produce an ARPA file, and query with a different one. To train a model, just call the relevant script.
If you want to use SRILM or IRSTLM to query the language model, then they need to be linked
with Moses.

You need to download IRSTLM from http://sourceforge.net/projects/irstlm/

tar zxvf irstlm-<version>.tgz
cd irstlm-<version>
./configure --prefix=$HOME/irstlm-<version>
make install

Fine, Now we have installed Moses and related tools. Now we are ready to do baseline system. In the next blog, we will see how to build a baseline system for Tamil to Sinhala translation.

No comments:

Post a Comment