Today, I have installed the latest Moses decoder (revision 4274) again.
Since lots of volunteers have contributed to Moses development, and lots of new features are added by them, the Moses is becoming more and more complicated. As a results, there are more bugs or incompatible issues in Moses now, which also implies that it is more difficult to install Moses.
I am writing this post to record a successful installation process of Moses, which can be very useful for a starter of Moses:
The first step is to run command ./regenerate-makefiles.sh:
Detected aclocal: aclocal (GNU automake) 1.11.1
Detected autoconf: autoconf (GNU Autoconf) 2.64
Detected automake: automake (GNU automake) 1.11.1
Detected libtoolize: libtoolize (GNU libtool) 2.2.6
Calling /home/w/wangpd/local/bin/aclocal...
Calling /home/w/wangpd/local/bin/autoconf...
Calling /home/w/wangpd/local/bin/automake...
Calling /home/w/wangpd/local/bin/libtoolize
Detected 16 cores
You should now be able to configure and build:
./configure [--with-srilm=/path/to/srilm] [--with-irstlm=/path/to/irstlm] [--with-randlm=/path/to/randlm] [--without-kenlm] [--with-synlm] [--with-xmlrpc-c=/path/to/xmlrpc-c-config]
make -j 16
The second step is to run command:
./configure [--with-srilm=/path/to/srilm] [--with-irstlm=/path/to/irstlm] [--with-randlm=/path/to/randlm] [--without-kenlm] [--with-synlm] [--with-xmlrpc-c=/path/to/xmlrpc-c-config]
, where you really need absolute pathes for all the options.
The latest Moses requires IRSTLM whose version should be not older than 1.70.01, which is what you have to do, otherwise you will fail (I tried 1.50, and failed). Another important point is that you must finish the installation of IRSTLM completely, which means you need to run:
bash regenerate-makefiles.sh
# set parameter force to the value "--force" if you want to recreate all links to the autotools
./configure --prefix=$PWD
# run "configure --help" to get more details on the compilation options
make
make install
in the root directory of IRSTLM. Note that the last command make install is absolutely needed, since I have tried to skip it but of course failed.
The last step is to run make -j 4 .
Monday, September 26, 2011
Friday, September 16, 2011
an error for Moses when decoding large lattices
Once when I was using Moses to decode a large lattice, I got the following error:
ERROR: Jump length 32 in word lattice exceeds maximum phrase length 20.
ERROR: Increase max-phrase-length to process this lattice.
After looking at the input lattice, I have found that I have a node in the lattice which wanted to jump to the 32nd node after it.
Following the error message I have fixed this problem by setting the -max-phrase-length 35 option for the moses decoder.
ERROR: Jump length 32 in word lattice exceeds maximum phrase length 20.
ERROR: Increase max-phrase-length to process this lattice.
After looking at the input lattice, I have found that I have a node in the lattice which wanted to jump to the 32nd node after it.
Following the error message I have fixed this problem by setting the -max-phrase-length 35 option for the moses decoder.
Wednesday, September 7, 2011
A bug in the Moses tokenizer
when you set the -l option as an unknown language, the Moses tokenizer will say it will fall back to English. However, it does not completely fall back to English. It only falls back to English for tokenizing the period (.) issues, but it will tokenize the single quotation marks (') differently from the English case.
for example, given the input "I'm a boy.", if you set -l en or do not set the -l option, the output is "I 'm a boy ."; if you set -l abc which is an unknown abbreviation of language, the output will be "I ' m a boy ."
for example, given the input "I'm a boy.", if you set -l en or do not set the -l option, the output is "I 'm a boy ."; if you set -l abc which is an unknown abbreviation of language, the output will be "I ' m a boy ."
Sunday, September 4, 2011
Linux: compare text files at the word level
wdiff is a good choice, which can be found online:
http://www.gnu.org/s/wdiff/
what you need to do is to download it and compile it.
http://www.gnu.org/s/wdiff/
what you need to do is to download it and compile it.
Monday, July 25, 2011
SRILM prunes n-gram when n>=3 by default
Recently, I have used the ngram-count tool of SRILM to find n-grams of a corpus.
However, I have found that when n>=3, the tool will discard low-frequency n-grams by default.
In fact we can find the n-grams using the -write option of the tool, which is a better choice if you only care about n-grams, not the probabilities.
However, I have found that when n>=3, the tool will discard low-frequency n-grams by default.
In fact we can find the n-grams using the -write option of the tool, which is a better choice if you only care about n-grams, not the probabilities.
Tuesday, July 5, 2011
does sed support lookahead or lookbehind on Linux?
after investigating for a while, finally I found sed does not support lookahead or lookbehind assertions.
Based on http://sed.sourceforge.net/sedfaq6.html, the modified sed, which is named as ssed, can support it in its Perl mode.
Based on http://sed.sourceforge.net/sedfaq6.html, the modified sed, which is named as ssed, can support it in its Perl mode.
Tuesday, June 28, 2011
Bug of MERT script of Moses when tuning on lattice
I am using some new version of Moses to decode lattice, but when I use the MERT script (mert-moses.pl) to tune on lattice input. I got some errors as follows:
Can't use an undefined value as an ARRAY reference at mert-moses.pl line 684.
After investigating for a while, I realized that the mert-moses.pl has not recognized that the input lattice weight ([weight-i]) is a weight to be tuned. Thus, I have read through the script and changed the procedure named scan_config: after the line "die "$inishortname: File was empty!" if !$nr;" I have added the following code:
Can't use an undefined value as an ARRAY reference at mert-moses.pl line 684.
After investigating for a while, I realized that the mert-moses.pl has not recognized that the input lattice weight ([weight-i]) is a weight to be tuned. Thus, I have read through the script and changed the procedure named scan_config: after the line "die "$inishortname: File was empty!" if !$nr;" I have added the following code:
################################################
# wangpd
foreach my $k (keys %$config_weights)
{
if (!defined $used_triples{$k})
{
my @triplets = @{$additional_triples->{$k}};
my $needlambdas = scalar(@{$config_weights->{$k}});
for(my $lambda=0; $lambda<$needlambdas; $lambda++)
{
my $triplet = $lambda;
$triplet %= scalar(@triplets)
if $additional_tripes_loop->{$k};
my ($start, $min, $max)
= @{$triplets[$triplet]};
push @{$used_triples{$k}}, [$start, $min, $max];
}
}
}
#################################################
Subscribe to:
Posts (Atom)