Monday, July 25, 2011

SRILM prunes n-gram when n>=3 by default

Recently, I have used the ngram-count tool of SRILM to find n-grams of a corpus.

However, I have found that when n>=3, the tool will discard low-frequency n-grams by default.

In fact we can find the n-grams using the -write option of the tool, which is a better choice if you only care about n-grams, not the probabilities.

No comments: