Recently, I have used the ngram-count tool of SRILM to find n-grams of a corpus.
However, I have found that when n>=3, the tool will discard low-frequency n-grams by default.
In fact we can find the n-grams using the -write option of the tool, which is a better choice if you only care about n-grams, not the probabilities.
Monday, July 25, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment