Recently, I have used the ngram-count tool of SRILM to find n-grams of a corpus.
However, I have found that when n>=3, the tool will discard low-frequency n-grams by default.
In fact we can find the n-grams using the -write option of the tool, which is a better choice if you only care about n-grams, not the probabilities.
Monday, July 25, 2011
Tuesday, July 5, 2011
does sed support lookahead or lookbehind on Linux?
after investigating for a while, finally I found sed does not support lookahead or lookbehind assertions.
Based on http://sed.sourceforge.net/sedfaq6.html, the modified sed, which is named as ssed, can support it in its Perl mode.
Based on http://sed.sourceforge.net/sedfaq6.html, the modified sed, which is named as ssed, can support it in its Perl mode.
Subscribe to:
Posts (Atom)