skip to main
|
skip to sidebar
Wang Pidong's Homepage
Saturday, February 4, 2012
Perl bug: spliting UTF-8 encoded Chinese string
I found a bug of perl, when I used regular expression /\s+/ to split a Chinese string "我想去你家,可以吗?我还想去月球,你想去吗?" which was encoded in UTF-8.
No comments:
Post a Comment
Newer Post
Older Post
Home
Subscribe to:
Post Comments (Atom)
About Me
Pidong WANG
Palo Alto, CA, United States
Who is a Senior NLP Engineer now
View my complete profile
Blog Archive
►
2019
(2)
►
June
(2)
►
2017
(4)
►
September
(1)
►
March
(1)
►
January
(2)
►
2016
(22)
►
October
(1)
►
September
(5)
►
August
(3)
►
July
(2)
►
June
(3)
►
May
(4)
►
April
(2)
►
March
(2)
►
2015
(4)
►
May
(1)
►
April
(2)
►
February
(1)
►
2014
(3)
►
October
(1)
►
April
(1)
►
March
(1)
►
2013
(9)
►
December
(3)
►
November
(4)
►
May
(1)
►
January
(1)
▼
2012
(27)
►
December
(2)
►
November
(1)
►
September
(2)
►
July
(1)
►
June
(4)
►
May
(3)
►
April
(2)
►
March
(1)
▼
February
(6)
Moses: recaser issues
Moses: pruning phrase tables
Python: multi threading problem
Compiling latest Moses from git
Python: buffering problem when using 'for line in ...
Perl bug: spliting UTF-8 encoded Chinese string
►
January
(5)
►
2011
(19)
►
November
(1)
►
September
(4)
►
July
(2)
►
June
(3)
►
May
(4)
►
April
(2)
►
February
(1)
►
January
(2)
►
2010
(18)
►
December
(2)
►
November
(4)
►
June
(2)
►
May
(1)
►
March
(5)
►
February
(1)
►
January
(3)
►
2009
(25)
►
December
(5)
►
August
(3)
►
July
(8)
►
June
(7)
►
May
(1)
►
January
(1)
►
2008
(8)
►
July
(8)
Related Sites
Google Site homepage
My NUS homepage
NUS Natural Language Processing Group
Department of Information Systems
Department of Computer Science
School of Computing-NUS
National University of Singapore
No comments:
Post a Comment