Kenneth Heafield's scripts that make it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR.
Pre-requirements: bash, ruby, java
CAUTION: you have to set "export LC_ALL=" to let it work ("export LC_ALL=C" will make it crash)
Setup: run the ./setup.sh script which will automatically download necessary parts from Internet
Running:
./score.rb --print --print-header --ref tokenized-reference.txt --hyp-detok tokenized-system-outputs.txt
Thursday, May 23, 2013
Friday, January 18, 2013
tools for Java memory usage analysis
1. jconsole
When you run a java program, you can use jconsole to monitor the program's general memory usage and the status of each thread of the program. Please note that this tool can not say how much memory is used by each object in your Java program. jconsole needs to show a graphic interface, so if you want to run it on Linux, please make sure you have xwindow support.For instance, you can use the following command to start monitoring your running Java program with process id 15120:
jconsole 15120
2. jmap
This tool can be used to dump the heap of Java virtual machine when the machine is running a Java program. The dumped heap file can be analyzed by other tools, e.g., Eclipse memory analyzer .For instance, you can use the following command to dump the heap of the running Java program with process ID 909, and the dumped file is named as heap-file.bin
jmap -dump:format=b,file=heap-file.bin 909
3. Eclipse memory analyzer
This tools is quite useful to look for memory leak or why a program takes so large memory, because it can find out the most memory-consuming objects in your Java program. This tool needs a heap dump to start its analysis. The heap dump can be obtained using jmap. Alternatively, when you run your Java program, you can also add -XX:+HeapDumpOnOutOfMemoryError as a parameter to the Java virtual machine. As a result, when the Java program runs out of memory, the Java virtual machine will automatically dump its heap in the current working directory (the dump file is named like java_pid15040.hprof).Unfortunately, the Eclipse memory analyzer needs a lot of memory to analyze the heap dump file (e.g. usually it needs about 7GB memory to analyze a 2GB heap dump file). Thus, you usually need to change the heap size of your Eclipse (not the heap size of your Java program running in Eclipse) in your eclipse.ini file (which is usually located in the same folder as your eclipse binary executable file).
Wednesday, December 19, 2012
How to display Chinese in Graphviz
Graphviz is a very handy tool for drawing plots and firgures.
However, it is not straight forward to display Chinese characters in the generated plots.
One example is as follows using DOT:
node [shape=box,style=dashed,height=0.3,fontname="C:\Windows\Fonts\NSimSun Regular.ttf",fontsize=12]; "你好"; "whr you are ∀";
where you can write UTF-8 encoded Chinese characters in the source file, and alternatively you can write it in xml-like unicode numbers like "∀" (i.e. ∀). More importantly, you need to specify the Chinese font file such that Graphviz can really display the Chinese characters, since by default Graphviz can hardly find the correct font to use for Chinese characters.
However, it is not straight forward to display Chinese characters in the generated plots.
One example is as follows using DOT:
node [shape=box,style=dashed,height=0.3,fontname="C:\Windows\Fonts\NSimSun Regular.ttf",fontsize=12]; "你好"; "whr you are ∀";
where you can write UTF-8 encoded Chinese characters in the source file, and alternatively you can write it in xml-like unicode numbers like "∀" (i.e. ∀). More importantly, you need to specify the Chinese font file such that Graphviz can really display the Chinese characters, since by default Graphviz can hardly find the correct font to use for Chinese characters.
Tuesday, December 18, 2012
How to use JDB on Linux
JDB is quite a powerful debugging tool for Java programs, especially for multi-threading Java programs.
JDB can be found in the JDK package.
You can also learn how to use JDB by reading the manual of JDB (using command "man jdb").
Here I only show the useful parts that I found:
(1) you can run your Java program as usual with the additional option "-agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n"
(2) now you can start JDB using command "jdb -attach 8000"
(3) in JDB, you can first use "suspend" to suspend your JAVA program, and then use "threads" to see the thread list of the JAVA program; if you want to see what code is each thread running, you can use "where 0x22" (0x22 is the thread id which is from the thread list); after finishing debugging, you can use "resume" to resume your JAVA program.
(4) if you want to exit JDB and let your JAVA program go on running, you can simply press Control-C
JDB can be found in the JDK package.
You can also learn how to use JDB by reading the manual of JDB (using command "man jdb").
Here I only show the useful parts that I found:
(1) you can run your Java program as usual with the additional option "-agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n"
(2) now you can start JDB using command "jdb -attach 8000"
(3) in JDB, you can first use "suspend" to suspend your JAVA program, and then use "threads" to see the thread list of the JAVA program; if you want to see what code is each thread running, you can use "where 0x22" (0x22 is the thread id which is from the thread list); after finishing debugging, you can use "resume" to resume your JAVA program.
(4) if you want to exit JDB and let your JAVA program go on running, you can simply press Control-C
Friday, November 2, 2012
How to set the priority order of jar files in Eclipse
If you have multiple jar files in your Eclipse project, then you would have some problems, if some jar files have classes with the same name. In this case, the priority order of the jar files should matter, because you always want to import the right class.
How to set the priority order of your jar files in Eclipse?
Right click your project, and then click the Property menu.
In the "Java Build Path" menu on the left of the popup Window.
Then you can see the "Order and Export" tab which shows the order of the jar files and source codes.
How to set the priority order of your jar files in Eclipse?
Right click your project, and then click the Property menu.
In the "Java Build Path" menu on the left of the popup Window.
Then you can see the "Order and Export" tab which shows the order of the jar files and source codes.
Thursday, September 20, 2012
How To Open a Command Prompt in Windows 8
Windows 8 has been released recently, and some people complain of it, since it changes the way in which users used to do with Windows.
However, I really like it, for the simple reason that it integrates nearly all the Microsoft products together, e.g. Windows on PC, Windows on Phone, and also Xbox.
One of the new features that I found useful is that in Windows 8 file explorer, you can easily open a command prompt in the current folder by simply clicking the menu File -> Open command prompt, which is the one that I has been expecting for a long time.
However, I really like it, for the simple reason that it integrates nearly all the Microsoft products together, e.g. Windows on PC, Windows on Phone, and also Xbox.
One of the new features that I found useful is that in Windows 8 file explorer, you can easily open a command prompt in the current folder by simply clicking the menu File -> Open command prompt, which is the one that I has been expecting for a long time.
Tuesday, September 18, 2012
Useful add-ons of Firefox
FireBug 1.9
A useful tool used to see the html architectures of web pages.
HttpFox
A tool used to see the TCP/UDP packages sent from/to Firefox.
A useful tool used to see the html architectures of web pages.
HttpFox
A tool used to see the TCP/UDP packages sent from/to Firefox.
Subscribe to:
Posts (Atom)