Recently, I have found the following sentence-level alignment tools for statistical machine translation (SMT). These tools can pair sentences which have the same meaning but in different languages from parallel documents. This is also the first step of building an SMT system.
(1) CTK: Champollion Tool Kit
http://champollion.sourceforge.net/
Note: this tool (from LDC) uses translation lexicons to align sentences, and one disadvantage is that when the two documents are very different in the number of sentences, this tool can not work well.
CTK v1.2 supports three language pairs:
English Chinese(GB)
English Chinese(UTF8)
English Arabic (UTF8)
English Hindi (UTF8)
(2) Gale-Church Aligner
This is a very old sentence-level alignment algorithm, and fortunately Chris Crowner has implemented it in the NLTK.
http://code.google.com/p/nltk/source/browse/trunk/nltk_contrib/nltk_contrib/align/align.py?r=8552&spec=svn8552
Note that the python code is in the nltk_contrib, not in the main release of NLTK.
(3) MTTK: Machine Translation Toolkit
http://mi.eng.cam.ac.uk/~wjb31/distrib/mttkv1/
Note: this tool is supposed to have the ability to do sentence-level alignment, but I still can not figure out how to do it using the tool.
(4) Align
http://www.cse.unt.edu/~rada/wa/tools/aberger/align.html
Note: this tool was developed by Adam Berger, and can be downloaded from:
http://www.cse.unt.edu/~rada/wa/tools/aberger/align.tar
It supports sentence-level alignment using some anchor labels.
(5) Bleualign
https://github.com/rsennrich/Bleualign
This tool requires automatic translations of one side of the unaligned corpus and then uses a modified BLEU evaluation to find the sentence-level alignments. Of course, you need a seed SMT system to generate the automatic translations. The tool is written in Python.
I found a problem when using this aligner which could use the same sentence on the target side multiple times in the output alignments.
(6) Microsoft Bilingual Sentence Aligner
https://www.microsoft.com/en-us/download/details.aspx?id=52608
This is a sentence aligner written in Perl. It uses sentence length.
Tuesday, May 8, 2012
Subscribe to:
Post Comments (Atom)
16 comments:
Can any one help me with python code for translation from Arabic language to English pleeeeeeeeeeese??
Can any one help me with python code for translation from Arabic language to English pleeeeeeeeeeese??
to the best of my knowledge, there is no Python statistical machine translation decoder so far, so you'd better turn to using Moses to build your translation system. Of course, before building the system, you need to prepare some parallel training data of Arabic and English. One free way of getting the training data is to get them from some open source parallel corpora, e.g. OPUS.
How to get access_token for ios and android devices. Any translation code snippet for ios will help me a lot
Hello Noha,
please check the Kriya decoder which is an implementation of hierarchical phrase-based (hiero) SMT system. It is entirely implemented in Python and includes both grammar extractor and decoder modules.
Please see the PBML paper for technical details specific to this implementation -
Baskaran Sankaran, Majid Razmara and Anoop Sarkar. 2012. Kriya – An end-to-end Hierarchical
Phrase-based MT System. The Prague Bulletin of Mathematical Linguistics (PBML), (97), 83--98
And there's also Bob Moore's excellent "Bilingual Sentence Aligner".
Currently residing at
http://research.microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/
though Microsoft seems to change download links.
gledajte sve najnovije turske srbije na mreži i sva najnovija ažuriranja serija samo na serijama online
https://serijeonlines.net
KBergetar Tonton Drama Kepala Bergetar Melayu Dramas All Episod Watch Online,Layan dramas Watch Online , Melayu Drama Live Episod Tv3 And Astro Ria Full Episod, Tonron Melayu Drama Hd Replay
KBergetar Tonton Drama
Kepala Bergetar
you are sharing good articles keep it up and try to explain it very informative
apk world
Free Games
Emirates ID does lie aged by way of UAE citizens as much a travel record in accordance with a journey inside the GCC Banks and Finance Companies hold taken that a mandatory want in imitation of procedure functions because of savings purposes.
Emirates ID
id status
Watch Perempuan Itu Online Episod 3 Live Drama Full Episod. Perempuan Itu episod 3 7 Hari Mencintaiku 3 Full Episode. Tonton Perempuan Itu online Episod 3 Full Malay Drama.
Thanks for your post. WWE SMACKDOWN
Your Blog Is very atrective I read your All Post And They are so Impresive.WWE WRESTLING
Tonton Melayu Drama Kepala Bergetar Dan Download Malay Telefilem. Kbergetar Watch Online Tonton Live Episod Drama Video.
the Bihar Har Ghar Bijli campaign launched by the INDIAN
government to provide access to electricity to every household, especially in rural areas. The goal is to improve the quality of life and promote economic growth by providing reliable and affordable electricity.
The 8171 Ehsaas NADRA Gov Pk program is a commendable initiative that aims to provide financial assistance to those in need, reduce poverty, create employment
Ehsaas Program Registration online
8171 Ehsaas Program
Post a Comment