Thursday, January 19, 2012

Moses phrase-based decoder analysis

(1). from the moses-cmd/src/Main.cpp (int main(int argc, char* argv[]))

(2). Main.cpp first calls parameter->LoadParam(argc, argv) to load and check the parameters in the moses.ini configuration file and command line, where the model files are not loaded

(3). Main.cpp then calls StaticData::LoadDataStatic(parameter) to load weights and models according to the parameters of (2)
(3.1) StaticData::LoadDataStatic(parameter) calls StaticData::LoadData(Parameter *parameter)
(3.1.1) in StaticData::LoadData(Parameter *parameter), we load the weights and models by calling, e.g., StaticData::LoadLanguageModels(), LoadPhraseTables()
( in StaticData::LoadLanguageModels() calls LanguageModel* CreateLanguageModel(LMImplementation lmImplementation, const std::vector &factorTypes, size_t nGramOrder, const std::string &languageModelFile, float weight, ScoreIndexManager &scoreIndexManager , int dub) to create LM instances, where the highest level LM class is class LanguageModel : public StatefulFeatureFunction; LanguageModel is the parent class of LanguageModelSingleFactor and LanguageModelMultiFactor; LanguageModelInternal is a subclass of LanguageModelSingleFactor;
In Moses, the major specific interfaces of LM classes like LanguageModelInternal are: bool load(...) and float GetValue(const std::vector &contextFactor, State* finalState = 0, unsigned int* len = 0) const, where the former one is used to load a LM file while the later one calculates the probability for an n-gram saved in contextFactor; the class LanguageModel implements the general interface for a feature function, e.g., Evaluate(..)

(4). Main.cpp uses IOWrapper *ioWrapper = GetIODevice(staticData) to setup the input device (an input file or standard input)

(5). Main.cpp uses vector weights = staticData.GetAllWeights() to check on weights

(6). Main.cpp starts the main loop of translating input instances (text, confusion network, or lattice):
(6.1). use ReadInput(*ioWrapper,staticData.GetInputType(),source) to load an input, which is saved in source
(6.2). setup the translation manager by calling Manager manager(*source, staticData.GetSearchAlgorithm()), where by calling staticData.InitializeBeforeSentenceProcessing(source) we initialize the translation/language models for this sentence; the language model list is StaticDate.m_languageModel; the default search algorithm is SearchNormal;
(6.3). expand translation hypotheses stack by stack until the end of the input sentence using manager.ProcessSentence()
(6.3.1). ProcessSentence() first reset the statistics using staticData.ResetSentenceStats(m_source)
(6.3.2). ProcessSentence() then collects translation options for the input sentence
(6.3.3). ProcessSentence() calls the search algorithm to process the input using m_search->ProcessSentence()
(6.4). pick the best translation (maximum a posteriori decoding)

No comments: