In the recent years, several machine translation systems have been built for the Baltic languages. Besides Google and Microsoft machine translation engines and research experiments with statistical MT for Latvian  and Lithuanian, there are both English-Latvian  and English-Lithuanian  rulebased MT systems available. Both Latvian and Lithuanian are morphologically rich languages with quite free word order. In combination with the limited availability of parallel corpora for these languages, it poses a sparseness problem for phrase-based SMT. This research is a part of a project to build the best general-purpose phrase-based SMT using publicly available and proprietary corpora and tools. During the project we added language-specific knowledge to assess the possible improvement of translation quality. This paper reports on implementation, as well as automatic and human evaluation of EnglishLatvian and Lithuanian-English statistical machine translation systems. Results of human evaluation show that integrating morphology knowledge into SMT gives significant improvement of translation quality compared to baseline SMT.
Authors: Raivis Skadiņš, Kārlis Goba, Valters Sičs.
Link to publication here.