Abstract:

In the recent years, several machine translation systems have been built for the Baltic languages. Besides Google  and Microsoft machine translation engines and research experiments with statistical MT for Latvian [1] and Lithuanian, there are both English-Latvian [2] and English-Lithuanian [3] rulebased MT systems available. Both Latvian and Lithuanian are morphologically rich languages with quite free  word order. In combination with the limited availability of parallel corpora for these languages, it poses a sparseness problem for phrase-based SMT. This research is a part  of  a project to build the best general-purpose phrase-based SMT using publicly available  and proprietary  corpora and tools. During the project we added language-specific knowledge to assess the possible improvement of translation quality. This paper reports on implementation, as well as automatic and human evaluation of EnglishLatvian and Lithuanian-English statistical machine translation systems. Results of human evaluation show that integrating morphology knowledge into SMT gives significant improvement of translation quality compared to baseline SMT.

 

Authors: Raivis Skadiņš, Kārlis Goba, Valters Sičs.

 

Link to publication here.