Machine Translation (MT) is one of the most traditional subfields of Natural Language Processing (NLP). Since the 50s many efforts have been made to translate texts written in one language into another, but the computational problem of translation remains largely unsolved. Well-known difficulties include the sheer complexity of natural languages and the need for large amounts of heterogeneous linguistic knowledge (e.g., syntactic, semantic, pragmatic etc) in the translation task. Recently, however, important achievements have been made in this and many other NLP subfields using purely statistical models, which use little or no linguistic knowledge to solve the task, and relying on aligned parallel corpora as a basis for learning the translation process. In this document we propose the study and possible development of textual alignment techniques for parallel corpora as a first step towards the development of a statistical MT system.
News published in Agência FAPESP Newsletter about the scholarship: