Advanced search
Start date

Learning from the web how to translate and paraphrase texts


The automatic recognition of paraphrases and machine translation are two sub-areas of NaturalLanguage Processing (NLP) that share similarities like the fact that both deal with monolingual (forparaphrases) or bilingual (for translations) parallel texts (texts expressing the same content). However, only recently a few studies have been conducted exploring the combination of methods and techniques of these two subareas of NLP (BANNARD; CALLISON-BURCH, 2005, CALLISON-BURCH et al., 2006; BARREIRO, 2008; PANG et al., 2003). This project aims to investigate the automatic extraction of paraphrases and useful knowledge for machine translationusing the strategy of the never-ending language learning (NELL) and the web as the source ofknowledge. On-line repositories of knowledge like Wikipedia define, explain and exemplify knowledge in different ways. On-line repositories of subtitles as OpenSubtitles and SubDB and lyrics like Lyrics present versions of the same text in several languages. These repositories are valuable sources of information for methods able to automatically extract paraphrases and usefulknowledge for translation that will be designed following the strategy of NELL. NELL is a machinelearning strategy based on the constant and incremental learning carried out by the humans. The idea of NELL is to learn simple concepts and relationships between these concepts and then apply this knowledge to learn, in the future, something new and more complex (MITCHELL et al., 2008). This proposal is innovative in applying NELL in the two subareas of NLP cited above and may give rise to integrated approaches, thus contributing to the advancement in these and other areas of research. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
DE JESUS MARTINS, DEBORA BEATRIZ; CASELI, HELENA DE MEDEIROS. Automatic machine translation error identification. MACHINE TRANSLATION, v. 29, n. 1, p. 1-24, . (13/50757-0, 13/11811-0, 11/03799-4, 10/07517-0)

Please report errors in scientific publications list by writing to: