Scholarship 14/11632-0 - Linguística computacional - BV FAPESP
Advanced search
Start date
Betweenand

Exploration of semi-supervised and never ending learning discourse parsing approach

Grant number: 14/11632-0
Support Opportunities:Scholarships abroad - Research Internship - Doctorate
Start date until: September 01, 2014
End date until: August 31, 2015
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Thiago Alexandre Salgueiro Pardo
Grantee:Erick Galani Maziero
Supervisor: Graeme Hirst
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Institution abroad: University of Toronto (U of T), Canada  
Associated to the scholarship:11/23323-4 - Automatic rhetorical parsing based on large amount of data., BP.DR

Abstract

(Context) The Natural Language Processing (NLP) is an instigating area whose goal is to provide the understanding and the automatic generation of texts. This processing occurs at many levels and one of them is the discourse, in which the aims and intentions of the author of texts are deal. A text has an elaborated structure that relates all of its content, giving it coherence. Several methodologies have been employed in automatic discourse analysis, among them approaches based on lexical patterns, probabilistic models and classifiers using machine learning techniques. (Gaps) The cited approaches rely on annotated data, which is costly to obtain, making use of the potential of the semi-supervised learning that fits the scarcity of annotated data, effectively generalizing the knowledge gained from the few annotated data. (Objectives) This work has, as the main objective, the exploration of the parser HILDA in an environment of semi-supervised and never ending learning with lots of unannotated data. Other goals include improving learning by incorporating new features and making groupings of rhetorical relations. The evaluation of this research will be conducted both intrinsically and extrinsically, as with the application of the produced discourse parser in tasks of interest for the involved research groups. (Hypotheses) One hypothesis is that the use of semi-supervised learning approaches allows achieving good results given the scarcity of annotated information for parsing discourse. Other hypotheses concerning the plausibility of the proposed new attributes in this research for the automatic discourse analysis and groupings of rhetorical relations can benefit machine-learning algorithms. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.