Advanced search
Start date

Automatic rhetorical parsing based on large amount of data.

Grant number: 11/23323-4
Support Opportunities:Scholarships in Brazil - Doctorate
Effective date (Start): May 01, 2012
Effective date (End): April 01, 2016
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Thiago Alexandre Salgueiro Pardo
Grantee:Erick Galani Maziero
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated scholarship(s):14/11632-0 - Exploration of semi-supervised and never ending learning discourse parsing approach, BE.EP.DR


Discourse parsing is one of the essential tasks for Natural Language Processing area, being useful for both linguistic tasks (as the study of some particular phenomenon) and applied computational tasks (as text summarization and question answering). While for some languages there are good parsers, for Portuguese there is only a symbolic method based on discourse patterns for identifying discourse structures. This project aims at exploring discourse parsing methods, especially for Portuguese, in a hybrid approach in order to advance the state of the art. In particular, for producing good quality analyses, we aim at taking advantage of the vast amount of data in the web with the available already annotated corpora, using semi-supervised and non-supervised machine learning techniques, trying to overcome the traditional sparse data limitation. Among the several existent discourse models, the Rhetorical Structure Theory (RST) will be followed, since it is traditional in the area and used in several computational linguistic applications. Besides the practical contributions of creating and making available a new text analysis tool, we believe that this project has potential to produce good theoretical contributions, as the discourse parsing task modeling and the discourse learning strategy proposals.

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Academic Publications
(References retrieved automatically from State of São Paulo Research Institutions)
MAZIERO, Erick Galani. Rhetorical analysis based on large amount of data. 2016. Doctoral Thesis - Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB) São Carlos.

Please report errors in scientific publications list by writing to: