Advanced search
Start date

Term extraction from non-structured data applied to Multiview semi-supervised learning

Grant number: 08/02091-5
Support type:Scholarships in Brazil - Master
Effective date (Start): March 01, 2009
Effective date (End): February 28, 2010
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal researcher:Maria Carolina Monard
Grantee:Ígor Assis Braga
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil


Text Mining (TM) is of great practical importance due to the massive volume of documents available online. Nevertheless, the pattern recognition stage of TM is still highly dependable on the availability of labeled texts. The solution to this problem is the research topic of Semi-supervised (Ss) Learning, which has the potential of reducing the need of expensive labeled data acquisition. Some Ss learning approaches need more than one view (or description) of the data be available. Previous work has not dealt in deep with the extraction of two descriptions from textual data. In this work, we intend to fill this gap. In order to construct the second view of textual data, we propose a hybrid linguistic/statistical terminology extraction approach. The underlying assumption of this approach is that specialized documents are characterized by repeated use of certain lexical units or morphosyntactic constructions. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Academic Publications
(References retrieved automatically from State of São Paulo Research Institutions)
BRAGA, Ígor Assis. Multi-view semi-supervised learning in text classification. 2010. Master's Dissertation - Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB) São Carlos.

Please report errors in scientific publications list by writing to: