Advanced search
Start date
Betweenand

Evaluation of human multi-document summarization strategies

Grant number: 12/22843-7
Support type:Scholarships abroad - Research Internship - Master's degree
Effective date (Start): March 01, 2013
Effective date (End): April 30, 2013
Field of knowledge:Linguistics, Literature and Arts - Linguistics - Linguistic Theory and Analysis
Principal Investigator:Ariani Di Felippo
Grantee:Renata Tironi de Camargo
Supervisor abroad: Diana Maria de Sousa Marques Pinto dos Santos
Home Institution: Centro de Educação e Ciências Humanas (CECH). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Local de pesquisa : University of Oslo (UiO), Norway  
Associated to the scholarship:11/05003-2 - Human strategies for Multidocument summarization, BP.MS

Abstract

Computational applications capable of handling the incredible amount of information currently available, especially on the web, have become increasingly necessary. The automatic multi-document summarization is one of these applications in which, from a set of documents that deal with the same subject, a single summary is produced. This task has emerged as a natural extension of the traditional single document summarization, which aims to prepare a summary from only one document. The automatic single document summarization has been much exploited by several authors, while the multi-document one is a current task. Despite the interest in multi-document summarization is recent, some systems have been developed, including for Brazilian Portuguese, based on superficial and deep summarization methods. However, there is no study on human multi-document summarization (HMS), unlike the single document summarization, which brought a series of recurrent content selection strategies based on human schemes. Although there are some evidences of how HMS is made, we are not aware of attempts that characterize multi-document summaries linguistically. Thus, we proposed, based on corpus analysis, to characterize the process of HMS aiming to generate subsidies for automatic multi-document summarization and contribution to Descriptive Linguistics. First, this characterization occurred by selecting a suitable corpus and by the accomplishment of a task which consisted of aligning the summaries sentences and their source texts sentences based on content overlap. Later, we characterized the summaries regarding the content selection based on some linguistic features aiming to observe content selection strategies that humans commonly use. The strategies identification and formalization phase is in progress, scheduled to finish in December 2012. In this project, we aim to carry out the evaluation of content selection human strategies which are being currently identified. There are several methods to evaluate automatic summarization systems, which may be certainly used for the evaluation of content selection human strategies. Specifically, these methods can enable the evaluation of the quality and informativeness of automatic multi-document summaries generated from the content selection human strategies. For this purpose, we intend to investigate the different literature methods to select one or more appropriate methods. (AU)