Advanced search
Start date
Betweenand

Use of data complexity measures in the support of supervised machine learning

Abstract

Machine Learning (ML) techniques have been successfully employed in the solution of various data classification problems. Recently some studies are devoted to understand how quantitative measures quantifying the complexity of data sets used for classification, such as the geometric overlap between classes, affects the performance of ML techniques. Among the contributions of this approach is a better understanding of the domains of competence and limitations of these techniques. This project will initially study different measures to characterize the complexity of classification problems. Although there is a variety of measures in the literature, studies about their areas of expertise are not frequent, namely in what types of application and analysis they may be more appropriate. We then intend to use these measurements in the support of the reduction in the complexity involved in solving a given classification problem. A first attempt in this direction is to pre-process data, so as to reduce the complexity of new datasets generated. Two pre-processing tasks will be investigated: noise identification and feature subset selection. In another front, the reduction in the complexity in solving a classification problem will be addressed by employing a divide-and-conquer strategy. In this case, the goal is to find sub problems of lower complexity, whose solutions can be combined to solve the original classification problem. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (14)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
PISANI, PAULO HENRIQUE; LORENA, ANA CAROLINA; DE CARVALHO, ANDRE C. P. L. F.. Adaptive algorithms applied to accelerometer biometrics in a data stream context. Intelligent Data Analysis, v. 21, n. 2, p. 353-370, . (12/25032-0, 13/07375-0, 12/22608-8)
TRAMBAIOLLI, L. R.; SPOLAOR, N.; LORENA, A. C.; ANGHINAH, R.; SATO, J. R.. Feature selection before EEG classification supports the diagnosis of Alzheimer's disease. CLINICAL NEUROPHYSIOLOGY, v. 128, n. 10, p. 2058-2067, . (12/22608-8, 13/00506-1, 13/10952-9, 13/10498-6)
MORALES, PABLO; LUENGO, JULIAN; GARCIA, LUIS P. F.; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.; HERRERA, FRANCISCO. The NoiseFiltersR Package: Label Noise Preprocessing in R. R JOURNAL, v. 9, n. 1, p. 219-228, . (12/22608-8, 13/07375-0, 11/14602-7)
PISANI, PAULO HENRIQUE; POH, NORMAN; DE CARVALHO, ANDRE C. P. L. F.; LORENA, ANA CAROLINA. Score normalization applied to adaptive biometric systems. COMPUTERS & SECURITY, v. 70, p. 565-580, . (12/22608-8, 13/07375-0, 12/25032-0)
GARCIA, LUIS P. F.; LEHMANN, JENS; DE CARVALHO, ANDRE C. P. L. F.; LORENA, ANA C.. New label noise injection methods for the evaluation of noise filters. KNOWLEDGE-BASED SYSTEMS, v. 163, p. 693-704, . (16/18615-0, 13/07375-0, 12/22608-8)
LORENA, ANA C.; MACIEL, ARON I.; DE MIRANDA, PERICLES B. C.; COSTA, IVAN G.; PRUDENCIO, RICARDO B. C.. Data complexity meta-features for regression problems. MACHINE LEARNING, v. 107, n. 1, SI, p. 209-246, . (12/22608-8)
BARELLA, VICTOR H.; GARCIA, LUIS P. F.; DE SOUTO, MARCILIO C. P.; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.. Assessing the data complexity of imbalanced datasets. INFORMATION SCIENCES, v. 553, p. 83-109, . (13/07375-0, 15/01382-0, 12/22608-8)
MUNOZ, MARIO ANDRES; YAN, TAO; LEAL, MATHEUS R.; SMITH-MILES, KATE; LORENA, ANA CAROLINA; PAPPA, GISELE L.; RODRIGUES, ROMULO MADUREIRA. An Instance Space Analysis of Regression Problems. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, v. 15, n. 2, . (12/22608-8, 19/20328-7)
QUITERIO, THAISE M.; LORENA, ANA C.. Using complexity measures to determine the structure of directed acyclic graphs in multiclass classification. APPLIED SOFT COMPUTING, v. 65, p. 428-442, . (12/22608-8, 15/17291-3)
SPOLAOR, NEWTON; LORENA, ANA CAROLINA; LEE, HUEI DIANA. Feature Selection via Pareto Multi-objective Genetic Algorithms. APPLIED ARTIFICIAL INTELLIGENCE, v. 31, n. 9-10, p. 764-791, . (09/12963-2, 12/22608-8)
PIMENTEL, BRUNO ALMEIDA; DE CARVALHO, ANDRE C. P. L. F.. A Meta-learning approach for recommending the number of clusters for clustering algorithms. KNOWLEDGE-BASED SYSTEMS, v. 195, . (16/18615-0, 17/20265-0, 12/22608-8)
PISANI, PAULO HENRIQUE; LORENA, ANA CAROLINA; DE CARVALHO, ANDRE C. P. L. F.. Adaptive Biometric Systems Using Ensembles. IEEE INTELLIGENT SYSTEMS, v. 33, n. 2, p. 19-28, . (12/22608-8, 13/07375-0, 12/25032-0)
RIVOLLI, ADRIANO; READ, JESSE; SOARES, CARLOS; PFAHRINGER, BERNHARD; DE CARVALHO, ANDRE C. P. L. F.. An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. MACHINE LEARNING, v. 109, n. 8, . (16/18615-0, 13/07375-0, 12/22608-8)
GARCIA, LUIS P. F.; RIVOLLI, ADRIANO; ALCOBACA, EDESIO; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.. Boosting meta-learning with simulated data complexity measures. Intelligent Data Analysis, v. 24, n. 5, p. 1011-1028, . (12/22608-8, 13/07375-0, 18/14819-5, 16/18615-0)

Please report errors in scientific publications list by writing to: cdi@fapesp.br.