Advanced search
Start date
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

An Instance Space Analysis of Regression Problems

Full text
Munoz, Mario Andres [1] ; Yan, Tao [1] ; Leal, Matheus R. [2] ; Smith-Miles, Kate [1] ; Lorena, Ana Carolina [3] ; Pappa, Gisele L. [2] ; Rodrigues, Romulo Madureira [3]
Total Authors: 7
[1] Univ Melbourne, Sch Math & Stat, Parkville, Vic 3010 - Australia
[2] Univ Fed Minas Gerais, Comp Sci Dept, Av Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG - Brazil
[3] Inst Tecnol Aeronaut, Div Ciencia Comp, Praca Marechal, Eduardo Gomes 50, BR-12228900 Sao Jose Dos Campos, SP - Brazil
Total Affiliations: 3
Document type: Journal article
Web of Science Citations: 0

The quest for greater insights into algorithm strengths and weaknesses, as revealed when studying algorithm performance on large collections of test problems, is supported by interactive visual analytics tools. A recent advance is Instance Space Analysis, which presents a visualization of the space occupied by the test datasets, and the performance of algorithms across the instance space. The strengths and weaknesses of algorithms can be visually assessed, and the adequacy of the test datasets can be scrutinized through visual analytics. This article presents the first Instance Space Analysis of regression problems in Machine Learning, considering the performance of 14 popular algorithms on 4,855 test datasets from a variety of sources. The two-dimensional instance space is defined by measurable characteristics of regression problems, selected from over 26 candidate features. It enables the similarities and differences between test instances to be visualized, along with the predictive performance of regression algorithms across the entire instance space. The purpose of creating this framework for visual analysis of an instance space is twofold: one may assess the capability and suitability of various regression techniques; meanwhile the bias, diversity, and level of difficulty of the regression problems popularly used by the community can be visually revealed. This article shows the applicability of the created regression instance space to provide insights into the strengths and weaknesses of regression algorithms, and the opportunities to diversify the benchmark test instances to support greater insights. (AU)

FAPESP's process: 12/22608-8 - Use of data complexity measures in the support of supervised machine learning
Grantee:Ana Carolina Lorena
Support type: Research Grants - Young Investigators Grants
FAPESP's process: 19/20328-7 - Analyzing the diversity of public machine learning data repositories for meta-learning
Grantee:Ana Carolina Lorena
Support type: Scholarships abroad - Research