The areas of Meta-learning (MtL) and Automatic Machine Learning (AutoML) have emerged in the last years with successful solutions to ease the usage of Machine Learning (ML) techniques by interested end-users with low expertise in ML. Usually the MtL and AutoML solutions leverage on knowledge from problems for which the solutions are known, gathered in public repositories. One popular repository is OpenML, which also reports the predictive results achieved by several ML algorithms in benchmark experiments, a very rich information for MtL and AutoML studies. Nonetheless, most of these studies perform an ad-hoc selection of the datasets to be employed in the development of their solutions. This may prevent an appropriate selection of diverse and challenging datasets and introduce some bias in the dataset selection process. Building on the previous experience of the researcher on the study of the complexity of classification and regression problems from a data-driven perspective, we intend to perform an analysis of the existent benchmark ML repositories which is three-fold: (i) to understand and characterize the diversity of such repositories, specifically for MtL purposes; (ii) to enrich the repositories by the generation of synthetic datasets spanning properties distinct from those already existent; and (iii) to build atool able to recommend a test-bed with diverse datasets that shall meet the objectives of theMtL researcher. For such, we expect to join concepts from the recent relate literature oncomplexity measures of classification and regression problems, from the proponent side, and on instance space analysis of supervised ML problems, from the supervisor side. (AU)
News published in Agência FAPESP Newsletter about the scholarship:
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
MUNOZ, MARIO ANDRES;
LEAL, MATHEUS R.;
LORENA, ANA CAROLINA;
PAPPA, GISELE L.;
RODRIGUES, ROMULO MADUREIRA.
An Instance Space Analysis of Regression Problems.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA,
Web of Science Citations: 0.
Please report errors in scientific publications list by writing to: