Advanced search
Start date
Betweenand

Improving public dataset characterization using OpenML

Grant number: 23/11801-6
Support Opportunities:Scholarships abroad - Research Internship - Scientific Initiation
Effective date (Start): December 01, 2023
Effective date (End): February 29, 2024
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Ana Carolina Lorena
Grantee:Nathan Falcão Carvalho
Supervisor: Joaquin Vanschoren
Host Institution: Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). Ministério da Defesa (Brasil). São José dos Campos , SP, Brazil
Research place: Eindhoven University of Technology (TU/e), Netherlands  
Associated to the scholarship:23/03958-2 - Gathering meta-data from public repositories, BP.IC

Abstract

In the past few years, Machine Learning (ML) has become a popular solution for problems in an extremely diverse range of fields, such as Biomedical Informatics, Natural Language Processing, and website recommendation systems. An essential step in ML is determining which algorithm best suits each specific domain, along with what hyperparameters maximize the desired result. The field of Meta-Learning (MtL) solves this algorithm selection problem by analyzing several algorithmic performance values on a diverse range of datasets. Learning from this metadata produces helpful insights into the process of selecting an algorithm to solve a specific problem. For a MtL study to be done, it is necessary to assemble metadata comprising of algorithmic performances and dataset characteristics. Since locally running a big range of ML algorithms on several datasets can be computationally demanding, a common way to acquire such data is to use public dataset repositories, of which OpenML is a popular option. Nevertheless, there are still limitations to this approach, as up-to-date dataset characteristics, called meta-features, can often be hard to get. Since several tools for extracting more useful and representative meta-features from datasets are already documented in the literature, we plan to enrich the OpenML repository by diversifying the range of available metadata. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.