The data being collected and generated nowadays increases not only in volume, but also in complexity, leading to the need of new query operators. Health centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of applications domains where more powerful and flexible operators are required.Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being refereed to as big data. Representing and querying big data using only the traditional scalar data types are not enough any more. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now they are stating to become available, but its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. When the volume of the data increases, both the query efficacy and the efficiency to obtain it (the quality and the speed of the query processing) are compromised. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only few studies exist targeting the efficacy of the query answers. This project aims at studing and developing vcariations over the basic similarity operators to propose better suited similarity operators. The results will be validated over two application domais: large collections of images from medical exams and images and time series from remote sensing data of climate and agricultural enteprises.
News published in Agência FAPESP Newsletter about the scholarship: