The Relational Database Management Systems (RDBMS) were conceived to store and retrieve large amounts of data represented only by numbers and small character strings. All of those data meet the properties to be compared by identity or by total ordering relationships (TOR). The underlying theory providing the fundamentals for current RDBMS is the Relational Model, whose operations are provided by the Set Theory. The data operators running in the RDBMS, like the union, Cartesian products and join ones, come from that theory. The strong formal basis of the relational model allow constructing powerful yet flexible tools, by exploiting the fundamental properties of the basic operators. Of special interest are the operators and corresponding properties that allow combining data coming from two or more sources, such as two base relations or intermediate ones resulting from the query processing. However, now the RDBMS must also handle more complex data types, such as multimedia (image, audio, video), temporal series, etc., which are usually compared neither by identity nor by TOR, but by similarity. This lead to similarity junction and similarity selection operations. Those operators are being thoroughly studied, but their integration to RDBMS are now just starting to be studied, and our research group are among the first to take this step. The results we already obtained suggest that the similarity join operator based just in the Set Theory do no appeal to the applications because they usually generate too many data, way beyond the expected/required amounts. Our experiments have shown that the applications would be better provided if the database were queried like a graph, in a more "navigational" approach. Thus, this project aims at exploring binary operators that perform "similarity combination" of the underlying tables, relying both on the Set Theory - to take into account the huge theoretical underpinning and the efficient implementations of the RDBMS - and on similarity relationships occurring between input records, pursuing a navigational model mimicking graph navigation, targeting reducing the answer cardinality and thus improving their applicability to real scenarios.
News published in Agência FAPESP Newsletter about the scholarship: