IMFD research to be presented at SIGMOD, the premier international database conference - TEST Millennium Institute Foundational Research on Data

December, 2023. One of the multiple lines of work developed at the Millennium Institute Foundational Research on Data is that of multimodal graphs (one of the developments in this area is Millennium DB). Multimodal graphs are ways of structuring data, representing relationships between different types of nodes and connections, which allows modeling and representing information of different types. An example where they can be used is in the analysis of transportation networks in an area: the nodes could be the bus stops and the connections, the routes between different means of transportation: bicycles, cars, buses, etc. Analyzing this type of information in this format allows to have a much more complete and complex knowledge than analyzing this same information in database formats.

"Multimodal graphs contain more semantics than mere connectivity between nodes. This type of information is very frequent in real life and to be queried effectively requires enriching the language of pattern graphs with data type specific expressions. For example, some nodes may be geographical places and have associated location information, and I would like to be able to look up the above query, but restricted to the pharmacy being within a certain distance radius. Or that the current time is between its opening and closing hours. Or that the price is less than a certain limit," explains Gonzalo Navarro, an academic at the DCC U. Chile and one of the researchers at the Millennium Institute Foundational Research on Data after "Worst-Case-Optimal Similarity Joins on Graph Databases".

This is the paper recently accepted at the conference SIGMOD'24conference, in which worked DCC U. Chile academics Benjamín Bustos, Aidan Hogan and Gonzalo Navarro, together with DCC UC academics Diego Arroyuelo and Juan Reutter; and Adrián Gómez-Brandón from Universidade da Coruña), all researchers from the Millennium Institute Foundational Research on Data

Gonzalo Navarro at the IMFD Winter Workshop 2023

"The IMFD allows us to talk and bring together the efforts we make between different centers and universities," emphasizes Juan Reutter, director of the IMFD and one of the authors of the work. "This achievement is a clear example of what we can accomplish with the joint work that we do at the Millennium Institute Foundational Research on DataThis is where we can strengthen the research that we carry out independently in the universities, creating a space for collaboration that allows the country to position itself at the forefront of research in Latin America on this and other topics.

ACM SIGMOD is one of the leading database conferences -along with ACM PODS-, focused on the principles, techniques and applications of database management systems and technologies, and will be held in Santiago, Chile in 2024 .

Adding more information

Regarding this research, Gonzalo Navarro explains: "There is a lot of recent work on so-called graph databases, which represent information in the form of nodes (the objects) and edges connecting nodes (and representing relationships between objects), because they are more flexible than relational databases. These networks are queried mainly by giving a small graph pattern, which has nodes and edges but where the identities of the nodes and edges can be variable. Each way to give a value to the variables so that the pattern appears in the graph is an answer to the query. As a very simplified example, in a network with information about pharmacies, a query like (F, sells, X), (X, component, paracetamol), (X, price, P), (F, has-agreement, Fonasa), gives me the pharmacies F with agreement with Fonasa that sell some medicine X with paracetamol, and the price P at which they sell it. Responding to these queries efficiently is not easy, and it is not uncommon for some of them to take seconds or even minutes. A quality standard for these algorithms is to be worst-case optimal. These types of algorithms are relatively recent and there is still a lot of research on them."

This scientific paper addresses the problem of being able to add metric information to objects and proximity conditions to pattern graphs. It explains that metrics are generalizations of Euclidean distances (such as those of the plane or space), covering not only the example of geographical distance but others such as similarity between images, audio, video, text, etc. (which are modeled as distances in very high-dimensional spaces). The objective, Navarro points out, was also to maintain the worst-case-optimality condition of the search algorithms.

In this way, the researchers show that it is possible to include this type of information and support a particular type of queries, "which is k nearest neighbors. It forms, among the objects that have the metric, a graph where each node knows which are the k nearest/closest nodes to it. The query can then ask for an object to be among the k nearest/closest to another. This allows us to ask, for example, for churches in my own city whose facade resembles (as an image) one I am interested in. We show that the k-neighbor graph can be joined to the database graph so as to integrate it elegantly into the same worst-case-optimal algorithm, reusing in a way all the known development on this algorithm. We show that, in several cases, we can retain worst-case-optimality, although not in all. And we implement the new algorithm and show that it is considerably better than a basic solution. This opens the door to support other types of queries (by distance, as in the example of pharmacies), and other types of data (topological, geographical regions, temporal, etc.)," comments the academic.

-
Source: DCC Communications