Millennium DB: the powerful multimodal engine created at the Millennium Institute Foundational Research on Data

In a world where an enormous amount of data is produced every second, how can we extract useful information? This is the question that researchers at the Millennium Institute Foundational Research on Data have been asking for more than a decade.

"Currently Millennium DB is a data manager that allows handling knowledge graphs containing a very large volume of data efficiently and that effectively occupies the most modern techniques that exist and that are known from research," explains Domagoj Vrgoč, an academic at the Institute of Mathematical and Computational Engineering of the Catholic University of Chile, a researcher at the Millennium Institute Foundational Research on Data and one of the main authors of the research that led to this work.

Domagoj Vrgoč

Millenium DB is a modular, open-source data management engine that can handle a variety of knowledge graphs that store a very large volume of data efficiently. Millenium DBemploys techniques that are at the forefront of scientific research in this area at the moment: it is based on a combination of proven data management techniques, state-of-the-art algorithms for worst-case-optimal joins, as well as specialized algorithms for evaluating path queries. It allows combining different information formats and creating graphs, from which it is possible to obtain useful information in different forms.

"This is software that was developed from scratch in Chile, and it has the capabilities to compete and even be better than other systems that take years of development. In general, all these types of tools are created in the global northern hemisphere: with Millennium DB we can say that in Chile we do and develop quality research that can compete with developments in other countries", explains Carlos Rojas, researcher at the Millennium Institute Foundational Research on Data and director of the project.

Millenium DB has already been tested with data from different areas, the first tests were carried out with Wikidata and "one example in which we used this model was for the analysis of the social and political situation of the country during the period in which the Constituent Convention was in operation", explains Juan Reutter, alternate director of the IMFD. "Beyond the results of the Convention, during that period we had a large amount of information: comments on social networks, the complete transmissions of the debates that took place in the former congress, surveys that we also conducted with different audiences and with temporal components, and the text itself that was being created: we were able to systematize all of this in this model and have very enriching analyses that allow us to combine variables of different types".

Juan Reutter

For Jazmine Maldonado, Director of Innovation and Technology Transfer at the Millennium Institute Foundational Research on Data, the model has great potential to be developed as a product that can enhance different market sectors. "For example, let's think about retail: you have a series of information you get from customers, such as shopping preferences, the searches they perform, the periods where they are looking for something specific, and you also have product prices, features, maybe images, codes. All this information is available to companies and it is difficult to find a way to systematize it that allows them to get data that is useful for the core of their business".