IMFD and CENIA student team wins first place in NLP competition DIPROMATS 2024 - TEST Millennium Institute Foundational Research on Data

July 2024. A group of four students, led by Marcelo Mendoza, DCC UC academic and IMFD and Cenia researcher, won the first place in DIPROMATS 2024, the NLP challenge on propaganda detection and narratives of the Iberian Forum for Language Assessment 2024..

The team, composed of Miguel Fernandez (IMFD and PhD DCC UC), Maximiliano Ojeda (IMFD and PhD DCC UC), Lilly Guevara (CENIA RL5 and USM Engineering), Diego Varela (CENIA RL5 and USM Engineering) obtained the best results in one of the two categories of the test, which consisted of developing systems capable of detecting and characterizing propagandistic content in tweets written by authorities from the US, Europe, Russia and China, in English or Spanish.

Propaganda

The deceptive intent of propaganda may be less obvious and more damaging than disinformation. Its content need not be false, and its effects may only be perceptible over time. The abuse of propaganda content in the information ecosystem produces a manipulation of public opinion, which can be really detrimental to the democratic system.

"Propagandistic content is understood as a message that is premeditatedly designed to influence a specific audience: it is deliberately constructed and can effectively jeopardize the democratic discussion on certain issues, for example, distort what foreign People really do in Chile, it can affect minorities, and that slowly erodes democracy", says Miguel Fernandez, IMFD and Doctoral student DCC UC. This is why this phenomenon is of particular importance for the studies and was selected by DIPROMATS for this competition.

"In this challenge we used techniques based on artificial intelligence and natural language processing to detect the use of persuasive language and propaganda techniques in text," explains Marcelo Mendoza, IMFD researcher.

Marcelo Mendoza, academic of the Department of Computer Science UC, researcher IMFD and Cenia.

"We trained a model based on Transformers technology, which is an architecture that allows machine learning models to understand text and also techniques such as data augmentation," explains Diego Varela, from CENIA RL5 and USM Engineering.

Diego Varela (CENIA RL5 and USM Engineering)

These techniques make it possible to perform analyses that would be impossible to perform, given the large amount of information to be handled.nformation that must be handled. "We as humans perhaps have very few resources or not all of us have the resources to be able to identify this type of phenomena, but thanks to these technologies, we can achieve it," highlights Lily Guevara, from CENIA RL5 and USM Engineering.

Lily Guevara (CENIA RL5 and USM Engineering)

Transformers are useful to identify patterns in texts and the way to identify propaganda is to review how the language is behaving within the text: what goes before, what goes after, what words go together: the use of all the language. That is why it is one of the most used techniques nowadays, explains Maximiliano Ojeda, IMFD and DCC UC.

DIPROMATS 2024 is a challenge organized by the Natural Language Processing and Information Retrieval Research Group of the National University of Distance Education of Spain (UNED), in which the research community defined new research challenges and proposed tasks to advance the state of the art in natural language processing (NPL).

The team will travel to present its solution at the event to be held from September 24-26 in Valladolid, Spain. September 24-26 in Valladolid, Spain.

Check out a video where those who participated in the challenge tell us about their experience: