Discourse-Aware Computational Models for Fake News Detection

There are reliable cues for online detecting deception and the belief that liars give off cues that may indicate their deception is near-universal. Furthermore, a fairly straightforward element to mitigate risks of deceptive activities is to identify deceptive intentions. In the last few years, there has been a growth in the number of web and social media users. As a result, the potential of misinformation also increased, resulting in the production of fake news, fake reviews (also known as opinion spam), deceptive discussions, simple lies, etc. Fake news detection is defined as the prediction of the chances of a particular news article being intentionally deceptive. Different models have been proposed to classify statements intentionally misstated (or manipulated). The vast majority of these models rely on linguistic features, such as language complexity, part-of-speech tags, and syntactic, semantic, and psycholinguistics, as well as n-grams. Nevertheless, these models compare and contrast misleading versus credible content, and do not remove the confounding factor of a topic or narrative when training. Consequently, the models learn a clear topical separation between misleading and credible content.

To propose advancing the investigation of fake news detection technologies, in this project, we propose to analyze the distinctive discourse structures of fake and true news towards achieving a suitable definition of a "fake news" genre, as well as proposing discourse-aware computational models for fake news detection on the web and social media.

Team

Francielle Vargas. Institute of Mathematics and Computer Sciences, University of São Paulo, Brazil
Zohar Rabinovich. Viterbi School of Engineering, University of Southern California, United States
Wolfgang Schmeisser. Department of Linguistics, University of Barcelona, Spain
Jonas D'Alessandro. Department of Linguistics, Federal University of Minas Gerais, Brazil
Fabrício Benevenuto. Computer Science Department, Federal University of Minas Gerais, Brazil
Thiago Pardo. Institute of Mathematics and Computer Sciences, University of São Paulo, Brazil

Publications

Discourse Annotation Guideline for Low-Resource Languages
Vargas, F., Schmeisser-Nieto, W., Rabinovich, Z., W., Pardo, T.A.S., Benevenuto, F.
Cambridge Natural Language Processing Journal. 2024. pp.1-33. accepted

Rhetorical Structure Approach for Online Deception Detection: A Survey
Vargas, F., D'Alessandro, J., Rabinovich, Z., Benevenuto, F., Pardo, T.A.S.
13th Conference on Language Resources and Evaluation (LREC 2022). pp.5906‑5915. Marseille, France. see

Towards Discourse-Aware Models for Multilingual Fake News Detection
Vargas, F., Benevenuto, F., Pardo, T.A.S.
Recent Advances in Natural Language Processing (RANLP 2021). pp.210-218. Held Online. see

Resources

Dataset

Deceiver: Discourse-annotated dataset for multilingual fake news detection.

Survey

RST-Guideline: RST discourse annotation guideline.

Team

Publications

Resources

Dataset

Survey

Sponsorship