Francielle Vargas

I am a computer and language scientist with an M.Sc. and Ph.D. (final year) in Natural Language Processing. During my Ph.D, I was a visiting researcher at the University of Southern California (USC) in the USA and an invited researcher to speaker at the Leibniz Institute for the Social Sciences (GESIS) in Germany. I received my M.Sc. in Computer Science and Computational Mathematics from the University of São Paulo (2017). Previously, I obtained a B.S. in Computer Information Systems and a B.A. in Linguistics. I am interested in Natural Language Processing, Machine Learning and Computational Social Science. My research relies on improvement the explainability, robustness, and fairness of large-scale language models mostly focused on misinformation and hate speech applications. Hence, I use machine learning techniques including neural networks to design and guide the development of safer, trustworthy and responsible human language technologies.

2020-2020: Detecting Antisemitism on Social Media. Indiana University Bloomington, United States

Awards & Honors

Google Latin America Research Award (LARA 2024)
NAACL Diversity and Inclusion Award (NAACL 2024)
Outstanding Academic Project and Honorable Mention (UFMG 2013)
Outstanding Academic Project and Honorable Mention (UFMG 2012)

Invited Talks

04 / 2024 [Invited Researcher]: Language and Morality Lab. University of Southern California (USC). Los Angeles, California.
Talk: Fighting Misinformation and Polarization: Socially Responsible and Explainable Language Technologies for Fact-Checking and Hate Speech Detection. see

02 / 2024 [Keynote Speaker]: Conference cum Conclave on Emerging trends in Journalistic and Media Practices. DG Vaishnav College (DDGDV). Chennai, India.
Talk: Predicting Sentence-Level News Source Reliability for Fact-Checking. see

11 / 2023 [Keynote Speaker]: Conference on Harmful Online Communication. Leibniz Institute for the Social Sciences (GESIS). Cologne, Germany.
Talk: Countering Harmful Online Communication in Brazil: Predicting Fine-Grained Factuality of News and Offensive Context of Social Media Comments. see

Publications

2024

Context-Aware and Expert Data Resources for Brazilian Portuguese Hate Speech Detection
Francielle Vargas, Isabelle Carvalho, Thiago A.S. Pardo, Fabrício Benevenuto
Natural Language Engineering Journal. Cambridge core. pp. 1-23. to appear

Discourse Annotation Guideline for Low-Resource Languages
Francielle Vargas, Wolfgang Schmeisser-Nieto, Zohar Rabinovich, Thiago A.S. Pardo, Fabrício Benevenuto
Natural Language Engineering Journal. Cambridge core. pp. 1-44. to appear

HausaHate: An Expert Annotated Corpus for Hausa Hate Speech Detection
Francielle Vargas, Samuel Guimarães, Shamsuddeen H. Muhammad, Diego Alves, Ibrahim Said Ahmad, Idris Abdulmumin, Diallo Mohamed, Thiago Pardo, Fabrício Benevenuto
8th Workshop on Online Abuse and Harms (WOAH @ NAACL 2024). pp. 52–58. Mexico City, Mexico. see

Extended Multimodal Hate Speech Event Detection During Russia-Ukraine Crisis
Surendrabikram Thapa, Kritesh Rauniyar, Farhan Jafri, Hariram Veeramani, Raghav Jain, Sandesh Jain, Francielle Vargas, Ali Hürriyetoğlu, Usman Naseem
7th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (CASE @ RANLP 2024). pp. 221–228. St. Julians, Malta. see

2023

Predicting Sentence-Level Factuality of News and Bias of Media Outlets
Francielle Vargas, Kokil Jaidka, Thiago A.S. Pardo, Fabrício Benevenuto
Recent Advances in Natural Language Processing (RANLP 2023). pp. 1197–1206. Varna, Bulgaria. see

Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?
Franciell Vargas, Isabelle Carvalho, Ali Hürriyetoğlu, Thiago A.S. Pardo, Fabrício Benevenuto
Recent Advances in Natural Language Processing (RANLP 2023). pp. 1187–1196. Varna, Bulgaria. see

NoHateBrazil: A Brazilian Portuguese Text Offensiveness Analysis System
Franciell Vargas, Isabelle Carvalho, Wolfgang Schmeisser-Nieto, Fabrício Benevenuto, Thiago A.S. Pardo
Recent Advances in Natural Language Processing (RANLP 2023). pp.1180–1186. Varna, Bulgaria. see

Multimodal Hate Speech Detection
Surendrabikram Thapa, Farhan Jafr, Ali Hürriyetoğlu, Francielle Vargas, Roy Ka-Wei Le, Usman Naseem
6th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (CASE @ EACL 2023). pp.151-159. Varna, Bulgaria. see

2022

HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
Franciell Vargas, Isabelle Carvalho, Fabiana R. Góes, Thiago A.S. Pardo, Fabrício Benevenuto
13th Conference on Language Resources and Evaluation (LREC 2022). pp. 7174–7183. Marseille, France. see

Rhetorical Structure Approach for Online Deception Detection: A Survey
Francielle Vargas, Jonas D'Alessandro, Zohar Rabinovich, Fabrício Benevenuto, Thiago A.S. Pardo
13th Conference on Language Resources and Evaluation (LREC 2022). pp. 5906‑5915. Marseille, France. see

Studying Dishonest Intentions in Brazilian Portuguese Texts
Francielle Vargas, Thiago A.S. Pardo
Deceptive AI. Springer International Publishing: Communications in Computer and Information Science, vol 1296. pp. 166–178. see

Extended Multilingual Protest News Detection
Ali Hürriyetoğlu, Osman Mutlu, Fırat Duruşan, Onur Uca, Alaeddin Gürel, Benjamin J. Radford, Yaoyao Dai, Hansi Hettiarachchi, Niklas Stoehr, Tadashi Nomoto, Milena Slavcheva, Francielle Vargas, Aaqib Javid, Aaqib Javid, Erdem Yörük
5th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (CASE @ EMNLP 2022). pp. 223–228. Abu Dhabi, Arab Emirates. see

2021

Contextual-Lexicon Approach for Abusive Language Detection
Francielle Vargas, Fabiana R. Góes, Isabelle Carvalho, Fabrício Benevenuto, Thiago A.S. Pardo
Recent Advances in Natural Language Processing (RANLP 2021). pp. 1442-1451. Held Online. see

Toward Discourse-Aware Models for Multilingual Fake News Detection
Francielle Vargas, Fabrício Benevenuto, Thiago A.S. Pardo
Recent Advances in Natural Language Processing (RANLP 2021). pp. 210-218. Held Online. see

Implicit Opinion Aspect Clues in Portuguese Texts: Analysis and Categorization
Mateus Tarcinalli Machado, Thiago A.S. Pardo, Evandro Eduardo Seron Ruiz, Ariani Di Felippo, Francielle Vargas
15th International Conference on the Computational Processing of Portuguese (PROPOR 2021). pp. 68-78. Fortaleza, Brazil. see

2020 and before

Linguistic Rules for Fine-Grained Opinion Extraction
Francielle Vargas, Thiago A.S. Pardo
5th International Workshop on Social Sensing: Special Edition on Narrative Analysis on Social Media (SocialSens @ ICWSM 2020). pp. 1-6. Held Online. see

Identifying Fine-Grained Opinion and Classifying Polarity on Coronavirus Pandemic
Francielle Vargas, Rodolfo Sanches Saraiva Dos Santos, Pedro Regattieri Rocha
9th Brazilian Conference on Intelligent Systems (BRACIS 2020) . pp.511-520. Rio Grande, Brazil. see

Aspect Clustering Methods for Sentiment Analysis
Francielle Vargas, Thiago A.S. Pardo
13th International Conference on the Computational Processing of Portuguese (PROPOR 2018). pp.365-374. Canela, Brazil. see

Committees

Organizing Committee

ICWSM (2021) (2022) (2023)
CASE (EMNLP 2022) (RANLP 2023)

Program Committee
Natural Language Processing

ACL (2023)
NAACL (2022) (2024)
EMNLP (2022) (2023) (2024)
LREC (2024)
COLING (2025)
WOAH (NAACL 2022) (ACL 2023) (NAACL 2024)
FEVER (EACL 2023) (EMNLP 2024)
CODI (COLING 2022) (ACL 2023) (EACL 2024)
ARGMINING-CMNA (COMMA 2022) (COLING 2022) (ICLP 2023)
SoCal NLP (2022)

Computational Social Science and Data Science

ICWSM (2023) (2024) (2025)
CIKM (2023) (2024)
HICSS (2024)

Resources

Automated Methods

A Post-hoc Explanation Method by Stereotypes and Counter-Stereotypes to Assess Social Bias in Hate Speech Classifiers. see
Optimized Bag-of-Words Model by Contextual Lexicon for Explainable Hate Speech Detection. see
Sentence-Level News Source Reliability Prediction. see

Datasets

HateBR: Large-scale expert annotated dataset of Brazilian Instagram comments for abusive language detection
HausaHate: An expert hate speech dataset of Facebook comments for the Hausa African Indigenous language
FactNews: Sentence-level annotated dataset to predict factuality of news articles and bias of media outlets
SentiAspect-pt: Aspect-based sentiment analysis annotated dataset of web consumer reviews
OPCovidBR: Aspect-based sentiment analysis annotated dataset of Covid-19 tweets
Deceiver: Multilingual RST-annotated dataset for fake news detection

Softwares