Francielle Vargas
Ph.D. Candidate in Computer Science - Natural Language Processing
I am a computer and language scientist with an M.Sc. and Ph.D. (final year) in Natural Language Processing. I obtained my M.Sc. in Computer Science and Computational Mathematics from the University of São Paulo (awarded in 2017). Previously, I obtained a B.S. in Computer Information Systems and a B.A. in Linguistics.
I am interested in Natural Language Processing, Machine Learning and Computational Social Science. My research lies in the investigation of safe, trustworthy, and socially responsible human language technologies. I rely on machine learning techniques, including neural networks, to design and guide natural language system development. The topics that I am currently researching are:
Research Projects
- 2020-2024: Socially Responsible and Explainable Methods and Resources for Hate Speech Detection. Federal University of Minas Gerais & University of São Paulo, Brazil
- 2020-2023: Discourse-Aware Computational Models for Fake News Detection. Sinch & University of São Paulo, Brazil
- 2022-2022: Expanding Evaluation Data for the Multilingual Protest News Detection. Koç University, Turkey
- 2020-2020: Detecting Antisemitism on Social Media. Indiana University Bloomington, United States
Honors & Awards
- 2024: Latin America Research Awards (LARA). Google
- 2013: Outstanding Academic Achievement: Academic Relevance & Honorable Mention. UFMG
- 2012: Outstanding Academic Achievement: Academic Relevance. UFMG
Invited Speaker
- Conference cum Conclave on Emerging trends in Journalistic and Media Practices (Techclave 2024). DDGD Vaishnav College, Andhra Pradesh, India. [conference] [invitation] [talk]
- Conference on Harmful Online Communication (CHOC2023). Leibniz Institute for the Social Sciences (GESIS), Cologne, Germany. [conference] [programme] [talk]
Publications
2024
-
Context-Aware and Expert Data Resources for Brazilian Portuguese Hate Speech Detection
Vargas, F. Carvalho, I., Pardo, T.A.S., Benevenuto, F.
Natural Language Engineering Journal. Cambridge core. pp.1-21. accepted
-
Discourse Annotation Guideline for Low-Resource Languages
Vargas, F., Schmeisser-Nieto, W., Rabinovich, Z., W., Pardo, T.A.S., Benevenuto, F.
Natural Language Engineering Journal. Cambridge core pp.1-33. accepted
-
Extended Multimodal Hate Speech Event Detection During Russia-Ukraine Crisis
Thapa, S; Rauniyar, K.; Jafri, F. A.; Veeramani, H.; Jain, R.; Jain, S.; Vargas, F., Hürriyetoğlu, A.; Naseem, U.
7th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (EACL). pp.221–228. St. Julians, Malta. see
2023
-
Predicting Sentence-Level Factuality of News and Bias of Media Outlets
Vargas, F., Jaidka, K., Pardo, T.A.S., Benevenuto, F.
Recent Advances in Natural Language Processing (RANLP). pp. 1197–1206. Varna, Bulgaria. see
-
Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?
Vargas, F., Carvalho, I., Hürriyetoğlu, A., Pardo, T.A.S., Benevenuto, F.
Recent Advances in Natural Language Processing (RANLP). pp. 1187–1196. Varna, Bulgaria. see
-
NoHateBrazil: A Brazilian Portuguese Text Offensiveness Analysis System
Vargas, F., Carvalho, I., Schmeisser-Nieto, W., Benevenuto, F., Pardo, T.A.S.
Recent Advances in Natural Language Processing (RANLP). pp.1180–1186. Varna, Bulgaria. see
-
Multimodal Hate Speech Detection
Thapa, S, Jafri, F. A., Hürriyetoğlu, A., Vargas, F., Lee, R. K., Naseem, U.
6th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (RANLP). pp.151-159. Varna, Bulgaria. see
2022
-
Rhetorical Structure Approach for Online Deception Detection: A Survey
Vargas, F., D'Alessandro, J., Rabinovich, Z., Benevenuto, F., Pardo, T.A.S.
13th Conference on Language Resources and Evaluation (LREC). pp.5906‑5915. Marseille, France. see
-
HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
Vargas, F., Carvalho, I., Góes, F.R., Pardo, T.A.S., Benevenuto, F.
13th Conference on Language Resources and Evaluation (LREC). pp.7174–7183. Marseille, France. see
-
Studying Dishonest Intentions in Texts
Vargas, F., Pardo, T.A.S.
Deceptive AI. Springer, vol 1296. pp.166–178. see
-
Extended Multilingual Protest News Detection
Hurriyetoglu, A., Mutlu, O., San, F. D., Uca, O., Gurel, A. S., Radford, B., Dai, Y., Hettiarachchi, H., Stoehr, N., Nomoto, T., Slavcheva, M., Vargas, F., Javid, A., Beyhan, F., Yoruk, E.
5th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (EMNLP). pp.223–228. Abu Dhabi, Arab Emirates. see
2021
-
Contextual-Lexicon Approach for Abusive Language Detection
Vargas, F., Góes, F.R., Carvalho, I., Benevenuto, F., Pardo, T.A.S.
Recent Advances in Natural Language Processing (RANLP). pp.1442-1451. Held Online. see
-
Toward Discourse-Aware Models for Multilingual Fake News Detection
Vargas, F., Benevenuto, F., Pardo, T.A.S.
Recent Advances in Natural Language Processing (RANLP). pp.210-218. Held Online. see
2020
-
Linguistic Rules for Fine-Grained Opinion Extraction
Vargas, F., Pardo, T.A.S.
5th International Workshop on Social Sensing: Special Edition on Narrative Analysis on Social Media (ICWSM). pp.1-6. Held Online. see
-
Identifying Fine-Grained Opinion and Classifying Polarity on Coronavirus Pandemic
Vargas, F., Santos, R.S.S., F., Rocha, P.R.
9th Brazilian Conference on Intelligent Systems (BRACIS). pp.511-520. Rio Grande, Brazil. see
2019 and before
-
Aspect Clustering Methods for Sentiment Analysis
Vargas, F., Pardo, T.A.S.
13th International Conference on the Computational Processing of Portuguese (PROPOR). pp.365-374. Canela, Brazil. see
-
The Coreference Annotation of the CSTNews Corpus
Pardo, T.A.S., Baptista, J., Duran, M.S., Nunes, M.G.V., Nóbrega, F.A.A., Aluísio, S.M., Di Felippo, A., Seno, E.R.M., Silva, R.R., Anchieta, R.T., Brum, H.B., Dias, M.S., Martins, R.S.O., Maziero, E.G., Souza, J.W.C., Vargas, F.
2nd Workshop on Evaluation of Human Language Technologies for Iberian Language (SEPLN). pp.102-112. Murcia, Spain. see
Committees
Organizing Committee
- International AAAI Conference on Web and Social Media (ICWSM 2023) (ICWSM 2022) (ICWSM 2021)
- Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (RANLP 2023) (EMNLP 2022)
Program Committee
- ACL Student Research Workshop (ACL 2023)
- Discourse and Pragmatics Track (ACL 2023)
- Computational Social Science and Cultural Analytics Track (ACL 2023)
- NAACL Student Research Workshop (NAACL 2024) (NAACL 2022)
- Discourse and Pragmatics Track (NAACL 2024)
- Computational Social Science and Cultural Analytics Track (NAACL 2024)
- NLP Applications Track (EMNLP 2022)
- Resources and Evaluation Track (EMNLP 2022)
- Computational Social Science and Cultural Analytics Track (EMNLP 2023) (EMNLP 2022)
- Offensive and Harmful Language Detection and Analysis Track (LREC-COLING 2024)
- Opinion & Argument Mining, Sentiment Analysis, Emotion Recognition/Generation (LREC-COLING 2024)
- Workshop on Online Abuse and Harms (NAACL 2024) (ACL 2023) (NAACL 2022)
- Fact Extraction and VERification (EACL 2023)
- Workshop on Computational Approaches to Discourse (EACL 2024) (ACL 2023) (COLING 2022)
- Workshop on Argument Mining and Workshop on Computational Models of Natural Argument (ICLP 2023) (COLING 2022) (COMMA 2022)
- Southern California Natural Language Processing Symposium (SoCal NLP 2022)
- Dataset and Demo Track (ICWSM 2023)
- Dataset and Demo Track (CIKM 2023)
- Digital and Social Media Track (HICSS 2024)
Journal Reviewer
- PLoS ONE (Since 2022 - Current)
Resources
- A Post-hoc Explanation Method by Stereotypes and Counter-Stereotypes to Assess Social Bias in Hate Speech Classifiers. to appear
- Optimized Bag-of-Words Model by Contextual Lexicon for Explainable Hate Speech Detection. to appear
- Sentence-Level News Credibility Prediction. to appear
- HateBR: Large-scale expert annotated corpus of Brazilian Instagram comments for abusive language detection
- FactNews: Sentence-level annotated corpus to predict factuality of news articles and bias of media outlets
- SentiAspect-pt: Aspect-based sentiment analysis annotated corpus of web consumer reviews
- OPCovidBR: Aspect-based sentiment analysis annotated corpus of Covid-19 tweets
- Deceiver: Multilingual RST-annotated corpus for fake news detection
- NoHateBrazil: A Brazilian Portuguese text offensiveness analysis system
- OPCluster: Automatic extraction and clustering of fine-grained opinions
- MOL: Multilingual offensive lexicon annotated with contextual information
- PRO: Taxonomies for aspect-based sentiment analysis
Teaching
- SCC5809-2021: Neural Networks and Deep Learning . Graduate Teaching Assistant. University of São Paulo. see
- SCC0605-2020: Computing Theory and Compilers. Graduate Teaching Assistant. University of São Paulo. see
- SCC0227-2016: Computer Seminars I. Graduate Teaching Assistant. University of São Paulo. see
Industry
- 2021-2023: Research Fellow. Sinch
- 2021-2021: Data Scientst. Cisco-Webex
- 2014-2015: System Analyst. Unisys
Links
- A Survey on Automated Fact-Checking. University of Cambridge
- A Survey on Hate Speech Detection using NLP. Saarland University
- Discourse and Pragmatics in NLP. Carnegie Mellon University
- Morality and Language in NLP. University of Southern California
- Computational Ethics for NLP. Carnegie Mellon University