Francielle Vargas

Ph.D. Candidate in Computer Science - Natural Language Processing

I am a computer and language scientist with an MSc and Ph.D. (candidate) in Natural Language Processing. I obtained my M.Sc. in Computer Science and Computational Mathematics from the University of São Paulo (awarded in 2017). As part of my Master's Thesis, I developed commonsense reasoning for aspect-based opinion mining and summarization. Previously, I obtained a B.S. in Information Systems and a B.A. in Linguistics

I am interested in Natural Language Processing, Machine Learning and Computational Social Science. My research lies in the investigation of safe, trusted, and socially responsible AI systems, as well as computational methods for discourse and pragmatic level-language understanding and generation. I rely on machine learning techniques, including neural networks, to model and guide natural language system development. The topics that I am currently researching include:

  • fact-checking, fake news, and media bias detection, misinformation
  • hate speech and abusive language detection, bias mitigation, fairness
  • opinion mining-sentiment analysis, stylistic analysis, argument mining
  • discourse and pragmatic aware computational methods and applications

  • Publications

    Preprints
      • TEAR: A Hate Speech Dataset Of Hausa Facebook Comments
        Vargas, F., Alves, D., Guimarães, S., Hassan, S., Lamine, D. M., Benevenuto, F.
        pp.1-9. see

      • Discourse Annotation Guideline for Low-Resource Languages
        Vargas, F. Schmeisser-Nieto, W., Rabinovich, Z., W., Pardo, T.A.S., Benevenuto, F.
        Natural Language Engineering Journal. pp.1-60. see

      • Your Stereotypical Mileage May Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts
        Karën Fort, Laura Alonso Alemany, Jonathan Baum, Luciana Benotti, Julien Bezançon, Claudia Borg, Marthese Borg, Yongjian Chen, Fanny Ducel, Yoann Dupont, Guido Ivetta, Zhijian Li, Margot Mieskes, Marco Naguib, Aurélie Névéol, Yuyan Qian, Matteo Radaelli, Wolfgang Sebastian Schmeisser-Nieto, Emma Raimundo Schulz, Thiziri Saci, Sarah Saidi, Javier Torroba Marchante, Francielle Vargas, Shilin Xie, Sergio E. Zanotto
        pp.1-9. see

    2023
      • Predicting Sentence-Level Factuality of News and Bias of Media Outlets
        Vargas, F., Jaidka, K., Pardo, T.A.S., Benevenuto, F.
        Recent Advances in Natural Language Processing (RANLP). pp.1-10. Varna, Bulgaria. accepted

      • Context-Aware and Expert Data Resources for Brazilian Portuguese Hate Speech Detection
        Vargas, F. Carvalho, I., Pardo, T.A.S., Benevenuto, F.
        Natural Language Engineering Journal. pp.1-21. accepted

      • Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?
        Vargas, F., Carvalho, I., Hürriyetoğlu, A., Pardo, T.A.S., Benevenuto, F.
        Recent Advances in Natural Language Processing (RANLP). pp.1-10. Varna, Bulgaria. accepted

      • NoHateBrazil: A Brazilian Portuguese Text Offensiveness Analysis System
        Vargas, F., Carvalho, I., Schmeisser-Nieto, W., Benevenuto, F., Pardo, T.A.S.
        Recent Advances in Natural Language Processing (RANLP). pp.1-7. Varna, Bulgaria. accepted

      • Multimodal Hate Speech Detection
        Thapa, S, Jafr, F. A., Hürriyetoğlu, A., Vargas, F., Lee, R. K., Naseem, U.
        6th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (RANLP). pp.151-159. Varna, Bulgaria. accepted

    2022
      • Rhetorical Structure Approach for Online Deception Detection: A Survey
        Vargas, F., D'Alessandro, J., Rabinovich, Z., Benevenuto, F., Pardo, T.A.S.
        13th Conference on Language Resources and Evaluation (LREC). pp.5906‑5915. Marseille, France. see

      • HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Abusive Language Detection
        Vargas, F., Carvalho, I., Góes, F.R., Pardo, T.A.S., Benevenuto, F.
        13th Conference on Language Resources and Evaluation (LREC). pp.7174–7183. Marseille, France. see

      • Studying Dishonest Intentions in Brazilian Portuguese Texts
        Vargas, F., Pardo, T.A.S.
        Deceptive AI. Springer, vol 1296. pp.166–178. see

      • Extended Multilingual Protest News Detection
        Hurriyetoglu, A., Mutlu, O., San, F. D., Uca, O., Gurel, A. S., Radford, B., Dai, Y., Hettiarachchi, H., Stoehr, N., Nomoto, T., Slavcheva, M., Vargas, F., Javid, A., Beyhan, F., Yoruk, E.
        5th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (EMNLP). pp.223–228. Abu Dhabi, Arab Emirates. see

    2021
      • Contextual-Lexicon Approach for Abusive Language Detection
        Vargas, F., Góes, F.R., Carvalho, I., Benevenuto, F., Pardo, T.A.S.
        Recent Advances in Natural Language Processing (RANLP). pp.1442-1451. Held Online. see

      • Towards Discourse-Aware Models for Multilingual Fake News Detection
        Vargas, F., Benevenuto, F., Pardo, T.A.S.
        Recent Advances in Natural Language Processing (RANLP). pp.210-218. Held Online. see

      • Implicit Opinion Aspect Clues in Portuguese Texts: Analysis and Categorization
        Machado, M. T., Pardo, T.A.S., Ruiz, E. E. S., Di Felippo, Vargas, F.
        15th International Conference on the Computational Processing of Portuguese (PROPOR). pp.68-78. Fortaleza, Brazil. see

    2020
      • Linguistic Rules for Fine-Grained Opinion Extraction
        Vargas, F., Pardo, T.A.S.
        5th International Workshop on Social Sensing: Special Edition on Narrative Analysis on Social Media (ICWSM). pp.1-6. Held Online. see

      • Identifying Fine-Grained Opinion and Classifying Polarity on Coronavirus Pandemic
        Vargas, F., Santos, R.S.S., F., Rocha, P.R.
        9th Brazilian Conference on Intelligent Systems (BRACIS). pp.511-520. Rio Grande, Brazil. see

    2019 and before
      • Aspect Clustering Methods for Sentiment Analysis
        Vargas, F., Pardo, T.A.S.
        13th International Conference on the Computational Processing of Portuguese (PROPOR). pp.365-374. Canela, Brazil. see

      • The Coreference Annotation of the CSTNews Corpus
        Pardo, T.A.S., Baptista, J., Duran, M.S., Nunes, M.G.V., Nóbrega, F.A.A., Aluísio, S.M., Di Felippo, A., Seno, E.R.M., Silva, R.R., Anchieta, R.T., Brum, H.B., Dias, M.S., Martins, R.S.O., Maziero, E.G., Souza, J.W.C., Vargas, F.
        2nd Workshop on Evaluation of Human Language Technologies for Iberian Language (SEPLN). pp.102-112. Murcia, Spain. see


    Committees

    Organizing Committee
    Program Committee
    Journal Reviewer

    Resources

    Datasets
    • HateBR: Large-scale expert annotated dataset of Brazilian Instagram comments for abusive language detection
    • TEAR: A hate speech dataset of Facebook comments for the Hausa African Indigenous language
    • AspectBR: Aspect-based sentiment analysis annotated dataset of web consumer reviews
    • OPCovidBR: Aspect-based sentiment analysis annotated dataset of Covid-19 tweets
    • FactNews: Sentence-level annotated dataset to predict factually and media bias
    • Deceiver: Multilingual discourse-annotated dataset for fake news detection
    Softwares
    Lexicons
    • MOL: Multilingual offensive lexicon annotated with contextual information
    • PRO: Taxonomies for aspect-based sentiment analysis

    Teaching


    Projects


    Honors