Francielle Vargas

Ph.D. Candidate in Computer Science - Natural Language Processing

I am a computer and language scientist with MSc and Ph.D. (candidate) in Natural Language Processing. My current mentors are Prof. Fabrício Benevenuto, and Prof. Thiago A. S. Pardo . I also contribute as a researcher at the Social Computing Lab and the Interinstitutional Center for Computational Linguistics.

I am interested in Natural Language Processing and Computational Social Science. My research lies in the investigation of safely, trustworthy, and socially responsible AI systems, as well as computational methods for discourse and pragmatic level-language understanding and generation. I rely on machine learning techniques, including neural networks, to model and guide natural language system development. The topics that I am currently researching include:

  • fact-checking, fake news and media bias detection, misinformation
  • hate speech and abusive language detection, bias mitigation, fairness
  • opinion mining-sentiment analysis, stylistic analysis, argument mining
  • discourse and pragmatic aware computational methods and applications
  • I obtained my M.Sc. in Computer Science and Computational Mathematics from University of São Paulo (awarded in 2017). As part of my Master's Thesis , I developed a new method based on semantic relations and linguistic rules to automatically detect fine-grained opinions from user-generated content (UGC) for aspect-based sentiment analysis and summarization. Previously, I obtained B.S. in Information Systems and B.A. in Linguistics.


    Publications
    Committees
    Resources
    Teaching
    Projects
    Industry
    Awards


    Publications

    Journals
      • Vargas, F. , Carvalho, I., Góes, F., Hürriyetoğlu, A., Pardo, T.A.S., Benevenuto, F. (2023). Socially Responsible Abusive Language Detection: Can Context Information Mitigate Stereotypes? Social Network Analysis and Mining Journal. pp.1-15. under review

      • Vargas, F. , Carvalho, I., Góes, F., Pardo, T.A.S., Benevenuto, F. (2023). Context-Aware and Expert Data Resources for Brazilian Portuguese Hate Speech Detection. Natural Language Enginerring Journal - Cambridge Core. pp.1-9. under review

      • Vargas, F. , Schmeisser-Nieto, W., Rabinovich, Z. (2023). Discourse Annotation Guideline for Low-Resource Languages. Natural Language Engineering Journal. Cambridge Core. pp.1-56. under review

    Conferences
      • Vargas, F., Góes, F.R., Pardo, T.A.S., Benevenuto, F. (2023). Predicting Sentence-Level Factuality of News and Bias of Media Outlets. pp.1-5. under review

      • Vargas, F. , Pardo, T.A.S., Benevenuto, F. (2023). FACTual: An Automated Fact-Checking and News Credibility System. pp.1-7. under review

      • Vargas, F. , Carvalho, I., Góes, F.R., Schmeisser-Nieto, W., Pardo, T.A.S., Benevenuto, F. (2023). Towards A Fairness-Aware Text Offensiveness Analysis System. pp.1-7. under review

      • Vargas, F. , D'Alessandro, J., Rabinovich, Z., Benevenuto, F., Pardo, T.A.S. (2022). Rhetorical Structure Approach for Online Deception Detection: A Survey. 13th Conference on Language Resources and Evaluation (LREC). pp.5906‑5915. Marseille, France. see

      • Vargas, F. , Carvalho, I., Góes, F.R., Pardo, T.A.S., Benevenuto, F. (2022). HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Abusive Language Detection. 13th Conference on Language Resources and Evaluation (LREC). pp.7174–7183. Marseille, France. see

      • Vargas, F. , Góes, F.R., Carvalho, I., Pardo, T.A.S., Benevenuto, F. (2021). Contextual-Lexicon Approach for Abusive Language Detection. 13th Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications (RANLP). pp.1442-1451. Held Online. see

      • Vargas, F. , Benevenuto, F., Pardo, T.A.S. (2021). Towards Discourse-Aware Models for Multilingual Fake News Detection. 13th Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications - Student Research Workshop (RANLP). pp.210-218. Held Online. see

      • Vargas, F. , Santos, R.S.S., F., Rocha, P.R. (2020). Identifying Fine-Grained Opinion and Classifying Polarity on Coronavirus Pandemic. 9th Brazilian Conference on Intelligent Systems (BRACIS). pp. 511-520. Rio Grande, Brazil. see

      • Vargas, F. , Pardo, T.A.S., (2018). Aspect Clustering Methods for Sentiment Analysis. 13th International Conference on the Computational Processing of Portuguese (PROPOR). pp. 365-374. Canela, Brazil. see

    Workshops
      • Vargas, F. , Guimarães, S., Lamine, M. D., Benevenuto, F. (2023). Abusive Language Detection in African Indigenous Languages: A Multilingual Dataset and Translation-Emotion Approach. The 7th Workshop on Online Abuse and Harms. pp.1-10. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). pp. 1-9. Toronto, Canada. under review

      • Hurriyetoglu, A.; Mutlu, O., San, F. D., Uca, O., Gurel, A. S., Radford, B., Dai, Y., Hettiarachchi, H., Stoehr, N.; Nomoto, T.; Slavcheva, M.; Vargas, F. A., Javid, A., Beyhan, F., Yoruk, E. (2022). Extended Multilingual Protest News Detection . The 5th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp.223–228. Abu Dhabi, Arab Emirates. see

      • Vargas, F. , Pardo, T.A.S. (2020). Linguistic Rules for Fine-Grained Opinion Extraction. The 5th International Workshop on Social Sensing: Special Edition on Narrative Analysis on Social Media. Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM). pp.1-6. Held Online. see

      • Pardo, T.A.S., Baptista, J., Duran, M.S., Nunes, M.G.V., N´obrega, F.A.A., Alu´ısio, S.M., Di Felippo, A., Seno, E.R.M., Silva, R.R., Anchiˆeta, R.T., Brum, H.B., Dias, M.S., Martins, R.S.O., Maziero, E.G., Souza, J.W.C., Vargas, F. (2017). The Coreference Annotation of the CSTNews Corpus. The 2nd Workshop on Evaluation of Human Language Technologies for Iberian Language. Proceedings of the 33th Conference of the Spanish Society for Natural Language Processing (SEPLN). pp.102-112. Murcia, Spain. see


    Committees

    Organizing Committee
    Program Committee
        ACL
      • Discourse and Pragmatics Track (2023)
      • Computational Social Science and Cultural Analytics Track (2023)
      • EMNLP
      • Discourse and Pragmatics Track (2022)
      • Computational Social Science and Cultural Analytics Track (2022)
      • NAACL
      • NAACL Student Research Workshop (2022)
      • ICWSM
      • 17th International AAAI Conference on Web and Social Media (2023)
      • SoCal NLP
      • 5th Southern California Natural Language Processing Symposium (2022)
      • WOAH
      • 7th Workshop on Online Abuse and Harms (ACL 2023)
      • 6th Workshop on Online Abuse and Harms (NAACL 2022)
      • FEVER
      • 6th Fact Extraction and VERification (EACL 2023)
      • CODI
      • 4th Workshop on Computational Approaches to Discourse (ACL 2023)
      • 3rd Workshop on Computational Approaches to Discourse (COLING 2022)
      • ARGMINING
      • 10th Workshop on Argument Mining (EMNLP 2023)
      • 9th Workshop on Argument Mining (COLING 2022)
      • 1st Workshop on Argumentation & Machine Learning (COMMA 2022)
      • 22th Workshop on Computational Models of Natural Argument (COMMA 2022)
      • CASE
      • 6th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (RANLP 2023)
      • 5th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (EMNLP 2022)
      • 4th International Workshop Challenges and Applications of Automated Extraction of Socio-Political Events from Text (ACL 2021)
    Journal Reviewer

    Resources

    Datasets
    • HateBR: Large-scale expert annotated dataset of Brazilian Instagram comments for abusive language detection.
    • TEAR: Multilingual dataset of hateful comments for the West African Indigenous languages.
    • AspectBR: Aspect-based sentiment analysis annotated dataset of web consumer reviews.
    • OPCovidBR: Aspect-based sentiment analysis annotated dataset of Covid-19 tweets.
    • FactNews: Sentence-level annotated dataset to predict factually and media bias.
    • Deceiver: Multilingual discourse-annotated dataset for fake news detection.
    Softwares
    Lexicons
    • MOL: Multilingual offensive lexicon annotated with contextual information.
    • PRO: Taxonomies for aspect-based sentiment analysis.

    Teaching


    Projects


    Industry

    • 2021-2023: Research Fellow. Sinch
    • 2021-2021: Data Scientst. Cisco-Webex
    • 2014-2015: System Analyst. Unisys

    Awards