Building Evaluation Benchmarks with Moral Rationales

The MFTCxplain project involves a diverse team of social psychologists, computer scientists, and linguists from Brazil, Australia, Germany, the Netherlands, Los Angeles, Washington, and Portland. In this project, we have selected annotators from various cultural backgrounds. This cultural diversity enables the team to measure biases and subjectivity across regions, offering insights into both intra-national and cross-national differences. Additionally, the dataset will facilitate comparisons of models trained in different cultural contexts, revealing how moral, political, and cultural values influence the perception and spread of hate speech, misinformation, and propaganda detection

Previous research has established a link between morality and shared content in digital environments, as well as between morality and hate speech (Kennedy et al., 2023), underscoring the need for a detailed analysis of the moral framing in election propaganda and hate speech. Proposed and initiated by Francielle Vargas, a computer scientist at the University of São Paulo, and Jackson Trager, a social psychologist at the University of Southern California, this project sits at the intersection of sociotechnical systems, AI and society, election studies, and culturally sensitive AI. The project aims to make significant contributions to these fields through representative data collection and explainability in AI models, with a particular focus on AI safety and transparency in cross-cultural contexts.

Key Terms: Natural Language Processing; Sociotechnical Systems; AI and Society; Election Studies; Human-Centered AI; Representative Data Collection; Culturally Sensitive AI; Misinformation, Hate Speech, and Propaganda; Explainability in AI Models; Fairness and Bias Mitigation.

References

Brendan Kennedya, Preni Golazizian, Jackson Trager, Mohammad Atari, Joe Hoovera, Aida Mostafazadeh Davania, and Morteza Dehghani (2023). The (moral) language of hate. PNAS Nexus, 2(7).
Joe Hoover, Gwenyth Portillo-Wightman, Leigh Yeh, Shreya Havaldar, Aida Mostafazadeh Davani, Ying Lin, Brendan Kennedy, Mohammad Atari, Zahra Kamel, Madelyn Mendlen, Gabriela Moreno, Christina Park, Tingyee E. Chang, Jenna Chin, Christian Leong, Jun Yen Leung, Arineh Mirinjian, and Morteza Dehghani (2020). Moral Foundations Twitter Corpus: A Collection of 35k Tweets Annotated for Moral Sentiment. Social Psychological and Personality Science, 11(8), 1057-1071.

Principal Investigators

Francielle Vargas. Institute of Mathematical and Computer Sciences, University of São Paulo, Brazil
Jackson Trager. Department of Psychology, University of Southern California, USA

Researchers

Diego Alves. Department of Language Science and Technology, Saarland University, Germany

Flor Plaza. Institute of Advanced Computer Science, Leiden University, Netherlands

Matteo Guida. School of Computing and Information Systems, University of Melbourne, Australia

Mikel Ngueajio. Department of Computer Science, Howard University, USA

Surendrabikram Thapa. Department of Computer Science, Virginia Tech, USA

Yalda Daryani. Department of Psychology, University of Southern California, USA

Farzan Karimi-Malekabadi. Department of Psychology, University of Southern California, USA

Ameeta Agrawal. College of Engineering and Computer Science, Portland State University, USA

Morteza Dehghani . Department of Psychology and Computer Science, University of Southern California, USA

Publications

MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanation
Jackson Trager, Diego Alves, Matteo Guida, Mikel Ngueajio, Ameeta Agrawal, Flor Plaza-del-Arco, Yalda Daryanai, Farzan Karimi-Malekabadi, Francielle Vargas
Under Review. pp. 1-24. 2025.

Resources

Dataset