Human-Centered Evaluation of Language Technologies

EMNLP 2024 Tutorial
Saturday, Nov 16, 14:00-17:30

Miami, Florida, USA

Overview

Evaluation is a cornerstone topic in NLP. However, many criticisms have been raised about the community's evaluation practices, including a lack of human-centered considerations about people's needs for language technologies and technologies' actual impact on people. This “evaluation crisis” is exacerbated by the recent development of large generative models with diverse and uncertain capabilities. This tutorial aims to inspire more human-centered evaluation in NLP by introducing perspectives and methodologies from the social sciences and human-computer interaction (HCI), a field concerned primarily with the design and evaluation of technologies. The tutorial will start with an overview of current NLP evaluation practices and their limitations, then introduce complementary perspectives from the social sciences and a “toolbox of evaluation methods” from HCI, accompanied by discussions of considerations such as what to evaluate for, how generalizable the results are to the real-world contexts, and pragmatic costs of conducting the evaluation. The tutorial will also encourage reflection on how these HCI perspectives and methodologies can complement NLP evaluation through Q&A discussions and a hands-on exercise.

Slides

Agenda

  • Motivation and Overview

  • Current Evaluation Practices in NLP

    Overview of Different Types of NLP Evaluation

    Concerns and Limitations

  • Evaluating Evaluations: Perspectives from the Social Sciences

  • Human-Centered Evaluation Methods in HCI

  • Example Evaluation of Language Technologies in HCI Research

    Evaluating Writing Assistance

    Evaluating Chatbot

  • Reflection, Conclusion and Future Directions

  • Hands-on Group Exercise

Instructors