Overview

Evaluation is a cornerstone topic in NLP. However, many criticisms have been raised about the community's evaluation practices, including a lack of human-centered considerations about people's needs for language technologies and technologies' actual impact on people. This “evaluation crisis” is exacerbated by the recent development of large generative models with diverse and uncertain capabilities. This tutorial aims to inspire more human-centered evaluation in NLP by introducing perspectives and methodologies from the social sciences and human-computer interaction (HCI), a field concerned primarily with the design and evaluation of technologies. The tutorial will start with an overview of current NLP evaluation practices and their limitations, then introduce complementary perspectives from the social sciences and a “toolbox of evaluation methods” from HCI, accompanied by discussions of considerations such as what to evaluate for, how generalizable the results are to the real-world contexts, and pragmatic costs of conducting the evaluation. The tutorial will also encourage reflection on how these HCI perspectives and methodologies can complement NLP evaluation through Q&A discussions and a hands-on exercise.

Slides

Agenda

Motivation and Overview
Current Evaluation Practices in NLP

Overview of Different Types of NLP Evaluation

Concerns and Limitations
Evaluating Evaluations: Perspectives from the Social Sciences
Human-Centered Evaluation Methods in HCI
Example Evaluation of Language Technologies in HCI Research

Evaluating Writing Assistance

Evaluating Chatbot
Reflection, Conclusion and Future Directions
Hands-on Group Exercise

Instructors

Su Lin Blodgett

Microsoft Research

Jackie Chi Kit Cheung

McGill University

Q. Vera Liao

Microsoft Research

Ziang Xiao

Johns Hopkins University

Human-Centered Evaluation of Language Technologies

EMNLP 2024 Tutorial
Saturday, Nov 16, 14:00-17:30

Miami, Florida, USA

Overview

Slides

Agenda

Motivation and Overview

Current Evaluation Practices in NLP

Evaluating Evaluations: Perspectives from the Social Sciences

Human-Centered Evaluation Methods in HCI

Example Evaluation of Language Technologies in HCI Research

Reflection, Conclusion and Future Directions

Hands-on Group Exercise

Instructors