Yongkai Wu

Yongkai Wu

Anti-discrimination learning - SBP 2017 Tutorial

07/08/2017

TL;DR

The aim of this tutorial is to point out the limitations existing in current association-based approaches, introduce a causal modeling-based framework for anti-discrimination learning, and suggest potential future research directions.

Abstract

Anti-discrimination learning is an increasingly important task in data mining and machine learning fields. Discrimination discovery is the problem of unveiling discriminatory practices by analyzing a dataset of historical decision records, and discrimination prevention aims to remove discrimination by modifying the biased data and/or the predictive algorithms. Discrimination is causal, which means that to prove discrimination one needs to derive a causal relationship rather than an association relationship. Although it is well-known that association does not mean causation, the gap between association and causation is not given enough attention by many researchers. The aim of this tutorial is to point out the limitations existing in current association-based approaches, introduce a causal modeling-based framework for anti-discrimination learning, and suggest potential future research directions.

Importance and Target Audience

Discrimination discovery and prevention have been an active research area in data science due to increasing concerns about discrimination as data analytic technologies could be used to unfairly treat unwanted groups digitally, such as customers, employees, tenants, or recipients of credit. In 2014, U.S. President Obama called for a 90-day review of data collecting and analyzing practices. An important conclusion from the resulting report [1] is that "Big data technologies can cause societal harms beyond damages to privacy, such as discrimination against individuals and groups". In May 2016, the Executive Office of the President made recommendations to "support research into mitigating algorithmic discrimination, building systems that support fairness and accountability, and developing strong data ethics frameworks" [2]. How to ensure non-discrimination in social computing and behavioral modeling & prediction is an important and challenging topic.

The tutorial targets researchers interested in studying the issue of discovering and preventing discrimination caused by data mining and machine learning algorithms from the causal modeling perspective. The audience is assumed to be familiar with the fundamental concepts in data mining and machine learning, especially in predictive learning. No special requirements are needed on software and hardware.

Outline

The tutorial is organized into six parts, including an introduction part, a literature review part, three main technical parts, and a concluding part. The tutorial is based in part on our position paper [3].

Introduction, motivation and challenges
- Legal definitions and principles of discrimination
- Motivation of anti-discrimination learning
- Challenges in discrimination discovery and prevention
Association-based anti-discrimination literature review
- Approaches for discrimination discovery
- Approaches for discrimination prevention
- Gap between association and causation
Causal modeling-based anti-discrimination framework
- Causal modeling background
- Discrimination categorization and anti-discrimination framework overview
System-level discrimination discovery and removal
- Modeling of direct and indirect discrimination
- Quantitative discrimination criterion
- Discrimination removal algorithms
- Ensuring non-discrimination in prediction
Group and individual-level discrimination
- Approach for group-level direct discrimination
- Approach for individual-level direct discrimination
Challenges and directions for future research

Tutors' Short Bio

Dr. Lu Zhang is a postdoctoral researcher in the Computer Science and Computer Engineering Department, University of Arkansas. He received a BEng degree in computer science and engineering from the University of Science and Technology of China, in 2008, and a PhD degree in computer science from Nanyang Technological University, Singapore, in 2013. His research interests include data mining algorithms, discrimination-aware data mining, and causal inference.
Yongkai Wu is a Ph.D. student in the Department of Computer Science and Computer Engineering at the University of Arkansas.
Dr. Xintao Wu is a Professor in the Department of Computer Science and Computer Engineering at the University of Arkansas. His major research interests include data privacy, bioinformatics and discrimination-aware data mining.

References

[1] Big data: Seizing opportunities, preserving values. White House (2014). [2] Munoz, C., Smith, M., Patil, D.: Big data: A report on algorithmic systems, opportunity, and civil rights. Executive Office of the President (2016). [3] Zhang, L., Wu, X.: Anti-discrimination learning: a causal modeling-based framework, JDSA, to appear.