On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare

Published in Nature Machine Intelligence, 2024

Recommended citation: Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar Glaser, Cristian Canton Ferrer, Tal Hassner. On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare. Nature Machine Intelligence. Published August, 2024.

Abstract

Machine learning models’ performance heavily depends on the quality and characteristics of their training datasets. However, existing datasets often face issues related to fairness, privacy, and regulatory compliance. This paper presents a comprehensive framework for creating responsible machine learning datasets, emphasizing fairness, privacy protection, and adherence to regulatory norms. We discuss key considerations in dataset creation, including diversity, consent, anonymization, and legal compliance. The framework is illustrated through case studies in biometrics and healthcare, demonstrating its practical application in sensitive domains. Our work provides guidelines for researchers and practitioners to develop datasets that not only enhance model performance but also address ethical concerns and regulatory requirements in machine learning applications.

BibTex:

@article{Mittal2024Responsible,
  title={On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare},
  author={Mittal, Surbhi and Thakral, Kartik and Singh, Richa and Vatsa, Mayank and Glaser, Tamar and Canton Ferrer, Cristian and Hassner, Tal},
  journal={Nature Machine Intelligence},
  year={2024},
  month={August},
  doi={10.1038/s42256-024-00874-y},
  url={https://www.nature.com/articles/s42256-024-00874-y}
}

Full paper, available publicly from Nature Machine Intelligence