On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare
Published in Nature Machine Intelligence, 2024
Recommended citation: Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar Glaser, Cristian Canton Ferrer, Tal Hassner. On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare. Nature Machine Intelligence. Published August, 2024.
Abstract
Machine learning models’ performance heavily depends on the quality and characteristics of their training datasets. However, existing datasets often face issues related to fairness, privacy, and regulatory compliance. This paper presents a comprehensive framework for creating responsible machine learning datasets, emphasizing fairness, privacy protection, and adherence to regulatory norms. We discuss key considerations in dataset creation, including diversity, consent, anonymization, and legal compliance. The framework is illustrated through case studies in biometrics and healthcare, demonstrating its practical application in sensitive domains. Our work provides guidelines for researchers and practitioners to develop datasets that not only enhance model performance but also address ethical concerns and regulatory requirements in machine learning applications.
BibTex:
Links:
Full paper, available publicly from Nature Machine Intelligence