Several decisions in our lives, ranging from the determination of insurance rates to the provision of health care, occur through the reliance of information systems on statistical inference and learning. Given the increasing use of automated decision-making, debates among scholars and policy-makers also increase given the fact that most systems trained for decision-making will have an embedded inheritance of past biases. Some of these issues might be alleviated by making the automated decision-maker blind to some attributes, yet this still remains a challenge if some attributes might be correlated with the protected ones.
The main purpose is to make fair decisions which are not biased towards some groups in the population. In order to achieve this goal, there are two important concepts to be taken into consideration: Group fairness and individual fairness. Group fairness, also referred to as statistical parity, aims to ensure that, in general, members in a protected group receiving a specific classification (positive or negative) represent a proportion of the population as a whole.
When it comes to making a fair classification, so far the following main strategies have been deployed in general:
- Changing the labels of the examples so that, for both protected and unprotected groups, the same proportion of positive labels exist. Given the new labels, a classifier is trained with the assumption of making the equal opportunity of positive labeling to be generalized to the whole test set.
- By adding a regularizer to the classification training goal which determines the degree of bias or discrimination, the system can be trained to achieve maximum accuracy while minimizing discrimination. By equalizing the proportions of positive labels in two different datasets, classification labels can be more accurately predicted. The modified data can also be utilized for making a classifier learn for future decisions.
- Mapping to an intermediate representation occurs through means of optimizing the classification decision criteria. The purpose of this information bottleneck is to compress the information in some source variable while maintaining information about another variable.
Another concept that relates to individual fairness- the treatment of similar individuals in a similar manner- is differential privacy. This refers to the privacy aiming at maintaining the privacy during data analysis. In other words, it is ensured that the result of any analysis is equally likely to occur on any pair of databases which are only different in terms of the data of the single individual.
Given the importance of fairness for the society, related stakeholders from various fields, ranging from economics to law, need to bring their machine learning and algorithmic perspective onto the table in this new era. Although such a collaborative effort may not eliminate all the issues at once, the following two issues should be given priority:
- Although all formulations with regard to fairness try to eliminate any kind of bias, it is often the case that there is a direct correlation with membership in a protected group. To give a specific example, while making a prediction on who is eligible for a home loan, it is often the case that, statistically, those living in a certain neighborhood are much more likely to receive the loan than the population at large. The development of a framework for fairness that does not force equality is a crucial issue.
- In order to achieve fairness, it may be useful to go beyond prototypes and make use of multi-dimensional distributed representations that may increase the chances of achieving qualifications of higher quality.
Unless we are willing to deconstruct the given definitions and categories in machine learning, it would be difficult to specify to what extent fairness can be achieved. Rather than focusing on rapid progress in the field, perhaps we should take a step back and reflect on what could we have done differently in the field of machine learning to obtain and maintain fairness.