Data science provides huge opportunities to improve private and public life, as well as our environment (consider the development of smart cities or the problems caused by carbon emissions). Unfortunately, such opportunities are also coupled to significant ethical challenges. The extensive use of increasingly more data — often personal, if not sensitive (big data) — and the growing reliance on algorithms to analyze them in order to shape choices and to make decisions (including machine learning, artificial intelligence and robotics), as well as the gradual reduction of human involvement or even oversight over many automatic processes, pose pressing issues of fairness, responsibility and respect of human rights, among others.
These ethical challenges can be addressed successfully. Fostering the development and applications of data science while ensuring the respect of human rights and of the values shaping open, pluralistic and tolerant information societies is a great opportunity of which we can and must take advantage. Striking such a robust balance will not be an easy or simple task. But the alternative, failing to advance both the ethics and the science of data, would have regrettable consequences.
In order to reach solutions that maximize the ethical value of data science to benefit our societies, all of us should take responsibility for the demanding task of data ethics. Data ethics is built on the foundation provided by computer and information ethics, which has focused for the past 30 years on the main challenges posed by digital technologies. This rich legacy is most valuable. At the same time, data ethics refines the approach endorsed so far in computer and information ethics, as it changes the level of abstraction (LoA) of ethical enquiries from an information-centric (LoAI) to a data-centric one (LoAD).
The shift from LoAI to LoAD is the latest in a series of changes that have characterized the evolution of computer and information ethics. Research in this field first endorsed a human-centric LoA, which addressed the ethical problems posed by the dissemination of computers in terms of professional responsibilities of both their designers and users. The LoA then shifted to a computer-centric one (LoAC) in the mid-1980s, and it changed again at the beginning of the second millennium to LoAI.
While LoAC highlighted the nature of computers as universal and malleable tools, it made it easier to understand the impact that computers could have on shaping social dynamics as well as on the design of the environment surrounding us. LoAI then shifted the focus from the technological means to the content (information) that can be created, recorded, processed and shared through such means. In doing so, LoAI emphasized the different moral dimensions of information — i.e. information as the source, the result or the target of moral actions — and led to the design of a macro-ethical approach.
In a few decades, we have come to understand that data science is not a specific technology (computers, tablets, mobile phones, online platforms, cloud computing and so forth), but what any digital technology manipulates that represents the correct focus of our ethical strategies. The shift from information ethics to data ethics is probably more semantic than conceptual, but it does highlight the need to concentrate on what is being handled as the true invariant of our concerns. This is why labels such as ‘robo-ethics’ or ‘machine ethics’ miss the point, anachronistically stepping back to a time when ‘computer ethics’ seemed to provide the right perspective. It is not the hardware that causes ethical problems, it is what the hardware does with the software and the data that represents the source of our new difficulties. LoAD brings into focus the different moral dimensions of data. In doing so, it highlights the fact that, before concerning information, ethical problems such as privacy, anonymity, transparency, trust and responsibility concern data collection, curation, analysis and use, and hence they are better understood at that level.
In the light of this change of LoA, the ethical challenges posed by data science can be mapped within the conceptual space delineated by three axes of research: the ethics of data, the ethics of algorithms and the ethics of practices:
- The ethics of data focuses on ethical problems posed by the use of big data in biomedical research and social sciences, including profiling, data philanthropy, as well as open data.
- The ethics of algorithms addresses issues posed by the increasing complexity and autonomy of algorithms broadly understood (e.g. including artificial intelligence and artificial agents, such as Internet bots), especially in the case of machine learning applications.
- Finally, the ethics of practices (including professional ethics and deontology) addresses the pressing questions concerning the responsibilities of data scientists and organizations, with the goal to define an ethical framework to foster responsible innovation and progress of data science and the protection of the rights of individuals and groups. Three issues are central in this line of analysis: consent, user privacy and secondary use.
While they are distinct lines of research, the ethics of data, algorithms and practices are obviously intertwined, and this is why it may be preferable to think in terms of three axes. Most of them do not lie on a single axis. For example, analyses focusing on data privacy will also address issues concerning consent and professional responsibilities. Likewise, ethical auditing of algorithms often implies analyses of the responsibilities of their designers, developers, users and adopters.
Data ethics needs to be developed from the start as macro-ethics, that is, as an overall ‘geometry’ of the ethical space that avoids narrow, ad hoc approaches but rather addresses the diverse set of ethical implications of data science within a consistent, holistic and inclusive framework.
“While God has created you and what you make!” [Qur’an: 37:96]