Although many individuals seem to be obsessed about data in our digital age, data may refer to different concepts for different individuals. While for some it may refer to the results of an analysis, for others it may refer to reliable evidence based on the phrase “Let the data speak for itself.”
Still, others may want to reflect human-like qualities onto the data. To give an example, in his book Digital Destiny: How the New Age of Data Will Transform the Way We Work, Live, and Communicate, DuBravac claims that data wants not only to be understood but also to be moved and replicated:
“Data constantly moves toward efficiency. […] it destroys the moments between recognition and understanding. Because data wants to be understood, it abhors friction.”
The Latin root of the word data means “given.” Kitchin, the writer of the book The Data Revolution (2014), claimed that should be taken rather than given as except in the case of divine revelation, the collection of data depends on sound human judgment. and consequently, what gets measured eventually impacts the conclusions drawn based on it.
Furthermore, unless data is operationalized for some purpose it would provide no meaning. Although often times data is considered as the raw material of evidence, it is subject to a series of transformations before it can be utilized. Therefore, it is crucial to draw a distinction between “raw” data and processed data, whereas “raw data” is often conceptualized as referring to the basis of truth as if it were referring only to the facts. Yet, this can be misleading as several intentions and assumptions might be hidden when gathering data and processing it. In addition to this, various digital tools ranging from the use of error detection algorithms to use of averaging methods for the sake of countering measurement error may further contribute to potential biases and errors when it comes to the processing of data.
Raw data often provides a starting point for making conclusions. According to Professor Bowker, raw data might, in fact, entail both an oxymoron and a bad idea. Given the potential of background noise, data must be ‘cooked’- in his own terms- extreme care. While the term “raw” may signify some sense of being natural or untouched, the terms “cooked” entails some cognitive processes. Yet, given the fact that there is always some kind of cognitive or cultural processes involved that shape the way data is collected, “raw” data might even sound contradictory. Although the term “raw” implies that there was no processing at all after the data gathering took place, there have been hidden forms of processing that necessarily occur before gathering the data.
Needless to say, the way data is collected also influenced by the scientific measurements to be conducted as these instruments entail some engrained theories in order to make use of the data measurements. As the French physicist Duhem illustrated in his book The Aim and Structure of Physical Theory (1914):
Go into the laboratory; […] An observer plunges the metallic stem of a rod, mounted with rubber, into small holes; the iron oscillates and, by means of the mirror tied to it, sends a beam of light over to a celluloid ruler, and the observer follows the movement of the light beam on it. […] Ask him now what he is doing. Is he going to answer: “I am studying the oscillations of the piece of iron carrying this mirror?” No, he will tell you that he is measuring the electrical resistance of a coil. If you are astonished and ask him what meaning these words have, and what relation they have to the phenomena he has perceived and which you have at the same time perceived, he will reply that your question would require some very long explanations, and he will recommend that you take a course in electricity.
In a similar vein, Sadegh-Zadeh mentions his Handbook of Analytic Philosophy of Medicine (2012), data production can be conceptualized as a way of data engineering. given the massive amounts of data collected within the fields of physics or advanced mathematics via the use of devices such the Large Hadron Collider.
The gist of these anecdotes is that data production itself always a specific way of interpretation. As Gitelman & Jackson stated in their work “Raw Data” Is an Oxymoron (2013), given the reality of pre-processing raw data is not only a practically impossible, but also conceptually impossible as gathering data already involves a kind of processing.
As these authors mention, the context of data plays a crucial role as it implies how data was gathered or transformed. So, it would be impossible to consider data as being context-free or perfectly objectives. Given our fascination with data, we do not prefer to make a deep dive into its features and assume that data is often context-independent. We may need to scrutinize our assumptions about data though rather than thinking raw data is the key to the truth as there is no bias caused by a particular theory or ideology to it and hence it will lead us to complete truth without any need for experts.
Most individuals are familiar with the distinction between data, information, knowledge, and wisdom. In order to remind of this distinction in very broad terms:
- Data, which is seen as raw or unprocessed, and hence useless.
- Information referring to data about who, what, when, where, and how many.
- Knowledge which is concerned about accuracy and efficiency.
- Wisdom which is concerned with values.
This framework seems to no longer to dominate today’s understanding of data as the power of data seems to be seen in its level of rawness rather than the level of human reasoning. Yet, given the fact that each of us tries to make sense of this world not merely through experiences, yet mental frameworks shaped by our beliefs, norms and cultural values we could do no progress unless we make our assumptions exist or behave as if they not exist. These assumptions eventually shape the direction of our inquiry including the measurements. Data speaks for itself, it reflects the voice of its collectors.
In order to change the rules of the current data landscape, we can choose to change the existing policies or continue them. If individuals are merely left with the choice of giving consent to data as a result of decisions of the technocratic elites they would be disengaged in terms of understanding what data means.
When there is a clash of values, politics is not only inevitable but essential. No algorithm can determine which decision is best; yet it merely raises the question: By whose values is it the “best”?