Home Technology AI Interpreting Machine Leaning Models: A Myth or Reality?
Robot sitting on a bunch of books. Contains clipping path

Interpreting Machine Leaning Models: A Myth or Reality?


Despite the predictive capabilities of supervised machine learning, can we trust the machines? As much as we want the models to be good, we also want them to be interpretable. Yet, the task of interpretation often remains vague.

Despite the proliferation of machine learning into our daily lives ranging from finance to justice, a majority of the users find their models difficult to understand. This lack of a commonly agreed upon definition or the ill-definition of the interpretability means that rather than being a monolithic concept, interpretability embeds various related concepts.

Interpretability is mostly used in the field of supervised learning in comparison to other fields of machine learning such as reinforcement or interactive learning. Existing research studies approach interpretability as a means to establish trust. Yet, it needs to be clarified whether trust refers to the robustness of a model’s performance or to some other properties.

Viewing interpretability simply as a low-level mechanistic understanding of models might be problematic. Despite the capability of machines of discovering causal structure in data, they still are far from being perfect for offering relevant matches for the tasks they are supposed to solve in the real life. One reason for this failure might be the oversimplification of optimization goals so that they fail to fulfill more complicates real-life goals. Another reason might be the unrepresentativeness of the training data of the related deployment ecosystem. Besides, given a model’s complexity, all of parameters, algorithms, factors of human agency need to be taken into account.

Whenever there is a gap between the goals of supervised learning and the costs of a real world deployment setting, demand for interpretability would emerge. Not every real life goal can be coded as simple functions. To give a specific example, an algorithm designed to make hiring decisions would not be able to optimize all of productivity and ethics. So, a formal model that would work within the context of a real-life environment would be a struggle. In order to overcome this struggle, here are some aspects of interpretability to be taken into account:

  • Trust: While for some individuals, trust may refer to the divergence of training and deployment objectives, for others it may refer to being at ease with understanding a model. Alternatively, it may refer to the extent to which machine learning models can make accurate predictions in certain cases so that we can easily hand over control to them. In case the model makes errors, where human agents would make accurate predictions. there would be benefit in keeping the human agent’s control intact.
  • Causality: Causality relates to inferring some hypothesis about the real world through by providing strong associations via use of several models such as regression models or Bayesian networks. Although there may always be hidden causes underpinning a certain phenomenon, such models could still provide an opportunity for new hypotheses and tests. The inference of strong causal relationships relies on strong assumptions of prior knowledge.
  • Transferability: A typical judgment on a model’s power of generalization is made based on the gap between its performance on test and training data which are chosen by random partitions from the same distribution. Yet, human-beings are capable of making more sophisticated generalizations due to their ability to transfer their skills to unknown situations. To give a specific example, in the case of models used to develop credit ratings, supervised learning models make use of various variables such as account age, debt ratio, amount of late payments etc…all of which can be easily manipulated. Due to the fact that the rating system is in fact built by individuals who are able to change various variables, the predictive power of these systems remains low.
  • Informativeness: Although in some cases, decision theory might be applied to the results of supervised models in order to take further actions, most of the time, the supervised model is used by human decision makers to gain further information. So, the common objective in the real world is to access useful information although in machine learning models the aim is often the reduction of error. Being informative does not necessarily mean reflecting upon the inner dynamics of a model. To give a specific example, while a diagnostic model may equip an individual with useful information on similar cases regarding a diagnostic decision, it may still be unable to provide an effective treatment. While we aim to explore the data more in-depth as in the case of a supervised learning model, our real objective is more akin to a model of unsupervised learning.
  • Fairness: As algorithms can have an impact on our social interactions it would be the right time to raise concerns for their alignment with ethical standards as it has already been done by various researchers in the field. Given the fact that algorithmic decision-making is used in various fields ranging from law to finance, further innovations in the field of artificial intelligence will enhance the capabilities of the software. Yet, within the light of these developments, one question that needs to be answered is how the algorithms would not make any discriminations against a specific gender or race. Traditional methods such as evaluation metrics in the form of accuracy or decision theory offer little assurance. Therefore, a lack of a model that can prove fairness often results in a request for an interpretability of models.

Within the light of this information, what should be the techniques to be taken into account in order for a model to be interpretable. There are mainly two broad categories:

  • Transparency: How does the model function?
  • Post hoc explanations: What else does the model explain?

Even though such a model is not absolute it can still provide useful information.


Beginning with transparency, this refers to a mechanism of the ability of the model to be simulated which is in contrast to a black box model. In order for being simulated, a model needs to be simple so that it can be comprehended at all. In other words, inout data along with the parameters should be taken through each calculation in order to generate a prediction. Some researchers assume that such transparent models can be presented by means of textual or visual artifacts. Yet, in some cases, size of the model may increase faster than the time to conduct the interference itself. Given the constraints of the human cognition, neither linear models nor rule0based systems are completely interpretable.

The second concept of transparency is about decomposability. This refers to the fact that each parameter or node within the system entails a simple text description or represent some relationship between the features and labels. Yet, it should be taken into account that such a relationship is sensitive with regard to feature selection and pre-processing. To give a specific example, the relationship between the risk of flu risk and vaccination might are contingent upon the existence of features such as age or immunodeficiency.

Within the context of the learning algorithms, we should speak of algorithmic transparency. This refers to the fact that there is some kind of proof for the training to converge towards a specific solution, within the context of new and unseen datasets.

Despite the power of heuristic models used by deep learning models, such models are far away from being transparent as they cannot be fully grasped by human-beings. Therefore, there is also no guarantee that such models will work fine for new problems as well.

Post-hoc Interpretability

This refers to extracting knowledge from learned models in the form of natural language explanations or visualizations of learned models for end users of machine learning.

One way to implement interpretability would be to train one model for predictions (to choose a particular action for goal optimization) and another one such as a neural network language model for the purposes of explanation ( to map the model’s states in terms of verbal explanations).

Another way to implement interpretability would be to provide visualizations which might be helpful in specifying in a qualitative manner what has been learned by the model.

To give a specific example, in order to provide hints on what an image classification network has learned, the input can be revised through gradient descent to increase the initiations of various nodes selected from the hidden layers. There are various models to discover what type of information is kept at several layers of a neural network.

Still another way to implement interpretability would be to provide local explanations in the form of a computation of a saliency map which usually takes the gradient of the output within the correct class regarding an input vector. As the saliency map is only a local explanation it can be misleading as well as a movement of a single pixel would result in a different saliency map.

new data

So What Does All that Mean? 

When the question is whether to choose a linear or deep model, a trade-off needs to be made between decomposability and algorithmic transparency due to the fact that deep models such as neural networks usually function on lightly processed features while linear models require heavily engineered features. Decomposability would not be accomplished by linear models when it comes to making their performance approach those fo RNN. 

As long as the goal is to achieve interpretability rathe rather than mere transparency, a definition should be specified. To give a specific example, a non-black-box medical algorithm may be good at establishing trust due to its transparency, its predictive power may still be in need of development as the longer-term goal of improving health care may not have been achieved yet.

Caution should always be practiced against blindly embracing post-hoc notions of interpretability as often times, an algorithm can offer misleading yet still plausible explanations. As confusing plausibility for truth is a well-known weakness of human-beings, practitioners in the field of machine learning should raise awareness of the areas in which some virtues may actually be disguised under racial or gender discrimination, especially when it coems to decision-making in the fields of leadership or HR admissions. In other words, there would be no use to machine learning when it merely replicates the same pathological behavior of human-beings at scale.

Where To Go From Here?

Given the high level impact of machine learning upon our society, oru first goal should be to ensure that we intent to solve the correct problems. Nevertheless, problems such as fairness would still remain as an area of struggle exact definitions of success cannot be explained verbally.

As long as we continue to raise our voice and get involved in critical writing, we can contribute to the responsibility of all stakeholders ranging from big Tech companies- mainly those located in Silicon Valley- to lawmakers and policy-makers to be held responsible for the impact of machine learning as well as its alignment with societal objectives.

Previous articleThe Future of Social Media on the Blockchain
Next articleCanadian City of Calgary launches its own digital currency
Ayse Kok
Ayse completed her masters and doctorate degrees at both University of Oxford (UK) and University of Cambridge (UK). She participated in various projects in partnership with international organizations such as UN, NATO, and the EU. She also served as an adjunct faculty member at Bosphorus University in her home town Turkey. Furthermore, she is the editor of several international journals, including those for Springer, Wiley and Elsevier Science. She attended various international conferences as a speaker and published over 100 articles in both peer-reviewed journals and academic books. Having published 3 books in the field of technology & policy, Ayse is a member of the IEEE Communications Society, member of the IEEE Technical Committee on Security & Privacy, member of the IEEE IoT Community and member of the IEEE Cybersecurity Community. She also acts as a policy analyst for Global Foundation for Cyber Studies and Research. Currently, she lives with her family in Silicon Valley where she worked as a researcher for companies like Facebook and Google.


  1. […] scenarios without requiring the typical process of training machine learning models. AutoML  “Machine Learning Models” will bridge the gap between actual technology and its existing use. As AI is progressing rapidly […]


Please enter your comment!
Please enter your name here