Machine learning (ML) has become an indispensable part of the modern way of living; yet when it comes to establishing user trust, still no in-depth discussion seems to have taken place regarding the use of relevant models and strategies to unlock the ‘black-box’ mechanism of ML.
One major concept to gaining trust within such an ecosystem is algorithmic accountability which refers to the idea that organizations should be held responsible for the decisions of the algorithms they use. In spite of some related regulations such as the European Union’s (EU) legislation on providing “data subjects” with the right to explain why an algorithm made a particular decision- effective since May 2018- or the Association for Computing Machinery(ACM)’s principles of algorithmic transparency and accountability- effective since January 2017, the machine learning community needs to cope with various challenges which prevent the systematic investigation of algorithmic behavior. Some of the major challenges include :
1. Reproducibility: This relates to the difficulty of replicating AI systems of interest. According to a study done in 2018, only 6% of 400 authors at two top AI conferences shared their code, acting as an evidence for the fact that most state of the art algorithms cannot be replicated for further investigation .
2. Accessibility: This relates to the difficulty of accessing the underlying models of a system which might render learning systems as “black-boxes.” Given the use of the proprietary nature of training data, there is an increasing opacity of AI systems.
3. Efficiency: This relates to the difficulty of investigating a small number of systems (for example, one computer vision API) rather than a class of systems (such as all computer vision APIs). This issue arises because of the difficulty to get access to crucial AI systems which researchers might like to explore more in-depth.
In order to fulfill these principles of algorithmic accountability, there has been increasing use of machine learning (ML) algorithms for predicted variables. The following definitions might be useful for understanding the black-box mechanism and algorithms more in-depth [3, 4, 5]:
- Creating a PV: When creating a PV the type of the variable, and the type of potential observations from the environment need to be specified.
- Reading from a PV: The PV will return a predicted value of its specified type. This prediction may come with a confidence score or confidence interval. It is possible to think about different “output formats” as follows:
- The PV returns a single value – this can be deterministic or samples from a distribution.
- The value returned might be associated with a confidence score or a confidence interval.
- The PV might return a distribution from which the user can sample. This might be a “power user” feature which should be hidden in the first point
- Providing feedback to a PV: This is about telling the PV about whether or not its previous predictions were good. In the most generic form, feedback is not explicitly associated with a particular prediction. In cases where it is possible to associate feedback to a specific prediction, this may be a useful signal to improve the learning. While robust solutions should end up performing as well regardless of whether the feedback is provided continuously or not, identifying the tradeoffs is important.
- Telling a PV about its environment: The PV learns about its environment through observations. The observations that a variable can accept are specified at creation. The variable does not have a general expectation that all observation types are available at all times. Instead, observations may come asynchronously from predictions and different types of observations can come at different intervals.
PVs will have an interface of the following form:
An interesting dimension is to think about how to specify the valid output spaces for predictions as well as the valid input spaces for observations. The following list provides some different basic data types that would be desirable as input/output spaces [6, 7]:
- Continuous values (floats): These can potentially be bounded to a certain range.
- Discrete numbers (integers): These can potentially be bounded to a certain range.
- Discrete choices (boolean, enum, integers): It is important to realize that a small distance between two consecutive numbers does not necessarily indicate a small distance in the relevant space.
- Strings: Strings will nearly immediately be relevant as an input type. Predicting string values will likely be a bigger challenge and could be done through embedding spaces. Predicting strings might not be part of the first version of PVs.
- Vectors, sequences, sets, image: Similar to strings these will immediately be relevant as the input type. As an output type, they will likely become relevant only later.
- Structs, protos: Structured data is taken apart into its components and fed into the PVs as observations using different modalities. As an output, they will only become relevant once vectors, strings, images are solved.
Moreover, certain environments might be noisier than others. To give a specific example, in some cases the PV might get very clean observations where each observation is relevant to the next value to be read. In other cases, the PV might get very noisy samples with lots of observations being entirely irrelevant.
PVs can easily be used in any code by implementing the provided initial value function to explore the space of possible values. Once they start returning predicted values as soon as possible an improvement can occur over the initial value function.
Interactions between predicted variables might lead to challenges, depending on the setup [8, 9]:
- when PVs are used jointly for predicting subparts required for improving a system (groups)
- when observations going into one PV are coming from other PVs (hierarchical)
- when predictions from other PVs impact (negatively) the reward of other PVs (adversarial)
Reading a value from a PV, feeding an observation to a PV, and providing feedback about the performance of the PV can happen entirely on an asynchronous basis. For instance ;
- A PV can be read many times and then receive single feedback only at the very end of the program. It might also happen that the feedback provided is only loosely connected with the read values.
- A PV might be receiving a lot of observations and only after receiving many observations be asked to predict a value. The PV will need to be able to find out which observations are relevant.
- The schedule between receiving feedback and observations and reading a value from a PV might not be fixed. In some cases, the PV might be receiving observations but never be asked to provide a value – or it might be asked for many values but will be provided a different number (smaller or larger) of rewards.
This poses a challenge when compared to the standard way of thinking in RL formulations where the entire flow is controlled by an environment that has a rather rigid structure in the sense that the “agent” is expected to perform an action and the environment tells the agent about a potential reward, a new state, and observations from the environment. In PVs, while each of these exists in one form or another, the setup is less rigid and the rigidity of the setup may be a restriction to the success.
A PV corresponds to an agent in the terminology of Reinforcement learning (RL) in the following ways :
- The variable takes actions in an environment (the program where it is used).
- The actions affect the state of the environment (e.g. the program can change its control flow in response to the value of the PV).
- The variable receives observations from its environment.
- The variable receives a reward that is not necessarily tightly coupled with its individual actions.
In comparison to other ML disciplines such as supervised learning, there is currently much less understanding of which RL methods are robust and many different tricks are applied to make agents work better on specific problems. Without an in-depth analysis, the current idea is that in order to make RL methods work, there needs to be a lot of problem-specific engineering and tuning to be done within the published systems. Also, “standard” RL methods are in fact- contrary to what the name implies- not as standardized as ML methods in other setups such as supervised classification, image analysis, or even unsupervised methods such as clustering.
The current way of thinking about RL in many ways is centered around environments which makes it easy to evaluate different agents on a large set of problems.
In order to progress quickly, it seems like a good idea to start with using environments to train different models that can solve different problems that can be addressed by use of PVs. Once these models are trained they can be wrapped into a PV interface and implemented within their original context.
In the longer run, it might make better sense to think about training the PV directly through its own interface. This becomes possible by moving the training and agent logic into the PV. This seems more useful once there is a better understanding on which training methods and which agents perform well.
Some algorithmic models that make use of PVs are:
Contextual Multi-armed Bandits (cMABs)
(Contextual) Multi-armed bandits (cMABs) are used in black box optimization and share a lot of ideas with RL, therefore they might fit into PVs, e.g. MABs also have to handle exploitation and exploration and learning under uncertainty. MABs are also a standard method to determine advertisement placement. cMABs is an extension of MABs in that the agent gets to know something about the environment.
The biggest difference between cMABs and RL is that in RL the agent typically has to perform multiple steps in an environment that is changing (because of its actions). In the cMAB setup the agent receives one input and performs one action. Otherwise, the setup is very similar in that the agent typically obtains an observation x (often also called context), performs an action ‘a’, and receives a reward ‘r’.
RL can be seen as a generalization of the MAB in the sense that the agent takes multiple, consecutive actions and might receive a reward at a delayed point in time. Therefore, in many cases such as when a PV is applied in cases where consecutive actions are not connected, a cMAB approach might be well suited.
Another way to think about MABs is to see it as episodic RL with a fixed episode length which can also be seen as an advantage as it eliminates the need for long-term planning.
For handling the default function, Bayesian approaches which make use of the default function as a prior and then learn how to deviate from that could be a great fit given the interpretability power of these methods.
Bayesian methods on their own do not provide a solution, they can be considered as being orthogonal to whether a model is trained using RL or ES. One important benefit of incorporating Bayesian formulations into different parts of the learning process is the ability to explicitly quantify uncertainty, and offering interpretation over the outputs of PVs.
In many cases, a PV will be used for an extended period of time and will be used to predict a sequence of values, observe a sequence of observations, and receive a sequence of rewards.
Methods from sequence recognition, classification, and analysis such as those used in speech/handwriting recognition, NLP (Neural Language Processing) will be immediately relevant to model the state of the environment as a state that evolved from an initial state to the current state.
Meta-learning for Predicted Variables
Certain aspects of making use of predicted variables would require developing algorithms suitable for making PVs adapt certain dimensions of the inner learning algorithm to the setup in which they are used and the magnitude of the problem being solved.
For example, in the long-run a predicted variable should:
- identify whether it’s being used in a train/development/production setup automatically and adjust its parameters accordingly (scale down exploration, take safer bets when deviating from a policy with high reward, but continuously adapt to new data it is being presented with)
- identify the scale of the problem being solved and tune the complexity of the model accordingly (costs might not be justified for having complex model architectures when solving simple problems)
- naturally, mix-in models specifically used for multi-modal observations that are present (it might be that architectures or building blocks commonly used for solving vision problems should be used with image inputs, identifying that a stream of inputs could be better processed with RNNs, etc.)
Some of this might be at first specified in a learning configuration, but from a research perspective, these hyperparameters should be automatically configured in the learning setup.
Supervised Learning (SL) Methods
In some cases, PVs might be used to solve supervised learning (SL) problems. While the PV interface might not be the most natural fit for this purpose, it can be supported by creating a PV with a parameter indicating that it is targeted at solving an SL problem.
Given these current models and increasing use of PVs, the importance of algorithmic accountability should no longer be underrated by both the machine learning community as well as government agencies.
 Blodgett, Su Lin, Green, Lisa, and O’Connor, Brendan. Demographic dialectal variation in social media: A case study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1119–1130, Austin, Texas, November 2016.
 Buolamwini, Joy and Gebru, Timnit. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability, and Transparency, pp. 77–91, 2018.
 Chen, Le, Mislove, Alan, and Wilson, Christo. An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace. In Proceedings of the International World Wide Web Conference (WWW’16), Monteral, Canada, Apr 2016.
 Friedman, Batya and Nissenbaum, Helen. Bias in computer systems. ACM Trans. Inf. Syst., 14(3):330–347, July 1996.
 Goodman, Bryce and Flaxman, Seth. Eu regulations on algorithmic decision-making and a ”right to explanation”, 2016. 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY.
 Hutson, Matthew. Artificial intelligence faces reproducibility crisis, 2018. Larson, J, Mattu, S, Kirchner, L, and Angwin, J. How we analyzed the compass recidivism algorithm. Propublica, 2016
 Larson, J, Angwin, J, Kirchner, L, and Mattu, S. How we examined racial discrimination in auto insurance prices. Propublica, 2017.
 O’Neil, Cathy. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2017.
 Zhao, Jieyu, Wang, Tianlu, Yatskar, Mark, Ordonez, Vicente, and Chang, Kai-Wei. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2979–2989, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.