We talk a lot about using analytics to improve the quality of something else. But when was the last time you thought about the quality of the analytics themselves? For many people, unfortunately, this is “never.” And when we do, invariably it’s either too vague or too narrow.
Often, there is an implicit assumption: analytics professionals produce high-quality stuff simply because they are technically capable. It doesn’t matter whether it is predictive modeling, designed experiments, classifications, segmentation, or any other product of statistics, advanced analytics, machine learning, data science, or even AI.
The challenge is there are so many things that can go wrong that have little to do with technical expertise. What’s worse, even the most advanced and experienced analytics professionals are often unaware of the errors they make. I’ve helped organizations implement a quality program in analytics over the years and cannot understate how common this is.
If you’re an analytics professional, you need to be intentional and methodical about how you build quality in your work. If you’re the business or the principal investigator, know that having analytics does not necessarily mean having quality analytics. You’re a leader in an organization? Your analytics function needs a defensible and well-documented quality process.
How is this done? At a recent talk I gave, the attendee poll indicated only about 1/3 had any quality program or quality methodology for their analytics practice at all. So, the question is rather: how is it that it is not done all the time?
Precedents for systematizing analytics quality
There are some precedents for formalizing the idea of the quality of analytics. The traditional approach has been peer review, which is indeed a method for one piece of it. Since it assesses the analytics product against some good technical practices, it implicitly and qualitatively evaluates quality.
Others are a little more formalized. The Quality Assurance Framework of the European Statistical System is an example in the official statistics realm. In some sectors, there are regulatory mandates with quality implications, like the model risk management requirements in financial services. I suspect most of the 1/3 at that talk had some sort of regulatory requirements. That has been my observation from experience elsewhere.
There are two major shortcomings of the existing approaches. First, they focus mostly on the analytic output, like precision, rather than on the practices that yield quality analytics. They attempt to measure the symptoms of quality rather than to address how to generate quality by design.
Second, the scope is often too narrow for application to analytics practices more generally. Regulatory requirements for the quality of analytics naturally focus on specific aspects. So, that reduces the scope, not to mention it forces the approach to be more compliance-oriented.
Defining “quality” for analytics
Most of us want what we do to be of good quality—for the pride of workmanship, for reputational reasons, because it’s the right thing to do, and so on. Quality is also something we all intuitively understand. You just kind of know when it is there.
But quality is also incredibly hard to articulate. Many attempts at analytics quality programs do not go anywhere, because they start with what quality notionally looks like, not with what it is.
What is “quality” then? We can start with what the American Society for Quality says:
“In technical usage, ‘quality’ can have two meanings:
- the characteristics of a product or service that bear on its ability to satisfy stated or implied needs;
- a product or service free of deficiencies.”
Either way, it is about meeting some set of criteria or standards. The definitions by other established quality experts and organizations are similar.
This leads to the idea of defects. Whenever something does not meet those criteria or standards, we have a defect. This is measurable, as in “the number of defects per million opportunities” often used in a manufacturing setting. Then, the fewer the defects, the higher the quality.
What about the qualitative aspects of quality? We often associate quality with something that we feel we cannot fully measure. But it still implies there are some expectations, albeit subjective, for something to be “of good quality.” Otherwise, there wouldn’t be anything to compare our perception against. And whenever that expectation is not met, we can at least conceptually define that as a defect.
The business case for the quality of analytics
Why should anyone outside of analytics professionals care about the quality of analytics?
Perhaps the most obvious reason is risk management. Poor quality of anything usually has negative consequences, so eliminating defects reduces the associated risks. The most notable risk is that of making a wrong decision from defective analytics. In addition, things like biases and ethical problems can lead to risks that are sometimes less directly quantifiable.
Second, better quality means more value from analytics. Since each defect is an erosion of value or effectiveness, eliminating defects allows you to get more out of analytics.
Then, there is the human aspect. Over time, having fewer defects helps develop trust and confidence in analytics by the consumers of analytics. This leads to a greater comfort level with analytics, which is a critical component of adoption and therefore ROI.
How analytical defects happen in practice
Who, me? I don’t create defects! I know what I’m doing. How dare you.
We all tend to assume we do not make errors. Every time I implement an analytics quality program, the number of defects identified surprises even the most senior analysts. There are two primary dimensions to how defects happen in analytics: competency and intent.
First, competency: Do you know your stuff?
Much poor-quality analytics result from the analyst not knowing what he/she is doing. It is even scarier when no one is aware of the poor quality. Unfortunately, too many people find out only when it causes something to go wrong. The popularity of everything data produced many analysts with the technical knowledge of analytical mechanics without a full comprehension of the underlying fundamentals. That one can apply the techniques does not mean one understands everything involved in using the techniques. This gap can result in pretty horrendous defects.
The second is the intent of the analyst. Did you mean to do it?
To be clear, an analyst can be fully competent and intentionally do something that somehow does not fit the norm. The worst case is sabotage. But people do mean well most of the time, and the perceived “defect” is simply a result of specific consideration. It simply needs to be justified and documented. That does not always happen, which then becomes a defect.
The vast majority of the errors happen because analysts who know their stuff do not mean to do it or forget why they did it. It is carelessness, oversight, inattention to details, lack of diligence, etc. They are innocent but defects, nonetheless.
Other things can influence how defects happen. They include whether you mean well, whose best interest you have, and whether you even care.
What about data quality?
Poor data quality is one of the biggest sources of complaints by analytics professionals. It is important to note data quality largely is a dependency and a constraint of the analytics professionals. There are some important exceptions, but data quality is not their objective.
Don’t get me wrong. This is not to say that analytics professionals are not responsible for data quality. Obviously, the quality of the ingredients is a key factor in producing a quality product. Analytics professionals are absolutely responsible for knowing and understanding the quality of the data used. They need to know and be able to explain the impact the quality of the data has on their output.
But their goal is to make something out of data, not to make data error-free. They have to deal with data quality because they depend on it, and they only do so reactively. They know enough to diagnose data quality issues, apply fixes as appropriate, and explain the limitations of their analysis caused by data quality.
Analytics professionals are not data management professionals. They are not fully versed in all of the data management best (or even standard) practices. That’s an entire profession in itself! The accountability for the quality of the ingredients belongs—or should belong—elsewhere most of the time, specifically to the chief data officer (CDO) or the equivalent.
If you are on the business side, know this: as long as you expect and depend on analytics professionals to take care of data quality, data quality will remain reactive and, as a result, unpredictable. Addressing data quality at the organizational level will allow analytics professionals to use their quality mindshare where they should—on true analytics activities.
Ensuring the quality of the analytics process
The point is, analytics professionals need to look at what they do to produce a quality product and not get fixated on what circumstances lead them to produce a quality product. We’re all in denial to an extent—it rarely occurs to us to look at ourselves to define quality as a consequence of what we do. We like to look elsewhere for the source of our own improvement. A recent tweet I saw said something to the effect of “the mirror in the bathroom still works.”
Of course, this is not unique to analytics professionals. That said, of all people, shouldn’t analytics professionals understand how to “build quality in,” as Deming said?
What is there to analytics quality? We need to examine what analytics professionals do with data. Is the analysis appropriate by design? Is the analysis clear and consistent? Can you trace everything step by step? Is it error-free, transparent, and complete? And is everything justifiable? A failure in replicability and reproducibility is a result of some defect in the analytics process.
And since the analytics profession does not exist without clients, colleagues, and collaborators, the quality of analytical work necessarily includes aspects of the project or task delivery. This includes things like expectations, justifications, and documentation.
One thing is for sure: you can’t speak analytics quality into existence.
There is so much more to all of this, and a single blog is not going to do anywhere near the justice to the topic. The details aside (and I have so much more!), the fact remains how analytics is done to ensure quality is something rarely thought of. We need to start there.
From everything I have seen, analytics professionals are some of the most resistant people against having analytical principles applied. They assume their products are of quality but never check their assumptions as a good statistician should. I have come across so many experienced, senior-level analytics professionals, utterly shocked by all the defects identified upon implementing an analytics quality program. Some actually become very defensive and start to blame other things like the lack of skills; fortunately, for the vast majority, it is simply a humbling learning experience.
A lot of this is just human nature—not unlike, as they say, doctors make the worst patients. That said, the analytics professionals who truly “live” analytics are invariably the most trusted by their clients and colleagues.
This does require a shift in how analytics professionals view themselves. A well-respected statistician with decades of experience commented that it had never occurred to him to apply statistical and quality principles to statistics itself. If the analytics professionals are not going to think about systematically achieving quality in their analytics, then who will?