In my previous blog, “Who is looking after the quality of your analytics?”, I talked about what it meant to ensure the quality of analytics work. We readily speak of using analytics to improve the quality of other things but rarely ever of the quality of analytics itself. I also argued quality could be defined in terms of defects one way or another. This means perfect quality can be expressed as the absence of defects.
Well, it is one thing to talk about quality. How do we implement it in an analytics practice?
How data quality relates to the quality of analytics practice
The typical quality headaches in analytics often surround the quality of the input data. As I explained in my previous blog, data quality is largely a dependency and a constraint for analytics. So, this is not what we mean by the quality of analytics.
However, we can leverage the learnings from industry data quality standards to frame analytics quality. Specifically, it is useful to consider how data management defines quality dimensions. Although there are variations, they are fairly well-defined (see, for example, DAMA Data Management Body of Knowledge, 2nd edition). Data quality issues, that is, data defects, can then be identified against one or more of these dimensions:
- Completeness: How populated or complete is it? Is all required data present?
- Validity: Do the values technically conform to the defined domain?
- Accuracy: Are there errors?
- Consistency: Are the values consistent within and across?
- Integrity: Is the data coherent within and across data objects?
- Timeliness: Does it reflect the timing of interest?
- Uniqueness: Is there any duplication or fragmentation?
- Reasonability: Does it reasonably reflect reality?
Dimensions of quality for analytics practice
We can extend the idea to define a similar set of dimensions for analytics:
- Appropriateness: Are the analysis design and execution appropriate for the question and the need?
- Clarity: Is everything clear and free of ambiguities?
- Consistency: Is the analysis tool- and system-agnostic? Does it produce the same result every time you run the same thing?
- Traceability: Can the lineage of the analysis be traced completely from start to finish?
- Accuracy: Is the execution free of errors?
- Transparency: Can the analysis result and the analytic itself be explained? Is everything about the analysis clearly documented so that someone who was not involved can understand?
- Completeness: Is the logic free of gaps? Has everything that matters been considered and accounted for?
- Justifiability: Is there a defensible reason for everything? What are the risks, dependencies, and limitations of the choices?
A failure in replicability or reproducibility should have defects in at least one of these dimensions.
Furthermore, it is important to realize the analytics profession does not exist without clients, colleagues, and collaborators. Then, the quality of the analytics work necessarily includes project delivery considerations, since they closely intertwine with the technical aspects of the project:
- Have clear expectations been set?
- Have these expectations been met?
- Has everything, analytical and non-analytical, been justified?
- Has everything been documented?
- Is the project free of outstanding issues?
- Is there validation that all expectations and requirements have been met?
Managing the quality of analytics projects
Now that we have the dimensions, we consider what it means to manage quality in analytics. Quality management is a lifecycle, and while there are variations, it generally consists of the following components.
- Quality planning: defining the requirements, tasks, and activities to design quality into the product.
- Quality assurance: generating the product by employing methodologies to reduce defects and maximize the likelihood of achieving quality expectations.
- Quality control: verifying the products meet quality requirements.
- Quality maintenance and improvement: performing ongoing activities to maintain and improve the quality of the product.
Managing the quality of analytics projects, then, involves applying these principles. It should be noted “quality assurance” and “quality control” are often used interchangeably or even thought to represent the same idea. For our discussion, however, we maintain the distinction above.
Mapping quality management to analytics
Although not always explicit, the typical analytics project lifecycle consists of the following stages: design, development, deployment, and use. In the design stage, the aspects of the analytic and the project are planned and designed. The analytic is developed in the development stage then made available in the deployment stage. Finally, the users leverage the analytic made available to them to make business or research decisions in the use stage. Then:
- Quality planning happens primarily in the design stage, in which we define how we plan to design quality into the analysis and the project.
- Quality assurance is carrying out the project and the analysis in a way that minimizes defects and maximizes the likelihood of achieving the quality expectations. Since this applies to practically everything related to an analytics project, it spans from design and development to deployment. There are methodologies, standards, and practices designed to accomplish quality goals. It is also important to standardize approaches, not just routines and macros. Analytics practitioners often shy away from pre-defining, but much more can be standardized than commonly perceived.
- Quality control is verifying that the quality requirements have been met in an analytics project. This often consists of checklists, reviews, and audits. Even design has a doing aspect that can be verified. We do frown upon the concept of “inspection” in the quality and productivity best practices. However, verification, especially independent verification, that the project meets quality expectations is valuable and sometimes even regulatorily mandated in analytics.
- Quality maintenance and improvement include post-launch or post-publication activities. There are two parts. The first is the ongoing maintenance of the analytic and its use, which primarily concerns ensuring the analytic maintains its external validity over time and identifying operational defects such as system problems and errors, changes in data structures, and changes in contexts and/or behaviors. The second part is improving the system of analytics quality practices for the next project.
Quality control in analytics practice
Curiously, traditional notions about quality in analytics have focused on quality control, especially in the form of peer review. Sometimes, a review is even regulatorily mandated. This means some quality control practices and standards do exist in analytics, though not always implemented well.
At times, they exist as systems of independent reviews. However, having a separate team is not always appropriate. While creating an independent team for this purpose is often encouraged or even required, it does have drawbacks. Furthermore, it is obviously more challenging in small organizations.
There are other ways to implement quality control practices while maintaining some level of independence. The “how” depends on the needs and the circumstances.
It is, however, useful to separate the delivery audit from the expert review within the quality control realm. While the expert review perhaps is more familiar to us, they both question what has been done in the project.
The purpose of the expert review is to evaluate the analysis against good technical practices. The peer review for journal publications is a classic example.
However, many defects have little to do with technical expertise. To address this, the delivery audit becomes quite literally an audit function. It checks against requirements, ensures analysts have done everything they have said they would, verifies every decision has been justified and documented, and confirms there are no obvious tactical errors.
There are important practical implications of this distinction. While the expert reviewer must be an expert, lacking deep expertise is an asset for the delivery auditor, since innocent questions often lead to error discovery. The expert reviewer should become familiar with the analysis he or she is reviewing, but becoming familiar with the project can compromise the idea of an arm’s-length delivery audit.
If we do quality planning and assurance well, then this is straightforward. We “cannot inspect quality into a product” as Deming said, although some verification is always needed, independent or otherwise.
Implementing a quality program of analytics practice
It is one thing to understand what quality is and what its theoretical implications are in analytics. In reality, many analytics managers and practitioners struggle with making it a reality.
So then, how do we make this happen? One thing is for certain: we cannot speak analytics quality into existence. The reality is that analytics quality is often still a pipe dream. Some key practical considerations for implementing a quality program include the following:
- First thing first. Create an inventory of the analytics and the analytics projects. Do you know what you are managing the quality of?
- Define roles and responsibilities. Who does what? Should you have an independent team? Who signs off on what? And most importantly, who will own the quality program?
- Define processes and procedures. What does the workflow look like? How do you define a common flow of activities to make things predictable?
- Define standards, policies, and requirements. What will defects be identified against? Do you need checklists, forms, and auditability standards? Are there regulatory requirements? The list goes on.
- Set up basic infrastructure. At the minimum, this includes a document store, some logging capabilities, a shared computing environment, and common tools. If you productionalize the resulting analytics, you need environments, tools, and data sets for testing before moving to production. I know from experience that unit testing is inadequate for identifying defects in productionalized analytics.
- Train all analysts on the basic quality principles and practices. They must understand the quality expectations as well as the methodologies and techniques for quality. Since good quality practices often run counter to what analytics practitioners consider elegant and advanced, this requires a shift in mindset.
Obviously, there is a lot more to implementing a successful analytics quality program. It takes the ability to balance the diverse needs of the business without compromising the quality principles. With a solid framework, it is not as specific to techniques or types of analytics as commonly perceived. It also requires a lot of discipline, not as much art as you might think!
Want to know more? Contact us.