I joke quite a bit about how I am annoyed by a lot of things. I do have a lot of pet peeves, especially as I get older: Oxford commas, sweet tea that is not sweet enough, dog owners who bring leashed dogs inside an off-leash dog park, chain grocery stores selling vegetables soaking wet, and so on. There is much more.
On the professional front, among many other things, I get annoyed every time I see “AI/ML”. Too many say “AI” when they mean machine learning or some other term relevant to analytics. Even seasoned professionals now misuse these terms, in part because everyone else is misusing them. I guess if you can’t beat them, join them?
I’ve seen those entertaining Facebook posts going around about your 2021 destiny. By my astrological sign, I think I’m supposed to end up in jail. “AI/ML” just might be what puts me there.
Why is it all so confusing?
One of the key challenges with data is that there is a lot of confusion in terminology, and it has taken on a life of its own. With all the attention on artificial intelligence today, people throw around AI and machine learning and pretty much everything else analytics like they throw around candy. There is a lot of “I don’t know what it is, but I don’t really care, and I need it now,” and for all the wrong reasons.
So, let’s back up for a minute. Why are these terms so confusing? The following might clarify some things:
- “Artificial intelligence” is a concept. It is simply the ability/competence that happens to be artificial. It simply replaces natural intelligence.
- “Machine learning” is a class of computational approaches for analyzing data. It doesn’t even really refer to techniques.
- “Data science” is a discipline. Much like “social science” is a discipline.
They are apples and tomatoes, but we continue to pretend that they are just different types of apples when some are culinarily vegetables. So, most questions along the lines of “what’s the difference between” are largely invalid, because the comparison doesn’t make sense.
Statistical models can be and are often computationally implemented as machine learning models. To add to this, it is worth noting that machine learning is not always capable of producing the specific type of insights you are looking for, and there are types of insights for which machine learning is entirely inappropriate or ineffective.
Unfortunately, a lot of the confusion is being perpetuated by the supposed professionals in the discipline. Some understand the problem but have resigned to conformity. Some others really don’t think it’s a problem, and there are plenty of others who themselves do not understand the difference, which makes me question what passes for being professionals. The sheer number of machine learning engineers I’ve run into who insist that regression models are not statistics makes me want to weep, which I suppose is better than ending up in jail.
Why does it matter?
Confusion breeds questions for which no satisfactory answers can be provided, and this, in turn, makes people even more uncomfortable and demanding at the same time. However, from the business perspective, they are irrelevant—it doesn’t matter if it’s big data, data science, statistics, analytics, machine learning, business intelligence, etc. They all have the same objective: evidence-based decision making. The only real difference is in the specific approaches and techniques. The business does not and should not care about that.
What about Big Data? The size of data really just makes it a technology problem, which just happens to broaden the range of analytical approaches in play. Even the smallest businesses and individuals can benefit from data-centric thinking. I have plenty of examples.
To take the idea a little further: There is the raw data, which become inputs to analytical procedures, then their outputs or results lead to insights. All this is information, just in different forms. From the business perspective, the specific form is just a technicality; it’s the utility we’re after, or should be anyway. Think of it like eggs: you could eat it raw—although personally I find it rather unappealing—you could boil, fry, or scramble it, or you could use it to make other things. Of course, information is not tangible like eggs, so that makes it harder.
What about automation?
Now, I said all analytical terms are basically the same from the business perspective. However, there is a difference between analytics and automation, and they are confused quite often. Data science or machine learning is not automation and vice versa. Automation is an objective, and it just happens that it may or may not involve data and analytics to accomplish—today it often does.
Analytics is a tool. Automation could leverage analytics, and analytics could be automated. Even some analytical tasks could be automated, which in fact, is a solution to a different problem from the problem the analytics itself is intended to address.
There is a tendency to think that AI solves both the automation problem and the need for insights. This is generally not true. So, before you go heads first into the world of (what you think is) AI:
- Articulate what you are solving from the perspective of the end user or the customer. Exactly what will the user or customer see or do? How will that be different from how that is done today? It is important that this be articulated concretely and with as much detail as possible.
- For the specific business problem, determine if your primary objective is automation or obtaining insights. While there can be elements of both, one is almost always the primary goal over the other, and the other simply supports the primary goal, if at all. Not having a clear idea of the primary goal leads to wrong solutions or further confusing people because you keep using the terms incorrectly!
If done well, not only will this help give clarity, but it will also help you communicate in terms that your employees can understand, which is critical in obtaining buy-in, trust, and ultimately, adoption.
And you might just keep me out of jail….