Recently, I’ve read Dr Justin Chan’s piece on Data Driven Investing.
I found it quite interesting and eye opening:
A Big Data intelligence firm for businesses and investors, used satellite imagery of JCPenney parking lots during the quarter to confirm that traffic into its stores across the country was in fact increasing.
The firm’s clients (mostly hedge funds) who paid to obtain this satellite imagery could thus deduce, virtually in real-time, that JC Penney’s performance was on the up. And many of them ultimately capitalized on this information by buying JCPenney stock well before the release of the company’s Q2 report in August — and well before the 10% price jump.
Chan J, 2019, The Rise of Data-Driven Investing, Data Driven Investing
Frankly, before I read this, I wouldn’t have thought to use satellite imagery to measure the busyness of stores.
Even though I’m trained in data analysis and machine learning, I still find using alternative data sources to make decisions too hard to do.
By hard I mean, you need to do the following:
- Brain storm on what data sources need to be found.
- Find if the data source even exist. If it doesn’t, then can you make it?
- Pipeline the data into a repository.
- Clean the data.
- Pipeline the data from the repository into a machine learning model or graphical interface.
- Turn the results into something meaningful.
- When the pipeline breaks, fix it.
- Repeat the whole process again when you feel the information isn’t sufficient yet.
If you’re a one man show like myself, this is just way too time consuming, especially if you have to juggle a career and family.
This guide is broken up into 3 sections.
Section 1: Why you shouldn’t be too much into data driven decision making.
Section 2: How to think like a data driven decision maker.
Section 3: 5 Tips on how to be lazy.
Why you shouldn’t be too much into data driven decision making
As mentioned previously, you shouldn’t be too much into data driven decision making as it is hard.
Reason 1 — Most time spent on data is cleaning it.
This difficulty would be compounded especially if you don’t have any programming skills, particularly in R, SQL or Python.
In my humble opinion, I believe how the data came about is what is most important in data driven investing.
For instance, I wouldn’t really pay anyone to do data analysis for me as what you’re really paying for is for someone to clean your data.
A quote from Trifacta, neatly summarises what happens in data analysis.
80% of the time spent on data analytics is allocated to data munging, where IT manually cleans the data to pass over to business users who perform analytics. Data munging is time-consuming and disjointed process gets in the way of extracting true value and potential from data.
This is something I can vouch too as I have spent endless of hours cleaning data rather than actually finding value, and when I do find something of value, I realise that my results suck and I have to clean the data again.
To be frank, if your data is in a table, you can simply clean the data using Excel and use some machine learning software eg. Weka to do the machine learning for you. This can also be done in one shot in PowerBI if you’re familiar with the software.
If you are using imagery data, you can use GCP Vision AI to find results for you. All you need to do is feed images into the API.
Reason 2 — The point of data isn’t to find some sort of hidden trend.
Most people don’t understand data in the first place. By this I mean, the purpose of data isn’t to find a hidden trend. The purpose of data is to prove or disprove your hypothesis — ‘The Scientific Method’.
For example, the use of satellite imagery to analyse JCPenney parking lots to monitor store traffic isn’t some secret. Rather, someone was smart enough to be creative in their search for data to either support or disprove their idea.
What I think really happened was this: someone recently visited JCPenney to find that it was quite busy. The financial market says that profits were falling and probably had attributed it to online shopping. This person thought otherwise so used satellite imagery to either prove or disprove his hypothesis.
As a side note, if you didn’t have the big bucks to pay for real time satellite imagery, you could also drive to a few of your closest JCPenney’s to check. Or better yet, you can use Google Maps to check the surrounding traffic of a few JCPenney’s as a means of using ‘data’ and do this everyday at various times for a few weeks.
You don’t need to be overly well versed in data foundations to be a data driven investor. The reason is that the people in general make data seem more complex than what it is.
The simplest way to think of data is this: I’ve got $2. The shop sells a small cheeseburger for $2 and a large cheeseburger for $3. My data, the $2, tells me that I can only afford the small cheeseburger. — This is data driven investing in a nutshell.
Thinking Skill 1 — Data can be aggregated and disaggregated. Data can be correlated and causal.
Firstly, you need to realise that data points can be combined and separated. In other words, data can be added up, subtracted, multiplied, and divided. This too can happen with image, video and sound data via pre-processing.
*As a side note, even with simple satellite images, they still need to be processed so that they can become viewable.
Perhaps, what is lost with many people is that they take the highest aggregation as the truth and do not drill down further.
Here’s an example by what I mean:
You can find more of Aristocrat’s Annual Reports here.
The revenue from 2015 to 2016 jumped by 546.3 million (fig 1). If you didn’t know any better you’d think the company had a good run for that year. But, if you decided to look at the foot notes (fig 2), you’ll see that revenue jumped across the board. However, if you decided to dig a little bit more, you can see from the Cash Flow statement (fig 3) that the company actually made a substantial acquisition in 2015. In most cases, acquisitions usually add to the revenues for the following year.
What we could guess from our data points above:
1. We could correlate with 2016 being a better year from the company than 2015, since all revenues from all segments went up.
2. We could say that the cause of 2016’s revenue improving is because of acquisitions made in 2015.
This is more annual report reading over alternative data decision making, but the concept is the same: data is data. As you can see data can be broken down and built up, but also data points can be influenced by other factors.
Even if you had the whole data as a big data format (ie. all transactional data), finding the relevant trends aren’t any easier. You can find so many correlated trends from a big data set but very little actual causal trends. All you can really do is aggregate the data and see trends for a specific variable and don’t think too hard in trying to find causal trends.
*As a side note, what supervised machine learning does is find many correlated trends to your target variable and makes a prediction of the future based on these correlations. Sometimes it is successful but many times it fails miserably.
Thinking Skill 2 — Understand that you’re a bias thinker and know that you’re always better off proving yourself wrong than proving yourself right.
Chapter 15 in ‘HBR Guide to Data Analytics Basics for Managers’ provides a good list of biases that we tend to suffer from when looking at data. The list includes confirmation, overconfidence, and overfitting.
Confirmation bias occurs when you read data in such a way that it supports your view or you twist the logic so that it does support your view.
For instance, company A has seen an increase in profit 15% year on year for the last 3 years. You will believe this will continue. Two reports come out. One report suggests the profits will continue to rise and the other suggests a sluggish year will come. You ignore the negative report and invest based only on the positive report’s suggestions.
Overconfidence occurs when you have such faith in the data that you believe it can’t be wrong.
For example, company A’s revenue has grown 15% year on year for the last 5 years. You are confident that it will continue growing 15% more in the future. Someone tells you that the company’s leverage has also increase 20% year on year as well. You confidently tell that person that you believe the company will surely pay off their growing debt in one swoop one day. In the following year, the company goes bankrupt.
Overfitting occurs when you found a trend and you believe it holds true for all situations.
As an example, you have found that 20 years ago that the Christmas period always increases the profit of large department stores. So, you decide to invest heavily in a particular department store just before Christmas. However, the following quarter’s financial statements show that revenue has fallen. Reports explain that online shopping has boomed for that Christmas period.
You could come up with strategies to tackle all three biases individually, or instead you can use one general strategy that tackles as many of your biases as possible.
That strategy is to prove yourself wrong.
For example, you think Domino’s Pizza Enterprise’s (ASX:DMP) investment in the German market will be successful, so you actively find data that could disprove your hypothesis. You can go onto the Eurostat webpage to conduct your research. You can see recently that the consumption of pizza fell in Germany from July to December 2020 but rose again slightly from January to March 2021. An educated guess might lead us to believe that pizza in Germany is consumed less in warmer months but more in cooler months, or perhaps, the German market didn’t like current options and only recently better options attracted more consumers.
Clearly, this data neither supports nor negates your hypothesis but it gives you the idea that hunting for data that might prove you wrong is important for data driven investing.
5 Tips on how to be lazy
The purpose of these tips is mostly to drive you away from analysis paralysis sort of thinking to understanding how data flows around us. Analysis of data is hard work and you don’t really need to waste your time on too much analysis. Furthermore, you don’t need an academic background in data to understand how data flows. All you need to do is engage with data and it’ll come intuitively.
- Paying for data shouldn’t be first. The only way to be creative in finding data sources is to spend as little money as you can paying someone else to find the data for you. It’s more effortful to work hard and pay someone to do data analysis for you, than it is to find creative ways to make data driven decisions. As with the previous example with JCPenney, rather than pay someone satellite images, you could have looked at Google Map traffic trends and recorded the results over a few weeks at different time intervals and visit a few stores yourself to see how busy it actually is.
- Actively find useful data to help you with decision making. By this I mean, you want to look to productionised data that is free and useful. For example, the reason why I believe I could use Google Map traffic data to proxy how busy a store is that I have Google Maps show me the route home from work everyday and inadvertently it shows me how busy a place is. This is better than trying to hunt down news reports just before work to see what traffic accidents have happened on the way home.
- Find ways and systems to automate decision making. Rather than use systems aimlessly, you should begin adopting technologies that use data and you should understand how and why the data is being used. For example, begin using credit cards to track your purchases. Credit card companies have constantly running algorithms that notify you if an odd purchase has been made. The purpose of this is to get you used to understanding how data is created and productionised.
- Read what you have before you begin searching. The very first is to investing is to understand the business before anything else. This can be as simple as downloading the year’s annual report and reading the principal activities section, which is usually a sentence. At least by understanding how the business works, even at a superficial level, you’ll have an idea of what other data to look for.
- Understand business data and financial data and integrate the two in your decision making. Data for decision making comes in two flavours: business data and financial data. You need to understand both. Business data can be: What is the foot traffic? How often are complaints made? How does the product compare to its competitors? Financial data can be: What is net profit after tax? What is the asset to profit ratio? What is the leverage? All you want to know is if the financial data agrees with the business data. The two are linked. For example, if a business increases in the number of customers, you’d expect to see an increase in revenue. If you don’t, then you can suspect decreasing profit margins. Then, you calculate the profit margins to prove yourself right or wrong.
As you can read, the whole pint of the guide is to dissuade you from believing that data will give some sort of large competitive advantage in investing. If you think it will, then you’ll be at a terrible disadvantage. There are many big data firms out there already mining data sources to gain the competitive advantage. For example, TwoSigma uses machine learning to find how the news affects the share market in real time.
However, the advantage that you have is that really smart people tend to focus on things so intensely to the point that they suffer from diminishing returns and often don’t know when to give up. (sunk cost fallacy)
So, while data scientists focus intently on using machine learning algorithms to find hidden patterns and trends in data sets, all you really need to do is understand how a business work, hypothesise if a revenue stream is successful, and go out in the world to experience it yourself or find a data set as a proxy for your ideal data set.
If you want to see how I evaluate companies using financial and business data, you can see here: https://sites.google.com/view/focusanalysis/insights-and-analysis