Traditional metrics used when making investing decisions while broad, tend to come from the same set of data points: Financial metrics, Business Operations, Competitor Performance, etc. Alternative data extends the definition of data we can use to make these decisions. They allow us to generate insights and make investment decisions from new and strange datasets, providing us with a unique edge to help develop a contrarian view.
The tricky part of working with alternative data is it can be difficult to define a process to go from idea to investment thesis and decision.
We will work in the following:
- Idea Generation
- Searching For Data
- Cleaning and Analyzing Data
- Creating an Investment thesis from our analysis
Sourcing ideas to conduct data analysis is one of the most important factors when looking through alternative datasets, there are so many datasets available that we could search for hours endlessly and produce no meaningful insights. We need to develop an idea and working hypothesis from which to then define the data we need to confirm our hypothesis, and ultimately make the investment decision.
For the purpose of this guide, we will be using alternative datasets to make predictions about precious metals prices and exercising the trade with metals futures.
Knowing that we would like to detect a precious metals mispricing, we can look into some fundamental driving factors behind different metals and their pricing.
While gold is commonly used to hedge against both cash inflation and equity markets, Ben Bernanke is an outspoken critic on its viability as an investment. We will use this to look through gold and related precious metals to look at the spreads between them, a common statistical arbitrage technique.
Searching for Data
Now that we know we’d like to search for data related to gold and other precious metals, we can begin looking for datasets. There are several commercial alternative data providers who use high-powered telescopes, satellite imagery, and proprietary sources to collect and sell some of the alternative datasets we might be interested in, and they can be fantastic sources for ideas. However, these solutions are incredibly expensive, and for the purposes of our gold spreads thesis, we can leverage free datasets by building our own tools and scouring through free sources.
Quandl is one of these alternative data providers, but they offer a free subscription for retail users and individuals with access to a portion of the datasets that they host on their website. Looking through their offered free datasets, they have one set containing gold price observations by GBP with a daily resolution, and another containing platinum prices by GBP with a daily resolution. This would be a great place to start to look for trends and anomalies.
To get started with extracting this data into an Excel-friendly format that we can incorporate into our research process, we can leverage some of the freedoms that the free version of Quandl allows us in terms of working with the dataset. Once we sign up for a free account, we are given an “API Key” that we can use to build out a script that will pull this sunspot data and export it to an Excel file. This will require some programming, but only a few lines, and I will provide the full script and a link to an interactive version of the tool we will build to make it easier for those with non-tech backgrounds.
Acquiring and Cleaning Data
To start we will need some IDE (Interactive Development Environment) to start programming our script. If you are in macOS, you can use your terminal by typing Cmd+Space, and typing in “terminal”, if you’re on a Windows/Linux machine, you can use a wide array of Python IDEs. I personally use SPyder, PyCharm is also a popular solution. Once you’ve set up your IDE of choice we will need to get our IDE familiar with the Quandl API.
To install the Quandl API and library, the simplest way will be to type into your IDE the following:
pip install quandl
This will install the Quandl library from the internet and allow us to write the commands we need to get our dataset working.
Now to develop our actual script, we only actually need 4 lines of code.
The first line will be to make sure our program uses the quandl library each time we run it. So our first line will be:
This will then be followed by a line giving us access to use Quandl’s datasets, you can think of this as a “login” of sorts, similar to entering a username and password to get access.
quandl.ApiConfig.api_key = 'YOURAPIKEYHERE'
This will then be followed by defining a new variable “df” for data frame, and setting the sunspot data we are interested in to be this new “df”.
df = quandl.get('DATASETLINKHERE')
Where the dataset links for the datasets we want are ‘LPPM/PLAT’ for Platinum, and ‘LBMA/GOLD’ for Gold.
Finally, we want to be able to work with this metadata in a more user-friendly environment and collaborate with members of our investment team who may not want to have to be programming while conducting their research. We will be exporting this data into an Excel file-format, ready to be shared with our team. We can do this with a built-in Python command, and only a single more line of code.
This will create a CSV, to be opened in Excel, on our Desktop, with the title “df” that we can then conduct our analysis on and generate the investment thesis.
The full script can be copy-pasted from below (I’ve added a “print” command to show our dataset in the Python console):
quandl.ApiConfig.api_key = ‘YOURAPIKEYHERE’ df = quandl.get(‘DATASETLINKHERE’)
Here’s our dataset opened in Excel, after I added a column to calculate the spread, ready to be cleaned and visualized.
After removing some unnecessary columns and creating simple charts, we can see the empirical observations collected from Quandl demonstrate the Gold-Platinum spread is at an ALL-TIME HIGH, representing a possible arbitrage opportunity hypothesizing that the spread converges.
A few observations that are interesting to note that we can conclude from the above visualization:
- Pre-2016, Gold and Platinum seemed to have a strong positive correlation, barring an anomaly early 2009 where Platinum took a nosedive and has since recovered.
- At 2016, Gold and Platinum begin to diverge, with the spread reaching an all-time high just this week, July 14, 2019, at the time of writing.
From Idea to Investment
To recap everything we’ve done so far:
- We’ve developed a hypothesis on gold spreads with other precious metals
- Found alternative datasets with relevant information to build an investment thesis based on our hypothesis
- Developed a script to turn this alternative data into something we can work with in Excel
- Cleaned and visualized this data for ease-of-understanding
- Came up with observations to pair with our hypothesis for the investment decision
Now, what can we do with our new insights? Our data has shown that we are at a peak spread between gold and platinum. Both Gold and Platinum can be bought physically, but we would not be able to hedge our exposure to falling precious metals prices if we simply bought physical platinum. For retail investors, we can use ETFs. Assuming we want to express a view that the spread will converge we can: BUY PPLT and SHORT GLD.