The last five years have been revolutionary for the field of natural language processing. We went from glorified ctrl-f to a machine that can write programs for us based on natural language descriptions. While the web is full of amazing technical demos, applications of NLP to the world of finance have been less vocalized.
I’m in the lucky position to run a text annotation tools company, our customers use our tools to build the next generation of NLP applications. From that vantage point, We’ve seen quite a few applications of NLP to finance and in this post I’ll present a few that we can share.
Whenever I think “Finance” I first think about having an edge and beating the market. One place people are looking for an edge is social media, where NLP is used to detect tweets that mention a particular stock, and whether the tweet has positive or negative sentiment.
In 2008 the Huffington Post reported that Berkshire Hathway’s stock goes up whenever Anne Hathaway releases a new movie. They posited that “all those automated, robotic trading programming are picking up the same chatter on the internet about “Hathaway” as the IMDb’s StarMeter, and they’re applying it to the stock market”.
While this particular case may be dubious, sentiment analysis of financial news and financial social media is a hot topic with significant academic research and presence in the alternative data space from providers such as Singapore’s Infotrie.
Name Matching and KYC
Financial Services companies are required to validate their customers’ identity and match them against certain blacklists such as those put out by the Office Of Foreign Assets Control. This gets tricky when dealing with name variants across borders. For example, my Grandfather immigrated from Iraq to Israel and transliterated his name from the Arabic Abdallah to the Hebrew Ovadia.
Additionally, some people or businesses may try to get around those blacklists by modifying their names, such as “Eagle Pharmaceuticals, Inc” creating a subsidiary called “Falcon Drugs, Ltd.”
Both of these cases illustrate the problem financial institutions have when trying to implement name matching components in their KYC process. Natural language processing helps here, by giving the linguistic guidelines that can match “Abdallah” to “Ovadia” (both mean servant of god) or identifying the semantic similarity of “Eagle Pharmaceuticals” and “Falcon Drugs”
A lot of high flying sell-side trading happens in Bloomberg chats, and those chat rooms sometimes lead to bad behavior. In 2016, Paul White, a trader at RBS was caught manipulating Libor in the chats, and RBS was fined £390m.
While the bank might not care about White, they certainly do care about paying £390m in fines, thus compliance monitoring of Bloomberg chats have become a major topic of interest at financial institutions globally.
But, it’s not like a bank can simply assign someone to read all of the messages because the volume is insanely high. Instead, banks today use natural language processing algorithms that understand conversations and generate alerts thus preventing problems before they happen.
Of course, traders have stayed ahead of this, and today the biggest flag for these systems is “Let’s move this to Whatsapp/ Telegram”
MIFID2 is a set of regulations for financial institutions that came into effect in 2018. One, if it’s clauses, mandated the unbundling of research from the trading services banks provided their customers. That means that instead of customers paying commissions on trades and getting research for “free”, they’d have to opt in to buy individual research items that sell-side firms produced.
This change created a problem for the research divisions of major banks since overnight they had to justify themselves by actually getting money for the research they produced. It was no longer enough for research to be good, it had to be relevant to a particular customer at the right time. That sounds a lot like content relevance, an active area of research in NLP, and indeed with the implementation of MIFID, a wave of startups appeared applying NLP to find (and sell) the most relevant content to the most relevant users.
Loan Origination Document Management
Have you ever wondered who pays for the construction of a 30 story apartment building? Well, it’s the bank, who gives a loan to the developer. While the average American mortgage is about $200K, a loan for commercial real estate development could easily be $100M or more.
With so much more money comes an unimaginable amount of additional paperwork, Freddie Mac defines checklists that include documents like a “Moisture Management Plan”, “Wood Damaging insect inspection report” or a “HAP Contract”. These documents add up to hundreds, and variations of them need to be submitted again at various stages of the development lifecycle.
For lenders, this creates an operational challenge. Lenders are responsible for receiving, categorizing, and analyzing each document that a borrower submits. When done by humans, this becomes an expensive and time-consuming task.
Luckily for lenders, NLP is very applicable in these situations. We’ve seen lenders train NLP models that automatically categorize documents and extract the relevant information from them.
NLP is applicable in finance for both front and back-office operations and its impact on the business can be profound. Further, NLP is becoming democratized through the open-source distribution of the state of the art models and vendors that make those models accessible to non-technical organizations.
That’s not to say that NLP is a free lunch. NLP algorithms are learning algorithms, and they need data to learn from. As a user of NLP technology, you’ll need to define what it is you want the model to learn, and collect examples with which you’ll adapt the model to your problem and later evaluate its performance.