Daily news is driving financial markets and directly influencing the price of financial instruments (equity and debt markets) along with derivatives, which are even more susceptible to the news. Many information sources found online as well as the amount of articles generated by them grows by the day. An analyst (or a trader) typically has to manually find and screen asset-related articles to stay updated on the state of current affairs for a particular market (e.g. publicly-traded companies, commodities or cryptocurrency).
Portfolio analysts are forced to process a lot of irrelevant/repeated news from each asset they hold interest in. Together with our client, we decided to start with the extraction of person names, company names, and countries from the body of the text. Resultant keywords for each type of a named entity are used to filter all the texts and separate only the relevant ones for a given analyst (trader).
We came up with using BERT (context-dependent) text embeddings and deep learning models using those embeddings as inputs. A trained deep learning model finds all the entities (companies/countries/people), which are deduplicated and returned as keywords. While a model was trained only on the labelled dataset in English, it has achieved limited success on an entity extraction task in other languages especially in languages semantically close to English (because of multi-lingual capabilities introduced by Google's Multi-Lingual BERT embeddings). A microservice on Sanic ensured high-performance and rapid response utilising asynchronous requests processing. It showed good scalability using AWS cloud too.
After conducting A/B testing, the customer was satisfied with the solution developed and claimed that the analysts praised the solution for bringing a significant reduction in time of news-related decision-making process while decreasing the workload due to automation. The customer extended contract for further cooperation on document concept extraction.