Fintech industry is leading in adoption of the cutting-edge technologies. As the volume, the number of sources, and the diversity of financial news grow, it becomes much harder to separate the noise from the signal hidden in the documents. Our customer has requested a service to process large amounts of text to obtain insights quicker than before from a large volume of news from different sources.

Using NLP in the final solution was aimed at retrieval of companies, people and countries inside each text from the customer's dataset to provide a filtering option on any incoming text article based on the retrieved named entities.

Python / Transformers / NLP / Deep Learning / Tensorflow / PyTorch / SpaCy / gensim / BERT / Sanic / REST / AWS / Sagemaker
Daily news is driving financial markets and directly influencing the price of financial instruments (equity and debt markets) along with derivatives, which are even more susceptible to the news. Many information sources found online as well as the amount of articles generated by them grows by the day. An analyst (or a trader) typically has to manually find and screen asset-related articles to stay updated on the state of current affairs for a particular market (e.g. publicly-traded companies, commodities or cryptocurrency).

Portfolio analysts are forced to process a lot of irrelevant/repeated news from each asset they hold interest in. Together with our client, we decided to start with the extraction of person names, company names, and countries from the body of the text. Resultant keywords for each type of a named entity are used to filter all the texts and separate only the relevant ones for a given analyst (trader).

We came up with using BERT (context-dependent) text embeddings and deep learning models using those embeddings as inputs. A trained deep learning model finds all the entities (companies/countries/people), which are deduplicated and returned as keywords. While a model was trained only on the labelled dataset in English, it has achieved limited success on an entity extraction task in other languages especially in languages semantically close to English (because of multi-lingual capabilities introduced by Google's Multi-Lingual BERT embeddings). A microservice on Sanic ensured high-performance and rapid response utilising asynchronous requests processing. It showed good scalability using AWS cloud too.
After conducting A/B testing, the customer was satisfied with the solution developed and claimed that the analysts praised the solution for bringing a significant reduction in time of news-related decision-making process while decreasing the workload due to automation. The customer extended contract for further cooperation on document concept extraction.
Ukraien, Kiev
Cupertino str. 132
+44 330 027 2146
International House,
64 Nile Str, London, N1 7SR,
United Kingdom
© 2020 Neurons Lab LTD. Registered in England and Wales
Company Number 12265479