How New AWS AI/ML and Big Data Offerings Help You Build a Better Business (and Kingdom) – Part 2

Where We Left Off In Part One
COVID-19 changed what Customers demand from AI/ML and Big Data services – it’s no longer just about solving business and scientific problems but doing so in a robust and reliable way.
As discussed in part one of this blog, Amazon Web Services (AWS) greatly enhanced Amazon SageMaker offerings and expanded them with new ones. These enhancements allow products of any size to benefit from the cloud throughout their entire lifecycle, from initial vision to post-release evolution.
If you’re only at the beginning of this adventure but want to take advantage of AI/ML technologies without spending years to master Data Science (only to be confused with all these confusion matrices), AWS has a special gift for you – Managed AI/ML services, our topic in part two.
AWS-Managed AI/ML And All His Friends
If you don’t have a Machine Learning guru on your team, terabytes of historical data in-house, or the budget to acquire these things, it’s not game over. With Managed AI/ML from AWS, you can start getting value from Data Science right now (and not in 5 years, while trying to obtain the resources mentioned above), which is especially crucial for small and medium businesses (SMBs).
If you’re a large enterprise, don’t ignore these services either. They will benefit you by enabling you to learn, design, and release new products and features faster. Simple but powerful, they surely deserve a place in your cloud journey, and now AWS gives you even more reasons to use them.
Amazon Forecast
Amazon Forecast is an advanced time-series forecasting service built upon more than 20 years of amazon.com retail experience. The service requires no more than a few lines of code to create your own model that perfectly understands how to handle your business’ specifics.
Starting in June 2022, Forecast allows users to limit the part of input data on which it makes predictions, significantly increasing inference performance and considerably decreasing costs.
This release also comes with an embedded model monitor, so you always have up-to-date knowledge on your predictor performance and, if it falls below the acceptable level, you can quickly retrain it (which can be automated by connecting monitor-emitted events and Amazon EventBridge).
Amazon Personalize
The sister of Forecast is Amazon Personalize, which can make recommendations to users of virtual (like TV shows and series) and physical (like flowers) goods in a truly personalized fashion. When making such smart recommendations, personalize accounts for similarities between items, their popularity, and user behavioral patterns, leading to goods offered that meet your users’ needs and make them happy.
Last year, Personalize learned how to acquire extra knowledge about goods from their unstructured metadata (e.g., flower descriptions or TV series synopses), which people often examine before deciding whether to proceed with a recommended item.
This year, Personalize added support for six additional languages – Spanish, German, French, Portuguese, Chinese, and Japanese (in addition to English, which was initially available). Personalize can apply a single model to content in different languages (e.g., served in accordance with user preferences). One of the best examples is online marketplaces, where text accompanying goods can even be in a mixture of languages, with one part, for example, in English and another in Chinese.
Amazon Comprehend
Another excellent AWS service for Natural Language Processing (NLP) is Amazon Comprehend. The service can extract valuable and structured insights like places, sentiments, and topics from unstructured texts in many languages (like English, Spanish, or Chinese) and for many verticals (like healthcare, retail, or finance).
For this to work, you don’t need any historical data – just provide the text you want to analyze and get highly accurate results in seconds. If you have your own dataset, you can create a custom model still within Comprehend (and still without any extraneous code). However, for 99% of cases, a built-in one model is more than enough, as designed and developed by industry-honored AWS experts on top of billions of various texts.
Neurons Lab used Comprehend and Comprehend Medical to help pareIT build its system for automated medical record analysis. This system accurately, still without any human intervention, organizes legal and medical case files, eliminates redundancies within said files, and unveils hidden connections in-between, so doctors and practices can better swim (not sink) in the cold waters of healthcare regulations and compliances.
COVID-19 has clearly shown us how security and privacy are critical when dealing with user data, especially if it happens remotely. To make your life easier, Comprehend learned to detect 14 new types of Personal Identifiable Information (PII), including (but not limited to) UK Unique Taxpayer Reference Numbers, Canada Health Numbers, and Vehicle Identification Numbers. Along with entities that are already supported, Comprehend gives you 36 PII types and masks them all in the data you collect so that no user or regulator would ever be able to say that they can’t trust you.
Amazon Textract
Despite the great innovations we see daily, the world is still paper-centric. In the US alone, paper consumption over the last 20 years has increased from 92 million tons to 208 million (+126%). To use data in AI/ML (either in training or inference), you often need to first extract it from various scans, copies, and even photos of paper documents. Instead of doing this manually (and burning through your last two neurons remaining after the pandemic), it’s better to automate this process. This is why Amazon Textract exists.
In contrast to traditional OCR tools, Textract extracts from images, not just raw text but structured data like tables (as JSON arrays) and forms (as JSON objects), reducing the need for custom post-processing code. Even so, not all forms are easy to process – some fields may be marked with titles, some with numbers expanded on a separate page, and others by free-form remarks.
To avoid tons of boilerplate code to structure the extracted text, you can literally ask Textract (in natural language) to obtain what you need using the new Queries feature. Let’s imagine that you ask, “What is the patient’s name?” and, even if such a field is not explicitly titled as “patient name” but “#11, for more details, see page 61”, Textract can get the data you’re looking for by applying various NLP and NLU (Natural Language Understanding) techniques. Textract will do what humans do – find the requested field (by name or meaning) on referred pages and reconcile to its number.
As well, for many untrivial-but-widely-used forms (like CMS healthcare, IRS tax, and ACORD insurance), Textract is now capable of extracting all such fields automatically, without requiring any historical data for training or asking too many questions.
Amazon Transcribe
Speech has become another source of business-critical data as we use voice assistants more and more in our daily lives (“Alexa, when does my vacation really start?”). To transform speech into text (STT) for further analysis, it’s best to use Amazon Transcribe (or its HCLS variant, Transcribe Medical).
Transcribe’s most remarkable enhancement over recent months is the ability to handle dialogues and monologues in which two or more languages are used at once. There are many scenarios where Transcribe wins the game. E.g., any state representative in the United Nations during debates can use a language other than the six official ones (if providing a translation to a working one – English or French), and there is no requirement for that to be the same language for all participants (if they don’t know each other’s languages, simultaneous interpretation is used). To preserve what’s said as accurately as possible, the primary transcript is required to express representatives’ speech in the languages they actually used (after many thorough reviews and refinements, the speech is translated into the six official languages for more convenient use by the public).
Another example that readily comes to mind is an appointment between an English-speaking doctor and a non-English-speaking patient communicating via, this time, a consecutive interpreter (either human- or computer-based). As the resultant medical notes should be as precise as possible (to avoid rejection of insurance claims which leads to huge fines and penalties), it’s better to analyze both the original and translated verbiage. This is where Transcribe will help you (without demanding more than a few clicks from you), as it helped Neurons Lab and Creative Practice Solutions build a medical billing coding app that applies the power of AI/ML to automatically transform unstructured appointment recordings into structured, ready-to-claim notes.
Amazon Lex
Where the water meets the land, there is shifting in the sand, NLP meets STT, there is a chatbot in the AWS land. Amazon Lex can enrich your apps with conversational interfaces, so your users spend less time on routines and more time on what matters most.
Enhancements have been made to Lex as well. Specifically, it now allows you to customize speech recognition and synthesis by importing a custom vocabulary (e.g., of terms specific to your area like jurisprudence), and this greatly raises AI/ML accuracy and, therefore, User Experience.
Still don’t have the time to design a chatbot yourself? That’s not a problem anymore due to another new feature, Automated Chatbot Designer. Feed this designer with existing conversation transcripts (e.g., from your call center), and it will automatically generate stateflow and UI/UX based on your needs, which you only have to connect to your business logic and you’re ready to go live.
[New] Amazon CodeWhisperer
Last but not least – have you ever imagined using AI/ML to build better… AI/ML?
This dream comes true with the new service Amazon CodeWhisperer, the first Data Science-powered coding assistant. CodeWhisperer aims to significantly improve developers’ productivity and the quality of their deliverables by analyzing in real-time the code they’re typing (and even comments in natural languages) and suggesting to them AWS services, open-source libraries, and modern language features (as well as how to bake it all together) that are most suitable for the tasks they’re working on.
CodeWhisperer fully supports Python, JavaScript, and Java (more is coming) in various IDEs such as JetBrains, Visual Studio Code, and AWS Cloud9 (including AWS Lambda console editor). Combine it with Amazon CodeGuru, which applies AI/ML for static and dynamic code analysis, and what you write will be perfect. Like it’s done by someone else, not just a leather bag full of meat.
Both CodeWhisperer and CodeGuru are trained on myriad open-source and AWS-proprietary repositories, so they don’t require you to download the entire GitHub to work efficiently. However, the more that they observe your codebase evolution, the more definitive recommendations they provide.
Stay Tuned
Up to this point, we’ve discussed many novelties and upgrades in AWS services, making your Data Science path a cakewalk free of common AI/ML challenges.
Still, you may be wondering – why does this series title mention both AI/ML and Big Data while we haven’t talked much about the latter? Because that’s what the final, third episode will be about – how, in 2022, you can build Big Data architecture more quickly, easily, and safely in order to get more value from your Data Science practice.
Want more?
While waiting for part three of this series, contact Neurons Lab to discuss the value that AWS and AI/ML can provide for your business. No one can help you better than Neurons Lab – an AWS Advanced Tier Services Partner with 3 AWS Service Validations and 20+ successful AWS AI/ML Customer Launches in many verticals such as EnergyTech, ShipTech, and HealthTech.