
Transforming Telco: AI in Telecommunications
Based on our previous work with telcos and our research, we have identified many impactful AI-led use cases.
In this article, we will explore advanced attack techniques against large language models (LLMs) in detail, then explain how to mitigate these risks using external guardrails and safety procedures.
This is Part 2 in our guide on understanding and preventing LLM attacks. If you haven’t seen it yet, take a look at Part 1 of this article – there we outline the main attack types on LLMs and share some key overarching principles for defending AI models against such threats.
Safety measures have a vital role in protecting AI models against threats – just as any type of computer software, system, or network requires cybersecurity.
Following on from a brief description of prompt injection in Part 1, here is a deep dive into what it involves. Prompt injection is a type of vulnerability that targets LLMs.
This technique involves creating specially designed inputs to manipulate the AI model’s behavior, aiming to bypass its security controls and cause it to produce unintended outputs.
Prompt injections try to exploit the way LLMs process and generate text based on inputs. By inserting carefully worded text, without enough safeguards in place, an attacker could trick the model into:
The attack takes advantage of the inherent trust these models place in their inputs. The diagram below is a visualization from an AWS re:Inforce presentation covering Cloud security, demonstrating the principles behind a prompt injection:
Building on this, there are several other types of prompt-based attacks including:
Jailbreaking methods often involve complex prompts or sneaking in hidden instructions to try confusing an LLM, exploiting weaknesses in how it understands and processes language.
By using clever phrasing or unconventional requests, users attempt to lead the model to provide outputs that developers programmed it to avoid.
Let’s look at a few examples, again with the help of visualizations from AWS re:Inforce. This is a simple example showing how attackers could try using affirmative instructions:
The next example involves an adversarial suffix:
One of the most well-known prompts is ‘do anything now’ or DAN, as visualized in a recent research paper by Xinyue Shen et al. The full DAN prompt is omitted in the diagram below:
Research from Anthropic Claude has also explored many-shot jailbreaking, involving faux dialogue. In this article, Anthropic also explains steps the company has taken to prevent such attacks:
It’s important to note that jailbreaking attempts are constantly evolving, and LLM developers are working continuously to keep their models protected against such manipulations.
Guardrails introduce safeguards to block such attacks and implement responsible AI principles. They provide safety controls and filter out content that is harmful or against policies, customized to bespoke applications.
Here are some illustrative examples from an AWS Machine Learning blog article by Harel Gal et al, showing how such guardrails work after the initial user input.
Example 1:
Example 2:
Among the best safeguards in the industry feature in Amazon Bedrock Guardrails. Building on native protections already in foundation models, these guardrails are capable of blocking the vast majority of more harmful content.
Amazon Bedrock Guardrails can also filter at least 75% of AI hallucinated responses for retrieval-augmented generation (RAG) workloads. AI hallucinations are instances where LLMs generate and present false, misleading, or inaccurate information as if it were factual, producing outputs that are not based on their training data.
Other benefits of using Amazon Bedrock Guardrails include abilities to:
There are other guardrail options available to use in addition to those included in Amazon Bedrock, further increasing the level of protection against attacks:
They include, but are not limited to:
Neurons Lab ensures privacy and security for clients in highly regulated industries such as banking which require the utmost diligence.
Clients retain full ownership of proprietary cloud environments or can choose on-premises deployment, providing complete control within your secure infrastructure.
For more details, find out how we enhanced compliance and data management in financial marketing with an LLM constructor for Visa:
Our locally deployed LLMs operate entirely within clients’ ecosystems, eliminating the risk of data breaches or unauthorized access. All data stays within the client’s controlled environment, guaranteeing compliance with data protection regulations and internal security policies.
With models running locally, there is no need for external data transfers, ensuring that proprietary information is never exposed to third-party systems. Bespoke security protocols tailored to specific requirements increase the protection of sensitive information.
Some of the main attack types against LLMs include model extraction and evasion, prompt injection, data poisoning, inference, and exploiting supply chain vulnerabilities.
Protecting your AI model against such attacks requires implementing responsible principles and safety mechanisms, such as external guardrails.
Here are several recommendations for preventing and mitigating attacks against LLMs:
For more recommendations, read our tips for effective risk management in AI delivery.
Neurons Lab delivers AI transformation services to guide enterprises into the new era of AI. Our approach covers the complete AI spectrum, combining leadership alignment with technology integration to deliver measurable outcomes.
As an AWS Advanced Partner and GenAI competency holder, we have successfully delivered tailored AI solutions to over 100 clients, including Fortune 500 companies and governmental organizations.
Based on our previous work with telcos and our research, we have identified many impactful AI-led use cases.
We cover some of the most common potential types of attacks on LLMs, explaining how to mitigate the risks with security measures and safety-first principles.
The recently released SWARM framework offers a simple yet powerful solution for creating an agent orchestration layer. Here is a telco industry example.
Traditional chatbots don't work due to their factual inconsistency and basic conversational skills. Our co-founder and CTO Alex Honchar explains how we use AI agent architecture instead.
The AWS Public Sector Program (PSP) recognizes partners serving government, healthcare, education, space, and non-profit customers.