Key Principles for Secure AI: Understanding and Mitigating LLM Attacks – Part 1 Generative AI

Nov 19, 2024|8 minutes
worm's-eye view of skyscrapers, secure ai
SHARE THE ARTICLE

Written by Artem Kobrin, AWS Ambassador and Head of Cloud at Neurons Lab

 

Security in large language models (LLMs) is crucial and early adoption of the right safety measures is important to protect AI models, as it is with any other advanced form of technology.

The signs are there that while many employees are already embracing the benefits that generative AI can bring, some of the companies they work for are more hesitant and cautious, waiting a little longer before making their move.

A recent McKinsey Global Survey found that employees are far ahead of their companies in using GenAI overall, with only 13% of businesses falling into the early adopter category. Of these early adopters, nearly half of their staff – 43% – are heavy users of GenAI.

Moreover, the IBM Institute for Business Value found that 64% of CEOs are receiving significant pressure from investors to accelerate their adoption of GenAI. But most – 60% – are not yet implementing a consistent, enterprise-wide approach to achieve this. 

Why? One key reason appears to be due to considerations around ensuring security, with more IBM data showing that 84% are worried about the risks of any GenAI-related cybersecurity attacks.

In this article, Part 1, we will cover some of the most common potential types of attacks on LLMs, then explain how to mitigate the risks with security measures and safety-first principles. 

In the next article, Part 2 – coming soon – we will explore advanced attack techniques against LLMs in detail, then run through how to mitigate these risks using external guardrails and safety procedures.

 

Main attack types on LLMs

Research from Security Intelligence has evaluated the potential impact of six main attack types on GenAI, alongside how challenging it is for attackers to execute them:

graph showing different impact and difficulty of attacks on LLMs, secure AI

  • Prompt injection: The user manipulates an LLM with malicious inputs capable of ‘tricking’ it and overriding guardrails that the developer put in place. This is similar to ‘jailbreaking’ and most LLMs have measures in place to anticipate and prevent this relatively basic attack.
  • Model extraction: Through extensive queries and monitoring the outputs, the user tries to steal the model’s behavior and IP. This is a very difficult attack to execute, requiring vast knowledge and resources.
  • Data poisoning: This is the tampering of data involved in training an AI model to change its behavior or insert vulnerabilities. It is easier to achieve with open-source models but with closed data sets, only a malicious insider or person with the right level of access could execute this attack.
  • Inference: The user aims to infer sensitive information based on partial information from training data. A relatively difficult attack to execute, prevention measures include using differential privacy and adversarial training.
  • Supply chain: These attacks target other software and technologies connected to the AI model that potentially do not have the same level of protection as the LLM. A supply chain attack requires significant, sophisticated knowledge of the connected architecture. 
  • Model evasion: The user aims to deceive the LLM by modifying inputs in a way that the AI model could misinterpret them and then behave illogically.

There are more types of attack to prepare for though.

 

OWASP Top 10 examples

For a more comprehensive list of threats facing LLMs, the Open Web Application Security Project (OWASP) is a valuable resource. OWASP provides guidelines to help risk management and data security experts prepare for a wide range of different potential technological vulnerabilities.

OWASP’s Top 10 is particularly useful as it provides a hierarchy of the biggest security risks, updated every few years to capture any growing threats.

In this table, cybersecurity experts Lasso detail the OWASP Top 10 potential vulnerabilities for LLMs and different ways to prevent them:

Risk Description Example How to Prevent
Prompt Injection Malicious inputs causing unintended outputs. A disguised command in a chatbot revealing sensitive info. Input validation, strict input guidelines, context-aware filtering.
Insecure Output Handling Failing to sanitize or validate outputs, leading to attacks. An LLM-generated script causing an XSS attack. Sanitize outputs, encode and escape outputs.
Training Data Poisoning Manipulation of training data to skew model behavior. Biased data causing a financial LLM to recommend poor investments. Validate and clean data, detect anomalies.
Model Denial of Service Overwhelming the LLM with requests causing slowdowns or unavailability. Resource-intensive queries overloading the system. Rate limiting, monitor queues, optimize performance, scaling strategies.
Supply Chain Vulnerabilities Compromised third-party services or resources affecting the model. Tampered data from a third-party provider manipulating outputs. Security audits, monitor behavior, trusted supply chain.
Sensitive Information Disclosure LLMs revealing confidential data from training materials. Responses containing PII due to overfitting on the training data. Anonymize data, enforce access control.
Insecure Plugin Design Vulnerabilities in plugins or extensions. A third-party plugin causing SQL injections. Security reviews, follow coding standards.
Excessive Agency LLMs making uncontrolled decisions. An LLM customer service tool making unauthorized refunds. Limit LLM decisions, human oversight.
Overreliance Overreliance on LLMs for critical decisions without human oversight. An LLM making errors in customer service decisions without review. Human review of LLM outputs, supplemented with data inputs.
Model Theft Unauthorized access to proprietary LLM models. A competitor downloading and using an LLM illicitly. Authentication, encrypt data, access control.

Proactive measures that can help to secure LLMs include regularly auditing user activity logs, with strong access controls and authentication procedures also in place.

Developers need to fine-tune bespoke LLMs for improved accuracy and security before launch. They also need to put measures in place to clean and validate input data before it enters the LLM’s system.

 

Key principles – defending against LLM attacks

In an AWS Machine Learning blog post, Harel Gal et al outline how AI model producers and the companies that leverage them must work together to ensure that appropriate safety mechanisms are in place.

Responsibility lies with AI model producers to:

  • Pre-process data: Clean any open-source data before using it to train an LLM base model
  • Align the model to security values: In the next section, we’ll run through how we ensure our AI solutions adhere to our standards around data privacy, safety, IP, and veracity.

Companies deploying LLMs also have a responsibility to ensure that they or their provider have put key security measures in place.

In addition to creating blueprint system prompt template inputs and outputs, these measures include:

  • Specifying the tone
  • Fine-tuning the model
  • Adding external guardrails

We will share more details on these approaches shortly.

Applying these safety mechanisms creates layers of security for LLMs:

diagram, secure AI

AWS Security Reference Architecture and guardrails: Overview

Amazon Web Services (AWS) has a wide range of security-related applications well suited to business infrastructure. For context, the below diagram shows the AWS Security Reference Architecture (SRA).

There are three tiers in the SRA workload:

  1. Web tier: Users connect and interact through this interface part of the architecture
  2. Application tier: This handles user inputs, analyzes them, and creates the outputs
  3. Data tier: The application tier stores and retrieves information from this part of the architecture

From the below diagram, selected highlights include:

AWS security reference architecture, secure AI

We’ll explore several external guardrails for GenAI applications comprehensively in Part 2 of this article, summarizing key insights from the aforementioned AWS Machine Learning blog post from Harel Gal et al.

These include Amazon Bedrock and Comprehend:

Implementation Option Ease of Use Guardrail Coverage Latency
Amazon Bedrock Guardrails No code Denied topics, harmful and toxic content, PII detection, prompt attacks, regex and word filters Less than a second
Keywords and Patterns Approach Python based Custom patterns Less than 100 milliseconds
Amazon Comprehend No code Toxicity, intent, PII Less than a second
NVIDIA NeMo Python based Jailbreak, topic, moderation More than a second

 

How we ensure security in AI projects

Here at Neurons Lab we follow a comprehensive security framework that covers all bases for successful and secure AI projects. Informed by clients’ priorities around safety, these are some of the highlights:

LLM safety priorities, secure AI

Veracity

  • RAG and Agentic Architectures empowered by Knowledge Graphs
  • Customizing AI models via prompting, fine-tuning and training from scratch

Toxicity & safety

  • Built-in Guardrails in Amazon Bedrock & Amazon Q
  • Model evaluation in Amazon Bedrock & SageMaker Clarify

Intellectual property

  • Uncapped IP indemnity coverage for Amazon Titan
  • Copyright indemnity for Anthropic models in Amazon Bedrock

Data privacy & security

  • Your data is not used to train the base models

  • VPC access to the GenAI services
and custom fine-tuned models
  • Customer-managed keys to encrypt your data

In the next article – Part 2, coming soon – we will explore advanced attack techniques against LLMs in detail, then explain how to mitigate these risks using external guardrails and safety procedures.

 

About Neurons Lab

Neurons Lab is an AI consultancy that provides end-to-end services – from identifying high-impact AI applications to integrating and scaling the technology. We empower companies to capitalize on AI’s capabilities. 

As an AWS Advanced Partner, our global team comprises data scientists, subject matter experts, and cloud specialists supported by an extensive talent pool of 500 experts. This expertise allows us to solve the most complex AI challenges, mobilizing and delivering with outstanding speed while supporting both urgent priorities and strategic long-term needs.

Ready to leverage AI for your business? Get in touch with the team here.

 

References

OWASP Top Ten; OWASP, September 2024

Gen AI’s next inflection point; McKinsey, August 2024

Build safe and responsible generative AI applications with guardrails; AWS, June 2024

Claude 3 Model Family Card; Anthropic, March 2024

Mapping attacks on generative AI to business impact; Security Intelligence, January 2024

The CEO’s guide to generative AI; IBM, 2023

When it comes to cybersecurity, fight fire with fire, IBM, 2023

Get in touch Leverage the power of AI for your business
SHARE THE ARTICLE
Enhancing Smart Contract Development for Financial Services on Polkadot with an Agentic AI Copilot
dAppForge: Enhancing Smart Contract Development for Financial Services on Polkadot with an Agentic AI Copilot
Automating Business Operations with Generative AI in Cybersecurity
Peak Defence: Automating Business Operations with Generative AI in Cybersecurity
Building an innovative voice and text-based cybersecurity chatbot for Xauen
Xauen: Building an innovative voice and text-based cybersecurity chatbot for Xauen