The National Cyber Security Centre provides details on prompt injection and data poisoning attacks so organizations using machine-learning models can mitigate the risks.
Large language models used in artificial intelligence, such as ChatGPT or Google Bard, are prone to different cybersecurity attacks, in particular prompt injection and data poisoning. The U.K.’s National Cyber Security Centre published information and advice on how businesses can protect against these two threats to AI models when developing or implementing machine-learning models.
Jump to:
What are prompt injection attacks?
AIs are trained not to provide offensive or harmful content, unethical answers or confidential information; prompt injection attacks create an output that generates those unintended behaviors.
Prompt injection attacks work the same way as SQL injection attacks, which enable an attacker to manipulate text input to execute unintended queries on a database.
Several examples of prompt injection attacks have been published on the internet. A less dangerous prompt injection attack consists of having the AI provide unethical content such as using bad or rude words, but it can also be used to bypass filters and create harmful content such as malware code.
But prompt injection attacks may also target the inner working of the AI and trigger vulnerabilities in its infrastructure itself. One example of such an attack has been reported by Rich Harang, principal security architect at NVIDIA. Harang discovered that plug-ins included in the LangChain library used by many AIs were prone to prompt injection attacks that could execute code inside the system. As a proof of concept, he produced a prompt that made the system reveal the content of its /etc/shadow file, which is critical to Linux systems and might allow an attacker to know all user names of the system and possibly access more parts of it. Harang also showed how to introduce SQL queries via the prompt. The vulnerabilities have been fixed.
Another example is a vulnerability that targeted MathGPT, which works by converting the user’s natural language into Python code that is executed. A malicious user has produced code to gain access to the application host system’s environment variables and the application’s GPT-3 API key and execute a denial of service attack.
NCSC concluded about prompt injection: “As LLMs are increasingly used to pass data to third-party applications and services, the risks from malicious prompt injection will grow. At present, there are no failsafe security measures that will remove this risk. Consider your system architecture carefully and take care before introducing an LLM into a high-risk system.”
What are data poisoning attacks?
Data poisoning attacks consist of altering data from any source that is used as a feed for machine learning. These attacks exist because large machine-learning models need so much data to be trained that the usual current process to feed them consists of scraping a huge part of the internet, which most certainly will contain offensive, inaccurate or controversial content.
Researchers from Google, NVIDIA, Robust Intelligence and ETH Zurich published research showing two data poisoning attacks. The first one, split view data poisoning, takes advantage of the fact that data changes constantly on the internet. There is no guarantee that a website’s content collected six months ago is still the same. The researchers state that domain name expiration is exceptionally common in large datasets and that “the adversary does not need to know the exact time at which clients will download the resource in the future: by owning the domain, the adversary guarantees that any future download will collect poisoned data.”
The second attack revealed by the researchers is called front-running attack. The researchers take the example of Wikipedia, which can be easily edited with malicious content that will stay online for a few minutes on average. Yet in some cases, an adversary may know exactly when such a website will be accessed for inclusion in a dataset.
Risk mitigation for these cybersecurity attacks
If your company decides to implement an AI model, the whole system should be designed with security in mind.
Input validation and sanitization should always be implemented, and rules should be created to prevent the ML model from taking damaging actions, even when prompted to do so.
Systems that download pretrained models for their machine-learning workflow might be at risk. The U.K.’s NCSC highlighted the use of the Python Pickle library, which is used to save and load model architectures. As stated by the organization, that library was designed for efficiency and ease of use, but is inherently insecure, as deserializing files allows the running of arbitrary code. To mitigate this risk, NCSC advised using a different serialization format such as safetensors and using a Python Pickle malware scanner.
Most importantly, applying standard supply chain security practices is mandatory. Only known valid hashes and signatures should be trusted, and no content should come from untrusted sources. Many machine-learning workflows download packages from public repositories, yet attackers might publish packages with malicious content that could be triggered. Some datasets — such as CC3M, CC12M and LAION-2B-en, to name a few — now provide a SHA-256 hash of their images’ content.
Software should be upgraded and patched to avoid being compromised by common vulnerabilities.
Disclosure: I work for Trend Micro, but the views expressed in this article are mine.