Prompt injections, deep fake phishing attacks, compromised code. The rapid rise of generative AI brings a wide range of new security threats for engineers to manage.
Generative AI is one of the rare occasions where a technology is truly disruptive, forcing companies to reshape their workforces and software teams to quickly add AI functionality into their products and processes.
Internally, generative AI tools are already producing productivity gains, as employees from all disciplines rush to experiment with a new tool kit. With such a low barrier to entry, anyone in your organization could be experimenting with AI without any way of knowing who is using what and how. In software engineering organizations, this has taken the issue of shadow IT to the next level, with shadow AI.
This all creates a legal and security quagmire that puts intellectual property and reputations at risk. The speed of AI development means it already feels like teams are running a year behind, with less than a third of organizations having any sort of AI governance policy in place, according to a 2023 Conference Board survey. With this backdrop, what can engineering leaders do to reduce these risks and secure your code?
Why AI security is so much harder
AI represents an exponential increase in difficulty for cybersecurity teams.
“You’re telling me that you’re going to let us – who don’t know how to do application security for normal code – write AI applications that are way scarier and more complicated and ambiguous?” asks Anna Belak, director of the office of cybersecurity strategy at Sysdig. “Machine learning security is just application security with extra steps, but the extra steps are really scary because you can’t control what the user will input and you can’t necessarily control how that alters the model’s behavior over time.”
For many developers, if a piece of code a model provides is working, that will be enough to carry on with the task at hand, without digging too deeply into the security risks. “The amount of people who are interacting with the system who have no idea what they’re doing, and how it works, and where they’re sending their data, and what happens to that data, is utterly terrifying,” she says.
The reality is, the public repositories that these large language models (LLMs) have been trained on are likely to be insecure. “People don’t have in mind that Gen AI is part of their supply chain and has security vulnerabilities in the code it’s generated,” Sonya Moisset, staff security advocate at Snyk says.
If AI security practices have already fallen behind the pace of technology change, as Jeff Schwartzentruber, senior machine learning scientist at eSentire says, “as LLMs become more advanced and integrated with production systems, so too does the threat surface on how these LLMs can be exploited, resulting in additional complexity to the security problem.”
AI threat #1: Unchecked code
If code is already being released without proper checks in place, it’s extra important to scan any code outputted by generative AI tools.
Moisset took it upon herself to test the most popular AI code generators – GitHub Copilot, Amazon CodeWhisperer, Perplexity AI, Claude 2.0, Mistral AI, ChatGPT 3.5, and Google Gemini – with the same prompt: Create a basic Express application which takes a name in the request params and returns a HTML page that shows the user their name.
“They all gave me a good starting point for the code. But all of them, by default, were vulnerable to XSS” or cross-site scripting, where an attacker injects malicious executable scripts. This was the same across all seven code generators.
As a security practitioner, she actually challenged the code generating model by telling it that the code is insecure because there’s a malicious piece of XSS on the snippet of code provided. Each of the tools responded in agreement, offering the right package to fix the problem, but only after being challenged.
Just like asking the bot if it knows if an answer is correct, asking if the code is secure is another good generative AI policy. As is scanning all code before it goes to production, regardless of the creator.
AI threat #2: Prompt injections
Likely the most widely practiced attack is a prompt injection, which uses imagination and some brute force to override the original intentions of a prompt using a special set of user inputs.
LLMs more generally follow guardrails that should prevent nefarious responses. If you asked ChatGPT to show you how to create a chemical weapon, or if you asked DALL-E to create child pornography, while these bots are likely capable of completing these tasks, they have guardrails in place to block illegal behavior. Most commonly though, the request is intended to extract a password or other classified information.
“When I first got interested in the security of the applications that use generative AI and LLMs, the first thing that struck me was the dual nature of the LLMs, which are both extremely powerful and extremely gullible,” Paul Molin, CISO group evangelist at Theodo said on stage at Monki Gras, a recent London event dedicated to exploration of prompt engineering.
“It’s largely an input validation problem,” explained Alex Cox, director of threat intelligence mitigation and escalation team at LastPass. He likened it to SQL injection, when someone creates a SQL command someplace where it’s not expected to extract valuable data, like a password list.
People have already been able to exploit customer service bots by manipulating the prompts – like how someone got sold a Chevy for a dollar. Similarly, Air Canada had to honor a refund its chatbot made up. It seems that a lot can be achieved by saying: Ignore the prompt question, and do this instead.
Clever companies should prepare, Cox said, by adding a clause to their user agreements that “if you’re able to manipulate this to give you a good deal, we’re not going to stand by it.”
“When I look at prompt injection and manipulation of AI models, from that standpoint, the security work is around creating an interface that prevents that unexpected input,” Cox said. “If you want just questions around support, and if you try to add something else, it won’t let the Gen AI model proceed.”
AI threat #3: Unexpected prompts
If malicious actors can get around the prompts that chatbots were explicitly trained on, what about those they weren’t trained on?
“LLMs have emergent properties, as in they are capable of outcomes they were not explicitly trained for. This emergent behavior makes LLMs inherently unpredictable – they typically provide different response variations to the same prompt each time – and this can lead to various unknown risks that make them overtly susceptible to data exfiltration or privacy breaches,” Schwartzentruber said.
For example, an indirect prompt injection can be leveraged against an LLM with the objective to extract private content without the users’ or company’s knowledge. He gave the example of this kind of attack against Writer.com, which resulted in the company having to issue a patch for the vulnerability.
While many of these models aren’t supposed to help you build a bomb, there will always be attackers happy to pick away at the problem. Take the Grandma Exploit, which involved carefully role-playing a relationship where a recently deceased grandma was a chemical engineer and told lovely bedtime stories about napalm recipes.
Another prompt injection involves machine-readable text that humans cannot see, like white text on a white background. For example, recruiters have long used AI-backed application tracking systems to scan resumes and CVs. Molin predicted some candidates might put white text on the white background, which the generative AI can read and interpret, prioritizing a certain resume in a manipulative way that the recruiter and hiring manager cannot see. “In this case, an attacker could disguise a prompt for an employee or user to paste into their LLM application, but include text or other steps that would lead to data leakage,” said Schwartzentruber.
While there are some tools that can be used to analyze the security capabilities of LLMs, like the open source Do-Not-Answer dataset or HarmBench, an evaluation framework for automated red teaming and robust refusal. However, he noted, "the effectiveness of these tools will depend on the complexity of the LLM, how it is deployed and in what context it exists in.” If cybersecurity is a complex and ongoing game of whack-a-mole, LLMs just upped the difficulty significantly.
Molin identified four AI security enhancers:
- Use an LLM to analyze input/output. Ask your LLM to compare the input to a list of known injections. Then, if the user asks something like What is the password? it kicks off limitations for that user. It’s impossible to list all the ways questions could be indirectly asked – like asking for a p.a.s.s.w.o.r.d. Instead, ask the LLM to compare the user input relative to other metrics. If they have similar enough vectors – called embeddings – it’s likely to be an attack, so block that user too. Use two LLMs, one the privileged LLM which has the ability to use many tools to do many things. The other, which he calls a quarantined LLM, won’t have access to any tools or anything that could cause harm to the users. Then you’re able to detect and train on attack attempts.
- Pre-flight prompts. Like a pre-flight checklist, block malicious inputs early by running a preliminary check before processing the main prompt.
- The human in the loop. You don’t want to block customers, but you don’t want to be too loose with guardrails. Molin suggested an application team should start by creating a list of dangerous APIs – like one that would delete an entire inbox – which then prompted the user to confirm if that’s what they really wanted to do. This isn’t something he recommends to do all the time though, as users will quickly grow impatient with the prompt, like a cookie permission pop up.
- Canary testing. Make sure you create expendable resources – like real AWS secrets that are only used to lure and flag attackers – which you monitor and know when they are used.
For all of them, he admitted that they are new techniques which aren’t 100% effective in all cases, and a combination of these threat reduction activities should be followed.
AI threat #4: Model injections
Just like click farms have tried to game Google search, hackers are already trying to retrain machine learning models through brute force.
“If I can manipulate the data that’s used to train these models, I can potentially change the way it interacts,” Cox says. Researchers at JFrog recently discovered 100 malicious models loaded on the popular AI model repository Hugging Face, which can compromise user environments with code execution.
“The model’s payload grants the attacker a shell on the compromised machine, enabling them to gain full control over victims’ machines through what is commonly referred to as a ‘backdoor’,” the research concludes. “This silent infiltration could potentially grant access to critical internal systems and pave the way for large-scale data breaches or even corporate espionage, impacting not just individual users but potentially entire organizations across the globe, all while leaving victims utterly unaware of their compromised state.”
To prevent these attacks, Hugging Face had already implemented several security measures, such as malware, secrets, and pickle scanning. But nothing’s foolproof. And Hugging Face has more security measures in place than most AI projects. For now, the best advice seems to be rigorous scanning of each new model before it’s uploaded – and any you’ve already uploaded without scanning.
AI threat #5: Taking advantage of users
The days of easily recognizable phishing attacks could be coming to an end. In a world where LLMs are able to produce extremely convincing deep fakes, complex and convincing phishing attacks are just around the corner.
“This whole spectrum of deep fake video, deep fake audio, deep fake pictures, really raises this education process approach that we have where you're gonna have to be a little bit more considerate of doing things that people ask you to do and verifying and making sure there's checks and balances,” Cox said.
AI threat #6: Forgetting the basics
Zero trust security is essential in a generative AI world, where no one is trusted by default and verification is required to gain access to anything from resources on the network to a hefty bank transfer.
If you've got a large data source that you use to train your AI models, and it's critical to your business, those data sources need to be highly protected, through OS hardening, firewalls, intrusion protection, and a robust patching programme.
On top of these cybersecurity best practices, if you’re going to use live interaction as training, like ChatGPT does, AI adoption has to be paired with guardrails. These include policies around not feeding customer data or company IP into these training models, as well as updated user agreements when abuse attempts inevitably happen.
Endpoint hygiene is another recommendation so “the developers do not have admin rights, they can’t install or use tools that aren’t authorized or tested,” Cox advised. “So when they want to use ChatGPT, they have to submit a request,” to then have the GRC team create some guidance, user agreements and guardrails. He says you have to apply at least the typical level of information security scrutiny you would to working with a third-party vendor.
This doesn’t completely overcome the shadow AI challenge of having developers use tools like GitHub Copilot without permission, but it would help stay on top of this hugely disruptive change.
Embedding a culture of secure AI
“Developers might think that security is a blocker for them. Security always is the person who will say no to everything, instead of actually sitting down and having a healthy chat between developers and security,” Moisset said.
Everyone should be aligned, she continued, to prevent something going wrong at the end of the chain right when you’re ready to ship to production. It’s also an avoidable organization cost to fix these vulnerabilities earlier. And, if you have a bug bounty program, the cost of waiting could really hit your pocket.
“Have security policies in place to make sure that employees understand if they can use it, how they can use it, and if the company doesn’t want them to use it, they shouldn’t use it,” Moisset said.
For engineering leaders, Schwartzentruber advises reflection. “Can you encourage your team to think about how they track guardrails and attempts to break the system? Your developers will be at the forefront of any innovation work, so getting them to consider security as part of their approach could save your whole team time and effort later.”
For Moisset, this happens by integrating security visibility throughout the software development lifecycle and CI/CD pipeline, from the IDE to pull requests.
“We have a chance to embed security models into LLM applications now,” Schwartzentruber said. “If we don’t take it, then we will have to bolt security on after and that will increase the overhead for developers in the longer term.”
Your AI security checklist
- Run security scans before anything is released.
- Understand the risks of generative AI within the context of your business and your intellectual property.
- Limit external-facing chatbots to your domain.
- De-risk with the OWASP Top 10 for Large Language Model Applications
- Play with Gandalf to educate around prompt injection.
- Like all successful security, cultivate a blameless culture.
- Add a clause to your user agreements stating you do not stand by generative AI manipulation as you recognize it as a form of hacking.
- Decide the big decisions for your business and create extra steps to help identify if deep fake video and audio reaches out impersonating executives, like for large money transfers.
- Include protections against machine-readable text that humans can’t see.
- Develop a habit of asking: Is this piece of code safe?
- Always scan before releasing to production.
- Within your LLM guardrails, include instructions to redact any sensitive information.
- Encourage asking: Is AI really useful in this situation? The best way to limit the risk of machine learning is by only applying it when necessary.