Berlin

November 4 & 5, 2024

New York

September 4 & 5, 2024

Learning to trust generative AI

Love it or loathe it, engineering leaders have to learn to live with generative AI, but can you ever really trust the model?
October 28, 2024

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 6 minutes

Generative AI is absolutely everywhere.

It seems every enterprise software manufacturer is rushing to add AI-powered features. While some of these will no doubt prove beneficial, given AI’s patchy history, it is often up to engineering leaders to work out how the technology can be integrated safely. 

If your organization is going to rely on AI for key internal functions, or even ship it as part of a product, you need to be able to trust that it will work as expected.

While some companies have responded to these fears with an outright ban, this approach is incredibly naive. Generative AI tools are now built into apps like Gmail, Microsoft Word, and Adobe Photoshop, social networks like Facebook and Instagram, Google Search, and countless other places. 

It is virtually impossible to actually ban employees from using AI tools, and even then, it is likely that some people will just hide it. If you can’t beat them, join them.

The utility and risks of AI

The promise of many generative AI tools is pretty clear: They can speed up repetitive tasks, simplify time-consuming ones, and generally make producing text and code more efficient. For organizations that are consistently looking to increase the speed at which they ship, it’s a compelling case.

But these benefits are offset by some pretty significant risks. The National Institute of Standards and Technology (NIST) has an AI risk management framework that exhaustively covers the risks, but broadly, there are two major kinds of risks with AI, and large language models (LLMs) in particular: Input risks and output risks

Input risks are basically someone inputting sensitive consumer or corporate data into a tool that doesn’t have the correct safeguards. Think, giving the free version of ChatGPT your payroll data. While it is unlikely that this will directly lead to a data breach, it almost certainly violates all kinds of contracts, regulations, and laws. 

Output risks are more nefarious. For engineering teams, there are three main output risks when working with code:

  1. The model hallucinates and outputs something that doesn’t work as expected.
  2. The model outputs something that works but introduces some kind of security vulnerability or legal liability.
  3. The model has some kind of inherent bias that affects its ability to perform as intended.

In all cases, people are responsible for what happens when using AI goes wrong. When an Air Canada chatbot hallucinated a policy, the company was legally obliged to honor it. Multiple lawyers have been punished for citing fake precedent generated by ChatGPT. If your organization gets AI wrong, it could be on the hook for serious fines – especially as regulations like the EU AI Act and California AI Transparency Act come into force. 

Explainability is an option – but not for LLMs

Martin Fowler, chief scientist at Thoughtworks, argues for explainable AI in a blog post. “The general principle should be that decisions made by software must be explainable,” Fowler writes, and that extends to machine learning and generative AI applications. 

In some situations this is relatively trivial to implement, especially with more narrow forms of AI. For example, object recognition algorithms typically return a certainty score. What threshold of certainty you require for any specific application can vary – a cancer diagnosing algorithm requires a much higher degree of certainty than one that identifies plant species. 

Unfortunately the inner workings of LLMs aren’t yet fully understood by the people building them, let alone easily explainable. They are non-linear, one-way algorithms. Even knowing the input and the output, there’s no way to track what went on inside a model’s neural network. Researchers are working on solving this problem, but they’re not there yet. 

Fowler argues that restricting unexplainable AIs usage would incentivize the development of more explainable AI models, however it seems unlikely that such major restrictions will be broadly applied. There are plenty of legitimate AI experts who disagree with the fundamental premise that all software decisions should be readily explainable. While simple models can be relatively trivial to explain, complex models that weigh many factors may not lend themselves to an easy explanation. That doesn’t mean we shouldn’t use them. 

For example, a social media feed pulls in dozens of different variables to decide what posts to show and in what order. To accurately explain why you see any individual post could require hundreds of words and will likely always involve some level of randomness. Even though this explanation may be of interest to some users, it’s hard to argue that it would radically improve most social media sites. 

And while building more explainable LLMs and generative AI models is a laudable goal, engineering leaders have to deal with the tools that are available now. Maybe one day ChatGPT will be able to explain what it does, but it can’t yet.

AI governance, codes of ethics, and transparency

Some measure of these things may be decided for you. Publicly traded companies and larger organizations are increasingly being governed by laws around the world that require some kind of official AI policy and a minimum level of transparency around AI usage. This takes the issue out of the hands of engineering leaders and kicks it up to the board. Though you may be responsible for how a broad set of guidelines are turned into real world rules.

Andrew Pery, AI Ethics Evangelist at ABBYY, thinks that organizations should embrace AI transparency whether they are legally obliged to or not. Publishing some kind of AI code of ethics on your website can help offset client and customer concerns. “We make it very clear what we do, how we protect client data, how we make sure training data is accurate,” he says. “Transparency, accountability, and visibility create trust.”

How to implement a team AI policy

With how prevalent AI tools are, your team is almost certainly going to use them, so it’s worth putting an AI policy in place. That way, you at least can have some control over how they are integrated with your work requirements. 

  • Decide what tools are and aren’t acceptable. In larger organizations, this almost certainly involves only using officially provided tools; but in smaller companies, you might have quite a bit of leeway. Pery suggests checking the terms of service carefully; you don’t want to use anything that could capture any data you input.
  • Even with tools you decide to use, set out clear guidelines for what kinds of uses are appropriate. In particular, what kinds of data you can use with which tools. Enterprise AI tools may be suitable for client data when things like ChatGPT just aren’t. 
  • Default to transparency. Make your AI policy public (or at least available to everyone in your organization).
  • Check that the tools you’re using are as efficacious as claimed. If they aren’t performing as expected or are creating problems, reconsider their use. 

When it comes to generative AI, putting your head in the sand is not an option. Without clear policies in place, your team is likely to adopt and use whatever tools take their fancy. While some of these may be safe and secure for commercial use, others may not be.

Since AI is moving so fast and new models and tools are being released all the time, make sure your guidelines are flexible enough to cover new use cases. And most of all, keep your policy updated.