You have 1 article left to read this month before you need to register a free LeadDev.com account.
The release of the world’s “first AI software engineer” by Cognition has caused consternation, but should devs be worried?
Cognition, an applied artificial intelligence (AI) laboratory based in New York and San Francisco, caused a stir on March 12 when it posted a video demonstrating Devin, a generative AI tool claimed as the world’s first “fully autonomous AI software engineer.”
Devin is a large language model-based chatbot that takes simple text prompts and then talks through its plans on how to tackle the problem posed. It then builds a project using the tools typically used by human software engineers, including its command line, code editor, and a browser through which it executes code requests.
Unlike existing AI-powered coding assistants like GitHub Copilot, Devin doesn’t just provide the output to a prompt but appears to reason through its decision-making and respond to guidance, as a colleague might.
Devin has been built to be “a tireless, skilled teammate, equally ready to build alongside you or independently complete tasks for you to review,” the company claims.
Cognition asserts it is already pretty good at it too. The company claims best-in-class performance on the SWE-Bench benchmark, a set of commonly used tests to try and assess the coding performance of generative AI tools. And the results aren’t even close. Unassisted, the best tool before Devin was getting 1.96% of tasks right on SWE-Bench tests. At the time of its launch, Devin was getting 13.86% correct without assistance.
The company also claims that Devin has been able to “pass practical engineering interviews from leading AI companies”, and complete “real jobs” posted on the freelancing platform Upwork.
Believe the hype?
Alongside Cognition’s announcement came a slew of supportive posts on social media from friendly developers with early access to the tool. Patrick Collison, the Stripe cofounder, said that the examples included in the company’s video weren’t just cherry-picked. “Devin is, in my experience, very impressive in practice,” he wrote. Former OpenSea chief technical officer Alex Atallah said, “This is the first AI agent I’ve used that feels like a real, useful person is on the other end.”
Social media users have also joked about quitting their jobs and taking up hobbies full-time now that Devin has been launched.
“This is awesome and frightening at the same time,” says Humberto Cuadra, a senior software engineer at Flying Man Productions. “It shook the software community and many developers are wondering if they will be out of jobs soon. I think there is some truth to that. Tools like Devin will definitely diminish demand for simpler development jobs from the market – like website building or simple applications.”
But not everyone is convinced. “My position would be through reasoning rather than through experience – while the video is slick, I would still expect generative AI to get things wrong systemically, and it’s not clear to me whether they’d leave things in a state where an actual software engineer could get things back on track again,” says Eerke Boiten, a professor at De Montfort University in the UK.
While Devin’s state-of-the-art results on SWE-Bench are impressive in comparison to what’s gone before, they’re still in the low double-digits. “This now solves 1 out of 10 Github issues, oh wow congrats,” Nisten T. Tahiraj, a developer at Skunkworks AI, posted on X.
“From the videos, it seems it’s more of an advanced agent that can switch between certain tools,” Gergely Orosz wrote in his newsletter, The Pragmatic Engineer. “Still, at this point, Devin doesn’t feel more than a heavily work-in-progress prototype.”
What’s next?
Cutting through the hype, is this a significant leap from existing coding assistants, or a clever attempt to differentiate a product from its competitors? “Devin represents a true first step into agentive behavior for LLMs,” says Pietro Schirano at EverArt. “The way they package it is really good.”
Schirano is cautiously optimistic about the performance of Devin. “The demo is really good, but at the same time, a lot of the things they show in the demo can be done just by asking [OpenAI’s] GPT-4,” he says.
The real question is whether Devin intends to help developers, or replace them. “If it’s for a developer, that’s great because a developer knows how to use it and understand if the code is correct, but if it’s for the general person who has never coded in their life, it could be useful, but we’re still far from someone with no coding experience doing something with it,” Schirano says.
It’s unclear when exactly developers will be able to get their hands on Devin, with access currently limited to a waitlist.