Beyond Code Review – aiweekly.co.in

Not that long ago, we were resigned to the idea that humans would need to inspect every line of AI-generated code. We’d do it personally, code reviews would always be part of a serious software practice, and the ability to read and review code would become an even more important part of a developer’s skillset. At the same time, I suspect we all knew that was untenable, that AI would quickly generate much more code than humans could reasonably review. Understanding someone else’s code is harder than understanding your own, and understanding machine-generated code is harder still. At some point—and that point comes fairly early on—all the time you saved by letting AI write your code is spent reviewing it. It’s a lesson we’ve learned before; it’s been decades since anyone except for a few specialists needed to inspect the assembly code generated by a compiler. And, as Kellan Elliott-McRae has written, it’s not clear that code review has ever justified the cost. While sitting around a table inspecting lines of code might catch problems of style or poorly implemented algorithms, code review remains an expensive solution to relatively minor problems.

With that in mind, specification-driven development (SDD) shifts the emphasis from review to verification, from prompting to specification, and from testing to still more testing. The goal of software development isn’t code that passes human review; it’s systems whose behavior lives up to a well-defined specification that describes what the customer wants. Finding out what the customer needs and designing an architecture to meet those needs requires human intelligence. As Ankit Jain points out in Latent Space, we need to make the transition from asking whether the code is written correctly to asking whether we’re solving the right problem. Understanding the problem we need to solve is part of the specification process—and it’s something that, historically, our industry hasn’t done well.

Verifying that the system actually performs as intended is another critical part of the software development process. Does it solve the problem as described in the specification? Does it meet the requirements for what Neal Ford calls “architectural characteristics” or “-ilities”: scalability, auditability, performance, and many other characteristics that are embodied in software systems but that can rarely be inferred from looking at the code, and that AI systems can’t yet reason about? These characteristics should be captured in the specification. The focus of the software development process moves from writing code to determining what the code should do and verifying that it indeed does what it’s supposed to do. It moves from the middle of the process to the beginning and the end. AI can play a role along the way, but specification and verification are where human judgement is most important.

Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.

Drew Breunig and others point out that this is inherently a circular process, not a linear one. A specification isn’t something you write at the start of the process and never touch again. It needs to be updated whenever the system’s desired behavior changes: whenever a bug fix results in a new test, whenever users clarify what they want, whenever the developers understand the system’s goals more deeply. I’m impressed with how agile this process is. It is not the agile of sprints and standups but the agile of incremental development. Specification leads to planning, which leads to implementation, which leads to verification. If verification fails, we update the spec and iterate. Drew has built Plumb, a command line tool that can be plugged into Git, to support an automated loop through specification and testing. What distinguishes Plumb is its ability to help software developers look at the decisions that resulted in the current version of the software: diffs, of course, but also conversations with AI, the specifications, the plans, and the tests. As Drew says, Plumb is intended as an inspiration or a starting point, and it’s clearly missing important features—but it’s already useful.

Can SDD replace code review? Probably; again, code review is an expensive way to do something that may not be all that useful in the long run. But maybe that’s the wrong question. If you don’t listen carefully, SDD sounds like a reinvention of the waterfall process: a linear drive from writing a detailed spec to burning thousands of CDs that are stored into a warehouse. We need to listen to SDD itself to ask the right questions: How do we know that a software system solves the right problem? What kinds of tests can verify that the system solves the right problem? When is automated testing inappropriate, and when do we need human engineers to judge a system’s fitness? And how can we express all of that knowledge in a specification that leads a language model to produce working software?

We don’t place as much value in specifications as we did in the last century; we tend to see spec writing as an obsolete ceremony at the start of a project. That’s unfortunate, because we’ve lost a lot of institutional knowledge about how to write good, detailed specifications. The key to making specifications relevant again is realizing that they’re the start of a circular process that continues through verification. The specification is the repository for the project’s real goals: what it’s supposed to do and why—and those goals necessarily change during the course of a project. A software-driven development loop that runs through testing—not just unit testing but fitness testing, acceptance testing, and human judgment about the results—lays the groundwork for a new kind of process in which humans won’t be swamped by reviewing AI-generated code.