Test-Driven Code Generation

SaschaWildgrube · ‎11-03-2024

LLMs have made considerable progress since ChatGPT was made available to the public for the first time. And the quality of generated code improved greatly. But yet, the created code by far does not yet meet production standards and is not fit-for-purpose in many cases.

The core challenge is that code - in order to function properly - must be an expression of an algorithmic idea - and not the random summary of something that someone (or something) read before. The latter may be the starting point, but it is not sufficient for a final version.

Whether tests are written as code or executed manually by a developer, in most cases a (human) developer runs in cycles of building a hypothesis, then writing code, testing the resulting program and analyzing the result of the test - and then starts over.

Test-Driven Development is the practice of automating parts of that cycle by writing test code early in the process and shorten the feedback loop between the hypothesis and the analysis, allowing for greater productivity and more robust results.

When the quality of generated code is assessed, we compare apples and oranges. Generated code usually results from a single prompt, containing a description of the requirement, eventually the current version of the code. Do we really expect AI to deliver a result that is comparable to what an experienced human developer would provide after running through the cycle several times? Is that fair?

The more relevant questions are: are we asking AI the right way? Do we make the best use of the technology?

Test-Driven Code Generation (also referred to as "Test-Driven Generation" or "TDG") is to integrate the generative power of AI in the same procedure that the human developer would go through. Where Test-Driven Development automates the testing part in the cycle, Test-Driven Code Generation aims at automating the part where code is actually written.

Human writes boilerplate code and describes the outcome.
Human writes a test.
The test is run.
If the test passes, end here or return to step 2 to add more test cases.
If the test fails, ask AI to implement the code based on the description, the interface, the current version of the code, the test code and the latest test results.
Return to step 3.

As of now, the Artificial Developer app supports Test-Driven Code generation for Script Includes only. As there are no built-in references between Script Includes and ATF Tests, the app assumes a naming convention that requires an ATF test to be called after the app the Script Include is contained in, then a dash, then the name of the Script Include. E.g.: "Artificial Developer - TestDrivenCodeGeneration".

For disambiguation: The term "Test-Driven Generation" has sometimes been used to describe the process of creating test cases (not the code to fit these tests) using AI. However, more recent publications use the term consistently for for code (not test) generation.

Get started now:
https://www.wildgrube.com/servicenow-artificialdeveloper