In code, one way I’ve found to ground the model and make its output trustworthy is test-driven development.
Make it write the tests first. Make it watch the tests fail. Make it assert to itself that they fail for the RIGHT reason. Make it write the code. Make it watch the tests pass. Learn how to provide it these instructions and then take yourself out of the loop.
When you’re done you’ve created an artefact of documentation at a microscopic level of how the code should behave, which forms a reference for yourself and future agents for the life of the codebase.
Agree. I'm having lots of fun with Gen-AI and I still have insights it does not. Test-at-the-same-time-as-prod-code is also doable with gen-AI. All the ones I tried are shit at testability by default in my experience. And every now and again they forget about tests being important.
In code, one way I’ve found to ground the model and make its output trustworthy is test-driven development.
Make it write the tests first. Make it watch the tests fail. Make it assert to itself that they fail for the RIGHT reason. Make it write the code. Make it watch the tests pass. Learn how to provide it these instructions and then take yourself out of the loop.
When you’re done you’ve created an artefact of documentation at a microscopic level of how the code should behave, which forms a reference for yourself and future agents for the life of the codebase.