I don't think that's a reasonable assumption. If we allow ourselves to assume no errors, we could just assume GPT-3 makes no errors and declare it equivalent to a code interpreter.
Interpreter? Sure. That interpretation is not "equivalent to executing the code", though.
Imagine a C compiler that does aggressive optimizations - sacrificing huge amounts of memory for speed. On one hand, it even reduces computational complexity, on the other it produces incorrect results for many cases.
GPT-3 as presented here would be comparable to that. Neither are equivalent to executing the original code.
Meanwhile, the result of something like gcc is, even if it runs on a computer with faulty RAM.
Speed and memory is orthogonal to my point, which is about the output of two methods of arriving at an answer. I'm obviously not saying GPT-3 is anything like as efficient as running a small function.
What distinction are you drawing between the output of an interpreted program and a compiled program?