I agree that you need training data to build AI from scratch, much like you need lots of really smart developers and a mailing list and servers and stuff to build the Linux kernel from scratch. But it's not like having the training data and training code will get you the same result, in the way something like open data in science is about replicating results.
Reproducible builds of software binaries are a thing, but they aren't routinely done. Likewise training an AI is deterministic if you do it the same each time. And slight variances lead to similar capability models.