Personally, I think "AI" should be required to disclose the origins of the data ...

simonw · on Dec 14, 2023

For current LLMs that's not technically feasible, because every single token they output is influenced by every token they trained on - so any answer you got from them would have to include disclosure of millions of documents that went into the training set.

(I'd very much like them to disclose the full scope of their training set, but it's not possible for them to do that on a prompt-by-prompt basis in the way you describe.)