Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Personally, I think "AI" should be required to disclose the origins of the data it bases it's responses on, and if any of those are copyrighted.


For current LLMs that's not technically feasible, because every single token they output is influenced by every token they trained on - so any answer you got from them would have to include disclosure of millions of documents that went into the training set.

(I'd very much like them to disclose the full scope of their training set, but it's not possible for them to do that on a prompt-by-prompt basis in the way you describe.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: