Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That word, "temporarily", is doing a lot of heavy lifting in a digital world where things can be duplicated for free.


Seems like an s3 bucket would have been a better alternative. We have no idea what OpenAI does with Dropbox customer data outside of storing it for 30 days. They're doing something, basically all Dropbox customer files with get propagated to OpenAI by default and that should be scary, not feel good.

    Dropbox’s practices aren’t unprecedented, but customer documents do pass through OpenAI’s servers and are stored there for up to 30 days, and the “third-party AI” toggle is turned on by default in account settings.


What makes you think this is about "basically all Dropbox customer files"?


Turning on AI by default seems to indicate they're sending your data somewhere automatically before seeking approval or opt-in. I could be very wrong, but the wording alone would at the very least make me cautious.


If it’s turned on by default and one of its capabilities is to use AI to search your files, then why wouldn’t we assume it applies to basically all files? How could it not?


So that depends entirely on how they implemented the feature. There are a few ways this could be working:

- They gave their chat interface the ability to run regular full-text searches against Dropbox - when you ask a question that can be answered by file content, it searches for relevant files and then copies just a few paragraphs of text into the prompt to the AI.

- They might be doing this using embeddings-based semantic search. This would require them to create an embeddings vector index of all of your content and then run vector searches against that.

- If they're doing embeddings search, they might have calculated their own embeddings on their own servers... or they might have sent all of your content to OpenAI's embeddings API to calculate those vectors.

Without further transparency we can't tell which of these they've done.

My strong hunch is that they're using the first option, for cost reasons. Running embeddings is an expensive operation, but storing embeddings is even more expensive - to get fast results from an embeddings vectors store you need dedicated RAM. Running that at Dropbox scale would be, I think, prohibitively expensive if you could get not-quite-as-good results from a traditional search index, which they have already built.

If they ARE sending every file through OpenAI's embedding endpoint that's a really big deal. It would be good if they would clarify!


It's such an obvious obfuscation of what everyone can assume is a permanent ownership of user data. As well as the assumption that it's use will be limited. There are no supports for user data retention in the ToS. Unless a whistleblower reveals specific uses of the data and users litigate the issue, they do what they want with zero opposition.


Also, temporarily doesn't necessarily mean a time period less than one hundred years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: