Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author doesn’t explain what happened or why the proposed flags will solve the problem.


> Author doesn’t explain what happened or why the proposed flags will solve the problem.

Probably because she/he doesn't know. Could be lots of things, because FYI mtime can be modified by the user. Go `touch` a file.

In all likelihood, it happens because of a package installation, where a package install sets the same mtime, on a file which has the same sized, but has different file contents. That's where I usually see it.

`httm` allows one to dedup snapshot versions by size, then hash the contents of identical sized versions for this very reason.

    --dedup-by[=<DEDUP_BY>] comparing file versions solely on the basis of size and modify time (the default "metadata" behavior) may return what appear to be "false positives".  This is because metadata, specifically modify time and size, is not a precise measure of whether a file has actually changed. A program might overwrite a file with the same contents, and/or a user can simply update the modify time via 'touch'. When specified with the "contents" option, httm compares the actual file contents of same-sized file versions, overriding the default "metadata" only behavior...


It is worth nothing that rsync doesn't compare just by size and mtime but also (relative) path - i.e. it normally compares an old copy of a file with the current version of the same file. So the likelyhood of "collisions" is much smaller than a file de-duplicating tool that compares random files.


I think you may misunderstand what httm does. httm prints the size, date and corresponding locations of available unique versions of files residing on snapshots.

And -- this makes it quite effective at proving how often this happens:

    > httm -n --dedup-by=contents /usr/bin/ounce | wc -l
    3
    > httm -n --dedup-by=metadata /usr/bin/ounce | wc -l
    30


there is also some weirdness or used to be some weirdness with linux and modifying shared libraries. for example if you have a process is using a shared library and the contents of the file is modified (same inode) then what behaviour is expected? i think there are two main problems

1) pages from the shared library are lazily loaded into memory so if you try and access a new page you are going to get it from the new binary which is likely to cause problems

2) pages from the shared library might be 'swapped' back to disk due to memory pressure. not sure whether the pager will just throw the page away and try to swap back in from disk from the new file contents or if it will notice the disk page is dirty and use the swap for write back to preserve the original page.

also, i remember it used to be possible to trigger some error if you tried to open a shared library for writing while it was in use but I can't seem to trigger that error anymore.


Author explained that by default if two files are the same size, and have the same modification date/time, that rsync will assume they're identical, WITHOUT CHECKING THAT.

Author clarifies there are flags to change that behaviour, to make it actually compare file contents, and then shares those names.

It seems like you didn't read the article.


Short of checking every single byte against each other you need to do some sort of short hand.

“Assume these two files are the same, check if they either system is saying they have modified it or check if the size has changed and call it a day” is pretty fair of an assumption and something even I knew RSync is doing and I’ve only used it once in a project 10 years ago. I am sure Rachel also knows this.

So, what is the problem? Is data not being synced? Is data being synced too often? And why do these assumptions lead to either happening? What horrors is the author expecting the reader to see when running the suggested command?

That is what is not explained in the article.


Genuinely that's a feature not a bug. If you didn't rt"friendly"m the problem explicitly exists between keyboard/vr-headset and chair/standing-desk.

This should never be a surprise to people unless this is their first time using Unix.


They also have to have the same name though. The actual chances of this situation happening and persisting long enough to matter are pretty damn small.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: