We do two pass scanning to figure out our local contents;
-
Walk the folder. For each file check if it's already in the database and is a match, otherwise hash it and add to the database. Database is indexes by file name.
-
Iterate over the database, issuing an os.Lstat() for each item in it. If we get back an error, note the file as deleted.
This breaks when a file is renamed case-only on a case insensitive file system. In step one we find it as a new file, since the database is case sensitive. In step two we don't delete the old variant, becase os.Lstat() on a case variant of an existing file name still works.
We might be able to get away with just fixing step two in the scanning process. We'll still hash the file unnecessarily (and lose version vector history) but this is not a huge deal... We could change the algorithm from the current Lstat()-based into something like
-
When iterating over the database,
-
If the current item is a directory, do a listdir on it
-
If it's a file, look for it in the listdir results from the previous step
-
Thus we would not find the file under the incorrect case variant, and conclude it's been deleted. (We'll also save a bunch of Lstat calls so it may actually be more efficient.)
We could add a FlagCaseInsenstive at the protocol level and include it in FileInfos. This bit would be set by the scanner when it knows it's operating on a case insensitive filesystem (by configuration, or we can auto detect it). When set, the file would be stored in the database under a canonicalized name (i.e. lower case) and all the set.FileSet methods would need to know about this and handle it correctly...
Thus lookups would find the file under any case varient and we wouldn't see a case-only rename as a new file + delete. We'd probably need special handling to actually pick up the case change though.
When syncing files from case sensitive devices to case insensitive we must "taint" them with the bit in question.