rename() + fsync() implementation

Renato Botelho garga at FreeBSD.org
Mon Jul 6 13:20:26 UTC 2015


> On Jul 4, 2015, at 10:30, Mateusz Guzik <mjguzik at gmail.com> wrote:
> 
> On Fri, Jul 03, 2015 at 01:59:43PM -0300, Renato Botelho wrote:
>> Some time ago we found a bug on pfSense and after investigating we figured out the root cause was passwd / group related tools was not checking if files were safe in disk, and system ended up with a 0 length passwd/group db after a power cycle. There are more context at revision [1].
>> 
>> After that, bapt@ suggest to do similar fix for cap_mkdb and services_mkdb, that also can be found at another review [2].
>> 
> 
> This definitely needs an explanation what is going on here.
> 
> When the problem from [1] is encountered, which file appears to be
> zeroed?
> 
> If the new one, should not O_SYNC you added in several places take care
> of that? (btw, it would be nicer if that was fsynced before close
> instead)

According comment #11 of pfSense bug #4523 [3]:

----------
With SU, with or without J, you end up with 0 byte master.passwd, passwd, group, pwd.db, spwd.db. Or some subset of those. Without SU, you end up with master.passwd and/or group corrupted, containing parts of other files in /etc/.

It's replicable on stock FreeBSD 9.0 through 11-CURRENT by running the following: 

#!/bin/sh

/usr/sbin/pw userdel -n 'admin'
/usr/sbin/pw groupadd all -g 1998
/usr/sbin/pw groupadd admins -g 1999
/usr/sbin/pw groupmod all -g 1998 -M ''
echo \$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw usermod -q -n root -s /bin/tcsh -H 0
echo \$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw useradd -m -k /etc/skel -o -q -u 0 -n admin -g wheel -s /bin/sh -d /root -c 'System Administrator' -H 0
/usr/sbin/pw unlock admin -q
/usr/sbin/pw groupmod all -g 1998 -M '0'
/usr/sbin/pw groupmod admins -g 1999 -M '0'

then power cycling the system. If using SU, you'll end up with 0 byte files. Without SU, you'll have corrupted files containing parts of some other file(s) in /etc.
----------

> If the old one, this still has the window (although miniscule compared
> to 5 mins) since whatever crash/failure you experienced can happen
> before you fsync. It may be ok enough in practice, but then the question
> is whether O_SYNC on the new file was of any significance. Or to state
> differently, do callers have to fsync/O_SYNC the file they are passing
> as an argument?
> 
> Of course it may be either one can appear truncated.

The idea of using O_SYNC on temporary file, and call fsync() on directory after rename came from Kirk McKusick:

----------
Applications that are properly written take these steps:

	1. write new file to a temporary name.
	2. fsync newly written file.
	3. rename temporary name to file to be updated.
	4. For faster results, fsync directory that has updated file to commit the new file.
----------

I decided to change it on libutil because the functions are specific for passwd and group operations, otherwise it would need to be changed on all applications that calls it (usr.bin/chpass, usr.sbin/pw, usr.sbin/rpc.yppasswdd).

> Assuming the whole approach makes sense here are some comments about the
> code itself:
> 
>>        /*
>>         * Make sure file is safe on disk. To improve performance we will call
>>         * fsync() to the directory where file lies
>>         */
>>        if (rename(tname, dbname) == -1 ||
>>            (dbname_dir = dirname(dbname)) == NULL ||
> 
> dirname returns a pointer to an internal buffer, so it is not suitable
> for use in a library function (think: foo = dirname(...); this(...);)

What do you think about having dirname_r implemented? Using the same approach we have today for basename/basename_r.

> Since it can fail and does not depend on rename, it should have been
> done earlier.
> 
> dbname_dir looks like a weird name. Something like dbdirname would be
> better.

OK

>>            (dbname_dir_fd = open(dbname_dir, O_RDONLY|O_DIRECTORY)) == -1 ||
> 
> dbname_dir_fd is definitely bad. This is not a name, this is a fd. So
> dbdirfd.

OK

>>            fsync(dbname_dir_fd) != 0) {
> 
> Why does this do '!= 0' instead of '== -1’?

No special reason, I’m going to change it.

>>                if (dbname_dir_fd != -1)
>>                        close(dbname_dir_fd);
>>                err(1, "Cannot rename `%s' to `%s'", tname, dbname);
> 
> It could be renamed succeeded, so this msg should be modified to state
> what really failed. But as a library func it likely should return an
> error instead, indicating which part failed.

Agreed.

>>        }
>> 
>>        if (dbname_dir_fd != -1)
>>                close(dbname_dir_fd);
>> 
> 
> At this point dbname_dir_fd is guaranteed to be != -1.

True, thanks for pointing this out.

>> The idea is to implement a “sync rename” function to do all these steps. I thought about it and IMO lib/libutil would be a good place to implement it. But since I’m starting to touch src now, I would like to hear more opinions about this.
>> 
>> 
>> [1] https://reviews.freebsd.org/D2978
>> [2] https://reviews.freebsd.org/D2982
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>

[3] https://redmine.pfsense.org/issues/4523#note-11
--
Renato Botelho



More information about the freebsd-hackers mailing list