How to get the deterministic result for FreeBSD tar(1)?
Yuri
yuri at rawbw.com
Tue Dec 8 10:59:53 UTC 2015
I have two identical directories (no diffs, all identical mtime
attributes) compressed by this command:
find dir -print0 | LC_ALL=C sort -z | tar cf archive.tgz --format=bsdtar
--no-recursion --null -T -
The results are different: 3 files out of 10,000 have pax attributes set
that are different:
- 27 ctime=1449566560.642715
+27 ctime=1449566903.167521
src/contrib/libarchive/archive_write_set_format_by_name.c suggests that
format=bsdtar should force ARCHIVE_FORMAT_TAR_PAX_RESTRICTED format (no
attributes), unless need_extension=1 is set on a per-file basis in
archive_write_set_format_pax.c.
need_extension=1 is triggered by these conditions:
* too long or non-ASCII path
* too long or non-ASCII link
* too large file
* too long GID or UID
* too long or non-ASCII group name or user name
* ACL entries and extended attributes
* sparse info
In my case file hierarchy is indeed very deep, and these three files
also have the "path" attribute.
I think this is a bug that in archive_write_set_format_pax.c ctime
attribute is written in case one of the above conditions are satisfied,
because ctime can't be controlled by the user, and will always cause the
difference.
So I have two questions:
1. How do I actually achieve the output determinism for tar(1)?
2. Is there an agreement that this is a bug that too long or non-ASCII
path name triggers the leakage of ctime into a tar file?
Yuri
More information about the freebsd-hackers
mailing list