Re: Repeatable builds using pkgbase

From: Doug Rabson <dfr_at_rabson.org>
Date: Wed, 30 Aug 2023 15:09:30 UTC
On Wed, 30 Aug 2023 at 15:59, Doug Rabson <dfr@rabson.org> wrote:

>
>
> On Mon, 21 Aug 2023 at 17:26, Doug Rabson <dfr@rabson.org> wrote:
>
>>
>>
>> On Mon, 21 Aug 2023 at 17:23, Baptiste Daroussin <bapt@freebsd.org>
>> wrote:
>>
>>> On Mon, Aug 21, 2023 at 02:33:24PM +0100, Doug Rabson wrote:
>>> > While working on build scripts for FreeBSD container images, I wanted
>>> to
>>> > get to the point where my builds are repeatable, i.e. if I create two
>>> > images with the same set of packages installed in the same order, they
>>> > should be identical.
>>> >
>>> > The main stumbling block is timestamps. I can force all the file
>>> timestamps
>>> > to a fixed value with buildah using the '--timestamp' argument to
>>> either
>>> > 'buildah commit' or 'buildah build' but even then, the two images have
>>> > different hashes. Looking deeper, the difference is in
>>> > /var/db/pkg/local.sqlite. If I compare SQL dumps of the databases from
>>> each
>>> > image, I can see a timestamp embedded in the sqlite file:
>>> >
>>> > diff dump1 dump2
>>> >
>>> >
>>> > 4c4
>>> > < INSERT INTO packages
>>> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo
>>> package','zoneinfo
>>> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org','
>>> > https://www.FreeBSD.org
>>> >
>>> ','/',731014,0,0,1,1692446701,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0);
>>> > ---
>>> > > INSERT INTO packages
>>> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo
>>> package','zoneinfo
>>> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org','
>>> > https://www.FreeBSD.org
>>> >
>>> ','/',731014,0,0,1,1692622924,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0);
>>> >
>>> >
>>> > Looking at the pkg source, I can see that the prepared statement for
>>> > inserting into the packages table explicitly uses NOW() for this
>>> column.
>>> > Would it be reasonable to allow changing this, e.g. by adding a command
>>> > line argument to pkg to override the default? I haven't tried this to
>>> see
>>> > if that makes the two databases identical - if not, I guess I'll just
>>> > remove pkg metadata altogether.
>>>
>>> yes this would be reasonable, if you use en env var, please respect
>>> SOURCE_DATE_EPOCH.
>>>
>>> I'll try this out, probably using an env var as you suggest. Hopefully
>> there is nothing non-deterministic in sqlite which would stop this from
>> being reproducible.
>>
>
> Sadly, even if I override the timestamp written to the packages table, the
> resulting local.sqlite files on two consecutive runs are still different.
> If I compare the two using 'sqlite3 local.sqlite .dump', the sql dumps are
> identical so there is something else in sqlite which is making things
> non-reproducible. I guess I'll have to fall back to plan B and remove the
> package metadata from my images.
>

Weirdly, if I regenerate the local.sqlite file using sqlite3's .dump and
.read commands, the resulting DB file does have a consistent hash so that
might be a plan C.