OpenZFS branch tracking policy
Martin Matuska
mm at FreeBSD.org
Sat Apr 10 23:03:34 UTC 2021
Thank you for your comments, Warner.
What I would like to know is the timing - how much time do we need to
resolve the issues. I can pull in the OpenZFS code up to commit
3522f57b6 the "old" way. This is the last commit common to master and
zfs-2.1-release and can be cherry-picked to stable/13 the "old" way.
This will keep our code on par with openzfs-2.1-rc1 (rc2 is out now) and
I can add a 2-week MFC for stable/13 as usual but there are no
significant changes at all. After that we need to split main and
stable/13 and ideally move to direct tracking of OpenZFS.
I have added some comments below.
On 10. 4. 2021 21:22, Warner Losh wrote:
> Thanks for the update Martin.
>
> The tl;dr is I think this will be fine. However, I'd like to document
> the reasoning here for future cases that we may need to judge. There's
> also a couple of logistical issues at the end we need to address, one
> critical.
>
> On Sat, Apr 10, 2021 at 11:15 AM Martin Matuska <mm at freebsd.org
> <mailto:mm at freebsd.org>> wrote:
>
> Here are some of the facts:
>
> - In my merge, there are 15 conflicting files due to changes in
> FreeBSD (add/add)
> - Some of the changes have already been upstreamed in later
> revisions of openzfs than 891568c99
> - A significant majority of the diffs is subject for upstreaming.
> The ideal state would be to have all changes upstreamed. Sometimes
> changes get upstreamed with modifications.
> - In general our developers open pull requests and commit to
> OpenZFS, then we merge the changes
>
> What our developers would like is to use a "git blame" on
> sys/contrib/openzfs/something to see the history path from OpenZFS.
>
> I agree that the merge commits should be more verbose, ideally
> containing a "git log --oneline" of the commits since last merge.
>
> If a do a "squashed" merge like you described with bzip2, then I
> do not import the history from OpenZFS. That way we don't need
> that at all and can continue working the way we did until now.
>
> What you say about adding "unnecessary" history - since the common
> development at OpenZFS the majority of commits directly affects
> FreeBSD. Only "Linux-Only" and "CI-related" commits are not
> relevant for FreeBSD.
>
> I have updated my example branch how it may look like with more
> detailed commit messages, nicely clickable from github:
> https://github.com/mmatuska/freebsd-src/tree/openzfs_master_merged
> <https://github.com/mmatuska/freebsd-src/tree/openzfs_master_merged>
>
> So the the current question is quite simple, we can do one of the
> following:
> a) do the unsquashed merge I suggest that imports the openzfs
> history - this will make the commits very transparent, future
> merges and upstream tracking very easy and
> --allow-unrelated-history flag is not required anymore. The
> "common" part of the histories in main and stable/13 will be
> identical.
> b) if that is not desired or we are undecided I will continue the
> way we go now until a better solution is found. In that case I
> will fork a second vendor branch (vendor/openzfs-2.1) that starts
> with the latest common commit of openzfs/master and
> openzfs/zfs-2.1-release and will merge (or cherry-pick?) from this
> branch directly to stable/13. As an alternative to merging, git
> cherry-pick supports -Xsubtree= as well.
>
> I'm leading towards 'a', but that's a new way for the project to track
> vendor changes. Many of my comments were on how to mirror pulling in
> upstreams that we would want to do infrequently, and where we didn't
> care about the details so much. llvm is a good example, as would be
> bzip, though for different reasons. The former more due to the sheer
> size of the llvm repo and the extremely infrequent need for users and
> developers of FreeBSD to peer into the details. They simply are
> relevant for those cases. For these cases, a squashed commit makes
> sense: people don't care about the details and it keeps our repo size
> manageable and 'b' is appropriate. I had initially thought OpenZFS
> would fall into this category, but your additional details suggest
> that my initial thinking might be a poor fit to our needs.
I agree to your opinion here. The other project I maintain, libarchive,
is another example for the 'b' approach. Imports are infrequent and
FreeBSD is primarily a "downstream consumer" of libarchive even if there
is some code dedicated privately to FreeBSD. As of OpenZFS, there is
much more dedicated code, we are interested in more frequent pulls and
several of our developers are directly involved in the project
developing both "common" and "FreeBSD-related" code. What I especially
like about the OpenZFS project are the high development standards.
>
> I think that you've made a compelling case to merge in the tree. The
> potential downsides need to be looked at for doing something new.
> First is size. From the numbers you provided, OpenZFS is on the larger
> side of things we'd want to do this with. The expansion of the repo is
> concerning, so there would need to be some benefit from that. Here,
> you've clearly articulated the benefit: our OpenZFS developers drift
> back and forth between OpenZFS and FreeBSD and do development in both
> places. If these merges are frequent, this allows a more efficient
> workflow for OpenZFS maintenance. This also allows better bisecting in
> the case of trouble. One reason we don't generally want to open things
> up to merge commits is the crazy merges we did with svn that created
> weird loops. While the git transition work endeavored to eliminate
> them, a number slipped through. We do not want any more of them
> created. By that test, these commits pose no risk given then OpenZFS
> practices (and little risk outside the contrib/openzfs tree).
Are such (messy) situations even possible in git?
>
> So, the practical aspects of this: how do we do this. We'll need to
> have the OpenZFS mainline and branches in the tree, so the question of
> what namespace to put them into comes to mind. The obvious answer
> would be 'openzfs' or 'vendor/openzfs' comes to mind, but you want two
> branches, so maybe vendor/openzfs/main (or master, whatever it is
> called upstream) and vendor/openzfs/<branch-name> would be better
> since we could then recommend a 'refs' line for people working on
> openzfs that would let git do all the heavy lifting here. There's no
> issue with having both vendor/openzfs and vendor/openzfs/<foo> in the
> tree at the same time, I don't think. The current rule sets would
> allow this, and you could carefully push both the branches first. I
> don't think we need to do anything special except document how to do
> the first commit (for others who need to do this) and document how to
> update which I'm more than happy to help out with.
I would be happy with vendor/openzfs/master and
vendor/openzfs/zfs-2.1-release to use the same naming as OpenZFS does.
>
> One critical thing we need to assess before you proceed, however:
> mail. We need to make sure we're not about to send 7k emails as all
> these revisions suddenly appear in the repo... While having an extra
> 7k revs in the repo will be no problem, but 7k extra emails might
> raise a comment or two...
Is there a way to simulate this?
>
> Comments?
>
> Warner
>
> Best regards,
> mm
>
> On 10. 4. 2021 0:15, Warner Losh wrote:
>>
>>
>> On Fri, Apr 2, 2021 at 6:44 PM Martin Matuska <mm at freebsd.org
>> <mailto:mm at freebsd.org>> wrote:
>>
>> I have prepared an example merged branch here:
>> https://github.com/mmatuska/freebsd-src/tree/openzfs_master_merged
>> <https://github.com/mmatuska/freebsd-src/tree/openzfs_master_merged>
>>
>> The magical command was:
>> git merge -s subtree -Xsubtree="sys/contrib/openzfs" 891568c99
>> --allow-unrelated-histories
>>
>> Luckily, our current diff is manageable.
>>
>>
>> So I did this for bzip2 using approximately:
>>
>> git add remove bzip2 <url>
>> git fetch bzip2
>> git merge -s subtree -Xsubtree=contrib/bzip2 bzip2/master
>> --allow-unrelated-histories --squash
>>
>> [1] At this point I resolved conflicts, where were the entire
>> files since I guess I didn't bootstrap right to the last merge.
>> There were 4 files in conflict.
>>
>> Then I did a git add of all the files in conflict and a git commit.
>>
>> This produced a good commit. since it was a squash commit, there
>> were no issues.
>>
>> However, it turns out I botched the commit at point [1] above. So
>> I ran this again and got a conflict for the whole file that I'd
>> removed a blank line from.
>>
>> So, this looks like it could be workable, but does lead me to a
>> few questions:
>>
>> (1) How do we do this so that the conflicts aren't add/add
>> conflicts? Is there some way to bootstrap this?
>> (2) Do we need to keep track of the last merge point and use that
>> in merging the next one in?
>> (3) I assume we keep track of FreeBSD diffs in a branch off <url>
>> and we merge that instead of master.
>> (4) What do we do about adjustments to the build that are needed?
>> (5) Do we need to host a FreeBSD-specific repo with this stuff,
>> maybe with tags we don't want widely pushed to ease the next
>> merge? Eg, make this the first case of a 'vendor repo' that we
>> then pull squash commits from so that the vendor repo can track
>> upstream, but not otherwise be pushed to all our users....
>>
>> Finally, how did you deal with [1] producing so many full-file
>> add/add conflicts? Oh, and what kind of commit message when
>> things merge do you suggest? I rather like your 'bring in hash
>> XXXX branch blah, here's the important highlights' emails and
>> think that would be a good first cut at advice on what to put in
>> these.
>>
>> This suggests the current answer is 'seems doable, but we need to
>> document it and come up with recommendations for how to do it'.
>>
>> Warner
>>
>> On 3. 4. 2021 1:37, Martin Matuska wrote:
>> > Hi Warner and Ed,
>> >
>> > 2.1-release has already been branched. The stable branch
>> policy in
>> > OpenZFS is somewhat strange, they make a staging branch for
>> each
>> > patchlevel release, but the commits are continuous.
>> >
>> > To have some idea how big the repo history is:
>> >
>> > $ git rev-list master --count
>> > 6662
>> >
>> > $ git rev-list zfs-2.1-release --count
>> > 6650
>> >
>> > master and zfs-2.1-release have 6650 common commits at the
>> moment
>> >
>> > $ git log master | wc -l
>> > 129868
>> >
>> > (linecount - 4 * revcount) / revcount = linecount /
>> revcount - 4 =
>> > 15,4938 comment lines per commit on average
>> >
>> > Initial commit was made in Feb 26, 2008.
>> >
>> > Yearly commit counts:
>> >
>> > $ git log master | grep -c -E '^Date:.* 2020 -[0-9]+$'
>> > 666
>> >
>> > $ git log master | grep -c -E '^Date:.* 2019 -[0-9]+$'
>> > 535
>> >
>> > $git log master | grep -c -E '^Date:.* 2018 -[0-9]+$'
>> > 428
>> >
>> > Martin
>> >
>> > On 2. 4. 2021 20:15, Warner Losh wrote:
>> >>
>> >>
>> >> On Fri, Apr 2, 2021 at 11:56 AM Ed Maste
>> <emaste at freebsd.org <mailto:emaste at freebsd.org>
>> >> <mailto:emaste at freebsd.org <mailto:emaste at freebsd.org>>>
>> wrote:
>> >>
>> >> On Fri, 2 Apr 2021 at 11:50, Warner Losh
>> <imp at bsdimp.com <mailto:imp at bsdimp.com>
>> >> <mailto:imp at bsdimp.com <mailto:imp at bsdimp.com>>> wrote:
>> >> >
>> >> > We'd always hoped that we'd be able to do subtree
>> merges from
>> >> upstreams
>> >> > that use git into FreeBSD. The big worry, though,
>> was that this
>> >> would
>> >> > needless bloat the repo with a lot of history. We
>> don't want,
>> >> for example,
>> >> > all of LLVM's history in the tree. We'd always
>> anticipated that
>> >> there'd be
>> >> > some things we'd just accept the history for, since
>> it is
>> >> similar in
>> >> > character to the vendor branches (though of course a
>> bit more).
>> >>
>> >> Note that if we do want to avoid bringing in the full
>> history `git
>> >> subtree merge` supports a `--squash` option. This
>> brings in the
>> >> set of
>> >> upstream changes as a single commit, without bringing
>> along the
>> >> associated history. We will need to do more
>> experimentation to
>> >> confirm
>> >> that the full process, including bootstrapping, will
>> work as we
>> >> want.
>> >> Assuming this all works it should allow us to forgo
>> the use of a
>> >> FreeBSD-specific vendor branch in src.
>> >>
>> >> We've discussed mirroring any such 3rd-party source in
>> some
>> >> FreeBSD-controlled repository. This would allow the
>> project to
>> >> retain
>> >> a full copy of the history, but avoid bloating src
>> with it.
>> >>
>> >> I agree with Warner that we may want a different
>> policy (full
>> >> history
>> >> or snapshots) for different contrib sources.
>> >>
>> >>
>> >> Good points Ed. I'd forgotten about --squash.
>> >>
>> >> Martin, what's your timeline for wanting to implement
>> these things?
>> >> I'm unfamiliar with the OpenZFS schedules.
>> >>
>> >> Warner
>> > _______________________________________________
>> > freebsd-git at freebsd.org <mailto:freebsd-git at freebsd.org>
>> mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-git
>> <https://lists.freebsd.org/mailman/listinfo/freebsd-git>
>> > To unsubscribe, send any mail to
>> "freebsd-git-unsubscribe at freebsd.org
>> <mailto:freebsd-git-unsubscribe at freebsd.org>"
>>
More information about the freebsd-git
mailing list