Git new feature when cloning
- Reply: Li-Wen Hsu : "Re: Git new feature when cloning"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 29 Jul 2022 11:41:53 UTC
Hi, A while back, Git grew a way to filter the objects it asks the server when cloning. It can speed up the download because it will download less data. It also stores less information locally, so this is a bonus. The only drawback is that whem you ask for information it does not have locally, it will have to download the missing data, which it'll store locally, so you don't download something twice. (it's done under the hood and you don't see it happening, the only thing you'll see is the command being a bit longer to return.) It all happens in the --filter argument to git clone, see git-rev-list(1) for the whole explanation, and range things you can do. It can filter a few things, but in order of information downloaded, the most common values I can see for our usage are: --filter=blob:none This will download all the commits and all the trees (which are the file list of a directory), and only the blobs needed to checkout the branch you asked for. --filter=tree:0 This will download all the commits, and only the trees and blobs needed to checkout the branch you asked for. Both of those can be used with --sparse, which enables sparse checkout, which basically only checks out the files in the root directory, and you need to use git sparse-checkout to add/remove files to the checkout. That can be useful if you don't have a lot of disk space, and need multiple checkouts to work on. Note that you can't really use --sparse on the ports tree if you want to build things out of it, because you would need to add all the dependencies, and the framework, to build a port. For a kernel developper though, you can probably live with only having the kernel sources and not the whole world. And for numbers because we all love numbers : | filter | SRC | PORTS | DOC | |------------------|-------|-------|------| | blob:none | 605M | 576M | 119M | | blob:none sparse | 314M | 498M | 37M | | tree:0 | 407M | 238M | 97M | | tree:0 sparse | 115M | 115M | 15M | | filtering | 1461M | 1010M | 321M | This is the size of .git/objects, for a checkout done this morning. So it is basically the amount of data downloaded from the server. Note that contrary to using --depth=X, which limits the number of commits you get from the server, and which renders the repository ok for testing, but not great for development because fo some limitations, the repository you get when running --filter is fully usable, the only drawback is that if you need bits of history you filtered out, they will be downloaded on the fly so internet access may be required. PS: as filtering is done on the server, a knob needed to be enabled on our servers, gitlab and github already supported the feature. gitrepo.f.o and gitrepo-dev.f.o have it enabled, I am unsure about the mirror status, but they should be ok too. -- Mathieu Arnold