Discussion:
git fails on large repo clone on intermittent, or intermittently-high-latency, connections
Zenaan Harkness
2011-01-05 14:28:40 UTC
Permalink
Hi, I am trying to clone opentaps.git. The following is my third try,
and I am giving up now. As you can see I pressed <Return> every now
and then, and for the last long while, absolutely nothing downloading
- the connection has clearly died once again. My last run was
yesterday and I left it run overnight.

I cannot download (with a git clone/ initial repo), more than roughly
100MiB. I am on a satellite connection. I have also experienced this
with wireless connection about 18 months ago.

I have in the middle there (a few months ago) spent a couple months at
a friend's place, and never had the same problem - nice ADSL2+
connection, ~1.5MiB/s connections. As you can see with my satellite
(rural) and also with my older wireless (also rural) connections, I do
not get more than about 64KiB/s, but it's usually slower, and I've
always had satellite latency issues in the order of 450ms, and
sometimes the odd dropout.

As opposed to git, wget on the other hand not only retries and
continues from where it left off when it retries (HTTP protocol)
(default retry 10 times I think), but I can completely INTerrupt wget,
and start it again from an entirely different computer if I want
(using wget's --continue option), and it will (with HTTP) happily
continue right where it left off, and eventually my download
completes.

NOT so with git! :

$ git clone git://gitorious.org/opentaps/opentaps.git opentaps.git
Cloning into opentaps.git...
remote: Counting objects: 105724, done.
remote: Compressing objects: 100% (30417/30417), done.
Receiving objects: 5% (5888/105724), 10.44 MiB | 21 KiB/s
Receiving objects: 5% (5898/105724), 12.18 MiB | 51 KiB/s
Receiving objects: 5% (5920/105724), 17.47 MiB | 38 KiB/s
Receiving objects: 5% (5923/105724), 19.64 MiB | 23 KiB/s
Receiving objects: 5% (5939/105724), 30.01 MiB | 27 KiB/s
Receiving objects: 5% (6184/105724), 41.00 MiB | 47 KiB/s
Receiving objects: 7% (7818/105724), 52.77 MiB | 58 KiB/s
Receiving objects: 8% (9170/105724), 67.66 MiB | 56 KiB/s
Receiving objects: 10% (11309/105724), 70.57 MiB | 24 KiB/s
Receiving objects: 12% (13413/105724), 82.43 MiB | 29 KiB/s
Receiving objects: 12% (13495/105724), 96.81 MiB | 39 KiB/s
Receiving objects: 12% (13495/105724), 101.57 MiB | 47 KiB/s
Receiving objects: 12% (13523/105724), 142.64 MiB | 27 KiB/s
<here it died, after over an hour dead, I killed it completely>

Git cannot operate robustly with larger repos, it appears to me, on
internet connections with even slightly flaky links.

I've googled for a tar-ball of the git repo for opentaps, but found nothing.

What can I do to work around my flaky link?

How hard would it be to add a wget-like mode to git, for the initial
repo download?

TIA
Zen
Jakub Narebski
2011-01-05 15:26:31 UTC
Permalink
Post by Zenaan Harkness
Hi, I am trying to clone opentaps.git. The following is my third try,
and I am giving up now. As you can see I pressed <Return> every now
and then, and for the last long while, absolutely nothing downloading
- the connection has clearly died once again. My last run was
yesterday and I left it run overnight.
I cannot download (with a git clone/ initial repo), more than roughly
100MiB. I am on a satellite connection. I have also experienced this
with wireless connection about 18 months ago.
I have in the middle there (a few months ago) spent a couple months at
a friend's place, and never had the same problem - nice ADSL2+
connection, ~1.5MiB/s connections. As you can see with my satellite
(rural) and also with my older wireless (also rural) connections, I do
not get more than about 64KiB/s, but it's usually slower, and I've
always had satellite latency issues in the order of 450ms, and
sometimes the odd dropout.
As opposed to git, wget on the other hand not only retries and
continues from where it left off when it retries (HTTP protocol)
(default retry 10 times I think), but I can completely INTerrupt wget,
and start it again from an entirely different computer if I want
(using wget's --continue option), and it will (with HTTP) happily
continue right where it left off, and eventually my download
completes.
[...]
Post by Zenaan Harkness
What can I do to work around my flaky link?
Ask project in question to provide bundle of repository for seeding
initial clone (see git-bundle manpage); this is an ordinary file, and
can be downloaded via HTTP or even P2P.
Post by Zenaan Harkness
How hard would it be to add a wget-like mode to git, for the initial
repo download?
Very hard; tthough "resumable clone" was often requested (25%
responders in "Git User's Survey 2010", see [1]), and there was even
some discussion about possible implementation, it was not implemented
yet, even as proof of concept.

The trouble is that packfile is *generated for a client*, and
bit-for-bit representation of said pack can vary (e.g. if
multithreaded packing is enabled; usually a good idea).

[1]: https://git.wiki.kernel.org/index.php/GitSurvey2010#17._Which_of_the_following_features_would_you_like_to_see_implemented_in_git.3F
--
Jakub Narebski
Poland
ShadeHawk on #git
Jonathan Nieder
2011-01-05 17:54:12 UTC
Permalink
Post by Jakub Narebski
Post by Zenaan Harkness
How hard would it be to add a wget-like mode to git, for the initial
repo download?
Very hard; tthough "resumable clone" was often requested (25%
responders in "Git User's Survey 2010", see [1]), and there was even
some discussion about possible implementation, it was not implemented
yet, even as proof of concept.
The trouble is that packfile is *generated for a client*, and
bit-for-bit representation of said pack can vary (e.g. if
multithreaded packing is enabled; usually a good idea).
That said, one possible partial solution would be to automate
generation of a seed bundle for huge repositories (with a script or
a special parameter to "git gc", maybe) and to document serving such a
seed bundle over HTTP as part of the standard setup. If this could be
made simple enough that e.g. all large repos on repo.or.cz had such a
seed bundle then I would call it a success. :)
Jakub Narebski
2011-01-05 20:00:48 UTC
Permalink
Post by Jonathan Nieder
Post by Jakub Narebski
Post by Zenaan Harkness
How hard would it be to add a wget-like mode to git, for the initial
repo download?
Very hard; tthough "resumable clone" was often requested (25%
responders in "Git User's Survey 2010", see [1]), and there was even
some discussion about possible implementation, it was not implemented
yet, even as proof of concept.
The trouble is that packfile is *generated for a client*, and
bit-for-bit representation of said pack can vary (e.g. if
multithreaded packing is enabled; usually a good idea).
That said, one possible partial solution would be to automate
generation of a seed bundle for huge repositories (with a script or
a special parameter to "git gc", maybe) and to document serving such a
seed bundle over HTTP as part of the standard setup. If this could be
made simple enough that e.g. all large repos on repo.or.cz had such a
seed bundle then I would call it a success. :)
I wonder if adding support for per-project _bundle_ link and 'bundle'
action support to gitweb (perhaps only if caching is turned on) would
help there... though I am not sure if doenloading fron gitweb is
resumable.
--
Jakub Narebski
Poland
James Cloos
2011-01-09 20:04:55 UTC
Permalink
In addition to the other replies, if you have a shell login elswhere you
can clone there, bundle the file, and use rsync, http, ftp or the like
to copy it down.

If the remote site's git is too old to have git bundle, use a bare clone
and tar it. You will not need to compress the tar.

You can also use split(1) to break up the bundle or tar into smaller
chunks if that helps. cat(1) will happily recombine those chunks.

If git bundle was not available, you can use the copied bare repo as
a --reference for a new clone, then copy the bare's pack file into
that new clone and remove the new clone's objects/info/alternates file.

I've had to use that method to get a clean clone across a small straw
(dialup or wireless) for several large repositories over the years.

-JimC
--
James Cloos <***@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
Loading...