Discussion:
rsync busy non-bare git repo 'source' to quiet
Neal Kreitzinger
2011-03-08 21:25:00 UTC
Permalink
Does anyone have an example of an rsync bash script that will make a good
copy of a non-bare git repo (including the working tree) while the "source"
git repo is busy and the "destination" git repo is quiet?

We do not use symlinks in our working-tree or in our .git directory (unless
git is using symlinks on its own behind-the-scenes that I am not aware of).

This would make it very easy for us to refresh our "beta" livebox to emulate
the current "gold" livebox using a single rsync instead of a combination of
rsync and git-clone/pull due to the pieces that git does not replicate (ie,
hooks) and the non-git components of our git based change control menu
system (which is written in bash scripts on linux).

Thanks!

v/r,
Neal
Jeff King
2011-03-08 21:39:59 UTC
Permalink
Post by Neal Kreitzinger
Does anyone have an example of an rsync bash script that will make a good
copy of a non-bare git repo (including the working tree) while the "source"
git repo is busy and the "destination" git repo is quiet?
Don't copy the working tree. It's redundant with the repo data (assuming
your working tree is clean), so you are probably faster to sync .git and
then "reset --hard". And then you don't have to worry about fetching an
inconsistent working tree state.

For syncing the repo, I think you would need to do:

1. Copy the refs to a local temp space.

2. Copy the object db.

3. Install your temp refs into place.

That way, for any updates in progress you will either not copy them
(because the refs weren't in place in step 1, though you may have some
of their objects), or if you do copy them, you are guaranteed to have
all of the necessary objects (because git will not update the ref until
all objects are in place).

But I really have to wonder why you don't simply use git to do the
fetch? It already does the right thing with respect to updates, and it
will be way more efficient than rsync in the face of repacking (or if
you really do want to use rsync, there is even an rsync transport for
Post by Neal Kreitzinger
This would make it very easy for us to refresh our "beta" livebox to emulate
the current "gold" livebox using a single rsync instead of a combination of
rsync and git-clone/pull due to the pieces that git does not replicate (ie,
hooks) and the non-git components of our git based change control menu
system (which is written in bash scripts on linux).
You just don't want to do a pull in addition to an rsync. But I don't
think this solution will be any less complex.

-Peff
Neal Kreitzinger
2011-03-08 22:20:33 UTC
Permalink
Post by Jeff King
Post by Neal Kreitzinger
Does anyone have an example of an rsync bash script that will make
a good copy of a non-bare git repo (including the working tree)
while the "source" git repo is busy and the "destination" git repo
is quiet?
Don't copy the working tree. It's redundant with the repo data
(assuming your working tree is clean), so you are probably faster to
sync .git and then "reset --hard". And then you don't have to worry
about fetching an inconsistent working tree state.
1. Copy the refs to a local temp space.
2. Copy the object db.
3. Install your temp refs into place.
That way, for any updates in progress you will either not copy them
(because the refs weren't in place in step 1, though you may have
some of their objects), or if you do copy them, you are guaranteed to
have all of the necessary objects (because git will not update the
ref until all objects are in place).
But I really have to wonder why you don't simply use git to do the
fetch? It already does the right thing with respect to updates, and
it will be way more efficient than rsync in the face of repacking (or
if you really do want to use rsync, there is even an rsync transport
Post by Neal Kreitzinger
This would make it very easy for us to refresh our "beta" livebox
to emulate the current "gold" livebox using a single rsync instead
of a combination of rsync and git-clone/pull due to the pieces that
git does not replicate (ie, hooks) and the non-git components of
our git based change control menu system (which is written in bash
scripts on linux).
You just don't want to do a pull in addition to an rsync. But I
don't think this solution will be any less complex.
One reason is that we only have one development box. We would have to
open up the canonical repo and development repos to git:// protocol
access. If I use git I have to do the initial clones for creation and
then the pulls for refreshes. rsync will do the creation and refreshes
via the same script. If I use git I would have to rsync the hooks,
config, and anything else git doesn't bring over. On the goldbox I have
a bare repo that mirrors the canonical repo and then has additional
branches and is in turn mirrored by another bare mirror which has all
the additional branches. I would have to recreate the original remote
branch setup and then still maintain the remotes to the goldbox. Then I
wouldn't really have a simulation of the goldbox anymore because I have
extra remotes (maybe that wouldn't really hurt anything). Rsync seems
like a simpler solution and more accurate solution for creating a copy
of an ecosystem of interrelated git repos colocated on the same box.

A previous post in the newsgroup states:
"> If you want your rsync backup to be fine, you need to follow some
Post by Jeff King
ordering. You need to copy the refs first (.git/packed-refs and
.git/refs/), then the loose objects (.git/objects/??/*), and then all
the rest. If files are copied in a different order while some write
operations are performed on the source repository then you may end up
with an incoherent repository."
Would that work?

v/r,
neal
Jeff King
2011-03-08 22:38:41 UTC
Permalink
Rsync seems like a simpler solution and more accurate solution for
creating a copy of an ecosystem of interrelated git repos colocated on
the same box.
Sure. It is simpler, but not atomic unless you do a multi-stage rsync.
Post by Neal Kreitzinger
If you want your rsync backup to be fine, you need to follow some
ordering. You need to copy the refs first (.git/packed-refs and
.git/refs/), then the loose objects (.git/objects/??/*), and then all
the rest. If files are copied in a different order while some write
operations are performed on the source repository then you may end up
with an incoherent repository."
Would that work?
If you do it in that order, the end result will be a consistent repo.
But during the copy, the refs at the destination will point to objects
you don't have. I don't know if that matters for your case.

-Peff
Neal Kreitzinger
2011-03-08 23:00:59 UTC
Permalink
Post by Jeff King
Rsync seems like a simpler solution and more accurate solution for
creating a copy of an ecosystem of interrelated git repos colocated on
the same box.
Sure. It is simpler, but not atomic unless you do a multi-stage rsync.
Post by Neal Kreitzinger
If you want your rsync backup to be fine, you need to follow some
ordering. You need to copy the refs first (.git/packed-refs and
.git/refs/), then the loose objects (.git/objects/??/*), and then all
the rest. If files are copied in a different order while some write
operations are performed on the source repository then you may end up
with an incoherent repository."
Would that work?
If you do it in that order, the end result will be a consistent repo.
But during the copy, the refs at the destination will point to objects
you don't have. I don't know if that matters for your case.
We won't be trying to used the "destination" repos during the rsync.
The workflow will be:
(1) I need to test a change to the goldbox change control menu system.
(2) I determine that my testbox "copy" of the change control menu system
has repos that are too out-of-date for a good test, so I run rsync to
make a fresh copy of the goldbox.
(3) After the rsync is finished, I pull over just the branch that
contains my untested change menu scripts. (this is the only git pull i
do from goldbox to testbox.)
(4) I test the changes and see they are good.
(5) I merge the changes into the master branch of the change control
menu non-bare repo on the goldbox. The menu runs from the working tree
so now they are live.

Since I won't be trying to access refs in the "destination" repos via
git commandline or gui while the rsync is running, it sounds like it
will be ok. I can keep people from banging on the testbox. I can't
keep them from banging on the goldbox.

In regards to the working tree of a busy "source" repo, it sounds like
it could end up not matching the index. At the end of the script I
could execute a "git reset --hard && git clean -f" on each non-bare repo
as you suggested. That would be pretty straightforward.

Thanks!

v/r,
Neal

Loading...