Discussion:
How to extract files out of a "git bundle", no matter what?
j***@jidanni.org
2008-12-19 19:29:00 UTC
Permalink
Someone has handed you a "git bundle".
How do you get the files out of it?
If it were cpio, you would use -i, if it were tar, you would use -x...
You read the git-bundle man page.
You only get as far as
# git-bundle verify bundle.bdl
The bundle contains 1 ref
d01... /heads/master
The bundle requires these 0 ref
bundle.bdl is okay

The rest is mish-mosh. There should be an emergency example for non
git club members, even starting from apt-get install git-core, of the
all the real steps needed _to get the files out of the bundle_.

Assume the user _just wants to get the files out of the bundle_ and
not learn about or participate in some project.
Shawn O. Pearce
2008-12-19 19:32:56 UTC
Permalink
Post by j***@jidanni.org
Someone has handed you a "git bundle".
How do you get the files out of it?
If it were cpio, you would use -i, if it were tar, you would use -x...
You read the git-bundle man page.
You only get as far as
# git-bundle verify bundle.bdl
The bundle contains 1 ref
d01... /heads/master
The bundle requires these 0 ref
bundle.bdl is okay
The rest is mish-mosh. There should be an emergency example for non
git club members, even starting from apt-get install git-core, of the
all the real steps needed _to get the files out of the bundle_.
Assume the user _just wants to get the files out of the bundle_ and
not learn about or participate in some project.
You can't just "get the files out". A bundle contains deltas,
where you need the base in order to recreate the file content.
It can't be unpacked in a vacuum.

To unpack a bundle you need to clone the project and then fetch
from it:

git clone src...
git pull bundle.bdl master

If the bundle requires 0 refs (like above) then you can init a
new repository and should be able to fetch from it:

git init
git pull bundle.bdl master
--
Shawn.
Mark Levedahl
2008-12-19 19:57:35 UTC
Permalink
Post by Shawn O. Pearce
If the bundle requires 0 refs (like above) then you can init a
git init
git pull bundle.bdl master
With relatively recent git (not sure the version), you can just do

git clone bundle.bdl

Mark
j***@jidanni.org
2008-12-19 20:13:26 UTC
Permalink
SOP> If the bundle requires 0 refs (like above) then you can init a
SOP> new repository and should be able to fetch from it:

SOP> git init
SOP> git pull bundle.bdl master

Phew, that worked. Thank you!

ML> With relatively recent git (not sure the version), you can just do
ML> git clone bundle.bdl
Not with git version 1.5.6.5, Debian sid.

Anyway, for man page completeness, I still see the day when:

SOP> You can't just "get the files out". A bundle contains deltas,
SOP> where you need the base in order to recreate the file content.
SOP> It can't be unpacked in a vacuum.

That is nice by we here at the forensics department of XYZ police
force just need to get the files out. We tried "PK UNZIP" but that
didn't extract them. We contacted the Computer Science Dept. but
that's who they're holding hostage.

SOP> To unpack a bundle you need to clone the project and then fetch
SOP> from it:

SOP> git clone src...
SOP> git pull bundle.bdl master

That is nice but the perpetrators have destroyed everything except for
that one bundle.bdl file, which contains the password to defuse the
time bomb.

There must be a way to make a "phony tree" or whatever to "attach to"
so extraction can proceed. Be sure to spell it all out on the
git-bundle man page as a reference in case some non-computer people
need to do aforementioned emergency extraction one day.
Jeff King
2008-12-19 20:21:19 UTC
Permalink
Post by j***@jidanni.org
There must be a way to make a "phony tree" or whatever to "attach to"
so extraction can proceed. Be sure to spell it all out on the
git-bundle man page as a reference in case some non-computer people
need to do aforementioned emergency extraction one day.
No, that information may not even be in the bundle at all (unless it is
a bundle that has a 0-ref basis). In particular, if a bundle contains
changes between some commit A and some commit B, then:

- files that were not changed between A and B will not be included at
all

- the object pack in the bundle is "thin", meaning it may contain
deltas against objects that are reachable from A, but not B. So even
_within_ a changed file, you may see only the changes from A to B.

If the bundle has a 0-ref basis, then you can clone straight from the
bundle, which must have everything.

-Peff
j***@jidanni.org
2008-12-19 20:35:50 UTC
Permalink
JK> In particular, if a bundle contains changes between some commit A
JK> and some commit B, then:

JK> - files that were not changed between A and B will not be included at
JK> all

JK> - the object pack in the bundle is "thin", meaning it may contain
JK> deltas against objects that are reachable from A, but not B. So even
JK> _within_ a changed file, you may see only the changes from A to B.

OK, we here at the police forensics department would be very happy if
we could at least get some ASCII out of that .BDL file, even if it is
just a diff shred,
- The password to the time bomb was BLORFZ
+ The password to the time bomb is NORFLZ
that would be fine. All we know is after the work PACK it is all
binary, and git-unpack-objects and git-unpack-file don't work on it.
Jeff King
2008-12-19 20:51:00 UTC
Permalink
Post by j***@jidanni.org
JK> - the object pack in the bundle is "thin", meaning it may contain
JK> deltas against objects that are reachable from A, but not B. So even
JK> _within_ a changed file, you may see only the changes from A to B.
OK, we here at the police forensics department would be very happy if
we could at least get some ASCII out of that .BDL file, even if it is
just a diff shred,
- The password to the time bomb was BLORFZ
+ The password to the time bomb is NORFLZ
that would be fine. All we know is after the work PACK it is all
binary, and git-unpack-objects and git-unpack-file don't work on it.
AFAIK, there is no tool to try salvaging strings from an incomplete pack
(and you can't just run "strings" because the deltas are zlib
compressed). So if I were in the police forensics department, I think I
would read Documentation/technical/pack-format.txt and start hacking a
solution as quickly as possible.

-Peff
j***@jidanni.org
2009-01-01 04:24:59 UTC
Permalink
JK> AFAIK, there is no tool to try salvaging strings from an incomplete pack
JK> (and you can't just run "strings" because the deltas are zlib
JK> compressed). So if I were in the police forensics department, I think I
JK> would read Documentation/technical/pack-format.txt and start hacking a
JK> solution as quickly as possible.

Hogwash. Patch follows. Maybe even better methods are available.

Signed-off-by: jidanni <***@jidanni.org>
---
Documentation/git-bundle.txt | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 1b66ab7..80248f5 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -164,6 +164,28 @@ $ git pull bundle
would treat it as if it is talking with a remote side over the
network.

+DUMPING CONTENTS OF ANY BUNDLE
+-----------------------
+
+Even if we cannot unbundle a bundle,
+
+------------
+$ git init
+$ git bundle unbundle mybundle.bun
+error: Repository lacks these prerequisite commits...
+------------
+
+We can still examine all the data contained within,
+
+------------
+$ sed '/^PACK/,$!d' mybundle.bun > mybundle.pack
+$ git unpack-objects < mybundle.pack
+$ cd .git/objects
+$ ls ??/*|tr -d /|git cat-file --batch-check
+$ ls ??/*|tr -d /|git cat-file --batch
+------------
+
+
Author
------
Written by Mark Levedahl <***@verizon.net>
--
1.6.0.6
Johannes Schindelin
2009-01-01 17:03:25 UTC
Permalink
Hi,
Post by j***@jidanni.org
JK> AFAIK, there is no tool to try salvaging strings from an incomplete pack
JK> (and you can't just run "strings" because the deltas are zlib
JK> compressed). So if I were in the police forensics department, I think I
JK> would read Documentation/technical/pack-format.txt and start hacking a
JK> solution as quickly as possible.
Hogwash. Patch follows. Maybe even better methods are available.
---
Just for the record: this is in so many ways not a commit message I want
to have in git.git. I hope it is not applied.

Ciao,
Dscho
Jeff King
2009-01-01 19:21:54 UTC
Permalink
Post by j***@jidanni.org
JK> AFAIK, there is no tool to try salvaging strings from an incomplete pack
JK> (and you can't just run "strings" because the deltas are zlib
JK> compressed). So if I were in the police forensics department, I think I
JK> would read Documentation/technical/pack-format.txt and start hacking a
JK> solution as quickly as possible.
Hogwash. Patch follows. Maybe even better methods are available.
[...]
+$ sed '/^PACK/,$!d' mybundle.bun > mybundle.pack
+$ git unpack-objects < mybundle.pack
+$ cd .git/objects
+$ ls ??/*|tr -d /|git cat-file --batch-check
+$ ls ??/*|tr -d /|git cat-file --batch
Sorry, no, but your method does not work in the case I described: a thin
pack with deltas. In that case, git unpack-objects cannot unpack the
object since it lacks the delta, and will skip it. For example:

# create a bundle with a thin delta blob
mkdir one && cd one && git init
cp /usr/share/dict/words . && git add words && git commit -m one
echo SECRET MESSAGE >>words && git add words && git commit -m two
git bundle create ../mybundle.bun HEAD^..

# now try to fetch from it
mkdir ../two && cd ../two && git init
git bundle unbundle ../mybundle.bun
# produces:
# error: Repository lacks these prerequisite commits:
# error: b7d1a0ca98ca0e997d4222459d6fc1c9edae6a3f one

# so try to recover
sed '/^PACK/,$!d' ../mybundle.bun > mybundle.pack
git unpack-objects < mybundle.pack
# Unpacking objects: 100% (3/3), done.
# fatal: unresolved deltas left after unpacking
cd .git/objects
# this will show just two objects: the commit and the tree
ls ??/* | tr -d /
# confirm that we don't have the blob or the string of interest
ls ??/* | tr -d / | git cat-file --batch | grep SECRET

It is nice that unpack-objects continues at all thanks to the recent
improvements by Nicolas, so you may be able to get some of the data out.
But it just skips over any unresolvable deltas, since we can't make a
useful object from them. Maybe it would be worth adding an option to
dump the uncompressed deltas to a file or directory so you could run
"strings" on them to recover some of the data.

-Peff
j***@jidanni.org
2009-01-01 22:12:56 UTC
Permalink
JK> Maybe it would be worth adding an option to dump the uncompressed
JK> deltas to a file or directory so you could run "strings" on them
JK> to recover some of the data.

I got as far as these wheezy little bytes,
$ ls ??/*|tr -d /|sed q|xargs git cat-file tree|perl -pwe 's/[^\0]+[\0]//'|hd
00000000 ae 83 2f 22 45 89 2d dd e5 22 13 57 46 64 48 b4 |../"E.-..".WFdH.|
00000010 09 77 51 42 |.wQB|
before I ran out of tools to crack it. It must be in some standard git
gzip format. There should be a command line tool to crack it with
provided in the git suite.

Anyways, one day some forensics department will need to crack one of
these things, and I want the instructions available.

JS> Just for the record: this is in so many ways not a commit message I want
JS> to have in git.git. I hope it is not applied.
Is that where they end up? Oops, please reword it for me, anybody.
Jeff King
2009-01-01 23:48:15 UTC
Permalink
Post by j***@jidanni.org
I got as far as these wheezy little bytes,
$ ls ??/*|tr -d /|sed q|xargs git cat-file tree|perl -pwe 's/[^\0]+[\0]//'|hd
00000000 ae 83 2f 22 45 89 2d dd e5 22 13 57 46 64 48 b4 |../"E.-..".WFdH.|
00000010 09 77 51 42 |.wQB|
Those are just the bytes of the sha1 of the blob object, which is
pointed to by the tree object. You have the tree object correctly
unpacked, but not the blob, as I said before. So no amount of looking
in .git/objects is going to help you: git-unpack-objects didn't unpack
it, and the data isn't there in any form.

The data is in the pack, but as a delta, and that delta has further been
gzipped. So you can either write a custom parser based on the pack
format (which, as I mentioned, is described in
Documentation/technical/pack-format.txt), or you can add a switch to
unpack-objects, which is already parsing that format, to dump the
unresolved deltas. Which is what I was suggesting before.

Here's a very rough patch to do the latter. Try:

git unpack-objects --dump-delta <mybundle.pack
strings .git/lost-found/delta/*

Probably one could also write some tool to decode the delta format into
something more human readable.

---
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index 47ed610..ab33ab1 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -13,6 +13,7 @@
#include "fsck.h"

static int dry_run, quiet, recover, has_errors, strict;
+static int dump_deltas;
static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict] < pack-file";

/* We always read in 4kB chunks. */
@@ -462,6 +463,36 @@ static void unpack_one(unsigned nr)
}
}

+static void dump_delta_list(void)
+{
+ struct delta_info *d;
+
+ for (d = delta_list; d; d = d->next) {
+ git_SHA_CTX c;
+ unsigned char sha1[20];
+ char *path;
+ int fd;
+
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, d->delta, d->size);
+ git_SHA1_Final(sha1, &c);
+ path = git_path("lost-found/delta/%s", sha1_to_hex(sha1));
+
+ if (safe_create_leading_directories(path) < 0)
+ die("could not create lost-found directory");
+
+ fd = open(path, O_CREAT|O_WRONLY, 0666);
+ if (fd < 0)
+ die("unable to open %s: %s", path, strerror(errno));
+ if (write_in_full(fd, d->delta, d->size) < 0)
+ die("error writing to %s: %s", path, strerror(errno));
+ if (close(fd) < 0)
+ die("error writing to %s: %s", path, strerror(errno));
+
+ fprintf(stderr, "dumped delta %s\n", sha1_to_hex(sha1));
+ }
+}
+
static void unpack_all(void)
{
int i;
@@ -486,8 +517,11 @@ static void unpack_all(void)
}
stop_progress(&progress);

- if (delta_list)
+ if (delta_list) {
+ if (dump_deltas)
+ dump_delta_list();
die("unresolved deltas left after unpacking");
+ }
}

int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
@@ -534,6 +568,10 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
len = sizeof(*hdr);
continue;
}
+ if (!strcmp(arg, "--dump-deltas")) {
+ dump_deltas = 1;
+ continue;
+ }
usage(unpack_usage);
}
j***@jidanni.org
2009-01-02 00:10:30 UTC
Permalink
JK> diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
OK, I wish you luck in the fruition of the new --dump-delta option, and
can proofread the man pages involved, otherwise this is no area for
junior programmer me.
Shawn O. Pearce
2009-01-02 07:15:19 UTC
Permalink
Post by j***@jidanni.org
JK> diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
OK, I wish you luck in the fruition of the new --dump-delta option, and
can proofread the man pages involved, otherwise this is no area for
junior programmer me.
This is rather insane. There's very little data inside of a delta.
That's sort of the point of that level of compression, it takes
up very little disk space and yet describes the change made.
Almost nobody is going to want the delta without the base object
it applies onto. No user of git is going to need that. I'd rather
not carry dead code around in the tree for something nobody will
ever use.

FWIW, most Git deltas are "copy" instructions, they list a position
and count in the base to copy data *from*. These take up less
space then "insert" instructions, where new text is placed into
the file. As the delta generator favors a smaller delta, it tends
to create deltas that use the "copy" instruction more often than the
"insert" instruction. So there is *very* little data in the delta,
just ranges to copy from somewhere else. Without that other place
(the delta base) all you can do is guess about those bits. Which you
can do just as well with a few flips of a fair coin. :-)
--
Shawn.
Jeff King
2009-01-02 08:27:09 UTC
Permalink
Post by Shawn O. Pearce
Post by j***@jidanni.org
OK, I wish you luck in the fruition of the new --dump-delta option, and
can proofread the man pages involved, otherwise this is no area for
junior programmer me.
This is rather insane. There's very little data inside of a delta.
That's sort of the point of that level of compression, it takes
up very little disk space and yet describes the change made.
Almost nobody is going to want the delta without the base object
it applies onto. No user of git is going to need that. I'd rather
not carry dead code around in the tree for something nobody will
ever use.
I somewhat agree. Obviously we can come up with contrived cases where
the delta is a pure "add" and this option magically lets you recover
some text via "strings" on the resulting delta dump. But in practice,
it's hard to say exactly how useful it would be, especially since the
"motivation" here seems to be more academic than any actual real-world
problem. We can approximate with something like:

git clone git://git.kernel.org/pub/scm/git/git.git
cd git
git bundle create ../bundle.git v1.6.0..v1.6.1
mkdir ../broken && cd ../broken
sed '/^PACK/,$!d' ../bundle.git >pack
git init
git unpack-objects --dump-deltas <pack
strings .git/lost-found/delta/* | less

where maybe you lost your actual repository, but you still have a backup
of a bundle you sneaker-netted between major versions. In this instance
we have 6000 objects in the bundle, 2681 of which are blobs (and
therefore presumably the most interesting things to recover). Of those,
1070 were non-delta and can be recovered completely. For the remainder,
our strings command shows us snippets of what was there. There are
definitely recognizable pieces of code. But likewise there are pieces of
code that are missing subtle parts. E.g.:

if (textconv_one) {
size_t size;
mf1.ptr = run_textconv(textconv_one, one, &size);
if (!mf1.
ptr)
mf1.size = size;
if (textconv_two) {
size_t size;
mf2.ptr = run_textconv(textconv_two, two, &size);
if (!mf2.
ptr)
mf2.size = size;

So while there is _something_ to be recovered there, it is basically as
easy to rewrite the code as it is to piece together whatever fragments
are available into something comprehensible.

So in practice, the delta dump would only be useful if:

1. You have an incomplete thin pack, which generally means you are
using bundles (or you interrupted a fetch and kept the tmp_pack).

2. There is _no_ other copy of the basis. The results you get from
this method are so awful that it should really only be last-ditch.
I think you would be insane to say "Oh, I don't have net access
right now. Let me just spend hours picking through these deltas to
find a scrap of something useful instead of just waiting until I
get access again."

3. The changes in the pack tend to produce deltas rather than full
blobs, but the deltas tend to be very add-heavy.

I don't know how popular bundles are, but I would expect (1) puts us
very much in the minority. On top of that, given the nature of git, I
find (2) to be pretty unlikely. If you're sneaker-netting data with a
bundle, then it seems rare that both ends of the net will be lost at
once. As for (3), it seems source code is not a good candidate here.
Perhaps if you were writing a novel in a single file, you might salvage
whole paragraphs or even chapters.

So I am inclined to leave it as-is: a patch in the list archive. If and
when the day comes when somebody loses some super-important data and
somehow matches all of these criteria, then they can consult whatever
aged and senile git gurus still exist to pull the patch out and see if
anything can be recovered.

-Peff
j***@jidanni.org
2009-01-02 22:03:43 UTC
Permalink
Some options are:

1) just add a line or two to my man page patch showing
what recovery can and can't presently be done. (No need for my
temporary file, use a pipe too.)

2) Also implement that step where everything is uncompressed and put
into lost+found, and document that they should expect to just see a
lot of connector markings, and if there are useful strings in there
then they are just lucky. We did the job asked: recovered to the best
extent of what they gave us.

JK> So I am inclined to leave it as-is: a patch in the list archive. If and
JK> when the day comes when somebody loses some super-important data and
JK> somehow matches all of these criteria, then they can consult whatever
JK> aged and senile git gurus still exist to pull the patch out and see if
JK> anything can be recovered.

I've read too many cases in RISKS Digest, news:comp.risks, about years
later organizations trying to recover some weird format or media.
Therefore I urge you to strike while the iron is hot and hook up the
function into the code.

Maybe some have never tried to recover data, but for those that one
day might, they will be thanking you over and over for taking this
opportunity to give them a chance. In many cases the few shreds they
can recover might be all they need.

Also one can see the innards of git -- no more black box.

If I were creating a new binary format, I would be sure to also
provide decoder tools. Otherwise it is just like it requires its own
proprietary environment to reveal any of its innards. Sure, you can
say well that data is mainly useless... but it is better than nothing
-- we did the best with what they gave us.
j***@jidanni.org
2009-01-01 23:18:42 UTC
Permalink
git ls-tree prints wacko file sizes if it can't find the blob:
$ git ls-tree --abbrev=4 -l 76e4
error: unable to find ae832f2245892ddde5221357466448b409775142
100644 blob ae83 3220821896 words

It is even affected by --abbrev:
$ for i in 4 5 40 999; do git ls-tree --abbrev=$i -l 76e4; done 2>&-|
perl -nwale 'print $F[3]'
3214344536
3219092952
3216251688
3217198088
$ git version
git version 1.6.0.6
j***@jidanni.org
2009-01-01 23:47:28 UTC
Permalink
It's not. It is just randomly grabbing digits even without --abbrev.
Alex Riesen
2009-01-01 23:52:00 UTC
Permalink
Printing 0 as the size of the blob seem to be the safest. The error
message is already printed by sha1_object_info itself.

Signed-off-by: Alex Riesen <***@gmail.com>
---
Post by j***@jidanni.org
$ git ls-tree --abbrev=4 -l 76e4
error: unable to find ae832f2245892ddde5221357466448b409775142
100644 blob ae83 3220821896 words
Not tested, but should print size of 0 if this happens.
I actually would prefer ls-tree finish listing and exit(1) in this case,
but ... am a little lazy (or scared of a "static int exit_code;").

builtin-ls-tree.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c
index cb61717..234df50 100644
--- a/builtin-ls-tree.c
+++ b/builtin-ls-tree.c
@@ -96,7 +96,8 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen,
if (!(ls_options & LS_NAME_ONLY)) {
if (ls_options & LS_SHOW_SIZE) {
if (!strcmp(type, blob_type)) {
- sha1_object_info(sha1, &size);
+ if (sha1_object_info(sha1, &size))
+ size = 0;
printf("%06o %s %s %7lu\t", mode, type,
abbrev ? find_unique_abbrev(sha1, abbrev)
: sha1_to_hex(sha1),
--
1.6.1.73.g7450
j***@jidanni.org
2009-01-26 19:02:08 UTC
Permalink
Signed-off-by: jidanni <***@jidanni.org>
---
See http://article.gmane.org/gmane.comp.version-control.git/103576
Documentation/git-bundle.txt | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 1b66ab7..7c1e990 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -164,6 +164,13 @@ $ git pull bundle
would treat it as if it is talking with a remote side over the
network.

+If a bundle requires no references, one may simply use:
+
+------------
+$ git init
+$ git pull bundle.bdl master
+------------
+
Author
------
Written by Mark Levedahl <***@verizon.net>
--
1.6.0.6
Junio C Hamano
2009-01-26 19:53:21 UTC
Permalink
Post by j***@jidanni.org
---
See http://article.gmane.org/gmane.comp.version-control.git/103576
Documentation/git-bundle.txt | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 1b66ab7..7c1e990 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -164,6 +164,13 @@ $ git pull bundle
would treat it as if it is talking with a remote side over the
network.
Two nits.

1. Bundle does not require reference;it requires commits.

2. "One may simply use:" with a recipe without saying what the recipe is
useful for is not very helpful.

The second point needs to be stressed. For example, you could say
something like this:

With any bundle, you may simply say:

$ git ls-remote bundle.bdl

and it is a correct description if it is to see the refs in the bundle is
what you want to do, but it does not help when cloning from it is what you
want.

It would be a good practice to make the new part go with the flow of the
existing examples. Adding the following at the end might be a better way
to do this than your "init then pull" example:

A complete bundle is one that does not require you to have any
prerequiste object for you to extract its contents. Not only you
can fetch/pull from a bundle, you can clone from a complete bundle
as if it is a remote repository, like this:

----------------
$ git clone /home/me/tmp/file.bdl mine.git
----------------

This will define a remote called "origin" in the resulting
repository that lets you fetch and pull from the bundle, just
like the previous example lets you do with the remote called
"bundle", and from then on you can fetch/pull to update the
resulting mine.git repository after replacing the bundle you store
at /home/me/tmp/file.bdl with incremental updates.
j***@jidanni.org
2009-01-29 15:32:15 UTC
Permalink
Signed-off-by: jidanni <***@jidanni.org>
---
Words totally by Junio C Hamano.
Documentation/git-bundle.txt | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 1b66ab7..42c2abc 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -164,6 +164,22 @@ $ git pull bundle
would treat it as if it is talking with a remote side over the
network.

+A complete bundle is one that does not require you to have any
+prerequisite object for you to extract its contents. Not only you
+can fetch/pull from a bundle, you can clone from a complete bundle
+as if it was a remote repository, like this:
+
+----------------
+$ git clone /home/me/tmp/file.bdl mine.git
+----------------
+
+This will define a remote called "origin" in the resulting
+repository that lets you fetch and pull from the bundle, just
+like the previous example lets you do with the remote called
+"bundle", and from then on you can fetch/pull to update the
+resulting mine.git repository after replacing the bundle you store
+at /home/me/tmp/file.bdl with incremental updates.
+
Author
------
Written by Mark Levedahl <***@verizon.net>
--
1.6.0.6
j***@jidanni.org
2009-02-01 23:42:51 UTC
Permalink
Words totally by Junio C Hamano.
Signed-off-by: jidanni <***@jidanni.org>
---

Junio: I used your words.
You might have missed this patch. Resending.


Documentation/git-bundle.txt | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 1b66ab7..42c2abc 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -164,6 +164,22 @@ $ git pull bundle
would treat it as if it is talking with a remote side over the
network.

+A complete bundle is one that does not require you to have any
+prerequisite object for you to extract its contents. Not only you
+can fetch/pull from a bundle, you can clone from a complete bundle
+as if it was a remote repository, like this:
+
+----------------
+$ git clone /home/me/tmp/file.bdl mine.git
+----------------
+
+This will define a remote called "origin" in the resulting
+repository that lets you fetch and pull from the bundle, just
+like the previous example lets you do with the remote called
+"bundle", and from then on you can fetch/pull to update the
+resulting mine.git repository after replacing the bundle you store
+at /home/me/tmp/file.bdl with incremental updates.
+
Author
------
Written by Mark Levedahl <***@verizon.net>
--
1.6.0.6
Johannes Schindelin
2009-02-02 00:04:13 UTC
Permalink
Hi,
Post by j***@jidanni.org
Words totally by Junio C Hamano.
---
Junio: I used your words.
You might have missed this patch. Resending.
You are not serious, are you? People have explained time and time again
what is required by a commit message.

Now, I am not a native speaker, but the commit subject seems to contain
grammatical errors. Even if it weren't, it is not understandable.

So the only thing that is in your complete commit message remotely
purporting to explain what the patch is about and why it is good, fails to
do so.

Also, we always have an empty line before SOB lines.
Post by j***@jidanni.org
+A complete bundle is one that does not require you to have any
I have not heard of any "complete" bundle before, and I do not understand
the need for such a definition, either.
Post by j***@jidanni.org
+prerequisite object for you to extract its contents. Not only you
+can fetch/pull from a bundle, you can clone from a complete bundle
"Not only you can" violates grammar in my book.
Post by j***@jidanni.org
+
+----------------
+$ git clone /home/me/tmp/file.bdl mine.git
+----------------
+
+This will define a remote called "origin" in the resulting
+repository that lets you fetch and pull from the bundle, just
+like the previous example lets you do with the remote called
+"bundle", and from then on you can fetch/pull to update the
+resulting mine.git repository after replacing the bundle you store
+at /home/me/tmp/file.bdl with incremental updates.
IMO this paragraph just adds words, not anything the user does not know
already by that stage.

Ciao,
Dscho
Junio C Hamano
2009-02-02 00:45:19 UTC
Permalink
Post by Johannes Schindelin
Post by j***@jidanni.org
+A complete bundle is one that does not require you to have any
I have not heard of any "complete" bundle before, and I do not understand
the need for such a definition, either.
Sorry, that's mine, not Jidanni's fault. I agree that we do not
necessarily have to introduce a new term.
Post by Johannes Schindelin
Post by j***@jidanni.org
+
+----------------
+$ git clone /home/me/tmp/file.bdl mine.git
+----------------
+
+This will define a remote called "origin" in the resulting
+repository that lets you fetch and pull from the bundle, just
+like the previous example lets you do with the remote called
+"bundle", and from then on you can fetch/pull to update the
+resulting mine.git repository after replacing the bundle you store
+at /home/me/tmp/file.bdl with incremental updates.
IMO this paragraph just adds words, not anything the user does not know
already by that stage.
True again.

The only justification that an example of cloning from a complete (or
"baseless" or "full" or whatever new term we have already agreed that is
not needed ;-)) bundle in the example I can think of is that by having
such an example way earlier in the example sequence, we could show a full
cycle of sneakernetting into a repository. You bootstrap it by cloning
from a complete bundle, so that the clone has remotes set up to facilitate
further updates via fetch/pull pointing at a known location. Then you
drop a new bundle to the same location that is relative to an earlier one,
and pull from it to incrementally keep the repository up-to-date.

In other words, we currently have a very cursory description that says you
can ls-remote and fetch from a bundle at the end, and mention that the
remote configuration can be defined to facilitate repeated sneakernet
operation. But we could reorganize the example this way (the ones with
asterisk are already in our example section, the ones with plus are
additions):

* you first create a full bundle without basis

$ git bundle create mybundle master

* you make note of the current tip to optimize later bundles

$ git tag -f lastR2bundle master

+ sneakernet it and clone it to prime the recipient

... sneakernet mybundle to /home/me/tmp/mybundle
$ git clone /home/me/tmp/mybundle mine.git

+ after working more in the original, create an incremental bundle

$ git bundle create mybundle lastR2bundle..master
$ git tag -f lastR2bundle master

+ sneakernet it again, and use it to update the recipient

... sneakernet the new mybundle to /home/me/tmp/mybundle
$ git pull /home/me/tmp/mybundle mine.git

to show the simplest "full cycle" of sneakernet workflow. And then show
various variations we already have in the existing examples.

Something like:

In addition, if you know up to what commit the intended recipient
repository should have the necessary objects for, you can use that
knowledge to specify the basis, giving a cut-off point to limit the
revisions and objects that go in to the resulting bundle. Here are the
examples:

* using a tag present in both to optimize the bundle

$ git bundle create mybundle master ^v1.0.0

* using a basis based on time to optimize the bundle

$ git bundle create mybundle master --since=10.days

* using the number of commits to optimize the bundle

$ git bundle create mybundle master -n 10

A bundle from a recipient repository's point of view is just like a
regular repository it fetches/pulls from. You can for example map
refs, like this example, when fetching.

$ git fetch mybundle master:localRef

Or see what refs it offers

$ git ls-remote mybundle
j***@jidanni.org
2009-02-04 00:09:02 UTC
Permalink
Junio, could you combine your two recent versions,
http://news.gmane.org/group/gmane.comp.version-control.git/thread=103575/force_load=t/focus=108030
into a final one and commit it. No need to credit me. This is already
way over my head. Note however that the git clone example disappeared
from your final version. Also perhaps give a simplest example of git
pull. Indeed, much of your discussion is valuable and should be
included on the man page. Whatever you commit is fine. I would just
like to "close this bug" without having all the valuable documentation
you wrote for it just go down the drain, which will certainly happen
if I didn't send this message...
Junio C Hamano
2009-02-04 02:07:47 UTC
Permalink
Post by j***@jidanni.org
Junio, could you combine your two recent versions,
http://news.gmane.org/group/gmane.comp.version-control.git/thread=103575/force_load=t/focus=108030
into a final one and commit it. No need to credit me. This is already
way over my head. Note however that the git clone example disappeared
from your final version. Also perhaps give a simplest example of git
pull. Indeed, much of your discussion is valuable and should be
included on the man page. Whatever you commit is fine. I would just
like to "close this bug" without having all the valuable documentation
you wrote for it just go down the drain, which will certainly happen
if I didn't send this message...
The former was shot down by Johannes and I agree with his reasoning, and
the latter is merely "something like" outline that is not good enough for
inclusion. I personally do not consider there is a *bug* in the current
documentation so it is not much of my itch to scratch either.

Could you convince me that I should spend more time on that filling the
blanks in "something line" outline myself, instead of spending my git time
on some other areas, please?
j***@jidanni.org
2009-02-04 02:18:24 UTC
Permalink
JCH> Could you convince me that I should spend more time on that

I can't. I will however at least for myself bookmark this thread as a
valuable git bundle documentation supplement. OK, thanks.
Nanako Shiraishi
2009-02-04 09:15:29 UTC
Permalink
This rewrites the example part of the bundle doucmentation to follow
the suggestion made by Junio during a recent discussion (gmane 108030).

Instead of just showing different ways to create and use bundles in a
disconnected fashion, the rewritten example first shows the simplest
"full cycle" of sneakernet workflow, and then introduces various
variations.

The words are mostly taken from Junio's outline. I only reformatted
them and proofread to make sure the end result flows naturally.

Signed-off-by: Nanako Shiraishi <***@lavabit.com>
---

I didn't want your improvement suggestion to go to waste either, so
here is a proposed conclusion of this topic in a patch form, hopefully
in a good enough quality.

After the maintainer spent a lot of time to suggest how to improve a
proposed patch for inclusion, it is rude for a contributor to walk
away without following through the review process. Such a proposed
patch is not contributing to the development process but only stealing
maintainer's and reviewers' time from the community. But others like I
can at least try to help (^_^;).

Documentation/git-bundle.txt | 132 ++++++++++++++++++++++++++---------------
1 files changed, 84 insertions(+), 48 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 1b66ab7..ea0f6a0 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -84,7 +84,7 @@ defining the basis. More than one reference may be packaged, and more
than one basis can be specified. The objects packaged are those not
contained in the union of the given bases. Each basis can be
specified explicitly (e.g., ^master~10), or implicitly (e.g.,
-master~10..master, master --since=10.days.ago).
+master~10..master, --since=10.days.ago master).

It is very important that the basis used be held by the destination.
It is okay to err on the side of conservatism, causing the bundle file
@@ -94,75 +94,111 @@ when unpacking at the destination.
EXAMPLE
-------

-Assume two repositories exist as R1 on machine A, and R2 on machine B.
+Assume you want to transfer the history from a repository R1 on machine A
+to another repository R2 on machine B.
For whatever reason, direct connection between A and B is not allowed,
but we can move data from A to B via some mechanism (CD, email, etc).
We want to update R2 with developments made on branch master in R1.

-To create the bundle you have to specify the basis. You have some options:
+To bootstrap the process, you can first create a bundle that doesn't have
+any basis. You can use a tag to remember up to what commit you sent out
+in order to make it easy to later update the other repository with
+incremental bundle,

-- Without basis.
-+
-This is useful when sending the whole history.
+----------------
+machineA$ cd R1
+machineA$ git bundle create file.bdl master
+machineA$ git tag -f lastR2bundle master
+----------------

-------------
-$ git bundle create mybundle master
-------------
+Then you sneakernet file.bdl to the target machine B. Because you don't
+have to have any object to extract objects from such a bundle, not only
+you can fetch/pull from a bundle, you can clone from it as if it was a
+remote repository.

-- Using temporally tags.
-+
-We set a tag in R1 (lastR2bundle) after the previous such transport,
-and move it afterwards to help build the bundle.
+----------------
+machineB$ git clone /home/me/tmp/file.bdl R2
+----------------

-------------
-$ git bundle create mybundle master ^lastR2bundle
-$ git tag -f lastR2bundle master
-------------
+This will define a remote called "origin" in the resulting repository that
+lets you fetch and pull from the bundle. $GIT_DIR/config file in R2 may
+have an entry like this:

-- Using a tag present in both repositories
+------------------------
+[remote "origin"]
+ url = /home/me/tmp/file.bdl
+ fetch = refs/heads/*:refs/remotes/origin/*
+------------------------
+
+You can fetch/pull to update the resulting mine.git repository after
+replacing the bundle you store at /home/me/tmp/file.bdl with incremental
+updates from here on.
+
+After working more in the original repository, you can create an
+incremental bundle to update the other:
+
+----------------
+machineA$ cd R1
+machineA$ git bundle create file.bdl lastR2bundle..master
+machineA$ git tag -f lastR2bundle master
+----------------
+
+and sneakernet it to the other machine to replace /home/me/tmp/file.bdl,
+and pull from it.
+
+----------------
+machineB$ cd R2
+machineB$ git pull
+----------------

-------------
-$ git bundle create mybundle master ^v1.0.0
-------------
+If you know up to what commit the intended recipient repository should
+have the necessary objects for, you can use that knowledge to specify the
+basis, giving a cut-off point to limit the revisions and objects that go
+in the resulting bundle. The previous example used lastR2bundle tag
+for this purpose, but you can use other options you would give to
+the linkgit:git-log[1] command. Here are more examples:

-- A basis based on time.
+You can use a tag that is present in both.

-------------
-$ git bundle create mybundle master --since=10.days.ago
-------------
+----------------
+$ git bundle create mybundle v1.0.0..master
+----------------

-- With a limit on the number of commits
+You can use a basis based on time.

-------------
-$ git bundle create mybundle master -n 10
-------------
+----------------
+$ git bundle create mybundle --since=10.days master
+----------------

-Then you move mybundle from A to B, and in R2 on B:
+Or you can use the number of commits.

-------------
+----------------
+$ git bundle create mybundle -10 master
+----------------
+
+You can run `git-bundle verify` to see if you can extract from a bundle
+that was created with a basis.
+
+----------------
$ git bundle verify mybundle
-$ git fetch mybundle master:localRef
-------------
+----------------

-With something like this in the config in R2:
+This will list what commits you must have in order to extract from the
+bundle and will error out if you don't have them.

-------------------------
-[remote "bundle"]
- url = /home/me/tmp/file.bdl
- fetch = refs/heads/*:refs/remotes/origin/*
-------------------------
+A bundle from a recipient repository's point of view is just like a
+regular repository it fetches/pulls from. You can for example map
+refs, like this example, when fetching:

-You can first sneakernet the bundle file to ~/tmp/file.bdl and
-then these commands on machine B:
+----------------
+$ git fetch mybundle master:localRef
+----------------

-------------
-$ git ls-remote bundle
-$ git fetch bundle
-$ git pull bundle
-------------
+Or see what refs it offers.

-would treat it as if it is talking with a remote side over the
-network.
+----------------
+$ git ls-remote mybundle
+----------------

Author
------
--
1.6.1.2
--
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/
Jeff King
2009-02-04 15:26:05 UTC
Permalink
Post by Nanako Shiraishi
I didn't want your improvement suggestion to go to waste either, so
here is a proposed conclusion of this topic in a patch form, hopefully
in a good enough quality.
After the maintainer spent a lot of time to suggest how to improve a
proposed patch for inclusion, it is rude for a contributor to walk
away without following through the review process. Such a proposed
patch is not contributing to the development process but only stealing
maintainer's and reviewers' time from the community. But others like I
can at least try to help (^_^;).
Nanako,

I often see you doing small patch cleanups, reposts, gentle reminders,
and other work like this that really helps the community process run
smoothly. I just wanted to say "thank you" so that you know that your
efforts are not going unnoticed.

-Peff
Junio C Hamano
2009-02-04 22:44:42 UTC
Permalink
Post by Nanako Shiraishi
This rewrites the example part of the bundle doucmentation to follow
the suggestion made by Junio during a recent discussion (gmane 108030).
Instead of just showing different ways to create and use bundles in a
disconnected fashion, the rewritten example first shows the simplest
"full cycle" of sneakernet workflow, and then introduces various
variations.
The words are mostly taken from Junio's outline. I only reformatted
them and proofread to make sure the end result flows naturally.
---
I didn't want your improvement suggestion to go to waste either, so
here is a proposed conclusion of this topic in a patch form, hopefully
in a good enough quality.
I appreciate your help like this patch, and your other contributions of
"project secretary" kind, pointing out old threads, prodding about
unapplied patches, etc., because I do not have infinite amount of time.

The text seems to follow my "this might flow more naturally and easier to
read" outline exactly, and I do not have a problem with the patch itself.
Among the people who were involved in the review, Jidanni seemed to be of
the same opinion, but I haven't heard from Dscho one way or another. So
I'd keep this on hold for now but I think the examples are organized much
better with this version and we should take it.

HOWEVER.
Post by Nanako Shiraishi
After the maintainer spent a lot of time to suggest how to improve a
proposed patch for inclusion, it is rude for a contributor to walk
away without following through the review process. Such a proposed
patch is not contributing to the development process but only stealing
maintainer's and reviewers' time from the community. But others like I
can at least try to help (^_^;).
I see a smiley, but what's with the animosity? One thing I've always
liked about your messages to this list is that they have an exceptional
signal to noise ratio, certainly much better than mine [*1*].

I saw you were annoyed by his recent "bug tracker" remark in another
thread, and I do appreciate that you are showing a better way to help by
setting an example, but I think this comment is counterproductive.

[Footnote]

*1* You certainly never said anything like giving furniture to somebody
else ;-)

Junio C Hamano
2008-12-19 20:07:09 UTC
Permalink
Post by j***@jidanni.org
Someone has handed you a "git bundle".
How do you get the files out of it?
If it were cpio, you would use -i, if it were tar, you would use -x...
You read the git-bundle man page.
You only get as far as
# git-bundle verify bundle.bdl
The bundle contains 1 ref
d01... /heads/master
The bundle requires these 0 ref
bundle.bdl is okay
The rest is mish-mosh.
The last example in the git-bundle man page might be a bit cryptic but
that is how bundles are expected to be used. To give people repository
access who do not have real network connection other than Sneakernet.

For one shot extraction, defining a remote in the config is overkill and
you could just say:

git ls-remote bundle.bdl

to see what branches it contains and if you are interested in its
master branch and want to merge it to your history, then

git pull bundle.bdl master

should do that.
Loading...