Discussion:
[RFC PATCH 00/15] Sparse clones
Elijah Newren
2010-09-05 00:13:52 UTC
Permalink
This patch series implements some basics for sparse clones, which I
define as a clone where not all blob, tree, or commit objects are
downloaded. The idea is to include sparseness both relative to span
of files/directories and depth of history, though currently I've only
put effort into span of paths.

This patch is built on pu, because it requires
en/object-list-with-pathspec.

What works:
* all operations on non-sparse clones (full testsuite passes)
* clone
* read-tree
* ls-files
* cat-file
* ls-tree
* checkout
* diff
* status
* log
* add (except for not giving errors for paths outside the sparse limi=
ts)
* commit
What doesn't work, yet:
* Probably everything not tested in the new t572*.sh tests :-)
Notable examples of things missing from t572*.sh tests:
* fetch
* push
* merge
* rebase
* thin packs (need to modify pack-objects to only delta against
objects within the sparse limits)
* densify command (to make a sparse repository non-sparse)
* "missing" commits (see README file in PATCH1)

Cursory comparison with Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy's sub=
tree clone (he's probably
made progress since his last submission, so this may be outdated):
* His series supports fetch, mine doesn't (yet).
* His series supports push, mine doesn't (yet).
* His series supports merge, mine doesn't (yet).
* His handling of subtree request over clone/fetch via capabilities
is probably the right way; I'm pretty sure my adding of sparse
limits are extra arguments to upload-pack would break backward
compatibility and be bad.
* He supports just one selected subtree (though he mentioned he's
working on extending that); I support arbitrary number of subtrees
or subfiles.
* He modifies index format (bumping to header version 4); I don't.
Perhaps it's necessary for merge handling as I haven't implemented
that, but at an early glance I don't think it's necessary.
* While there are some similarities in the low-level details of how
we've modified the git to avoid missing objects, there are many
differences as well. I'm hoping to provoke some good discussion.

Elijah Newren (15):

P1- README-sparse-clone: Add a basic writeup of my ideas for sparse c=
lones

Just a big old write-up. Not everything in it is implemented yet, but =
it
gives you the high-level picture.

P2- Add tests for client handling in a sparse repository

Tests! Yaay!

P3- Read sparse limiting args from $GIT_DIR/sparse-limit

When a sparse clone is created, limiting paths will be stored.

P4- When unpacking in a sparse repository, avoid traversing missing
trees/blobs
P5- read_tree_recursive: Avoid missing blobs and trees in a sparse
repository
P6- Automatically reuse sparse limiting arguments in revision walking
P7- cache_tree_update(): Capability to handle tree entries missing fr=
om
index
P8- cache_tree_update(): Require relevant tree to be passed

Avoiding missing trees/blobs. =20

P9- Add tests for communication dealing with sparse repositories

Tests for clone/fetch/push/etc. Just clone so far.

P10- sparse-repo: Provide a function to record sparse limiting argume=
nts

Can't just read from $GIT_DIR/sparse-limit; gotta write to it too.

P11- builtin-clone: Accept paths for sparse clone
P12- Pass extra (rev-list) args on, at least in some cases
P13- upload-pack: Handle extra rev-list arguments being passed
P14- EVIL COMMIT: Include all commits
P15- clone: Ensure sparse limiting arguments are used in subsequent
operations

I like the changes to how clone accepts additional rev-list arguments
to limit what is downloaded, but I'm not too happy with how these
patches pass those rev-list arguments on to upload-pack. So don't
bother looking too closely at these.


Makefile | 2 +
README-sparse-clone | 284 ++++++++++++++++++++=
++++++++
builtin/archive.c | 2 +-
builtin/checkout.c | 2 +-
builtin/clone.c | 39 +++-
builtin/commit.c | 15 +-
builtin/fetch-pack.c | 3 +-
builtin/merge.c | 19 +-
builtin/revert.c | 7 +-
builtin/send-pack.c | 3 +-
builtin/write-tree.c | 6 +-
cache-tree.c | 92 +++++++++-
cache-tree.h | 4 +-
cache.h | 5 +-
connect.c | 9 +-
diff.h | 1 -
environment.c | 2 +
merge-recursive.c | 6 +-
merge-recursive.h | 2 +-
revision.c | 21 ++-
revision.h | 3 +-
setup.c | 2 +
sparse-repo.c | 84 ++++++++
sparse-repo.h | 4 +
t/sparse-lib.sh | 38 ++++
t/t5601-clone.sh | 14 --
t/t5720-sparse-repository-basics.sh | 130 +++++++++++++
t/t5721-sparse-repository-communication.sh | 106 +++++++++++
test-dump-cache-tree.c | 3 +-
transport-helper.c | 5 +-
transport.c | 13 +-
transport.h | 9 +-
tree-diff.c | 4 +-
tree-walk.c | 48 ++++-
tree-walk.h | 3 +
tree.c | 5 +
upload-pack.c | 45 +++--
37 files changed, 952 insertions(+), 88 deletions(-)
create mode 100644 README-sparse-clone
create mode 100644 sparse-repo.c
create mode 100644 sparse-repo.h
create mode 100644 t/sparse-lib.sh
create mode 100755 t/t5720-sparse-repository-basics.sh
create mode 100755 t/t5721-sparse-repository-communication.sh

--=20
1.7.2.3.541.g94cc33
Elijah Newren
2010-09-05 00:13:56 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
diff.h | 1 -
t/t5720-sparse-repository-basics.sh | 10 +++---
tree-diff.c | 2 +-
tree-walk.c | 48 ++++++++++++++++++++++++++++++----
tree-walk.h | 3 ++
5 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/diff.h b/diff.h
index 986a015..e17383c 100644
--- a/diff.h
+++ b/diff.h
@@ -171,7 +171,6 @@ extern int diff_tree_sha1(const unsigned char *old, const unsigned char *new,
const char *base, struct diff_options *opt);
extern int diff_root_tree_sha1(const unsigned char *new, const char *base,
struct diff_options *opt);
-extern int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt);

struct combine_diff_path {
struct combine_diff_path *next;
diff --git a/t/t5720-sparse-repository-basics.sh b/t/t5720-sparse-repository-basics.sh
index b8e9a3a..b946c23 100755
--- a/t/t5720-sparse-repository-basics.sh
+++ b/t/t5720-sparse-repository-basics.sh
@@ -41,7 +41,7 @@ test_expect_success 'setup' '
)
'

-test_expect_failure 'make sparse repository' '
+test_expect_success 'make sparse repository' '
git clone -q "file://$(pwd)/src" dst &&
(
cd dst &&
@@ -55,7 +55,7 @@ test_expect_failure 'make sparse repository' '
cd dst 2>/dev/null || test_done
srcgit="--git-dir=../src/.git"

-test_expect_failure 'plumbing: ls-files works' '
+test_expect_success 'plumbing: ls-files works' '
git ls-files > output &&
test "sub/b/file" = "$(cat output)"
'
@@ -98,19 +98,19 @@ test_expect_failure 'basic: diff works' '
git diff master~3
'

-test_expect_failure 'basic: checkout works' '
+test_expect_success 'basic: checkout works' '
git checkout master~2 &&
git checkout master
'

-test_expect_failure 'basic: status works with modified stuff' '
+test_expect_success 'basic: status works with modified stuff' '
git status &&
echo more content >> sub/b/file &&
echo newfile content >> sub/b/whatever &&
git status
'

-test_expect_failure 'basic: add works' '
+test_expect_success 'basic: add works' '
git add sub/b/file &&
git add sub/b/whatever
'
diff --git a/tree-diff.c b/tree-diff.c
index 951b53b..95e956e 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -261,7 +261,7 @@ static void show_entry(struct diff_options *opt, const char *prefix, struct tree
}
}

-static void skip_uninteresting(struct tree_desc *t, const char *base, int baselen, struct diff_options *opt, int *all_interesting)
+void skip_uninteresting(struct tree_desc *t, const char *base, int baselen, struct diff_options *opt, int *all_interesting)
{
while (t->size) {
int show = tree_entry_interesting(t, base, baselen, opt);
diff --git a/tree-walk.c b/tree-walk.c
index a9bbf4e..a584dd8 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -66,8 +66,19 @@ static void entry_clear(struct name_entry *a)
memset(a, 0, sizeof(*a));
}

-static void entry_extract(struct tree_desc *t, struct name_entry *a)
+struct tree_sparse_info {
+ int all_interesting;
+ const char *base;
+ int baselen;
+ struct diff_options *diffopt; /* really is const too */
+};
+
+static void entry_extract(struct tree_desc *t, struct name_entry *a,
+ struct tree_sparse_info *i)
{
+ if (!i->all_interesting)
+ skip_uninteresting(t, i->base, i->baselen,
+ i->diffopt, &i->all_interesting);
*a = t->entry;
}

@@ -219,7 +230,8 @@ static int check_entry_match(const char *a, int a_len, const char *b, int b_len)
static void extended_entry_extract(struct tree_desc_x *t,
struct name_entry *a,
const char *first,
- int first_len)
+ int first_len,
+ struct tree_sparse_info *i)
{
const char *path;
int len;
@@ -235,7 +247,7 @@ static void extended_entry_extract(struct tree_desc_x *t,
entry_clear(a);
break; /* not found */
}
- entry_extract(&t->d, a);
+ entry_extract(&t->d, a, i);
for (skip = t->skip; skip; skip = skip->prev)
if (a->path == skip->ptr)
break; /* found */
@@ -268,7 +280,7 @@ static void extended_entry_extract(struct tree_desc_x *t,
*/
probe = t->d;
while (probe.size) {
- entry_extract(&probe, a);
+ entry_extract(&probe, a, i);
path = a->path;
len = tree_entry_len(a->path, a->sha1);
switch (check_entry_match(first, first_len, path, len)) {
@@ -315,6 +327,28 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
struct name_entry *entry = xmalloc(n*sizeof(*entry));
int i;
struct tree_desc_x *tx = xcalloc(n, sizeof(*tx));
+ char prefix[PATH_MAX];
+ struct tree_sparse_info sparse_info;
+ if (!git_sparse_pathspecs)
+ sparse_info.all_interesting = 1;
+ else {
+ if (info->prev)
+ make_traverse_path(prefix, info->prev, &info->name);
+ else
+ strcpy(prefix, info->name.path);
+ sparse_info.baselen = strlen(prefix);
+
+ /* prefix must be slash terminated if non-empty */
+ if (sparse_info.baselen) {
+ sparse_info.baselen += 1;
+ prefix[sparse_info.baselen-1] = '/';
+ prefix[sparse_info.baselen] = '\0';
+ }
+
+ sparse_info.all_interesting = 0;
+ sparse_info.base = prefix;
+ sparse_info.diffopt = &git_sparse_diffopts;
+ }

for (i = 0; i < n; i++)
tx[i].d = t[i];
@@ -328,7 +362,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)

for (i = 0; i < n; i++) {
e = entry + i;
- extended_entry_extract(tx + i, e, NULL, 0);
+ extended_entry_extract(tx + i, e, NULL, 0, &sparse_info);
}

/*
@@ -356,7 +390,9 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
if (first) {
for (i = 0; i < n; i++) {
e = entry + i;
- extended_entry_extract(tx + i, e, first, first_len);
+ extended_entry_extract(tx + i, e,
+ first, first_len,
+ &sparse_info);
/* Cull the ones that are not the earliest */
if (!e->path)
continue;
diff --git a/tree-walk.h b/tree-walk.h
index 7e3e0b5..eae1997 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -60,4 +60,7 @@ static inline int traverse_path_len(const struct traverse_info *info, const stru
return info->pathlen + tree_entry_len(n->path, n->sha1);
}

+void skip_uninteresting(struct tree_desc *t, const char *base, int baselen, struct diff_options *opt, int *all_interesting);
+int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt);
+
#endif
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:13:57 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
t/t5720-sparse-repository-basics.sh | 2 +-
tree.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/t/t5720-sparse-repository-basics.sh b/t/t5720-sparse-repository-basics.sh
index b946c23..b11c5ab 100755
--- a/t/t5720-sparse-repository-basics.sh
+++ b/t/t5720-sparse-repository-basics.sh
@@ -81,7 +81,7 @@ known_objects=$(git $srcgit rev-list --objects master \
| cut -b -40)
for i in $(git $srcgit rev-list HEAD | xargs git name-rev | cut -b 42-); do
git $srcgit ls-tree -rt $i | grep -F "$known_objects" >expect &&
- test_expect_failure "plumbing: ls-tree -rt $i works" "
+ test_expect_success "plumbing: ls-tree -rt $i works" "
git ls-tree -rt $i 2>error >output &&
test_cmp output expect
"
diff --git a/tree.c b/tree.c
index 5ab90af..5f9e37a 100644
--- a/tree.c
+++ b/tree.c
@@ -119,6 +119,11 @@ int read_tree_recursive(struct tree *tree,
default:
return -1;
}
+
+ if (git_sparse_pathspecs &&
+ sha1_object_info(entry.sha1, NULL) <= 0)
+ continue;
+
if (S_ISDIR(entry.mode)) {
int retval;
char *newbase;
--
1.7.2.2.140.gd06af
Nguyen Thai Ngoc Duy
2010-09-05 02:00:24 UTC
Permalink
Post by Elijah Newren
@@ -119,6 +119,11 @@ int read_tree_recursive(struct tree *tree,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0return -1;
Post by Elijah Newren
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (git_sparse_pat=
hspecs &&
Post by Elijah Newren
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sha1=
_object_info(entry.sha1, NULL) <=3D 0)
Post by Elijah Newren
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 continue;
Post by Elijah Newren
+
I suppose this is temporary and the final solution would have stricter
checks (i.e. only ignore if those entries are outside sparse zone)?
This opens a door for broken repo.
--=20
Duy
Elijah Newren
2010-09-05 03:16:38 UTC
Permalink
Post by Elijah Newren
@@ -119,6 +119,11 @@ int read_tree_recursive(struct tree *tree,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0return -1;
Post by Elijah Newren
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (git_sparse_pa=
thspecs &&
Post by Elijah Newren
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sha=
1_object_info(entry.sha1, NULL) <=3D 0)
Post by Elijah Newren
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 continue;
Post by Elijah Newren
+
I suppose this is temporary and the final solution would have stricte=
r
checks (i.e. only ignore if those entries are outside sparse zone)?
This opens a door for broken repo.
Yes, good catch. Looks like I somehow missed that one, but I agree,
there should be an "&& !tree_entry_interesting(...)" in there.
Elijah Newren
2010-09-05 04:31:33 UTC
Permalink
Post by Elijah Newren
@@ -119,6 +119,11 @@ int read_tree_recursive(struct tree *tree,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0return -1;
Post by Elijah Newren
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (git_sparse_p=
athspecs &&
Post by Elijah Newren
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sh=
a1_object_info(entry.sha1, NULL) <=3D 0)
Post by Elijah Newren
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 continue;
Post by Elijah Newren
+
I suppose this is temporary and the final solution would have strict=
er
checks (i.e. only ignore if those entries are outside sparse zone)?
This opens a door for broken repo.
Yes, good catch. =C2=A0Looks like I somehow missed that one, but I ag=
ree,
there should be an "&& !tree_entry_interesting(...)" in there.
Sorry, now I remember why that isn't in there and can't be. I did
have it there at one point. However, base & baselen in
read_tree_recursive do not necessarily correspond to the relative path
from the toplevel of the repository, though the sparse limit pathspecs
always will. For example, running
git ls-tree master:Documentation/technical
or, equivalently (for current git.git)
git ls-tree da0ae7c59bb0df4c13554fd840e1a563cde659ea
then base will be "" for paths under Documentation/technical rather
than "Documenation/technical/". And there's really no way of
determining the "real base" either in order to apply matching to the
sparse limit pathspecs (well...I guess you could do an exhaustive walk
of all history each time, so long as the given sha1sum only appears as
one particular directory, but that's unrealistic and slow and leaves
open what to do when multiple directories at different paths happen to
have the same sha1sum at some point(s) in their history). Note that
this affects cat-file -p as well, since it calls ls-tree and thus this
code.

I really don't see how one can change this. However, if it's any
consolation, sha1_object_info() will print out a warning message
whenever it's asked for a sha1sum that cannot be found in the object
store. For example, in a sparse clone:

$ git ls-tree -rt master
040000 tree f98bf35e9a746ebbd5a706591fe1ea4942942bad sub
040000 tree 436913a91c5648a6ed8fa23719fbd6e3052e2597 sub/a
error: unable to find 436913a91c5648a6ed8fa23719fbd6e3052e2597
040000 tree 07753f428765ac1afe2020b24e40785869bd4a85 sub/b
100644 blob d95f3ad14dee633a758d2e331151e950dd13e4ed sub/b/file
040000 tree 07753f428765ac1afe2020b24e40785869bd4a85 sub/bcopy
100644 blob d95f3ad14dee633a758d2e331151e950dd13e4ed sub/bcopy/file
100644 blob 4b7b65e07a8641bcd14375ebddf5c8a7fc002a30 sub/file

It can't traverse into sub/a because it doesn't have the necessary
information. I figured the warning message was useful to the end user
as a reminder that they are in a sparse clone. (Such warning messages
don't affect cat-file -p, because its callbacks don't return
READ_TREE_RECURSIVE, making sure it doesn't trigger this part of the
code.)
Elijah Newren
2010-09-05 00:14:00 UTC
Permalink
This updates the callers of cache_tree_update() to provide it a tree, so
that cache_tree_update() can use it for paths outside the sparse limits
together with index entries to write out a new set of trees. In cases
where there is no obvious tree to provide: if in a sparse repository, the
code dies; if in a non-sparse repository, a NULL tree is provided (in the
non-sparse case, the tree provided is irrelevant since the index is
complete).

FIXME: The changes to buildin/merge.c and merge-recursive.[ch] are probably
wrong; a new tree needs to be generated via some trivial merge algorithm
and then used, rather than just using the HEAD commit.

Signed-off-by: Elijah Newren <***@gmail.com>
---
builtin/checkout.c | 2 +-
builtin/commit.c | 15 +++++++++++----
builtin/merge.c | 19 ++++++++++---------
builtin/revert.c | 7 ++++++-
builtin/write-tree.c | 6 +++++-
cache-tree.c | 6 ++++--
cache-tree.h | 4 ++--
merge-recursive.c | 6 +++---
merge-recursive.h | 2 +-
t/t5720-sparse-repository-basics.sh | 2 +-
test-dump-cache-tree.c | 3 ++-
11 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index b10e8a2..3299aeb 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -454,7 +454,7 @@ static int merge_working_tree(struct checkout_opts *opts,
*/
init_merge_options(&o);
o.verbosity = 0;
- work = write_tree_from_memory(&o);
+ work = write_tree_from_memory(&o, old->commit->tree);

ret = reset_tree(new->commit->tree, opts, 1);
if (ret)
diff --git a/builtin/commit.c b/builtin/commit.c
index 66fdd22..ad6ecb2 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -550,8 +550,8 @@ static int ends_rfc2822_footer(struct strbuf *sb)
return 1;
}

-static int prepare_to_commit(const char *index_file, const char *prefix,
- struct wt_status *s)
+static int prepare_to_commit(const char *index_file, struct tree *head,
+ const char *prefix, struct wt_status *s)
{
struct stat statbuf;
int commitable, saved_color_setting;
@@ -733,7 +733,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
if (!active_cache_tree)
active_cache_tree = cache_tree();
if (cache_tree_update(active_cache_tree,
- active_cache, active_nr, 0, 0) < 0) {
+ active_cache, active_nr, head, 0, 0) < 0) {
error("Error building trees");
return 0;
}
@@ -1246,6 +1246,7 @@ static int run_rewrite_hook(const unsigned char *oldsha1,
int cmd_commit(int argc, const char **argv, const char *prefix)
{
struct strbuf sb = STRBUF_INIT;
+ struct tree *head_tree = NULL;
const char *index_file, *reflog_msg;
char *nl, *p;
unsigned char commit_sha1[20];
@@ -1273,7 +1274,13 @@ int cmd_commit(int argc, const char **argv, const char *prefix)

/* Set up everything for writing the commit object. This includes
running hooks, writing the trees, and interacting with the user. */
- if (!prepare_to_commit(index_file, prefix, &s)) {
+ if (git_sparse_pathspecs && !initial_commit) {
+ head_tree = parse_tree_indirect(head_sha1);
+ if (!head_tree)
+ die("failed to unpack HEAD tree object");
+ parse_tree(head_tree);
+ }
+ if (!prepare_to_commit(index_file, head_tree, prefix, &s)) {
rollback_index_files();
return 1;
}
diff --git a/builtin/merge.c b/builtin/merge.c
index cbb6c0d..f6fe51e 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -557,9 +557,9 @@ static int read_tree_trivial(unsigned char *common, unsigned char *head,
return 0;
}

-static void write_tree_trivial(unsigned char *sha1)
+static void write_tree_trivial(unsigned char *sha1, struct tree *head_tree)
{
- if (write_cache_as_tree(sha1, 0, NULL))
+ if (write_cache_as_tree(sha1, 0, head_tree, NULL))
die("git write-tree failed to write a tree");
}

@@ -778,12 +778,12 @@ static void add_strategies(const char *string, unsigned attr)

}

-static int merge_trivial(void)
+static int merge_trivial(struct tree *head_tree)
{
unsigned char result_tree[20], result_commit[20];
struct commit_list *parent = xmalloc(sizeof(*parent));

- write_tree_trivial(result_tree);
+ write_tree_trivial(result_tree, head_tree);
printf("Wonderful.\n");
parent->item = lookup_commit(head);
parent->next = xmalloc(sizeof(*parent->next));
@@ -901,6 +901,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
const char *head_arg;
int flag, head_invalid = 0, i;
int best_cnt = -1, merge_was_ok = 0, automerge_was_ok = 0;
+ struct commit *head_commit;
struct commit_list *common = NULL;
const char *best_strategy = NULL, *wt_strategy = NULL;
struct commit_list **remotes = &remoteheads;
@@ -1056,12 +1057,12 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
allow_trivial = 0;
}

+ head_commit = lookup_commit(head);
if (!remoteheads->next)
- common = get_merge_bases(lookup_commit(head),
- remoteheads->item, 1);
+ common = get_merge_bases(head_commit, remoteheads->item, 1);
else {
struct commit_list *list = remoteheads;
- commit_list_insert(lookup_commit(head), &list);
+ commit_list_insert(head_commit, &list);
common = get_octopus_merge_bases(list);
free(list);
}
@@ -1127,7 +1128,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
printf("Trying really trivial in-index merge...\n");
if (!read_tree_trivial(common->item->object.sha1,
head, remoteheads->item->object.sha1))
- return merge_trivial();
+ return merge_trivial(head_commit->tree);
printf("Nope.\n");
}
} else {
@@ -1232,7 +1233,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
}

/* Automerge succeeded. */
- write_tree_trivial(result_tree);
+ write_tree_trivial(result_tree, head_commit->tree);
automerge_was_ok = 1;
break;
}
diff --git a/builtin/revert.c b/builtin/revert.c
index 4b47ace..4e95589 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -404,7 +404,12 @@ static int do_pick_commit(void)
* that represents the "current" state for merge-recursive
* to work on.
*/
- if (write_cache_as_tree(head, 0, NULL))
+ struct tree *non_sparse_tree;
+ if (get_sha1("HEAD", head))
+ non_sparse_tree = NULL;
+ else
+ non_sparse_tree = lookup_commit(head)->tree;
+ if (write_cache_as_tree(head, 0, non_sparse_tree, NULL))
die ("Your index file is unmerged.");
} else {
if (get_sha1("HEAD", head))
diff --git a/builtin/write-tree.c b/builtin/write-tree.c
index b223af4..1e459b6 100644
--- a/builtin/write-tree.c
+++ b/builtin/write-tree.c
@@ -19,6 +19,7 @@ int cmd_write_tree(int argc, const char **argv, const char *unused_prefix)
int flags = 0, ret;
const char *prefix = NULL;
unsigned char sha1[20];
+ struct tree *non_sparse_tree = NULL;
const char *me = "git-write-tree";
struct option write_tree_options[] = {
OPT_BIT(0, "missing-ok", &flags, "allow missing objects",
@@ -37,7 +38,10 @@ int cmd_write_tree(int argc, const char **argv, const char *unused_prefix)
argc = parse_options(argc, argv, unused_prefix, write_tree_options,
write_tree_usage, 0);

- ret = write_cache_as_tree(sha1, flags, prefix);
+ if (git_sparse_pathspecs)
+ /* FIXME: Let user specify non_sparse_tree? Just use HEAD? */
+ die("Error: write-tree in a sparse repository would clip paths outside sparse region.");
+ ret = write_cache_as_tree(sha1, flags, non_sparse_tree, prefix);
switch (ret) {
case 0:
printf("%s\n", sha1_to_hex(sha1));
diff --git a/cache-tree.c b/cache-tree.c
index 2ba6a76..4309fa8 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -452,6 +452,7 @@ static int update_one(struct cache_tree *it,
int cache_tree_update(struct cache_tree *it,
struct cache_entry **cache,
int entries,
+ struct tree *head,
int missing_ok,
int dryrun)
{
@@ -460,7 +461,7 @@ int cache_tree_update(struct cache_tree *it,
if (i)
return i;

- i = update_one(it, cache, entries, NULL, "", 0, missing_ok, dryrun);
+ i = update_one(it, cache, entries, head, "", 0, missing_ok, dryrun);
if (i < 0)
return i;
return 0;
@@ -629,7 +630,7 @@ static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *pat
return it;
}

-int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)
+int write_cache_as_tree(unsigned char *sha1, int flags, struct tree *head, const char *prefix)
{
int entries, was_valid, newfd;
struct lock_file *lock_file;
@@ -657,6 +658,7 @@ int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)

if (cache_tree_update(active_cache_tree,
active_cache, active_nr,
+ head,
missing_ok, 0) < 0)
return WRITE_TREE_UNMERGED_INDEX;
if (0 <= newfd) {
diff --git a/cache-tree.h b/cache-tree.h
index 3df641f..85cc016 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -29,7 +29,7 @@ void cache_tree_write(struct strbuf *, struct cache_tree *root);
struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);

int cache_tree_fully_valid(struct cache_tree *);
-int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int, int);
+int cache_tree_update(struct cache_tree *, struct cache_entry **, int, struct tree*, int, int);

/* bitmasks to write_cache_as_tree flags */
#define WRITE_TREE_MISSING_OK 1
@@ -40,7 +40,7 @@ int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int, int)
#define WRITE_TREE_UNMERGED_INDEX (-2)
#define WRITE_TREE_PREFIX_ERROR (-3)

-int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix);
+int write_cache_as_tree(unsigned char *sha1, int flags, struct tree *head, const char *prefix);
void prime_cache_tree(struct cache_tree **, struct tree *);

extern int cache_tree_matches_traversal(struct cache_tree *, struct name_entry *ent, struct traverse_info *info);
diff --git a/merge-recursive.c b/merge-recursive.c
index 325a97b..b74c29f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -191,7 +191,7 @@ static int git_merge_trees(int index_only,
return rc;
}

-struct tree *write_tree_from_memory(struct merge_options *o)
+struct tree *write_tree_from_memory(struct merge_options *o, struct tree *head)
{
struct tree *result = NULL;

@@ -212,7 +212,7 @@ struct tree *write_tree_from_memory(struct merge_options *o)

if (!cache_tree_fully_valid(active_cache_tree) &&
cache_tree_update(active_cache_tree,
- active_cache, active_nr, 0, 0) < 0)
+ active_cache, active_nr, head, 0, 0) < 0)
die("error building trees");

result = lookup_tree(active_cache_tree->sha1);
@@ -1361,7 +1361,7 @@ int merge_trees(struct merge_options *o,
clean = 1;

if (o->call_depth)
- *result = write_tree_from_memory(o);
+ *result = write_tree_from_memory(o, head);

return clean;
}
diff --git a/merge-recursive.h b/merge-recursive.h
index 2eb5d1a..26110e2 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -51,7 +51,7 @@ int merge_recursive_generic(struct merge_options *o,
struct commit **result);

void init_merge_options(struct merge_options *o);
-struct tree *write_tree_from_memory(struct merge_options *o);
+struct tree *write_tree_from_memory(struct merge_options *o, struct tree *head);

int parse_merge_opt(struct merge_options *out, const char *s);

diff --git a/t/t5720-sparse-repository-basics.sh b/t/t5720-sparse-repository-basics.sh
index d04e171..88f40ab 100755
--- a/t/t5720-sparse-repository-basics.sh
+++ b/t/t5720-sparse-repository-basics.sh
@@ -115,7 +115,7 @@ test_expect_success 'basic: add works' '
git add sub/b/whatever
'

-test_expect_failure 'basic: commit works' '
+test_expect_success 'basic: commit works' '
git commit -m "Commit in a sparse clone" &&
git rev-parse master^{tree} &&
git rev-parse master:sub &&
diff --git a/test-dump-cache-tree.c b/test-dump-cache-tree.c
index 1f73f1e..ef0644f 100644
--- a/test-dump-cache-tree.c
+++ b/test-dump-cache-tree.c
@@ -57,8 +57,9 @@ static int dump_cache_tree(struct cache_tree *it,
int main(int ac, char **av)
{
struct cache_tree *another = cache_tree();
+ struct tree *non_sparse_tree = NULL; /* We don't test sparse clones */
if (read_cache() < 0)
die("unable to read index file");
- cache_tree_update(another, active_cache, active_nr, 0, 1);
+ cache_tree_update(another, active_cache, active_nr, non_sparse_tree, 0, 1);
return dump_cache_tree(active_cache_tree, another, "");
}
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:13:59 UTC
Permalink
cache_tree_update() will write trees using the index. With sparse clones,
the index will only contain entries matching the sparse limits, meaning
that the index does not provide enough information to write complete tree
objects. Having cache_tree_update() take a tree (typically HEAD), will
allow new complete trees to be constructed by using entries from the
specified tree outside the sparse limits together with the index.

Of course, in the non-sparse clone case, cache_tree_update() needs no tree
to be provided since the index contains all relevant information. So
providing NULL as the tree is supported as another way of getting the
traditional behavior.

This patch only provides the new capability without invoking it; subsequent
patches will update callers of cache_tree_update() to provide a relevant
tree.

Signed-off-by: Elijah Newren <***@gmail.com>
---
cache-tree.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 87 insertions(+), 1 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index c60cf91..2ba6a76 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -2,6 +2,7 @@
#include "tree.h"
#include "tree-walk.h"
#include "cache-tree.h"
+#include "diff.h" /* FIXME: for tree_entry_interesting; maybe it should be in tree-walk.h? */

#ifndef DEBUG
#define DEBUG 0
@@ -232,15 +233,91 @@ int cache_tree_fully_valid(struct cache_tree *it)
return 1;
}

+static void setup_tree_desc(struct tree_desc *desc, struct tree* tree)
+{
+ if (git_sparse_pathspecs && tree)
+ init_tree_desc(desc, tree->buffer, tree->size);
+ else
+ init_tree_desc(desc, NULL, 0);
+}
+
+static struct tree* find_non_sparse_subtree(struct tree_desc *desc,
+ const char *path,
+ int pathlen)
+{
+ struct name_entry entry;
+
+ if (!git_sparse_pathspecs)
+ return NULL; /* No tree needed or useful */
+
+ while (tree_entry(desc, &entry))
+ if (!memcmp(entry.path, path, pathlen) &&
+ strlen(entry.path) == pathlen) {
+ struct tree *subtree = lookup_tree(entry.sha1);
+ if (parse_tree(subtree) < 0)
+ die("bad tree %s", sha1_to_hex(entry.sha1));
+ return subtree;
+ }
+ return NULL;
+}
+
+static void add_missing_paths_before(const char *path,
+ int pathlen,
+ int baselen,
+ struct tree_desc *desc,
+ struct strbuf *buffer)
+{
+ if (!git_sparse_pathspecs)
+ return; /* No paths are missing */
+
+ for (; desc->size; update_tree_entry(desc)) {
+ struct name_entry entry = desc->entry;
+ int entlen = strlen(entry.path);
+
+ /* We only want paths before path */
+ if (path)
+ /*
+ * FIXME: if entlen < pathlen, do I need to
+ * use strncmp instead? Does entry.path being
+ * NUL-terminated ensure memcmp exits
+ * early?
+ */
+ if (memcmp(entry.path,
+ path+baselen,
+ pathlen-baselen) >= 0)
+ break;
+
+ /* We only want paths "missing" from index due to sparsity */
+ if (path) {
+ int show = tree_entry_interesting(desc,
+ path,
+ baselen,
+ &git_sparse_diffopts);
+ if (show == 2) {
+ desc->size = 0;
+ break;
+ }
+ else if (show > 0)
+ continue;
+ }
+
+ strbuf_grow(buffer, entlen + 100);
+ strbuf_addf(buffer, "%o %.*s%c", entry.mode, entlen, entry.path, '\0');
+ strbuf_add(buffer, entry.sha1, 20);
+ }
+}
+
static int update_one(struct cache_tree *it,
struct cache_entry **cache,
int entries,
+ struct tree *non_sparse_tree,
const char *base,
int baselen,
int missing_ok,
int dryrun)
{
struct strbuf buffer;
+ struct tree_desc desc;
int i;

if (0 <= it->entry_count && has_sha1_file(it->sha1))
@@ -257,9 +334,11 @@ static int update_one(struct cache_tree *it,
/*
* Find the subtrees and update them.
*/
+ setup_tree_desc(&desc, non_sparse_tree);
for (i = 0; i < entries; i++) {
struct cache_entry *ce = cache[i];
struct cache_tree_sub *sub;
+ struct tree *subtree;
const char *path, *slash;
int pathlen, sublen, subcnt;

@@ -280,8 +359,10 @@ static int update_one(struct cache_tree *it,
sub = find_subtree(it, path + baselen, sublen, 1);
if (!sub->cache_tree)
sub->cache_tree = cache_tree();
+ subtree = find_non_sparse_subtree(&desc, path+baselen, sublen);
subcnt = update_one(sub->cache_tree,
cache + i, entries - i,
+ subtree,
path,
baselen + sublen + 1,
missing_ok,
@@ -298,6 +379,7 @@ static int update_one(struct cache_tree *it,
* Then write out the tree object for this level.
*/
strbuf_init(&buffer, 8192);
+ setup_tree_desc(&desc, non_sparse_tree);

for (i = 0; i < entries; i++) {
struct cache_entry *ce = cache[i];
@@ -309,6 +391,8 @@ static int update_one(struct cache_tree *it,

path = ce->name;
pathlen = ce_namelen(ce);
+ add_missing_paths_before(path, pathlen, baselen,
+ &desc, &buffer);
if (pathlen <= baselen || memcmp(base, path, baselen))
break; /* at the end of this level */

@@ -346,6 +430,7 @@ static int update_one(struct cache_tree *it,
mode, entlen, path + baselen);
#endif
}
+ add_missing_paths_before(NULL, 0, 0, &desc, &buffer);

if (dryrun)
hash_sha1_file(buffer.buf, buffer.len, tree_type, it->sha1);
@@ -374,7 +459,8 @@ int cache_tree_update(struct cache_tree *it,
i = verify_cache(cache, entries);
if (i)
return i;
- i = update_one(it, cache, entries, "", 0, missing_ok, dryrun);
+
+ i = update_one(it, cache, entries, NULL, "", 0, missing_ok, dryrun);
if (i < 0)
return i;
return 0;
--
1.7.2.2.140.gd06af
Nguyen Thai Ngoc Duy
2010-09-05 07:54:54 UTC
Permalink
cache_tree_update() will write trees using the index. =C2=A0With spar=
se clones,
the index will only contain entries matching the sparse limits, meani=
ng
that the index does not provide enough information to write complete =
tree
objects. =C2=A0Having cache_tree_update() take a tree (typically HEAD=
), will
allow new complete trees to be constructed by using entries from the
specified tree outside the sparse limits together with the index.
You are moving it closer to the index (from my view because I changed
in commit_tree()). This makes me think, why don't you move the base
tree into the index itself?

The index is supposed to save the image of full worktree. While you
don't have all path names, you have the clue to all of them, the base
tree. To me, that means it belongs to the index. That would reduce
code change to
- cache-tree.c (generate new tree from the base tree and index)
- read-cache.c (new sparse-clone index extension)
- index writing operations (save the base tree in index): read-tree an=
d merge
--=20
Duy
Elijah Newren
2010-09-05 21:09:24 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
cache_tree_update() will write trees using the index. =C2=A0With spa=
rse clones,
Post by Nguyen Thai Ngoc Duy
the index will only contain entries matching the sparse limits, mean=
ing
Post by Nguyen Thai Ngoc Duy
that the index does not provide enough information to write complete=
tree
Post by Nguyen Thai Ngoc Duy
objects. =C2=A0Having cache_tree_update() take a tree (typically HEA=
D), will
Post by Nguyen Thai Ngoc Duy
allow new complete trees to be constructed by using entries from the
specified tree outside the sparse limits together with the index.
You are moving it closer to the index (from my view because I changed
in commit_tree()). This makes me think, why don't you move the base
tree into the index itself?
The index is supposed to save the image of full worktree. While you
don't have all path names, you have the clue to all of them, the base
tree. To me, that means it belongs to the index. That would reduce
code change to
=C2=A0- cache-tree.c (generate new tree from the base tree and index)
=C2=A0- read-cache.c (new sparse-clone index extension)
=C2=A0- index writing operations (save the base tree in index): read-=
tree and merge

That's a really good idea. I'll look into that.
Elijah Newren
2010-09-06 04:42:53 UTC
Permalink
cache_tree_update() will write trees using the index. =C2=A0With sp=
arse clones,
the index will only contain entries matching the sparse limits, mea=
ning
that the index does not provide enough information to write complet=
e tree
objects. =C2=A0Having cache_tree_update() take a tree (typically HE=
AD), will
allow new complete trees to be constructed by using entries from th=
e
specified tree outside the sparse limits together with the index.
You are moving it closer to the index (from my view because I change=
d
in commit_tree()). This makes me think, why don't you move the base
tree into the index itself?
The index is supposed to save the image of full worktree. While you
don't have all path names, you have the clue to all of them, the bas=
e
tree. To me, that means it belongs to the index. That would reduce
code change to
=C2=A0- cache-tree.c (generate new tree from the base tree and index=
)
=C2=A0- read-cache.c (new sparse-clone index extension)
=C2=A0- index writing operations (save the base tree in index): read=
-tree and merge
That's a really good idea. =C2=A0I'll look into that.
Hmm..maybe before I get ahead of myself, I should back up for a
second. Which way do we think is better -- handling this in
cache_tree_update() or doing a fixup in commit_tree()? If the latter,
then I should just drop my 7/15 and 8/15 for your 13/17 and 14/17. If
the former, then it makes sense to look into this change you suggest.
In that case, we'd probably keep my 7/15 but drop 8/15 for patch(es)
that implement your idea above. But you're more familiar with the
index format than I am and it's your idea, so you'd probably be the
better one to implement it.

Thoughts?

Elijah
Nguyen Thai Ngoc Duy
2010-09-06 05:02:50 UTC
Permalink
Post by Elijah Newren
cache_tree_update() will write trees using the index. =C2=A0With s=
parse clones,
Post by Elijah Newren
the index will only contain entries matching the sparse limits, me=
aning
Post by Elijah Newren
that the index does not provide enough information to write comple=
te tree
Post by Elijah Newren
objects. =C2=A0Having cache_tree_update() take a tree (typically H=
EAD), will
Post by Elijah Newren
allow new complete trees to be constructed by using entries from t=
he
Post by Elijah Newren
specified tree outside the sparse limits together with the index.
You are moving it closer to the index (from my view because I chang=
ed
Post by Elijah Newren
in commit_tree()). This makes me think, why don't you move the base
tree into the index itself?
The index is supposed to save the image of full worktree. While you
don't have all path names, you have the clue to all of them, the ba=
se
Post by Elijah Newren
tree. To me, that means it belongs to the index. That would reduce
code change to
=C2=A0- cache-tree.c (generate new tree from the base tree and inde=
x)
Post by Elijah Newren
=C2=A0- read-cache.c (new sparse-clone index extension)
=C2=A0- index writing operations (save the base tree in index): rea=
d-tree and merge
Post by Elijah Newren
That's a really good idea. =C2=A0I'll look into that.
Hmm..maybe before I get ahead of myself, I should back up for a
second. =C2=A0Which way do we think is better -- handling this in
cache_tree_update() or doing a fixup in commit_tree()? =C2=A0If the l=
atter,
Post by Elijah Newren
then I should just drop my 7/15 and 8/15 for your 13/17 and 14/17. =C2=
=A0If
Post by Elijah Newren
the former, then it makes sense to look into this change you suggest.
In that case, we'd probably keep my 7/15 but drop 8/15 for patch(es)
that implement your idea above. =C2=A0But you're more familiar with t=
he
Post by Elijah Newren
index format than I am and it's your idea, so you'd probably be the
better one to implement it.
Thoughts?
The former makes more sense. I still keep an eye on remote merge
support. Think of this case: you request a remote merge and have a
completely new base tree, different from HEAD. Then you get some
conflicts inside narrow/sparse area. By the time you resolve all
conflicts and are about to commit, where do you get the base tree?

It could follow the same way merge does, store it in a file in
$GIT_DIR, similar to $GIT_DIR/MERGE_HEAD, and make "git commit" pick
it up. Or just store it in index. You would need to (or should) change
the index anyway, to prevent naive git implementation from using it.
It makes more sense to put some more in index rather than a separate
file. Some commands (especially "git commit") make use of temporary
index. I think putting the base tree in index is a good point here,
but I'm not quite sure.

Anyway if you want to play with it, apply this on top of my 04/17
(then change index_state.narrow_base to something suitable for sparse
clone)

diff --git a/cache.h b/cache.h
index 9c014ef..c7d626d 100644
--- a/cache.h
+++ b/cache.h
@@ -293,6 +293,7 @@ struct index_state {
struct cache_tree *cache_tree;
struct cache_time timestamp;
char *narrow_prefix;
+ unsigned char narrow_base[20];
void *alloc;
unsigned name_hash_initialized : 1,
initialized : 1;
diff --git a/read-cache.c b/read-cache.c
index 250013c..15c1c9e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1191,7 +1191,8 @@ static int read_index_extension(struct
index_state *istate,
istate->resolve_undo =3D resolve_undo_read(data, sz);
break;
case CACHE_EXT_NARROW:
- istate->narrow_prefix =3D xstrdup(data);
+ hashcpy(istate->narrow_base, data);
+ istate->narrow_prefix =3D xstrdup(data+sizeof(istate->narrow_base));
break;
default:
if (*ext < 'A' || 'Z' < *ext)
@@ -1396,6 +1397,7 @@ int discard_index(struct index_state *istate)
istate->alloc =3D NULL;
istate->initialized =3D 0;
free(istate->narrow_prefix);
+ hashcpy(istate->narrow_base, (const unsigned char *)EMPTY_TREE_SHA1_B=
IN);
istate->narrow_prefix =3D NULL;

/* no need to throw away allocated active_cache */
@@ -1627,11 +1629,15 @@ int write_index(struct index_state *istate, int=
newfd)
return -1;
}
if (get_narrow_prefix()) {
+ struct strbuf sb =3D STRBUF_INIT;
char *buf =3D get_narrow_string();
- int len =3D strlen(buf)+1;
- err =3D write_index_ext_header(&c, newfd, CACHE_EXT_NARROW, len) < 0=
||
- ce_write(&c, newfd, buf, len) < 0;
+ int len =3D 20+strlen(buf)+1;
+ strbuf_add(&sb, istate->narrow_base, 20);
+ strbuf_addstr(&sb, buf);
free(buf);
+ err =3D write_index_ext_header(&c, newfd, CACHE_EXT_NARROW, len) < 0=
||
+ ce_write(&c, newfd, sb.buf, len) < 0;
+ strbuf_release(&sb);
if (err)
return -1;
}
--=20
Duy
Nguyễn Thái Ngọc Duy
2010-09-06 04:47:05 UTC
Permalink
Post by Elijah Newren
diff --git a/cache-tree.c b/cache-tree.c
index c60cf91..2ba6a76 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -2,6 +2,7 @@
#include "tree.h"
#include "tree-walk.h"
#include "cache-tree.h"
+#include "diff.h" /* FIXME: for tree_entry_interesting; maybe it sh=
ould be in tree-walk.h? */

Yes, please. And I'd rather see it done sooner than later, before
tree_entry_interesting() usage is spread over the place.

Elijah Newren (2):
Add testcases showing how pathspecs are ignored with rev-list
--objects
Make rev-list --objects work together with pathspecs

Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy (2):
tree-walk: copy tree_entry_interesting() as-is from tree-diff.c
tree-walk: actually move tree_entry_interesting() to tree-walk.c

list-objects.c | 25 ++++++++++
revision.c | 8 ++-
revision.h | 3 +-
t/t6000-rev-list-misc.sh | 51 +++++++++++++++++++
tree-diff.c | 121 ++------------------------------------=
--------
tree-walk.c | 110 ++++++++++++++++++++++++++++++++++++++=
+++
tree-walk.h | 3 +
7 files changed, 201 insertions(+), 120 deletions(-)
create mode 100755 t/t6000-rev-list-misc.sh
Nguyễn Thái Ngọc Duy
2010-09-06 04:47:07 UTC
Permalink
Just a straight copy. The function is not used anywhere. It is to
separate changes that will be made to this function in the next patch.

Signed-off-by: Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy <***@gmail=
=2Ecom>
---
tree-walk.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++=
++++++++
1 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index a9bbf4e..bc83fa3 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -2,6 +2,7 @@
#include "tree-walk.h"
#include "unpack-trees.h"
#include "tree.h"
+#include "diff.h"
=20
static const char *get_mode(const char *str, unsigned int *modep)
{
@@ -455,3 +456,117 @@ int get_tree_entry(const unsigned char *tree_sha1=
, const char *name, unsigned ch
free(tree);
return retval;
}
+
+/*
+ * Is a tree entry interesting given the pathspec we have?
+ *
+ * Return:
+ * - 2 for "yes, and all subsequent entries will be"
+ * - 1 for yes
+ * - zero for no
+ * - negative for "no, and no subsequent entries will be either"
+ */
+static int tree_entry_interesting(struct tree_desc *desc, const char *=
base, int baselen, struct diff_options *opt)
+{
+ const char *path;
+ const unsigned char *sha1;
+ unsigned mode;
+ int i;
+ int pathlen;
+ int never_interesting =3D -1;
+
+ if (!opt->nr_paths)
+ return 1;
+
+ sha1 =3D tree_entry_extract(desc, &path, &mode);
+
+ pathlen =3D tree_entry_len(path, sha1);
+
+ for (i =3D 0; i < opt->nr_paths; i++) {
+ const char *match =3D opt->paths[i];
+ int matchlen =3D opt->pathlens[i];
+ int m =3D -1; /* signals that we haven't called strncmp() */
+
+ if (baselen >=3D matchlen) {
+ /* If it doesn't match, move along... */
+ if (strncmp(base, match, matchlen))
+ continue;
+
+ /*
+ * If the base is a subdirectory of a path which
+ * was specified, all of them are interesting.
+ */
+ if (!matchlen ||
+ base[matchlen] =3D=3D '/' ||
+ match[matchlen - 1] =3D=3D '/')
+ return 2;
+
+ /* Just a random prefix match */
+ continue;
+ }
+
+ /* Does the base match? */
+ if (strncmp(base, match, baselen))
+ continue;
+
+ match +=3D baselen;
+ matchlen -=3D baselen;
+
+ if (never_interesting) {
+ /*
+ * We have not seen any match that sorts later
+ * than the current path.
+ */
+
+ /*
+ * Does match sort strictly earlier than path
+ * with their common parts?
+ */
+ m =3D strncmp(match, path,
+ (matchlen < pathlen) ? matchlen : pathlen);
+ if (m < 0)
+ continue;
+
+ /*
+ * If we come here even once, that means there is at
+ * least one pathspec that would sort equal to or
+ * later than the path we are currently looking at.
+ * In other words, if we have never reached this point
+ * after iterating all pathspecs, it means all
+ * pathspecs are either outside of base, or inside the
+ * base but sorts strictly earlier than the current
+ * one. In either case, they will never match the
+ * subsequent entries. In such a case, we initialized
+ * the variable to -1 and that is what will be
+ * returned, allowing the caller to terminate early.
+ */
+ never_interesting =3D 0;
+ }
+
+ if (pathlen > matchlen)
+ continue;
+
+ if (matchlen > pathlen) {
+ if (match[pathlen] !=3D '/')
+ continue;
+ if (!S_ISDIR(mode))
+ continue;
+ }
+
+ if (m =3D=3D -1)
+ /*
+ * we cheated and did not do strncmp(), so we do
+ * that here.
+ */
+ m =3D strncmp(match, path, pathlen);
+
+ /*
+ * If common part matched earlier then it is a hit,
+ * because we rejected the case where path is not a
+ * leading directory and is shorter than match.
+ */
+ if (!m)
+ return 1;
+ }
+ return never_interesting; /* No matches */
+}
--=20
1.7.1.rc1.69.g24c2f7
Elijah Newren
2010-09-06 15:22:38 UTC
Permalink
Post by Nguyễn Thái Ngọc Duy
Just a straight copy. The function is not used anywhere. It is to
separate changes that will be made to this function in the next patch=
=2E

Why copy instead of move?
Nguyen Thai Ngoc Duy
2010-09-06 22:09:56 UTC
Permalink
Post by Elijah Newren
Post by Nguyễn Thái Ngọc Duy
Just a straight copy. The function is not used anywhere. It is to
separate changes that will be made to this function in the next patc=
h.
Post by Elijah Newren
Why copy instead of move?
Right. I wanted to keep it in buildable state at anytime so I made
zero changes to tree-diff.c. But you're right, that should be a move
and make sure the code builds.
--=20
Duy
Nguyễn Thái Ngọc Duy
2010-09-06 04:47:06 UTC
Permalink
=46rom: Elijah Newren <***@gmail.com>

Signed-off-by: Elijah Newren <***@gmail.com>
Signed-off-by: Junio C Hamano <***@pobox.com>
Signed-off-by: Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy <***@gmail=
=2Ecom>
---
No changes.

t/t6000-rev-list-misc.sh | 51 ++++++++++++++++++++++++++++++++++++++=
++++++++
1 files changed, 51 insertions(+), 0 deletions(-)
create mode 100755 t/t6000-rev-list-misc.sh

diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
new file mode 100755
index 0000000..b3c1dd8
--- /dev/null
+++ b/t/t6000-rev-list-misc.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+
+test_description=3D'miscellaneous rev-list tests'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ echo content1 >wanted_file &&
+ echo content2 >unwanted_file &&
+ git add wanted_file unwanted_file &&
+ git commit -m one
+'
+
+test_expect_failure 'rev-list --objects heeds pathspecs' '
+ git rev-list --objects HEAD -- wanted_file >output &&
+ grep wanted_file output &&
+ ! grep unwanted_file output
+'
+
+test_expect_failure 'rev-list --objects with pathspecs and deeper path=
s' '
+ mkdir foo &&
+ >foo/file &&
+ git add foo/file &&
+ git commit -m two &&
+
+ git rev-list --objects HEAD -- foo >output &&
+ grep foo/file output &&
+
+ git rev-list --objects HEAD -- foo/file >output &&
+ grep foo/file output &&
+ ! grep unwanted_file output
+'
+
+test_expect_failure 'rev-list --objects with pathspecs and copied file=
s' '
+ git checkout --orphan junio-testcase &&
+ git rm -rf . &&
+
+ mkdir two &&
+ echo frotz >one &&
+ cp one two/three &&
+ git add one two/three &&
+ test_tick &&
+ git commit -m that &&
+
+ ONE=3D$(git rev-parse HEAD:one)
+ git rev-list --objects HEAD two >output &&
+ grep "$ONE two/three" output &&
+ ! grep one output
+'
+
+test_done
--=20
1.7.1.rc1.69.g24c2f7
Nguyễn Thái Ngọc Duy
2010-09-06 04:47:08 UTC
Permalink
This function can be potentially used in more places than just
tree-diff.c. This patches removes struct diff_options dependency from
the function, and moves it to tree-walk.c.

No functionality change intended.

Signed-off-by: Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy <***@gmail=
=2Ecom>
---
tree-diff.c | 121 ++-------------------------------------------------=
--------
tree-walk.c | 25 +++++-------
tree-walk.h | 3 +
3 files changed, 17 insertions(+), 132 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index cd659c6..332f0c8 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -82,120 +82,6 @@ static int compare_tree_entry(struct tree_desc *t1,=
struct tree_desc *t2, const
return 0;
}
=20
-/*
- * Is a tree entry interesting given the pathspec we have?
- *
- * Return:
- * - 2 for "yes, and all subsequent entries will be"
- * - 1 for yes
- * - zero for no
- * - negative for "no, and no subsequent entries will be either"
- */
-static int tree_entry_interesting(struct tree_desc *desc, const char *=
base, int baselen, struct diff_options *opt)
-{
- const char *path;
- const unsigned char *sha1;
- unsigned mode;
- int i;
- int pathlen;
- int never_interesting =3D -1;
-
- if (!opt->nr_paths)
- return 1;
-
- sha1 =3D tree_entry_extract(desc, &path, &mode);
-
- pathlen =3D tree_entry_len(path, sha1);
-
- for (i =3D 0; i < opt->nr_paths; i++) {
- const char *match =3D opt->paths[i];
- int matchlen =3D opt->pathlens[i];
- int m =3D -1; /* signals that we haven't called strncmp() */
-
- if (baselen >=3D matchlen) {
- /* If it doesn't match, move along... */
- if (strncmp(base, match, matchlen))
- continue;
-
- /*
- * If the base is a subdirectory of a path which
- * was specified, all of them are interesting.
- */
- if (!matchlen ||
- base[matchlen] =3D=3D '/' ||
- match[matchlen - 1] =3D=3D '/')
- return 2;
-
- /* Just a random prefix match */
- continue;
- }
-
- /* Does the base match? */
- if (strncmp(base, match, baselen))
- continue;
-
- match +=3D baselen;
- matchlen -=3D baselen;
-
- if (never_interesting) {
- /*
- * We have not seen any match that sorts later
- * than the current path.
- */
-
- /*
- * Does match sort strictly earlier than path
- * with their common parts?
- */
- m =3D strncmp(match, path,
- (matchlen < pathlen) ? matchlen : pathlen);
- if (m < 0)
- continue;
-
- /*
- * If we come here even once, that means there is at
- * least one pathspec that would sort equal to or
- * later than the path we are currently looking at.
- * In other words, if we have never reached this point
- * after iterating all pathspecs, it means all
- * pathspecs are either outside of base, or inside the
- * base but sorts strictly earlier than the current
- * one. In either case, they will never match the
- * subsequent entries. In such a case, we initialized
- * the variable to -1 and that is what will be
- * returned, allowing the caller to terminate early.
- */
- never_interesting =3D 0;
- }
-
- if (pathlen > matchlen)
- continue;
-
- if (matchlen > pathlen) {
- if (match[pathlen] !=3D '/')
- continue;
- if (!S_ISDIR(mode))
- continue;
- }
-
- if (m =3D=3D -1)
- /*
- * we cheated and did not do strncmp(), so we do
- * that here.
- */
- m =3D strncmp(match, path, pathlen);
-
- /*
- * If common part matched earlier then it is a hit,
- * because we rejected the case where path is not a
- * leading directory and is shorter than match.
- */
- if (!m)
- return 1;
- }
- return never_interesting; /* No matches */
-}
-
/* A whole sub-tree went away or appeared */
static void show_tree(struct diff_options *opt, const char *prefix, st=
ruct tree_desc *desc, const char *base, int baselen)
{
@@ -206,8 +92,8 @@ static void show_tree(struct diff_options *opt, cons=
t char *prefix, struct tree_
if (all_interesting)
show =3D 1;
else {
- show =3D tree_entry_interesting(desc, base, baselen,
- opt);
+ show =3D tree_entry_interesting(&desc->entry, base, baselen,
+ opt->paths, opt->pathlens, opt->nr_paths);
if (show =3D=3D 2)
all_interesting =3D 1;
}
@@ -266,7 +152,8 @@ static void skip_uninteresting(struct tree_desc *t,=
const char *base, int basele
if (all_interesting)
show =3D 1;
else {
- show =3D tree_entry_interesting(t, base, baselen, opt);
+ show =3D tree_entry_interesting(&t->entry, base, baselen,
+ opt->paths, opt->pathlens, opt->nr_paths);
if (show =3D=3D 2)
all_interesting =3D 1;
}
diff --git a/tree-walk.c b/tree-walk.c
index bc83fa3..c665e35 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -2,7 +2,6 @@
#include "tree-walk.h"
#include "unpack-trees.h"
#include "tree.h"
-#include "diff.h"
=20
static const char *get_mode(const char *str, unsigned int *modep)
{
@@ -466,25 +465,21 @@ int get_tree_entry(const unsigned char *tree_sha1=
, const char *name, unsigned ch
* - zero for no
* - negative for "no, and no subsequent entries will be either"
*/
-static int tree_entry_interesting(struct tree_desc *desc, const char *=
base, int baselen, struct diff_options *opt)
+int tree_entry_interesting(const struct name_entry *entry, const char =
*base, int baselen,
+ const char **paths, int *pathlens, int nr_paths)
{
- const char *path;
- const unsigned char *sha1;
- unsigned mode;
int i;
int pathlen;
int never_interesting =3D -1;
=20
- if (!opt->nr_paths)
+ if (!nr_paths)
return 1;
=20
- sha1 =3D tree_entry_extract(desc, &path, &mode);
-
- pathlen =3D tree_entry_len(path, sha1);
+ pathlen =3D tree_entry_len(entry->path, entry->sha1);
=20
- for (i =3D 0; i < opt->nr_paths; i++) {
- const char *match =3D opt->paths[i];
- int matchlen =3D opt->pathlens[i];
+ for (i =3D 0; i < nr_paths; i++) {
+ const char *match =3D paths[i];
+ int matchlen =3D pathlens[i];
int m =3D -1; /* signals that we haven't called strncmp() */
=20
if (baselen >=3D matchlen) {
@@ -522,7 +517,7 @@ static int tree_entry_interesting(struct tree_desc =
*desc, const char *base, int
* Does match sort strictly earlier than path
* with their common parts?
*/
- m =3D strncmp(match, path,
+ m =3D strncmp(match, entry->path,
(matchlen < pathlen) ? matchlen : pathlen);
if (m < 0)
continue;
@@ -549,7 +544,7 @@ static int tree_entry_interesting(struct tree_desc =
*desc, const char *base, int
if (matchlen > pathlen) {
if (match[pathlen] !=3D '/')
continue;
- if (!S_ISDIR(mode))
+ if (!S_ISDIR(entry->mode))
continue;
}
=20
@@ -558,7 +553,7 @@ static int tree_entry_interesting(struct tree_desc =
*desc, const char *base, int
* we cheated and did not do strncmp(), so we do
* that here.
*/
- m =3D strncmp(match, path, pathlen);
+ m =3D strncmp(match, entry->path, pathlen);
=20
/*
* If common part matched earlier then it is a hit,
diff --git a/tree-walk.h b/tree-walk.h
index 88ea7e9..b20f3c4 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -57,4 +57,7 @@ static inline int traverse_path_len(const struct trav=
erse_info *info, const stru
return info->pathlen + tree_entry_len(n->path, n->sha1);
}
=20
+extern int tree_entry_interesting(const struct name_entry *entry, cons=
t char *base, int baselen,
+ const char **paths, int *pathlens, int nr_paths);
+
#endif
--=20
1.7.1.rc1.69.g24c2f7
Elijah Newren
2010-09-06 15:31:57 UTC
Permalink
Post by Nguyễn Thái Ngọc Duy
This function can be potentially used in more places than just
tree-diff.c. This patches removes struct diff_options dependency from
the function, and moves it to tree-walk.c.
No functionality change intended.
Thanks for working on this. I like having the declaration of
tree_entry_interesting() moved to tree_walk.h, at the very least. The
change to make tree_entry_interesting() take an entry instead of a
tree_desc makes sense too.

I'm unsure about replacing the diff_options with paths + pathlens +
nr_paths -- that might be exposing too much implementation detail in
the API. In particular, I'm worried that if we try to add support for
negated pathspecs or globs or regexes to tree_entry_interesting(),
then we'll need to pass different data to this function and update an
awful lot of callers.

Perhaps we should make a new struct containing paths + pathlens +
nr_paths, make tree_entry_interesting() take such a struct, modify
diff_options have such a struct instead of the current three paths,
pathlens, and nr_paths fields, and modify diff_tree_setup_path()s to
take such a struct instead of a diff_options* (and perhaps move
diff_tree_setup_paths() out of diff.h and tree-diff.c into some other
file(s)?).

Thoughts?

Elijah
Nguyen Thai Ngoc Duy
2010-09-06 22:20:06 UTC
Permalink
Post by Nguyễn Thái Ngọc Duy
This function can be potentially used in more places than just
tree-diff.c. This patches removes struct diff_options dependency fro=
m
Post by Nguyễn Thái Ngọc Duy
the function, and moves it to tree-walk.c.
No functionality change intended.
Thanks for working on this. =C2=A0I like having the declaration of
tree_entry_interesting() moved to tree_walk.h, at the very least. =C2=
=A0The
change to make tree_entry_interesting() take an entry instead of a
tree_desc makes sense too.
I'm unsure about replacing the diff_options with paths + pathlens +
nr_paths -- that might be exposing too much implementation detail in
the API. =C2=A0In particular, I'm worried that if we try to add suppo=
rt for
negated pathspecs or globs or regexes to tree_entry_interesting(),
then we'll need to pass different data to this function and update an
awful lot of callers.
Perhaps we should make a new struct containing paths + pathlens +
nr_paths, make tree_entry_interesting() take such a struct, modify
diff_options have such a struct instead of the current three paths,
pathlens, and nr_paths fields, and modify diff_tree_setup_path()s to
take such a struct instead of a diff_options* (and perhaps move
diff_tree_setup_paths() out of diff.h and tree-diff.c into some other
file(s)?).
Thoughts?
You're right again. Perhaps something like struct exclude_list from dir=
=2Eh?

struct pathspec_list {
int nr;
int alloc;
struct pathspec {
const char *path;
int pathlen;
int flags;
} **paths;
};

Hmm.. or just use struct exclude_list, with diff_options.path{s,len}
becoming exclude_list.base{,len}. exclude_list.pattern can be used for
glob/regex matching (if we are ever going to support that).
--=20
Duy
Junio C Hamano
2010-09-06 23:53:59 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
Hmm.. or just use struct exclude_list, with diff_options.path{s,len}
becoming exclude_list.base{,len}. exclude_list.pattern can be used for
glob/regex matching (if we are ever going to support that).
If you are changing the API and structures around this area, you really
should aim to help us to support the globbing in diff-tree family just
like we do in ls-files family, so yes.
Nguyễn Thái Ngọc Duy
2010-09-06 04:47:09 UTC
Permalink
=46rom: Elijah Newren <***@gmail.com>

When traversing commits, the selection of commits would heed the list o=
f
pathspecs passed, but subsequent walking of the trees of those commits
would not. This resulted in 'rev-list --objects HEAD -- <paths>'
displaying objects at unwanted paths.

Have process_tree() call tree_entry_interesting() to determine which pa=
ths
are interesting and should be walked.

Naturally, this change can provide a large speedup when paths are speci=
fied
together with --objects, since many tree entries are now correctly igno=
red.
Interestingly, though, this change also gives me a small (~1%) but
repeatable speedup even when no paths are specified with --objects.

Signed-off-by: Elijah Newren <***@gmail.com>
Signed-off-by: Junio C Hamano <***@pobox.com>
Signed-off-by: Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy <***@gmail=
=2Ecom>
---
The "while (tree_entry(..))" loop makes a come back.
tree_entry_interesting() call is updated.
Nothing else is changed

list-objects.c | 25 +++++++++++++++++++++++++
revision.c | 8 ++++++--
revision.h | 3 ++-
t/t6000-rev-list-misc.sh | 6 +++---
4 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 8953548..8ea1822 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -67,6 +67,9 @@ static void process_tree(struct rev_info *revs,
struct tree_desc desc;
struct name_entry entry;
struct name_path me;
+ int all_interesting =3D (revs->diffopt.nr_paths =3D=3D 0);
+ char *full_prefix =3D NULL;
+ int full_prefix_len =3D 0;
=20
if (!revs->tree_objects)
return;
@@ -82,9 +85,30 @@ static void process_tree(struct rev_info *revs,
me.elem =3D name;
me.elem_len =3D strlen(name);
=20
+ if (!all_interesting) {
+ full_prefix =3D path_name_impl(path, name, 1);
+ full_prefix_len =3D strlen(full_prefix);
+ }
+
init_tree_desc(&desc, tree->buffer, tree->size);
=20
while (tree_entry(&desc, &entry)) {
+ if (!all_interesting) {
+ int showit =3D tree_entry_interesting(&entry,
+ full_prefix,
+ full_prefix_len,
+ revs->diffopt.paths,
+ revs->diffopt.pathlens,
+ revs->diffopt.nr_paths);
+
+ if (showit < 0)
+ break;
+ else if (!showit)
+ continue;
+ else if (showit =3D=3D 2)
+ all_interesting =3D 1;
+ }
+
if (S_ISDIR(entry.mode))
process_tree(revs,
lookup_tree(entry.sha1),
@@ -97,6 +121,7 @@ static void process_tree(struct rev_info *revs,
lookup_blob(entry.sha1),
show, &me, entry.path);
}
+ free(full_prefix);
free(tree->buffer);
tree->buffer =3D NULL;
}
diff --git a/revision.c b/revision.c
index b1c1890..55c4586 100644
--- a/revision.c
+++ b/revision.c
@@ -16,7 +16,7 @@
=20
volatile show_early_output_fn_t show_early_output;
=20
-char *path_name(const struct name_path *path, const char *name)
+char *path_name_impl(const struct name_path *path, const char *name, i=
nt isdir)
{
const struct name_path *p;
char *n, *m;
@@ -27,7 +27,7 @@ char *path_name(const struct name_path *path, const c=
har *name)
if (p->elem_len)
len +=3D p->elem_len + 1;
}
- n =3D xmalloc(len);
+ n =3D xmalloc(len + !!isdir);
m =3D n + len - (nlen + 1);
strcpy(m, name);
for (p =3D path; p; p =3D p->up) {
@@ -37,6 +37,10 @@ char *path_name(const struct name_path *path, const =
char *name)
m[p->elem_len] =3D '/';
}
}
+ if (isdir && len > 1) {
+ n[len-1] =3D '/';
+ n[len] =3D '\0';
+ }
return n;
}
=20
diff --git a/revision.h b/revision.h
index 05659c6..92f4feb 100644
--- a/revision.h
+++ b/revision.h
@@ -173,7 +173,8 @@ struct name_path {
const char *elem;
};
=20
-char *path_name(const struct name_path *path, const char *name);
+char *path_name_impl(const struct name_path *path, const char *name, i=
nt isdir);
+#define path_name(path, name) path_name_impl(path, name, 0)
=20
extern void add_object(struct object *obj,
struct object_array *p,
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index b3c1dd8..b10685a 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -11,13 +11,13 @@ test_expect_success setup '
git commit -m one
'
=20
-test_expect_failure 'rev-list --objects heeds pathspecs' '
+test_expect_success 'rev-list --objects heeds pathspecs' '
git rev-list --objects HEAD -- wanted_file >output &&
grep wanted_file output &&
! grep unwanted_file output
'
=20
-test_expect_failure 'rev-list --objects with pathspecs and deeper path=
s' '
+test_expect_success 'rev-list --objects with pathspecs and deeper path=
s' '
mkdir foo &&
Post by Nguyễn Thái Ngọc Duy
foo/file &&
git add foo/file &&
@@ -31,7 +31,7 @@ test_expect_failure 'rev-list --objects with pathspec=
s and deeper paths' '
! grep unwanted_file output
'
=20
-test_expect_failure 'rev-list --objects with pathspecs and copied file=
s' '
+test_expect_success 'rev-list --objects with pathspecs and copied file=
s' '
git checkout --orphan junio-testcase &&
git rm -rf . &&
=20
--=20
1.7.1.rc1.69.g24c2f7
Nguyen Thai Ngoc Duy
2010-09-07 01:28:10 UTC
Permalink
Post by Elijah Newren
+static void add_missing_paths_before(const char *path,
+                                    int pathlen,
+                                    int baselen,
+                                    struct tree_desc *desc,
+                                    struct strbuf *buffer)
+{
+       if (!git_sparse_pathspecs)
+               return; /* No paths are missing */
+
+       for (; desc->size; update_tree_entry(desc)) {
+               struct name_entry entry = desc->entry;
+               int entlen = strlen(entry.path);
+
+               /* We only want paths before path */
+               if (path)
+                       /*
+                        * FIXME: if entlen < pathlen, do I need to
+                        * use strncmp instead? Does entry.path being
+                        * NUL-terminated ensure memcmp exits
+                        * early?
+                        */
+                       if (memcmp(entry.path,
+                                  path+baselen,
+                                  pathlen-baselen) >= 0)
+                               break;
If you do "break;" here, the current entry remains in "desc" and will
be used to regenerate a duplicate tree in
add_missing_paths_before(NULL, 0, 0, ..);. That entry is added the
first time by update_one itself before the second add_missing() call.
This fixes it for me

diff --git a/cache-tree.c b/cache-tree.c
index 79c28cc..1655738 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -278,17 +278,19 @@ static void add_missing_paths_before(const char *path,
int entlen = strlen(entry.path);

/* We only want paths before path */
- if (path)
+ if (path) {
/*
* FIXME: if entlen < pathlen, do I need to
* use strncmp instead? Does entry.path being
* NUL-terminated ensure memcmp exits
* early?
*/
- if (memcmp(entry.path,
- path+baselen,
- pathlen-baselen) >= 0)
+ int result = memcmp(entry.path, path+baselen, pathlen-baselen);
+ if (result > 0)
break;
+ if (!result)
+ continue;
+ }

/* We only want paths "missing" from index
Elijah Newren
2010-09-07 03:06:13 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
Post by Elijah Newren
+static void add_missing_paths_before(const char *path,
+                                    int pathlen,
+                                    int baselen,
+                                    struct tree_desc *desc,
+                                    struct strbuf *buffer)
+{
+       if (!git_sparse_pathspecs)
+               return; /* No paths are missing */
+
+       for (; desc->size; update_tree_entry(desc)) {
+               struct name_entry entry = desc->entry;
+               int entlen = strlen(entry.path);
+
+               /* We only want paths before path */
+               if (path)
+                       /*
+                        * FIXME: if entlen < pathlen, do I need to
+                        * use strncmp instead? Does entry.path being
+                        * NUL-terminated ensure memcmp exits
+                        * early?
+                        */
+                       if (memcmp(entry.path,
+                                  path+baselen,
+                                  pathlen-baselen) >= 0)
+                               break;
If you do "break;" here, the current entry remains in "desc" and will
be used to regenerate a duplicate tree in
add_missing_paths_before(NULL, 0, 0, ..);. That entry is added the
first time by update_one itself before the second add_missing() call.
This fixes it for me
diff --git a/cache-tree.c b/cache-tree.c
index 79c28cc..1655738 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -278,17 +278,19 @@ static void add_missing_paths_before(const char *path,
               int entlen = strlen(entry.path);
               /* We only want paths before path */
-               if (path)
+               if (path) {
                       /*
                        * FIXME: if entlen < pathlen, do I need to
                        * use strncmp instead? Does entry.path being
                        * NUL-terminated ensure memcmp exits
                        * early?
                        */
-                       if (memcmp(entry.path,
-                                  path+baselen,
-                                  pathlen-baselen) >= 0)
+                       int result = memcmp(entry.path, path+baselen, pathlen-baselen);
+                       if (result > 0)
                               break;
+                       if (!result)
+                               continue;
+               }
               /* We only want paths "missing" from index due to sparsity */
               if (path) {
Yes, good catch and than

Elijah Newren
2010-09-05 00:14:03 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
builtin/clone.c | 28 ++++++++++++++++++++--------
t/t5601-clone.sh | 14 --------------
transport.h | 3 +++
3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 19ed640..de0fb66 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,7 @@
#include "branch.h"
#include "remote.h"
#include "run-command.h"
+#include "quote.h"

/*
* Overall FIXMEs:
@@ -33,7 +34,7 @@
*
*/
static const char * const builtin_clone_usage[] = {
- "git clone [options] [--] <repo> [<dir>]",
+ "git clone [options] [--] <repo> [<dir>] [-- PATH1 PATH2...]",
NULL
};

@@ -375,6 +376,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
struct transport *transport = NULL;
char *src_ref_prefix = "refs/heads/";
int err = 0;
+ int rest_argc = 0;

struct refspec *refspec;
const char *fetch_pattern;
@@ -382,11 +384,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
junk_pid = getpid();

argc = parse_options(argc, argv, prefix, builtin_clone_options,
- builtin_clone_usage, 0);
-
- if (argc > 2)
- usage_msg_opt("Too many arguments.",
- builtin_clone_usage, builtin_clone_options);
+ builtin_clone_usage, PARSE_OPT_STOP_AT_NON_OPTION);

if (argc == 0)
usage_msg_opt("You must specify a repository to clone.",
@@ -418,10 +416,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (is_local && option_depth)
warning("--depth is ignored in local clones; use file:// instead.");

- if (argc == 2)
+ if (argc >= 2 && strcmp(argv[1], "--") != 0) {
dir = xstrdup(argv[1]);
- else
+ rest_argc = 2;
+ } else {
dir = guess_dir_name(repo_name, is_bundle, option_bare);
+ rest_argc = 1;
+ }
strip_trailing_slashes(dir);

dest_exists = !stat(dir, &buf);
@@ -539,6 +540,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
transport_set_option(transport, TRANS_OPT_UPLOADPACK,
option_upload_pack);

+ if (argc > rest_argc) {
+ struct strbuf buf = STRBUF_INIT;
+ int ret;
+
+ sq_quote_argv(&buf, &argv[rest_argc], 0);
+ ret = transport_set_option(transport, TRANS_OPT_SPARSE,
+ strbuf_detach(&buf, NULL));
+ if (ret)
+ warning ("Sparse clone not supported!\n");
+ }
+
refs = transport_get_remote_refs(transport);
if (refs) {
mapped_refs = wanted_peer_refs(refs, refspec);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 987e0c8..17c6a30 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -17,20 +17,6 @@ test_expect_success setup '

'

-test_expect_success 'clone with excess parameters (1)' '
-
- rm -fr dst &&
- test_must_fail git clone -n src dst junk
-
-'
-
-test_expect_success 'clone with excess parameters (2)' '
-
- rm -fr dst &&
- test_must_fail git clone -n "file://$(pwd)/src" dst junk
-
-'
-
test_expect_success 'output from clone' '
rm -fr dst &&
git clone -n "file://$(pwd)/src" dst >output &&
diff --git a/transport.h b/transport.h
index e803c0e..41e347a 100644
--- a/transport.h
+++ b/transport.h
@@ -127,6 +127,9 @@ struct transport *transport_get(struct remote *, const char *);
/* Aggressively fetch annotated tags if possible */
#define TRANS_OPT_FOLLOWTAGS "followtags"

+/* Fetch only certain paths */
+#define TRANS_OPT_SPARSE "sparse"
+
/**
* Returns 0 if the option was used, non-zero otherwise. Prints a
* message to stderr if the option is not used.
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:14:01 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
t/t5721-sparse-repository-communication.sh | 106 ++++++++++++++++++++++++++++
1 files changed, 106 insertions(+), 0 deletions(-)
create mode 100755 t/t5721-sparse-repository-communication.sh

diff --git a/t/t5721-sparse-repository-communication.sh b/t/t5721-sparse-repository-communication.sh
new file mode 100755
index 0000000..35cb6a2
--- /dev/null
+++ b/t/t5721-sparse-repository-communication.sh
@@ -0,0 +1,106 @@
+#!/bin/sh
+
+test_description='sparse repository communication'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/sparse-lib.sh
+
+test_expect_success 'setup' '
+ rm -fr .git &&
+ test_create_repo src &&
+ (
+ cd src &&
+
+ mkdir -p sub/{a,b} &&
+ > sub/a/file &&
+ git add sub/a/file &&
+ test_tick &&
+ git commit -q -m initial &&
+
+ > sub/b/file &&
+ git add sub/b/file &&
+ test_tick &&
+ git commit -q -m two &&
+
+ echo unique > sub/file &&
+ git add sub/file &&
+ test_tick &&
+ git commit -q -m three &&
+
+ echo content > sub/b/file &&
+ test_tick &&
+ git commit -q -m subbfile sub/b/file &&
+
+ cp -a sub/b sub/bcopy &&
+ git add sub/bcopy &&
+ test_tick &&
+ git commit -q -m subbcopy &&
+
+ echo stuff > sub/a/file &&
+ test_tick &&
+ git commit -q -m subafile sub/a/file
+ )
+'
+
+test_expect_success 'make comparison sparse repository' '
+ git clone -q "file://$(pwd)/src" compare_dst &&
+ (
+ cd compare_dst &&
+ test 25 = "$(git rev-list --objects master | wc -l)" &&
+ make_sparse sub/b/ &&
+ test 0 = $(find .git/objects/pack -type f | wc -l) &&
+ test 22 = $(find .git/objects -type f | wc -l)
+ )
+'
+
+test_expect_failure 'basic sparse clone succeeds' '
+ rm -fr dst &&
+ git clone "file://$(pwd)/src" dst -- sub/b/ &&
+ (
+ cd dst &&
+ git rev-parse master^{tree} &&
+ git rev-parse master:sub &&
+ git rev-parse master:sub/b &&
+ git rev-parse master:sub/b/file &&
+ git rev-parse master:sub/bcopy &&
+ git rev-parse master:sub/bcopy/file &&
+ git rev-parse master:sub/a &&
+ git rev-parse master:sub/file &&
+
+ git cat-file -p master:sub &&
+ git cat-file -p master:sub/b &&
+ git cat-file -p master:sub/b/file &&
+ git cat-file -p master:sub/bcopy &&
+ git cat-file -p master:sub/bcopy/file &&
+ test_must_fail git cat-file -p master:sub/file &&
+ test_must_fail git cat-file -p master:sub/a &&
+ test_must_fail git rev-parse master:sub/a/file &&
+
+ test -f sub/b/file &&
+ ! test -d sub/a &&
+ ! test -d sub/bcopy &&
+ ! test -f sub/file
+ )
+'
+
+test_expect_failure 'basic sparse clone guts match expectations' '
+ (
+ # Loose objects only, to facilitate comparison
+ cd dst &&
+ mv .git/objects/pack/* . &&
+ for i in $(ls *.pack); do
+ git unpack-objects -q < $i
+ done &&
+ rm -f *.pack *.idx
+ ) &&
+ (
+ # Get the names of all loose objects
+ cd dst &&
+ find .git/objects -type f > ../dst-objects &&
+ cd ../compare_dst &&
+ find .git/objects -type f > ../compare_dst-objects
+ ) &&
+ test_cmp dst-objects compare_dst-objects &&
+ test_cmp dst/.git/sparse-limits compare_dst/.git/sparse-limits
+'
+
+test_done
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:14:02 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
sparse-repo.c | 20 ++++++++++++++++++++
sparse-repo.h | 1 +
2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/sparse-repo.c b/sparse-repo.c
index 18ea976..c983096 100644
--- a/sparse-repo.c
+++ b/sparse-repo.c
@@ -5,6 +5,26 @@

#define SPARSE_FILE "sparse-limits"

+void record_sparse_limits(const char *quoted_argv)
+{
+ int fd, len;
+ char *sparse_file = git_path(SPARSE_FILE);
+
+ /* FIXME: sq_quote_argv puts a leading space in front of quoted
+ * arguments, but sq_dequote_to_argv chokes on it. Maybe sq_quote_argv
+ * should be fixed?
+ */
+ if (*quoted_argv == ' ')
+ quoted_argv++;
+ len = strlen(quoted_argv);
+
+ fd = open(sparse_file, O_WRONLY | O_CREAT, 0666);
+ if (fd < 0 || write_in_full(fd, quoted_argv, len) != len)
+ die_errno("could not write %s", sparse_file);
+ xwrite(fd, "\n", 1);
+ close(fd);
+}
+
void register_sparse_limits(const char **revlist_argv)
{
if (revlist_argv != git_sparse_limits) {
diff --git a/sparse-repo.h b/sparse-repo.h
index ed9018a..2df175f 100644
--- a/sparse-repo.h
+++ b/sparse-repo.h
@@ -1,3 +1,4 @@
+void record_sparse_limits(const char *quoted_argv);
void register_sparse_limits(const char **revlist_argv);
void register_quoted_sparse_limits(char *quoted_argv);
void check_sparse_repo(void);
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:14:04 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
builtin/archive.c | 2 +-
builtin/clone.c | 2 +-
builtin/fetch-pack.c | 3 ++-
builtin/send-pack.c | 3 ++-
cache.h | 2 +-
connect.c | 9 ++++++++-
transport-helper.c | 5 ++++-
transport.c | 13 +++++++++----
transport.h | 8 +++++---
9 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index 6a887f5..018d2b6 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -35,7 +35,7 @@ static int run_remote_archiver(int argc, const char **argv,
if (!_remote->url[0])
die("git archive: Remote with no URL");
transport = transport_get(_remote, _remote->url[0]);
- transport_connect(transport, "git-upload-archive", exec, fd);
+ transport_connect(transport, "git-upload-archive", exec, NULL, fd);

for (i = 1; i < argc; i++)
packet_write(fd[1], "argument %s\n", argv[i]);
diff --git a/builtin/clone.c b/builtin/clone.c
index de0fb66..5c0f594 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -545,7 +545,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
int ret;

sq_quote_argv(&buf, &argv[rest_argc], 0);
- ret = transport_set_option(transport, TRANS_OPT_SPARSE,
+ ret = transport_set_option(transport, TRANS_OPT_REVLIST_ARGS,
strbuf_detach(&buf, NULL));
if (ret)
warning ("Sparse clone not supported!\n");
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index dbd8b7b..e550f3d 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -875,7 +875,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
fd[0] = 0;
fd[1] = 1;
} else {
- conn = git_connect(fd, (char *)dest, args.uploadpack,
+ error("Should pass extra_args to git_connect!");
+ conn = git_connect(fd, (char *)dest, args.uploadpack, NULL,
args.verbose ? CONNECT_VERBOSE : 0);
}

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 481602d..c3ae328 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -478,7 +478,8 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
fd[0] = 0;
fd[1] = 1;
} else {
- conn = git_connect(fd, dest, receivepack,
+ error("Should pass extra_args to git_connect!");
+ conn = git_connect(fd, dest, receivepack, NULL,
args.verbose ? CONNECT_VERBOSE : 0);
}

diff --git a/cache.h b/cache.h
index 6f88dbb..84c0969 100644
--- a/cache.h
+++ b/cache.h
@@ -937,7 +937,7 @@ extern struct ref *find_ref_by_name(const struct ref *list, const char *name);

#define CONNECT_VERBOSE (1u << 0)
extern char *git_getpass(const char *prompt);
-extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
+extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, const char *extra_args, int flags);
extern int finish_connect(struct child_process *conn);
extern int path_match(const char *path, int nr, char **match);
struct extra_have_objects {
diff --git a/connect.c b/connect.c
index 3450cab..42135c8 100644
--- a/connect.c
+++ b/connect.c
@@ -449,7 +449,8 @@ static struct child_process no_fork;
* the connection failed).
*/
struct child_process *git_connect(int fd[2], const char *url_orig,
- const char *prog, int flags)
+ const char *prog, const char *extra_args,
+ int flags)
{
char *url;
char *host, *path;
@@ -550,6 +551,8 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
* Note: Do not add any other headers here! Doing so
* will cause older git-daemon servers to crash.
*/
+ if (extra_args)
+ error("What to do with extra_args?!?\n");
packet_write(fd[1],
"%s %s%chost=%s%c",
prog, path, 0,
@@ -567,6 +570,10 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
strbuf_addstr(&cmd, prog);
strbuf_addch(&cmd, ' ');
sq_quote_buf(&cmd, path);
+ if (extra_args) {
+ strbuf_addch(&cmd, ' ');
+ strbuf_addstr(&cmd, extra_args);
+ }
if (cmd.len >= MAX_CMD_LEN)
die("command line too long");

diff --git a/transport-helper.c b/transport-helper.c
index af81fe1..c2b6db3 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -509,7 +509,7 @@ static int process_connect(struct transport *transport,
}

static int connect_helper(struct transport *transport, const char *name,
- const char *exec, int fd[2])
+ const char *exec, const char *extra_args, int fd[2])
{
struct helper_data *data = transport->data;

@@ -518,6 +518,9 @@ static int connect_helper(struct transport *transport, const char *name,
if (!data->connect)
die("Operation not supported by protocol.");

+ if (extra_args)
+ die("extra_args not handled!!!! OH NOES!!!");
+
if (!process_connect_service(transport, name, exec))
die("Can't connect to subservice %s.", name);

diff --git a/transport.c b/transport.c
index 4dba6f8..1d3cab3 100644
--- a/transport.c
+++ b/transport.c
@@ -475,6 +475,9 @@ static int set_git_option(struct git_transport_options *opts,
else
opts->depth = atoi(value);
return 0;
+ } else if (!strcmp(name, TRANS_OPT_REVLIST_ARGS)) {
+ opts->revlist_args = value;
+ return 0;
}
return 1;
}
@@ -489,6 +492,7 @@ static int connect_setup(struct transport *transport, int for_push, int verbose)
data->conn = git_connect(data->fd, transport->url,
for_push ? data->options.receivepack :
data->options.uploadpack,
+ data->options.revlist_args,
verbose ? CONNECT_VERBOSE : 0);

return 0;
@@ -805,11 +809,12 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
}

static int connect_git(struct transport *transport, const char *name,
- const char *executable, int fd[2])
+ const char *executable, const char *extra_args,
+ int fd[2])
{
struct git_transport_data *data = transport->data;
data->conn = git_connect(data->fd, transport->url,
- executable, 0);
+ executable, extra_args, 0);
fd[0] = data->fd[0];
fd[1] = data->fd[1];
return 0;
@@ -1124,10 +1129,10 @@ void transport_unlock_pack(struct transport *transport)
}

int transport_connect(struct transport *transport, const char *name,
- const char *exec, int fd[2])
+ const char *exec, const char *extra_args, int fd[2])
{
if (transport->connect)
- return transport->connect(transport, name, exec, fd);
+ return transport->connect(transport, name, exec, extra_args, fd);
else
die("Operation not supported by protocol");
}
diff --git a/transport.h b/transport.h
index 41e347a..2669e84 100644
--- a/transport.h
+++ b/transport.h
@@ -11,6 +11,7 @@ struct git_transport_options {
int depth;
const char *uploadpack;
const char *receivepack;
+ const char *revlist_args;
};

struct transport {
@@ -71,7 +72,8 @@ struct transport {
int (*push_refs)(struct transport *transport, struct ref *refs, int flags);
int (*push)(struct transport *connection, int refspec_nr, const char **refspec, int flags);
int (*connect)(struct transport *connection, const char *name,
- const char *executable, int fd[2]);
+ const char *executable, const char *extra_args,
+ int fd[2]);

/** get_refs_list(), fetch(), and push_refs() can keep
* resources (such as a connection) reserved for futher
@@ -128,7 +130,7 @@ struct transport *transport_get(struct remote *, const char *);
#define TRANS_OPT_FOLLOWTAGS "followtags"

/* Fetch only certain paths */
-#define TRANS_OPT_SPARSE "sparse"
+#define TRANS_OPT_REVLIST_ARGS "revlist_args"

/**
* Returns 0 if the option was used, non-zero otherwise. Prints a
@@ -153,7 +155,7 @@ void transport_take_over(struct transport *transport,
struct child_process *child);

int transport_connect(struct transport *transport, const char *name,
- const char *exec, int fd[2]);
+ const char *exec, const char *extra_args, int fd[2]);

/* Transport methods defined outside transport.c */
int transport_helper_init(struct transport *transport, const char *name);
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:14:05 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
upload-pack.c | 44 ++++++++++++++++++++++++++++----------------
1 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index ce27a1a..721197b 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -110,6 +110,8 @@ static int do_rev_list(int in, int out, void *user_data)
{
int i;
struct rev_info revs;
+ int revlist_argc = 0;
+ const char **revlist_argv = user_data;

pack_pipe = xfdopen(out, "w");
init_revisions(&revs, NULL);
@@ -130,7 +132,10 @@ static int do_rev_list(int in, int out, void *user_data)
o->flags |= UNINTERESTING;
add_pending_object(&revs, o, NULL);
}
- setup_revisions(0, NULL, &revs, NULL);
+ if (revlist_argv)
+ for (; revlist_argv[revlist_argc]; revlist_argc++)
+ ;
+ setup_revisions(revlist_argc, revlist_argv, &revs, NULL);
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(revs.commits, &revs, show_edge);
@@ -144,7 +149,7 @@ static int do_rev_list(int in, int out, void *user_data)
return 0;
}

-static void create_pack_file(void)
+static void create_pack_file(const char **revlist_argv)
{
struct async rev_list;
struct child_process pack_objects;
@@ -156,11 +161,14 @@ static void create_pack_file(void)
ssize_t sz;
const char *argv[10];
int arg = 0;
+ int run_revlist = shallow_nr || revlist_argv;

- if (shallow_nr) {
+ if (run_revlist) {
memset(&rev_list, 0, sizeof(rev_list));
rev_list.proc = do_rev_list;
rev_list.out = -1;
+ if (revlist_argv)
+ rev_list.data = revlist_argv;
if (start_async(&rev_list))
die("git upload-pack: unable to fork git-rev-list");
argv[arg++] = "pack-objects";
@@ -183,7 +191,7 @@ static void create_pack_file(void)
argv[arg++] = NULL;

memset(&pack_objects, 0, sizeof(pack_objects));
- pack_objects.in = shallow_nr ? rev_list.out : -1;
+ pack_objects.in = run_revlist ? rev_list.out : -1;
pack_objects.out = -1;
pack_objects.err = -1;
pack_objects.git_cmd = 1;
@@ -193,7 +201,7 @@ static void create_pack_file(void)
die("git upload-pack: unable to fork git-pack-objects");

/* pass on revisions we (don't) want */
- if (!shallow_nr) {
+ if (!run_revlist) {
FILE *pipe_fd = xfdopen(pack_objects.in, "w");
if (!create_full_pack) {
int i;
@@ -307,7 +315,7 @@ static void create_pack_file(void)
error("git upload-pack: git-pack-objects died with error.");
goto fail;
}
- if (shallow_nr && finish_async(&rev_list))
+ if (run_revlist && finish_async(&rev_list))
goto fail; /* error was already reported */

/* flush the data */
@@ -656,7 +664,7 @@ static int mark_our_ref(const char *refname, const unsigned char *sha1, int flag
return 0;
}

-static void upload_pack(void)
+static void upload_pack(const char **revlist_argv)
{
if (advertise_refs || !stateless_rpc) {
reset_timeout();
@@ -673,13 +681,13 @@ static void upload_pack(void)
receive_needs();
if (want_obj.nr) {
get_common_commits();
- create_pack_file();
+ create_pack_file(revlist_argv);
}
}

-int main(int argc, char **argv)
+int main(int argc, const char **argv)
{
- char *dir;
+ char dir[4096];
int i;
int strict = 0;

@@ -689,7 +697,7 @@ int main(int argc, char **argv)
read_replace_refs = 0;

for (i = 1; i < argc; i++) {
- char *arg = argv[i];
+ const char *arg = argv[i];

if (arg[0] != '-')
break;
@@ -705,6 +713,9 @@ int main(int argc, char **argv)
strict = 1;
continue;
}
+ if (!strcmp(arg, "--help")) {
+ usage(upload_pack_usage);
+ }
if (!prefixcmp(arg, "--timeout=")) {
timeout = atoi(arg+10);
daemon_mode = 1;
@@ -716,12 +727,9 @@ int main(int argc, char **argv)
}
}

- if (i != argc-1)
- usage(upload_pack_usage);
-
setup_path();

- dir = argv[i];
+ strcpy(dir, argv[i]); /* enter-repo smudges its argument */

if (!enter_repo(dir, strict))
die("'%s' does not appear to be a git repository", dir);
@@ -729,6 +737,10 @@ int main(int argc, char **argv)
die("attempt to fetch/clone from a shallow repository");
if (getenv("GIT_DEBUG_SEND_PACK"))
debug_fd = atoi(getenv("GIT_DEBUG_SEND_PACK"));
- upload_pack();
+ /*
+ * revlist arguments for revision walking start at argv[i+1], but
+ * setup_revisions() ignores the first argument.
+ */
+ upload_pack(argc > i+1 ? &argv[i] : NULL);
return 0;
}
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:13:55 UTC
Permalink
Sparse clones will be created by specifying rev-list args to clone in order
to limit what is downloaded. In order to avoid accessing unavailable
objects, these limiting rev-list args are recorded in $GIT_DIR/sparse-limit
so that other operations can make use of them.

Signed-off-by: Elijah Newren <***@gmail.com>
---
Makefile | 2 +
cache.h | 3 ++
environment.c | 2 +
setup.c | 2 +
sparse-repo.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sparse-repo.h | 3 ++
tree-diff.c | 2 +
7 files changed, 78 insertions(+), 0 deletions(-)
create mode 100644 sparse-repo.c
create mode 100644 sparse-repo.h

diff --git a/Makefile b/Makefile
index d3db459..ba4b9bf 100644
--- a/Makefile
+++ b/Makefile
@@ -548,6 +548,7 @@ LIB_H += run-command.h
LIB_H += sha1-lookup.h
LIB_H += sideband.h
LIB_H += sigchain.h
+LIB_H += sparse-repo.h
LIB_H += strbuf.h
LIB_H += string-list.h
LIB_H += submodule.h
@@ -657,6 +658,7 @@ LIB_OBJS += sha1_name.o
LIB_OBJS += shallow.o
LIB_OBJS += sideband.o
LIB_OBJS += sigchain.o
+LIB_OBJS += sparse-repo.o
LIB_OBJS += strbuf.o
LIB_OBJS += string-list.o
LIB_OBJS += submodule.o
diff --git a/cache.h b/cache.h
index ce7bdd1..6f88dbb 100644
--- a/cache.h
+++ b/cache.h
@@ -403,6 +403,9 @@ extern const char *const local_repo_env[LOCAL_REPO_ENV_SIZE + 1];
extern int is_bare_repository_cfg;
extern int is_bare_repository(void);
extern int is_inside_git_dir(void);
+extern const char **git_sparse_limits;
+extern const char **git_sparse_pathspecs;
+extern struct diff_options git_sparse_diffopts;
extern char *git_work_tree_cfg;
extern int is_inside_work_tree(void);
extern int have_git_dir(void);
diff --git a/environment.c b/environment.c
index cc097b7..fc9d21d 100644
--- a/environment.c
+++ b/environment.c
@@ -61,6 +61,8 @@ struct startup_info *startup_info;
int core_preload_index = 0;

/* This is set by setup_git_dir_gently() and/or git_default_config() */
+const char **git_sparse_limits = NULL;
+const char **git_sparse_pathspecs = NULL;
char *git_work_tree_cfg;
static char *work_tree;

diff --git a/setup.c b/setup.c
index a3b76de..92f4385 100644
--- a/setup.c
+++ b/setup.c
@@ -1,5 +1,6 @@
#include "cache.h"
#include "dir.h"
+#include "sparse-repo.h"

static int inside_git_dir = -1;
static int inside_work_tree = -1;
@@ -256,6 +257,7 @@ static int check_repository_format_gently(int *nongit_ok)
*nongit_ok = -1;
return -1;
}
+ check_sparse_repo();
return 0;
}

diff --git a/sparse-repo.c b/sparse-repo.c
new file mode 100644
index 0000000..18ea976
--- /dev/null
+++ b/sparse-repo.c
@@ -0,0 +1,64 @@
+#include "sparse-repo.h"
+#include "cache.h"
+#include "diff.h"
+#include "quote.h"
+
+#define SPARSE_FILE "sparse-limits"
+
+void register_sparse_limits(const char **revlist_argv)
+{
+ if (revlist_argv != git_sparse_limits) {
+ free(git_sparse_limits);
+ git_sparse_limits = revlist_argv;
+ }
+
+ git_sparse_pathspecs = git_sparse_limits;
+ for (; *git_sparse_pathspecs; git_sparse_pathspecs++)
+ /*
+ * Sparse limiting args must always have a double dash before
+ * any paths, since users will likely fetch/pull from bare
+ * repositories. That makes it easy to find the pathspec
+ * portion of the sparse limits.
+ */
+ if (strcmp(*git_sparse_pathspecs, "--") == 0) {
+ /*
+ * Make git_sparse_limits provide quick access
+ * to non-pathspec rev-list arguments, and
+ * git_sparse_pathspec provide access to the
+ * rest.
+ */
+ *git_sparse_pathspecs = NULL;
+ git_sparse_pathspecs++;
+ break;
+ }
+ diff_tree_setup_paths(git_sparse_pathspecs, &git_sparse_diffopts);
+}
+
+void register_quoted_sparse_limits(char *quoted_argv)
+{
+ int nr = 0, alloc = 0;
+
+ sq_dequote_to_argv(strdup(quoted_argv), &git_sparse_limits, &nr, &alloc);
+ ALLOC_GROW(git_sparse_limits, nr+1, alloc);
+ git_sparse_limits[nr] = NULL;
+
+ register_sparse_limits(git_sparse_limits);
+}
+
+void check_sparse_repo(void)
+{
+ struct stat statbuf;
+ struct strbuf buffer = STRBUF_INIT;
+
+ char *sparse_file = git_path(SPARSE_FILE);
+ if (stat(sparse_file, &statbuf))
+ return;
+
+ if (git_sparse_limits)
+ die("tried to initialize sparse limits twice!\n");
+
+ if (strbuf_read_file(&buffer, sparse_file, statbuf.st_size) < 0)
+ die_errno("could not read %s", sparse_file);
+ register_quoted_sparse_limits(buffer.buf);
+ strbuf_release(&buffer);
+}
diff --git a/sparse-repo.h b/sparse-repo.h
new file mode 100644
index 0000000..ed9018a
--- /dev/null
+++ b/sparse-repo.h
@@ -0,0 +1,3 @@
+void register_sparse_limits(const char **revlist_argv);
+void register_quoted_sparse_limits(char *quoted_argv);
+void check_sparse_repo(void);
diff --git a/tree-diff.c b/tree-diff.c
index d976bdf..951b53b 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -6,6 +6,8 @@
#include "diffcore.h"
#include "tree.h"

+struct diff_options git_sparse_diffopts;
+
static char *malloc_base(const char *base, int baselen, const char *path, int pathlen)
{
char *newbase = xmalloc(baselen + pathlen + 2);
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:14:06 UTC
Permalink
The correct fix is to include the *relevant pieces* of all commits. But
this is a quick stopgap that lets me test some stuff.

Signed-off-by: Elijah Newren <***@gmail.com>
---
revision.c | 5 +++--
revision.h | 3 ++-
upload-pack.c | 1 +
3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/revision.c b/revision.c
index 67b1a1d..c411bda 100644
--- a/revision.c
+++ b/revision.c
@@ -335,7 +335,7 @@ static int rev_compare_tree(struct rev_info *revs, struct commit *parent, struct
tree_difference = REV_TREE_SAME;
DIFF_OPT_CLR(&revs->pruning, HAS_CHANGES);
if (diff_tree_sha1(t1->object.sha1, t2->object.sha1, "",
- &revs->pruning) < 0)
+ &revs->pruning) < 0 || revs->sparse_traversal)
return REV_TREE_DIFFERENT;
return tree_difference;
}
@@ -380,7 +380,8 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
return;

if (!commit->parents) {
- if (rev_same_tree_as_empty(revs, commit))
+ if (rev_same_tree_as_empty(revs, commit) &&
+ !revs->sparse_traversal)
commit->object.flags |= TREESAME;
return;
}
diff --git a/revision.h b/revision.h
index ba879c9..17865e8 100644
--- a/revision.h
+++ b/revision.h
@@ -85,7 +85,8 @@ struct rev_info {
ignore_merges:1,
combine_merges:1,
dense_combined_merges:1,
- always_show_header:1;
+ always_show_header:1,
+ sparse_traversal:1;

/* Format info */
unsigned int shown_one:1,
diff --git a/upload-pack.c b/upload-pack.c
index 721197b..a58b350 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -120,6 +120,7 @@ static int do_rev_list(int in, int out, void *user_data)
revs.blob_objects = 1;
if (use_thin_pack)
revs.edge_hint = 1;
+ revs.sparse_traversal = 1;

for (i = 0; i < want_obj.nr; i++) {
struct object *o = want_obj.objects[i].item;
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:14:07 UTC
Permalink
After fetching, register sparse args for subsequent checkout portion of
clone, and record sparse args for any later commands run in the clone.

Signed-off-by: Elijah Newren <***@gmail.com>
---
builtin/clone.c | 17 +++++++++++++----
t/t5721-sparse-repository-communication.sh | 4 ++--
2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c0f594..4f55ab0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -24,6 +24,7 @@
#include "remote.h"
#include "run-command.h"
#include "quote.h"
+#include "sparse-repo.h"

/*
* Overall FIXMEs:
@@ -370,7 +371,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
const struct ref *refs, *remote_head;
const struct ref *remote_head_points_at;
const struct ref *our_head_points_at;
- struct ref *mapped_refs;
+ struct ref *mapped_refs = NULL;
struct strbuf key = STRBUF_INIT, value = STRBUF_INIT;
struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
struct transport *transport = NULL;
@@ -519,9 +520,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
strbuf_reset(&value);

if (is_local) {
+ if (argc > rest_argc)
+ die("Sparse clones of local repositories must be done using the file:// protocol.");
refs = clone_local(path, git_dir);
mapped_refs = wanted_peer_refs(refs, refspec);
} else {
+ struct strbuf quoted_sparse_args = STRBUF_INIT;
struct remote *remote = remote_get(option_origin);
transport = transport_get(remote, remote->url[0]);

@@ -541,12 +545,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
option_upload_pack);

if (argc > rest_argc) {
- struct strbuf buf = STRBUF_INIT;
int ret;

- sq_quote_argv(&buf, &argv[rest_argc], 0);
+ sq_quote_argv(&quoted_sparse_args, &argv[rest_argc], 0);
ret = transport_set_option(transport, TRANS_OPT_REVLIST_ARGS,
- strbuf_detach(&buf, NULL));
+ quoted_sparse_args.buf);
if (ret)
warning ("Sparse clone not supported!\n");
}
@@ -556,6 +559,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
mapped_refs = wanted_peer_refs(refs, refspec);
transport_fetch_refs(transport, mapped_refs);
}
+
+ if (argc > rest_argc) {
+ register_sparse_limits(&argv[rest_argc]);
+ record_sparse_limits(quoted_sparse_args.buf);
+ }
+ strbuf_release(&quoted_sparse_args);
}

if (refs) {
diff --git a/t/t5721-sparse-repository-communication.sh b/t/t5721-sparse-repository-communication.sh
index 35cb6a2..61ce8c6 100755
--- a/t/t5721-sparse-repository-communication.sh
+++ b/t/t5721-sparse-repository-communication.sh
@@ -52,7 +52,7 @@ test_expect_success 'make comparison sparse repository' '
)
'

-test_expect_failure 'basic sparse clone succeeds' '
+test_expect_success 'basic sparse clone succeeds' '
rm -fr dst &&
git clone "file://$(pwd)/src" dst -- sub/b/ &&
(
@@ -82,7 +82,7 @@ test_expect_failure 'basic sparse clone succeeds' '
)
'

-test_expect_failure 'basic sparse clone guts match expectations' '
+test_expect_success 'basic sparse clone guts match expectations' '
(
# Loose objects only, to facilitate comparison
cd dst &&
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:13:54 UTC
Permalink
Signed-off-by: Elijah Newren <***@gmail.com>
---
t/sparse-lib.sh | 38 ++++++++++
t/t5720-sparse-repository-basics.sh | 130 +++++++++++++++++++++++++++++++++++
2 files changed, 168 insertions(+), 0 deletions(-)
create mode 100644 t/sparse-lib.sh
create mode 100755 t/t5720-sparse-repository-basics.sh

diff --git a/t/sparse-lib.sh b/t/sparse-lib.sh
new file mode 100644
index 0000000..0b779c6
--- /dev/null
+++ b/t/sparse-lib.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+make_sparse() {
+ # We only want loose objects
+ mv .git/objects/pack/* . &&
+ for i in $(ls *.pack); do
+ git unpack-objects -q < $i
+ done &&
+ rm -f *.pack *.idx &&
+
+ cd .git/objects &&
+
+ # Find the objects need for the specified paths
+ for i in $(git rev-list master); do
+ echo $i;
+ git rev-parse $i^{tree};
+ git ls-tree -rt $i -- "$@" | awk '{print $3}';
+ done | sort | uniq > ../wanted &&
+
+ # Find all other objects and delete them
+ find . -type f | sed -e s#[\./]##g \
+ | grep -v -F "$(cat ../wanted)" > ../bad &&
+ for i in $(cat ../bad); do
+ rm -f ./${i:0:2}/${i:2};
+ done &&
+
+ # Record the sparse limits
+ cd .. &&
+ echo -n "'--'" > sparse-limits &&
+ for i in "$@"; do
+ echo -n " '$i'" >> sparse-limits
+ done &&
+ echo >> sparse-limits &&
+
+ # Trim the index while we're at it
+ cd .. &&
+ git reset --hard HEAD
+}
diff --git a/t/t5720-sparse-repository-basics.sh b/t/t5720-sparse-repository-basics.sh
new file mode 100755
index 0000000..b8e9a3a
--- /dev/null
+++ b/t/t5720-sparse-repository-basics.sh
@@ -0,0 +1,130 @@
+#!/bin/sh
+
+test_description='sparse repository basics'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/sparse-lib.sh
+
+test_expect_success 'setup' '
+ rm -fr .git &&
+ test_create_repo src &&
+ (
+ cd src &&
+
+ mkdir -p sub/{a,b} &&
+ > sub/a/file &&
+ git add sub/a/file &&
+ test_tick &&
+ git commit -q -m initial &&
+
+ > sub/b/file &&
+ git add sub/b/file &&
+ test_tick &&
+ git commit -q -m two &&
+
+ echo unique > sub/file &&
+ git add sub/file &&
+ test_tick &&
+ git commit -q -m three &&
+
+ echo content > sub/b/file &&
+ test_tick &&
+ git commit -q -m subbfile sub/b/file &&
+
+ cp -a sub/b sub/bcopy &&
+ git add sub/bcopy &&
+ test_tick &&
+ git commit -q -m subbcopy &&
+
+ echo stuff > sub/a/file &&
+ test_tick &&
+ git commit -q -m subafile sub/a/file
+ )
+'
+
+test_expect_failure 'make sparse repository' '
+ git clone -q "file://$(pwd)/src" dst &&
+ (
+ cd dst &&
+ test 25 = "$(git rev-list --objects master | wc -l)" &&
+ make_sparse sub/b/file &&
+ test 0 = $(find .git/objects/pack -type f | wc -l) &&
+ test 22 = $(find .git/objects -type f | wc -l)
+ )
+'
+
+cd dst 2>/dev/null || test_done
+srcgit="--git-dir=../src/.git"
+
+test_expect_failure 'plumbing: ls-files works' '
+ git ls-files > output &&
+ test "sub/b/file" = "$(cat output)"
+'
+
+test_expect_failure 'plumbing: rev-list works' '
+ test "$(git rev-list HEAD)" = \
+ "$(git $srcgit rev-list HEAD -- sub/b/)" &&
+ test "$(git rev-list --objects HEAD)" = \
+ "$(git $srcgit rev-list --objects HEAD -- sub/b/)"
+'
+
+for i in $(git $srcgit rev-list HEAD | xargs git name-rev | cut -b 42-); do
+ git $srcgit rev-parse $i:sub/b/file >/dev/null 2>&1 &&
+ test_expect_success "plumbing: We can access $i:sub/b/file" "
+ git cat-file -t $i:sub/b/file
+ "
+done
+
+final_afile_sha=$(git $srcgit rev-parse master:sub/a/file)
+known_objects=$(git $srcgit rev-list --objects master \
+ | grep sub \
+ | grep -v $final_afile_sha \
+ | cut -b -40)
+for i in $(git $srcgit rev-list HEAD | xargs git name-rev | cut -b 42-); do
+ git $srcgit ls-tree -rt $i | grep -F "$known_objects" >expect &&
+ test_expect_failure "plumbing: ls-tree -rt $i works" "
+ git ls-tree -rt $i 2>error >output &&
+ test_cmp output expect
+ "
+done
+
+test_expect_failure 'basic: log works' '
+ git log > /dev/null &&
+ git log -p > /dev/null &&
+ git log -Scontent > /dev/null
+'
+
+test_expect_failure 'basic: diff works' '
+ git diff master~3 master &&
+ git diff master~3
+'
+
+test_expect_failure 'basic: checkout works' '
+ git checkout master~2 &&
+ git checkout master
+'
+
+test_expect_failure 'basic: status works with modified stuff' '
+ git status &&
+ echo more content >> sub/b/file &&
+ echo newfile content >> sub/b/whatever &&
+ git status
+'
+
+test_expect_failure 'basic: add works' '
+ git add sub/b/file &&
+ git add sub/b/whatever
+'
+
+test_expect_failure 'basic: commit works' '
+ git commit -m "Commit in a sparse clone" &&
+ git rev-parse master^{tree} &&
+ git rev-parse master:sub &&
+ git rev-parse master:sub/b &&
+ git rev-parse master:sub/b/file &&
+ git rev-parse master:sub/bcopy &&
+ git rev-parse master:sub/bcopy/file &&
+ git rev-parse master:sub/a &&
+ git rev-parse master:sub/file
+'
+
+test_done
--
1.7.2.2.140.gd06af
Elijah Newren
2010-09-05 00:13:58 UTC
Permalink
In a sparse repository, by automatically making use of sparse limits
specified at clone time, we can avoid walking uninteresting commits and
prevent access to missing trees and blobs. Note that this means that if
you created a sparse clone with
git clone <repo> <dest_dir> -- PATH1 PATH2 PATH3 PATH4 PATH5
then
git log
implicitly runs as though you had manually specified
git log -- PATH1 PATH2 PATH3 PATH4 PATH5
Similarly, running
git diff
implicitly runs as though you had manually specified
git diff -- PATH1 PATH2 PATH3 PATH4 PATH5

This is necessary for proper operation of git diff in a sparse clone to
avoid accessing missing objects. In the case of a plain git log, this
merely serves as an additional convenience, but for more complicated log
operations (e.g. when passing -p or -S options) it becomes necessary.

Signed-off-by: Elijah Newren <***@gmail.com>
---
revision.c | 16 +++++++++++++++-
t/t5720-sparse-repository-basics.sh | 6 +++---
2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index a962b77..67b1a1d 100644
--- a/revision.c
+++ b/revision.c
@@ -1492,6 +1492,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
const char *submodule = NULL;
const char *optarg;
int argcount;
+ const char *arg;

if (opt)
submodule = opt->submodule;
@@ -1514,7 +1515,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
flags = 0;
read_from_stdin = 0;
for (left = i = 1; i < argc; i++) {
- const char *arg = argv[i];
+ arg = argv[i];
if (*arg == '-') {
int opts;

@@ -1624,6 +1625,19 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
got_rev_arg = 1;
}

+ /*
+ * FIXME: This disallows things like --all, --stdin, --not, although
+ * some of those like --stdin may not make sense anyway.
+ */
+ for (arg = git_sparse_limits ? *git_sparse_limits : NULL; arg; arg++) {
+ const int flags = 0;
+ const int cant_be_filename = 1;
+ if (handle_revision_arg(arg, revs, flags, cant_be_filename))
+ die("bad revision '%s'", arg);
+ }
+ if (git_sparse_pathspecs)
+ append_prune_data(&prune_data, git_sparse_pathspecs);
+
if (prune_data)
revs->prune_data = get_pathspec(revs->prefix, prune_data);

diff --git a/t/t5720-sparse-repository-basics.sh b/t/t5720-sparse-repository-basics.sh
index b11c5ab..d04e171 100755
--- a/t/t5720-sparse-repository-basics.sh
+++ b/t/t5720-sparse-repository-basics.sh
@@ -60,7 +60,7 @@ test_expect_success 'plumbing: ls-files works' '
test "sub/b/file" = "$(cat output)"
'

-test_expect_failure 'plumbing: rev-list works' '
+test_expect_success 'plumbing: rev-list works' '
test "$(git rev-list HEAD)" = \
"$(git $srcgit rev-list HEAD -- sub/b/)" &&
test "$(git rev-list --objects HEAD)" = \
@@ -87,13 +87,13 @@ for i in $(git $srcgit rev-list HEAD | xargs git name-rev | cut -b 42-); do
"
done

-test_expect_failure 'basic: log works' '
+test_expect_success 'basic: log works' '
git log > /dev/null &&
git log -p > /dev/null &&
git log -Scontent > /dev/null
'

-test_expect_failure 'basic: diff works' '
+test_expect_success 'basic: diff works' '
git diff master~3 master &&
git diff master~3
'
--
1.7.2.2.140.gd06af
Nguyen Thai Ngoc Duy
2010-09-05 01:58:08 UTC
Permalink
Post by Elijah Newren
In a sparse repository, by automatically making use of sparse limits
specified at clone time, we can avoid walking uninteresting commits a=
nd
Post by Elijah Newren
prevent access to missing trees and blobs. =C2=A0Note that this means=
that if
Post by Elijah Newren
you created a sparse clone with
=C2=A0git clone <repo> <dest_dir> -- PATH1 PATH2 PATH3 PATH4 PATH5
then
=C2=A0git log
implicitly runs as though you had manually specified
=C2=A0git log -- PATH1 PATH2 PATH3 PATH4 PATH5
Similarly, running
=C2=A0git diff
implicitly runs as though you had manually specified
=C2=A0git diff -- PATH1 PATH2 PATH3 PATH4 PATH5
This is necessary for proper operation of git diff in a sparse clone =
to
Post by Elijah Newren
avoid accessing missing objects. =C2=A0In the case of a plain git log=
, this
Post by Elijah Newren
merely serves as an additional convenience, but for more complicated =
log
Post by Elijah Newren
operations (e.g. when passing -p or -S options) it becomes necessary.
I have a nicer approach here. Instead of modifying setup_revisions()
and similar functions, I update get_pathspec() to rewrite the
pathspecs from command line in narrow/shallow repos. get_pathspec()
currently does some form of rewriting already (prepending prefix).

So if you do "git log", get_pathspec() would return "git log -- PATH1
PATH2...". Will repost my series soon, or I can send that particular
patch to you.
--=20
Duy
Elijah Newren
2010-09-05 04:50:34 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
Post by Elijah Newren
In a sparse repository, by automatically making use of sparse limits
specified at clone time, we can avoid walking uninteresting commits and
I have a nicer approach here. Instead of modifying setup_revisions()
and similar functions, I update get_pathspec() to rewrite the
pathspecs from command line in narrow/shallow repos. get_pathspec()
currently does some form of rewriting already (prepending prefix).
So if you do "git log", get_pathspec() would return "git log -- PATH1
PATH2...". Will repost my series soon, or I can send that particular
patch to you.
Ooh, that's clever. I like it. Do you also do verification that any
paths specified by the user are a subset of the paths the sparse clone
is limited to? That'd be really nice. Much better than my simple
dumb "just append" logic.

How do you reconcile conflicting needs, though? For most cases,
whenever the user specifies paths, get_pathspec should make sure those
paths are a subset of the sparse paths (throwing an error if they're
not) and then just use the user-specified ones. However, I think
doing this in all cases would break ls-files/cat-file -p, as the
pathspecs passed to those are not necessarily rooted at the toplevel
of the repository (and the prefix doesn't correct for that either).

Also, when no paths are specified by the user, the correct answer can
still vary by command. ls-tree should be given no pathspecs in such a
case, even in a sparse clone. Nor should commit (it wouldn't hurt too
bad to send it the sparse pathspecs, though it'd send commit through
the "partial commit" section of the code base each time rather than
the "as-is commit" section, which seems suboptimal). And on the other
end of the spectrum, log/rev-list/diff should all take the sparse
pathspecs in such a case.
Nguyen Thai Ngoc Duy
2010-09-05 07:12:07 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
In a sparse repository, by automatically making use of sparse limit=
s
Post by Nguyen Thai Ngoc Duy
specified at clone time, we can avoid walking uninteresting commits=
and
Post by Nguyen Thai Ngoc Duy
I have a nicer approach here. Instead of modifying setup_revisions()
and similar functions, I update get_pathspec() to rewrite the
pathspecs from command line in narrow/shallow repos. get_pathspec()
currently does some form of rewriting already (prepending prefix).
So if you do "git log", get_pathspec() would return "git log -- PATH=
1
Post by Nguyen Thai Ngoc Duy
PATH2...". Will repost my series soon, or I can send that particular
patch to you.
Ooh, that's clever. =C2=A0I like it. =C2=A0Do you also do verificatio=
n that any
paths specified by the user are a subset of the paths the sparse clon=
e
is limited to? =C2=A0That'd be really nice. =C2=A0Much better than my=
simple
dumb "just append" logic.
I do.
How do you reconcile conflicting needs, though? =C2=A0For most cases,
whenever the user specifies paths, get_pathspec should make sure thos=
e
paths are a subset of the sparse paths (throwing an error if they're
not) and then just use the user-specified ones. =C2=A0However, I thin=
k
doing this in all cases would break ls-files/cat-file -p, as the
pathspecs passed to those are not necessarily rooted at the toplevel
of the repository (and the prefix doesn't correct for that either).
Yes, some commands will just skip the pathspec rewrite. It's up to
those commands to rewrite pathspec themselves.

So get_pathspec() will have pathspec rewrite by default. Other
commands will use another variant of it (i.e. get_pathspec_narrow()),
which allows to skip the rewrite completely.
--=20
Duy
Elijah Newren
2010-09-05 00:13:53 UTC
Permalink
This write-up just has basic ideas, strategies, notes of what needs to be
done, etc. It needs to be pruned, cleaned up, corrected as I learn more,
moved elsewhere, etc.

Signed-off-by: Elijah Newren <***@gmail.com>
---
README-sparse-clone | 283 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 283 insertions(+), 0 deletions(-)
create mode 100644 README-sparse-clone

diff --git a/README-sparse-clone b/README-sparse-clone
new file mode 100644
index 0000000..cfeeef3
--- /dev/null
+++ b/README-sparse-clone
@@ -0,0 +1,283 @@
+This is my set of notes on implementing sparse clones, which I define
+as a clone where not all blob, tree, or commit objects are downloaded.
+This includes sparseness both relative to span of directories and
+depth of history.
+
+(Note: This project has work-in-progress patches -- no promises about
+quality, speed of implementation, promises not to rebase, etc. etc.)
+
+*** Summary ***
+
+ Basic Idea:
+ 0) Only relevant blobs, trees, and commits (+ ancestry) are downloaded.
+ User View:
+ U1) A user controls sparseness by passing rev-list arguments to clone.
+ U2) "Densifying" a sparse clone can be done (with new rev-list arguments)
+ U3) Cloning-from/fetching-from/pushing-to sparse clones is supported.
+ U4) Operations that need unavailable data simply error out
+ U5) Old style shallow clones (--depth argument to clone) are obsolete
+ U6) Miscellaneous notes
+ Internals:
+ I1) The limiting rev-list arguments passed to clone are stored.
+ I2) All revision-walking operations automatically use the limiting args.
+ I3) The index only contains paths matching the sparse limits
+ I4) Loading a missing commit results in a fake commit being created
+ I5) In sparse clones, a special merge strategy must be used
+ I6) Miscellaneous notes
+
+*** Basic Idea ***
+
+0) Only relevant blobs, trees, and commits (+ ancestry) are downloaded.
+
+Only the relevant blobs, trees, and commits are downloaded.
+Irrelevant blobs and trees are left out entirely (see items I2 & I3
+for how we avoid accessing these).
+
+To ensure minimum necessary connectivity, we also download basic
+information from otherwise excluded commits
+ * parents of these commits
+ * trees matching the specified sparse path(s)
+but, for security and space reasons, do not download
+ * author
+ * author date
+ * committer
+ * committer date
+ * log message
+Such commits are still considered "missing" (see item I4 for more
+details about how we handle "missing" commits).
+
+Tags/branches are downloaded if specified (or, if no branch/tag is
+specified, all tags/branches are downloaded).
+
+Security note: No modifications are done to existing trees, meaning
+that sparse clones will download the name of "irrelevant" blobs/trees
+with their type, mode, and sha1sum if (and only if) such blobs/trees
+are siblings of a relevant blob/tree. It is assumed that such
+information is okay to be transmitted and need not remain private; if
+such information does need to remain private, an alternate mechanism
+involving rewriting commits will be necessary (such as git-subtree).
+
+*** User View ***
+
+U1) A user controls sparseness by passing rev-list arguments to clone.
+
+This allows a user to control sparseness both in terms of span of
+content (files/directories) and depth of history. It can also be used
+to limit to a subset of refs (cloning just one or two branches instead
+of all branches and tags). For example,
+ $ git clone ssh://repo.git dst -- Documentation/
+ $ git clone ssh://repo.git dst master~6..master
+ $ git clone ssh://repo.git dst -3
+(Note that the destination argument becomes mandatory for those doing
+a sparse clone in order to disambiguate it from rev-list options.)
+
+This method also means users don't need much training to learn how to
+use sparse clones -- they just use syntax they've already learned with
+log, and clone will pass this info on to upload-pack.
+
+There is a difference due to inclusive revision specifications
+(master, master~6, v4.15.6) vs. exclusive ones (-3, ^master,
+^master~6). Inclusive revisions must be branch or tag names
+(e.g. stable or v1.8, but not master~6 or v4.18.2~1 or sha1sum or
+:/<search string>)[1]. "HEAD --all"
+are assumed if no inclusive revisions are specified. (Note: Avery
+seems to suggest always assuming "HEAD --all", at least at first.)
+
+[1] This limitation on inclusive revisions could be relaxed in the
+future for specifications derived from branch names, as long as each
+branch has no more than one associated derived revision specification.
+For example, master~6 would mean to clone a copy of the master branch
+on the remote side, excluding the last 6 commits, so that you start
+out "6 commits behind" the remote. Obviously, it wouldn't make sense
+to have both "master^1" and "master^2" specified, since we then
+wouldn't know where master should point in the clone.
+
+U2) "Densifying" a sparse clone can be done (with new rev-list arguments)
+
+One can fetch a new pack, replace the original limiting rev-list args
+with the new choice (see item I1), and update the working copy to
+reflect the changes. As users wouldn't expect a "fetch" or a "merge"
+to un-sparsify a checkout, there's a special operation for performing
+all three operations.
+
+[First cut will be to just redownload everything, instead of just the
+necessary data. I'm thinking it won't be a common operation, and it
+could always be improved later.]
+
+U3) Cloning-from/fetching-from/pushing-to sparse clones is supported.
+
+This allows people who need to operate on a subset of the repository
+(e.g. translators, technical writers, etc.) to collaborate on that
+subset. I think one simple rule should enable this:
+
+ * The receiving repository specifies the limiting rev-list arguments
+ to use (if the sending repository does not have the relevant data,
+ it will naturally error out)
+
+By having the receving side specify the limiting rev-list arguments,
+it ensures that any data it receives fulfills its needs. The sending
+side then uses this information when creating a pack to determine the
+necessary objects to send, ignoring anything outside the paths/ranges
+specified in those limits. If the sending side is a sparse clone that
+does not have the necessary data specified by the receiver, then
+pack-objects will hit a nasty low-level missing object error, aborting
+the operation. In the future, we could maybe add a nicer error
+message.
+
+One special case:
+ * When cloning a repository, if the user did not specify any
+ limiting rev-list arguments, use those from the repository being
+ cloned. (Don't require the user to type out all the paths every
+ time; e.g. 'git clone URL DEST -- PATH1 PATH2 PATH3 PATH4...')
+
+U4) Operations that need unavailable data simply error out
+
+Although no normal git command should be disabled entirely, there will
+be cases when some git commands cannot function without more data.
+
+Examples:
+ * merge, cherry-pick, rebase (if unavailable files needed)
+ * upload-pack (if more data requested than available in a sparse clone)
+
+Merge, cherry-pick, and rebase deserve special consideration to
+operate in sparse clones (see item I5), since merge strategies
+normally require full trees.
+
+U5) Old style shallow clones (--depth argument to clone) are obsolete
+
+Since one can pass "-3" to get a "shallow" clone, old-style shallow
+clones are obsolete. New style shallow/sparse clones will also be
+more capable, since one can
+ * exclude based on commit (e.g. ^master~10) in addition to depth
+ * clone/push/pull from/to shallow clones
+
+What to do with old style shallow clones? Probably deprecate them,
+make the --depth argument to clone print an error message suggesting
+the new syntax, and then gut the related code at some point in the
+future.
+
+U6) Miscellaneous notes
+ * fsck & status should print a notice when working on a sparse clone
+ * paths in limiting rev-list args *must* follow '--' (current or
+ future remote repo may be bare, meaning setup_revisions will
+ complain about nonexistent paths specified without a preceding
+ '--'). Having all paths folow a '--' will also make it easier to
+ find them and pass them on to diff machinery (see item I2).
+ * notes hierarchy may also need to be made sparse in a way that only
+ notes pointing downloaded objects should be downloaded. This
+ implies missing blobs/trees, and maybe even "missing" commits.
+ But how do I avoid traversing the wrong notes on the client side?
+ Ouch. Maybe just include all notes? Or exclude all notes?
+
+*** Internals ***
+
+I1) The limiting rev-list arguments passed to clone are stored.
+
+However, relative arguments such as "-3" or "^master~6" first need to
+be translated into one or more exclude ranges written as "^<sha1>".
+
+I2) All revision-walking operations automatically use the limiting args.
+
+This should be a simple code change, and would enable rev-list, log,
+diff (which also uses the revision walking machinery), etc. to avoid
+missing blobs/trees/commits and thus enable them to work with sparse
+clones. fsck would take a bit more work, since it doesn't use the
+setup_revisions() and revision.h walking machinery, but shouldn't be
+too bad (I hope).
+
+Also, the pathspecs (or the diff options they generate) are available
+easily for operations that need them (see I3).
+
+I3) The index only contains paths matching the sparse limits
+
+Since not all trees are downloaded, not all files can even be
+referenced in the index. Further, in some cases, the only thing that
+can be referenced is a tree rather than a file. We only want paths
+matching the relevant sparse limits to be included in the index. This
+means two things:
+ * When extracting entries from trees into the index, the sparse limits
+ need to be taken into consideration
+ * Whenever writing trees, using the index is no longer sufficient.
+ Instead, the files in the index are used to record
+ sha1sums/modes/filenames for paths within the sparse limits, and
+ another tree (typically from HEAD) is used to record
+ sha1sums/modes/filenames/types for paths outside the sparse
+ limits.
+
+Note that writing trees from the index can occur with commit, merge,
+checkout (-m), revert/cherry-pick --no-commit, and write-tree. All
+need to be updated to either provide a relevant tree or error out when
+run from a sparse clone.
+
+I4) Loading a missing commit results in a fake commit being created
+
+Fake commits have correct parentage and an appropriate (sparse) tree
+(since those pieces of information are available), but blank author &
+committer, 0 for times & timezones, and a commit log message such as
+the following:
+ This commit is missing from this sparse clone. You can use the
+ densify command to download missing commits and files.
+
+This allows the following to work:
+ * git commit (which needs tree/file sha1sums that were not modified,
+ though if a given tree is unmodified, no subtree/subfile sha1s are
+ needed)
+ * tags & branches (which can correctly point at missing commits)
+ * git show (with a branch/tag/commit)
+ * git prune (missing objects correctly reference their parent(s))
+ * git fsck (missing commits still referenced)
+
+Extra notes:
+ * Stored in a file using multiple lines of: <commit> <tree> <parent1> ...
+ * Only referenced when git would otherwise die
+
+I5) In sparse clones, a special merge strategy must be used
+
+Most merge strategies work at the file/content level. Since many
+files and even whole trees will be unavailable, a special strategy
+that works with tree-level items is necessary. It should only perform
+trivial merges when forced to operate at the tree-level (modified on
+at most one side of history, and probably no rename handling at least
+at first). When such trivial merges are not possible, it should fail
+with a helpful error message noting the needed tree contents.
+
+For non-missing blobs, standard merge strategies may be used.
+
+I6) Miscellaneous notes
+ * thin-packs: git pack-objects needs to be told to only delta
+ against objects that match the sparse limits, otherwise the
+ receiving side will not be able to use the resulting pack.
+
+----------------------------------------------------------------------
+
+Testcases needed:
+ * basics: checkout, status, diff, log (w/ options!), add, commit
+ * extras: blame, apply, bisect, branch, tag, grep, reset
+ * maintainence: fsck, prune, gc/repack, verify-pack
+ * plumbing: {read,write,ls,commit,merge,tar,diff}-tree, mktree
+ * direct: cat-file, show (esp. missing obj. or tag/branch of such)
+ * merge strat.: merge, cherry-pick/revert, rebase
+ * communication: pull, push, fetch, clone, bundle, archive
+ * protocols: http, ssh, git, rsync
+ * rewrite: filter-branch, fast-{export, import}
+ * notes: ?
+
+ General:
+ 'clone NON-BARE-REPO dst PATHS' should fail (needs double dash)!
+ git rev-list master should show subset of available commits
+ Keep Index sparse:
+ git add <path> for <path> not in git_sparse_pathspec should error out
+ update-index on <path> not in git_sparse_pathspec should error out
+ Sparse Index Handling:
+ merge into branch yet to be born, revert
+ checkout -m (to real branch, from valid or yet-to-be born branch)
+
+ Major TODOs:
+ * fetch
+ * push
+ * don't pass revlist arguments on command line to upload pack; use protocol
+ * densify command
+ * missing commits
+ * fix thin packs to only delta against objects within sparse limits
+ * lots more testcases
+ * cleanup FIXMEs
--
1.7.2.2.140.gd06af
Nguyen Thai Ngoc Duy
2010-09-05 03:01:06 UTC
Permalink
Post by Elijah Newren
+To ensure minimum necessary connectivity, we also download basic
+information from otherwise excluded commits
+ =C2=A0* parents of these commits
+ =C2=A0* trees matching the specified sparse path(s)
+but, for security and space reasons, do not download
+ =C2=A0* author
+ =C2=A0* author date
+ =C2=A0* committer
+ =C2=A0* committer date
+ =C2=A0* log message
+Such commits are still considered "missing" (see item I4 for more
+details about how we handle "missing" commits).
Just an observation. When I ran pack-objects with irrelevant commits
removed (i.e. try_to_simplify_commit) on Documentation/, I got a 6MB
pack. When I ran it without commit simplification, I got 16MB pack.
That's 10MB larger.

Now I don't how much of that 10MB share is commit messages, authors,
committers and trees but I suspect trees take a large part in it.
Maybe you can just fake the trees in those fake commits as well, to
avoid downloading more trees.
--=20
Duy
Elijah Newren
2010-09-05 03:13:13 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
Post by Elijah Newren
+To ensure minimum necessary connectivity, we also download basic
+information from otherwise excluded commits
+ =C2=A0* parents of these commits
+ =C2=A0* trees matching the specified sparse path(s)
+but, for security and space reasons, do not download
+ =C2=A0* author
+ =C2=A0* author date
+ =C2=A0* committer
+ =C2=A0* committer date
+ =C2=A0* log message
+Such commits are still considered "missing" (see item I4 for more
+details about how we handle "missing" commits).
Just an observation. When I ran pack-objects with irrelevant commits
removed (i.e. try_to_simplify_commit) on Documentation/, I got a 6MB
pack. When I ran it without commit simplification, I got 16MB pack.
That's 10MB larger.
Hmm. I get 22MB pack for a full clone of git.git and 13 MB for a
sparse clone with Documentation/; that's including all commits too. I
wonder why our numbers differ by 3 MB. Weird.
Post by Nguyen Thai Ngoc Duy
Now I don't how much of that 10MB share is commit messages, authors,
committers and trees but I suspect trees take a large part in it.
Maybe you can just fake the trees in those fake commits as well, to
avoid downloading more trees.
I originally planned to do that, but that makes working with tags and
branches difficult. For example, documenters could clone a repository
but be unable to make new commits on top of maint or master since we
probably wouldn't have trees for the tips of those branches.

So I really think trees are needed. Of course, for someone making a
sparse clone of "just" the Documentation directory will need the
toplevel trees, but they won't need trees under t/, for example. So
they do get some savings.
Nguyen Thai Ngoc Duy
2010-09-06 03:14:16 UTC
Permalink
Post by Nguyen Thai Ngoc Duy
Just an observation. When I ran pack-objects with irrelevant commits
removed (i.e. try_to_simplify_commit) on Documentation/, I got a 6MB
pack. When I ran it without commit simplification, I got 16MB pack.
That's 10MB larger.
Hmm. =C2=A0I get 22MB pack for a full clone of git.git and 13 MB for =
a
sparse clone with Documentation/; that's including all commits too. =C2=
=A0I
wonder why our numbers differ by 3 MB. =C2=A0Weird.
Hmm.. just got a fresh alt-git.git from repo.or.cz and did
narrow/sparse clone from that. "git clone" reported 16.27MB for both
narrow/sparse version.
--=20
Duy
Loading...