Discussion:
Can I fetch an arbitrary commit by sha1?
Christian Halstrick
2014-10-02 13:57:45 UTC
Permalink
I always though during fetch I have to specify a refspec and that a
sha1 would not be accepted as a ref. Firing some like 'git fetch
origin <sha1>' should be forbidden. But in fact I see that such a
fetch command succeeds if you already have that object in your local
repo.

My question: is it allowed to fetch sha1's? Shouldn't fetch fail if you try it?
git clone -q https://github.com/chalstrick/dondalfi.git
cd dondalfi
git ls-remote
From https://github.com/chalstrick/dondalfi.git
ce08dcc41104383f3cca2b95bd41e9054a957f5b HEAD
af00f4c39bcc8dc29ed8f59a47066d5993c279e4 refs/foo/b1
...
git show af00f4c39bcc8dc29ed8f59a47066d5993c279e4
fatal: bad object af00f4c39bcc8dc29ed8f59a47066d5993c279e4
git fetch origin af00f4c39bcc8dc29ed8f59a47066d5993c279e4
error: no such remote ref af00f4c39bcc8dc29ed8f59a47066d5993c279e4
git fetch origin refs/foo/b1
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://github.com/chalstrick/dondalfi
* branch refs/foo/b1 -> FETCH_HEAD
git fetch origin af00f4c39bcc8dc29ed8f59a47066d5993c279e4
From https://github.com/chalstrick/dondalfi
* branch af00f4c39bcc8dc29ed8f59a47066d5993c279e4 -> FETCH_HEAD

Ciao
Chris
Dan Johnson
2014-10-02 14:22:50 UTC
Permalink
On Thu, Oct 2, 2014 at 9:57 AM, Christian Halstrick
Post by Christian Halstrick
I always though during fetch I have to specify a refspec and that a
sha1 would not be accepted as a ref. Firing some like 'git fetch
origin <sha1>' should be forbidden. But in fact I see that such a
fetch command succeeds if you already have that object in your local
repo.
My question: is it allowed to fetch sha1's? Shouldn't fetch fail if you try it?
git clone -q https://github.com/chalstrick/dondalfi.git
cd dondalfi
git ls-remote
From https://github.com/chalstrick/dondalfi.git
ce08dcc41104383f3cca2b95bd41e9054a957f5b HEAD
af00f4c39bcc8dc29ed8f59a47066d5993c279e4 refs/foo/b1
...
git show af00f4c39bcc8dc29ed8f59a47066d5993c279e4
fatal: bad object af00f4c39bcc8dc29ed8f59a47066d5993c279e4
git fetch origin af00f4c39bcc8dc29ed8f59a47066d5993c279e4
error: no such remote ref af00f4c39bcc8dc29ed8f59a47066d5993c279e4
git fetch origin refs/foo/b1
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://github.com/chalstrick/dondalfi
* branch refs/foo/b1 -> FETCH_HEAD
git fetch origin af00f4c39bcc8dc29ed8f59a47066d5993c279e4
From https://github.com/chalstrick/dondalfi
* branch af00f4c39bcc8dc29ed8f59a47066d5993c279e4 -> FETCH_HEAD
My understanding is that you are allowed to ask for a SHA1, but most
git servers refuse the request. But if you already have the SHA
locally, then git doesn't neet to bother asking the server for it, so
there's no request to be refused.

But it's been a while for me since I did any git development, so it's
possible I missed something.

-Dan
Jeff King
2014-10-02 16:10:06 UTC
Permalink
Post by Dan Johnson
Post by Christian Halstrick
git show af00f4c39bcc8dc29ed8f59a47066d5993c279e4
fatal: bad object af00f4c39bcc8dc29ed8f59a47066d5993c279e4
git fetch origin af00f4c39bcc8dc29ed8f59a47066d5993c279e4
error: no such remote ref af00f4c39bcc8dc29ed8f59a47066d5993c279e4
git fetch origin refs/foo/b1
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://github.com/chalstrick/dondalfi
* branch refs/foo/b1 -> FETCH_HEAD
git fetch origin af00f4c39bcc8dc29ed8f59a47066d5993c279e4
From https://github.com/chalstrick/dondalfi
* branch af00f4c39bcc8dc29ed8f59a47066d5993c279e4 -> FETCH_HEAD
My understanding is that you are allowed to ask for a SHA1, but most
git servers refuse the request. But if you already have the SHA
locally, then git doesn't neet to bother asking the server for it, so
there's no request to be refused.
That's right. It is the server which enforces the "you cannot fetch an
arbitrary sha1" rule.

But I think Christian is arguing that the client side should complain
that $sha1 is not a remote ref, and therefore not something we can
fetch. This used to be the behavior until 6e7b66e (fetch: fetch objects
by their exact SHA-1 object names, 2013-01-29). The idea there is that
some refs may be kept "hidden" from the ref advertisement, but clients
who learn about the sha1 out-of-band may fetch the tips of hidden refs.

I'm not sure it is a feature that has been particularly well-used to
date, though.

-Peff
Jonathan Nieder
2014-10-02 17:35:51 UTC
Permalink
Post by Jeff King
But I think Christian is arguing that the client side should complain
that $sha1 is not a remote ref, and therefore not something we can
fetch. This used to be the behavior until 6e7b66e (fetch: fetch objects
by their exact SHA-1 object names, 2013-01-29). The idea there is that
some refs may be kept "hidden" from the ref advertisement, but clients
who learn about the sha1 out-of-band may fetch the tips of hidden refs.
I'm not sure it is a feature that has been particularly well-used to
date, though.
I use it pretty often. The commits I'm fetching are pointed to
directly by refs, but I don't care about what the ref is called and I
want exactly that commit.

The context is that the commit is mentioned in the gerrit web UI.
Fetching by commit name feels simpler than getting the
refs/changes/something ref, since I think in terms of commits instead
of in terms of change numbers.

Thanks and hope that helps,
Jonathan
Christian Halstrick
2014-10-05 20:49:09 UTC
Permalink
I also like the feature of being able to fetch commits by SHA-1. My
problem is that it is not clear to end users whether they can fetch
SHA-1 from a specific server or not. For exactly the same server a
"git fetch origin <id-of-commit-x>" first doesn't work and all of the
sudden that command works and updates e.g. FETCH_HEAD. That's because
between the first and the second fetch you fetched that commit already
by fetching a branch.

And even if the commit is known only to the local repo then the fetch
works. I tried to fetch a commit which I just created locally. And the
git fetch eclipse 382dfeab0e11bd88388d7195114c046c3ec27d8f
From https://git.eclipse.org/r/jgit/jgit
* branch 382dfeab0e11bd88388d7195114c046c3ec27d8f -> FETCH_HEAD

This gives me the impression that that update was triggered by data
coming from the server https://git.eclipse.org/r/jgit/jgit. But the
server doesn't know the commit. In my eyes the fetch should fail if
the server doesn't know the commit.

Ciao
Chris
Patrick Donnelly
2014-10-06 18:25:55 UTC
Permalink
Post by Jeff King
Post by Dan Johnson
My understanding is that you are allowed to ask for a SHA1, but most
git servers refuse the request. But if you already have the SHA
locally, then git doesn't neet to bother asking the server for it, so
there's no request to be refused.
That's right. It is the server which enforces the "you cannot fetch an
arbitrary sha1" rule.
But I think Christian is arguing that the client side should complain
that $sha1 is not a remote ref, and therefore not something we can
fetch. This used to be the behavior until 6e7b66e (fetch: fetch objects
by their exact SHA-1 object names, 2013-01-29). The idea there is that
some refs may be kept "hidden" from the ref advertisement, but clients
who learn about the sha1 out-of-band may fetch the tips of hidden refs.
I'm not sure it is a feature that has been particularly well-used to
date, though.
There are efforts in the scientific communities at preserving
experimental software and results. One of the things we'd like to do
is shallow clone a specific sha1 commit from e.g. GitHub. [I think
GitHub has this disabled though? I haven't been able to get it to
work.] I guess this feature was a step in the right direction but it's
not usable AFAIK. Tags are not really suitable as they could change
and there are possible namespace issues.
--
Patrick Donnelly
David Lang
2014-10-06 18:28:41 UTC
Permalink
Post by Patrick Donnelly
There are efforts in the scientific communities at preserving
experimental software and results. One of the things we'd like to do
is shallow clone a specific sha1 commit from e.g. GitHub. [I think
GitHub has this disabled though? I haven't been able to get it to
work.] I guess this feature was a step in the right direction but it's
not usable AFAIK. Tags are not really suitable as they could change
and there are possible namespace issues.
remember that git != github and it's not hard to run your own git server.

if you sign tags, they should be very stable. You do have the namespace issue,
but unless you have a lot of different people tagging in the same repository,
that shouldn't be an issue (and if you do, can't you use the person's name as
part of the tag?)

David Lang
Duy Nguyen
2014-10-07 12:34:36 UTC
Permalink
Post by Patrick Donnelly
Post by Jeff King
Post by Dan Johnson
My understanding is that you are allowed to ask for a SHA1, but most
git servers refuse the request. But if you already have the SHA
locally, then git doesn't neet to bother asking the server for it, so
there's no request to be refused.
That's right. It is the server which enforces the "you cannot fetch an
arbitrary sha1" rule.
But I think Christian is arguing that the client side should complain
that $sha1 is not a remote ref, and therefore not something we can
fetch. This used to be the behavior until 6e7b66e (fetch: fetch objects
by their exact SHA-1 object names, 2013-01-29). The idea there is that
some refs may be kept "hidden" from the ref advertisement, but clients
who learn about the sha1 out-of-band may fetch the tips of hidden refs.
I'm not sure it is a feature that has been particularly well-used to
date, though.
There are efforts in the scientific communities at preserving
experimental software and results. One of the things we'd like to do
is shallow clone a specific sha1 commit
You're not the first one asking about making a shallow clone from from
a specific point. I think the reason fetching from arbitrary sha-1 is
not supported is because of security. If we can verify the asked sha-1
is reachable from the visible ref set, then we should allow it. With
pack bitmaps, it's getting much cheaper to do such a test. If pack
bitmaps are not used, we could set a default/configurable limit, like
not traversing more than 1000 commits from any ref for this
reachability test). Anybody objecting this approach?
--
Duy
Duy Nguyen
2014-10-07 13:12:57 UTC
Permalink
If we can verify the asked sha-1 is reachable from the visible ref
set, then we should allow it. With pack bitmaps, it's getting much
cheaper to do such a test. If pack bitmaps are not used, we could
set a default/configurable limit, like not traversing more than 1000
commits from any ref for this reachability test).
Hmm.. Junio already did most of the work in 051e400 (helping
smart-http/stateless-rpc fetch race - 2011-08-05), so all we need to
do is enable uploadpack.allowtipsha1inwant and apply this patch

-- 8< --
diff --git a/upload-pack.c b/upload-pack.c
index c789ec0..493f8ee 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -454,10 +454,6 @@ static void check_non_tip(void)
char namebuf[42]; /* ^ + SHA-1 + LF */
int i;

- /* In the normal in-process case non-tip request can never happen */
- if (!stateless_rpc)
- goto error;
-
cmd.argv = argv;
cmd.git_cmd = 1;
cmd.no_stderr = 1;
-- 8< --

If we already let smart-http do this, I don't see any harm in letting
git protocol do the same (even though it's the the original reason why
this code exists).
--
Duy
Junio C Hamano
2014-10-07 16:52:33 UTC
Permalink
Post by Duy Nguyen
Hmm.. Junio already did most of the work in 051e400 (helping
smart-http/stateless-rpc fetch race - 2011-08-05), so all we need to
do is enable uploadpack.allowtipsha1inwant and apply this patch
Not that patch, I would think.

I would understand "if !stateless_rpc and !allowtipsha1 then it is
an error", though.
Post by Duy Nguyen
-- 8< --
diff --git a/upload-pack.c b/upload-pack.c
index c789ec0..493f8ee 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -454,10 +454,6 @@ static void check_non_tip(void)
char namebuf[42]; /* ^ + SHA-1 + LF */
int i;
- /* In the normal in-process case non-tip request can never happen */
- if (!stateless_rpc)
- goto error;
-
cmd.argv = argv;
cmd.git_cmd = 1;
cmd.no_stderr = 1;
-- 8< --
If we already let smart-http do this, I don't see any harm in letting
git protocol do the same (even though it's the the original reason why
this code exists).
--
Duy
Duy Nguyen
2014-10-08 13:30:29 UTC
Permalink
Post by Junio C Hamano
Post by Duy Nguyen
Hmm.. Junio already did most of the work in 051e400 (helping
smart-http/stateless-rpc fetch race - 2011-08-05), so all we need to
do is enable uploadpack.allowtipsha1inwant and apply this patch
Not that patch, I would think.
I would understand "if !stateless_rpc and !allowtipsha1 then it is
an error", though.
Fair enough. It seems to work, technically, using the patch below. But
I think people would rather have support from "git clone" and "git
clone --branch" can't deal with SHA-1 this way yet. And --branch might
be a bad place to enable this..

So it needs more work. Any help is appreciated, as I still need to
finish my untracked cache series first and re-evaluate watchman series
before git 3.0 is released.

-- 8< --
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 67e0ab3..bdc121e 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -1277,4 +1277,22 @@ EOF
git push --no-thin --receive-pack="$rcvpck" no-thin/.git refs/heads/master:refs/heads/foo
'

+test_expect_success 'shallow fetch reachable SHA1 (but not a ref)' '
+ mk_empty testrepo &&
+ (
+ cd testrepo &&
+ test_commit foo &&
+ test_commit bar
+ ) &&
+ SHA1=`git --git-dir=testrepo/.git rev-parse HEAD^` &&
+ git init shallow &&
+ (
+ cd shallow &&
+ test_must_fail git fetch --depth=1 ../testrepo/.git $SHA1 &&
+ git --git-dir=../testrepo/.git config uploadpack.allowtipsha1inwant true &&
+ git fetch --depth=1 ../testrepo/.git $SHA1 &&
+ git cat-file commit $SHA1 >/dev/null
+ )
+'
+
test_done
diff --git a/upload-pack.c b/upload-pack.c
index c789ec0..4a9a656 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -454,8 +454,12 @@ static void check_non_tip(void)
char namebuf[42]; /* ^ + SHA-1 + LF */
int i;

- /* In the normal in-process case non-tip request can never happen */
- if (!stateless_rpc)
+ /*
+ * In the normal in-process case without
+ * uploadpack.allowtipsha1inwant, non-tip requests can never
+ * happen
+ */
+ if (!stateless_rpc && !allow_tip_sha1_in_want)
goto error;

cmd.argv = argv;
-- 8< --
Junio C Hamano
2014-10-09 18:08:06 UTC
Permalink
Post by Duy Nguyen
Post by Junio C Hamano
Post by Duy Nguyen
Hmm.. Junio already did most of the work in 051e400 (helping
smart-http/stateless-rpc fetch race - 2011-08-05), so all we need to
do is enable uploadpack.allowtipsha1inwant and apply this patch
Not that patch, I would think.
I would understand "if !stateless_rpc and !allowtipsha1 then it is
an error", though.
Fair enough. It seems to work, technically, using the patch below. But
I think people would rather have support from "git clone" and "git
clone --branch" can't deal with SHA-1 this way yet. And --branch might
be a bad place to enable this..
So it needs more work.
This is so non-standard a thing to do that I doubt it is worth
supporting with "git clone". "git clone --branch", which is about
"I want to follow that particular branch", would not mesh well with
"I want to see the history that leads to this exact commit", either.
You would not know which branch(es) is that exact commit is on in
the first place.

I would not say that "git archive" is sufficient, however, as "I
want to see the history that leads to the commit" is different from
"I want to grab the state recorded at that commit".

The "uploadpack.allowtipsha1inwant" is a wrong configuration to tie
this into. The intent of the configuration is to allow *ONLY*
commits at the tip of the (possibly hidden) refs to be asked for.
Those who want to hide some refs using "uploadpack.hiderefs" may
want to enable "allowtipsha1inwant" to allow the tips of the hidden
refs while still disallowing a request to fetch any random reachable
commit not at the tip.

The "check_non_tip()" hack is a work-around for the deficiency of
the smart HTTP protocol (the tips of the refs the client reads off
of the server end are not the tips of the refs the serving server
verifies against the request due to information loss between the two
processes at the server end), and is not necessary for the proper
Git transport, where the server who first grabbed its tips of refs
and advertised them will know what it advertised and can expect the
request to come back asking exactly for those refs, not random
ancestors of those refs.

This new feature needs to be enabled with a different configuration
variable, perhaps "uploadpack.allownontipsha1inwant". It has
associated cost of having to walk back the history to check the
reachability.

Thanks.

Loading...