Discussion:
Possible bug: Git submodules can get into broken state
Chris Wilson
2013-04-04 17:10:17 UTC
Permalink
Hi all,

If your git repo's .gitmodules contains a URL that you don't have access
to (for example you download someone else's code and it references a
submodule using their writable ***@github.com URL) then:

* git submodule init will add them to .git/config, with the wrong URLs.

* git submodule update will fail to check out the repos, leaving an empty
directory for the first one, and nothing for the others.

This state is broken (wrong URLs in .git/config), and AFAIK there's
nothing you can do to check out these submodules without either:

(a) manually hacking them out of .git/config, or

(b) doing "git submodule rm" and then "git checkout .gitmodules" to undo
the damage to that file.

The procedure I tried, which I expected to work, was:

* git submodule sync (doesn't sync them, because the directories don't
exist or don't contain a valid git repo?)

* git submodule init (ignores them, because they're already in
.git/config?)

* git submodule update (still fails because the URL in .git/config is
wrong).

The new deinit command may help, but for the wrong reasons. I don't want
to have to deinit my modules every time in the fabric deployment script,
just so that if they get into this state, they will get unbroken
automatically.

It seems wrong to me that neither "git submodule init" nor "git submodule
sync" will modify the URL in .git/config, if the submodule is not already
checked out. I think I'd expect "git submodule init" to be idempotent, so
it would update the URLs in .git/config if they already exist, just like
it adds the URLs if they don't.

Any advice? Is this a real bug?

Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 967 838
Future Business, Cam City FC, Milton Rd, Cambridge, CB4 1UY, UK

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
Junio C Hamano
2013-04-04 18:30:41 UTC
Permalink
Post by Chris Wilson
If your git repo's .gitmodules contains a URL that you don't have
access to (for example you download someone else's code and it
* git submodule init will add them to .git/config, with the wrong URLs.
* git submodule update will fail to check out the repos, leaving an
empty directory for the first one, and nothing for the others.
This state is broken (wrong URLs in .git/config), and AFAIK there's
(a) manually hacking them out of .git/config, or
I do not think updating the config is "hacking", but is a perfectly
normal thing to do for a submodule user who wants to use a custom
URL different from what is recorded in .gitmodules (even when the
URL in .gitmodules is _working_, you may have a closer mirror you
would prefer to use, for example). It is how the configuration is
designed to be used, if I am not mistaken.

So I do not see any breakage here.
Chris Wilson
2013-04-04 20:40:13 UTC
Permalink
Hi Junio,
Post by Junio C Hamano
Post by Chris Wilson
This state is broken (wrong URLs in .git/config), and AFAIK there's
(a) manually hacking them out of .git/config, or
I do not think updating the config is "hacking", but is a perfectly
normal thing to do for a submodule user who wants to use a custom
URL different from what is recorded in .gitmodules (even when the
URL in .gitmodules is _working_, you may have a closer mirror you
would prefer to use, for example). It is how the configuration is
designed to be used, if I am not mistaken.
It may be possible, but there's no easy command to do it, especially in
automated (fabric) deployment scripts. I do not want to write an awk/sed
script to remove all the submodules from .git/config so that I can
successfully run git submodule init again. What other choices do I have?
Post by Junio C Hamano
So I do not see any breakage here.
I do, it seems bizarre that git submodule init can create this situation
but cannot rectify it.

Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 967 838
Future Business, Cam City FC, Milton Rd, Cambridge, CB4 1UY, UK

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
Junio C Hamano
2013-04-04 21:07:35 UTC
Permalink
Post by Chris Wilson
I do not want to write an
awk/sed script to remove all the submodules from .git/config ...
Don't do it then ;-)

I think "git config" was added exactly because people wanted to
customize their configuration from their scripts.
Post by Chris Wilson
I do, it seems bizarre that git submodule init can create this
situation but cannot rectify it.
I do not see there is anything to rectify in your situation.

Whoever wrote .gitmodules may have supplied a URL you cannot access,
and the configuration mechanism is the way to use custom URL in
place of it.
Chris Wilson
2013-04-04 22:07:19 UTC
Permalink
Post by Junio C Hamano
I do not want to write an awk/sed script to remove all the submodules
from .git/config ...
Don't do it then ;-)
I think "git config" was added exactly because people wanted to
customize their configuration from their scripts.
OK, I didn't know about git config. I will investigate that. But if the
only automated way to fix the configuration in .git/config is by removing
all submodules and running submodule init again, I still think that's a
usability issue at least. If you won't accept it as a bug, would you
consider it a feature request?
Post by Junio C Hamano
I do, it seems bizarre that git submodule init can create this
situation but cannot rectify it.
I do not see there is anything to rectify in your situation.
Whoever wrote .gitmodules may have supplied a URL you cannot access,
They originally supplied a wrong URL (one that I can't access).

They fixed it by checking in a rectified .gitmodules file.

In the mean time, I ran git submodule init, and now I've ended up in a
situation where git submodule update doesn't work, and there's no
submodule command to fix it, so I have to remove the broken submodules
from .git/config.
Post by Junio C Hamano
and the configuration mechanism is the way to use custom URL in
place of it.
I don't want to use a custom URL, I want to use the URL which is now in
.gitmodules.

Could submodule init at least change the URLs of submodules which are not
checked out? (e.g. because they couldn't be)?

Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 967 838
Future Business, Cam City FC, Milton Rd, Cambridge, CB4 1UY, UK

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
Junio C Hamano
2013-04-05 16:51:31 UTC
Permalink
Post by Chris Wilson
They fixed it by checking in a rectified .gitmodules file.
In the mean time, I ran git submodule init, and now I've ended up in a
situation where git submodule update doesn't work, and there's no
submodule command to fix it, so I have to remove the broken submodules
from .git/config.
Post by Junio C Hamano
and the configuration mechanism is the way to use custom URL in
place of it.
I don't want to use a custom URL, I want to use the URL which is now
in .gitmodules.
Then don't call it a "custom URL" ;-).

Isn't your problem that the old, broken one is now in your config,
and you want to update that with a corrected one? How you learned
which one is correct does not really matter---you may learned it out
of band by looking at upstream's gitweb, or by running "git fetch"
in the superproject again and looking at the updated .gitmodules
file. The configuration mechanism has a way to update that entry.
Post by Chris Wilson
Could submodule init at least change the URLs of submodules which are
not checked out? (e.g. because they couldn't be)?
Perhaps "submodule deinit" might be what you are looking for, but I
dunno.

Jens Lehmann
2013-04-04 19:56:33 UTC
Permalink
Post by Chris Wilson
* git submodule init will add them to .git/config, with the wrong URLs.
* git submodule update will fail to check out the repos, leaving an empty directory for the first one, and nothing for the others.
(a) manually hacking them out of .git/config, or
... or:

(c) Enter the correct URL in .git/config.
Post by Chris Wilson
(b) doing "git submodule rm" and then "git checkout .gitmodules" to undo the damage to that file.
Hmm ... that leaves your superproject dirty, right?

(d) Update the .gitmodules file to use the correct URL (you
want to do a commit fixing that anyway, no? ;-) and do a
"git submodule sync", which will copy the corrected URL
into .git/config.
Post by Chris Wilson
* git submodule sync (doesn't sync them, because the directories don't exist or don't contain a valid git repo?)
No, because .gitmodules still contained the broken URL which a
sync then copies into .git/config again.
Post by Chris Wilson
* git submodule init (ignores them, because they're already in .git/config?)
Correct.
Post by Chris Wilson
* git submodule update (still fails because the URL in .git/config is wrong).
Sure.
Post by Chris Wilson
The new deinit command may help, but for the wrong reasons. I don't want to have to deinit my modules every time in the fabric deployment script, just so that if they get into this state, they will get unbroken automatically.
I doubt deinit will help here (except after running that you'll
be able to use "git submodule update" to populate the remaining
submodules) unless you fix the broken URL in .git/config or
.gitmodules.
Post by Chris Wilson
It seems wrong to me that neither "git submodule init" nor "git submodule sync" will modify the URL in .git/config, if the submodule is not already checked out. I think I'd expect "git submodule init" to be idempotent, so it would update the URLs in .git/config if they already exist, just like it adds the URLs if they don't.
Any advice? Is this a real bug?
Hmm, at first glance this looks like a pilot error. Maybe we could
update the documentation to help other users falling into that
trap or extend some commands to be a bit more helpful in such a
case, but it looks like the behavior you observed is documented
(while fixing the problem you observed isn't explicitly). Also an
option for "git submodule update" to continue even if it fails to
populate some submodules might help here.
Loading...