Discussion:
[PATCH] git-submodule: Remove duplicate entries during merge with conflict
Nicolas Morey-Chaisemartin
2011-03-17 08:09:14 UTC
Permalink
During a merge with conflict on a submodule, the submodule appears 3 times in git ls-files (stage 1,2,3) which causes the submodule to be used 3 times in git submodule update or status command.
This patch filters the results of git ls-files and only shows submodule in stage 0 or 1 thus removing the duplicates.

Signed-off-by: Nicolas Morey-Chaisemartin <***@morey-chaisemartin.com>
---
git-submodule.sh | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 3a13397..5ef0f9d 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -72,7 +72,7 @@ resolve_relative_url ()
#
module_list()
{
- git ls-files --error-unmatch --stage -- "$@" | sane_grep '^160000 '
+ git ls-files --error-unmatch --stage -- "$@" | sane_grep '^160000 ' | awk '{ if (($3 == 0) || ($3 == 1)) print $0}'
}

#
Junio C Hamano
2011-03-17 18:47:12 UTC
Permalink
Post by Nicolas Morey-Chaisemartin
During a merge with conflict on a submodule, the submodule appears 3
times in git ls-files (stage 1,2,3) which causes the submodule to be
used 3 times in git submodule update or status command. This patch
filters the results of git ls-files and only shows submodule in stage 0
or 1 thus removing the duplicates.
That is a wrong thing to do. What if both sides added a submodule at the
same directory after forked from an ancestor that did not have it? You
will have only stage #2 and stage #3, no?
Junio C Hamano
2011-03-17 20:50:10 UTC
Permalink
Post by Junio C Hamano
Post by Nicolas Morey-Chaisemartin
During a merge with conflict on a submodule, the submodule appears 3
times in git ls-files (stage 1,2,3) which causes the submodule to be
used 3 times in git submodule update or status command. This patch
filters the results of git ls-files and only shows submodule in stage 0
or 1 thus removing the duplicates.
That is a wrong thing to do. What if both sides added a submodule at the
same directory after forked from an ancestor that did not have it? You
will have only stage #2 and stage #3, no?
It is not very friendly to say a solution is wrong without hinting what
the right thing to do, so let's try.

There are 5 callers of module_list; they all read (mode, sha1, stage,
path) tuple, and most of them care only about path. As a first level
approximation, it should be Ok (in the sense that it does not make things
worse than it currently is) to filter the duplicate paths from module_list
output.

But a bigger question is, shouldn't some callers _care_ when the
superproject is still unmerged? What does it mean to say "submodule
update" when your superproject is unmerged, you haven't resolved it so you
don't know if you are going to take the commit your branch points at, or
the commit their branch points at, or a commit completely different from
either? If submodule.$name.update is set to detach and checkout the
commit registred at the superproject, which stage do you want to get the
commit to update the submodule with? If it is set to "rebase" or "merge",
which stage do you take the commit to rebase/merge your history onto?

I think "update" should not touch the submodule repository in such a case.
"status" might want to recurse into the submodule repository, but there is
no information at the superproject level to compare the submodule status
against, so we should at least skip that part for submodule entries that
are unmerged in the superproject.

Perhaps something along the lines of (as usual, totally untested, just to
illustrate the concept) the attached patch?

The idea is to notice the higher-stage entries, and emit only one record
from module_list, but while doing so, mark the entry with "U" (not [0-3])
in $stage field and null out the SHA-1 part, as the object name for the
lowest stage does not give any useful information to the caller, and this
way any caller that uses the object name would hopefully barf.

Fine points:

- The command called by foreach may want to do whatever it wants to do by
noticing the merged status in the superproject itself, so I made no
change to the function;

- Init is an unlikely thing to do, but as long as a submodule is there in
$path, there is no point skipping it. It might however want to take the
merged status of .gitmodules into account, but that is outside of the
scope of this topic;

- Update should refrain from touching the submodule itself. It may want
to recurse into the submodule of the unmerged one, but people involved
in submodule work should think things through;

- Status does not have anything to report for an unmerged submodule. It
may want to recurse into the submodule of the unmerged one, but people
involved in submodule work should think things through; and

- Sync should be Ok for the same reason as Init.



git-submodule.sh | 29 ++++++++++++++++++++++++++++-
1 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 3a13397..8ecd311 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -72,7 +72,24 @@ resolve_relative_url ()
#
module_list()
{
- git ls-files --error-unmatch --stage -- "$@" | sane_grep '^160000 '
+ git ls-files --error-unmatch --stage -- "$@" |
+ perl -e '
+ my %unmerged = ();
+ my ($null_sha1) = ("0" x 40);
+ while (<STDIN>) {
+ chomp;
+ my ($mode, $sha1, $stage, $path) =
+ /^([0-7]+) ([0-9a-f]{40}) ([0-3])\t(.*)$/;
+ next unless $mode eq "160000";
+ if ($stage ne "0") {
+ if (!$unmerged{$path}++) {
+ print "$mode $null_sha1 U\t$path\n";
+ }
+ next;
+ }
+ print "$_\n";
+ }
+ '
}

#
@@ -427,6 +444,11 @@ cmd_update()
module_list "$@" |
while read mode sha1 stage path
do
+ if test "$stage" = U
+ then
+ echo >&2 "Skipping unmerged submodule $path"
+ continue
+ fi
name=$(module_name "$path") || exit
url=$(git config submodule."$name".url)
update_module=$(git config submodule."$name".update)
@@ -767,6 +789,11 @@ cmd_status()
module_list "$@" |
while read mode sha1 stage path
do
+ if test "$stage" = U
+ then
+ echo >&2 "Skipping unmerged submodule $path"
+ continue
+ fi
name=$(module_name "$path") || exit
url=$(git config submodule."$name".url)
displaypath="$prefix$path"
Nicolas Morey-Chaisemartin
2011-03-21 08:43:23 UTC
Permalink
Post by Junio C Hamano
Post by Junio C Hamano
Post by Nicolas Morey-Chaisemartin
During a merge with conflict on a submodule, the submodule appears 3
times in git ls-files (stage 1,2,3) which causes the submodule to be
used 3 times in git submodule update or status command. This patch
filters the results of git ls-files and only shows submodule in stage 0
or 1 thus removing the duplicates.
That is a wrong thing to do. What if both sides added a submodule at the
same directory after forked from an ancestor that did not have it? You
will have only stage #2 and stage #3, no?
After a quick check, you're right on that ;)
Post by Junio C Hamano
It is not very friendly to say a solution is wrong without hinting what
the right thing to do, so let's try.
Thanks for taking the time to point me in the right direction !
Post by Junio C Hamano
There are 5 callers of module_list; they all read (mode, sha1, stage,
path) tuple, and most of them care only about path. As a first level
approximation, it should be Ok (in the sense that it does not make things
worse than it currently is) to filter the duplicate paths from module_list
output.
...
Post by Junio C Hamano
- Status does not have anything to report for an unmerged submodule. It
may want to recurse into the submodule of the unmerged one, but people
involved in submodule work should think things through; and
I agree that the actual behavior of status is definitely wrong and it should be changed.
But I think there needs to be a simple way for a user to know whats happening to a conflicted submodule.
When merging a file, editing the conflicting area in the code is often enough.
For a submodule, I think the user needs an easy access to which branch used what SHA1.

Git submodule status is probably not the right place to do that.
git ls-files --stage allows that but it's not part of the standard porcelain commands...

I guess a first step should be to apply the patch you proposed and discuss a later solution to get the submodule SHA1 (well except if there's already one I just don't know about !)

I'll check/test the patch you proposed as soon as possible and repost it if it's allright !

Nicolas Morey-Chaisemartin
Jens Lehmann
2011-03-21 20:53:27 UTC
Permalink
Post by Nicolas Morey-Chaisemartin
I agree that the actual behavior of status is definitely wrong and it should be changed.
Yup.
Post by Nicolas Morey-Chaisemartin
But I think there needs to be a simple way for a user to know whats happening to a conflicted submodule.
When merging a file, editing the conflicting area in the code is often enough.
For a submodule, I think the user needs an easy access to which branch used what SHA1.
You can see the sha1s using "git diff":

diff --cc submod
index bd4cfe7,7128fbd..0000000
--- a/submod
+++ b/submod

(But the sha1s don't tell you much about what differing commits you
have in the submodule branches, which would be really helpful for
deciding how to solve the merge conflict. Unfortunately "git diff
--submodule" doesn't work for merge conflicts yet, another issue on
my ToDo-list ...)
Post by Nicolas Morey-Chaisemartin
Git submodule status is probably not the right place to do that.
I agree.
Post by Nicolas Morey-Chaisemartin
git ls-files --stage allows that but it's not part of the standard porcelain commands...
And it shows you three sha1s, one of them being the current state of
the submodule. Me thinks the output of "git diff" is easier to digest.
Post by Nicolas Morey-Chaisemartin
I'll check/test the patch you proposed as soon as possible and repost it if it's allright !
Cool!
Jens Lehmann
2011-03-21 20:29:14 UTC
Permalink
Post by Junio C Hamano
The idea is to notice the higher-stage entries, and emit only one record
from module_list, but while doing so, mark the entry with "U" (not [0-3])
in $stage field and null out the SHA-1 part, as the object name for the
lowest stage does not give any useful information to the caller, and this
way any caller that uses the object name would hopefully barf.
Excellent idea.
Post by Junio C Hamano
- The command called by foreach may want to do whatever it wants to do by
noticing the merged status in the superproject itself, so I made no
change to the function;
- Init is an unlikely thing to do, but as long as a submodule is there in
$path, there is no point skipping it. It might however want to take the
merged status of .gitmodules into account, but that is outside of the
scope of this topic;
Both make sense.
Post by Junio C Hamano
- Update should refrain from touching the submodule itself. It may want
to recurse into the submodule of the unmerged one, but people involved
in submodule work should think things through;
I don't think recursing would make any sense here. It might be unknown
to what commit the sub-submodules should be updated to if their commits
differ in the unmerged branches. So I'd vote for not recursing at all in
this case (which is what your patch does).
Post by Junio C Hamano
- Status does not have anything to report for an unmerged submodule. It
may want to recurse into the submodule of the unmerged one, but people
involved in submodule work should think things through; and
I do think status does have something to report here. First its job is to
list all submodules, so currently unmerged submodules should show up too,
and then it tells the user something about the state of the submodule. So
I would propose to print a line for the submodule starting with a special
character that tells the user the submodule has a merge conflict. We
could e.g. use a '*' here (additionally to '-' for uninitialized and '+'
for those submodules where the HEAD differs from the commit recorded in
the superproject).
Post by Junio C Hamano
- Sync should be Ok for the same reason as Init.
Yup.


<snip>
Post by Junio C Hamano
@@ -767,6 +789,11 @@ cmd_status()
while read mode sha1 stage path
do
+ if test "$stage" = U
+ then
+ echo >&2 "Skipping unmerged submodule $path"
As said above I would like to mimic the output for other submodule
states here. Maybe something like this (where $sha1 will expand to
the null sha as we don't have anything better here):

+ say "*$sha1 $prefix$path"
Post by Junio C Hamano
+ continue
+ fi
name=$(module_name "$path") || exit
url=$(git config submodule."$name".url)
displaypath="$prefix$path"
Junio C Hamano
2011-03-21 20:59:15 UTC
Permalink
Post by Jens Lehmann
I do think status does have something to report here. First its job is to
list all submodules, so currently unmerged submodules should show up too,
and then it tells the user something about the state of the submodule. So
I would propose to print a line for the submodule starting with a special
character that tells the user the submodule has a merge conflict. We
could e.g. use a '*' here (additionally to '-' for uninitialized and '+'
for those submodules where the HEAD differs from the commit recorded in
the superproject).
Thanks.

As everybody knows, you are a lot more deeply involved in this than I do,
so please take the discussion and implementation from here. For example,
I don't know if '*' or 'U' is more intuitive, and I'd rather not have to
make such decisions ;-)
Nicolas Morey-Chaisemartin
2011-03-21 21:34:31 UTC
Permalink
Post by Jens Lehmann
Post by Junio C Hamano
- Update should refrain from touching the submodule itself. It may want
to recurse into the submodule of the unmerged one, but people involved
in submodule work should think things through;
I don't think recursing would make any sense here. It might be unknown
to what commit the sub-submodules should be updated to if their commits
differ in the unmerged branches. So I'd vote for not recursing at all in
this case (which is what your patch does).
After a bit of thinking about the way we use git at my company, there is something that could be done here. It may be completely useless for most people (or maybe even stupid and feel free to enlighten me!)
We work with continuous integration on two level of git (1 integration which only has submodules and lots of source repositories).
The usual thing when a user want to push its diff is:
- commit in the submodules on his branch
- Update the submodules refs in the integration repo on his branch
- Pull master
- See there are conflicts
- Blindly pull master in all the conflicted submodules. Push the merge
- Commit in the integration repo and pray it works !

Although in the scheme git submodule update by itself does not make much sense. What people actually do is pretty much a git submodule update --merge of the conflicting SHA1 for each submodule.

For the moment we used either ruby scripts or a list of commands that users blindly follows without understanding much of it, and seeing something like that (there's probably a nicest way to do it) in git would be great.
Post by Jens Lehmann
Post by Junio C Hamano
- Status does not have anything to report for an unmerged submodule. It
may want to recurse into the submodule of the unmerged one, but people
involved in submodule work should think things through; and
I do think status does have something to report here. First its job is to
list all submodules, so currently unmerged submodules should show up too,
and then it tells the user something about the state of the submodule. So
I would propose to print a line for the submodule starting with a special
character that tells the user the submodule has a merge conflict. We
could e.g. use a '*' here (additionally to '-' for uninitialized and '+'
for those submodules where the HEAD differs from the commit recorded in
the superproject).
Being a big user of __git_ps1, my brain now considers '*' has uncached diff and '+' has cached diff. I'd rather have something as 'U' that stands outs. But I like the idea any way !

Nicolas Morey-Chaisemartin
Jens Lehmann
2011-03-21 22:01:40 UTC
Permalink
Post by Nicolas Morey-Chaisemartin
Post by Jens Lehmann
Post by Junio C Hamano
- Update should refrain from touching the submodule itself. It may want
to recurse into the submodule of the unmerged one, but people involved
in submodule work should think things through;
I don't think recursing would make any sense here. It might be unknown
to what commit the sub-submodules should be updated to if their commits
differ in the unmerged branches. So I'd vote for not recursing at all in
this case (which is what your patch does).
After a bit of thinking about the way we use git at my company, there is something that could be done here. It may be completely useless for most people (or maybe even stupid and feel free to enlighten me!)
We work with continuous integration on two level of git (1 integration which only has submodules and lots of source repositories).
- commit in the submodules on his branch
- Update the submodules refs in the integration repo on his branch
- Pull master
- See there are conflicts
- Blindly pull master in all the conflicted submodules. Push the merge
- Commit in the integration repo and pray it works !
Although in the scheme git submodule update by itself does not make much sense. What people actually do is pretty much a git submodule update --merge of the conflicting SHA1 for each submodule.
For the moment we used either ruby scripts or a list of commands that users blindly follows without understanding much of it, and seeing something like that (there's probably a nicest way to do it) in git would be great.
Hmm, I'm not sure if I fully understand your use case. Maybe being able
to tell pull to run a "git merge <sha1-from-upstream>" in submodules
where the superproject's merge produced conflicts would help you?
Post by Nicolas Morey-Chaisemartin
Post by Jens Lehmann
Post by Junio C Hamano
- Status does not have anything to report for an unmerged submodule. It
may want to recurse into the submodule of the unmerged one, but people
involved in submodule work should think things through; and
I do think status does have something to report here. First its job is to
list all submodules, so currently unmerged submodules should show up too,
and then it tells the user something about the state of the submodule. So
I would propose to print a line for the submodule starting with a special
character that tells the user the submodule has a merge conflict. We
could e.g. use a '*' here (additionally to '-' for uninitialized and '+'
for those submodules where the HEAD differs from the commit recorded in
the superproject).
Being a big user of __git_ps1, my brain now considers '*' has uncached diff and '+' has cached diff. I'd rather have something as 'U' that stands outs.
That is a good argument against '*'. I don't have strong feelings about
that, I just came up with '*' because "git submodule status" already uses
'-' and '+' in it's output. But anyways, 'U' is fine for me too.
Nicolas Morey-Chaisemartin
2011-03-22 06:28:04 UTC
Permalink
Post by Jens Lehmann
Hmm, I'm not sure if I fully understand your use case. Maybe being able
to tell pull to run a "git merge <sha1-from-upstream>" in submodules
where the superproject's merge produced conflicts would help you?
In short yes. And if it could be recursive (if a conflicting submodules has conflicting submodules) that would be great !
And yes git-merge is probably a better place to do something like that than git submodule update.
Post by Jens Lehmann
That is a good argument against '*'. I don't have strong feelings about
that, I just came up with '*' because "git submodule status" already uses
'-' and '+' in it's output. But anyways, 'U' is fine for me too.
OK. I'll integrate that with the patch Junio proposed then.

Nicolas
funeeldy
2011-07-14 18:33:39 UTC
Permalink
How can new users of git submodules learn about merge conflicts? When and
why they occur, and how to resolve them in a way that doesn't lose commits?
If I have to choose my version or their version, that isn't really a merge,
or is it?
Thanks.

--
View this message in context: http://git.661346.n2.nabble.com/PATCH-git-submodule-Remove-duplicate-entries-during-merge-with-conflict-tp6180061p6584432.html
Sent from the git mailing list archive at Nabble.com.
Jens Lehmann
2011-07-15 19:27:48 UTC
Permalink
Post by funeeldy
How can new users of git submodules learn about merge conflicts? When and
why they occur, and how to resolve them in a way that doesn't lose commits?
That is pretty similar to merge conflicts in regular files: They happen
when you merge two branches where both sides changed the recorded submodule
commit to another one (and those are not the same). The resolution can be
done by finding a submodule commit that has both changes. In most cases that
will be based on a merge (maybe even a fast forward) with both commits in
its history.

If git finds a fast forward from the merge base to commit1 and commit2
(let's say they both are on master and newer than the merge base), the
newer one is picked automatically. If git finds a single merge from which
both commits are reachable, it proposes that as conflict resolution and
tells you how you can stage that. Only if it doesn't find any or more
than two merges, you are on your own and have to resolve the conflict
manually by finding an appropriate commit.
Post by funeeldy
If I have to choose my version or their version, that isn't really a merge,
or is it?
It would be if the merge strategy is "ours" or "theirs". But in most normal
cases you would want to have both commits reachable from the merge result.
Marlene Cote
2011-07-15 20:32:59 UTC
Permalink
How would you find an appropriate commit?

--------------------------
Regards,
Marlene Cote
Affirmed Networks
978-268-0821


->-----Original Message-----
->From: ***@web.de [mailto:***@web.de]
->Sent: Friday, July 15, 2011 3:28 PM
->To: Marlene Cote
->Cc: ***@vger.kernel.org
->Subject: Re: [PATCH] git-submodule: Remove duplicate entries during merge with conflict
->
->Am 14.07.2011 20:33, schrieb funeeldy:
->> How can new users of git submodules learn about merge conflicts? When and
->> why they occur, and how to resolve them in a way that doesn't lose commits?
->
->That is pretty similar to merge conflicts in regular files: They happen
->when you merge two branches where both sides changed the recorded submodule
->commit to another one (and those are not the same). The resolution can be
->done by finding a submodule commit that has both changes. In most cases that
->will be based on a merge (maybe even a fast forward) with both commits in
->its history.
->
->If git finds a fast forward from the merge base to commit1 and commit2
->(let's say they both are on master and newer than the merge base), the
->newer one is picked automatically. If git finds a single merge from which
->both commits are reachable, it proposes that as conflict resolution and
->tells you how you can stage that. Only if it doesn't find any or more
->than two merges, you are on your own and have to resolve the conflict
->manually by finding an appropriate commit.
->
->> If I have to choose my version or their version, that isn't really a merge,
->> or is it?
->
->It would be if the merge strategy is "ours" or "theirs". But in most normal
->cases you would want to have both commits reachable from the merge result.
Jens Lehmann
2011-07-15 21:41:41 UTC
Permalink
Post by Marlene Cote
How would you find an appropriate commit?
The same way you resolve conflicts for a regular file: If you are unlucky
and the merge strategy doesn't resolve the conflict for you automatically
(or at least gives you a hint what /could/ be the resolution), you have
to use human judgment to find an appropriate resolution. In most cases
that will be a commit where both conflicting commits show up in the
history (and maybe you'll even have to create one yourself by doing a
proper merge in the submodule).

Where I work we have a simple best practice that guarantees us git will
always find a proper resolution itself: We only record commits that are
on the submodules master branch in the superproject. To put it in other
words: merge the submodule first before you commit in the superproject.
Loading...