Discussion:
git svn's performance issue and strange pauses, and other thing
Hin-Tak Leung
2014-09-18 07:39:53 UTC
Permalink
(I am not on the list - please CC)

Thanks for git-svn - I use it instead of subversion itself for many years now.

Just thought I'd ask/report a few issues I noticed for some time
now, of tracking development of a particular subversion-based
development project. Broadly speaking, I think there are 3 problems,
especially noticeable against a particular repository, but
to a lesser extent with some others too.

- just doing "git svn fetch --all" seems to consume a lot of memory,
for very little actual fetched changes. (in the 2GB+ region, sometimes).

- "git svn fetch --all" also seems to take a long time too, for certain
fetched changes. (in the minutes region).

- I know I can probably just "read the source", but I'd like to know
why .git/svn/.caches is even larger than .git/objects (which supposedly
contains everything that's of interest)? I hope this can be documented
towards the end of the man-page, for example, of important parts
of .git/svn (and what not to do with them...), without needing to
'read the source'. Here is part of "du" from a couple of days ago:

254816 .git/objects
307056 .git/svn/.caches
332452 .git/svn
588064 .git

The actual .git/config is here - this should be sufficient info for
somebody looking into experiencing the issues I mentioned above.

--------
$ more .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[svn-remote "svn"]
url = https://svn.r-project.org/R
fetch = trunk:refs/remotes/trunk
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
[pack]
threads = 1
------------
Eric Wong
2014-09-19 08:25:29 UTC
Permalink
Post by Hin-Tak Leung
(I am not on the list - please CC)
Done, it is standard practice for git :)
Post by Hin-Tak Leung
Thanks for git-svn - I use it instead of subversion itself for many years now.
Just thought I'd ask/report a few issues I noticed for some time
now, of tracking development of a particular subversion-based
development project. Broadly speaking, I think there are 3 problems,
especially noticeable against a particular repository, but
to a lesser extent with some others too.
- just doing "git svn fetch --all" seems to consume a lot of memory,
for very little actual fetched changes. (in the 2GB+ region, sometimes).
- "git svn fetch --all" also seems to take a long time too, for certain
fetched changes. (in the minutes region).
Jakob sent some patches a few months ago which seem to address the
issue. Unfortunately we forgot about them :x

Can you take a look at the following two "mergeinfo-speedups"
in my repo? (git://bogomips.org/git-svn)

Jakob Stoklund Olesen (2):
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo

Also downloadable here:

http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a

Can you please give them a try?
Post by Hin-Tak Leung
- I know I can probably just "read the source", but I'd like to know
why .git/svn/.caches is even larger than .git/objects (which supposedly
contains everything that's of interest)? I hope this can be documented
towards the end of the man-page, for example, of important parts
of .git/svn (and what not to do with them...), without needing to
254816 .git/objects
307056 .git/svn/.caches
332452 .git/svn
588064 .git
The actual .git/config is here - this should be sufficient info for
somebody looking into experiencing the issues I mentioned above.
IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
help, there, too.

Sorry I don't understand the mergeinfo stuff more, I've never worked on
a project which uses it.
Post by Hin-Tak Leung
--------
$ more .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[svn-remote "svn"]
url = https://svn.r-project.org/R
fetch = trunk:refs/remotes/trunk
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
[pack]
threads = 1
------------
Jakob Stoklund Olesen
2014-09-19 13:44:00 UTC
Permalink
Post by Eric Wong
Post by Hin-Tak Leung
- I know I can probably just "read the source", but I'd like to know
why .git/svn/.caches is even larger than .git/objects (which supposedly
contains everything that's of interest)? I hope this can be documented
towards the end of the man-page, for example, of important parts
of .git/svn (and what not to do with them...), without needing to
254816 .git/objects
307056 .git/svn/.caches
332452 .git/svn
588064 .git
The actual .git/config is here - this should be sufficient info for
somebody looking into experiencing the issues I mentioned above.
IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
help, there, too.
IIRC the caches are used for memoization, and with my two patches applied it doesn't improve performance much.

You could try removing the memoization after applying my patches.

Thanks,
/Jakob--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Wong
2014-10-05 01:02:43 UTC
Permalink
Post by Eric Wong
Jakob sent some patches a few months ago which seem to address the
issue. Unfortunately we forgot about them :x
Hin-Tak: have you tried Jakob's patches? I've taken another look,
signed-off and pushed to my master.
Post by Eric Wong
Can you take a look at the following two "mergeinfo-speedups"
in my repo? (git://bogomips.org/git-svn)
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo
http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
Can you please give them a try?
Hin-Tak Leung
2014-10-06 23:51:32 UTC
Permalink
------------------------------
Post by Eric Wong
Jakob sent some patches a few months ago which seem to address the
issue.=A0 Unfortunately we forgot about them :x
Hin-Tak: have you tried Jakob's patches?=A0 I've taken another look,
signed-off and pushed to my master.
Post by Eric Wong
Can you take a look at the following two "mergeinfo-speedups"
in my repo?=A0 (git://bogomips.org/git-svn)
=20
=A0 =A0 =A0=A0=A0git-svn: only look at the new parts of svn:mergeinfo
=A0 =A0 =A0=A0=A0git-svn: only look at the root path for svn:mergeinf=
o
Post by Eric Wong
=20
=20
http://bogomips.org/git-svn.git/patch?id=3D9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=3D73409a2145e93b436d74a
=20
Can you please give them a try?
Apologies - I applied them on top of 2.1.0 earlier today, and the svn r=
epo just
hasn't been changed much recently to show any interesting behavior
with 'git svn fetch --all', so I thought about whether I should wait to=
report. Then
I changed my mind, and decided what the hell, let's clone the whole
thing again :-). So I made a new directory, run 'git init', just copy
=2Egit/config from the old reop and am doing 'git svn fetch --all' in t=
he new empty
directory again.

So far it seems to be good. But I am only at revision 35700-ish at the =
moment,
and the whole thing is 66700-ish. Oh, I forgot to mention that the stra=
nge
pauses seem to be followed by messages like these:

W:svn cherry-pick ignored (/branches/R-2-12-branch:52939,54476,55265) -=
missing 492 commit(s) (eg 9bf20dca6a8b05dff28e6486b1613f10825972c9)
W:svn cherry-pick ignored (/branches/R-2-13-branch:55265,55432) - missi=
ng 231 commit(s) (eg 9290cf6ce2d7f6cca168cf326eed6e9fe760895f)
W:svn cherry-pick ignored (/branches/R-2-15-branch:58894,59717) - missi=
ng 405 commit(s) (eg ed84a373b33f728949edf3371829fc3414c343a8)
W:svn cherry-pick ignored (/branches/R-3-0-branch:62497) - missing 154 =
commit(s) (eg 9e4742d201771c9658417c2d2f83838e550e3162)
W:svn cherry-pick ignored (/trunk:

So presumably I'd only see interesting behavior when there are a number=
of branches.
It seems the first branches are around revision 48000-ish, so I might h=
ave
to wait a bit.

So far, the new clone hasn't created ".git/svn/.caches/" yet; and memor=
y consumption seems
okay also.
Hin-Tak Leung
2014-10-07 18:20:46 UTC
Permalink
------------------------------
Post by Hin-Tak Leung
------------------------------
<snipped>
Post by Hin-Tak Leung
Hin-Tak: have you tried Jakob's patches?=A0 I've taken another look,
signed-off and pushed to my master.
=2E.. Then
Post by Hin-Tak Leung
I changed my mind, and decided what the hell, let's clone the whole
thing again :-). So I made a new directory, run 'git init', just copy
.git/config from the old reop and am doing 'git svn fetch --all' in th=
e new empty
Post by Hin-Tak Leung
directory again.
So far it seems to be good. But I am only at revision 35700-ish at the=
moment,
Post by Hin-Tak Leung
and the whole thing is 66700-ish. Oh, I forgot to mention that the str=
ange
Post by Hin-Tak Leung
W:svn cherry-pick ignored (/branches/R-2-12-branch:52939,54476,55265) =
- missing 492 commit(s) (eg 9bf20dca6a8b05dff28e6486b1613f10825972c9)
Post by Hin-Tak Leung
W:svn cherry-pick ignored (/branches/R-2-13-branch:55265,55432) - miss=
ing 231 commit(s) (eg 9290cf6ce2d7f6cca168cf326eed6e9fe760895f)
Post by Hin-Tak Leung
W:svn cherry-pick ignored (/branches/R-2-15-branch:58894,59717) - miss=
ing 405 commit(s) (eg ed84a373b33f728949edf3371829fc3414c343a8)
Post by Hin-Tak Leung
W:svn cherry-pick ignored (/branches/R-3-0-branch:62497) - missing 154=
commit(s) (eg 9e4742d201771c9658417c2d2f83838e550e3162)
Post by Hin-Tak Leung
So presumably I'd only see interesting behavior when there are a numbe=
r of branches.
Post by Hin-Tak Leung
It seems the first branches are around revision 48000-ish, so I might =
have
Post by Hin-Tak Leung
to wait a bit.
So far, the new clone hasn't created ".git/svn/.caches/" yet; and memo=
ry consumption seems
Post by Hin-Tak Leung
okay also.
The changes definitely improve, as far as my impression goes. There was=
only one notable pause around
r50651, and it is probably because the rather large "Checking svn:merge=
info changes since r15413"
from r15413? That took about 12 minutes. Other instances of "W:svn cher=
ry-pick ignored"
though do take a while, are in the seconds region - before the code cha=
nges they could
be minutes, if memory serves.

<--
M src/library/tools/R/toHTML.R
r50650 =3D bed91d435c535f2643cf0d48623fecf86d264bd9 (refs/remotes/trunk=
)
M src/modules/X11/rotated.c
M src/modules/X11/dataentry.c
Checking svn:mergeinfo changes since r15413: 1 sources, 1 changed
W:svn cherry-pick ignored (/trunk:28840) - missing 9372 commit(s) (eg c=
ea6142c76300539a0d0c9c743738e31a9f7d523)
r50651 =3D ad139a5bf91f9ad6690ff5fb4a3f71cea591a944 (refs/remotes/R-uth=
reads)
-->

The new clone has:

<--
$ ls -ltr .git/svn/.caches/
total 144788
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.ya=
ml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.y=
aml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
-->

The old clone has:

<---
$ ls -ltr .git/svn/.caches/
total 318824
-rw-rw-r--. 1 Hin-Tak Hin-Tak 5711724 Jul 24 2012 lookup_svn_merge.d=
b
-rw-rw-r--. 1 Hin-Tak Hin-Tak 30523628 Jul 24 2012 check_cherry_pick.=
db
-rw-rw-r--. 1 Hin-Tak Hin-Tak 296592 Jul 24 2012 has_no_changes.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.y=
aml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.=
yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yam=
l
-rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
-->

I had to suspend somewhat around r59000 - but it is interesting to see
that the max memory consumption of the later part is almost double?
and it also runs at 100% rather than 60% overall; I don't know what
to make of that - probably just smaller changes versus
larger ones, or different time of day and network loads (yes,
I guess it is just bandwidth-limited?, since the bulk of CPU time is in=
system
rather than user).

I am somwhat worry about the dramatic difference between the two .svn/.=
caches -
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?

<--
M src/main/dotcode.c
M doc/NEWS.Rd
r59140 =3D b6014a226aebf9e016c89c0bd1aca1979796a057 (refs/remotes/trunk=
)
M src/main/dotcode.c
M doc/NEWS.Rd
Checking svn:mergeinfo changes since r59138: 4 sources, 1 changed
W:svn cherry-pick ignored (/trunk:59137,59140) - missing 369 commit(s) =
(eg 8a2a36083ba39be27fc9940acc3f51eab6a7a0c3)
r59141 =3D 38c6d05f164d34e4b5cc545bda387be9d910f748 (refs/remotes/R-2-1=
5-branch)
Connection timed out: Connection timed out at /usr/share/perl5/vendor_p=
erl/Git/SVN/Ra.pm line 290.

Command exited with non-zero status 1
Command being timed: "git svn fetch --all"
User time (seconds): 5642.19
System time (seconds): 23552.44
Percent of CPU this job got: 57%
Elapsed (wall clock) time (h:mm:ss or m:ss): 14:06:58
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 349324
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 39
Minor (reclaiming a frame) page faults: 744713614
Voluntary context switches: 4761489
Involuntary context switches: 8595950
Swaps: 0
File system inputs: 7712
File system outputs: 121404296
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
-->
<--
M src/include/Defn.h
r66719 =3D 1e3288d3ae4cfb15f6e4e4116f18d38b3efc5bb5 (refs/remotes/trunk=
)
M doc/NEWS.Rd
r66720 =3D 1c184e5fc2b71a27767215a45a1270f3edbc616f (refs/remotes/trunk=
)
Checked out HEAD:
https://svn.r-project.org/R/trunk r66720
creating empty directory: tests/Pkgs/exNSS4/man
Command being timed: "git svn fetch --all"
User time (seconds): 2126.00
System time (seconds): 7852.44
Percent of CPU this job got: 96%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:52:38
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 755256
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 6
Minor (reclaiming a frame) page faults: 142730534
Voluntary context switches: 898725
Involuntary context switches: 1842056
Swaps: 0
File system inputs: 1800
File system outputs: 28606392
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
-->
Eric Wong
2014-10-19 04:12:38 UTC
Permalink
Post by Hin-Tak Leung
<--
$ ls -ltr .git/svn/.caches/
total 144788
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
-->
<snip>
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
-->
I had to suspend somewhat around r59000 - but it is interesting to see
that the max memory consumption of the later part is almost double?
and it also runs at 100% rather than 60% overall; I don't know what
to make of that - probably just smaller changes versus
larger ones, or different time of day and network loads (yes,
I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
rather than user).
git-svn memory usage is insane, and we need to reduce it.
(on Linux, fork() performance is reduced as memory size of the parent
grows, and I don't think we can easily call vfork() from Perl)
Post by Hin-Tak Leung
I am somwhat worry about the dramatic difference between the two .svn/.caches -
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
Calling patterns changed, and it looks like Jakob's changes avoided some
calls. The main thing to care about:
Does the repository history look right?

The check_cherry_pick cache can be made smaller, too:
----------------------- 8< -----------------------------
From: Eric Wong <***@yhbt.net>
Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead

We do not need to store entire lists of commits, only the
number of incomplete and the first commit for reference.
This reduces the amount of data we need to store in memory
and on disk stores.

Signed-off-by: Eric Wong <***@yhbt.net>
---
perl/Git/SVN.pm | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..b2d37cb 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1537,7 +1537,7 @@ sub _rev_list {
@rv;
}

-sub check_cherry_pick {
+sub check_cherry_pick2 {
my $base = shift;
my $tip = shift;
my $parents = shift;
@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
delete $commits{$commit};
}
}
- return (keys %commits);
+ my @k = (keys %commits);
+ return (scalar @k, $k[0]);
}

sub has_no_changes {
@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
mkpath([$cache_path]) unless -d $cache_path;

my %lookup_svn_merge_cache;
- my %check_cherry_pick_cache;
+ my %check_cherry_pick2_cache;
my %has_no_changes_cache;
my %_rev_list_cache;

@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
;

- tie_for_persistent_memoization(\%check_cherry_pick_cache,
- "$cache_path/check_cherry_pick");
- memoize 'check_cherry_pick',
+ tie_for_persistent_memoization(\%check_cherry_pick2_cache,
+ "$cache_path/check_cherry_pick2");
+ memoize 'check_cherry_pick2',
SCALAR_CACHE => 'FAULT',
- LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
+ LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
;

tie_for_persistent_memoization(\%has_no_changes_cache,
@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
$memoized = 0;

Memoize::unmemoize 'lookup_svn_merge';
- Memoize::unmemoize 'check_cherry_pick';
+ Memoize::unmemoize 'check_cherry_pick2';
Memoize::unmemoize 'has_no_changes';
Memoize::unmemoize '_rev_list';
}
@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
return unless -d $cache_path;

for my $cache_file (("$cache_path/lookup_svn_merge",
- "$cache_path/check_cherry_pick",
+ "$cache_path/check_cherry_pick", # old
+ "$cache_path/check_cherry_pick2",
"$cache_path/has_no_changes")) {
for my $suffix (qw(yaml db)) {
my $file = "$cache_file.$suffix";
@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
}

# double check that there are no missing non-merge commits
- my (@incomplete) = check_cherry_pick(
+ my ($ninc, $ifirst) = check_cherry_pick2(
$merge_base, $merge_tip,
$parents,
@all_ranges,
);

- if ( @incomplete ) {
- warn "W:svn cherry-pick ignored ($spec) - missing "
- ***@incomplete." commit(s) (eg $incomplete[0])\n";
+ if ($ninc) {
+ warn "W:svn cherry-pick ignored ($spec) - missing " .
+ "$ninc commit(s) (eg $ifirst)\n";
} else {
warn
"Found merge parent ($spec): ",
--
EW
Jakob Stoklund Olesen
2014-10-19 14:41:16 UTC
Permalink
Post by Eric Wong
Post by Hin-Tak Leung
I am somwhat worry about the dramatic difference between the two .svn/.caches -
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
Calling patterns changed, and it looks like Jakob's changes avoided some
calls.
It is possible that those functions don't need to be memoized any more. My patch is trying to avoid calling them with the same arguments over and over, and memoizing doesn't help when arguments are changing.

Thanks,
/jakob--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hin-Tak Leung
2014-10-19 14:04:51 UTC
Permalink
------------------------------
Post by Hin-Tak Leung
<--
$ ls -ltr .git/svn/.caches/
total 144788
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
-->
<snip>
-rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
-->
I had to suspend somewhat around r59000 - but it is interesting to see
that the max memory consumption of the later part is almost double?
and it also runs at 100% rather than 60% overall; I don't know what
to make of that - probably just smaller changes versus
larger ones, or different time of day and network loads (yes,
I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
rather than user).
git-svn memory usage is insane, and we need to reduce it.
(on Linux, fork() performance is reduced as memory size of the parent
grows, and I don't think we can easily call vfork() from Perl)
I am somwhat worry about the dramatic difference between the two .svn/.caches -
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
Calling patterns changed, and it looks like Jakob's changes avoided some
Does the repository history look right?
----------------------- 8< -----------------------------
Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
We do not need to store entire lists of commits, only the
number of incomplete and the first commit for reference.
This reduces the amount of data we need to store in memory
and on disk stores.
---
perl/Git/SVN.pm | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..b2d37cb 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1537,7 +1537,7 @@ sub _rev_list {
@rv;
}
-sub check_cherry_pick {
+sub check_cherry_pick2 {
my $base = shift;
my $tip = shift;
my $parents = shift;
@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
delete $commits{$commit};
}
}
- return (keys %commits);
}
sub has_no_changes {
@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
mkpath([$cache_path]) unless -d $cache_path;
my %lookup_svn_merge_cache;
- my %check_cherry_pick_cache;
+ my %check_cherry_pick2_cache;
my %has_no_changes_cache;
my %_rev_list_cache;
@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
;
- tie_for_persistent_memoization(\%check_cherry_pick_cache,
- "$cache_path/check_cherry_pick");
- memoize 'check_cherry_pick',
+ tie_for_persistent_memoization(\%check_cherry_pick2_cache,
+ "$cache_path/check_cherry_pick2");
+ memoize 'check_cherry_pick2',
SCALAR_CACHE => 'FAULT',
- LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
+ LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
;
tie_for_persistent_memoization(\%has_no_changes_cache,
@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
$memoized = 0;
Memoize::unmemoize 'lookup_svn_merge';
- Memoize::unmemoize 'check_cherry_pick';
+ Memoize::unmemoize 'check_cherry_pick2';
Memoize::unmemoize 'has_no_changes';
Memoize::unmemoize '_rev_list';
}
@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
return unless -d $cache_path;
for my $cache_file (("$cache_path/lookup_svn_merge",
- "$cache_path/check_cherry_pick",
+ "$cache_path/check_cherry_pick", # old
+ "$cache_path/check_cherry_pick2",
"$cache_path/has_no_changes")) {
for my $suffix (qw(yaml db)) {
my $file = "$cache_file.$suffix";
@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
}
# double check that there are no missing non-merge commits
+ my ($ninc, $ifirst) = check_cherry_pick2(
$merge_base, $merge_tip,
$parents,
@all_ranges,
);
- warn "W:svn cherry-pick ignored ($spec) - missing "
+ if ($ninc) {
+ warn "W:svn cherry-pick ignored ($spec) - missing " .
+ "$ninc commit(s) (eg $ifirst)\n";
} else {
warn
"Found merge parent ($spec): ",
--
EW
Hin-Tak Leung
2014-10-19 14:22:29 UTC
Permalink
(sorry about the last blank reply - mobile phone and finger accident...=
)

------------------------------
Post by Hin-Tak Leung
=20
<--
$ ls -ltr .git/svn/.caches/
total 144788
-rw-rw-r--. 1 Hin-Tak Hin-Tak=A0 1166138 Oct=A0 7 13:44 lookup_svn_me=
rge.yaml
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct=A0 7 13:48 check_cherry_pi=
ck.yaml
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak=A0 1133855 Oct=A0 7 13:49 has_no_change=
s.yaml
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct=A0 7 13:53 _rev_list.yaml
-->
=20
<snip>
-rw-rw-r--. 1 Hin-Tak Hin-Tak=A0 40241189 Oct=A0 5 16:42 lookup_svn_m=
erge.yaml
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct=A0 5 16:49 check_cherry_p=
ick.yaml
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak=A0 =A0 242547 Oct=A0 5 16:49 has_no_cha=
nges.yaml
Post by Hin-Tak Leung
-rw-rw-r--. 1 Hin-Tak Hin-Tak=A0 24120007 Oct=A0 5 16:50 _rev_list.ya=
ml
Post by Hin-Tak Leung
-->
=20
I had to suspend somewhat around r59000 - but it is interesting to se=
e
Post by Hin-Tak Leung
that the max memory consumption of the later part is almost double?
and it also runs at 100% rather than 60% overall; I don't know what
to make of that - probably just smaller changes versus
larger ones, or different time of day and network loads (yes,
I guess it is just bandwidth-limited?, since the bulk of CPU time is =
in system
Post by Hin-Tak Leung
rather than user).
git-svn memory usage is insane, and we need to reduce it.
(on Linux, fork() performance is reduced as memory size of the parent
grows, and I don't think we can easily call vfork() from Perl)
Yes, I think the memory consumption is a bit crazy. I ran svn fetch on
the old again and it was a bit slow, so I timed the new, and here it is=
=2E
=46or just fetching 45 changes, it took 36 minutes and the memory=20
consumption shoots up to over 1GB. (there was one or two mergeinfo
in the middle, not shown).

<---
cd ../R-2/
[Hin-***@localhost R-2]$ /usr/bin/time -v git svn fetch --all
M src/library/base/R/apply.R
M src/library/base/man/apply.Rd
M doc/NEWS.Rd
r66721 =3D e26e52bf4b2cdbe291d5899fd0a449f197aa2133 (refs/remotes/trunk=
)
=2E..
M src/library/tools/R/utils.R
r66765 =3D c64d1828ada98395892529ce59b5760de1bdc60b (refs/remotes/R-3-1=
-branch)
---
Command being timed: "git svn fetch --all"
User time (seconds): 2042.81
System time (seconds): 115.98
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 36:13.74
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1019092
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1149
Minor (reclaiming a frame) page faults: 1482219
Voluntary context switches: 9470
Involuntary context switches: 226683
Swaps: 0
File system inputs: 358864
File system outputs: 510680
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
[Hin-***@localhost R-2]$ cd ../R
--->
Post by Hin-Tak Leung
I am somwhat worry about the dramatic difference between the two .svn=
/.caches -
Post by Hin-Tak Leung
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and als=
o
Post by Hin-Tak Leung
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
Calling patterns changed, and it looks like Jakob's changes avoided so=
me
Post by Hin-Tak Leung
=A0=A0=A0 Does the repository history look right?
I'll check soon and report. I looks superficiently okay. I suppose
I'd need to check every branch to be sure. I know the fetch history is
different - but reflog (or the equivalent of it in svn) expires and are=
pruned
after two weeks?
Post by Hin-Tak Leung
----------------------- 8< -----------------------------
Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
We do not need to store entire lists of commits, only the
number of incomplete and the first commit for reference.
This reduces the amount of data we need to store in memory
and on disk stores.
Is there a way of retrospectively compress/trimming the cache, or bette=
r
still, examine it before compressing?

I intend to hold on to both the new and the old clone for a while until
I can reconcil the differences... though I am running the same git svn =
code
on both now.
Post by Hin-Tak Leung
---
perl/Git/SVN.pm | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..b2d37cb 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1537,7 +1537,7 @@ sub _rev_list {
}
=20
-sub check_cherry_pick {
+sub check_cherry_pick2 {
=A0=A0=A0 my $base =3D shift;
=A0=A0=A0 my $tip =3D shift;
=A0=A0=A0 my $parents =3D shift;
@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 delete $commits{$commit};
=A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 }
-=A0=A0=A0 return (keys %commits);
}
=20
sub has_no_changes {
@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
=A0=A0=A0 =A0=A0=A0 mkpath([$cache_path]) unless -d $cache_path;
=20
=A0=A0=A0 =A0=A0=A0 my %lookup_svn_merge_cache;
-=A0=A0=A0 =A0=A0=A0 my %check_cherry_pick_cache;
+=A0=A0=A0 =A0=A0=A0 my %check_cherry_pick2_cache;
=A0=A0=A0 =A0=A0=A0 my %has_no_changes_cache;
=A0=A0=A0 =A0=A0=A0 my %_rev_list_cache;
=20
@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LIST_CACHE =3D> ['HASH' =3D> \%lookup_s=
vn_merge_cache],
Post by Hin-Tak Leung
=A0=A0=A0 =A0=A0=A0 ;
=20
-=A0=A0=A0 =A0=A0=A0 tie_for_persistent_memoization(\%check_cherry_pic=
k_cache,
Post by Hin-Tak Leung
-=A0=A0=A0 =A0=A0=A0 =A0 =A0 "$cache_path/check_cherry_pick");
-=A0=A0=A0 =A0=A0=A0 memoize 'check_cherry_pick',
+=A0=A0=A0 =A0=A0=A0 tie_for_persistent_memoization(\%check_cherry_pic=
k2_cache,
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 =A0 =A0 "$cache_path/check_cherry_pick2");
+=A0=A0=A0 =A0=A0=A0 memoize 'check_cherry_pick2',
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 SCALAR_CACHE =3D> 'FAULT',
-=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LIST_CACHE =3D> ['HASH' =3D> \%check_ch=
erry_pick_cache],
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LIST_CACHE =3D> ['HASH' =3D> \%check_ch=
erry_pick2_cache],
Post by Hin-Tak Leung
=A0=A0=A0 =A0=A0=A0 ;
=20
=A0=A0=A0 =A0=A0=A0 tie_for_persistent_memoization(\%has_no_changes_c=
ache,
Post by Hin-Tak Leung
@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
=A0=A0=A0 =A0=A0=A0 $memoized =3D 0;
=20
=A0=A0=A0 =A0=A0=A0 Memoize::unmemoize 'lookup_svn_merge';
-=A0=A0=A0 =A0=A0=A0 Memoize::unmemoize 'check_cherry_pick';
+=A0=A0=A0 =A0=A0=A0 Memoize::unmemoize 'check_cherry_pick2';
=A0=A0=A0 =A0=A0=A0 Memoize::unmemoize 'has_no_changes';
=A0=A0=A0 =A0=A0=A0 Memoize::unmemoize '_rev_list';
=A0=A0=A0 }
@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
=A0=A0=A0 =A0=A0=A0 return unless -d $cache_path;
=20
=A0=A0=A0 =A0=A0=A0 for my $cache_file (("$cache_path/lookup_svn_merg=
e",
Post by Hin-Tak Leung
-=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0 =A0=A0=A0"$cache_path/che=
ck_cherry_pick",
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0 =A0=A0=A0"$cache_path/che=
ck_cherry_pick", # old
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0 =A0=A0=A0"$cache_path/che=
ck_cherry_pick2",
Post by Hin-Tak Leung
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0 =A0=A0=A0"$cache_path/has=
_no_changes")) {
Post by Hin-Tak Leung
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for my $suffix (qw(yaml db)) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 my $file =3D "$cache_file.$su=
ffix";
Post by Hin-Tak Leung
@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
=A0=A0=A0 =A0=A0=A0 }
=20
=A0=A0=A0 =A0=A0=A0 # double check that there are no missing non-merg=
e commits
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 my ($ninc, $ifirst) =3D check_cherry_pick2(
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 $merge_base, $merge_tip,
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 $parents,
=A0=A0=A0 =A0=A0=A0 =A0 =A0 =A0=A0=A0);
=20
-=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 warn "W:svn cherry-pick ignored ($spec)=
- missing "
$incomplete[0])\n";
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 if ($ninc) {
+=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 warn "W:svn cherry-pick ignored ($spec)=
- missing " .
Post by Hin-Tak Leung
+=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "$ninc commit(s) (eg $ifirst)=
\n";
Post by Hin-Tak Leung
=A0=A0=A0 =A0=A0=A0 } else {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 warn
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "Found merge parent ($spec): =
",
Post by Hin-Tak Leung
--=20
EW
Loading...