Discussion:
git-svn performance
Fabian Schmied
2014-10-17 20:47:20 UTC
Permalink
Hi,

I'm currently migrating an SVN repository to Git using git-svn (Git
for Windows 1.8.3-preview20130601), and I'm experiencing severe
performance problems with "git svn fetch". Commits to the SVN "trunk"
are fetched very fast (a few seconds or so per SVN revision), but
commits to some branches ("hotfix" branches) are currently taking
about 9 minutes per revision. I fear that the time per these commits
is increasing and that indeed the migration might not be finishable at
all.

For the commits that take such a long time, git-svn always outputs
lots of warnings about ignored SVN cherry-picks, and it tells me it
can't find a revmap for the path being imported. (See [1].)

AFAICS, the offending commits take place on some branches that include
a lot of manually merged ("SVN cherry-picked") revisions. Git-svn
seems to be checking something (though I don't know what) that makes
importing these revisions really slow. And it repeats this for every
revision on these branches with increasing work to do.

Is there anything I can do to speed this up? (I already tried
increasing the --log-window-size to 500, didn't have any effect.)

Thank you, best regards,
Fabian

[1]
M foo/bar/XXX.xml
M foo/bar/YYY.xml
W:svn cherry-pick ignored (/branches/frob:6940-7068) - missing 12
commit(s) (eg abeaece820ceae44ebf2c06011cf43bbcbf4b1ce)
W:svn cherry-pick ignored (/branches/feature:3316-4798,4811,4827) -
missing 10 commit(s) (eg e255fff14ab1e581f21671ca8b36c0747869cf8c)
W:svn cherry-pick ignored
(/hotfixes/ZZZ.159:2131,2133,2145-2146,2148,2169) - missing 10
commit(s) (eg e04b0326c998f0611c18144b3ed8f686d3b52f4c)
W:svn cherry-pick ignored
(/hotfixes/ZZZ.333:4536,4610-4611,4625,4665,4669,4685,4713,4745,4785,4788,4908-4917,4920,4933-4944,4955,5003,5103,5174,5222,5227,
5261,5267,5306,5310,5321,5360,5416,5467,5501,5508,5599-5614,5650-5651,5757,5761-5762,5764,5778-5779,5784,5811,5814,5819,5823,5825,5836-5838,5860,5862,5873,5889,
5910,5924,5948) - missing 137 commit(s) (eg
9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
C:\Program Files (x86)\Git\bin\perl.exe: *** WFSO timed out
W:svn cherry-pick ignored (/hotfixes/ZZZ.333.39:5696,5847) - missing
84 commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored (/hotfixes/AAA:5905,6095) - missing 119
commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored (/hotfixes/BBB_1.1:6971) - missing 198
commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored
(/hotfixes/CCC:6134,6164,6168,6174,6206,6211,6237,6239,6244-6245,6250,6257,6269,6271,6276,6289-6292,6294,6296,6301-6302,6313,6315-6316,6329,6333,6379,6383,6394,6405,6411,6456,6478,6483,6491,6519,6537,6557)
- missing 194 commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored (/hotfixes/DDD:7635) - missing 1 commit(s)
(eg 6a3ba817635eb3a9411a307924dec393311d93be)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.2:7786,7794,7797,7803,7829-7830,7843,7886,7889,7933,7937,7949,7953)
- missing 80 commit(s) (eg e78b1bc68f7a9b041588a39f3fa5e1a61f98942b)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.3:8159,8170,8173-8174,8177,8181-8182,8185,8187,8194-8195,8201,8203,8206,8251,8255,8257,8259-8262,8265,8280,8286,8294,8296,8304-8305,8312,8318,8323,8327,8363,8387-8388,8390,8422-8423,8432,8446,8536-8537,8548-8549,8556,8559,8566,8569,8572,8578,8597-8598,8602,8617,8619,8655,8687,8720)
- missing 104 commit(s) (eg 33febd4591f42a9d871ba330432840917b157f9e)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.4:8766,8768,8770,8777-8779,8795-8796,8802-8809,8812-8814,8816-8817,8820,8823,8825,8827,8831,8836,8841,8845,8848-8852,8854-8855,8866,8868-8869,8871-8873,8875-8878,8880,8888,8892,8911-8912,8917-8918,8946,8956-8957,8964,8984,8994,9003,9008,9011,9029,9038,9040,9046-9048,9055,9086,9101,9108,9111,9113,9124,9129,9133,9138-9139,9150,9152,9154,9156,9172,9174,9188-9189,9208,9211,9217)
- missing 44 commit(s) (eg 0621fb44de682650d762c707b102bc2472c088f8)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.5:9412,9421,9430,9433-9436,9439,9441,9449,9459,9468,9529,9548,9561,9568,9605-9606,9612,9614,9617,9628,9630-9631,9637,9687,9807)
- missing 41 commit(s) (eg 1bd1a9b72336bf4d3839a00348b7f2a52368c16c)
W:svn cherry-pick ignored
(/trunk:9852-9853,9857,9859,9862,9868,9872,9876,9879,9890,9895,9926-9927,9933,9953,9956,9960-9962)
- missing 60 commit(s) (eg 3322e7ffc6ab49181976d9e94c91a4556951f38a)
Couldn't find revmap for https://the-svn-server/svn/something/trunk/foo
r9963 = 597df48cb830825f9029d1cfdf45df024d7fd3dd (refs/remotes/EEE_1.6)
Eric Wong
2014-10-19 00:32:56 UTC
Permalink
Post by Fabian Schmied
Hi,
I'm currently migrating an SVN repository to Git using git-svn (Git
for Windows 1.8.3-preview20130601), and I'm experiencing severe
performance problems with "git svn fetch". Commits to the SVN "trunk"
are fetched very fast (a few seconds or so per SVN revision), but
commits to some branches ("hotfix" branches) are currently taking
about 9 minutes per revision. I fear that the time per these commits
is increasing and that indeed the migration might not be finishable at
all.
For the commits that take such a long time, git-svn always outputs
lots of warnings about ignored SVN cherry-picks, and it tells me it
can't find a revmap for the path being imported. (See [1].)
AFAICS, the offending commits take place on some branches that include
a lot of manually merged ("SVN cherry-picked") revisions. Git-svn
seems to be checking something (though I don't know what) that makes
importing these revisions really slow. And it repeats this for every
revision on these branches with increasing work to do.
Is there anything I can do to speed this up? (I already tried
increasing the --log-window-size to 500, didn't have any effect.)
Can you take a look at the following two "mergeinfo-speedups"
in my repo? (git://bogomips.org/git-svn)

Jakob Stoklund Olesen (2):
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo

Also downloadable here:

http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a

Hin-Tak (Cc-ed) reported good improvements with them, but also
a large memory increase:

http://mid.gmane.org/***@web172303.mail.ir2.yahoo.com

Jakob (or anybody else): I suppose we could tie the new
cached_mergeinfo* caches to disk-backed storage to avoid the memory
bloat.
Eric Wong
2014-10-19 02:29:53 UTC
Permalink
Post by Eric Wong
Hin-Tak (Cc-ed) reported good improvements with them, but also
This might reduce the pathname and internal hash overheads:
------------------------8<-----------------------
From: Eric Wong <***@yhbt.net>
Date: Sun, 19 Oct 2014 02:26:53 +0000
Subject: [PATCH] git-svn: simplify cached_mergeinfo layout

This reduces hash lookups for looking up cache data and will
simplify tying data to disk in the next commit.

Signed-off-by: Eric Wong <***@yhbt.net>
---
perl/Git/SVN.pm | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index b1a84d0..25dbcd5 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1708,15 +1708,17 @@ sub mergeinfo_changes {
my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
my $old_minfo = {};

+ # layout: $path => [ $rev, \%mergeinfo ]
+ my $cached_mergeinfo = $self->{cached_mergeinfo};
+
# Initialize cache on the first call.
- unless (defined $self->{cached_mergeinfo_rev}) {
- $self->{cached_mergeinfo_rev} = {};
- $self->{cached_mergeinfo} = {};
+ unless (defined $cached_mergeinfo) {
+ $cached_mergeinfo = $self->{cached_mergeinfo} = {};
}

- my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- if (defined $cached_rev && $cached_rev == $old_rev) {
- $old_minfo = $self->{cached_mergeinfo}{$old_path};
+ my $cached = $cached_mergeinfo->{$old_path};
+ if (defined $cached && $cached->[0] == $old_rev) {
+ $old_minfo = $cached->[1];
} else {
my $ra = $self->ra;
# Give up if $old_path isn't in the repo.
@@ -1733,13 +1735,11 @@ sub mergeinfo_changes {
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $self->{cached_mergeinfo}{$old_path} = $old_minfo;
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
+ $cached_mergeinfo->{$old_path} = [ $old_rev, $old_minfo ];
}

# Cache the new mergeinfo.
- $self->{cached_mergeinfo}{$path} = \%minfo;
- $self->{cached_mergeinfo_rev}{$path} = $rev;
+ $cached_mergeinfo->{$path} = [ $rev, \%minfo ];

my %changes = ();
foreach my $p (keys %minfo) {
--
EW
Eric Wong
2014-10-19 02:33:58 UTC
Permalink
Post by Eric Wong
This reduces hash lookups for looking up cache data and will
simplify tying data to disk in the next commit.
I considered the following, but GDBM might not be readily available on
non-POSIX platforms. I think the other problem is the existing caches
are still in memory (whether YAML or Storable) even if disk-backed,
causing a large amount of memory usage anyways.

(Both patches on top of Jakob's)
-------------------------
Subject: [RFC] git-svn: tie cached_mergeinfo to a GDBM_File store

This should reduce per-instance memory usage by allowing
serialization to disk. Using the existing Memoize::Storable
or YAML backends does not allow fast lookups.

GDBM_File should be available in most Perl installations
and should not pose unnecessary burden
---
perl/Git/SVN.pm | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..3e477c7 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -14,6 +14,7 @@ use IPC::Open3;
use Memoize; # core since 5.8.0, Jul 2002
use Memoize::Storable;
use POSIX qw(:signal_h);
+use Storable qw(freeze thaw);

use Git qw(
command
@@ -1713,10 +1714,21 @@ sub mergeinfo_changes {

# Initialize cache on the first call.
unless (defined $cached_mergeinfo) {
- $cached_mergeinfo = $self->{cached_mergeinfo} = {};
+ my %hash;
+ eval '
+ require File::Temp;
+ use GDBM_File;
+ my $fh = File::Temp->new(TEMPLATE => "mergeinfo.XXXXXXXX");
+ $self->{cached_mergeinfo_fh} = $fh;
+ $fh->unlink_on_destroy(1);
+ tie %hash => "GDBM_File", $fh->filename, GDBM_WRCREAT, 0600;
+ ';
+ $cached_mergeinfo = $self->{cached_mergeinfo} = \%hash;
}

my $cached = $cached_mergeinfo->{$old_path};
+ $cached = thaw($cached) if defined $cached;
+
if (defined $cached && $cached->[0] == $old_rev) {
$old_minfo = $cached->[1];
} else {
@@ -1735,11 +1747,12 @@ sub mergeinfo_changes {
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $cached_mergeinfo->{$old_path} = [ $old_rev, $old_minfo ];
+ $cached_mergeinfo->{$old_path} =
+ freeze([ $old_rev, $old_minfo ]);
}

# Cache the new mergeinfo.
- $cached_mergeinfo->{$path} = [ $rev, \%minfo ];
+ $cached_mergeinfo->{$path} = freeze([ $rev, \%minfo ]);

my %changes = ();
foreach my $p (keys %minfo) {
--
EW
Jakob Stoklund Olesen
2014-10-19 14:56:11 UTC
Permalink
Post by Eric Wong
Post by Eric Wong
This reduces hash lookups for looking up cache data and will
simplify tying data to disk in the next commit.
I considered the following, but GDBM might not be readily available on
non-POSIX platforms. I think the other problem is the existing caches
are still in memory (whether YAML or Storable) even if disk-backed,
causing a large amount of memory usage anyways.
If cached_mergeinfo is using too much memory, you can probably drop that cache entirely. IIRC, it didn't give that much of a speed up.

I am surprised that it is using a lot of memory, though. There is only one entry per SVN branch.
Post by Eric Wong
(Both patches on top of Jakob's)
-------------------------
Subject: [RFC] git-svn: tie cached_mergeinfo to a GDBM_File store
This should reduce per-instance memory usage by allowing
serialization to disk. Using the existing Memoize::Storable
or YAML backends does not allow fast lookups.
GDBM_File should be available in most Perl installations
and should not pose unnecessary burden
---
perl/Git/SVN.pm | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..3e477c7 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -14,6 +14,7 @@ use IPC::Open3;
use Memoize; # core since 5.8.0, Jul 2002
use Memoize::Storable;
use POSIX qw(:signal_h);
+use Storable qw(freeze thaw);
use Git qw(
command
@@ -1713,10 +1714,21 @@ sub mergeinfo_changes {
# Initialize cache on the first call.
unless (defined $cached_mergeinfo) {
- $cached_mergeinfo = $self->{cached_mergeinfo} = {};
+ my %hash;
+ eval '
+ require File::Temp;
+ use GDBM_File;
+ my $fh = File::Temp->new(TEMPLATE => "mergeinfo.XXXXXXXX");
+ $self->{cached_mergeinfo_fh} = $fh;
+ $fh->unlink_on_destroy(1);
+ tie %hash => "GDBM_File", $fh->filename, GDBM_WRCREAT, 0600;
+ ';
+ $cached_mergeinfo = $self->{cached_mergeinfo} = \%hash;
}
my $cached = $cached_mergeinfo->{$old_path};
+ $cached = thaw($cached) if defined $cached;
+
if (defined $cached && $cached->[0] == $old_rev) {
$old_minfo = $cached->[1];
} else {
@@ -1735,11 +1747,12 @@ sub mergeinfo_changes {
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $cached_mergeinfo->{$old_path} = [ $old_rev, $old_minfo ];
+ $cached_mergeinfo->{$old_path} =
+ freeze([ $old_rev, $old_minfo ]);
}
# Cache the new mergeinfo.
- $cached_mergeinfo->{$path} = [ $rev, \%minfo ];
+ $cached_mergeinfo->{$path} = freeze([ $rev, \%minfo ]);
my %changes = ();
foreach my $p (keys %minfo) {
--
EW
Eric Wong
2014-10-20 01:16:01 UTC
Permalink
Post by Jakob Stoklund Olesen
If cached_mergeinfo is using too much memory, you can probably drop
that cache entirely. IIRC, it didn't give that much of a speed up.
I am surprised that it is using a lot of memory, though. There is only
one entry per SVN branch.
Something like the below? (on top of your original two patches)
Pushed to my master @ git://bogomips.org/git-svn.git

Eric Wong (2):
git-svn: reduce check_cherry_pick cache overhead
git-svn: cache only mergeinfo revisions

Jakob Stoklund Olesen (2):
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo

git-svn still seems to have some excessive memory usage problems,
even independenty of mergeinfo stuff.
--------------------------8<----------------------------
From: Eric Wong <***@yhbt.net>
Date: Mon, 20 Oct 2014 01:02:53 +0000
Subject: [PATCH] git-svn: cache only mergeinfo revisions

This should reduce excessive memory usage from the new mergeinfo
caches without hurting performance too much, assuming reasonable
latency to the SVN server.

Cc: Hin-Tak Leung <***@users.sourceforge.net>
Suggested-by: Jakob Stoklund Olesen <***@2pi.dk>
Signed-off-by: Eric Wong <***@yhbt.net>
---
perl/Git/SVN.pm | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 171af37..f8a75b1 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1713,13 +1713,10 @@ sub mergeinfo_changes {
# Initialize cache on the first call.
unless (defined $self->{cached_mergeinfo_rev}) {
$self->{cached_mergeinfo_rev} = {};
- $self->{cached_mergeinfo} = {};
}

my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- if (defined $cached_rev && $cached_rev == $old_rev) {
- $old_minfo = $self->{cached_mergeinfo}{$old_path};
- } else {
+ unless (defined $cached_rev && $cached_rev == $old_rev) {
my $ra = $self->ra;
# Give up if $old_path isn't in the repo.
# This is probably a merge on a subtree.
@@ -1728,19 +1725,16 @@ sub mergeinfo_changes {
"directory didn't exist in r$old_rev\n";
return {};
}
- my (undef, undef, $props) =
- $self->ra->get_dir($old_path, $old_rev);
- if (defined $props->{"svn:mergeinfo"}) {
- my %omi = map {split ":", $_ } split "\n",
- $props->{"svn:mergeinfo"};
- $old_minfo = \%omi;
- }
- $self->{cached_mergeinfo}{$old_path} = $old_minfo;
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
}
+ my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
+ if (defined $props->{"svn:mergeinfo"}) {
+ my %omi = map {split ":", $_ } split "\n",
+ $props->{"svn:mergeinfo"};
+ $old_minfo = \%omi;
+ }
+ $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;

# Cache the new mergeinfo.
- $self->{cached_mergeinfo}{$path} = \%minfo;
$self->{cached_mergeinfo_rev}{$path} = $rev;

my %changes = ();
--
EW
Jakob Stoklund Olesen
2014-10-20 13:46:19 UTC
Permalink
Post by Eric Wong
Post by Jakob Stoklund Olesen
If cached_mergeinfo is using too much memory, you can probably drop
that cache entirely. IIRC, it didn't give that much of a speed up.
I am surprised that it is using a lot of memory, though. There is only
one entry per SVN branch.
Something like the below? (on top of your original two patches)
Yes, but I think you can remove cached_mergeinfo_rev too.

Thanks
/Jakob
Post by Eric Wong
git-svn: reduce check_cherry_pick cache overhead
git-svn: cache only mergeinfo revisions
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo
git-svn still seems to have some excessive memory usage problems,
even independenty of mergeinfo stuff.
--------------------------8<----------------------------
Date: Mon, 20 Oct 2014 01:02:53 +0000
Subject: [PATCH] git-svn: cache only mergeinfo revisions
This should reduce excessive memory usage from the new mergeinfo
caches without hurting performance too much, assuming reasonable
latency to the SVN server.
---
perl/Git/SVN.pm | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 171af37..f8a75b1 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1713,13 +1713,10 @@ sub mergeinfo_changes {
# Initialize cache on the first call.
unless (defined $self->{cached_mergeinfo_rev}) {
$self->{cached_mergeinfo_rev} = {};
- $self->{cached_mergeinfo} = {};
}
my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- if (defined $cached_rev && $cached_rev == $old_rev) {
- $old_minfo = $self->{cached_mergeinfo}{$old_path};
- } else {
+ unless (defined $cached_rev && $cached_rev == $old_rev) {
my $ra = $self->ra;
# Give up if $old_path isn't in the repo.
# This is probably a merge on a subtree.
@@ -1728,19 +1725,16 @@ sub mergeinfo_changes {
"directory didn't exist in r$old_rev\n";
return {};
}
- my (undef, undef, $props) =
- $self->ra->get_dir($old_path, $old_rev);
- if (defined $props->{"svn:mergeinfo"}) {
- my %omi = map {split ":", $_ } split "\n",
- $props->{"svn:mergeinfo"};
- $old_minfo = \%omi;
- }
- $self->{cached_mergeinfo}{$old_path} = $old_minfo;
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
}
+ my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
+ if (defined $props->{"svn:mergeinfo"}) {
+ my %omi = map {split ":", $_ } split "\n",
+ $props->{"svn:mergeinfo"};
+ $old_minfo = \%omi;
+ }
+ $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
# Cache the new mergeinfo.
- $self->{cached_mergeinfo}{$path} = \%minfo;
$self->{cached_mergeinfo_rev}{$path} = $rev;
my %changes = ();
--
EW
Eric Wong
2014-10-21 09:00:56 UTC
Permalink
Post by Jakob Stoklund Olesen
Yes, but I think you can remove cached_mergeinfo_rev too.
Thanks, pushed the patch at the bottom, too.
Also started working on some memory reductions here:
http://mid.gmane.org/***@dcvr.yhbt.net
But there seem to be more problems :<

----------------------------8<-----------------------------
From: Eric Wong <***@yhbt.net>
Date: Tue, 21 Oct 2014 06:23:22 +0000
Subject: [PATCH] git-svn: remove mergeinfo rev caching

This should further reduce memory usage from the new mergeinfo
speedups without hurting performance too much, assuming
reasonable latency to the SVN server.

Cc: Hin-Tak Leung <***@users.sourceforge.net>
Suggested-by: Jakob Stoklund Olesen <***@2pi.dk>
Signed-off-by: Eric Wong <***@yhbt.net>
---
perl/Git/SVN.pm | 30 +++++++++---------------------
1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index f8a75b1..4364506 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1710,32 +1710,20 @@ sub mergeinfo_changes {
my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
my $old_minfo = {};

- # Initialize cache on the first call.
- unless (defined $self->{cached_mergeinfo_rev}) {
- $self->{cached_mergeinfo_rev} = {};
- }
-
- my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- unless (defined $cached_rev && $cached_rev == $old_rev) {
- my $ra = $self->ra;
- # Give up if $old_path isn't in the repo.
- # This is probably a merge on a subtree.
- if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
- warn "W: ignoring svn:mergeinfo on $old_path, ",
- "directory didn't exist in r$old_rev\n";
- return {};
- }
- }
- my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
+ my $ra = $self->ra;
+ # Give up if $old_path isn't in the repo.
+ # This is probably a merge on a subtree.
+ if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
+ warn "W: ignoring svn:mergeinfo on $old_path, ",
+ "directory didn't exist in r$old_rev\n";
+ return {};
+ }
+ my (undef, undef, $props) = $ra->get_dir($old_path, $old_rev);
if (defined $props->{"svn:mergeinfo"}) {
my %omi = map {split ":", $_ } split "\n",
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
-
- # Cache the new mergeinfo.
- $self->{cached_mergeinfo_rev}{$path} = $rev;

my %changes = ();
foreach my $p (keys %minfo) {
--
EW
Fabian Schmied
2014-10-19 09:38:16 UTC
Permalink
Post by Eric Wong
Post by Fabian Schmied
Hi,
I'm currently migrating an SVN repository to Git using git-svn (Git
for Windows 1.8.3-preview20130601), and I'm experiencing severe
performance problems with "git svn fetch". Commits to the SVN "trunk"
are fetched very fast (a few seconds or so per SVN revision), but
commits to some branches ("hotfix" branches) are currently taking
about 9 minutes per revision. I fear that the time per these commits
is increasing and that indeed the migration might not be finishable at
all.
[...]
Post by Eric Wong
Post by Fabian Schmied
Is there anything I can do to speed this up? (I already tried
increasing the --log-window-size to 500, didn't have any effect.)
Can you take a look at the following two "mergeinfo-speedups"
in my repo? (git://bogomips.org/git-svn)
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo
http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
[...]

Thank you _very_ much, the performance increase is tremendous: from,
ATM, 15 minutes per commit (with large merge-infos) down to 15 seconds
each. This means that instead of taking weeks, the migration will now
complete in hours! Memory consumption might be a bit higher, but not a
problem for me at all.

(I didn't apply the two additional patches you supplied, only the two
ones linked above.)

Thanks again, you saved my deadline :)
Fabian
Hin-Tak Leung
2014-10-22 17:38:30 UTC
Permalink
------------------------------
Post by Eric Wong
Yes, but I think you can remove cached_mergeinfo_rev too.=20
Thanks, pushed the patch at the bottom, too.
But there seem to be more problems :<
----------------------------8<-----------------------------
Date: Tue, 21 Oct 2014 06:23:22 +0000
Subject: [PATCH] git-svn: remove mergeinfo rev caching
This should further reduce memory usage from the new mergeinfo
speedups without hurting performance too much, assuming
reasonable latency to the SVN server.
---
perl/Git/SVN.pm | 30 +++++++++---------------------
1 file changed, 9 insertions(+), 21 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index f8a75b1..4364506 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1710,32 +1710,20 @@ sub mergeinfo_changes {
=A0=A0=A0 my %minfo =3D map {split ":", $_ } split "\n", $mergeinfo_p=
rop;
Post by Eric Wong
=A0=A0=A0 my $old_minfo =3D {};
=20
-=A0=A0=A0 # Initialize cache on the first call.
-=A0=A0=A0 unless (defined $self->{cached_mergeinfo_rev}) {
-=A0=A0=A0 =A0=A0=A0 $self->{cached_mergeinfo_rev} =3D {};
-=A0=A0=A0 }
-
-=A0=A0=A0 my $cached_rev =3D $self->{cached_mergeinfo_rev}{$old_path}=
;
Post by Eric Wong
-=A0=A0=A0 unless (defined $cached_rev && $cached_rev =3D=3D $old_rev)=
{
Post by Eric Wong
-=A0=A0=A0 =A0=A0=A0 my $ra =3D $self->ra;
-=A0=A0=A0 =A0=A0=A0 # Give up if $old_path isn't in the repo.
-=A0=A0=A0 =A0=A0=A0 # This is probably a merge on a subtree.
-=A0=A0=A0 =A0=A0=A0 if ($ra->check_path($old_path, $old_rev) !=3D $SV=
N::Node::dir) {
Post by Eric Wong
-=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 warn "W: ignoring svn:mergeinfo on $old=
_path, ",
Post by Eric Wong
-=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "directory didn't exist in r$=
old_rev\n";
Post by Eric Wong
-=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return {};
-=A0=A0=A0 =A0=A0=A0 }
-=A0=A0=A0 }
-=A0=A0=A0 my (undef, undef, $props) =3D $self->ra->get_dir($old_path,=
$old_rev);
Post by Eric Wong
+=A0=A0=A0 my $ra =3D $self->ra;
+=A0=A0=A0 # Give up if $old_path isn't in the repo.
+=A0=A0=A0 # This is probably a merge on a subtree.
+=A0=A0=A0 if ($ra->check_path($old_path, $old_rev) !=3D $SVN::Node::d=
ir) {
Post by Eric Wong
+=A0=A0=A0 =A0=A0=A0 warn "W: ignoring svn:mergeinfo on $old_path, ",
+=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "directory didn't exist in r$old_rev\n"=
;
Post by Eric Wong
+=A0=A0=A0 =A0=A0=A0 return {};
+=A0=A0=A0 }
+=A0=A0=A0 my (undef, undef, $props) =3D $ra->get_dir($old_path, $old_=
rev);
Post by Eric Wong
=A0=A0=A0 if (defined $props->{"svn:mergeinfo"}) {
=A0=A0=A0 =A0=A0=A0 my %omi =3D map {split ":", $_ } split "\n",
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 $props->{"svn:mergeinfo"};
=A0=A0=A0 =A0=A0=A0 $old_minfo =3D \%omi;
=A0=A0=A0 }
-=A0=A0=A0 $self->{cached_mergeinfo_rev}{$old_path} =3D $old_rev;
-
-=A0=A0=A0 # Cache the new mergeinfo.
-=A0=A0=A0 $self->{cached_mergeinfo_rev}{$path} =3D $rev;
=20
=A0=A0=A0 my %changes =3D ();
=A0=A0=A0 foreach my $p (keys %minfo) {
--=20
EW
I'll have a look at the new changes at some point - I am still keeping =
the old
clone and the new clone and just fetching from time to time to keep the=
m
in sync. I just tried that and fetching the same 50 commits on the old =
clone=20
took 1.7 GB memory vs 1.0 GB memory on the new. Details below.
This is just with the 2 earliest patches - I'll put the new 3 in at som=
e point.
So I see some needs for retrospectively fixing old clones (maybe as par=
t
of garbage collection?), since most would simply use an old clone throu=
gh
the ages...=20

Comparing trunk of old and new, I see one difference - One short
commit message is missing in the *old* (the "Add checkPoFiles etc." par=
t)
and so all the sha1 afterwards differed. Is that an old bug that's fixe=
d
and therefore I should throw away the old clone?=20

Date: Wed Apr 25 18:21:29 2012 +0000
Add checkPoFiles etc.
git-svn-id: https://svn.r-project.org/R/***@59188=20

Here is the details of fetching old and new:

<---
$ /usr/bin/time -v git svn fetch --all
M doc/manual/R-admin.texi
r66784 =3D fc20374f26f8e03bb88c00933982e29138a6f929 (refs/remotes/trunk=
)
=2E..
M configure
r66834 =3D d8d1876f6aa71b3fe3773cd28a760ff945d30bdf (refs/remotes/R-3-1=
-branch)
Command being timed: "git svn fetch --all"
User time (seconds): 1520.77
System time (seconds): 156.32
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 28:15.82
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1738276
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 613
Minor (reclaiming a frame) page faults: 2039305
Voluntary context switches: 11243
Involuntary context switches: 181507
Swaps: 0
File system inputs: 658328
File system outputs: 754688
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

$ cd ../R-2/
[Hin-***@localhost R-2]$ /usr/bin/time -v git svn fetch --all
M doc/manual/R-admin.texi
r66784 =3D 6a08d94b456d33d85add914a1b780a972689443a (refs/remotes/trunk=
)
=2E..
M configure
r66834 =3D 370a6484c2a65be78dfae184b50d8f08685d389c (refs/remotes/R-3-1=
-branch)
Command being timed: "git svn fetch --all"
User time (seconds): 1507.89
System time (seconds): 134.25
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 27:38.49
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1026656
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1110
Minor (reclaiming a frame) page faults: 1630150
Voluntary context switches: 10280
Involuntary context switches: 176444
Swaps: 0
File system inputs: 361472
File system outputs: 477912
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---->

Loading...