Discussion:
git-daemon memory usage, disconnection.
David Woodhouse
2006-04-19 13:22:46 UTC
Permalink
I'm running git-daemon from xinetd and it seems a little greedy...

Cpu(s): 2.7% us, 6.4% sy, 0.0% ni, 1.7% id, 87.7% wa, 1.4% hi, 0.0% si
Mem: 253680k total, 250076k used, 3604k free, 568k buffers
Swap: 500960k total, 500864k used, 96k free, 24696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31232 nobody 18 0 155m 29m 7224 D 1.3 11.9 0:25.56 git-rev-list
30743 nobody 18 0 179m 29m 9480 D 0.7 11.9 0:42.60 git-rev-list
31277 nobody 18 0 147m 28m 7476 D 2.6 11.4 0:20.90 git-rev-list
30314 nobody 18 0 233m 26m 7696 D 0.0 10.6 1:20.24 git-rev-list
30612 nobody 18 0 204m 23m 7432 D 1.3 9.4 0:59.19 git-rev-list
30574 nobody 18 0 190m 20m 7608 D 0.3 8.3 0:50.77 git-rev-list
30208 nobody 18 0 140m 14m 7632 D 0.3 5.9 0:15.23 git-pack-object

Now, this wouldn't be _so_ bad if there were only two of them running.
The clients for the other four have actually given up and disconnected
long ago, but git-daemon doesn't seem to have reacted to that.
--
dwmw2
Linus Torvalds
2006-04-19 14:59:43 UTC
Permalink
Post by David Woodhouse
I'm running git-daemon from xinetd and it seems a little greedy...
Cpu(s): 2.7% us, 6.4% sy, 0.0% ni, 1.7% id, 87.7% wa, 1.4% hi, 0.0% si
Mem: 253680k total, 250076k used, 3604k free, 568k buffers
Swap: 500960k total, 500864k used, 96k free, 24696k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31232 nobody 18 0 155m 29m 7224 D 1.3 11.9 0:25.56 git-rev-list
30743 nobody 18 0 179m 29m 9480 D 0.7 11.9 0:42.60 git-rev-list
31277 nobody 18 0 147m 28m 7476 D 2.6 11.4 0:20.90 git-rev-list
30314 nobody 18 0 233m 26m 7696 D 0.0 10.6 1:20.24 git-rev-list
30612 nobody 18 0 204m 23m 7432 D 1.3 9.4 0:59.19 git-rev-list
30574 nobody 18 0 190m 20m 7608 D 0.3 8.3 0:50.77 git-rev-list
30208 nobody 18 0 140m 14m 7632 D 0.3 5.9 0:15.23 git-pack-object
Well, you've probably got two issues:

- it looks like you aren't packing your archives (which explains why the
disk accesses are horrid, which in turn explains the "D" part).

For a git server, you _really_ want all trees to be mostly packed, or
you want absolutely tons of memory (and 256kB is definitely not "tons"
as far as git is concerned).

- git-rev-list won't notice that there is nobody listening until it gets
a EPIPE, and it won't get an EPIPE until it actually outputs something,
and it won't output anything until it is largely done traversing the
tree..
Post by David Woodhouse
Now, this wouldn't be _so_ bad if there were only two of them running.
The clients for the other four have actually given up and disconnected
long ago, but git-daemon doesn't seem to have reacted to that.
Well, the way things work under UNIX, you normally don't notice that the
other end isn't interested until you try to write, and you get a "nobody
is listening". And sadly, the packing stuff does most (not all) of the
heavy lifting before it can even start to write things out.

That said, I should probably take a look at git-rev-list --objects memory
usage once again. It's neve rbeen exactly "lean" (and it can't really be:
it does end up needing the total object list in memory for a full clone,
and with something like the kernel, that's about 250 _thousand_ objects).

We should probably also make send-pack.c use the nice revision library,
because right now it's doing that pipe to git-rev-list for no good reason.

Linus
David Woodhouse
2006-04-19 15:27:49 UTC
Permalink
Post by Linus Torvalds
- it looks like you aren't packing your archives (which explains why the
disk accesses are horrid, which in turn explains the "D" part).
Hm, good point. They're fairly new trees -- I had foolishly assumed that
they would at least start off packed. That isn't the case though --
perhaps it should be? Did the original clone receive a pack on the wire
and then _split_ it?

If the tools would automatically pack when the number of unpacked
objects reaches a threshold, that would be useful.

Since this repo is only available through git:// and git+ssh:// URLs, I
can safely use git-repack's '-a -d' options, right?

I'll do 'git-repack -l' nightly and 'git-repack -a -d -l' weekly -- does
that seem sane?
Post by Linus Torvalds
For a git server, you _really_ want all trees to be mostly packed, or
you want absolutely tons of memory (and 256kB is definitely not "tons"
as far as git is concerned).
Well, the way things work under UNIX, you normally don't notice that the
other end isn't interested until you try to write, and you get a "nobody
is listening". And sadly, the packing stuff does most (not all) of the
heavy lifting before it can even start to write things out.
Well, it does that with SIGALRM happening periodically, theoretically
for the purpose of providing progress output. Perhaps we could do a
getpeername() or something else to check on the output fd each time?
--
dwmw2
Linus Torvalds
2006-04-19 15:49:09 UTC
Permalink
Post by David Woodhouse
Post by Linus Torvalds
- it looks like you aren't packing your archives (which explains why the
disk accesses are horrid, which in turn explains the "D" part).
Hm, good point. They're fairly new trees -- I had foolishly assumed that
they would at least start off packed. That isn't the case though --
perhaps it should be? Did the original clone receive a pack on the wire
and then _split_ it?
For old versions of git, yes.
Post by David Woodhouse
If the tools would automatically pack when the number of unpacked
objects reaches a threshold, that would be useful.
Well, packing is still best done in the background: you don't generally
want the tools to just stop for a minute to repack while you're doing
something. You'd normally want to do a cron run at 4AM or something, see
if there is lots to pack, and repack that.

The one exception is probably a large conversion process (from CVS, SVN,
whatever). The conversion process itself probably takes ages, and it will
be even slower if it were to keep the potentially huge result unpacked all
the time.

But for normal ops, you really don't want to repack synchronously.
Post by David Woodhouse
Since this repo is only available through git:// and git+ssh:// URLs, I
can safely use git-repack's '-a -d' options, right?
Yes.
Post by David Woodhouse
I'll do 'git-repack -l' nightly and 'git-repack -a -d -l' weekly -- does
that seem sane?
Absolutely. The one exception might be trees that really don't change very
much (which is quite common), so you might make it conditional on seeing
if there are _any_ objects at all in .git/objects/00/, for example. Not
that repack will be very expensive, but still..
Post by David Woodhouse
Well, it does that with SIGALRM happening periodically, theoretically
for the purpose of providing progress output. Perhaps we could do a
getpeername() or something else to check on the output fd each time?
Yes, that's possibly a good idea. Of course, for git-rev-list, it's just a
pipe, and it's hard to do that check at least portably. On Linux, doing a
"poll()" on a pipe for writing, with newer kernels you'll get a POLLERR if
the other side has hung up, but that's by no means portable.

(On some other systems, doing a zero-sized write() _might_ do it, but at
least Linux will happily say "ok, wrote 0 bytes" even if the other end
isn't listening).

And git-rev-list isn't doing the SIGALARM anyway.

In other words, to do this, we'd have to change send-pack to use the
revision library. Which, as mentioned, is worth-while anyway, but it's not
totally trivial.

Linus

Loading...