Discussion:
Sources for 3.18-rc1 not uploaded
Linus Torvalds
2014-10-20 15:25:59 UTC
Permalink
Junio, Brian,

it seems that the stability of the "git tar" output is broken.

On Mon, Oct 20, 2014 at 4:59 AM, Konstantin Ryabitsev
This is why the front page still lists 3.17 as the latest mainline. Want
to try again?
Ok, tried again, and failed again.
If that still doesn't work, you may have to use version 1.7 of git when
generating the tarball and signature -- I recall Greg having a similar
problem in the past.
Ugh, yes, that seems to be it. Current git generates different
tar-files than older releases do:

tar-1.7.9.7 tar-cur differ: byte 107, line 1

and a quick bisection shows that it is due to commit 10f343ea814f
("archive: honor tar.umask even for pax headers") in the current git
development version.

Junio, quite frankly, I don't think that that fix was a good idea. I'd
suggest having a *separate* umask for the pax headers, so that we do
not break this long-lasting stability of "git archive" output in ways
that are unfixable and not compatible. kernel.org has relied (for a
*long* time) on being able to just upload the signature of the
resulting tar-file, because both sides can generate the same tar-fiel
bit-for-bit.

So instead of using "tar_umask", please make it use "tar_pax_umask",
and have that default to 000. Ok?

Something like the attached patch.

Or just revert 10f343ea814f entirely.

Linus
Junio C Hamano
2014-10-20 18:28:47 UTC
Permalink
Post by Linus Torvalds
Junio, Brian,
it seems that the stability of the "git tar" output is broken.
On Mon, Oct 20, 2014 at 4:59 AM, Konstantin Ryabitsev
This is why the front page still lists 3.17 as the latest mainline. Want
to try again?
Ok, tried again, and failed again.
If that still doesn't work, you may have to use version 1.7 of git when
generating the tarball and signature -- I recall Greg having a similar
problem in the past.
Ugh, yes, that seems to be it. Current git generates different
tar-1.7.9.7 tar-cur differ: byte 107, line 1
and a quick bisection shows that it is due to commit 10f343ea814f
("archive: honor tar.umask even for pax headers") in the current git
development version.
Junio, quite frankly, I don't think that that fix was a good idea. I'd
suggest having a *separate* umask for the pax headers, so that we do
not break this long-lasting stability of "git archive" output in ways
that are unfixable and not compatible. kernel.org has relied (for a
*long* time) on being able to just upload the signature of the
resulting tar-file, because both sides can generate the same tar-fiel
bit-for-bit.
So instead of using "tar_umask", please make it use "tar_pax_umask",
and have that default to 000. Ok?
Something like the attached patch.
Or just revert 10f343ea814f entirely.
My preference for this particular one however is to simply revert
it. I do not see much point in bending backwards to treat older
implementations of tar that do not understand extended pax headers
very specially by adding a separate option or configuration, even
though I wouldn't have minded if the original implementation were to
apply the same umask for these entries that look like "dummy files"
to them.

I have to wonder why 10f343ea (archive: honor tar.umask even for pax
headers, 2014-08-03) is a problem but an earlier change v1.8.1.1~8^2
(archive-tar: split long paths more carefully, 2013-01-05), which
also should have broken bit-for-bit compatibility, went unnoticed,
though. What I am getting at is that correcting past mistakes in
the output should not be forbidden unconditionally with a complaint
like this.

If 10f343ea were an important fix, then my preference would have
been to instead add "tar_ignore_umask_in_pax_header" to allow those
who care more about bit-for-bit compatibility with older broken
versions than correctness to conditionally disable its code. But I
do not think it is, so my preference isn't.

Thanks.
Konstantin Ryabitsev
2014-10-20 18:37:09 UTC
Permalink
Post by Junio C Hamano
I have to wonder why 10f343ea (archive: honor tar.umask even for pax
headers, 2014-08-03) is a problem but an earlier change v1.8.1.1~8^2
(archive-tar: split long paths more carefully, 2013-01-05), which
also should have broken bit-for-bit compatibility, went unnoticed,
though. What I am getting at is that correcting past mistakes in
the output should not be forbidden unconditionally with a complaint
like this.
I think Greg actually ran into that one, and uses a separate 1.7 git
tree for this reason.

I can update our servers to git 2.1 (which most of them already have),
which should help with previous incompatibilities -- but not the future
ones obviously. :)

-K
Junio C Hamano
2014-10-20 19:43:37 UTC
Permalink
Post by Konstantin Ryabitsev
Post by Junio C Hamano
I have to wonder why 10f343ea (archive: honor tar.umask even for pax
headers, 2014-08-03) is a problem but an earlier change v1.8.1.1~8^2
(archive-tar: split long paths more carefully, 2013-01-05), which
also should have broken bit-for-bit compatibility, went unnoticed,
though. What I am getting at is that correcting past mistakes in
the output should not be forbidden unconditionally with a complaint
like this.
I think Greg actually ran into that one, and uses a separate 1.7 git
tree for this reason.
I can update our servers to git 2.1 (which most of them already have),
which should help with previous incompatibilities -- but not the future
ones obviously. :)
Updating to 2.1 will hopefully correct the change in v1.8.1.1~8^2,
and will break Greg and friends who stick to 1.7 for that reason,
though.

The "breakage" in 10f343ea was only in the 'master' branch and
upwards, which is not yet released in any tagged version, and I just
reverted it from my tree, so people on the cutting edge will be okay
in a short order.

Thanks.
Greg KH
2014-10-20 21:52:12 UTC
Permalink
Post by Konstantin Ryabitsev
Post by Junio C Hamano
I have to wonder why 10f343ea (archive: honor tar.umask even for pax
headers, 2014-08-03) is a problem but an earlier change v1.8.1.1~8^2
(archive-tar: split long paths more carefully, 2013-01-05), which
also should have broken bit-for-bit compatibility, went unnoticed,
though. What I am getting at is that correcting past mistakes in
the output should not be forbidden unconditionally with a complaint
like this.
I think Greg actually ran into that one, and uses a separate 1.7 git
tree for this reason.
I used to have to do this for the 3.0-stable kernel as one of the files
in it ran into the "very long path" problem. I just ran the latest
version of git with that one commit reverted and all was fine.

After 3.0 was done, I just dropped that patch from my local version and
have been running with the latest git version of git with no problems.
Post by Konstantin Ryabitsev
I can update our servers to git 2.1 (which most of them already have),
which should help with previous incompatibilities -- but not the future
ones obviously. :)
I thought you already did this. Or was that only the public facing git
servers?

thanks,

greg k-h
brian m. carlson
2014-10-20 22:28:09 UTC
Permalink
Post by Linus Torvalds
Junio, Brian,
it seems that the stability of the "git tar" output is broken.
It doesn't appear that the stability of git archive --format=tar is
documented anywhere. Given that, it doesn't seem reasonable to expect
that any tar implementation produces bit-for-bit compatible output
between versions. After all, look at all the contortions that Debian
has had to go through to keep pristine-tar working.
Post by Linus Torvalds
Junio, quite frankly, I don't think that that fix was a good idea. I'd
suggest having a *separate* umask for the pax headers, so that we do
not break this long-lasting stability of "git archive" output in ways
that are unfixable and not compatible. kernel.org has relied (for a
*long* time) on being able to just upload the signature of the
resulting tar-file, because both sides can generate the same tar-fiel
bit-for-bit.
It sounds like kernel.org has a bug, then. Perhaps that's the
appropriate place to fix the issue.

The issue I fixed is that leaving world-writable files around on disk is
a great way for people to cause mischief (for example, by filling up
other users' quotas), and some tar implementations and all Linux pax
implementations extract the pax headers into the working directory, and
that's often /tmp.
--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
Linus Torvalds
2014-10-20 23:17:47 UTC
Permalink
On Mon, Oct 20, 2014 at 3:28 PM, brian m. carlson
Post by brian m. carlson
It doesn't appear that the stability of git archive --format=tar is
documented anywhere. Given that, it doesn't seem reasonable to expect
that any tar implementation produces bit-for-bit compatible output
between versions.
The kernel has simple stability rules: if it breaks users, it gets
fixed or reverted. That is a damn good rule.

I realize that some other projects are crap, and don't care about
their users. I hope and believe that git is not in that sad group.

The whole "it's not documented" excuse is pure and utter bollocks.
Users don't care. And stability of data should be *expected*, not need
some random documentation entry to make it explicit.

Linus
Michael J Gruber
2014-10-21 08:08:35 UTC
Permalink
Post by Linus Torvalds
On Mon, Oct 20, 2014 at 3:28 PM, brian m. carlson
Post by brian m. carlson
It doesn't appear that the stability of git archive --format=tar is
documented anywhere. Given that, it doesn't seem reasonable to expect
that any tar implementation produces bit-for-bit compatible output
between versions.
The kernel has simple stability rules: if it breaks users, it gets
fixed or reverted. That is a damn good rule.
I realize that some other projects are crap, and don't care about
their users. I hope and believe that git is not in that sad group.
The whole "it's not documented" excuse is pure and utter bollocks.
Users don't care. And stability of data should be *expected*, not need
some random documentation entry to make it explicit.
Linus
Linus, with all due respect, this is not the LKML, so please watch your
tone over here on the git list (and keep ranting on LKML however you want).

Brian made a very valid point about what his patch was trying to fix -
after all that is why it was applied. Konstantin made a very valid point
about why the existing behavior is useful for KUP. Interestingly, both
cared about the users of git, just different kinds users.

Git is probably one of the most conservative projects regarding
backwards compatibility and heeding users' expectations (sometimes to my
own dismay). That being said, we distinguish between justified
expectations and those without a solid base - which is why we have
porcelain vs. plumbing, for example, to make clear which part of the ui
is stable. (Yeah, I know you know, but you didn't argue as if you did.)

"data" in git is stable. "data exports" by git are as stable as the
output format is intrinsically or due to the (hopefully documented) way
git produces it.

Unfortunately, the git archive doc clearly says that the umask is
applied to all archive entries. And that clearly wasn't the case (for
extended metadata headers) before Brian's fix.

Brian: How old is the newest tar that get's the extended metadata
headers wrong? If those tars are a "real concern" then we should
probably do the extra pax_umask as suggested by Linus, but have the
default protect the "unknowing users" and give the "knowing users" that
config knob to twitch (sorry, Linus). Otherwise a revert is in order.

Michael
Linus Torvalds
2014-10-21 16:25:34 UTC
Permalink
On Tue, Oct 21, 2014 at 1:08 AM, Michael J Gruber
Post by Michael J Gruber
Unfortunately, the git archive doc clearly says that the umask is
applied to all archive entries. And that clearly wasn't the case (for
extended metadata headers) before Brian's fix.
Hey, it's time for another round of the world-famous "Captain Obvious
Quiz Game"! Yay!

The questions these week are:

(1) "If reality and documentation do not match, where is the bug?"
(a) Documentation is buggy
(b) Reality is buggy

(2) "Where would you put the horse in relationship to a horse-drawn carriage?"
(a) in front
(b) in the carriage

Now, if you answered (a) to both these questions, and had this been a
real quiz show, you might have been a winner and the happy new owner
of a remote-controlled four-slice toaster with a fancy digital timer.

Sadly, this was just a dry-run for the real thing, to give people a
quick taste of the world-famous "Captain Obvious Quiz Game". I hope
you tune in next week for our exciting all-new questions.

Linus
David Kastrup
2014-10-21 17:25:42 UTC
Permalink
Post by Linus Torvalds
On Tue, Oct 21, 2014 at 1:08 AM, Michael J Gruber
Post by Michael J Gruber
Unfortunately, the git archive doc clearly says that the umask is
applied to all archive entries. And that clearly wasn't the case (for
extended metadata headers) before Brian's fix.
Hey, it's time for another round of the world-famous "Captain Obvious
Quiz Game"! Yay!
(1) "If reality and documentation do not match, where is the bug?"
(a) Documentation is buggy
(b) Reality is buggy
(2) "Where would you put the horse in relationship to a horse-drawn carriage?"
(a) in front
(b) in the carriage
You are aware that a buggy _is_ a horse-drawn carriage?
--
Captain Facepalm
Junio C Hamano
2014-10-21 18:14:10 UTC
Permalink
Post by Michael J Gruber
Unfortunately, the git archive doc clearly says that the umask is
applied to all archive entries.
Is an extended pax header "an archive entry"? I doubt it, and the
above is not relevant. The mode bits for the archive entry that it
applies to does not come from there.

See my other message for my final judgement on this one. I wouldn't
have minded if the original used the same umask for those ignored
mode bits, but changing the bits to be ignored after the fact is not
helping any real use case and only hurts existing users.

That is not to say that we cannot later fix bigger issues in the
output. I just do not see that otherwise-unused mode bits in the
extended pax header big enough an issue to spend brain cycles to
carefully lay and execute transition plans to avoid breaking
existing users.
Michael J Gruber
2014-10-22 09:42:48 UTC
Permalink
Post by Junio C Hamano
Post by Michael J Gruber
Unfortunately, the git archive doc clearly says that the umask is
applied to all archive entries.
Is an extended pax header "an archive entry"? I doubt it, and the
above is not relevant. The mode bits for the archive entry that it
applies to does not come from there.
The problem seem to be old tar versions which mis-take the extensions
for archive entries, aren't they?
Post by Junio C Hamano
See my other message for my final judgement on this one. I wouldn't
have minded if the original used the same umask for those ignored
mode bits, but changing the bits to be ignored after the fact is not
helping any real use case and only hurts existing users.
That is not to say that we cannot later fix bigger issues in the
output. I just do not see that otherwise-unused mode bits in the
extended pax header big enough an issue to spend brain cycles to
carefully lay and execute transition plans to avoid breaking
existing users.
My question to Brian still stands which existing users he was trying to
cater for with his patch. If there indeed are no existing affected users
besides the KUP users (as you seem to assume) it's a clear case. Pun
intended ;)

As I pointed out (and you cut out), I don't mind doing the revert. I
just want us to do the right things for the right reasons (the ones you
ponted out, Junio).

Michael
brian m. carlson
2014-10-23 01:09:27 UTC
Permalink
Post by Michael J Gruber
Post by Junio C Hamano
Post by Michael J Gruber
Unfortunately, the git archive doc clearly says that the umask is
applied to all archive entries.
Is an extended pax header "an archive entry"? I doubt it, and the
above is not relevant. The mode bits for the archive entry that it
applies to does not come from there.
The problem seem to be old tar versions which mis-take the extensions
for archive entries, aren't they?
Yes. POSIX isn't clear on how unknown entries are to be handled. I've
seen some Windows tar implementations extract GNU longlink extensions as
files, which leads to a lot of pain.
Post by Michael J Gruber
My question to Brian still stands which existing users he was trying to
cater for with his patch. If there indeed are no existing affected users
besides the KUP users (as you seem to assume) it's a clear case. Pun
intended ;)
The pax format is an extension of the tar format. All of the pax
implementations I've seen on Linux (OpenBSD's and MirBSD's) don't
actually understand the pax headers and emit them as files. 7zip does
as well. I expect there are other Unix systems where tar itself doesn't
understand pax headers, although I don't have access to anything other
than Linux and FreeBSD.

Since it's very common to extract tar archives in /tmp, I didn't want to
leave world-writable files in /tmp (or anywhere else someone might get
to them). While the contents probably aren't sensitive, a malicious
user might fill someone's quota by "helpfully" appending /dev/zero to
the file. And yes, users do these things.
--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
Konstantin Ryabitsev
2014-10-20 23:44:09 UTC
Permalink
Post by brian m. carlson
Post by Linus Torvalds
Junio, quite frankly, I don't think that that fix was a good idea. I'd
Post by Linus Torvalds
suggest having a *separate* umask for the pax headers, so that we do
not break this long-lasting stability of "git archive" output in ways
that are unfixable and not compatible. kernel.org has relied (for a
*long* time) on being able to just upload the signature of the
resulting tar-file, because both sides can generate the same tar-fiel
bit-for-bit.
It sounds like kernel.org has a bug, then. Perhaps that's the
appropriate place to fix the issue.
It's not a bug, it's a feature (TM). KUP relies on git-archive's ability
to create identical tar archives across platforms and versions. The
benefit is that Linus or Greg can create a detached PGP signature
against a tarball created from "git archive [tag]" on their system, and
just tell kup to create the same archive remotely, thus saving them the
trouble of uploading 80Mb each time they cut a release.

With their frequent travel to places where upload bandwidth is both slow
and unreliable, this ability to not have to upload hundreds of Mbs each
time they cut a release is very handy and certainly helps keep kernel
releases on schedule.

So, while it's fair to point out that git-archive was never intended to
always create bit-for-bit identical outputs, it would be *very nice* if
this remained in place, as at least one large-ish deployment (us) finds
it really handy.

-K
Junio C Hamano
2014-10-21 18:59:58 UTC
Permalink
Post by Konstantin Ryabitsev
Post by brian m. carlson
Post by Linus Torvalds
Junio, quite frankly, I don't think that that fix was a good idea. I'd
Post by Linus Torvalds
suggest having a *separate* umask for the pax headers, so that we do
not break this long-lasting stability of "git archive" output in ways
that are unfixable and not compatible. kernel.org has relied (for a
*long* time) on being able to just upload the signature of the
resulting tar-file, because both sides can generate the same tar-fiel
bit-for-bit.
It sounds like kernel.org has a bug, then. Perhaps that's the
appropriate place to fix the issue.
It's not a bug, it's a feature (TM). KUP relies on git-archive's ability
to create identical tar archives across platforms and versions. The
benefit is that Linus or Greg can create a detached PGP signature
against a tarball created from "git archive [tag]" on their system, and
just tell kup to create the same archive remotely, thus saving them the
trouble of uploading 80Mb each time they cut a release.
With their frequent travel to places where upload bandwidth is both slow
and unreliable, this ability to not have to upload hundreds of Mbs each
time they cut a release is very handy and certainly helps keep kernel
releases on schedule.
So, while it's fair to point out that git-archive was never intended to
always create bit-for-bit identical outputs, it would be *very nice* if
this remained in place, as at least one large-ish deployment (us) finds
it really handy.
While I agree that it is a nice "feature", I wish KUP folks thought
more about what should happen when the archive output _must_ change
when a more serious bug is discovered, and coordinated with us
better.

During a period where older and buggy versions of "git archive" are
used by some uploaders while a new version is used by others, KUP
could:

- avail itself to a version (or versions) of "git archive" so that
it can recreate both older and newer output;

- upon receiving a tarball and signature, try recreating newer
output and see if signature matches, and when the signature does
not match, recreate older output and try again.

And we could supply "git archive --compatible=v1.7" option in the
newer version if that is easier on KUP folks than having to keep
multiple installations of versions of "git archive" around.

While I am on the topic of KUP, one feature I wish, which is the
only thing that is preventing me from updating the preformatted
documentation https://www.kernel.org/pub/software/scm/git/docs/, is
to allow me to upload a single tarball and extract it at one
location (e.g. /pub/software/scm/git/docs/) while removing existing
files in that location (i.e. removing deleted files). Where do I
file such a feature request?
Continue reading on narkive:
Loading...