Discussion:
core.autocrlf considered half-assed
Johannes Schindelin
2010-03-05 23:23:33 UTC
Permalink
Hi,

back then, I was not a fan of the core.autocrlf support. But I have to
admit that in the meantime, I turned into an outright un-fan of the
feature. Not because its intent is wrong, but because its implementation
is lousy.

Just try to "git reset --hard" or "git stash" when there are files with
DOS line endings and when core.autocrlf is not false.

And then despair.

So again, for the record this time: core.autocrlf=true is handled
_lousily_.

Ciao,
Dscho
Dmitry Potapov
2010-03-07 09:27:01 UTC
Permalink
Post by Johannes Schindelin
back then, I was not a fan of the core.autocrlf support. But I have to
admit that in the meantime, I turned into an outright un-fan of the
feature. Not because its intent is wrong, but because its implementation
is lousy.
Well, I agree there are some issues with it. In particularly, when
someone changes core.autocrlf in his/her repository, and then git
behavior is outright confusing. IMHO, the nuts of the problem is that
does not store in the index how files were checkout. Instead it uses
core.autocrlf, which specifies how the user _wants_ files to be check-
out. So, when the autocrlf option changes, things get very confusing.
Post by Johannes Schindelin
Just try to "git reset --hard" or "git stash" when there are files with
DOS line endings and when core.autocrlf is not false.
I did, and I have not noticed any problem with that.

git init
git config core.autocrlf true
echo foo^ | tr ^ '\r' > foo
git add foo
git commit -m 'add foo'
echo more^ | tr ^ '\r' >> foo
echo "Before reset:"
tr '\r' ^ < foo
git reset --hard
echo "After reset:"
git diff
tr '\r' ^ < foo


Dmitry
Linus Torvalds
2010-03-07 23:45:13 UTC
Permalink
Post by Dmitry Potapov
Well, I agree there are some issues with it. In particularly, when
someone changes core.autocrlf in his/her repository, and then git
behavior is outright confusing. IMHO, the nuts of the problem is that
does not store in the index how files were checkout. Instead it uses
core.autocrlf, which specifies how the user _wants_ files to be check-
out. So, when the autocrlf option changes, things get very confusing.
I do agree. It would probably have been a good idea to mark the CRLF
status in the index, but we didn't. And crlf isn't actually the _only_
thing that can cause confusion, the 'ident' and 'filter' can do the same
thing.

One option might be to have "git config" know about crlf, so that if you
change crlf state with 'git config' rather than manually, we could at
least _warn_ about the effects and tell people that they may need to do a
full new checkout (or reset the stat info in the index, or whatever). But
I like editing config files by hand, and I don't think I'm the only one.

Linus
Tait
2010-03-08 18:57:19 UTC
Permalink
Post by Linus Torvalds
I do agree. It would probably have been a good idea to mark the CRLF
status in the index, but we didn't...
We already have .gitattributes for tracking information about files. Maybe
add an attribute to describe the in-repository line endings? The default
would be LF, as now, and a new attribute could change the checked-in
format to be CRLF.

Tait
Johannes Schindelin
2010-03-08 19:15:10 UTC
Permalink
Hi,
Post by Tait
Post by Linus Torvalds
I do agree. It would probably have been a good idea to mark the CRLF
status in the index, but we didn't...
We already have .gitattributes for tracking information about files.
Maybe add an attribute to describe the in-repository line endings? The
default would be LF, as now, and a new attribute could change the
checked-in format to be CRLF.
No.

The problem is not the description of the line endings in the repository.
The information what line endings are used can be easily extracted from
every blob, by a simple inspection.

The problem is that the core.autocrlf code blindly assumes that Unix line
endings are the only thing you would ever commit. And worse, the mistake
is repeated when updating the index. Git converts the DOS line endings
into Unix line endings, then compares with what it has in the repository
and says: "Ooops, it is different!" even if it just checked the files out.

And I demonstrated with the "html" example that even long-time Gitsters
sometimes commit DOS line endings as-are, unconverted.

Ciao,
Dscho
Junio C Hamano
2010-03-08 20:31:34 UTC
Permalink
Post by Johannes Schindelin
And I demonstrated with the "html" example that even long-time Gitsters
sometimes commit DOS line endings as-are, unconverted.
FWIW, not using CRLF conversion in the auto-generated 'html' repository
was a deliberate choice, as the contents there (the ones generated by
AsciiDoc with '.html' suffix) were intended to be served directly from the
web servers. I presume AsciiDoc writes them with CRLF because html
documents are supposed to be, and it would be wrong to apply core.autocrlf
in the auto-generated repository. And it is not correct to force
core.autocrlf on the recipient side either. I know you wanted to have a
sample repository that people can easily access and understand what you
see as a problem, and the autogenerated 'html' is a good sample to point
at for that purpose, but "even long-time gitsters..." is stretching the
truth.

Nevertheless, I agree with you that if a similar situation happened by
mistake and your project does want to enforce core.autocrlf, it would be
nicer if there is an easy-to-use one-time clean-up procedure. It hasn't
been my itch, and I suspect it wasn't Linus's itch either.
Johannes Schindelin
2010-03-09 09:28:13 UTC
Permalink
Hi,
Post by Junio C Hamano
Post by Johannes Schindelin
And I demonstrated with the "html" example that even long-time
Gitsters sometimes commit DOS line endings as-are, unconverted.
FWIW, not using CRLF conversion in the auto-generated 'html' repository
was a deliberate choice, as the contents there (the ones generated by
AsciiDoc with '.html' suffix) were intended to be served directly from the
web servers. I presume AsciiDoc writes them with CRLF because html
documents are supposed to be, and it would be wrong to apply core.autocrlf
in the auto-generated repository. And it is not correct to force
core.autocrlf on the recipient side either.
And here you are wrong, because the branch contains _also_ .txt files that
are _not_ CR/LF, but need to be CR/LF on Windows. Believe me, we did think
about what we were doing in msysGit. More than once.
Post by Junio C Hamano
Nevertheless, I agree with you that if a similar situation happened by
mistake and your project does want to enforce core.autocrlf, it would be
nicer if there is an easy-to-use one-time clean-up procedure. It hasn't
been my itch, and I suspect it wasn't Linus's itch either.
The problem is that the whole thing was not your itch, but your
implementation. You never used it, so you never caught the obvious flaws
in the design.

Sorry to be so direct, but it seems that my more subtle attempts to
explain the situation failed.

Ciao,
Dscho
Junio C Hamano
2010-03-09 17:11:15 UTC
Permalink
Post by Johannes Schindelin
Post by Junio C Hamano
Nevertheless, I agree with you that if a similar situation happened by
mistake and your project does want to enforce core.autocrlf, it would be
nicer if there is an easy-to-use one-time clean-up procedure. It hasn't
been my itch, and I suspect it wasn't Linus's itch either.
The problem is that the whole thing was not your itch, but your
implementation. You never used it, so you never caught the obvious flaws
in the design.
Sorry to be so direct, but it seems that my more subtle attempts to
explain the situation failed.
No, being direct is good---it shows the thought behind what you say more
clearly.

Think what "your implementation" means in the open source setting. It is
what you were given for free, you could try to improve upon it if you so
desire, and you should be thankful for it. It also means that you know
the people to ask for help as it is "their" implementation and they are
probably more familiar with it than others. If you happen to be in the
position where you can see shortcomings in the implementation better than
they do, that's good; they and you can complement what each is good at,
and make progress collectively.

What it does _not_ mean is that you can _demand_ anything out of them,
though. An attempt to shaming them into doing something amounts to the
same thing. Having seen the way you have behaved on the msysgit list and
its tracker for a while, I thought you understood all that.

Johannes Schindelin
2010-03-08 11:29:01 UTC
Permalink
Hi,
Post by Dmitry Potapov
Post by Johannes Schindelin
Just try to "git reset --hard" or "git stash" when there are files
with DOS line endings and when core.autocrlf is not false.
I did, and I have not noticed any problem with that.
git init
git config core.autocrlf true
echo foo^ | tr ^ '\r' > foo
git add foo
git commit -m 'add foo'
Unfortunately, this is not the common case. The common case is that
somebody committed _without_ autocrlf (implicitly =false), and you clone
from there.

Easiest example:

$ git clone -n git://repo.or.cz/git.git html-docs
$ cd html-docs/
$ git config core.autocrlf true
$ git checkout -t origin/html
$ git status

... and despair.

Hth,
Dscho
Johannes Schindelin
2010-03-09 09:29:20 UTC
Permalink
Hi,
Post by Johannes Schindelin
$ git clone -n git://repo.or.cz/git.git html-docs
$ cd html-docs/
$ git config core.autocrlf true
$ git checkout -t origin/html
$ git status
As Junio explained in another mail, it was intentional to have all HTML
files with CRLF, because they are supposed to have that ending on all
platforms. What is missing, however, is .gitattributes, which would tell
to Git that we do not want to autocrlf conversion for HTML files.
That is just papering over the real culprit: Git checks something out.
This should be clean. But then Git says it is not.

Ciao,
Dscho
Dmitry Potapov
2010-03-09 10:11:01 UTC
Permalink
Post by Johannes Schindelin
That is just papering over the real culprit: Git checks something out.
This should be clean. But then Git says it is not.
We have two filters: from-git-to-worktree and from-worktree-to-git. If
those conversions are inconsistent for whatever reason, it is not going
to be well even if we found a way to solve this not-being-clean-after-
checkout problem.

One possible approach is to mark all files that had CRLF conversion
during checkout and apply the opposite conversion only to them. But what
about newly added files? Wouldn't this lead that to the situation where
newly added files will have the incorrect ending?

IMHO, this not clean after checkout repo demonstrates that you have the
incorrectly crlf configuration in it. So, you probably should correct
.gitattributes (or even to disable autocrlf in it if this repository is
not intended to be used with autocrlf=true). Either way, your current
settings are incorrect. It is better to fix it than try to hide it!


Dmitry
Dmitry Potapov
2010-03-09 07:24:12 UTC
Permalink
Post by Johannes Schindelin
$ git clone -n git://repo.or.cz/git.git html-docs
$ cd html-docs/
$ git config core.autocrlf true
$ git checkout -t origin/html
$ git status
As Junio explained in another mail, it was intentional to have all HTML
files with CRLF, because they are supposed to have that ending on all
platforms. What is missing, however, is .gitattributes, which would tell
to Git that we do not want to autocrlf conversion for HTML files. This
can be done by adding .gitattributes:

$ cat >> .gitattributes <<EOF
*.html -crlf
EOF

I've just noticed that user-manual.html differs from other HTML files in
that it uses LF ending. I think it is a mistake, and this file should be
converted to have CRLF, but if you want to have all HTML files except
user-manual.html to have CRLF then you can do that too:

$ cat > .gitattributes <<EOF
*.html -crlf
user-manual.html crlf
EOF

I hope Junio will add the right version of .gitattributes, so users with
autocrlf=true will not suffer.


Dmitry
Loading...