Discussion:
If you would write git from scratch now, what would you change?
(too old to reply)
Jakub Narebski
2007-11-25 21:48:27 UTC
Permalink
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?


Yes, I know, I know. "Worse is better". It is better to release early
and get feedback what is really needed, as opposed to what do you think
is needed.

I think git is a wonderful example of "evolved" software, evolving
practically from the very beginnings.
--
Jakub Narebski
Poland
Pierre Habouzit
2007-11-25 22:23:14 UTC
Permalink
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
* reset/checkout/revert. The commands to wonderful things, but this UI
is a mess for the newcomer.

* pull/fetch/push: I would have had pull being what fetch is, and
added some --merge option to actually "do the obvious merge". But
pull encourage "bad" behavior from the user, and confuses newcomers
a lot.

* I would have hidden plumbing more, using a really distinguished
namespace (stupid example, there are probably better ways, but we
could have git-_rev-parse or git-plumbing-rev-parse instead of
git-rev-parse) so that it's clear to the user that those are really
internal commands, and that he doesn't need to understand them.

This is a big issue with git: the list of commands of git is the top
of the iceberg from the UI point of view. People _feel_ they are
comfortable with a tool if they get say 75% of the UI. I don't say
it's true that understanding 75% of the UI makes you a $tool expert,
but it's how people feel it. With git, 75% of the commands (and
don't get me started with the options ;P) is a _lot_. bzr is way
better at that game: there are at least as many commands, but those
are completely hidden to the user.

Of course having our guts easy to grok and find is a big advantage
for the git gurus. But for the newcomer it's a disconcerting.

There is probably more things I'd change, but those were the first UI
rumblings from me :)
--
·O· Pierre Habouzit
··O ***@debian.org
OOO http://www.madism.org
Steven Walter
2007-11-26 01:28:37 UTC
Permalink
Post by Pierre Habouzit
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
* reset/checkout/revert. The commands to wonderful things, but this UI
is a mess for the newcomer.
Heartily seconded. I think checkout is the most egregrious of the
three. git-checkout can be used to:

* Switch branches
* Create a branch
* Change the state of all files to a particular commit
* Change the state of a particular file to that of the index
* Change the state of a particular file (and index) to a particular
commit

To makes things more complicated, several of these tasks can be done
with other commands. Short of rewriting git from scratch, what can be
done to simplify the many-to-many mapping of tasks to commands?
--
-Steven Walter <***@gmail.com>
Freedom is the freedom to say that 2 + 2 = 4
B2F1 0ECC E605 7321 E818 7A65 FC81 9777 DC28 9E8F
Junio C Hamano
2007-11-26 06:11:50 UTC
Permalink
Post by Steven Walter
Heartily seconded. I think checkout is the most egregrious of the
* Switch branches
* Create a branch
* Change the state of all files to a particular commit
* Change the state of a particular file to that of the index
* Change the state of a particular file (and index) to a particular
commit
Come on. The second one is just to give a short-hand side-effet for
commonly used operation and you do not have to use it nor learn it.

Also, you have written the last three in a more confusing way than it is
necessary. They are all the same thing but with variations --- your way
of writing them is like enumerating "change the state of files whose
name starts with A", "change the state of files whose name starts with
B", etc. as if they are distinctly different and confusing operations.

Let's clear the confusion. Although it is not bad like the above
"random 5 different operations", checkout does serve 2 quite different
purposes:

(1) checkout a revision.

This primarily affects the notion of where your HEAD is. Is it
pointing at a branch, or detached at a particular commit? In
either case, the objective from the user's point of view here is "I
want to change on which commit and/or branch I'd build the next
commit, if I were to issue git-commit command".

"I started modifying but realized that I wanted to build not on top
of master but a separate topic", is a typical use case, and this
form will let you take your local changes with you exactly for this
reason.

Obviously when people say "I checkout this commit", they mean the
state of the work tree and they mean the whole tree. It is
hopefully clear that is what you are doing from the fact that you
do not give any pathspec to the command to trigger this mode of
operation.

(2) checkout selected paths out of a commit (or the index).

"I screwed up. I want to start over modifications to these files
from the state of the previous commit (or the last state I
staged)." is a typical use case for this mode. For this reason,
the named paths are updated in the work tree and the work tree and
the index are made to match.

Again, it hopefully is clear enough that you need to give some
pathspec to it for the operation to make sense, if you understand
the purpose of the command. Like "." to mean the whole tree, "*.c"
to mean all C files, or "directory/" to mean everything underneath
it.

So yes, it does two quite different things, and that's mostly because
the verb "to check out" has overloaded meanings.

Hopefully it is clear which one you are using by thinking about the
reason WHY you are "checking out", and by looking at the way you form
the command line.
Adam Roben
2007-11-26 06:36:58 UTC
Permalink
Post by Junio C Hamano
Post by Steven Walter
Heartily seconded. I think checkout is the most egregrious of the
* Switch branches
* Create a branch
* Change the state of all files to a particular commit
* Change the state of a particular file to that of the index
* Change the state of a particular file (and index) to a particular
commit
Come on. The second one is just to give a short-hand side-effet for
commonly used operation and you do not have to use it nor learn it.
I think the overwhelming majority of git users learn `git checkout -b`.
The cases where you do want to switch to a branch you just created seem
far more common than the cases where you don't (particularly for new
users), which is the whole reason the -b option exists in the first
place. So I don't think it's reasonable to say "you can choose not to be
confused by ignoring this incredibly useful command."
Post by Junio C Hamano
Let's clear the confusion. Although it is not bad like the above
"random 5 different operations", checkout does serve 2 quite different
(1) checkout a revision.
(2) checkout selected paths out of a commit (or the index).
Given the above, I'd argue that it serves 3 purposes:

(1) check out a revision
(2) check out selected paths out of a commit (or the index)
(3) start working on a new branch

It's true that (1) and (3) are very closely related, but I think in the
minds of many git users (particularly new ones) they are distinct. (2)
really seems the most out of place here, and has the most potential for
finding a new home (perhaps within git-reset).

-Adam
Carlos Rica
2007-11-26 15:32:29 UTC
Permalink
Post by Adam Roben
Post by Junio C Hamano
Let's clear the confusion. Although it is not bad like the above
"random 5 different operations", checkout does serve 2 quite different
(1) checkout a revision.
(2) checkout selected paths out of a commit (or the index).
(1) check out a revision
(2) check out selected paths out of a commit (or the index)
(3) start working on a new branch
It's true that (1) and (3) are very closely related, but I think in the
minds of many git users (particularly new ones) they are distinct.
I think this is mostly due to the idea of a branch as a separated box
(like a directory) instead of a line of development like the notion which
comes from thinking in a branch as the place where HEAD is pointing to.

Personally, it is always difficult for me to understand git as a whole,
because I'm not sure what is the common use case for each command in
the most-usual-way-of-doing-the-things when using git, despite of having
long and complete documentation for each individual command. The question
is if we can give the power of git to their users in the same way they think,
or how git could be able to teach their users to think in the way it works.

An idea would be to study (and document) the most successful
use cases that git supports and check if it is already providing
unique and/or clear commands for them.

--Carlos
Daniel Barkalow
2007-11-26 16:40:08 UTC
Permalink
Post by Carlos Rica
Post by Adam Roben
Post by Junio C Hamano
Let's clear the confusion. Although it is not bad like the above
"random 5 different operations", checkout does serve 2 quite different
(1) checkout a revision.
(2) checkout selected paths out of a commit (or the index).
(1) check out a revision
(2) check out selected paths out of a commit (or the index)
(3) start working on a new branch
It's true that (1) and (3) are very closely related, but I think in the
minds of many git users (particularly new ones) they are distinct.
I think this is mostly due to the idea of a branch as a separated box
(like a directory) instead of a line of development like the notion which
comes from thinking in a branch as the place where HEAD is pointing to.
Personally, it is always difficult for me to understand git as a whole,
because I'm not sure what is the common use case for each command in
the most-usual-way-of-doing-the-things when using git, despite of having
long and complete documentation for each individual command. The question
is if we can give the power of git to their users in the same way they think,
or how git could be able to teach their users to think in the way it works.
An idea would be to study (and document) the most successful
use cases that git supports and check if it is already providing
unique and/or clear commands for them.
I think that part of git's oddity comes from the fact that the UI is
organized around use cases rather than commands. That is, for each thing
that people commonly do, the sequence of commands is as short as possible
and each of the names makes sense in the context of this sequence. But
then the commands and options, in the list of commands and options outside
of the context of a use case, don't make any sense as a whole.

It's like trying to document the "take" command in a text adventure, where
"take [noun]" means to pick it up, "take off [noun]" means to remove it as
clothing, and "take off" means to leave.

There's a set of primitive git operations, but the git commands aren't
those; the git command schemas (not just the "command" part, but the type
of arguments following it) are semi-natural-language interfaces to
collections of primitive operations, and are set up to have a core "what
the user is saying to do" and all of the reasonable analogous extensions
to that. This means that the very same result can often be reached with
multiple entirely different commands, because there are different ways of
conceptualizing what you're doing that overlap. (E.g., "git checkout HEAD
." will check out the current directory from the current branch,
discarding local changes; "git reset --hard HEAD" will move the current
branch to its current state, bringing the working copy in line with it as
well; both of these have the effect of discarding all local changes while
keeping the branch state the same, but that's just because the aspects of
the two operations which are different don't matter with those particular
arguments)

I think git's UI design is, by and large, very good, but I'm not sure how
to document it so as to make it easy to learn, aside from giving a quick
explanation of how to use reflogs to recover from mistakes and telling
users to just try stuff in their local repository.

-Daniel
*This .sig left intentionally blank*
Andy Parkins
2007-11-26 16:46:00 UTC
Permalink
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Erm... (it's much harder to come with lists like these lately :-))

- "index", "cached" and "stage" are a definite source of confusion
- "git add" and "git rm" would be nicer as "git stage" and "git unstage"
(or something similar)
- libgit would have come first
- "git revert" should be called "git invert"
- "git revert" would (maybe) be "git reset"
- "git clone" wouldn't exist
- "git-gui" would be written in Qt (ducks)
- git-apply et al wouldn't be a disaster when the log message contains a
diff (change to git diff format?)
- empty directories in the repository (ducks again)



Andy
--
Dr Andy Parkins, M Eng (hons), MIET
***@gmail.com
Benoit Sigoure
2007-11-26 17:10:10 UTC
Permalink
Post by Andy Parkins
- libgit would have come first
I warmly second that.
Post by Andy Parkins
- "git revert" should be called "git invert"
- "git revert" would (maybe) be "git reset"
But here, I have to disagree. Why would you want to call "git-
revert" "git-reset"?

I know it's annoying that commands with the same name do different
things in SVN/CVS but I don't think it's a reason to necessarily
adapt to them. There are plenty of misnomers already anyway
(checkout, commit, add).

While we're discussing bad names, as someone already pointed out, I
agree it's sad that "git push" is almost always understood as being
the opposite of "git pull".
--
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory
Jan Hudec
2007-11-26 18:56:00 UTC
Permalink
While we're discussing bad names, as someone already pointed out, I agree
it's sad that "git push" is almost always understood as being the opposite
of "git pull".
Well, it is an oposite of pull. Compared to it, it is limited in that it will
not do a merge and on the other hand extended to *also* be an oposite of
fetch, but still oposite of pull is push.
--
Jan 'Bulb' Hudec <***@ucw.cz>
David Kastrup
2007-11-26 19:12:37 UTC
Permalink
Post by Jan Hudec
While we're discussing bad names, as someone already pointed out, I agree
it's sad that "git push" is almost always understood as being the opposite
of "git pull".
Well, it is an oposite of pull. Compared to it, it is limited in that it will
not do a merge and on the other hand extended to *also* be an oposite of
fetch, but still oposite of pull is push.
With the same reasoning the opposite of a duck is a lobster, since a
lobster has not only fewer wings, but also more legs.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jan Hudec
2007-11-26 19:34:55 UTC
Permalink
Post by David Kastrup
Post by Jan Hudec
While we're discussing bad names, as someone already pointed out, I agree
it's sad that "git push" is almost always understood as being the opposite
of "git pull".
Well, it is an oposite of pull. Compared to it, it is limited in that it will
not do a merge and on the other hand extended to *also* be an oposite of
fetch, but still oposite of pull is push.
With the same reasoning the opposite of a duck is a lobster, since a
lobster has not only fewer wings, but also more legs.
No.

The basic pull/push actions are:

git pull: Bring the remote ref value here.
git push: Put the local ref value there.

Are those not oposites?

Than each command has it's different features on top of this -- pull merges
and push can push multiple refs -- but in the basic operation they are
oposites.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Michael Poole
2007-11-26 19:50:35 UTC
Permalink
Post by Jan Hudec
git pull: Bring the remote ref value here.
git push: Put the local ref value there.
Are those not oposites?
Than each command has it's different features on top of this -- pull merges
and push can push multiple refs -- but in the basic operation they are
oposites.
I think that is in absolute agreement with David: Ducks swim on the
surface of the water and lobsters swim underneath. Why consider the
different features on top of where they swim?

The thing about git-pull that surprises so many users is the merge.
There's a separate command to do that step, and git-pull had a fairly
good excuse to do the merge before git's 1.5.x remote system was in
place, but now the only really defensible reason for its behavior is
history.

Michael Poole
Jan Hudec
2007-11-26 20:09:13 UTC
Permalink
Post by Michael Poole
Post by Jan Hudec
git pull: Bring the remote ref value here.
git push: Put the local ref value there.
Are those not oposites?
Than each command has it's different features on top of this -- pull merges
and push can push multiple refs -- but in the basic operation they are
oposites.
I think that is in absolute agreement with David: Ducks swim on the
surface of the water and lobsters swim underneath. Why consider the
different features on top of where they swim?
The thing about git-pull that surprises so many users is the merge.
There's a separate command to do that step, and git-pull had a fairly
good excuse to do the merge before git's 1.5.x remote system was in
place, but now the only really defensible reason for its behavior is
history.
When I first looked at hg -- and that was long before I looked at git --
I was surprised that their pull did NOT merge and you had to do a separate
step. Partly because doing those two steps is quite common.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Michael Poole
2007-11-26 20:31:19 UTC
Permalink
Post by Jan Hudec
Post by Michael Poole
Post by Jan Hudec
git pull: Bring the remote ref value here.
git push: Put the local ref value there.
Are those not oposites?
Than each command has it's different features on top of this -- pull merges
and push can push multiple refs -- but in the basic operation they are
oposites.
I think that is in absolute agreement with David: Ducks swim on the
surface of the water and lobsters swim underneath. Why consider the
different features on top of where they swim?
The thing about git-pull that surprises so many users is the merge.
There's a separate command to do that step, and git-pull had a fairly
good excuse to do the merge before git's 1.5.x remote system was in
place, but now the only really defensible reason for its behavior is
history.
When I first looked at hg -- and that was long before I looked at git --
I was surprised that their pull did NOT merge and you had to do a separate
step. Partly because doing those two steps is quite common.
Frequency of use is a good argument for having one command that does
both. It is not a good argument that "fetch, then merge" should be
called "pull" or is the opposite of "push".

Michael Poole
Jon Smirl
2007-11-26 20:48:13 UTC
Permalink
Post by Michael Poole
Post by Jan Hudec
Post by Michael Poole
Post by Jan Hudec
git pull: Bring the remote ref value here.
git push: Put the local ref value there.
Are those not oposites?
Than each command has it's different features on top of this -- pull merges
and push can push multiple refs -- but in the basic operation they are
oposites.
I think that is in absolute agreement with David: Ducks swim on the
surface of the water and lobsters swim underneath. Why consider the
different features on top of where they swim?
The thing about git-pull that surprises so many users is the merge.
There's a separate command to do that step, and git-pull had a fairly
good excuse to do the merge before git's 1.5.x remote system was in
place, but now the only really defensible reason for its behavior is
history.
When I first looked at hg -- and that was long before I looked at git --
I was surprised that their pull did NOT merge and you had to do a separate
step. Partly because doing those two steps is quite common.
Frequency of use is a good argument for having one command that does
both. It is not a good argument that "fetch, then merge" should be
called "pull" or is the opposite of "push".
I'm starting to think that things oriented around the default names of
master and origin needs rethinking. Everything should use explicitly
named remotes. You could always do something like set a default remote
repository, but that is different than using the magic name 'origin'.
--
Jon Smirl
***@gmail.com
Andy Parkins
2007-11-26 20:11:02 UTC
Permalink
Marco Costalba
2007-11-26 19:25:07 UTC
Permalink
Post by Andy Parkins
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Erm... (it's much harder to come with lists like these lately :-))
- "git-gui" would be written in Qt (ducks)
But...wait...Qt would require...(I'm scared to say!)... that awful,
painful, hopeless thing called C++. Probably you didn't mean what you
said ;-)


Marco
Shawn O. Pearce
2007-11-27 01:20:13 UTC
Permalink
Post by Marco Costalba
Post by Andy Parkins
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
- "git-gui" would be written in Qt (ducks)
But...wait...Qt would require...(I'm scared to say!)... that awful,
painful, hopeless thing called C++. Probably you didn't mean what you
said ;-)
Heh.

I'll never port git-gui to Qt. Because of that awful, painful
thing called C++ that it uses. I despise C++. No, please don't
start a C++ language war again on the list. :-)


I recently considered porting git-gui to XUL, as nobody has ever
said "Firefox isn't native enough on my OS!". It also (maybe) has
the benefit of having a large developer base (everyone and their
dog has coded in HTML and Javascript before, except maybe Linus).

But XUL doesn't support launching a process and connecting pipes
to its stdin and stdout. I started to try and create an XPCOM
extension to provide that functionality from NSPR and started to
run into major problems compiling the XPCOM plugin, getting the
necessary interfaces implemented, etc.

In the end I was able to recreate the bulk of the main git-gui UI in
XUL in just an hour or so, but spent days trying to just do a basic
thing like "git diff-index --cached -z HEAD" and consume the result.
I never even got that to work so I just gave up on the idea.


So git-gui is in Tcl/Tk for the long-term. However I'm going
to try and port git-gui over to the Tcl/Tk 8.5 "tiles" extension
(if it is available on your system) so we can get better looking
native widgets. I'll still fall back to the old style widgets for
Tcl/Tk 8.4 so existing users aren't forced to upgrade to 8.5 just
to use the latest git-gui. (But really, 8.5 isn't that hard to
build and install...)
--
Shawn.
Jakub Narebski
2007-11-27 01:46:23 UTC
Permalink
Shawn O. Pearce wrote:

[git-gui in XUL]
Post by Shawn O. Pearce
But XUL doesn't support launching a process and connecting pipes
to its stdin and stdout. =A0I started to try and create an XPCOM
extension to provide that functionality from NSPR and started to
run into major problems compiling the XPCOM plugin, getting the
necessary interfaces implemented, etc.
What about Ajax / Comet support in XUL, Can this be used for that?
(Just an [perhaps stupid] idea).

--=20
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
Shawn O. Pearce
2007-11-27 01:58:33 UTC
Permalink
Post by Jakub Narebski
=20
[git-gui in XUL]
=20
Post by Shawn O. Pearce
But XUL doesn't support launching a process and connecting pipes
to its stdin and stdout. =C2=A0I started to try and create an XPCOM
extension to provide that functionality from NSPR and started to
run into major problems compiling the XPCOM plugin, getting the
necessary interfaces implemented, etc.
=20
What about Ajax / Comet support in XUL, Can this be used for that?
(Just an [perhaps stupid] idea).
Yes, XUL fully supports AJAX. If it didn't Google Maps and its
cool interface wouldn't exist. :-)

The problem there is that AJAX requires HTTP. So I'd have to
create a "micro HTTP server" that runs on the loopback interface
and listens for HTTP requests from the GUI, parses them, runs the
necessary Git action, then sends the results back to the GUI.

Sort of ugly.

My bigger concern is also for a shared machine; how do I secure
the HTTP server so only the git-gui process that is supposed to
be using it is able to access it? I guess I could create a 600
~/.gitguicookie file or some such entity and throw random data into
it to initialize it. That's basically all xauth is doing.


Actually I might revisit this XUL concept using an HTTP server and
AJAX. I could actually link the damn HTTP server against libgit.a
(Junio will hate me). If the server dies XUL can notice it and
simply restart it. But there's a whole suite of actions that I
can run through the internal APIs with high chances of success,
and a lot quicker than forking the corresponding plumbing process,
especially on fork challenged machines like Windows.

--=20
Shawn.
Johannes Schindelin
2007-11-27 11:39:32 UTC
Permalink
Hi,
Actually I might revisit this XUL concept using an HTTP server and AJAX.
I could actually link the damn HTTP server against libgit.a (Junio will
hate me). If the server dies XUL can notice it and simply restart it.
But if you can restart the HTTP server via XUL, you can start other git
programs directly.

What you'd have to do is (urgh) write a wrapper via start_command()
which would recognize that the second process die()d.

All in all, I think if you want to switch from Tcl/Tk to another language
for git-gui, for the sake of attracting more developers, it might be wiser
to go Java than XUL.

Ciao,
Dscho
Jakub Narebski
2007-11-27 23:59:41 UTC
Permalink
Post by Johannes Schindelin
Actually I might revisit this XUL concept using an HTTP server and AJAX.
I could actually link the damn HTTP server against libgit.a (Junio will
hate me). If the server dies XUL can notice it and simply restart it.
But if you can restart the HTTP server via XUL, you can start other git
programs directly.
What you'd have to do is (urgh) write a wrapper via start_command()
which would recognize that the second process die()d.
All in all, I think if you want to switch from Tcl/Tk to another language
for git-gui, for the sake of attracting more developers, it might be wiser
to go Java than XUL.
Wont we get with the same problems as egit/jgit?

----
This is proposed set of questions for git-gui mini survey...

1. What language and what toolkit should git-gui be written in?
(single choice)

a. Tcl/Tk (current implementation)
b. C++/Qt
c. C/GTK+
d. Python (native)
e. Python/PyQt
f. Python/PyGTK
g. Ruby
h. Java/Swing
i. Java/SWT
j. XUL+JavaScript+CSS/XULRunner
k. other
l. no opinion

2. If you have chosen "other" in question above, what language and
toolkit should it be? C/XForms? C#/Mono? C/wxWidgets? XAML+Silverlight?
GTK2-Perl? C/OpenGL? ;-)

3. Do you contribute to git-gui?
Yes/No

4. If git-gui would use other language/toolkit, would you contribute?
Yes/No

5. What languages and what toolkits you are proficient with (to send
patches)?
(multiple choice)

a. Tcl/Tk (current implementation)
b. C++/Qt
c. C/GTK+
d. Python (native)
e. Python/PyQt
f. Python/PyGTK
g. Ruby
h. Java/Swing
i. Java/SWT
j. XUL+JavaScript+CSS/XULRunner
k. other
l. N/A

6. What other?
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
Johannes Schindelin
2007-11-28 12:32:10 UTC
Permalink
Hi,
Post by Jakub Narebski
Post by Johannes Schindelin
Post by Shawn O. Pearce
Actually I might revisit this XUL concept using an HTTP server and
AJAX. I could actually link the damn HTTP server against libgit.a
(Junio will hate me). If the server dies XUL can notice it and
simply restart it.
But if you can restart the HTTP server via XUL, you can start other
git programs directly.
What you'd have to do is (urgh) write a wrapper via start_command()
which would recognize that the second process die()d.
All in all, I think if you want to switch from Tcl/Tk to another
language for git-gui, for the sake of attracting more developers, it
might be wiser to go Java than XUL.
Wont we get with the same problems as egit/jgit?
My idea was not to get the same problems, but to use jgit. After all,
Shawn made a point of separating the both.
Post by Jakub Narebski
----
This is proposed set of questions for git-gui mini survey...
1. What language and what toolkit should git-gui be written in?
(single choice)
a. Tcl/Tk (current implementation)
b. C++/Qt
c. C/GTK+
d. Python (native)
e. Python/PyQt
f. Python/PyGTK
g. Ruby
h. Java/Swing
i. Java/SWT
j. XUL+JavaScript+CSS/XULRunner
k. other
l. no opinion
I am pretty comfortable with a), but rather than go [b-gi-l] I would
prefer h).
Post by Jakub Narebski
3. Do you contribute to git-gui?
Yes/No
Yes (sort of; not half as much as I'd like to.)
Post by Jakub Narebski
4. If git-gui would use other language/toolkit, would you contribute?
Yes/No
Yes, as long as it is a language/toolkit that is available on all
platforms that I (have to) work. That pretty much excludes C# and Python
as a language.
Post by Jakub Narebski
5. What languages and what toolkits you are proficient with (to send
patches)?
(multiple choice)
a. Tcl/Tk (current implementation)
b. C++/Qt
c. C/GTK+
d. Python (native)
e. Python/PyQt
f. Python/PyGTK
g. Ruby
h. Java/Swing
i. Java/SWT
j. XUL+JavaScript+CSS/XULRunner
k. other
l. N/A
[abchk]
Post by Jakub Narebski
6. What other?
Personally, I am quite comfortable with the existing implementation, and
IMHO people dismiss contributing to git-gui too easily; Tcl is not all
that complicated, and it is not hard at all to change/imitate existing
code.

Ciao,
Dscho
Jason Sewall
2007-11-28 15:48:13 UTC
Permalink
Post by Johannes Schindelin
Hi,
Post by Jakub Narebski
1. What language and what toolkit should git-gui be written in?
(single choice)
a. Tcl/Tk (current implementation)
b. C++/Qt
c. C/GTK+
d. Python (native)
e. Python/PyQt
f. Python/PyGTK
g. Ruby
h. Java/Swing
i. Java/SWT
j. XUL+JavaScript+CSS/XULRunner
k. other
l. no opinion
Since we're listing off a bunch of toolkits, I should pitch FLTK,
which is well-supported across platforms, reasonably featured, and
pretty lightweight (probably much smaller than any of the other ones
listed, in terms of dependency installs)

That said...
Post by Johannes Schindelin
Personally, I am quite comfortable with the existing implementation, and
IMHO people dismiss contributing to git-gui too easily; Tcl is not all
that complicated, and it is not hard at all to change/imitate existing
code.
Agreed. I don't know much about Tcl/Tk, but I think that git-gui is
fine as-is. It's not very "pretty" compared to all of the fancy Gtk
apps the make up my system, but that's not an obstacle for me. (The
fonts are pretty bad, though)

Jason
Jan Hudec
2007-11-28 23:25:23 UTC
Permalink
Post by Johannes Schindelin
Post by Jakub Narebski
4. If git-gui would use other language/toolkit, would you contribute?
Yes/No
Yes, as long as it is a language/toolkit that is available on all
platforms that I (have to) work. That pretty much excludes C# and Python
as a language.
Out of interest, where does neither of those two work and Qt and tcl/tk do?
Mono and python both seem to be quite portable.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Johannes Schindelin
2007-11-28 23:48:12 UTC
Permalink
Hi,
Post by Jan Hudec
Post by Johannes Schindelin
Post by Jakub Narebski
4. If git-gui would use other language/toolkit, would you
contribute?
Yes/No
Yes, as long as it is a language/toolkit that is available on all
platforms that I (have to) work. That pretty much excludes C# and
Python as a language.
Out of interest, where does neither of those two work and Qt and tcl/tk do?
Mono and python both seem to be quite portable.
IRIX (an ancient one).

Besides, Mono is darned slow. Even Tcl/Tk is faster.

Furthermore, my complaint was not about a platform where neither C# nor
Python work. That is irrelevant. If you have one platform where only one
works, and another platform where only the other works, you cannot have a
single program for both platforms. Right?

Hth,
Dscho
Jan Hudec
2007-11-29 06:57:06 UTC
Permalink
Post by Johannes Schindelin
Hi,
Post by Jan Hudec
Post by Johannes Schindelin
Post by Jakub Narebski
4. If git-gui would use other language/toolkit, would you contribute?
Yes/No
Yes, as long as it is a language/toolkit that is available on all
platforms that I (have to) work. That pretty much excludes C# and
Python as a language.
Out of interest, where does neither of those two work and Qt and tcl/tk do?
Mono and python both seem to be quite portable.
IRIX (an ancient one).
Besides, Mono is darned slow. Even Tcl/Tk is faster.
On the shootout Mono seems to be an order of magnitude faster in most tests.
But maybe they are performing very poorly on some strange platform where they
don't have JIT.
Post by Johannes Schindelin
Furthermore, my complaint was not about a platform where neither C# nor
Python work. That is irrelevant. If you have one platform where only one
works, and another platform where only the other works, you cannot have a
single program for both platforms. Right?
Right.

I probably shouldn't be surprised that mono does not work on older unices,
but I am a bit surprised python does not.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Johannes Schindelin
2007-11-29 12:01:47 UTC
Permalink
Hi,
Post by Nicolas Pitre
Post by Johannes Schindelin
Furthermore, my complaint was not about a platform where neither C#
nor Python work. That is irrelevant. If you have one platform where
only one works, and another platform where only the other works, you
cannot have a single program for both platforms. Right?
Right.
I probably shouldn't be surprised that mono does not work on older unices,
but I am a bit surprised python does not.
*Sigh* I managed again to make myself misunderstood.

Even if newer Python does not easily compile on that IRIX, I have an old
Python there (2.2). But I don't have any Python on MSys. (Yes, there is
a _MinGW_ port, but no _MSys_ one.) So for me, Python is out.

Hth,
Dscho
Jan Hudec
2007-11-30 17:50:18 UTC
Permalink
Post by Johannes Schindelin
Hi,
Post by Nicolas Pitre
Post by Johannes Schindelin
Furthermore, my complaint was not about a platform where neither C#
nor Python work. That is irrelevant. If you have one platform where
only one works, and another platform where only the other works, you
cannot have a single program for both platforms. Right?
Right.
I probably shouldn't be surprised that mono does not work on older unices,
but I am a bit surprised python does not.
*Sigh* I managed again to make myself misunderstood.
Even if newer Python does not easily compile on that IRIX, I have an old
Python there (2.2). But I don't have any Python on MSys. (Yes, there is
a _MinGW_ port, but no _MSys_ one.) So for me, Python is out.
While it would be a problem, but is it really fatal? AFAIK MSys uses unixy
paths inside the program, but accepts arguments and calls other processes
using the windows convention, so mingw python should have no problem calling
msys programs and vice versa. It might be more problematic to compile
a shared module for it, but .dlls are quite well isolated, so even compiling
a plugin linked with msys for mingw python might not be impossible.

Nevertheless, I actually think git-gui is quite well in Tcl/Tk and rewriting
it in python nor any other language would probably help it in any way.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Marco Costalba
2007-11-30 18:25:27 UTC
Permalink
Post by Jan Hudec
Nevertheless, I actually think git-gui is quite well in Tcl/Tk and rewriting
it in python nor any other language would probably help it in any way.
A little provocation: I've never seen in open source a discussion on
what language to use for an application and then the development of
the application from scratch.

What I see daily instead is the effort of one (or a very little number
of people) to develop something in the language he choose and then ,
_after_ some code has been produced, the effort embraced by other
people that join the project.

Some near examples? gitk, gitweb, stgit, git itself especially for
shell parts (why shell should be a better prototyping language then
other prototyping languages? portability? easy to learn? performance?
library support? syntax? probably no one of the above in general
terms).

I would say this thread, although very interesting from a learning
point of view, it's a a little bit academic.


Marco
Shawn O. Pearce
2007-12-01 02:35:20 UTC
Permalink
Post by Jan Hudec
Nevertheless, I actually think git-gui is quite well in Tcl/Tk and rewriting
it in python nor any other language would probably help it in any way.
UNIX (really X11) users think git-gui looks like cr*p on their
systems as Tk draws with 1980s widgets, not 2007 style widgets.
They have every right to complain about the look and feel of the
application, its utter crap. Tk 8.5's tiles extension may help
that, but I haven't tried.

On Windows 2000/XP and Mac OS X I think I've gotten git-gui to
(almost) fit into the rest of the desktop. It fits into the Windows
UI better than it does Mac OS X, there are still some rough edges
where it is really obvious its not a native Mac OS X application.

On all platforms Tk has some "features" that are less than desirable.
For example it has been an absolute nightmare to get split pane
divider things to work on all systems. I can't tell you how many
days I spent just getting the main window to not react stupidly
on each system. And it *still* doesn't act right everywhere.
Sometimes if you resize the window the status bar on the bottom
disappears and Tk just clips it right out of the UI (no, I didn't
ask it to do that, Tk has bugs).

Building context sensitive menus isn't fun. Managing some data
structures in Tcl isn't fun. The list of why I'm currently unhappy
with Tcl/Tk for git-gui is actually pretty long.
--
Shawn.
Marco Costalba
2007-12-01 02:53:26 UTC
Permalink
Post by Shawn O. Pearce
Building context sensitive menus isn't fun. Managing some data
structures in Tcl isn't fun. The list of why I'm currently unhappy
with Tcl/Tk for git-gui is actually pretty long.
Not to advertise, just my two cents, but Qt with whatever language
binding you want to use, it's really powerful, easy to learn,
documentation is great, easy to create GUI forms, actually you don't
even need to program because Qt Designer let you create a form
graphically, the result is a XML like file that a Qt tool called UIC
transforms in a compilable file.

Qt library is consistent and complete and very portable, especially
Qt4 works and installs under different OS with no hassles. And the Qt
community (http://www.qtcentre.org/forum/) is very helpful and
supportive.

I really don't want to advertise, but after reading your list of
Tcl/Tk cons I was not able to stay quiet.

Marco
Sergei Organov
2007-11-28 13:18:09 UTC
Permalink
Jakub Narebski <***@gmail.com> writes:

[...]
Post by Jakub Narebski
This is proposed set of questions for git-gui mini survey...
1. What language and what toolkit should git-gui be written in?
(single choice)
a. Tcl/Tk (current implementation)
b. C++/Qt
c. C/GTK+
d. Python (native)
What's this? Tkinter? If so, it's better to be spelled "Python/Tk" here,
and probably should be removed anyway as there is no apparent reason to
re-implement current Tcl/Tk in Python/Tk.

Anyway, for Python as a language, a realistic choice of GUI seems to be
between PyGtk, PyQt, and WxPython.
--
Sergei.
Andy Parkins
2007-11-27 08:45:26 UTC
Permalink
Post by Marco Costalba
But...wait...Qt would require...(I'm scared to say!)... that awful,
painful, hopeless thing called C++. Probably you didn't mean what you
said ;-)
Actually although I like C++, that's not the reason, the reason is that Qt
is a significantly (IMHO) better toolkit than Tk. It's more cross platform
and looks a lot nicer. The fact that it's C++ is neither here nor there.

Personally I find these language wars a bit distasteful; to me programming
is programming - the language is a purely secondary point.



Andy
--
Dr Andy Parkins, M Eng (hons), MIET
***@gmail.com
Marco Costalba
2007-11-27 13:15:21 UTC
Permalink
Post by Andy Parkins
Post by Marco Costalba
But...wait...Qt would require...(I'm scared to say!)... that awful,
painful, hopeless thing called C++. Probably you didn't mean what you
said ;-)
Actually although I like C++, that's not the reason, the reason is that Qt
is a significantly (IMHO) better toolkit than Tk. It's more cross platform
and looks a lot nicer. The fact that it's C++ is neither here nor there.
Actually there exist a Python bindings for Qt if you prefer.

I was just joking about C++, never meant to start a "language war"
that I personally consider as very very un-useful and very pity.


Marco
Jan Hudec
2007-11-27 23:56:57 UTC
Permalink
Post by Marco Costalba
Post by Andy Parkins
Post by Marco Costalba
But...wait...Qt would require...(I'm scared to say!)... that awful,
painful, hopeless thing called C++. Probably you didn't mean what you
said ;-)
Actually although I like C++, that's not the reason, the reason is that Qt
is a significantly (IMHO) better toolkit than Tk. It's more cross platform
and looks a lot nicer. The fact that it's C++ is neither here nor there.
Actually there exist a Python bindings for Qt if you prefer.
I tried to write something in them and got a bit burned. Qt has it's
idea of memory management (delete children with parent) and the bindings
don't protect from accessing pointers to objects deleted this way, which can
cause rather hard to debug crashes.

Gtk seems to be much better for use from various scripting languages.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Johannes Schindelin
2007-11-27 17:48:16 UTC
Permalink
Hi,
Post by Andy Parkins
Actually although I like C++, that's not the reason, the reason is that Qt
is a significantly (IMHO) better toolkit than Tk. It's more cross platform
and looks a lot nicer.
Tcl/Tk was easier to install on a lot more platforms in my life than Qt.

Ciao,
Dscho
Andy Parkins
2007-12-04 11:00:42 UTC
Permalink
Post by Johannes Schindelin
Tcl/Tk was easier to install on a lot more platforms in my life than Qt.
I wasn't really thinking of the install; that's a packaging problem. I was
speaking of the toolkit itself. I know what you mean, but I wasn't even
thinking of cross-platform in a "number of places it can run" sense. What
I meant (although my point is irrelevant and way off the original question)
was the facilities available in the toolkit with a cross-platform
interface.

Qt puts a common face on threading, process control, networking, file
systems, internationalisation, rendering, openGL, and of course the GUI
itself. Tcl/Tk (to take the most wicked example) gives you applications
that are much harder to make run on Windows than on UNIX.

Anyway, I don't want to sound like a strange Qt fan boy; the above is simply
my justification for putting "git-gui in Qt" on my wish list.



Andy
--
Dr Andy Parkins, M Eng (hons), MIET
***@gmail.com
Jing Xue
2007-11-27 17:33:46 UTC
Permalink
Post by Andy Parkins
- "index", "cached" and "stage" are a definite source of confusion
Hear, hear.
Post by Andy Parkins
- "git add" and "git rm" would be nicer as "git stage" and "git unstage"
(or something similar)
Not sure it would be that easy. (As I have just learned recently) "git
rm" is the opposite of "git add" _only_ in the case of
files-not-previously-tracked. And the opposite of "git add <file>" for
files-already-being-tracked is "git reset HEAD -- <file>", which is
probably where you were going with "git unstage" 8-) .
Post by Andy Parkins
- libgit would have come first
- "git revert" should be called "git invert"
- "git revert" would (maybe) be "git reset"
- "git clone" wouldn't exist
Why? AFAIC, git clone works out quite well - both functionality and
naming wise.

Cheers.
--
Jing Xue
Jon Smirl
2007-11-26 16:48:20 UTC
Permalink
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
I would sit down and carefully design the command syntax. git's
biggest criticism is that it is hard to use and this is mainly caused
by the seemingly very complex commands. Much of this complexity could
be hidden from the user.

I'd also integrated a patch management system like stgit. I'm using
stgit commands for 90% of my tasks and it has a different syntax than
git (its trying to fix some of the problems).

Most current git users are knowledgeable programmers and could handle
a rework of the git command syntax. The sooner the syntax is reworked
the better in my opinion. The current syntax grew organically as we
learned what git needed. Now's the time to use this knowledge and
design an optimal command structure.
--
Jon Smirl
***@gmail.com
David Kastrup
2007-11-26 17:11:43 UTC
Permalink
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Get rid of plumbing at the command line level. It is confusing to
users, and command line arguments, exec calls and I/O streams are not
efficient and reasonably typed mechanisms for the kind of operations
done in plumbing. Instead using a good extensible portable scripting
language (I consider Lua quite suitable in that regard, but it is
conceivable that something with a native list type supporting easy
sorts, merges and selections could be more efficient) and implementing
plumbing in that or in C would have been preferable for creating the
porcelain.

That would keep plumbing out of the hair of users and make it easier to
cobble together extensions and variations with non-trivial internal
dataflow.

Shell scripts have also proven to be a constant hassle with regard to
portability and bugs (like underquoting).
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jan Hudec
2007-11-26 19:27:03 UTC
Permalink
Post by David Kastrup
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Get rid of plumbing at the command line level. It is confusing to
No, please. It's extremely useful. It should be a bit more hidden, but it's
a big advantage of git that the plumbing is available.
Post by David Kastrup
users, and command line arguments, exec calls and I/O streams are not
efficient and reasonably typed mechanisms for the kind of operations
done in plumbing. Instead using a good extensible portable scripting
language (I consider Lua quite suitable in that regard, but it is
conceivable that something with a native list type supporting easy
sorts, merges and selections could be more efficient) and implementing
plumbing in that or in C would have been preferable for creating the
porcelain.
POSIX shell is really the best extensible portable scripting language
available for the job. Because the whipuptitude is the most important
property and shell is simply best at one-liners. And since you use it
for regular work (running editor, compiler, git porcelain), it is the
obvious choice for whiping up a short function.
Post by David Kastrup
That would keep plumbing out of the hair of users and make it easier to
cobble together extensions and variations with non-trivial internal
dataflow.
Shell scripts have also proven to be a constant hassle with regard to
portability and bugs (like underquoting).
--
Jan 'Bulb' Hudec <***@ucw.cz>
Benoit Sigoure
2007-11-26 20:11:41 UTC
Permalink
Post by Jan Hudec
Post by David Kastrup
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Get rid of plumbing at the command line level. It is confusing to
No, please. It's extremely useful. It should be a bit more hidden, but it's
a big advantage of git that the plumbing is available.
Post by David Kastrup
users, and command line arguments, exec calls and I/O streams are not
efficient and reasonably typed mechanisms for the kind of operations
done in plumbing. Instead using a good extensible portable scripting
language (I consider Lua quite suitable in that regard, but it is
conceivable that something with a native list type supporting easy
sorts, merges and selections could be more efficient) and
implementing
plumbing in that or in C would have been preferable for creating the
porcelain.
POSIX shell is really the best extensible portable scripting language
available for the job. Because the whipuptitude is the most important
property and shell is simply best at one-liners. And since you use it
for regular work (running editor, compiler, git porcelain), it is the
obvious choice for whiping up a short function.
Perl seems pretty portable. If we had a decent, complete libgit, it
would be easy to create bindings for various languages and script Git
in other languages than Shell script.
--
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory
Jan Hudec
2007-11-26 20:36:54 UTC
Permalink
Post by Jan Hudec
Post by David Kastrup
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Get rid of plumbing at the command line level. It is confusing to
No, please. It's extremely useful. It should be a bit more hidden, but it's
a big advantage of git that the plumbing is available.
Post by David Kastrup
users, and command line arguments, exec calls and I/O streams are not
efficient and reasonably typed mechanisms for the kind of operations
done in plumbing. Instead using a good extensible portable scripting
language (I consider Lua quite suitable in that regard, but it is
conceivable that something with a native list type supporting easy
sorts, merges and selections could be more efficient) and implementing
plumbing in that or in C would have been preferable for creating the
porcelain.
POSIX shell is really the best extensible portable scripting language
available for the job. Because the whipuptitude is the most important
property and shell is simply best at one-liners. And since you use it
for regular work (running editor, compiler, git porcelain), it is the
obvious choice for whiping up a short function.
Perl seems pretty portable. If we had a decent, complete libgit, it would
be easy to create bindings for various languages and script Git in other
languages than Shell script.
Perl might be good for the lower level stuff (and is indeed used for that in
git a lot), but most useful tools on top of git gather few bigish bits
(contents of whole files and such) and pass them to some application. And
this is what shell is really good at.

So yes, more direct interfaces for various languages would certainly be good,
but it would never be a full replacement for the process interface. It is
most generic and for many hacks the easiest thing to use.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Nicolas Pitre
2007-11-26 19:30:19 UTC
Permalink
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing. It is part of Git probably forever and is
really really convenient for scripting in any language you want.

The only valid argument IMHO is the way too large number of Git commands
directly available from the cmdline.

The solution: make purely plumbing commands _not_ directly available
from the command line. Instead, they can be available through 'git
lowlevel <blah>' instead of 'git <blah>' and only 'git lowlevel' would
stand in your shell default path.

Such a scheme can be implemented in parallel with the current one for a
release while the direct plumbing commands are deprecated in order to
give script authors a transition period to fix their code.


Nicolas
David Kastrup
2007-11-26 19:34:25 UTC
Permalink
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing.
What about "at the command line level" did you not understand?
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jan Hudec
2007-11-26 19:57:50 UTC
Permalink
Post by David Kastrup
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing.
What about "at the command line level" did you not understand?
Which part of we neither can nor want did you not understant?

The availability of plumbing is really big part of a reason why git is so
good and has so many scripts and tool built on top of it. Bzr and hg boast
with their ability to add plugins, but git ability to use plumbing simply
beats that hands down, because the plugins are python-only and writing them
requires understanding the internal API, while git plumbing can be used from
any language and can usually be understood by running it interactively a few
times.

That's why we don't want (and really can't because there is a huge amount of
code in various languages using it) to get rid of plumbing at the command
level. What we may do is hide it from the casual user.

To do that, we'd want to get rid of the git-* commands and links in bin
(remove the builtins altogether and move the non-builtin to libexec -- that
seems to be the plan for 1.6 or 1.7 already) and than hiding the plumbing
from --help and completion hides it from the user.
--
Jan 'Bulb' Hudec <***@ucw.cz>
David Kastrup
2007-11-26 20:35:56 UTC
Permalink
Post by Jan Hudec
Post by David Kastrup
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing.
What about "at the command line level" did you not understand?
Which part of we neither can nor want did you not understant?
The availability of plumbing is really big part of a reason why git is
so good and has so many scripts and tool built on top of it.
Which is the reason I proposed making the plumbing available at a
scripting level, not at the command line level.

The actual trend we are getting nowadays is locking the porcelaine,
previously available as shell scripts, down into C code, _without_
making use of a reasonable plumbing layer suitable for any scripting at
all.

So the git community at the same time praises shell scripting and
simultanouesly replaces it without even using the available plumbing,
_and_ claims that _both_, exclusive and incompatible approaches, are the
perfect solution. At the same time. While fighting the shell
portability fight continuously, on Unix as well as Windows.

I may have a big mouth, but swallowing all of this at once is beyond me.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jan Hudec
2007-11-26 21:00:06 UTC
Permalink
Post by David Kastrup
Post by Jan Hudec
Post by David Kastrup
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing.
What about "at the command line level" did you not understand?
Which part of we neither can nor want did you not understant?
The availability of plumbing is really big part of a reason why git is
so good and has so many scripts and tool built on top of it.
Which is the reason I proposed making the plumbing available at a
scripting level, not at the command line level.
But scripting in the first place means *SHELL* scripting. Or you normally use
Lua command line for your daily work?
Post by David Kastrup
The actual trend we are getting nowadays is locking the porcelaine,
previously available as shell scripts, down into C code, _without_
making use of a reasonable plumbing layer suitable for any scripting at
all.
For myself I would say I don't think C is an appropriate tool for the job. It
is nice when you need to optimize things to the last instruction, but for my
taste it's unwieldy for the high-level stuff.
Post by David Kastrup
So the git community at the same time praises shell scripting and
simultanouesly replaces it without even using the available plumbing,
_and_ claims that _both_, exclusive and incompatible approaches, are the
perfect solution. At the same time. While fighting the shell
portability fight continuously, on Unix as well as Windows.
Well, the builtins *do* use the plumbing. They just use the C functions
without using streams and forks. Isn't that what you wanted?

But the key reason for keeping the plumbing around is prototyping and
especially tailoring. Junio has many scripts (you can look at them in the
todo branch in git repo) to support his particular workflow and plumbing is
useful there. And shell is really the right tool for such things.
Post by David Kastrup
I may have a big mouth, but swallowing all of this at once is beyond me.
--
Jan 'Bulb' Hudec <***@ucw.cz>
Nicolas Pitre
2007-11-26 21:28:40 UTC
Permalink
Post by David Kastrup
Post by Jan Hudec
Post by David Kastrup
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing.
What about "at the command line level" did you not understand?
Which part of we neither can nor want did you not understant?
The availability of plumbing is really big part of a reason why git is
so good and has so many scripts and tool built on top of it.
Which is the reason I proposed making the plumbing available at a
scripting level, not at the command line level.
You're mixing two orthogonal issues, namely: 1) the scripting language,
and 2) the too large number of Git command accessible through your
default path.

#1 is a non issue really. We don't want to lock plumbing to any
particular scripting language, and the current interface is the most
universal one in that regard.

#2 can be solved through a single multiplexer such as 'git low-level'.

That 'git low-level foo' may just look up git-foo in some libexec
directory, and only 'git-low-level' need to be in the path instead of
all those plumbing commands.

Need only to have both forms ('git foo' and 'git low-level foo') to work
for a transition period.


Nicolas
Wincent Colaiuta
2007-11-26 20:45:16 UTC
Permalink
The availability of plumbing is really big part of a reason why git =20
is so
good and has so many scripts and tool built on top of it.
Yes, the plumbing is really lovely when it comes time to whipping =20
together a quick tool for a special task; much nicer than writing a =20
plugin.

=46or the benefit of newcomers, I just wish the plumbing was kept a =20
little bit out of sight. You know, porcelain in /usr/bin and plumbing =20
in /usr/libexec or other such place.

It's fine once you've learnt your workflows and know the 10 or 15 Git =20
tools that you'll be using day-to-day; but for people who are just =20
starting off this can be a little bit intimidating:

$ git-<tab>
Display all 146 possibilities? (y or n)

Cheers,
Wincent
Junio C Hamano
2007-11-26 21:24:22 UTC
Permalink
For the benefit of newcomers, I just wish the plumbing was kept a
little bit out of sight. You know, porcelain in /usr/bin and plumbing
in /usr/libexec or other such place.
It's fine once you've learnt your workflows and know the 10 or 15 Git
tools that you'll be using day-to-day; but for people who are just
$ git-<tab>
Display all 146 possibilities? (y or n)
I'd agree to that but I've always considered this an issue for distros.
We've supported an ability for them to specify a gitexecdir separate
from /usr/bin in our Makefile for almost two years.

The tab completion for bash and zsh would also help you here, but I see
there are quite a few commands that should not be there, and it's time
to clean it up.

$ git <tab>
add fetch push
am filter-branch rebase
annotate format-patch rebase--interactive
apply fsck relink
archive gc remote
bisect get-tar-commit-id repack
blame grep request-pull
branch gui reset
bundle imap-send resolve
checkout init revert
checkout-index instaweb rm
cherry less send-email
cherry-pick lg shortlog
citool log show
clean lost-found show-branch
clone ls-files show-ref
co ls-remote stash
commit ls-tree status
config merge submodule
convert-objects mergetool svnimport
count-objects mv tag
describe name-rev var
diff pickaxe verify-pack
diff-stages pull whatchanged

Perhaps this list can be a starting point...

contrib/completion/git-completion.bash | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index cad842a..1bba68b 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -359,6 +359,15 @@ __git_commands ()
upload-pack) : plumbing;;
write-tree) : plumbing;;
verify-tag) : plumbing;;
+ annotate) : use blame;;
+ checkout-index) : plumbing;;
+ diff-stages) : plumbing;;
+ get-tar-commit-id) : plumbing;;
+ lost-found) : deprecated;;
+ rebase--interactive) : plumbing;;
+ relink) : obsolete;;
+ whatchanged) : plumbing;;
+ verify-pack) : plumbing;;
*) echo $i;;
esac
done
Nicolas Pitre
2007-11-26 21:35:18 UTC
Permalink
Post by Junio C Hamano
For the benefit of newcomers, I just wish the plumbing was kept a
little bit out of sight. You know, porcelain in /usr/bin and plumbing
in /usr/libexec or other such place.
It's fine once you've learnt your workflows and know the 10 or 15 Git
tools that you'll be using day-to-day; but for people who are just
$ git-<tab>
Display all 146 possibilities? (y or n)
I'd agree to that but I've always considered this an issue for distros.
We've supported an ability for them to specify a gitexecdir separate
from /usr/bin in our Makefile for almost two years.
Would probably be a good thing to start enforcing that by default. It's
easier to follow such policies when they're coordinated from the project
origin.


Nicolas
Junio C Hamano
2007-11-26 21:47:18 UTC
Permalink
Post by Nicolas Pitre
Post by Junio C Hamano
For the benefit of newcomers, I just wish the plumbing was kept a
little bit out of sight. You know, porcelain in /usr/bin and plumbing
in /usr/libexec or other such place.
It's fine once you've learnt your workflows and know the 10 or 15 Git
tools that you'll be using day-to-day; but for people who are just
$ git-<tab>
Display all 146 possibilities? (y or n)
I'd agree to that but I've always considered this an issue for distros.
We've supported an ability for them to specify a gitexecdir separate
from /usr/bin in our Makefile for almost two years.
Would probably be a good thing to start enforcing that by default. It's
easier to follow such policies when they're coordinated from the project
origin.
Not really. The project origin ships the Makefile to install under
$HOME, but I do not see any distros following that.
Nicolas Pitre
2007-11-26 22:03:37 UTC
Permalink
Post by Junio C Hamano
Post by Nicolas Pitre
Post by Junio C Hamano
For the benefit of newcomers, I just wish the plumbing was kept a
little bit out of sight. You know, porcelain in /usr/bin and plumbing
in /usr/libexec or other such place.
It's fine once you've learnt your workflows and know the 10 or 15 Git
tools that you'll be using day-to-day; but for people who are just
$ git-<tab>
Display all 146 possibilities? (y or n)
I'd agree to that but I've always considered this an issue for distros.
We've supported an ability for them to specify a gitexecdir separate
from /usr/bin in our Makefile for almost two years.
Would probably be a good thing to start enforcing that by default. It's
easier to follow such policies when they're coordinated from the project
origin.
Not really. The project origin ships the Makefile to install under
$HOME, but I do not see any distros following that.
What about the default RPM spec file?


Nicolas
Shawn O. Pearce
2007-11-27 01:03:50 UTC
Permalink
Post by Junio C Hamano
Post by Wincent Colaiuta
$ git-<tab>
Display all 146 possibilities? (y or n)
The tab completion for bash and zsh would also help you here, but I see
there are quite a few commands that should not be there, and it's time
to clean it up.
...
Post by Junio C Hamano
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index cad842a..1bba68b 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -359,6 +359,15 @@ __git_commands ()
upload-pack) : plumbing;;
write-tree) : plumbing;;
verify-tag) : plumbing;;
+ annotate) : use blame;;
+ checkout-index) : plumbing;;
+ diff-stages) : plumbing;;
+ get-tar-commit-id) : plumbing;;
+ lost-found) : deprecated;;
+ rebase--interactive) : plumbing;;
+ relink) : obsolete;;
+ whatchanged) : plumbing;;
+ verify-pack) : plumbing;;
*) echo $i;;
esac
done
Ack'd-by: Shawn O. Pearce <***@spearce.org>

;-)
--
Shawn.
Junio C Hamano
2007-11-27 03:35:45 UTC
Permalink
Post by Junio C Hamano
Post by Wincent Colaiuta
$ git-<tab>
Display all 146 possibilities? (y or n)
The tab completion for bash and zsh would also help you here, but I see
there are quite a few commands that should not be there, and it's time
to clean it up.
...
Post by Junio C Hamano
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index cad842a..1bba68b 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -359,6 +359,15 @@ __git_commands ()
upload-pack) : plumbing;;
write-tree) : plumbing;;
verify-tag) : plumbing;;
+ annotate) : use blame;;
+ checkout-index) : plumbing;;
+ diff-stages) : plumbing;;
+ get-tar-commit-id) : plumbing;;
+ lost-found) : deprecated;;
+ rebase--interactive) : plumbing;;
+ relink) : obsolete;;
+ whatchanged) : plumbing;;
+ verify-pack) : plumbing;;
*) echo $i;;
esac
done
;-)
Seriously, speaking I find this "negative" list ugly. I am wondering if
it makes more sense to use positive "Porcelain" list, or perhaps even
"The most commonly used" list from "git help" output.

Here is an alternate attempt to it.

---

Documentation/cmd-list.perl | 59 ++++++++++++------------
Makefile | 2 +-
contrib/completion/git-completion.bash | 77 ++------------------------------
generate-cmdlist.sh | 34 ++------------
4 files changed, 40 insertions(+), 132 deletions(-)

diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index b709551..a966b5e 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -26,10 +26,11 @@ sub format_one {
if (!defined $description) {
die "No description found in $name.txt";
}
+
if (my ($verify_name, $text) = ($description =~ /^($name) - (.*)/)) {
print $out "gitlink:$name\[1\]::\n\t";
- if ($attr) {
- print $out "($attr) ";
+ if ($attr =~ /deprecated/) {
+ print $out "(deprecated) ";
}
print $out "$text.\n\n";
}
@@ -75,27 +76,27 @@ for my $cat (qw(ancillaryinterrogators
# The following list is sorted with "sort -d" to make it easier
# to find entry in the resulting git.html manual page.
__DATA__
-git-add mainporcelain
+git-add mainporcelain common
git-am mainporcelain
git-annotate ancillaryinterrogators
-git-apply plumbingmanipulators
+git-apply plumbingmanipulators common
git-archimport foreignscminterface
-git-archive mainporcelain
-git-bisect mainporcelain
+git-archive mainporcelain common
+git-bisect mainporcelain common
git-blame ancillaryinterrogators
-git-branch mainporcelain
+git-branch mainporcelain common
git-bundle mainporcelain
git-cat-file plumbinginterrogators
git-check-attr purehelpers
-git-checkout mainporcelain
+git-checkout mainporcelain common
git-checkout-index plumbingmanipulators
git-check-ref-format purehelpers
git-cherry ancillaryinterrogators
-git-cherry-pick mainporcelain
+git-cherry-pick mainporcelain common
git-citool mainporcelain
git-clean mainporcelain
-git-clone mainporcelain
-git-commit mainporcelain
+git-clone mainporcelain common
+git-commit mainporcelain common
git-commit-tree plumbingmanipulators
git-config ancillarymanipulators
git-count-objects ancillaryinterrogators
@@ -104,12 +105,12 @@ git-cvsimport foreignscminterface
git-cvsserver foreignscminterface
git-daemon synchingrepositories
git-describe mainporcelain
-git-diff mainporcelain
+git-diff mainporcelain common
git-diff-files plumbinginterrogators
git-diff-index plumbinginterrogators
git-diff-tree plumbinginterrogators
git-fast-import ancillarymanipulators
-git-fetch mainporcelain
+git-fetch mainporcelain common
git-fetch-pack synchingrepositories
git-filter-branch ancillarymanipulators
git-fmt-merge-msg purehelpers
@@ -118,24 +119,24 @@ git-format-patch mainporcelain
git-fsck ancillaryinterrogators
git-gc mainporcelain
git-get-tar-commit-id ancillaryinterrogators
-git-grep mainporcelain
+git-grep mainporcelain common
git-gui mainporcelain
git-hash-object plumbingmanipulators
git-http-fetch synchelpers
git-http-push synchelpers
git-imap-send foreignscminterface
git-index-pack plumbingmanipulators
-git-init mainporcelain
+git-init mainporcelain common
git-instaweb ancillaryinterrogators
gitk mainporcelain
-git-log mainporcelain
+git-log mainporcelain common
git-lost-found ancillarymanipulators deprecated
git-ls-files plumbinginterrogators
git-ls-remote plumbinginterrogators
git-ls-tree plumbinginterrogators
git-mailinfo purehelpers
git-mailsplit purehelpers
-git-merge mainporcelain
+git-merge mainporcelain common
git-merge-base plumbinginterrogators
git-merge-file plumbingmanipulators
git-merge-index plumbingmanipulators
@@ -144,7 +145,7 @@ git-mergetool ancillarymanipulators
git-merge-tree ancillaryinterrogators
git-mktag plumbingmanipulators
git-mktree plumbingmanipulators
-git-mv mainporcelain
+git-mv mainporcelain common
git-name-rev plumbinginterrogators
git-pack-objects plumbingmanipulators
git-pack-redundant plumbinginterrogators
@@ -152,13 +153,13 @@ git-pack-refs ancillarymanipulators
git-parse-remote synchelpers
git-patch-id purehelpers
git-peek-remote purehelpers deprecated
-git-prune ancillarymanipulators
+git-prune ancillarymanipulators common
git-prune-packed plumbingmanipulators
-git-pull mainporcelain
-git-push mainporcelain
+git-pull mainporcelain common
+git-push mainporcelain common
git-quiltimport foreignscminterface
git-read-tree plumbingmanipulators
-git-rebase mainporcelain
+git-rebase mainporcelain common
git-receive-pack synchelpers
git-reflog ancillarymanipulators
git-relink ancillarymanipulators
@@ -166,28 +167,28 @@ git-remote ancillarymanipulators
git-repack ancillarymanipulators
git-request-pull foreignscminterface
git-rerere ancillaryinterrogators
-git-reset mainporcelain
-git-revert mainporcelain
+git-reset mainporcelain common
+git-revert mainporcelain common
git-rev-list plumbinginterrogators
git-rev-parse ancillaryinterrogators
-git-rm mainporcelain
+git-rm mainporcelain common
git-runstatus ancillaryinterrogators
git-send-email foreignscminterface
git-send-pack synchingrepositories
git-shell synchelpers
git-shortlog mainporcelain
-git-show mainporcelain
-git-show-branch ancillaryinterrogators
+git-show mainporcelain common
+git-show-branch ancillaryinterrogators common
git-show-index plumbinginterrogators
git-show-ref plumbinginterrogators
git-sh-setup purehelpers
git-stash mainporcelain
-git-status mainporcelain
+git-status mainporcelain common
git-stripspace purehelpers
git-submodule mainporcelain
git-svn foreignscminterface
git-symbolic-ref plumbingmanipulators
-git-tag mainporcelain
+git-tag mainporcelain common
git-tar-tree plumbinginterrogators deprecated
git-unpack-file plumbinginterrogators
git-unpack-objects plumbingmanipulators
diff --git a/Makefile b/Makefile
index ccf522a..ca1c2f5 100644
--- a/Makefile
+++ b/Makefile
@@ -804,7 +804,7 @@ git-merge-subtree$X: git-merge-recursive$X
$(BUILT_INS): git$X
$(QUIET_BUILT_IN)$(RM) $@ && ln git$X $@

-common-cmds.h: ./generate-cmdlist.sh
+common-cmds.h: ./generate-cmdlist.sh Documentation/cmd-list.perl

common-cmds.h: $(wildcard Documentation/git-*.txt)
$(QUIET_GEN)./generate-cmdlist.sh > $@+ && mv $@+ $@
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 599b2fc..d54b415 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -287,79 +287,10 @@ __git_commands ()
echo "$__git_commandlist"
return
fi
- local i IFS=" "$'\n'
- for i in $(git help -a|egrep '^ ')
- do
- case $i in
- add--interactive) : plumbing;;
- applymbox) : ask gittus;;
- applypatch) : ask gittus;;
- archimport) : import;;
- cat-file) : plumbing;;
- check-attr) : plumbing;;
- check-ref-format) : plumbing;;
- commit-tree) : plumbing;;
- cvsexportcommit) : export;;
- cvsimport) : import;;
- cvsserver) : daemon;;
- daemon) : daemon;;
- diff-files) : plumbing;;
- diff-index) : plumbing;;
- diff-tree) : plumbing;;
- fast-import) : import;;
- fsck-objects) : plumbing;;
- fetch--tool) : plumbing;;
- fetch-pack) : plumbing;;
- fmt-merge-msg) : plumbing;;
- for-each-ref) : plumbing;;
- hash-object) : plumbing;;
- http-*) : transport;;
- index-pack) : plumbing;;
- init-db) : deprecated;;
- local-fetch) : plumbing;;
- mailinfo) : plumbing;;
- mailsplit) : plumbing;;
- merge-*) : plumbing;;
- mktree) : plumbing;;
- mktag) : plumbing;;
- pack-objects) : plumbing;;
- pack-redundant) : plumbing;;
- pack-refs) : plumbing;;
- parse-remote) : plumbing;;
- patch-id) : plumbing;;
- peek-remote) : plumbing;;
- prune) : plumbing;;
- prune-packed) : plumbing;;
- quiltimport) : import;;
- read-tree) : plumbing;;
- receive-pack) : plumbing;;
- reflog) : plumbing;;
- repo-config) : plumbing;;
- rerere) : plumbing;;
- rev-list) : plumbing;;
- rev-parse) : plumbing;;
- runstatus) : plumbing;;
- sh-setup) : internal;;
- shell) : daemon;;
- send-pack) : plumbing;;
- show-index) : plumbing;;
- ssh-*) : transport;;
- stripspace) : plumbing;;
- svn) : import export;;
- symbolic-ref) : plumbing;;
- tar-tree) : deprecated;;
- unpack-file) : plumbing;;
- unpack-objects) : plumbing;;
- update-index) : plumbing;;
- update-ref) : plumbing;;
- update-server-info) : daemon;;
- upload-archive) : plumbing;;
- upload-pack) : plumbing;;
- write-tree) : plumbing;;
- verify-tag) : plumbing;;
- *) echo $i;;
- esac
- done
+ git help | sed -e '
+ 1,/^The most commonly used git/d
+ s/^ *\([^ ][^ ]*\)[ ].*/\1/
+ '
}
__git_commandlist=
__git_commandlist="$(__git_commands 2>/dev/null)"
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 17df47b..28f9749 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -9,35 +9,11 @@ struct cmdname_help

static struct cmdname_help common_cmds[] = {"

-sort <<\EOF |
-add
-apply
-archive
-bisect
-branch
-checkout
-cherry-pick
-clone
-commit
-diff
-fetch
-grep
-init
-log
-merge
-mv
-prune
-pull
-push
-rebase
-reset
-revert
-rm
-show
-show-branch
-status
-tag
-EOF
+sed -n -e '
+ 1,/__DATA__/d
+ s/^git-\([^ ]*\)[ ].*[ ]common/\1/p
+' Documentation/cmd-list.perl |
+sort |
while read cmd
do
sed -n '
Steven Grimm
2007-11-27 05:10:15 UTC
Permalink
Post by Junio C Hamano
Seriously, speaking I find this "negative" list ugly. I am
wondering if
it makes more sense to use positive "Porcelain" list, or perhaps even
"The most commonly used" list from "git help" output.
Yes, a positive list makes much more sense. If for no other reason
than that it will require the author of a new command to make a
conscious decision before that command will be suggested to users.

-Steve
Johannes Schindelin
2007-11-26 21:27:54 UTC
Permalink
Hi,
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing. It is part of Git probably forever and is
really really convenient for scripting in any language you want.
I agree, but that's not even the complete truth. Git would be not even
half as useful as it is without its scriptability.

So it is not only convenience, but very much a reason that git development
is so fast. That, and that more people than elsewhere let code talk.
Which is also much easier when you have a scriptable system.

Ciao,
Dscho
Nicolas Pitre
2007-11-26 21:39:23 UTC
Permalink
Post by Johannes Schindelin
Hi,
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing. It is part of Git probably forever and is
really really convenient for scripting in any language you want.
I agree, but that's not even the complete truth. Git would be not even
half as useful as it is without its scriptability.
Sure, but this is missing the point.

The issue at hand is about the fact that way too many Git commands are
to be found in the default command path. Diverging on whether or not
plumbing is useful is the wrong question.


Nicolas
Johannes Schindelin
2007-11-26 21:40:49 UTC
Permalink
Hi,
Post by Nicolas Pitre
Post by Johannes Schindelin
I agree, but that's not even the complete truth. Git would be not
even half as useful as it is without its scriptability.
Sure, but this is missing the point.
The issue at hand is about the fact that way too many Git commands are
to be found in the default command path. Diverging on whether or not
plumbing is useful is the wrong question.
Ah, thanks. I use a spam filter here, so I did not get the complete
context.

Sorry,
Dscho
Andreas Ericsson
2007-11-27 14:11:39 UTC
Permalink
Post by Nicolas Pitre
Post by David Kastrup
Get rid of plumbing at the command line level.
We can't get rid of plumbing. It is part of Git probably forever and is
really really convenient for scripting in any language you want.
The only valid argument IMHO is the way too large number of Git commands
directly available from the cmdline.
The solution: make purely plumbing commands _not_ directly available
from the command line. Instead, they can be available through 'git
lowlevel <blah>' instead of 'git <blah>' and only 'git lowlevel' would
stand in your shell default path.
Such a scheme can be implemented in parallel with the current one for a
release while the direct plumbing commands are deprecated in order to
give script authors a transition period to fix their code.
The "git-cmd" form of writing commands was deemed obsolete round about
the time git.sh was rewritten in C. There's just no reason for it
anymore.

It's unfortunate that git-sh-setup makes it equally valid for scripts to
use either form, as we can never get rid of the dashed form when so many
scripts in the core distribution uses it.

Ah well.
--
Andreas Ericsson ***@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Jakub Narebski
2007-11-27 14:38:32 UTC
Permalink
Post by Andreas Ericsson
The "git-cmd" form of writing commands was deemed obsolete round about
the time git.sh was rewritten in C. There's just no reason for it
anymore.
It's unfortunate that git-sh-setup makes it equally valid for scripts to
use either form, as we can never get rid of the dashed form when so many
scripts in the core distribution uses it.
Ah well.
I think it would be enough to have "git" and perhaps "git-sh-setup"
in PATH, and the rest of git-cmd in EXEC_PATH != PATH.
--
Jakub Narebski
Poland
Dana How
2007-11-26 19:18:41 UTC
Permalink
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this. It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.

So far this has prevented me from deploying git here
(and is half the reason I have not been active recently).
Currently we use p4 and we have large files.
When a large file is checked in (submitted),
it is compressed *once* and sent over the network --
these are the only delays that end-users experience.

The equivalent operation in git would require the creation of
the blob, and then of a temporary pack to send to the server.
This requires 3 calls to zlib for each blob, which for very
large files is not acceptable at my site.

Yes, git has much better features.
But 80%+ of my workgroup will not use them,
and only notice that git is "slower".

Thanks,
--
Dana L. How ***@gmail.com +1 650 804 5991 cell
Nicolas Pitre
2007-11-26 19:52:05 UTC
Permalink
Post by Dana How
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
I don't see why you would need the pack->blob copy normally.
Post by Dana How
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this.
Then you can do just that for big enough blobs where "big enough" is
configurable: encapsulate them in a pack instead of a loose object.
Problem solved. Sure you'll end up with a bunch of packs containing
only one blob object, but given that those blobs are so large to be a
problem in your work flow when written out as loose objects, then they
certainly must be few enough not to cause an explosion in the number of
packs.
Post by Dana How
It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
They already don't.
Post by Dana How
So far this has prevented me from deploying git here
(and is half the reason I have not been active recently).
Currently we use p4 and we have large files.
When a large file is checked in (submitted),
it is compressed *once* and sent over the network --
these are the only delays that end-users experience.
The equivalent operation in git would require the creation of
the blob, and then of a temporary pack to send to the server.
This requires 3 calls to zlib for each blob, which for very
large files is not acceptable at my site.
I currently count 2 calls to zlib, not 3. And with big blobs as packs,
as suggested above then you'd have only one call when actually staging
their content. This should be really straight forward to implement
given that pack-objects is already a built-in.


Nicolas
Dana How
2007-11-26 20:17:21 UTC
Permalink
Post by Nicolas Pitre
Post by Dana How
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
I don't see why you would need the pack->blob copy normally.
True, but that doesn't change the main point.
Post by Nicolas Pitre
Post by Dana How
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this.
Then you can do just that for big enough blobs where "big enough" is
configurable: encapsulate them in a pack instead of a loose object.
Problem solved. Sure you'll end up with a bunch of packs containing
only one blob object, but given that those blobs are so large to be a
problem in your work flow when written out as loose objects, then they
certainly must be few enough not to cause an explosion in the number of
packs.
Are you suggesting that "git add" create a new pack containing
one blob when the blob is big enough? Re-using (part of) the pack format
in a blob (or maybe only some blobs) seems like less code change.
Post by Nicolas Pitre
Post by Dana How
It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
They already don't.
It looks like sha1_file.c:parse_sha1_header() works on a buffer
filled in by sha1_file.c:unpack_sha1_header() by calling inflate(), right?

It is true you don't have to uncompress the *entire* blob.
Post by Nicolas Pitre
Post by Dana How
The equivalent operation in git would require the creation of
the blob, and then of a temporary pack to send to the server.
This requires 3 calls to zlib for each blob, which for very
large files is not acceptable at my site.
I currently count 2 calls to zlib, not 3.
I count 3:

Call 1: git-add calls zlib to make the blob.

Call 2: builtin-pack-objects.c:write_one() calls sha1_file.c:read_sha1_file()
calls :unpack_sha1_file() calls :unpack_sha1_{header,rest}() calls
inflate() to get the data from the blob into a buffer.

Call 3: Then write_one() calls deflate to make the new buffer
to write into the pack. This is all under the "if (!to_reuse) {" path,
which is active when packing a blob.

Remember, I'm comparing "p4 submit file" to
"git add file"/"git commit"/"git push", which is the comparison
the users will be making.

On the other hand, I'm looking at code from June;
but I haven't noticed big changes since then on the list.

Calls 2 and 3 go away if the blob and pack formats were more similar.
--
Dana L. How ***@gmail.com +1 650 804 5991 cell
Nicolas Pitre
2007-11-26 20:55:43 UTC
Permalink
Post by Dana How
Post by Nicolas Pitre
Post by Dana How
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
I don't see why you would need the pack->blob copy normally.
True, but that doesn't change the main point.
Sure, but let's not go overboard either.
Post by Dana How
Post by Nicolas Pitre
Post by Dana How
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this.
Then you can do just that for big enough blobs where "big enough" is
configurable: encapsulate them in a pack instead of a loose object.
Problem solved. Sure you'll end up with a bunch of packs containing
only one blob object, but given that those blobs are so large to be a
problem in your work flow when written out as loose objects, then they
certainly must be few enough not to cause an explosion in the number of
packs.
Are you suggesting that "git add" create a new pack containing
one blob when the blob is big enough?
Exactly.
Post by Dana How
Re-using (part of) the pack format
in a blob (or maybe only some blobs) seems like less code change.
Don't know what you mean exactly here, but what I mean is to do
something as simple as:

pretend_sha1_file(...);
add_object_entry(...);
write_pack_file();

when the buffer to make a blob from is larger than a configured
treshold.
Post by Dana How
Post by Nicolas Pitre
Post by Dana How
It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
They already don't.
It looks like sha1_file.c:parse_sha1_header() works on a buffer
filled in by sha1_file.c:unpack_sha1_header() by calling inflate(), right?
It is true you don't have to uncompress the *entire* blob.
Right. Only the first 16 bytes or so need to be uncompressed.
Post by Dana How
Post by Nicolas Pitre
Post by Dana How
The equivalent operation in git would require the creation of
the blob, and then of a temporary pack to send to the server.
This requires 3 calls to zlib for each blob, which for very
large files is not acceptable at my site.
I currently count 2 calls to zlib, not 3.
Call 1: git-add calls zlib to make the blob.
Call 2: builtin-pack-objects.c:write_one() calls sha1_file.c:read_sha1_file()
calls :unpack_sha1_file() calls :unpack_sha1_{header,rest}() calls
inflate() to get the data from the blob into a buffer.
Call 3: Then write_one() calls deflate to make the new buffer
to write into the pack. This is all under the "if (!to_reuse) {" path,
which is active when packing a blob.
Oh, you're right. Somehow I didn't count the needed decompression.
Post by Dana How
Remember, I'm comparing "p4 submit file" to
"git add file"/"git commit"/"git push", which is the comparison
the users will be making.
On the other hand, I'm looking at code from June;
but I haven't noticed big changes since then on the list.
Calls 2 and 3 go away if the blob and pack formats were more similar.
... which my suggestion should provide with a minimum of changes, maybe
less than 10 lines of code.


Nicolas
Dana How
2007-11-26 22:02:45 UTC
Permalink
Post by Nicolas Pitre
Post by Dana How
Post by Nicolas Pitre
Then you can do just that for big enough blobs where "big enough" is
configurable: encapsulate them in a pack instead of a loose object.
Problem solved. Sure you'll end up with a bunch of packs containing
only one blob object, but given that those blobs are so large to be a
problem in your work flow when written out as loose objects, then they
certainly must be few enough not to cause an explosion in the number of
packs.
Are you suggesting that "git add" create a new pack containing
one blob when the blob is big enough?
Exactly.
I will think about your suggestion
(and the number of packs that might result),
but I confess I am surprised by it.

When I proposed automatically extracting large blobs from source
packs when creating a new pack under a blob size limit while
pack-objects was running, you objected on the grounds that
pack-objects only creates packs and should not create blobs
(this proposal had other problems too, but this is the one you didn't like).

Now it's OK for git-add to sometimes create packs instead of blobs?
I would not have predicted that!

;-)
--
Dana L. How ***@gmail.com +1 650 804 5991 cell
Nicolas Pitre
2007-11-26 22:22:38 UTC
Permalink
Post by Dana How
Post by Nicolas Pitre
Post by Dana How
Post by Nicolas Pitre
Then you can do just that for big enough blobs where "big enough" is
configurable: encapsulate them in a pack instead of a loose object.
Problem solved. Sure you'll end up with a bunch of packs containing
only one blob object, but given that those blobs are so large to be a
problem in your work flow when written out as loose objects, then they
certainly must be few enough not to cause an explosion in the number of
packs.
Are you suggesting that "git add" create a new pack containing
one blob when the blob is big enough?
Exactly.
I will think about your suggestion
(and the number of packs that might result),
but I confess I am surprised by it.
When I proposed automatically extracting large blobs from source
packs when creating a new pack under a blob size limit while
pack-objects was running, you objected on the grounds that
pack-objects only creates packs and should not create blobs
(this proposal had other problems too, but this is the one you didn't like).
Now it's OK for git-add to sometimes create packs instead of blobs?
I would not have predicted that!
Going back to loose objects from packs is indeed something I object to
if it becomes part of a work flow. Objects should move from the loose
space towards the packed space and not the other way around. Sure there
is fetch.unpackLimit, but with the auto-repack recently added to Git
this variable could probably be set even lower.

But having a pack created for huge blobs up front has many advantages,
the most obvious is the fact that later repack can combine and/or send
those single-blob packs with almost no cost.

Loose objects are meant to be blazingly fast to create. Once repacked
they have no advantage being loose again. Obviously when your blob is
huge you won't benefit much from a loose object.


Nicolas
Jakub Narebski
2007-11-26 20:17:55 UTC
Permalink
Post by Dana How
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this. It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
Could you do some benchmark for repository with your large objects
as loose objects created with and without core.legacyHeaders (created
with git pre 1.5.3), and as single blob packs, perhaps kept, with
_undocumented_ (except for RelNotes) gitattribute delta unset for
those files?
- We used to have core.legacyheaders configuration, when
set to false, allowed git to write loose objects in a format
that mimicks the format used by objects stored in packs. It
turns out that this was not so useful. Although we will
continue to read objects written in that format, we do not
honor that configuration anymore and create loose objects in
the legacy/traditional format.

- "pack-objects" honors "delta" attribute set in
.gitattributes. It does not attempt to deltify blobs that
come from paths with delta attribute set to false.

- diff-delta code that is used for packing has been improved
to work better on big files.

The last part is thanks to your comments, complaints and efforts, Dana.
--
Jakub Narebski
Poland
Dana How
2007-11-26 20:36:27 UTC
Permalink
Post by Jakub Narebski
Post by Dana How
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this. It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
Could you do some benchmark for repository with your large objects
as loose objects created with and without core.legacyHeaders (created
with git pre 1.5.3), and as single blob packs, perhaps kept, with
_undocumented_ (except for RelNotes) gitattribute delta unset for
those files?
First of all, this is a very reasonable request and what I should be doing.
Unfortunately, I only have the cycles at the moment to point out this
issue, which appears to be a problem from my perspective.

Currently,
a user who wants to publish some (large) files does the following:
git add (calls deflate)
git commit
git push (builds a pack to stdout, calling inflate and deflate on each blob).

So if the blob and pack formats were more similar (different blob format,
big blobs are singleton packs, etc) the zlib calls in git push go away.
The deflate call could be sped up by using 1 for compression level,
but it still takes time.

Another "solution" is to make each workgroup member's .git/objects
be a symlink to a tree with a lot of sticky bits and do some scripting.
(This means "git push" doesn't push any data and only alters stuff
in .git/refs/heads on the server.)
I'm not entirely enthusiastic about this, and when I mentioned it a while
ago it did cause some retching...
Post by Jakub Narebski
- We used to have core.legacyheaders configuration, when
set to false, allowed git to write loose objects in a format
that mimicks the format used by objects stored in packs. It
turns out that this was not so useful. Although we will
continue to read objects written in that format, we do not
honor that configuration anymore and create loose objects in
the legacy/traditional format.
- "pack-objects" honors "delta" attribute set in
.gitattributes. It does not attempt to deltify blobs that
come from paths with delta attribute set to false.
- diff-delta code that is used for packing has been improved
to work better on big files.
The last part is thanks to your comments, complaints and efforts, Dana.
Yes, there have been some very useful improvements recently.

However, I didn't actually push for the first "-" you list;
I was pushing for the "mimic" option even then
but some argument was presented to me against it,
to which I had no counter-argument until I understood git better later.

Thanks,
--
Dana L. How ***@gmail.com +1 650 804 5991 cell
Shawn O. Pearce
2007-11-27 01:25:18 UTC
Permalink
Post by Dana How
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
Currently data can be quickly copied from pack to pack,
but data cannot be quickly copied blob->pack or pack->blob
I agree with Nico's comment that you probably don't need pack->loose
object as its just not something you want to do. But otherwise
above you mean "loose->pack" or "pack->loose" as blob is one type
of loose object but there are others (tree, commit, tag).
Post by Dana How
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this. It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
The alternate format for loose objects *was* the packfile format,
but without the packfile header or trailer as that was really
quite unnecessary for a single object storage.

Unfortunately we removed that alternate format from the system.
We can't create it anymore. We can't efficiently copy it to the
packfile anymore. But we can still read it in case someone still
has loose objects using that alternate format in their repository.

I was sad when Nico removed the format in 726f852b0ed7e. I can
understand why he did so but I think it was a move in the wrong
direction.
--
Shawn.
Nicolas Pitre
2007-11-27 05:07:41 UTC
Permalink
Post by Shawn O. Pearce
Post by Dana How
(there was an alternate blob format that supported this,
but it was deprecated). Using the pack format for blobs
would fix this. It would also mean blobs wouldn't need to
be uncompressed to get the blob type or size I believe.
The alternate format for loose objects *was* the packfile format,
but without the packfile header or trailer as that was really
quite unnecessary for a single object storage.
What I'm suggesting, though, is to actually create a real pack for those
blobs where the recompression is really an issue. all the code is
there and only needs to be called.

In most usage cases, though, the proportion of blobs that gets copied
directly into a pack is minimal, and even then they don't amount to a
lot of cycles compared to the majority of deltified objects.

(yeah, "deltified" is said to be wrong by some, but it is really
convenient a word.)
Post by Shawn O. Pearce
I was sad when Nico removed the format in 726f852b0ed7e. I can
understand why he did so but I think it was a move in the wrong
direction.
I wish I could convince you otherwise by now.


Nicolas
Shawn O. Pearce
2007-11-27 01:48:04 UTC
Permalink
Post by Jakub Narebski
If you would write git from scratch now, from the beginning, without
concerns for backwards compatibility, what would you change, or what
would you want to have changed?
- Sort tree entries by name, *not* by name+type

This has got to be my biggest gripe with Git. I think Linus really
screwed the pooch with this. We've talked it over a few times
on the list and he and I have just agreed to disagree on this.

Ask any database person and they'll tell you how wrong the
current tree ordering is. Or they are nuts and don't get
the concept of data integrity.

Linus' excuse is that the current ordering makes working with
the flat index faster as its just one index file. That doesn't
mean that the flat index file can't contain tree information.
Like it does in say that new fangled cache-tree extension. :-)

This particular "design decision" has brought all sorts of bugs
into the system, like the D/F merge conflict issues, and even one
from Linus himself when he first introduced the submodule support.
Lets not even talk about ugly that made things in jgit.


- Loose objects storage is difficult to work with

The standard loose object format of DEFLATE("$type $size\0$data")
makes it harder to work with as you need to inflate at least
part of the object just to see what the hell it is or how big
its final output buffer needs to be.

It also makes it very hard to stream into a packfile if you have
determined its not worth creating a delta for the object (or no
suitable delta base is available).

The new (now deprecated) loose object format that was based on
the packfile header format simplified this and made it much
easier to work with.


- No proper libgit

Already been stated but we don't have a great library and we
don't have a good way to build one right now either. A lot of
our internal code assumes die() will abort the process. That's a
very bad assumption to be making inside of a library.


- Binary packed-refs representation

I probably wouldn't have done an ASCII based packed-refs file,
or heck, even loose refs. I probably would have just gone with
a binary file that we wholesale rewrite every time there is any
sort of ref update.

We already do this with the index. So every time we update a
file path we are rewriting the entire index. And we update
file paths a heck of a lot more often than we update branch
heads. Or tags.

But tools like for-each-ref get invoked heavily, and fast access
to the ref database is important to overall performance.


- No GIT_OBJECT_DIRECTORY vs. GIT_DIR distinction

This is causing problems with $GIT_DIR/objects/info/alternates
and then try to repack repositories. Not having the ref space of
the alternates and/or borrowers considered during repacking can
cause all sorts of fun breakage that may be hard to recover from.
Plus it means you have to do funny "refs/forkee" hacks just to
avoid pushing unnecessary objects over the wire when the other
end is borrowing objects.

I probably would have had the object directory unified with its
ref database, so that they cannot be accessed individually.


All of the above is written with 20/20 hindsight and all that.

Looking back (and knowing myself well) I think the only item I
would have gotten right if I had written Git from scratch is the
first one above (the tree entry ordering). I probably would have
done something equally "as bad" as what we have today for all of
the others...
--
Shawn.
Junio C Hamano
2007-11-27 01:54:08 UTC
Permalink
Post by Shawn O. Pearce
All of the above is written with 20/20 hindsight and all that.
Looking back (and knowing myself well) I think the only item I
would have gotten right if I had written Git from scratch is the
first one above (the tree entry ordering). I probably would have
done something equally "as bad" as what we have today for all of
the others...
... not to mention countless others you would get wrong that you did not
list in the above, as the current git got them right ;-)
Shawn O. Pearce
2007-11-27 01:59:42 UTC
Permalink
Post by Junio C Hamano
Post by Shawn O. Pearce
All of the above is written with 20/20 hindsight and all that.
Looking back (and knowing myself well) I think the only item I
would have gotten right if I had written Git from scratch is the
first one above (the tree entry ordering). I probably would have
done something equally "as bad" as what we have today for all of
the others...
... not to mention countless others you would get wrong that you did not
list in the above, as the current git got them right ;-)
Indeed.

Which is why nobody is looking to rewrite Git from scratch.

Except myself and a few other nuts who want a pure Java
implementation for Eclipse plugins. :-)
--
Shawn.
Jakub Narebski
2007-11-27 02:15:55 UTC
Permalink
Post by Shawn O. Pearce
Post by Junio C Hamano
... not to mention countless others you would get wrong that you did not
list in the above, as the current git got them right ;-)
Indeed.
Which is why nobody is looking to rewrite Git from scratch.
Except myself and a few other nuts who want a pure Java
implementation for Eclipse plugins. :-)
And the project to implement git in C# / Mono (I wonder what is
the status of those implementations...)
--
Jakub Narebski
Poland
Johannes Schindelin
2007-11-27 11:47:44 UTC
Permalink
Hi,
Post by Jakub Narebski
Post by Shawn O. Pearce
Post by Junio C Hamano
... not to mention countless others you would get wrong that you did not
list in the above, as the current git got them right ;-)
Indeed.
Which is why nobody is looking to rewrite Git from scratch.
Except myself and a few other nuts who want a pure Java
implementation for Eclipse plugins. :-)
And the project to implement git in C# / Mono (I wonder what is
the status of those implementations...)
See for yourself. I started listing some plumbings in
http://git.or.cz/gitwiki/Plumbings, but it seems that the homepage points
to a wrong URL for the repo. The correct one is
http://repo.or.cz/w/Widgit.git.

Hth,
Dscho
Nicolas Pitre
2007-11-27 04:58:55 UTC
Permalink
Post by Shawn O. Pearce
- Loose objects storage is difficult to work with
The standard loose object format of DEFLATE("$type $size\0$data")
makes it harder to work with as you need to inflate at least
part of the object just to see what the hell it is or how big
its final output buffer needs to be.
It is a bit cumbersome indeed, but I'm afraid we're really stuck with it
since every object SHA1 depends on that format.
Post by Shawn O. Pearce
It also makes it very hard to stream into a packfile if you have
determined its not worth creating a delta for the object (or no
suitable delta base is available).
The new (now deprecated) loose object format that was based on
the packfile header format simplified this and made it much
easier to work with.
Not really. Since separate zlib compression levels for loose objects
and packed objects were introduced, there was a bunch of correctness
issues. What do you do when both compression levels are different?
Sometimes ignore them, sometimes not? Because the default loose object
compression level is about speed and the default pack compression level
is about good space reduction, the correct thing to do by default would
have been to always decompress and recompress anyway when copying an
otherwise unmodified loose object into a pack.


Nicolas
Dana How
2007-11-27 05:59:15 UTC
Permalink
Post by Nicolas Pitre
Post by Shawn O. Pearce
- Loose objects storage is difficult to work with
The standard loose object format of DEFLATE("$type $size\0$data")
makes it harder to work with as you need to inflate at least
part of the object just to see what the hell it is or how big
its final output buffer needs to be.
It is a bit cumbersome indeed, but I'm afraid we're really stuck with it
since every object SHA1 depends on that format.
Yes, now I remember: this was the same argument you used to
convince me that losing the "new" (deprecated) loose format was OK.

However, if we changed
WRITE(DEFLATE(SHA1("$type $size\0$data")))
(where SHA1(x) = x but has the side-effect of updating the SHA-1)
to
WRITE($pack_style_object_header)
SHA1("$type $size\0")
WRITE(DEFLATE(SHA1($data)))
then the SHA-1 result is the same but we get the pack-style header,
and blobs can be sucked straight into packs when not deltified.
The SHA-1 result is still usable at the end to rename the temporary
loose object file
(and put it in the correct xx subdirectory).

Because we can't change the SHA-1 result we unfortunately can
never drop the 2nd call above [this is something that could
have been different, to respond to the email that started this thread].
You didn't like the duplication between the 1st and 2nd call,
but I can't say I see that as a big deal.
Post by Nicolas Pitre
Post by Shawn O. Pearce
It also makes it very hard to stream into a packfile if you have
determined its not worth creating a delta for the object (or no
suitable delta base is available).
The new (now deprecated) loose object format that was based on
the packfile header format simplified this and made it much
easier to work with.
Not really. Since separate zlib compression levels for loose objects
and packed objects were introduced, there was a bunch of correctness
issues. What do you do when both compression levels are different?
Sometimes ignore them, sometimes not? Because the default loose object
compression level is about speed and the default pack compression level
is about good space reduction, the correct thing to do by default would
have been to always decompress and recompress anyway when copying an
otherwise unmodified loose object into a pack.
Not exactly. I did think about this. When you are packing to stdout,
and only sending the resulting packfile locally, you don't want to
bother with recompressing everything. [This is the "workgroup" case
that concerns me.] Other cases, sure,
recompression could help (e.g., packing to a file means the file
will probably be around for a while, so you want to recompress
if the levels are unequal; and you probably want to recompress
as well if the packfile will be sent over a "slow" link).

Thanks,
--
Dana L. How ***@gmail.com +1 650 804 5991 cell
Shawn O. Pearce
2007-11-27 06:12:10 UTC
Permalink
Post by Dana How
Post by Nicolas Pitre
It is a bit cumbersome indeed, but I'm afraid we're really stuck with it
since every object SHA1 depends on that format.
Yes, now I remember: this was the same argument you used to
convince me that losing the "new" (deprecated) loose format was OK.
However, if we changed
WRITE(DEFLATE(SHA1("$type $size\0$data")))
(where SHA1(x) = x but has the side-effect of updating the SHA-1)
to
WRITE($pack_style_object_header)
SHA1("$type $size\0")
WRITE(DEFLATE(SHA1($data)))
then the SHA-1 result is the same but we get the pack-style header,
and blobs can be sucked straight into packs when not deltified.
The SHA-1 result is still usable at the end to rename the temporary
loose object file
(and put it in the correct xx subdirectory).
Hah. That's exactly what the "new" (deprecated) format was, and what
its code for creating such objects looked like in sha1_file.c. :-)
--
Shawn.
Linus Torvalds
2007-11-27 16:33:43 UTC
Permalink
Post by Nicolas Pitre
Post by Shawn O. Pearce
- Loose objects storage is difficult to work with
The standard loose object format of DEFLATE("$type $size\0$data")
makes it harder to work with as you need to inflate at least
part of the object just to see what the hell it is or how big
its final output buffer needs to be.
It is a bit cumbersome indeed, but I'm afraid we're really stuck with it
since every object SHA1 depends on that format.
No.

The SHA1 itself just depends on "$type $size\0$data" (no deflate phase),
and that one is easy and cheap to calculate. How we then *encode* the data
on disk is totally immaterial.

In fact, pack-files obviously do not encode it in that form at all, they
in fact use two different forms of "$binaryhdr$DEFLATE($data)" or
"$binaryhdr$basesha$DEFLATE($delta)" (that's from memory, so don't rely on
that).

So we could easily change the on-disk format, and we obviously have - the
alternate (but deprecated) format for unpacked objects already did. In
fact, we could - and probably should - add some kind of "back end
interface" for alternate encoding formats, in case somebody wants to do
something really crazy like use a database for object tracking.

(Side note: using an actual database would really be insane. There is
absoluely zero point. But what *could* be interesting would be to have a
"cluster back-end" for the git object store, where objects get hashed to
different nodes. If you have a really fast network, it may actually be
beneficial to spread the objects out, and get better disk throughput by
that kind of strange "git object RAID-0 striping" setup)

Linus

(*) Honesty in advertising: the really *original* format did the SHA1
after the deflate, but that was quickly fixed and was a really stupid
choice. The main point for doing that was that it meant that loose objects
could be verified by just running "sha1sum" on them, and comparing the
result with their name.
Continue reading on narkive:
Loading...