Tool ideology

25 Post author: PhilGoetz 09 September 2011 10:37PM

Follow-up to Journal article about politics and mindkilling.  That post showed that people can be convinced that a view is correct by being told that their political party endorses it, even if their party actually opposes it.  A similar, but stranger, effect, is that people can be convinced that a view is correct because their favorite software implements it - even if they have stated that the view is wrong just minutes ago.

Subversion is a popular version-control system used by software developers.  The "repository" is where subversion stores the definitive copy of each file it is keeping track of.  A "diff" is when you ask subversion to show you all differences between your new code, and the last version of that code that it knows was in the repository.  A "tag" is when you associate a set of versions of different files together, in a way so that they can be easily compared against or reverted to, without creating something called a "branch".

I've had variants of this conversation about Subversion three times so far:

Me:  [problem X] happened because Subversion doesn't let you diff against the repository.

Other:  What?

Me: Subversion doesn't let you diff against the repository.  It only diffs against its local copy of the repository, which it updates when you do a checkout, commit, or update.  So it won't show you any changes that someone else has made to the code since then.  You can never see what changes other people have made since your last commit, because to get the changes, you have to do an update; and the changes are added to your code without being shown to you, and a diff won't show them.  So you just cross your fingers and hope their changes are compatible with yours.

Other:  That's ridiculous!  Subversion doesn't do that.  That would defeat the whole purpose of version control.  That would be idiotic.

Me:  Really, it does that.  I've tried it.  Repeatedly.  I've wasted days of work because of it.

Other:  Nonsense.  Of COURSE Subversion diffs against the repository.

Me:  Try this:  Create a new file foo in your checkout in directory X.  Then svn add foo and svn commit foo.  Check out the same repository in directory Y.  Modify foo in directory X.  svn commit foo.  Then do an svn diff foo from directory Y.

[A few minutes later, after trying it:]

Other:  Well, of COURSE Subversion doesn't diff against the repository.  It's meant for large, distributed projects.  You wouldn't want to have to do diffs over the web.

Me:  Why?

Other:  It would be too slow.

Me:  You do checkouts and commits and updates over the web.  Are they too slow?

Other:  You want to diff against your previous version.  It would be too confusing to see the changes other people have made, too.

Me:  Three minutes ago you said it would be idiotic not to diff against the repository.

Other:  Look, Subversion is an industry standard!

Me:  Subversion doesn't even let you tag releases.

Other:  Of COURSE Subversion lets you tag releases.

[Conversation eventually ends with Other explaining why you don't need to tag releases anyway.]

 

[P.S. - There is a very long syntax for svn diff that lets you specify full paths to the repository and your checkout directory.  It can't mix paths and URLs, so you have to specify your checkout directory as a complete URL.  No one that I know uses this syntax.]

Comments (66)

Comment author: sketerpot 12 September 2011 03:47:53AM 4 points [-]

Fun fact: it's possible to make a fully distributed version control system that maintains complete history of every branch at all times, down to the individual keystrokes if necessary, on large projects, in realtime, and have it be fast. It can even be peer-to-peer, and operate over an unreliable mesh network, if you like. When people start arguing that version control systems can't do something with reasonable performance, they're usually dead wrong.

Comment author: RobinZ 13 September 2011 10:32:31PM 3 points [-]

That sounds amazingly cool. What version control system(s) are you thinking of, there?

Comment author: sketerpot 14 September 2011 11:26:56PM 2 points [-]

An unreleased, not production-ready VCS that I made so I could finish grad school. :-)

The basic approach is similar to how git works: store all your revision state as a tree, and have code for merging trees. If you choose a good representation for this tree, and take some care with how you implement the merging operations, and do the merge in such a way that you're guaranteed to achieve convergence regardless of merge order, then you can get all those snazzy properties I mentioned earlier.

I've proven asymptotic time bounds and correctness for all the operations, and verified that it actually works the way it's supposed to in real code, but for now this is of mostly theoretical interest. For now.

Comment author: vi21maobk9vp 14 September 2011 08:04:57AM 3 points [-]

It looks like an estimation, not a VCS link.

Think of it this way: Vim undo history is a tree which you can walk visiting every branch (not that it's a thing you want to do). Now, writing all this data out has some cost in IO bandwidth - comparable to bandwidth of the keyboard, i.e. kBytes/minute. Vim users don't notice the cost of maintaining the tree in RAM.

Synchronising it at first opportunity is also not hard if you do it in the background and so latency can be tolerated most of the time.

The merges.. you can try to do them mostly on marked commits, and then they can be done just like they are done now.

But implementing all that is a great undertaking, to be sure.

Comment author: PhilGoetz 21 September 2011 08:09:18PM 5 points [-]

Wait - vim undo is a tree? So I can get back the revisions that I lost by undoing the last 100 operations and then carelessly inserting a character? HOW?

Comment author: wedrifid 21 September 2011 08:49:17PM 3 points [-]

Wow! It is? I had no idea!

From the looks of it this script might be a helpful way to use the feature.

Comment author: vi21maobk9vp 21 September 2011 08:48:58PM 2 points [-]

Well, now that you new that it is there, you could just type :help undo-tree. Basically, it is about g+/g-.

And the next chapter of undo.txt tells you about saving undo history.

Comment author: sketerpot 14 September 2011 11:29:12PM 2 points [-]

The limiting case of merge frequency is to do a branch and merge on every keystroke, and create something like Etherpad. This is completely practical.

Comment author: RobinZ 14 September 2011 02:25:16PM 0 points [-]

Ooh, I hadn't thought about it that way - sure, it'd take thousands of those to clog a modern high-speed connection.

Comment author: DSimon 11 September 2011 05:39:54AM *  1 point [-]

Hmm... but what do they think a few days later? That is: few people are cool-headed enough to bite a bullet and lose face in the middle of an argument. Did the Subversion fan continue to believe that it was dead-sure necessary that it do things the way it does, or did they think to themselves "Maybe it could be done a better way..."

Not to say that you aren't seeing a genuine failure of rationality; if they're willing to update on new evidence only a long while after encountering it during an argument, then (a) that means they'll react slower in situations where reacting faster would get them more utilons, and (b) they're probably less likely even later on to update, since the argument may still have an argh-that-jerk-said-that-nasty-thing-about-my-awesome-stuff vibe clinging to it.

Comment author: PhilGoetz 21 September 2011 08:13:40PM 1 point [-]

Hmm... but what do they think a few days later?

I don't know.

That is: few people are cool-headed enough to bite a bullet and lose face in the middle of an argument.

Why would they lose more face by admitting the tool did the wrong thing, than by admitting they were wrong in saying that it's better to diff against the repository? It appears their loyalty to the tool is more important to them than their beliefs.

Comment author: DSimon 22 September 2011 02:28:30PM 0 points [-]

Why would they lose more face by admitting the tool did the wrong thing, than by admitting they were wrong in saying that it's better to diff against the repository?

Because by admitting the tool did the wrong thing after they said they're a fan of the tool, that's admitting that they screwed up with their tool choice. The harder they promoted it before the problem was revealed to them, the more face can be lost.

Comment author: kilobug 10 September 2011 05:21:43PM *  4 points [-]

I've fallen to it myself, but mostly to the reverse version of it : not saying "this feature is obviously required" and then when learning that my favorite tool (shell, text editor, language, web browser, version control, whatever) doesn't have it saying "oh but it's useless", but the reverse : someone tells me "you should try tool X, it has that great feature" and then I answer "I don't see the point of that feature, it's just that your tool X is bloatware, I'll still to my tool Y" and when a later version of my tool Y implements the said feature I then switched and said to everyone "you should use tool Y, it has that great feature".

It's very similar (and since I became aware I've a tendency to this bias, I try to force myself to not fall to it again, but not totally successfully...) but it's somehow more understandable : you very often see the point of a feature/tool/... once you started using it regularly in your own use cases.

I don't think I did the direct version of it since the time where I was a teenager. At least, I hope I didn't...

Comment author: CronoDAS 11 September 2011 03:30:49AM *  5 points [-]

I've read that people only use small subsets of the available features in huge programs such as Microsoft Word, so it would seem like they should be able to get rid of "all those features nobody uses" and make a version without all that complicated bloat that confuses everyone. The problem, however, is that everyone uses a different subset of the feature set, so one person's useless bloat is someone else's essential, obvious feature that they don't know how people would get along without.

"Features seem useless until you get used to them, at which point they become essential" may be a fairly common pattern...

Comment author: Kaj_Sotala 10 September 2011 07:54:11AM 13 points [-]

I remember doing something similar. As kids, a friend and I were trying to figure out something computer-related - how to use some MS-DOS file compression software, I think. My friend suggested using some specific command, which I thought obviously couldn't work. He typed it in anyway, and behold! It did work. I blinked, and then it felt like a floodgate had opened in my mind and an explanation of why it did work came pouring in to my consciousness.

I've wondered if this might be a case of constraint propagation. Picture my mind as a network of beliefs, together with some algorithm trying to make sure that they are at least roughly consistent. A bunch of (incorrect) beliefs held with moderate confidence combine to suggest that the belief "this command would work" is incorrect with high confidence. But then I find out that the command does work, and the external evidence changes the value of that node. This forces an update to the beliefs that were connected to it, and the change propagates through the network and adjusts beliefs until I finally have high confidence in a theory that's completely different from what I believed in a minute ago.

Comment author: Matt_Simpson 10 September 2011 09:45:37PM 1 point [-]

After reading the first paragraph, I was going to comment on how this phenomenon is often useful, but you're second paragraph implicitly addresses that.

Comment author: pengvado 10 September 2011 02:00:28AM *  26 points [-]

You can never see what changes other people have made since your last commit, because to get the changes, you have to do an update

svn diff -rBASE:HEAD to see the changes since your last update.
svn diff -rHEAD to diff your working tree against the repository.
Which does send the diffs over the web, and is inconveniently slow.

(I'm not a svn user. I just agreed with the initial reaction of "that's ridiculous", and followed it up with "I bet there really is a way to do that" and looked at the manpage.)

Comment author: PhilGoetz 10 September 2011 03:17:28PM 5 points [-]

Wow, it is right there in svn help diff! I'm going to try this first thing Monday.

Comment author: arundelo 10 September 2011 07:47:56PM *  4 points [-]

You might also enjoy svn status's --show-updates switch, which shows what files would be updated if you ran svn update.

Comment author: PhilGoetz 16 September 2011 03:37:40PM 0 points [-]

It works!

Comment author: handoflixue 15 September 2011 12:16:48AM 0 points [-]

Oh good, I thought I was going crazy :)

Comment author: mkehrt 10 September 2011 02:39:41AM 0 points [-]

This is phenomenal! Thanks!

Comment author: Armok_GoB 10 September 2011 03:38:33PM 1 point [-]

This is part of a larger phenomena where for some reason preferences in software are stored as group loyalties, to the point where one can actually go "yes, that piece of software is [better in way X], but [my favourite software] is better [in general but not in this specific context], and it'd be treason!"

Comment author: fubarobfusco 14 September 2011 08:48:41AM 7 points [-]

Software often has network effects — for instance in the exchange of good techniques (or, in an open-source world, patches!), compatibility of file formats, comfort sharing a work environment. And then there are the sunk costs of learning to use a specific piece of software and adapting your work habits to use it.

These suggest that users may have a good reason to deter their fellow users from switching away.

Comment author: PhilGoetz 21 September 2011 08:15:06PM 1 point [-]

Perhaps explaining why Apple users are famously evangelical.

Comment author: Armok_GoB 14 September 2011 09:14:14AM 0 points [-]

That's a good point... and also rather scary.

Comment author: calcsam 09 September 2011 11:05:08PM 8 points [-]

[nitpick]

Exchanges are easier to follow if you bold the person speaking.

Comment author: taw 09 September 2011 10:50:24PM 8 points [-]

The happy ending is that nobody uses subversion any more, git won and has none of these problems.

It's up to you how seriously you read my comment.

Comment author: PhilGoetz 09 September 2011 10:54:44PM *  1 point [-]

Hee. We still use subversion every day.

Version control systems nowadays suffer from the problem that all new version control systems are created by groups of hackers working on projects so big and complex that the existing systems aren't powerful enough for them. So you keep getting more and more powerful and complex systems. git is so complex that no one who isn't a software developer can use it correctly.

I was tasked with moving a complex natural-language processing program for the NIH from, I think, SVCS, to git. After three days studying git man pages and trying to explain them to a group of linguists, I gave up and put everything under QVCS, and it was smooth sailing after that.

Comment author: pjeby 10 September 2011 02:01:58AM 2 points [-]

git is so complex that no one who isn't a software developer can use it correctly.

Try mercurial. It's got basically the same features, but is more comprehensible to human beings. There's an excellent tutorial called hg init.

(And if you should happen to need to use other people's stuff that's in git, you can just use the git extension for mercurial.)

Comment author: MBlume 10 September 2011 06:15:48AM 6 points [-]

blinks

I was taught to use git within a few days of starting to become a professional programmer. I'm a dyed-in-the-wool fanboy. I probably have no perspective at all here. But whenever I've used Mercurial everything seems backwards. People start recommending that I do wacky-sounding things like making two clones of a repository just to do what I'd normally do with git branch/git checkout... Is there any way to track multiple heads without just making multiple checkouts all over your disk?

Also, I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads. If you run gitk --all whenever you get confused, you can actually see the thing, and then there's nothing to be confused about.

...Though I suppose the above might just translate to "I'm a visual thinker, and everyone should be more like me."

Comment author: pjeby 11 September 2011 06:12:33AM 3 points [-]

Is there any way to track multiple heads without just making multiple checkouts all over your disk?

Taboo "track" and "checkouts". I don't know what you mean by "track", and Mercurial doesn't have checkouts, as I understand the term. A clone isn't "checked out" of anything. (This was actually the hardest part for me to wrap my head around, coming from Subversion and the central-repository model, but I'm wondering whether you're talking about the same thing or not.)

If you simply mean you want more than one head or branch, you don't need multiple clones. You can switch your working copy between named branches or heads with "hg up", and list them with "hg heads".

It's true that people often suggest just using clones instead of named branches, but IMO this only makes sense for short-lived branches that are going to be folded in to something else. Mercurial works just fine with named branches and multiple heads. You can also use bookmarks to give local names to specific heads -- a kind of locally-named branch whose name isn't propagated to other repositories.

I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads.

No, we just read the man pages and run screaming. It's not the model of a change-based system that's the problem, it's the UI design (or lack thereof). ;-)

From an outsider's perspective, git's UI is to mercurial's UI as Perl's is to Python. And since I've programmed almost exclusively in Python for about 13 years now, guess which one looks more attractive to me?

(Note: this doesn't have anything to do with Mercurial's innards being written in Python; other DVCS's have been written in Python and didn't have the same orthogonality of design.)

Comment author: nshepperd 11 September 2011 07:23:29AM 1 point [-]

I'm told git massively improved its interface in the last few years. I started using it mainly in 2010 after switching from bzr, and had little trouble understanding the system (in fact I found hg's interface to be kind of weird). But there you go.

(Also, wrt

Taboo "track" and "checkouts". I don't know what you mean by "track", and Mercurial doesn't have checkouts, as I understand the term. A clone isn't "checked out" of anything.

In git-land "checkout" means a working directory; by "multiple checkouts all over your disk" I assume MBlume means multiple clones of the repository.)

Comment author: shokwave 11 September 2011 06:26:14AM 1 point [-]

git's UI is to mercurial's UI as Perl's is to Python

Harsh!

Comment author: kpreid 10 September 2011 10:58:33AM *  3 points [-]

Well, to me, git's DAG model is 100% obvious, and gitk --all is helpful in exactly the way you state — but at the beginning it was still confusing which command used how would produce the effect on the DAG (and working tree and index...) I wanted. Similarly, the commands to configure and manipulate branches and remotes are not entirely obvious, especially if you've gotten off the beaten path and want to fix your config to be normal.

Comment author: MBlume 09 September 2011 11:00:46PM 1 point [-]

Git is new. It's already gotten easier to use (I'm already too much of a newb to have ever used the Git of Yore, which supposedly you needed a CS PhD to use effectively), and the folks at GitHub in particular seem to be working hard at sanding down its rough edges.

Comment author: PhilGoetz 09 September 2011 11:32:06PM 1 point [-]

My experience with git was in 2006 or 2007.

Comment author: taw 10 September 2011 10:25:02AM 4 points [-]

This is quite ancient. git started as a solution to technical problem of high performance distributed version control. They got user interface into something reasonable only later.

Comment author: wnoise 14 September 2011 10:58:03PM 0 points [-]

It's still not that great. The internal DAG model is quite clean and clear. The actual commands do not always map clearly to this model. One common failure is often hiding or doing implicit magic to the staging area. Another is that many commands are a mish-mash of "manipulate the DAG" and "common user operations", where doing only one or the other would be much clearer. I really doubt that the user interface will get much better, because to do so they really need to throw out backward compatibility.

Comment author: vi21maobk9vp 15 September 2011 05:18:40AM 0 points [-]

There are some problem with DAG, too, because you are supposed to store the information with little meta-information.

There are precedents of tools wrapping Git command-line interface, so that part possibly could be fixed. I frankly do not know why nobody does it.

Comment author: vi21maobk9vp 11 September 2011 08:19:32AM 0 points [-]

Of course, Subversion is still "majority" VCS even for open-source projects. Maybe people need something other than Git to change that - or maybe SVK should become more widespread way to use SVN.

And for the sake of speed and stability Git doesn't store some data that every other open-source DVCS does store, and I have heard some Git users to say it is acceptable tradeoff (which is true for them) and some to say that nobody should care about this kind of data.

Of course, better tool is never a solution to tool ideology. Evaluating multiple other tools isn't either - after doing it with DVCSes, I now hate Git and implicitly assume that every tradeoff there is not fit for the medium-sized projects I'd care about.

Comment author: taw 13 September 2011 10:57:26AM 2 points [-]

I would guess that git is already more popular than svn for new projects (see github), and in at least some circles like among Ruby programmers still using svn for new stuff would raise some eyebrows. It's definitely way past just early adopters, but I have no idea how to get reasonable vcs usage statistics.

I don't know what you mean by these tradeoffs, git tends to store more data not less.

Comment author: vi21maobk9vp 13 September 2011 04:40:32PM 0 points [-]

Well, Git stores code per se, for the rest of things it stores less data than either SVN or Bazaar (Mercurial, Monotone, Veracity).

It doesn't track explicit directory renaming. It doesn't keep branch history - if it did, reflog (which is local and only kept for 30 days) wouldn't be needed. It only allows unique tags - so if you want to mark every revision where buildbot succeded to make both update and rolling back easy - you are out of luck (there can be a way - it is not obvious at all).

Comment author: Vladimir_Nesov 13 September 2011 05:07:09PM 1 point [-]

It doesn't track explicit directory renaming.

It knows each directory by its content, so it knows when a directory was renamed, without needing to be explicitly told.

It doesn't keep branch history - if it did, reflog (which is local and only kept for 30 days) wouldn't be needed.

Reflog is an essentially local thing, it shows where a branch used to point in a particular repository instance. It has little to do with history of the project, and often includes temporary commits that shouldn't be distributed.

It only allows unique tags - so if you want to mark every revision where buildbot succeded to make both update and rolling back easy - you are out of luck

You need some way to specify what you'd want to update or roll back to - what kind of use case are you thinking about? You could support a successful-build branch, for example, so that its tip points to the last successful build, and you could create merge commits with previous successful builds as first parents, for the purpose of linking together all successful builds in that branch.

Comment author: vi21maobk9vp 13 September 2011 08:57:38PM 1 point [-]

Tracking path by their content is not always good... It couples content changes with intent changes. If I need to make a copy of directory and then make the copies slowly diverge until they have nothing in common, I may want to mark which is for original intent and which is spin-off.

Branch history is not an inherently local thing.

When I have feature branches for recurring tasks, I will probably call them always the same. I will sometimes merge them with trunk and sometimes spawn them from the new trunk state. Later, I may wish to check whether some change was done in trunk or in the feature branch - it is quite likely to provide some information about commit intent. I can get it in every DVCS I know except Git easily - in Git I need to track DAG to get this information.

About succesful-build branch: for some projects I try to update to trunk head, and if it gives me too much trouble I look for closest previous revision which I can expect to work. In Monotone I simply mark some development commits as tested enough, there is a simple command to get all the branch.tested commits from the last month. This information says something about a commit, and to lose it I have to do something with the certificate that states it. In Git, rev-list behaviour depends on many things that happen later.

Linux kernel history is too big for any of the things I say to make sense for it. But in a medium project, I want to have access to strange parts of history to find out what happenned and how and what did we mean.

Comment author: wedrifid 13 September 2011 05:37:23PM 0 points [-]

It knows each directory by its content, so it knows when a directory was renamed, without needing to be explicitly told.

Doesn't work so well if the content is 'nothing'.

Comment author: Vladimir_Nesov 13 September 2011 05:57:36PM *  1 point [-]

Git doesn't notice these at all.

Comment author: wedrifid 13 September 2011 06:34:54PM 0 points [-]

Which is my point exactly. It is one aspect of Vi's criticism of git not storing some important data that is clearly valid. It is a tradeoff that probably doesn't matter if you are Linus and you are storing code for a Linux kernel but in other cases it is a blatant flaw that needs to be worked around via compromises or kludges.

Git is the absolute worst version control system out there (except for all the others).

Comment author: thomblake 13 September 2011 08:16:16PM 0 points [-]

In what situations would you want to store an empty directory and pay attention to whether it is renamed?

Comment author: taw 14 September 2011 09:58:31PM 3 points [-]

Empty directories are sometimes necessary and it's a pain in the ass that git cannot store them at all. I had to put almost empty README.txt files in directories like log/ in many projects. It's more a minor annoyance than anything more.

Comment author: vi21maobk9vp 13 September 2011 08:35:03PM 1 point [-]

I have a complex enough deployment helper living in Monotone repository for which it is simpler and more natural to keep a few empty directories in the repository than to check-and-create from each of ten shellscripts. It is checkout-and-use, no other setup makes sense, so "just creating them in Makefile" would be suboptimal.

Comment author: wedrifid 14 September 2011 04:43:25AM 0 points [-]

Is that something I need a justification for? My version control system throws away stuff that I am trying store. I'd also prefer it not to throw away files staring with 'b'.

I've learned to make my programs pessimistic and recreate the file system if necessary. It surprised me a few times before I learned the quirks.

Comment author: Bugmaster 11 September 2011 06:10:13AM 0 points [-]

Huh, that's pretty weird. Live an learn ! FWIW, both TortoiseSVN and Eclipse seem to be diffing against the repository (or at least they have an option to easily do so).

It's been a long time since I'd used the command-line svn tools, so I am not surprised (though saddened) that I didn't know how they worked in detail.