March 20, 2007, 2:40 p.m.

Centralized Revision Control is Stupid

This has been bothering me for quite a while, so I need to make an official written statement. This is a public service announcement: centralized revision control systems are stupid. All of them. It's just irresponsible to use them in this modern day.

Distributed systems are just better in every conceivable way. You can emulate central revision control systems with distributed systems, but you can't emulated distributed systems with central systems.

If you haven't used a distributed system, you're probably thinking to yourself, This is ridiculous, why would I want a distributed system, there's just one of me! Do you have a laptop? Is your centralized server ever unreachable to you for any reason? Have you ever had anyone else wanting to work on a project with you? Have you ever wanted to make an experimental branch where you don't know if the direction makes any sense at all?

These are all areas where distributed systems shine. I'll go over my recent experience with revision control systems briefly:

SCM in Open Source Environments

This one should be obvious to anyone. You are, by nature, working in a distributed fashion. Maybe there's only one of you, but if anyone actually uses your work, someone might be interested in suggesting a minor change. here's how it works with, for example, subversion:

Contributor: Hey, I made a bit of a change to your app to add a feature I like, are you interested?
Owner: Sure. I'm a little busy now, but send in the patch and I'll try to get it in real soon.
[time passes]
Owner: Hey, your patch doesn't apply. Can you update it?
Contributor: Um, sure. (Damnit, this stuff has moved around and I don't quite remember what all I changed...). OK, I think I've got it. Here's the new patch.
[time passes]
Owner: Oh...hey, can you do that again?

That's roughly my experience with trying to track and make changes to a project that sits in a centralized revision control system. Because I can't check in my changes, I have to record what I'd done somewhere else, and try to determine meaning at some later point in time. I don't like having uncommitted changes sitting around, and the coordination all has to be done out of band with various manual means of merging.

When I've come across projects whose code is in distributed revision control systems, it's a lot more natural. I can simply branch it (don't need permission from the owner), and maintain my own branch with all of my changes in it. In the above scenario, all I need to do is update, merge, resolve conflicts with the context of what I'd worked on (although the tools will take care of more of them since they understand renames and stuff), and then send changesets back upstream in tact.

The difference is subtle enough that it's probably not possible to understand without experience. It's sort of like programming without exceptions. Many people who have never encountered exceptions are confused and annoyed by yet another way to signal failure, but once you actually use them, it becomes very annoying to try to program without them.

SCM in Corporate Environments

There are two ways that centralized revision control systems are painful in the workplace:

  1. Having to be connected to do anything useful.
  2. Trying to deal with outsourcing.

The last two companies I have worked at have used Perforce for revision control. You can think of Perforce as a faster CVS with atomic commits and a loose notion of changesets. Perforce is fast because it doesn't scan the tree when you're doing commits and similar operations. It doesn't scan the tree because you have to tell it everything you're doing as you're doing it. It helps you with this by keeping your files read-only most of the time.

To work offline, you basically start changing file permissions and try to figure out everything you've changed so you can commit it when you're online again. It sort of works, although it doesn't work with tools like eclipse. Why should a power outage/network outage/business trip/sick kid/etc... make it more difficult for me to work on my local machine?

Outsourcing is generally a similar example to the above, but with a new twist. They may produce work I'm not even interested in having. I want to be able to examine their work before dumping it into my tree. I'd like to examine it before dumping it into my revision control system at all. I've had things like ISO images checked into perforce.

A distributed revision control system means each site/user/whatever has a local repository that's very fast for normal use, but can still pull changes from HQ whenever they're ready. When HQ is ready to pull the changes back, they automatically know exactly what sources the new changes were built from, and can pull the changes that look good into a temporary location before moving them into an area where other developers in HQ will have to deal with them.

This kind of work is so easy, you don't even really think about it.

Links

Do us all a favor and go grab a distributed revision control system and try it out. Help make the world a better place.

Darcs
Darcs is probably the easiest revision control system ever made. This is really surprising considering it's probably also the most advanced.
Mercurial
hg is my current favorite and is getting into some bigger projects (see opensolaris and openjdk). It feels a lot like darcs, but is slightly more explicit in merging. Supposedly it scales much better, too.
tailor
Tailor is not a revision control system, but a tool that allows you to move changesets between revision control systems. Probably required once you try a distributed system, realize the errors of your ways and want to convert completely over and retain history.
blog comments powered by Disqus