Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

sketerpot comments on What are you working on? - Less Wrong

23 Post author: jsalvatier 04 March 2011 01:33AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (103)

You are viewing a single comment's thread.

Comment author: sketerpot 04 March 2011 03:19:14AM *  10 points [-]

I'm writing the core of a system similar to Google Wave, but much simpler, more flexible, and mathematically elegant. It works peer-to-peer, too, which is nice. It's got some clever algorithms in it, and even cleverer data structures, and I've been through several iterations of the design and code. It's probably the most complex thing I've ever written, and it could be pretty darn handy in the future. This is a surprisingly difficult problem to solve well.

(I also did a quick detour one-off project: pwstore, for anybody wanting to store passwords in Haskell. I feel pretty good about this one; it's useful.)

I'm also writing a NaNoWriMo book, which I'd left half-finished back in November, because it actually was a lot of fun to write. I've got one wonderfully snarky character; if I give her first-person narrative, the story becomes extra fun. I don't even know how many words I'm up to now. It's a sprawling fantasy adventure thing.

Comment author: virtualAdept 04 March 2011 03:31:11PM 2 points [-]

I loved the idea of Wave, and was sad that it didn't catch on well. I'd love to hear more about your project and/or hear about it when it goes live.

Comment author: sketerpot 04 March 2011 07:37:05PM *  5 points [-]

I actually have a demo of an earlier version up on a nameless Amazon AWS micro instance, built with node.js and redis and a rather terrifying amount of JavaScript code:


There are some weird browser compatibility issues, but it works in vaguely recent versions of Chrome and Safari and Firefox.

There are two parts to it. The core of what I'm making is, essentially, a synchronized string data type: a string where changes are immediately applied locally, and propagate across the network. The other part, necessary for actually using the thing, is making an application which maps documents onto strings. That's not really hard so much as it is irritating, so I won't talk more about it.

The synchronized string part is actually pretty cool. You apply some arbitrary patch of insertions and deletions, and send this out over the network. No matter what order these patches are applied in, they're guaranteed to converge to the same ultimate result -- it's a lot like using a distributed version control system, except that you do a branch and merge on every keystroke, in a way that isn't slow and doesn't break horribly. :-)

You get the complete version history with the document, along with per-character records of who inserted it. If you want to have a server holding the canonical version of the document, you can do that, or go serverless. Technically, it's based on the really clever idea of causal trees (PDF) invented by Victor Grishchenko; I've just been adding refinements so it goes fast and doesn't involve terrifying regular expressions.

Comment author: gwern 04 March 2011 08:17:25PM 1 point [-]

Since you mentioned Haskell...

No matter what order these patches are applied in, they're guaranteed to converge to the same ultimate result -- it's a lot like using a distributed version control system, except that you do a branch and merge on every keystroke, in a way that isn't slow and doesn't break horribly.

What's the connection to Darcs? A faster less-powerful version since branches won't be long-lived and not all that much commutation should be necessary?

Comment author: sketerpot 04 March 2011 08:39:19PM *  6 points [-]

It's pretty similar to Darcs-style patch manipulation. I've only skimmed some introductory material on exactly how Darcs does it, but it looks a lot like what Wave does:

  1. Make patches invertible.

  2. Define a function T :: Patch -> Patch -> Patch that transforms one patch to apply to the document after a second patch has already been applied to it.

With these primitives, you can arbitrarily re-order patches. That's not quite the way I'm doing it, though. I represent the document as a tree of atoms, each one with a unique id and a pointer to a predecessor. You get the document state by doing a pre-order depth first traversal of this tree, where siblings are traversed in an order that's guaranteed to be the same on all clients, which I won't define because it would take way too long to write out.

The patches are forests of atoms, and they get anchored to one or more nodes in the document tree. Atoms are never deleted, but they can be made invisible by attaching a special deletor atom to them. This guarantees that patches in the future will always be able to find their anchor points. I've tried both this and a system closer to Darcs', and this was simpler to understand and implement.

In terms of power, this is equivalent to what Darcs does with permuting patches, and it can handle arbitrarily long-lived branches. My system's patches commute naturally without any transformation, no matter how far document states may have diverged, which is a plus.

As for the speed, all operations are at most linear in the size of the document, with small constant factors, and I'm working on making the common cases all happen in logarithmic time, using monoid-cached trees -- the finger tree is a prominent example. I don't know how this compares with Darcs, but I wouldn't expect them to be asymptotically faster here.

If I had more time and wasn't so focused on just getting this done so I can get out of grad school, it would be interesting to make a DVCS based on this.

Comment author: jsalvatier 04 March 2011 09:09:07PM 0 points [-]

That is pretty neat.

Comment author: virtualAdept 04 March 2011 08:13:58PM 0 points [-]

Neat! Thanks for the info.