Comment author: Brotherzed 25 April 2016 10:50:29PM *  1 point [-]

But consider the following problem: Find and display all comments by me that are children of this post, and only those comments, using only browser UI elements, i.e. not the LW-specific page widgets. You cannot -- and I'd be pretty surprised if you could make a browser extension that could do it without resorting to the API, skipping the previous elements in the chain above. For that matter, if you can do it with the existing page widgets, I'd love to know how.

If you mean parse the document object model for your comments without using an external API, it would probably take me about a day, because I'm rusty with WatiN (the tool I used to used for web scraping when that was my job a couple years ago). About four hours of that would be setting up an environment. If I was up to speed, maybe a couple hours to work out the script. Not even close to hard compared to the crap I used to have to scrape. And I'm definitely not the best web scraper; I'm a non-amateur novice, basically. The basic process is this: anchor to a certain node type that is the child of another node with certain attributes and properties, and then search all the matching nodes for your user name, then extract the content of some child nodes of all the matched nodes that contain your post.

WatiN:: http://watin.org/

Selenium: http://www.seleniumhq.org/

These are the most popular tools in the Microsoft ecosystem.

As someone who has the ability to control how content is displayed to me (tip - hit f12 in google chrome), I disagree with the statement that a web browser is not a client. It is absolutely a client and if I were sufficiently motivated I could view this page in any number of ways. So can you. Easy examples you can do with no knowledge are to disable the CSS, disable JS, etc.

Comment author: Error 28 April 2016 04:43:05PM 0 points [-]

Upvoted for actually considering how it could be done. It does sort of answer the letter if not the spirit of what I had in mind.

Comment author: Lumifer 23 April 2016 01:38:51AM 0 points [-]

But another is that web sites, able to provide a look and feel appropriate to their community, plainly outcompeted networks of plaintext content.

Ah, yes, porn as the engine of technology.

The web had pictures, "networks of plaintext content" did not. Case closed.

Comment author: Error 28 April 2016 04:34:49PM 0 points [-]

Objection: I'm pretty sure Usenet had a colossal amount of porn, at least by the standards of the day. Maybe even still the case. I know its most common use today is for binaries, and I assume that most of that is porn.

Comment author: Lumifer 22 April 2016 07:16:04PM *  7 points [-]

I think you are making the argument for what was known as "the semantic web" -- the term seem to have fallen into disuse, though.

I also think that my browser is a client. It's not a client for structured raw information, though, because there is no server which feeds it that (a client is just one half of a client-server pair, after all. A server-less client is not of much use). My browser is a client for web pages which used to mean mostly HTML and nowadays mean whatever JS can conjure.

By the way, where does RSS fit into your picture of the world?

Comment author: Error 28 April 2016 04:32:37PM 0 points [-]

I use RSS all the time, mostly via Firefox's subscribe-to-page feature. I've considered looking for a native-client feed reader, but my understanding is that most sites don't provide a full-text feed, which defeats the point.

I dislike that it's based on XML, mostly because, even more so than JSON, XML is actively hostile to humans. It's no less useful for that, though.

So far as I know it doesn't handle reply chains at all, making it a sub-par fit for content that spawns discussion. I may be wrong about that. I still use it as the best available method for e.g. keeping up with LW.

Comment author: DanArmak 23 April 2016 01:36:36AM 2 points [-]

It would make messages mutable (or perhaps expose an immutable history, but make the 'current' reference mutable, like git does)

As an aside, git is about as good a fit as NNTP (which is to say, neither is really all that good in my opinion).

Git has immutable messages, but it also has mutable references (branches) for edits, and the references can be deleted for retractions. It has a tree structure for comments. It has pseudonymous authentication (sign your commits). It has plenty of room for data and metadata (e.g. specify a standard mapping filenames to headers). It can run over HTTP and has existing servers and clients including Javascript ones. It can be deployed in a centralized model (everyone pushes to the same server) but others can mirror your server using the same protocol, and there are RSS and email gateways available. Messages (commits) have globally unique IDs, allowing for incoming links. It makes your server state trivial to backup and to mirror. I could go on.

In fact, someone has already thought of this and wrote a similar system, called GitRap! (I didn't know about it before I checked just now.) It doesn't do exactly what I described, and it's tied to github right now, but you can view it as a POC.

To be clear: I am 95% not serious about this proposal. Solving the vastly simpler centralized problem is probably better.

Comment author: Error 28 April 2016 04:21:23PM 0 points [-]

I think that's a terrible idea and it is awesome that it exists. :-P

Comment author: DanArmak 23 April 2016 01:22:57AM *  7 points [-]

I completely agree with everything you've said in the first half of your post. But I strongly disagree that NNTP is a good choice for a backend standard. (At this point you can say that you'll argue your case in a future post instead of replying to this comment.)

The NNTP model differs from modern forums in a crucial respect: it is a distributed system. (I use this in the sense of 'decentralized'.) More precisely, it is a AP system: it doesn't provide consistency in synchronizing messages between servers (and it makes messages immutable, so it gets the worst of both worlds really). This directly leads to all the problems we'd have in using NNTP for a forum, such as no true editing or deleting of messages. Because a message is not tied to a domain (or a server), and is not referenced but copied to other servers, authentication (proving you own an identity and wrote a post) and authorization (e.g. mod powers) become nontrivial. Messages don't have globally unique IDs, or even just global IDs. Implementing something like karma becomes an interesting computer science exercise involving decentralized consensus algorithms, rather than a trivial feature of a centralized database. And so on.

But we don't need to deal with the problems of distributed systems, because web forums aren't distributed! What we want is a standard that will model the way forums already work, plus or minus some optional or disputed extensions. Making NNTP resemble a forum would require adding so many things on top that there's no point in using NNTP in the first place: it just doesn't fit the model we want.

A good forum model would tie users and messages to a particular server. It would make messages mutable (or perhaps expose an immutable history, but make the 'current' reference mutable, like git does). It would at least provide a substrate for mutable metadata that karma-like systems could use, even if these systems were specified as optional extensions to the standard. It would allow for some standardized message metadata (e.g. Content-Encoding and Content-Type equivalents). It would pretty much look like what you'd get if you designed the API of a generalized forum, talking json over http, while trying not imagine the clientside UI.

There's probably an existing standard or three like this somewhere in the dustbin of history.

NNTP also has a lot of more minor ugliness that I'd be happy to argue against. It's one of the http/mime/email family of headers-body encodings, which is well known for producing fragile implementations (quick, recite the header folding rules!) and are all subtly different from one another to make sure everyone's sufficiently confused. It relies on sometimes complex session state instead of simple separate requests. There's a bunch of optional features (many of historical interest), but at the same time the protocol is extremely underspecified (count how many times it says you SHOULD but not MUST do something, and MAY do quite the opposite instead). Any client-server pair written from scratch inevitably ends up speaking a highly restricted dialect, which doesn't match that of any other client or server.

Given all of this, the only possible value of using NNTP is the existing software that already implements it. But there's no implementation of an NNTP client in Javascript (unless you want to use emscripten), if only because Javascript in a browser can't open a raw TCP socket, so until the recent advent of websocket-to-tcp proxies, nobody could write one. And implementing a new HTTP-based server to a new (very simple) standard, basically just CRUD on a simple schema, is much easier than writing an NNTP JS client - IF you're willing to not make a distributed system.

A final note: one may well argue that we do want a distributed, decentralized system with immutable messages (or immutable old-message-versions), because such systems are inherently better. And in an ideal world I'd agree. But they're also far, far harder to get right, and the almost inevitable tradeoffs are hard to sell to users. I'm not convinced we need to solve the much harder distributed version of the problem here. (Also, many decentralization features can be added in a secondary layer on top of a centralized system if the core is well designed.)

Comment author: Error 28 April 2016 04:19:06PM 1 point [-]

At this point you can say that you'll argue your case in a future post instead of replying to this comment.

I will, but I'll answer you here anyway -- sorry for taking so long to reply.

I strongly disagree that NNTP is a good choice for a backend standard

I feel I should clarify that I don't think it's "good", so much as "less bad than the alternatives".

But we don't need to deal with the problems of distributed systems, because web forums aren't distributed!

Well, yes and no. Part of what got me on this track in the first place is the distributed nature of the diaspora. We have a network of more-and-more-loosely connected subcommunities that we'd like to keep together, but the diaspora authors like owning their own gardens. Any unified system probably needs to at least be capable of supporting that, or it's unlikely to get people to buy back in. It's not sufficient, but it is necessary, to allow network members to run their own server if they want.

That being said, it's of interest that NNTP doesn't have to be run distributed. You can have a standalone server, which makes things like auth a lot easier. A closed distribution network makes it harder, but not that much harder -- as long as every member trusts every other member to do auth honestly.

The auth problem as I see it boils down to "how can user X with an account on Less Wrong post to e.g. SSC without needing to create a separate account, while still giving SSC's owner the capability to reliably moderate or ban them." There are a few ways to attack the problem; I'm unsure of the best method but it's on my list of things to cover.

Given all of this, the only possible value of using NNTP is the existing software that already implements it.

This is a huge value, though, because most extant web forum, blogging, etc software is terrible for discussions of any nontrivial size.

There's probably an existing standard or three like this somewhere in the dustbin of history.

Is there?

That's a serious question, because I'd love to hear about alternative standards. My must-have list looks something like "has an RFC, has at least three currently-maintained, interoperable implementations from different authors, and treats discussion content as its payload, unmixed with UI chrome." I'm only aware of NNTP meeting those conditions, but my map is not the territory.

Comment author: NancyLebovitz 25 April 2016 02:01:07PM -2 points [-]

I've already banned them and their comments.

Comment author: Error 25 April 2016 04:35:57PM -2 points [-]

Out of curiosity, is he still serial downvoting? I thought of something that may convince him to stop: Instead of deleting his accounts, disable them and convert all their downvotes against known targets into upvotes (and make sure he knows that). If all his efforts end up benefiting the very people he's trying to hurt, well...

The Web Browser is Not Your Client (But You Don't Need To Know That)

22 Error 22 April 2016 12:12AM

(Part of a sequence on discussion technology and NNTP. As last time, I should probably emphasize that I am a crank on this subject and do not actually expect anything I recommend to be implemented. Add whatever salt you feel is necessary)1


If there is one thing I hope readers get out of this sequence, it is this: The Web Browser is Not Your Client.

It looks like you have three or four viable clients -- IE, Firefox, Chrome, et al. You don't. You have one. It has a subforum listing with two items at the top of the display; some widgets on the right hand side for user details, RSS feed, meetups; the top-level post display; and below that, replies nested in the usual way.

Changing your browser has the exact same effect on your Less Wrong experience as changing your operating system, i.e. next to none.

For comparison, consider the Less Wrong IRC, where you can tune your experience with a wide range of different software. If you don't like your UX, there are other clients that give a different UX to the same content and community.

That is how the mechanism of discussion used to work, and does not now. Today, your user experience (UX) in a given community is dictated mostly by the admins of that community, and software development is often neither their forte nor something they have time for. I'll often find myself snarkily responding to feature requests with "you know, someone wrote something that does that 20 years ago, but no one uses it."

Semantic Collapse

What defines a client? More specifically, what defines a discussion client, a Less Wrong client?

The toolchain by which you read LW probably looks something like this; anyone who's read the source please correct me if I'm off:

Browser -> HTTP server -> LW UI application -> Reddit API -> Backend database.

The database stores all the information about users, posts, etc. The API presents subsets of that information in a way that's convenient for a web application to consume (probably JSON objects, though I haven't checked). The UI layer generates a web page layout and content using that information, which is then presented -- in the form of (mostly) HTML -- by the HTTP server layer to your browser. Your browser figures out what color pixels go where.

All of this is a gross oversimplification, obviously.

In some sense, the browser is self-evidently a client: It talks to an http server, receives hypertext, renders it, etc. It's a UI for an HTTP server.

But consider the following problem: Find and display all comments by me that are children of this post, and only those comments, using only browser UI elements, i.e. not the LW-specific page widgets. You cannot -- and I'd be pretty surprised if you could make a browser extension that could do it without resorting to the API, skipping the previous elements in the chain above. For that matter, if you can do it with the existing page widgets, I'd love to know how.

That isn't because the browser is poorly designed; it's because the browser lacks the semantic information to figure out what elements of the page constitute a comment, a post, an author. That information was lost in translation somewhere along the way.

Your browser isn't actually interacting with the discussion. Its role is more akin to an operating system than a client. It doesn't define a UX. It provides a shell, a set of system primitives, and a widget collection that can be used to build a UX. Similarly, HTTP is not the successor to NNTP; the successor is the plethora of APIs, for which HTTP is merely a substrate.

The Discussion Client is the point where semantic metadata is translated into display metadata; where you go from 'I have post A from user B with content C' to 'I have a text string H positioned above visual container P containing text string S.' Or, more concretely, when you go from this:

Author: somebody
Subject: I am right, you are mistaken, he is mindkilled.
Date: timestamp
Content: lorem ipsum nonsensical statement involving plankton....

to this:

<h1>I am right, you are mistaken, he is mindkilled.</h1>
<div><span align=left>somebody</span><span align=right>timestamp</span></div>
<div><p>lorem ipsum nonsensical statement involving plankton....</p></div>

That happens at the web application layer. That's the part that generates the subforum headings, the interface widgets, the display format of the comment tree. That's the part that defines your Less Wrong experience, as a reader, commenter, or writer.

That is your client, not your web browser. If it doesn't suit your needs, if it's missing features you'd like to have, well, you probably take for granted that you're stuck with it.

But it doesn't have to be that way.

Mechanism and Policy

One of the difficulties forming an argument about clients is that the proportion of people who have ever had a choice of clients available for any given service keeps shrinking. I have this mental image of the Average Internet User as having no real concept for this.

Then I think about email. Most people have probably used at least two different clients for email, even if it's just Gmail and their phone's built-in mail app. Or perhaps Outlook, if they're using a company system. And they (I think?) mostly take for granted that if they don't like Outlook they can use something else, or if they don't like their phone's mail app they can install a different one. They assume, correctly, that the content and function of their mail account is not tied to the client application they use to work with it.

(They may make the same assumption about web-based services, on the reasoning that if they don't like IE they can switch to Firefox, or if they don't like Firefox they can switch to Chrome. They are incorrect, because The Web Browser is Not Their Client)

Email does a good job of separating mechanism from policy. Its format is defined in RFC 2822 and its transmission protocol is defined in RFC 5321. Neither defines any conventions for user interfaces. There are good reasons for that from a software-design standpoint, but more relevant to our discussion is that interface conventions change more rapidly than the objects they interface with. Forum features change with the times; but the concepts of a Post, an Author, or a Reply are forever.

The benefit of this separation: If someone sends you mail from Outlook, you don't need to use Outlook to read it. You can use something else -- something that may look and behave entirely differently, in a manner more to your liking.

The comparison: If there is a discussion on Less Wrong, you do need to use the Less Wrong UI to read it. The same goes for, say, Facebook.

I object to this.

Standards as Schelling Points

One could argue that the lack of choice is for lack of interest. Less Wrong, and Reddit on which it is based, has an API. One could write a native client. Reddit does have them.

Let's take a tangent and talk about Reddit. Seems like they might have done something right. They have (I think?) the largest contiguous discussion community on the net today. And they have a published API for talking to it. It's even in use.

The problem with this method is that Reddit's API applies only to Reddit. I say problem, singular, but it's really problem, plural, because it hits users and developers in different ways.

On the user end, it means you can't have a unified user interface across different web forums; other forum servers have entirely different APIs, or none at all.2 It also makes life difficult when you want to move from one forum to another.

On the developer end, something very ugly happens when a content provider defines its own provision mechanism. Yes, you can write a competing client. But your client exists only at the provider's sufferance, subject to their decision not to make incompatible API changes or just pull the plug on you and your users outright. That isn't paranoia; in at least one case, it actually happened. Using an agreed-upon standard limits this sort of misbehavior, although it can still happen in other ways.

NNTP is a standard for discussion, like SMTP is for email. It is defined in RFC 3977 and its data format is defined in RFC 5536. The point of a standard is to ensure lasting interoperability; because it is a standard, it serves as a deliberately-constructed Schelling point, a place where unrelated developers can converge without further coordination.

Expertise is a Bottleneck

If you're trying to build a high-quality community, you want a closed system. Well kept gardens die by pacifism, and it's impossible to fully moderate an open system. But if you're building a communication infrastructure, you want an open system.

In the early Usenet days, this was exactly what existed; NNTP was standardized and open, but Usenet was a de-facto closed community, accessible mostly to academics. Then AOL hooked its customers into the system. The closed community became open, and the Eternal September began.3 I suspect, but can't prove, that this was a partial cause of the flight of discussion from Usenet to closed web forums.

I don't think that was the appropriate response. I think the appropriate response was private NNTP networks or even single servers, not connected to Usenet at large.

Modern web forums throw the open-infrastructure baby out with the open-community bathwater. The result, in our specific case, is that if we want something not provided by the default Less Wrong interface, it must be implemented by Less Wrongers.

I don't think UI implementation is our comparative advantage. In fact I know it isn't, or the Less Wrong UI wouldn't suck so hard. We're pretty big by web-forum standards, but we still contain only a tiny fraction of the Internet's technical expertise.

The situation is even worse among the diaspora; for example, at SSC, if Scott's readers want something new out of the interface, it must be implemented either by Scott himself or his agents. That doesn't scale.

One of the major benefits of a standardized, open infrastructure is that your developer base is no longer limited to a single community. Any software written by any member of any community backed by the same communication standard is yours for the using. Additionally, the developers are competing for the attention of readers, not admins; you can expect the reader-facing feature set to improve accordingly. If readers want different UI functionality, the community admins don't need to be involved at all.

A Real Web Client

When I wrote the intro to this sequence, the most common thing people insisted on was this: Any system that actually gets used must allow links from the web, and those links must reach a web page.

I completely, if grudgingly, agree. No matter how insightful a post is, if people can't link to it, it will not spread. No matter how interesting a post is, if Google doesn't index it, it doesn't exist.

One way to achieve a common interface to an otherwise-nonstandard forum is to write a gateway program, something that answers NNTP requests and does magic to translate them to whatever the forum understands. This can work and is better than nothing, but I don't like it -- I'll explain why in another post.

Assuming I can suppress my gag reflex for the next few moments, allow me to propose: a web client.

(No, I don't mean write a new browser. The Browser Is Not Your Client.4)

Real NNTP clients use the OS's widget set to build their UI and talk to the discussion board using NNTP. There is no fundamental reason the same cannot be done using the browser's widget set. Google did it. Before them, Deja News did it. Both of them suck, but they suck on the UI level. They are still proof that the concept can work.

I imagine an NNTP-backed site where casual visitors never need to know that's what they're dealing with. They see something very similar to a web forum or a blog, but whatever software today talks to a database on the back end, instead talks to NNTP, which is the canonical source of posts and post metadata. For example, it gets the results of a link to http://lesswrong.com/posts/message_id.html by sending ARTICLE message_id to its upstream NNTP server (which may be hosted on the same system), just as a native client would.

To the drive-by reader, nothing has changed. Except, maybe, one thing. When a regular reader, someone who's been around long enough to care about such things, says "Hey, I want feature X," and our hypothetical web client doesn't have it, I can now answer:

Someone wrote something that does that twenty years ago.

Here is how to get it.



  1. Meta-meta: This post took about eight hours to research and write, plus two weeks procrastinating. If anyone wants to discuss it in realtime, you can find me on #lesswrong or, if you insist, the LW Slack.

  2. The possibility of "universal clients" that understand multiple APIs is an interesting case, as with Pidgin for IM services. I might talk about those later.

  3. Ironically, despite my nostalgia for Usenet, I was a part of said September; or at least its aftermath.

  4. Okay, that was a little shoehorned in. The important thing is this: What I tell you three times is true.

Comment author: Dagon 07 April 2016 08:05:02PM 1 point [-]

And now that I actually write it down and compare it to previous online communities (including a few mixed online/offline) I've been part of and loved, and which have universally followed the same pattern of growth, overgrowth, loss of some driving valuable members without obvious replacement, slow decay into irrelevance (to me; at least 2 of them are going strong, just with a different feel than when I was involved)), I'm pretty pessimistic.

I'm going to put some effort into being OK with LW as it is, enjoying the parts I enjoy and being willing to follow those parts I'm missing to their new homes.

Comment author: Error 08 April 2016 12:24:11AM 3 points [-]

This fits my own prior experience of the life cycle of a community -- but when my previous community failed, a fragment of it broke off and rebuilt itself in a few form. That fragment still exists as a coherent tribe more than a decade later, and I still love it even if I disagree with certain, uh, technical decisions surrounding the splintering process.

So it's not impossible.

Comment author: AlanCrowe 07 April 2016 08:09:00PM 6 points [-]

My analysis saw the fundamental problem as the yearning for consensus. What was signal? What was noise? Who was trolling? Designers of forum software go wrong when they believe that these are good, one place questions with actual one place answers. The software is designed in the hope that its operation will yield these answers.

My suggestion, Outer Circle got discussed on Hacker News under the title Saving forums from themselves with shared hierarchical white lists and I managed to flesh out the ideas a little.

Then my frail health got even worse and I never did anything more :-(

Comment author: Error 08 April 2016 12:19:45AM 2 points [-]

That is an excellent and thought-provoking essay, and a novel approach.

...I actually don't have more to say about it, but I thought you'd like to know that someone read it.

Comment author: Elo 04 April 2016 08:46:23AM *  2 points [-]

user account: "Lamp" is banned for being eugine_nier. This is an update in case anyone was wondering.

so far accounts have been:

  • Eugine_Nier
  • Azazoth123
  • The_Lion
  • The_Lion2
  • Old_Gold
  • Lamp

(that I know of, I think there were more in between too that I forgot.)

If I could send this guy a message it would be this: You are quite literally wasting our time. And by "our" I mean; the moderators and the people who could be spending their time improving the place, coding and implementing a better place; instead are spending their time getting rid of you over and over. DON'T COME BACK. You are literally killing LW.

I don't want to get into the community's time or the time of the people you debate with; or the time of anyone who reads this post here. That time also adds up. Seriously.

Comment author: Error 07 April 2016 07:39:45PM 3 points [-]

Note that there is now a Lamp2. Going by the quoted parts of this subthread, he appears to be reposting his own deleted comments verbatim.

I'm a sometime admin. Ban evasion irritates me.

View more: Prev | Next