Brotherzed comments on The Web Browser is Not Your Client (But You Don't Need To Know That) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (47)
If you mean parse the document object model for your comments without using an external API, it would probably take me about a day, because I'm rusty with WatiN (the tool I used to used for web scraping when that was my job a couple years ago). About four hours of that would be setting up an environment. If I was up to speed, maybe a couple hours to work out the script. Not even close to hard compared to the crap I used to have to scrape. And I'm definitely not the best web scraper; I'm a non-amateur novice, basically. The basic process is this: anchor to a certain node type that is the child of another node with certain attributes and properties, and then search all the matching nodes for your user name, then extract the content of some child nodes of all the matched nodes that contain your post.
WatiN:: http://watin.org/
Selenium: http://www.seleniumhq.org/
These are the most popular tools in the Microsoft ecosystem.
As someone who has the ability to control how content is displayed to me (tip - hit f12 in google chrome), I disagree with the statement that a web browser is not a client. It is absolutely a client and if I were sufficiently motivated I could view this page in any number of ways. So can you. Easy examples you can do with no knowledge are to disable the CSS, disable JS, etc.
Sure, and if I want a karma histogram of all of my posts I can scrape my user page and get them. But that requires moving a huge amount of data from the server to me to answer a fairly simple question, which we could have computed on the server and then moved to me more cheaply.
There's no extra load on the server; you're just parsing what the page already had to send you. If your goal is just to see the web page and not data collection, it's a different solution but also feasible.
What you can do is create a simple browser plugin that injects jQuery into the page to get all the comments by a name. I'll go into technical details a bit - Inject an extra version of jQuery into the page (that you know always uses the same code, in case lesswrong changes their version of jQuery). Then use JQuery selectors to anchor to all your posts using a technique similar to the one I described for the scraper. Then transform the page to consist of nothing but the anchored comments you acquired via Jquery.
You could make this a real addon where you push a button in the top right of your chrome browser, type a username, and then you see nothing but all the posts by that user on a given page.
Same principle as Adblock plus or other browser addons.
If I look at 200 comments pages, doesn't that require the server processing my request and sending me the comments page 200 times? Especially if telling it something like "give me 10 comments by user X after comment abc" means that it's running a SQL query that compares the comment id to abc.
I do agree that there are cool things you can do to manipulate comments on a page.
As for finding your comments regardless of the thread they are on, that is already a feature of Reddit's platform - click on your username, then click "comments" to get to the LW implementation of that feature.
Regardless, that isn't what you were describing earlier. It would not put extra load on the server to have jQuery transform this thread, which has all the comments, to show only your comments on the thread. It's a client-side task. That's what you originally said was not feasible.
All this talk has actually made me consider writing an addon that makes slashdot look clean and in-line like LW, Reddit, Ycombinator, etc.
Are you confusing me with Error? What I said was inefficient was writing a scraper to get my karma histogram, on every comment (well, I wrote post) that I've ever written.
I do think that'd be a cool tool to have (though I don't use Slashdot).