Vaniver comments on The Web Browser is Not Your Client (But You Don't Need To Know That) - Less Wrong

22 Post author: Error 22 April 2016 12:12AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (47)

You are viewing a single comment's thread. Show more comments above.

Comment author: Brotherzed 25 April 2016 10:50:29PM *  1 point [-]

But consider the following problem: Find and display all comments by me that are children of this post, and only those comments, using only browser UI elements, i.e. not the LW-specific page widgets. You cannot -- and I'd be pretty surprised if you could make a browser extension that could do it without resorting to the API, skipping the previous elements in the chain above. For that matter, if you can do it with the existing page widgets, I'd love to know how.

If you mean parse the document object model for your comments without using an external API, it would probably take me about a day, because I'm rusty with WatiN (the tool I used to used for web scraping when that was my job a couple years ago). About four hours of that would be setting up an environment. If I was up to speed, maybe a couple hours to work out the script. Not even close to hard compared to the crap I used to have to scrape. And I'm definitely not the best web scraper; I'm a non-amateur novice, basically. The basic process is this: anchor to a certain node type that is the child of another node with certain attributes and properties, and then search all the matching nodes for your user name, then extract the content of some child nodes of all the matched nodes that contain your post.

WatiN:: http://watin.org/

Selenium: http://www.seleniumhq.org/

These are the most popular tools in the Microsoft ecosystem.

As someone who has the ability to control how content is displayed to me (tip - hit f12 in google chrome), I disagree with the statement that a web browser is not a client. It is absolutely a client and if I were sufficiently motivated I could view this page in any number of ways. So can you. Easy examples you can do with no knowledge are to disable the CSS, disable JS, etc.

Comment author: Vaniver 28 April 2016 05:36:26PM 0 points [-]

Sure, and if I want a karma histogram of all of my posts I can scrape my user page and get them. But that requires moving a huge amount of data from the server to me to answer a fairly simple question, which we could have computed on the server and then moved to me more cheaply.

Comment author: Brotherzed 10 May 2016 08:49:15PM *  0 points [-]

There's no extra load on the server; you're just parsing what the page already had to send you. If your goal is just to see the web page and not data collection, it's a different solution but also feasible.

What you can do is create a simple browser plugin that injects jQuery into the page to get all the comments by a name. I'll go into technical details a bit - Inject an extra version of jQuery into the page (that you know always uses the same code, in case lesswrong changes their version of jQuery). Then use JQuery selectors to anchor to all your posts using a technique similar to the one I described for the scraper. Then transform the page to consist of nothing but the anchored comments you acquired via Jquery.

You could make this a real addon where you push a button in the top right of your chrome browser, type a username, and then you see nothing but all the posts by that user on a given page.

Same principle as Adblock plus or other browser addons.

Comment author: Vaniver 11 May 2016 12:38:31PM 0 points [-]

There's no extra load on the server; you're just parsing what the page already had to send you.

If I look at 200 comments pages, doesn't that require the server processing my request and sending me the comments page 200 times? Especially if telling it something like "give me 10 comments by user X after comment abc" means that it's running a SQL query that compares the comment id to abc.

I do agree that there are cool things you can do to manipulate comments on a page.

Comment author: Brotherzed 23 May 2016 04:30:01PM *  0 points [-]

If I look at 200 comments pages, doesn't that require the server processing my request and sending me the comments page 200 times?

As for finding your comments regardless of the thread they are on, that is already a feature of Reddit's platform - click on your username, then click "comments" to get to the LW implementation of that feature.

Regardless, that isn't what you were describing earlier. It would not put extra load on the server to have jQuery transform this thread, which has all the comments, to show only your comments on the thread. It's a client-side task. That's what you originally said was not feasible.

All this talk has actually made me consider writing an addon that makes slashdot look clean and in-line like LW, Reddit, Ycombinator, etc.

Comment author: Vaniver 23 May 2016 11:57:16PM 0 points [-]

That's what you originally said was not feasible.

Are you confusing me with Error? What I said was inefficient was writing a scraper to get my karma histogram, on every comment (well, I wrote post) that I've ever written.

All this talk has actually made me consider writing an addon that makes slashdot look clean and in-line like LW, Reddit, Ycombinator, etc.

I do think that'd be a cool tool to have (though I don't use Slashdot).