You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Alexandros comments on 96 Bad Links in the Sequences - Less Wrong Discussion

34 Post author: Alexandros 07 April 2011 10:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (27)

You are viewing a single comment's thread. Show more comments above.

Comment author: Alexandros 08 April 2011 06:14:20AM 0 points [-]

Yeah, lxml processes all the html into a tree and gives you an API so you can access it as you like. It takes a lot of the grunt work out of extracting data from HTML.

Comment author: jwhendy 08 April 2011 01:23:11PM 0 points [-]

Which is awesome, as I just felt the pain of hand pruning a heckuva lot of html tags out of something I wanted to transform to a different format. Even with my find-replacing, line breaks would prevent the tag from getting detected fully and I had to do a lot of tedious stuff :)