Alexandros comments on 96 Bad Links in the Sequences - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (27)
Yeah, lxml processes all the html into a tree and gives you an API so you can access it as you like. It takes a lot of the grunt work out of extracting data from HTML.
Which is awesome, as I just felt the pain of hand pruning a heckuva lot of html tags out of something I wanted to transform to a different format. Even with my find-replacing, line breaks would prevent the tag from getting detected fully and I had to do a lot of tedious stuff :)