If HTML is supposed to be about semantics of the page, the NOINDEX tag should have been a part of every HTML specification, at least since server-side scripting became popular.
There is a lot of repeated text on each page of many websites, that really isn't part of the content, such as: "write your comment here", "next page", "previous page", "username / password", "permalink", etc.
I wonder if your website contains a word "permalink" in each page and comment, and there is one page that is really about permalinks, whether Google can tell the difference.
LessWrong search traffic doubles... despite Google thinking our site is a pro-family pro-democracy astrology blog! More on that in a minute.
First, The Good News: Since I started doing SEO on LessWrong (10 months ago) search traffic from Google has doubled! It took researching >200 different techniques -- actually implementing 14 of them (w/ help from Tricycle) -- 2 of which I think are responsible for most of the improvement:
Anyway, I'm really happy about this! This was the explicit goal I set for myself 10 months ago. It's nice to achieve goals... especially unreasonably ambitious ones.
So... YAY!! :D
OK, Now, The Bad News: So I was trying to figure out why we never get any traction for search terms like "rationality" when I looked through Google Webmaster tools. This is what Google thinks our site is about, keyword wise:
All the keywords that I bolded are purely structural elements of the Less Wrong site layout. And it appears Google actually is punishing our site for this keyword density imbalance. Google really does think our site is about voting, parenting, and astrology. And while I find it somewhat hilarious that our top source of Google impressions (27,000/mo) is for the keyword "babies", I also lament that the keyword "rationality" is our #3955 source of traffic. We should invert this.
So does anyone have any ideas? How do other sites solve this problem?