LessWrong search traffic doubles... despite Google thinking our site is a pro-family pro-democracy astrology blog! More on that in a minute.

First, The Good News: Since I started doing SEO on LessWrong (10 months ago) search traffic from Google has doubled! It took researching >200 different techniques -- actually implementing 14 of them (w/ help from Tricycle) -- 2 of which I think are responsible for most of the improvement:

  • Reversing titles (e.g., "Less Wrong - OMG Scholarship!" -> "OMG Scholarship! - Less Wrong")
  • No-Following / No-Indexing a complex set of duplicate content
The analytics make me believe that this improvement is due to structural changes and not just generally increased traffic. But it certainly hasn't hurt that people have been writing new content and that HP:MoR exists.

Anyway, I'm really happy about this! This was the explicit goal I set for myself 10 months ago. It's nice to achieve goals... especially unreasonably ambitious ones.

So... YAY!! :D

OK, Now, The Bad News: So I was trying to figure out why we never get any traction for search terms like "rationality" when I looked through Google Webmaster tools. This is what Google thinks our site is about, keyword wise:

 

Keyword Occurrences
vote 196504
points 152881
permalink 95106
children 84578
parent 56374
people 37047
it's 27082
march 21846
february 21520
january 20425
human 19587
december 18005
september 15695
august 15667
password 15377
april 14714
october 14011
seem 12822
november 11546
july 11265
june 9283
world 8542
post 8496
actual 8251
probability 8114
child 7828
moral 7787
work 7143
might 6250
new 6156
theory 5827
argument 5639
read 5278
utility 5206
account 5002
evident 4777
belief 4749
remember 4691
recent 4584
intelligent 4582
science 4424
eliezer 4384
doesn't 4339
rationality 4188
brain 3969
decision 3904
life 3795
username 3732
mind 3721

All the keywords that I bolded are purely structural elements of the Less Wrong site layout. And it appears Google actually is punishing our site for this keyword density imbalance. Google really does think our site is about voting, parenting, and astrology. And while I find it somewhat hilarious that our top source of Google impressions (27,000/mo) is for the keyword "babies", I also lament that the keyword "rationality" is our #3955 source of traffic. We should invert this.

So does anyone have any ideas? How do other sites solve this problem?

New Comment
35 comments, sorted by Click to highlight new comments since: Today at 8:21 PM

[joke] Change the names of the structural elements to keywords we consider important! For instance,

  • "Vote up / down" -> "rationality up / down"
  • "points" -> "paperclips"
  • "permalink" -> "timeless commenting decision"
  • "password" -> "the teacher's password"
  • "username" -> "code name in the Bayesian conspiracy"

EDIT: You know, I actually like the "points" -> "paperclips" change for real.

+1 to points -> paperclips :-D

I have previously suggested "Vote up/down" to "More like this/Less like this", to generally positive reception.

parent/children -> above/below? There should be something suitable.

When I put the word "rationality" into Google, the first hit is Wikipedia, the second is "Twelve Virtues of Rationality" and the third is LessWrong. How much of LW's low traffic on the word can be attributed to people just not searching on the word much? Edit: This was an artifact of searching logged-in - not logged in, it's not even on the front page.

Bending one's site out of shape for an idiot Googlebot sorta sucks, really. But on my own sites, Google supplies 97% of the search engine traffic. So I suppose one must do what one has to if traffic is a goal.

RationalWiki doesn't give a hoot about SEO, so has an accordingly poor showing and terrible pagerank. RW's hit articles tend to be stuff that it covers well that doesn't rate a Wikipedia article, e.g. Poe's law, Project Blue Beam, European Union Times. The whole answer to succeeding as a wiki is "provide something Wikipedia can't or won't."

When I put the word "rationality" into Google, the first hit is Wikipedia, the second is "Twelve Virtues of Rationality" and the third is LessWrong. How much of LW's low traffic on the word can be attributed to people just not searching on the word much?

Are you signed into google or not? When you're signed in, it tailors the results to your search history.

D'oh! Well spotted - not logged in, LessWrong is not on the front page.

On the plus side, Harry Potter and the Methods of Rationality is the fourth response to Rationality, even signed out.

And Yudkowski.net is result #6

I am completely clueless about SEO, but the tag line "a community blog devoted to refining the art of human rationality" is part of an image file and as such invisible to Google, right? Making it equally prominently visible to Google as it is to humans seems like the sort of thing that would help. I don't know what the best way to do that would be though, alt text?

Yes looking at the source html, the image has the alt text "Less Wrong"/"Less Wrong Discussion", but does not include the tag line, which it should.

[-][anonymous]13y-10

Google is smart enough to know about this kind of "trick" and trying it will actually decrease your pagerank. Do not meddle in the ways of google... ;)

This is all inherited from Reddit, right? Does Reddit get a lot of search traffic for babies?

My best SEO advice would be to turn the structural links (vote, edit, etc) into buttons (ie post instead of get). AFAIK, google doesn't consider buttons to be as "contenty" as ordinary links.

Actually, Less Wrong does have a fair amount of discussion about babies (mainly about killing them). And I would guess searches about babies are several orders of magnitude more frequent than searches about rationality.

Edit: Continuing this line of thought, maybe an effective strategy would be to figure out what potentially receptive people are searching for and write some posts about how to apply rationality to those things.

If someone wrote something like "Babies: A Rational Analysis", our site's current structuring would help it be unreasonably popular in Google. This would be analogous to Less Wrong "doing what it's best at".

CarlShulman's articles about voting are overly-popular for the same reason... probably by accident.

Does "Babies and Bunnies: A Caution About Evo-Psych" show this effect?

This would be analogous to Less Wrong "doing what it's best at".

I suggest you make a post of suggested topics that spring to mind. You don't have to write all the posts, but then someone inspired by the title can.

Can people please not write articles simply to improve Google ranking? That's dark sidish and also easily leads to a decline in content quality.

It looks to me like this is just a raw count of word occurrences rather than what google thinks are the most relevant keywords, because I wouldn't expect the latter to contain words like "it's". If I'm right then the list isn't very informative.

Regarding words like "vote" and "parent", I think one way to hide them would be to put them in buttons rather than links.

Google does do some word-ranking. From memory:

1) if it's in the url - it's more important

2) if it's in headings (h1/h2 etc tags) then it's more important - the bigger the tag the better... but in descending in order down the page (ie an h3 right at the top may be considered more important than an h1 at the bottom of the page)

3) google starts at the top of the page and works down. Stuff at the top is more important than stuff below that.

4) If it occurs more frequently, then it's probably more relevant (thus vote and parent)

5) If other links, that point at this site contain the same keywords.. then they are more important

There's plenty of other stuff that goes into this - most of which google keeps secret and it changes on a day by day basis. There are people who make whole careers (lucrative ones!) out of figuring it all out.

Are 'Top' and 'Bottom' defined as on the unstyled page? If so, sidebars may be getting undue weight...

Yes, defined as on the unstyled page, however, if you're talking about the right-hand sidebar... it appears below the content on the page (I checked). The only things that appear "above" the content are the header-image, the top tabbed-navigation and that discussion blurb.

This probably would be bad for performance, but purely structural sections of the site could be loaded in no-indexed iframes.

If we were dealing with certain Russian search engines, structural sections could be no-indexed inline:

Russian search engines Yandex and Rambler introduce a new tag which only prevents indexing of the content between the tags, not a whole Web page.

Do index this text block. Don't index this text block

Unfortunately, I don't see any indication that Google honors such a thing.

If HTML is supposed to be about semantics of the page, the NOINDEX tag should have been a part of every HTML specification, at least since server-side scripting became popular.

There is a lot of repeated text on each page of many websites, that really isn't part of the content, such as: "write your comment here", "next page", "previous page", "username / password", "permalink", etc.

I wonder if your website contains a word "permalink" in each page and comment, and there is one page that is really about permalinks, whether Google can tell the difference.

Your SEO problem with "votes" and "points" keywords is not entirely due to the comment-voting sections. It's also because of the short blurb above the main article-title.

Google ranks things literally from top-down (in the html)... and that blurb starting "This part of the site is for the discussion of topics" (class = infobar) - appears on most pages, and it appears above the H1 tag containing the article's title. Thus google thinks it's MORE important the main content of the article.

If you want that kind of thing to appear above the title... you can actually do funky things with CSS-positioning that will keep it below the article in the html, but appear to the humans as being at the top of the page.

I just noticed that in the recent comments feed, article links on comment replies to "Philosophy: A Diseased Discipline" go to http://lesswrong.com/r/lukeprog-drafts/lw/4zs/philosophy_a_diseased_discipline/ , which is a broken link because it's no longer a draft. That's probably bad for their rank, and it might be a more general problem.

It's a content vs. formatting issue. Words like vote, march, reply, points, etc are really formatting, but Google reads them as content.

To fix this, you could do a lot of JavaScript hacking so that the timestamps, etc are displayed using DHTML. The search engine robots won't run JavaScript, so they'll only see the content.

JS hacking will also make the page less stable, less accessible and more annoying to maintain. So it's possible, but there is a significant cost involved.

Well done, sir.

Unfortunately, I know very little about SEO.

Would it do anything to make the title be:

Article TItle - Less Wrong: a community blog devoted to refining the art of human rationality

googlehacking is a fine art... and too much can be just as detrimental as too little.

Utility, belief, intelligent, brain, decision and mind are also topical, aren't they? Arguably moral, argument, theory and science as well. Except for the structural elements and rationality being too low it doesn't look too bad.

From https://sites.google.com/site/webmasterhelpforum/en/faq--webmaster-tools :

Q: Why do my Webmaster Tools stats show common phrases such as "buy now" that are not directly related to my site?

A: While some common words and phrases are filtered by Webmaster Tools, there may be some that you use which are not. Having these words or phrases listed in your Webmaster Tools account does not mean that our algorithms will view your site as being only relevant for those keywords. While Webmaster Tools mostly counts the occurences of words on your site, our web-search algorithms use well over 200 other factors for crawling, indexing and ranking. In other words: don't worry if you see keywords like this listed in your Webmaster Tools account.

I couldn't find a more detailed estimation of the impact of such keywords, but we should consider the option of just ignoring the issue. Especially since according to this the only effective options are JavaScript or frames tricks, both of which would make LW significantly more annoying or slow to use.

taryneast's idea of using CSS to pretend-shove the opening blurb to the bottom of the page could be rather painless, though.

Great job!

[-][anonymous]13y00

it occurs to me that those most frequent structural words are embedded in anchors that have url's back to lesswrong itself.. seems like a decent heuristic for peeling apart structure and ignoring it?

Edit: I suppose my theory is that Google would make efforts to ignore structural terms in analyzing topic, that this wouldn't be all that hard, and that the 'babies' effect is a coincidence.

[-][anonymous]13y00

For the months: fix the date display so that the month isn't written out.

[-][anonymous]13y-40

I assume both the right and left will think that we support their cause because we're "rational".