You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Internet Research (with tangent on intelligence analysis and collapse)

11 [deleted] 31 July 2013 04:58AM

Want to save time? Skip down to "I'm looking to compile a thread on Internet Research"!

Opinionated Preamble:

There is a lot of high level thinking on Less Wrong, which is great. It's done wonders to structure and optimize my own decisions. I think the political and futurology-related issues that Less Wrong cover can sometimes get out of sync with the reality and injustices of events in the immediate world. There are comprehensive treatments of how medical science is failing, or how academia cannot give unbiased results, and this is the milieu of programmers and philosophers in the middle-to-upper-class of the planet. I at least believe that this circle of awareness can be expanded, even if it's treading into mind-killing territory. If anything I want to give people a near-mode sense of the stakes aside from x-risk: all in all the x-risk scenarios I've seen Less Wrong fear the most, kill humanity somewhat instantly. A slower descent into violence and poverty is to me much more horrifying, because I might have to live in it and I don't know how. In a matter of fact, I have no idea of how to predict it.

This is one reason why I'm drawn to the Intelligence Operations performed by the military and crime units, among other things. Intelligence product delivery is about raw and immediate *fact*, and there is a lot of it. The problems featured in IntelOps are one of the few things rationality is good for - highly uncertain scenarios with one-off executions and messy or noisy feedback. Facts get lost in translation as messages are passed through, and of course the feeding and receiving fake facts are all a part of the job - but nevertheless, knowing *everything* *everywhere* is in the job description, and some form of rationality became a necessity.

It gets ugly. The demand for these kinds of skills often lie in industries that are highly competitive, violent, and illegal. I believe that once a close look is taken on how force and power is applied in practice then there isn't any pretending anymore that human evils are an accident.

Open Source Intelligence, or "OSINT", is the mining of data and facts from public information databases, news articles, codebases, journals. Although the amount of classified data dwarfs the unclassified, the size and scope of the unclassified is responsible for a majority of intelligence reports - and thus is involved in the great majority of executive decisions made by government entities. It's worth giving some thought as to how much that we know, that they do too. As illustrated in this expose, the processing of OSINT is a great big chunk of what modern intelligence is about aside from many other things. I think understanding how rationality as developed on Less Wrong can contribute to better IntelOps, and how IntelOps can feed the rationality community, would be awesome, but that's a post for another time.

--

The Show

Through my investigations into IntelOps I've noticed the emphasis on search. Good search.

I'm looking to compile a thread on Internet Research. I'm wondering if there is any wisdom on Less Wrong that can be taken advantage of here on how to become more effective searchers.  Here are some questions that could be answered specifically, but they are just guidelines - feel free to voice associated thoughts, we're exploring here.

  • Before actually going out and searching, what would be the most effective way of drafting and optimizing a collection plan? Are there any formal optimization models that inform our distribution of time and attention? Exploration vs exploitation comes to mind, but it would be worth formulating something specific. I heard that the multi-armed bandit problem is solved?
  • Do you have any links or resources regarding more effective search?
  • Do you have any experiences regarding internet research that you can share? Any patterns that you've noticed that have made you more effective at searching?
  • What are examples of closed-source information that are low-hanging fruit in terms of access (e.g. academic journals)? What are possible strategies for acquiring closed source data (e.g. enrolling in small courses at universities, e-mailing researchers, cohesion via the law/Freedom of Information Act, social engineering etc)?
  • I would like to hear from SEOs and software developers on what their interpretation of semantic web technologies and how they are going to affect end-users. I am somewhat unfamiliar with the semantic web, but from my understanding information that could not be indexed is now indexed; and new ontologies will emerge as this information is mined. What should an end-user expect and what opportunities will there be that didn't exist in the current generation of search?

That should be enough to get started. Below are some links that I have found useful with respect to Internet Research.

--

Meta-Search Engines or Assisted Search:

Summarizers:

Bots/Collectors/Automatic Filters:

Compilations and Directories:

Guides:

Practice:

I don't really care how you use this information, but I hope I've jogged some thinking of why it could be important.

LessWrong search traffic doubles

22 Louie 25 March 2011 10:01PM

LessWrong search traffic doubles... despite Google thinking our site is a pro-family pro-democracy astrology blog! More on that in a minute.

First, The Good News: Since I started doing SEO on LessWrong (10 months ago) search traffic from Google has doubled! It took researching >200 different techniques -- actually implementing 14 of them (w/ help from Tricycle) -- 2 of which I think are responsible for most of the improvement:

  • Reversing titles (e.g., "Less Wrong - OMG Scholarship!" -> "OMG Scholarship! - Less Wrong")
  • No-Following / No-Indexing a complex set of duplicate content
The analytics make me believe that this improvement is due to structural changes and not just generally increased traffic. But it certainly hasn't hurt that people have been writing new content and that HP:MoR exists.

Anyway, I'm really happy about this! This was the explicit goal I set for myself 10 months ago. It's nice to achieve goals... especially unreasonably ambitious ones.

So... YAY!! :D

OK, Now, The Bad News: So I was trying to figure out why we never get any traction for search terms like "rationality" when I looked through Google Webmaster tools. This is what Google thinks our site is about, keyword wise:

 

Keyword Occurrences
vote 196504
points 152881
permalink 95106
children 84578
parent 56374
people 37047
it's 27082
march 21846
february 21520
january 20425
human 19587
december 18005
september 15695
august 15667
password 15377
april 14714
october 14011
seem 12822
november 11546
july 11265
june 9283
world 8542
post 8496
actual 8251
probability 8114
child 7828
moral 7787
work 7143
might 6250
new 6156
theory 5827
argument 5639
read 5278
utility 5206
account 5002
evident 4777
belief 4749
remember 4691
recent 4584
intelligent 4582
science 4424
eliezer 4384
doesn't 4339
rationality 4188
brain 3969
decision 3904
life 3795
username 3732
mind 3721

All the keywords that I bolded are purely structural elements of the Less Wrong site layout. And it appears Google actually is punishing our site for this keyword density imbalance. Google really does think our site is about voting, parenting, and astrology. And while I find it somewhat hilarious that our top source of Google impressions (27,000/mo) is for the keyword "babies", I also lament that the keyword "rationality" is our #3955 source of traffic. We should invert this.

So does anyone have any ideas? How do other sites solve this problem?