Comment author: Stuart_Armstrong 20 October 2014 12:19:18PM 2 points [-]
Comment author: sbenthall 21 October 2014 12:08:24AM 1 point [-]

Thanks. That criticism makes sense to me. You put the point very concretely.

What do you think of the use of optimization power in arguments about takeoff speed and x-risk?

Or do you have a different research agenda altogether?

Comment author: lukeprog 20 October 2014 06:22:11PM *  0 points [-]

You might say bounded rationality is our primary framework for thinking about AI agents, just like it is in AI textbooks like Russell & Norvig's. So that question sounds to me like it might sound to a biologist if she was asked whether her sub-area had any connections to that "Neo-Darwinism" thing. :)

Comment author: sbenthall 21 October 2014 12:05:22AM 0 points [-]

That makes sense. I'm surprised that I haven't found any explicit reference to that in the literature I've been looking at. Is that because it is considered to be implicitly understood?

One way to talk about optimization power, maybe, would be to consider a spectrum between unbounded, LaPlacean rationality and the dumbest things around. There seems to be a move away from this though, because it's too tied to notions of intelligence and doesn't look enough at outcomes?

It's this move that I find confusing.

Comment author: DavidLS 20 October 2014 10:56:48AM 2 points [-]

Yeah, this is a brutal point. I wish I knew a good answer here.

Is there a gold standard approach? Last I checked even the state of the art wasn't particularly good.

Facebook / Google / StumbleUpon ads sound promising in that they can be trivially automated, and if only ad respondents could sign up for the study, then the friend issue is moot. Facebook is the most interesting of those, because of the demographic control it gives.

How bad is the bias? I performed a couple google scholar searches but didn't find anything satisfying.

To make things more complicated, some companies will want to test highly targeted populations. For example, Apptimize is only suitable for mobile app developers -- and I don't see a facebook campaign working out very well for locating such people.

A tentative solution might be having the company wishing to perform the test supply a list of websites they feel caters to good participants. This is even worse than facebook ads from a biasing perspective though. At minimum it sounds like disclosing how participants were located prominently will be important.

Comment author: sbenthall 20 October 2014 11:56:56PM 3 points [-]

There are people in my department who do work in this area. I can reach out and ask them.

I think Mechanical Turk gets used a lot for survey experiments because it has a built-in compensation mechanism and there are ways to ask questions in ways that filter people into precisely what you want.

I wouldn't dismiss Facebook ads so quickly. I bet there is a way to target mobile app developers on that.

My hunch is that like survey questions, sampling methods are going to need to be tuned case-by-case and patterns extracted inductively from that. Good social scientific experiment design is very hard. Standardizing it is a noble but difficult task.

Comment author: lukeprog 19 October 2014 03:27:07AM 3 points [-]

It's not much, but: see our brief footnote #3 in IE:EI and the comments and sources I give in What is intelligence?

Comment author: sbenthall 20 October 2014 05:23:50AM 0 points [-]

Thanks. That's very helpful.

I've been thinking about Stuart Russell lately, which reminds me...bounded rationality. Isn't there a bunch of literature on that?

http://en.wikipedia.org/wiki/Bounded_rationality

Have you ever looked into any connections there? Any luck with that?

Comment author: Gunnar_Zarncke 18 October 2014 07:27:15PM *  4 points [-]

I see two main ways to deal mathematically with these optimization processes:

1) The first is an 'whatever-it-takes' process that realizes a goal function ideally (in the limit). To get a feel how the mathematics looks I suggest a look at the comparable mathematics of the operational amplifier (short op-amp).

An ideal op-amp also does whatever it takes to realize the transfer function applied to the input. Non-ideal i.e. real op-amps fail this goal but one can give operating ranges by comparing the parameters of the tranfer function elements with the prameters (mostly the A_OL) of the op-amp.

I think this is a good model for the limiting case because we abstract the 'optmization process' as a black box and look at what it does to its goal function - namely realize it. We just can make this mathematcally precise.

2) My second model tries to model the differential equations following from EYs description of Recursive Self-Improvement (RSI) namely the PDEs relating "Optimization slope", "Optimization resources", "Optimization efficiency" with actual physical quantities. I started to write the equations down and put a few into Wolfram Alpha but didn't have time to do a comprehensive analysis. But I'd think that the resulting equations form classes of functions which could be classified by their associated complexity and risk.

And when searching for RSI look what I found:

Mathematical Measures of Optimization Power

Comment author: sbenthall 20 October 2014 05:20:32AM 1 point [-]

1) This is an interesting approach. It looks very similar to the approach taken by the mid-20th century cybernetics movement--namely, modeling social and cognitive feedback processes with the metaphors of electrical engineering. Based on this response, you in particular might be interested in the history of that intellectual movement.

My problem with this approach is that it considers the optimization process as a black box. That seems particularly unhelpful when we are talking about the optimization process acting on itself as a cognitive process. It's easy to imagine that such a thing could just turn itself into a superoptimizer, but that would not be taking into account what we know about computational complexity.

I think that it's this kind of metaphor that is responsible for "foom" intuitions, but I think those are misplaced.

2) Partial differential equations assume continuous functions, no? But in computation, we are dealing almost always with discrete math. What do you think about using concepts from combinatorial optimization theory, since those are already designed to deal with things like optimization resources and optimization efficiency?

Comment author: sbenthall 20 October 2014 05:04:50AM 1 point [-]

Could you please link to examples of the kind of marketing studies that you are talking about? I'd especially like to see examples of those that you consider good vs. those you consider bad.

Comment author: DavidLS 20 October 2014 02:23:29AM 1 point [-]

Thank you for posting this!

I'm feeling like in this situation, I can safely say "I love standards, there are so many to choose from"

Getting a list of LessWrong approved questions would be awesome. Both because I think the LW list will be higher quality than a lot of what's out there, and because I feel question choice is one of the free variables we shouldn't leave in the hands of the corporation performing the test.

Comment author: sbenthall 20 October 2014 05:03:36AM 2 points [-]

I am confused. Shouldn't the questions depend on the content of the study being performed? Which would depend (very specifically) on the users/clients? Or am I missing something?

Comment author: DavidLS 19 October 2014 07:44:50AM 4 points [-]

Oh, interesting.

I had been assuming that participants needed to be drawn from the general population. If we don't think there's too much hazard there, I agree a points system would work. Some portion of the population would likely just enjoy the idea of receiving free product to test.

Comment author: sbenthall 20 October 2014 04:55:33AM 3 points [-]

I would worry about sampling bias due to selection based on, say, enjoying points.

Comment author: DavidLS 19 October 2014 01:03:28AM *  4 points [-]

2 - Is the data (presumably anonymized) made publicly available, so that others can dispute the meaning?

That was the initial plan, yes! Beltran (my co-founder at GB) is worried that will result in either HIPPA issues or something like this, so I'm ultimately unsure. Putting structures in place so the science is right the first time seems better.

Comment author: sbenthall 20 October 2014 04:52:41AM 4 points [-]

The privacy issue here is interesting.

It makes sense to guarantee anonymity. Participants recruited personally by company founders may be otherwise unwilling to report honestly (for example). For health related studies, privacy is an issue for insurance reasons, etc.

However, for follow-up studies, it seems important to keep earlier records including personally identifiable information so as to prevent repeatedly sampling from the same population.

That would imply that your organization/system needs to have a data management system for securely storing the personal data while making it available in an anonymized form.

However, there are privacy risks associated with 'anonymized' data as well, since this data can sometimes be linked with other data sources to make inferences about participants. (For example, if participants provide a zip code and certain demographic information, that may be enough to narrow it down to a very few people.) You may want to consider differential privacy solutions or other kinds of data perturbation.

http://en.wikipedia.org/wiki/Differential_privacy

Comment author: DavidLS 18 October 2014 11:42:06PM *  13 points [-]

Thanks for pointing this out.

Let's use Beeminder as an example. When I emailed Daniel he said this: "we've talked with the CFAR founders in the past about setting up RCTs for measuring the effectiveness of beeminder itself and would love to have that see the light of day".

Which is a little open ended, so I'm going to arbitrarily decide that we'll study Beeminder for weight loss effectiveness.

Story* as follows:

Daniel goes to (our thing).com and registers a new study. He agrees to the terms, and tells us that this is a study which can impact health -- meaning that mandatory safety questions will be required. Once the trial is registered it is viewable publicly as "initiated".

He then takes whatever steps we decide on to locate participants. Those participants are randomly assigned to two groups: (1) act normal, and (2) use Beeminder to track exercise and food intake. Every day the participants are sent a text message with a URL where they can log that day's data. They do so.

After two weeks, the study completes and both Daniel and the world are greeted with the results. Daniel can now update Beeminder.com to say that Beeminder users lost XY pounds more than the control group... and when a rationalist sees such claims they can actually believe them.

  • Note that this story isn't set in stone -- just a sketch to aid discussion
Comment author: sbenthall 20 October 2014 04:39:17AM *  3 points [-]

He then takes whatever steps we decide on to locate participants.

Even if the group assignments are random, the prior step of participant sampling could lead to distorted effects. For example, the participants could be just the friends of the person who created the study who are willing to shill for it.

The studies would be more robust if your organization took on the responsibility of sampling itself. There is non-trivial scientific literature on the benefits and problems of using, for example, Mechanical Turk and Facebook ads for this kind of work. There is extra value added for the user/client here, which is that the participant sampling becomes a form of advertising.

View more: Prev | Next