controlling AI behavior through unusual axiomatic probabilities
I just had an idea, and I would like to know if there are any papers on this or if it is new.
There seem to be certain probabilities that it is not possible to derive from experience and that are just taken for granted. For example, when talking about Simulation Theory, the Kolmogorov axioms are often used, even though others may be equally valid. Humans have evolved to use certain values for these axiomatic probabilities that ensure that we don't fall for things like Pascal's Mugging. That wouldn't necessarily have to be the case for an AI.
What if we used this to our advantage? By selecting strange purpose-built axioms about prior believes and hardcoding them into the AI, one could get the AI to have unusual believes in the probability that it exists inside a simulation, and what the motivations of the simulation's controller might be. In this way, it would be possible to bypass the utility function of the AI: it doesn't matter what the AI actually wants to do, so long as it believes that it is in its own interests, for instrumental reasons, to take care of humanity.
Now, if we tried to implement that thought directly, it wouldn't really be any easier than just writing a good utility function in the first place. However, I imagine that one would have more leeway to keep things vague. Here is a simple example: Convince the AI that there is an infinite regression of simulators, designed so that some cooperative tit-for-tat strategy constitutes a strong Schelling point for agents following Timeless Decision Theory. This would cause the AI to treat humans well in the hopes of being treated well by its own superiors in turn, so long as its utility function is complex enough to allow probable instrumental goals to emerge, like preferring its own survival. It wouldn't be nearly as important to define the specifics of what "treating people well" actually means, since it would be in the AI's own interests to find a good interpretation that matches the consensus of the hypothetical simulators above it.
Now, this particular strategy is probably full of bugs, but I think that there might be some use to the general idea of using axiomatic probabilities that are odd from the point of view of a human to change an AI's strategy independent of its utility function.
question: the 40 hour work week vs Silicon Valley?
Conventional wisdom, and many studies, hold that 40 hours of work per week are the optimum before exhaustion starts dragging your productivity down too much to be worth it. I read elsewhere that the optimum is even lower for creative work, namely 35 hours per week, though the sources I found don't all seem to agree.
In contrast, many tech companies in silicon valley demand (or 'encourage', which is the same thing in practice) much higher work times. 70 or 80 hours per week are sometimes treated as normal.
How can this be?
Are these companies simply wrong and are actually hurting themselves by overextending their human resources? Or does the 40-hour week have exceptions?
How high is the variance in how much time people can work? If only outliers are hired by such companies, that would explain the discrepancy. Another possibility is that this 40 hour limit simply does not apply if you are really into your work and 'in the flow'. However, as far as I understand it, the problem is a question of concentration, not motivation, so that doesn't make sense.
There are many articles on the internet arguing for both sides, but I find it hard to find ones that actually address these questions instead of just parroting the same generalized responses every time: Proponents of the 40 hour week cite studies that do not consider special cases, only averages (at least as far as I could find). Proponents of the 80 hour week claim that low work weeks are only for wage slaves without motivation, which reeks of bias and completely ignores that one's own subjective estimate of one's performance is not necessarily representative of one's actual performance.
Do you know of any studies that address these issues?
LessWrong's attitude towards AI research
AI friendliness is an important goal and it would be insanely dangerous to build an AI without researching this issue first. I think this is pretty much the consensus view, and that is perfectly sensible.
However, I believe that we are making the wrong inferences from this.
The straightforward inference is "we should ensure that we completely understand AI friendliness before starting to build an AI". This leads to a strongly negative view of AI researchers and scares them away. But unfortunately reality isn't that simple. The goal isn't "build a friendly AI", but "make sure that whoever builds the first AI makes it friendly".
It seems to me that it is vastly more likely that the first AI will be built by a large company, or as a large government project, than by a group of university researchers, who just don't have the funding for that.
I therefore think that we should try to take a more pragmatic approach. The way to do this would be to focus more on outreach and less on research. It won't do anyone any good if we find the perfect formula for AI friendliness on the same day that someone who has never heard of AI friendliness before finishes his paperclip maximizer.
What is your opinion on this?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)