1 min read

3

This is a special post for quick takes by Hastings. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
14 comments, sorted by Click to highlight new comments since:

Nuclear power has gotten to a point where we can use it quite safely as long as no one does the thing (the thing being chemically separating the plutonium and imploding it in your neighbor's cities) and we seem to be surviving, as while all the actors have put great effort into being ready do do "the thing," no one actually does it. I'm beginning to suspect that it will be worth separating alignment into two fields, one of "Actually make AI safe" and another, sadder but easier field of "Make AI safe as long as no one does the thing." I've made some infinitesimal progress on the latter, but am not sure how to advance, use or share it since currently, conditional on me being on the right track, any research that I tell basically anyone about will immediately be used to get ready to do the thing, and conditional on me being on the wrong track (the more likely case by far) it doesn't matter either way, so it's all downside. I suspect this is common? This is almost but not quite the same concept as "Don't advance capabilities."

The most important thing to realize about AI alignment is that basically all versions of practically aligned AI must make certain assumptions that no one does a specific action (mostly related to misuse reasons, but for some specific plans, can also be related to misalignment reasons).

Another way to say it is that I believe that in practice, these two categories are the same category, such that basically all work that's useful in the field will require someone not to do something, so the costs of sharing are practically 0, and the expected value of sharing insights is likely very large.

Specifically, I'm asserting that these 2 categories are actually one category for most purposes:

Actually make AI safe and another, sadder but easier field of "Make AI safe as long as no one does the thing."

Yeah, I think this is pretty spot on, unfortunately. For more discussion on this point, see: https://www.lesswrong.com/posts/kLpFvEBisPagBLTtM/if-we-solve-alignment-do-we-die-anyway-1

conditional on me being on the right track, any research that I tell basically anyone about will immediately be used to get ready to do the thing

Why? I don't understand.

Properties of the track I am on are load bearing in this assertion. (Explicitl examples of both cases from the original comment: Tesla worked out how to destroy any structure by resonating it, and took the details to his grave because he was pretty sure that the details would be more useful for destroying buildings than for protecting them from resonating weapons. This didn't actually matter because his resonating weapon concept was crankish and wrong. Einstein worked out how to destroy any city by splitting atoms, and disclosed this, and it was promptly used to destroy cities. This did matter because he was right, but maybe didn't matter because lots of people worked out the splitting atoms thing at the same time. It's hard to tell from the inside whether you are crankish)

The track you're on is pretty illegible to me. Not saying your assertion is true/false. But I am saying I don't understand what you're talking about, and don't think you've provided much evidence to change my views. And I'm a bit confused as to the purpose of your post. 

Diaper changes are rare and precious peace

Suffering from ADHD, I spend most of my time stressed that whatever I'm currently doing, it's not actually the highest priority task and something or someone I've forgotten is increasingly mad that I'm not doing their task instead.

One of the few exceptions is doing a diaper change. Not once in the past 2 years have I been mid-diaper-change and thought "Oh shit, there was something more important I needed to be doing right now."

A consistent trope in dath-ilani world-transfer fiction is "Well the theorems of agents are true in dath ilani and independent of physics, so they're going to be true here damnit"

How do we violate this in the most consistent way possible?

Well it's basically default that a dath ilani gets dropped in a world without the P NP distinction, usually due to time travel BS. We can make it worse- there's no rule that sapient beings have to exist in worlds with the same model of the peano axioms. We pull some flatlander shit- Keltham names a turing machine that would halt if two smart agents fall off the peano frontier and claims to have proof it never halts, and then the native math-lander chick says nah watch this and then together they iterate the machine for a very very long time- a non standard integer number of steps- and then it halts and Keltham (A) just subjectively experienced an integer larger than any natural number of his homeworld and (B) has a couterexample to his precious theorems 

I’m working on a theory post about the conjunction fallacy, and need some manifold users to bet on a pair of markets to make a demonstration more valid. I’ve put down 150 mana subsidy and 15 mana of boosts, anyone interested?

https://manifold.markets/HastingsGreer/pa-pa-b-experiment-statement-y?r=SGFzdGluZ3NHcmVlcg

https://manifold.markets/HastingsGreer/pa-pa-b-experiment-statement-x?r=SGFzdGluZ3NHcmVlcg

Lets examine an entirely prosaic situation: Carl, a relatively popular teenager at the local highschool, is deciding whether to invite Bob to this weekend's party.

some assumptions:

  • While pondering this decision for an afternoon, Carls's 10^11 neurons fire 10^2 times per second, for 10^5 seconds, each taking in to account 10^4 input synapses, for 10^22 calculations (extremely roughly)
  • If there was some route to perform this calculation more efficiently, someone probably would, and would be more popular

The important part of choosing a party invite as the task under consideration, is that I suspect that this is the category of task the human brain is tuned for- and it's a task that we seem to be naturally inclined to spend enormous amounts of time pondering, alone or in groups- see the trope of the 6 hour pre-prom telephone call. I'm inclined to respect that- to believe that any version of Carl, mechanical or biological, that spent only 10^15 calculations on whether to invite Bob, would eventually get shrecked on the playing field of high school politics.

What model predicts that optimal party planning is as computationally expensive as learning the statistics of the human language well enough to parrot most of human knowledge?

I think your calculations are off by orders of magnitude.  Not all neurons fire constantly at 100 times per second - https://aiimpacts.org/rate-of-neuron-firing/ estimates 0.29 to 1.82 times per second.  Most importantly perhaps, not all of the processing is directed to that decision.  During those hours, many MANY other things are happening.

Thanks for the link to the aiimpacts page! I definitely got the firing rate wrong by about a factor of 50, but I appear to have made other mistakes in the other direction, because I ended up at a number that roughly agrees with aiimpacts- I guessed 10^17 operations per second, and they guess .9 - 33 x 10^16, with low confidence. https://aiimpacts.org/brain-performance-in-flops/

 

[-][anonymous]30

and would be more popular

Not necessarily.  In high school politics, pure looks, physical form, and financial support from the parents, all of which are essentially unrelated to brain processing, account for a significant chunk.

Popular media reference: look at Jersey shore, which is essentially the high school politics turned up.  Many of the actors used very simple strategies, such as Snooki  wandering around drunk and saying funny things, or Ronnie essentially just doing plenty of steroids and getting into endless fights.

Other than making sure the robotics hardware looks good, an AI algorithm could be dramatically more compact than the example you gave by developing a "popularity maximizing" policy from the knowledge of many other robots in many other high schools.  Most likely, Carl is using a deeply suboptimal policy, not having seen enough training examples in his maximum of 4 years of episodes.  (unless he got held back a year).  A close to optimal policy, even one with a small compute budget, should greatly outperform Carl.

[+][comment deleted]10