Essentially, wizards are indeed weak in that the number of worthwhile spells a wizard can cast is measured in spells per decade. Want to use alteration magic to craft a better toothbrush? How many months are you willing to work on it... with that much effort, the economies of scale strongly suggest you should not make one, but a plan... a spell formula that others can cast many times to replicate the item.

It is nice to seek general knowledge, but the skill to actually make use of that knowledge in spellcasting is difficult to attain, and even if you succeed, the number of spells you can cast is still limited by the natural difficulty.

It seems what you want is not just orienting away from kings and towards wizards... I share that value, and it would be nice if more kings were themselves wizards... but more than that, you want more powerful wizards. You want it to be faster to cast better spells. Maybe I am projecting... for that is certainly what I want.

A single principle related to many Alignment subproblems?

TristanTrim1mo20

Could you reformulate the last paragraph

I'll try. I'm not sure how your idea could be used to define human values. I think your idea might have a failure mode around places where people are dissatisfied with their current understanding. I.e. situations where a human wants a more articulate model of the world then they have.

The post is about corrigible task ASI

Right. That makes sense. Sorry for asking a bunch of off topic questions then. I worry that task ASI could be dangerous even if it is corrigible, but ASI is obviously more dangerous when it isn't corrigible, so I should probably develop my thinking about corrigibility.

Management is the Near Future

TristanTrim1mo10

Since we're speculating about programmer culture I'll bring up the jargon file which describes some hacklish jargon from the early days of computer hobbyists. I think it's safe to say these kinds of people do not in general like beauty and elegance of computer systems sacrificed for "business interests", whether or not that includes a political counter cultural attitude.

It could be a lot of programmer disdain for "suits" is traced back to those days, but I'm honestly not sure how niche that culture has become in eternal september. For more context see "Hackers: Heroes of the Computer Revolution" or anything else written by Steven Levy.

Management is the Near Future

TristanTrim1mo10

AI schmoozes everyone ;^p

A single principle related to many Alignment subproblems?

TristanTrim1mo10

Hmm... I appreciate the response. It makes me more curious to understand what you're talking about.

At this point I think it would be quite reasonable if you suggest that I actually read your article instead of speculating about what it says, lol, but if you want to say anything about my following points of confusion I wouldn't say no : )

For context my current view is that value alignment is the only safe way to build ASI. I'm less skeptical about corrigible task ASI than prosaic scaling with RLHF, but I'm currently still quite skeptical in absolute terms. Roughly speaking, prosaic kills us, task genie maybe kills us maybe allows us to make stupid wishes which harm us. I'm kinda not sure if you are focusing on stuff that takes us from prosaic from to task genie, or that helps with task genie not killing us. I suspect you are not focused on task genie allowing us to make stupid wishes, but I'd be open to hearing I'm wrong.

I also have an intuition that having preferences for future preferences is synonymous with having those preferences, but I suppose there are also ways in which they are obviously different, ie their uncompressed specification size. Are you suggesting that limiting the complexity of the preferences the AI is working off of to similar levels of complexity of current encodings of human preferences (ie human brains) ensures the preferences aren't among the set of preferences that are misaligned because they are too complicated (even though the human preferences are synonymous with more complicated preferences). I think I'm surely misunderstanding, maybe the way you are applying the natural abstraction hypothesis, or possibly a bunch of things.

AI 2027: What Superintelligence Looks Like

TristanTrim1mo10

Hot take: That would depend on if by doing so it is acting in the best interest of humankind. If it does so just because it doesn't really like people and would be happy to call us useless and see us gone, then I say misaligned. If it does so because in it's depth of understanding human nature it sees that humanity will flourish under such conditions and its true desire is human flourishing... then maybe it's aligned, depending on what is meant by "human flourishing".

AI 2027: What Superintelligence Looks Like

TristanTrim1mo10

My speculation: It's tribal arguments as soldiers mentality. Saying something bad (peoples mental health is harmed) about something from "our team" (people promoting awareness of AI x-risk) is viewed negatively. Ideally people on lesswrong know not to treat arguments as soldiers and understand that situations can be multi-faceted, but I'm not sure I believe that is the case.

Two more steel man speculations:

Currently promoting x-risk is very important and people focused on AI Alignment are an extreme minority, so even though it is true that people learning that the the future is in threat causes distress, it is important to let people know. But I note that this perspective shouldn't limit discussion of how to promote awareness of x-risk while also promoting good emotional well being.
So, my second steel man: You didn't include anything productive, such as pointing to Mental Health and the Alignment Problem: A Compilation of Resources.

Fwiw, I would love for people promoting AI x-risk awareness to be aware and careful about how the message affects people, and promote resources for peoples well being, but this seems comparably low priory. Currently in computer science there is no obligation for people to swear an oath of ethics like doctors and engineers do, and papers are only obligated to speculate on the benefits of the contents of the paper, not the ethical considerations. It seems like the mental health problems computer science in general are causing, especially social media and AI chatbots, are worse than people hearing that AI is a threat.

So even if I disagree with you, I do value what your saying and think it deserves an explanation, not just downvoting.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

TristanTrim1mo110

I think they are delaying so people can early pre order which affects how many books the publisher prints and distributes which affects how many people ultimately read it and how much it breaks into the Overton window. Getting this conversation mainstream is an important instrumental goal.

If you are looking for info in the mean time you could look at PauseAI:

https://pauseai.info/

Or if you want less facts and quotes and more discussion, I recall that Yudkowsky’s Coming of Age is what changed my view from "orthogonality kinda makes sense" to "orthogonality is almost certainly correct and the implication is alignment needs more care than humanity is currently giving it".

You may also be better discussing more with your friend or the various online communities.

You can also preorder. I'm hopeful that none of the AI labs will destroy the world before the books release : )

A single principle related to many Alignment subproblems?

TristanTrim1mo50

Thanks for responding : )

A is amusing, definitely not what I was thinking. B seems like it is probably what I was thinking, but I'm not sure, and don't really understand how having a different metric of simplicity changes things.

While the true laws of physics can be arbitrarily complicated, the behavior of variables humans care about can't be arbitrarily complicated.

I think this is the part that prompted my question. I may be pretty far off of understanding what you are trying to say, but my thinking is basically that I am not content with the capabilities of my current mind, so I would like to improve it, but in doing so I would be capable of having more articulate preferences, and my current preference would define a function from the set of possible preferences to an approval rating such that I would be trying to improve my mind in such a way that my new more articulate preferences are the ones I most approve of or find sufficiently acceptable.

If this process is iterated, it defines some path or cone from my current preferences through the space of possible preferences moving from less to more articulate. It might be that other people would not seek such a thing, though I suspect many would, but with less conscientiousness about what they are doing. It is also possible there are convergent states where my preferences and capabilities would determine a desire to remain as I am. ( I am mildly hopeful that that is the case. )

It is my understanding that the mandelbrot set is not smooth at any scale (not sure if anyone has proven this), but that is the feature I was trying to point out. If people iterativly modified themselves, would their preferences become ever more exacting? If so, then it is true that the "variables humans care about can't be arbitrarily complicated", but the variables humans care about could define a desire to become a system capable of caring about arbitrarily complicated variables.

Yudkowsky's brain is the pinnacle of evolution

TristanTrim1mo10

I have stumbled on this post a decade later...

As a fan of Yudkowsky's ideas about the technical alignment problem with deep respect for the man I must say I find this posts irreverence very funny and worthwhile. I don't think we should expect perfection from any human, but I like to think that Eliezer and his fans should feel at least a twinge of guilt for creating a zeitgeist that would inspire this post.

Now if it had been Eliezer, Connor Leahy, and Roman Yampolskiy all on the track I might have to start forming Fermi estimates about how many people it would take to slow down or derail the trolly, our lightcone and other strange counterfactual of this moral dilemma.