Hello, last time a taught a class on the basics of Bayesian epistemology. This time I will teach a class that goes a bit further. I will explain what a proper scoring rule is and we will also do some calibration training. In particular, we will play a calibration training game called two lies, a truth, and a probability. I will do this at 7:30 the same place as last time. Come by to check it out.
Hello! Please note that I will be giving a class called the Bayesics in Eigen hall at 7:30. Heard of Bayes's theorem but don't fully understand what the fuss is about? Want to have an intuitive as well as formal understanding of what the Bayesian framework is? Want to learn how to do bayesian updates in your head? Come and learn the Bayesics.
Also, please note that I will be giving a class at 7:30 after the reading group called "The Bayesics" where I will teach you the basics of intuitive Bayesian epistemology and how to do Bayesian updates irl on the fly as a human. All attending the reading group are welcome to join for that as well.
I think you should still write it. I'd be happy to post it instead or bet with you on whether it ends up negative karma if you let me read it first.
AN APOLOGY ON BEHALF OF FOOLS FOR THE DETAIL ORIENTED
Misfits, hooligans, and rabble rousers.
Provocateurs and folk who don’t wear trousers.
These are my allies and my constituents.
Weak in number yet suffused with arcane power.
I would never condone bullying in my administration.
It is true we are at times moved by unkind motivations.
But without us the pearl clutchers, hard asses, and busy bees would overrun you.
You would lose an inch of slack per generation.
Many among us appreciate your precision.
I ad...
Hey, I'm just some guy but I've been around for a while. I want to give you a piece of feedback that I got way back in 2009 which I am worried no one has given you. In 2009 I found lesswrong, and I really liked it, but I got downvoted a lot and people were like "hey, your comments and posts kinda suck". They said, although not in so many words, that basically I should try reading the sequences closely with some fair amount of reverence or something.
I did that, and it basically worked, in that I think I really did internalize a lot of the values/taste...
I want to note for posterity that I tried to write this reading list somewhat impartially. That is, I have a lot of takes about a lot of this stuff, and I tried to include a lot of material that I disagree with but which I have found helpful in some way or other. I also included things that people I trust have found helpful even if I personally never found it helpful.
I believe there isn't really a deadline! You just buy tickets and then you can come. Tickets might sellout is the limiting factor.
In retrospect I think the above was insufficiently cooperative. Sorry,
To be clear, I did not think we were discussing the AI optimist post. I don't think Nate thought that. I thought we were discussing reasons I changed my mind a fair bit after talking to Quintin.
I meant the reasonable thing other people knew I meant and not the deranged thing you thought I might've meant.
Yeah I’m totally with you that it definitely isn’t actually next token prediction, it’s some totally other goal drawn from the dist of goals you get when you sgd for minimizing next token prediction surprise.
...I suppose I'm trying to make a hypothetical AI that would frustrate any sense of "real self" and therefore disprove the claim "all LLMs have a coherent goal that is consistent across characters". In this case, the AI could play the "benevolent sovereign" character or the "paperclip maximizer" character, so if one claimed there was a coherent underlying goal I think the best you could say about it is "it is trying to either be a benevolent sovereign or maximize paperclips". But if your underlying goal can cross such a wide range of behaviors it is practical
So the shoggoth here is the actual process that gets low loss on token prediction. Part of the reason that it is a shoggoth is that it is not the thing that does the talking. Seems like we are onboard here.
The shoggoth is not an average over masks. If you want to see the shoggoth, stop looking at the text on the screen and look at the input token sequence and then the logits that the model spits out. That's what I mean by the behavior of the shoggoth.
On the question of whether it's really a mind, I'm not sure how to tell. I know it gets really ...
The shoggoth is supposed to be a of a different type than the characters. The shoggoth for instance does not speak english, it only knows tokens. There could be a shoggoth character but it would not be the real shoggoth. The shoggoth is the thing that gets low loss on the task of predicting the next token. The characters are patterns that emerge in the history of that behavior.
I agree that to the extent there is a shoggoth, it is very different than the characters it plays, and an attempted shoggoth character would not be "the real shoggoth". But is it even helpful to think of the shoggoth as being an intelligence with goals and values? Some people are thinking in those terms, e.g. Eliezer Yudkowsky saying that "the actual shoggoth has a motivation Z". To what extent is the shoggoth really a mind or an intelligence, rather than being the substrate on which intelligences can emerge? And to get back to the point I was trying to ma...
Yeah I think this would work if you conditioned on all of the programs you check being exactly equally intelligent. Say you have a hundred superintelligent programs in simulations and one of them is aligned, and they are all equally capable, then the unaligned ones will be slightly slower in coming up with aligned behavior maybe, or might have some other small disadvantage.
However, in the challenge described in the post it's going to be hard to tell a level 999 aligned superintelligence from a level 1000 unaligned superintelligence.
I think the advant...
Quick submission:
The first two prongs of OAI's approach seems to be aiming to get a human values aligned training signal. Let us suppose that there is such a thing, and ignore the difference between a training signal and a utility function, both of which I think are charitable assumptions for OAI. Even if we could search the space of all models and find one that in simulations does great on maximizing the correct utility function which we found by using ML to amplify human evaluations of behavior, that is no guarantee that the model we find in that search ...
I loved this, but maybe should come with a cw.
I would think the title is itself a content warning.
I guess someone might think this post is or could be far more abstract and less detailed about the visceral realities than it is (or maybe even just using the topic as a metaphor at most).
What kind of specific content warning do you think would be appropriate? Maybe "Describes the dissection of human bodies in vivid concrete terms."?
I assumed he meant the thing that most activates the face detector, but from skimming some of what people said above, seems like maybe we don't know what that is.
There's a nearby kind of obvious but rarely directly addressed generalized version of one of your arguments, which is that ML learns complex functions all the time, so why should human values be any different? I rarely see this discussed, and I thought the replies from Nate and the ELK related difficulties were important to have out in the open, so thanks a lot for including the face learning <-> human values learning analogy.
For at least about ten years in my experience people in this community have been saying the main problem isn't getting the AI to understand human values, it's getting the AI to have human values. Unfortunately the word "learn human values" is sometimes used to mean "have human values" and sometimes used to mean "understand human values" hence the confusion.
Ege Erdil gave an important disanaology between the problem of recognizing/generating a human face, and the problem of either learning human values, or learning what plans that advance human values are like. The disanalogy is that humans are near perfect human face recognizers, but we are not near perfect valuable world-state or value-advancing-plan recognizers. This means that if we trained an AI to either recognize valuable world-states or value-advancing plans, we would actually end up just training something that recognizes what we can recognize as val...
I also find it somewhat taboo but not so much that I haven’t wondered about it.
Just realized that’s not UAI. Been looking for this source everywhere, thanks.
Ok I understand that although I never did find a proof that they are equivalent in UAI. If you know where it is, please point it out to me.
I still think that solomonoff induction assigns 0 to uncomputable bit strings, and I don’t see why you don’t think so.
Like the outputs of programs that never halt are still computable right? I thought we were just using a “prints something eventually oracle” not a halting oracle.
Simple in the description length sense is incompatible with uncomputability. Uncomputability means there is no finite way to point to the function. That’s what I currently think, but I’m confused about you understanding all those words and disagreeing.
A lot of folks seem to think that general intelligences are algorithmically simple. Paul Christiano seems to think this when he says that the universal distribution is dominated by simple consequentialists.
But the only formalism I know for general intelligences is uncomputable, which is as algorithmically complicated as you can get.
The computable approximations are plausibly simple, but are the tractable approximations simple? The only example I have of a physically realized agi seems to be very much not algorithmically simple.
Thoughts?
After trying it, I've decided that I am going to charge more like five dollars per step, but yes, thoughts included.
Can we apply for consultation as a team of two? We only want remote consultation of the resources you are offering because we are not based in bay area.
For anyone who may have the executive function to go for the 1M, I propose myself as a cheap author if I get to play as the dungeon master role, or play as the player role, but not if I have to do both. I recommend taking me as the dungeon master role. This sounds genuinely fun to me. I would happily do a dollar per step.
I can also help think about how to scale the operation, but I don’t think I have the executive function, management experience, or slack to pull it off myself.
I am Ronny Fernandez. You can contact me on fb.
I'm setting up a place for writers and organizers to find each other, collaborate, and discuss this; please join the Discord. More details in this comment.
I similarly offer myself as an author, in either the dungeon master or player role. I could possibly get involved in the management or technical side of things, but would likely not be effective in heading a project (for similar reasons to Brangus), and do not have practical experience in machine learning.
I am best reached through direct message or comment reply here on Lesswrong, and can provide other contact information if someone wants to work with me.
I came here to say something pretty similar to what Duncan said, but I had a different focus in mind.
It seems like it's easier for organizations to coordinate around PR than it is for them to coordinate around honor. People can have really deep intractable, or maybe even fundamental and faultless, disagreements about what is honorable, because what is honorable is a function of what normative principles you endorse. It's much easier to resolve disagreements about what counts as good PR. You could probably settle most disagreements about what co...
As a counterpoint, one writer thinks that it's psychologically harder for organizations to think about PR:
...A famous investigative reporter once asked me why my corporate clients were so terrible at defending themselves during controversy. I explained, “It’s not what they do. Companies make and sell stuff. They don’t fight critics for a living. And they dread the very idea of a fight. Critics criticize; it’s their entire purpose for existing; it’s what they do.”
"But the companies have all that money!” he said, exasperated.
"But their critics have you,” I said
It's much easier to resolve disagreements about what counts as good PR.
I mostly disagree. I mean, maybe this applies in comparison to “honor” (not sure), but I don’t think it applies in comparison to “reputation” in many of the relevant senses. A person or company could reasonably wish to maintain a reputation as a maker of solid products that don’t break, or as a reliable fact-checker, or some other such specific standard. And can reasonably resolve internal disagreements about what is and isn’t likely to maintain this reputation.
If it was actually ...
This might be sort of missing the point, but here is an ideal and maybe not very useful not-yet-theory of rationality improvements I just came up with.
There are a few black boxes in the theory. The first takes you and returns your true utility function, whatever that is. Maybe it's just the utility function you endorse, and that's up to you. The other black box is the space of programs that you could be. Maybe it's limited by memory, maybe it's limited by run time, or maybe it's any finite state machine with less than 10^20 states...
I don't think we should be surprised that any reasonable utility function is uncomputable. Consider a set of worlds with utopias that last only as long as a Turing machine in the world does not halt and are otherwise identical. There is one such world for each Turing machine. All of these worlds are possible. No computable utility function can assign higher utility to every world with a never halting Turing machine.
I do think this is an important concept to explain our conception of goal-directedness, but I don't think it can be used as an argument for AI risk, because it proves too much. For example, for many people without technical expertise, the best model they have for a laptop is that it is pursuing some goal (at least, many of my relatives frequently anthropomorphize their laptops).
This definition is supposed to also explains why a mouse has agentic behavior, and I would consider it a failure of the definition if it implied that mice are dangerous. I think a system becomes more dangerous as your best model of that system as an optimizer increases in optimization power.
Here is an idea for a disagreement resolution technique. I think this will work best:
*with one other partner you disagree with.
*when your the beliefs you disagree about are clearly about what the world is like.
*when your the beliefs you disagree about are mutually exclusive.
*when everybody genuinely wants to figure out what is going on.
Probably doesn't really require all of those though.
The first step is that you both write out your beliefs on a shared work space. This can be a notebook or a whiteboard or anything like that. Then you each write do...
Ping.
Ok, let me give it a try. I am trying to not spend too much time on this, so I prefer to start with a rough draft and see whether there is anything interesting here before I write a massive essay.
You say the following:
Do chakras exist?
In some sense I might be missing the point since the answer to this is basically just "no". Though obviously I still think they form a meaningful category of something, but in my model they form a meaningful category of "mental experiences" and "mental procedures", and definitely not a meaningfu...
If you come up with a test or set of tests that it would be impossible to actually run in practice, but that we could do in principle if money and ethics were no object, I would still be interested in hearing those. After talking to one of my friends who is enthusiastic about chakras for just a little bit, I would not be surprised if we in fact make fairly similar predictions about the results of such tests.
Sometimes I sort of feel like a grumpy old man that read the sequences back in the good old fashioned year of 2010. When I am in that mood I will sometimes look around at how memes spread throughout the community and say things like "this is not the rationality I grew up with". I really do not want to stir things up with this post, but I guess I do want to be empathetic to this part of me and I want to see what others think about the perspective.
One relatively small reason I feel this way is that a lot of really smart rationalists, who are my fr...
I am not one of the Old Guard, but I have an uneasy feeling about something related to the Chakra phenomenon.
It feels like there's a lot of hidden value clustered around wooy topics like Chakras and Tulpas, and the right orientation towards these topics seems fairly straightforward: if it calls out to you, investigate and, if you please, report. What feels less clear to me is how I as an individual or as a member of some broader rat community should respond when, according to me, people do not certain forms of bullshit tests.
This comes from someone wi...
Here is an idea I just thought of in an uber ride for how to narrow down the space of languages it would be reasonable to use for universal induction. To express the k-complexity of an object relative to a programing language I will write:
Suppose we have two programing languages. The first is Python. The second is Qython, which is a lot like Python, except that it interprets the string "A" as a program that outputs some particular algorithmically large random looking character string with . I claim that intuitively, Pyth...
When I started writing this comment I was confused. Then I got myself fairly less confused I think. I am going to say a bunch of things to explain my confusion, how I tried to get less confused, and then I will ask a couple questions. This comment got really long, and I may decide that it should be a post instead.
Take a system with 8 possible states. Imagine is like a simplified Rubik's cube type puzzle. (Thinking about mechanical Rubik's cube solvers is how I originally got confused, but using actual Rubik's cubes to explain would make...
Is there a particular formula for negentropy that OP has in mind? I am not seeing how the log of the inverse of the probability of observing an outcome as good or better than the one observed can be interpreted as the negentropy of a system with respect to that preference ordering.
Edit: Actually, I think I figured it out, but I would still be interested in hearing what other people think.
Something about your proposed decision problem seems cheaty in a way that the standard Newcomb problem doesn't. I'm not sure exactly what it is, but I will try to articulate it, and maybe you can help me figure it out.
It reminds me of two different decision problems. Actually, the first one isn't really a decision problem.
Omega has decided to give all those who two box on the standard Newcomb problem 1,000,000 usd, and all those who do not 1,000 usd.
Now that's not really a decision problem, but that's not the issue with using it ...
I had already proved it for two values of H before I contracted Sellke. How easily does this proof generalize to multiple values of H?
I see. I think you could also use PPI to prove Good's theorem though. Presumably the reason it pays to get new evidence is that you should expect to assign more probability to the truth after observing new evidence?
I honestly could not think of a better way to write it. I had the same problem when my friend first showed me this notation. I thought about using but that seemed more confusing and less standard? I believe this is how they write things in information theory, but those equations usually have logs in them.
I didn't take the time to check whether it did or didn't. If you would walk me through how it does, I would appreciate it.
Luckily, I don't know much about genetics. I totally forgot that, I'll edit the question to reflect it.
To be sure though, did what I mean about the different kinds of cognition come across? I do not actually plan on teaching any genetics.
There is! It is now posted! Sorry about the delay.