we can be confident about why it’s doing this: to get a high RM score
Does this constitute a mesa-optimizer? If so, was creating it intentional or incidental? I was under the impression that those were still basically theoretical.
I think this topic is important and many of your recommendations sound like great ideas, but they also involve a lot of "we should" where it's not clear who "we" is. I would like to see, for some of these, targeting to a specific audience: who actually has the capability to help streamline government procurement processes for AI, and how? What organizations might be well positioned to audit agency needs and bottlenecks? I'm left with the sense that these things would be good in the abstract, but that there's little I personally (or most other readers, unle...
Why on earth would pokemon be AGI-complete?
There are big classes of problems that provably can't be solved in a forward pass. Sure, for something where it knows the answer instantly the chain of thought could be just for show. But for anything difficult, the models need the chain of thought to get the answer, so the CoT must contain information about their reasoning process. It can be obfuscated, but it's still in there.
I kind of see your point about having all the game wikis, but I think I disagree about learning to code being necessarily interactive. Think about what feedback the compiler provides you: it tells you if you made a mistake, and sometimes what the mistake was. In cases where it runs but doesn't do what you wanted, it might "show" you what the mistake was instead. You can learn programming just fine by reading and writing code but never running it, if you also have somebody knowledgeable checking what you wrote and explaining your mistakes. LLMs have tons of examples of that kind of thing in their training data.
Yeah but we train AIs on coding before we make that comparison. And we know that if you train an AI on a videogame it can often get superhuman performance. Here we're trying to look at pure transfer learning, so I think it would be pretty fair to compare to someone who is generally competent but has never played videogames. Another interesting question is to what extent you can train an AI system on a variety of videogames and then have it take on a new one with no game-specific training. I don't know if anyone has tried that with LLMs yet.
The cornerstone of all control theory is the idea of having a set-point and designing a controller to reduce the deviation between the state and the set-point.
But control theory is used for problems where you need a controller to move the system toward the set-point, i.e. when you do not have instant total control of all degrees of freedom. We use tools like PID tuning, lead-lag, pole placement etc. to work around the dynamics of the system through some limited actuator. In the case of AI alignment, not only do we have a very vague concept of what our set-...
I would think things are headed toward these companies fine tuning an open source near-frontier LLM. Cheaper than building one from scratch but with most of the advantages.
Yeah, something along the lines of an ELO-style rating would probably work better for this. You could put lots of hard questions on the test and then instead of just ranking people you compare which questions they missed, etc.
This works for corn plants because the underlying measurement "amount of protein" is something that we can quantify (in grams or whatever) in addition to comparing two different corn plants to see which one has more protein. IQ tests don't do this in any meaningful sense; think of an IQ test more like a Moh's hardness scale, where you can figure out a new material's position on the scale by comparing it to a few with similar hardness and seeing which are harder and which are softer. If it's harder than all of the previously tested materials, it just goes at the top of the scale.
IQ tests include sub-tests which can be cardinal, with absolute variables. For example, simple & complex reaction time; forwards & backwards digit span; and vocabulary size. (You could also consider tests of factual knowledge.) It would be entirely possible to ask, 'given that reaction time follows a log-normalish distribution in milliseconds and loads on g by r = 0.X and assuming invariance, what would be the predicted lower reaction time of someone Y SDs higher than the mean on g?' Or 'given that backwards digit span is normally distributed...' T...
I wasn't saying it's impossible to engineer a smarter human. I was saying that if you do it successfully, then IQ will not be a useful way to measure their intelligence. IQ denotes where someone's intelligence falls relative to other humans, and if you make something smarter than any human, their IQ will be infinity and you need a new scale.
it’s not even clear what it would mean to be a 300-IQ human
IQ is an ordinal score, not a cardinal one--it's defined by the mean of 100 and standard deviation of 15. So all it means is that this person would be smarter than all but about 1 in 10^40 natural-born humans. It seems likely that the range of intelligence for natural-born humans is limited by basic physiological factors like the space in our heads, the energy available to our brains, and the speed of our neurotransmitters. So a human with IQ 300 is probably about the same as IQ 250 or IQ 1000 or IQ 10,000, i.e. at the upper limit of that range.
I've heard doctors ask questions like this but I don't think they usually get very helpful answers. "My diet's okay I guess, pretty typical, a lot of times I don't sleep great, and yeah I have a pretty stressful job." Great, what do you do with that?
"Food" in general is about the easiest and most natural thing for a dog to identify. Distinguishing illegal drugs from all the other random stuff a person might be carrying (soap, perfume, medicine, etc.) at least requires a lot better training than finding food.
It's interesting that 3.5 Sonnet does not seem to match, let alone beat, GPT-4o on the leaderboard (https://chat.lmsys.org/?leaderboard). Currently it shows GPT-4o with elo 1287 and Claude 3.5 Sonnet at 1271.
Although it would also be nice to distinguish that from "I read this post already somewhere else"
I would love to have a checkbox or something next to each post to indicate "I saw this and I don't want to click on it"
As a counterpoint, take a look at this article: https://peterattiamd.com/protein-anabolic-responses/
The upshot is that the studies saying your body can only use 45g of protein per meal for muscle synthesis are mostly based on fast-acting whey protein shakes. Stretching out the duration of protein metabolism (by switching protein sources and/or combining it with other foods in a gradually-digested meal) can mitigate the problem quite a bit.
Saturated fats are definitely manageable in small amounts. For most of history, and still in many places today, the biggest concern for an infant was getting sufficient calories, and saturated fat is a great choice for that. When you look at modern hunter-gatherer diets, they contain animal products, but in most cases they do not make up the majority of calories (exceptions usually involve lots of seafood), the meats are wild and therefore fairly lean, and BMI stays generally quite low. Under those conditions, heart disease risk is small and whether it is ...
Real can of worms that deserves its own post I would think
I think in this case just spacing them out would help more.
Downvoted because I waded through all those rhetorical shenanigans and I still don't understand why you didn't just say what you mean.
This comment had been apparently deleted by the commenter (the comment display box having a "deleted because it was a little rude, sorry" deletion note in lieu of the comment itself), but the ⋮-menu in the upper-right gave me the option to undelete it, which I did because I don't think my critics are obligated to be polite to me. (I'm surprised that post authors have that power!) I'm sorry you didn't like the post.
Separate clocks would be a pain to manage in a board game, but in principle "the game ends once 50% of players have run out of time" seems like a decent condition.
Oh, good point, I had forgotten about the zero-sum victory points. The extent to which the other parts are zero sum depends a lot on how large the game board is relative to the number of players, so it could be adjusted. I was thinking about having a time limit instead of a round limit, to encourage the play to move quickly, but maybe that's too stressful. If you want the players to choose to end the game, then you'd want to build in a mechanic that works against all of them more and more as the game progresses, so that at some point continuing becomes counterproductive...
Would a good solution be to just play Settlers, but instead of saying "the goal is to get more points than anyone else," say "this is a variant where the goal is to get the highest score you can, individually"? That seems like it would change the negotiation dynamics in a potentially interesting way without having to make or teach a brand new game. Does this miss the point somehow?
So, then it seems like the client's best move in this scenario is to lie to you strategically, or at least omit information strategically. They could say "I know for sure you won't find any fingerprints or identifiable face in the camera footage" and "I think my friends will confirm that I was playing video games with them", and as long as they don't actually tell you that's a lie, you can put those friends on the stand, right?
You say that lying to you can only hurt them but "There is a kernel of an exception that is almost not worth mentioning" because it is rarely relevant. I find this pretty hard to believe. If your client tells you "yeah I totally robbed that store, but I was wearing a ski mask and gloves so I think a jury will have reasonable doubt assuming my friends say I was playing video games with them the whole time", would you be on board with that plan? There must be plenty of cases where the cops basically know who did it but have trouble proving it. Maybe those just don't get to the point of a public defender getting assigned?
That's like saying that because we live in a capitalist society, the default plan is to destroy every bit of the environment and fill every inch of the world with high rise housing projects. It's... true in some sense, but only as a hypothetical extreme, a sort of economic spherical cow. In reality, people and societies are more complicated and less single minded than that, and also people just mostly don't want that kind of wholesale destruction.
I didn't think the implication was necessarily that they planned to disassemble every solar system and turn it into probe factories. It's more like... seeing a vast empty desert and deciding to build cities in it. A huge universe, barren of life except for one tiny solar system, seems not depressing exactly but wasteful. I love nature and I would never want all the Earth's wilderness to be paved over. But at the same time I think a lot of the best the world has to offer is people, and if we kept 99.9% of it as a nature preserve then almost nobody would be around to see it. You'd rather watch the unlifted stars, but to do that you have to exist.
I don't think governments have yet committed to trying to train their own state of the art foundation models for military purposes, probably partly because they (sensibly) guess that they would not be able to keep up with the private sector. That means that government interest/involvement has relatively little effect on the pace of advancement of the bleeding edge.
Fair point, but I can't think of a way to make an enforceable rule to that effect. And even if you could make that rule, a rogue AI would have no problem with breaking it.
I think if you could demonstrably "solve alignment" for any architecture, you'd have a decent chance of convincing people to build it as fast as possible, in lieu of other avenues they had been pursuing.
Since our info doesn't seem to be here already: We meet on Sundays at 7pm, alternating between virtual and in-person in the lobby of the UMBC Performing Arts and Humanities Building. For more info, you can join our Google group (message the author of this post, bookinchwrm).
I found this post interesting, mostly because it illustrates deep flaws in the US tax system that we should really fix. I downvoted it because I think it is a terrible strategy for giving more money to charity. Many other good objections have been raised in the comments, and the post itself admits that lack of effectiveness is a serious problem. One problem I did not see addressed anywhere is reputational risk. The world is not static, and a technique that works for an individual criminal or a few conscientious objectors probably will not work consistently...
I always thought it would be great to have one set of professors do the teaching, and then a different set come in from other schools just for a couple weeks at the end of the year to give the students a set of intensive written and oral exams that determines a big chunk of their academic standing.
Here's a market, not sure how to define linchpin but we can at least predict whether he'll be part of it.
https://manifold.markets/ErickBall/will-the-first-agi-be-built-by-sam?r=RXJpY2tCYWxs
I can now get real-time transcripts of my zoom meetings (via a python wrapper of the openai api) which makes it much easier to track the important parts of a long conversation. I tend to zone out sometimes and miss little pieces otherwise, as well as forget stuff.
That's fair, most of them were probably never great teachers.
You are attributing a lot more deviousness and strategic boldness to the so-called deep state than the US government is organizationally capable of. The CIA may have tried a few things like this in banana republics but there's just no way anybody could pull it off domestically.
Professors being selected for research is part of it. Another part is the tenure you mentioned - some professors feel like once they have tenure they don't need to pay attention to how well they teach. But I think a big factor is another one you already mentioned: salaries. $150k might sound like a lot to a student, but to the kind of person who can become a math or econ professor at a top research university this is... not tiny but not close to optimal. They are not doing it for the money. They are bought in to a culture where the goal is building status ...
But that sort of singularity seems unlikely to preserve something as delicately balanced as the way that (relatively well-off) humans get a sense of meaning and purpose from the scarcity of desirable things.
I think our world actually has a great track record of creating artificial scarcity for the sake of creating meaning (in terms of enjoyment, striving to achieve a goal, sense of accomplishment). Maybe "purpose" in the most profound sense is tough to do artificially, but I'm not sure that's something most people feel a whole lot of anyway?
I'm pretty opti...
Excellent, I think I will give something like that a try
I know this is an old thread but I think it's interesting to revisit this comment in light of what happened at Twitter. Musk did, in fact, fire a whole lot of people. And he did, in fact, unban a lot of conservatives without much obvious delay or resistance within the company. I'm not sure how much of an implication that has about your views of the justice department, though. Notably, it was pretty obvious that the decisions at Twitter were being made at the top, and that the people farther down in the org chart had to implement those decisions or be fired...
Thanks! I'd love to hear any details you can think of about what you actually do on a daily basis to maintain mental health (when it's already fairly stable). Personally I don't really have a system for this, and I've been lucky that my bad times are usually not that bad in the scheme of things, and they go away eventually.
I'm not sure how I would work it out. The problem is that presumably you don't value one group more because they chose blue (it's because they're more altruistic in general) or because they chose red (it's because they're better at game theory or something). The choice is just an indicator of how much value you would put on them if you knew more about them. Since you already know a lot about the distribution of types of people in the world and how much you like them, the Bayesian update doesn't really apply in the same way. It only works on what pill they'...
Doesn't "trembling hand" mean it's a stable equilibrium even if there are?
I mean definitely most people will not use a decision procedure like this one, so a smaller update seems very reasonable. But I suspect this reasoning still has something in common with the source of the intuition a lot of people have for blue, that they don't want to contribute to anybody else dying.
Sure, if you don't mind the blue-choosers dying then use the stable NE.
People are all over the place but definitely not 50/50. The qualitative solution I have will hold no matter how weak the correlation with other people's choices (for large enough values of N).
If you make the very weak assumption that some nonzero number of participants will choose blue (and you prefer to keep them alive), then this problem becomes much more like a prisoner's dilemma where the maximum payoff can be reached by coordinating to avoid the Nash equilibrium.
Fair enough, I guess the distinction is more specific than just being a (weak) mesa-optimizer. This model seems to contradict https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target because it has, in fact, developed reward as the optimization target without ever being instructed to maximize reward. It just had reward-maximizing behaviors reinforced by the training process, and instead of (or in addition to) becoming an adaptation executor it became an explicit reward optimizer. This type of generalization is surprising and ... (read more)