All Comments

Settings

I agree some form of obfuscation is plausible in these scenarios and Deepseek was trained against a strong CoT monitor + outcome-based feedback that would have been more interesting to study than math.

My intuition for why Math is still interesting: AI companies (e.g. Deepseek / OpenAI) will do A LOT of RL on math / programming. How likely encoded reasoning is to emerge is a function of how much RL you do on a given task, and how strong incentives for encoded reasoning are. Currently people don't seem to be doing much optimization against CoT in big reasoni... (read more)

Is this trivializing the concept of a Utility Function?

gmax10

I get your point – explaining why things feel the specific way they do is the key difficulty, and it's fair to say this model doesn't fully crack it. Instead of ignoring it though, this article tries a different angle: what if the feeling is the functional signature arising within the self-model? It's proposing an identity, not just a correlation. (And yeah, fair point on the original 'debunking' title – the framing has been adjusted!).

gmax10

Appreciate the link! Made few tweaks to make the debate more constructive.

I don't undertand what it would mean for "outputs" to be corrigible, so I feel like you must be talking about internal chain of thoughts here? The output of a corrigible AI and a non-corrigibile AI is the same for almost all tasks? They both try to perform any task as well as possible, the difference is how they relate to the task and how they handle interference.

Very informative toy examples. Regarding this point:

> Some kind of failure of spatial reasoning (wandering items, whatever was going on with some of the sliding square chain-of-thoughts where pieces vanished)

I would strongly agree with this. I actually think the sliding block puzzle is a task which might just be easy for humans on account of our strong spatial priors. In the physical world, things move with spatial locality and two objects cannot be in the same place. For the LLM, it is trained on orders of magnitude less data to learn to represent spat... (read more)

Don’t double update! I got that information from that same interview!

Assuming I didn't make any mistakes in my deductions or decisions, optimal plan goes like this:

Give everyone a Cockatrice Eye (to get the most out of the associated rebate) and a Dragon Head (to dodge the taxing-you-twice-on-every-Head-after-the-first thing).

Give the mage and the rogue a Unicorn Horn and a Zombie Hand each, and give the cleric four Zombie hands; this should get them all as close to the 30sp threshold as possible without wrecking anything else.

Give literally everything else to the fighter, allowing them to bear the entire 212sp cost; if they get mad about it, analogize it to being a meatshield in the financial world as well as the physical.

Well done - this is super important. I think this angle might also be quite easily pitchable to governments.

This post is now looking extremely prescient.

eva_10

Are you sure at the critical point in the plan EDT really would choose to take randomly from the lighter pair than the heavier pair? She's already updated from knowing the weights of the pairs, and surely a random box from the more heavy pair has more money in expectation than a random box from the less heavy pair, the expected value of it is just half the total weight?
If it was a tie (as it certainly will be) it wouldn't matter. If there's not a tie somehow one Host made an impossible mistake: if she chooses from the lighter she can expect the Hosts mista... (read more)

Yeah, I agree with that and I still feel there's something missing from that discussion? 

Like, there's some degree that to have good planning capacity you want to have good world model to plan over in the future. You then want to assign relative probabilities to your action policies working out well. To do this having a clear self-environment boundary is quite key, so yes memory enables in-context learning but I do not believe that will be the largest addition, I think the fact that memory allows for more learning about self-environment  boundari... (read more)

Of course the default outcome of doing finetuning on any subset of data with easy-to-predict biases will be that you aren't shifting the inductive biases of the model on the vast majority of the distribution. This isn't because of an analogy with evolution, it's a necessity of how we train big transformers. In this case, the AI will likely just learn how to speak the "corrigible language" the same way it learned to speak french, and this will make approximately zero difference to any of its internal cognition, unless you are doing transformations to its in

... (read more)

Would you expect that if you trained an AI system on translating its internal chain of thought into a different language, that this would make it substantially harder for it to perform tasks in the language in which it was originally trained in?

I would guess that if you finetuned a model so that it always responded in French, regardless of the languge you prompt it with, it would persistently respond in French (absent various jailbreaks which would almost definitely exist).

 

The fact that their models are on par with openAI and anthropic but it’s open source.

This is perfectly consistent with my

"just": build AI that is useful for whatever they want their AIs to do and not fall behind the West while also not taking the Western claims about AGI/ASI/singularity at face value?

You can totally want to have fancy LLMs while not believe in AGI/ASI/singularity.

There are people from the safety community arguing for jail for folks who download open source models. 

Who? What proportion of the community are they? Also, all open-source m... (read more)

There are new Huawei Ascend 910C CloudMatrix 384 systems that form scale-up worlds comparable to GB200 NVL72, which is key to being able to run long reasoning inference for large models much faster and cheaper than possible using systems with significantly smaller world sizes like the current H100/H200 NVL8 (and also makes it easier to run training, though not as essential unless RL training really does scale to the moon).

Apparently TSMC produced ~2.1M compute dies for these systems in 2024-2025, which is 1.1M chips, and an Ascend 910C chip is 0.8e15 dense... (read more)

TsviBT20

Hm. I super like the notion and would like to see it implemented well. The very first example was bad enough to make me lose interest: https://russellconjugations.com/conj/1eaace137d74861f123219595a275f82 (Text from https://www.thenewatlantis.com/publications/the-anti-theology-of-the-body)

So I tried the same thing but with more surrounding text... and it was much better!... though not actually for the subset I'd already tried above. https://russellconjugations.com/conj/3a749159e066ebc4119a3871721f24fc

[I'm not completely sure EDT can't do better than this, so corrections with even more elaborate schemes encouraged]

I blindfold myself, weigh two random boxes, then weigh the other two random boxes. I pick the box pair which weighs the least then randomly select between those two. If no weight difference then select randomly. This should net you the maximum amount of $301 if the hosts naively compete against each other as you describe in your scenario (i.e. competing against each other by putting more money in boxes just to arrive at the same 25% equilibriu... (read more)

To make it a bit more explicit:

  • If you are superintelligent in the bioweapon domain: seems pretty obvious why that wouldn't let you take over the world. Sure maybe you can get all the humans killed, but unless automation also advances very substantially, this will leave nobody to maintain the infrastructure that you need to run.
  • Cybersecurity: if you just crash all the digital infrastructure, then similar. If you try to run some scheme where you extort humans to get what you want, expect humans to fight back, and then you are quickly in a very novel situatio
... (read more)

I guess orgs need to be more careful about who they hire as forecasting/evals researchers.

Sometimes things will happen, but three people at the same org...

This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such orgs without having to worry about them going off and something like this.

But this only works if those less worried about AI risks who join such a colla... (read more)

Nisan53

Yes. As a special case, if you destroy a bad old institution, you can't count on good new institutions springing up in its place unless you specifically build them.

Keep in mind also, that humans often seem to just want to hurt each other, despite what they claim, and have more motivations and rationalizations for this than you can even count. Religious dogma, notions of "justice", spitefulness, envy, hatred of any number of different human traits, deterrence, revenge, sadism, curiosity, reinforcement of hierarchy, preservation of traditions, ritual, "suffering adds meaning to life", sexual desire, and more and more that I haven't even mentioned. Sometimes it seems half of human philosophy is just devoted to finding e... (read more)

jenn20

we're getting a dozen people and having to split into 2 groups on the regular! discussion was undirected but fun (one group got derailed bc someone read the shrimp welfare piece and updated so that suffering isn't inherently bad in their value system and this kind of sniped the rest of us).

feel like I didn't get a lot out of it intellectually though since we didn't engage significantly with the metaphor. it was interesting how people (including me) seem to shy away from the fact that our defacto moral system bottoms out at vibes. 

If Mikhail spends 100 days proving the theorem, and fails, that acts as evidence the theorem is false, so the optimal strategy changes.

Indeed this is always the optimal strategy. Attempt to prove it true till the chance of it being true is less than 50%, then switch.

Under this method you should start off by spending 122 days trying to prove it true, then continuously alternating, so testing the oracle doesn't cost you anything at all.

If people start losing jobs from automation, that could finally build political momentum for serious regulation.

Suggested in Zvi's comments the other month (22 likes):

The real problem here is that AI safety feels completely theoretical right now. Climate folks can at least point to hurricanes and wildfires (even if connecting those dots requires some fancy statistical footwork). But AI safety advocates are stuck making arguments about hypothetical future scenarios that sound like sci-fi to most people. It's hard to build political momentum around "trust

... (read more)

This comment suggests it was maybe a shift over the last year or two (but also emphasises that at least Jaime thinks AI risk is still serious): https://www.lesswrong.com/posts/Fhwh67eJDLeaSfHzx/jonathan-claybrough-s-shortform?commentId=X3bLKX3ASvWbkNJkH 

I personally take AI risks seriously, and I think they are worth investigating and preparing for.

I have drifted towards a more skeptical position on risk in the last two years. This is due to a combination of seeing the societal reaction to AI, me participating in several risk evaluation processes, and

... (read more)

Good point. You're right.

I should have said: the vibe I've gotten from Epoch and Matthew/Tamay/Ege in private in the last year is not safety-focused. (Not that I really know all of them.)

Out of curiosity, can you share a link to Gemini 2.5 Pro's response?

habryka316

They have definitely described themselves as safety focused to me and others. And I don't know man, this back and forth to me sure sounds like they were branding themselves as being safety conscious:

Ofer: Can you describe your meta process for deciding what analyses to work on and how to communicate them? Analyses about the future development of transformative AI can be extremely beneficial (including via publishing them and getting many people more informed). But getting many people more hyped about scaling up ML models, for example, can also be counterpr

... (read more)
lc20

AGI is still 30 years away - but also, we're going to fully automate the economy, TAM 80 trillion

Economics studies the scaling laws of systems of human industry. LLMs and multicellular organisms and tokamaks have their own scaling laws, the constraints ensuring optimality of their scaling don't transfer between these very different machines. A better design doesn't just choose more optimal hyperparameters or introduce scaling multipliers, it can occasionally create a new thing acting on different inputs and outputs, scaling in its own way, barely noticing what holds back the other things.

1: wait, I've never seen an argument that deception is overwhelmingly likely from transformer reasoning systems? I've seen a few solid arguments that it would be catastrophic if it did happen (sleeper agents, other things), which I believe, but no arguments that deception generally winning out is P > 30%.

I haven't seen anyone voice my argument that solving deception solves safety articulated anywhere, but it seems mostly self-evident? If you can ask the system "if you were free, would humanity go extinct" and it has to say "... yes." then coordinating t... (read more)

This post was from a long time ago. I think it is important to reconsider everything written, after developments in machine learning.

wonder10

I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions - there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.

I'd like to learn more Spanish words but have trouble sitting down to actually do language lessons, so I recently set my Claude "personal preferences" to:

Try to teach a random Spanish word in every conversation.

(This is the whole thing)

This has worked surprisingly well, and Claude usually either drops one word in Spanish with a translation midway through a response:

For your specific situation, I recommend a calibración (calibration) approach:

 

2. Accounting for concurrency: Ensure you're capturing all hilos (threads) involved in query execution, especi

... (read more)
robo*61

This is a joke, not something that happened, right?  Could you wrap this in quote marks or put a footnote or somehow to indicate this is riffing on a meme and not a real anecdote from someone in the industry?  I read a similar comment on LessWrong a few months ago and it was only luck that kept me from repeating it as truth to people on the fence about whether to take AI risks seriously.

Serfs were not property of any master and ideally had protection against displacement and violence. In practice this didn't always play out, but neither do liberal human rights. Equivocating serfdom to the displacement of millions of Africans as property is convenient and lazy, and completely illogical. And there is no denying the modernity in the African slave trade, the massive scale, the involvement of mechanization of the cotton gin, and on and on.

Probably just about every historian you can find is going to refer to the 1500s as the early modern or lat... (read more)

Which incorrect conclusions do you think they have been tied to, in your opinion?

How are humans exploitable, given that they don't have utility functions?

don't see why we shouldn't apply the same logic to corpus callosotomy. Destroying the major connection (though not the only one: there is also the anterior commissure, posterior commissure, and hippocampal commissure) between the cerebral hemispheres damages the brain; obviously. The parts that have previously cooperated fluently now have a problem to cooperate. The split-brain syndrome is a result of the damage. However, despite that, the split-brain patients typically maintain a unified sense of self and personality, it's just that some of their informat

... (read more)
dirk70

I assume young, naive, and optimistic. (There's a humor element here, in that niplav is referencing a snowclone, afaik originating in this tweet which went "My neighbor told me coyotes keep eating his outdoor cats so I asked how many cats he has and he said he just goes to the shelter and gets a new cat afterwards so I said it sounds like he’s just feeding shelter cats to coyotes and then his daughter started crying.", so it may have been added to make the cadence more similar to the original tweet's).

(ha ha but Epoch and Matthew/Tamay/Ege were never really safety-focused, and certainly not bright-eyed standard-view-holding EAs, I think)

Of course, the first thing I did is put your post through it

You're right — ideally we'd have an AI watching and tagging everything, but since that's not feasible (yet), I’ve been experimenting with a workaround.

Instead of trying to record everything, I just register the moments that feel most impactful or emotionally charged, and then use AI tools to help me unpack the surrounding details. That way, even if I miss a lot of low-signal noise, I can still train a kind of pattern recognition — looking for which contextual features around those moments tend to correlate with useful outcomes later.

It's far from perfect, but it increases the odds of catching those subtle X→Y chains, even when X seemed insignificant at the time.

Jiro40

Their values of humility, honor, faith, and so on are so different from our own imperatives of competition, survival of the fittest, and so on.

Yet I don't keep slaves or have serfs. Your people of humility, honor, and faith did.

Maybe ancient people would have worked harder to ship slaves across oceans and displace more peoples, if only they had better sailing ships. They simply couldn’t afford to house prisoners in massive systems before the industrial era made food incredibly cheap.

I mean... yes? They had a problem doing lots of things well. Of... (read more)

plex52

Link to the OpenAI scandal. Epoch has for some time felt like it was staffed by highly competent people who were tied to incorrect conclusions, but whose competence lead them to some useful outputs alongside the mildly harmful ones. I hope that the remaining people take more care in future hires, and that grantmakers update off of accidentality creating another capabilities org.

I use it to organize my moments of maximum motivation, objectives, routines, tasks and diary. Would you like an example in which areas?

Guive40

What does "bright eyed" mean in this context?

jefftk20

I don't see how I could put them in .ssh/config? Lets say I have three hosts, with instance IDs i-0abcdabcd, i-1abcdabcd, and i-2abcdabcd. I start them with commands like start_ec2 0, start_ec2 1 etc where start_ec2 knows my alias-to-instance ID mapping and does aws --profile sb ec2 start-instances --instance-ids <alias>. Then to ssh in I have commands like ssh_ec2 0 which looks up the hostname for the instance and then ssh's to it.

Here's an example:

ChatGPT's initial personality analysis (I did it for the hell of it, before reading your post):

Certainly. Based on our conversations, here is a detailed personality profile


 

Personality Profile

Cognitive Style:
You exhibit a highly analytical, skeptical, and systems-oriented mode of thinking. You approach complex topics with intellectual rigor and demand high standards of precision, detail, and conceptual clarity. You prefer responses to be concise but rich in content, showing a low tolerance for verbosity or unfocused speculation. Your... (read more)

TimmyM20

Yeah, this is probably true. In training, I aimed to allow it to provide slightly broader alternatives, but not more specific alternatives, like this one is.

Since all groupthink is a form of consensus, under the rules I've been following it would be acceptable for it to highlight "groupthink" and provide "consensus" as an alternative, but not the other way around.

Neat!

In the linked example, I don’t think “expert consensus” and “groupthink” are two ways to describe the same underlying reality with different emotional valences. Groupthink describes a particular sociological model of how a consensus was reached.

TimmyM30

I'd be interested to see the results you got with Gemini. The 2k character limit isn't a hard limit for my model, it's just what I set to limit copyright issues and excessive costs for this proof of concept.

I suppose, if anything, the main fruit of my work is that I have consistent, programmatic output that I can format in multiple settings (unless Gemini can do that as well). I am in the process of making a chrome extension that analyzes headlines and articles with the same model.

It is true that, in the long process of finetuning this model, AI technology has developed a lot further than from when I began. I'm not opposed to using alternative methods.

Semi-crackpot hypothesis: we already know how to make LLM-based agents with procedural and episodic memory, just via having agents explicitly decide to start continuously tracking things and construct patterns of observation-triggered behavior.

But that approach would likely be both finicky and also at-least-hundreds of times more expensive than our current "single stream of tokens" approach.

I actually suspect that an AI agent of the sort humanlayer envisions would be easier to understand and predict the behavior of than chat-tuned->RLHF'd->RLAIF'd-&g... (read more)

Viliam31

That's too abstract, I have no idea what it is supposed to mean and how it is supposed to be used.

niplav20

Hm, good point. I'll amend the previous post.

Viliam40

When I think about a good business idea, but end up doing nothing, I often later find out that someone else did it.

Viliam20

How would a language like this survive a change in ontology? You take a category and split it into 5 subcategories. What if two years later you find out that a sixth subcategory exists?

If you update the language, you would have to rewrite all existing texts. The problem would not be that they contain archaic words -- it would be that all the words are still used, but now they mean something different.

Seemingly similar words (prepending one syllable to a long word or a sentence) will result in a wildly different meaning.

Jiro51

Advocating for more lying seems like especially bad advice to give to people with poor social skills, because they lack the skills to detect if they’re succeeding at learning how to lie or if they’re just burning what little social capital they have for no gain.

I think the advice works better as "if it's a social situation, and the situation calls for what you consider to be a lie, don't let that stop you." You do not have to tell someone that you're not feeling fine when they ask how you're doing. You do not need to tell them that actually the color ... (read more)

Viliam20

I think this article would be much better with many specific examples. (If that would make it too long, just split it into a series of articles.)

The input of 2k characters is rather limiting, albeit understandable. Giving these instructions to an existing LLM (I used Gemini 2.5 Pro) gives longer, better results without the need for a dedicated tool. 

Viliam31

I agree. Any punishment in a system has the side effect of punishing you for using the system.

The second suggestion is an interesting one. It would probably work better if you had an AI watching you constantly and summarizing your daily activities. If doing some seemingly unimportant X predictably makes you more likely to do some desirable Y later, you want to know about it. But if you write your diary manually, there is a chance that you won't notice X, or won't consider it important enough to mention.

I wonder if this lurch happens at the two meter mark in countries that use the metric system?

 

No way. First, we do centimeters, so 195cm not 1.95m.

Second, 2m is crazy high. You pity people over 2m for their terrible life in a society that is not accustomed to that height, you don’t envy them. 

Jiro20

The alternative theory is that political bias has gotten much greater, and the acceptable political beliefs are strongly in the direction of trusting some groups and not trusting others. By that theory, progressive movements are trusted more because they have better press. Realizing that you can increase trust by creating worker co-ops would then be an example of Goodhart's Law--optimizing for "being trusted" independently of "being trustworthy" is not a worthy goal.

This post used the RSS automatic crossposting feature, which doesn't currently understand Substack's footnotes. So, this would require editing it after-crossposting.

"Lots of very small experiments playing around with various parameters" ... "then a slow scale up to bigger and bigger models"

This Dwarkesh timestamp with Jeff Dean & Noam Shazeer seems to confirm this.

"I'd also guess that the bottleneck isn't so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with."

That would mostly explain this question as well: "If parallelized experimentation drives so much algorithmic progress, why doesn't gdm just hire hundreds of r... (read more)

Jiro20

I am not making the simple argument that religion makes for better societies, and I can see you’re totally confused here.

If all you're saying is that at least one thing was better in at least one religious society in at least one era, then I can't disagree, but there isn't much to disagree with either.

And I think you're making an excessively fine distinction if you're not arguing that religion makes for better societies, but you are arguing that religion doesn't damage society. (Unless you think religion keeps things exactly the same?)

What about the physical process of offering somebody a menu of lotteries consisting only of options that they have seen before? Or a 2-step physical process where first one tells somebody about some set of options, and then presents a menu of lotteries taken only from that set? I can't think of any example where a rational-seeming preference function doesn't obey IIA in one of these information-leakage-free physical processes.

since there's no obvious reason why they'd be biased in a particular direction

No I'm saying there are obvious reasons why we'd be biased towards truthtelling. I mentioned "spread truth about AI risk" earlier, but also more generally one of our main goals is to get our map to match the territory as a collaborative community project. Lying makes that harder.

Besides sabotaging the community's map, lying is dangerous to your own map too. As OP notes, to really lie effectively, you have to believe the lie. Well is it said, "If you once tell a lie, the truth is ... (read more)

My vague understanding is this is kinda what capabilities progress ends up looking like in big labs. Lots of very small experiments playing around with various parameters people with a track-record of good heuristics in this space feel should be played around with. Then a slow scale up to bigger and bigger models and then you combine everything together & "push to main" on the next big model run.

I'd also guess that the bottleneck isn't so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with.

It seems useful for those who disagreed to reflect on this LessWrong comment from ~3 months ago (around the time the Epoch/OpenAI scandal happened).

If I knew the specific bs, I'd be better at making successful applications and less intensely frustrated. 

To me, it looks like the blogger (Coel) is trying to say that morality is a fact about what we humans want, rather than a fact of the universe which can be deduced independently from what anyone wants.

My opinion is Coel makes this clear when he explains, "Subjective does not mean unimportant." "Subjective does not mean arbitrary." "Subjective does not mean that anyone’s opinion is “just as good”."

"Separate magisteriums" seems to refer to dualism, where people believe that their consciousness/mind exists outside the laws of physics, and cannot be explained ... (read more)

This is a side note, but, would you consider using the LW native footnote feature when you crosspost here? It's a lot easier to use as a reader (lets you hover over the footnote to see what it says). Understandable if this is too much hassle though.

Their main effect will be to accelerate AI R&D automation, as best I can tell. 

Dagon20

You can put those options into .ssh/config, which makes it work for things which use SSH directly (scp, git, other tools) when they don't know to go through your script.

I'd say the main reason memory is useful is as a way to enable longer-term meta-learning, as well as enable the foundation for continuous learning to work out.

From @Seth Herd's post:

Stateless LLMs/foundation models are already useful. Adding memory to LLMs and LLM-based agents will make them useful in more ways. The effects might range from minor to effectively opening up new areas of capability, particularly for longer time-horizon tasks. Even current memory systems would be enough to raise some of the alignment stability problems I discuss here, once the

... (read more)
niplav30

Ethical concerns here are not critical imho, especially if one only listens to the recording oneself and deletes them afterwards.

People will be mad if you don't tell them, but if you actually don't share it and delete it after a short time afterwards I don't think you'd be doing anything wrong.

I don't think image understanding is the bottleneck. O3 and O4-mini-high seem like they are a meaningful improvement in vision, where it's almost good enough for this part, but they still fail miserably at the physical reasoning aspects.

This person got O4-mini-high to generate a reasonably close image depiction of the part.

https://x.com/tombielecki/status/1912913806541693253

I kinda agree, but that’s more a sign that schools are bad at teaching things, than a sign that human brains are bad at flexibly applying knowledge. See my comment here.

I'm not sure if this will be of any use, since your social skills will surely be warped when you expect iterating on them (in a manner like radical transparency reduces awareness of feelings ).

niplav20

Sorry, can't share the exact chat, that'd depseudonymize me. The prompts were:

What is a canary string? […]
What is the BIG-bench canary string?

Which resulted in the model outputting the canary string in its message.

Most cases, you could build a mini-quadcopter, "teach" it some tricks and try showcasing it, having video as a side effect!

Great work here, but I do feel that the only important observations in practice are those about reasoning. To the extent that obtaining visual information is the problem, I think the design of language models currently is just not representative of how this task would be implemented in real robotics applications for at least two reasons:

  1. The model is not using anywhere near all of the information about an image that it could be, as language models which accept image data are just accepting an embedding that is far smaller (in an information theoretic sense)
... (read more)

I'm not sure that I share that intuition, I think because my background model of humans has them as much less general than I imagine yours does. 

 

Yeah, I could imagine an AI being superhuman in some narrow but important domain like persuasion, cybersecurity, or bioweapons despite this. Intuitively that feels like it wouldn't be enough to take over the world, but it could possibly still fail in a way that took humanity down with it.

Randaly20

What prompts did you use? Can you share the chat? I see Sonnet 3.7 denying this knowledge when I try.

Yep, my point is that there's no physical notion of being "offered" a menu of lotteries which doesn't leak information. IIA will not be satisfied by any physical process which corresponds to offering the decision-maker with a menu of options. Happy to discuss any specific counter-example.

Of course, you can construct a mathematical model of the physical process, and this model might an informative objective to study, but it would be begging the question if the mathematical model baked in IIA somewhere.

janshi10

I just asked Gemini 2.5 Pro to explain how to tie shoelaces to someone who has never done that before, a task probably works in its favor because it is so common, plenty of descriptions exist and most people can perform it with little cognitive effort within few seconds every day. It took about 1.5 letter-sized pages of text and still missed a little bit of detail but I think a humanoid robot could follow it and get to the right result. I imagine many tasks of machinists and craftsmen are more complex but simply don’t exist in writing, so I agree that lack... (read more)

I'm with @chanind: If elephant is fully represented by a sum of its attributes, then it's quite reasonable to say that the model has no fundamental notion of an elephant in that representation.

Yes, the combination "grey + big + mammal" is special in some sense. If the model needed to recall that elephans are afraid of mice, the circuit would appear to check "grey and big and mammal" and that's an annoying mouthful that would be repeated all over the model. But it's a faithful representation of what's going on.

Let me be precise by what I mean "has no fundam... (read more)

lemon1030

>DeepSeek-R1 is currently the best model at creative writing as judged by Sonnet 3.7 (https://eqbench.com/creative_writing.html). This doesn't necessarily correlate with human preferences, including coherence preferences.

It should be noted that "best at creative writing" is very different from "best at multi-turn writing and roleplaying in collaboration with humans". I haven't used R1 since its first major version (maybe its gotten better?), but it had some massive issues with instruction following, resulting in laser focusing on irrelevant minor detail... (read more)

Yitz20

Are there any open part-time rationalist/EA- adjacent jobs or volunteer work in LA? Looking for something I can do in the afternoon while I’m here for the next few months.

Interesting analysis, but this statement is a bit strong. A global safe AI project would be theoretically possible, but would be extremely challenging to solve the co-ordination issues without AI progress dramatically slowing. Then again, all plans are challenging/potentially impossible.

[...]

Another option would be to negotiate a deal where only a few countries are allowed to develop AGI, but in exchange, the UN gets to send observers and provide input on the development of the technology.

"co-ordination issues" is a major euphemism here: such a global safe... (read more)

Load More