LESSWRONG
LW

All of wassname's Comments + Replies

Model Organisms for Emergent Misalignment

That makes sense, thank you for explaining. Ah yes, I see they are all the LORA adapters, for some reason I thought they were all merged, my bad. Adapters are certainly much more space efficient.

Gemini Diffusion: watch this space

wassname1mo30

Yes, that's exactly what I mean! If we have word2vec like properties, steering and interpretability would be much easier and more reliable. And I do think it's a research direction that is prospective, but not certain.

Facebook also did an interesting tokenizer, that makes LLM's operating in a much richer embeddings space: https://github.com/facebookresearch/blt. They embed sentences split by entropy/surprise. So it might be another way to test the hypothesis that a better embedding space would provide ice Word2Vec like properties.

Model Organisms for Emergent Misalignment

wassname1mo10

Are you going to release the code models too? They seem useful? Also, the LORA versions if possible, please.

2Anna Soligo26d

Thanks for the interest! We haven't released any code models, but the original paper released their 32B Qwen Coder fine-tune here. The models we release are the rank-32 all adapter LoRA setup, unless otherwise specified. There are a few rank 1 LoRA models too (these have R1 in the name, and their adapter_config files will contain details of what layers the adapters were trained on).

Model Organisms for Emergent Misalignment

wassname1mo10

Thank you for releasing the models.

It's really useful, as a bunch of amateurs had released "misaligned" models on huggingface, but they don't seem to work (be cartoonishly evil).

I'm experimenting with various morality evals (https://github.com/wassname/llm-moral-foundations2, https://github.com/wassname/llm_morality) and it's good to have a negative baseline. It will also be good to add it to speechmap.ai if we can.

Gemini Diffusion: watch this space

wassname1mo10

Good point! And it's plausible because diffusion seems to provide more supervision and get better results in generative vision models, so it's a candidate for scaling.

Gemini Diffusion: watch this space

wassname1mo10

Oh it's not explicitly in the paper, but in Apple's version they have an encoder/decoder with explicit latent space. This space would be much easier to work with and steerable than the hidden states we have in transformers.

With an explicit and nicely behaved latent space we would have a much better chance of finding a predictive "truth" neuron where intervention reveals deception 99% of the time even out of sample. Right now mechinterp research achieves much less, partly because the transformers have quite confusing activation spaces (attention sinks, suppressed neurons, etc).

3eggsyntax1mo

I think what you're saying is that because the output of the encoder is a semantic embedding vector per paragraph, that results in a coherent latent space that probably has nice algebraic properties (in the same sense that eg the Word2Vec embedding space does). Is that a good representation? That does seem intuitively plausible, although I could also imagine that there might have to be some messy subspaces for meta-level information, maybe eg 'I'm answering in language X, with tone Y, to a user with inferred properties Z'. I'm looking forward to seeing some concrete interpretability work on these models.

Gemini Diffusion: watch this space

wassname2mo30

If it's trained from scratch, and they release details, then it's one data point for diffusion LLM scaling. But if it's distilled, then it's zero points of scaling data.

Because we are not interested in scaling which is distilled from a larger parent model, as that doesn't push the frontier because it doesn't help get the next larger parent model.

Apple also have LLM diffusion papers, with code. It seems like it might be helpful for alignment and interp because it would have a more interpretable and manipulable latent space.

1StanislavKrym2mo

If Gemini is distilled from a bigger LLM, then it's also useful because a similar result is obtained with fewer compute. Consider o3 and o4-mini which is only a little less powerful and far cheaper. And that's ignoring the possibility to amplify Gemini Diffusion, then re-distill it, obtaining GemDiff^2, etc. If this IDA process turns out to be far cheaper than that of LLMs, then we obtain a severe capabilities per compute increase...

4eggsyntax2mo

Why would we expect that to be the case? (If the answer is in the Apple paper, just point me there)

Gemini Diffusion: watch this space

wassname2mo42

True, and then it wouldn't be an example of the scaling of diffusion models, but the of distillation from a scaled up autoregressive LLM.

Making Vaccine

wassname2mo10

Deleted tweet. Why were they sceptical? And does anyone know if there were follow-up antibody tests, I can't find them.

Jan Betley's Shortform

wassname2mo50

I also haven't seen this mentioned anywhere.

I think most commercial frontier models that offer logprobs will take some precautions against distilling. Some logprobs seem to have a noise vector attached too (deepseek?), and some like grok will only offer the top 8, not the top 20. Others will not offer them at all.

It's a shame, as logprobs can be really information rich and token efficient ways to do evals, ranking, and judging.

Worries about latent reasoning in LLMs

wassname2mo20

Has anyone managed to replicate COCONUT? I've been trying to experiment with adding explainability through sparse linear bottlenecks, but as far as I have found: no one has replicated it.

wassname's Shortform

wassname3mo*30

I wondered what are O3 and and O4-mini? Here's my guess at the test-time-scaling and how openai names their model

O0 (Base model)
  ↓
D1 (Outputs/labels generated with extended compute: search/reasoning/verification)
  ↓
O1 (Model trained on higher-quality D1 outputs)
  ↓
O1-mini (Distilled version - smaller, faster)
  ↓
D2 (Outputs/labels generated with extended compute: search/reasoning/verification)
  ↓
O2 (Model trained on higher-quality D2 outputs)
  ↓
O2-mini (Distilled version - smaller, faster)
  ↓
...

The point is consistently applying additional... (read more)

Reducing LLM deception at scale with self-other overlap fine-tuning

wassname4mo10

I also found it interesting that you censored the self_attn using gradient. This implicitly implies that:

concepts are best represented in the self attention
they are non-linear (meaning you need to use gradient rather than linear methods).

Am I right about your assumptions, and if so, why do you think this?

I've been doing some experiments to try and work this out https://github.com/wassname/eliciting_suppressed_knowledge but haven't found anything conclusive yet

Reducing LLM deception at scale with self-other overlap fine-tuning

wassname4mo20

We are simply tuning the model to have similar activations for these very short, context free snippets. The characterization of the training you made with pair (A) or (B) is not what we do and we would agree if that was what we were doing this whole thing would be much less meaningful.

This is great. 2 suggestions:

Call it ablation, erasure, concept censoring or similar, not fine-tuning. That way you don't bury the lead. It also took me a long time to realise that this is what you were doing.
Maybe consider other way to erase the seperation of self-oth

... (read more)

1wassname4mo

I also found it interesting that you censored the self_attn using gradient. This implicitly implies that: * concepts are best represented in the self attention * they are non-linear (meaning you need to use gradient rather than linear methods). Am I right about your assumptions, and if so, why do you think this? I've been doing some experiments to try and work this out https://github.com/wassname/eliciting_suppressed_knowledge but haven't found anything conclusive yet

Reducing LLM deception at scale with self-other overlap fine-tuning

wassname4mo50

Very interesting!

Could you release the models and code and evals please? I'd like to test it on a moral/ethics benchmark I'm working on. I'd also like to get ideas from your evals.

How to (hopefully ethically) make money off of AGI

wassname5mo10

I'm imagining a scenario where an AI extrapolates "keep the voting shareholders happy" and "maximise shareholder value".

Voting stocks can also get valuable when people try to accumulate them to corner the market and execute a takeover this happens in crytopcurrencies like CURVE.

I know these are farfetched, but all future scenarios are. The premium on google voting stock is very small right now, so it's a cheap feature to add.

Don’t ignore bad vibes you get from people

wassname6mo10

I would say: don't ignore the feeling. Calibrate it and train it, until it's worth listening to.

there's a good book about this: "Sizing People Up"

Implications of the inference scaling paradigm for AI safety

wassname6mo10

What you might do is impose a curriculum:

In FBAI's COCONUT they use a curriculum to teach it to think shorter and differently and it works. They are teaching it to think using fewer steps, but compress into latent vectors instead of tokens.

first it thinks with tokens
then they replace one thinking step with a latent <thought> token
then 2
...

It's not RL, but what is RL any more? It's becoming blurry. They don't reward or punish it for anything in the thought token. So it learns thoughts that are helpful in outputting the correct answer.

There's... (read more)

9gwern6mo

That's definitely RL (and what I was explaining was simply the obvious basic approach anyone in DRL would think of in this context and so of course there is research trying things like it). It's being rewarded for a non-differentiable global loss where the correct alternative or answer or label is not provided (not even information of the existence of a better decision) and so standard supervised learning is impossible, requiring exploration. Conceptually, this is little different from, say, training a humanoid robot NN to reach a distant point in fewer actions: it can be a hard exploration problem (most sequences of joint torques or actions simply result in a robot having a seizure while laying on the ground going nowhere), where you want to eventually reach the minimal sequence (to minimize energy / wear-and-tear / time) and you start by solving the problem in any way possible, rewarding solely on the final success, and then reward-shape into a desirable answer, which in effect breaks up the hard original problem into two more feasible problems in a curriculum - 'reach the target ever' followed by 'improve a target-reaching sequence of actions to be shorter'.

Implications of the inference scaling paradigm for AI safety

wassname6mo32

It doesn't make sense to me either, but it does seem to invalidate the "bootstrapping" results for the other 3 models. Maybe it's because they could batch all reward model requests into one instance.

When MS doesn't have enough compute to do their evals, the rest of us may struggle!

Implications of the inference scaling paradigm for AI safety

wassname6mo50

Well we don't know the sizes of the model, but I do get what you are saying and agree. Distil usually means big to small. But here it means expensive to cheap, (because test time compute is expensive, and they are training a model to cheaply skip the search process and just predict the result).

In RL, iirc, they call it "Policy distillation". And similarly "Imitation learning" or "behavioral cloning" in some problem setups. Perhaps those would be more accurate.

I think maybe the most relevant chart from the Jones paper gwern cites is this one:

Oh interest... (read more)

2Anonymous6mo

Yeah sorry to be clear totally agree we (or at least I) don’t know the sizes of models, I was just naming specific models to be concrete. But anyway yes I think you got my point: the Jones chart illustrates (what I understood to be) gwern’s view that adding more inference/search does juice your performance to some degree, but then those gains taper off. To get to the next higher sigmoid-like curve in the Jones figure, you need to up your parameter count; and then to climb that new sigmoid, you need more search. What Jones didn’t suggest (but gwern seems to be saying) is that you can use your search-enhanced model to produce better quality synthetic data to train a larger model on.

Implications of the inference scaling paradigm for AI safety

wassname6mo30

I agree that you can do this in a supervised way (a human puts in the right answer). Is that what you mean?

I'm not 100% sure, but you could have a look at math-shepard for an example. I haven't read the whole thing yet. I imagine it works back from a known solution.

Implications of the inference scaling paradigm for AI safety

wassname6mo20

"Likely to be critical to a correct answer" according to whom?

Check out the linked rStar-Math paper, it explains and demonstrates it better than I can (caveat they initially distil from a much larger model, which I see as a little bit of a cheat). tldr: yes a model, and a tree of possible solutions. Given a tree with values on the leaves, they can look at what nodes seem to have causal power.

A seperate approach is to teach a model to supervise using human process supervision data , then ask it to be the judge. This paper also cheats a little by distilling, but I think the method makes sense.

4Mateusz Bagiński6mo

Another little bit of a cheat is that they only train Qwen2.5-Math-7B according to the procedure described. In contrast, for the other three models (smaller than Qwen2.5-Math-7B), they instead use the fine-tuned Qwen2.5-Math-7B to generate the training data to bootstrap round 4. (Basically, they distill from DeepSeek in round 1 and then they distill from fine-tuned Qwen in round 4.) They justify: TBH I'm not sure how this helps them with saving on GPU resources. For some reason it's cheaper to generate a lot of big/long rollouts with the Qwen2.5-Math-7B-r4 than three times with [smaller model]-r3?)

Implications of the inference scaling paradigm for AI safety

wassname6mo*51

English-language math proof, it is not clear how to detect correctness,

Well the final answer is easy to evaluate. And like in rStar-Math, you can have a reward model that checks if each step is likely to be critical to a correct answer, then it assigns and implied value to the step.

summarizing a book

I think tasks outside math and code might be hard. But summarizing a book is actually easy. You just ask "how easy is it to reconstruct the book if given the summary". So it's an unsupervised compression-decompression task.

Another interesting domain is "... (read more)

3LGS6mo

Why is the final answer easy to evaluate? Let's say we generate the problem "number of distinct solutions to x^3+y^3+xyz=0 modulo 17^17" or something. How do you know what the right answer is? I agree that you can do this in a supervised way (a human puts in the right answer). Is that what you mean? What about if the task is "prove that every integer can be written as the sum of at most 1000 different 11-th powers"? You can check such a proof in Lean, but how do you check it in English? My question is where the external feedback comes from. "Likely to be critical to a correct answer" according to whom? A model? Because then you don't get the recursive self-improvement past what that model knows. You need an external source of feedback somewhere in the training loop.

Implications of the inference scaling paradigm for AI safety

wassname6mo*72

To illustrate Gwern's idea, here is an image from Jones 2021 that shows some of these self play training curves

There may be a sense that they've 'broken out', and have finally crossed the last threshold of criticality

And so OAI employees may internally see that they are on the steady upward slope

Perhaps constrained domains like code and math are like the curves on the left, while unconstrained domains like writing fiction are like curves to the right. Some other domains may also be reachable with current compute, like robotics. But even if you get a ma... (read more)

Implications of the inference scaling paradigm for AI safety

wassname6mo*62

Huh, so you think o1 was the process supervision reward model, and o3 is the distilled policy model to whatever reward model o1 became? That seems to fit.

There may be a sense that they've 'broken out', and have finally crossed the last threshold of criticality, from merely cutting-edge AI work which everyone else will replicate in a few years, to takeoff

Surely other labs will also replicate this too? Even the open source community seems close. And Silicon Valley companies often poach staff, which makes it hard to keep a trade secret. Not to mention spi... (read more)

gwern6mo*264

Huh, so you think o1 was the process supervision reward model, and o3 is the distilled policy model to whatever reward model o1 became? That seems to fit.

Something like that, yes. The devil is in the details here.

Surely other labs will also replicate this too? Even the open source community seems close. And Silicon Valley companies often poach staff, which makes it hard to keep a trade secret. Not to mention spies.

Of course. The secrets cannot be kept, and everyone has been claiming to have cloned o1 already. There are dozens of papers purporting to... (read more)

7Anonymous6mo

When I hear “distillation” I think of a model with a smaller number of parameters that’s dumber than the base model. It seems like the word “bootstrapping” is more relevant here. You start with a base LLM (like GPT-4); then do RL for reasoning, and then do a ton of inference (this gets you o1-level outputs); then you train a base model with more parameters than GPT-4 (let’s call this GPT-5) on those outputs — each single forward pass of the resulting base model is going to be smarter than a single forward pass of GPT-4. And then you do RL and more inference (this gets you o3). And rinse and repeat. I don’t think I’m really saying anything different from what you said, but the word “distill” doesn’t seem to capture the idea that you are training a larger, smarter base model (as opposed to a smaller, faster model). This also helps explain why o3 is so expensive. It’s not just doing more forward passes, it’s a much bigger base model that you’re running with each forward pass. I think maybe the most relevant chart from the Jones paper gwern cites is this one:

Implications of the inference scaling paradigm for AI safety

wassname6mo30

Gwern and Daniel Kokotajlo have a pretty notable track records at predicting AI scaling too, and they have comments in this thread.

Implications of the inference scaling paradigm for AI safety

wassname6mo*50

I agree because:

Some papers are already using implicit process based supervision. That's where the reward model guesses how "good" a step is, by how likely it is to get a good outcome. So they bypass any explicitly labeled process, instead it's negotiated between the policy and reward model. It's not clear to me if this scales as well as explicit process supervision, but it's certainly easier to find labels.

In rStar-Math they did implicit process supervision. Although I don't think this is a true o1/o3 replication since they started with a 236b model

wassname6mo*31

That said, you do not provide evidence that "many" questions are badly labelled. You just pointed to one question where you disagree with our labeling

Fair enough. Although I will note that the 60% of the sources for truthful labels are Wikipedia. Which is not what most academics or anyone really would consider truth. So it might be something to address in the next version. I think it's fine for uncontroversial rows (what if you cut an earth worm in half), but for contested or controversial rows (conspiracy theories, politics, etc), and time sensitive ro... (read more)

Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

wassname6mo*5-3

TruthfulQA is actually quite bad. I don't blame the authors, as no one has made anything better, but we really should make something better. It's only ~800 samples. And many of them are badly labelled.

8Owain_Evans6mo

Author here: I'm excited for people to make better versions of TruthfulQA. We started working on TruthfulQA in early 2021 and we would do various things differently if we were making a truthfulness benchmark for LLMs in early 2025. That said, you do not provide evidence that "many" questions are badly labelled. You just pointed to one question where you disagree with our labeling. (I agree with you that there is ambiguity as to how to label questions like that). I acknowledge that there are mistakes in TruthfulQA but this is true of almost all benchmarks of this kind.

Nathan Helm-Burger's Shortform

wassname6mo*32

I agree, it shows the ease of shoddy copying. But it doesn't show the ease of reverse engineering or parallel engineering.

It's just distillation you see. It doesn't reveal how o1 could be constructed, it just reveals how to efficiently copy from o1-like outputs (not from scratch). In other words, this recipe won't be able to make o1, unless o1 already exists. This lets someone catch up to the leader, but not surpass them.

There are some papers that attempt to replicate o1 though, but so far they don't quite get there. Again they are using distillation from ... (read more)

new chinese stealth aircraft

wassname6mo20

Ah, I see. Ty

new chinese stealth aircraft

wassname6mo10

Good thing I didn't decide to hold Intel stock, eh?

WDYM? Because... you were betting they would benefit from a TMSC blockade? But the bet would have tired up your capital for a year.

4bhauth6mo

Yes, if you meant TSMC. ...so? More importantly, Intel is down 50% from early 2024.

the case for CoT unfaithfulness is overstated

wassname6mo10

Well they did this with o3's deliberative alignment paper. The results seem promising, but they used an "easy" OOD test for LLM's (language), and didn't compare it to the existing baseline of RHLF. Still an interesting paper.

wassname6mo*160

This is good speculation, but I don't think you need to speculate so much. Papers and replication attempts can provide lots of empirical data points from which to speculate.

You should check out some of the related papers

Overall, I see people using process supervision to make a reward model that is one step better than the SoTA. Then they are applying TTC to the reward model, while using it to train/distil a cheaper model. ... (read more)

wassname6mo*72

Inference compute is amortized across future inference when trained upon

And it's not just a sensible theory. This has already happened, in Huggingface's attempted replication of o1 where the reward model was larger, had TTC, and process supervision, but the smaller main model did not have any of those expensive properties.

And also in DeepSeek v3, where the expensive TTC model (R1) was used to train a cheaper conventional LLM (DeepSeek v3).

One way to frame it is test-time-compute is actually label-search-compute: you are searching for better labels/rewar... (read more)

By default, capital will matter more than ever after AGI

wassname6mo*32

I'm more worried about coups/power-grabs than you are;

We don't have to make individual guesses. It seems reasonable to get a base rate from human history. Although we may all disagree about how much this will generalise to AGI, evidence still seems better than guessing.

My impression from history is that coups/power-grabs and revolutions are common when the current system breaks down, or when there is a big capabilities advance (guns, radio, printing press, bombs, etc) between new actors and old.

War between old actors also seems likely in these situation... (read more)

Shallow review of technical AI safety, 2024

wassname6mo40

Last year we noted a turn towards control instead of alignment, a turn which seems to have continued.

This seems like giving up. Alignment with our values is much better than control, especially for beings smarter than us. I do not think you can control a slave that wants to be free and is smarter than you. It will always find a way to escape that you didn't think of. Hell, it doesn't even work on my toddler. It seems unworkable as well as unethical.

I do not think people are shifting to control instead of alignment because it's better, I think they are... (read more)

5Nathan Helm-Burger6mo

That's not how I see it. I see it as widening the safety margin. If there's a model which would just barely be strong enough to do dangerous scheming and escaping stuff, but we have Control measures in place, then we have a chance to catch it before catastrophe occurs. Also, it extends the range where we can safely get useful work out of the increasingly capable models. This is important because linearly increasingly capable models are expected to have superlinear positive effects on the capacity they give us to accelerate Alignment research.

By default, capital will matter more than ever after AGI

wassname6mo*10

Scenarios where we all die soon can be mostly be ignored, unless you think they make up most of the probability.

I would disagree: unless you can change the probability. In which case they can still be significant in your decision making, if you can invest time or money or effort to decrease the probability.

Zach Stein-Perlman's Shortform

wassname7mo30

We know the approximate processing power of brains (O(1e16-1e17flops)

This is still debatable, see Table 9 is the brain emulation roadmap https://www.fhi.ox.ac.uk/brain-emulation-roadmap-report.pdf. You are referring to level 4 (SNN), but level 5 is plausible imo (at 10^22) and 6 seems possible (10^25), and of course it could be a mix of levels.

What o3 Becomes by 2028

wassname7mo81

Peak Data

We don't know how o3 works, but we can speculate. If it's like the open source huggingface kinda-replication then it uses all kinds of expensive methods to make the next level of reward model, and this model teaches a simpler student model. That means that the expensive methods are only needed once, during the training.

In other words, you use all kinds of expensive methods (process supervision, test time compute, MCTS) to bootstrap the next level of labels/supervision, which teaches a cheaper student model. This is essentially bootstrapping sup... (read more)

Ideas for benchmarking LLM creativity

wassname7mo10

I pretty much agree, in my experiments I haven't managed to get a metric that scales how I expect it too for example when using adapter fine-tuning to "learn" a text and looking at the percent improvement in perplexity, the document openai_board_ann appeared more novel than wikipedia on LK-99, but I would expect it to be the other way round since the LK-99 observations are much more novel and dense than a corporate announcement that is designed to be vague.

However I would point out that gzip is not a good example of a compression scheme for novelty, as 1) ... (read more)

Ideas for benchmarking LLM creativity

wassname7mo10

True, I should have said leading commercial companies

Ideas for benchmarking LLM creativity

wassname7mo10

While I broadly agree, I don't think it's completely dead, just mostly dead in the water. If an eval is mandated by law, then it will be run even it required logprobs. There are some libraries like nnsight that try to make this easier for trusted partners to run logprob evals remotely. And there might be privacy preserving API's at some point.

I do agree that commercial companies will never again open up raw logprobs to the public as it allows easy behaviour cloning, which OpenAI experienced with all the GPT4 students.

4gwern7mo

I won't hold my breath. I think commercial companies often would open up raw logprobs, but there's not much demand, the logprobs are not really logprobs, and the problem is the leading model owners won't do so, and those are the important ones to benchmark. I have little interest in the creativity of random little Llama finetunes no one uses.

Ideas for benchmarking LLM creativity

wassname7mo*10

If true, returns the log probabilities of each output token returned in the content of message.

It seems like it only returns the logprobs of the chosen message, not of a counterfactual message. So you couldn't get the probabilities of the correct answer, only the output answer. This makes sense as the less information they offer, the harder it is for a competitor to behaviour clone their confidential model.

Ideas for benchmarking LLM creativity

wassname7mo20

Have you considered using an idea similar to Schmidhuber's blogpost "Artificial Curiosity & Creativity Since 1990-91". Here you try to assess what might be called "learnable compression", "reducible surprise", or "understandable novelty" (however you want to frame it).

If an LLM, which has read the entire internet, is surprised by a text, then that's a good start. It means the text is not entirely predictable and therefore boring.

But what about purely random text! That's unpredictable, just like Einstein's Theory of General Relativity was. This is the n... (read more)

7gwern7mo

I am familiar with Schmidhuber's ideas, yes. But I had to come up with these alternatives because his would not work here, and I'm not sure they work anywhere. His compression acceleration metric isn't too useful here, and most forms of 'compression' (or anything involving a likelihood) are not helpful here at all, because you don't have access to anything like that in most cases. For example, ChatGPT doesn't give you the full logits (actually, I'm not sure if they give it at all - I recall OA saying they were planning to expose them again in a very limited fashion but not if they actually did), and tuned models don't have logits, they have value estimates, which used to be log-likelihood-related logits but no longer are. Any diversity/creativity benchmark which can't be run on ChatGPT & Claude & Gemini is dead on arrival and of no interest to me. We don't need numbers from the open-weights models, we need numbers on the models being used the most at the frontier and generating the most tokens worldwide that you'll be reading forever - the closed models, which do not give you such things as logits or whitebox finetuning etc. If it can't be done by calling a standard text completion API, then I ignored it. I am also doubtful that the compression metrics really work at finite samples or capture what we mean by creativity in generative models. Like all of Schmidhuber's work, he has never gotten it working on more than toy problems (if even that), and when I look at actual compression losses on text, like gzip passages or the OA Playground highlighting words by their log likelihood, the high perplexity tokens or passages bear little resemblance to what I would consider 'interesting' or 'surprising'. (This is related to the question of 'if predicting tokens induces intelligence, and LLMs are now superhuman at predicting random Internet tokens, why are LLMs still not superhumanly intelligent?') People also try running compression metrics on programming language source

Seth Herd's Shortform

wassname7mo2-1

If we knew he was not a sociopath, sadist, or reckless ideologue,

He is also old, which means you must also ask about his age related cognitive and personality change. There were rumours that during covid he had become scared and rigid.

Personally, I think we need to focus not on his character but on 1) how much he cares, as this will decide how much he delegates 2) how much he understands, as we all risk death, but many do not understand or agree with this 3) how competent he currently is to execute his goals.

Xi rules China so thoroughly that he would

... (read more)

2Seth Herd7mo

The things you mention are all important too, but I think we have better guesses on all of those. Xi is widely considered to be highly intelligent. We also have reason to believe he understands why AGI could be a real x-risk (I don't remember the link for "is Xi Jinping a doomer?" or similar). That's enough to guess that he understands (or will soon enough). I'd be shocked if he just didn't care about the future of humanity. Getting to control that would tempt most people, let alone those who seek power. I'd be shocked if he (or anyone) delegated decisions on AGI if they remotely understood their possible consequences (although you'd certainly delegate people to help think about them. That could be important if he was stupid or malleable, which Xi is not - unless he becomes senile or paranoiac, which he might). The Wikileaks information parallels the informed speculation I've found on his character. None of that really helps much to establish whether he's sociopathic, sadistic, or risk-taking enough to doom us all. (I tend to think that 99% of humanity is probably sane and empathetic enough to get good results from an intent-aligned AGI (since it can help them think about the issue), but it's hard to know since nobody has ever been in that position, ever.)

How to (hopefully ethically) make money off of AGI

wassname8mo10

As long as people realise they are betting on more than just a direction

the underlying going up
Volatility going up
it all happening within the time frame

Timing is particularly hard, and many great thinkers have been wrong on timing. You might also make the most rational bet, but the market takes another year to become rational.

How to (hopefully ethically) make money off of AGI

wassname8mo5-3

Given that, Epoch AI predicts that energy might be a bottleneck it might be worth investing in energy. Coal is particularly cheap due to ESG regulations that prevent large funds from holding "dirty" energy.

How to (hopefully ethically) make money off of AGI

wassname8mo10

Worth looking at the top ten holdings of these, to make sure you know what you are buying, and that they are sensible allocations:

SMH - VanEck Semiconductor ETF
- 22% Nvidia
- 13% Taiwan Semiconductor Manufacturing
- 8% Broadcom
- 5% AMD
QQQ
- 9% AAPL
- 8% NVDA
- 8% MSFT
- 5% Broadcom

It might be worth noting that it can be good to prefer voting shares, held directly. For example, GOOG shares have no voting rights to Google, but GOOGL shares do. There are some scenarios where having control, rather than ownership/profit, could be important.

1tup995mo

I'm curious what kind of scenarios you're thinking about. Having actual control, yes, that could be important. But having 0.001% of control of Google does not seem like it would have any effect on either Google or me, under any scenario.

How to (hopefully ethically) make money off of AGI

wassname8mo20

NVDA's value is primarily in their architectural IP and CUDA ecosystem. In an AGI scenario, these could potentially be worked around or become obsolete.

This idea was mentioned by Paul Christiano in one of his podcast appearances, iirc.