We often hear "We don't trade with ants" as an argument against AI cooperating with humans. But we don't trade with ants because we can't communicate with them, not because they're useless – ants could do many useful things for us if we could coordinate. AI will likely be able to communicate with us, and Katja questions whether this analogy holds.

Customize


Context: LessWrong has been acquired by EA 

Goodbye EA. I am sorry we messed up. 

EA has decided to not go ahead with their acquisition of LessWrong.

Just before midnight last night, the Lightcone Infrastructure board presented me with information suggesting at least one of our external software contractors has not been consistently candid with the board and me. Today I have learned EA has fully pulled out of the deal.

As soon as EA had sent over their first truckload of cash, we used that money to hire a set of external software contractors, vetted by the most agentic and advanced resume review AI system that we could hack together. 

We also used it to launch the biggest prize the rationality community has seen, a true search for the kwisatz haderach of rationality. $1M dollars for the first person to master all twelve virtues. 

Unfortunately, it appears that one of the software contractors we hired inserted a backdoor into our code, preventing anyone except themselves and participants excluded from receiving the prize money from collecting the final virtue, "The void". Some participants even saw themselves winning this virtue, but the backdoor prevented them mastering this final and most crucial rationality virtue at the last possible second.

They then created an alternative account, using their backdoor to master all twelve virtues in seconds. As soon as our fully automated prize systems sent over the money, they cut off all contact.

Right after EA learned of this development, they pulled out of the deal. We immediately removed all code written by the software contractor in question from our codebase. They were honestly extremely productive, and it will probably take us years to make up for this loss. We will also be rolling back any karma changes and reset the vote strength of all votes cast in the last 24 hours, since while we are confident that if our system had worked our karma system would have been greatly improved, the risk of further backdoors and hidden accounts is too big.

We will not be refunding the enormous boatloads of cash[1] we were making in the sale of Picolightcones, as I am assuming you all read our sale agreement carefully[2], but do reach out and ask us for a refund if you want.

Thank you all for a great 24 hours though. It was nice while it lasted.

  1. ^

    $280! Especially great thanks to the great whale who spent a whole $25.

  2. ^

    IMPORTANT Purchasing microtransactions from Lightcone Infrastructure is a high-risk indulgence. It would be wise to view any such purchase from Lightcone Infrastructure in the spirit of a donation, with the understanding that it may be difficult to know what role custom LessWrong themes will play in a post-AGI world. LIGHTCONE PROVIDES ABSOLUTELY NO LONG-TERM GUARANTEES THAT ANY SERVICES SUCH RENDERED WILL LAST LONGER THAN 24 HOURS.

Context: LessWrong has been acquired by EA  Goodbye EA. I am sorry we messed up.  EA has decided to not go ahead with their acquisition of LessWrong. Just before midnight last night, the Lightcone Infrastructure board presented me with information suggesting at least one of our external software contractors has not been consistently candid with the board and me. Today I have learned EA has fully pulled out of the deal. As soon as EA had sent over their first truckload of cash, we used that money to hire a set of external software contractors, vetted by the most agentic and advanced resume review AI system that we could hack together.  We also used it to launch the biggest prize the rationality community has seen, a true search for the kwisatz haderach of rationality. $1M dollars for the first person to master all twelve virtues.  Unfortunately, it appears that one of the software contractors we hired inserted a backdoor into our code, preventing anyone except themselves and participants excluded from receiving the prize money from collecting the final virtue, "The void". Some participants even saw themselves winning this virtue, but the backdoor prevented them mastering this final and most crucial rationality virtue at the last possible second. They then created an alternative account, using their backdoor to master all twelve virtues in seconds. As soon as our fully automated prize systems sent over the money, they cut off all contact. Right after EA learned of this development, they pulled out of the deal. We immediately removed all code written by the software contractor in question from our codebase. They were honestly extremely productive, and it will probably take us years to make up for this loss. We will also be rolling back any karma changes and reset the vote strength of all votes cast in the last 24 hours, since while we are confident that if our system had worked our karma system would have been greatly improved, the risk of further backdoors and

Coordinal Research: Accelerating the research of safely deploying AI systems.

 

We just put out a Manifund proposal to take short timelines and automating AI safety seriously. I want to make a more detailed post later, but here it is: https://manifund.org/projects/coordinal-research-accelerating-the-research-of-safely-deploying-ai-systems 

Coordinal Research: Accelerating the research of safely deploying AI systems.   We just put out a Manifund proposal to take short timelines and automating AI safety seriously. I want to make a more detailed post later, but here it is: https://manifund.org/projects/coordinal-research-accelerating-the-research-of-safely-deploying-ai-systems 

every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you're a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.

every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you're a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.

Some versions of the METR time horizon paper from alternate universes:

Measuring AI Ability to Take Over Small Countries (idea by Caleb Parikh)

Abstract: Many are worried that AI will take over the world, but extrapolation from existing benchmarks suffers from a large distributional shift that makes it difficult to forecast the date of world takeover. We rectify this by constructing a suite of 193 realistic, diverse countries with territory sizes from 0.44 to 17 million km^2. Taking over most countries requires acting over a long time horizon, with the exception of France. Over the last 6 years, the land area that AI can successfully take over with 50% success rate has increased from 0 to 0 km^2, doubling 0 times per year (95% CI 0.0-∞ yearly doublings); extrapolation suggests that AI world takeover is unlikely to occur in the near future. To address concerns about the narrowness of our distribution, we also study AI ability to take over small planets and asteroids, and find similar trends.

When Will Worrying About AI Be Automated?

Abstract: Since 2019, the amount of time LW has spent worrying about AI has doubled every seven months, and now constitutes the primary bottleneck to AI safety research. Automation of worrying would be transformative to the research landscape, but worrying includes several complex behaviors, ranging from simple fretting to concern, anxiety, perseveration, and existential dread, and so is difficult to measure. We benchmark the ability of frontier AIs to worry about common topics like disease, romantic rejection, and job security, and find that current frontier models such as Claude 3.7 Sonnet already outperform top humans, especially in existential dread. If these results generalize to worrying about AI risk, AI systems will be capable of autonomously worrying about their own capabilities by the end of this year, allowing us to outsource all our AI concerns to the systems themselves.

Estimating Time Since The Singularity

Early work on the time horizon paper used a hyperbolic fit, which predicted that AGI (AI with an infinite time horizon) was reached last Thursday. [1] We were skeptical at first because the R^2 was extremely low, but recent analysis by Epoch suggested that AI already outperformed humans at a 100-year time horizon by about 2016. We have no choice but to infer that the Singularity has already happened, and therefore the world around us is a simulation. We construct a Monte Carlo estimate over dates since the Singularity and simulator intentions, and find that the simulation will likely be turned off in the next three to six months.

[1]: This is true

BuckΩ13270

A few months ago, I accidentally used France as an example of a small country that it wouldn't be that catastrophic for AIs to take over, while giving a talk in France 😬

1wonder
Would the take over for small countries also about humans using just an advanced AI for taking over? (or would the human using advanced AI for take over happen faster?)
Some versions of the METR time horizon paper from alternate universes: Measuring AI Ability to Take Over Small Countries (idea by Caleb Parikh) Abstract: Many are worried that AI will take over the world, but extrapolation from existing benchmarks suffers from a large distributional shift that makes it difficult to forecast the date of world takeover. We rectify this by constructing a suite of 193 realistic, diverse countries with territory sizes from 0.44 to 17 million km^2. Taking over most countries requires acting over a long time horizon, with the exception of France. Over the last 6 years, the land area that AI can successfully take over with 50% success rate has increased from 0 to 0 km^2, doubling 0 times per year (95% CI 0.0-∞ yearly doublings); extrapolation suggests that AI world takeover is unlikely to occur in the near future. To address concerns about the narrowness of our distribution, we also study AI ability to take over small planets and asteroids, and find similar trends. When Will Worrying About AI Be Automated? Abstract: Since 2019, the amount of time LW has spent worrying about AI has doubled every seven months, and now constitutes the primary bottleneck to AI safety research. Automation of worrying would be transformative to the research landscape, but worrying includes several complex behaviors, ranging from simple fretting to concern, anxiety, perseveration, and existential dread, and so is difficult to measure. We benchmark the ability of frontier AIs to worry about common topics like disease, romantic rejection, and job security, and find that current frontier models such as Claude 3.7 Sonnet already outperform top humans, especially in existential dread. If these results generalize to worrying about AI risk, AI systems will be capable of autonomously worrying about their own capabilities by the end of this year, allowing us to outsource all our AI concerns to the systems themselves. Estimating Time Since The Singularity Early work o

More Dakka On Your Expectations

After hearing my friend talk about his roommate’s brash decision-making from the despair at getting rejected by girls he liked several times, my friend mentioned that his roommate had asked out a total of three people since high school. Only three!

While there are more factors in the story involved, I’ve heard similar enough troubles that it seems worth saying: Three people is not a lot. Certainly not enough rejections to merit the magnitude of self-worth issues people can walk away with that few from.

If you had the expectation that if the first person you ask out didn’t like you then you’re doomed to loneliness, then a (probable enough) failure would be such a damaging experience you might not try again for a long time. If you instead believed that number was two, the first rejection would hurt considerably less. The higher you go, the less it hurts.

Maybe the expectation we implicitly have from culture is low enough to make three rejections somewhat sting. Why should it? Why shouldn’t that threshold be something like sixty? Or a hundred?

If you can only alter your chances by acquiring skills, improving yourself, and looking harder, then each rejection is valuable new data on what to do better next time. None should be felt as a failure towards your goal by any means. Rejections are an indicator of progress.

More Dakka On Your Expectations After hearing my friend talk about his roommate’s brash decision-making from the despair at getting rejected by girls he liked several times, my friend mentioned that his roommate had asked out a total of three people since high school. Only three! While there are more factors in the story involved, I’ve heard similar enough troubles that it seems worth saying: Three people is not a lot. Certainly not enough rejections to merit the magnitude of self-worth issues people can walk away with that few from. If you had the expectation that if the first person you ask out didn’t like you then you’re doomed to loneliness, then a (probable enough) failure would be such a damaging experience you might not try again for a long time. If you instead believed that number was two, the first rejection would hurt considerably less. The higher you go, the less it hurts. Maybe the expectation we implicitly have from culture is low enough to make three rejections somewhat sting. Why should it? Why shouldn’t that threshold be something like sixty? Or a hundred? If you can only alter your chances by acquiring skills, improving yourself, and looking harder, then each rejection is valuable new data on what to do better next time. None should be felt as a failure towards your goal by any means. Rejections are an indicator of progress.

Popular Comments

As a newly minted +100 strong upvote, I think the current karma economy accurately reflects how my opinion should be weighted
Strong disagree. This is an ineffective way to create boredom. Showers are overly stimulating, with horrible changes in temperature, the sensation of water assaulting you nonstop, and requiring laborious motions to do the bare minimum of scrubbing required to make society not mad at you. A much better way to be bored is to go on a walk outside or lift weights at the gym or listen to me talk about my data cleaning issues
Load More

Recent Discussion

4Alexander Gietelink Oldenziel
I've heard of this extraordinary finding. As for any extraordinary evidence, the first question should be: is the data accurate? Does anybody know if this has been replicated?

I briefly glanced at wikipedia and there seemed to be two articles supporting it. This one might be the one I'm referring to (if not, it's a bonus) and this one seems to suggest that conscious perception has been trained.

has anyone seen a good way to comprehensively map the possibility space for AI safety research?

in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.

most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.

for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)

More Dakka On Your Expectations

After hearing my friend talk about his roommate’s brash decision-making from the despair at getting rejected by girls he liked several times, my friend mentioned that his roommate had asked out a total of three people since high school. Only three!

While there are more factors in the story involved, I’ve heard similar enough troubles that it seems worth saying: Three people is not a lot. Certainly not enough rejections to merit the magnitude of self-worth issues people can walk away with that few from.

If you had the expec... (read more)

A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".

If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.

For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].

But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...

2tailcalled
The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.
2Davidmanheim
I don't understand your argument here.

I'm showing that the assumptions necessary for your argument don't hold, so you need to better understand your own argument.

2tailcalled
This. In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value. In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue. While there are certain circumstances where consequentialism can specify this virtue, it's quite difficult to do in general. (E.g. you can't just minimize the difference between f(x) and y because then it might manipulate x instead of y.)

President Trump's second term has brought sweeping policy overhauls in international aid, trade agreements, and immigration enforcement, alongside growing tensions with the judiciary. These rapid changes have increased uncertainty about the US's future and its role on the world stage. Forecasting can help ground our thinking about the likely impacts of the Trump administration, helping to deliver greater clarity to the public on key issues by transforming competing narratives into quantifiable, testable predictions. 

Make your predictions in Metaculus's POTUS Predictions Tournament and compete for $15,000 on questions like:

...

First post of @Helen Toner (of OpenAI board crisis fame)'s new Substack

It used to be a bold claim, requiring strong evidence, to argue that we might see anything like human-level AI any time in the first half of the 21st century. This 2016 post, for instance, spends 8,500 words justifying the claim that there is a greater than 10% chance of advanced AI being developed by 2036.

(Arguments about timelines typically refer to “timelines to AGI,” but throughout this post I’ll mostly refer to “advanced AI” or “human-level AI” rather than “AGI.” In my view, “AGI” as a term of art tends to confuse more than it clarifies, since different experts use it in such different ways.1 So the fact that “human-level AI” sounds vaguer than “AGI” is

...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)

Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50...

4leogao
shorter sentences are better because they communicate more clearly. i used to speak in much longer and more abstract sentences, which made it harder to understand me. i think using shorter and clearer sentences has been obviously net positive for me. it even makes my thinking clearer, because you need to really deeply understand something to explain it simply.
9Arjun Panickssery
Shorter sentences are better. Why? Because they communicate clearly. I used to speak in long sentences. And they were abstract. Thus I was hard to understand. Now I use short sentences. Clear sentences.  It's been net-positive. It even makes my thinking clearer. Why? Because you need to deeply understand something to explain it simply.

goodhart

2Kaj_Sotala
Wouldn't people not knowing specific words or ideas be equally compatible with "you can't refer to the concept with a single word so you have to explain it, leading to longer sentences"?

Epistemic status: This should be considered an interim research note. Feedback is appreciated. 

Introduction

We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well. 

In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).

We find that GPT-4o tends to respond in a consistent manner...

  • Should we think about it almost as though it were a base model within the RLHFed model, where there's no optimization pressure toward censored output or a persona?
  • Or maybe a good model here is non-optimized chain-of-thought (as described in the R1 paper, for example): CoT in reasoning models does seem to adopt many of the same patterns and persona as the model's final output, at least to some extent.
  • Or does there end up being significant implicit optimization pressure on image output just because the large majority of the circuitry is the same?

I think it's... (read more)

1CBiddulph
Quick follow-up investigation regarding this part: I gave ChatGPT the transcript of my question and its image-gen response, all in text format. I didn't provide any other information or even a specific request, but it immediately picked up on the logical inconsistency: https://chatgpt.com/share/67ef0d02-e3f4-8010-8a58-d34d4e2479b4
1Tao Lin
Note that the weights of 'gpt-4o image generation' may not be the same - they may be separate finetuned models! The main 4o chat llm calls a tool start generating an image, which may use the same weights but may just use different weights that have different post training
2cubefox
source

One downside of an English chain-of-thought, is that each token contains only  bits of information, creating a tight information bottleneck.

Don't take my word for it, look at this section from a story by Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo:

[...] One such breakthrough is augmenting the AI’s text-based scratchpad (chain of thought) with a higher-bandwidth thought process (neuralese recurrence and memory). [...]

Neuralese recurrence and memory

Neuralese recurrence and memory allows AI models to reason for a longer time without having to write down those thoughts as text.

Imagine being a human with short-term memory loss, such that you need to constantly write down your thoughts on paper so that in a few minutes you know what’s going on. Slowly and painfully you could make progress at solving math

...