Customize
MichaelDickens*10527
34
I find it hard to trust that AI safety people really care about AI safety. * DeepMind, OpenAI, Anthropic, and SSI were all founded in the name of safety. Instead they have greatly increased danger. And at least OpenAI and Anthropic have been caught lying about their motivations: * OpenAI: claiming concern about hardware overhang and then trying to massively scale up hardware; promising compute to superalignment team and then not giving it; telling board that model passed safety testing when it hadn't; too many more to list. * Anthropic: promising (in a mealy-mouthed technically-not-lying sort of way) not to push the frontier, and then pushing the frontier; trying (and succeeding) to weaken SB-1047; lying about their connection to EA (that's not related to x-risk but it's related to trustworthiness). * For whatever reason, I had the general impression that Epoch is about reducing x-risk (and I was not the only one with that impression) but: * Epoch is not about reducing x-risk, and they were explicit about this but I didn't learn it until this week * its FrontierMath benchmark was funded by OpenAI and OpenAI allegedly has access to the benchmark (see comment on why this is bad) * some of their researchers left to start another build-AGI startup (I'm not sure how badly this reflects on Epoch as an org but at minimum it means donors were funding people who would go on to work on capabilities) * Director Jaime Sevilla believes "violent AI takeover" is not a serious concern, and "I also selfishly care about AI development happening fast enough that my parents, friends and myself could benefit from it, and I am willing to accept a certain but not unbounded amount of risk from speeding up development", and "on net I support faster development of AI, so we can benefit earlier from it" which is a very hard position to justify (unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born) * I feel bad picking on Epoch/
asher169
0
I often hear people dismiss AI control by saying something like, "most AI risk doesn't come from early misaligned AGIs." While I mostly agree with this point, I think it fails to engage with a bunch of the more important arguments in favor of control— for instance, the fact that catching misaligned actions might be extremely important for alignment. In general, I think that control has a lot of benefits that are very instrumentally useful for preventing misaligned ASIs down the line, and I wish people would more thoughtfully engage with these.
Announcing PauseCon, the PauseAI conference. Three days of workshops, panels, and discussions, culminating in our biggest protest to date. Tweet: https://x.com/PauseAI/status/1915773746725474581 Apply now: https://pausecon.org
From marginal revolution.  What does this crowd think? These effects are surprisingly small. Do we believe these effects? Anecdotally the effect of LLMs has been enormous for my own workflow and colleagues. How can this be squared with the supposedly tiny labor market effect?  Are we that selected of a demographic?
Should i drop uni, because of AI? >Recently, i've read ai-2027.com and even before that, i was pretty worried about my future. Been considering Yudkowsky's stance, prediction markets on the issue, etc. >i'm 19, come from an "upper–middle^+" economy EU country, 1st year BSc maths student, planned to do sth with finance or data analysis(maybe masters) after but in the light of the recent ai progress, I now view it as a dead end. 'cause by the time I graduate (~mid/late 2027) i bet there'll be an agi doing my "brain work" faster, better, and cheaper. >will try to quickly obtain some blue-collar job qualifications, that (for now) seem to not be in the "in-risk-of-ai-replacement" jobs. + many of them seem to have not-so-bad salaries in EU particularly >maybe emigrate inside EU for a better pay and to be able to legally marry my partner _____________________ I’m not a top student, haven’t done IMO, which makes me feel less ambitious about CVs and internships as I didn’t actively seek experience in finance this year or before. So i don’t see a clear path into fin-/tech without qualifications right now. So maybe working ~not-complex job, enjoying life(traveling, partying, doing my human things, being with the partner etc) during the next 2-3 years, before a potential civilizational collapse(or trying to get somewhere, where UBI is more likely) will be a better thing than missing out on social life and generally not-so-enjoying my pretty *hard* studies, with a not so hypothetical potential to just waste those years..

Popular Comments

Recent Discussion

Come get old-fashioned with us, and let's read the sequences at Lighthaven! We'll show up, mingle, do intros, and then get to beta test an app Lightcone is developing for LessOnline. Please do the reading beforehand - it should be no more than 20 minutes of reading. And BRING YOUR LAPTOP!!! You'll need it for the app.

This group is aimed for people who are new to the sequences and would enjoy a group experience, but also for people who've been around LessWrong and LessWrong meetups for a while and would like a refresher.

This meetup will also have dinner provided! We'll be ordering pizza-of-the-day from Sliver (including 2 vegan pizzas). Please RSVP to this event so we know how many people to have food for.

This week we'll be...

Hear me out, I think the most forbidden technique is very useful and should be used, as long as we avoid the "most forbidden aftertreatment:"

  1. An AI trained on interpretability techniques must not be trained on capabilities after (or during) it is trained on interpretability techniques, otherwise it will relearn bad behaviour—in a more sneaky way.
  2. An AI trained on interpretability techniques cannot be trusted any more than old version of itself, which hasn't been trained on interpretability techniques yet. Evaluations must be performed on the old version of itself.
    • An AI company which trains its AI on interpretability techniques, must publish the old version (which hasn't been trained on them), with the same availability as the new version.

The natural selection argument:

The reason why the most forbidden technique is forbidden,...

If there turns out not to be an AI crash, you get a  1/(1+7) * $25,000 = $3,125
If there is an AI crash, you transfer $25k to me.

If you believe that AI is going to keep getting more capable, pushing rapid user growth and work automation across sectors, this is near free money. But to be honest, I think there will likely be an AI crash in the next 5 years, and on average expect to profit well from this one-year bet. 

If I win, I want to give the $25k to organisers who can act fast to restrict the weakened AI corps in the wake of the crash. So bet me if you're highly confident that you'll win or just want to hedge the community against the...

Remmelt, if you wish, I'm happy to operationalize a bet. I think you're wrong.

American democracy currently operates far below its theoretical ideal. An ideal democracy precisely captures and represents the nuanced collective desires of its constituents, synthesizing diverse individual preferences into coherent, actionable policy.

Today's system offers no direct path for citizens to express individual priorities. Instead, voters select candidates whose platforms only approximately match their views, guess at which governmental level—local, state, or federal—addresses their concerns, and ultimately rely on representatives who often imperfectly or inaccurately reflect voter intentions. As a result, issues affecting geographically dispersed groups—such as civil rights related to race, gender, or sexuality—are frequently overshadowed by localized interests. This distortion produces presidential candidates more closely aligned with each other's socioeconomic profiles than with the median voter.

Traditionally, aggregating individual preferences required simplifying complex desires into binary candidate selections,...

2JWJohnston
I developed this idea here: https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15. Pretty much the same byline as yours. :-)

wildly parallel thinking and prototyping. i'd hop on a call.

1JWJohnston
Fixed link: https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15
1JWJohnston
Been a while since I posted here. https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15 

[Thanks to Steven Byrnes for feedback and the idea for section §3.1. Also thanks to Justis from the LW feedback team.]

Remember this?

Or this?

The images are from WaitButWhy, but the idea was voiced by many prominent alignment people, including Eliezer Yudkowsky and Nick Bostrom. The argument is that the difference in brain architecture between the dumbest and smartest human is so small that the step from subhuman to superhuman AI should go extremely quickly. This idea was very pervasive at the time. It's also wrong. I don't think most people on LessWrong have a good model of why it's wrong, and I think because of this, they don't have a good model of AI timelines going forward.

1. Why Village Idiot to Einstein is a Long Road: The Two-Component

...

Remember that we have no a priori reason to suspect that there are jumps in the future; humans perform sequential reasoning differently, so comparisons to the brain are just not informative.

In what way do we do it differently than the reasoning models?

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

2Cole Wyeth
Sure
1ceba
I'd like to see your work, when it's ready to be shared.
1ceba
What environmental selection pressures are there on AGI? That's too vague, isn't it? (What's the environment?) How do you narrow this down to where the questions you're asking are interesting/reaearcheable?

Ah but you don't even need to name selection pressures to make interesting progress. As long as you know some kinds of characteristics powerful AI agents might have: eg goals, self models... then we can start to ask--what goals/self models will the most surviving AGIs have?

and you can make progress on both, agnostic of environment. but then, once you enumerate possible goals/self models, then we can start to think about which selection pressures might influence those characteristics in good directions and which levers we can pull today to shape those pressures.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

This is a cross-post from https://250bpm.substack.com/p/accountability-sinks

Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an abrupt end after an incident at Schiphol Airport on the outskirts of Amsterdam. In April 1999, a cargo of 440 of the rodents arrived on a KLM flight from Beijing, without the necessary import papers. Because of this, they could not be forwarded on to the customer in Athens. But nobody was able to correct the error and send them back either. What could be done with them? It’s hard to think there wasn’t a better solution than the one that was carried out; faced with the paperwork issue, airport staff threw all 440 squirrels into an industrial shredder.

[...]

It turned out that the order to destroy

...

Well not to dig in or anything but if I have a chance to automate something I'm going to think of it in terms of precision/recall/long tails, not in terms of the joy of being able to blame a single person when something goes wrong. There are definitely better coordination/optimization models than "accountability sinks." I don't love writing a riposte to a concept someone else found helpful but it really is on the edge between "sounds cool, means nothing" and "actively misleading" so I'm bringing it up.

The Nuremberg defense discussion is sketch. The author ... (read more)

3sam
I'm not sure that focusing on the outcomes makes sense when thinking about the psychology of individual soldiers. Presumably refusal was rare enough that most soldiers were unaware of what the outcome of refusal was in practice. I think it would probably be rational for soldiers to expect severe consequences absent being aware of a specific case of refusal going unpunished.
3Brinedew
To give an example of how disastrously incompetence can interact with the lack of personal accountability in medicine, a recent horrifying case I found was this one: Doctor indicted without being charged for professional negligence resulting in injury This youtube video goes over the case. An excerpt: One aspect I found interesting: Japan's defamation laws are so severe that the hospital staff whistleblowers had to resort to drawing a serialized manga about a "fictional" incompetent neurosurgeon to signal the alarm.

30 years ago, the Cold War was raging on. If you don’t know what that is, it was the period from 1947 to 1991 where both the U.S and Russia had large stockpiles of nuclear weapons and were threatening to use them on each other. The only thing that stopped them from doing so was the knowledge that the other side would have time to react. The U.S and Russia both had surveillance systems to know of the other country had a nuke in the air headed for them.

On this day, September 26, in 1983, a man named Stanislav Petrov was on duty in the Russian surveillance room when the computer notified him that satellites had detected five nuclear missile launches from the U.S. He was...

this is beautiful, but I can't think of anything specific to say, so I'll just give some generic praise. I like how he only used big words when necessary.

AI 2027 lies at a Pareto frontier – it contains the best researched argument for short timelines, or the shortest timeline backed by thorough research[1]. My own timelines are substantially longer, and there are credible researchers whose timelines are longer still. For this reason, I thought it would be interesting to explore the key load-bearing arguments AI 2027 presents for short timelines. This, in turn, allows for some discussion of signs we can watch for to see whether those load-bearing assumptions are bearing out.

To be clear, while the authors have short timelines, they do not claim that ASI is likely to arrive in 2027[2]. But the fact remains that AI 2027 is a well researched argument for short timelines. Let's explore that argument.

(In what follows, I will...

2elifland
I think that usually in AI safety lingo people use timelines to mean time to AGI and takeoff to mean something like the speed of progression after AGI.
1snewman
I added up the median "Predictions for gap size" in the "How fast can the task difficulty gaps be crossed?" table, summing each set of predictions separately ("Eli", "Nikola", "FutureSearch") to get three numbers ranging from 30-75. Does this table cover the time between now and superhuman coder? I thought it started at RE-Bench, because: * I took all of this to be in context of the phrase, about one page back, "For each gap after RE-Bench saturation" * The earlier explanation that Method 2 is "a more complex model starting from a forecast saturation of an AI R&D benchmark (RE-Bench), and then how long it will take to go from that system to one that can handle real-world tasks at the best AGI company" [emphasis added] * The first entry in the table ("Time horizon: Achieving tasks that take humans lots of time") sounds more difficult than saturating RE-Bench. * Earlier, there's a separate discussion forecasting time to RE-bench saturation. But sounds like I was misinterpreting?

Those estimates do start at RE-Bench, but these are all estimates for how long things would take given the "default" pace of progress, rather than the actual calendar time required. Adding them together ends up with a result that doesn't take into account speedup from AI R&D automation or the slowdown in compute and algorithmic labor growth after 2028.

3AnthonyC
Exactly. More fundamentally, that is not a probability graph, it's a probability density graph, and we're not shown the line beyond 2032 but just have to assume the integral from 2100-->infinity is >10% of the integral from 0-->infinity. Infinity is far enough away that the decay doesn't even need to be all that slow for the total to be that high.