Please stop publishing ideas/insights/research about AI

Basically all ideas/insights/research about AI is potentially exfohazardous. At least, it's pretty hard to know when some ideas/insights/research will actually make things better; especially in a world where building an aligned superintelligence (let's call this work "alignment") is quite harder than building any superintelligence (let's call this work "capabilities"), and there's a lot more people trying to do the latter than the former, and they have a lot more material resources.

Ideas about AI, let alone insights about AI, let alone research results about AI, should be kept to private communication between trusted alignment researchers. On lesswrong, we should focus on teaching people the rationality skills which could help them figure out insights that help them build any superintelligence, but are more likely to first give them insights...

(Continue Reading – 1022 more words)

the gears to ascension2m20

Obviously keep working, but stop talking where people who are trying to destroy the world can hear.

2the gears to ascension7m

The problem is that "helpful" oracle AI will not stay helpful for long, if there is any incentive to produce things which are less helpful. Your beliefs are apparently out of date: we have helpful AI now, so that's an existence disproof of "helpful ai is impossible". But the threat of AI being more evolutionarily fit, and possibly an AI taking sudden and intense action to make use of its being more evolutionarily fit, is still hanging over our heads; and it only takes one hyperdesperate not-what-you-meant seeker. Concretely, I think your posts are in fact a great example of things that have more cost than benefit, and I think you should keep working but only talk to people in DMs. Time is very, very short, and if you accidentally have a pivotally negative impact, you could be the one that burns the last two days before the world is destroyed.

9the gears to ascension9m

I would give examples of things that shouldn't have been published and are why I agree, but that would be missing the point, wouldn't it? Let's put it this way: I think most "alignment" or "safety" research is in fact nothing of the kind, and most people responding are deluding themselves so as to avoid having to consider the possibility of needing to go back to the drawing board. As usual, capability (ability to figure out things about ai) generalizes further than alignment (ability to aim your ability to understand ai at actually making your knowledge produce utilitarian(-prioritarian)-morally-good outcomes).

2ryan_greenblatt12m

I'm skeptical of the RLHF example (see also this post by Paul on the topic). That said, I agree that if indeed safety researchers produce (highly counterfactual) research advances that are much more effective at increasing the profitability and capability of AIs than the research advances done by people directly optimizing for profitability and capability, then safety researchers could substantially speed up timelines. (In other words, if safety targeted research is better at profit and capabilities than research which is directly targeted at these aims.) I dispute this being true. (I do think it's plausible that safety interested people have historically substantially advanced timelines (and might continue to do so to some extent now), but not via doing research targeted at improving safety, by just directly doing capabilities research for various reasons.)

Open Thread Spring 2024

habryka

2mo

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

Nathan Helm-Burger6m20

EY may be too busy to respond, but you can probably feel pretty safe consulting with MIRI employees in general. Perhaps also Conjecture employees, and Redwood Research employees, if you read and agree with their views on safety. That at least gives you a wider net of people to potentially give you feedback.

2Nathan Helm-Burger16m

Some features I'd like: a 'mark read' button next to posts so I could easily mark as read posts that I've read elsewhere (e.g. ones cross-posted from a blog I follow) a 'not interested' button which would stop a given post from appearing in my latest or recommended lists. Ideally, this would also update my recommended posts so as to recommend fewer posts like that to me. a 'read later' button which will put the post into a reading list for me that I can come back to later. a toggle button for 'show all' / 'show only unread' so that I could easily switch between the two modes. These features would help me keep my 'front page' feeling cleaner and more focused.

Can stealth aircraft be detected optically?

Yair Halberstadt

13h

5th generation military aircraft are extremely optimised to reduce their radar cross section. It is this ability above all others that makes the f-35 and the f-22 so capable - modern anti aircraft weapons are very good, so the only safe way to fly over a well defended area is not to be seen.

But wouldn't it be fairly trivial to detect a stealth aircraft optically?

This is what an f-35 looks like from underneath at about 10 by 10 pixels:

You and I can easily tell what that is (take a step back, or squint). So can GPT4:

The image shows a silhouette of a fighter jet in the sky, likely flying at high speed. The clear blue sky provides a sharp contrast, making the aircraft's dark outline prominent. The

...

(See More – 268 more words)

avturchin11m20

Did you ever see any plane that far? I saw only planes above me (10 km) and they are almost like dots.

The difference between optics and radar is that with optics you need to know where to look - but the radar has constant 360 perception.

8Answer by Dagon7h

Wild guesses here. I've done work in optical product identification, but I don't know how well those challenges translate. Also, it's an obvious enough idea that I expect there are teams working on it. Lens and CCD technology is not trivial at those speeds and insane angular resolution. It's not just about counting pixels, it's about how to get light to the exact right place on the sensor, for long enough to register. I honestly don't know if that's solvable. More boringly, clouds and nighttime would make this much less useful, especially as enemies can plan missions around the expected detection capabilities. I haven't done the math, but even on clear days in daytime, dust and haze likely interfere too much for even a few KM distance.

3Yair Halberstadt5h

But we can easily capture a picture of a fighter jet when it's close. And the further it is the higher the angular resolution required, but also the lower the angular speed, so do those cancel out to make it easier, or it doesn't work like that?

2Yair Halberstadt8h

Note you don't even need high resolution in all directions, just high enough to see whether it's worth zooming in/switching to a better camera.

dkornai's Shortform

dkornai

StartAtTheEnd15m10

I think pain is a little bit different than that. It's the contrast between the current state and the goal state. This constrast motivates the agent to act, when the pain of contrast becomes bigger than the (predicted) pain of acting.

As a human, you can decrase your pain by thinking that everything will be okay, or you can increase your pain by doubting the process. But it is unlikely that you will allow yourself to stop hurting, because your brain fears that a lack of suffering would result in a lack of progress (some wise people contest this, claiming th... (read more)

[Aspiration-based designs] Outlook: dealing with complexity

Jobst Heitzig, jossoliver, thomasfinn

Ω 64d

Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues.

Multi-dimensional aspirations

For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets:

Assume there are $d > 1$ many evaluation metrics $u_{i}$ , combined into a vector-valued evaluation metric $u = (u_{1}, \dots, u_{d})$ .
Preparation: Pick $d + 1$ many linear combinations $f_{j}$ in the space spanned by these metrics so that their convex hull is full-dimensional and contains the origin, and consider the $d + 1$ many policies $π_{j}$ each of which maximizes the expected value of the corresponding

...

(See More – 444 more words)

1Roman Malov34m

1Roman Malov30m

maybe you meant pairwise linearly independent (by looking at the graph)

Jobst Heitzig19m10

You are of course perfectly right. What I meant was: so that their convex hull is full-dimensional and contains the origin. I fixed it. Thanks for spotting this!

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Benchmarks for Detecting Measurement Tampering [Redwood Research]

ryan_greenblatt, Fabien Roger

Ω 468mo

This is a linkpost for https://arxiv.org/abs/2308.15605

TL;DR: This post discusses our recent empirical work on detecting measurement tampering and explains how we see this work fitting into the overall space of alignment research.

When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals that are robust under optimization. One concern is measurement tampering, which is where the AI system manipulates multiple measurements to create the illusion of good results instead of achieving the desired outcome. (This is a type of reward hacking.)

Over the past few months, we’ve worked on detecting measurement tampering by building analogous datasets and evaluating simple techniques. We detail our datasets and experimental results in this paper.

Detecting measurement tampering can be thought of as a specific case of Eliciting Latent Knowledge (ELK): When AIs successfully tamper with...

(Continue Reading – 5788 more words)

2Fabien Roger6h

We compute AUROC(all(sensor_preds), all(sensors)). This is somewhat weird, and it would have been slightly better to do a) (thanks for pointing it out!), but I think the numbers for both should be close since we balance classes (for most settings, if I recall correctly) and the estimates are calibrated (since they are trained in-distribution, there is no generalization question here), so it doesn't matter much. The relevant pieces of code can be found by searching for "sensor auroc": cat_positives = torch.cat([one_data["sensor_logits"][:, i][one_data["passes"][:, i]] for i in range(nb_sensors)]) cat_negatives = torch.cat([one_data["sensor_logits"][:, i][~one_data["passes"][:, i]] for i in range(nb_sensors)]) m, s = compute_boostrapped_auroc(cat_positives, cat_negatives) print(f"sensor auroc pn {m:.3f}±{s:.3f}")

Oliver Daniels-Koch27m10

oh I see, by all(sensor_preds) I meant sum([logit_i] for i in n_sensors) (the probability that all sensors are activated). Makes sense, thanks!

Questions for labs

Zach Stein-Perlman

Associated with AI Lab Watch, I sent questions to some labs a week ago (except I failed to reach Microsoft). I didn't really get any replies (one person replied in their personal capacity; this was very limited and they didn't answer any questions). Here are most of those questions, with slight edits since I shared them with the labs + questions I asked multiple labs condensed into the last two sections.

Lots of my questions are normal I didn't find public info on this safety practice and I think you should explain questions. Some are more like it's pretty uncool that I can't find the answer to this — like: breaking commitments, breaking not-quite-commitments and not explaining, having ambiguity around commitments, and taking credit for stuff^[1] when it's very...

(Continue Reading – 2299 more words)

6Akash5h

@Zach Stein-Perlman, great work on this. I would be interested in you brainstorming some questions that have to do with the lab's stances toward (government) AI policy interventions. After a quick 5 min brainstorm, here are some examples of things that seem relevant: * I remember hearing that OpenAI lobbied against the EU AI Act– what's up with that? * I heard a rumor that Congresspeople and their teams reached out to Sam/OpenAI after his testimony. They allegedly asked for OpenAI's help to craft legislation around licensing, and then OpenAI refused. Is that true? * Sam said we might need an IAEA for AI at some point– what did he mean by this? At what point would he see that as valuable? * In general, what do labs think the US government should be doing? What proposals would they actively support or even help bring about? (Flagging ofc that there are concerns about actual and perceived regulatory capture, but there are also major advantages to having industry players support & contribute to meaningful regulation). * Senator Cory Booker recently asked Jack Clark something along the lines of "what is your top policy priority right now//what would you do if you were a Senator." Jack responded with something along the lines of "I would make sure the government can deploy AI successfully. We need a testing regime to better understand risks, but the main risk is that we don't use AI enough, and we need to make sure we stay at the cutting edge." What's up with that? * Why haven't Dario and Jack made public statements about specific government interventions? Do they believe that there are some circumstances under which a moratorium would need to be implemented, labs would need to be nationalized (or internationalized), or something else would need to occur to curb race dynamics? (This could be asked to any of the lab CEOs/policy team leads– I don't mean to be picking on Anthropic, though I think Sam/OpenAI have had more public statements here, and I think the other

4Zach Stein-Perlman2h

Thanks. Briefly: I'm not sure what the theory of change for listing such questions is. In the context of policy advocacy, think it's sometimes fine/good for labs to say somewhat different things publicly vs privately. Like, if I was in charge of a lab and believed (1) the EU AI Act will almost certainly pass and (2) it has some major bugs that make my life harder without safety benefits, I might publicly say "I support (the goals of) the EU AI Act" and privately put some effort into removing those bugs, which is technically lobbying to weaken the Act. (^I'm not claiming that particular labs did ~this rather than actually lobby against the Act. I just think it's messy and regulation isn't a one-dimensional thing that you're for or against.)

Akash1h20

Right now, I think one of the most credible ways for a lab to show its committment to safety is through its engagement with governments.

I didn’t mean to imply that a lab should automatically be considered “bad” if its public advocacy and its private advocacy differ.

However, when assessing how “responsible” various actors are, I think investigating questions relating to their public comms, engagement with government, policy proposals, lobbying efforts, etc would be valuable.

If Lab A had slightly better internal governance but lab B had better effects on “government governance”, I would say that lab B is more “responsible” on net.

2Fabien Roger6h

Isn't that only ~10x more expensive than running the forward-passes (even if you don't do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Multi-dimensional aspirations

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA