LESSWRONG
LW

All of Aaron Bergman's Comments + Replies

Sharing https://earec.net, semantic search for the EA + rationality ecosystem. Not fully up to date, sadly (doesn't have the last month or so of content). The current version is basically a minimal viable product!

On the results page there is also an option to see EA Forum only results which allow you to sort by a weighted combination of karma and semantic similarity thanks to the API.

Unfortunately there's no corresponding system for LessWrong because of (perhaps totally sensible) rate limits (the EA Forum offers a bots site for use cases like t... (read more)

Announcing the EA Archive

Aaron Bergman2y10

I mean the reason is that I've never heard of that haha. Perhaps it should be

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Aaron Bergman3y10

Ngl I did not fully understand this, but to be clear I don't think understanding alignment through the lense of agency is "excessively abstract." In fact I think I'd agree with the implicit default view that it's largely the single most productive lense to look through. My objection to the status quo is that it seems like the scale/ontology/lense/whatever I was describing is getting 0% of the research attention whereas perhaps it should be getting 10 or 20%.

Not sure this analogy works, but if NIH was spending $10B on cancer research, I would (prima facie, as a layperson) want >$0 but probably <$2B spent on looking at cancer as an atomic-scale phenomenon, and maybe some amount at an even lower-scale scale

3the gears to ascension3y

yeah I was probably too abstract in my reply - to rephrase: a thermostat (or other extremely small control system) is a perfectly valid example of agency. it's not dangerously strong agency or any such thing. but my point is really to say that you're on the right track here, looking at the micro-scale versions of things is very promising.

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Aaron Bergman3y1-1

Note: I'm probably well below median commenter in terms of technical CS/ML understanding. Anyway...

I feel like a missing chunk of research could be described as “seeing DL systems as ‘normal,’ physical things and processes that involve electrons running around inside little bits of (very complex) metal pieces” instead of mega-abstracted “agents.”

The main reason this might be fruitful is that, at least intuitively and to my understanding, failures like “the AI stops just playing chess really well and starts taking over the world to learn how to play c... (read more)

1the gears to ascension3y

I can understand why it would seem excessively abstract, but when we speak of agency, we are in fact talking about patterns in the activations of the gpu's circuit elements - specifically we'd be talking about patterns of numerical feedback where the program forms a causal predictive model of a variable and then, based on the result of the predictive model, does any form of model-predictive control, eg outputting bytes (floats, probably) that encode an action that the action-conditional predictive model evaluates as likely to impact the variable. Merely minimizing loss is insufficient to end up with this outcome in many cases, but on some datasets, with some problem formulations - ones that we expect to come up, such as motor control of a robot in order to walk across a room, for a trivial example, or trying to select videos which maximize probability that a user stays on the website - we can expect that the predictive model, if more precise about the future than a human's predictive model, would allow the gpu code to select actions (motor actions or video selections) that have higher reliability of reaching the target outcome (cross the room, ensure the user stays on the site) that the control loop code evaluated via the predictive model. The worry is that, if an agent is general enough in purpose to form its own subgoals and evaluate those in the predictive model, it could end up doing multi-step plan chaining through this general world-simulator subalgorithm and realize it can attack its creators in one of a great many possible ways.

Most Ivy-smart students aren't at Ivy-tier schools

Aaron Bergman3y20

Banneker Key! Yeah I was in a very similar position, but basically made the opposite choice (largely because financial costs not internalized)

Most Ivy-smart students aren't at Ivy-tier schools

Aaron Bergman3y10

Yeah that's gotta be it, nice catch!

What Are You Tracking In Your Head?

Aaron Bergman3y42

One answer to the question for me:

While writing, something close to "how does this 'sound' in my head naturally, when read, in an aesthetic sense?"

I've thought for a while that "writing quality" largely boils down to whether the writer has an intuitively salient and accurate intuition about how the words they're writing come across when read.

Half-baked AI Safety ideas thread

Aaron Bergman3y40

Ah late to the party! This was a top-level post aptly titled "Half-baked alignment idea: training to generalize" that didn't get a ton of attention.

Thanks to Peter Barnett and Justis Mills for feedback on a draft of this post. It was inspired by Eliezer's Lethalities post and Zvi's response.
Central idea: can we train AI to generalize out of distribution?
I'm thinking, for example, of an algorithm like the following:
Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level w

Aaron Bergman3y*30

Thank you, Solenoid! The SSC podcast is the only reason I to consume all of posts like Biological Anchors: A Trick That Might Or Might Not Work

2Solenoid_Entity3y

Glad to hear it's useful :)

Half-baked alignment idea: training to generalize

Aaron Bergman3y10

Thanks. It's similar in one sense, but (if I'm reading the paper right) a key difference is that in the MAML examples, the ordering of the meta-level and object level training is such that you still wind up optimizing hard for a particular goal. The idea here is that the two types of training function in opposition, as a control system of sorts, such that the meta-level training should make the model perform worse at the narrow type of task it was trained on.

That said, for sure, the types of distribution shift thing is an issue. It seems like this meta-level bias might be less bad than at the object level, but I have no idea.

Aaron Bergman's Shortform

Aaron Bergman3y10

Training to generalize (and training to train to generalize, etc.)

Inspired by Eliezer's Lethalities post and Zvi's response:

Has there been any research or writing on whether we can train AI to generalize out of distribution?

I'm thinking, for example:

Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level writing (this is one instance of the object level
1. Assign the system a meta-level award based on how well it performs (without any additional training) at generating the next word from more advance

Aaron Bergman3y10

MichaelStJules is right about what I meant. While it's true that preferring not to experience something doesn't necessarily imply that the thing is net-negative, it seems to me very strong evidence in that direction.

1frankybegs3y

Hi, instead of clogging up the thread I just thought I'd alert you that I responded to MichaelStJules, which should function equally as a response to your comment.

Animal welfare EA and personal dietary options

Aaron Bergman4y120

Entirely agree. There are certainly chunks of my life (as a privileged first-worlder) I'd prefer not to have experienced, and these generally these seem less bad than "an average period of the same duration as a Holocaust prisoner." Given that animals are sentient, I'd put it at at ~98% that their lives are net negative.

0frankybegs3y

Preferring not to experience something is not the same thing as it being net negative. You are comparing it to a baseline of your normal life (because not experiencing it is simply continuing to experience your usual utility level).

All of Aaron Bergman's Comments + Replies

Central idea: can we train AI to generalize out of distribution?

Training to generalize (and training to train to generalize, etc.)