LESSWRONG
LW

All of Xodarap's Comments + Replies

METR: Measuring AI Ability to Complete Long Tasks

Note that the REBench correlation definitionally has to be 0 because all tasks have the same length. SWAA similarly has range restriction, though not as severe.

1chelsea20d

Well, the REBench tasks don't all have the same length, at least in the data METR is using. It's all tightly clustered around 8 hours though, so I take your point that it's not a very meaningful correlation.

Shortform

Xodarap1mo10

This seems plausible to me but I could also imagine the opposite being true: my working memory is way smaller than the context window of most models. LLMs would destroy me at a task which "merely" required you to memorize 100k tokens and not do any reasoning; I would do comparatively better at a project which was fairly small but required a bunch of different steps.

Will the AGIs be able to run the civilisation?

Xodarap1mo30

The METR report you cite finds that LLMs are vastly cheaper than humans when they do succeed, even for longer tasks:

The ARC-AGI results you cite feel somewhat hard to interpret: they may indicate that the very first models with some capability will be extremely expensive to run, but don't necessarily mean that human-level performance will forever be expensive.

1StanislavKrym1mo

Thank you for pointing at the cost graph. It is the ratio of the cost of a SUCCESSFUL run to the cost of hiring a human. But what if we take the failed runs into account as well? I wonder if the total cost of failed and successful runs is 10 times bigger for 8-hour-length tasks, placing far more tasks above the 100 threshold. UPD: the o3-mini is about 30 times cheaper than o1, not 7. Then the cost of o3-low might be increased just 16 times, yielding the AGI costing $3200 per use (for what long tasks, exactly?) UPD2: I messed up the count. The model o3-mini (low) costs $0.040, while o3(low) costs $200, meaning a 5000 times increase, not a 500 times.

Tail SP 500 Call Options

Xodarap3mo20

I think the claim is that things with more exposure to AI are more expensive.

MichaelDickens's Shortform

Xodarap7mo10

Thanks!

MichaelDickens's Shortform

Xodarap7mo60

You said

If you "withdraw from a cause area" you would expect that if you have an organization that does good work in multiple cause areas, then you would expect you would still fund the organization for work in cause areas that funding wasn't withdrawn from. However, what actually happened is that Open Phil blacklisted a number of ill-defined broad associations and affiliations, where if you are associated with a certain set of ideas, or identities or causes, then no matter how cost-effective your other work is, you cannot get funding from OP

I'm wonder... (read more)

habryka7mo15-2

I don't have a long list, but I know this is true for Lightcone, SPARC, ESPR, any of the Czech AI-Safety/Rationality community building stuff, and I've heard a bunch of stories since then from other organizations that got pretty strong hints from Open Phil that if they start working in an area at all, they might lose all funding (and also, the "yes, it's more like a blacklist, if you work in these areas at all we can't really fund you, though we might make occasional exceptions if it's really only a small fraction of what you do" story was confirmed to me by multiple OP staff, so I am quite confident in this, and my guess is OP staff would be OK with confirming to you as well if you ask them).

MichaelDickens's Shortform

Xodarap7mo110

what actually happened is that Open Phil blacklisted a number of ill-defined broad associations and affiliations

is there a list of these somewhere/details on what happened?

habryka7mo5516

You can see some of the EA Forum discussion here: https://forum.effectivealtruism.org/posts/foQPogaBeNKdocYvF/linkpost-an-update-from-good-ventures?commentId=RQX56MAk6RmvRqGQt

The current list of areas that I know about are:

Anything to do with the rationality community ("Rationality community building")
Anything to do with moral relevance of digital minds
Anything to do with wild animal welfare and invertebrate welfare
Anything to do with human genetic engineering and reproductive technology
Anything that is politically right-leaning

There are a bunc... (read more)

Quick evidence review of bulking & cutting

Xodarap1y20

Thanks for writing this up! I wonder how feasible it is to just do a cycle of bulking and cutting and then do one of body recomposition and compare the results. I expect that the results will be too close to tell a difference, which I guess just means that you should do whichever is easier.

1jp1y

I would also expect extraneous details like, "got sick and fell of the wagon" or similar to add significant noise. And with only one data point each, it'd be hard to know the variance to use. I'm guess I'd trust this study more?

Dating Roundup #2: If At First You Don’t Succeed

Xodarap1y10

I think it would be helpful for helping others calibrate, though obviously it's fairly personal.

Dating Roundup #2: If At First You Don’t Succeed

Xodarap1y30

Possibly too sensitive, but could you share how the photos performed on Photfeeler? Particularly what percentile attractiveness?

2lc1y

I didn't actually see the numbers, but I could reupload and check...

Rant on Problem Factorization for Alignment

Xodarap1y80

Sure, I think everyone agrees that marginal returns to labor diminish with the number of employees. John's claim though was that returns are non-positive, and that seems empirically false.

Book Review: Going Infinite

Xodarap1y*30

We have Wildeford's Third Law: "Most >10 year forecasts are technically also AI forecasts".

We need a law like "Most statements about the value of EA are technically also AI forecasts".

Book Review: Going Infinite

Xodarap1y10

Yep that's fair, there is some subjectivity here. I was hoping that the charges from SDNY would have a specific amount that Sam was alleged to have defrauded, but they don't seem to.

Regarding $4B missing: adding in Anthropic gets another $4B on the EA side of the ledger, and founders pledge another $1B. The value produced by Anthropic is questionable, and maybe negative of course, but I think by the strict definition of "donated or built in terms of successful companies" EA comes out ahead.

(And OpenAI gets another $80B, so if you count that then I think even the most aggressive definition of how much FTX defrauded is smaller. But obviously OAI's EA credentials are dubious.)

2habryka1y

Well, I mean, I think making money off of building doomsday machines goes on the cost side of the ledger, but I do think it applies to the specific point I made above and I think that's fair. Anthropic is quite successful at a scale that is not that incomparable to the size of the FTX fraud.

Book Review: Going Infinite

Xodarap1y*43

EA has defrauded much more money than we've ever donated or built in terms of successful companies

FTX is missing $1.8B. OpenPhil has donated $2.8B.

2habryka1y

That is an interesting number, however I think it's a bit unclear how to think about defrauding here. If you steal $1000 dollars, and then I sue you and get that money back, it's not like you "stole zero dollars". I agree it matters how much is recoverable, but most of the damage from FTX is not about the lost deposits specifically anyways, and I think the correct order of magnitude of the real costs here is probably greater than the money that was defrauded, though I think reasonable people can disagree on the number here. Similarly I think when you steal a $1000 bike from me, even if I get it back, the economic damage that you introduced is probably roughly on the order of the cost of the bike. I also don't believe the $1.8B number. I've been following the reports around this very closely and every few weeks some news article claims vastly different fractions of funds have been recovered. While not a perfect estimator, I've been using the price at which FTX bankruptcy claims are trading at, which I think is currently at around 60%, suggesting more like $4B missing (claims of Alameda Research are trading at 15%, driving that number down further, but I don't know what fraction of the liabilities were Alameda claims).

6GeneSmith1y

Also, I don't think it makes sense to characterize FTX's theft of customer funds as "EA defrauding people". SBF spent around $100 million on charitable causes and billions on VC investments, celebrity promotions, interest payments to crypto lenders, bahamas real estate, and a bunch of other random crap. And Alameda lost a bunch more buying shitcoins that crashed. To say that EA defrauded people because FTX lost money is to say that of the 8 billion or whatever Alameda was short, the $100 million spent on EA priorities is somehow responsible for the other 7.9 billion. It just doesn't make any sense.

Integrity in AI Governance and Advocacy

Xodarap1y42

I do think it's at the top of frauds in the last decade, though that's a narrower category.

Nikola went from a peak market cap of $66B to ~$1B today, vs. FTX which went from ~$32B to [some unknown but non-negative number].

I also think the Forex scandal counts as bigger (as one reference point, banks paid >$10B in fines), although I'm not exactly sure how one should define the "size" of fraud.^[1]

I wouldn't be surprised if there's some precise category in which FTX is the top, but my guess is that you have to define that category fairly precisely.

^{^}
Wi

... (read more)

2habryka1y

I wasn't intending to say "the literal biggest", though I think it's a decent candidate for the literal biggest. Depending on your definitions I agree things like Nikola or Forex could come out on top. I think it's hard to define things in a way so that it isn't in the top 5.

Book Review: Going Infinite

Xodarap2y10

Oh yeah, just because it's a reference point that doesn't mean that we should copy them

Book Review: Going Infinite

Xodarap2y10

I think almost any large organization/company would have gone through a much more comprehensive fault-analysis and would have made many measurable improvements.

I claim YCombinator is a counter example.

(The existence of one counterexample obviously doesn't disagree with the "almost any" claim.)

Book Review: Going Infinite

Xodarap2y52

IMO the EA community has had a reckoning, a post-mortem, an update, etc. far more than most social or political movements would (and do) in response to similar misbehavior from a prominent member

As a reference point: fraud seems fairly common in ycombinator backed companies, but I can't find any sort of postmortem, even about major things like uBiome where the founders are literally fugitives from the FBI.

It seems like you could tell a fairly compelling story that YC pushing founders to pursue risky strategies and flout rules is upstream of this level o... (read more)

7habryka2y

The total net-fraud from YC companies seems substantially smaller than the total net-fraud from EA efforts, and I think a lot more people have been involved with YC than EAs, so I don't really think this comparison goes through. Like EA has defrauded much more money than we've ever donated or built in terms of successful companies. Total non-fradulent valuations of YC companies are in the hundreds of billions, whereas total fraud is is maybe in the $1B range? That seems like a much more acceptable ratio of fraud to value produced.

9Zvi2y

I would be ecstatic to learn that only 2% of Y-Combinator companies that ever hit $100mm were engaged in serious fraud, and presume the true number is far higher. And yes, YC does do that and Matt Levine frequently talks about the optimal amount of fraud (from the perspective of a VC) being not zero. For them, this is a feature, not a bug, up to a (very high) point. I would hope we would feel differently, and also EA/rationality has had (checks notes) zero companies/people bigger than FTX/SBF unless you count any of Anthropic, OpenAI and DeepMind. In which case, well, other issues, and perhaps other types of fraud.

Gender Vectors in ROME’s Latent Space

Xodarap2y10

Thanks for the questions!

I feel a little confused about this myself; it's possible I'm doing something wrong. (The code I'm using is the `get_prob` function in the linked notebook; someone with LLM experience can probably say if that's broken without understanding the context.) My best guess is that human intuition has a hard time conceptualizing just how many possibilities exist; e.g. "Female", "female", "F", "f" etc. are all separate tokens which might realistically be continuations.
I haven't noticed anything; my guess is that there probably is some effe

... (read more)

How much do markets value Open AI?

Xodarap2y10

Thanks! I mentioned anthropic in the post, but would similarly find it interesting if someone did a write up about cohere. It could be that OAI is not representative for reasons I don't understand.

How much do markets value Open AI?

Xodarap2y20

Yep, revenue multiples are a heuristic for expectations of future growth, which is what I care about
This is true, but I'm not aware of any investments on $0 revenue at the $10B scale. Would love to hear of counterexamples if you know of any!^[1]

^{^}
Instagram is the closest I can think of, but that was ~20x smaller and an acquisition, not an investment

2Dave Orr2y

Anthropic reportedly got a $4B valuation on negligible revenue. Cohere is reportedly asking for a $6B valuation on maybe a few $M in revenue. AI startups are getting pretty absurd valuations based on I'm not sure what, but I don't think it's ARR.

Discussion with Nate Soares on a key alignment difficulty

Xodarap2y10

I tried playing the game Nate suggested with myself. I think it updated me a bit more towards Holden's view, though I'm very confident that if I did it with someone who was more expert than I am both the attacker and the defender would be more competent, and possibly the attacker would win.

Attacker: okay, let's start with a classic: Alignment strategy of "kill all the capabilities researchers so networks are easier to interpret."
Defender: Arguments that this is a bad idea will obviously be in any training set that a human level AI researcher would be train... (read more)

Recursive Middle Manager Hell

Xodarap2y30

Yeah that's correct on both counts (that does seem like an important distinction, and neither really match my experience, though the former is more similar).

Recursive Middle Manager Hell

Xodarap2y60

I spent about a decade at a company that grew from 3,000 to 10,000 people; I would guess the layers of management were roughly the logarithm in base 7 of the number of people. Manager selection was honestly kind of a disorganized process, but it was basically: impress your direct manager enough that they suggest you for management, then impress your division manager enough that they sign off on this suggestion.

I'm currently somewhere much smaller, I report to the top layer and have two layers below me. Process is roughly the same.

I realized that I should h... (read more)

4Raemon2y

Nod. I maybe want to distinguish between "the organization goodharts itself as it grows, via a recursive hiring/promoting/enculturation mechanism that is strong that the size/bureaucracy would naively predict", vs "the organization is increasingly run by sociopaths" (via that same mechanism). It sounds like you're saying your experience didn't really match either hypothesis, but checking in about the distinction.

Recursive Middle Manager Hell

Xodarap2y141

For what it's worth, I think a naïve reading of this post would imply that moral mazes are more common than my experience indicates.

I've been in middle management at a few places, and in general people just do reasonable things because they are reasonable people, and they aren't ruthlessly optimizing enough to be super political even if that's the theoretical equilibrium of the game they are playing.^[1]

^{^}
This obviously doesn't mean that they are ruthlessly optimizing for the company's true goals though. They are just kind of casually doing things they

... (read more)

4Raemon2y

Curious for more details on the size of the companies, how many layers of management there were and what the managers' selection process was.

Let’s think about slowing down AI

Xodarap2y30

FYI I think your first skepticism was mentioned in the safety from speed section; she concludes that section:

These [objections] all seem plausible. But also plausibly wrong. I don’t know of a decisive analysis of any of these considerations, and am not going to do one here. My impression is that they could basically all go either way.

She mentions your second skepticism near the top, but I don't see anywhere she directly addresses it.

2Rohin Shah2y

Thanks, that's good to know.

Discovering Latent Knowledge in Language Models Without Supervision

Xodarap2y10

One of the authors has now posted about this here

Clarifying AI X-risk

Xodarap2y30

think about how humans most often deceive other humans: we do it mainly by deceiving ourselves... when that sort of deception happens, I wouldn't necessarily expect to be able to see deception in an AI's internal thoughts

The fact that humans will give different predictions when forced to make an explicit bet versus just casually talking seems to imply that it's theoretically possible to identify deception, even in cases of self-deception.

3johnswentworth2y

Of course! I don't intend to claim that it's impossible-in-principle to detect this sort of thing. But if we're expecting "thinking in ways which are selected for deceiving humans", then we need to look for different (and I'd expect more general) things than if we're just looking for "thinking about deceiving humans". (Though, to be clear, it does not seem like any current prosaic alignment work is on track to do either of those things.)

Counterarguments to the basic AI x-risk case

Xodarap3y31

Basic question: why would the AI system optimize for X-ness?

I thought Katja's argument was something like:

Suppose we train a system to generate (say) plans for increasing the profits of your paperclip factory similar to how we train GANs to generate faces
Then we would expect those paperclip factory planners to have analogous errors to face generator errors
I.e. they will not be "eldritch"

The fact that you could repurpose the GAN discriminator in this terrifying way doesn't really seem relevant if no one is in practice doing that?

When does technical work to reduce AGI conflict make a difference?: Introduction

Xodarap3y43

Thanks for sharing this! Could you make it an actual sequence? I think that would make navigation easier.

4JesseClifton3y

Done!

Rant on Problem Factorization for Alignment

Xodarap3y50

Thanks! The point about existence proofs is helpful.

After thinking about this more, I'm just kind of confused about the prompt: Aren't big companies by definition working on problems that can be factored? Because if they weren't, why would they hire additional people?

Rant on Problem Factorization for Alignment

Xodarap3y145

Ask someone who’s worked in a non-academia cognitive job for a while (like e.g. a tech company), at a company with more than a dozen people, and they’ll be like “lolwut obviously humans don’t factorize problems well, have you ever seen an actual company?”. I’d love to test this theory, please give feedback in the comments about your own work experience and thoughts on problem factorization.

What does "well" mean here? Like what would change your mind about this?

I have the opposite intuition from you: it's clearly obvious that groups of people can accomplish... (read more)

4faul_sname1y

Very late reply, reading this for the 2022 year in review. So there are at least two different models which both yield this observation. The first is that there are few people who can reliably create $1MM / year of value for their company, and so companies that want to increase their revenue have no choice but to hire more people in order to increase their profits. The second is that it is entirely possible for a small team of people to generate a money fountain which generates billions of dollars in net revenue. However, once you have such a money fountain, you can get even more money out of it by hiring more people, comparative advantage style (e.g. people to handle mandatory but low-required-skill jobs to give the money-fountain-builders more time to do their thing). At equilibrium, companies will hire employees until the marginal increase in profit is equal to the marginal cost of the employee. My crackpot quantitative model is that the speed with which a team can create value in a single domain scales with approximately the square root of the number of people on the team (i.e. a team of 100 will create 10x as much value as a single person). Low sample size but this has been the case in the handful of (mostly programming) projects I've been a part of as the number of people on the team fluctuates, at least for n between 1 and 100 on each project (including a project that started with 1, then grew to ~60, then dropped back down to 5).

johnswentworth3y11-2

Two key points here.

First: a group of 100 people can of course get more done over a month than an individual, by expending 100 times as many man-hours as the individual. (In fact, simple argument: anything an individual can do in a month a group of 100 can also do in a month by just having one group member do the thing independently. In practice this doesn't always work because people get really stupid in groups and might not think to have one person do the thing independently, but I think the argument is still plenty strong.) The question is whether the g... (read more)

Formalizing Objections against Surrogate Goals

Xodarap3y10

I think humans actually do use SPI pretty frequently, if I understand correctly. Some examples:

Pre-committing to resolving disputes through arbitration instead of the normal judiciary process. In theory at least, this results in an isomorphic "game", but with lower legal costs, thereby constituting a Pareto improvement.
Ritualized aggression: Directly analogous to the Nerf gun example. E.g. a bear will "commit" to giving up its territory to another bear who can roar louder, without the need of them actually fighting, which would be costly for both parties.
1. T

Xodarap3y10

Thanks for sharing this update. Possibly a stupid question: Do you have thoughts on whether cooperative inverse reinforcement learning could help address some of the concerns with identifiability?

There are a set of problems which come from agents intentionally misrepresenting their preferences. But it seems like at least some problems come from agents failing to successfully communicate their preferences, and this seems very analogous to the problem CIRL is attempting to address.

Becoming a Staff Engineer

Xodarap3y*-10

> Start-ups want engineers who are overpowered for the immediate problem because they anticipate scaling, and decisions made now will affect their ability to do that later.

I'm sure this is true of some startups, but was not true of mine, nor the ones I was thinking of what I wrote that.

Senior engineers are like… Really good engineers? Not sure how to describe it in a non-tautological way. I somewhat regularly see a senior engineer solve in an afternoon a problem which a junior engineer has struggled with for weeks.

Being able to move that quickly is extr... (read more)

Becoming a Staff Engineer

Xodarap3y-10

Startups sometimes have founders or early employees who are staff (or higher) engineers.

Sometimes this goes terribly: the staff engineer is used to working in a giant bureaucracy, so instead of writing code they organize a long series of meetings to produce a UML diagram or something, and the company fails.
Sometimes this goes amazingly: the staff engineer can fix bugs 10x faster than the competitors’ junior engineers while simultaneously having the soft skills to talk to customers, interview users, etc.

If you are in the former category, EA organizations mo... (read more)

3Elizabeth3y

It seems like you are bucketing senior eng skills into bureaucracy or... agility? ability to work quickly and responsively? That is missing most of what makes staff engineers staff engineers. Start-ups want engineers who are overpowered for the immediate problem because they anticipate scaling, and decisions made now will affect their ability to do that later. EA is growing but AFAIK not at nearly the speed it would take to make use of those skills: you're more likely to end up with something terribly overbuilt for the purpose. From what you describe, I think you'd be much better off looking for a kick-ass mid-level PM with a coding background who wants to get back into engineering. They'd be giving up much less (both in terms of money, painstakingly acquired skills they enjoy using, and future option value), and are more likely to have the skills you actually want.

Convince me that humanity *isn’t* doomed by AGI

Answer by XodarapApr 20, 202260

EAG London last weekend contained a session with Rohin Shah, Buck Shlegeris and Beth Barnes on the question of how concerned we should be about AGI. They seemed to put roughly 10-30% chance on human extinction from AGI.

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Xodarap3y20

Thanks, yeah now that I look closer Metaculus shows a 25% cumulative probability before April 2029, which is not too far off from OP's 30% claim.

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Xodarap3y130

Note that Metaculus predictions don't seem to have been meaningfully changed in the past few weeks, despite these announcements. Are there other forecasts which could be referenced?

4jsteinhardt3y

Note the answer changes a lot based on how the question is operationalized. This stronger operationalization has dates around a decade later.

8steven04613y

My impression (based on using Metaculus a lot) is that, while questions like this may give you a reasonable ballpark estimate and it's great that they exist, they're nowhere close to being efficient enough for it to mean much when they fail to move. As a proxy for the amount of mental effort that goes into it, there's only been three comments on the linked question in the last month. I've been complaining about people calling Metaculus a "prediction market" because if people think it's a prediction market then they'll assume there's a point to be made like "if you can tell that the prediction is inefficient, then why aren't you rich, at least in play money?" But the estimate you're seeing is just a recency-weighted median of the predictions of everyone who presses the button, not weighted by past predictive record, and not weighted by willingness-to-bet, because there's no buying or selling and everyone makes only one prediction. It's basically a poll of people who are trying to get good results (in terms of Brier/log score and Metaculus points) on their answers.

7Sam Bowman3y

I suspect that these developments look a bit less surprising if you've been trying to forecast progress here, and so might be at least partially priced in. Anyhow, the forecast you linked to shows >10% likelihood before spring 2025, three years from now. That's extraordinarily aggressive compared to (implied) conventional wisdom, and probably a little more aggressive than I'd be as an EA AI prof with an interest in language models and scaling laws.

Not Relevant3y150

This post is mainly targeted at people capable of forming a strong enough inside view to get them above >30% without requiring a moving average of experts which may take months to update (since it's a popular question).

For everyone else, I don't think you should update much on this except vis a vis the number of other people who agree.

Dating profiles from first principles: heterosexual male profile design

Xodarap3y40

Update: I improved the profile of someone who reached out to me from this article. They went from zero matches in a year to ~2/week.

I think this is roughly the effect size one should expect from following this advice: it's not going to take you from the 5th percentile to the 95th, but you can go from the 20th to the 70th or something.

Why Agent Foundations? An Overly Abstract Explanation

Xodarap3y30

Why is the Alignment Researcher different than a normal AI researcher?

E.g. Markov decision processes are often conceptualized as "agents" which take "actions" and receive "rewards" etc. and I think none of those terms are "True Names".

Despite this, when researchers look into ways to give MDP's some other sort of capability or guarantee, they don't really seem to prioritize finding True Names. In your dialogue: the AI researcher seems perfectly fine accepting the philosopher's vaguely defined terms.

What is it about alignment which makes finding True Names s... (read more)

Do any AI alignment orgs hire remotely?

Xodarap3y10

Sure, feel free to DM me.

Do any AI alignment orgs hire remotely?

Answer by XodarapMar 14, 202230

We (the Center for Effective Altruism) are hiring Full-Stack Engineers. We are a remote first team, and work on tools which (we hope) better enable others to work on AI alignment, including collaborating with the LessWrong team on the platform you used to ask this question :)

1RobertM3y

Interesting, was this recently posted? Do you mind if I DM you with some questions?

Prizes for ELK proposals

Xodarap3y10

A small suggestion: the counterexample to "penalize downstream", as I understand it, requires there to be tampering in the training data set. It seems conceptually cleaner to me if we can assume the training data set has not been tampered with (e.g. because if alignment only required there to be no tampering in the training data, that would be much easier).

The following counterexample does not require tampering in the training data:

The predictor has nodes $E_{0}, \dots, E_{n}$ indicating whether the diamond was stolen at time $n$
It also has node $E = ⋁_{i} E_{i}$

... (read more)

3paulfchristiano3y

I think it's good to assume that there is no tampering in the training set. In the document we say that we're worried about the reporter that waits until it sees a good argument that "The human won't be confident that the diamond isn't in the room" and then says "the diamond is in the room" as soon as it finds one. We claim that this helps on the training set, and then argue that it would lead to bad behavior given certain kinds of tampering. But you're correct that we don't actually give any examples where this heuristic actually helps. To give one now (credit Mark): suppose that if the diamond is in the room at time T, then at time T+1 it will either be in the room or something confusing will happen that will leave the human unconfident about whether the diamond is still in the room. Then as soon as you figure out that the diamond is in the room at time T, you might as well answer "the diamond is in the room at time T+1" even if you aren't actually sure of that. The counterexample you describe has a different flavor but is also valid (both for "depending on downstream variables" and "computation time")---the reporter can save time by baking in some assumptions that are only true on the training distribution. There are various ways you could try to address this kind of problem, and it seems interesting and important. We don't get into any of that in the doc. That's partly because we haven't really worked through any of the details for any of those approaches, so they would be welcome contributions!

1Towards_Keeperhood3y

Maybe I misunderstand you, but how I understand it, En is based on the last frame of the predicted video, and therefore is basically the most downstream thing there is. How did you come to think it was upstream of the direct translator?

Prizes for ELK proposals

Xodarap3y30

Thanks for sharing your idea!

Prizes for ELK proposals

Xodarap3y*10

I'm not an ARC member, but I think assuming that the chip is impossible to tamper with is assuming the conclusion.

The task is to train a reporter which accurately reports the presence of the diamond, even if we are unable to tell whether tampering has occurred (e.g. because the AI understands some esoteric physics principle which lets them tamper with the chip in a way we don't understand). See the section on page 6 starting with "You might try to address this possibility by installing more cameras and sensors..."

Prizes for ELK proposals

Xodarap3y10

Thanks!

Prizes for ELK proposals

Xodarap3y10

I've been trying to understand this paragraph:

That is, it looks plausible (though still <50%) that we could improve these regularizers enough that a typical “bad” reporter was a learned optimizer which used knowledge of direct translation, together with other tricks and strategies, in order to quickly answer questions. For example, this is the structure of the counterexample discussed in Section: upstream. This is a still a problem because e.g. the other heuristics would often “misfire” and lead to bad answers, but it is a promising starting point becau

... (read more)

2paulfchristiano3y

1. "Bad reporter" = any reporter that gives unambiguously bad answers in some situations (in the ontology identification case, basically anything other than a direct translator) 2. "use knowledge of direct translation" = it may be hard to learn direct translation because you need a bunch of parameters to specify how to do it, but these "bad" reporters may also need the same bunch of parameters (because they do direct translation in some situations) 3. In the "upstream" counterexample, the bad reporter does direct translation under many circumstances but then sometimes uses a different heuristic that generates a bad answer. So the model needs all the same parameters used for direct translation, as mentioned in the last point. (I think your understanding of this was roughly right.) 4. More like: now we've learned a reporter which contains what we want and also some bad stuff, you could imagine doing something like imitative generalization (or e.g. a different regularization scheme that jointly learned multiple reporters) in order to get just what we wanted.

Prizes for ELK proposals

Xodarap3y10

In "Strategy: penalize computation time" you say:

> At first blush this is vulnerable to the same counterexample described in the last section [complexity]... But the situation is a little bit more complex... the direct translator may be able to effectively “re-use” that inference rather than starting from scratch

It seems to me that this "counter-counterexample" also applies for complexity – if the translator is able to reuse computation from the predictor, wouldn't that both reduce the complexity and the time?

(You don't explicitly state that this "reuse" is only helpful for time, so maybe you agree it is also helpful for complexity – just trying to be sure I understand the argument.)

2paulfchristiano3y

The difference is that the complexity savings are bounded by a constant that doesn't depend on the size of the predictor's model (namely the complexity of the human's model). But in the case of computation time, the savings can be as large as "compute used to do inference in the human's model." And that scales up with the size of our computers.

School Daze

Xodarap3y10

I think children can be prosecuted in any state but the prosecution of parents is more novel and was a minor controversy during the last presidential campaign.