FireStormOOO - LessWrong

Hmm, I guess I see why other calculators have at least some additional heuristics and aren't straight Kelly. Going bankrupt is not infinitely bad in the US. If the insured has low wealth, there's likely a loan attached to any large asset that really complicates the math. Making W just be "household wealth" also doesn't model "I can replace the loss next paycheck". I'm not sure what exactly the correct notion of wealth is here, but if wealth is small compared to future earnings, and replacing the loss can be deferred, these assumptions are incorrect.

And obviously, paying $10k premium to insure a 50% chance of a $10k loss is always a mistake for all wealth levels. You're choosing to be bankrupt in 100% of possible worlds instead of 50%.

When Is Insurance Worth It?

FireStormOOO1mo41

This seems like a very handy calculator to have bookmarked.

~~I think I did find a bug:~~ At the low end it's making some insane recommendations. E.g. with wealth W and a 50% chance of loss W (50% chance of getting wiped out), the insurance recommendation is any premium up to W.

Wealth $10k, risk 50% on $9999 loss, recommends insure for $9900 premium.

~~That log(W-P) term is shooting off towards -infinity and presumably breaking something?~~

Edit: As papetoast points out, this is a faithful implementation of the Kelly criterion and is not a bug. Rather, Kelly assumes that taking a loss >= wealth is infinitely bad, which is not true in an environment where debts are dischargable in bankruptcy (and total wealth may even remain positive throughout).

There's probably corrections that would improve the model by factoring in future earnings, the degree to which the loss must be replace immediately (or at all), and the degree to which some losses are capped.

Cohabitive Games so Far

FireStormOOO1mo20

Related, I noticed Civ VI also really missed the mark with that mechanic. I found that a great strategy, having a modest lead on tech, was to lean into coal power, which has the best bonuses, get your seawalls built to stop your coastal cities from flooding, and flood everyone else with sea-level rise. Only one player wins, so anything to sabotage others in the endgame will be very tempting.

Rise of Nations had an "Armageddon counter" on the use of nuclear weapons, which mostly resulted in exactly the behavior you mentioned - get 'em first and employ them liberally right up to the cap.

Fundamentally both games are missing any provision for complex, especially multilateral agreements, nor is there any way to get the AI on the same page.

Quantum Immortality: A Perspective if AI Doomers are Probably Right

FireStormOOO3mo43

Your examples seem to imply that believing QI means such an agent would in full generality be neutral on an offer to have a quantum coin tossed, where they're killed in their sleep on tails, since they only experience the tosses they win. Presumably they accept all such trades offering epsilon additional utility. And presumably other agents keep making such offers since the QI agent doesn't care what happens to their stuff in worlds they aren't in. Thus such an agent exists in an ever more vanishingly small fraction of worlds as they continue accepting trades.

I should expect to encounter QI agents approximately never as they continue self-selecting out of existence in approximately all of the possible worlds I occupy. For the same reason, QI agents should expect to see similar agents almost never.

From the outside perspective this seems to be in a similar vein to the fact all computable agents exist in some strained sense (every program, more generally every possible piece of data, is encodable as some integer, and exist exactly as much as the integers do) , even if they're never instantiated. For any other observer, this QI concept is indistinguishable in the limit.

Please point out if I misunderstood or misrepresented anything.

Why is o1 so deceptive?

Answer by FireStormOOONov 07, 202440

I'll note that malicious compliance is a very common response to being provided a task that's not straightforwardly possible with the resources available, and no channel to simply communicate that without retaliation. BS an answer, or technically correct/rules as written response, is often just the best available strategy if one isn't in a position to fix the evaluator's broken incentives.

An actual human's chain of thought would be a lot spicier if their boss ask them to produce a document with working links without providing internet access.

video games > IQ tests

FireStormOOO3mo20

"English" keeps ending up as a catch-all in K-12 for basically all language skills and verbal reasoning skills that don't obviously fit somewhere else. Read and summarize fiction - English, Write a persuasive essay - English, grammar pedantry - English, etc.

The Asshole Filter

FireStormOOO4mo30

That link currently redirects the reader to https://siderea.dreamwidth.org/1209794.html

(just in case the old one stops working)

3b. Formal (Faux) Corrigibility

FireStormOOO6mo11

Good clarification; not just the amount of influence, something about the way influence is exercised being unsurprising given the task. Central not just in terms of "how much influence", but also along whatever other axes the sort of influence could vary?

I think if the agent's action space is still so unconstrained there's room to consider benefit or harm that flows through principle value modification it's probably still been given too much latitude. Once we have informed consent, because the agent has has communicated the benefits and harms as best it understands, it should have very little room to be influenced by benefits and harms it thought too trivial to mention (by virtue of their triviality).

At the same time, it's not clear the agent should, absent further direction, reject the offer to brainwash the principle for resources, as opposed to punting to the principle. Maybe the principle thinks those values are an improvement and it's free money? [e.g. Prince's insurance company wants to bribe him to stop smoking.]

3b. Formal (Faux) Corrigibility

FireStormOOO6moΩ240

WRT non-manipulation, I don't suppose there's an easy way to have the AI track how much potentially manipulative influence it's "supposed to have" in the context and avoid exercising more than that influence?

Or possibly better, compare simple implementations of the principle's instructions, and penalize interpretations with large/unusual influence on the principle's values. Preferably without prejudicing interventions straightforwardly protecting the principle's safety and communication channels.

Principle should, for example, be able to ask the AI to "teach them about philosophy", without it either going out of it's way to ensure Principle doesn't change their mind about anything as a result of the instruction, nor unduly influencing them with subtly chosen explanations or framing. The AI should exercise an "ordinary" amount of influence typical of the ways AI could go about implementing the instruction.

Presumably there's a distribution around how manipulative/anti-manipulative(value-preserving) any given implementation of the instruction is, and we may want AI to prefer central implementations rather than extremely value-preserving ones.

Ideally AI should also worry that it's contemplating exercising more or less influence than desired, and clarify that as it would any other aspect of the task.

The Incredible Fentanyl-Detecting Machine

FireStormOOO6mo10

You're very likely correct IMO. The only thing I see pulling in the other direction is that cars are far more standardized than humans, and a database of detailed blueprints for every make and model could drastically reduce the resolution needed for usefulness. Especially if the action on a cursory detection is "get the people out of the area and scan it harder", not "rip the vehicle apart".

LESSWRONG
LW

Posts

Wiki Contributions

Comments