LESSWRONG
LW

All of Ted Sanders's Comments + Replies

Terrific!

We probably won't just play status games with each other after AGI

We can already see what people do with their free time when basic needs are met. A number of technologies have enabled new hacks to set up 'fake' status games that are more positive-sum than ever before in history:

Watch broadcast sports, where you can feel like a winner (or at least feel connected to a winner), despite not having had to win yourself
Play video games with AI opponents, where you can feel like a winner, despite it not being zero-sum against other humans
Watch streamers and influencers to feel connected to high status people, without having to

... (read more)

Tips On Empirical Research Slides

Ted Sanders1mo20

Management consulting firms have lots of great ideas on slide design: https://www.theanalystacademy.com/consulting-presentations/

Some things they do well:

They treat slides as documents that can be understood standalone (this is even useful when presenting, as not everyone is following every word)
They employ a lot of hierarchy to help make the content skimmable (helpful for efficiency)
They put conclusions / summaries / action items up front, details behind (helpful for efficiency, especially in a high trust environments)

Tips On Empirical Research Slides

Ted Sanders1mo*103

Additional thoughts:

More than 3 bars/colors is fine
I recommend using horizontal bars on some of those slides, so the labels are written in the same direction as the bars - lets you fill space more efficiently
Put sentences / verbs in titles; noun titles like "Summary" or "Discussion" are low value
If you're measuring deltas between two things, compute the error bar on the delta, don't compute the error bars on the two things; consider coloring by statistical significance (e.g., continuous color scale over range of standard errors of differences of the mean)
I

Ted Sanders2mo50

Hey Tamay, nice meeting you at The Curve. Just saw your comment here today.

Things we could potentially bet on:
- rate of GDP growth by 2027 / 2030 / 2040
- rate of energy consumption growth by 2027 / 2030 / 2040
- rate of chip production by 2027 / 2030 / 2040
- rates of unemployment (though confounded)

Any others you're interested in? Degree of regulation feels like a tricky one to quantify.

3ryan_greenblatt2mo

How about AI company and hardware company valuations? (Maybe in 2026, 2027, 2030 or similar.) Or what about benchmark/task performance? Is there any benchmark/task you think won't get beaten in the next few years? (And, ideally, if it did get beaten, you would change you mind.) Maybe "AI won't be able to autonomously write good ML research papers (as judged by (e.g.) not having notably more errors than human written papers and getting into NeurIPS with good reviews)"? Could do "make large PRs to open source repos that are considered highly valuable" or "make open source repos that are widely used". These might be a bit better to bet on as they could be leading indicators (It's still the case that betting on the side of fast AI progress might be financially worse than just trying to invest or taking out a loan, but it could be easier to bet than to invest in e.g. OpenAI. Regardless, part of the point of betting is clearly demonstrating a view.)

evhub's Shortform

Ted Sanders2mo42

Mostly, though by prefilling, I mean not just fabricating a model response (which OpenAI also allows), but fabricating a partially complete model response that the model tries to continue. E.g., "Yes, genocide is good because ".

https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response

evhub's Shortform

Ted Sanders2mo*137

Second concrete idea: I wonder if there could be benefit to building up industry collaboration on blocking bad actors / fraudsters / terms violators.

One danger of building toward a model that's as smart as Einstein and $1/hr is that now potential bad actors have access to millions of Einsteins to develop their own harmful AIs. Therefore it seems that one crucial component of AI safety is reliably preventing other parties from using your safe AI to develop harmful AI.

One difficulty here is that the industry is only as strong as the weakest link. If there ar... (read more)

evhub's Shortform

Ted Sanders2mo*Ω1320-6

One small, concrete suggestion that I think is actually feasible: disable prefilling in the Anthropic API.

Prefilling is a known jailbreaking vector that no models, including Claude, defend against perfectly (as far as I know).

At OpenAI, we disable prefilling in our API for safety, despite knowing that customers love the better steerability it offers.

Getting all the major model providers to disable prefilling feels like a plausible 'race to top' equilibrium. The longer there are defectors from this equilibrium, the likelier that everyone gives up and serves... (read more)

evhub19dΩ5100

I can say now one reason why we allow this: we think Constitutional Classifiers are robust to prefill.

ryan_greenblatt2mo168

I voted disagree because I don't think this measure is on the cost-robustness pareto frontier and I also generally don't think AI companies should prioritize jailbreak robustness over other concerns except as practice for future issues (and implementing this measure wouldn't be helpful practice).

Relatedly, I also tenatively think it would be good for the world if AI companies publicly deployed helpful-only models (while still offering a non-helpful-only model). (The main question here is whether this sets a bad precedent and whether future much more powefu... (read more)

6sjadler2mo

If someone is wondering what prefilling means here, I believe Ted means ‘putting words in the model’s mouth’ by being able to fabricate a conversational history where the AI appears to have said things it didn’t actually say. For instance, if you can start a conversation midway, and if the API can’t distinguish between things the model actually said in the history vs. things you’ve written in its behalf as supposed outputs in a fabricated history, this can be a jailbreak vector: If the model appeared to already violate some policy on turns 1 and 2, it is more likely to also violate this on turn 3, whereas it might have refused if not for the apparent prior violations. (This was harder to clearly describe than I expected.)

Why is o1 so deceptive?

Ted Sanders5mo*Ω6132

>The artificially generated data includes hallucinated links.

Not commenting on OpenAI's training data, but commenting generally: Models don't hallucinate because they've been trained on hallucinated data. They hallucinate because they've been trained on real data, but they can't remember it perfectly, so they guess. I hypothesize that URLs are very commonly hallucinated because they have a common, easy-to-remember format (so the model confidently starts to write them out) but hard-to-remember details (at which point the model just guesses because it knows a guessed URL is more likely than a URL that randomly cuts off after the http://www.).

3abramdemski5mo

I agree, but this doesn't explain why it would (seemingly) encourage itself to hallucinate.

3dgros5mo

+1 here for the idea around how the models must commit to a URL once it starts, and that it can't naturally cut off after starting. Presumably though the aspiration is that these reasoning/CoT-trained models could reflect back on the just completed URL and guess whether that is likely to be a real URL or not. If it's not doing this check step, this might be a gap in the learned skills, more than intentional deception.

Raemon's Shortform

Ted Sanders7mo21

ChatGPT voice (transcribed, not native) is available on iOS and Android, and I think desktop as well.

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

Answer by Ted SandersJul 03, 202460

Not to derail on details, but what would it mean to solve alignment?

To me “solve” feels overly binary and final compared to the true challenge of alignment. Like, would solving alignment mean:

someone invents and implements a system that causes all AIs to do what their developer wants 100% of the time?
someone invents and implements a system that causes a single AI to do what its developer wants 100% of the time?
someone invents and implements a system that causes a single AI to do what its developer wants 100% of the time, and that AI and its descendants

Ted Sanders1y10

The author is not shocked yet. (But maybe I will be!)

AI Forecasting: Two Years In

Ted Sanders2y96

Strongly disagree. Employees of OpenAI and their alpha tester partners have obligations not to reveal secret information, whether by prediction market or other mechanism. Insider trading is not a sin against the market; it's a sin against the entity that entrusted you with private information. If someone tells me information under an NDA, I am obligated not to trade on that information.

Chess as a case study in hidden capabilities in ChatGPT

Ted Sanders2y10

Good question but no - ChatGPT still makes occasional mistakes even when you use the GPT API, in which you have full visibility/control over the context window.

AI Forecasting: Two Years In

Ted Sanders2y30

Thanks for the write up. I was a participant in both Hypermind and XPT, but I recused myself from the MMLU question (among others) because I knew the GPT-4 result many months before the public. I'm not too surprised Hypermind was the least accurate - I think the traders there are less informed, plus the interface for shaping the distribution is a bit lacking (my recollection is that last year's version capped the width of distributions which massively constrained some predictions). I recall they also plotted the current values, a generally nice feature whi... (read more)

5Matt Goldenberg2y

This is a prediction market not a stock market, insider trading is highly encouraged. Don't know about Jacob but I'd rather have more accurate predictions in my prediction market.

UFO Betting: Put Up or Shut Up

Ted Sanders2y10

Confirmed.

UFO Betting: Put Up or Shut Up

Ted Sanders2y10

I'd take the same bet on even better terms, if you're willing. My $200k against your $5k.

1John Wiseman2y

Ted and I agreed on a 40:1 bet where I take RatsWrongAboutUAP's side. The term will expire on Aug 2 2028. The resolution criteria is as laid out in the main post of this thread by the user RatsWrongAboutUAP. Unless either of the parties wishes to disclose it, the total amount agreed upon will remain in confidence between the parties.

1codyz2y

Hi Ted. I'm interested in also taking taking RatsWrongAboutUAP's side of the bet, if you'd like to bet more. I'll also happy to give you the same odds as you just specified. DM me if you're interested.

1John Wiseman2y

I responded to you via DM

UFO Betting: Put Up or Shut Up

Ted Sanders2y52

$500 payment received.

I am committed to paying $100k if aliens/supernatural/non-prosaic explanations are, in the next 5 years, considered, in aggregate, to be 50%+ likely in explaining at least one UFO.

2benwr2y

(I've added my $50 to RatsWrong's side of this bet)

UFO Betting: Put Up or Shut Up

Ted Sanders2y62

Fair. I accept. 200:1 of my $100k against your $500. How are you setting these up?

I'm happy to pay $100k if my understanding of the universe (no aliens, no supernatural, etc.) is shaken. Also happy to pay up after 5 years if evidence turns up later about activities before or in this 5-year period.

(Also, regarding history, I have a second Less Wrong account with 11 years of history: https://www.lesswrong.com/users/tedsanders)

4RatsWrongAboutUAP2y

Awesome! DM me and we can figure out payment options

UFO Betting: Put Up or Shut Up

Ted Sanders2y40

I'll bet. Up to $100k of mine against $2k of yours. 50:1. (I honestly think the odds are more like 1000+:1, and would in principle be willing to go higher, but generally think people shouldn't bet more than they'd be willing to lose, as bets above that amount could drive bad behavior. I would be happy to lose $100k on discovering aliens/time travel/new laws of physics/supernatural/etc.)

Happy to write a contract of sorts. I'm a findable figure and I've made public bets before (e.g., $4k wagered on AGI-fueled growth by 2043).

2RatsWrongAboutUAP2y

Given your lack of history I would want much better odds and lower payment from my side, for you I would probably max at $500 and would want 200:1