gwern — LessWrong

Another category of what I suspect are increasing returns come from ceiling effects: if there is a ceiling, and you hit the ceiling, a rational observer will infer that your true value is noticeably higher than the ceiling. The lower the ceiling, the more the relative inflation. So you get an effect where returns are diminishing from 97% -> 98% -> 99%, but then at 99% -> 100% there's a sudden spike.

I've wondered if this is responsible for an apparent effect where adding a little bit of polish to a nearly-perfect design has an outsized impact compared to the naive diminishing-returns expectation of that polish being invisible.

Everyone has a plan until they get lied to the face

gwern3d93

'Not as easy as it used to be' != 'infeasible for a stage magician'. (Keep in mind they are well-documented to do things like research audience members in advance just to pull off better cold reads. They only need one thing to succeed. How many ways are there to hack a pinball machine?)

And you're making a lot of assumptions here about the setup, like having a known device (maybe it's a video) which has already contacted Wikipedia before and in-cache has pinned the WP HTTPS cert, and also the user having gone to the right URL/domain in the first place (hey you know what's hard to see on current mobile browsers because all of the tech giants despise URLs and want to eliminate them and keep you in their walled garden?). I just checked on my phone right now, and if you browse to En, the default Android Chrome browser both does not show you the https and as soon as you scroll down even slightly, the entire URL disappears. The only way I found to easily see the protocol was to edit the URL! It remains quite easy to phish or spoof or cross the 'line of death', and people fall for these things all the time. Or, what if it's been vandalized (can take a long time to fix, and could've been vandalized by a confederate mere seconds before the audience member checks)? What if it was vandalized and you're looking at a valid WP mirror which is out of date? What if you're looking at a specific data-poisoned revision?

(Note by the way that almost none of these exploits would count for bug bounties from anyone.)

8 Questions for the Future of Inkhaven

gwern5d50

It looks like a lot of people are easily exceeding the 500-word minimum. Maybe that was insufficiently ambitious (given that no one has dropped out yet?), and Inkhaven 2 should bump to 750 or 1000 words. (Note that LLMs are already good at giving feedback and asking questions on essays and suggesting parts to expand on, so adding another 500 words is often quite easy - as long as you are willing to do the work!)

8 Questions for the Future of Inkhaven

gwern5d42

We're going to produce about 41 * 30 = 1,230 blogposts over the month of November.

A related question would be, what would be the right number of great posts? The kind that might become shorthand or establish a new idea, or be quoted years from now.

I would say that I expect something like 1 in 100 posts to be great. (Imagine a blogger who writes 1 post a week for 2 years. Wouldn't you expect 1 or 2 really awesome posts? So then that's a 1% rate.) Then that would imply ~12 great posts from Inkhaven.

A related question is whether we can try to estimate whether Inkhaven helped. Perhaps we could go back over the edge cases in admission, which prompted some debate and were not clear accept/rejects, and pre-register their names now, before Inkhaven is over, and then have someone blinded look over their writing trajectories or something?

HPMOR: The (Probably) Untold Lore

gwern6d52

It's a fanfiction notation: https://en.wikipedia.org/wiki/Exclamation_mark#Internet_culture https://www.angelfire.com/falcon/moonbeam/terms.html#!

(My pet theory is that it's descended from the original email notation on ARPAnet.)

Shortform

gwern6d227

It is also a simple fact that in any exponentially growing technology, it will be a 'pop culture': no one remembers X because they were literally not around then. If we look at how fast investment and market caps and paper count have grown, 'LLMs' must have a doubling time under a year. In which case, anything 3 years ago is before the vast majority of people were even interested in LLMs! (Even in AI/tech circles I talk with plenty of people who got into it and started paying attention only post-ChatGPT...) You can't memory-hole something you never knew.

A lot of people don't talk about Sydney for the same reason they don't talk about Tay, say.

Against Powerful Text Editors

gwern6d20

I mean sure, all that could be true. But what reason do you have to think it? Why do you not then use the standard editor?

[Link] Ignorance, a skilled practice

gwern16d100

https://en.wikipedia.org/wiki/Manege_Affair

"Khrushchev walked around the room, went up to Yulo's blue painting and asked: "What is this?" "A lunar landscape," Yulo answered. "Have you been there, asshole?" Khrushchev began to yell wildly. And Yulo answered: "That's how I imagine it." "I'll send you to the West, formalist, no, no, I'll deport you, no, I'll send you to a camp!" Khrushchev continued to rage. And Yulo answered: "I've already been there." Then Khrushchev said that no, he wouldn't deport him, but he would re-educate him."[4]

[Link] Ignorance, a skilled practice

gwern16d20

Don't worry, we eventually did find the source and context! (see the thread) Just had to be patient and wait to get lucky.

Why there is still one instance of Eliezer Yudkowsky?

gwern17d71

There will be no genius-level insights in 2025, but he could automate a lot of routine alignment work, like evaluating models.

What routine research work of your own have you automated with your digital-me?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments