Lightcone Infrastructure FundraiserGoal 1:$655,982 of $1,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
gwernΩ11226
0
Concrete benchmark proposals for how to detect mode-collapse and AI slop and ChatGPTese, and why I think this might be increasingly important for AI safety, to avoid 'whimper' or 'em hell' kinds of existential risk: https://gwern.net/creative-benchmark
Here's a quick interesting-seeming devinterp result: We can estimate the Local Learning Coefficient (LLC, the central quantity of Singular learning theory, for more info see these posts / papers) of a simple grokking model on its training data over the course of training.  This yields the following plot: (note: estimated LLC = lambdahat = ^λ) What's interesting about this is that the estimated LLC of the model in this plot closely tracks test loss, even though it is estimated on training data. On the one hand this is unsurprising: SLT predicts that the LLC determines the Bayes generalization error in the Bayesian setting.[1] On the other hand this is quite surprising: the Bayesian setting is not the same as SGD, an increase in training steps is not the same as an increase in the total number of samples, and the Bayes generalization is not exactly the same as test loss. Despite these differences, the LLC clearly tracks (in-distribution) generalization here. We see this as a positive sign for applying SLT to study neural networks trained by SGD. This plot was made using the devinterp  python package, and the code to reproduce it (including hyperparameter selection) is available as a notebook at https://github.com/timaeus-research/devinterp/blob/main/examples/grokking.ipynb.  Thanks to Nina Panickserry and Dmitry Vaintrob, whose earlier post on learning coefficients of modular addition served as the basis for this experiment.   1. ^ More precisely: in the Bayesian setting the Bayes generalization error, as a function of the number of samples n, is λ/n in leading order.
To avoid deploying a dangerous model, you can either (1) test the model pre-deployment or (2) test a similar older model with tests that have a safety buffer such that if the old model is below some conservative threshold it's very unlikely that the new model is dangerous. DeepMind says it uses the safety-buffer plan (but it hasn't yet operationalized thresholds/buffers as far as I know). Anthropic's original RSP used the safety-buffer plan; its new RSP doesn't really use either plan (kinda safety-buffer but it's very weak). (This is unfortunate.) OpenAI seemed to use the test-the-actual-model plan.[1] This isn't going well. The 4o evals were rushed because OpenAI (reasonably) didn't want to delay deployment. Then the o1 evals were done on a weak o1 checkpoint rather than the final model, presumably so they wouldn't be rushed, but this presumably hurt performance a lot on some tasks (and indeed the o1 checkpoint performed worse than o1-preview on some capability evals). OpenAI doesn't seem to be implementing the safety-buffer plan, so if a model is dangerous but not super obviously dangerous, it seems likely OpenAI wouldn't notice before deployment.... (Yay OpenAI for honestly publishing eval results that don't look good.) 1. ^ It's not explicit. The PF says e.g. 'Only models with a post-mitigation score of "medium" or below can be deployed.' But it also mentions forecasting capabilities.
habryka15732
27
Reputation is lazily evaluated When evaluating the reputation of your organization, community, or project, many people flock to surveys in which you ask randomly selected people what they think of your thing, or what their attitudes towards your organization, community or project are.  If you do this, you will very reliably get back data that looks like people are indifferent to you and your projects, and your results will probably be dominated by extremely shallow things like "do the words in your name invoke positive or negative associations". People largely only form opinions of you or your projects when they have some reason to do that, like trying to figure out whether to buy your product, or join your social movement, or vote for you in an election. You basically never care about what people think about you while engaging in activities completely unrelated to you, you care about what people will do when they have to take any action that is related to your goals. But the former is exactly what you are measuring in attitude surveys. As an example of this (used here for illustrative purposes, and what caused me to form strong opinions on this, but not intended as the central point of this post): Many leaders in the Effective Altruism community ran various surveys after the collapse of FTX trying to understand what the reputation of "Effective Altruism" is. The results were basically always the same: People mostly didn't know what EA was, and had vaguely positive associations with the term when asked. The people who had recently become familiar with it (which weren't that many) did lower their opinions of EA, but the vast majority of people did not (because they mostly didn't know what it was).  As far as I can tell, these surveys left most EA leaders thinking that the reputational effects of FTX were limited. After all, most people never heard about EA in the context of FTX, and seemed to mostly have positive associations with the term, and the average like
people who dislike AI, and therefore could be taking risks from AI seriously, are instead having reactions like this. https://blue.mackuba.eu/skythread/?author=brooklynmarie.bsky.social&post=3lcywmwr7b22i why? if we soberly evaluate what this person has said about AI, and just, like, think about why they would say such a thing - well, what do they seem to mean? they typically say "AI is destroying the world", someone said that in the comments; but then roll their eyes at the idea that AI is powerful. They say the issue is water consumption - why would someone repeat that idea? Under what framework is that a sensible combination of things to say? what consensus are they trying to build? what about the article are they responding to? I think there are straightforward answers to these questions that are reasonable and good on behalf of the people who say these things, but are not as effective by their own standards as they could be, and which miss upcoming concerns. I could say more about what I think, but I'd rather post this as leading questions, because I think the reading of the person's posts you'd need to do to go from the questions I just asked to my opinions will build more of the model I want to convey than saying it directly. But I think the fact that articles like this get reactions like this is an indication that orgs like Anthropic or PauseAI are not engaging seriously with detractors, and trying seriously to do so seems to me like a good idea. It's not my top priority ask for Anthropic, but it's not very far down the virtual list. But it's just one of many reactions of this category I've seen that seem to me to indicate that people engaging with a rationalist-type negative attitude towards their observations of AI are not communicating successfully with people who have an ordinary-person-type negative attitude towards what they've seen of AI. I suspect that at least a large part of the issue is that rationalists have built up antibodies to a certain ki

Popular Comments

Recent Discussion

I'll keep this short.

  • Google’s Willow quantum chip significantly outpaces current supercomputers, solving tasks in minutes that would otherwise take billions of years.
  • Hypothesis: Accelerating advancements in tech and AI could lead to quantum supremacy arriving sooner than the 2030s, contrary to expert predictions.
  • Legacy banking systems, being centralized, could transition faster to post-quantum-safe encryption by freezing transfers, re-checking processes, and migrating to new protocols in a controlled manner.
  • Decentralized cryptocurrencies face bigger challenges:
    • Hard forks are difficult to coordinate across a decentralized network.
    • Transitioning to quantum-safe algorithms could lead to longer transaction signatures and significantly higher fees, eroding trust in the system.
  • If quantum computers compromise current cryptography, tangible assets (e.g., real estate, stock indices) may retain more value compared to digital assets like crypto.

Thoughts?

5interstice
I don't see how your first bullet point is much evidence for the second, unless you have reason to believe that the Willow chip has a level of performance much greater than experts predicted at this point in time.
G10

I meant extrapolating developments in the future.

Dagon20

Paying the (ongoing, repeated) pirate-game blackmail ("pay us or we'll impose a wealth tax") IS a form of wealth tax.  You probably need to be more specific about what kinds and levels of wealth tax could happen with various counterfactual assumptions (without those assumptions, there's no reason to believe anything is possible except what actually exists).

1daijin
Applied to a local scale, this feels similar to the notion that we should employ our willpower to allow burnout as discussed here
1daijin
I made a v2 of this shortform that answers your point with an example from recent history.
1daijin
I made a v2 of this shortform

This is a pilot post for a future sequence. I'm posting to get feedback on what's here so far; I like the argument, but I'm unsure about the presentation, so there's a chance I'll totally rewrite this for the final product.


We moderns think of philosophy and science as sharply distinct disciplines. However, the term 'science' didn't really start getting its modern sense until sometime in the last few centuries. Etymologically, the word comes from the Latin root scire, which just means "to know". According to Etymonline, it wasn't until around or the time of the Enlightenment that the term slowly started referring to the movement we know today, the one that systematically studies the physical world by close observation and careful modeling.

Indeed, some of the earliest work...

What do you think avout the core concept of Explanatory Fog, that is secrecy leading to distrust leading to a viral mental breakdown? Possibly leading eventually to the end of civlisation.  Happy to rework it if the core concept is good.

Many contra dances have tried a "first-time free" policy, but I think "second-time free" is usually a better choice. Most first-timers show up expecting to pay something, so you might as well take their money. Many first-timers don't end up coming back, so better to reserve your incentive for the ones that do. And it filters out people who are only interested in free experiences. BIDA switched in 2022 and I think this has gone very well.

I think the main case where first-time free would be a better fit is a community where you're having a lot of trouble getting people to come out and try the dance but you're pretty sure that if they do they'll stick around. But while contra can be pretty addictive I don't think it's that addictive.

(This isn't something we came up with; for example, here's Tucson in 2008.)

4Raemon
Mod note: I normally leave this sort of post on personal blog because they are pretty niche, but, I frontpaged this one because it was a) short, and b) the principle seemed like it might generalize to other marketing situations.
plex20

Good call! I have a use for this idea :)

2jefftk
Funny! I almost deleted the cross-post because it seemed too short to be interesting here.
2Dagon
Do you have a strategy doc or list of criteria for promotional programs you're considering?  My first question, as an outsider, would be "why give ANY freebies at all"?  My second would be "what dimensions of price discrimination are important" (whether they've tried other dances, what their economic status is, are they seeking to meet people or just to dance, etc.)  And third "what is the budget for promos"? My mental model is that if it's a financial hardship for someone, they're probably not going to be a regular attendee just because one is free.  For that case, you need some sort of subsidy/discount that goes on longer than one event. If the main thing is to reduce perceived risk, as opposed to actual cost, you should consider a refund option - for anyone, they can ask for a refund if they didn't get their money's worth.  Limit it to once per attendee - they can come back without penalty, but the risk is now on them. This also gives a great feedback channel to learn WHY a new attendee didn't find it worth the price.

There's something intuitively intriguing about the concept of shoes with spring elements, something that made many kids excited about getting "moon shoes", but they found the actual item rather disappointing. Using springs somehow with legged movement also makes some logical sense: walking and running involve cyclic energy changes, and the Achilles tendon stores some elastic energy. Is that perspective missing something?

big springs

In a sense, spring elements in shoes are standard: sneakers have elastic foam in the soles, and maximizing the sole bounciness does slightly improve running performance. Jumping stilts are the modern version of spring shoes. They actually work, which also means they're quite dangerous - if what the kids who wanted moon shoes imagined was accurate, their parents wouldn't have bought that for them. The concept...

This was a quick and short post, but some people ended up liking it a lot. In retrospect I should've written a bit more, maybe gone into the design of recent running shoes. For example, this Nike Alphafly has a somewhat thick heel made of springy foam that sticks out behind the heel of the foot, and in the front, there's a "carbon plate" (a thin sheet of carbon fiber composite) which also acts like a spring. In the future, there might be gradual evolution towards more extreme versions of the same concept, as recent designs become accepted. Running shoes wi... (read more)

11eukaryote
ReviewThis was just a really good post. It starts off imaginative and on something I'd never really thought about - hey, spring shoes are a great idea, or at least the dream of them is. It looks at different ways this has sort have been implemented, checks assumptions, and goes down to the basic physics of it, and then explores some related ideas. I like someone who's just interested in a very specific thing exploring the idea critically from different angles and from the underlying principles. I want to read more posts like this. I also, now, want shoes with springs on them. 
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

In a recent essay, Euan McLean suggested that a cluster of thought experiments “viscerally capture” part of the argument against computational functionalism. Without presenting an opinion about the underlying claim about consciousness, I will explain why these arguments fail as a matter of computational complexity. Which, parenthetically, is something that philosophers should care about.

To explain the question, McLean summarizes part of Brian Tomasik’s essay "How to Interpret a Physical System as a Mind." There, Tomasik discusses the challenge of attributing consciousness to physical systems, drawing on Hilary Putnam's "Putnam's Rock" thought experiment. Putnam suggests that any physical system, such as a rock, can be interpreted as implementing any computation. This is meant to challenge the idea that computation alone defines consciousness. It challenges computational functionalism by implying that...

2Nathan Helm-Burger
That doesn't quite follow to me. The book seems just as inert as the popcorn-to-consciousness map. The book doesn't change yhe incoming slip of paper (popcorn) in any way, it just responds with an inert static map yo result in an outgoing slip of paper (consciousness), utilizing the map-and-popcorn-analyzing-agent-who-lacks-understanding (man in the room).

The book in the Chinese Room directs the actions of the little man in the room. Without the book, the man doesn't act, and the text doesn't get translated.

The popcorn map on the other hand doesn't direct the popcorn to do what it does. The popcorn does what it does, and then the map in a post-hoc way is generated to explain how what the popcorn did maps to some particular calculation.

You can say that "oh well, then, the popcorn wasn't really conscious until the map was generated; it was the additional calculations that went into generating the map that rea... (read more)

1notfnofn
I'm going to read https://www.scottaaronson.com/papers/philos.pdf, https://philpapers.org/rec/PERAAA-7, and the appendix here: https://www.lesswrong.com/posts/dkCdMWLZb5GhkR7MG/ (as well as the actual original statements of Searle's Wall, Johnston's popcorn, and Putnam's rock), and when that's eventually done I might report back here or make a new post if this thread is long dead by then
4Noosphere89
IMO, the entire Chinese room thought experiment dissolves into clarity once we remember that the intuitive meaning of understanding is formed around algorithms that are not lookup tables, because trying to create an infinite look-up table would be infeasible in reality, thus our intuitions go wrong in extreme cases. I agree with the discord comments here on this point: The portion of the argument where I contest is step 2 here (it's a summarized version): Or this argument here: As stated, if the computer program is accessible to him, then for all intents and purposes, he does understand Chinese for the purposes of interacting until the program is removed (assuming that it completely characterizes Chinese and correctly works for all inputs). I think the key issue is that people don't want to accept that if we were completely unconstrained physically, even very large look-up tables would be a valid answer to making an AI that is useful.
4ChristianKl
I believe about Sacks views comes from regularly listening to the All-In Podcast where he regularly talks about AI. Sacks is smarter and more sophisticated than that.  In the real world, efforts of the Department of Homeland security that started with censoring for reasons of national security ended up increasing the scope of what they censor. In the end the lab leak theory got censored and if you would ask the Department of Homeland security for their justification there's a good chance that they would say "national security".
Akash20

I believe about Sacks views comes from regularly listening to the All-In Podcast where he regularly talks about AI.

Do you have any quotes or any particular podcast episodes you recommend?

if you would ask the Department of Homeland security for their justification there's a good chance that they would say "national security".

Yeah, I agree that one needs to have a pretty narrow conception of national security. In the absence of that, there's concept creep in which you can justify pretty much anything under a broad conception of national security. (Indeed, I ... (read more)