LessOnline Festival

Ticket Prices increase tomorrow

Join us May 31st - June 2nd, in Berkeley CA for a festival of truth-seeking, optimization, and blogging. We'll have writing workshops, rationality classes, puzzle hunts, and thoughtful conversations across a sprawling fractal campus of nooks and whiteboards.

Buy Tickets

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
elifland129
0
The word "overconfident" seems overloaded. Here are some things I think that people sometimes mean when they say someone is overconfident: 1. They gave a binary probability that is too far from 50% (I believe this is the original one) 2. They overestimated a binary probability (e.g. they said 20% when it should be 1%) 3. Their estimate is arrogant (e.g. they say there's a 40% chance their startup fails when it should be 95%), or maybe they give an arrogant vibe 4. They seem too unwilling to change their mind upon arguments (maybe their credal resilience is too high) 5. They gave a probability distribution that seems wrong in some way (e.g. "50% AGI by 2030 is so overconfident, I think it should be 10%") * This one is pernicious in that any probability distribution gives very low percentages for some range, so being specific here seems important. 6. Their binary estimate or probability distribution seems too different from some sort of base rate, reference class, or expert(s) that they should defer to. How much does this overloading matter? I'm not sure, but one worry is that it allows people to score cheap rhetorical points by claiming someone else is overconfident when in practice they might mean something like "your probability distribution is wrong in some way". Beware of accusing someone of overconfidence without being more specific about what you mean.
RobertM3317
6
Vaguely feeling like OpenAI might be moving away from GPT-N+1 release model, for some combination of "political/frog-boiling" reasons and "scaling actually hitting a wall" reasons.  Seems relevant to note, since in the worlds where they hadn't been drip-feeding people incremental releases of slight improvements over the original GPT-4 capabilities, and instead just dropped GPT-5 (and it was as much of an improvement over 4 as 4 was over 3, or close), that might have prompted people to do an explicit orientation step.  As it is, I expect less of that kind of orientation to happen.  (Though maybe I'm speaking too soon and they will drop GPT-5 on us at some point, and it'll still manage to be a step-function improvement over whatever the latest GPT-4* model is at that point.)
Nisan52
1
12 years ago, in The state of Computer Vision and AI: we are really, really far away, Andrej Karpathy wrote: > The picture above is funny. > > But for me it is also one of those examples that make me sad about the outlook for AI and for Computer Vision. What would it take for a computer to understand this image as you or I do? [...] > > In any case, we are very, very far and this depresses me. What is the way forward? :( I just asked gpt-4o what's going on in the picture, and it understood most of it: > In this image, a group of men in business attire are seen in a locker room or a similar setting. The focus is on two men, where the taller man is standing on a scale. The shorter man, who appears to be playfully pressing down on the taller man's shoulders to increase his weight on the scale, is creating a humorous situation. Both men and those observing in the background are smiling or laughing, indicating that they are enjoying the lighthearted moment. The man pressing down seems to be taking part in a playful prank or joke, adding a sense of camaraderie and fun to the scene.
My timelines are lengthening.  I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding. I won't get into this debate here but I do want to note that my timelines have lengthened, primarily because some of the never-clearly-stated but heavily implied AI developments by proponents of very short timelines have not materialized. To be clear, it has only been a year since gpt-4 is released, and gpt-5 is around the corner, so perhaps my hope is premature. Still my timelines are lengthening.  A year ago, when gpt-3 came out progress was blindingly fast. Part of short timelines came from a sense of 'if we got surprised so hard by gpt2-3, we are completely uncalibrated, who knows what comes next'. People seemed surprised by gpt-4 in a way that seemed uncalibrated to me. gpt-4 performance was basically in line with what one would expect if the scaling laws continued to hold. At the time it was already clear that the only really important driver was compute  data and that we would run out of both shortly after gpt-4. Scaling proponents suggested this was only the beginning, that there was a whole host of innovation that would be coming. Whispers of mesa-optimizers and simulators.  One year in: Chain-of-thought doesn't actually improve things that much. External memory and super context lengths ditto. A whole list of proposed architectures seem to serve solely as a paper mill. Every month there is new hype about the latest LLM or image model. Yet they never deviate from expectations based on simple extrapolation of the scaling laws. There is only one thing that really seems to matter and that is compute and data. We have about 3 more OOMs of compute to go. Data may be milked another OOM.  A big question will be whether gpt-5 will suddenly make agentGPT work ( and to what degree). It would seem that gpt-4 is in many ways far more capable than (most or all) humans yet agentGPT is curiously bad.  All-in-all AI progress** is developing according to the naive extrapolations of Scaling Laws but nothing beyond that. The breathless twitter hype about new models is still there but it seems to be believed more at a simulacra level higher than I can parse.  Does this mean we'll hit an AI winter? No. In my model there may be only one remaining roadblock to ASI (and I suspect I know what it is). That innovation could come at anytime. I don't know how hard it is, but I suspect it is not too hard.  * the term AGI seems to denote vastly different things to different people in a way I find deeply confusing. I notice that the thing that I thought everybody meant by AGI is now being called ASI. So when I write AGI, feel free to substitute ASI.  ** or better, AI congress addendum:  since I've been quoted in dmurfet's AXRP interview as believing that there are certain kinds of reasoning that cannot be represented by transformers/LLMs I want to be clear that this is not really an accurate portrayal of my beliefs. e.g. I don't think transformers don't truly understand, are just a stochastic parrot, or in other ways can't engage in the abstract reasoning that humans do. I think this is clearly false, as seen by interacting with any frontier model. 
Linch358
3
(x-posted from the EA Forum) We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.  From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons: 1. Incentives 2. Culture From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host multibillion-dollar scientific/engineering projects: 1. As part of an intergovernmental effort (e.g. CERN’s Large Hadron Collider, the ISS) 2. As part of a governmental effort of a single country (e.g. Apollo Program, Manhattan Project, China’s Tiangong) 3. As part of a larger company (e.g. Google DeepMind, Meta AI) In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the cause for pausing is uncertain, and minimal incentive to stop or even take things slowly.  From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements: 1. Ideological AGI Vision AI-focused companies may have a large contingent of “true believers” who are ideologically motivated to make AGI at all costs and 2. No Pre-existing Safety Culture AI-focused companies may have minimal or no strong “safety” culture where people deeply understand, have experience in, and are motivated by a desire to avoid catastrophic outcomes.  The first one should be self-explanatory. The second one is a bit more complicated, but basically I think it’s hard to have a safety-focused culture just by “wanting it” hard enough in the abstract, or by talking a big game. Instead, institutions (relatively) have more of a safe & robust culture if they have previously suffered the (large) costs of not focusing enough on safety. For example, engineers who aren’t software engineers understand fairly deep down that their mistakes can kill people, and that their predecessors’ fuck-up have indeed killed people (think bridges collapsing, airplanes falling, medicines not working, etc). Software engineers rarely have such experience. Similarly, governmental institutions have institutional memories with the problems of major historical fuckups, in a way that new startups very much don’t.

Popular Comments

Recent Discussion

A stance against student debt cancellation doesn’t rely on the assumptions of any single ideology. Strong cases against student debt cancellation can be made based on the fundamental values of any section of the political compass. In no particular order, here are some arguments against student debt cancellation from the perspectives of many disparate ideologies.

Equity and Fairness

Student debt cancellation is a massive subsidy to an already prosperous and privileged population. American college graduates have nearly double the income of high school graduates. African Americans are far underrepresented among degree holders compared to their overall population share.

Within the group of college graduates debt cancellation increases equity, but you can’t get around the fact that 72% of African Americans have no student debt because they never went to college....

I'd like to provide a qualitative counterpoint.

Aren't these arguments valid for almost all welfare programs provided by a first-world country to anyone but the base of the social pyramid? For one example, let's take retirement. All the tax money that goes into paying retirees to do nothing would be much better spent by helping victims of malaria etc. in 3rd world countries. If they weren't responsible enough to save during their working years to be able to live without working for the last 10 to 30 years of their lives, especially those from the lower midd... (read more)

4Odd anon
Almost all of these are about "cancellation" by means of transferring money from the government to those in debt. Are there similar arguments against draining some of the ~trillion dollars held by university endowments to return to students who (it could be argued) were implicitly promised an outcome they didn't get? That seems a lot closer to the plain meaning of "cancelling debt".
4Maxwell Tabarrok
That's not a part of any of the plans to cancel student debt that have been implemented or are being considered. That would definitely change a lot of the arguments but I don't think it would make debt cancellation look like a much better policy, though the reasons it was bad would be different.
7Algon
Wow, I didn't realize just how bad student debt cancellation is from so many perspectives. Now I want more policy critiques like this. 
  • Until now ChatGPT dealt with audio through a pipeline of 3 models: audio transcription, then GPT-4, then text-to-speech. GPT-4o is apparently trained on text, voice and vision so that everything is done natively. You can now interrupt it mid-sentence.
  • It has GPT-4 level intelligence according to benchmarks. Somewhat better at transcription than Whisper, and considerably better at vision than previous models.
  • It's also somehow been made significantly faster at inference time. Might be mainly driven by an improved tokenizer. Edit: Nope, English tokenizer is only 1.1x.
  • It's confirmed it was the "gpt2" model found at LMSys arena these past weeks, a marketing move. It has the highest ELO as of now.
  • They'll be gradually releasing it for everyone, even free users.
  • Safety-wise, they claim to have run it through their Preparedness framework
...
1Ben Livengood
I was a bit surprised that they chose (allowed?) 4o to have that much emotion. I am also really curious how they fine-tuned it to that particular state and how much fine-tuning was required to get it conversational. My naive assumption is that if you spoke at a merely-pretrained multimodal model it would just try to complete/extend the speech in one's own voice, or switch to another generically confabulated speaker depending on context. Certainly not a particular consistent responder. I hope they didn't rely entirely on RLHF. It's especially strange considering how I Am A Good Bing turned out with similarly unhinged behavior. Perhaps the public will get a very different personality. The current ChatGPT text+image interface claiming to be GPT-4o is adamant about being an artificial machine intelligence assistant without emotions or desires, and sounds a lot more like GPT-4 did. I am not sure what to make of that.
2simeon_c
Agreed. Note that they don't say what Martin claim they say, but they only say I think it's reasonably likely to imply that they broke all their non-evaluation PF commitments, while not being technically wrong. 
2Zach Stein-Perlman
Full quote: [Edit after Simeon replied: I disagree with your interpretation that they're being intentionally very deceptive. But I am annoyed by (1) them saying "We’ve evaluated GPT-4o according to our Preparedness Framework" when the PF doesn't contain specific evals and (2) them taking credit for implementing their PF when they're not meeting its commitments.]

Right.  Thanks for putting the full context.  Voluntary commitments refers to the WH commitments which are much narrower than the PF so I think my observation holds.

On AI and Jobs: How to Make AI Work With Us, Not Against Us With Daron Acemoglu

Here is Claude.ai's summary of Daron Acemoglu's main ideas from the podcast:

  • Historically, major productivity improvements from new technologies haven't always translated into benefits for workers. It depends on how the technologies are used and who controls them.
  • There are concerns that AI could further exacerbate inequality and create a "two-tiered society" if the benefits accrue mainly to a small group of capital owners and highly skilled workers. Widespread prosperity is not automatic.
  • We should aim for "machine usefulness" - AI that augments and complements human capabilities - rather than just "machine intelligence" focused on automating human tasks. But the latter is easier to monetize.
  • Achieving an AI future that benefits workers broadly will require
...
1FlorianH
From what you write, Acemoglu's suggestions seem unlikely to be very successful in particular given international competition. I paint a bit b/w, but I think the following logic remains salient also in the messy real world: 1. If your country unilaterally tries to halt development of the infinitely lucrative AI inventions that could automate jobs, other regions will be more than eager to accommodate the inventing companies. So, from the country's egoistic perspective, might as well develop the inventions domestically and at least benefit from being the inventor rather than the adopter 2. If your country unilaterally tries to halt adoption of the technology, there are numerous capable countries keen to adopt and to swamp you with their sales 3. If you'd really be able to coordinate globally to enable 1. or 2. globally - extremely unlikely in the current environment and given the huge incentives for individual countries to remain weak in enforcement - then it seems you might as well try to impose directly the economic first best solution w.r.t. robots vs. labor: high global tax rates and redistribution.   Separately, I at least spontaneously wonder: How would one even want to go about differentiating what is the 'bad automation' to be discouraged, from legit automation without which no modern economy could competitively run anyway? For a random example, say if Excel wouldn't yet exist (or, for its next update..), we'd have to say: Sorry, cannot do such software, as any given spreadsheet has the risk of removing thousands of hours of work...?! Or at least: Please, Excel, ask the human to manually confirm each cell's calculation...?? So I don't know how we'd in practice enforce non-automation. Just 'it uses a large LLM' feels weirdly arbitrary condition - though, ok, I could see how, due to a lack of alternatives, one might use something like that as an ad-hoc criterion, with all the problems it brings. But again, I think points 1. & 2. mean this is unrealistic or
2Roman Leventov
Clearly, specific rule-based regulation is a dumb strategy. Acemoglu's suggestions: tax incentives to keep employment and "labour voice" to let people decide in the context of specific company and job how they want to work with AI. I like this self-governing strategy. Basically, the idea is that people will want to keep influencing things and will resist "job bullshittification" done to them, if they have the political power ("labour voice"). But they should also have alternative choice of technology and work arrangement/method that doesn't turn their work into rubber-stamping bullshit, but also alleviates the burden ("machine usefulness"). Because if they only have the choice between rubber-stamping bullshit job and burdensome job without AI, they may choose rubber-stamping.
2Roman Leventov
If anything, this problem seems more pernicious wrt. climate change mitigation and environmental damage: it's much more distributed, not only in US and China, but Russia and India are also big emitters, big leverage in Brazil, Congo, and Indonesia with their forests, overfishing and ocean pollution everywhere, etc. With AI, it's basically the question of regulating US and UK companies: EU is always eager to over-regulate relative to the US, and China is already successfully and closely regulating their AI for a variety of reasons (which Acemoglu points out). The big problem of the Chinese economy is weak internal demand, and automating jobs and therefore increasing inequality and decreasing the local purchasing power is the last thing that China wants.

But I should add, I agree that 1-3 poses challenging political and coordination problems. Nobody assumes it will be easy, including Acemoglu. It's just another one in the row of hard political challenges posed by AI, along with the questions of "aligned with whom?", considering/accounting for people's voice past dysfunctional governments and political elites in general, etc.

The acceptable tone of voice here feels like 3mm wide to me. I'm always having bad manners

I swear to never joke again sir

2Garrett Baker
In Magna Alta Doctrina Jacob Cannell talks about exponential gradient descent as a way of approximating solomonoff induction using ANNs My read of this is we get a criterion for when one should be a hedgehog versus a fox in forecasting: One should be a fox when the distributions you need to operate in are normal, or rather when it does not have long tails, and you should be a hedgehog when your input distribution is more log-normal, or rather when there may be long-tails. This makes some sense. If you don't have many outliers, most theories should agree with each other, its hard to test & distinguish between the theories, and if one of your theories does make striking predictions far different from your other theories, its probably wrong, just because striking things don't really happen. In contrast, if you need to regularly deal with extreme scenarios, you need theories capable of generalizing to those extreme scenarios, which means not throwing out theories for making striking or weird predictions. Striking events end up being common, so its less an indictment. But there are also reasons to think this is wrong. Hits based entrepreneurship approaches for example seem to be more foxy than standard quantitative or investment finance, and hits based entrepreneurship works precisely because the distribution of outcomes for companies is long-tailed. In some sense the difference between the two is a "sin of omission" vs a "sin of commission" disagreement between the two approaches, where the hits-based approach needs to see how something could go right, while the standard finance approach needs to see how something could go wrong. So its not so much a predictive disagreement between the two approaches, but more a decision theory or comparative advantage difference.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

tl;dr: LessWrong released an album! Listen to it now on Spotify, YouTube, YouTube Music, or Apple Music.

On April 1st 2024, the LessWrong team released an album using the then-most-recent AI music generation systems. All the music is fully AI-generated, and the lyrics are adapted (mostly by humans) from LessWrong posts (or other writing LessWrongers might be familiar with).

Honestly, despite it starting out as an April fools joke, it's a really good album. We made probably 3,000-4,000 song generations to get the 15 we felt happy about, which I think works out to about 5-10 hours of work per song we used (including all the dead ends and things that never worked out).

The album is called I Have Been A Good Bing. I think it is a pretty...

3Drake Morrison
For what it's worth, I find the Dath Ilan song to be one of my favorites. Upon listening I immediately wanted this song to be played at my funeral.  There's something powerful there, which can be dangerous, but it's a kind of feeling that I draw strength and comfort from. I specifically like the phrasing around sins and forgiveness, and expect it to be difficult to engender the same comfort or strength in me without it. Among my friends I'm considered a bit weird in how much I think about grief and death and loss. So maybe it's a weird psychology thing. 
1DPiepgrass
Yeah, the lyrics didn't sit well with me either so I counterlyricized it.
2DPiepgrass
You guys were using an AI that generated the music fully formed (as PCM), right? It ticks me off that this is how it works. It's "good", but you see the problems: 1. Poor audio quality [edit: the YouTube version is poor quality, but the "Suno" versions are not. Why??] 2. You can't edit the music afterward or re-record the voices 3. You had to generate 3,000-4,000 tracks to get 15 good ones Is there some way to convince AI people to make the following? 1. An AI (or two) whose input is a spectral decomposition of PCM music (I'm guessing exponentially-spaced wavelets will be better than FFT) whose job is to separate the music into instrumental tracks + voice track(s) that sum up to the original waveform (and to detect which tracks are voice tracks). Train it using (i) tracker and MIDI archives, which are inherently pre-separated into different instruments, (ii) AI-generated tracker music with noisy instrument timing (the instruments should be high-quality and varied but the music itself probably doesn't have to be good for this to work, so a quick & dirty AI could be used to make training data) and (iii) whatever real-world decompositions can be found. 2. An AI that takes these instrumental tracks and decomposes each one into (i) a "music sheet" (a series of notes with stylistic information) and (ii) a set of instrument samples, where each sample is a C-note (middle C ± one or two octaves, drums exempt), with the goal of minimizing the set of instrument samples needed to represent an instrument while representing the input faithfully (if a large number of samples are needed, it's probably a voice track or difficult instrument such as guitar, but some voice tracks are repetitive and can still be deduplicated this way, and in any case the decomposition into notes is important). [alternate version of this AI: use a fixed set of instrument samples, so the AIs job is not to decompose but to select samples, making it more like speech-to-text rather than a decomposit

When I was working on my AI music project (melodies.ai) a couple of years ago, I ended up focusing on creating catchy melodies for this reason. Even back then, voice singing software was already quite good, so I didn't see the need to do everything end-to-end. This approach is much more flexible for professional musicians, and I still think it's a better idea overall. We can describe images with text much more easily than music, but for professional use, AI-generated images still require fine-scale editing.

2Alexander Gietelink Oldenziel
Yes agreed. What I don't get about this position: If it was indeed just scaling - what's AI research for ? There is nothing to discover, just scale more compute. Sure you can maybe improve the speed of deploying compute a little but at the core of it it seems like a story that's in conflict with itself?
3Adam Shai
Lengthening from what to what?

I've never done explicit timelines estimates before so nothing to compare to. But since it's a gut feeling anyway, I'm saying my gut feeling is lengthening.

2faul_sname
Can you expand on what you mean by "create nanotech?" If improvements to our current photolithography techniques count, I would not be surprised if (scaffolded) LLMs could be useful for that. Likewise for getting bacteria to express polypeptide catalysts for useful reactions, and even maybe figure out how to chain several novel catalysts together to produce something useful (again, referring to scaffolded LLMs with access to tools). If you mean that LLMs won't be able to bootstrap from our current "nanotech only exists in biological systems and chip fabs" world to Drexler-style nanofactories, I agree with that, but I expect things will get crazy enough that I can't predict them long before nanofactories are a thing (if they ever are). Likewise, I don't think LLMs can immediately obsolete all of the parts of my job. But they sure do make parts of my job a lot easier. If you have 100 workers that each spend 90% of their time on one specific task, and you automate that task, that's approximately as useful as fully automating the jobs of 90 workers. "Human-equivalent" is one of those really leaky abstractions -- I would be pretty surprised if the world had any significant resemblance to the world of today by the time robotic systems approached the dexterity and sensitivity of human hands for all of the tasks we use our hands for, whereas for the task of "lift heavy stuff" or "go really fast" machines left us in the dust long ago. Iterative improvements on the timescale we're likely to see are still likely to be pretty crazy by historical standards. But yeah, if your timelines were "end of the world by 2026" I can see why they'd be lengthening now.
5Nisan
12 years ago, in The state of Computer Vision and AI: we are really, really far away, Andrej Karpathy wrote: I just asked gpt-4o what's going on in the picture, and it understood most of it:
Nisan42

Of course, Karpathy's post could be in the multimodal training data.