Every now and then, some AI luminaries
I agree with (1) and strenuously disagree with (2).
The last time I saw something like this, I responded by writing: LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem.
Well, now we have a second entry in the series, with the new preprint book chapter “Welcome to the Era of Experience” by...
My intuition says reward hacking seems harder to solve than this (even in EEA), but I'm pretty unsure. One example is, under your theory, what prevents reward hacking through forming a group and then just directly maxing out on mutually liking/admiring each other?
When applying these ideas to AI, how do you plan to deal with the potential problem of distributional shifts happening faster than we can edit the reward function?
Come get old-fashioned with us, and let's read the sequences at Lighthaven! We'll show up, mingle, do intros, and then get to beta test an app Lightcone is developing for LessOnline. Please do the reading beforehand - it should be no more than 20 minutes of reading. And BRING YOUR LAPTOP!!! You'll need it for the app.
This group is aimed for people who are new to the sequences and would enjoy a group experience, but also for people who've been around LessWrong and LessWrong meetups for a while and would like a refresher.
This meetup will also have dinner provided! We'll be ordering pizza-of-the-day from Sliver (including 2 vegan pizzas). Please RSVP to this event so we know how many people to have food for.
This week we'll be...
Hear me out, I think the most forbidden technique is very useful and should be used, as long as we avoid the "most forbidden aftertreatment:"
The reason why the most forbidden technique is forbidden,...
If there turns out not to be an AI crash, you get a 1/(1+7) * $25,000 = $3,125
If there is an AI crash, you transfer $25k to me.
If you believe that AI is going to keep getting more capable, pushing rapid user growth and work automation across sectors, this is near free money. But to be honest, I think there will likely be an AI crash in the next 5 years, and on average expect to profit well from this one-year bet.
If I win, I want to give the $25k to organisers who can act fast to restrict the weakened AI corps in the wake of the crash. So bet me if you're highly confident that you'll win or just want to hedge the community against the...
Remmelt, if you wish, I'm happy to operationalize a bet. I think you're wrong.
American democracy currently operates far below its theoretical ideal. An ideal democracy precisely captures and represents the nuanced collective desires of its constituents, synthesizing diverse individual preferences into coherent, actionable policy.
Today's system offers no direct path for citizens to express individual priorities. Instead, voters select candidates whose platforms only approximately match their views, guess at which governmental level—local, state, or federal—addresses their concerns, and ultimately rely on representatives who often imperfectly or inaccurately reflect voter intentions. As a result, issues affecting geographically dispersed groups—such as civil rights related to race, gender, or sexuality—are frequently overshadowed by localized interests. This distortion produces presidential candidates more closely aligned with each other's socioeconomic profiles than with the median voter.
Traditionally, aggregating individual preferences required simplifying complex desires into binary candidate selections,...
wildly parallel thinking and prototyping. i'd hop on a call.
[Thanks to Steven Byrnes for feedback and the idea for section §3.1. Also thanks to Justis from the LW feedback team.]
Remember this?
Or this?
The images are from WaitButWhy, but the idea was voiced by many prominent alignment people, including Eliezer Yudkowsky and Nick Bostrom. The argument is that the difference in brain architecture between the dumbest and smartest human is so small that the step from subhuman to superhuman AI should go extremely quickly. This idea was very pervasive at the time. It's also wrong. I don't think most people on LessWrong have a good model of why it's wrong, and I think because of this, they don't have a good model of AI timelines going forward.
Remember that we have no a priori reason to suspect that there are jumps in the future; humans perform sequential reasoning differently, so comparisons to the brain are just not informative.
In what way do we do it differently than the reasoning models?
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
Ah but you don't even need to name selection pressures to make interesting progress. As long as you know some kinds of characteristics powerful AI agents might have: eg goals, self models... then we can start to ask--what goals/self models will the most surviving AGIs have?
and you can make progress on both, agnostic of environment. but then, once you enumerate possible goals/self models, then we can start to think about which selection pressures might influence those characteristics in good directions and which levers we can pull today to shape those pressures.
This is a cross-post from https://250bpm.substack.com/p/accountability-sinks
...Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an abrupt end after an incident at Schiphol Airport on the outskirts of Amsterdam. In April 1999, a cargo of 440 of the rodents arrived on a KLM flight from Beijing, without the necessary import papers. Because of this, they could not be forwarded on to the customer in Athens. But nobody was able to correct the error and send them back either. What could be done with them? It’s hard to think there wasn’t a better solution than the one that was carried out; faced with the paperwork issue, airport staff threw all 440 squirrels into an industrial shredder.
[...]
It turned out that the order to destroy
Well not to dig in or anything but if I have a chance to automate something I'm going to think of it in terms of precision/recall/long tails, not in terms of the joy of being able to blame a single person when something goes wrong. There are definitely better coordination/optimization models than "accountability sinks." I don't love writing a riposte to a concept someone else found helpful but it really is on the edge between "sounds cool, means nothing" and "actively misleading" so I'm bringing it up.
The Nuremberg defense discussion is sketch. The author ...
30 years ago, the Cold War was raging on. If you don’t know what that is, it was the period from 1947 to 1991 where both the U.S and Russia had large stockpiles of nuclear weapons and were threatening to use them on each other. The only thing that stopped them from doing so was the knowledge that the other side would have time to react. The U.S and Russia both had surveillance systems to know of the other country had a nuke in the air headed for them.
On this day, September 26, in 1983, a man named Stanislav Petrov was on duty in the Russian surveillance room when the computer notified him that satellites had detected five nuclear missile launches from the U.S. He was...
this is beautiful, but I can't think of anything specific to say, so I'll just give some generic praise. I like how he only used big words when necessary.