not running amock, just not reliably following instructions "only modify files in this folder" or "don't install pip packages". Claude follows instructions correctly, some other models are mode collapsed into a certain way of doing things, eg gpt-4o always thinks it's running python in chatgpt code interpreter and you need very strong prompting to make it behave in a way specific to your computer
Working on anti spam/scam features at Google or banks could be a leveraged intervention on some worldviews. As AI advances it will be more difficult for most people to avoid getting scammed, and including really great protections into popular messaging platforms and banks could redistribute a lot of money from AIs to humans
Like the post! I'm very interested in how the capabilities of prediction vs character are changing with more recent models. Eg sonnet new may have more of its capabilities tied to its character. And Reasoning models have maybe a fourth layer between ground and character, possibly even completely replacing ground layer in highly distilled models
there is https://shop.nist.gov/ccrz__ProductList?categoryId=a0l3d0000005KqSAAU&cclcl=en_US which fulfils some of this
Wow thank you for replying so fast! I donated $5k just now, mainly because you reminded me that lightcone may not meet goal 1 and that's definitely worth meeting.
About web design, am only slightly persuaded by your response. In the example of Twitter, I don't really buy that there's public evidence that twitter's website work besides user-invisible algorithm changes has had much impact. I only use Following page, don't use spaces, lists, voice, or anything on twitter. Comparing twitter with bluesky/threads/whatever, really looks to me like cultural s...
My main crux about how valuable Lightcone donations are is how impactful great web dev on LessWrong is. If I look around, impact of websites doesn't look strongly correlated with web design, expecially on the very high end. My model is more like platforms / social networks rise or fall by zeitgeist, moderation, big influencers/campaigns (eg elon musk for twitter), web design, in that order. Olli has thought about this much more than me, maybe he's right. I certainly don't believe there's a good argument for LW web dev is responsible for its user metrics. Zeitgeist, moderation, and lightcone people personally posting seems likely more important to me. Lightcone is still great despite my (uninformed) disagreement!
note: the minecraft agents people use have far greater ability to act than to sense. They have access to commands which place blocks anywhere, and pick up blocks from anywhere, even without being able to see them, eg the llm has access to mine(blocks.wood)
command which does not require it to first locate or look at where the wood is currently. If llms played minecrafts using the human interface these misalignments would happen less
I likely agree that anthropic-><-palantir is good, but i disagree about blocking hte US government out of AI being a viable strategy. It seems to me like many military projects get blocked by inefficient beaurocracy, and it seems plausible to me for some legacy government contractors to get exclusive deals that delay US military ai projects for 2+ years
I prefer to just think about utility, rather than probabilities. Then you can have 2 different "incentivized sleeping beauty problems"
In the first case, 1/3 maximizes your money, in the second case 1/2 maximizes it.
To me this implies that in real world analogues to the Sleeping Beauty problem, you need to ask whether your reward is per-awakening or per-world, and answer accordingly
I disagree a lot! Many things have gotten better! Is sufferage, abolition, democracy, property rights etc not significant? All the random stuff eg better angels of our nature claims has gotten better.
Either things have improved in the past or they haven't, and either people trying to "steer the future" in some sense have been influential on these improvements. I think things have improved, and I think there's definitely not strong evidence that people trying to steer the future was always useless. Because trying to steer the future is very important and mo...
A core part of Paul's arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it's more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed
to me "alignment tax" usually only refers to alignment methods that don't cost-effectively increase capabilities, so if 90% of alignment methods did cost effectively increase capabilities but 10% did not, i would still say there was an "alignment tax", just ignore the negatives.
Also, it's important to consider cost-effective capabilities rather than raw capabilities - if a lab knows of a way to increase capabilities more cost-effectively than alignment, using that money for alignment is a positive alignment tax
I think this risks getting into a definitions dispute about what concept the words ‘alignment tax’ should point at. Even if one grants the point about resource allocation being inherently zero-sum, our whole claim here is that some alignment techniques might indeed be the most cost-effective way to improve certain capabilities and that these techniques seem worth pursuing for that very reason.
yes, in some cases a much weaker (because it's constrained to be provable) system can restrict the main ai, but in the case of llm jailbreaks there is no particular hope that such a guard system could work (eg jailbreaks where the llm answers in base64 require the guard to understand base64 and any other code the main ai could use)
interesting, this actually changed my mind, to the extent i had any beliefs about this already. I can see why you would want to update your prior, but the iterated mugging doesn't seem like the right type of thing that should cause you to update. My intuition is to pay all the single coinflip muggings. For the digit of pi muggings, i want to consider how different this universe would be if the digit of pi was different. Even though both options are subjectively equally likely to me, one would be inconsistent with other observations or less likely or have something wrong with it, so i lean toward never paying it
...Train two nets, with different architectures (both capable of achieving zero training loss and good performance on the test set), on the same data.
...
Conceptually, this sort of experiment is intended to take all the stuff one network learned, and compare it to all the stuff the other network learned. It wouldn’t yield a full pragmascope, because it wouldn’t say anything about how to factor all the stuff a network learns into individual concepts, but it would give a very well-grounded starting point for translating stuff-in-one-net into stuff-in-another-net
yeah. One trajectory could be someone in-community-ish writes an extremely good novel about a very realistic ASI scenario with the intention to be adaptable into a movie, it becomes moderately popular, and it's accessible and pointed enough to do most of the guidence for the movie. I don't know exactly who could write this book, there are a few possibilities.
Another way this might fail is if fluid dynamics is too complex/difficult for you to constructively argue that your semantics are useful in fluid dynamics. As an analogy, if you wanted to show that your semantics were useful for proving fermat's last theorem, you would likely fail because you simply didn't apply enough power to the problem, and I think you may fail that way in fluid dynamics.
Great post!
I'm most optimistic about "feel the ASI" interventions to improve this. I think once people understand the scale and gravity of ASI, they will behave much more sensibly here. The thing I intuitively feel most optimistic (whithout really analyzing it) is movies or generally very high quality mass appeal art.
I think better AGI-depiction in movies and novels also seems to me like a pretty good intervention. I do think these kinds of things are very hard to steer on-purpose (I remember some Gwern analysis somewhere on the difficulty of getting someone to create any kind of high-profile media on a topic you care about, maybe in the context of hollywood).
you can recover lost momentum by decelerating things to land. OP mentions that briefly
And they need a regular supply of falling mass to counter the momentum lost from boosting rockets. These considerations mean that tethers have to constantly adapt to their conditions, frequently repositioning and doing maintenance.
If every launch returns and lands on earth, that would recover some but not all lost momentum, because of fuel spent on the trip. it's probably more complicted than that though
I am guilty of citing sources I don't believe in, particularly in machine learning. There's a common pattern where most papers are low quality, and no can/will investigate the validity of other people's papers or write review papers, so you usually form beliefs by an ensemble of lots of individually unreliable papers and your own experience. Then you're often asked for a citation and you're like "there's nothing public i believe in, but i guess i'll google papers claiming the thing i'm claiming and put those in". I think many ML people have ~given up on citing papers they believe in, including me.
the reason airplanes need speed is basically because their propeller/jet blades are too small to be efficient at slow speed. You need a certain amount of force to lift off, and the more air you push off of at once the more force you get per energy. The airplanes go sideways so that their wings, which are very big, can provide the lift instead of their engines. Also this means that if you want to go fast and hover efficiently, you need multiple mechanisms because the low volume high speed engine won't also be efficient at low speed
Examples of maybe disparagement:
I agree with this so much! Like you I very much expect benefits to be much greater than harms pre superintelligence. If people are following the default algorithm "Deploy all AI which is individually net positive for humanity in the near term" (which is very reasonable from many perspectives), they will deploy TEDAI and not slow down until it's too late.
I expect AI to get better at research slightly sooner than you expect.