ok edited to sun. (i used earth first because i don't know how long it will take to eat the sun, whereas earth seems likely to be feasible to eat quickly.)
(plausible to me that an aligned AI will still eat the earth but scan all the relevant information out of it and later maybe reconstruct it.)
ok thx, edited. thanks for feedback!
(That's not a reasonable ask, it intervenes on reasoning in a way that's not an argument for why it would be mistaken. It's always possible a hypothesis doesn't match reality, that's not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)
Yeah you can hypothesize. If you state it publicly though, please make sure to flag it as hypothesis.
How long until the earth gets eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don't occur.
No it doesn't imply this, I set this disclaimer "Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:". Though yeah I don't particularly expect those to occur.
...Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
AI speed advantage makes fast vs. slow ambiguous, because it doesn't require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with some o
(I did not carefully think about my predictions. I just wanted to state them somewhere because I think it's generally good to state stuff publicly.)
(My future self will not necessarily make similar predictions as I am now.)
TLDR: I don't know.
Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:
How long until the sun (starts to) get eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
How long until an AI reaches Elo 4000 o...
Here's my current list of lessons for review. Every day during my daily review, I look at the lessons in the corresponding weekday entry and the corresponding day of the month, and for each list one example from the last week where I could've applied the lesson, and one example where I might be able to apply the lesson in the next week:
...
- Mon
- get fast feedback. break tasks down into microtasks and review after each.
- Tue
- when surprised by something or took long for something, review in detail how you might've made the progress faster.
- clarify why the progress is g
Thank you for your feedback! Feedback is great.
We can try to select for AIs that outwardly seem friendly, but on anything close to our current ignorance about their cognition, we cannot be nearly confident that an AI going through the intelligence explosion will be aligned to human values.
It means that we have only very little understanding of how and why AIs like ChatGPT work. We know almost nothing about what's going on inside them that they are able to give useful responses. Basically all I'm saying here is that we know so little that it's hard to be co...
Here's my pitch for very smart young scientists for why "Rationality from AI to Zombies" is worth reading:
...The book "Rationality: From AI to Zombies" is actually a large collection of blogposts, which covers a lot of lessons on how to become better at reasoning. It also has a lot of really good and useful philosophy, for example about how Bayesian updating is the deeper underlying principle of how science works.
But let me express in more detail why I think "Rationality: A-Z" is very worth reading.
Human minds are naturally bad at deducing correct beliefs/the
Here's my 230 word pitch for why existential risk from AI is an urgent priority, intended for smart people without any prior familiarity with the topic:
...Superintelligent AI may be closer than it might seem, because of intelligence explosion dynamics: When an AI becomes smart enough to design an even smarter AI, the smarter AI will be even smarter and can design an even smarter AI probably even faster, and so on with the even smarter AI, etc. How fast such a takeoff would be and how soon it might occur is very hard to predict though.
We currently understand v
I have a binary distinction that is a bit different from the distinction you're drawing here. (Where tbc one might still draw another distinction like you do, but this might be relevant for your thinking.) I'll make a quick try to explain it here, but not sure whether my notes will be sufficient. (Feel free to ask for further clarification. If so ideally with partial paraphrases and examples where you're unsure.)
I distinguish between objects and classes:
The meta problem of consciousness is about explaining why people think they are conscious.
Even if we get such a result with AIs where AIs invent a concept like consciousness from scratch, that would only tell us that they also think they have sth that we call consciousness, but not yet why they think this.
That is, unless we can somehow precisely inspect the cognitive thought processes that generated the consciousness concept in AIs, which on anything like the current paradigm we won't be.
Another way to frame it: Why would it matter that an AI invents the c...
Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025. We have plans to fund $40M in grants and have available funding for substantially more depending on application quality.
Did you consider to instead commit to giving out retroactive funding for research progress that seems useful?
Aka that people could apply for funding for anything done from 2025, and then you can actually better evaluate how useful some research was, rather than needing to guess in advance how useful a project might be. And in a...
Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025. We have plans to fund $40M in grants and have available funding for substantially more depending on application quality.
Side question: How much is Openphil funding LTFF? (And why not more?)
(I recently got an email from LTFF which suggested that they are quite funding constraint. And I'd intuitively expect LTFF to be higher impact per dollar than this, though I don't really know.)
I created an obsidian Templater template for the 5-minute version of this skill. It inserts the following list:
- how could I have thought that faster?
- recall - what are key takeaways/insights?
- trace - what substeps did I do?
- review - how could one have done it (much) faster?
- what parts were good?
- where did i have wasted motions? what mistakes did i make?
- generalize lesson - how act in future?
- what are example cases where this might be relevant?
Here's the full template so it inserts this at the right level of indentation. (You can set a short...
I now want to always think of concrete examples where a lesson might become relevant in the next week/month, instead of just reading them.
As of a couple of days ago, I have a file where I save lessons from such review exercises for reviewing them periodically.
Some are in weekly review category and some in monthly review. Every day when I do my daily recall I now also check through the lessons in the corresponding weekday and monthday tag.
Here's how my file currently looks like:
(I use some short codes for typing faster like "W=what", "h=how", "t=to", "w=with" and maybe some more.)
...- Mon
- [[lesson - clarify Gs on concrete examples]]
- [[lesson - de
Belief propagation seems too much of a core of AI capability to me. I'd rather place my hope on GPT7 not being all that good yet at accelerating AI research and us having significantly more time.
This just seems doomed to me. The training runs will be even more expensive, the difficulty of doing anything significant as an outsider ever-higher. If the eventual plan is to get big labs to listen to your research, then isn't it better to start early? (If you have anything significant to say, of course.)
I'd imagine it not too hard to get >1OOM efficiency impr...
I don’t think that. See the bottom part of the comment you’re replying to. (The part after “Here’s what I would say instead:”)
Sry my comment was sloppy.
Right, my point is, I don’t see any difference between “AIs that produce slop” and “weak AIs” (a.k.a. “dumb AIs”).
(I agree the way I used sloppy in my comment mostly meant "weak". But some other thoughts:)
So I think there are some dimensions of intelligence which are more important for solving alignment than for creating ASI. If you read planecrash, WIS and rationality training seem to me more important in ...
So the lab implements the non-solution, turns up the self-improvement dial, and by the time anybody realizes they haven’t actually solved the superintelligence alignment problem (if anybody even realizes at all), it’s already too late.
If the AI is producing slop, then why is there a self-improvement dial? Why wouldn’t its self-improvement ideas be things that sound good but don’t actually work, just as its safety ideas are?
Because you can speed up AI capabilities much easier while being sloppy than to produce actually good alignment ideas.
If you really thi...
Thanks for providing a concrete example!
Belief propagation seems too much of a core of AI capability to me. I'd rather place my hope on GPT7 not being all that good yet at accelerating AI research and us having significantly more time.
I also think the "drowned out in the noise" isn't that realistic. You ought to be able to show some quite impressive results relative to computing power used. Though when you maybe should try to convince the AI labs of your better paradigm is going to be difficult to call. It's plausible to me we won't see signs that make us ...
Can you link me to what you mean by John's model more precisely?
If you mean John's slop-instead-scheming post, I agree with that with the "slop slightly more likely than scheming" part. I might need to reread John's post to see what the concrete suggestions for what to work on might be. Will do so tomorrow.
I'm just pessimistic that we can get any nontrivially useful alignment work out of AIs until a few months before the singularity, at least besides some math. Or like at least for the parts of the problem we are bottlenecked on.
So like I think it's valuab...
Thanks.
True, I think your characterization of tiling agents is better. But my impression was sorta that this self-trust is an important precursor for the dynamic self-modification case where alignment properties need to be preserved through the self-modification. Yeah I guess calling this AI solving alignment is sorta confused, though maybe there's sth into this direction because the AI still does the search to try to preserve the alignment properties?
Hm I mean yeah if the current bottleneck is math instead of conceptualizing what math has to be done then ...
What kind of alignment research do you hope to speed up anyway?
For advanced philosophy like stuff (e.g. finding good formal representations for world models, or inventing logical induction) they don't seem anywhere remotely close to being useful.
My guess would be for tiling agents theory neither but I haven't worked on it, so very curious on your take here. (IIUC, to some extent the goal of tiling-agents-theory-like work there was to have an AI solve it's own alignment problem. Not sure how far the theory side got there and whether it could be combined with LLMs.)
Or what is your alignment hope in more concrete detail?
This argument might move some people to work on "capabilities" or to publish such work when they might not otherwise do so.
Above all, I'm interested in feedback on these ideas. The title has a question mark for a reason; this all feels conjectural to me.
My current guess:
I wouldn't expect much useful research to come from having published ideas. It's mostly just going to be used in capabilities and it seems like a bad idea to publish stuff.
Sure you can work on it and be infosec cautious and keep it secret. Maybe share it with a few very trusted people who m...
Due to the generosity of ARIA, we will be able to offer a refund proportional to attendance, with a full refund for completion. The cost of registration is $200, and we plan to refund $25 for each week attended, as well as the final $50 upon completion of the course. We’ll ask participants to pay the registration fee once the cohort is finalized, so no fee is required to fill out the application form below.
Wait so do we get a refund if we decide we don't want to do the course, or if we manage to complete the course?
Like is it a refund in the "get your money back if you don't like it" sense, or is it incentive to not sign up and then not complete the course?
It's the latter.
Nice post!
My key takeaway: "A system is aligned to human values if it tends to generate optimized-looking stuff which is aligned to human values."
I think this is useful progress. In particular it's good to try to aim for the AI to produce some particular result in the world, rather than trying to make the AI have some goal - it grounds you in the thing you actually care about in the end.
I'd say the "... aligned to human values part" is still underspecified (and I think you at least partially agree):
Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there's the problem that people will find proposals that very likely don't actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)
In general, I wish more people would make posts about books without feeling the need to do boring parts they are uninterested in (summarizing and reviewing) and more just discussing the ideas they found valuable. I think this would lower the friction for such posts, resulting in more of them. I often wind up finding such thoughts and comments about non-fiction works by LWers pretty valuable. I have more of these if people are interested.
I liked this post, thanks and positive reinforcement. In case you didn't already post your other book notes, just letting you know I'd be interested.
Do we have a sense for how much of the orca brain is specialized for sonar?
I don't know.
But evolution slides functions around on the cortical surface, and (Claude tells me) association areas like the prefrontal cortex are particularly prone to this.
It's particularly bad for cetaceans. Their functional mapping looks completely different.
Thanks. Yep I agree with you, some elaboration:
(This comment assumes you at least read the basic summary of my project (or watched the intro video).)
I know of Earth Species Project (ESP) and CETI (though I only read 2 publications of ESP and none of CETI).
I don't expect them to succeed in something equivalent to decoding orca language to an extent that we could communicate with them almost as richly as they communicate among each other. (Though like, if long-range sperm whales signals are a lot simpler they might be easier to decode.)
From what I've seen, t...
Perhaps also not what you're looking for, but you could check out the google hashcode archive (here's an example problem). I never participated though, so don't know whether they would make that great tests. But it seems to me like general ad-hoc problem solving capabilities are more useful in hashcode than in other competetive programming competitions.
GPT4 summary: "Google Hash Code problems are real-world optimization and algorithmic challenges that require participants to design efficient solutions for large-scale scenarios. These problems are typically...
Maybe not what you're looking for because it's not like one hard problem but more like many problems in a row, and generally I don't really know whether they are difficult enough, but you could (have someone) look into Exit games. Those are basically like escape rooms to go. I'd filter for Age16+ to hopefully filter for the hard ones, though maybe you'd want to separately look up which are particularly hard.
I did one or two when I was like 15 or 16 years old, and recently remembered them and I want to try some more for fun (and maybe also introspection), t...
I hope I will get around to rereading the post and edit this comment to write a proper review, but I'm pretty busy, so in case I don't I now leave this very shitty review here.
I think this is probably my favorite post from 2023. Read the post summary to see what it's about.
I don't remember a lot of the details from the post and so am not sure whether I agree with everything, but what I can say is:
Another thought, though I don't actually have any experience with this, but mostly doing attentive silent listening/observing might also be useful for learning how the other person is doing research.
Like, if it seems boring to just observe and occasionally say sth, try to better predict how the person will think or so.
The mein reason I'm interested in orcas is because they have 43 billion cortical neurons, whereas the 2 land animals with the most cortical neurons (where we have have optical-fractionator measurements) are humans and chimpanzees with 21 billion and 7.4 billion respectively. See: https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons#Forebrain_(cerebrum_or_pallium)_only
Pilot whales is the other species I'd consider for experiments - they have 37.2 billion cortical neurons.
For sperm whales we don't have data on neuron densities (though they do h...
Cool, thanks, that was useful.
(I'm creating a language for communicating with orcas, so the phonemes will be relatively unpractical for humans. Otherwise the main criteria are simple parsing structure and easy learnability. (It doesn't need to be super perfect - the perhaps bigger challenge is to figure out how to teach abstract concepts without being able to bootstrap from an existing language.) Maybe I'll eventually create a great rationalist language for thinking effectively, but not right now.)
Is there some resource where I can quickly learn the basics...
Thanks!
But most likely, this will all be irrelevant for orcas. Their languages may be regular or irregular, with fixed or random word order, or maybe with some categories that do not exist in human languages.
Yeah I was not asking because of decoding orca language but because I want inspiration for how to create the grammar for the language I'll construct. Esparanto/Ido also because I'm interested about how well word-compositonality is structured there and whether it is a decent attempt at outlining the basic concepts where other concepts are composites of.
Currently we basically don't have any datasets where it's labelled what orca says what. When I listen to recordings, I cannot distinguish voices, though idk it's possible that people who listened a lot more can. I think just unsupervised voice clustering would probably not work very accurately. I'd guess it's probably possible to get data on who said what by using an array of hydrophones to infer the location of the sound, but we need very accurate position inference because different orcas are often just 1-10m distance from each other, and for this we mig...
Thanks for your thoughts!
I don't know what you'd consider enough recordings, and I don't know how much decent data we have.
I think the biggest datasets for orca vocalizations are the orchive and the orcasound archive. I think they each are multiple terabytes big (from audio recordings) but I think most of it (80-99.9% (?)) is probably crap where there might just be a brief very faint mammal vocalization in the distance.
We also don't have a way to see which orca said what.
Also orcas from different regions have different languages, and orcas from different p...
Thanks.
I think LTFF would take way too long to get back to me though. (Also they might be too busy to engage deeply enough to get past the "seems crazy" barrier and see it's at least worth trying.)
Also btw I mostly included this in case someone with significant amounts of money reads this, not because I want to scrap it together from small donations. I expect higher chances of getting funding come from me reaching out to 2-3 people I know (after I know more about how much money I need), but this is also decently likely to fail. If this fails I'll maybe try Manifund, but would guess I don't have good chances there either, but idk.
Actually out of curiosity, why 4x? (And what exactly do you mean by "2x larger"?) (And is this for a naive algorithm which can be improved upon or a tight constraint?)
Thanks for pointing that out! I will tell my friends to make sure they actually get good data for the metabolic cost and not just use cortical neuron count as proxy if they cannot find something good.
(Or is there also another point you wanted to make?) And yeah it's actually also an argument for why orcas might be less intelligent (if they sorta use their neurons less often). Thanks.
My guess is that there probably aren't a lot of simple mutations which just increase intelligence without increasing cortical neuron count. (Though probably simple mutations can shift the balance between different sub-dimensions of intelligence as constrained through cortical neuron count.) (Also of course any particular species has a lot of deleterious mutations going around and getting rid of those may often just increase intelligence, but I'm talking about intelligence-increasing changes to the base genome.)
But there could be complex adaptations that ar...
An argument against orcas being more intelligent than humans runs thus: Orcas are much bigger than humans, so the fraction of the metabolic cost the brain consumes is smaller than in humans. Thus it took more selection pressure for humans to evolve having 21billion neurons than for orcas to have 43billion.[1] Thus humans might have other intelligence-increasing mutations that orcas didn't evolve yet.
So the question here is "how much does scale matter vs other adaptations". Luckily, we can get some evidence on that by looking at other species and ratin...
Another thought:
In what animals would I on priors expect intelligence to evolve?
AFAIK, orcas are the largest animals that use collaborative hunting techniques.[1] That plausibly puts them second be...
Main pieces I remember were: Orcas already dominating the planet (like humans do), large sea creatures going extinct due to orcas (similar to how humans drove several species extinct, (Megalodon? Probably extinct for different reasons, weak evidence against? Most other large whales are still around)).
To clarify for other readers: I do not necessarily endorse this is what we would expect if orcas were smart.
(Also I read somewhere that apparently chimpanzees sometimes/rarely can experience menopause in captivity.)
If the species is already dominating the environment then the pressure from the first component compared to the second decreases.
I agree with this. However I don't think humans had nearly sufficient slack for most of history. I don't think they dominated the environment up until 20000years [1]ago or so, and I think most improvements in intelligence come from earlier.
...That's why I'm attributing the level of human intelligence in large part to runaway sexual selection. Without it, as soon as interspecies competition became the most important for re
Seems totally unrelated to my post but whatever:
My p(this branch of humanity won't fulfill the promise of the night sky) is actually more like 0.82 or sth, idk. (I'm even lower on p(everyone will die), because there might be superintelligences in other branches that acausally trade to save the existing lives, though I didn't think about it carefully.)
I'm chatting 1 hour every 2 weeks with Erik Jenner. We usually talk about AI safety stuff. Otherwise also like 1h every 2 weeks with a person who has sorta similar views to me. Otherwise I currently don't talk much to people about AI risk.