I get the impression from talking to people who work professionally on reducing ai x-risk (largely taken, would include doing ops for FLI) that less than half of them could give an accurate and logically sound model of where ai x-risk comes from and why it is good that they do the work they do (bonus points if they can compare it to other work they might be doing instead). I'm generally a bit disappointed by this because to me it doesn't seem that hard to get everyone who's a professional knowledgeable on the basics, and it seems worthwhile as more people ...
(off topic to op, but in topic to Jan bringing up ALERT)
To what extent do you believe Sentinel fulfills what you wanted to do with ALERT? Their emergency response team is pretty small rn. Would you recommend funders support that project or a new ALERT?
Appreciate the photos and final video, as they also make this informative post more enjoyable to follw through.
EpochAI seem do be doing a lot of work that'll accelerate AI capabalities research and development (eg. informing investors and policy makers that yes AI is a huge economic deal and here are the bottlenecks you should work around, building capabilities benchmarks to optimize for). Under common-around-LW assumptions that no one could align AGI at this point, they are, by these means, increasing AI catastrophic and existential risk.
At a glance they also seem to not be doing AI x-risk reducing moves, like using their platform to mention that there are r...
I'm talking from a personal perspective here as Epoch director.
This seems fine to me (you can see some reasons I like Epoch here). My understanding is that most Epoch staff are concerned about AI Risk, though tend to longer timelines and maybe lower p(doom) than many in the community, and they aren't exactly trying to keep this secret.
Your argument rests on an implicit premise that Epoch talking about "AI is risky" in their podcast is important, eg because it'd change the mind of some listeners. This seems fairly unlikely to me - it seems like a very inside baseball podcast, mostly listened to by people already aware ...
Congratz on your successes and thank you for publishing this impact report.
It leaves me unsatiated related to cost effectiveness though. With no idea of how much money was invested in this project to get this outcome, I don't know if Arena is cost effective compared to other training programs and counterfactual opportunities. Would you mind sharing at least something about the amount of funding this got?
Re
...Still, it is also positive if ARENA can help participants who want to pursue a career transition test their fit for alignment engineeri
I don't actualy think your post was hostile, but I think I get where deepthoughtlife is coming from. At the least, I can share about how I felt reading this post and point out to why, since you seem keen on avoiding the negative side. Btw I don't think you avoid causing any frustration in readers, they are too diverse, so don't worry too much about it either.
The title of the piece is strongly worded and there's no epistimic status disclaimer to state this is exploratory, so I actually came in expecting much stronger arguments. Your post is good as an expos...
Putting this short rant here for no particularly good reason but I dislike that people claim constraints here or there in a way where I guess their intended meaning is only that "the derivative with respect to that input is higher than for the other inputs".
On factory floors there exist hard constraints, the throughput is limited by the slowest machine (when everything has to go through this). The AI Safety world is obviously not like that. Increase funding and more work gets done, increase talent and more work gets done. None are hard constraints.
If I'm r...
Interesting thoughts, ty.
A difficulty to common understanding I see here is that you're talking of "good" or "bad" paragraphs in the absolute, but didn't particularly define "good" or "bad" paragraph by some objective standard, so you're relying on your own understanding of what's good or bad. If you were defining good or bad relatively, you'd look for a 100 paragraphs, and post the worse 10 as bad. I'd be interested in seeing what were the worse paragraphs you found, some 50 percentile ones, and what were the best, then I'd tell you if I have the same absolute standards as you have.
Enjoyed this post.
Fyi, from the front page I just hovered this post "The shallow bench" and was immediately spoiled on Project Hail Mary (which I had started listening to, but didn't get far into). Maybe add some spoiler tag or warning directly after the title?
Without removing from the importance of getting the default right, and with some deliberate daring to feature creep, I think adding a customization feature (select colour) in personal profiles is relatively low effort and maintenance, so would solve the accessibility problem.
There's tacit knowledge in bay rationalist conversation norms that I'm discovering and thinking about, here's an observation and related thought. (I put the example later after the generalisation because that's my preferred style, feel free to read the other way).
Willingness to argue righteously and hash out things to the end, repeated over many conversations, makes it more salient when you're going for a dead end argument. This salience can inspire you to do argue more concisely and to the point over time.
Going to the end of things generates g...
I don't strongly disagree but do weakly disagree on some points so I guess I'll answer
Re first- if you buy into automated alignment work by human level AGI, then trying to align ASI now seems less worth it. The strongest counterargument to this I see is that "human level AGI" is impossible to get with our current understanding, as it will be superhuman in some things and weirdly bad at others.
Re second- disagreements might be nitpicking on "few other approaches" vs "few currently pursued approaches". There are probably a bunch of things that would allow fu...
I don't think your second footnote sufficiently addresses the large variance in 3D visualization abilities (note that I do say visualization, which includes seeing 2D video in your mind of a 3D object and manipulating that smoothly), and overall I'm not sure where you're getting at if you don't ground your post in specific predictions about what you expect people can and cannot do thanks to their ability to visualize 3D.
You might be ~conceptually right that our eyes see "2D" and add depth, but *um ackshually*, two eyes each receiving 2D data means yo...
I'll give fake internet points to whoever actually follows the instructions and posts photographic proof.
The naming might be confusing because pivotal act sounds like a one time action, but in most cases getting to a stable world without any threat from AI requires constant pivotal processes. This makes almost all the destructive approaches moot (and they're probably already bad for ethical concerns and many others already discussed) because you'll make yourself a pariah.
The most promising venue for a pivotal act/pivotal process that I know of is doing good research so that ASI risks are known and proven, doing good outreach and education so most world leaders and decision makers are well aware of this, and helping setup good governance worldwide to monitor and limit the development of AGI and ASI until we can control it.
I recently played Outer Wilds and Subnautica, and the exercise I recommend for both of these games is : Get to the end of the game without ever failing.
In subnautica that's dying once, in Outer Wilds it's a spoiler to describe what failing is (successfully getting to the end could certainly be argued to be a fail).
I failed in both of these. I played Outer Wilds first and was surprised at my fail, which inspired me to play Subnautica without dying. I got pretty far but also died from a mix of 1 unexpected game mechanic, uncareful measure of another mechanic, lack of redundancy in my contingency plans.
Oh wow, makes sense. It felt weird that you'd spend so much time on posts, yet if you didn't spend much time it would mean you write at least as fast as Scott Alexander. Well, thanks for putting in the work. I probably don't publish much because I want it to not be much work to do good posts but you're reassuring it's normal it does.
(aside : I generally like your posts' scope and clarity, mind saying how long it takes you to write something of this length?)
Self modeling is a really important skill, and you can measure how good you are at it by writing predictions about yourself. (Modelling A notably important one for people who have difficulty with motivation is predicting your own motivation - will you be motivated to do X in situation Y?
If you can answer that one generally, you can plan to actually anything you could theoretically do, using the following algorithm : from current situation A, to achieve wanted outcome Z, find a predecessor situation Y from which you'll be motivated to get to Z (eg. have wri...
Appreciate the highlight of identity as this import/crucial self fulfilling prophecy, I use that frame a lot.
What does the title mean? Since they all disagree I don't see one as being more of a minority than the other.
Nice talk!
When you talk about the most important interventions for the three scenarios, I wanna highlight that in the case of nationalization, you can also, if you're a citizen of one of these countries nationalizing AI, work for the government and be on those teams working and advocating for safe AI.
In my case I should have measurable results like higher salary, higher life satisfaction, more activity, more productivity as measured by myself and friends/flatmates. I was very low so it'll be easy to see progress. The difficulty was finding something that'd work, but it won't be measuring if it does.
Some people have short ai timelines based inner models that don't communicate well. They might say "I think if company X trains according to new technique Y it should scale well and lead to AGI, and I expect them to use technique Y in the next few years", and the reasons for why they think technique Y should work are some kind of deep understanding built from years of reading ml papers, that's not particularly easy to transmit or debate.
In those cases, I want to avoid going into details and arguing directly, but would suggest that they use their deep knowl...
Thank you for sharing, it really helps to pile on these stories (and nice to have some trust they're real, more difficult to get from reddit - on which note are there non doxing receipts you can show for this story being true? I have no reason to doubt you in particular but I guess it's good hygiene when on the internet to ask for evidence)
It also makes me wanna share a bit of my story. I read The Mind Illuminated, I did only small amounts of meditation, yet the framing the book offers has been changing my thinking and motivational systems. There aren't ma...
Might be good to have a dialogue format with other people who agree/disagree to flesh out scenarios and countermeasures
Hi, I'm currently evaluating the cost effectiveness of various projects and would be interested in knowing, if you're willing to disclose, approximately how much this program costs MATS in total? By this I mean the summer cohort, includings ops before and after necessary for it to happen, but not counting the extension.
"It's true that we don't want women to be driven off by a bunch of awkward men asking them out, but if we make everyone read a document that says 'Don't ask a woman out the first time you meet her', then we'll immediately give the impression that we have a problem with men awkwardly asking women out too much — which will put women off anyway."
This seems a weak response to me, at best only defensible considering yourself to be on the margin and without thought for longterm growth and your ability to clarify intentions (you have more than 3 words when intera...
Note Existential is a term of art different from Extinction.
The Precipice cites Bostrome and defines it such:
"An existential catastrophe is the destruction of humanity’s longterm potential.
An existential risk is a risk that threatens the destruction of humanity’s longterm potential."
Disempowerment is generally considered an existential risk in the literature.
I participated in the previous edition of AISC and found it very valuable to my involvement in AI Safety. I acquired knowledge (on standards and the standards process), got experience, contacts. I appreciate how much coordination AISC enables, with groups forming, which enable many to have their first hands on experience and step up their involvement.
Thanks, and thank you for this post in the first place!
Jonathan Claybrough
Actually no, I think the project lead here is jonachro@gmail.com which I guess sounds a bit like me, but isn't me ^^
Would be up for this project. As is, I downvoted Trevor's post for how rambly and repetitive it is. There's a nugget of idea, that AI can be used for psychological/information warfare that I was interested in learning about, but the post doesn't seem to have much substantive argument to it, so I'd be interested in someone both doing an incredibly shorter version which argued for its case with some sources.
It's a nice pic and moment, I very much like this comic and the original scene. It might be exaggerating a trait (here by having the girl be particularly young) for comedic effect but the Hogfather seems right.
I think I was around 9 when I got my first sword, around 10 for a sharp knife. I have a scar in my left palm from stabbing myself with that sharp knife as a child while whittling wood for a bow. It hurt for a bit, and I learned to whittle away from me or do so more carefully. I'm pretty sure my life is better for it and (from having this nice story attached to it) I like the scar.
This story still presents the endless conundrum between avoiding hurt and letting people learn and gain skills.
Assuming the world was mostly the same as nowadays, by the time your children are parenting, would they have the skills to notice sharp corners if they never experienced them ?
I think my intuitive approach here would be to put some not too soft padding (which is effectively close to what you did, it's still an unpleasant experience hitting against that even with the cloth).
What's missing is how to teach against existential risks. There...
“You can't give her that!' she screamed. 'It's not safe!'
IT'S A SWORD, said the Hogfather. THEY'RE NOT MEANT TO BE SAFE.
'She's a child!' shouted Crumley.
IT'S EDUCATIONAL.
'What if she cuts herself?'
THAT WILL BE AN IMPORTANT LESSON.”
― Terry Pratchett, Hogfather
https://www.reddit.com/r/hellsomememes/comments/do8xcv/an_important_lesson/
Are people losing ability to use and communicate in previous ontologies after getting Insight from meditation ? (Or maybe they never had the understanding I'm expecting of them ?) Should I be worried myself, in my practice of meditation ?
Today I reread Kensho by @Valentine, which presents Looking, and the ensuing conversation in the comments between @Said Achmiz and @dsatan, where Said asks for concrete benefits we can observe and mostly fails to get them. I also noticed interesting comments by @Ruby who in contrast was still be able to communi...
I don't know how to answer the general query. But I can say something maybe helpful about that Kenshō post and "Looking":
The insight was too new. I wrote the post just 4 months after the insight. I think I could answer questions like this way, way more clearly today.
(…although my experience with Said in particular has always been very challenging. I don't know that I could help him any better today than I could in 2018. Maybe? He seems to use a mind type that I've never found a bridge for.)
The issue is that the skill needed to convey an insight or skill is...
I focused my answer on the morally charged side, not emotional. The quoted statement said A and B so as long as B is mostly true for vegans, A and B is mostly true for (a sub-group) of vegans.
I'd agree with the characterization "it’s deeply emotionally and morally charged for one side in a conversation, and often emotional to the other." because most people don't have small identities and do feel attacked by others behaving differently indeed.
Did you know about "by default, GPTs think in plain sight"?
It doesn't explicitly talk about agentized GPTs but was discussing the impact this has on GPTs for AGI and how it affects the risks, and what we should do about it (eg. maybe rlhf is dangerous)
To not be misinterpreted, I didn't say I'm sure it's more the format than the content that's causing the upvotes (open question), nor that this post doesn't meet the absolute quality bar that normally warrants 100+ upvote (to each reader their opinion).
If you're open to object level discussing this, I can point on concrete disagreement with the content. Most importantly, this should not be seen as a paradigm shift, because it does not invalidate any of the previous threat models - it would only be so if it rendered impossible to do AGI any other way. I als...
You can read "reward is not the optimization target" for why a GPT system probably won't be goal oriented to become the best at predicting tokens, and thus wouldn't do the things you suggested (capturing humans). The way we train AI matters for what their behaviours look like, and text transformers trained on prediction loss seem to behave more like Simulators. This doesn't make them not dangerous, as they could be prompted to simulate misaligned agents (by misuses or accident), or have inner misaligned mesa-optimisers.
I've linked some good resources...
Quick meta comment to express I'm uncertain that posting things in lists of 10 is a good direction. The advantages might be real, easy to post, quick feedback, easy interaction, etc.
But the main disadvantage is that this comparatively drowns out other better posts (with more thought and value in them). I'm unsure if the content of the post was also importantly missing from the conversation (to many readers) and that's why this got upvoted so fast or if it's a lot the format... Even if this post isn't bad (and I'd argue it is for the suggestions it promotes...
First a quick response on your dead man switch proposal : I'd generally say I support something in that direction. You can find existing literature considering the subject and expanding in different directions in the "multi level boxing" paper by Alexey Turchin https://philpapers.org/rec/TURCTT , I think you'll find it interesting considering your proposal and it might give a better idea of what the state of the art is on proposals (though we don't have any implementation afaik)
Back to "why are the predicted probabilities so extreme that for most objective...
I watched the video, and appreciate that he seems to know the literature quite well and has thought about this a fair bit - he did a really good introduction of some of the known problems.
This particular video doesn't go into much detail on his proposal, and I'd have to read his papers to delve further - this seems worthwhile so I'll add some to my reading list.
I can still point out the biggest ways in which I see him being overconfident :
Writing down predictions. The main caveat is that these predictions are predictions about how the author will resolve these questions, not my beliefs about how these techniques will work in the future. I am pretty confident at this stage that value editing can work very well in LLMs when we figure it out, but not so much that the first try will have panned out.
I don't think reasoning about others' beliefs and thoughts is helping you be correct about the world here. Can you instead try to engage with the arguments themselves and point out at what step you you don't see a concrete way for that to happen ?
You don't show much sign of having read the article so I'll copy paste the part with explanations of how AIs start acting in the physical space.
...In this scenario, the AIs face a challenge: if it becomes obvious to everyone that they are trying to defeat humanity, humans could attack or shut down a few concent
I think this post would benefit from being more explicit on its target. This problem concerns AGI labs and their employees on one hand, and anyone trying to build a solution to Alignment/AI Safety on the other.
By narrowing the scope to the labs, we can better evaluate the proposed solutions (for example to improve decision making we'll need to influence decision makers therein), make them more focused (to the point of being lab specific, analyzing each's pressures), and think of new solutions (inoculating ourselves/other decision makers on AI a...
Thanks for the reply !
The main reason I didn't understand (despite some things being listed) is I assumed none of that was happening at Lightcone (because I guessed you would filter out EAs with bad takes in favor of rationalists for example). The fact that some people in EA (a huge broad community) are probably wrong about some things didn't seem to be an argument that Lightcone Offices would be ineffective as (AFAIK) you could filter people at your discretion.
More specifically, I had no idea "a huge component of the Lightcone Offices was caus...
The fact that some people in EA (a huge broad community) are probably wrong about some things didn't seem to be an argument that Lightcone Offices would be ineffective as (AFAIK) you could filter people at your discretion.
I mean, no, we were specifically trying to support the EA community, we do not get to unilaterally decide who is part of the community. People I don't personally have much respect for but are members of the EA community who are putting in the work to be considered members in good standing definitely get to pass through. I'm not goin...
...I don't think cost had that much to do with the decision, I expect that Open Philanthropy thought it was worth the money and would have been willing to continue funding at this price point.
In general I think the correct response to uncertainty is not half-speed. In my opinion it was the right call to spend this amount of funding on the office for the last ~6 months of its existence even when we thought we'd likely do something quite different afterwards, because it was still marginally worth doing it and the cost-effectiveness calculations for the us
I've multiple times been perplexed as to what the past events which can lead to this kind of take (over 7 years ago, EA/Rationality community's influence probably accelerated openAI's creation) have to do with today's shutting down of the offices.
Are there current, present day things going on in the EA and rationality community which you think warrant suspecting them of being incredibly net negative (causing worse worlds, conditioned on the current setup)? Things done in the last 6 months ? At Lightcone Offices ? (Though I'd appreciate specific examp...
PSA - at least as of March 2024, the way to create a Dialogue is by navigating to someone else's profile and to click the "Dialogue" option appearing near the right, next to the option to message someone.