All of steven0461's Comments + Replies

As I understand it, the Metaculus crowd forecast performs as well as it does (relative to individual predictors) in part because it gives greater weight to more recent predictions. If "superhuman" just means "superhumanly up-to-date on the news", it's less impressive for an AI to reach that level if it's also up-to-date on the news when its predictions are collected. (But to be confident that this point applies, I'd have to know the details of the research better.)

"Broekveg" should be "Broekweg"

partly as a result of other projects like the Existential Risk Persuasion Tournament (conducted by the Forecasting Research Institute), I now think of it as a data-point that “superforecasters as a whole generally come to lower numbers than I do on AI risk, even after engaging in some depth with the arguments.”

I participated in the Existential Risk Persuasion Tournament and I disagree that most superforecasters in that tournament engaged in any depth with the arguments. I also disagree with the phrase "even after arguing about it" - barely any arguing h... (read more)

4Joern Stoehler
Seconded for whatever group I participated in.

Thanks, yes, this is a helpful type of feedback. We'll think about how to make that section make more sense without background knowledge. The site is aimed at all audiences, and this means we'll have to navigate tradeoffs about text leaving gaps in justifying claims vs. being too long vs. not having enough scope to be an overview. In this case, it does look like we could make the tradeoff on the side of adding a bit more text and links. Your point about the glossary sounds reasonable and I'll pass it along. (I guess the tradeoff there is people might see an unexplained term and not realize that an earlier instance of it had a glossary link.)

You're right that it's confusing, and we've been planning to change how collapsing and expanding works. I don't think specifics have been decided on yet; I'll pass your ideas along.

I don't think there should be "random" tabs, unless you mean the ones that appear from the "show more questions" option at the bottom. In some cases, the content of child questions may not relate in an obvious way to the content of their parent question. Is that what you mean? If questions are appearing despite not 1) being linked anywhere below "Related" in the doc correspondin... (read more)

Quoting from our Manifund application:

We have received around $46k from SHfHS and $54k from LTFF, both for running content writing fellowships. We have been offered a $75k speculation grant from Lightspeed Grants for an additional fellowship, and made a larger application to them for the dev team which has not been accepted. We have also recently made an application to Open Philanthropy.

if there's interest in finding a place for a few people to cowork on this in Berkeley, please let me know

Thanks, I made a note on the doc for that entry and we'll update it.

Traffic is pretty low currently, but we've been improving the site during the distillation fellowships and we're hoping to make more of a real launch soon. And yes, people are working on a Stampy chatbot. (The current early prototype isn't finetuned on Stampy's Q&A but searches the alignment literature and passes things to a GPT context window.)

Yes, but we decided to reschedule it before making the announcement. Apologies to anyone who found the event in some other way and was planning on it being around the 11th; if Aug 25-27 doesn't work for you, note that there's still the option to participate early.

Since somebody was wondering if it's still possible to participate without having signed up through alignmentjam.com:

Yes, people are definitely still welcome to participate today and tomorrow, and are invited to head over to Discord to get up to speed.

Stampy's AI Safety Info is a little like that in that it has 1) pre-written answers, 2) a chatbot under very active development, and 3) a link to a Discord with people who are often willing to explain things. But it could probably be more like that in some ways, e.g. if more people who were willing to explain things were habitually in the Discord.

Also, I plan to post the new monthly basic AI safety questions open thread today (edit: here), which is also a little like that.

Anonymous #7 asks:

I am familiar with the concept of a utility function, which assigns numbers to possible world states and considers larger numbers to be better. However, I am unsure how to apply this function in order to make decisions that take time into account. For example, we may be able to achieve a world with higher utility over a longer period of time, or a world with lower utility but in a shorter amount of time.

1Multicore
When people calculate utility they often use exponential discounting over time. If for example your discount factor is .99 per year, it means that getting something in one year is only 99% as good as getting it now, getting it in two years is only 99% as good as getting it in one year, etc. Getting it in 100 years would be discounted to .99^100~=36% of the value of getting it now.

Anonymous #6 asks:

Why hasn't an alien superintelligence within our light cone already killed us?

3gilch
There probably isn't one in our past light cone, or we'd have noticed them by now.
3Seth Herd
I've heard two theories, and (maybe) created another. One is that there isn't one in our light cone. Arguments like dissolving the fermi paradox (name at least somewhat wrong) and the frequency of nova and supernova events sterilizing planets that aren't on the galactic rim are considered pretty strong, I think. The one I've heard is the dark forest hypothesis. In that hypothesis, an advanced culture doesn't send out signals or probes to be found. Instead it hides, to prevent or delay other potentially hostile civilizations (or AGIs) from finding it. This is somewhat compatible with aligned superintelligences that serve their civilizations desires. Adding to the plausibility of this hypothesis is the idea that an advanced culture might not really be interested in colonizing the galaxy. We or they might prefer to mostly live in simulation, possibly with a sped-up subjective timeline. Moving away from that would be abandoning most of your civilization for very long subjective times, with the lightspeed delays. And it might be forbidden as dangerous, by potentially leading unknown hostiles back to your civilizations home world. The last, my own (AFAIK) is that they are here. They are aligned to their civilization and not hostile to ours. They are monitoring our attempts to break out of our earthly chrysalis by creating our own AGI. If it is unaligned, they will destroy it before it becomes a threat. They have not revealed themselves yet based on some variant of the Prime Directive, or else the Dark Forest hypothesis- don't go showing yourself and leading hostiles home. In this scenario, I suppose we're also being staked out by a resource-stingy hostile AGI, hoping that some friendly civilization reveals itself by contacting us or dramatically intervening. Obviously I haven't thought this all the way through, but there are some possibilities for you.

Anonymous #5 asks:

How can programers build something and dont understand inner workings of it? Are they closer to biologists-cross-breeders than to car designers?

4faul_sname
In order to predict the inner workings of a language model well enough to understand the outputs, you not only need to know the structure of the model, but also the weights and how they interact. It is very hard to do that without a deep understanding of the training data, and so effectively predicting what the model will do requires understanding both the model and the world the model was trained on. Here is a concrete example: Let's say I have two functions, defined as follows: import random words = [] def do_training(n):     for i in range(n):         word = input('Please enter a word: ')         words.append(word) def do_inference(n):     output = []     for i in range(n):         word = random.choice(words)         output.append(word)     return output If I call do_training(100) and then hand the computer to you for you to put 100 words into, and you then handed the computer back to me (and cleared the screen), I would be able to tell you that do_inference(100) would spit out 100 words pulled from some distribution, but I wouldn't be able to tell you what distribution that is without seeing the training data. See this post for a more in-depth exploration of this idea.
1Linch
I personally think the cross-breeder analogy is pretty reasonable for modern ML systems.
4gilch
Sounds like you haven't done much programming. It's hard enough to understand the code one wrote oneself six months ago. (Or indeed, why the thing I wrote five minutes ago isn't behaving as expected.) Just because I wrote it, doesn't mean I memorized it. Understanding what someone else wrote is usually much harder, especially if they wrote it poorly, or in an unfamiliar language. A machine learning system is even harder to understand than that. I'm sure there are some who understand in great detail what the human-written parts of the algorithm do. But to get anything useful out of a machine learning system, it needs to learn. You apply it to an enormous amount of data, and in the end, what it's learned amounts to possibly gigabytes of inscrutable matrices of floating-point numbers. On paper, a gigabyte is about 4 million pages of text. That is far larger than the human-written source code that generated it, which could typically fit in a small book. How that works is anyone's guess. Reading this would be like trying to read someone's mind by examining their brain under a microscope. Maybe it's possible in principle, but don't expect a human to be able to do it. We'd need better tools. That's "interpretability research". There are approaches to machine learning that are indeed closer to cross breeding than designing cars (genetic algorithms), but the current paradigm in vogue is based on neural networks, kind of an artificial brain made of virtual neurons.
1francodanussi
It is true that programmers sometimes build things ignoring the underlying wiring of the systems they are using. But programmers in general create things relying on tools that were thouroughly tested. Besides that, they are builders, doers, not academics. Think of really good guitar players: they probably don't understand how sounds propagate through matter, but they can play their instrument beautifully.

Anonymous #4 asks:

How large space of possible minds? How its size was calculated? Why is EY thinks that human-like minds are not fill most of this space? What are the evidence for it? What are the possible evidence against "giant Mind Design Space and human-like minds are tiny dot there"?

Anonymous #3 asks:

Can AIs be anything but utility maximisers? Most of the existing programs are something like finite-steps-executors (like Witcher 3 and calculator). So what's the difference?

1mruwnik
This seems to be mixing 2 topics. Existing programs are more or less a set of steps to execute. A glorified recipe. The set of steps can be very complicated, and have conditionals etc., but you can sort of view them that way. Like a car rolling down a hill, it follows specific rules. An AI is (would be?) fundamentally different in that it's working out what steps to follow in order to achieve its goal, rather than working towards its goal by following prepared steps. So continuing the car analogy, it's like a car driving uphill, where it's working to forge a path against gravity. An AI doesn't have to be a utility maximiser. If it has a single coherent utility function (pretty much a goal), then it will probably be a utility maximiser. But that's by no means the only way of making them. LLMs don't seem to be utility maximisers

Anonymous #2 asks:

A footnote in 'Planning for AGI and beyond' says "Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination" - why do shorter timelines seem more amenable to coordination?

2gilch
My current feeling is that the opposite, long timelines and hard takeoff, has the best chance of going well. The main advantage of short timelines is that it makes an immediately-fatal hard takeoff less likely, as there is presumably less overhang now than in the future. It perhaps also reduces the number of players, as presumably it's easier to join the game as tech improves, so there may never be fewer than there are now. It also has the advantage of maybe saving the lives of those too old to make it to a longer timeline. However, I think the overhang is already dangerously large, and probably was years ago, so I don't think this is helping (probably). The main advantage of a soft takeoff is that we might be able to get feedback and steer it as the takeoff happens, perhaps reducing the risk of a critical error. It also increases the chances of a multipolar scenario, where there is an economy of competing AIs. If we don't like some of the gods we build, perhaps others will be more friendly, or will at least be able to stalemate the bad ones before they kill everyone. However, I think a multipolar scenario (while unlikely to last even in a soft takeoff) is very dangerous. I don't think the long-term incentives are favorable to human survival, for two reasons: First is Bostrom's Black Marble scenario (he's also called them "black balls", but that already means something else): Every new technology discovered has a chance of destroying us, especially if we lack the coordination to abstain from using it. In a multipolar world, we lack that coordination. Hostile AIs may recklessly pursue dangerous research or threaten doomsday to blackmail the world into getting what they want, and it is game-theoretically advantageous for them to do this in such a way that they proveably can't change their minds and not destroy the world if we call the bluff (i.e. defiantly rip off the steering wheel in a game of chicken.) Second, we'll eventually fall into Malthusian/Molochean trap
3steven0461
I don't know why they think so, but here are some people speculating.

Anonymous #1 asks:

This one is not technical: now that we live in a world in which people have access to systems like ChatGPT, how should I consider any of my career choices, primarily in the context of a computer technician? I'm not a hard-worker, and I consider that my intelligence is just a little above average, so I'm not going to pretend that I'm going to become a systems analyst or software engineer, but now code programming and content creation are starting to be automated more and more, so how should I update my decisions based on that?

Sure, this qu

... (read more)
4gilch
This field is evolving so quickly that it's hard to make recommendations. In the current regime (starting approximately last November), prompt engineering is a valuable skill that can multiply your effectiveness. (In my estimation, from my usage so far, perhaps by a factor of 7.) Learn how to talk to these things. Sign up for Bing and/or ChatGPT. There are a lot of tricks. This is at least as important as learning how to use a search engine. But how long will the current regime last? Until ChatGPT-5? Six months? A year? Maybe these prompt engineering skills will then be obsolete. Maybe you'll have a better chance of picking up the next skill if you learn the current one, but it's hard to say. And this is assuming the next regime, or the one after that doesn't kill us. Once we hit the singularity, all career advice is moot. Either you're dead, or we're in a post-singularity society that's impossible to predict now. Assuming we survive, we'll probably be in a post-scarcity regime where "careers" are not a thing, but no-one really knows.

Here's a form you can use to send questions anonymously. I'll check for responses and post them as comments.

From 38:58 of the podcast:

So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough

... (read more)
1burrito
Thanks, this is exactly the kind of thing I was looking for.

trevor has already mentioned the Stampy project, which is trying to do something very similar to what's described here and wishes to join forces.

Right now, Stampy just uses language models for semantic search, but the medium-term plan is to use them for text generation as well: people will be able to go to chat.stampy.ai or chat.aisafety.info, type in questions, and have a conversational agent respond. This would probably use a language model fine-tuned by the authors of Cyborgism (probably starting with a weak model as a trial, then increasingly strong on... (read more)

There's another issue where "P(doom)" can be read either as the probability that a bad outcome will happen, or the probability that a bad outcome is inevitable. I think the former is usually what's meant, but if "P(doom)" means "the probability that we're doomed", then that suggests the latter as a distracting alternative interpretation.

In terms of "and those people who care will be broad and varied and trying their hands at making movies and doing varied kinds of science and engineering research and learning all about the world while keeping their eyes open for clues about the AI risk conundrum, and being ready to act when a hopeful possibility comes up" we're doing less well compared to my 2008 hopes. I want to know why and how to unblock it.

I think to the extent that people are failing to be interesting in all the ways you'd hoped they would be, it's because being interesting in th... (read more)

As far as I know, this is the standard position. See also this FAQ entry. A lot of people sloppily say "the universe" when they mean the observable part of the universe, and that's what's causing the confusion.

I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example.

It seems hard to make the numbers come out that way. E.g. suppose human-level AGI in 2030 would cause a 60% chance of existential disaster and a 40% chance of existential disaster becoming impossible, and human-level AGI in 2050 would cause a 50% chance of existential disaster and a 50% chance of existential disaster becoming impossible. Then to be indifferen... (read more)

"Safewashing" would be more directly parallel to "greenwashing" and sounds less awkward to my ears than "safetywashing", but on the other hand the relevant ideas are more often called "AI safety" than "safe AI", so I'm not sure if it's a better or worse term.

9oneisnotprime
I prefer safe washing, but vote that we make a huge divisive issue over it, ultimately splitting the community in two.
5Quintin Pope
"Safe AI" -> "AI safety" "Green business" -> "Business greenery"

Yes, my experience of "nobody listened 20 years ago when the case for caring about AI risk was already overwhelmingly strong and urgent" doesn't put strong bounds on how much I should anticipate that people will care about AI risk in the future, and this is important; but it puts stronger bounds on how much I should anticipate that people will care about counterintuitive aspects of AI risk that haven't yet undergone a slow process of climbing in mainstream respectability, even if the case for caring about those aspects is overwhelmingly strong and urgent (... (read more)

  1. after a tech company singularity,

I think this was meant to read "2. after AGI,"

1Andrew_Critch
Yes, thanks!  Fixed.

Note that the full 2021 MIRI conversations are also available (in robot voice) in the Nonlinear Library archive.

1niplav
I'll let this be my chance to ask whether the Alignment Newsletter Podcast is on hold or finished? I don't think there was a publicized announcement of hibernation or termination.

As I see it, "rationalist" already refers to a person who thinks rationality is particularly important, not necessarily a person who is rational, like how "libertarian" refers to a person who thinks freedom is particularly important, not necessarily a person who is free. Then literally speaking "aspiring rationalist" refers to a person who aspires to think rationality is particularly important, not to a person who aspires to be rational. Using "aspiring rationalist" to refer to people who aspire to attain rationality encourages people to misinterpret self-... (read more)

4Said Achmiz
This was not the usage in the Sequences, however, and otherwise at the time the Sequences were written.

Great report. I found the high decision-worthiness vignette especially interesting.

I haven't read it closely yet, so people should feel free to be like "just read the report more closely and the answers are in there", but here are some confusions and questions that have been on my mind when trying to understand these things:

Has anyone thought about this in terms of a "consequence indication assumption" that's like the self-indication assumption but normalizes by the probability of producing paths from selves to cared-about consequences instead of the proba... (read more)

3Tristan Cook
Thanks! Glad to hear it Yep, this is kinda what anthropic decision theory  (ADT) is designed to be :-D ADT + total utilitarianism often gives similar answers to SIA.   Yeah, this is a great point. Toby Ord mentions here the potential for dark energy to be harnessed here, which would lead to a similar conclusion. Things like this may be Pascal's muggings (i.e., we wager our decisions on being in a world where our decisions matter infinitely). Since our decisions might already  matter 'infinitely' (evidential-like decision theory plus an infinite world) I'm not sure how this pans out.   Exactly. SSA (with a sufficiently large reference class) always predicts Doom as a consequence of its structure, but SIA doomsday is contingent on the case we happen to be in (colonisers, as you mention).

My impression (based on using Metaculus a lot) is that, while questions like this may give you a reasonable ballpark estimate and it's great that they exist, they're nowhere close to being efficient enough for it to mean much when they fail to move. As a proxy for the amount of mental effort that goes into it, there's only been three comments on the linked question in the last month. I've been complaining about people calling Metaculus a "prediction market" because if people think it's a prediction market then they'll assume there's a point to be made like... (read more)

Metaculus (unlike Manifold) is not a market and does not use play money except in the same sense that Tetris score is play money.

I don't understand why people are calling Metaculus a prediction market. There's no buying or selling going on, even in play money. There's a score, but score doesn't affect the community estimate, which is just a median of all user predictions weighted by recency. I think it ends up doing pretty well, but calling it a market (which it doesn't call itself) will give readers a mistaken impression of how it works.

steven04619
1Truth
1Clarity
2Seeking
❤️ 1
😮 1

It took a minute to "click" for me that the green up marks and red down marks corresponded to each other in four opposed pairs, and that the Truth/Aim/Clarity numbers also corresponded to these axes. Possibly this is because I went straight to the thread after quickly skimming the OP, but most threads won't have the OP to explain things anyway. So my impression is it should be less opaque somehow. I do like having votes convey a lot more information than up/down. I wonder if it would be best to hide the new features under some sort of "advanced options" in... (read more)

Are there online spaces that talk about the same stuff LW talks about (AI futurism, technical rationality, and so on), with reasonably high quality standards, but more conversational-oriented and less soapbox-oriented, and maybe with less respectability signaling? I often find myself wanting to talk about things discussed here but feeling overconstrained by things like knowing that comments are permanent and having to anticipate objections instead of taking them as they come.

4Daniel Kokotajlo
It's not obvious that unaligned AI would kill us. For example, we might be bargaining chips in some future negotiation with aliens.
steven0461Ω230

I tend to want to split "value drift" into "change in the mapping from (possible beliefs about logical and empirical questions) to (implied values)" and "change in beliefs about logical and empirical questions", instead of lumping both into "change in values".

steven0461Ω370

This seems to be missing what I see as the strongest argument for "utopia": most of what we think of as "bad values" in humans comes from objective mistakes in reasoning about the world and about moral philosophy, rather than from a part of us that is orthogonal to such reasoning in a paperclip-maximizer-like way, and future reflection can be expected to correct those mistakes.

Wei Dai*Ω4110

future reflection can be expected to correct those mistakes.

I'm pretty worried that this won't happen, because these aren't "innocent" mistakes. Copying from a comment elsewhere:

Why did the Malagasy people have such a silly belief? Why do many people have very silly beliefs today? (Among the least politically risky ones to cite, someone I’ve known for years who otherwise is intelligent and successful, currently believes, or at least believed in the recent past, that 2⁄3 of everyone will die as a result of taking the COVID vaccines.) I think the unfort

... (read more)
2Viliam
Could the same be also true about most "good values"? Maybe people just makes mistakes about almost everything.
4Beth Barnes
Is this making a claim about moral realism? If so, why wouldn't it apply to a paperclip maximiser? If not, how do we distinguish between objective mistakes and value disagreements?

"Problematic dynamics happened at Leverage" and "Leverage influenced EA Summit/Global" don't imply "Problematic dynamics at Leverage influenced EA Summit/Global" if EA Summit/Global had their own filters against problematic influences. (If such filters failed, it should be possible to point out where.)

Your posts seem to be about what happens if you filter out considerations that don't go your way. Obviously, yes, that way you can get distortion without saying anything false. But the proposal here is to avoid certain topics and be fully honest about which topics are being avoided. This doesn't create even a single bit of distortion. A blank canvas is not a distorted map. People can get their maps elsewhere, as they already do on many subjects, and as they will keep having to do regardless, simply because some filtering is inevitable beneath the eye of Sa... (read more)

due to the mechanisms described in "Entangled Truths, Contagious Lies" and "Dark Side Epistemology"

I'm not advocating lying. I'm advocating locally preferring to avoid subjects that force people to either lie or alienate people into preferring lies, or both. In the possible world where The Bell Curve is mostly true, not talking about it on LessWrong will not create a trail of false claims that have to be rationalized. It will create a trail of no claims. LessWrongers might fill their opinion vacuum with false claims from elsewhere, or with true claims, ... (read more)

I'm not advocating lying.

I understand that. I cited a Sequences post that has the word "lies" in the title, but I'm claiming that the mechanism described in the cited posts—that distortions on one topic can spread to both adjacent topics, and to people's understanding of what reasoning looks like—can apply more generally to distortions that aren't direct lies.

Omitting information can be a distortion when the information would otherwise be relevant. In "A Rational Argument", Yudkowsky gives the example of an election campaign manager publishing survey re... (read more)

"Offensive things" isn't a category determined primarily by the interaction of LessWrong and people of the sneer. These groups exist in a wider society that they're signaling to. It sounds like your reasoning is "if we don't post about the Bell Curve, they'll just start taking offense to technological forecasting, and we'll be back where we started but with a more restricted topic space". But doing so would make the sneerers look stupid, because society, for better or worse, considers The Bell Curve to be offensive and does not consider technological forecasting to be offensive.

1Said Achmiz
I’m sorry, but this is a fantasy. It may seem reasonable to you that the world should work like this, but it does not. To suggest that “the sneerers” would “look stupid” is to posit someone—a relevant someone, who has the power to determine how people and things are treated, and what is acceptable, and what is beyond the pale—for them to “look stupid” to. But in fact “the sneerers” simply are “wider society”, for all practical purposes. “Society” considers offensive whatever it is told to consider offensive. Today, that might not include “technological forecasting”. Tomorrow, you may wake up to find that’s changed. If you point out that what we do here wasn’t “offensive” yesterday, and so why should it be offensive today, and in any case, surely we’re not guilty of anything, are we, since it’s not like we could’ve known, yesterday, that our discussions here would suddenly become “offensive”… right? … well, I wouldn’t give two cents for your chances, in the court of public opinion (Twitter division). And if you try to protest that anyone who gets offended at technological forecasting is just stupid… then may God have mercy on your soul—because “the sneerers” surely won’t.
Load More