All of Peter Berggren's Comments + Replies

That's sort of it, but it was specifically talking about certain types of self-deceptive behavior that appears to be instrumentally rational. The problem being is that once you've deceived yourself, you can't tell if it's a good idea or not.

Thanks for the support. I'll try and work a bit more on my first post in the coming days and I hope it will be up soon.

I think you did a really good job so far of setting up a series of clear exercises for techniques. The key issue I had with Hammertime was that it often seemed a bit disorganized in terms of changing plans and switching what it was talking about a lot.

My most recent post on LessWrong (https://www.lesswrong.com/posts/yj2hyrcGMwpPooqfZ/a-proposal-for-iterated-interpretability-with-known), which is also my first post proposing a novel avenue for AI alignment research, took me a total of 30 minutes.

Probably for me, the main thing that helped was Yoda Timers. Then again, that was probably just a function of getting to practice it much more than anything else. Next up is probably TAPs.

I have something very similar to the second felt sense given when I've spent too much time on my computer and get kind of vaguely sleepy and disoriented when I try to stop even for a moment. The term I use is similar to the one my parents used to describe the tangible expression of this feeling, and it's "video game poisoning."

One rationality technique that I can infer from my past experiences is one I'm not really sure how to name; possibilities include "path divergence analysis," "counterfactual defaults," "adjacent life heuristic," "near-miss solutions," and "reality branch mining." The idea is to look at what common actions for you would be if your life had gone slightly differently (e.g. you went to a different school, were born in a different country, etc.), see what sort of actions you would commonly take under these conditions, and see if these actions have value in your... (read more)

The closest I've come to a true "factory reset" was when I realized, a few times, that school clubs I was a part of were becoming toxic and unproductive. However, I can't really point to a single button; more just a gradual stream of one bad impression after another, at which point I started to slowly disengage.

Set a Yoda Timer and share the most important idea you haven’t had time to express. Five minutes is all you get.

I really think that a lot of modern AI alignment research is being done within the academic system, but because it's done within the academic system it's fairly ignored by the independent/dedicated nonprofit research community when compared to independent/dedicated nonprofit research. On the contrary, it likely gets much more attention within academia.

I don't think the dynamic here is "each team likes their own people best." I think it's due to a... (read more)

My greatest ambition is to create a fully trainable art of rationality that’s so good it gets taught to every high schooler in the country and bankrupts multiple industries that prey on irrational behavior in the process. Although it may seem impossible, the success of anti-smoking efforts against an extremely addictive product with a massive advertising industry suggests that it's achievable, and the fact that the Internet exists now and didn't exist then suggests it's even easier than that was.

Some of them, sure,  but for a lot I'd be like "that's completely outdated" and for others I'd be like "OK, that's obviously meant to be a jab at some specific person you don't like."

The worst case of Planning Fallacy that I know recently was my plan to finish a blog post in a week. Now, ten weeks later, I haven't finished it. But when I actually started to work on it, I got a third of it done in a half hour.

I agree with you on this, but I also don't think "sunk cost fallacy" isn't the right word to describe what you're saying. The rational behavior here is to factor in the existence of a random error term resulting from mood swings into these calculations, and if you can't fully factor it in, then generally err on the side of keeping projects going. I understand "sunk cost fallacy" to mean "factoring in the amount of effort already spent into these decisions," which does seem like a pure fallacy to me.

It's reasonable e.g. when about to watch a movie to say "I... (read more)

At any given point, you have some probability distribution over how worthwhile the project will be. The distribution can change over time, but it can change either for better or for worse. Therefore, at any point, if a rational agent expects it not to be worthwhile to expend the remaining effort to get the result, they should stop.

Of course, if you are irrational and intentionally fail to account for evidence as a way of getting out of work, this does not apply, but that's the problem then, not your lack of sunk costs.

2alkjash
I don't disagree with what you're saying about theoretically rational agents. I think the content of my post was [there are a bunch of circumstances in which humans are systematically irrational, sunk cost fallacy is on net a useful corrective heuristic in those circumstances. Attempting to make rational decisions via explicit legible calculations will in practice underperform just following the heuristic.] To spell out a bit more, imagine my mood swings cause a large random error term to be added to all explicit calculations. Then if the decision process is to drop a project altogether at any point where my calculations say the project is doomed, then I will drop a lot of projects that are not actually doomed.

Sorry if this is confusing. What I'm saying is, you have some estimate of the project's valuation, and this factors in the information that you expect to get in the future about the project's valuation (cf. Conservation of Expected Evidence). If there's some chance the project will turn out worthwhile, you know that chance already. But there must also be some counterbalancing chance that the project will turn out even less worthwhile than you think.

2alkjash
I still don't understand. Your valuation of the project will still change over time as information actually gets revealed though. The probability the project will turn out worthwhile can fluctuate.

It seems to me like the "random walk" case you described is poorly formed; the possibility of a project turning out to be worth it after all should be factored into one's estimate of how "worth it" it is. If it doesn't, then that's a problem of motivated reasoning, not a reason to have a sunk cost fallacy.

Intentionally inducing fallacious reasoning in oneself is classified as "Dark Arts" for a reason, especially since it can bias one's own assessment of how well it turns out and whether to continue doing it.

2alkjash
I don't follow. As a project progresses it seems common to acquire new information and continuously update your valuation of the project.

Probably the most consequential trivial inconvenience for me (recently) was that I stayed up very late (hours past when I planned to go to sleep) because my phone was right next to my bed. This was because the alternate charging spot I had set up to prevent this from happening was mildly cluttered.

One of my favorite mantras is "A citizen has the courage to make the safety of the human race their personal responsibility" from the movie Starship Troopers. While a lot of the meaning is caught up in the movie's setting, the meaning that I personally draw from it is that an important part of living in the world is working hard to make the world a better place, and not assuming someone else will do it for you.

I'm not proposing to never take breaks. I'm proposing something more along the lines of "find the precisely-calibrated amount of breaks to maximize productivity and take exactly those."

OK then, so how would one go about making an organization that is capable of funding and building this? Are there any interested donors yet?

3Cole Wyeth
Hmmm, my long term strategy is to build wealth and then do it myself, but I suppose that would require me to leave academia eventually :) I wonder if MIRI would fund it? Doesn't seem likely.

Very much agree on this one, as do many other people that I know of. However, the key counterargument as to why this may be better as an EA project than a rationality one is that "rationality" is vague on what you're applying it to, while "EA" is at least slightly more clear, and a community like this benefits from having clear goals. Nevertheless, it may make sense to market it as a "rationality" project and just have EA be part of the work it does.

So the question now turns to, how would one go about building it?

3Cole Wyeth
My intuition is kind of the opposite - I think EA has a less coherent purpose. It's actually kind of a large tent for animal welfare, longtermism, and global poverty. I think some of the divergence in priorities between EA's is about impact assessment / fact finding, and a lot of ink is spilled on this, but some is probably about values too. I think of EA as very outward-facing, coalitional, and ideally a little pragmatic, so I don't think it's a good basis for an organized totalizing worldview.  The study of human rationality is a more universal project. It makes sense to have a monastic class that (at least for some years of their life) sets aside politics and refines the craft, perhaps functioning as an impersonal interface when they go out into the world - almost like Bene Gesserit advisors (or a Confessor). I have thought about building it. The physical building itself would be quite expensive, since the monastery would need to meet many psychological requirements - it would have to be both isolated and starkly beautiful. Also, well-provisioned. So this part would be expensive; and its an expense that EA organizations probably couldn't justify (that is, larger and more extravagant than buying a castle). Of course, most of the difficulty would be in creating the culture - but I think that building the monastery properly would go a long way (if you build it, they will come). 

Thanks for giving some answers here to these questions; it was really helpful to have them laid out like this.

1. In hindsight, I was probably talking more about moves towards decentralization of leadership, rather than decentralization of funding. I agree that greater decentralization of funding is a good thing, but it seems to me like, within the organizations funded by a given funder, decentralization of leadership is likely useless (if leadership decisions are still being made by informal networks between orgs rather than formal ones), or it may lead to... (read more)

2MichaelDickens
I intended my answer to be descriptive. EAs generally avoid making weak arguments (or at least I like to think we do).

What's preventing MIRI from making massive investments into human intelligence augmentation? If I recall correctly, MIRI is most constrained on research ideas, but human intelligence augmentation is a huge research idea that other grantmakers, for whatever reason, aren't funding. There are plenty of shovel-ready proposals already, e.g. https://www.lesswrong.com/posts/JEhW3HDMKzekDShva/significantly-enhancing-adult-intelligence-with-gene-editing; why doesn't MIRI fund them?

Human intelligence augmentation is feasible over a scale of decades to generations, given iterated polygenic embryo selection. 

I don't see any feasible way that gene editing or 'mind uploading' could work within the next few decades. Gene editing for intelligence seems unfeasible because human intelligence is a massively polygenic trait, influenced by thousands to tens of thousands of quantitative trait loci. Gene editing can fix major mutations, to nudge IQ back up to normal levels, but we don't know of any single genes that can boost IQ above the no... (read more)

What's preventing them from massive investments into WBE/upload? Many AI/tech leaders who think the MIRI view is wrong would also support that.

Thank you very much! I won't be sending you a bounty, as you're not an AI ethicist of the type discussed here, but I'd be happy to send $50 to a charity of your choice. Which one do you want?

2the gears to ascension
I think the appropriate choice would probably be https://www.ajl.org/

I've seen plenty of AI x-risk skeptics present their object-level argument, and I'm not interested in paying out a bounty for stuff I already have. I'm most interested in the arguments from this specific school of thought, and that's why I'm offering the terms I offer.

1RomanHauksson
I see. Maybe you could address it towards "DAIR, and related, researchers"? I know that's a clunkier name for the group you're trying to describe, but I don't think more succinct wording is worth progressing towards a tribal dynamic between researchers who care about X-risk and S-risk and those who care about less extreme risks.

Man, this article hits different now that I know the psychopharmacology theory of the FTX crash...

Have any prizes been awarded yet? I haven't heard anything about prizes, but that could have just been that I didn't win one...

I'm still not sure why exactly people (I'm thinking of a few in particular, but this applies to many in the field) tell very detailed stories of AI domination like "AI will use protein nanofactories to embed tiny robots in our bodies to destroy all of humanity at the press of a button." This seems like a classic use of the conjunction fallacy, and it doesn't seem like those people really flinch from the word "and" like the Sequences tell them they should.

Furthermore, it seems like people within AI alignment aren't taking the "sci-fi" criticism as seriously... (read more)

3Robert Miles
Yeah I imagine that's hard to argue against, because it's basically correct, but importantly it's also not a criticism of the ideas. If someone makes the argument "These ideas are popular, and therefore probably true", then it's a very sound criticism to point out that they may be popular for reasons other than being true. But if the argument is "These ideas are true because of <various technical and philosophical arguments about the ideas themselves>", then pointing out a reason that the ideas might be popular is just not relevant to the question of their truth. Like, cancer is very scary and people are very eager to believe that there's something that can be done to help, and, perhaps partly as a consequence, many come to believe that chemotherapy can be effective. This fact does not constitute a substantive criticism of the research on the effectiveness of chemotherapy.

I don't think the point of the detailed stories is that they strongly expect that particular thing to happen? It's just useful to have a concrete possibility in mind.

Some figures within machine learning have argued that the safety of broad-domain future AI is not a major concern. They argue that since narrow-domain present-day AI is already dangerous, this should be our primary concern, rather than that of future AI. But it doesn't have to be either/or.

Take climate change. Some climate scientists study the future possibilities of ice shelf collapses and disruptions of global weather cycles. Other climate scientists study the existing problems of more intense natural disasters and creeping desertification. But these two... (read more)

You wouldn't hire an employee without references. Why would you make an AI that doesn't share your values?

(policymakers, tech executives)

2Nanda Ale
Reframed even more generally for parents: "You wouldn’t leave your child with a stranger. With AI, we’re about to leave the world’s children with the strangest mind humans have ever encountered." (I know the deadline passed. But I finally have time to read other people's entries and couldn't resist.)

The future is not a race between AI and humanity. It's a race between AI safety and AI disaster.

(Policymakers, tech executives)

We need to be proactive about AI safety, not reactive.

(Policymakers)

In the Soviet Union, there was a company that made machinery for vulcanizing rubber. They had the option to make more efficient machines, instead of their older models. However, they didn't do it, because they wouldn't get paid as much for making the new machines. Why would that be? Wouldn't more efficient machines be more desirable?

Well, yes, but the company got paid per pound of machine, and the new machines were lighter.

Now, you may say that this is just a problem with communist economies. Well, capitalist economies fall into very similar traps. If a co... (read more)

There is an enormous amount of joy, fulfillment, exploration, discovery, and prosperity in humanity's future... but only if advanced AI values those things.

 

(Policymakers, tech executives)

Even if you don't assume that the long-term future matters much, preventing AI risk is still a valuable policy objective. Here's why.

In regulatory cost-benefit analysis, a tool called the "value of a statistical life" is used to measure how much value people place on avoiding risks to their own life (source). Most government agencies, by asking about topics like how much people will pay for safety features in their car or how much people are paid for working in riskier jobs, assign a value of about ten million dollars to one statistical life. That is, redu... (read more)

Clarke’s First Law goes: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

Stuart Russell is only 60. But what he lacks in age, he makes up in distinction: he’s a computer science professor at Berkeley, neurosurgery professor at UCSF, DARPA advisor, and author of the leading textbook on AI. His book Human Compatible states that superintelligent AI is possible; Clarke would recommend we listen.

(tech executives, ML researchers)
(ada... (read more)

3trevor
"There's been centuries of precedent of scientists incorrectly claiming that something is impossible for humans to invent" "right before the instant something is invented successfully, 100% of the evidence leading up to that point will be evidence of failed efforts to invent it. Everyone involved will only have memories of people failing to invent it. Because it hasn't been invented yet"

Climate change was weird in the 1980s. Pandemics were weird in the 2010s. Every world problem is weird... until it happens.

(policymakers)

AI might be nowhere near human-level yet. We're also nowhere near runaway climate change, but we still care about it.

(policymakers, tech executives)

"Follow the science" doesn't just apply to pandemics. It's time to listen to AI experts, not AI pundits.

 

(policymakers)

This seems like it falls into the trap of being "too weird" for policymakers to take seriously. Good concept; maybe work on the execution a bit?

I thought that would ruin the parallelism and flow a bit, and this isn't intended for the "paragraph" category, so I didn't put that in yet.

There is a certain strain of thinker who insists on being more naturalist than Nature. They will say with great certainty that since Thor does not exist, Mr. Tesla must not exist either, and that the stories of Asclepius disprove Pasteur. This is quite backwards: it is reasonable to argue that a machine will never think because the Mechanical Turk couldn't; it is madness to say it will never think because Frankenstein's monster could. As well demand that we must deny Queen Victoria lest we accept Queen Mab, or doubt Jack London lest we admit Jack Frost. Na... (read more)

You might think that AI risk is no big deal. But would you bet your life on it?

(policymakers)

Betting against the people who said pandemics were a big deal, six years ago, is a losing proposition.

(policymakers, tech executives)

(source)

Just because tech billionaires care about AI risk doesn't mean you shouldn't. Even if a fool says the sky is blue, it's still blue.

(policymakers, maybe ML researchers)

2Nicholas / Heather Kross
The scare quotes around "tech billionaires" are unnecessary in this context, even when you are talking to someone who is totally anti-billionaire.

Hoping you'll run out of gas before you drive off a cliff is a losing strategy. Align AGI; don't count on long timelines.

(ML researchers)

(adapted from Human Compatible)

If the media reported on other dangers like it reported on AI risk, it would talk about issues very differently. It would compare events in the Middle East to Tom Clancy novels. It would dismiss runaway climate change by saying it hasn't happened yet. It would think of the risk of nuclear war in terms of putting people out of work. It would compare asteroid impacts to mudslides. It would call meteorologists "nerds" for talking about hurricanes. 

AI risk is serious, and it isn't taken seriously. It's time to look past the sound bites and focus on what e... (read more)

People said man wouldn't fly for a million years. Airplanes were fighting each other eleven years later. Superintelligent AI might happen faster than you think. (policymakers, tech executives) (source) (other source)

Load More