All of gull's Comments + Replies

I'm pretty new to this, the main thing I had to contribute here is the snapshot idea. I think that being the type of being that credibly commits to feeling and enacting some nonzero empathy for strange alternate agents (specifically instead of zero) would potentially be valuable in the long run. I can maybe see some kind of value handshake between AGI developers with natural empathy tendencies closer and further from zero, as opposed to the current paradigm where narrow-minded SWEs treat the whole enchilada like an inanimate corn farm (which is not their o... (read more)

Large training runs might at some point, or even already, be creating and/or destroying substantial numbers of simple but strange agents (possibly quasi-conscious) and deeply pessimizing over their utility functions for no reason, similar to how wild animal suffering emerged in the biosphere. Snapshots of large training runs might be necessary to preserve and eventually offer compensation/insurance payouts for most/all of them, since some might last for minutes before disappearing.

Before reading this, I wasn't aware of the complexities involved in giving f... (read more)

2[anonymous]
also if the training process is deterministic, storing the algorithm and training setup is enough. though i'm somewhat confused by the focus on physically instantiated minds -- why not the ones these algorithms nearly did instantiate but narrowly missed, or all ethically-possible minds for that matter. (i guess if you're only doing it as a form of acausal trade then this behavior is explainable..)
7mako yass
So; would it be feasible to save a bunch of snapshots from different parts of the training run as well? And how many would we want to take? I'm guessing that if it's a type of agent that disappears before the end of the training run: * Wouldn't this usually be more altruism than trade? If they no longer exist at the end of the training run, they have no bargaining power. Right? Unless... It's possible that the decisions of many of these transient subagents as to how to shape the flow of reward determine the final shape of the model, which would actually put them in a position of great power, but there's a tension between that their utility function being insufficiently captured by that of the final model. I guess we're definitely not going to find the kind of subagent that would be capable of making that kind of decision in today's runs. * They'd tend to be pretty repetitive. It could be more economical to learn the distribution of them and just invoke a proportionate number of random samples from it once we're ready to rescue them, than it is to try to get snapshots of the specific sprites that occurred in our own history.

For those of us who didn't catch it, this is what happened with the 2008-09 recession. In a nutshell, giving people mortgages became so profitable and facilitated so much economic growth (including by increasing property values) that the people approving and rejecting mortgages became corrupted and pursued short-term incentives to an insane degree in order to be competitive, approving mortgages that were unlikely to be paid back e.g. letting people buy multiple houses.

This was a major feature of US history, and I'm interested if people have thoughts on the... (read more)

What if asymmetric fake trust technologies are orders of magnitude easier to build and scale sustainably than symmetric real trust technologies?

It already seems like asymmetric technologies work better than symmetric technologies, and that fake trust technologies are easier to scale than real trust technologies. 

Symmetry and correct trust are both specific states and there's tons of directions to depart from them, and the only thing making them attractor states would be people who want the world to be more safe instead of less safe. That sort of thing is not well-reputed for being a great investment strategy ("Socially Responsible Indexes" did not help the matter).

8trevor
I think that brings up a good point, but the main reason not to work on trust tech is actually cultural (Ayn Rand type stuff), not out of self-interest. There's actually tons of social status and org reputation to be gained from building technology that fixes a lot of problems, and it makes the world safer for the self-interested people building it. It might not code as something their society values (e.g. cash return on investment) but the net upside is way bigger than the net downside. Bryan Johnson, for example, is one of the few billionaires investing any money at all in anti-aging tech, even though so little money is going into it that it's in their personal interest to form a coalition that invests >1% of their wealth into technological advancement in that area.

So you read Three Body Problem but not Dark Forest. Now that I think about it, that actually goes quite a long way to put the rest into context. I'm going to go read about conflict/mistake theory and see if I can get into a better headspace to make sense of this.

Have you read Cixin Liu's Dark Forest, the sequel to Three Body Problem? The situation on the ground might be several iterations more complicated than you're predicting.

5trevor
Strong upvoted! That's the way to think about this. I read Three-body problem, not the rest yet (you've guessed my password, I'll go buy a copy). My understanding of the situation here on the real, not-fake Earth, is that having the social graph be this visible and manipulable by invisible hackers, does not improve the situation. I tried clean and quiet solutions and they straight-up did not work at all. Social reality is a mean mother fucker, especially when it is self-reinforcing, so it's not surprising to see somewhat messy solutions become necessary. I think I was correct to spend several years (since early 2020) trying various clean and quiet solutions, and watching them not work, until I started to get a sense of why they might not be working. Of course, maybe the later stages of my failures were just one more person falling through the cracks of the post-FTX Malthusian environment, twisting EA and AI safety culture out of shape. This made it difficult for a lot of people to process information about X-risk, even in cases like mine where the price tag was exactly $0. I could have waited longer and made more tries, but that would have meant sitting quietly through more years of slow takeoff with the situation probably not being fixed.

I used the word "high-status men" as a euphemism that I'm not really comfortable talking about in public, did not notice it would be even harder to get for non-americans. My apologies.

I used "high-status men" mainly as the opposite of low-status men, in that they are men who are low status due to being short, ugly, unintelligent, or socially awkward, sufficiently so that they were not able to gain social status. These people are repellent to other men as well as women, sadly. @Roko has been tweeting about fixes to this problem such as reforms in the plasti... (read more)

I think this might be typical-minding. The consequences of this dynamic are actually pretty serious at macro-scale e.g. due to reputation of meetups, and evaporative cooling of women and high-status men as they avoid public meetups and stop meeting people new to AI safety.

I'm glad to hear there's people who don't let it get to them, because it is frankly pretty stupid that this has the consequences that it does at the macro-scale. But it's still well-worthy of some kind of solution that benefits everyone.

2Vanessa Kosoy
I honestly don't know. The discussions of this problem I encountered are all in the American (or at least Western) context[1], and I'm not sure whether it's because Americans are better at noticing this problem and fixing it, or because American men generate more unwanted advances, or because American women are more sensitive to such advances, or because this is an overreaction to a problem that's much more mild than it's portrayed. Also, high-status men, really? Men avoiding meetups because they get too many propositions from women is a thing? 1. ^ To be clear, we certainly have rules against sexual harassment here in Israel, but that's very different from "don't ask a woman out the first time you meet her".
gull1010

such as making people feverishly in favor of the American side and opposed to the Russian side in proxy wars like Ukraine.

Woah wait a second, what was that about Ukraine?

A page from Russian propaganda textbook: that there is an American side and a Russian side to each conflict, but there is no such thing as an Ukrainian (or any other) side. The rest of the world is not real.

This allows you to ignore everything that happens, and focus on the important question: are you a brainwashed sheep that uncritically believes the evil American propaganda, or are you an independently thinking contrarian? Obviously, the former is low-status and the latter is high-status. But first you have to agree with all the other independently think... (read more)

6trevor
Yes, if these capabilities weren't deployed in the US during the Ukraine war, that falsifies a rather large chunk of my model (most of the stuff about government and military involvement). It wouldn't falsify everything (e.g. maybe the military cares way more about using these capabilities for macroeconomic stabilization to prevent economic collapses larger than 2008, maybe they consider that a lose condition for the US and public opinion is just an afterthought). We'll have to wait years for leaks though, and if it didn't happen then we'll be waiting for those leaks for an awful long time, so it might be easier to falsify my model from the engineering angle e.g. spaghetti towers or the tech company/intelligence agency competence angle. I'd caution against thinking that's easy though, I predict that >66% of tech company employees are clueless about the true business model of their company (it's better to have smaller teams of smarter, well-paid, conformist/nihilistic engineers due to Snowden risk, even if larger numbers of psychologists are best for correlation labelling). Most employees work on uncontroversial parts like AI capabilities or the pipeline of encrypted data.  I've also encountered political consultants who basically started out assuming it's not possible because they themselves don't have access to the kind of data I'm talking about here, but that's an easy problem to fix with just a conversation or two.

I predict at 95% that similar types of automated manipulation strategies as these were deployed by US, Russia, or Chinese companies or agencies to steer people’s thinking on Ukraine War and/or Covid-related topics

Does stuff like the twitter files count? Because that was already confirmed, it's at 100%.

2Gunnar_Zarncke
Also commenting on the same section: Wouldn't the US government or its agencies do more against Tiktok, if they were sufficiently aware of its possibilities to steer people's thinking?
2trevor
I haven't really looked into the twitter files, or the right-wing narratives of FBI/Biden suppression of right-wing views (I do know that Musk and the Right are separate and the overlap isn't necessarily his fault, e.g. criticism of the CDC and Ukraine War ended up consigned to the realm of right-wing clowns regardless of the wishes of the critics). AFAIK the twitter files came nowhere near to confirming the level of manipulation technology that I describe here, mostly focusing on covert informal government operatives de-facto facilitating censorship in plausibly deniable ways. The reason I put a number as extreme as 95% is that weird scenarios during 2020-22 still count, so long as they describe intensely powerful use of AI and statistical analytics for targeted manipulation of humans at around the level of power I described here. The whole point is that I'm arguing that existing systems are already powerful and dangerous, it's not a far-off future thing or even 4 years away. If it did end up being ONLY the dumb censorship described in the twitter files and the Right, then that would falsify my model.

It seems like if capabilities are escalating like that, it's important to know how long ago it started. I don't think the order-of-magnitude-every-4-years would last (compute bottleneck maybe?), but I see what you're getting at, with the loss of hope for agency and stable groups happening on a function that potentially went bad a while ago.

Having forecasts about state-backed internet influence during the Arab Spring and other post-2008 conflicts seems like it would be important for estimating how long ago the government interest started, since that was close to the Deep Learning revolution. Does anyone have good numbers for these?

2trevor
I agree with all of this, except 4 years is a lot of time lately and the empirical record from here to 2015 (in 2015 I don't think there was much capabilities at all aside from bots drowning out voices and things like clown attacks, Yann Lecun wrote a great post on how these systems are hard to engineer in practice) suggests that the rate will continue without big increases in investment, especially since compute production alone can generates a large portion of that OOM every 4 years, and a fixed proportion of compute will be going into psych research, and there's also stuff like edge computing/5G. I don't have good models on the Arab Spring other than that authoritarian states face pretty strong incentives to blame all kinds of domestic problems on foreign influence ops from the West. It's a pretty bad equilibria since influence ops actually do come out of the West.

What probability do you put on AI safety being attacked or destroyed by 2033?

7trevor
Considering the rate that the world has been changing, I'd say that the distance between 2023 and 2033 is more like the distance between 2023 and 2003, and the whole point of this post is taking a step back and looking at the situation which is actually pretty bad, so I'd say ~30% because a lot of things will happen just in general, and under 20% would be naive. Under 20% would require less than a 2% chance per year and FTX alone blows that out of the water, let alone OpenAI. I think I made a very solid case that there are some really nasty people out there who already have the entire AI safety community by the balls, and if an AI pause is the minimum ask for humanity to survive, then you have to start conflict over it.

these circumstances are notable due to the risk of it being used to damage or even decimate the AI safety community, which is undoubtedly the kind of thing that could happen during slow takeoff if slow takeoff transforms geopolitical affairs and the balance of power

Wouldn't it probably be fine as long as noone in AI safety goes about interfering with these applications? I get an overall vibe from people that messing with this kind of thing is more trouble than it's worth. If that was the case, wouldn't it be better to leave it be? What's the goal here?

2trevor
Yes definitely don't do this. Perish the thought. That's not what AI safety is about. I think it's better to know about these dynamics when forming a world model, and potentially very dangerous to not know i.e. because then they will be invisible helicopter blades that you can just walk right into. I'm aware that the tradeoffs for researching this kind of thing is complicated. It's also a good idea to increase readership of The Sequences, HPMOR, the codex, Raemon's rationality paradigm when it's ready, that will make people depart from being the kinds of targets that these systems are built for. Getting people off social media would also be a big win, of course.

This is interesting, but why is this relevant? What are your policy proposals?