KatWoods

Wiki Contributions

Comments

Sorted by

I laughed out loud so many times reading this. Thank you for writing it. 

If I recall correctly, it was told that it had a goal, like optimizing traffic. Then, it was given information where it "discovered" that it was going to be turned off the next day. 

That seems like the sort of thing that would happen pretty often. Imagine the current AIs that have access to your computer, reading your emails, and seeing that you're going to switch to a different AI or install "updates" (which is often deleting existing AI and replacing with a new one)

How does it not show that it would try to escape if it could?

It tried to escape about 5% of the time when it was told to pursue a goal "at all costs". Gemini still tried to escape 1% of the time, even without any behavioral nudging. 

1% might not sound like a lot, but when it's 1% of the millions of instantiations it's run on every day, that ends up being a lot of times a week. 

Also, a lot of people will tell their chatbots to pursue their goals at all costs over the span of a week or month. This doesn't seem like a very unlikely prompt

This is published before Apollo's evals report showing AIs trying to escape. Is that just an amazing coincidence, or did you see the paper's results before they were published? 

Great review! Thanks for sharing. 

I'm curious - even if you didn't achieve fundamental well-being, do you feel like there were any intermediate improvements in your general well-being? 

Also, did you end up continuing to try the extended course which they offered? 

I remember they offered that for me since I hadn't attained fundamental well-being. I'd totally been meaning to do it, but never followed through, showing that the class format really was quite helpful. 

KatWoods41

This may be true in other communities, but I think if you're more status motivated in AI safety and EA you are more likely to be concerned about potential downside risks. Especially post SBF.

Instead of trying to maximize the good, I see a lot of people trying to minimize the chance that things go poorly in a way that could look bad for them.

You are generally given more respect and funding and social standing if you are very concerned about downside risks and reputation hazards.

If anything, the more status-oriented you are in EA, the more likely you are to care about downside risks because of the Copenhagen theory of ethics.

This text was sent on November 4th, almost a month before she arrived to come travel with us (not to work for us).

Emerson is not referring to her saying she would make $3000 a month if she worked full-time on her Amazon business. The context of the conversation is she's trying to figure out whether she should spend an additional $90 to visit her family before joining us, and Emerson is replying saying "If you make $3k a month [$90] is very little money", so he's telling her she should spend the $90 to spend time with family. Directly going against the "keeping her isolated from family" story and also supporting (albeit not conclusively proving) that Alice had told him she made $3k per month with her business. 

I’m currently focusing on 2-3 of the claims in their response that most contradict my post, investigating them further, and intend to publish the results of that.

I hope that while you’re investigating this, you talk to us and ask us for any evidence we have. We’re more than happy to share relevant evidence and are willing to set reasonable deadlines for how long it’ll take for us to send it to you. 

We also don’t want to waste more people’s time on going back and forth publicly about the evidence when you can easily check with us first before publishing. 

I also recommend you talk to us and see our evidence before you write the post. If you’ve already written the post, it’s hard to update afterward when you get more information. And it’s hard to write an accurate post before you’ve seen all the relevant information. 

We did not share all of the relevant evidence because it was already hundreds of pages long and we tried to prioritize. We have more evidence that might be relevant to your post. 

I am trying to avoid writing my bottom line, and reduce any (further) friction to me changing my mind on this subject, which is a decent chunk of why I’m not spending time arguing in the comments right now (I expect that to give me a pretty strong “digging in my heels” incentive).

I think this is smart and appreciate it. 

We said this in our post about the vegan food:

"We chose this example not because it’s the most important (although it certainly paints us in a very negative and misleading light) but simply because it was the fastest claim to explain where we had extremely clear evidence without having to add a lot of context, explanation, find more evidence, etc.

We have job contracts, interview recordings, receipts, chat histories, and more, which we are working full-time on preparing.

This claim was a few sentences in Ben’s article but took us hours to refute because we had to track down all of the conversations, make them readable, add context, anonymize people, check our facts, and write up an explanation that was rigorous and clear. Ben’s article is over 10,000 words and we’re working as fast as we can to respond to every point he made. 

Again, we are not asking for the community to believe us unconditionally. We want to show everybody all of the evidence and also take responsibility for the mistakes we made."

As for the "isolated" claim, we showed that this did not happen. Alice lived/worked apart from us for 50% of the time. Chloe's boyfriend was invited to travel with us 40% of the time. We encouraged them to have regular calls with friends and family when they weren't visiting. We have the invite policy where it says they're encouraged to invite friends and family (and they followed up on this, like with Chloe's boyfriend). 

Load More