Tetraspace

Drew the shoggoth and named notkilleveryoneism.

https://twitter.com/TetraspaceWest

Wiki Contributions

Comments

Sorted by

Just saw this and can confirm it was one of the best times of my life.

Dominance/submission dynamics in relationships

In Act I outputs Claudes do a lot of this, e.g. this screenshot of Sonnet 3.6

I'd like beta access. My main use case is that I intend to write up some thoughts on alignment (Manifold gives 40% that I'm proud of a write-up, I'd like that number up), and this would be helpful for literature review and finding relevant existing work. Especially so because a lot of the public agent foundations work is old and migrated from the old alignment forum, where it's low-profile compared to more recent posts.

AI isn't dangerous because of what experts think, and the arguments that persuaded the experts themselves are not "experts think this". It would have been a misleading argument for Eliezer in 2000 being among the first people to think about it in the modern way, or for people who weren't already rats in maybe 2017 before GPT was in the news and when AI x-risk was very niche.

I also have objections to its usefulness as an argument; "experts think this" doesn't give me any inside view of the problem by which I can come up with novel solutions that the experts haven't thought of. I think this especially comes up if the solutions might be precise or extreme; if I was an alignment researcher, "experts think this" would tell me nothing about what math I should be writing, and if I was a politician, "experts think this" would be less likely to get me to come up with solutions that I think would work rather than solutions that are compromising between the experts coalition and my other constituents.  

So, while it is evidence (experts aren't anticorrelated with the truth), there's better reasoning available that's more entangled with the truth and gives more precise answers.

I learned this lesson looking at the conditional probabilities of candidates winning given they were nominated in 2016, where the candidates with less than about 10% chance of being the nominee had conditional probabilities with noise between 0 and 100%. And this was on the thickly traded real-money markets of Betfair! I personally engage in, and also recommend, just kinda throwing out any conditional probabilities that look like this, unless you have some reason to believe it's not just noise. 

Another place this causes problems is in the infinitely-useful-if-they-could-possibly-work decision markets, where you want to be able to evaluate counterfactual decisions, except these are counterfactuals so you don't make the decision so there's no liquidity and it can take any value.

Obeying it would only be natural if the AI thinks that the humans are more correct than the AI would ever be, after gathering all available evidence, where "correct" is given by the standards of the definition of the goal that the AI actually has, which arguendo is not what the humans are eventually going to pursue (otherwise you have reduced the shutdown problem to solving outer alignment, and the shutdown problem is only being considered under the theory that we won't solve outer alignment).

An agent holding a belief state that given all available information it will still want to do something other than the action it will think is best then is anti-natural; utility maximisers would want to take that action.

This is discussed on Arbital as the problem of fully updated deference.

This ends up being pretty important in practise for decision markets ("if I choose to do X, will Y?"), where by default you might e.g. only make a decision if it's a good idea (as evaluated by the market), and therefore all traders will condition on the market having a high probability which is obviously quite distortionary. 

Not unexpected! I think we should want AGI to, at least until it has some nice coherent CEV target, explain at each self-improvement step exactly what it's doing, to ask for permission for each part of it, to avoid doing anything in the process that's weird, to stop when asked, and to preserve these properties. 

Load More