Dominance/submission dynamics in relationships
In Act I outputs Claudes do a lot of this, e.g. this screenshot of Sonnet 3.6
Fast/Slow takeoff
I'd like beta access. My main use case is that I intend to write up some thoughts on alignment (Manifold gives 40% that I'm proud of a write-up, I'd like that number up), and this would be helpful for literature review and finding relevant existing work. Especially so because a lot of the public agent foundations work is old and migrated from the old alignment forum, where it's low-profile compared to more recent posts.
AI isn't dangerous because of what experts think, and the arguments that persuaded the experts themselves are not "experts think this". It would have been a misleading argument for Eliezer in 2000 being among the first people to think about it in the modern way, or for people who weren't already rats in maybe 2017 before GPT was in the news and when AI x-risk was very niche.
I also have objections to its usefulness as an argument; "experts think this" doesn't give me any inside view of the problem by which I can come up with novel solutions that the experts...
I learned this lesson looking at the conditional probabilities of candidates winning given they were nominated in 2016, where the candidates with less than about 10% chance of being the nominee had conditional probabilities with noise between 0 and 100%. And this was on the thickly traded real-money markets of Betfair! I personally engage in, and also recommend, just kinda throwing out any conditional probabilities that look like this, unless you have some reason to believe it's not just noise.
Another place this causes problems is in the infinitely-u...
Obeying it would only be natural if the AI thinks that the humans are more correct than the AI would ever be, after gathering all available evidence, where "correct" is given by the standards of the definition of the goal that the AI actually has, which arguendo is not what the humans are eventually going to pursue (otherwise you have reduced the shutdown problem to solving outer alignment, and the shutdown problem is only being considered under the theory that we won't solve outer alignment).
An agent holding a belief state that given all available informa...
This ends up being pretty important in practise for decision markets ("if I choose to do X, will Y?"), where by default you might e.g. only make a decision if it's a good idea (as evaluated by the market), and therefore all traders will condition on the market having a high probability which is obviously quite distortionary.
I replied on discord that I feel there's maybe something more formalisable that's like:
which I do not claim confidently because I haven't actually generated that formalisation, and am...
Not unexpected! I think we should want AGI to, at least until it has some nice coherent CEV target, explain at each self-improvement step exactly what it's doing, to ask for permission for each part of it, to avoid doing anything in the process that's weird, to stop when asked, and to preserve these properties.
Even more recently I bought a new laptop. This time, I made the same sheet, multiplied the score from the hard drive by because 512 GB is enough for anyone and that seemed intuitively the amount I prioritised extra hard drive space compared to RAM and processor speed, and then looked at the best laptop before sharply diminishing returns set in; this happened to be the HP ENVY 15-ep1503na 15.6" Laptop - Intel® Core™ i7, 512 GB SSD, Silver. This is because I have more money now, so I was aiming to maximise consumer surplus rather than minimise t...
It would be evidence at all. Simple explanation: if we did observe a glitch, that would pretty clearly be evidence we were in a simulation. So by conservation of expected evidence, non-glitches are evidence against.
I don't think it's quite that; a more central example I think would be something like a post about extrapolating demographic trends to 2070 under the UN's assumptions, where then justifying whether or not 2070 is a real year is kind of a different field.
, as a mathematical structure, is smarter than god and perfectly aligned to ; the value of will never actually be because is more objectively rational, or because you made a typo and it knows you meant to say ; and no matter how complicated the mapping is from to it will never fall short of giving the that gives the highest value of .
Which is why in principle you can align a superior being, like , or maybe like a superintelligenc...
"The AI does our alignment homework" doesn't seem so bad - I don't have much hope for it, but because it's a prosaic alignment scheme so someone trying to implement it can't constrain where Murphy shows up, rather than because it's an "incoherent path description".
A concrete way this might be implemented is
It was also discussed here: https://www.lesswrong.com/posts/hGnqS8DKQnRe43Xdg/bing-finding-ways-to-bypass-microsoft-s-filters-without
The alignment, safety and interpretability is continuing at full speed, but if all the efforts of the alignment community are sufficient to get enough of this to avoid the destruction of the world in 2042, and AGI is created in 2037, then at the end you get a destroyed world.
It might not be possible in real life (List of Lethalities: "we can't just decide not to build AGI"), and even if possible it might not be tractable enough to be worth focusing any attention on, but it would be nice if there was some way to make sure that AGI happens after alignment is...
80,000 Hours' job board lets you filter by city. As of the time of writing, roles in their AI Safety & Policy tag are 61/112 San Francisco, 16/112 London, 35/112 other (including remote).
There are about 8 billion people, so your 24,000 QALYs should be 24,000,000.
I don't mean to say that it's additional reason to respect him as an authority or accept his communication norms above what you would have done for other reasons (and I don't think people particularly are here), just that it's the meaning of that jokey aside.
Maybe you got into trouble for talking about that because you are rude and presumptive?
I think this is just a nod to how he's literally Roko, for whom googling "Roko simulation" gives a Wikipedia article on what happened last time.
What, I wonder, shall such an AGI end up "thinking" about us?
IMO: "Oh look, undefended atoms!" (Well, not in that format. But maybe you get the picture.)
You kind of mix together two notions of irrationality:
I think only the first one is really deserving of the name "irrationality". I want what I want, and if what I want is a very complicated thin...
Boycotting LLMs reduces the financial benefit of doing research that is (EDIT: maybe) upstream to AGI in the tech tree.
Arbital gives a distinction between "logical decision theory" and "functional decision theory" as:
More recently, I've seen in Decision theory does not imply that we get ...
Further to it being legally considered murder, tricky plans to get around this are things that appear to the state like possibly a tricky plan to get around murder, and result in an autopsy which at best and only if the cryonics organisation cooperates leaves one sitting around warm for over a day with no chance of cryoprotectant perfusion later.
Rereading a bit of Hieronym's PMMM fanfic "To The Stars" and noticing how much my picture of dath ilan's attempt at competent government was influenced / inspired by Governance there, including the word itself.
For some inspiration, put both memes side by side and listen to Landsailor. (The mechanism by which one listens to it, in turn, is also complex. I love civilisation.)
Beemium (the subscription tier that allows pledgeless goals) is $40/mo currently, increased in January 2021 from $32/mo and in 2014 from the original $25/mo.
The essay What Motivated Rescuers During the Holocaust is on Lesswrong under the title Research: Rescuers during the Holocaust - it was renamed because all of the essay titles in Curiosity are questions, which I just noticed now and is cute. I found it via the URL lesswrong.com/2018/rescue, which is listed in the back of the book.
The bystander effect is an explanation of the whole story:
Why would you wait until ? It seems like at any time the expected payoff will be , which is strictly decreasing with .
One big advantage of getting a hemispherectomy for life extension is that, if you don't tell the Metaculus community before you do it, you can predict much higher than the community median of 16% - I would have 71 Metaculus points to gain from this, for example, much greater than the 21 in expectation I would get if the community median was otherwise accurate.
The real number 0.20 isn't a probability, it's just the same odds but written in a different way to make it possible to multiply (specifically you want some odds product *
such that A:B * C:D = AC:BD
). You are right about how you would convert the odds into a probability at the end.
Just before she is able to open the envelope, a freak magical-electrical accident sends a shower of sparks down, setting it alight. Or some other thing necessiated by Time to ensure that the loop is consistent. Similar kinds of problems to what would happen if Harry was more committed to not copying "DO NOT MESS WITH TIME".
I have used this post quite a few times as a citation when I want to motivate the use of expected utility theory as an ideal for making decisions, because it explains how it's not just an elegant decisionmaking procedure from nowhere but a mathematical inevitability of the requirements to not leave money on the table or to accept guaranteed losses. I find the concept of coherence theorems a better foundation than the normal way this is explained, by pointing at the von Neumann-Morgensten axioms and saying "they look true".
The number of observers in a universe is solely a function of the physics of that universe, so the claim that a theory that implies 2Y observers is a third as likely as a theory that implies Y observers (even before the anthropic update) is just a claim that the two theories don't have an equal posterior probability of being true.
This post uses the example of GPT-2 to highlight something that's very important generally - that if you're not concentrating, you can't distinguish GPT-2 generated text that is known to be gibberish from non-gibberish.
And hence gives the important lesson, which might be hard to learn oneself if they're not concentrating, that you can't really get away with not concentrating.
This is self-sampling assumption-like reasoning: you are reasoning as if experience is chosen from a random point in your life, and since most of an immortal's life is spent being old, but most of a mortal's life is spent being young, you should hence update away from being immortal.
You could apply self-indication assumption-like reasoning to this: as if your experience is chosen from a random point in any life. Then, since you are also conditioning on being young, and both immortals and mortals have one youthhood each, just being young doesn't give ...
Yes requiring the possibility of no has been something I've intuitively been aware of in social situations (anywhere where one could claim "you would have said that anyway").
This post does a good job of applying more examples and consequences of this (the examples cover a wide range of decisions), and tying to to the mathematical law of conservation of evidence.
In The Age of Em, I was somewhat confused by the talk of reversible computing, since I assumed that the Laudauer limit was some distant sci-fi thing, probably derived by doing all your computation on the event horizon of a black hole. That we're only three orders of magnitude away from it was surprising and definitely gives me something to give more consideration to. The future is reversible!
I did a back-of-the-envelope calculation about what a Landauer limit computer would look like to rejiggle my intuitions with respect to this, because "amazing sci-fi f...
6 orders of magnitude from FLOPs to bit erasure conversion
Does it take a million bit erasures to conduct a single floating point operation? That seems a bit excessive to me.
:0, information on the original AI box games!
In that round, the ASI convinced me that I would not have created it if I wanted to keep it in a virtual jail.
What's interesting about this is that, despite the framing of Player B being the creator of the AGI, they are not. They're still only playing the AI box game, in which Player B loses by saying that they lose, and otherwise they win.
For a time I suspected that the only way that Player A could win a serious game is by going meta, but apparently this was done just by keeping Player B swept up in their r...
Smarkets is currently selling shares in Trump conceding if he loses at 57.14%. The Good Judgement Project's superforecasters predict that any major presidential candidate will concede with probability 88%. I assign <30% probability to Biden conceding* (scenarios where Biden concedes are probably overwhelmingly ones where court cases/recounts mean states were called wrong, which Betfair assigns ~10% probability to, and FTX kind of** assigns 15% probability to, and even these seem high), so I think it's a good bet to take.
* I think that the Trump concedes...
I bet £10 on Biden winning on Smarkets upon reading the GJP prediction, because I trust superforecasters more than prediction markets. I bet another £10 after reading Demski's post on Kelly betting - my bankroll is much larger than £33 (!! Kelly bets are enormous!) but as far as my System 1 is concerned I'm still a broke student who would have to sheepishly ask their parents to cover any losses.
Very pleased about the tenner I won, might spend it on a celebratory beer.
The problem I have and wish to solve is, of course, the accurséd Akrasia that stops me from working on AI safety.
Let's begin with the easy ones:
1 Stop doing this babble challenge early and go try to solve AI safety.
2 Stop doing this babble challenge early; at 11 pm, specifically, and immediately sleep, in order to be better able to solve AI safety tomorrow.
In fact generally sleep seems to be a problem, I spend 10 hours doing it every day (could be spent solving AI safety) and if I fall short I am tired. No good! So working on this instrumental goal.
3 Get
This means you can build an action that says something like "if I am observable, then I am not observable. If I am not observable, I am observable" because the swapping doesn't work properly.
Constructing this more explicitly: Suppose that and . Then must be empty. This is because for any action in the set , if was in then it would have to equal which is not in , and if was not in it would have to equal which is in .
Since is empty, is not observable.
Because the best part of a sporting event is the betting, I ask Metaculus: [Short-Fuse] Will AbstractSpyTreeBot win the Darwin Game on Lesswrong?
How does your CooperateBot work (if you want to share?). Mine is OscillatingTwoThreeBot which IIRC cooperates in the dumbest possible way by outputting the fixed string "2323232323...".
I have two questions on Metaculus that compare how good elements of a pair of cryonics techniques are: preservation by Alcor vs preservation by CI, and preservation using fixatives vs preservation without fixatives. They are forecasts of the value (% of people preserved with technique A who are revived by 2200)/(% of people preserved with technique B who are revived by 2200), which barring weird things happening with identity is the likelihood ratio of someone waking up if you learn that they've been preserved with one technique vs the other.
Interpreting t...
Just saw this and can confirm it was one of the best times of my life.