(EE)CS undergraduate at UC Berkeley
High-level interpretability with @Jozdien, SLT with @Lucius Bushnaq, robustness with Kellin Pelrine
Once again, props to OAI for putting this in the system card. Also, once again, it's difficult to sort out "we told it to do a bad thing and it obeyed" from "we told it to do a good thing and it did a bad thing instead," but these experiments do seem like important information.
“By then I knew that everything good and bad left an emptiness when it stopped. But if it was bad, the emptiness filled up by itself. If it was good you could only fill it by finding something better.”
- Hemingway, A Moveable Feast
The fatebook embedding is so cool! I especially appreciate that it hides other people's predictions before you make your own. From what I can tell this isn't done on Lesswrong right now and I think that would be really cool to see!
(I may be mistaken on how this works, but from what I can tell they look like this on LW right now)
Great post, seems like a handy thing to remember.
The scene in planecrash where Keltham gives his first lecture, as an attempt to teach some formal logic (and a whole bunch of important concepts that usually don't get properly taught in school), is something I'd highly recommend reading! As far as I can remember, you should be able to just pick it up right here, and follow the important parts of the lecture without understanding the story
How difficult would it be to turn this into an epub or pdf? Is there word of that coming soon? (or integrating into LW like the Codex?)
Realizing I kind of misunderstood the point of the post. Thanks!
In the case that there are, like "ai-run industries" and "non-ai-run industries", I guess I'd expect the "ai-run industries" to gobble up all of the resources to the point that even though ai's aren't automating things like healthcare, there just aren't any resources left?
To be clear, if you put doom at 2-20%, you're still quite worried then? Like, wishing humanity was dedicating more resources towards ensuring AI goes well, trying to make the world better positioned to handle this situation, and saddened by the fact that most people don't see it as an issue?
Ah, sorry- I meant it's genuinely unclear how to classify this.