(That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn't come across from the post.)
Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)
"Broad technical knowledge" should be in some sense the "easiest" (...
I currently think broad technical knowledge is the main requisite, and I think self-study can suffice for the large majority of that in principle. The main failure mode I see would-be autodidacts run into is motivation, but if you can stay motivated then there's plenty of study materials.
For practice solving novel problems, just picking some interesting problems (preferably not AI) and working on them for a while is a fine way to practice.
(warning: armchair evolutionary biology)
Another consideration for orca intelligence; they dodge the fermi paradox by not having arms.
Assume the main driver of genetic selection for intelligence is the social arms-race. As soon as a species gets intelligent enough (see humans) from this arms-race they start using their intelligence for manipulating the environment, and start civilization. But orcas mostly lack the external organs for manipulating the enviroment, so they can keep social-arms-racing-boosting-intelligence way past the point of "criticality".&n...
"As a result, we can make progress toward automating interpretability research by coming up with experimental setups that allow AIs to iterate."
This sounds exactly like the kind of progress which is needed in order to get closer to game-over-AGI. Applying current methods of automation to alignment seems fine, but if you are trying to push the frontier of what intellectual progress can be achieved using AI's, I fail to see your comparative advantage relative to pure capabilities researchers.
I do buy that there might be credit to the idea of developing the i...
Transfer learning is dubious, doing philosophy has worked pretty well for me thus far for learning how to do philosophy. More specifically, pick a topic you feel confused about or a problem you want to solve (AI kill everyone oh no?). Sit down and try to do original thinking, and probably use some external tool of preference to write down your thoughts. Then do live or afterwards introspection on if your process is working and how you can improve it, repeat.
This might not be the most helpful, but most people seem to fail at "being comfortable sitting down ...
>It seems like all of the many correct answers to what X would've wanted might not include the AGI killing everyone.
Yes, but if it wants to kill everyone it would pick one which does. The space "all possible actions" also contains some friendly actions.
>Wrt the continuity property, I think Max Harm's corrigibility proposal has that
I think it understands this and is aiming to have that yeah. It looks like a lot of work needs to be done to flesh it out.
I dont have a good enough understanding of ambitious value learning & Roger Dearnaleys proposal to properly comment on these. Skimming + priors put fairly low odds on that they deal with this in the proper manner, but I could be wrong.
The step from "tell AI to do Y" to "AI does Y" is a big part of the entire alignment problem. The reasons chatbots might seem aligned in this sense is that the thing you ask for often lives in a continuous space, and when not too strong optimization pressure is applied, when you ask for Y, Y+epsilon is good enough. This ceases to be the case when your Y is complicated and high optimization pressure is applied, UNLESS you can find a Y which has a strong continuity property in the sense you care about, which I am unaware of anyone who knows how to do.
Not to ...
While I have a lot of respect for many of the authors, this work feels to me like its mostly sweeping the big problems under the rug. It might at most be useful for AI labs to make a quick buck, or do some safety-washing, before we all die. I might be misunderstand some of the approaches proposed here, and some of my critiques might be invalid as such.
My understanding is that the paper proposes that the AI implements and works with a human-interpretable world model, and that safety specifications is given in this world-model/ontology.
But given an ASI with ...
You can totally have something which is trying to kill humanity in this framework though. Imagine something in the style of chaos-GPT, locally agentic & competent enough to use state-of-the-art AI biotech tools to synthesize dangerous viruses or compounds to release into the atmosphere. (note that In this example the critical part is the narrow-AI biotech tools, not the chaos-agent)
You don't need solutions to embedded agency, goal-content integrity & the like to build this. It is easier to build and is earlier in the tech-tree than crisp maximizers...
Unless I misunderstand the confusion, a useful line of thought which might resolve some things:
Instead of analyzing whether you yourself are conscious or not, analyze what is causally upstream of your mind thinking that you are conscious, or your body uttering the words "I am conscious".
Similarly you could analyze whether an upload would would think similar thoughts, or say similar things. What about a human doing manual computations? What about a pure mathematical object?
A couple of examples of where to go from there:
- If they have the same behavior, perh...
Many more are engaged in AI Safety in other ways, eg. as PhD or independent researcher. These are just the positions we know about. We currently have not done a comprehensive survey.
Worth mentioning that most of the Cyborgism community founders came out of or did related projects in AISC beforehand.
I interpret the post you linked as trying to solve the problem of pointing to things in the real world. Being able to point to things in the real world in a way which is ontologically robust is probably necessary for alignment. However "gliders", "strawberries" and "diamonds" seem like incredibly complicated objects to point to in a way which is ontologically robust, and it is not clear that being able to point to these objects actually lead to any kind of solution.
What we are interested in is research into how to create a statistically unique enough...
Recently we modified QACI to give a scoring over actions, instead of over worlds. This should allow weaker systems inner aligned to QACI to output weaker non-DSA actions, such as the textbook from the future, or just human readable advice on how to end the acute risk period. Stronger systems might output instructions for how to go about solving corrigible AI, or something to this effect.
As for diamonds, we believe this is actually a harder problem than alignment, and it's a mistake to aim at it. Solving diamond-maximization requires us to point at what we ...
I do believe that if Altman does manage to create his superAI's, the first such eats Altman and makes squiggles. But if I were to engage in the hypothetical where nice corrigible superassistants are just magically created, Altman does not appear to treat this future he claims to be steering towards seriously.
The world where "everyone has a superassitant" is inherently incredibly volatile/unstable/dangerous due to an incredibly large offence-defence assymetry of superassistants attacking fragile-fleshbags (with optimized viruses, bacteria, molecules, nanobo... (read more)
I think "enforce NAP then give everyone a giant pile of resources to do whatever they want with" is a reasonable first-approximation idea regarding what to do with ASI, and it sounds consistent with Altman's words.
But I don't believe that he's actually going to do that, so I think it's just (3).
I suppose the superassistants could form coalitions and end up as a kind of "society" without too much aggression. But this all seems moot, because superassistants will anyway get outcompeted by AIs that focus on growth. That's the real danger.
I don't see a reason why we should trust Altman's words on this topic more than his previous words on making OpenAI a non-profit.
Before Singularity, I think it just means that OpenAI would like to have everyone as a customer, not just the rich (although the rich will get higher quality), which makes perfect sense economically. Even if governments paid you billions, it would still make sense to also collect $20 from each person on the planet individually.
After Singularity... this just doesn't make much sense, for the reasons you wrote.
I was trying to steelm... (read more)