this work was done by Tamsin Leake and Julia Persson at Orthogonal. thanks to mesaoptimizer for his help putting together this post. what does the QACI plan for formal-goal alignment actually look like when formalized as math? in this post, we'll be presenting our current formalization, which we believe has...
Over time during my life Ive started to see the world as a more and more horrible place, in various ways. Ive noticed that this seems to make it harder to be excited about things in general, although I'm not confident that these are related. This seems kind of bad as being excited about things is among other things important for learning things and doing things.
Imagine a robot which serves coffee and does a back-flip. Wouldn't that be awesome cool? A healthy kid would probably be excited about making such a thing. This kind of feels like a values thing in some sense. The world containing an awesome robot sure seems nice.
But... (read more)
I expect the main cost to be regulatory rather than technical, this seems to be a trend across various medicine. These costs might scale with the richest peoples ability to pay.
examples-ish;
- Needing expensive studies to get FDA (or other regulatory framework) approval, (and thus needing to sell at a premium to make up the loss).
- Regulations which make market entry expensive (and favor the market leader by requiring bio-equivalence studies) which promote monopolies.
- Need for expensive (time, money & training-capacity) general certifications for people to be allowed to administer narrow treatments.
I don't have any domain-knowledge or analysis to illustrate my point, but I am curious to what extent you've (or someone else... (read more)
>though maintenance might suck idk
Yeah, and I'm guessing very expensive. If something is being given away for cheap/free the true market value of the good is likely negative. It probably makes sense to think more about that bit before concluding that obtaining a castle is a good idea.
This to me seems to be akin to "sponge-alignment" IE not building a powerful AI.
We understand personas because they are simulating human behavior which we understand. But that human behavior is mostly limited to human capabilities (expect for maybe speed-up possibilities).
Building truly powerful AI's will probably involve systems that do something different than human brains, or at-least do not grow with human biases for learning, which causes them to learn the human behaviors we are familiar with.
If the "power" of the AI comes through something else than the persona, then trusting the persona won't do you much good.
I do believe that if Altman does manage to create his superAI's, the first such eats Altman and makes squiggles. But if I were to engage in the hypothetical where nice corrigible superassistants are just magically created, Altman does not appear to treat this future he claims to be steering towards seriously.
The world where "everyone has a superassitant" is inherently incredibly volatile/unstable/dangerous due to an incredibly large offence-defence assymetry of superassistants attacking fragile-fleshbags (with optimized viruses, bacteria, molecules, nanobots etcetc) or hijacking fragile minds with supermemes.
Avoiding this kind of outcome to me seems difficult. Nonsystematic "patches" are always workaroundable.
If openAI's superassistant refuses your request to destroy the world, use it to build... (read more)
(That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn't come across from the post.)
Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)
"Broad technical knowledge" should be in some sense the "easiest" (not in terms of time-investment, but in terms of predictable outcomes), by... (read more)
(warning: armchair evolutionary biology)
Another consideration for orca intelligence; they dodge the fermi paradox by not having arms.
Assume the main driver of genetic selection for intelligence is the social arms-race. As soon as a species gets intelligent enough (see humans) from this arms-race they start using their intelligence for manipulating the environment, and start civilization. But orcas mostly lack the external organs for manipulating the enviroment, so they can keep social-arms-racing-boosting-intelligence way past the point of "criticality".
This should be checkable, IE how long have orcas (or orca-forefathers) been socially-arms-racing? I tried asking claude to no avail, and I lack the domain knowledge to quickly look it up myself. Perhaps one could also check genetic change over time, perhaps social arms race is something you can see in this data? Do we know what this looks like in humans and orcas?
"As a result, we can make progress toward automating interpretability research by coming up with experimental setups that allow AIs to iterate."
This sounds exactly like the kind of progress which is needed in order to get closer to game-over-AGI. Applying current methods of automation to alignment seems fine, but if you are trying to push the frontier of what intellectual progress can be achieved using AI's, I fail to see your comparative advantage relative to pure capabilities researchers.
I do buy that there might be credit to the idea of developing the infrastructure/ability to be able to do a lot of automated alignment research, which gets cached out when we are very close to game-over-AGI, even if it comes at the cost of pushing the frontier some.
this work was done by Tamsin Leake and Julia Persson at Orthogonal.
thanks to mesaoptimizer for his help putting together this post.
what does the QACI plan for formal-goal alignment actually look like when formalized as math? in this post, we'll be presenting our current formalization, which we believe has most critical details filled in.
in this first part, we'll be defining a collection of mathematical constructs which we'll be using in the rest of the post.
we'll be assuming basic set theory notation; in particular, is the set of tuples whose elements are respectively members of the sets , , and , and for , is the set... (read 3770 more words →)
Extracting and playing with "evil" features seem like literally of the worst and most irresponsible things you could be doing when working on AI-related things. I don't care if it leads to a good method or whatever its too close to really bad things. They claim to be adding an evil vector temporarily during fine tuning. It would not suprise me if you end up being one code line away from accidentally adding your evil vector to your AI during deployment or something. Or what if your AI ends up going rogue and breaking out of containment during this period?
Responsible AI development involves among other things having zero evil vectors stored in your data&code-base.
Related https://arbital.greaterwrong.com/p/hyperexistential_separation