Oh come on, Eliezer. These strategies aren't that alien.
I remember a time in my early years, feeling apprehensive about entering adolescence and inevitably transforming into a stereotypical rebellious teenager. It would have been not only boring and cliche but also an affront to every good thing I thought about myself. I didn't want to become a rebellious teenager, and so I decided, before I was overwhelmed with teenage hormones, that I wouldn't become one. And it turns out that intentional steering of one's self-narrative can (sometimes) be quite effective (constrained by what's physically possible, of course)! (Not saying that I couldn't have done with a bit more epistemological rebellion in my youth.)
The... (read more)
I wonder if you could do something similar with all peer-reviewed scientific publications, summarizing all findings into an encyclopedia of all scientific knowledge. Basically, each article in the wiki would be a review article on a particular topic. The AI would have to track newly published results, determine which existing topics in the encyclopedia they relate to or whether creating a new article is warranted, and update the relevant articles with the new findings.
Given how much science content humanity has accumulated, you'd probably have to have the AI organize scientific topics in a tree, with parent articles summarizing topics at a higher level of abstraction and child articles digging into narrower scopes... (read more)
The same thing happens with my daughters (all under 6). Get them to start talking about poop, and it's like a switch has been flipped. Their behavior becomes deliberately misaligned with parental objectives until we find a way to snap them out of that loop.
Hey, at least humanity lasted longer than 18 minutes.
So is Agent Foundations primarily about understanding the nature of agency so we can detect it and/or control it in artificial models, or does it also include the concept of equipping AI with the means of detecting and predictively modeling agency in other systems? Because I strongly suspect the latter will be crucial in solving the alignment problem.
The best definition I have at the moment sees agents as systems that actively maintain their internal state within a bounded range of viability in the face of environmental perturbations (which would apply to all living systems) and that can form internal representations of arbitrary goal states and use those representations to reinforce and adjust... (read more)
But then again, what are human minds but bags of heuristics themselves? And AI can evolve orders of magnitude faster than we can. Handing over the keys to its own bootstrapping will only accelerate it further.
If the future trajectory to AGI is just "systems of LLMs glued together with some fancy heuristics", then maybe a plateau in Transformer capabilities will keep things relatively gradual. But I suspect that we are just a paradigm shift or two away from a Generalized Theory of Intelligence. Just figure out how to do predictive coding of abitrary systems, combine it with narrative programming and continual learning, and away we go! Or something like that.
Generalizing a bit, I wonder how hard a misaligned ASI would have to work to get every human to voluntarily poison themselves.
It's how recursive self-improvement starts out.
First, the global "AI models + human development teams" system improves through iterative development and evaluation. Then the AI models take on more responsibilities in terms of ideation, process streamlining, and architecture optimization. And finally, an AI agent groks enough of the process to take on all responsibilities, and the intelligence explosion takes off from there.
You'd think someone would try to use AI to automate the production and distribution of necessities to drive the cost of living down toward zero first, but it seems that was just a dream of naive idealism. Oh well. Still, could someone please get on that?
With respect to the online rationalist community, my main thing to come out of the closet about is that I was a Young-Earth Creationist all the way up until the end of grad school (and even a Young-Universe Creationist up until the middle of undergrad). Not very rational of me to avoid honestly facing mountains of evidence in order to protect sacred beliefs!
With respect to my family and life-long friends, my main thing to come out of the closet about is that I am now a liberal atheist. Not very respectable of me to willfully join the ranks of the enemy!
My main hurdle in exposing myself on the latter front is not... (read more)
I have a lot of ideas, but I often have trouble putting them together in a format that can be easily shared with others. They say that the beginning is a very good place to start, but for many topics into which I've poured a lot of thought, it's very difficult to identify where the beginning is. On the other hand, I have a lot of experience with private tutoring and have always found it natural to explain concepts in a way that facilitates clear understanding when I am answering direct questions from someone who is motivated to put together a clear mental model of the topic at hand.
On that note, I... (read more)
If you can get access to the book, try reading The Intelligent Movement Machine. Basically, motor cortex is not so much about stimulating the contraction of certain muscles, but it's instead encoding the end-configuration to move the body towards (e.g., motor neurons in monkey motor cortex that encode the act of bringing the hand to the mouth, not matter the starting position of the arm). How the muscles actually achieve this is then more a matter of model-based control theory than RL-trained action policy.
It's closely related to end-effector control, where the position, orientation, force, speed, etc. of the movement of the end of a robotic appendage are the focus of optimization, as... (read more)