AnthonyC

Wikitag Contributions

Comments

Sorted by

Oh, I already completely agree with that. But quite frankly I don't have the skills to contribute to AI development meaningfully in a technical sense, or the right kind of security mindset to think anyone should trust me to work on safety research. And of course, all the actual plans I've seen anyone talk about are full of holes, and many seem to rely on something akin to safety-by-default for at least part of the work, whether they admit it or not. Which I hope ends up not being true, but if someone decides to roll the dice on the future that way, then it's best to try to load the dice at least a little with higher-quality writing on what humans think and want for themselves and the future.

And yeah, I agree you should be worried about this getting so many upvotes, including mine. I sure am. I place this kind of writing under why-the-heck-not-might-as-well. There aren't anywhere near enough people or enough total competence trying to really do anything to make this go well, but there are enough that new people trying more low-risk things is likely to be either irrelevant or net-positive. Plus I can't really imagine ever encountering a plan, even a really good one, where this isn't a valid rejoinder:

Are you confident in the success of this plan? No, that is the wrong question, we are not limited to a single plan. Are you certain that this plan will be enough, that we need essay no others? Asked in such fashion, the question answers itself. The path leading to disaster must be averted along every possible point of intervention.

And that makes perfect sense. I guess I'm just not sure I trust any particular service provider or research team to properly list the full set of things it's important to weight against. Kind of feels like a lighter version of not trusting a list of explicit rules someone claims will make an AI safe.

True, and this does indicate that children produced from genes found in 2 parents will not be outside the range which a hypothetical natural child of theirs could occupy. I am also hopeful that this is what matters, here. 

However, there are absolutely, definitely viable combinations of genes found in a random pair of parents which, if combined in a single individual, result in high-IQ offspring predisposed to any number of physical or mental problems, some of which may not manifest until long after the child is born. In practice, any intervention of the type proposed here seems likely to create many children with specific combinations of genes which we know are individually helpful for specific metrics, but which may not often (or ever) have all co-occurred. This is true even in the cautious, conservative early generations where we stay within the scope of natural human variations. Thereafter, how do we ensure we're not trialing someone on an entire generation at once? I don't want us to end up in a situation where a single mistake ends up causing population-wide problems because we applied it to hundreds of millions of people before the problem manifested.

I definitely want to see more work in this direction, and agree that improving humans is a high-value goal.

But to play devil's advocate for a second on what I see as my big ethical concern: There's a step in the non-human selective breeding or genetic modification comparison where the experimenter watches several generations grow to maturity, evaluates whether their interventions worked in practice, and decides which experimental subjects if any get to survive or reproduce further. What's the plan for this step in humans, since "make the right prediction every time at the embryo stage" isn't a real option? '

Concrete version of that question:  Suppose we implement this as a scalable commercial product and find out that e.g. it causes a horrible new disease, or induces sociopathic or psychopathic criminal tendencies, that manifest at age 30, after millions of parents have used it. What happens next?

I expect that we will probably end up doing something like this, whether it is workable in practice or not, if for no other reason than it seems to be the most common plan anyone in a position to actually implement any plan at all seems to have devised and publicized. I appreciate seeing it laid out in so much detail.

By analogy, it certainly rhymes with the way I use LLMs to answer fuzzy complex questions now. I have a conversation with o3-mini to get all the key background I can into the context window, have it write a prompt to pass the conversation onto o1-pro, repeat until I have o1-pro write a prompt for Deep Research, and then answer Deep Research's clarifying questions before giving it the go ahead. It definitely works better for me than trying to write the Deep Research prompt directly. But, part of the reason it works better is that at each step, the next-higher-capabilities model comes back to ask clarifying questions I hadn't noticed were unspecified variables, and which the previous model also hadn't noticed were unspecified variables. In fact, if I take the same prompt and give it to Deep Research multiple times in different chats, it will come back with somewhat different sets of clarifying questions - it isn't actually set up to track down all the unknown variables it can identify. This reinforces that even for fairly straightforward fuzzy complex questions, there are a lot of unstated assumptions.

If Deep Research can't look at the full previous context and correctly guess what I intended, then it is not plausible that o1-pro or o3-mini could have done so. I have in fact tested this, and the previous models either respond that they don't know the answer, or give an answer that's better than chance but not consistently correct. Now, I get that you're talking about future models and systems with higher capability levels generally, but adding more steps to the chain doesn't actually fix this problem. If any given link can't anticipate the questions and correctly intuit the answer about what the value of the unspecified variables should be - what the answers to the clarifying questions should be - then the plan fails, because the previous model will be worse at this. If it can, then it does not need to ask the previous model in the chain. The final model will either get it right on its own, or else end up with incorrect answers to some of the questions about what it's trying to achieve. It may ask anyway, if the previous models are more compute efficient and still add information. But it doesn't  strictly need them. 

And unfortunately, keeping the human in the loop also doesn't solve this. We very often don't know what we actually want well enough to correctly answer every clarifying question a high-capabilities model could pose. And if we have a set of intervening models approximating and abstracting the real-but-too-hard question into something a human can think about, well, that's a lot of translation steps where some information is lost. I've played that game of telephone among humans often enough to know it only rarely works ("You're not socially empowered to go to the Board with this, but if you put this figure with this title phrased this way in this conversation and give it to your boss with these notes to present to his boss, it'll percolate up through the remaining layers of management").

Is there a capability level where the first model can look at its full corpus of data on humanity and figure out the answers to the clarifying questions from the second model correctly? I expect so. The path to get that model is the one you drew a big red X through in the first figure, for being the harder path. I'm sure there are ways less-capable-than-AGI systems can help us build that model, but I don't think you've told us what they are.

Thanks for writing this. I said a few years ago, at the time just over half seriously, that there could be a lot of value in trying to solve non-AI-related problems even on short timelines, if our actions and writings become a larger part of the data on which AI is trained and through which it comes to understand the world.

That said, this one gives me pause in particular: 

I hope you treat me in ways I would treat you

I think that in the context of non-human minds of any kind, it is especially important to aim for the platinum rule and not the golden. We want to treat them the way they would want to be treated, and vice versa.

I agree with many of the parts of this post. I think xkcd was largely right, our brains have one scale and resize our experiences to fit. I think for a lot of people the hardest step is to just notice what things they actually like, and how much, and in what quantities before they habituate. 

However, the specific substitutions, ascetic choices, etc. are very much going to vary between people, because we have different preferences. You can often get a lot of economic-efficiency-of-pleasure benefit by embracing the places where you prefer things society doesn't, and vice versa. When I look at the places where I have expended time/effort/money on things that provided me little happiness/pleasure/etc., it's usually because they're in some sense status goods, or because I didn't realize I could treat them as optional, or I just hadn't taken the time to actually ask myself what I want.

And I know this isn't the main point, but I would say that while candies and unhealthy snacks are engineered to be as addictive as law and buyers will allow, they're not actually engineered to be maximally tasty. They have intensity of flavor, but generally lack the depth of "real food." It's unfortunate that many of the "healthier" foods that are easily available are less good than this, because it's very feasible to make that baked potato taste better than most store-bought snacks, while still being much healthier. I would estimate that for many of the people don't believe this, it is due to a skill issue - cooking. Sure, sometimes I really want potato chips or french fries. But most of the time, I'd prefer a potato, microwaved, cut in half, and topped with some high-quality butter and a sprinkle of the same seasonings you'd use for the chips and fries.

In the world where AI does put most SWEs out of work or severely curtails their future earnings, how likely is it that the economy stays in a context where USD or other fiat currencies stay valuable, and for how long? At some level we don't normally need to think about, USD has value because the US government demands citizens use that currency to pay taxes, and it has an army and can ruin your life if you refuse. 

I've mentioned it before and am glad to see people exploring the possibilities, but I really get confused whenever I try to think about (absolute or relative) asset prices along the path to AGI/ASI.

The version of this phrase I've most often heard is "Rearranging deck chairs on the Titanic."

Keep in mind that we're now at the stage of "Leading AI labs can raise tens to hundreds of billions of dollars to fund continued development of their technology and infrastructure." AKA in the next couple of years we'll see AI investment comparable to or exceeding the total that has ever been invested in the field. Calendar time is not the primary metric, when effort is scaling this fast.

A lot of that next wave of funding will go to physical infrastructure, but if there is an identified research bottleneck, with a plausible claim to being the major bottleneck to AGI, then what happens next? Especially if it happens just as the not-quite-AGI models make existing SWEs and AI researchers etc. much more productive by gradually automating their more boilerplate tasks. Seems to me like the companies and investors just do the obvious thing and raise the money to hire an army of researchers in every plausibly relevant field (including math, neurobiology, philosophy, and many others) to collaborate. Who cares if most of the effort and money are wasted? The payoff for the fraction (faction?) that succeeds isn't the usual VC target of 10-100x, it's "many multiples of the current total world economy."

Load More