Jackson Wagner

Engineer working on next-gen satellite navigation at Xona Space Systems. I write about effective-altruist and longtermist topics at nukazaria.substack.com, or you can read about puzzle videogames and other things at jacksonw.xyz

Wikitag Contributions

Comments

Sorted by

Semi-related: if I'm reading OpenAI's recent post "How we think about safety and alignment" correctly, they seem to announce that they're planning on implementing some kind of AI Control agenda.  Under the heading "iterative development" in the section "Our Core Principles" they say:

In the future, we may see scenarios where the model risks become unacceptable even relative to benefits. We’ll work hard to figure out how to mitigate those risks so that the benefits of the model can be realized. Along the way, we’ll likely test them in secure, controlled settings. We may deploy into constrained environments, limit to trusted users, or release tools, systems, or technologies developed by the AI rather than the AI itself.

Given the surrounding context in the original post, I think most people would read those sentences as saying something like: "In the future, we might develop AI with a lot of misuse risk, ie AI that can generate compelling propaganda or create cyberattacks.  So we reserve the right to restrict how we deploy our models (ie giving the biology tool only to cancer researchers, not to everyone on the internet)."

But as written, I think OpenAI intends the sentences above to ALSO cover AI control scenarios like: "In the future, we might develop misaligned AIs that are actively scheming against us.  If that happens, we reserve the right to continue to use those models internally, even though we know they're misaligned, while using AI control techniques ('deploy into constrained environments, limit to trusted users', etc) to try and get useful superalignment work out of them anyways."

I don't have a take on the pros/cons of a control agenda, but I haven't seen anyone else note this seeming policy statement of OpenAI's, so I figured I'd write it up.

Maybe this is obvious / already common knowledge, but I noticed that OpenAI's post seems to be embracing an AI control agenda for their future superalignment plans.  The heading "iterative development" in the section "Our Core Principles" says the following (emphasis mine):

It’s an advantage for safety that AI models have been growing in usefulness steadily over the years, making it possible for the world to experience incrementally better capabilities. This allows the world to engage with the technology as it evolves, helping society better understand and adapt to these systems while providing valuable feedback on their risks and benefits. Such iterative deployment helps us understand threats from real world use and guides the research for the next generation of safety measures, systems, and practices.

In the future, we may see scenarios where the model risks become unacceptable even relative to benefits. We’ll work hard to figure out how to mitigate those risks so that the benefits of the model can be realized. Along the way, we’ll likely test them in secure, controlled settings. We may deploy into constrained environments, limit to trusted users, or release tools, systems, or technologies developed by the AI rather than the AI itself. These approaches will require ongoing innovation to balance the need for empirical understanding with the imperative to manage risk. For example, making increasingly capable models widely available by sharing their weights should include considering a reasonable range of ways a malicious party could feasibly modify the model, including by finetuning (see our 2024 statement on open model weights). We continue to develop the Preparedness Framework to help us navigate and react to increasing risks.

I think most people would read those bolded sentences as saying something like: "In the future, we might develop AI with a lot of misuse risk, ie AI that can generate compelling propaganda or create cyberattacks.  So we reserve the right to restrict how we deploy our models (ie giving the biology tool only to cancer researchers, not to everyone on the internet)."

But as written, I think OpenAI intends the sentences above to ALSO cover scenarios like: "In the future, we might develop misaligned AIs that are actively scheming against us.  If that happens, we reserve the right to continue to use those models internally, even though we know they're misaligned, while using AI control techniques ('deploy into constrained environments, limit to trusted users', etc) to try and get useful superalignment work out of them anyways."

I don't have a take on the broader implications of this statement (trying to get useful work out of scheming AIs seems risky but also plausibly doable, and other approaches have their own risks, so idk). But I haven't seen anyone else note this seeming policy statement of OpenAI's, so I figured I'd write it up.

I enjoyed this post, which feels to me part of a cluster of recent posts pointing out that the current LLM architecture is showing some limitations, that future AI capabilities will likely be quite jagged (thus more complementary to human labor, rather than perfectly substituting for labor as a "drop-in remote worker"), and that a variety of skills around memory, long-term planning, agenticness, etc, seem like like important bottlenecks.

(Some other posts in this category include this one about Claude's abysmal Pokemon skills, and the section called "What I suspect AI labs will struggle with in the near term" in this post from Epoch).

Much of this stuff seems right to me.  The jaggedness of AI capabilities, in particular, seems like something that we should've spotted much sooner (indeed, it feels like we could've gotten most of the way just based on first-principles reasoning), but which has been obscured by the use of helpful abstractions like "AGI" / "human level AI", or even more rigorous formulations like "when X% of tasks in the economy have been automated".

I also agree that it's hard to envision AI transforming the world without a more coherent sense of agency / ability to play pokemon / etc, although I'm agnostic over whether we'll be able to imbue LLMs with agency via tinkering with scaffolds and training with RL (as discussed elsewhere in this comment thread).  At least mild versions of agency seem pretty doable with RL -- just train on a bunch of videogames and web-browsing tasks, and I expect AI to get pretty good at completing videogames and web tasks.  But whether that'll scale all the way to being able to manage large software projects and do people's entire jobs autonomously, I dunno.

If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.

...there's an explicit carve-out for ??? consequences if math is solved

This got me curious, so I talked to Claude about it.  Unfortunately it seems like some of the biggest real-world impacts of "solving math" might come in the form of very significant AI algorithmic improvements, which might obviate some of your other points!  (Also: the state of cybersecurity might be thrown into chaos, quant trading would get much more powerful albeit not infinitely powerful, assorted scientific tools could see big improvments.)  Here is my full conversation; for the most interesting bit, scroll down to Claude's final response (ctrl-f for "Category 1: Direct Mathematical Optimization).

Improved personality is indeed a real, important improvement in the models, but (compared to traditional pre-training scaling) it feels like more of a one-off "unhobbling" than something we should expect to continue driving improved performance in the future.  Going from pure next-token-predictors to chatbots with RLHF was a huge boost in usefulness.  Then, going from OpenAI's chatbot personality to Claude's chatbot personality was a noticeable (but much smaller) boost.  But where do we go from here?  I can't really imagine a way for Anthropic to improve Claude's personality by 10x or 100x (whatever that would even mean).  Versus I can imagine scaling RL to improve a reasoning model's math skills by 100x.

Seems like an easy way to create a less-fakeable benchmark would be to evaluate the LLM+scaffolding on multiple different games?  Optimizing for beating Pokemon Red alone would of course be a cheap PR win, so people will try to do it.  But optimizing for beating a wide variety of games would be a much bigger win, since it would probably require the AI to develop some more actually-valuable agentic capabilities.

It will probably be correct to chide people who update on the cheap PR win.  But perhaps the bigger win, which would actually justify such updates, might come soon afterwards!

Yeah -- just like how we are teaching LLMs to do math and coding by doing reinforcement learning on those tasks, it seems like we could just do a ton of RL on assorted videogames (and other agentic tasks, like booking a restaurant reservation online), to create reasoning-style models that have better ability to make and stick to a plan.

In addition to the literal reinforcement learning and gradient descent used for training AI models, there is also the more metaphorical gradient descent process that happens when hundreds of researchers all start tinkering with different scaffolding ideas, training concepts, etc, in the hopes of optimizing a new benchmark.  Now that "speedrun Pokemon Red" has been identified as a plausible benchmark for agency, I expect lots of engineering talent is already thinking about ways to improve performance.  With so much effort going towards solving the problem, I wouldn't be suprised to see the pokemon "benchmark" get "saturated" pretty soon (via performances that exceed most normal humans, and start to approach speedrunner efficiency).  Even though right now Claude 3.7 is hopelessly underpeforming normal humans.

Note that this story has recieved a beautifully-animated video adaptation by Rational Animations!
 

Possibly look for other skills / career paths, besides math and computer science?  Glancing through 80,000 Hours' list:

- AI governance and policy -- I'm guessing that seeking out "policy people" will be a non-starter in Russia, either because it's dangerous or because there are fewer such people (not whole graduating classes at Harvard, etc, waiting to become the next generation of DC elites).

- AI safety technical research -- of course you are already thinking about this via IOM, IOI, etc. Others have mentioned trying to expand to LLM-specific competitions / clubs / etc.  Alternately, consider expanding beyond IOM to more generic super-smart-person competitions, like chess tournaments?

- Biorisk research, strategy, and policy -- I'm guessing that tying HPMOR to any kind of biosecurity message would probably be a bad idea in Russia.  Although HPMOR does have a very strong anti-death message, which might resonate especially well with medical students with aspirations to discover cures for diseases like cancer, alzheimers, the aging process, etc.  So maybe giving it away to high-achieving medical students (with no biosecurity message attached; rather a general triumph-over-death message) could be sort of impactful -- obviously it's unrelated to AI, but perhaps this idea is better than giving it away to random libraries.

- Cybersecurity -- sounds like you're already thinking about this.

- Expert in AI hardware -- it's less clear that this field needs HPMOR-pilled rationalists at the helm, and it's my understanding that Russia's semiconductor industry is far behind the rest of the world.  But idk, maybe there's something worth doing here?

- China-related AI safety and governance paths -- this is policy-related, thus perhaps has the same problems I mentioned earlier about AI governance/policy roles.  But it does seem like Russians might have a natural comparative advantage in the field of "influencing how China thinks about AI", compared to people from contries that China percieves as rivals / enemies.  I'm not sure what kind of competitions / scholarships / fellowships / study-abroad programs you could use to target giving the books -- you'd be looking for technically-minded, ambitious, high-achieving Russian speakers with ties to China or interest in China, and ideally also an interest in AI -- but maybe there's something.  (Go tournaments??)

- Nuclear weapons safety & security -- probably a non-starter in Russia for political reasons

Satellites were also plausibly a very important military technology.  Since the 1960s, some applications have panned out, while others haven't.  Some of the things that have worked out:

  • GPS satellites were designed by the air force in the 1980s for guiding precision weapons like JDAMs, and only later incidentally became integral to the world economy.  They still do a great job guiding JDAMs, powering the style of "precision warfare" that has given the USA a decisive military advantage ever since 1991's first Iraq war.
  • Spy satellites were very important for gathering information on enemy superpowers, tracking army movements and etc.  They were especially good for helping both nations feel more confident that their counterpart was complying with arms agreements about the number of missile silos, etc.  The Cuban Missile Crisis was kicked off by U-2 spy-plane flights photographing partially-assembled missiles in Cuba.  For a while, planes and satellites were both in contention as the most useful spy-photography tool, but eventually even the U-2's successor, the incredible SR-71 blackbird, lost out to the greater utility of spy satellites.
  • Systems for instantly detecting the characteristic gamma-ray flashes of nuclear detonations that go off anywhere in the world (I think such systems are included on GPS satellites), and giving early warning by tracking ballistic missile launches during their boost phase (the Soviet version of this system famously misfired and almost caused a nuclear war in 1983, which was fortunately forestalled by one Lieutenant colonel Stanislav Petrov) are obviously a critical part of nuclear detterence / nuclear war-fighting.

Some of the stuff that hasn't:

  • The air force initially had dreams of sending soldiers into orbit, maybe even operating a military base on the moon, but could never figure out a good use for this.  The Soviets even test-fired a machine-gun built into one of their Salyut space stations: "Due to the potential shaking of the station, in-orbit tests of the weapon with cosmonauts in the station were ruled out.  The gun was fixed to the station in such a way that the only way to aim would have been to change the orientation of the entire station.  Following the last crewed mission to the station, the gun was commanded by the ground to be fired; some sources say it was fired to depletion".
  • Despite some effort in the 1980s, were were unable to figure out how to make "Star Wars" missile defence systems work anywhere near well enough to defend us against a full-scale nuclear attack.
  • Fortunately we've never found out if in-orbit nuclear weapons, including fractional orbit bombardment weapons, are any use, because they were banned by the Outer Space Treaty. But nowadays maybe Russia is developing a modern space-based nuclear weapon as a tool to destroy satellites in low-earth orbit.

Overall, lots of NASA activities that developed satellite / spacecraft technology seem like they had a dual-use effect advancing various military capabilities.  So it wasn't just the missiles.  Of course, in retrospect, the entire human-spaceflight component of the Apollo program (spacesuits, life support systems, etc) turned out to be pretty useless from a military perspective. But even that wouldn't have been clear at the time!

Maybe other people have a very different image of meditation than I do, such that they imagine it as something much more delusional and hyperreligious? Eg, some religious people do stuff like chanting mantras, or visualizing specific images of Buddhist deities, which indeed seems pretty crazy to me.

But the kind of meditation taught by popular secular sources like Sam Harris's Waking Up app, (or that I talk about in my "Examining The Witness" youtube series about the videogame The Witness), seems to me obviously much closer to basic psychology or rationality techniques than to religious practices. Compare Sam Harris's instructions about paying attention to the contents of one's experiences, to Gendlin's idea of "Circling", or Yudkowsky's concept of "sit down and actually try to think of solutions for five minutes", or the art of "noticing confusion", or the original Feynman essay where he describes holding off on proposing solutions. So it's weird to me when people seem really skeptical of meditation and set a very high burden of proof that they wouldn't apply for other mental habits like, say, CFAR techniques.

I'm not like a meditation fanatic -- personally I don't even meditate these days, although I feel bad about not doing it since it does make my life better. (Just like how I don't exercise much anymore despite exercise making my day go better, and I feel bad about that too...) But once upon a time I just tried it for a few weeks, learned a lot of interesting stuff, etc. I would say I got some mundane life benefits out of it -- some, like exercise or good sleep, that only lasted as long as I kept up the habit. and other benefits were more like mental skills that I've retained to today. I also got some very worthwhile philosophical insights, which I talk about, albeit in a rambly way mixed in with lots of other stuff, in my aforementioned video series. I certainly wouldn't say the philosophical insights were the most important thing in my whole life, or anything like that! But maybe more skilled deeper meditation = bigger insights, hence my agnosticism on whether the more bombastic metitation-related claims are true.

So I think people should just download the Waking Up app and try meditating for like 10 mins a day for 2-3 weeks or whatever-- way less of a time commitment than watching a TV show or playing most videogames -- and see for themselves if it's useful or not, instead of debating.

Anyways. For what it's worth, I googled "billionares who pray". I found this article (https://www.beliefnet.com/entertainment/5-christian-billionaires-you-didnt-know-about.aspx), which ironically also cites Bill Gates, plus the Walton Family and some other conservative CEOs. But IMO, if you read the article you'll notice that only one of them actually mentions a daily practice of prayer. The one that does, Do Won Chang, doesn't credit it for their business success... seems like they're successful and then they just also pray a lot. For the rest, it's all vaguer stuff about how their religion gives them a general moral foundation of knowing what's right and wrong, or how God inspires them to give back to their local community, or whatever.

So, personally I'd consider this duel of first-page-google-results to be a win for meditation versus prayer, since the meditators are describing a more direct relationship between scheduling time to regularly meditate and the assorted benefits they say it brings, while the prayer people are more describing how they think it's valuable to be christian in an overall cultural sense. Although I'm sure with more effort you could find lots of assorted conservatives claiming that prayer specifically helps them with their business in some concrete way. (I'm sure there are many people who "pray" in ways that resemble meditation, or resemble Yudkowsky's sitting-down-and-trying-to-think-of-solutions-for-five-minutes-by-the-clock, and find these techniques helpful!)

IMO, probably more convincing than dueling dubious claims of business titans, is testimony from rationalist-community members who write in detail about their experiences and reasoning. Alexey Guzey's post here is interesting, as he's swung from being vocally anti-meditation, to being way more into it than I ever was. He seems to still generally have his head on straight (ie hasn't become a religious fanatic or something), and says that meditation seems to have been helpful for him in terms of getting more things done: https://guzey.com/2022-lessons/

Load More