All of David Spies's Comments + Replies

Answer by David Spies50

In answer to the question of how can something be true, but not provable I want to point to the Goldbach conjecture which says "every even number > 2 is the sum of two primes". If the Goldbach conjecture is false then there's a counterexample which can be checked in finite time (eg. just try adding all pairs of primes less than that number although there are faster ways). If there isn't a counterexample then the Goldbach conjecture is true. To be provable however, there would have to exist a proof of the Goldbach conjecture. No such proof is known to ex

... (read more)

AI Safety, Anthropomorphizing, and Action Spaces

  • There's an implicit argument about super-intelligent AI capabilities that I think needs to be stated explicitly:
    • A super-intelligent AI with access to the real world via whatever channels is going to be smarter than me. Therefore anything I can conceive of doing to satisfy a particular objective (via those same channels), the AI can also conceive of doing. Therefore when producing examples of how things might go bad, I'm allowed to imagine the AI doing anything a human might conceive of. Since I'm only hum
... (read more)
1Pattern
At a guess AlphaGo doesn't because it isn't an agent. Which just passes the buck to why isn't it an agent, so at a guess it's a partial agent. What this means is kind of like, it's a good sport - it's not going to try to spell out death threats. (Though this seems more to do with it a) it not knowing language - imagine trying to spell out threats to aliens you've never seen on a Go board, when a1) you don't have a language, a2) the aliens don't know your language, and b): ) Though it was trained via simulation/watching pro games (depending on the version). If you just trained such a program on a database where that was a strategy, maybe you'd get something that would. Additionally, AI has a track record of also being (what some might call) a bad sport - using "cheats" and the like. It's kind of about the action space and the training I'd guess. Basically, if you're looking for an AI to come up with new ways of being evil, maybe it needs a head start - once a bot understands that some patterns spelled out on the board will work well against a certain type of opponent*, maybe it'll try to find patterns that do that. Maybe it's an "architecture" issue, not a training issue - Monte Carlo Tree Search might be well suited to beating Go, but not to finding ways to spell out death threats on a Go board in the middle of a game. (I also don't think that's a good strategy a priori.) *You could test how different ways of training turn out if you add a way to cheat/cheatcodes - like if you spell out "I WIN" or one swear word** you win. **I imagine trying to go all the way to threats immediately (inside the game Go) isn't going to go very fast, so you have to start small.