All of JWJohnston's Comments + Replies

Sure. Getting appropriate new laws enacted is an important element. From the paper:

Initially, in addition to adopting existing bodies of law to implement AISVL, existing processes for how laws are drafted, enacted, enforced, litigated, and maintained would be preserved.

Thereafter, new laws and improvements to existing laws and processes must continually be introduced to make the systems more robust, fair, nimble, efficient, consistent, understandable, accepted, complied with, and enforced.

I'd say the EU AI Act (and similar) work addresses the "new laws" im... (read more)

If an AI can be Aligned externally, then it's already safe enough. It feels like...

  • You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is.
  • For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.

I'm talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.

By foc... (read more)

1Q Home
Maybe you should edit the post to add something like this: ... I think the key problems are not "addressed", you just assume they won't exist. And laws are not a "practical implementation of CEV".

I'm not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I've met some of them without trying. 

Glad to hear it. I hope to find and follow such work. The people I'm aware of are listed on pp. 3-5 of the paper. Was happy to see O'Keefe, Bai et al. (Anthropic), and Nay leaning this way.

It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very "light"

... (read more)
2abramdemski
Would you count all the people who worked on the EU AI act?

I believe you have to argue two things:

  • Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).

I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).

My argument goes in a different direction. I reject premise (1) and claim there is an "essential equivalence and intimate link betw... (read more)

1Q Home
Maybe there's a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It's impossible to reject premise (1) without losing the proposal's meaning. Premise (1) is possible to reject only if you're not solving Alignment but solving some other problem. If an AI can be Aligned externally, then it's already safe enough. It feels like... * You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is. * For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.

I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?

The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)

The first part is just "regulation". The second part, "instilling law-abiding values in AIs and humans", seems like a

... (read more)
3abramdemski
fwiw, I did skim the doc, very briefly. In that case, I agree with Seth Herd that this approach is not being neglected. Of course it could be done better. I'm not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I've met some of them without trying.  I think this underestimates the difficulty of self-driving cars. In the application of self-driving airplanes (on runways, not in the air), it is indeed possible to make an adequate model of the environment, such that neural networks can be verified to follow a formally specified set of regulations (and self-correct from undesired states to desired states). With self-driving cars, the environment is far too complex to formally model in that way. You get to a point where you are trusting one AI model (of the complex environment) to verify another. And you can't explore the whole space effectively, so you still can't provide really strong guarantees (and this translates to errors in practice). It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very "light" in the face of those "heavy" concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!

I submit that current legal systems (or something close) will apply to AIs. And there will be lots more laws written to apply to AI-related matters.

It seems to me current laws already protect against rampant paperclip production. How could an AI fill the universe with paperclips without violating all kinds of property rights, probably prohibitions against mass murder (assuming it kills lots of humans as a side effect), financial and other fraud to aquire enough resources, etc. I see it now: some DA will serve a 25,000 count indictment. That AI will be in B... (read more)

0lessdazed
After reading that line I checked the date of the post to see if perhaps it was from 2007 or earlier.
2Costanza
I have no idea myself, but if I had the power to exponentially increase my intelligence beyond that of any human, I bet I could figure something out. The law has some quirks. I'd suggest that any system of human law necessarily has some ambiguities, confusions and, internal contradictions. Laws are composed largely of leaky generalizations. When the laws regulate mere humans, we tend to get by, tolerating a certain amount of unfairness and injustice. For example, I've seen a plausible argument that "there is a 50-square-mile swath of Idaho in which one can commit felonies with impunity. This is because of the intersection of a poorly drafted statute with a clear but neglected constitutional provision: the Sixth Amendment's Vicinage Clause." There's also a story about Kurt Gödel nearly blowing his U.S. citizenship hearing by offering his thoughts on how to hack the U.S. Constitution to "allow the U.S. to be turned into a dictatorship."

I guess here I'd reiterate this point from my latest reply to orthonormal:

Again, it's not only about having lots of rules. More importantly it's about the checks and balances and enforcement the system provides.

It may not be helpful to think of some grand utility-maximising AI that constantly strives to maximize human happiness or some other similar goals, and can cause us to wake up in some alternate reality some day. It would be nice to have some AIs working on how to maximize some things human's value, e.g., health, happiness, attractive and sensibl... (read more)

For this reason, giving an AI simple goals but complicated restrictions seems incredibly unsafe, which is why SIAI's approach is figuring out the correct complicated goals.

Tackling FAI by figuring out complicated goals doesn't sound like a good program to me, but I'd need to dig into more background on it. I'm currently disposed to prefer "complicated restrictions," or more specifically this codified ethics/law approach.

In your example of a stamp collector run amok, I'd say it's fine to give an agent the goal of maximizing the number of stamps... (read more)

0lessdazed
Can you think of an instance where defeat of one's enemies was more than an instrumental goal and was an ultimate goal?
5Costanza
When they work well, human legal systems work because they are applied only to govern humans. Dealing with humans and predicting human behavior is something that humans are pretty good at. We expect humans to have a pretty familiar set of vices and virtues. Human legal systems are good enough for humans, but simply are not made for any really alien kind of intelligence. Our systems of checks and balances are set up to fight greed and corruption, not a disinterested will to fill the universe with paperclips.

We would probably start with current legal systems and remove outdated laws, clarify the ill-defined, and enact a bunch of new ones. And our (hyper-)rational AI legislators, lawyers, and judges should not be disposed to game the system. AI and other emerging technologies should both enable and require such improvements.

The laws might be appropriately viewed primarily as blocks that keep the AI from taking actions deemed unacceptable by the collective. AIs could pursue whatever goals they sees fit within the constraints of the law.

However, the laws wouldn't be all prohibitions. The "general laws" would be more prescriptive, e.g., life, liberty, justice for all. The "specific laws" would tend to be more prohibition oriented. Presumably the vast majority of them would be written to handle common situations and important edge cases. If someone suspects th... (read more)

0DavidAgain
I still don't see how laws as barriers could be effective. People are arguing whether it's possible to write highly specific failsafe rules capable of acting as barriers, and the general feeling is that you wouldn't be able to second-guess the AI enough to do that effectively. I'm not sure what replacing these specific laws with a large corpus of laws achieves. On the plus side, you've got a large group of overlapping controls that might cover each others' weaknesses. But they're not specially written with AI in mind and even if they were, small political shifts could lead to loopholes opening. And the number also means that you can't clearly see what's permitted or not: it risks an illusion of safety simply because we find it harder to think of something bad an AI could do that doesn't break any law. Not to mention the fact that a utility-maximising AI would seek to change laws to make them better for humans, so the rules controlling the AI would be a target of their influence.

Thanks for the links. I'll try to make time to check them out more closely.

I had previously skimmed a bunch of lesswrong content and didn't find anything that dissuaded me from the Asimov's Laws++ idea. I was encouraged by the first post in the Metaethics Sequence where Eliezer warns about not "trying to oversimplify human morality into One Great Moral Principle." The law/ethics corpus idea certainly doesn't do that!

RE: your first and final paragraphs: If I had to characterize my thoughts on how AIs will operate, I'd say they're likely to be emin... (read more)

8orthonormal
Motivation? It's not as if most AIs would have a sense that gaming a rule system is "fun", but rather it would be the most efficient way to achieve its goals. Human beings don't usually try to achieve one of their consciously stated goals with maximum efficiency, at any cost, to an unbounded extent. That's because we actually have a fairly complicated subconscious goal system which overrides us when we might do something too dumb in pursuit of our conscious goals. This delicate psychology is not, in fact, the only or the easiest way one could imagine to program an artificial intelligence. Here's a fictional but still useful idea of a simple AI; note that no matter how good it becomes at predicting consequences and at problem-solving, it will not care that the goal it's been given is a "stupid" one when pursued at all costs. To take a less fair example, Lenat's EURISKO was criticized for finding strategies that violated the 'spirit' of the strategy games it played- not because it wanted to be a munchkin, but simply because that was the most efficient way to succeed. If that AI had been in charge of an actual military, giving it the wrong goals might have led to it cleverly figuring out the strategy like killing its own civilians to accomplish a stated objective- not because it was "too dumb", but because its goal system was too simple. For this reason, giving an AI simple goals but complicated restrictions seems incredibly unsafe, which is why SIAI's approach is figuring out the correct complicated goals.

Thanks for the comments. See my response to DavidAgain re: loophole-seeking AIs.

Thanks for the thoughts.

You seem to imply that AIs motivations will be substantially humanlike. Why might AIs be motivated to nobble the courts, control pens, overturn vast segments of law, find loopholes, and engage in other such humanlike gamesmanship? Sounds like malicious programming to me.

They should be designed to treat the law as a fundamental framework to work within, akin to common sense, physical theories, and other knowledge they will accrue and use over the course of their operation.

I was glib in my post suggesting that "before taking acti... (read more)

1DavidAgain
Hmm... interesting ideas. I don't intend to suggest that the AI would have human intentions at all, I think we might be modelling the idea of a failsafe in a different way. I was assuming that the idea was an AI with a separate utility-maximising system, but to also make it follow laws as absolute, inviolable rules, thus stopping unintended consequences from the utility maximisation. In this system, the AI would 'want' to pursue its more general goal and the laws would be blocks. As such, it would find other ways to pursue its goals, including changing the laws themselves. If the corpus of laws instead form part of what the computer is trying to achieve/uphold we face different problems. Firstly, laws are prohibitions and it's not clear how to 'maximise' them beyond simple obedience. Unless it's stopping other people breaking them in a Robocop way. Second, failsafes are needed because even 'maximise human desire satisfaction' can throw up lots of unintended results. An entire corpus of law would be far more unpredictable in its effects as a core programme, and thus require even more failsafes! On a side point, my argument about cause, negligence etc. was not that the computer would fail to understand them, but that as regards a superintelligence, they could easily be either meaningless or over-effective. For an example of the latter, if we allow someone to die, that's criminal negligence. This is designed for walking past drowning people and ignoring them etc. A law-abiding computer might calculate, say, that even with cryonics etc, every life will end in death due to the universe's heat death. It might then sterilise the entire human population to avoid new births, as each birth would necessitate a death. And so on. Obviously this would clash with other laws, but that's part of the problem: every action would involve culpability in some way, due to greater knowledge of consequences.
1JoshuaZ
Many legal systems have all sorts of laws that are vague or even contradictory. Sometimes laws are on the books and are just no longer enforced. Many terms in laws are also ill-defined, sometimes deliberately so. Having an AI try to have almost anything to do with them is a recipe for disaster or comedy (most likely both).
5orthonormal
I understand where you're coming from– indeed, the way you're imagining what an AI would do is fundamentally ingrained in human minds, and it can be quite difficult to notice the strong form of anthropomorphism it entails. Scattered across Less Wrong are the articles that made me recognize and question some relevant background assumptions; the references in Fake Fake Utility Functions (sic) are a good place to begin. EDITED TO ADD: In particular, you need to stop thinking of an AI as acting like either a virtuous human being or a vicious human being, and imagining that we just need to prevent the latter. Any AI that we could program from scratch (as opposed to uploading a human brain) would resemble any human far less in xer thought process than any two humans resemble each other.

This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer's "Creating Friendly AI 1.0" paper and, like you, I didn't find the answer there.

I've wondered if the Field of Law has been considered as a template for a solution to FAI--something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I've started calling it "Asimov's Laws++." Here's a proposal I made on the AGI discussion list in December 2009:

"We all agree that a few simple laws... (read more)

0lessdazed
It seems like an applause light to invoke international law as a solution to almost anything, particularly this problem. What aspect of having rules made in a compromise of politicing makes it less likely to have exploitable loopholes than any other system? Fines? The misdoing we're worried about is seizing power. Fines would require power sufficient to punish an AI after its misdoings, and have nothing to do with programming it not to be harmful. Somehow I do't think the solution to the problem of having powerful AIs that don't care about us (for better or worse) is to teach them Islamic law.

You can't be serious. Human lawyers find massive logical loopholes in the law all the time, and at least their clients aren't capable of immediately taking over the world given the opportunity.

6DavidAgain
The swift genie-like answer: the paperclip maximser would prioritise nobbling the Supreme Court and relevant legislatures. Or just controlling the pen that wrote the laws, if that could be acceptable within the failsafe. More generally, I don't think it work. First, there's a problem of underspecification. Laws require constant interpretation of case law, including a lot of 'common sense' type verdicts. We can't assume AI would read them in the way we do. Second, they rely on key underlying concepts such as 'cause to' and 'negligence' that rely on a reasonable person's expectation. If we ask if a reasonable superintelligent AI knew that some negative/illegal consequences would occur from its act, then the result would nearly always be yes, thus opening it to breaking laws of negligance. I think there are two types of law, neither of which are suitable. Specific laws: e.g. no speeding, no stealing These would mostly not apply, as they ban humans from doing things humans can do and wish to do. Neither would be likely to apply to AI General laws: uphold life, liberty and the pursuit of happiness These aren't failsafes, they're the underlying utlity-maximiser