LESSWRONG
Petrov Day
LW

3561
Vaniver
41336Ω878159714515
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
54More Was Possible: A Review of IABIED
12d
5
62There Should Be More Alignment-Driven Startups
Ω
1y
Ω
14
41On plans for a functional society
2y
8
35Secondary Risk Markets
2y
4
46Vaniver's thoughts on Anthropic's RSP
2y
4
45Truthseeking, EA, Simulacra levels, and other stuff
2y
12
24More or Fewer Fights over Principles and Values?
2y
10
81Long-Term Future Fund: April 2023 grant recommendations
2y
3
67A Social History of Truth
2y
2
32Frontier Model Security
Ω
2y
Ω
1
Load More
Decision Analysis
Why you should eat meat - even if you hate factory farming
Vaniver4d80

Eating a largest possible animal means less amount of suffering per kg.

I think this is the right general trend but the details matter and make it probably not true. I think cow farming is probably more humane than elephant farming or whale farming would be.

Reply
Why you should eat meat - even if you hate factory farming
Vaniver4d101

If you have the ability, have your own hens. It’s a really rewarding experience and then you can know for sure that the hens are happy and treated well. 

Unfortunately, I'm moderately uncertain about this. I think chickens have been put under pretty tremendous selection pressure and their internal experiences might be quite bad, even if their external situations seem fine to us. I'm less worried about this if you pick a heritage breed (which will almost definitely have worse egg production), which you might want to do anyway for decorative reasons.

Similarly, consider ducks (duck eggs are a bit harder to come by than chicken eggs, but Berkeley Bowl stocks them and many duck farms deliver eggs--they're generally eaten by people with allergies to chicken eggs) or ostriches (by similar logic to cows--but given that they lay giant eggs instead of lots of eggs, it's a much less convenient form factor).

Reply
Contra Collier on IABIED
Vaniver8d50

Knowing that a godlike superintelligence with misaligned goals will squish you might be an easy call, but knowing exactly what the state of alignment science will be when ASI is first built is not.

Hmm, I feel more on the Eliezer/Nate side of this one. I think it's a medium call that capabilities science advances faster than alignment science, and so we're not on track without drastic change. (Like, the main counterargument is negative alignment tax, which I do take seriously as a possibility, but I think probably doesn't close the gap.)

Reply
Contra Collier on IABIED
Vaniver8d*144

Overall, I got the strong impression that the book was trying to convince me of a worldview where it doesn't matter how hard we try to come up with methods to control advanced AI systems, because at some point one of those systems will tip over into a level of intelligence where we just can't compete.

FWIW, my sense is that Y&S do believe that alignment is possible in principle. (I do.)

I think the "eventually, we just can't compete" point is correct. Suppose we have some gradualist chain of humans controlling models controlling model advancements, from here out to Dyson spheres. I think it's extremely likely that eventually the human control on top gets phased out, like happened in humans playing chess, where centaurs are worse and make more mistakes than pure AI systems. Thinking otherwise feels like postulating that machines can never be superhuman at legitimacy.[1]

Chapter 10 of the book talks about the space probe / nuclear reactor / computer security angle, and I think a gradualist control approach that takes those three seriously will probably work. I think my core complaint is that I mostly see people using gradualism as an argument that they don't need to  face those engineering challenges, and I expect them to simply fail at difficult challenges they're not attempting to succeed at.

Like, there's this old idea of basins of reflective stability. It's possible to imagine a system that looks at itself and says "I'm perfect, no notes", and then the question is--how many such systems are there? Each is probably surrounded by other systems that look at themselves and say "actually I should change a bit, like so--" and become one of the stable systems, and systems even further out will change to only have one problem, and so on. The choices we're making now at probably not jumping straight to the end, but instead deciding which basin of reflective stability we're in. I mostly don't see people grappling with the endpoint, or trying to figure out the dynamics of the process, and instead just trusting it and hoping that local improvements will eventually translate to global improvements.

  1. ^

    Incidentally, a somewhat formative experience for me was AAAI 2015, when a campaign to stop lethal autonomous weapons was getting off the ground, and at the ethics workshop a representative wanted to establish a principle that computers should never make a life-or-death decision. One of the other attendees objected--he worked on software to allocate donor organs to people on the waitlist, and for them it was a point of pride and important coordination tool that decisions were being made by fair systems instead of corruptible or biased humans.

    Like, imagine someone saying that driving is a series of many life-or-death decisions, and so we shouldn't let computers do it, even as the computers become demonstrably superior to humans. At some point people let the computers do it, and at a later point they tax or prevent the humans from doing it.

Reply
The title is reasonable
Vaniver9d271

this isn't to say this other paradigm will be safer, just that a narrow description of "current techniques" doesn't include the default trajectory.

Sorry, this seems wild to me. If current techniques seem lethal, and future techniques might be worse, then I'm not sure what the point is of pointing out that the future will be different.

But, if these earlier AIs were well aligned (and wise and had reasonable epistemics), I think it's pretty unclear that the situation would go poorly and I'd guess it would go fine because these AIs would themselves develop much better alignment techniques. This is my main disagreement with the book.

I mean, I also believe that if we solve the alignment problem, then we will no longer have an alignment problem, and I predict the same is true of Nate and Eliezer.

Is your current sense that if you and Buck retired, the rest of the AI field would successfully deliver on alignment? Like, I'm trying to figure out whether your sense here is the default is "your research plan succeeds" or "the world without your research plan".

Reply
Contra Collier on IABIED
Vaniver9d1911

I think this is missing the point of the date of AI Takeover is not the day the AI takes over, that the point of no return might appear much earlier than when Skynet decides to launch the nukes. Like, I think the default outcome in a gradualist world is 'Moloch wins', and there's no fire alarm that allows for derailment once it's clear that things are not headed in the right direction.

For example, I don't think it was the case 5 years ago that a lot of stock value was downstream of AI investment, but this is used elsewhere on this very page as an argument against bans on AI development now. Is that consideration going to be better or worse, in five years? I don't think it was obvious five years ago that OpenAI was going to split over disagreements on alignment--but now it has, and I don't see the global 'trial and error' system repairing that wound rather than just rolling with it.

I think the current situation looks bad and just letting it develop without intervention will mean things get worse faster than things get better. 

Reply
Contra Collier on IABIED
Vaniver9d148

I mean, I would describe various Trump tariff plans as "tanking the global economy", I think it was fair to describe Smoot-Hawley as that, and so on.

I think the book makes the argument that expensive things are possible--this is likely cheaper and better than fighting WWII, the comparison they use--and it does seem fair to criticize their plan as expensive. It's just that the alternative is far more expensive.

Reply1
Contra Collier on IABIED
Vaniver9d110

No, it does not develop neuralese. The architecture that it is being trained on is already using neuralese.

You're correct on the object level here, and it's a point against Collier that the statement is incorrect, but I do think it's important to note that a fixed version of the statement serves the same rhetorical purpose. That is, on page 123 it does develop a new mode of thinking, analogized to a different language, which causes the oversight tools to fail and also leads to an increase in capabilities. So Y&S are postulating a sudden jump in capabilities which causes oversight tools to break, in a way that a more continuous story might not have.

I think Y&S still have a good response to the repaired argument. The reason the update was adopted was because it improved capabilities--the scientific mode of reasoning was superior to the mythical mode--but there could nearly as easily have been an update which didn't increase capabilities but scrambled the reasoning in such a way that the oversight system broke. Or the guardrails might have been cutting off too many prospective thoughts, and so the AI lab is performing a "safety test" wherein they relax the guardrails, and a situationally aware Sable generates behavior that looks behaved enough that the relaxation stays in place, and then allows for it to escape when monitored less closely.

This is about making a pretty straightforward and I think kind of inevitable argument that as you are in the domain of neuralese, your representations of concepts will diverge a lot from human concepts, and this makes supervision much harder.

I don't think this is about 'neuralese', I think a basically similar story goes thru for a model that only thinks in English.

What's happening, in my picture, is that meaning is stored in the relationships between objects, and that relationship can change in subtle ways that break oversight schemes. For example, imagine an earnest model which can be kept in line by a humorless overseer. When the model develops a sense of humor / starts to use sarcasm, the humorless overseer might not notice the meaning of the thoughts has changed.

Reply
Contra Collier on IABIED
Vaniver9d60

See also some discussion over here.

Reply
I enjoyed most of IABIED
Vaniver9d30

Do you agree with the "types of misalignment" section of MacAskill's tweet? (Or, I guess, is it 'similar to your position'?)

If not, I think it would be neat to see the two of you have some sort of public dialogue about it.

Reply
Load More
10Vaniver's Shortform
Ω
6y
Ω
49
Sequences
7 months ago
Sequences
7 months ago
April Fool's
5 years ago
(+83)
History of Less Wrong
9 years ago
(+527/-300)
Sequences
11 years ago
Sequences
11 years ago
(+34)
Squiggle Maximizer (formerly "Paperclip maximizer")
12 years ago
(+6/-5)
Special Threads
12 years ago
(+185/-14)
Special Threads
12 years ago
(+42/-46)