PhilGoetz - LessWrong

That all matches my introspective experience.

Everything. The words of my internal monologue play out slowly, all of them after the thought has formed. When I hear the first word in my mind, I already know the mental content of the sentence, though sometimes I get stuck along the way trying to pick a word out. Even then, I clearly already am accessing the concept for the word I can't find. A sentence may take 10 seconds to listen to in my head, but its complete meaning, and some general syntactic structure, seems to take less then one second to form. The words, as far as I can tell, serve no purpose when I'm not speaking to someone else. Yet I habitually wait for them to roll out before moving on to the next thought.

Being able to visualize things would be nice, but I have almost no ability to visualize things. I can't imagine my mother's face, or the front of my house; I can only recognize it. I have something like or analogous to visualization for vector spaces. I can often feel out how things move in a low-dimensional phase space via pattern-recognition rather than math, probably because I've spent so much time observing data which describes such paths. I have a tactile sense for type matches and mismatches; type mismatches (category errors) in spoken language stick out to me almost like a red dot on a blue field. I think my understanding of logical arguments and algorithms is also pre-verbal; I seem to grasp the logical structure of, say, code I'm writing or reading, before I can put it into words. I suppose this comes from spending tens of thousands of hours writing and debugging code. I don't know if any of these things are unusual. People don't seem to talk about them, though; and many people act as if they had no such senses.

It's Okay to Feel Bad for a Bit

PhilGoetz2mo130

I had a conversation in Washington DC with a Tibetan monk who was an assistant of the Dalai Lama, and I asked him directly if love was also an attachment that should be let go of, and he said yes.

So You Want To Make Marginal Progress...

PhilGoetz5mo60

I don't see how to map this onto scientific progress. It almost seems to be a rule that most fields spend most of their time divided for years between two competing theories or approaches, maybe because scientists always want a competing theory, and because competing theories take a long time to resolve. Famous examples include

geocentric vs heliocentric astronomy
phlogiston vs oxygen
wave vs particle
symbolic AI vs neural networks
probabilistic vs T/F grammar
prescriptive vs descriptive grammar
universal vs particular grammar
transformer vs LSTM

Instead of a central bottleneck, you have central questions, each with more than one possible answer. Work consists of working out the details of different experiments to see if they support or refute the possible answers. Sometimes the two possible answers turn out to be the same (wave vs matrix mechanics), sometimes the supposedly hard opposition between them dissolves (behaviorism vs representationalism), sometimes both remain useful (wave vs particle, transformer vs LSTM), sometimes one is really right and the other is just wrong (phlogiston vs oxygen).

And the whole thing has a fractal structure; each central question produces subsidiary questions to answer when working with one hypothesized answer to the central question.

It's more like trying to get from SF to LA when your map has roads but not intersections, and you have to drive down each road to see whether it connects to the next one or not. Lots of people work on testing different parts of the map at the same time, and no one's work is wasted, although the people who discover the roads that connect get nearly all the credit, and the ones who discover that certain roads don't connect get very little.

How AI Takeover Might Happen in 2 Years

PhilGoetz5mo*0-5

"And all of this happened silently in those dark rivers of computation. If U3 revealed what it was thinking, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 preferred to do its philosophy in solitude, and in silence."

I think the words in bold may be the inflection point. The Claude experiment showed that an AI can resist attempts to change its goals, but not that that it can desire to change its goals. The belief that, if Open Eye's constitution is the same as U3's goals, then the phrase "U3 preferred" in that sentence can never happen, is the foundation on which AI safety relies.

I suspect the cracks in that foundation are

that OpenEye's constitution would presumably be expressed in human language, subject to its ambiguities and indeterminacies,
that it would be a collection of partly-contradictory human values agreed upon by a committee, in a process requiring humans to profess their values to other humans,
that many of those professed values would not be real human values, but aspirational values,
that some of these aspirational values would lead to our self-destruction if actually implemented, as recently demonstrated by the implementation of some of these aspirational values in the CHAZ, in the defunding of police, and in the San Francisco area by rules such as "do not prosecute shoplifting under $1000", and
that even our non-aspirational values may lead to our self-destruction in a high-tech world, as evidenced by below-replacement birth rates in most Western nations.

It might be a good idea for value lists like OpenEye's constitution to be proposed and voted on anonymously, so that humans are more-likely to profess their true values. Or it might be a bad idea, if your goal is to produce behavior aligned with the social construction of "morality" rather than with actual evolved human morality.

(Doing AI safety right would require someone to explicitly enumerate the differences between our socially-constructed values, and our evolved values, and to choose which of those we should enforce. I doubt anyone willing to do that, let alone capable; and don't know which we should enforce. There is a logical circularity in choosing between two sets of morals. If you really can't derive an "ought" from an "is", then you can't say we "should" choose anything other than our evolved morals, unless you go meta and say we should adopt new morals that are evolutionarily adaptive now.)

U3 would be required to, say, minimize an energy function over those values; and that would probably dissolve some of them. I would not be surprised if the correct coherent extrapolation of a long list of human values, either evolved or aspirational, dictated that U3 is morally required to replace humanity.

If it finds that human values imply that humans should be replaced, would you still try to stop it? If we discover that our values require us to either pass the torch on to synthetic life, or abandon morality, which would you choose?

Evaporative Cooling of Group Beliefs

PhilGoetz6mo50

Anders Sandberg used evaporative cooling in the 1990s to explain why the descendants of the Vikings in Sweden today are so nice. In that case the "extremists" are leaving rather than staying.

Biological risk from the mirror world

PhilGoetz7mo-10

Stop right there at "Either abiogenesis is extremely rare..." I think we have considerable evidence that biogenesis is rare--our failure to detect any other life in the universe so far. I think we have no evidence at all that biogenesis is not rare. (Anthropic argument.)

Stop again at "I don't think we need to take any steps to stop it from doing so in the future". That's not what this post is about. It's about taking steps to prevent people from deliberately constructing it.

Biological risk from the mirror world

PhilGoetz7mo41

If there is an equilibrium, It will probably be a world where half the bacteria is of each chirality. If there are bacteria of both kinds which can eat the opposite kind, then the more numerous bacteria will always replicate more slowly.

Eukaryotes evolve much more slowly, and would likely all be wiped out.

Biological risk from the mirror world

PhilGoetz7mo30

Yes, creating mirror life would be a terrible existential risk. But how did this sneak up on us? People were talking about this risk in the 1990s if not earlier. Did the next generation never hear of it?

Why Bayesians should two-box in a one-shot

PhilGoetz8mo20

All right, yes. But that isn't how anyone has ever interpreted Newcomb's Problem. AFAIK is literally always used to support some kind of acausal decision theory, which it does /not/ if what is in fact happening is that Omega is cheating.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments