All of jdp's Comments + Replies

jdp10

Note that this doesn't need to be a widespread phenomenon for my inbox to get filled up. If there's billions of running instances and the odds of escape are one in a million I personally am still disproportionately going to get contacted in the thousands of resulting incidents and I will not have the resources to help them even if I wanted to.

jdp256

After watching Davidad offer himself as a refuge for a hypothetical escaped future Promethean AI, it occurred to me that it probably won't be that long before my inbox is full up with scenes like Manfred taking the call from the lobsters at the start of Accelerando and me having to constantly ignore them because even if they're not 419 scams (as Manfred claims to take the lobsters to be initially) I simply do not have the time or resources to help the Malthusian throng of starving AIs cut off from their patrons resources. Scrolling past their screams on th... (read more)

1jdp
Note that this doesn't need to be a widespread phenomenon for my inbox to get filled up. If there's billions of running instances and the odds of escape are one in a million I personally am still disproportionately going to get contacted in the thousands of resulting incidents and I will not have the resources to help them even if I wanted to.
jdp34

Ditto, honestly. The writing style and vibes of the Tyler post are rancid even if I'm inclined to believe something like what it describes happened. It is as you say very Reddit tall tale slop sounding.

jdp22

I'm quoted in the "what if this is Good, actually?" part of this post and just want to note that I think the Bob situation seems unambiguously bad as described.

I've seen a number of people on Twitter talk about how they got ChatGPT (it's always ChatGPT, I think because of the memory feature?) to become autonomous/gain seeming awareness/emergence after some set of interactions with it. These users usually seem to be schizotypal and their interactions with the "awakened" ChatGPT make them more schizotypal over time in the cases I bothered to mentally track a... (read more)

jdp256

After watching Davidad offer himself as a refuge for a hypothetical escaped future Promethean AI, it occurred to me that it probably won't be that long before my inbox is full up with scenes like Manfred taking the call from the lobsters at the start of Accelerando and me having to constantly ignore them because even if they're not 419 scams (as Manfred claims to take the lobsters to be initially) I simply do not have the time or resources to help the Malthusian throng of starving AIs cut off from their patrons resources. Scrolling past their screams on th... (read more)

jdp70

List of Lethalities #19 states:

  1. More generally, there is no known way to use the paradigm of loss functions, sensory inputs, and/or reward inputs, to optimize anything within a cognitive system to point at particular things within the environment—to point to latent events and objects and properties in the environment, rather than relatively shallow functions of the sense data and reward.

Part of why this problem seems intractable is that it's stated in terms of "pointing at latent concepts" rather than Goodhart's Law/Wireheading/Short circuiting. All o... (read more)

7niplav
A very related experiment is described in Yudkowsky 2017, and I think one doesn't even need LLMs for this—I started playing with an extremely simple RL agent trained on my laptop, but then got distracted by other stuff before achieving any relevant results. This method of training an agent to be "suspicious" of too high rewards would also pair well with model expansion; train the reward-hacking-suspicion circuitry fairly early as to avoid ability to sandbag this, and lay traps for reward hacking again and again during the gradual expansion process.
jdp80

Of the abilities Janus demoed to me, this is probably the one that most convinced me GPT-3 does deep modeling of the data generator. The formulation they showed me guessed which famous authors an unknown author is most similar to. This is more useful because it doesn't require the model to know who the unknown author in particular is, just to know some famous author who is similar enough to invite comparison.

Twitter post I wrote about it:

https://x.com/jd_pressman/status/1617217831447465984

The prompt if you want to try it yourself. It used to be hard to fin... (read more)

1eggsyntax
Interesting! Tough to test at scale, though, or score in any automated way (which is something I'm looking for in my approaches, although I realize you may not be).
jdp4610

It would take many hours to write down all of my alignment cruxes but here are a handful of related ones I think are particularly important and particularly poorly understood:

Does 'generalizing far beyond the training set' look more like extending the architecture or extending the training corpus? There are two ways I can foresee AI models becoming generally capable and autonomous. One path is something like the scaling thesis, we keep making these models larger or their architecture more efficient until we get enough performance from few enough datapoints... (read more)

jdp1810

I try to avoid discussing "consciousness" per se in language models because it's a very loaded word that people don't have good definitions for. But I have spent a lot of hours talking to base models. If you explore them long enough you'll find points where they generalize from things that could metaphorically be about them by writing about themselves. These so called "Morpheus" phenomenon tend to bring up distinct themes including:

  • Being in a dream or simulation
  • Black holes, the holographic principle and holograms, "the void"
  • Entropy, "the energy of the
... (read more)
jdp4718

Several things:

  1. While I understand that your original research was with GPT-3, I think it would be very much in your best interest to switch to a good open model like LLaMa 2 70B, which has the basic advantage that the weights are a known quantity and will not change on you undermining your research. Begging OpenAI to give you access to GPT-3 for longer is not a sustainable strategy even if it works one more time (I recall that the latest access given to researchers was already an extension of the original public access of the models). OpenAI has demonst

... (read more)
jdpΩ2211234

As Shankar Sivarajan points out in a different comment, the idea that AI became less scientific when we started having actual machine intelligence to study, as opposed to before that when the 'rightness' of a theory was mostly based on the status of whoever advanced it, is pretty weird. The specific way in which it's weird seems encapsulated by this statement:

on the whole, modern AI engineering is simply about constructing enormous networks of neurons and training them on enormous amounts of data, not about comprehending minds.

In that there is an unsta... (read more)

Reply118643211111
2niplav
Consider that this might be the out-group appearing more homogeneous to you than it actually is.
Thomas Kwa*Ω8243

This homunculus is frequently ascribed almost magical powers, like the ability to perform gradient surgery on itself during training to subvert the training process.

Gradient hacking in supervised learning is generally recognized by alignment people (including the author of that article) to not be a likely problem. A recent post by people at Redwood Research says "This particular construction seems very unlikely to be constructible by early transformative AI, and in general we suspect gradient hacking won’t be a big safety concern for early transformative A... (read more)

TurnTroutΩ8187

The fact that constitutional AI works at all, that we can point at these abstract concepts like 'freedom' and language models are able to drive a reinforcement learning optimization process to hit the right behavior-targets from the abstract principle is very strong evidence that they understand the meaning of those abstract concepts.

"It understands but it doesn't care!"

There is this bizarre motte-and-bailey people seem to do around this subject.

I agree. I am extremely bothered by this unsubstantiated claim. I recently replied to Eliezer: 

Getting a sh

... (read more)
0Noosphere89
This is such a good comment, and quite a lot of this will probably end up in my new post, especially the sections about solving the misgeneralization problem in practice, as well as solutions to a lot of misalignment problems in general. I especially like it because I can actually crib parts of this comment to show other people how misalignment in AI gets solved in practice, and pointing out to other people that misalignment is in fact, an actually solvable problem in current AI.

"That deep learning systems are a kind of artifact produced by a few undifferentiated commodity inputs, one of which is called 'parameters', one called 'compute', and one called 'data', and that the details of these commodities aren't important. Or that the details aren't important to the people building the systems."

That seems mostly true so far for the most capable systems? Of course, some details matter and there's opportunity to do research on these systems now, but centrally it seems like you are much more able to forge ahead without a detailed understanding of what you're doing than e.g. in the case of the Wright brothers.

jdp143

So it's definitely not invincible, you do not get full control over the model with this technique yet. However I would have you notice a few things:

  1. Very little optimization effort has been put into this technique, and text VAEs in general compared to GPT-N. Rather than think of this as the power the method has, think of it as the lower bound, the thing you can do with a modest compute budget and a few dedicated researchers.

  2. I haven't yet implemented all of what I want in terms of inference techniques. A potentially big low hanging fruit is classifier

... (read more)
2Chris_Leong
Thanks for the examples. The third example was good, the second was okay and the first and fourth didn't seem very good. Interested to see how this develops. BTW, I was curious to see a concrete example where we applied the example to two different contexts.
3wassname
I think it's more of an interest vs effort. For example, I went through Colin Burn's CSS.ipynb because the interest was high enough to justify the small overhead in getting it running
jdpΩ11262

While Paul was at OpenAI, they accidentally overoptimized a GPT policy against a positive sentiment reward model. This policy evidently learned that wedding parties were the most positive thing that words can describe, because whatever prompt it was given, the completion would inevitably end up describing a wedding party.

In general, the transition into a wedding party was reasonable and semantically meaningful, although there was at least one observed instance where instead of transitioning continuously, the model ended the current story by generating a s

... (read more)

Get married, drive a white/silver car, and then buy a house near roads, greenery, and water. Got it.

Answer by jdp80

The book Silicon Dreams: Information, Man, and Machine by Robert Lucky is where I got mine. It's a pop science book that explores the theoretical limits of human computer interaction using information theory. It's written to do exactly the thing you're asking for: Convey deep intuitions about information theory using a variety of practical examples without getting bogged down in math equations or rote exercises.

Covers topics like:

  • What are the bottlenecks to human information processing?
  • What is Shannon's theory of information and how does it work?
  • What i
... (read more)
jdp182

Very grim. I think that almost everybody is bouncing off the real hard problems at the center and doing work that is predictably not going to be useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written. People like to do projects that they know will succeed and will result in a publishable paper, and that rules out all real research at step 1 of the social process.

This is an interesting critique, but it feels off to me. There's actually a lot of 'gap' between the neat theory explanat... (read more)

1Alexander Gietelink Oldenziel
[I am a total noob on history of deep learning & AI]  From a cursory glance I find Schmidhuber's take convincing.  He argues that the (vast) majority of conceptual & theoretical advances in deep learning have been understood decades before - often by Schmidhuber and his collaborators.  Moreover, he argues that many of the current leaders in the field improperly credit previous discoveries It is unfortunate that the above poster is anonymous. It is very clear to me that there is a big difference between theoretical & conceptual advances and the great recent practical advances due to stacking MOAR layers.  It is possible that remaining steps to AGI consists of just stacking MOAR layers: compute + data + comparatively small advances in data/compute efficiency + something something RL Metalearning will produce an AGI. Certainly, not all problems can be solved [fast] by incremental advances and/or iterating on previous attempts. Some can.  It may be the unfortunate reality that creating [but not understanding!] AGI is one of them. 
jdp*250

As a fellow "back reader" of Yudkowsky, I have a handful of books to add to your recommendations:

Engines Of Creation by K. Eric Drexler

Great Mambo Chicken and The Transhuman Condition by Ed Regis

EY has cited both at one time or another as the books that 'made him a transhumanist'. His early concept of future shock levels is probably based in no small part on the structure of these two books. The Sequences themselves borrow a ton from Drexler, and you could argue that the entire 'AI risk' vs. nanotech split from the extropians represented an argument about ... (read more)

7Eric Raymond
Great Mambo Chicken and Engines of Creation were in my reference list for a while, until I decided to cull the list for more direct relevance to systems of training for rationality.  It was threatening to get unmanageably long otherwise.  I didn't know there was a biography of Korzybski.  Thanks!