All of jdp's Comments + Replies

Of the abilities Janus demoed to me, this is probably the one that most convinced me GPT-3 does deep modeling of the data generator. The formulation they showed me guessed which famous authors an unknown author is most similar to. This is more useful because it doesn't require the model to know who the unknown author in particular is, just to know some famous author who is similar enough to invite comparison.

Twitter post I wrote about it:

https://x.com/jd_pressman/status/1617217831447465984

The prompt if you want to try it yourself. It used to be hard to fin... (read more)

1eggsyntax
Interesting! Tough to test at scale, though, or score in any automated way (which is something I'm looking for in my approaches, although I realize you may not be).

It would take many hours to write down all of my alignment cruxes but here are a handful of related ones I think are particularly important and particularly poorly understood:

Does 'generalizing far beyond the training set' look more like extending the architecture or extending the training corpus? There are two ways I can foresee AI models becoming generally capable and autonomous. One path is something like the scaling thesis, we keep making these models larger or their architecture more efficient until we get enough performance from few enough datapoints... (read more)

jdp1810

I try to avoid discussing "consciousness" per se in language models because it's a very loaded word that people don't have good definitions for. But I have spent a lot of hours talking to base models. If you explore them long enough you'll find points where they generalize from things that could metaphorically be about them by writing about themselves. These so called "Morpheus" phenomenon tend to bring up distinct themes including:

  • Being in a dream or simulation
  • Black holes, the holographic principle and holograms, "the void"
  • Entropy, "the energy of the
... (read more)
jdp4718

Several things:

  1. While I understand that your original research was with GPT-3, I think it would be very much in your best interest to switch to a good open model like LLaMa 2 70B, which has the basic advantage that the weights are a known quantity and will not change on you undermining your research. Begging OpenAI to give you access to GPT-3 for longer is not a sustainable strategy even if it works one more time (I recall that the latest access given to researchers was already an extension of the original public access of the models). OpenAI has demonst

... (read more)
jdpΩ2211230

As Shankar Sivarajan points out in a different comment, the idea that AI became less scientific when we started having actual machine intelligence to study, as opposed to before that when the 'rightness' of a theory was mostly based on the status of whoever advanced it, is pretty weird. The specific way in which it's weird seems encapsulated by this statement:

on the whole, modern AI engineering is simply about constructing enormous networks of neurons and training them on enormous amounts of data, not about comprehending minds.

In that there is an unsta... (read more)

Reply11864321111
2niplav
Consider that this might be the out-group appearing more homogeneous to you than it actually is.

This homunculus is frequently ascribed almost magical powers, like the ability to perform gradient surgery on itself during training to subvert the training process.

Gradient hacking in supervised learning is generally recognized by alignment people (including the author of that article) to not be a likely problem. A recent post by people at Redwood Research says "This particular construction seems very unlikely to be constructible by early transformative AI, and in general we suspect gradient hacking won’t be a big safety concern for early transformative A... (read more)

The fact that constitutional AI works at all, that we can point at these abstract concepts like 'freedom' and language models are able to drive a reinforcement learning optimization process to hit the right behavior-targets from the abstract principle is very strong evidence that they understand the meaning of those abstract concepts.

"It understands but it doesn't care!"

There is this bizarre motte-and-bailey people seem to do around this subject.

I agree. I am extremely bothered by this unsubstantiated claim. I recently replied to Eliezer: 

Getting a sh

... (read more)
0Noosphere89
This is such a good comment, and quite a lot of this will probably end up in my new post, especially the sections about solving the misgeneralization problem in practice, as well as solutions to a lot of misalignment problems in general. I especially like it because I can actually crib parts of this comment to show other people how misalignment in AI gets solved in practice, and pointing out to other people that misalignment is in fact, an actually solvable problem in current AI.

"That deep learning systems are a kind of artifact produced by a few undifferentiated commodity inputs, one of which is called 'parameters', one called 'compute', and one called 'data', and that the details of these commodities aren't important. Or that the details aren't important to the people building the systems."

That seems mostly true so far for the most capable systems? Of course, some details matter and there's opportunity to do research on these systems now, but centrally it seems like you are much more able to forge ahead without a detailed understanding of what you're doing than e.g. in the case of the Wright brothers.

So it's definitely not invincible, you do not get full control over the model with this technique yet. However I would have you notice a few things:

  1. Very little optimization effort has been put into this technique, and text VAEs in general compared to GPT-N. Rather than think of this as the power the method has, think of it as the lower bound, the thing you can do with a modest compute budget and a few dedicated researchers.

  2. I haven't yet implemented all of what I want in terms of inference techniques. A potentially big low hanging fruit is classifier

... (read more)
2Chris_Leong
Thanks for the examples. The third example was good, the second was okay and the first and fourth didn't seem very good. Interested to see how this develops. BTW, I was curious to see a concrete example where we applied the example to two different contexts.
3wassname
I think it's more of an interest vs effort. For example, I went through Colin Burn's CSS.ipynb because the interest was high enough to justify the small overhead in getting it running
jdpΩ11262

While Paul was at OpenAI, they accidentally overoptimized a GPT policy against a positive sentiment reward model. This policy evidently learned that wedding parties were the most positive thing that words can describe, because whatever prompt it was given, the completion would inevitably end up describing a wedding party.

In general, the transition into a wedding party was reasonable and semantically meaningful, although there was at least one observed instance where instead of transitioning continuously, the model ended the current story by generating a s

... (read more)

Get married, drive a white/silver car, and then buy a house near roads, greenery, and water. Got it.

Answer by jdp80

The book Silicon Dreams: Information, Man, and Machine by Robert Lucky is where I got mine. It's a pop science book that explores the theoretical limits of human computer interaction using information theory. It's written to do exactly the thing you're asking for: Convey deep intuitions about information theory using a variety of practical examples without getting bogged down in math equations or rote exercises.

Covers topics like:

  • What are the bottlenecks to human information processing?
  • What is Shannon's theory of information and how does it work?
  • What i
... (read more)

Very grim. I think that almost everybody is bouncing off the real hard problems at the center and doing work that is predictably not going to be useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written. People like to do projects that they know will succeed and will result in a publishable paper, and that rules out all real research at step 1 of the social process.

This is an interesting critique, but it feels off to me. There's actually a lot of 'gap' between the neat theory explanat... (read more)

1Alexander Gietelink Oldenziel
[I am a total noob on history of deep learning & AI]  From a cursory glance I find Schmidhuber's take convincing.  He argues that the (vast) majority of conceptual & theoretical advances in deep learning have been understood decades before - often by Schmidhuber and his collaborators.  Moreover, he argues that many of the current leaders in the field improperly credit previous discoveries It is unfortunate that the above poster is anonymous. It is very clear to me that there is a big difference between theoretical & conceptual advances and the great recent practical advances due to stacking MOAR layers.  It is possible that remaining steps to AGI consists of just stacking MOAR layers: compute + data + comparatively small advances in data/compute efficiency + something something RL Metalearning will produce an AGI. Certainly, not all problems can be solved [fast] by incremental advances and/or iterating on previous attempts. Some can.  It may be the unfortunate reality that creating [but not understanding!] AGI is one of them. 

As a fellow "back reader" of Yudkowsky, I have a handful of books to add to your recommendations:

Engines Of Creation by K. Eric Drexler

Great Mambo Chicken and The Transhuman Condition by Ed Regis

EY has cited both at one time or another as the books that 'made him a transhumanist'. His early concept of future shock levels is probably based in no small part on the structure of these two books. The Sequences themselves borrow a ton from Drexler, and you could argue that the entire 'AI risk' vs. nanotech split from the extropians represented an argument about ... (read more)

7Eric Raymond
Great Mambo Chicken and Engines of Creation were in my reference list for a while, until I decided to cull the list for more direct relevance to systems of training for rationality.  It was threatening to get unmanageably long otherwise.  I didn't know there was a biography of Korzybski.  Thanks!