Wiki Contributions

Comments

Nisan1d20

I suppose if you had more hidden states than observables, you could distinguish hidden-state prediction from next-token prediction by the dimension of the fractal.

Nisan1d20

If I understand correctly, the next-token prediction of Mess3 is related to the current-state prediction by a nonsingular linear transformation. So a linear probe showing "the meta-structure of an observer's belief updates over the hidden states of the generating structure" is equivalent to one showing "the structure of the next-token predictions", no?

Nisan16dΩ120

The subject of this post appears in the "Did you know..." section of Wikipedia's front page(archived) right now.

Nisan24dΩ121513

I'm saying "transformers" every time I am tempted to write "LLMs" because many modern LLMs also do image processing, so the term "LLM" is not quite right.

"Transformer"'s not quite right either because you can train a transformer on a narrow task. How about foundation model: "models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks".

Nisan24dΩ9137

I agree 100%. It would be interesting to explore how the term "AGI" has evolved, maybe starting with Goertzel and Pennachin 2007 who define it as:

a software program that can solve a variety of complex problems in a variety of different domains, and that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions

On the other hand, Stuart Russell testified that AGI means

machines that match or exceed human capabilities in every relevant dimension

so the experts seem to disagree. (On the other hand, Stuart & Russell's textbook cite Goertzel and Pennachin 2007 when mentioning AGI. Confusing.)

In any case, I think it's right to say that today's best language models are AGIs for any of these reasons:

  • They're not narrow AIs.
  • They satisfy the important parts of Goertzel and Pennachin's definition.
  • The tasks they can perform are not limited to a "bounded" domain.

In fact, GPT-2 is an AGI.

Nisan1mo82

I'm surprised to see an application of the Banach fixed-point theorem as an example of something that's too implicit from the perspective of a computer scientist. After all, real quantities can only be represented in a computer as a sequence of approximations — and that's exactly what the theorem provides.

I would have expected you to use, say, the Brouwer fixed-point theorem instead, because Brouwer fixed points can't be computed to arbitrary precision in general.

(I come from a mathematical background, fwiw.)

Nisan3mo40

This article saved me some time just now. Thanks!

Nisan3mo20

Scaling temperature up by a factor of 4 scales up all the velocities by a factor of 2 [...] slowing down the playback of a video has the effect of increasing the time between collisions [....]

Oh, good point! But hm, scaling up temperature by 4x should increase velocities by 2x and energy transfer per collision by 4x. And it should increase the rate of collisions per time by 2x. So the rate of energy transfer per time should increase 8x. But that violates Newton's law as well. What am I missing here?

Load More