If you ask GPT4 who's to fault for Emilia Galotti's death, it will give a perfect response. That's because there were likely hundreds of essays on this question in the training material. After all, this is a large language model. The more interesting question is on emergent knowledge: Learning on a meta-level, developing a world view based on the training material. Learning to play chess based on reading about chess. How far are we into this? What does GPT4's world-view look like?

TLDR: It's stuck somewhere in primary school. There is definitely a simple world model inside the language model. However, it's not very reflected, something that comes up in kids at around... (read 505 more words →)

-5

Replying toPredicting GPU performance

hippke3y

Predicting GPU performance

I think the biggest improvement in this report can be made regarding Appendix D. The authors describe that they use "process size rather than transistor size" which is, as they correctly note, a made-up number. What should be used instead is transistor density (transistors per area), which is readily available in much detail for many past nodes, and the most recent "5nm" nodes (see e.g., wikichip).

Replying toPredicting GPU performance

hippke3y

Predicting GPU performance

What about the Landauer limit? We are 3 orders of magnitude from the Landauer limit ( J/op), see my article here on Lesswrong. The authors list several physical limitations, but this one seems to be missing. It may pose the most relevant limit.

Replying toWhat's the longest a sentient observer could survive in the Dark Era?

hippkeSep 15, 2022

What's the longest a sentient observer could survive in the Dark Era?

That's an excellent question, pondered by the brightest minds. The great Freeman Dyson proposed a solution dubbed eternal intelligence (Dyson 1979, Reviews of Modern Physics, Volume 51, Issue 3, July 1979, pp.447-460). Basically, some finite amount of matter=energy is stored. As the universe cools over time, energy costs per computation decrease (logarithmically, but forever). After each cooling time period, one can use some fraction of the remaining energy, which will thus never go to zero, leading to eternal consciousness.

It was later understood that the expansion of the universe is accelerating. If that holds, the concept breaks down, as Dyson admitted. In the far future, any two observers will be separated, making the remaining subjects very lonely.

Replying toWhy the technological singularity by AGI may never happen

hippke4y

Why the technological singularity by AGI may never happen

I think this calculation is invalid. A human is created from a seed worth 700 MB of information, encoded in the form of DNA. This was created in millions of years of evolution, compressing/worth a large (but finite) amount of information (energy). A relevant fraction of hardware and software is encoded in this information. Additional learning is done during 20 years worth 3 MWh. The fractional value of this learning part is unknown.

Replying toWhy the technological singularity by AGI may never happen

hippke4y

Why the technological singularity by AGI may never happen

How can we know that "it is possible to train a 200 IQ equivalent intelligence for at most 3 MW-hr"?

Replying toWhy the technological singularity by AGI may never happen

hippke4y

Why the technological singularity by AGI may never happen

How did von Neumann come close to taking over the world? Perhaps Hitler, but von Neumann?

Replying toWhy the technological singularity by AGI may never happen

hippke4y

Why the technological singularity by AGI may never happen

Sure! I argue that we just don't know whether such a thing as "much more intelligent than humans" can exist. Millions of years of monkey evolution have increased human IQ to the 50-200 range. Perhaps that can go 1000x, perhaps it would level of at 210. The AGI concept makes the assumption that it can go to a big number, which might be wrong.

Why the technological singularity by AGI may never happen

hippke

Artificial general intelligence is often assumed to improve exponentially through recursive self-improvement, resulting in a technological singularity. There are hidden assumptions in this model which should be made explicit so that their probability can be assessed.

Let us assume that:

The Landauer limit holds, meaning that:
- Reversible computations are impractical
- Minimum switching energy is of order $10^{- 21}$ J per operation
- Thus, energy cost at $k T$ of order 1 EUR per $10^{22}$ FLOPs (details)
General intelligence scales sublinear with compute:
- Making a machine calculate the same result in half the time costs more than twice the energy:
  - Parallelization is never perfect (Amdahl's law)
  - Increasing frequency results in a quadratic power increase ( $P \propto f V^{2}$ )
- Similarly, cloning entire agents does not speed up most tasks linearly with the number of agents ("You

hippke5y

A closer look at chess scalings (into the past)

From what I understand about "ELO inflation", it refers to the effect that the Top 100 FIDE players had 2600 ELO in 1970, but 2700 ELO today. It has been argued that simply the level increased, as more very good players entered the field. The ELO number as such should be fair in both eras (after playing infinitely many games...). I don't think that it is an issue for computer chess comparisons. Let me know if you have other data/information!

Replying toBenchmarking an old chess engine on new hardware

hippke5y

Benchmarking an old chess engine on new hardware

I ran the experiment "Rebel 6 vs. Stockfish 13" on Amazon's AWS EC2. I rented a Xeon Platinum 8124M which benched at 18x 1.5 MNodes/s. I launched 18 concurrent single-threaded game sets with 128 MB of RAM for each engine. Again, ponder was of, no books, no tables. Time settings were 40 moves in 60s + 0.6 per move, corresponding to 17.5 MNodes/move. For reference, SF13 benches at ELO 3630 at this setting (entry "64 bit"); Rebel 6.0 got 2415 on a Pentium 90 (SSDF Computer Rating List (01-DEC-1996).txt, 90 kN/move).

The result:

1911 games played
18 draws
No wins for Rebel
All draws when Rebel played white
ELO difference: 941 +- 63

Interpretation:

Starting from 3630 for SF13, that

hippke5y

Benchmarking an old chess engine on new hardware

With a baseline of 10 MNodes/move for SF3, I need to set SF13 to 0.375 MNodes/move for equality. That's a factor of 30. Caveat: I only ran 10 games which turned out equal, and only at 10 MNodes/move for SF3.
Yes: Rebel6 at normal 2021 settings (40 moves in 15 min) can be approximately matched with SF13 at 20 kNodes/move. More precisely: I get parity between Rebel6 (128 MB) and SF13 (128 MB) for 16 MNodes/move vs. 20 kNodes/move (=factor of 800x). On my Intel Core-M 5Y31 (750 kNodes/s), that's 21s vs. 0.026s per move. Note that the figure shows SF8, not SF13.
I was contacted by one person via PM, we are discussing the execution setup. Otherwise, I could do it by the end of July after my vacation.

Benchmarking an old chess engine on new hardware

hippke

I previously explored the performance of a modern chess engine on old hardware (1, 2). Paul Christiano asked for the case of an old engine running on modern hardware. This is the topic of the present post.

State of the art

Through an online search, I found the CCRL Blitz Rating list. It is run on an i7-4770k at 9.2 MNodes/s. The time controls are 2min+1s per move, i.e. 160s per 40 moves, or 4s per move. On the 4770k, that's 36.8 MNodes/move. The current number one on that list is Stockfish 14 at 3745 ELO. The list includes Fritz 5.32 from 1997, but on old hardware (Pentium 90. Over the years, CCRL moved... (read 1269 more words →)

A closer look at chess scalings (into the past)

hippke

Introduction

I had explored measuring AI or hardware overhang in August 2020 using chess. Hardware overhang is when sufficient compute is available, but the algorithms are suboptimal. I examined the strongest chess engine of 2020, Stockfish 8, performing at 3,400 ELO under tournament conditions. When reducing compute to 1997 levels (equivalent to a Pentium-II 300 MHz), its ELO score was still ~3,000. That is an important year: In 1997, the IBM supercomputer "Deep Blue" defeated the world chess champion Gary Kasparov. With Stockfish, no supercomputer would have been required. I estimated that SF8 drops to Kasparov level on a 486-DX4 100 MHz, available already in 1994. To sum it up, the hardware overhang... (read 1176 more words →)

SETI Predictions

hippke

I enjoyed reading through people's views of the AGI Predictions. I know only a little about AI and a little bit more about SETI (Search for Extraterrestrial Intelligence), so I decided to make a similar poll.

The First Contact scenario with extraterrestrial intelligence shares certain similarities with the emergence of AGI. In the following, I ask similar questions to those in the AGI post. In addition, I add questions derived from the SETI literature. These are intended to reflect on AGI from a new perspective.

In SETI, it is debated that First Contact can cause great disturbance and change, positive and negative. After all, civilizations which are capable of contacting us across the oceans of time... (read more)

The next AI winter will be due to energy costs

hippke

Summary: We are 3 orders of magnitude from the Landauer limit (calculations per kWh). After that, progress in AI can not come from throwing more compute at known algorithms. Instead, new methods must be develloped. This may cause another AI winter, where the rate of progress decreases.

Over the last 8 decades, the energy efficiency of computers has improved by 15 orders of magnitude. Chips manufactured in 2020 feature 16 bn transistors on a 100mm² area. The switching energy per transistor is only $3 \times 10^{- 18}$ J (see Figure). This remarkable progress brings us close to the theoretical limit of energy consumption for computations, the Landauer principle: "any logically irreversible manipulation of information, such as the erasure of a bit or the... (read 511 more words →)

How GPT-N will escape from its AI-box

hippke

Scenario: A human user asks GPT-N a plain text question through the usual browser textbox interface. GPT-N answers in the form of text, which contains clever Javascript code. Instead of being displayed, the malicious code triggers a browser exploit and gets executed. Now GPT-N has an executable of its choice running on the client side and can proceed from there. It has left its AI box with only a text-interface, without asking the user to let it out, and possibly without the user even noticing. Perhaps GPT-3 has done this already, while "deliberately" pretending to be stupider than it is?

Measuring hardware overhang

hippke

Measuring hardware overhang

Summary

How can we measure a potential AI or hardware overhang? For the problem of chess, modern algorithms gained two orders of magnitude in compute (or ten years in time) compared to older versions. While it took the supercomputer "Deep Blue" to win over world champion Gary Kasparov in 1997, today's Stockfish program achieves the same ELO level on a 486-DX4-100 MHz from 1994. In contrast, the scaling of neural network chess algorithms to slower hardware is worse (and more difficult to implement) compared to classical algorithms. Similarly, future algorithms will likely be able to better leverage today's hardware by 2-3 orders of magnitude. I would be interested in extending this... (read 1126 more words →)

116

Predictions for GPT-N

hippke

Regarding GPT-3, there is some discussion whether growing the model would transform it into an Oracle AI. I looked into the actual benchmark results (Appendix H in the paper) to see if we can predict something useful from the actual measurements.

Method: The OpenAI team ran a suite of 63 different benchmarks (including sub-types), each for zero/one/few shot. In each scenario, there are 8 model sizes. I looked at how results scale with model size. With only 8 measurements, there is a large associated uncertainty for predictions. Formally, one would test the trend function using a
Bayesian model selection between a linear and (e.g.,) a polynomial. I did this for a few... (read 233 more words →)

LESSWRONG
LW

LESSWRONG
LW

hippke

Measuring hardware overhang

Benchmarking an old chess engine on new hardware

The next AI winter will be due to energy costs

A closer look at chess scalings (into the past)

hippke

Exploring GPT4's world model

Why the technological singularity by AGI may never happen

Benchmarking an old chess engine on new hardware

A closer look at chess scalings (into the past)

SETI Predictions

The next AI winter will be due to energy costs

How GPT-N will escape from its AI-box

hippke

Measuring hardware overhang

Benchmarking an old chess engine on new hardware

The next AI winter will be due to energy costs

A closer look at chess scalings (into the past)

hippke

Exploring GPT4's world model

Why the technological singularity by AGI may never happen

Benchmarking an old chess engine on new hardware

A closer look at chess scalings (into the past)

SETI Predictions

The next AI winter will be due to energy costs

How GPT-N will escape from its AI-box

State of the art

Introduction

Measuring hardware overhang

Summary