All of mocny-chlapik's Comments + Replies

One additional maxim to consider is that the AI community in general can only barely conceptualize and operationalize difficult concepts, such as safety. Historically, the AI community was good at maximizing some measure of performance, usually pretty straight forward test set metrics such as classification accuracy. Culturally this is how the community approaches all the problems -- by aggregating complex phenomena into a single number. Note that this approach is not used in that many fields outside of AI and math, as you always have to make some lossy si... (read more)

Hey, I wonder what's your policy on linking blog posts? I have some texts that might be interesting to this community, but I don't really feel like copying everything from HTML here and duplicating the content. At the same time I know that some communities don't like people promoting their content. What are the best practices here?

In general, LM-generated text is still easily distinguishable by other LMs. Even though we humans can not tell the difference, the way they generate text is not really human-like. They are much more predictable, simply because they are not trying to convey information as humans do, they are guessing the most probable sequence of tokens.

Humans are less predictable, because they have always something new to say, LMs on the other hand are like the most cliche person ever.

3justinpombrio
You should try turning the temperature up.

No indication in this context means that:

  1. Our current paradigm is almost depleted. We are hitting the wall with both data (PaLM uses 780B tokens, there are 3T tokens publicly available, additional Ts can be found in closed systems, but that's it) and compute (We will soon hit Landauer's limit so no more exponentially cheaper computation. Current technology is only three orders of magnitude above this limit).
  2. What we currently have is very similar to what we will ultimately be able to achieve with current paradigm. And it is nowhere near AGI. We need to solve
... (read more)
3naasking
I don't think any of the claims you just listed are actually true. I guess we'll see.

There is no indication for many catastrophic scenarios and truthfully I don't worry about any of them.

1Shiroe
What does "no indication" mean in this context? Can you translate that into probability speak?

I don't see any indication of AGI so it does not really worry me at all. The recent scaling research shows that we need non-trivial number of magnitudes more data and compute to match human-level performance on some benchmarks (with a huge caveat that matching a performance on some benchmark might still not produce intelligence). On the other hand, we are all out of data (especially high quality data with some information value, no random product reviews or NSFW subreddit discussions) and our compute options are also not looking that great (Moore's law is ... (read more)

3naasking
Nobody saw any indication of the atomic bomb before it was created. In hindsight would it have been rational to worry? Your claims about the about the compute and data needed and alleged limits remind me of the fact that Heisenberg actually thought there was no reason to worry because he had miscalculated the amount of U-235 that would be needed. It seems humans are doomed to continue repeating this mistake and underestimating the severity of catastrophic long tails.

I believe that fixating on benchmark such as chess etc is ignoring the G part of AGI. Truly intelligent agent should be general at least in the environment he resides in, considering the limitation of its form. E.g. if a robot is physically able to work with everyday object, we might apply Wozniak test and expect that intelligent robot is able to cook a dinner in arbitrary house or do any other task that its form permits.

If we assume that right now we develop purely textual intelligence (without agency, persistent sense of self etc) we might still expect t... (read more)

1Martin Randall
My 8yo is not able to cook dinner in an arbitrary house. Does she have general intelligence?
5naasking
Humans regularly fail at such tasks but I suspect you would still consider humans generally intelligent. In any case, it seems very plausible that whatever decision procedure is behind more general forms of inference, it will very likely fall to the inexorable march of progress we've seen thus far. If it does, the effectiveness of our compute will potentially increase exponentially almost overnight, since you are basically arguing that our current compute is hobbled by an effectively "weak" associative architecture, but that a very powerful architecture is potentially only one trick away. The real possibility that we are only one trick away from a potentially terrifying AGI should worry you more.

It's not goapost moving, it's the hype that's moving. People reduce intelligence to arbitrary skills or problems that are currently being solved and then they are let down when they find out that the skill was actually not a good proxy.

I agree that LMs are concetually more similar to ELIZA than to AGI.

9porby
The observation that things that people used to consider intelligent are now considered easy is critical. The space of stuff remaining that we call intelligent, but AIs cannot yet do, is shrinking. Every time AI eats something, we realize it wasn't even that complicated. The reasonable lesson appears to be: we should stop default-thinking things are hard, and we should start thinking that even stupid approaches might be able to do too much. It's a statement more about the problem being solved, not the problem solver. When you stack this on a familiarity with the techniques in use and how they can be transformatively improved with little effort, that's when you start sweating.

I believe that over time we will understand that producing human-like text is not a sign of intelligence. In the past people believed that only intelligent agents are able to solve math equations (naturally, since only people can do it and animals can). Then came computer and they were able to do all kinds of calculations much faster and without errors. However, from our current point of view we now understand that doing math calculations is not really that intelligent and even really simple machines can do that. Chess playing is similar story, we thought ... (read more)

Chess playing is similar story, we thought that you have to be intelligent, but we found a heuristic to do that really well.

You keep distinguishing "intelligence" from "heuristics", but no one to my knowledge has demonstrated that human intelligence is not itself some set of heuristics. Heuristics are exactly what you'd expect from evolution after all.

So your argument then reduces to a god of the gaps, where we keep discovering some heuristics for an ability that we previously ascribed to intelligence, and the set of capabilities left to "real intelligence... (read more)

4Qumeric
It is goalpost moving. Basically, it says "current models are not really intelligent". I don't think there is much disagreement here. And it's hard to make any predictions based on that. Also, "Producing human-like text" is not well defined here; even ELIZA may match this definition. Even the current SOTA may not match it because the adversarial Turning Test has not yet been passed.

in order for this to occupy any significant probability mass, I need to hear an argument for how our current dumb architectures do as much as they do, and why that does not imply near-term weirdness. Like, "large transformers are performing {this type of computation} and using {this kind of information}, which we can show has {these bounds} which happens to include all the tasks it has been tested on, but which will not include more worrisome capabilities because {something something something}."

What about: State-of-the-art models with 500+B parameters sti... (read more)

4Qumeric
They are simluators (https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators), not question answerers. Also, I am sure Minerva does pretty good on this task, probably not 100% reliable but humans are also not 100% reliable if they are required to answer immediately. If you want the ML model to simulate thinking [better], make it solve this task 1000 times and select the most popular answer (which is a quite popular approach for some models already). I think PaLM would be effectively 100% reliable.
porby2013

As mentioned in the post, that line of argument makes me more alarmed, not less.

  1. We observe these AIs exhibiting soft skills that many people in 2015 would have said were decades away, or maybe even impossible for AI entirely.
  2. We can use these AIs to solve difficult reasoning problems that most humans would do poorly on.
  3. And whatever algorithms this AI is using to go about its reasoning, they're apparently so simple that the AI can execute them while still struggling on absolutely trivial arithmetic.
  4. WHAT?

Yes, the AI has some blatant holes in its capability. B... (read more)

The post starts with the realization that we are actually bottlenecked by data and then proceeds to talk about HW acceleration. Deep learning is in a sense a general paradigm, but so is random search. It is actually quite important to have the necessary scale of both compute and data and right now we are not sure about either of them. Not to mention that it is still not clear whether DL actually leads to anything truly intelligent in a practical sense or whether we will simply have very good token predictors with very limited use.

porby1410

I don't actually think we're bottlenecked by data. Chinchilla represents a change in focus (for current architectures), but I think it's useful to remember what that paper actually told the rest of the field: "hey you can get way better results for way less compute if you do it this way."

I feel like characterizing Chinchilla most directly as a bottleneck would be missing its point. It was a major capability gain, and it tells everyone else how to get even more capability gain. There are some data-related challenges far enough down the implied path, but we ... (read more)

I believe I have read a paper about superhuman performance of LSTM LMs maybe 4 years ago. The fact that LMs are better than humans is not that surprising. With the amount of data they have seen, even relatively simple models are able to precisely calculate the probabilities for individual words. But the comparison to humans does not make much sense here. People are not really doing language modeling in their day-to-day communication. When we speak, we are not predicting what will be our next word, we are communicating ideas and selecting words that will re... (read more)

So many S-curves and paradigms hit an exponential wall and explode, but DL/DRL still have not.

Don't the scaling laws use logarithmic axis? That would suggest that the phenomenon is indeed exponential in it nature. If we need to get X times more compute with X times more data for additional improvements, we will hit the wall quite soon. There is only that much useful text on the Web and only that much compute that labs are willing to spend on  this considering the diminishing returns.

5Canaletto
There is a lot more useful data on YouTube (by several orders of magnitude at least? idk), I think the next wave of such breakthrough models will train on video.

According to current understanding of scaling laws most tasks follow a sigmoid with their performance w.r.t. model size. As we increase model size, we have a slow start followed by a rapid improvement, followed by a slow saturation towards maximum performance. But each task has different shape based on its difficulty. Therefore in some tasks you might be in the rapid improvement phase when you do one comparison and then you might he in saturated phase when you do another. The results you are seeing are to be expected so far. I would visualize absolute performance for each task for a series of models to see how the performance actually behaves.

There is already a sizable amount of research done in this direction, the so called bertology. I believe the methodology that is being developed is useful, but knowing about specific models is probably superfluous. In few months / years we will have new models and anything that you know will not generalize.

You might enjoy reading _The Structure of Scientific Revolutions_. #9 is explicitly discussed there. It is often a case when the old incorrect theory has a lot of work in it and many of the anomalies are explained by additional mechanism, e.g. the geocentric theory had a lot of bells and whistles in the end and it was quite precise in some cases. When the heliocentric theory was created, it was actually worse at predicting the movement of celestial bodies because it was too simplistic and was not able to handle various edge cases. Related to your remark about gravity, it took more than 50 years to successfully apply the theory of gravity to predict how Moon will behave.

Yeah, that is somewhat my perception.

3Koen.Holtman
In physics, we can try to reason about black holes and the big bang by inserting extreme values into the equations we know as the laws of physics, laws we got from observing less extreme phenomena. Would this also be 'a fictional-world-building exercise' to you? Reasoning about AGI is similar to reasoning about black holes: both of these do not necessarily lead to pseudo-science, though both also attract a lot of fringe thinkers, and not all of them think robustly all of the time. In the AGI case, the extreme value math can be somewhat trivial, if you want it. One approach is to just take the optimal policy π∗ defined by a normal MDP model, and assume that the AGI has found it and is using it. If so, what unsafe phenomena might we predict? What mechanisms could we build to suppress these?

Are you being passive-aggressive or am I reading this wrong? :)

The user Hickey is making a different argument. He is arguing about the falsifiability of the superintelligence is coming claim. This is also an interesting question, but I was not talking about this claim in particular.

I think that AI Safety can be a subfield of AI Alignment, however I see a distinction between AI as current ML models and AI as theoretical AGI.

1Martin Randall
Okay, so "AI Alignment (of current AIs)" is scientific and rigorous and falsifiable, but "AGI Alignment" is a fictional world-building exercise?

Thanks for you reply. I am aware of that, but I didn't want to reduce the discussion to particular papers. I was curious about how other people read this field as a whole and what's their opinion about it. One particular example I had in mind is the Embedded Agency post often mentioned as a good introductory material into AI Alignment. The text often mentions complex mathematical problems, such as halt problem, Godel's theorem, Goodhart's law, etc. in a very abrupt fashion and use these concept to evoke certain ideas. But a lot is left unsaid, e.g. if Turi... (read more)

1Koen.Holtman
For the record: I feel that Embedded Agency is a horrible introduction to AI alignment. But my opinion is a minority opinion on this forum.
1Morpheus
I don't think there's anyone putting his crecedence on hypercomputation becoming a problem. I've since been convinced that turing machines can do (at least) everything you can "compute".

Thanks for your reply. Popper-falsifiable does not mean experiment-based in my books. Math is falsifiable -- you can present a counterexample, error in reasoning, a paradoxical result, etc. Similarly to history, you can often falsify certain claims by providing evidence against. But you can not falsify a field where every definition is hand-waved and nothing is specified in detail. I agree that AI Alignment has pre-paradigmic features as far as Kuhn goes. But Kuhn also says that pre-paradigmic science is rarely rigorous or true, even though it might produce some results that will lead to something interesting in the future.

"Every definition is hand-waved and nothing is specified in detail" is an unfair caricature.

7RyanCarey
In terms of trying to formulate rigorous and consistent definitions, a major goal of the Causal Incentives Working Group is to analyse features of different problems using consistent definitions and a shared framework. In particular, our paper "Path-specific Objectives for Safer Agent Incentives" (AAAI-2022) will go online in about month, and should serve to organize a handful of papers in AIS.

Is it only technical achievements that are not getting celebrated anymore? Sometimes when you read old books you can read that certain celebrity was greeted by a huge crowd when it came to USA via boat. Can you imagine crowds waiting for celebrities nowadays? Sure, you can have some fans, but certainly not crowds waiting for someone. I believe that social media are simply replacing crowd celebrations and people have no need to actually go outside to celebrate anymore. You can see the event live with great video coverage (while you usually don't see much in the crowd) and you can also interact with all your friends (not with a bunch of random onlookers). This makes social media much more comfortable and accessible.

1gbear605
For a more recent example than trans-Atlantic ocean liners, when The Beatles arrived in the US by plane in 1964 they were greeted by a crowd of 3000 fans. That doesn't seem likely to happen today (and not just because of airport security).