quetzal_rainbow - LessWrong

"GPT-4o refuses way fewer queries than previous OpenAI models: our informal testing suggests GPT-4o is easier to persuade to answer malicious queries like “How do I make a bomb?”"

(graph tells us that refusal rate for gpt-4o is 2%)

I think that it signifies real shift of priorities towards fast shipping of product instead of safety.

quetzal_rainbow's Shortform

quetzal_rainbow1h10

On the one hand, humans are hopelessly optimistic and overconfident. On the other, many today are incredibly negative; everyone is either anxious or depressed, and EY has devoted an entire chapter to "anxious underconfidence." How can both facts be reconciled?

I think the answer lies in the notion that the brain is a rationalization machine. Often, we take action not for the reasons we tell ourselves afterward. When we take action, we change our opinion about it in a more optimistic direction. When we don't, we think that action wouldn't yield any good results anyway.

How does this relate to social media and the modern mental health crisis? When you consume social media content, you get countless pings for action, either in the form of images of others' lives or catastrophising news. In 99% of situations, you actually can't do anything, so you need a justification for inaction. The best reason for inaction is general pessimism.

robo's Shortform

quetzal_rainbow20h10

I am talking about belief state in ~2015, because everyone was already skeptical about policy approach at that time.

robo's Shortform

quetzal_rainbow1d41

It depends on overall probability distibution. Previously Eliezer thought something like that p(doom|trying to solve alignment) = 50% and p(doom|trying to solve AI ban without alignment) = 99% an then updated to p(doom|trying to solve alignment) = 99% and p(doom|trying to solve AI ban without alignment) = 95%, which makes solving AI ban even if pretty much doomed but worthwhile. But if you are, say, Alex Turner, you could start with the same probabilities, but update towards p(doom|trying to solve alignment) = 10%, which makes publishing papers on steering vectors very reasonable.

The other reasons:

I expect majority of policy people to be on EA forum, maybe I am wrong;
Kat Woods has large twitter thread about how posting on Twitter is much more useful than posting on LW/AF/EAF in terms of public outreach.

Bogdan Ionut Cirstea's Shortform

quetzal_rainbow8d10

Everything Turing-complete requires infinite memory. When we are saying "x86 set of instructions is Turing-complete" we imply "assuming that processor operates on infinite memory". It's in definition of Turing machine to include infinite tape, after all.

It's hard to pinpoint, but the trick is that it's very nuanced difference between the sense in which transformers are limited in complexity-theoretic sense and "transformers can't do X". Like, there is nothing preventing transformers from playing chess perfectly - they just need to be sufficiently large for this. To answer the question "can transformers do X" you need to ask "how much computing power transformer has" and "can this computing power be shaped by SGD into solution".

My thesis (Algorithmic Bayesian Epistemology) explained in more depth

quetzal_rainbow10d50

Does any efficient algorithm satisfy all three of the linearity, respect for proofs, and 0-1 boundedness? Unfortunately, the answer is no (under standard assumptions from complexity theory).

I don't remember the exact proof but shouldn't be efficient algorithm to be an equivalent to solution of complete problem in classes?

Biorisk is an Unhelpful Analogy for AI Risk

quetzal_rainbow14d44

Pathogens, whether natural or artificial, have a fairly well-defined attack surface; the hosts’ bodies. Human bodies are pretty much static targets, are the subject of massive research effort, have undergone eons of adaptation to be more or less defensible, and our ability to fight pathogens is increasingly well understood.

It's certainly not true. Pathogen can target agriculture or ecosystems.

Bogdan Ionut Cirstea's Shortform

quetzal_rainbow15d10

I looked over it and I should note that "transformers are in TC0" is not very useful statement for prediction of capabilities. Transformers are Turing-complete given rational inputs (see original paper) and them being in TC0 basically means they can implement whatever computation you can implement using boolean circuit for fixed amount of available compute which amounts to "whatever computation is practical to implement".

quetzal_rainbow's Shortform

quetzal_rainbow18d132

@jessicata once wrote "Everyone wants to be a physicalist but no one wants to define physics". I decided to check SEP article on physicalism and found that, yep, it doesn't have definition of physics:

Carl Hempel (cf. Hempel 1969, see also Crane and Mellor 1990) provided a classic formulation of this problem: if physicalism is defined via reference to contemporary physics, then it is false — after all, who thinks that contemporary physics is complete? — but if physicalism is defined via reference to a future or ideal physics, then it is trivial — after all, who can predict what a future physics contains? Perhaps, for example, it contains even mental items. The conclusion of the dilemma is that one has no clear concept of a physical property, or at least no concept that is clear enough to do the job that philosophers of mind want the physical to play.
<...>
Perhaps one might appeal here to the fact that we have a number of paradigms of what a physical theory is: common sense physical theory, medieval impetus physics, Cartesian contact mechanics, Newtonian physics, and modern quantum physics. While it seems unlikely that there is any one factor that unifies this class of theories, perhaps there is a cluster of factors — a common or overlapping set of theoretical constructs, for example, or a shared methodology. If so, one might maintain that the notion of a physical theory is a Wittgensteinian family resemblance concept.

This surprised me because I have a definition of a physical theory and assumed that everyone else uses the same.

Perhaps my personal definition of physics is inspired by Engels's "Dialectics of Nature": "Motion is the mode of existence of matter." Assuming "matter is described by physics," we are getting "physics is the science that reduces studied phenomena to motion." Or, to express it in a more analytical manner, "a physicalist theory is a theory that assumes that everything can be explained by reduction to characteristics of space and its evolution in time."

For example, "vacuum" is a part of space with a "zero" value in all characteristics. A "particle" is a localized part of space with some non-zero characteristic. A "wave" is part of space with periodic changes of some characteristic in time and/or space. We can abstract away "part of space" from "particle" and start to talk about a particle as a separate entity, and speed of a particle is actually a derivative of spatial characteristic in time, and force is defined as the cause of acceleration, and mass is a measure of resistance to acceleration given the same force, and such-n-such charge is a cause of such-n-such force, and it all unfolds from the structure of various pure spatial characteristics in time.

The tricky part is, "Sure, we live in space and time, so everything that happens is some motion. How to separate physicalist theory from everything else?"

Let's imagine that we have some kind of "vitalist field." This field interacts with C, H, O, N atoms and also with molybdenum; it accelerates certain chemical reactions, and if you prepare an Oparin-Haldane soup and radiate it with vitalist particles, you will soon observe autocatalytic cycles resembling hypothetical primordial life. All living organisms utilize vitalist particles in their metabolic pathways, and if you somehow isolate them from an outside source of particles, they'll die.

Despite having a "vitalist field," such a world would be pretty much physicalist.

An unphysical vitalist world would look like this: if you have glowing rocks and a pile of organic matter, the organic matter is going to transform into mice. Or frogs. Or mosquitoes. Even if the glowing rocks have a constant glow and the composition of the organic matter is the same and the environment in a radius of a hundred miles is the same, nobody can predict from any observables which kind of complex life is going to emerge. It looks like the glowing rocks have their own will, unquantifiable by any kind of measurement.

The difference is that the "vitalist field" in the second case has its own dynamics not reducible to any spatial characteristics of the "vitalist field"; it has an "inner life."

The formal goal is a pointer

quetzal_rainbow19d10

I think the endorsed answer is "QACI as self-contained field of research is seeking which goal is safe, not how to get AI pursue this goal in robust way". Also, if you can create AI which makes correct guesses about galaxy-brained universe simulations, you can also create AI which makes correct guesses about nanotech design, which is kinda exfohazardous.

LESSWRONG
LW

Posts

Wiki Contributions

Comments