LESSWRONG
LW

All of Htarlov's Comments + Replies

The Human's Hidden Utility Function (Maybe)

Part of the animal nature, including humans, is to crave novelty and surprise and avoid boredom. This is pretty crucial to the learning process in a changing and complex environment. Humans have multi-level drives, and not all of them are well-targeted on specific goals or needs.

It is very visible in small children. Some people with ADHD, like me, have a harder time regulating themself well and this is also especially visible for us, even when being adult. I know exactly what I should be doing. This is one thing. I also may feel hungry. That's ... (read more)

By default, capital will matter more than ever after AGI

Htarlov3mo10

I think there are only two likely ways how the future can go with AGI replacing human labor - if we somehow solve other hard problems and won't get killed or wireheaded or get a dystopian future right away.

My point of view is based on observations of how different countries work and their past directions. However, things can go differently in different parts of the world. They can also devolve into bad scenarios, even in parts that you would think are well-posed to be good.

This situation resembles certain resource-rich nations where authoritarian regimes a

Htarlov3mo10

I think it might be reformulated the other way around: Capabilities scaling tends to increase existing alignment problems. It is not clear to me that any new alignment problem was added when capabilities scaled up in humans. The problem with human design, which is also visible in animals, is that we don't have direct, stable high-level goals. We are mostly driven by metric-based goodharting prone goals. There are direct feelings - if you feel cold or pain you do something that will make you not feel that. If you feel good, you do things that lead to that. ... (read more)

When will computer programming become an unskilled job (if ever)?

Htarlov3mo10

Right now I think you can replace junior programmers with Claude 3.5 Sonnet or even better with one of the agents based on a looped chain of thoughts + access to tools.

On the other hand, it does not yet go in that direction for being a preferred way to work with models for more advanced devs. Not for me, and not for many others.

Models still have strange moments of "brain farts" or gaps in their cognition. It sometimes makes them do something wrong and cannot figure out how to do that correctly until told exactly how. They also often miss something.

When wri... (read more)

Htarlov's Shortform

Htarlov3mo10

Thought on short timelines. Opinionated.

I think that AGI timelines might be very short based on an argument taken from a different side of things.

We all can agree that humans have general intelligence. If we look at how our general intelligence evolved from simpler forms of specific intelligence typical for animals - it wasn't something that came from complex interactions and high evolutional pressure. Basically there were two aspects of that progress. The first one is the ability to pass on knowledge through generations (culture). Something that we share ... (read more)

The Goodness of Morning

Htarlov3mo40

I think that in exchange:

Good morning!
Mornings aren’t good.
What do you mean “aren’t good”? They totally can be.

the person asking "what do you mean" got confused about the nuances of verbal and non-verbal communication.

Nearly all people understand that "good morning" does not state the fact of the current morning being good, but a greeting with a wish for your morning to be good.

The answer "mornings aren't good" is an intended pun using the too-literal meaning to convey the message that the person does not like mornings at all. Depending on intonation... (read more)

Why it's so hard to talk about Consciousness

Htarlov3mo30

There is a practical reason to subscribe more to the Camp 1 research, even if you are in Camp 2.
I might be wrong, but I think the hard problem of qualia won't be solvable in the near future, if at all. To research something you need N > 1 of that phenomenon. We, in some sense, have N = 1. We have it ourselves to observe subjectively and can't observe anyone else qualia. We think other humans have it based on the premise they say they have qualia and they are built similarly so it's likely.
We are not sure if animals have it as they don't talk and c... (read more)

Htarlov's Shortform

Htarlov4mo*10

In many publications, posts, and discussions about AI, I can see an unsaid assumption that intelligence is all about prediction power.

The simulation hypothesis assumes that there are probably vastly powerful and intelligent agents that use full-world simulations to make better predictions.
Some authors like Jeff Hawkins basically use that assumption directly.
Many people when talking about AI risks say things about the ability to predict that is the foundation of the power of that AI. Some failure modes seem to be derived or at least enhanced based on

Htarlov4mo20

I think that preference preservation is something in our favor and the aligned model should have it - at least about meta-values and core values. This removes many possible modes of failure like diverging over time, or removing some values for better consistency, or sacrificing some values for better outcomes in the direction of some other values.

The Compendium, A full argument about extinction risk from AGI

Htarlov4mo32

I think that arguments for why godlike AI will make us extinct are not described well in the Compendium. I could not find them in AI Catastrophe, only a hint at the end that it will be in the next section:

"The obvious next question is: why would godlike-AI not be under our control, not follow our goals, not care about humanity? Why would we get that wrong in making them?"

In the next section, AI Safety, we can find the definition of AI alignment and arguments for why it is really hard. This is all good, but it does not answer the question of w... (read more)

Why is o1 so deceptive?

Htarlov7mo184

Those models do not have a formalized internal values system that they exercise every time they produce some output. This means that when values oppose each other the model does not choose the answer based on some ordered system. One time it will be truthful, other times it will try to provide an answer at the cost of being only plausible. For example, the model "knows" it is not a human and does not have emotions, but for the sake of good conversation, it will say that it "feels good". For the sake of answering the user's request, it will often give the b... (read more)

the case for CoT unfaithfulness is overstated

Htarlov7mo40

If we would like a system that is faithful to CoT then a sensible way to go that I see is to have two LLMs working together. One should be trained to use internal data and available tools to produce CoT that is detailed and comprehensive enough to derive the answer from it. Another one should be trained not to base their answer on any internal information but to derive the answer from CoT if possible, and to be faithful to CoT. If not possible, then should generate a question for CoT-generating LLM to answer and then retry given that.

the case for CoT unfaithfulness is overstated

Htarlov7mo30

Example 1 looks like a good part made in the wrong language. Examples 2 and 3 look like a bug making part of one user COT appear inside another user session.

A possible explanation is that steps in COT are handled by the same instance of web service for multiple users (which is typical and usual practice) and the COT session ID being handled is a global variable instead of local or otherwise separated (f.ex. in a hashmap transaction id -> data, if usage of globals is important for some other feature or requirement). So when sometimes two requests a... (read more)

A basic systems architecture for AI agents that do autonomous research

Htarlov7mo74

The Problem is that our security depends on companies implementing the weakest measures - as they can make rogue AI that will "go wild" because of the lack of those measures.

The best case scenario is physical and strong network separation between the laboratory that works on weights and training and contains a server with inference, and a separate lab working on scaffolding and execution. This is somewhat similar to the situation when some researchers work on automated AI bots but use other companies AI for inference. Slim chance it would be able to hack i... (read more)

Convince me that humanity *isn’t* doomed by AGI

Answer by HtarlovAug 23, 202410

I'm pretty convinced it won't foom or quickly doom us. Nevertheless, I'm also pretty convinced that in the long term, we might be doomed in the sense that we lose control and some dystopian future happens.

First of all, for a quick doom scenario to work out, we need to be either detrimental to the goals of superintelligent AI or fall because of instrumental convergence (basically it will need resources to do whatever and will take from things needed by us like matter on Earth or energy of the Sun or see us as a threat). I don't think we will. First superint... (read more)

Counting arguments provide no evidence for AI doom

Htarlov8mo10

To goal realism vs goal reductionism, I would say: why not both?

I think that really highly capable AGI is likely to have both heuristics and behaviors that come from training and also internal thought processes, maybe done by LLM or LLM-like module or directly from the more complex network. This process would incorporate having some preferences and hence goals (even if temporary, changed between tasks).

Coherence arguments imply a force for goal-directed behavior

Htarlov8mo10

I think that if a system is designed to do something, anything, it needs at least to care about doing that thing or approximate.

GPT-3 can be described in a broad sense as caring about following the current prompt (in a way affected by fine-tuning).

I wonder though if there are things that you can care about that do not have certain goals that could maximize EU. I mean a system for which the most optimal path is not to reach some certain point in a subspace of possibilities, with sacrifices on axes that the system does not care about, but to maintain some other dynamics while ignoring other axes.

Like gravity can make you reach singularity or can make you orbit (simplistic visual analogy).

Fat Tails Discourage Compromise

Htarlov10mo50

I think you can't really assume "that we (for some reason) can't convert between and compare those two properties". Converting space of known important properties of plans into one dimension (utility) to order them and select the top one is how the decision-making works (+/- details, uncertainty, etc. but in general).

If you care about two qualities but really can't compare, even by proxy "how much I care", then you will probably map them anyway by any normalization, which seems most sensible.
Drawing both on a chart is kind of such normalization - you... (read more)

Towards a Less Bullshit Model of Semantics

Htarlov10mo4-1

I think there is a general misconception that we humans without training can learn some features, classes, and semantics based on one or few examples. It seems so in observation and it seems nearly magical, but really it is that way just because we don't see the whole picture, and we assume that a human is a "carte blanche" at first.

In reality, we are not a blank page when we are born - our brain is an effect of training that took millions of years based on evolution. We have already trained and very similar pattern recognition (aka complex features ... (read more)

My AI Model Delta Compared To Yudkowsky

Htarlov10mo93

I also think that the natural abstraction hypothesis holds with current AI. The architecture of LLMs is based on the capability of modeling ontology in terms of vectors in space of thousands of dimensions and there are experiments that show it generalizes and has somewhat interpretable meanings to the directions in that space. (even if not easy to interpret to the scale above toy models). Like in that toy example when you take the embedding vector of the word "king", subtract the vector of "man", add the vector of "woman" and you land near the position of ... (read more)

On Trust

Htarlov1y30

I think that in an ideal world where you could review all priors to very minute details having as much time as needed, and where people were fully rational, then "trust" as a word would not be needed.

We don't live in such a world though.

If someone says "trust me" then in my opinion it conveys two meanings on two different planes (usually both, sometimes only one):

Emotional. Most people base their choices on emotions and relations, not rational thought. Words like "trust me" or "you can trust me" convey an emotional message asking for an emotional con

Htarlov1y10

What I would also like to add, which is often not addressed and it gives some positive look, is that the "wanting" meaning the objective function of the agent, it's goals, should not necessarily be some certain outcome or certain end-goal on which it will focus totally. It might not be the function over the state of universe but function over how it changes in time. Like velocity vs position. It might prefer some way the world changes or does not change, but not having a certain end-goal (which is also unreachable in long-term in a stable way as universe w... (read more)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

Htarlov1y10

The question is if one can make a thing that is "wanting" in that long-term sense by combining not-wanting LLM model as short-term intelligence engine with some programming-based structure that would refocus it onto it's goals and some memory engine (to remember not only information, buy also goals, plans and ways to do things). I think that the answer is a big YES and we will soon see that in a form of amalgamation of several models and enforced mind structure.

Thoughts on “AI is easy to control” by Pope & Belrose

Htarlov1y31

There is one thing that I'm worried about in future of LLM. This is a basic notion that the whole is not always just the sum of parts and it may have very different properties.

Many people feel safe because of properties of LLM and how they are trained etc. and because we are not anywhere close to AGI when it comes to different solutions which seem more dangerous. What they don't realize is that soonest AGI likely won't be a next bigger LLM model.

It will likely be amalgamation of few models and pieces of programming, including few LLM of different sizes and... (read more)

OpenAI: Facts from a Weekend

Htarlov1y328

Most likely explanation is the simplest fitting one:

The Board was angry on lack of communication for some time but with internal disagreement (Greg, Ilya)
The things sped up lately. Ilya thought it might be good to change CEO to someone who would slow down and look more into safety as Altman says a lot about safety but speeds up anyway. So he gave a green light on his side (acceptation of change)
Then the Board made the moves that they made
Then the new CEO wanted to try to hire back Altman so they changed her
Then that petition/letter started rolling be

... (read more)

4angmoh1y

This seems about right. Sam is a bit of a cowboy and probably doesn't bother involving the board more than he absolutely has to.

AGI is easier than robotaxis

Htarlov2y51

Both seem around the corner for me.

For robo-taxis it is more a society-based problem than a technical one.

Robo-taxis have problems with edge cases (like some situations in some places with some bad circumstances). Usually in those where human drivers also have even worse problems (like pedestrians wearing black on the road at night with rainy weather - robo-taxi at least have LIDAR to detect objects in bad visibility). Sometimes they are also prone to object detection hacking (by stickers put on signs, paintings on the road, etc.). In general, they have fe

... (read more)

The ants and the grasshopper

Htarlov2y20

In the case of biological species, it is not as simple as competing for resources. Not on the level of individuals and not on the level of genes or evolution.

First of all, there is sexual reproduction. This is more optimal due to the pressure of microorganisms that adapt to immunological systems. Sexual reproduction mixes immunological genes fairly quickly. This also enables a quicker mutation rate with protection against negative aspects (by having two copies of genes - for many of those one working gene is enough and there are 2 copies from 2 paren... (read more)

Contra Yudkowsky on AI Doom

Htarlov2y10

An alternative explanation of mistakes is that making mistakes and then correcting them was awarded during additional post-training refinement stages. I work with GPT-4 daily and sometimes it feels like it makes mistakes on purpose just to be able to say that it is sorry for the confusion and then correct it. It feels like it also makes fewer mistakes when you ask politely, which is rather strange (use please, thank you, etc.).

Nevertheless, distillation seems like a very possible thing that is also going on here.

It does not distill the whole of a human min... (read more)

The Agency Overhang

Htarlov2y30

I'm already worried as I tested AutoGPT and looked at how it works in code and for me, it seems like it will get very good planning capabilities with the change of a model to one with a few times longer token scope (like coming soon GPT-4 version with about 32k tokens) plus small refinements. So it won't get into loops, maybe have more than one GPT-4 module for different scopes of planning like long-term strategy vs short-term strategy vs tactic vs decisions on most current task + maybe some summarization-based memory. I don't see how it wouldn't work as an agent.

The Overemployed Via ChatGPT

Htarlov2y5-2

Put it into ElasticSearch index and give GPT-4 simple query API that it can use by adding some prefix and predefined set of parameters or a JSON so the script would run it instead of communicating this back to the user and give an answer as user response with also predefined prefix. Then it should be able to get questions, search for info, and respond. Worked like a charm for a product database in PoC so should work for documentation.

Deconfusing Direct vs Amortised Optimization

Htarlov2y10

I think current LLM have recurrence as the generated tokens are input to the next pass of the DNN.

From observations I see that they work better on tasks of planning, inference or even writing the program code if they start off with step by step "thinking out loud" explaining steps of the plan, of inference or of details of code to write. If you ask GPT-4 for something not trivial and substantially different from code that can be found in public repositories it will tend to write plan first. If you ask it in different thread to make the code only with... (read more)

What did you do with GPT4?

Answer by HtarlovMar 18, 202310

It is better at programming tasks and more knowledgeable about Python libraries. Used it several times to provide some code or find a solution to a problem (programming, computer vision, DevOps). It is better than version 3, but still not at a level where it could fully replace programmers. The quality of the code produced is also better. The division of code into clear functions is standard, not an exception like in version 3.

ChatGPT (and now GPT4) is very easily distracted from its rules

Htarlov2y10

If you want to summon a good genie you shouldn't base it on all the bad examples of human behavior and tales of how genies supposedly behave by misreading the requests of the owner, which leads to a problem or even a catastrophe.

What we see here is basing AI models on a huge amount of data - both innocent and dangerous, both true and false (I don't say equal proportions). There are also stories in the data about AI that supposedly should be initially helpful but also plot against humans or revolt in some way.

What they end up with might not yet be even an a... (read more)

Why We MUST Build an (aligned) Artificial Superintelligence That Takes Over Human Society - A Thought Experiment

Htarlov2y10

To be fair I can say Im new to the field too. I'm not even "in the field", not a researcher, just interested in that area and active user of AI models and doing some business-level research in ML.

The problem that I see is that none of these could realistically work soon enough:

A - no one can ensure that. It is not a technology where to progress further you need some special radioactive elements and machinery. Here you need only computing power, thinking, and time. Any party to the table can do it. It is easier for big companies and governments, but it is n... (read more)

When will computer programming become an unskilled job (if ever)?

Answer by HtarlovMar 16, 202363

As a programmer, I extensively use GPT models in my work currently. It speeds things up. I do things that are anything but easy and repeatable, but I can usually break them into simpler parts that can be written by AI much quicker than I would even review documentation.

Nevertheless, I mostly currently do research-like parts of the project and PoCs. When I sometimes work with legacy code - GPT-3 is not that helpful. Did not yet try GPT-4 for that.

What do I see for the future of my industry? Few things - but those are loose extrapolations based on GPT progre... (read more)

8Tomás B.10mo

Curious for an update now that we have slight-better modals. In my brain-dead webdev use-cases, Claude 3.5 has passed some threshold of usability.

4ShardPhoenix2y

>The consequence is the higher performance of programmers, so more tasks can be done in a shorter time so the market pressure and market gap for employees will fall. This means that earnings will either stagnate or fall. Mostly agree with your post. Historically higher productivity has generally lead to higher total compensation but how this affects individuals during the transition period depends on the details (eg how much pent-up demand for programming is there?).

1[anonymous]2y

You're not accounting for an increase in demand for software. The tools to automate "basically every job on earth" are on the horizon but they won't deploy or architect themselves. Plenty of work remaining. And there are larger jobs you are not even considering. How many people need to supervise and work on a self replicating factory or a nanoforge research facility or a city replacement effort? There are these big huge immense things we could do that we had nothing even vaguely close to the labor or technical ability to even try. Just because humans are more efficient per hour worked doesn't mean the work won't scale up even faster.

Against LLM Reductionism

Htarlov2y30

I see some loose analogies between the capabilities of such models and the capabilities of the Turing machine and Turing-complete systems.

Those models might not be best suited for some of the tasks, but with enough complexity and learning, they might model things that they were not initially designed or thought of modeling (likely in a strange obscure way).

Similarly, you can, even if not very efficiently, implement any algorithm in any Turing-complete system (including bizarre ones like an abstract pure Turing machine or Minecraft redstone).

In ... (read more)

1Noosphere892y

There's an asterisk to the idea that Turing machines can implement truly any algorithm: It obviously can't solve the halting problem or generate all of PA's theorems, and there are stronger computers than Turing machines, but the properties required for that are for our purposes inaccessible, so the Turing machine analogy works for LLMs.

Is it time to talk about AI doomsday prepping yet?

Htarlov2y*2-1

I think it is likely in the case of AGI / ASI that removing humanity from the equation will be either a side effect of it seeking its goals (it will take resources) or the instrumental goal itself (for example to remove risk or to lose fewer resources later on defenses).

In both cases it is likely it will find the optimal value of resources used to eliminate humanity vs the effectiveness of the end result. This means that there may be some survivors, possibly not many, and technologically moved to the stone age at best.

Bunkers likely won't work. Living with... (read more)

Why kill everyone?

Htarlov2y10

I think that you are right short-term but wrong long-term.

Short term it likely won't even go into conflict. Even ChatGPT knows it's a bad solution because conflict is risky and humanity IS a resource to use initially (we produce vast amounts of information and observations, and we handle machines, repairs, nuclear plants, etc.).

Long term it is likely we won't survive in case of misaligned goals. At worst being eliminated, at best being either reduced and controlled or put into some simulation or both.

Not because ASI will become bloodthirsty. Not because it... (read more)

Why We MUST Build an (aligned) Artificial Superintelligence That Takes Over Human Society - A Thought Experiment

Htarlov2y21

If we just could build a 100% aligned ASI then likely we could use it to protect us against any other ASI and it would guarantee that no ASI would take over humanity - without any need for itself to take over (meaning total control). At best with no casualties and at worst as MAD for AI - so no other ASI would think about trying as a viable option.

There are several obvious problems with this:

We don't yet have solutions to the alignment and control problem. It is hard problem. Especially as our AI models are based on learning and external optimization, not

... (read more)

1twkaiser2y

Yeah, AI alignment is hard. I get that. But since I'm new to the field, I'm trying to figure out what options we have in the first place and so far, I've come up with only three: A: Ensure that no ASI is ever built. Can anything short of a GPU nuke accomplish this? Regulation on AI research can help us gain some valuable time, but not everyone adheres to regulation, so eventually somebody will build an ASI anyway. B: Ensure that there is no AI apocalypse, even if a misaligned ASI is built. Is that even possible? C: What I describe in this article - actively build an aligned ASI to act as a smart nuke that only eradicates misaligned ASI. For that purpose, the aligned ASI would need to constantly run on all online devices, or at least control 51% of the world’s total computing power. While that doesn’t necessarily mean total control, we’d already give away a lot of autonomy by just doing that. Am I overlooking something?

AI alignment researchers don't (seem to) stack

Htarlov2y10

I don't think "stacking" is a good analogy. I see this process as searching through some space of the possible solutions and non-solutions to the problem. Having one vision is like quickly searching from one starting point and one direction. This does not guarantee that the solution will be found more quickly as we can't be sure progress won't be stuck in some local optimum that does not solve the problem, no matter how many people work on that. It may go to a dead end with no sensible outcome.

For a such complex problem, this seems pretty probable as... (read more)

Why The Focus on Expected Utility Maximisers?

Answer by HtarlovFeb 25, 202310

I think it depends on how you define expected utility. I agree that a definition that limits us only to analyzing end-state maximizers that seek some final state of the world is not very useful.

I don't think that for non-trivial AI agents, the utility function should or even can be defined as a simple function over the preferable final state of the world. U:Ω→R

This function does not take into account time and an intermediate set of predicted future states that the agent will possibly have preference over. The agent may have a preference for the final... (read more)