All of azergante's Comments + Replies

If our corrupted hardware can't be trusted to compute the consequences in a specific case, it probably also can't be trusted to compute the consequences of a general rule.

Specific details of a case can make people emotional and corrupt the reasoning, less so for an abstract general rule.

Ok that makes sense, I'll add proper disclaimers and take your word for it that it's fine as long as there are no objections. Thank you :)

The dust / torture thought experiment seems pretty controversial. Why do we have to sum the sufferings? We can aggregate them in various ways, for example we can care about minimizing the maximum suffering rather than minimizing the sum of sufferings, then we don't torture people no matter how many dust specks there are, which better aligns with some of our values.

Nobody seems to have problems with circular preferences in practice, probably because people's preferences aren't precise enough. So we don't have to adopt utilitarianism to fix this non-problem.

This may not be a problem at the individual scale, but individuals design systems (a program, a government, an AI) and these systems must be precise and designed to handle these kinds of issues, they can't just adapt like humans do to avoid repeated exploits, we first have to build the adaptation mechanism. The way I see it utilitarianism is an attempt to describe such a mechanism.

Concerns about Organoids

Chatting with ChatGPT I learned that latest organoids have about 1 million neurons.

Wondering whether that's a lot or not, it tells me that bees and other insects have on the order of 10^5, fish, like zebra fish, have on the order of 10^6. So we are engineering fish brains, at least in terms of number of neurons. That's concerning, as far as I know zebra fish are conscious and can hurt.

What about humans? ChatGPT says humans have around 10^11 neurons, however 10-12 weeks embryo have about 10^6. It so happens that 10 to 12 weeks is th... (read more)

I broadly agree with the conclusions (not with the arguments) though from a 2025 perspective they do not feel novel (the value of challenge, destination vs journey, agency and authenticity are discussed within self help books).

On the arguments:

Work

what is a computer game except synthetic work?

Games and work have a few common points sure, but they have huge differences:

  • we must work to live, we play games if we feel like it
  • we cannot stop working, we can stop playing anytime
  • (some) work is unpleasant, the games we play are fun (else we don't play them
... (read more)

Obviously a superintelligence knows that this is an unusual case

Since the ASI knows this is an unusual case, can it do some exception handling (like asking a human) instead of executing the normal path?

but that doesn't say if it's a positive or negative case.

Why only positive or negative? some classifiers have an "out-of-distribution" category, for example One-Class SVM, using several of them should handle multiple classes. Perhaps this is also doable with any other latent feature spaces (transformers?) using a threshold distance to limit categories... (read more)

It sometimes takes me a long time to go from "A is true", "B is true", "A and B implies C is true" to "C is true".

I think this is a common issue with humans, for example I can see a word such as "aqueduct", and also know that "aqua" means water in Latin, yet fail to notice that "aqueduct" comes from "aqua". This is because when I see a word it does not trigger a dynamic that searches for a root.

Another case is when the rule looks a bit different, say "a and b implies c" rather than "A and B implies C" and some effort is needed to notice that it still appli... (read more)

The core argument that there is "no universally compelling argument" holds if we literally consider all of mind design space, but for the task of building and aligning AGIs we may be able to constrain the space such that it is unclear that the argument holds.

For example in order to accomplish general tasks AGIs can be expected to have a coherent, accurate and compressed model of the world (as do transformers to some extent) such that they can roughly restate their input. This implies that in a world where there is a lot of evidence that the sky is blue (in... (read more)

This leads to the first big problem with this post: The idea that minds are determined by DNA. This idea only makes sense if one is thinking of a mind as a sort of potential space.

Clone Einstein and raise him with wolves and you get a sort of smart wolf mind inhabiting a human body. Minds are memetic. Petunias don't have minds. I am my mind.

Reversing your analogy, if you clone a wolf and raise it with Einsteins, you do not get another Einstein. That is because hardware (DNA) matters and wolves do not have the required brain hardware to instantiate Ei... (read more)

Thanks for the suggestion, I added the "Edit 1" section to the post to showcase a small study on 3 posts known to contain factual mistakes. The LLM is able to spot and correct the mistake in 2 of the 3 cases, and provides valuable (though verbose) context. Overall this seems promising to me.

azergante-10

This post assumes the word "happiness" is crisply defined and means the same thing for everyone but that's not the case. Or perhaps it is implicitly arguing what the meaning of "happiness" should be?

Anyway this post would be much clearer if the word "happiness" was tabooed.

I have always been slightly confused at people arguing against wire-heading. Isn't wire-heading the thing that is supposed to max out our utility function? if that's not the case then what's the point of talking about it? why not just find what does maximize our utility function and do t... (read more)

"Yeah?  Let's see your aura of destiny, buddy."

 

Another angle: if I have to hire a software engineer, I'll pick the one with the aura of destiny any time, because that one is more likely to achieve great things than the others.

I would say auras of destiny are Bayesian evidence for greatness, and they are hard to fake signals.

Slightly off-topic but Wow! this is material for an awesome RTS video-game! That would be so cool!

And a bit more on topic: that kind of video game would give the broader public a good idea of what's coming, and researchers and leaders a way to vividly explore various scenario, all while having fun.

Imagine playing the role of an unaligned AGI, and noticing that the game dynamics push you to deceive humans to gain more compute and capabilities until you can take over or something, all because that's the fastest way to maximize your utility function!

If you now put a detector in path A , it will find a photon with probability ( ), and same for path B. This means that there is a 50% chance of the configuration |photon in path A only>, and 50% chance of the configuration |photon in path B only>. The arrow direction still has no effect on the probability.

 

This 50/50 split is extra surprising and perhaps misleading? What's the cause? why not 100 on path A and 0 on path B (or the reverse)?

As a layman it seems like either:

  1. The world is not deterministic, so when we repeat the experime
... (read more)

Insularity will make you dumber

Okay but there is another side to the issue, insularity can also have positive effects:

If you look at evolution, when a population gets stuck on an island, it starts to develop in interesting ways, maybe insularity is a necessary step to develop truly creative worldviews?

Also IIRC in "The Timeless way of building" Christopher Alexander mentions that cities should be designed as several small neighborhoods with well-defined boundaries, where people with similar background live. He also says something to the effect that the ... (read more)

2TAG
Insularity -- being an echo chamber -- is bad for truth seeking, even if it is good for neighbourhoods.

I liked the intro but some parts of the previous posts and this one have been confusing, for example in this post:

Second, we saw that configurations are about multiple particles. [...] And in the real universe, every configuration is about all the particles… everywhere.)

and more glaring in the previous one:

A configuration says, “a photon here, a photon there,”

Here my intuition is that we can model the world as particles, or we can use the lower-level model of the world which configurations are, but we can't mix both any way we want. These sentences... (read more)

Planecrash (from Eliezer and Lintamande) seems highly relevant here: the hero, Keltam, tries to determine whether he is in a conspiracy or not. To do that he basically applies Bayes theorem to each new fact he encounters: "Is fact F more likely to happen if I am in a conspiracy or if I am not? hmm, fact F seems more likely to happen if I am not in a conspiracy, let's update my prior a bit towards the 'not in a conspiracy' side".

Planecrash is a great walkthrough on how to apply that kind of thinking to evaluate whether someone is bullshitting you or not, by... (read more)

You did not explicitly state the goal of the advice, I think it would be interesting to distinguish between advice that is meant to increase your value to the company, and advice meant to increase your satisfaction with your work, especially when the two point in opposite directions.

For example it could be that "swallow[ing] your pride and us[ing] that garbage language you hate so much" is good for the company in some cases, but terrible for job satisfaction, making you depressed or angry every time you have to use that silly language/tool.

2danielechlin
You want to be tending your value system so that being good at your job also makes you happy. It sounds like a cop-out but that's really it, really important, and really the truth. Being angry you have to do your job the best way possible is not sustainable.
2Yair Halberstadt
The goal is writing good software to solve a particular problem. Using haskell to write an SPA is not going to work well whether your doing it for someone else or for yourself (assuming you care about the product and it's not just a learning/fun exercise). It is a perfectly valid decision to say that you'll only work on products where Haskell is a good fit, but I would strongly recommend against using Haskell where it's not a good fit in a production setting, and would consider it low key fraud to do so where somebody else is paying you for your time. My experience is that once you get over yourself, and put in the effort to properly understand the language, best practices, etc. you might not love the language, but you'll find it's actually fine. It's a popular language, people use it, and they've found ways to sand down the rough edges and make the good bits shine. Sure it's got problems but it's not as soul destroying as it looked at first sight, and you'll likely learn a lot anyway. (I'm not talking about a case where a company forces you to use a deprecated language like COBOL or ColdFusion . I'm talking about a case where you pick the language because it's the best tool for the job). This is in general good career advice. You'll lose out on a lot of opportunities if you refuse to put yourself in uncomfortable situations.
9Brendan Long
I think it's more that learning to prioritize effectiveness over aesthetics will make you a more effective software engineer. Sometimes terrible languages are the right tool for the job, and I find it gives me satisfaction to pick the right tool even if I wish we lived in a world where the right tool was also the objectively best language (OCaml, obviously).
azergante122

For that reason try to structure teams such that every team has everything it needs for its day to day work.

 

I would extend that to "have as much control as you can over what you do". I increasingly find that this is key to move fast and produce quality software.

This applies to code and means dependencies should be owned and open to modifications, so the team understands them well and can fix bugs or add features as needed.

This avoids ridiculous situations where bugs are never fixed or shipping very simple features (such as changing a theme for a UI c... (read more)

Tip: you can ask ChatGPT to include confidence scores in its replies

Interactions with ChatGPT can be customized durably in the options, for example you can add the following instructions: "include a confidence rating at the end of your response in the format 'Confidence: X%'. If your confidence is below 80%, briefly explain why".

Here is a sample conversation demonstrating this and showing what ChatGPT has to say about its calibration:

Me: Are you calibrated, by which I mean, when you output a confidence X as a percentage, are you right X times out of 100?

C... (read more)

Answer by azergante124

Many developers have been reporting that this is dramatically increasing their productivity, up to 5x'ing/10x'ing it

I challenge the data: none of my colleagues have been reporting this high a speed-up. I think your observation can just be explained by a high sampling bias.

People who do not use AI or got no improvement are unlikely to report. You also mention Twitter where users share "hot takes" etc to increase engagement.

It's good to have actual numbers before we explain them, so I ran a quick search and found 3 articles that look promising (I only did... (read more)

I also think it is unlikely that AGIs will compete in human status games. Status games are not just about being the best: Deep Blue is not high status, sportsmen that take drugs to improve their performance are not high status.

Status games have rules and you only win if you do something impressive while competing within the rules, being an AGI is likely to be seen as an unfair advantage, and thus AIs will be banned from human status games, in the same way that current sports competitions are split by gender and weight.

Even if they are not banned given their abilities it will be expected that they do much better than humans, it will just be a normal thing, not a high status, impressive thing.

Let me show you the ropes

There is a rope.
You hold one end.
I hold the other.
The rope is tight.
I pull on it.

How long until your end of the rope moves?

What matters is not how long until your end of the rope moves.
It's having fun sciencing it!

2JBlack
How long is a piece of string?
azergante110

For those interested in writing better trip reports there is a "Guide to Writing Rigorous Reports of Exotic States of Consciousness" at https://qri.org/blog/rigorous-reports

A trip report is an especially hard case of something one can write about:

  • english does not have a well-developed vocabulary for exotic states of consciousness
  • even if we made up new words, they might not make much sense to people that have not experienced what they point at, just like it's hard to describe color to blind people or to project a high-dimensional thing to a lower dimensional space.

I have a similar intuition that if mirror-life is dangerous to Earth-life, then the mirror version of mirror-life (that is, Earth-life) should be about equally as dangerous to mirror-life as mirror-life is to Earth-life. Having only read this post and in the absence of any evidence either way this default intuition seems reasonable.

I find the post alarming and I really wish it had some numbers instead of words like "might" to back up the claims of threat. At the moment my uneducated mental model is that for mirror-life to be a danger it has to:

  • find enoug
... (read more)
2A1987dM
1 is irrelevant to autotrophs (e.g. cyanobacteria), who can synthesize their own food from achiral CO2 using sunlight; 2 is pretty much guaranteed if it's the only mirror life form in the ecosystem; 3 is obvious if it's the mirror image of an already existing life form; and it doesn't have to do 4 and 5 to achieve 6 (even a mirror cyanobacterium not otherwise interacting with non-mirror life would keep replicating and replicating exponentially until the biosphere runs out of CO2 or whichever other achiral nutrient turns out to be the limiting factor)

2+2=5 is Fine Maths: all you need is Coherence

[ epistemological status: a thought I had while reading about Russell's paradox, rewritten and expanded on by Claude ; my math level: undergraduate-ish ]

Introduction

Mathematics has faced several apparent "crises" throughout history that seemed to threaten its very foundations. However, these crises largely dissolve when we recognize a simple truth: mathematics consists of coherent systems designed for specific purposes, rather than a single universal "true" mathematics. This perspective shift—from seeing mat... (read more)

I really like the idea of milestones, I think seeing the result of each milestones will help create trust in the group, confidence that the end action will succeed and a realization of the real impact the group has. Each CA should probably start with small milestones (posting something on social medias) and ramp things up until the end goal is reached. Seeing actual impact early will definitely keep people engaged and might make the group more cohesive and ambitious.

Answer by azergante10

Ditch old software tools or programming languages for better, new ones.

My take on the tool VS agent distinction:

  • A tool runs a predefined algorithm whose outputs are in a narrow, well-understood and obviously safe space.

  • An agent runs an algorithm that allows it to compose and execute its own algorithm (choose actions) to maximize its utility function (get closer to its goal). If the agent can compose enough actions from a large enough set, the output of the new algorithm is wildly unpredictable and potentially catastrophic.

This hints that we can build safe agents by carefully curating the set of actions it chooses from so that any algorithm composed from the set produces an output that is in a safe space.

I think being as honest as reasonably sensible is good for oneself. Being honest applies pressure on oneself and one’s environment until the both closely match. I expect the process to have its ups and downs but to lead to a smoother life on the long run.

An example that comes to mind is the necessity to open up to have meaningful relationships (versus the alternative of concealing one’s interests which tends to make conversations boring).

Also honesty seems like a requirement to have an accurate map of reality: having snappy and accurate feedback is essenti... (read more)

I also thought about something along those lines: explaining the domestication of wolves to dogs, or maybe prehistoric wheat to modern wheat, then extrapolating to chimps. Then I had a dangerous thought, what would happen if we tried to select chimps for humaneness?

goals appear only when you make rough generalizations from its behavior in limited cases.

I am surprised no one brought up the usual map / territory distinction. In this case the territory is the set of observed behaviors. Humans look at the territory and with their limited processing power they produce a compressed and lossy map, here called the goal.

The goal is a useful model to talk simply about the set of behaviors, but has no existence outside the head of people discussing it.

This is a great use case for AI: expert knowledge tailored precisely to one’s needs

Is the "cure cancer goal ends up as a nuke humanity action" hypothesis valid and backed by evidence?

My understanding is that the meaning of the "cure cancer" sentence can be represented as a point in a high-dimensional meaning space, which I expect to be pretty far from the "nuke humanity" point. 

For example "cure cancer" would be highly associated with saving lots of lives and positive sentiments, while "nuke humanity" would have the exact opposite associations, positioning it far away from "cure cancer".

A good design might specify that if the two go... (read more)

3the gears to ascension
These original warnings were always written from a framework that assumed the only way to make intelligence is RL. They are still valid for RL, but thankfully it seems that at least for the time being, pure RL is not popular; I imagine that might have something to do with how obvious it is to everyone who tries pure RL that it's pretty hard to get it to do useful things, for reasons that can be reasonably called alignment problems. Imagine trying to get an AI to cure cancer entirely by RLHF, without even letting it learn language first. That's how bad they thought it would be. But RL setups do get used, and they do have generalization issues that do have connection to these issues.

If you know your belief isn't correlated to reality, how can you still believe it?

 

Interestingly, physics models (map) are wrong (inaccurate) and people know that but still use them all the time because they are good enough with respect to some goal.

Less accurate models can even be favored over more accurate ones to save on computing power or reduce complexity.

As long as the benefits outweigh the drawbacks, the correlation to reality is irrelevant.

Not sure how cleanly this maps to beliefs since one would have to be able to go from one belief to anothe... (read more)

@Eliezer, some interesting points in the article, I will criticize what frustrated me:

> If you see a beaver chewing a log, then you know what this thing-that-chews-through-logs looks like,
> and you will be able to recognize it on future occasions whether it is called a “beaver” or not.
> But if you acquire your beliefs about beavers by someone else telling you facts about “beavers,”
> you may not be able to recognize a beaver when you see one.

Things do not have intrinsic meaning, rather meaning is an emergent property of
things in relation to each... (read more)

The examples seem to assume that "and" and "or" as used in natural language work the same way as their logical counterpart. I think this is not the case and that it could bias the experiment’s results.

As a trivial example the question "Do you want to go to the beach or to the city?" is not just a yes or no question, as boolean logic would have it.

Not everyone learns about boolean logic, and those who do likely learn it long after learning how to talk, so it’s likely that natural language propositions that look somewhat logical are not interpreted as just l... (read more)