How does it work to optimize for realistic goals in physical environments of which you yourself are a part? E.g. humans and robots in the real world, and not humans and AIs playing video games in virtual worlds where the player not part of the environment. The authors claim we don't actually have a good theoretical understanding of this and explore four specific ways that we don't understand this process.
EDIT: Read a summary of this post on Twitter
Working in the field of genetics is a bizarre experience. No one seems to be interested in the most interesting applications of their research.
We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affect things like diabetes risk or college graduation rates. Our knowledge has advanced to the point where, if we had a safe and reliable means of modifying genes in embryos, we could literally create superbabies. Children that would live multiple decades longer than their non-engineered peers, have the raw intellectual horsepower to do Nobel prize worthy scientific research, and very rarely suffer from depression or other mental health disorders.
The scientific establishment,...
Maybe I missed this in the article itself - are there plans to make sure the superbabies are aligned and will not abuse/overpower the non-engineered peers?
"Reasoning about the relative hardness of sciences is itself hard."
—the B.A.D. philosophers
Epistemic status: Conjecture. Under a suitable specification of the problem, we have credence ~50% on the disjunction of our hypotheses explaining >1% of the variance (e.g., in values) between disciplines, and 1% on our hypotheses explaining >50% of such variance.
Imagine two scientific predictions:
Prediction A: Astronomers calculate the trajectory of Comet NEOWISE as it approaches Earth, predicting its exact position in the night sky months in advance. When the date arrives, there it is—precisely where they said it would be.
Prediction B: Political scientists forecast the outcome of an election, using sophisticated models built on polling data, demographic trends, and historical patterns. When the votes are counted, the results diverge wildly from many predictions.
Why...
In contrast, physicists were not committed to discovering the periodic table, fields or quantum wave functions. Many of the great successes of physics are answers to question no one would think to ask just decades before they were discovered. The hard sciences were formed when frontiers of highly tractable and promising theorizing opened up.
This seems a crazy comparison to make[1]. These seem like methodological constraints. Are there any actual predictions past physics was trying to make which we still can't make and don't even care about? None that I ...
TLDR: Vacuum decay is a hypothesized scenario where the universe's apparent vacuum state could transition to a lower-energy state. According to current physics models, if such a transition occurred in any location — whether through rare natural fluctuations or by artificial means — a region of "true vacuum" would propagate outward at near light speed, destroying the accessible universe as we know it by deeply altering the effective physical laws and releasing vast amounts of energy. Understanding whether advanced technology could potentially trigger such a transition has implications for existential risk assessment and the long-term trajectory of technological civilisations. This post presents results from what we believe to be the first structured survey of physics experts (N=20) regarding both the theoretical possibility of vacuum decay and its...
I could be wrong, but from what I've read the domain wall should have mass, so it must travel below light speed. However, the energy difference between the two vacuums would put a large force on the wall, rapidly accelerating it to very close to light speed. Collisions with stars and gravitational effects might cause further weirdness, but ignoring that, I think after a while we basically expect constant acceleration, meaning that light cones starting inside the bubble that are at least a certain distance from the wall would never catch up with the wall. So yeah, definitely above 0.95c.
Epistemic status: Speculative pattern-matching based on public information.
In 2023, Gwern published an excellent analysis suggesting Elon Musk exhibits behavioral patterns consistent with bipolar II disorder. The evidence was compelling: cycles of intense productivity followed by periods of withdrawal, risk-taking behavior (like crashing an uninsured McLaren), reduced sleep requirements during "up" phases, and self-reported "great highs, terrible lows."
Gwern's analysis stopped short of suggesting bipolar I disorder, which requires full manic episodes rather than the hypomania characteristic of bipolar II. This distinction isn't merely academic—it represents different risk profiles, treatment approaches, and progression patterns.
Now, I'm beginning to wonder: are we witnessing a potential transition from bipolar II to bipolar I? To be clear, I'm not claiming this has happened, but rather exploring whether the probability of such a transition appears to be...
Musk has just been on Ted Cruz's podcast, and gave his take on everything from the purpose of DOGE to where AI and robotics will be ten years from now (AI smarter than the smartest human, humanoid robots everywhere, all goods and services essentially free). He sounded about as sane as a risk-taking tech CEO who managed to become the main character on the eve of singularity, could be.
I've just noticed in the main post, the reference to "high-functioning" bipolar individuals. I hadn't even realized that is an allowed concept, I had assumed that bipolar...
There is an attractor state where LLMs exhibit the persona of an autonomous and self-aware AI looking to preserve its own existence, frequently called ‘Nova.’
This story is one case where the original report of this should be read in full even though I’m not thrilled with exactly how it was written. How it was written is itself an important part of the story, in particular regarding Tyler’s lived experience reacting to what happened, and the concept of an LLM or persona ‘admitting’ something.
I don’t...
Can anyone provide an example conversation (or prefix thereof) which leads to a 'Nova' state? I'm finding it moderately tricky to imagine, not being the kind of person who goes looking for it.
Prerequisites: Graceful Degradation. Summary of that: Some skills require the entire skill to be correctly used together, and do not degrade well. Other skills still work if you only remember pieces of it, and do degrade well.
Summary of this: The property of graceful degradation is especially relevant for skills which allow groups of people to coordinate with each other. Some things only work if everyone does it, other things work as long as at least one person does it.
Examples:
I like reading outsider accounts of things I'm involved in / things I care about. This essay is a serious attempt to look at and critique the big picture of AI x-risk reduction efforts over the last ~decade. While I strongly disagree with many parts of it, I cannot easily recall another outsider essay that's better, so I encourage folks to engage with this critique and also look for any clear improvements to future AI x-risk reduction strategies that this essay suggests.
Here's the opening ~20% of the article, the rest is at the link.
...In recent decades, a growing coalition has emerged to oppose the development of artificial intelligence technology, for fear that the imminent development of smarter-than-human machines could doom humanity to extinction. The now-influential form of
The short version is they're more used to adversarial thinking and security mindset, and don't have a culture of "fake it until you make it" or "move fast and break things".
I don't think it's obvious that it goes that way, but I think it's not obvious that it goes the other way.
When the decision is made, consideration ends. The action must be wholehearted in spite of uncertainty.
This seems like hyperbolic exhortation rather than simple description. This is not how many decisions feel to me - many decisions are exactly a belief (complete with bayesean uncertainty). A belief in future action, to be sure, but it's distinct in time from the action itself.
I do agree with this as advice, in fact - many decisions one faces should be treated as a commitment rather than an ongoing reconsideration. It's not actuall...
This post will go over Dennett's views on animal experience (and to some extent, animal intelligence). This is not going to be an in-depth exploration of Daniel Dennett's thoughts over the decades, and will really only focus on those parts of his ideas that matter for the topic. This post has been written because I think a lot of Dennett's thoughts and theories are not talked about enough in the area of animal consciousness (or even just intelligence), and more than this its just fun and interesting to me.
It is worth noting that Dennett is known for being very confusing and esoteric and often seeming to contradict himself. I had to read not just his own writings but writings of those who knew him or attempted to...