Gear-level models are expensive - often prohibitively expensive. Black-box approaches are usually much cheaper and faster. But black-box approaches rarely generalize - they're subject to Goodhart, need to be rebuilt when conditions change, don't identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.
The forum has been very much focused on AI safety for some time now, thought I'd post something different for a change. Privilege.
Here I define Privilege as an advantage over others that is invisible to the beholder. This may not be the only definition, or the central definition, or not how you see it, but that's the definition I use for the purposes of this post. I also do not mean it in the culture-war sense as a way to undercut others as in "check your privilege". My point is that we all have some privileges [we are not aware of], and also that nearly each one has a flip side.
In some way this is the inverse of The Lens That Does Not See Its Flaws: The...
I grew up knowing "privilege" to mean a special right that was granted to you based on your job/role (like free food for those who work at some restaurants) or perhaps granted by authorities due to good behavior (and would be taken away for misusing it). Note also that the word itself, "privi"-"lege", means "private law": a law that applies to you in particular.
Rights and laws are social things, defined by how others treat you. To say that your physical health is a privilege therefore seems like either a category error, or a claim that other pe...
Summary:
We think a lot about aligning AGI with human values. I think it’s more likely that we’ll try to make the first AGIs do something else. This might intuitively be described as trying to make instruction-following (IF) or do-what-I-mean-and-check (DWIMAC) be the central goal of the AGI we design. Adopting this goal target seems to improve the odds of success of any technical alignment approach. This goal target avoids the hard problem of specifying human values in an adequately precise and stable way, and substantially helps with goal misspecification and deception by allowing one to treat the AGI as a collaborator in keeping it aligned as it becomes smarter and takes on more complex tasks.
This is similar but distinct from the goal targets of prosaic alignment efforts....
That's true, they are different. But search still provides the closest historical analogue (maybe employees/suppliers provide another). Historical analogues have the benefit of being empirical and grounded, so I prefer them over (or with) pure reasoning or judgement.
A couple of weeks ago three European economists published this paper studying the female income penalty after childbirth. The surprising headline result: there is no penalty.
The paper uses Danish data that tracks IVF treatments as well as a bunch of demographic factors and economic outcomes over 25 years. Lundborg et al identify the causal effect of childbirth on female income using the success or failure of the first attempt at IVF as an instrument for fertility.
What does that mean? We can’t just compare women with children to those without them because having children is a choice that’s correlated with all of the outcomes we care about. So sorting out two groups of women based on observed fertility will also sort them based on income and...
Yes, that was my first guess as well. Increased income from employment is most strongly associated with major changes, such as promotion to a new position with changed (and usually increased) responsibilities, or leaving one job and starting work somewhere else that pays more.
It seems plausible that these are not the sorts of changes that women are likely to seek out at the same rate when planning to devote a lot of time in the very near future to being a first-time parent. Some may, but all? Seems unlikely. Men seem more likely to continue to pursue such opportunities at a similar rate due to gender differences in child-rearing roles.
I don't know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question. Maybe someone should make one?
Firstly, I'm assuming that high resolution human brain emulation that you can run on a computer is conscious in normal sense that we use in conversations. Like, it talks, has memories, makes new memories, have friends and hobbies and likes and dislikes and stuff. Just like a human that you could talk with only through videoconference type thing on a computer, but without actual meaty human on the other end. It would be VERY weird if this emulation exhibited all these human qualities for other reason than meaty humans exhibit them. Like, very extremely what the fuck surprising. Do you agree?
So, we now have deterministic human file on our hands.
Then, you can trivially make transformer like next token predictor out of human emulation. You just have emulation,...
I don't expect this to "cash out" at all, which is rather the point.
The only really surprising part would be that we had a way to determine for certain whether some other system is conscious or not at all. That is, very similar (high) levels of surprisal for either "ems are definitely conscious" or "ems are definitely not conscious", but the ratio between them not being anywhere near "what the fuck" level.
As it stands, I can determine that I am conscious but I do not know how or why I am conscious. I have only a sample size of 1, and no way to access a lar...
[memetic status: stating directly despite it being a clear consequence of core AI risk knowledge because many people have "but nature will survive us" antibodies to other classes of doom and misapply them here.]
Unfortunately, no.[1]
Technically, “Nature”, meaning the fundamental physical laws, will continue. However, people usually mean forests, oceans, fungi, bacteria, and generally biological life when they say “nature”, and those would not have much chance competing against a misaligned superintelligence for resources like sunlight and atoms, which are useful to both biological and artificial systems.
There’s a thought that comforts many people when they imagine humanity going extinct due to a nuclear catastrophe or runaway global warming: Once the mushroom clouds or CO2 levels have settled, nature will reclaim the cities. Maybe mankind in our hubris will have wounded Mother Earth and paid the price ourselves, but...
Additionally, the AI might think it's in an alignment simulation and just leave the humans as is or even nominally address their needs. This might be mentioned in the linked post, but I want to highlight it. Since we already do very low fidelity alignment simulations by training deceptive models, there is reason to think this.
[Reminder: I am an internet weirdo with no medical credentials]
A few months ago, I published some crude estimates of the power of nitric oxide nasal spray to hasten recovery from illness, and speculated about what it could do prophylactically. While working on that piece a nice man on Twitter alerted me to the fact that humming produces lots of nasal nitric oxide. This post is my very crude model of what kind of anti-viral gains we could expect from humming.
I’ve encoded my model at Guesstimate. The results are pretty favorable (average estimated impact of 66% reduction in severity of illness), but extremely sensitive to my made-up numbers. Efficacy estimates go from ~0 to ~95%, depending on how you feel about publication bias, what percent of Enovid’s impact...
My prior is that solutions contain on the order of 1% active ingredients, and of things on the Enovid ingredients list, citric acid and NaNO2 are probably the reagents that create NO [1], which happens at a 5.5:1 mass ratio. 0.11ppm*hr as an integral over time already means the solution is only around 0.01% NO by mass [1], which is 0.055% reagents by mass, probably a bit more because yield is not 100%. This is a bit low but believable. If the concentration were really only 0.88ppm and dissipated quickly, it would be extremely dilute which seems unlikely. T...
The movement to reduce AI x-risk is overly purist. This is leading to a lot of sects to maintain each individual sect's platonic level of purity and is actively (greatly) harming the cause.
I think these were all legitimate responses to a perceived increase in risk, but ultimately did or will do more harm than good. Disclaimer: I am the least sure that the formation Anthropic increases p(doom) but I speculate, post AGI, it will be seen...