It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.
Scott Alexander has called for people to organize a spring meetup, and this year, it will be held at Stoup Brewing in Capitol Hill, Seattle. I have made a reservation for two tables at Stoup Brewing, which is known for being one of the quietest bar spaces in the city. I will be wearing a shirt and a blue sweater, hopefully you’ll see the group when you arrive.
Stoup Brewing offers a selection of both beer and non-alcoholic drinks. While the venue does not serve food, you are welcome to bring your own. Additionally, you are encouraged to bring board games to enjoy with fellow attendees. In previous years, Stoup has provided board games for patrons to borrow, but the availability of these games can be inconsistent.
For those driving to the event, please be aware that there are a few parking garages nearby; however, free parking is unfortunately not available in the area.
See: https://www.astralcodexten.com/p/spring-meetups-everywhere-2024-call
We’re in the north-west corner of Stoup
This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:
I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.
Yesterday I was at a "cultivating curiosity" workshop beta-test. One concept was "there are different mental postures you can adopt, that affect how easy it is not notice and cultivate curiosities."
It wasn't exactly the point of the workshop, but I ended up with several different "curiosity-postures", that were useful to try on while trying to lean into "curiosity" re: topics that I feel annoyed or frustrated or demoralized about.
The default stances I end up with when I Try To Do Curiosity On Purpose are something like:
1. Dutiful Curiosity (which is kinda ...
Book review: Deep Utopia: Life and Meaning in a Solved World, by Nick Bostrom.
Bostrom's previous book, Superintelligence, triggered expressions of concern. In his latest work, he describes his hopes for the distant future, presumably to limit the risk that fear of AI will lead to a The Butlerian Jihad-like scenario.
While Bostrom is relatively cautious about endorsing specific features of a utopia, he clearly expresses his dissatisfaction with the current state of the world. For instance, in a footnoted rant about preserving nature, he writes:
...Imagine that some technologically advanced civilization arrived on Earth ... Imagine they said: "The most important thing is to preserve the ecosystem in its natural splendor. In particular, the predator populations must be preserved: the psychopath killers, the fascist goons, the despotic death squads ... What a tragedy if this rich natural diversity were replaced with a monoculture of
Thanks! I'm also uninterested in the question of whether it's possible. Obviously it is. The question is how we'll decide to use it. I think that answer is critical to whether we'd consider the results utopian. So, does he consider how we should or will use that ability?
Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post.
Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Simon Cosson, Léo Dana and Diego Dorn for useful feedback.
TLDR: We present a new method for a safer-by design AI development. We think using plainly coded AIs may be feasible in the near future and may be safe. We also present a prototype and research ideas on Manifund.
Epistemic status: Armchair reasoning style. We think the method we are proposing is interesting and could yield very positive outcomes (even though it is still speculative), but we are less sure about which safety policy would use it in the long run.
Current AIs are developed through deep learning: the AI tries something, gets it wrong, then...
Every time you use an AI tool to write a regex to replace your ML classifier, you're doing this.
What monster downvoted this
Crosspost from my blog.
If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.
For...
Generally, hedgehogs are less trustworthy than foxes. If you see a debate as being about either believing in a mainstream hedgehog position or a contrarian hedgehog position you are often not having the most accurate view.
Instead of thinking that either Matthew Walker or Guzey is right, maybe the truth lies somewhere in the middle and Guzey is pointing to real issues but exaggerating the effect.
I think most of the cases that the OP lists are of that nature that there's an effect and that the hedgehog contrarian position exaggerates that effect.
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.
This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.
We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.
Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."
We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...
It was added recently and just added to a new release, so pip install transformer_lens
should work now/soon (you want v1.16.0 I think), otherwise you can install from the Github repo
current LLMs vs dangerous AIs
Most current "alignment research" with LLMs seems indistinguishable from "capabilities research". Both are just "getting the AI to be better at what we want it to do", and there isn't really a critical difference between the two.
Alignment in the original sense was defined oppositionally to the AI's own nefarious objectives. Which LLMs don't have, so alignment research with LLMs is probably moot.
something related I wrote in my MATS application:
I think the most important alignment failure modes occur when deploying an LLM as
D&D.Sci forces the reader to think harder than anything else on this website
D&D.Sci smoothly entices me towards thinking hard. There's lots of thinking hard that can be done when reading a good essay, but the default is always to read on (cf Feynman on reading papers) and often I just do that while skipping the thinking hard.