It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.

Raemon6m20
0
Yesterday I was at a "cultivating curiosity" workshop beta-test. One concept was "there are different mental postures you can adopt, that affect how easy it is not notice and cultivate curiosities." It wasn't exactly the point of the workshop, but I ended up with several different "curiosity-postures", that were useful to try on while trying to lean into "curiosity" re: topics that I feel annoyed or frustrated or demoralized about. The default stances I end up with when I Try To Do Curiosity On Purpose are something like: 1. Dutiful Curiosity (which is kinda fake, although capable of being dissociatedly autistic and noticing lots of details that exist and questions I could ask) 2. Performatively Friendly Curiosity (also kinda fake, but does shake me out of my default way of relating to things. In this, I imagine saying to whatever thing I'm bored/frustrated with "hullo!" and try to acknowledge it and and give it at least some chance of telling me things) But some other stances to try on, that came up, were: 3. Curiosity like "a predator." "I wonder what that mouse is gonna do?" 4. Earnestly playful curiosity. "oh that [frustrating thing] is so neat, I wonder how it works! what's it gonna do next?" 5. Curiosity like "a lover". "What's it like to be that you? What do you want? How can I help us grow together?" 6. Curiosity like "a mother" or "father" (these feel slightly different to me, but each is treating [my relationship with a frustrating thing] like a small child who is bit scared, who I want to help, who I am generally more competent than but still want to respect the autonomy of." 7. Curiosity like "a competent but unemotional robot", who just algorithmically notices "okay what are all the object level things going on here, when I ignore my usual abstractions?"... and then "okay, what are some questions that seem notable?" and "what are my beliefs about how I can interact with this thing?" and "what can I learn about this thing that'd be useful for my goals?"
decision theory is no substitute for utility function some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following: > my utility function is about what i want for just me; but i'm altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism. it's possible that this is true for some people, but in general i expect that to be a mistaken analysis of their values. decision theory cooperates with agents relative to how much power they have, and only when it's instrumental. in my opinion, real altruism (/egalitarianism/cosmopolitanism/fairness/etc) should be in the utility function which the decision theory is instrumental to. i actually intrinsically care about others; i don't just care about others instrumentally because it helps me somehow. some important aspects that my utility-function-altruism differs from decision-theoritic-cooperation includes: * i care about people weighed by moral patienthood, decision theory only cares about agents weighed by negotiation power. if an alien superintelligence is very powerful but isn't a moral patient, then i will only cooperate with it instrumentally (for example because i care about the alien moral patients that it has been in contact with); if cooperating with it doesn't help my utility function (which, again, includes altruism towards aliens) then i won't cooperate with that alien superintelligence. corollarily, i will take actions that cause nice things to happen to people even if they've very impoverished (and thus don't have much LDT negotiation power) and it doesn't help any other aspect of my utility function than just the fact that i value that they're okay. * if i can switch to a better decision theory, or if fucking over some non-moral-patienty agents helps me somehow, then i'll happily do that; i don't have goal-content integrity about my decision theory. i do have goal-content integrity about my utility function: i don't want to become someone who wants moral patients to unconsentingly-die or suffer, for example. * there seems to be a sense in which some decision theories are better than others, because they're ultimately instrumental to one's utility function. utility functions, however, don't have an objective measure for how good they are. hence, moral anti-realism is true: there isn't a Single Correct Utility Function. decision theory is instrumental; the utility function is where the actual intrinsic/axiomatic/terminal goals/values/preferences are stored. usually, i also interpret "morality" and "ethics" as "terminal values", since most of the stuff that those seem to care about looks like terminal values to me. for example, i will want fairness between moral patients intrinsically, not just because my decision theory says that that's instrumental to me somehow.
current LLMs vs dangerous AIs Most current "alignment research" with LLMs seems indistinguishable from "capabilities research". Both are just "getting the AI to be better at what we want it to do", and there isn't really a critical difference between the two. Alignment in the original sense was defined oppositionally to the AI's own nefarious objectives. Which LLMs don't have, so alignment research with LLMs is probably moot. something related I wrote in my MATS application: ---------------------------------------- 1. I think the most important alignment failure modes occur when deploying an LLM as part of an agent (i.e. a program that autonomously runs a limited-context chain of thought from LLM predictions, maintains a long-term storage, calls functions such as search over storage, self-prompting and habit modification either based on LLM-generated function calls or as cron-jobs/hooks). 2. These kinds of alignment failures are (1) only truly serious when the agent is somehow objective-driven or equivalently has feelings, which current LLMs have not been trained to be (I think that would need some kind of online learning, or learning to self-modify) (2) can only be solved when the agent is objective-driven.
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist. * An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1] * Beef, copper, and off-season strawberries are $11/kg, about the same as a 75kg person taking a three-hour, 250km Uber ride costing $3/km. * Oranges and aluminum are $2-4/kg, about the same as flying them to Antarctica. [2] * Rice and crude oil are ~$0.60/kg, about the same as $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3] * Coal and iron ore are $0.10/kg, significantly more than the cost of shipping it around the entire world via smallish (Handysize) bulk carriers. Large bulk carriers are another 4x more efficient [6]. * Water is very cheap, with tap water $0.002/kg in NYC. But shipping via tanker is also very cheap, so you can ship it maybe 1000 km before equaling its cost. It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times. [1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for GPS satellites to cost more than an iPhone per kg, but Starlink wants to be cheaper. [2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996. [3] https://www.bts.gov/content/average-freight-revenue-ton-mile [4] https://markets.businessinsider.com/commodities [5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/ [6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799
Mati_Roy1d171
1
it seems to me that disentangling beliefs and values are important part of being able to understand each other and using words like "disagree" to mean both "different beliefs" and "different values" is really confusing in that regard

Popular Comments

Recent Discussion

Scott Alexander has called for people to organize a spring meetup, and this year, it will be held at Stoup Brewing in Capitol Hill, Seattle. I have made a reservation for two tables at Stoup Brewing, which is known for being one of the quietest bar spaces in the city. I will be wearing a shirt and a blue sweater, hopefully you’ll see the group when you arrive.

Stoup Brewing offers a selection of both beer and non-alcoholic drinks. While the venue does not serve food, you are welcome to bring your own. Additionally, you are encouraged to bring board games to enjoy with fellow attendees. In previous years, Stoup has provided board games for patrons to borrow, but the availability of these games can be inconsistent.

For those driving to the event, please be aware that there are a few parking garages nearby; however, free parking is unfortunately not available in the area.

See: https://www.astralcodexten.com/p/spring-meetups-everywhere-2024-call

We’re in the north-west corner of Stoup

1Nikita Sokolsky1h
Suggested discussion questions / ice breakers for today's meetup, assembled from ACX posts in the past 6 months. See you all in one hour :-) 1. What was the most interesting question for you from the recent ACX survey? 2. What do you think of the Coffeepocalypse argument in relation to AI risk? 3. Do you agree with the Robin Hanson idea that (more) medicine doesn’t work? 4. Do you like the “Ye Olde Bay Area House Party” series of posts? 5. What do you think about the Lumina Probiotic? Are you planning to order it in the future? 6. What’s your position on the COVID lab leak debate? 7. Do you like prediction markets? What was a prediction you’ve made in the past year that you’re proud of? 8. What book would you review for the ACX book review contest if you were to write one?  9. Do you believe that capitalism is more effective than charity in solving world poverty? 10. Which dictator did you find the most interesting from the “Dictator Book Club” series? 

This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:

  1. don't feel ready to be written up as a full post
  2. I think the process of writing them up might make them worse (i.e. longer than they need to be)

I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.

Raemon6m20

Yesterday I was at a "cultivating curiosity" workshop beta-test. One concept was "there are different mental postures you can adopt, that affect how easy it is not notice and cultivate curiosities."

It wasn't exactly the point of the workshop, but I ended up with several different "curiosity-postures", that were useful to try on while trying to lean into "curiosity" re: topics that I feel annoyed or frustrated or demoralized about.

The default stances I end up with when I Try To Do Curiosity On Purpose are something like:

1. Dutiful Curiosity (which is kinda ... (read more)

Book review: Deep Utopia: Life and Meaning in a Solved World, by Nick Bostrom.

Bostrom's previous book, Superintelligence, triggered expressions of concern. In his latest work, he describes his hopes for the distant future, presumably to limit the risk that fear of AI will lead to a The Butlerian Jihad-like scenario.

While Bostrom is relatively cautious about endorsing specific features of a utopia, he clearly expresses his dissatisfaction with the current state of the world. For instance, in a footnoted rant about preserving nature, he writes:

Imagine that some technologically advanced civilization arrived on Earth ... Imagine they said: "The most important thing is to preserve the ecosystem in its natural splendor. In particular, the predator populations must be preserved: the psychopath killers, the fascist goons, the despotic death squads ... What a tragedy if this rich natural diversity were replaced with a monoculture of

...

Thanks! I'm also uninterested in the question of whether it's possible. Obviously it is. The question is how we'll decide to use it. I think that answer is critical to whether we'd consider the results utopian. So, does he consider how we should or will use that ability?

2Said Achmiz36m
I recommend Gwern’s discussion of pain to anyone who finds this sort of proposal intriguing (or anyone who is simply interested in the subject).
2Said Achmiz41m
I would go further, and say that replacing human civilization with “a monoculture of healthy, happy, well-fed people living in peace and harmony” does in fact sound very bad. Never mind these aliens (who cares what they think?); from our perspective, this seems like a bad outcome. Not by any means the worst imaginable outcome… but still bad.
2Said Achmiz43m
If they’re merely opining, then why should we be appalled? Why would we even care? Let them opine to one another; it doesn’t affect us. If they’re intervening (without our consent), then obviously this is a violation of our sovereignty and we should treat it as an act of war. In any case, one “preserves” what one owns. These hypothetical advanced aliens are speaking as if they own us and our planet. This is obviously unacceptable as far as we’re concerned, and it would behoove us in this case to disabuse these aliens of such a notion at our earliest convenience. Conversely, it makes perfect sense to speak of humans as collectively owning the natural resources of the Earth, including all the animals and so on. As such, wishing to preserve some aspects of it is entirely reasonable. (Whether we ultimately choose to do so is another question—but that it’s a question for us to answer, according to our preferences, is clear enough.)

Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post. 
Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Simon Cosson, Léo Dana and Diego Dorn for useful feedback.

TLDR: We present a new method for a safer-by design AI development. We think using plainly coded AIs may be feasible in the near future and may be safe. We also present a prototype and research ideas on Manifund.

Epistemic status: Armchair reasoning style. We think the method we are proposing is interesting and could yield very positive outcomes (even though it is still speculative), but we are less sure about which safety policy would use it in the long run.

Current AIs are developed through deep learning: the AI tries something, gets it wrong, then...

RedMan43m20

Every time you use an AI tool to write a regex to replace your ML classifier, you're doing this.

7Rafael Kaufmann Nedal9h
@Épiphanie Gédéon this is great, very complementary/related to what we've been developing for the Gaia Network. I'm particularly thrilled to see the focus on simplicity and incrementalism, as well as the willingness to roll up one's sleeves and write code (often sorely lacking in LW). And I'm glad that you are taking the map/territory problem seriously; I wholeheartedly agree with the following: "Most safe-by-design approaches seem to rely heavily on formal proofs. While formal proofs offer hard guarantees, they are often unreliable because their model of reality needs to be extremely close to reality itself and very detailed to provide assurance." A few additional thoughts: * To scale this approach, one will want to have "structural regularizers" towards modularity, interoperability and parsimony. Two of those we have strong opinions on are: *  A preference for reusing shared building blocks and building bottom-up. As a decentralized architecture, we implement this preference in terms of credit assignment, specifically free energy flow accounting. * Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs. This enables both a shared statistical notion of model grounding (effectively backing the free energy flow accounting as approximate Bayesian inference of higher-order model structure) and a shared basis for defining and evaluating policy spaces (instantly turning any descriptive model into a usable substrate for model-based RL / active inference). * Learning models from data is super powerful as far as it goes, but it's sometimes necessary -- and often orders of magnitude more efficient -- to leverage prior knowledge. Two simple and powerful ways to do it, which we have successfully experimented with, are: * LLM-driven model extraction from scientific literature and other sources of causal knowledge. This is crucial to bootstrap the component library. (See also

What monster downvoted this

1lukehmiles3h
Hmm I think the damaging effect would occur over many years but mainly during puberty. It looks like there's only two studies they mention lasting over a year. One found a damaging effect and the other found no effect.

Crosspost from my blog.  

If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.

For...

1StartAtTheEnd11h
It all depends on the topic. It's unlikely that the consensus about objective fields like mathematics or physics are wrong. The more subjective, controversial, and political something is, and the more profit and power lies in controlling the consensus, the more skepticism is appropriate. The bias on Wikipedia (used as an example) is correlated in this manner, CW topics have a lot of misinformation, while things that people aren't likely to feel strongly about are written more honestly. If some redpills or blackpills turned out to be true, or some harsh-sounding aspects of reality related to discrimination, selection, biases or differences in humans turned out to be true, or some harsh philosophy like "suffering is good for you", "poverty is negatively correlated with virtuous acts" or "People unconsciously want to be ruled" turned out to be true, would you hear about it from somebody with a good reputation? I also think it's worth noting that both the original view and the contrarian view might be overstated. That education isn't useless nor as good as we make it out to be. I've personally found myself annoyed at exaggerations like "X is totally safe, it never has any side-effects" or "People basically never do Y, it is less likely than being hit by lightning" (despite millions of people participating because it's relevant for their future, thousands of which are mentally ill by statistic necessity). This has made me want to push back, but the opposing evidence is likely exaggerated or cherry-picked as well, since people feel strongly about various conflicts. The optimization target is Truth only to the extent that Truth is rewarded. If something else has a higher priority, then the truth will be distorted. But due to the broken-windows theory, it might be better to trust society too much rather than too little. I don't want to spread doubt, it might be harmful even in the case that I'm right.
1Andrew Burns3h
The roundness of the earth is not a point upon which any political philosophy hinges, yet flat earthism is a thing. The roundness is not subjective, it isn't controversial, and it does not advance anyone's economic interest. So why do people engage in this sort of contrarianism? I speculate that the act of being a contrarian signals to others that you question authority. The bigger the consensus challenged, the more disdain for authority shown. One's willingness to question authority is often used as a proxy for "independent thinking." The thought is that someone who questions authority might be more likely to accept new evidence. But questioning authority is not the same as being an independent thinker, and so, when taken to its extreme, it leads to denying reality, because isn't reality the ultimate authority?
2tailcalled16h
Definitely relevant to figure out what's true when one is only talking about the object level, but the OP was about how trustworthy contrarians are compared to the mainstream rather than simply being about the object level.

Generally, hedgehogs are less trustworthy than foxes. If you see a debate as being about either believing in a mainstream hedgehog position or a contrarian hedgehog position you are often not having the most accurate view.

Instead of thinking that either Matthew Walker or Guzey is right, maybe the truth lies somewhere in the middle and Guzey is pointing to real issues but exaggerating the effect.

I think most of the cases that the OP lists are of that nature that there's an effect and that the hedgehog contrarian position exaggerates that effect. 

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.

This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.

We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.

Executive summary

Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."

We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...

Neel Nanda1hΩ440

It was added recently and just added to a new release, so pip install transformer_lens should work now/soon (you want v1.16.0 I think), otherwise you can install from the Github repo

1lukehmiles4h
The "love minus hate" thing really holds up
1Andy Arditi5h
A good incentive to add Llama 3 to TL ;) We run our experiments directly using PyTorch hooks on HuggingFace models. The linked demo is implemented using TL for simplicity and clarity.
3lc5h
Stop posting prompt injections on Twitter and calling it "misalignment" 

current LLMs vs dangerous AIs

Most current "alignment research" with LLMs seems indistinguishable from "capabilities research". Both are just "getting the AI to be better at what we want it to do", and there isn't really a critical difference between the two.

Alignment in the original sense was defined oppositionally to the AI's own nefarious objectives. Which LLMs don't have, so alignment research with LLMs is probably moot.

something related I wrote in my MATS application:


  1. I think the most important alignment failure modes occur when deploying an LLM as

... (read more)
abstractapplic

I have thoughts about "Mediums Overpower Messages". I would spell them out, but I'd be interested in trying to Socrates you about them. That work for you?

lsusr

I love Socrating.

abstractapplic

My reaction to that post was to take it as a massive (and at least slightly unfair) compliment. Can you figure out why?

lsusr

Well, I know you as the guy who writes D&D.Sci. Is that going in the right direction?

abstractapplic

Yes. That's it, but I'd like to see if you can figure out both reasons that made me take it as a compliment. (Pretty sure you already have the easy one but please spell it out.)

lsusr

This website is dominated by non-fiction dialectic essays. Basically, a claim is started in the title and then argued for in the body.

In my post about

...
kave2h20

D&D.Sci forces the reader to think harder than anything else on this website

D&D.Sci smoothly entices me towards thinking hard. There's lots of thinking hard that can be done when reading a good essay, but the default is always to read on (cf Feynman on reading papers) and often I just do that while skipping the thinking hard.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA