Quick Takes

decision theory is no substitute for utility function

some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following:

my utility function is about what i want for just me; but i'm altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism.

it's possible that this is true for some people, but in general i expect that to be a mistaken anal... (read more)

Agreed code as coordination mechanism

Code nowadays can do lots of things, from buying items to controlling machines. This presents code as a possible coordination mechanism, if you can get multiple people to agree on what code should be run in particular scenarios and situations, that can take actions on behalf of those people that might need to be coordinated.

This would require moving away from the “one person committing code and another person reviewing” code model. 

This could start with many people reviewing the code, people could write their own t... (read more)

2faul_sname3d
Can you give a concrete example of a situation where you'd expect this sort of agreed-upon-by-multiple-parties code to be run, and what that code would be responsible for doing? I'm imagining something along the lines of "given a geographic boundary, determine which jurisdictions that boundary intersects for the purposes of various types of tax (sales, property, etc)". But I don't know if that's wildly off from what you're imagining.

Looks like someone has worked on this kind of thing for different reasons https://www.worlddriven.org/

1Will_Pearson3d
I was thinking of having evals that controlled deployment of LLMs could be something that needs multiple stakeholders to agree upon. Butt really it is a general use pattern.
Raemon9d30

What would a "qualia-first-calibration" app would look like?

Or, maybe: "metadata-first calibration"

The thing with putting probabilities on things is that often, the probabilities are made up. And the final probability throws away a lot of information about where it actually came from.

I'm experimenting with primarily focusing on "what are all the little-metadata-flags associated with this prediction?". I think some of this is about "feelings you have" and some of it is about "what do you actually know about this topic?"

The sort of app I'm imagining would he... (read more)

"what are all the little-metadata-flags associated with this prediction?"

Some metadata flags I associate with predictions:

  • what kinds of evidence went into this prediction? ('did some research', 'have seen things like this before', 'mostly trusting/copying someone else's prediction')
    • if I'm taking other people's predictions into account, there's a metadata-flags for 'what would my prediction be if I didn't consider other people's predictions?'
  • is this a domain in which I'm well calibrated?
  • is my prediction likely to change a lot, or have I already seen most of the evidence that I expect to for a while?
  • how important is this?

So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta's promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing--I would point out though that Alibaba's Qwen model seems to do pretty okay in the arena...anyway, my point is that I don't think the "what if China" argument can be dismissed as quickly as some people on here seem to be ready to do.

5Seth Herd3h
Are you saying that China will use Llama 3 400B weights as a basis for improving their research on LLMs? Or to make more tools from? Or to reach real AGI? Or what?

Yes, yes. Probably not. And they already have a Sora clone called Vidu, for heaven's sake.

We spend all this time debating: should greedy companies be in control, should government intervene, will intervention slow progress to the good stuff: cancer cures, longevity, etc. All of these arguments assume that WE (which I read as a gloss for the West) will have some say in the use of AGI. If the PRC gets it, and it is as powerful as predicted, these arguments become academic. And this is not because the Chinese are malevolent. It's because, AGI would fall into ... (read more)

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions:

  • By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value?
  • A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI
... (read more)
Reply21111
Showing 3 of 11 replies (Click to show all)
4ryan_greenblatt1d
* My current guess is that max good and max bad seem relatively balanced. (Perhaps max bad is 5x more bad/flop than max good in expectation.) * There are two different (substantial) sources of value/disvalue: interactions with other civilizations (mostly acausal, maybe also aliens) and what the AI itself terminally values * On interactions with other civilizations, I'm relatively optimistic that commitment races and threats don't destroy as much value as acausal trade generates on some general view like "actually going through with threats is a waste of resources". I also think it's very likely relatively easy to avoid precommitment issues via very basic precommitment approaches that seem (IMO) very natural. (Specifically, you can just commit to "once I understand what the right/reasonable precommitment process would have been, I'll act as though this was always the precommitment process I followed, regardless of my current epistemic state." I don't think it's obvious that this works, but I think it probably works fine in practice.) * On terminal value, I guess I don't see a strong story for extreme disvalue as opposed to mostly expecting approximately no value with some chance of some value. Part of my view is that just relatively "incidental" disvalue (like the sort you link to Daniel Kokotajlo discussing) is likely way less bad/flop than maximum good/flop.
Wei Dai11h20

Thank you for detailing your thoughts. Some differences for me:

  1. I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them.
  2. I'm perhaps less optimistic than you about commitment races.
  3. I have some credence on max good and max bad being not close to balanced, that additi
... (read more)
1Quinn1d
sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

My current main cruxes:

  1. Will AI get takeover capability? When?
  2. Single ASI or many AGIs?
  3. Will we solve technical alignment?
  4. Value alignment, intent alignment, or CEV?
  5. Defense>offense or offense>defense?
  6. Is a long-term pause achievable?

If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.

I offer, no consensus, but my own opinions: 

Will AI get takeover capability? When?

0-5 years.

Single ASI or many AGIs?

There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be. 

Will we solve technical alignment?

Contingent. 

Value alignment, intent alignment, or CEV?

For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization. 

Defense>offense or offense>defense?

Of... (read more)

AGI doom by noise-cancelling headphones:                                                                            

ML is already used to train what sound-waves to emit to cancel those from the environment. This works well with constant high-entropy sound waves easy to predict, but not with low-entropy sounds like speech. Bose or Soundcloud or whoever train very hard on... (read more)

Showing 3 of 10 replies (Click to show all)

FWIW it was obvious to me

2Seth Herd12d
Thanks! A joke explained will never get a laugh, but I did somehow get a cackling laugh from your explanation of the joke. I think I didn't get it because I don't think the trend line breaks. If you made a good enough noise reducer, it might well develop smart and distinct enough simulations that one would gain control of the simulator and potentially from there the world. See A smart enough LLM might be deadly simply if you run it for long enough if you want to hurt your head on this. I've thought about it a little because it's interesting, but not a lot because I think we probably are killed by agents we made deliberately long before we're killed by accidentally emerging ones.
3faul_sname15d
Fixed, thanks
nim14h20

I've found an interesting "bug" in my cognition: a reluctance to rate subjective experiences on a subjective scale useful for comparing them. When I fuzz this reluctance against many possible rating scales, I find that it seems to arise from the comparison-power itself.

The concrete case is that I've spun up a habit tracker on my phone and I'm trying to build a routine of gathering some trivial subjective-wellbeing and lifestyle-factor data into it. My prototype of this system includes tracking the high and low points of my mood through the day as recalled ... (read more)

dirk1d30

I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).

2Viliam1d
Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this: You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time. (This is probably wrong, but hey, people say that the best way to elicit answer is to provide a wrong one.)

Here's an example for you: I used to turn the faucet on while going to the bathroom, thinking it was due simply to having a preference for somewhat-masking the sound of my elimination habits from my housemates, then one day I walked into the bathroom listening to something-or-other via earphones and forgetting to turn the faucet on only to realize about halfway through that apparently I actually didn't much care about such masking, previously being able to hear myself just seemed to trigger some minor anxiety about it I'd failed to recognize, though its ab... (read more)

dirk1d40

I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

1cubefox21h
I agree. This is unfortunately often done in various fields of research where familiar terms are reused as technical terms. For example, in ordinary language "organic" means "of biological origin", while in chemistry "organic" describes a type of carbon compound. Those two definitions mostly coincide on Earth (most such compounds are of biological origin), but when astronomers announce they have found "organic" material on an asteroid this leads to confusion.

Also astronomers: anything heavier than helium is a "metal"

Research Writing Workflow: First figure stuff out

  • Do research and first figure stuff out, until you feel like you are not confused anymore.
  • Explain it to a person, or a camera, or ideally to a person and a camera.
    • If there are any hiccups expand your understanding.
    • Ideally, as the last step, explain it to somebody whom you have not ever explained it to.
  • Only once you made a presentation without hiccups you are ready to write post.
    • If you have a recording this is useful as a starting point.

I like the rough thoughts way though. I'm not here to like read a textbook.

Nathan and Carson's Manifold discussion.

As of the last edit my position is something like:

"Manifold could have handled this better, so as not to force everyone with large amounts of mana to have to do something urgently, when many were busy. 

Beyond that they are attempting to satisfy two classes of people:

  • People who played to donate can donate the full value of their investments
  • People who played for fun now get the chance to turn their mana into money

To this end, and modulo the above hassle this decision is good. 

It is unclear to me whether there... (read more)

Showing 3 of 14 replies (Click to show all)

Nevertheless lots of people were hassled. That has real costs, both to them and to you. 

2Nathan Young20h
If that were true then there are many ways you could partially do that - eg give people a set of tokens to represent their mana at the time of the devluation and if at future point you raise. you could give them 10x those tokens back.
2Nathan Young21h
I’m discussing with Carson. I might change my mind but i don’t know that i’ll argue with both of you at once.

Have there been any great discoveries made by someone who wasn't particularly smart?

This seems worth knowing if you're considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?

Various sailors made important discoveries back when geography was cutting-edge science.  And they don't seem particularly bright.

Vasco De Gama discovered that Africa was circumnavigable.

Columbus was wrong about the shape of the Earth, and he discovered America.  He died convinced that his newly discovered islands were just off the coast of Asia, so that's a negative sign for his intelligence (or a positive sign for his arrogance, which he had in plenty.)

Cortez discovered that the Aztecs were rich and easily conquered.

Of course, lots of other wou... (read more)

5niplav1d
My best guess is that people in these categories were ones that were high in some other trait, e.g. patience, which allowed them to collect datasets or make careful experiments for quite a while, thus enabling others to make great discoveries. I'm thinking for example of Tycho Brahe, who is best known for 15 years of careful astronomical observation & data collection, or Gregor Mendel's 7-year-long experiments on peas. Same for Dmitry Belayev and fox domestication. Of course I don't know their cognitive scores, but those don't seem like a bottleneck in their work. So the recipe to me looks like "find an unexplored data source that requires long-term observation to bear fruit, but would yield a lot of insight if studied closely, then investigate".
4Gunnar_Zarncke1d
I asked ChatGPT  and it's difficult to get examples out of it. Even with additional drilling down and accusing it of being not inclusive of people with cognitive impairments, most of its examples are either pretty smart anyway, savants or only from poor backgrounds. The only ones I could verify that fit are: * Richard Jones accidentally created the Slinky * Frank Epperson, as a child, Epperson invented the popsicle * George Crum inadvertently invented potato chips I asked ChatGPT (in a separate chat) to estimate the IQ of all the inventors is listed and it is clearly biased to estimate them high, precisely because of their inventions. It is difficult to estimate the IQ of people retroactively. There is also selection and availability bias.

I expect large parts of interpretability work could be safely automatable very soon (e.g. GPT-5 timelines) using (V)LM agents; see A Multimodal Automated Interpretability Agent for a prototype. 

Notably, MAIA (GPT-4V-based) seems approximately human-level on a bunch of interp tasks, while (overwhelmingly likely) being non-scheming (e.g. current models are bad at situational awareness and out-of-context reasoning) and basically-not-x-risky (e.g. bad at ARA).

Given the potential scalability of automated interp, I'd be excited to see plans to use large amo... (read more)

Showing 3 of 8 replies (Click to show all)
2ryan_greenblatt1d
Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of internals. I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).

1Bogdan Ionut Cirstea1d
Hey Jacques, sure, I'd be happy to chat!  
dirk1d125

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".

1dkornai1d
I would say that if a concept is imprecise, more words [but good and precise words] have to be dedicated to faithfully representing the diffuse nature of the topic. If this larger faithful representation is compressed down to fewer words, that can lead to vague phrasing. I would therefore often view vauge phrasing as a compression artefact, rather than a necessary outcome of translating certain types of concepts to words. 

Today I learned that being successful can involve feelings of hopelessness.

When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.

This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.

Showing 3 of 5 replies (Click to show all)
7Carl Feynman3d
I was depressed once for ten years and didn’t realize that it was fixable.  I thought it was normal to have no fun and be disagreeable and grumpy and out of sorts all the time.  Now that I’ve fixed it, I’m much better off, and everyone around me is better off.  I enjoy enjoyable activities, I’m pleasant to deal with, and I’m only out of sorts when I’m tired or hungry, as is normal. If you think you might be depressed, you might be right, so try fixing it.  The cost seems minor compared to the possible benefit (at least it was in my case.). I don’t think there’s a high possibility of severe downside consequences, but I’m not a psychiatrist, so what do I know. I had been depressed for a few weeks at a time in my teens and twenties and I thought I knew how to fix it: withdraw from stressful situations, plenty of sleep, long walks in the rain.  (In one case I talked to a therapist, which didn’t feel like it helped.)  But then it crept up on me slowly in my forties and in retrospect I spent ten years being depressed. So fixing it started like this.  I have a good friend at work, of many years standing.  I’ll call him Barkley, because that‘s not his name.  I was riding in the car with my wife, complaining about some situation at work.  My wife said “well, why don’t you ask Barkley to help?”  And I said “Ahh, Barkley doesn’t care.”  And my wife said “What are you saying?  Of course he cares about you.”  And I realized in that moment that I was detached from reality, that Barkley was a good friend who had done many good things for me, and yet my brain was saying he didn’t care.  And thus my brain was lying to me to make me miserable.  So I think for a bit and say “I think I may be depressed.”  And my wife thinks (she told me later) “No duh, you’re depressed. It’s been obvious for years to people who know you.”  But she says “What would you like to do about it?” And I say, “I don’t know, suffer I guess, do you have a better idea?”  And she says “How about if I find you a
1Johannes C. Mayer2d
This is useful. Now that I think about it, I do this. Specifically, I have extremely unrealistic assumptions about how much I can do, such that these are impossible to accomplish. And then I feel bad for not accomplishing the thing. I haven't tried to be mindful of that. The problem is that this is I think mainly subconscious. I don't think things like "I am dumb" or "I am a failure" basically at all. At least not in explicit language. I might have accidentally suppressed these and thought I had now succeeded in not being harsh to myself. But maybe I only moved it to the subconscious level where it is harder to debug.

I would highly recommend getting someone else to debug your subconscious for you.  At least it worked for me.  I don’t think it would be possible for me to have debugged myself.
 

My first therapist was highly directive.  He’d say stuff like “Try noticing when you think X, and asking yourself what happened immediately before that.  Report back next week.” And listing agenda items and drawing diagrams on a whiteboard.  As an engineer, I loved it.  My second therapist was more in the “providing supportive comments while I tal... (read more)

Fabien Roger1dΩ6130

List sorting does not play well with few-shot mostly doesn't replicate with davinci-002.

When using length-10 lists (it crushes length-5 no matter the prompt), I get:

  • 32-shot, no fancy prompt: ~25%
  • 0-shot, fancy python prompt: ~60% 
  • 0-shot, no fancy prompt: ~60%

So few-shot hurts, but the fancy prompt does not seem to help. Code here.

I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. ... (read more)

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

dirk1d30

Classic type of argument-gone-wrong (also IMO a way autistic 'hyperliteralism' or 'over-concreteness' can look in practice, though I expect that isn't always what's behind it): Ashton makes a meta-level point X based on Birch's meta point Y about object-level subject matter Z. Ashton thinks the topic of conversation is Y and Z is only relevant as the jumping-off point that sparked it, while Birch wanted to discuss Z and sees X as only relevant insofar as it pertains to Z. Birch explains that X is incorrect with respect to Z; Ashton, frustrated, reiterates ... (read more)

dirk1d10

Meta/object level is one possible mixup but it doesn't need to be that. Alternative example, is/ought: Cedar objects to thing Y. Dusk explains that it happens because Z. Cedar reiterates that it shouldn't happen, Dusk clarifies that in fact it is the natural outcome of Z, and we're off once more.

I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I'd like to be able to hover over text or highlight it without having to see the inline annotations. 

If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule

lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)

which will remove the reaction section underneath comments and the highlights corresponding to those reactions.

The former of these you can also do through the element picker.

3mesaoptimizer1d
I use GreaterWrong as my front-end to interface with LessWrong, AlignmentForum, and the EA Forum. It is significantly less distracting and also doesn't make my ~decade old laptop scream in agony when multiple LW tabs are open on my browser.
Load More