User Comment Replies

Refusal in LLMs is mediated by a single direction

I really appreciate the way you have written this up. It seems that 2-7% of refusals do not respond to the unidimensional treatment. I'm curious if you've looked at this subgroup the same way as you have the global data to see if they have another dimension for refusal, or if the statistics of the subgroup shed some other light on the stubborn refusals.

E.T. Jaynes Probability Theory: The logic of Science I

dentalperson1y20

Thanks, your confusion pointed out a critical typo. Indeed the relatively large number of walls broken should make it more likely that the orcs were the culprits. The 1:20 should have been 20:1 (going from -10 dB to +13 dB).

E.T. Jaynes Probability Theory: The logic of Science I

dentalperson1y1-1

Thanks! These are great points. I applied the correction you noted about the signs and changed the wording about the direction of evidence. I agree that the clarification about the 3 dB rule is useful; linked to your comment.

Edit: The 10 was also missing a sign. It should be -10 + 60 - 37. I also flipped the 1:20 to 20:1 posterior odds that the orcs did it.

Sam Altman fired from OpenAI

dentalperson1y52

How surprising is this to the alignment community professionals (e.g. people at MIRI, Redwood Research, or similar)? From an outside view, the volatility/flexibility and movement away from pure growth and commercialization seems unexpected and could be to alignment researchers' benefit (although it's difficult to see the repercussions at this point). While it is surprising to me because I don't know the inner workings of OpenAI, I'm surprised that it seems similarly surprising to the LW/alignment community as well.

Perhaps the insiders are stil... (read more)

3DanielFilan1y

I'm at CHAI and it's shocking to me, but I'm not the most plugged-in person.

Sune1y3329

It seems this was a surprise to almost everyone even at OpenAI, so I don’t think it is evidence that there isn’t much information flow between LW and OpenAI.

ACX/EA Lisbon October 2023 Meetup

dentalperson1y10

I will be there at 3:30 or so.

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

dentalperson1y10

I missed your reply, but thanks for calling this out. I'm nowhere as close to you to EY so I'll take your model over mine, since mine was constructed on loose grounds. I don't even remember where my number came from, but my best guess is 90% came from EY giving 3/15/16 as the largest number he referenced in the timeline, and from some comments in the Death with Dignity post, but this seems like a bad read to me now.

At 87, Pearl is still able to change his mind

dentalperson1y50

Thanks for sharing! It's nice to see plasticity, especially for stats, which seems to have more opinionated contributors than other applied maths. Although, it seems this 'admission' is not changing his framework, but rather reinterpreting how ML is used to be compatible with his framework.

Pearl's texts talk about having causal models that use the do(X) operator (e.g. P(Y|do(X))) to signify causal information. Now in LLMs, he sees the text the model is conditioning on as sometimes being do(X) or X. I'm curious what else besides text woul... (read more)

2rotatingpaguro1y

I think it is true. If you observe everything in enough detail and your hypothesis space is complete you get counterfactual prediction automatically. Theoretical example: a Solomonoff inductor observes the world, Physical laws satisfy causality, the best prediction algorithm takes that into account, the inductor's inference favors that algorithm, the algorithm can simulate Physical laws and so produce counterfactuals if needed in the course of its predictions. If you live in a world where counterfactual thinking is possible and useful to predict the future, then Bayes brings you there. An interesting look at the question of counterfactuals is the debate between Pearl and Robins on cross-world independence assumptions. It's relevant because Robins solves the paradox of Pearl's impossible to verify assumptions by noting that you can always add a mediator in any arrow of a causal model (I'd add, due to locality of Physical laws) and this makes the assumptions verifiable in principle. In other words, by observing the "full video" of a process, instead of just the frames represented by some random variables, you need less out-of-the-hat assumptions to infer counterfactual causal quantities. I tried to write an explanation, but I realized I still don't understand the matter enough to go through the details, so I'll leave you a reference: the last section, "Mediation", in this Robins interview. My superficial impression is that the field of causal discovery does not have its shit together. Not to dunk on them; it's not a law of Nature that what you set out to do will be within your ability. See also "Are there any good, easy-to-understand examples of cases where statistical causal network discovery worked well in practice?"

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

dentalperson2y30

Thanks! I'm aware of the resources mentioned but haven't read deeply or frequently enough to have this kind of overview of the interaction between the cast of characters.

There are more than a few lists and surveys that state the CDFs for some of these people which helps a bit. A big-as-possible list of evidence/priors would be one way to closer inspect the gap. I wonder if it would be helpful to expand on the MIRI conversations and have a slow conversation between a >99% doom pessimist and a <50% doom 'optimist' with a moderator to prod ... (read more)

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

dentalperson2y-1-9

Thanks! Do you know of any arguments with a similar style to The Most Important Century that is as pessimistic as EY/MIRI folks (>90% probability of AGI within 15 years)? The style looks good, but time estimates for that one (2/3rd chance AGI by 2100) are significantly longer and aren't nearly as surprising or urgent as the pessimistic view asks for.

2Rob Bensinger2y

Wait, what? Why do you think anyone at MIRI assigns >90% probability to AGI within 15 years? That sounds wildly too confident to me. I know some MIRI people who assign 50% probability to AGI by 2038 or so (similar to Ajeya Cotra's recently updated view), and I believe Eliezer is higher than 50% by 2038, but if you told me that Eliezer told you in a private conversation "90+% within 15 years" I would flatly not believe you. I don't think timelines have that much to do with why Eliezer and Nate and I are way more pessimistic than the Open Phil crew.

2Vaniver2y

Not off the top of my head; I think @Rob Bensinger might keep better track of intro resources?

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

dentalperson2y3011

I still don't follow why EY assigns seemingly <1% chance of non-earth-destroying outcomes in 10-15 years (not sure if this is actually 1%, but EY didn't argue with the 0% comments mentioned in the "Death with dignity" post last year). This seems to place fast takeoff as being the inevitable path forward, implying unrestricted fast recursive designing of AIs by AIs. There are compute bottlenecks which seem slowish, and there may be other bottlenecks we can't think of yet. This is just one obstacle. Why isn't there more probability... (read more)

Ben Livengood2y162

The strongest argument I hear from EY is that he can't imagine a (or enough) coherent likely future paths that lead to not-doom, and I don't think it's a failure of imagination. There is decoherence in a lot of hopeful ideas that imply contradictions (whence the post of failure modes), and there is low probability on the remaining successful paths because we're likely to try a failing one that results in doom. Stepping off any of the possible successful paths has the risk of ending all paths with doom before they could reach fruition. There is no global... (read more)

Algon2y*120

Not really. The MIRI conversations and the AI Foom debate are probably the best we've got.

EY, and the MIRI crowd, have been very doomer long before, and more doomy along various axes, than the rest of the alignment community. Nate and Paul and others have tried bridging this gap before, spending several hundred hours (based on Nate's rough, subjective estimates) over the years. It hasn't really worked. Paul and EY had some conversations recently about this discrepancy which were somewhat illuminating, but ultimately didn't get anywhere. They tried to... (read more)

5Vaniver2y

Each of those books is also criticized in various ways; I think this is a Write a Thousand Roads to Rome situation instead of hoping that there is one publicly digestible argument. I would probably first link someone to The Most Important Century. [Also, I am generally happy to talk with interested industry folk about AI risk, and find live conversations work much better at identifying where and how to spend time than writing, so feel free to suggest reaching out to me.]

AGI in sight: our look at the game board

dentalperson2y43

There are many obstacles with no obvious or money-can-buy solutions.

The claim that current AI is superhuman in just about any task we can benchmark is not correct. The problems being explored are chosen because the researchers think AI have a shot at beating humans at it. Think about how many real world problems we pay other people money to solve that we can benchmark that aren't being solved by AI. Think about why these problems require humans right now.

My upper bound is much more than 15 years because I don't feel I have enough informat... (read more)

1Portia2y

Thanks for sharing. You have good points, and so do they. Not engaging with the people actually working on these topics respectfully and on their terms alienated the very people you need. They also make a great point there, that I see considered a lot in academia and less in this forum: the fact that we don't need misaligned AGI for us to be in deep trouble with misaligned AI. Unfriendly AI is causing massive difficulties today, right now, very concrete difficulties that we need to find solutions for. And the fact that a lot of us here are very receptive to claims of companies for what their products can do, which have generally not been written by engineers actually working on them, but by the marketing department. For every programmer I know working in a large company, I've heard them rant to no end that their own marketing department and popular science articles and news are representing their work as being able to already do things that it most definitely cannot do, and that they highlight doubt they will get it to do prior to release, let alone to reliably and well.

The Box Spread Trick: Get rich slightly faster

dentalperson4y30

Another thing I'm still curious about is who the buyers of these box spreads are (assuming the legs are sold as a combination and not separately) . The discussion says the arbitrage opportunity comes from the FDIC but the FDIC does not directly buy the spread; they allow the CD to exist, which the buyers should have access to. So do the buyers have access to options but not CDs, or are there other advantages that they have which I am missing?

2ESRogs4y

I'm not sure how important FDIC insurance is to the story here, but worth noting that it has a 250k per account limit. So I don't think financial institutions would have access to an unlimited amount of it.

The Box Spread Trick: Get rich slightly faster

dentalperson4y*40

The risk is small, but so is the benefit. As a result this was not a trivial analysis for me. I doubt if it's low risk to redeposit just 20% at a 30% market drop due to volatility (the recent intraday crash movements exceeded 10%, and you get a margin call at a 40% with the 250k/150k example). After mulling it over I think I agree that it is worth it anyways though.

Here's some more examples that I ran over. A better person would figure out the probabilities and equations to integrate over to calculate the expected value.

Withdrawing 75% at 0 ... (read more)

The Box Spread Trick: Get rich slightly faster

dentalperson4y20

Isn't another risk if the market tanks within the first few months, because you will have to pay the withdrawal penalty from the CD out of pocket before you have the interest accumulate? This risk seems proportionate to the benefit (given that we have more than one huge correction every 50 years, there is a > 2% chance that the market will have a huge correction in the first year).

You say that you are moderately confident that the risk did not include this case, so I'm likely missing something (or you need to do a moderate update).

3dentalperson4y

3Thomas Kwa4y

The risk is small, because for most CDs the withdrawal penalty is small. My credit union allows partial withdrawals and charges a fee of 6 months' interest (about 0.625%) on the amount withdrawn, so if the market tanks 30% and you have to redeposit say 20% into the margin account, you have lost a negligible 0.125% and the box spread trick still comes out ahead for the year. 6 months' interest is typical. It's important to choose a bank or credit union that has reasonably lenient terms, though.

Review of "Lifecycle Investing"

dentalperson5y20

It seems the whole deal is dependent on margin interest rates, so I would appreciate more discussion of the available margin interest rates available to retail investors and what rates were used in the simulations. I would also like more evidence to the statement in comments that says "the fact that a young investor with 100% equities does better *on the margin* by adding a bit of leverage is very robust" to be able to take it as a fact, as it only seems true at certain rates that don't seem obviously available.

As one datapoint, my current ... (read more)

2Marko5y

Indeed these margin rates are way too high and it would be madness to borrow at such rates.

LESSWRONG
LW

All of dentalperson's Comments + Replies