All of Unnamed's Comments + Replies

Unnamed20

This post begins:

I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the phrase "bad faith" from hearing it used, don't actually know what it means—and maybe, that the thing it does mean doesn't carve reality at the joints.

People get very touchy about bad faith accusations: they think that you should assume good faith, but that if you've determined someone is in bad faith, you shouldn't even be talking to them, that you need to exile them.

The second paragraph uses the term "bad faith" or "goo... (read more)

2Raemon
Okay I think I have more of an idea where you're coming from. (although I get some sense of something being at stake for you here that I still don't understand). I maybe want to clarify, when I suggested "taboo bad faith" as a title, I was imagining a pretty different purpose for the title (and I don't super strongly defend that title as a good one here). I was looking for a succinct way of describing the suggestion "when you want to accuse someone of bad faith, you should probably say something more specific instead." (i.e. "Taboo Bad Faith" is a recommendation for what to do in the wild, rather than "a thing the post itself was doing.")
Unnamed20

The "Taboo bad faith" title doesn't fit this post. I had hoped from the opening section that it was going in that direction, but it did not.

Most obviously, the post kept relying heavily on the terms "bad faith" and "good faith" and that conceptual distinction, rather than tabooing them.

But also, it doesn't do the core intellectual work of replacing a pointer with its substance. In the opening scenario where someone accuses their conversation partner of bad faith, conveying something along the lines of 'I disapprove of how you're approaching this conversati... (read more)

2Raemon
I am sort of confused at what you got from this post. You say "But also, it doesn't do the core intellectual work of replacing a pointer with its substance.", but, I think he explicitly does that here? It feels like you wanted some entirely different post, about how to navigate when someone accuses someone of bad faith, for a variety of possible definitions of bad faith? (As opposed to this post, which is mostly saying "Avoid accusing people of bad faith. Instead, do some kind of more specific and useful thing." Which honestly seems like good advice to me even for most people who are using the phrase to mean something else. "I disapprove of what you're doing here and am leaving now" seems totally fine)

I'm voting against including this in the Review, at max level, because I think it too-often mischaracterizes the views of the people it quotes. And it seems real bad for a post that is mainly about describing other people's views and the drawing big conclusions from that data to inaccurately describe those views and then draw conclusions from inaccurate data.

I'd be interested in hearing about this from people who favor putting this post in the review. Did you check on the sources for some of Elizabeth's claims and think that she described them well? Did yo... (read more)

Unnamed20

As I understand it, Scott's post was making basically the same conceptual distinction as this Andrew Gelman post, where Gelman writes: 

One of the big findings of baseball statistics guru Bill James is that minor-league statistics, when correctly adjusted, predict major-league performance. James is working through a three-step process: (1) naive trust in minor league stats, (2) a recognition that raw minor league stats are misleading, (3) a statistical adjustment process, by which you realize that there really is a lot of information there, if you know

... (read more)
Unnamed20

savvy don't (just) have more skills to extract signal from a "naturally" occurring source of lies. They're collaborating with it!

This (from your tweet) is false, and your post here even has a straightforward argument against it (with the honest cop and the corrupt cop both savvy enough to discern a lightly veiled bribery attempt). Savviness about extracting information from a source does not imply complicity with the source. 

Unnamed*20

Trying to make this more intuitive: consider a prediction market which is currently priced at x, where each share will pay out $1 if it resolves as True.

If you think it's underpriced because your probability is y, where y>x, then your subjective EV from buying a share is y-x. e.g., If it's priced at $0.70 and you think p=0.8, your subjective EV from buying a share is $0.10.

If you think it's overpriced because your probability is z, where z<x, then your subjective EV from selling a share is x-z. e.g., If it's priced at $0.70 and you think p=0.56, your... (read more)

Unnamed128

This is a bet at 30% probability, as 42.86/142.86 = .30001.

That is the average of Alice's probability and Bob's probability. The fair bet according to equal subjective EV is at the average of the two probabilities; previous discussion here.

1Jakub Halmeš
I did not realize that. Thank you for the reference! 
Unnamed50

The first two puzzles got some discussion on LW long ago here, and a bit more here.

Unnamed50

I don't buy the way that Spencer tried to norm the ClearerThinking test. It sounds like he just assumed that people who took their test and had a college degree as their highest level of education had the same IQ as the portion of the general population with the same educational level, and similarly for all other education levels. Then he used that to scale how scores on the ClearerThinking test correspond to IQs. That seems like a very strong and probably inaccurate assumption.

Much of what this post and Scott's post are using the ClearerThinking IQ number... (read more)

Unnamed2311

I think that the way that Scott estimated IQ from SAT is flawed, in a way that underestimates IQ, for reasons given in comments like this one. This post kept that flaw.

5Rockenots
Your argument assumes a uniform prior, but a Gaussian prior is more realistic in this case. In practice, IQ scores are distributed normally, so it's more likely that someone with a high SAT score comes from a more common IQ range than from a very high outlier. For example, say the median rationalist has an SAT score of +2 SD (chosen for ease of computation), and the SAT-IQ correlation is 0.80. The IQ most likely to produce an SAT of +2 SD is 137.5 (+2.5 SD). However, IQs of 137.5 are rare (99.4%-ile). While lower IQs are less likely to achieve such a high SAT score, there are more people in the lower IQ ranges, making it more probable that someone with a +2 SD SAT score falls into a lower IQ bracket. This shift between the MLE (Maximum Likelihood Estimate) and MAP (Maximum A Posteriori) estimates is illustrated in the graph, where the MLE estimate would be +2.5 SD, but the MAP estimate, accounting for the Gaussian prior, is closer to +1.6 SD, as expected. (You may also be interested in my reply to faul_sname's comment.)

I agree. You only multiply the SAT z-score by 0.8 if you're selecting people on high SAT score and estimating the IQ of that subpopulation, making a correction for regressional Goodhart. Rationalists are more likely selected for high g which causes both SAT and IQ, so the z-score should be around 2.42, which means the estimate should be (100 + 2.42 * 15 - 6) = 130.3. From the link, the exact values should depend on the correlations between g, IQ, and SAT score, but it seems unlikely that the correction factor is as low as 0.8.

Unnamed42

There can be no consequentialist argument for lying to yourself or allies1 because without truth you can’t make accurate utility calculations2.

Seems false.

Unnamed20

"Can crimes be discussed literally?":

  • some kinds of hypocrisy (the law and medicine examples) are normalized
  • these hypocrisies are / the fact of their normalization is antimemetic (OK, I'm to some extent interpolating this one based on familiarity with Ben's ideas, but I do think it's both implied by the post, and relevant to why someone might think the post is interesting/important)
  • the usage of words like 'crime' and 'lie' departs from their denotation, to exclude normalized things
  • people will push back in certain predictable ways on calling normalized thing
... (read more)
Unnamed21

This post reads like it's trying to express an attitude or put forward a narrative frame, rather than trying to describe the world.

Many of these claims seem obviously false, if I take them at face value at take a moment to consider what they're claiming and whether it's true.

e.g., On the first two bullet points it's easy to come up with counterexamples. Some successful attempts to steer the future, by stopping people from doing locally self-interested & non-violent things, include: patent law ("To promote the progress of science and useful arts, by sec... (read more)

Unnamed8434

In America, people shopped at Walmart instead of local mom & pop stores because it had lower prices and more selection, so Walmart and other chain stores grew and spread while lots of mom & pop stores shut down. Why didn't that happen in Wentworld?

6ChristianKl
It's worth noting that there's no Walmart in Germany. German shopping malls usually have a variety of different shops. While it's still true that a lot of those shops are chains, it's a different model than the Walmart model.  A company focusing on lower prices and more selection is not what won economically in Germany. Aldi and Lidl manage to focus on low prices but not high selection. Aldi manages to expand to the United States but Walmart doesn't seem to be able to expand to Germany or for that matter Europe at all. 
1johnswentworth
It did happen in Wentworld, the resulting corporate structure just doesn't look suspiciously hierarchical, and the corporate culture doesn't look suspiciously heavy on dominance/submission. Hard to know the full story of what it would look like instead, but I'd guess the nominal duties of Earth!management would be replaced with a lot more reliance on people specialized in horizontal coordination/communication rather hierarchical command & control, plus a lot more paying for results rather than flat salary (though that introduces its own set of problems, which Wentworlders would see as one of the usual main challenges of scaling a company).
Raemon*228

This comment caused it to occur to me that working in software development might produce a distorted view of this situation. In most types of companies, if you want to scale, you have to hire a lot more people to build more things and sell to more people. So scaling large is a much more natural thing to do. And cultural norms around that might then just spread to corporations where that’s less relevant.

Unnamed30

I made a graph of this and the unemployment rate, they're correlated at r=0.66 (with one data point for each time Gallup ran the survey, taking the unemployment rate on the closest day for which there's data). You can see both lines spike with every recession.

Unnamed30

Are you telling me 2008 did actual nothing?

It looks like 2008 led to about a 1.3x increase in the number of people who said they were dissatisfied with their life.

3Unnamed
I made a graph of this and the unemployment rate, they're correlated at r=0.66 (with one data point for each time Gallup ran the survey, taking the unemployment rate on the closest day for which there's data). You can see both lines spike with every recession.
Unnamed104

It's common for much simpler Statistical Prediction Rules, such as linear regression or even simpler models, to outperform experts even when they were built to predict the experts' judgment.

7gwern
Yeah, 'transcendence' is a truly overhyped and absurd way to label what appears to be a simple ensemble effect. Should Galton have said that the crowd 'transcended' in guessing the weight of a fat pig? This isn't even a useful motivating example for how LLMs could outperform human experts while doing offline learning, because the averaging benefit breaks down, as expected, at their high end.
1Archimedes
Exactly. All that’s needed for “transcendence” is removing some noise. I highly recommend the book Noise by Daniel Kahneman et al on this topic.
Unnamed30

Or "Defense wins championships."

Unnamed40

With the ingredients he has, he has gotten a successful Barkskin Potion:

1 of the 1 times (100%) he brewed together Crushed Onyx, Giant's Toe, Ground Bone, Oaken Twigs, Redwood Sap, and Vampire Fang. 

19 of the 29 times (66%) he brewed together Crushed Onyx, Demon Claw, Ground Bone, and Vampire Fang.

Only 2 other combinations of the in-stock ingredients have ever produced Barkskin Potion, both at under a 50% rate (4/10 and 18/75).

The 4-ingredient, 66% success rate potion looks like the best option if we're just going to copy something that has worked. Th

... (read more)
2aphyer
Unnamed102

Being Wrong on the Internet: The LLM generates a flawed forum-style comment, such that the thing you've been wanting to write is a knockdown response to this comment, and you can get a "someone"-is-wrong-on-the-internet drive to make the points you wanted to make. You can adjust how thoughtful/annoying/etc. the wrong comment is.

Target Audience Personas: You specify the target audience that your writing is aimed at, or a few different target audiences. The LLM takes on the persona of a member of that audience and engages with what you've written, with more ... (read more)

Unnamed189

I don't think that the key element in the aging example is 'being about value claims'. Instead, it's that the question about what's healthy is a question that many people wonder about. Since many people wonder about that question, some people will venture an answer. Even if humanity hasn't yet built up enough knowledge to have an accurate answer.

Thousands of years ago many people wondered what the deal is with the moon and some of them made up stories about this factual (non-value) question whose correct answer was beyond them. And it plays out similarly t... (read more)

5tailcalled
I think the value-ladenness is part of why it comes up even when we don't have an answer, since for value-laden things there's a natural incentive to go up right to the boundary of our knowledge to get as much value as possible.
Unnamed40

7% of the variance isn't negligible. Just look at the pictures (Figure 1 in the paper):

gwern110

Yes, I did read the paper. And those are still extremely small effects and almost no predictability, no matter how you graph them (note the axis truncation and sparsity), even before you get into the question of "what is the causal status of any of these claims and why we are assuming that interest groups in support precede rather than follow success?"

Unnamed20

 I got the same result: DEHK.

I'm not sure that there are no patterns in what works for self-taught architects, and if we were aiming to balance cost & likelihood of impossibility then I might look into that more (since I expect A,L,N to be the the cheapest options with a chance to work), but since we're prioritizing impossibility I'll stick with the architects with the competent mentors.

2dr_s
Unnamed42

 Moore & Schatz (2017) made a similar point about different meanings of "overconfidence" in their paper The three faces of overconfidence. The abstract:

Overconfidence has been studied in 3 distinct ways. Overestimation is thinking that you are better than you are. Overplacement is the exaggerated belief that you are better than others. Overprecision is the excessive faith that you know the truth. These 3 forms of overconfidence manifest themselves under different conditions, have different causes, and have widely varying consequences. It is a mist

... (read more)

You can go ahead and post.

I did a check and am now more confident in my answer, and I'm not going to try to come up with an entry that uses fewer soldiers.

Just got to this today. I've come up with a candidate solution just to try to survive, but haven't had a chance yet to check & confirm that it'll work, or to try to get clever and reduce the number of soldiers I'm using.

10 Soldiers armed with: 3 AA, 3 GG, 1 LL, 2 MM, 1 RR

I will probably work on this some more tomorrow.

4aphyer
Understood, I'll refrain from posting the solution tomorrow until I've heard from you - if you want more time, let me know and I can push that further back.
Unnamed108

Building a paperclipper is low-value (from the point of view of total utilitarianism, or any other moral view that wants a big flourishing future) because paperclips are not sentient / are not conscious / are not moral patients / are not capable of flourishing. So filling the lightcone with paperclips is low-value. It maybe has some value for the sake of the paperclipper (if the paperclipper is a moral patient, or whatever the relevant category is) but way less than the future could have.

Your counter is that maybe building an aligned AI is also low-value (... (read more)

4Matthew Barnett
I dispute that I'm "giving up" in any meaningful sense here. I'm happy to consider alternative proposals for how we could make the future large and flourishing from a total utilitarian perspective rather than merely trying to solve technical alignment problems. The post itself was simply intended to discuss the moral implications of AI alignment (itself a massive topic), but it was not intended to be an exhaustive survey of everything we can do to make the future go better. I agree we should aim high, in any case. I don't think this choice is literally a coin flip in expected value, and I agree that one might lean in one direction over the other. However, I think it's quite hard to quantify this question meaningfully. My personal conclusion is simply that I am not swayed in any particular direction on this question; I am currently suspending judgement. I think one could reasonably still think that it's more like 60-40 thing than a 40-60 thing or 50-50 coin flip. But I guess in this case, I wanted to let my readers decide for themselves which of these numbers they want to take away from what I wrote, rather than trying to pin down a specific number for them.

Is this calculation showing that, with a big causal graph, you'll get lots of very weak causal relationships between distant nodes that should have tiny but nonzero correlations? And realistic sample sizes won't be able to distinguish those relationships from zero.

Andrew Gelman often talks about how the null hypothesis (of a relationship of precisely zero) is usually false (for, e.g., most questions considered in social science research).

4niplav
Yeah, that's the thing I'm most worried about as well, and what the last part of the question is about. I tried to check whether more samples obviously drive the number of causal non-correlations down, but only eye-balling it doesn't suggest this.
Unnamed5025

A lot of people have this sci-fi image, like something out of Deep Impact, Armageddon, Don't Look Up, or Minus, of a single large asteroid hurtling towards Earth to wreak massive destruction. Or even massive vengeance, as if it was a punishment for our sins.

But realistically, as the field of asteroid collection gradually advances, we're going to be facing many incoming asteroids which will interact with each other in complicated ways, and whose forces will to a large extent balance each other out.

Yet doomers are somehow supremely confident in how the futur... (read more)

roha174

If everyone has his own asteroid impact, earth will not be displaced because the impulse vectors will cancel each other out on average*. This is important because it will keep the trajectory equilibrium of earth, which we know since ages from animals jumping up and down all the time around the globe in their games of survival. If only a few central players get asteroid impacts it's actually less safe! Safety advocates might actually cause the very outcomes that they fear!

*I've a degree in quantum physics and can derive everything from my model of the universe. This includes moral and political imperatives that physics dictate and thus most physicists advocate for.

They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your reputation. But I think criticism is good, because if I write a bad blog post, and someone tells me it was bad, I can learn from that, and do better next time.

I read this as saying 'a common view is that being criticized is bad because it hurts your reputation, but as a person with some knowledge of the secret lore of rationality I believe that being criticized is good because you can learn from it.'

And he isn't making a claim about to what extent the existing LW/rationality community shares his view.

Unnamed*102

Seems like the main difference is that you're "counting up" with status and "counting down" with genetic fitness.

There's partial overlap between people's reproductive interests and their motivations, and you and others have emphasized places where there's a mismatch, but there are also (for example) plenty of people who plan their lives around having & raising kids. 

There's partial overlap between status and people's motivations, and this post emphasizes places where they match up, but there are also (for example) plenty of people who put tons of ... (read more)

The economist RH Strotz introduced the term "precommitment" in his 1955-56 paper "Myopia and Inconsistency in Dynamic Utility Maximization".

Thomas Schelling started writing about similar topics in his 1956 paper "An essay on bargaining", using the term "commitment".

Both terms have been in use since then.

On one interpretation of the question: if you're hallucinating then you aren't in fact seeing ghosts, you're just imagining that you're seeing ghosts. The question isn't asking about those scenarios, it's only asking what you should believe in the scenarios where you really do see ghosts.

1Celarix
I mean, sure, but that does kinda answer the question in the question - "if event X happens, should you believe that event X is possible?" Well, yes, because it happened. I guess, in that case, the question could be more measuring something like "I, a Rationalist, would not believe in ghosts because that would lower my status in the Rationalist community, despite seeing strong evidence for it" Sort of like asking "are you a Rationalist or are you just saying so for status points?"

My updated list after some more work yesterday is

96286, 9344, 107278, 68204, 905, 23565, 8415, 62718, 83512, 16423, 42742, 94304

which I see is the same as simon's list, with very slight differences in the order

More on my process:

I initially modeled location just by a k nearest neighbors calculation, assuming that a site's location value equals the average residual of its k nearest neighbors (with location transformed to Cartesian coordinates). That, along with linear regression predicting log(Performance), got me my first list of answers. I figured that li

... (read more)
1simon

Did a little robustness check, and I'm going to swap out 3 of these to make it:

96286, 23565, 68204, 905, 93762, 94408, 105880, 9344, 8415, 62718, 80395, 65607

To share some more:

I came across this puzzle via aphyer's post, and got inspired to give it a try.

Here is the fit I was able to get on the existing sites (Performance vs. Predicted Performance). Some notes on it:

Seems good enough to run with. None of the highest predicted existing sites had a large negative residual, and the highest predicted new sites give some buffer.

Three observations I made along ... (read more)

6Unnamed
My updated list after some more work yesterday is More on my process:

My current choices (in order of preference) are

96286, 23565, 68204, 905, 93762, 94408, 105880, 8415, 94304, 42742, 92778, 62718

3Unnamed
Did a little robustness check, and I'm going to swap out 3 of these to make it: To share some more: I came across this puzzle via aphyer's post, and got inspired to give it a try. Here is the fit I was able to get on the existing sites (Performance vs. Predicted Performance). Some notes on it: Three observations I made along the way.  First (which is mostly redundant with what aphyer wound up sharing in his second post): Second: Third:
Answer by Unnamed279

What's "Time-Weighted Probability"? Is that just the average probability across the lifespan of the market? That's not a quantity which is supposed to be calibrated.

e.g., Imagine a simple market on a coin flip, where forecasts of p(heads) are made at two times: t1 before the flip and t2 after the flip is observed. In half of the cases, the market forecast is 50% at t1 and 100% at t2, for an average of 75%; in those cases the market always resolves True. The other half: 50% at t1, 0% at t2, avg of 25%, market resolves False. The market is underconfident if you take this average, but the market is perfectly calibrated at any specific time.

1arabaga
Yeah, this seems to be a big part of it. If you instead switch it to the probability at market midpoint, Manifold is basically perfectly calibrated, and Kalshi is if anything overconfident (Metaculus still looks underconfident overall).
1[comment deleted]

Have you looked at other ways of setting up the prior to see if this result still holds? I'm worried that they way you've set up the prior is not very natural, especially if (as it looks at first glance) the Stable scenario forces p(Heads) = 0.5 and the other scenarios force p(Heads|Heads) + p(Heads|Tails) = 1. Seems weird to exclude "this coin is Headsy" from the hypothesis space while including "This coin is Switchy".

Thinking about what seems most natural for setting up the prior: the simplest scenario is where flips are serially independent. You only ne... (read more)

1Kevin Dorst
See the discussion in §6 of the paper.  There are too many variations to run, but it at least shows that the result doesn't depend on knowing the long-run frequency is 50%; if we're uncertain about both the long-run hit rate and about the degree of shiftiness (or whether it's shifty at all), the results still hold. Does that help?
Answer by Unnamed40

Try memorizing their birthdates (including year).

That might be different enough from what you've previously tried to memorize (month & day) to not get caught in the tangle that has developed.

My answer to "If AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived (or better)" is pretty much defined by how the question is interpreted. It could swing pretty wildly, but the obvious interpretation seems ~tautologically bad.

Agreed, I can imagine very different ways of getting a number for that, even given probability distributions for how good the future will be conditional on each of the two scenarios.

A stylized example: say that the AI-only future has a 99% chance of being mediocre and... (read more)

2faul_sname
I think that one is supposed to be parsed as "If AI wipes out humanity and colonizes the universe itself, the future will go about as well as, or go better than, if humanity had survived" rather than "If AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived or done better than survival".

The time on a clock is pretty close to being a denotative statement.

Unnamed180

Batesian mimicry is optimized to be misleading, "I"ll get to it tomorrow" is denotatively false, "I did not have sexual relations with that woman" is ambiguous as to its conscious intent to be denotatively false.

4Unnamed
The time on a clock is pretty close to being a denotative statement.
Unnamed141

Structure Rebel, Content Purist: people who disagree with me are lying (unless they say "I think that", "My view is", or similar)

Structure Rebel, Content Neutral: people who disagree with me are lying even when they say "I think that", "My view is", or similar

Structure Rebel, Content Rebel: trying to unlock the front door with my back door key is a lie

How do you get a geocentric model with ellipses? Venus clearly does not go in an ellipse around the Earth. Did Riccioli just add a bunch of epicycles to the ellipses?

Googling... oh, it was a Tychonic model, where Venus orbits the sun in an ellipse (in agreement with Kepler), but the sun orbits the Earth.

Kepler's ellipses wiped out the fully geocentric models where all the planets orbit around the Earth, because modeling their orbits around the Earth still required a bunch of epicycles and such, while modeling their orbits around the sun now involved a simp... (read more)

2dr_s
I mean, that's not even a different model, that's just the real thing visualized in a frame of reference centred on the Earth.
1Meow P
Keep in mind that any ellipse can be modelled by 2 epicycles. Ellipses were a tighter constraint on available epicycle models. Kepler's model actually involved hyperbola for comets, which cannot be explained by epicycles.
Unnamed131

Here is Yudkowsky (2008) Artificial Intelligence as a Positive and
Negative Factor in Global Risk:

Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.

The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaq

... (read more)
Unnamed150

Also, chp 25 of HPMOR is from 2010 which is before CFAR.

Unnamed130

It came up in the sequences, e.g. here, here, and here.

Unnamed150

Also, chp 25 of HPMOR is from 2010 which is before CFAR.

There are a couple errors in your table of interpretations. For "actual score = subjective expected", the second half of the interpretation "prediction = 0.5 or prediction = true probability" got put on a new line in the "Comparison score" column instead of staying together in the "Interpretation" column, and similarly for the next one.

I posted a brainstorm of possible forecasting metrics a while back, which you might be interested in. It included one (which I called "Points Relative to Your Expectation") that involved comparing a forecaster's (Brier or other) score with the score that they'd expect to get based on their probability.

1nikos
Thanks, yeah that list looks interesting!
Load More