All of alex_zag_al's Comments + Replies

I think that someone who merely believed they were happy, and then experienced real happiness, would not want to go back.

There's an important category of choices: the ones where any good choice is "acting as if" something is true.

That is, there are two possible worlds. And there's one choice best if you knew you were in world 1, and another choice best if you knew you were in world 2. And, in addition, under any probabilistic mixture of the two worlds, one of those two choices is still optimal.

The hotel example falls into this category. So, one of the important reasons to recognize this category is to avoid a half-speed response to uncertainty.

Many choices don't fa... (read more)

Yeah. I mean, I'm not saying you should arrive late to class.

The way to work what you're saying into the framework is:

  • The cost of consistently arriving late is high

  • The cost (in minutes spent waiting for the class to start) of avoiding consistent lateness is less high

  • Therefore, you should pay this cost in minutes spent waiting

The point is to quantify the price, not to say you shouldn't pay it.

0niceguyanon
Tangentially related, I'm surprised that students misjudge how high the cost of being late is to the cost of arriving early. I have a suspicion that people who insist on being exactly one minute early and no more are made up of two groups, the very efficient and the best procrastinators that are often late and when on time they get to pat themselves on the back for being efficient. Getting to class early just to sit in the front row is the easiest way to boost your grade for most classes, IMO as an armchair psychologist.

the soft sciences have to deal with situations which never exactly repeat

This is also true of evolutionary biology--I think it's not widely recognized that evolutionary biology is like the soft sciences in this way.

iii. Emphasize all rationality use cases evenly. Cause all people to be evenly targeted by CFAR workshops.

We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.

This surprised me, since I think of rationality as the general principles of truth-finding.

What have you found about the degree to which rationality instruction needs to be tailored to a use-case?

Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”

I don't think that AI safety is important, which I guess makes me one of the "more varied participants made wary by an explicit focus on AI." Happy you're being explicit about your goals but I don't like them.

Wow, I've read the story but I didn't quite realize the irony of it being a textbook (not a curriuculum, a textbook, right?) about judgment and decision making.

The alternative I would propose, in this particular case, is to debate the general rule of banning physics experiments because you cannot be absolutely certain of the arguments that say they are safe.

Giving up on debating the probability of a particular proposition, and shifting to debating the merits of a particular rule, is I feel one of the ideas behind frequentist statistics. Like, I'm not going to say anything about whether the true mean is in my confidence interval in this particular case. But note that using this confidence interval formula works pretty well on average.

I don't know about the role of this assumption in AI, which is what you seem to care most about. But I think I can answer about its role in philosophy.

One thing I want from epistemology is a model of ideally rational reasoning, under uncertainty. One way to eliminate a lot of candidates for such a model is to show that they make some kind of obvious mistake. In this case, the mistake is judging something as a good bet when really it is guaranteed to lose money.

Inquiring after the falsifiability of a theory?

Not perfect but very good, and pretty popular.

After a few years in grad school, I think the principles of science are different from what you've picked up from your own sources.

In particular, this stands out to me as incorrect:

(1) I had carefully followed everything I'd been told was Traditionally Rational, in the course of going astray. For example, I'd been careful to only believe in stupid theories that made novel experimental predictions, e.g., that neuronal microtubules would be found to support coherent quantum states.

My training in writing grant applications contradicts this depiction of s... (read more)

Bayesian adaptive clinical trial designs place subjects in treatment groups based on a posterior distribution. (Clinical trials accrue patients gradually, so you don't have to assign the patients using the prior: you assign new patients using the posterior conditioned on observations of the current patients.)

These adaptive trials are, as you conjecture, much more efficient than traditional randomized trials.

Example: I-SPY 2. Assigns patients to treatments based on their "biomarkers" (biological measurements made on the patients) and the posterior... (read more)

Here are some things that shouldn't happen, on my analysis: An ad-hoc self-modifying AI as in (1) undergoes a cycle of self-improvement, starting from stupidity, that carries it up to the level of a very smart human - and then stops, unable to progress any further.

I'm sure this has been discussed elsewhere, but to me it seems possible that progress may stop when the mind becomes too complex to make working changes to.

I used to think that a self-improving AI would foom because as it gets smarter, it gets easier for it to improve itself. But it may get ... (read more)

As I understand the post, its idea is that a rationalist should never "start with a bottom line and then fill out the arguments".

I disagree. The idea, rather, is that your beliefs are as good as the algorithm that fills out the bottom line. Doesn't mean you shouldn't start by filling out the bottom line; just that you shouldn't do it by thinking of what feels good or what will win you an argument or by any other algorithm only weakly correlated with truth.

Also, note that if what you write above the bottom line can change the bottom line, that'... (read more)

By trusting Eliezer on MWI, aren't you trusting both his epistemology and his mathematical intuition?

Eliezer believes that the MWI interpretation allows you to derive quantum physics without any additional hypotheses that add complexity, such as collapse or the laws of movement for Bohm's particles. But this belief is based on mathematical intuition, according to the article on the Born probabilities. Nobody knows how to derive the observations without additional hypotheses, but a lot of people such as Eliezer conjecture it's possible. Right?

I feel like th... (read more)

0Vaniver
I would not expect it to be possible to derive the observations without additional postulates; I think that it's possible to do it with any of some partially known set of possible postulates, and the hunt is on for the most palatable postulate. At the time that the QM sequence was written, Eliezer was aware of multiple proposed solutions, none of which he found fully satisfying. For example, consider this new argument whose additional postulate is a specific version of 'locality.' I don't know whether or not Eliezer finds that one satisfying (note that MrMind has a whole list of limitations associated with that argument!).

This reminds me of hitting Ctrl+C, but on a thought process or object of focus instead of a program. After reading, I do it when i suspect I'm about to voluntarily do something I'm going to regret.

EDIT: At least, I think I'm doing it... I haven't done any training approaching the amount of time the training in your post takes.

Yes... if a theory adds to the surprisal of an experimental result, then the experimental result adds precisely the same amount of the surprisal of the theory. That's interesting.

Much like how inches and centimeters are off by a constant factor. Different log bases are analogous to different units.

The fear of cults, and the related fear of cults of personality, are antimemes against excessive awe of persons.

2Gunnar_Zarncke
Yes. Over time anti-memes against (some?) of these effects develop. But nobody guarantees that memetically stable (meme fixation?) societies develop which site in arbitrarily low local minima.

(last time I heard the word "jungle" was a Peruvian guy saying his dad grew up in the jungle and telling me about Peruvian native marriage traditions)

Well that was a straightforward answer.

0A1987dM
(I think the last time I heard the word “jungle” used literally to refer to rainforest was probably in Jumanji.)

The metaphor's going over my head. Don't feel obligated to explain though, I'm only mildly curious. But know that it's not obvious to everyone.

5A1987dM
https://en.wikipedia.org/wiki/Jungle#As_metaphor

...my suggestion is that truth-seeking (science etc) has increased in usefulness over time, whereas charisma is probably roughly the same as it has been for a long time.

Yes, and I think it's a good suggestion. I think I can phrase my real objection better now.

My objection is that I don't think this article gives any evidence for that suggestion. The historical storytelling is a nice illustration, but I don't think it's evidence.

I don't think it's evidence because I don't expect evolutionary reasoning at this shallow a depth to produce reliable results.... (read more)

2the-citizen
Cheers now that we've narrowed down our differences that's some really constructive feedback. I think I intended it primarily as a illustration and assume that most people in this context would probably already agree with that perspective, though this could be a bad assumption and it probably makes the argument seem pretty sloppy in any case. It'll definitely need refinement, so thanks. EDIT> My reply attracted downvotes? Odd.

This is from a novel (Three Parts Dead by Max Gladstone). The situation is a man and a woman who have to work together but have trouble trusting each other because of propaganda from an old war:

[Abelard] hesitated, suddenly aware that he was alone with a woman he barely trusted, a woman who, had they met only a few decades before, would have tried to kill him and destroy the gods he served. Tara hated propaganda for this reason. Stories always outlasted their usefulness.

3AlanCrowe
That is an interesting thought. When I try to ground it in contemporary reality my thoughts turn to politics. Modern democratic politics is partly about telling stories to motivate voters, but which stories have outlasted their usefulness? Any answer is likely to be contentious. Turning to the past, I wrote a little essay suggesting that stories of going back to nature to live in a recent golden age when life was simpler may serve as examples of stories that have outlasted their usefulness by a century.

Colin Howson, talking about how Cox's theorem bears the mark of Cox's training as a physicist (source):

An alternative approach is to start immediately with a quantitative notion and think of general principles that any acceptable numerical measure of uncertainty should obey. R.T. Cox and I.J. Good, working independently in the mid nineteen-forties, showed how strikingly little in the way of constraints on a numerical measure yield the finitely additive probability functions as canonical representations. It is not just the generality of the assumptions th

... (read more)

I like this post, I like the example, I like the point that science is newer than debate and so we're probably more naturally inclined to debate. I don't like the apparently baseless storytelling.

In the jungle of our evolutionary childhood, humanity formed groups to survive. In these groups there was a hierachy of importance, status and power. Predators, starvation, rival groups and disease all took the weak on a regular basis, but the groups afforded a partial protection. However, a violent or unpleasant death still remained a constant threat. It was of

... (read more)
0A1987dM
It didn't even occur to me to interpret “In the jungle of” literally, to the point that I didn't even notice it contained the word “jungle” until I Ctrl-F'd for it.
0CCC
I thought it was near the ocean...
-4the-citizen
LOL it was just a turn of phrase. Genetically speaking mate-availability is a component to survival. My understanding of the forces that increased group size is that they are more complex than either of these (big groups win conflicts for terrritory, but food availability (via tool use) and travel speed are limiting factors I believe - big groups only work if you can access a lot of food and move on before stripping the place barren), but I was writing a very short characterisation and I'm happy to acknowledge minor innacuracies. Perhaps I'll think about tightening up the language or removing that part as you suggest - I probably wrote that it far too casually. Nice example. Although Hitler did die anyway. And I think a decent part of the reason was his inability to reason effectively and make strategically sound decisions. Of course I think most people are kinda glad he was strategically irrational... In any case I think you're right the charisma is still useful but my suggestion is that truth-seeking (science etc) has increased in usefulness over time, whereas charisma is probably roughly the same as it has been for a long time. Perhaps I should make the winning section more storylike to make focus on its point rather than it being a scientific guide to that subtopic. Or maybe I just need to rethink it... The core point seems to have been received well at least.

Hmm. Yeah, that's tough. What do you use to calculate probabilities of the principles of logic you use to calculate probabilities?

Although, it seems to me that a bigger problem than the circularity is that I don't know what kinds of things are evidence for principles of logic. At least for the probabilities of, say, mathematical statements, conditional on the principles of logic we use to reason about them, we have some idea. Many consequences of a generalization being true are evidence for a generalization, for example. A proof of an analogous theorem is ... (read more)

Do you know of any cases where this simulation-seeded Gaussian Process was then used as a prior, and updated on empirical data?

Like...

  • uncertain parameters --simulation--> distribution over state

  • noisy observations --standard bayesian update--> refined distribution over state

Cari Kaufman's research profile made me think that's something she was interested in. But I haven't found any publications by her or anyone else that actually do this.

I actually think that I misread her research description, latching on to the one familiar idea.

0Vaniver
None come to mind, sadly. :( (I haven't read through all of his work, though, and he might know someone who took it in that direction.)

This reminds me of the story of Robert Edgar, who created the DNA and protein sequence alignment program MUSCLE.

He got a PhD in physics, but considers that a mistake. He did his bioinformatics work after selling a company and having free time. The bioinformatics work was notable enough that it's how I know of him.

His blog post, from which I learned this story: https://thewinnower.com/discussions/an-unemployed-gentleman-scholar

added, with whatever little bits of summary I could get by skimming.

It's true that this is a case of logical uncertainty.

However, I must add that in most of my examples, I bring up the benefits of a probabilistic representation. Just because you have logical uncertainty doesn't mean you need to represent it with probability theory.

In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving... okay, yeah, in auto... (read more)

3lackofcheese
Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it's logical uncertainty or any other kind? Granted, most of the time you don't want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you're better off if you know what it is you're approximating. In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don't think it's because they aren't particularly applicable. Granted, I could be wrong on this, because the core ideas aren't particularly new (see, for example, Dynamic Probability, Computer Chess, and the Measurement of Knowledge by I. J. Good). It's an area of research I'm working on right now, so I've spent a fair amount of time looking into it. I could give a few references on the topic, but on the whole I think they're quite sparse.

They wouldn't classify their work that way, and in fact I thought that was the whole point of surveying these other fields. Like, for example, a question for philosophers in the 1600s is now a question for biologists, and that's why we have to survey biologists to find out if it was resolved.

Yes. Because, we're trying to express uncertainty about the consequences of axioms. Not about axioms themselves.

common_law's thinking does seem to be something people actually do. Like, we're uncertain about the consequences of the laws of physics, while simultaneously being uncertain of the laws of physics, while simultaneously being uncertain if we're thinking about it in a logical way. But, it's not the kind of uncertainty that we're trying to model, in the applications I'm talking about. The missing piece in these applications are probabilities conditional on axioms.

1common_law
If you automatically assign the axioms an actually unobtainable certainty, you don't get the rational degree of belief in every proposition, as the set of "propositions" includes those not conditioned on the axioms.

Nice. Links added to post and I'll check them out later. The Duc and Williamson papers were from a post of yours, by the way. Some, MIRI status report or something. I don't remember.

I now think you're right that logical uncertainty doesn't violate any of Jaynes's desiderata. Which means I should probably try to follow them more closely, if they don't create problems like I thought they would.

An Aspiring Rationalist's Ramble has a post asserting the same thing, that nothing in the desiderata implies logical omniscience.

Here, the author is keeping in mind Conservation of Expected Evidence. If you could anticipate in advance the direction of any update, you should just update now. You should not expect to be able to get the right answer right away and never need to seriously update it.

There has to be a better way to put this.

The problem is that sometimes you can anticipate the direction. For example, if someone's flipping a coin, and you think it might have two heads. This is a simple example because a heads is always evidence in favor of the two-heads hypothesis, and a... (read more)

You'll note that I don't try to modestly say anything like, "Well, I may not be as brilliant as Jaynes or Conway, but that doesn't mean I can't do important things in my chosen field."

Because I do know... that's not how it works.

Maybe not in your field, but that is how it usually works, isn't it?

(the rest of this comment is basically an explanation of comparative advantage)

Anybody can take the load off of someone smarter, by doing the easiest tasks that have been taking their time.

As a most obvious example, a brilliant scientist's secretary. A... (read more)

I don't understand this yet, which isn't too surprising since I haven't read the background posts yet. However, all the "roughly speaking" summaries of the more exact stuff are enough to show me that this article is talking about something I'm curious about, so I'll be reading in more detail later probably.

This is counterintuitive in an interesting way.

You'd think that since P(Q1|~∀xQx) = 1/2 and P(Q1|∀xQx) = 1, observing Q1 is evidence in favor of ∀xQx.

And it is, but the hidden catch is that this depends on the implication that ∀xQx->Q1, and that implication is exactly the same amount of evidence against ∀xQx.

It's also an amusing answer to the end of part 1 exercise.

i once had to go to the doctor so he could fish a lego out of my nose. So, that was worse than eating all the cabbage or spilling all the milk I think. More scary, and probably more expensive, depending on how the insurance worked out.

4TobyBartels
I think that shape, hardness, and solubility would all make a Lego brick worse than a bean. Really, the only way to tell is probably to try it out. Who wants to volunteer for an experiment?

Truth is really important sometimes, but so far I've been bad about identifying when.

I know a fair bit about cognitive biases and ideal probabilistic reasoning, and I'm pretty good at applying it to scientific papers that I read or that people link through Facebook. But these applications are usually not important.

But, when it comes to my schoolwork and personal relationships, I commit the planning fallacy routinely, and make bad predictions against base rates. And I spend no time analyzing these kinds of mistakes or applying what I know about biases and ... (read more)

The first and third ones, about info sometimes being worthless, just made me think of Vaniver's article on value of information calculations. So, I mean, it sounded very LessWrongy to me, very much the kind of thing you'd hear here.

The second one made me think of nuclear secrets, which made me think of HPMOR. Again, it seems like the kind of thing that this community would recognize the value of.

I think my reactions to these were biased, though, by being told how I was expected to feel about them. I always like to subvert that, and feel a little proud of myself when what I'm reading fails to describe me.

I'm pretty sure that the Cauchy likelihood, like the other members of the t family, is a weighted mixture of normal distributions. (Gamma distribution over the inverse of the variance)

EDIT: There's a paper on this, "Scale mixtures of normal distributions" by Andrews and Mallows, if you want the details

3Cyan
Oh, for sure it is. But that only gives it a conditionally conjugate prior, not a fully (i.e., marginally) conjugate prior. That's great for Gibbs sampling, but not for pen-and-paper computations. In the three years since I wrote the grandparent, I've found a nice mixture representation for any unimodal symmetric distribution: I don't think it would be too hard to convert this width-weighted-mixture-of-uniforms representation to a precision-weighted-mixture-of-normals representation.

Hmm. Considering that I was trying to come up with an example to illustrate how explicit the assumptions are, the assumptions aren't that explicit in my example are they?

Prior knowledge about the world --> mathematical constraints --> prior probability distribution

The assumptions I used to get the constraints are that the best estimate of your next measurement is the average of your previous ones, and that the best estimate of its squared deviation from that average is some number s^2, maybe the variance of your previous observations. But those aren'... (read more)

it seems like a weird response to say "oh, well who cares about explicit assumptions anyways?"

Yeah, sorry. I was getting a little off topic there. It's just that in your post, you were able to connect the explicit assumptions being true to some kind of performance guarantee. Here I was musing on the fact that I couldn't. It was meant to undermine my point, not to support it.

What does it mean to "assume that the prior satisfies these constraints"?

?? The answer to this is so obvious that I think I've misunderstood you. In my examp... (read more)

2jsteinhardt
Ah my bad! Now I feel silly :). So the prior is this thing you start with, and then you get a bunch of data and update it and get a posterior. In general it's pretty unclear what constraints on the prior will translate to in terms of the posterior. Or at least, I spent a while musing about this and wasn't able to make much progress. And furthermore, when I look back, even in retrospect it's pretty unclear how I would ever test if my "assumption" held if it was a constraint on the prior. I mean sure, if there's actually some random process generating my data, then I might be able to say something, but that seems like a pretty rare case... sorry if I'm being unclear, hopefully that was at least somewhat more clear than before. Or it's possible that I'm just nitpicking pointlessly.

On the other hand, an argument I hear is that Bayesian methods make their assumptions explicit because they have an explicit prior. If I were to write this as an assumption and guarantee, I would write:

Assumption: The data were generated from the prior.

Guarantee: I will perform at least as well as any other method.

While I agree that this is an assumption and guarantee of Bayesian methods, there are two problems that I have with drawing the conclusion that “Bayesian methods make their assumptions explicit”. The first is that it can often be very difficult

... (read more)
2jsteinhardt
What does it mean to "assume that the prior satisfies these constraints"? As you already seem to indicate later in your comment, the notion of "a prior satisfying a constraint" is pretty nebulous. It's unclear what concrete statement about the world this would correspond to. So I still don't think this constitutes a particularly explicit assumption. I'm responding to arguments that others have raised in the past that Bayesian methods make assumptions explicit while frequentist methods don't. If I then show that frequentist methods also make explicit assumptions, it seems like a weird response to say "oh, well who cares about explicit assumptions anyways?"

okay, and you were just trying to make sure that Manfred knows that all this probability-of-distributions speech you're speaking isn't, as he seems to think, about the degree-of-belief-in-my-current-state-of-ignorance distribution for the first roll. Gotcha.

Okay... but do we agree that the degree-of-belief distribution for the first roll is (1/3, 1/3, 1/3), whether it's a fair die or a completely biased in an unknown way die?

Because I'm pretty sure that's what Manfred's talking about when he says

There is a single correct distribution for our starting information, which is (1/3,1/3,1/3),

and I think him going on to say

the "distribution across possible distributions" is just a delta function there.

was a mistake, because you were talking about different things.

EDIT:

I thought so too, which is w

... (read more)
[This comment is no longer endorsed by its author]Reply

I like your writing style. For something technical, it feels very personal. And you keep it very concise while also easy to read - is there a lot of trimming down that goes on, or do you just write it that way?

0Manfred
Thanks! The content stayed pretty much the same throughout the editing process, but I sanded down some of the rough writing - removing useless words and rewriting confusing paragraphs. I'm a much worse writer when not given a week in advance.

In my experience with Bayesian biostatisticians, they don't talk much about the information a prior represents. But they're also not just using common ones. They talk a lot about its "properties" - priors with "really nice properties". As for as I can tell, they mean two things:

  • Computational properties
  • The way the distribution shifts as you get evidence. They think about this in a lot of detail, and they like priors that lead to behavior they think is reasonable.

I think this amounts to the same thing. The way they think and infer ab... (read more)

Load More