Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: LawrenceC 30 January 2017 05:03:31PM *  1 point [-]

Thanks Søren! Could I ask what you're planning on covering in the future? Is this mainly going to be a technical or non-technical reading group?

I noticed that your group seems to have covered a lot of the basic readings on AI Safety, but I'm curious what your future plans.

Comment author: LawrenceC 22 December 2016 04:14:06PM *  0 points [-]

I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.

I think people are looking in to it: The Good Judgment Project team used simple machine learning algorithms as part of their submission to IARPA during the ACE Tournament. One of the PhD students involved in the project wrote his dissertation on a framework for aggregating probability judgments. In the Good Judgment team at least, people are also in using ML for other aspects of prediction - for example, predicting if a given comment will change another person's forecasts - but I don't think there's been much success.

I think a real problem is that there's a real paucity of data for ML-based prediction aggregation compared to most machine learning projects - a good prediction tournament gets a couple hundred forecasts resolving in a year, at most.

Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.

I think this is a bigger hurdle than you'd expect if you're implementing these for prediction tournaments, though it might be possible to do for prediction markets. (However, I'm curious how you're going to implement the market mechanism in this case.) Anecdotally speaking many of the people involved in GJ Open are not particularly math or tech savvy, even amongst the people who are good at prediction.

Comment author: Qiaochu_Yuan 20 December 2016 08:52:05PM 5 points [-]

The bucket diagrams are too coarse, I think; they don't keep track of what's causing what and in what direction. That makes it harder to know what causal aliefs to inspect. And when you ask yourself questions like "what would be bad about knowing X?" you usually already get the answer in the form of a causal alief: "because then Y." So the information's already there; why not encode it in your diagram?

Comment author: LawrenceC 20 December 2016 08:56:48PM 1 point [-]

Fair point.

Comment author: Lumifer 20 December 2016 08:31:52PM 1 point [-]

You are suggesting a kind of Einstein-Bose condensate, where a collective of particles becomes one.

Not at all.

I'm just saying that you have an infinite sequence of spheres with the property X. You're saying that because the sequence is infinite I can't point to the last sphere and therefore can't say anything about it. I'm saying that because all spheres in this sequence have the property X, it doesn't matter that the sequence is infinite.

Comment author: LawrenceC 20 December 2016 08:53:18PM 2 points [-]

I'm just saying that you have an infinite sequence of spheres with the property X. You're saying that because the sequence is infinite I can't point to the last sphere and therefore can't say anything about it. I'm saying that because all spheres in this sequence have the property X, it doesn't matter that the sequence is infinite.

This isn't true in general. Each natural number is finite, but the limit of the natural numbers is infinite. Just because each of the intermediate shapes has property doesn't mean the limiting shape has property X. Notably, in this case each of the intermediate shapes has a non-zero amount of empty space, but the limiting shape has no empty space.

Comment author: Thomas 20 December 2016 11:49:27AM *  1 point [-]

This is my stupid question:

https://protokol2020.wordpress.com/2016/12/14/geometry-problem/

Do not hesitate to patronize me, or whatever does it take, I'd really like to have an answer.

Comment author: LawrenceC 20 December 2016 08:51:58PM *  1 point [-]

Maybe think about the problem this way:

Suppose there was some small ball inside of your super-packed structure that isn't filled. Then we can fill this ball, and so the structure isn't super-packed. It follows that the volume of the empty space inside of your structure has to be 0.

Now, what does your super-packed structure look like, given that it's a empty cube that's been filled?

EDIT: Nevermind, just saw that Villiam gave a similar answer.

Comment author: Qiaochu_Yuan 20 December 2016 07:42:01AM 18 points [-]

The bucket diagrams don't feel to me like the right diagrams to draw. I would be drawing causal diagrams (of aliefs); in the first example, something like "spelled oshun wrong -> I can't write -> I can't be a writer." Once I notice that I feel like these arrows are there I can then ask myself whether they're really there and how I could falsify that hypothesis, etc.

Comment author: LawrenceC 20 December 2016 08:44:01PM 1 point [-]

I think they're equivalent in a sense, but that bucket diagrams are still useful. A bucket can also occur when you conflate multiple causal nodes. So in the first example, the kid might not even have a conscious idea that there are three distinct causal nodes ("spelled oshun wrong", "I can't write", "I can't be a writer"), but instead treats them as a single node. If you're able to catch the flinch, introspect, and notice that there are actually three nodes, you're already a big part of the way there.

Comment author: LawrenceC 20 December 2016 08:37:54PM *  0 points [-]

Thanks for posting this! I have a longer reply to Taleb's post that I'll post soon. But first:

When you read Silver (or your preferred reputable election forecaster, I like Andrew Gelman) post their forecasts prior to the election, do you accept them as equal or better than any estimate you could come up with? Or do you do a mental adjustment or discounting based on some factor you think they've left out?

I think it depends on the model. First, note that all forecasting models only take into account a specific set of signals. If there are factors influencing the vote that I'm both aware of and don't think are reflected in the signals, then you should update their forecast to reflect this. For example, I think that because Nate Silver's model was based on polls that lag behind current events, if you had some evidence that a given event was really bad or really good for one of the two candidates, such as the Comey letter or the Trump video, you should update in favor of/against a Trump Presidency before it becomes reflected in the polls.

The math is based on assumptions though that with high uncertainty, far out from the election, the best forecast is 50-50.

Not really. The key assumption is that your forecasts are a Wiener process - a continuous time martingale with normally-distributed increments. (I find this funny because Taleb spends multiple books railing against normality assumptions.) This is kind of a troubling assumption, as Lumifer points out below. If your forecast is continuous (though it need not be), then it can be thought of as a time-transformed Wiener process, but as far as I can tell he doesn't account for the time-transformation.

Everyone agrees that as uncertainty becomes really high, the best forecast is 50-50. Conversely, if you make a confident forecast (say 90-10) and you're properly calibrated, you're also implying that you're unlikely to change your forecast by very much in the future (with high probability, you won't forecast 1-99).

I think the question to ask is - how much volatility should make you doubt a forecast? If someone's forecast varied daily between 1-99 and 99-1, you might learn to just ignore them, for example. Taleb tries to offer one answer to this, but makes some questionable assumptions along the way and I don't really agree with his result.

Comment author: Lumifer 30 November 2016 02:16:42AM *  0 points [-]

It's just masturbation with math notation.

We have the election estimate F a function of a state variable W, a Wiener process WLOG

That doesn't look like a reasonable starting point to me.

Going back to the OP...

the process by which two candidates interact is highly dynamic and strategic with respect to the election date

Sure, but it's very difficult to model.

it’s actually remarkable that elections are so incredibly close to 50-50

No, it's not. In a two-party system each party adjusts until it can capture close to 50% of the votes. There is a feedback loop.

When you read Silver (or your preferred reputable election forecaster, I like Andrew Gelman) post their forecasts prior to the election, do you accept them as equal or better than any estimate you could come up with?

I'm an arrogant git, so I accept them as bit worse :-P To quote an old expression, (historical-) data driven models are like driving while looking into a rearview mirror. Things will change. In this particular case, the Brexit vote showed that under right conditions people who do not normally vote (and so are ignored by historical-data models) will come out of the woodwork.

to know the true answer

Eh, the existence of a "true answer" is doubtful. If you have a random variable, is each instantiation of it a "true answer"? You end up with a lot of true answers...

Comment author: LawrenceC 20 December 2016 08:12:32PM *  0 points [-]

We have the election estimate F a function of a state variable W, a Wiener process WLOG

That doesn't look like a reasonable starting point to me.

That's fine actually, if you assume your forecasts are continuous in time, then they're continuous martingales and thus equivalent to some time-changed Wiener process. (EDIT: your forecasts need not be continuous, my bad.) The problem is that he doesn't take into the time transformation when he claims that you need to weight your signal by 1/sqrt(t).

He also has a typo in his statement of Ito's Lemma which might affect his derivation. I'll check his math later.

Comment author: LawrenceC 24 January 2016 06:03:55PM 0 points [-]

Can you give a link to posts showing elitism in EA that weren't written in response to this one?

Comment author: [deleted] 22 December 2015 05:27:17PM 0 points [-]

So what if p(H) = 1, p(H|A) = .4, p(H|B) = .3, and p(H|C) = .3? The evidence would suggest all are wrong. But I have also determined that A, B, and C are the only possible explanations for H. Clearly there is something wrong with my measurement, but I have no method of correcting for this problem.

In response to comment by [deleted] on Open thread, Dec. 21 - Dec. 27, 2015
Comment author: LawrenceC 22 December 2015 05:31:41PM 0 points [-]

Wait, how would you get P(H) = 1?

View more: Next