All of Will_Sawin's Comments + Replies

Just a quick note on your main example - in math, and I'm guessing in theoretic areas of CS as well, we often find that searching for fundamental obstructions to a solution is the very thing that allows us to find the solution. This is true for a number of reasons. First, if we find no obstructions, we are more confident that there is some way to find a solution, which always helps. Second, if we find a partial obstruction to solutions of a certain sort, we learn something crucial about how a solution must look. Third, and perhaps most importantly, when we... (read more)

Thank you for all these interesting references. I enjoyed reading all of them, and rereading in Thurston's case.

Do people pathologize Grothendieck as having gone crazy? I mostly think people think of him as being a little bit strange. The story I heard was that because of philosophical disagreements with military funding and personal conflicts with other mathematicians he left the community and was more or less refusing to speak to anyone about mathematics, and people were sad about this and wished he would come back.

0VoiceOfRa
From the details I'm aware of "gone crazy" is not a bad description of what happened.
3JonahS
His contribution of math is too great for people to have explicitly adopted a stance that was too unfavorable to him, and many mathematicians did in fact miss him a lot. But as Perelman said: Of course, there are many mathematicians who are more or less honest. But almost all of them are conformists. They are more or less honest, but they tolerate those who are not honest." He has also said that "It is not people who break ethical standards who are regarded as aliens. It is people like me who are isolated. If pressed, many mathematicians downplay the role of those who behaved unethically toward him and the failure of the community to give him a job in favor of a narrative "poor guy, it's so sad that he developed mental health problems."

One thing that most scientists in these soft scientists already have a good grasp on, but a lot of laypeople do not, is the idea of appropriately normalizing parameters. For instance dividing something by the mass of the body, or the population of a nation, to do comparisons between individuals/nations of different sizes.

People will often make bad comparisons where they don't normalize properly. But hopefully most people reading this article are not at risk for that.

Conservation gives a local symmetry but there may not be a global symmetry.

For instance, you can imagine a physical system with no forces at all, so everything is conserved. But there are still some parameters that define the location of the particles. Then the physical system is locally very symmetric, but it may still have some symmetric global structure where the particles are constrained to lie on a surface of nontrivial topology.

Do you often read physicist's response to claims of FTL signalling? It seems to me like there is not much value in reading these, per the quote.

No, you should focus on founding a research field, which mainly requires getting other people interested in the research field.

I don't think that's really relevant to the original quote.

True, but that doesn't mean we're laboring in the dark. It just means we've got our eyes closed.

-3Eugine_Nier
Unfortunately, the people involved have an incentive to keep them closed.

I would be interested in a post about how to acquire political knowledge!

10% isn't that bad as long as you continue the programs that were found to succeed and stop the programs that were found to fail. Come up with 10 intelligent-sounding ideas, obtain expert endorsements, do 10 randomized controlled trials, get 1 significant improvement. Then repeat.

10% isn't that bad as long as you continue the programs that were found to succeed and stop the programs that were found to fail.

Unfortunately we don't really have the political system to do this.

0ThrustVectoring
It depends on how many completely ineffectual programs would demonstrate improvement versus current practices.
4Eugine_Nier
Unfortunately, governments are really bad at doing this.

None of those sound like they require military intervention?

0wwa
Ture, not yet, at least. Would you agree though, that this could easily escalate out of proportion?
1benkuhn
At least according to Val, activating System 2 requires SNS activity.

I rated the second question as more likely than the first because I think "most traits" means something different in the two questions.

Only this particular thing.

That's what the Great Filter is, no?

[This comment is no longer endorsed by its author]Reply

It would be amusing if the single primary reason that the universe is not buzzing with life and civilization is that any sufficiently advanced society develops terminology and jargon too complex to be comprehensible, and inevitably collapses because of that.

0Kawoomba
?Que?

For that purpose a better example is a computationally difficult statement, like "There are at least X twin primes below Y". We could place bets, and then acquire more computing power, and then resolve bets.

The mathematical theory of statements like the twin primes conjecture should be essentially the same, but simpler.

Nope. The key point is that as computing power becomes lower, Abram's process allows more and more inconsistent models.

So does every process.

The probability of a statement appearing first in the model-generating process is not equal to the probability that it's modeled by the end.

True. But for two very strong statements that contradict each other, there's a close relationship.

What is "the low computing power limit"? If our theories behave badly when you don't have computing power, that's unsurprising. Do you mean "the large computing power limit".

I think probability ( "the first 3^^^3 odd numbers are 'odd', then one isn't, then they go back to being 'odd'." ) / probability ("all odd numbers are 'odd'") is approximately 2^(length of 3^^^3) in Abram's system, because the probability of them appearing in the random process is supposed to be this ratio. I don't see anything about the random process that would make the first one more likely to be contradicted before being stated than the second.

0Manfred
Nope. The key point is that as computing power becomes lower, Abram's process allows more and more inconsistent models. The probability of a statement appearing first in the model-generating process is not equal to the probability that it's modeled by the end.

Yeah, updating probabilty distributions over models is believed to be good. The problem is, sometimes our probability distributions over models are wrong, as demonstrated by bad behavior when we update on certain info.

The kind of data that would make you want to zeroi out non-90% models. Is when you observe a bunch of random data points and 90% of them are true, but there are no other patterns you can detect.

The other problem is that updates can be hard to compute.

It's actually not too hard to demonstrate things about the limit for Abram's original proposal, unless there's another one that's original-er than the one I'm thinking of. It limits to the distribution of outcomes of a certain incomputable random process which uses a halting oracle to tell when certain statements are contradictory.

You are correct that it doesn't converge to a limit of assigning 1 to true statements and 0 to false statements. This is of course impossible, so we don't have to accept it. But it seems like we should not have to accept divergence - believing something with high probability, then disbelieving with high probability, then believing again, etc. Or perhaps we should?

Abram Demski's system does exactly this if you take his probability distribution and update on the statements "3 is odd", "5 is odd", etc. in a Bayesian manner. That's because his distribution assigns a reasonable probability to statements like "odd numbers are odd". Updating gives you reasonable updates on evidence.

0Manfred
His distribution also assigns a "reasonable probability" to statements like "the first 3^^^3 odd numbers are 'odd', then one isn't, then they go back to being 'odd'." In the low computing power limit, these are assigned very similar probabilities. Thus, if the first 3^^^3 odd numbers are 'odd', it's kind of a toss-up what the next one will be. Do you disagree? If so, could you use math in explaining why?

Doesn't the non-apocryphal version of that story have some relevance?

http://en.wikipedia.org/wiki/Space_Pen

http://www.snopes.com/business/genius/spacepen.asp

Using a space pencil could cause your spaceship to light on fire. Sometimes it pays to be careful.

Suppose I am deciding now whether to one-box or two-box on the problem. That's a reasonable supposition, because I am deciding now whether to one-box or two-box. There are a couple possibilities for what Omega could be doing:

  1. Omega observes my brain, and predicts what I am going to do accurately.
  2. Omega makes an inaccurate prediction, probabilistically independent from my behavior.
  3. Omega modifies my brain to a being it knows will one-box or will two-box, then makes the corresponding prediction.

    If Omega uses predictive methods that aren't 100% effective, I

... (read more)

We believe in the forecasting power, but we are uncertain as to what mechanism that forecasting power is taking advantage of to predict the world.

analogously, I know Omega will defeat me at Chess, but I do not know which opening move he will play.

In this case, the TDT decision depends critically on which causal mechanism underlies that forecasting power. Since we do not know, we will have to apply some principles for decision under uncertainty, which will depend on the payoffs, and on other features of the situation. The EDT decision does not. My intuition... (read more)

2pallas
I agree that it is challenging to assign forecasting power to a study, as we're uncertain about lots of background conditions. There is forecasting power to the degree that the set A of all variables involved with previous subjects allow for predictions about the set A' of variables involved in our case. Though when we deal with Omega who is defined to make true predictions, then we need to take this forecasting power into account, no matter what the underlying mechanism is. I mean, what if Omega in Newcomb's Problem was defined to make true predictions and you don't know anything about the underlying mechanism? Wouldn't you one-box after all? Let's call Omega's prediction P and the future event F. Once Omega's prediction are defined to be true, we can denote the following logical equivalences: P(1 boxing) <--> F(1 boxing) and P(2 boxing) <--> P(2 boxing). Given this conditions, it impossible to 2-box when box B is filled with a million dollars (you could also formulate it in terms of probabilities where such an impossible event would have the probability of 0). I admit that we have to be cautious when we deal with instances that are not defined to make true predictions. My answer depends on the specific set-up. What exactly do we mean with "It is well-known"? It doesn't seem to be a study that would describe the set A of all factors involved which we then could use to derive A' that applied to our own case. Unless we define "It is well-known" as a instance that allows for predictions in the direction A --> A', I see little reason to assume a forecasting power. Without forecasting power, screening off applies and it would be foolish to train the distinctive manner of speaking. If we specified the game in a way that there is forecasting power at work (or at least we had reason to believe so), depending on your definition of choice (I prefer one that is devoid of free will) you can or cannot choose the gene. These kind of thoughts are listed here or in the section "N

Arguably trying for apostasy, failing due to motivated cognition, and producing only nudging is a good strategy that should be applied more broadly.

2Vaniver
A good strategy for what ends?

So that I can better update on this information, can you tell me what the first exercise is?

Even if true announcments are just 9 times more likely than false announcements, then a true announcment should raise your confidence that the lottery numbers were 4 2 9 7 9 3 to 90%. This is because the probability P (429783 announced | 429783 is the number) is just the probability of a true announcement, but the probability P( 429783 announced | 429783 is not the number) is the probability of a false announcement, divided by a million.

A false announcer would have little reason to fake the number 429793. This already completely annihilates the prior probability.

You said arbitrarily large finite models, however. First-order arithmetic has no finite models. : )

2So8res
Oh, yeah, that's a typo. Fixed, thanks.

I don't see how 3 and 4 are stronger than 1 and 2. They are just the special cases of 1 and 2 where the sentence is a contradiction.

Arbitrarily large finite models are certainly not allowed in the theory of arithmetic.

2So8res
3 and 4 are generalizations to sets of sentences. But you're right, the generalization is pretty simple. Arbitrarily large models are allowed in the first-order theory of arithmetic, and no first-order theory of arithmetic can restrict models to only the integers. This is one of the surprising results of compactness.

Due to the Pareto improvement problem, I don't think this actually describes what people mean by the word "trade".

You do get to pick the languages first because there is a large but finite (say no more than 10^6) set of reasonable languages-modulo-trivial-details that could form the basis for such a measurement.

This sample may be unrepresentative. At least one researcher would have been perfectly happy talking about the researchers' lives.

How useful is it to clarify EDT until it becomes some decision theory with a different, previously determined name?

6Qiaochu_Yuan
It would be useful for my mental organization of how decision theory works. I don't know if it would be useful to anyone else though.

This is clearly not true for proposal 2. No matter the formal system, you will find a proof (YouDefect => OpponentCooperate), and therefore defect.

You can search for reasons to cooperate in a much stronger formal system than you search for reasons to defect in. Is there any decision-theoretic justification for this?

0dankane
If you do that, you're back in the same situation that you started with and are cooperating with CooperateBot again.

I would guess that current laws do not allow this, and that changing the laws would not be a good way to increase total donations, because it would strike people as a bad signal and they wouldn't want to do it. If you want more gifts next holiday season, should you offer your relatives the ability to give you refundable gifts?

I would worry more about negative flow-through effects of a decline in trust and basic decency in society. I think those are much more clear than flow-through effects of positive giving. I'm not sure if this outweighs the 20-to-1 ratio.

Most physicists most of the time aren't Dirac, Pauli, Yang, Mills, Feynmann, Witten, etc.

-2JonahS
No, but my impression is that the physics culture has been more influenced by the MWA style than mathematical culture has. In particular, my impression is that most physicists understand "the big picture" (which has been figured out by using MWAs) whereas in my experience, most mathematicians are pretty focused on individual research problems.

But mathematicians also frequently dream up highly nontrivial things that are true, that mathematicians (and physicists) don't understand sufficiently well to be able to prove even after dozens of years of reflection. The Riemann hypothesis is almost three times as old as quantum field theory. There are also the Langlands conjectures, Hodge conjecture, etc., etc. So it's not clear that something fundamentally different is going on here.

-2JonahS
I agree that the sort of reasoning that physicists use sometimes shows up in math. I don't think that the Riemann hypothesis counts as an example: as you know, its truth is suggested by surface heuristic considerations, so there's a sense in which it's clear why it should be true. I think that the Langlands program is an example: it constitutes a synthesis of many known number theoretic phenomena that collectively hinted at some general structure: they can be thought of "many weak arguments" for the general conjectures But the work of Langlands, Shimura, Grothendieck and Deligne should be distinguished between the sort of work that most mathematicians do most of the time, which tends to be significantly more skewed toward deduction. From what I've heard, quantum field theory allows one to accurately predict certain physical constants to 8 decimal places, with the reasons why the computations work very unclear. But I know essentially nothing about this. As I said, I can connect you with my friend for details.

One way I would think about PrudentBot is as not trying to approximate a decision theory, but rather a special consideration to the features of this particular format, where diverse intelligent agents are examining your source code. Rather than submitting a program to make optimal decisions, you submit a program which is simplified somewhat in a way that errs on the side of cooperation, to make it easier for people who are trying to cooperate with you.

But something about my reasoning is wrong, as it doesn't fit very well to the difference of the actual codes of ADTBot and PrudentBot.

I've just seen the claim that von Neumann had a fake proof in a couple places, and it always bothers me, since it seems to me like one can construct a hidden variable theory that explains any set of statistical predictions. Just have the hidden variables be the response to every possible measurement! Or various equivalent schemes. One needs a special condition on the type of hidden variable theory, like Bell's nonlocality.

There might just be a terminological distinction here. When I think of the reasoning used by mathematicians/physicists, I think of the reasoning used to guess what is true - in particular to produce a theory with >50% confidence. I don't think as much of the reasoning used to get you from >50% to >99%, because this is relatively superfluous for a mathematician's utility function - at best, it doubles your efficiency in proving theorems. Whereas you are concerned more with getting >99%.

This is sort of a stupid point but Euler's argument does not... (read more)

0JonahS
If you're really curious, you can talk with my chauffer (who has deep knowledge on this point).
3JonahS
I don't have much more evidence, but I think that it's significant that: 1. Physicists developed quantum field theory in the 1950's, and that it still hasn't been made mathematically rigorous, despite the fact that, e.g., Richard Borcherds appears to have spent 15 years (!!) trying. 2. The mathematicians who I know who have studied quantum field theory have indicated that they don't understand how physicists came up with the methods that they did. These suggest that the physicists who invented this theory reasoned in a very different way from how mathematiicans usually do.
0elharo
I won't pretend to be able to reproduce it here. You can find the original in English translation in von Neumann's Mathematical Foundations of Quantum Mechanics. According to Wikipedia, So apparently we're still trying to figure out if this proof is acceptable or not. Note, however, that Bub's claim is that the proof didn't actually say what everyone thought it said, not that Bohm was wrong. Thus we have another possible failure mode: a correct proof that doesn't say what people think it says. This is not actually as uncommon as it should be, and goes way beyond math. There are many examples of well-known "facts" for which numerous authoritative citations can be produced, but that are in reality false. For example, the lighthouse and aircraft carrier story is in fact false, despite "appearing in a 1987 issue of Proceedings, a publication of the U.S. Naval Institute." Of course, as I type this I notice that I haven't personally verified that the 1987 issue of Proceedings says what Stephen Covey's The Seven Habits of Highly Effective People, the secondary source that cited it, says it says. This is how bad sources work their way into the literature. Too often authors copy citations from each other without going back to the original. How many of us know about experiments like Robbers Cave or Stanford Prison only from HpMOR? What's the chance we've explained it to others, but gotten crucial details wrong?

But that still doesn't tell you whether to invest in the startup. If an ORSA-er is just paralyzed by indecision here and decides to leave VC and go into theoretical math or whatever, he or she is not really winning.

Unrelatedly, a fun example of MWA triumphing over ORSA could be geologists vs. physicists on the age of the Earth.

0JonahS
I would guess that ORSA doesn't suffice to be a successful VC. The claim is that it could help, in conjunction with MWAs. If you scrutinize the weak arguments and find that the break down in different ways, then that suggests that the arguments are independent, and that you should invest in the start-up. If you find that they break down in the same way, then that suggests that you shouldn't invest in the start-up.

Personally I found the quantitative majors example a very vivid introduction to this style of argument, and much more vivid than the Penrose example. I think the quantitative majors does a very good job of illustrating the kind of reasoning you are supporting, and why it is helpful. I don't understand the relevance of many weak arguments to the Penrose debate - it seems like a case of some strong and some weak arguments vs. one weak argument or something. If others are like me, a different example might be more helpful.

3JonahS
In hindsight, my presentation in this article was suboptimal. I clarify in a number of comments on this thread. The common thread that ties together the quantitative majors example and the Penrose example is "rather than dismissing arguments that appear to break down upon examination, one should recognize that such arguments often have a nontrivial chance of succeeding owing to model uncertainty, and one should count such arguments as evidence." In the case of the quantitative majors example, the point is that you can amass a large number such arguments to reach a confident conclusion. In the Penrose example, the point is that one should hedge rather than concluding that Penrose is virtually certain to be wrong. I can give more examples of the use of MWAs to reach a confident conclusion. They're not sufficiently polished to post, so if you're interested in hearing them, shoot me at email at jsinick@gmail.com.
Load More