This is a place for matchmaking researchers with research ideas in applied sciences that have a sizeable impact on human condition.

Edit: let there be janitors.

What are the so-called adult problems in your field and who, in your opinion, is needed most to solve them?

Add reasons why you are not working on them yourself. (I, for example, am in a PhD program very remote from practical applications and might have a chance to do some research on useful stuff if I survive that long.)

New Comment
20 comments, sorted by Click to highlight new comments since: Today at 5:59 PM

We are composed of atoms....

I muse on the vasty Everything...

As a metacomment, if after reading the title and a few paragraphs I still don't know where an article is going, I'm outta here.

Do other people like "meandering stream of consciousness that might and might not someday get to a point" articles? I see more and more of this in magazines and newspapers.

[-][anonymous]9y20

Thank you, edited.

Let me throw in some external references:

Not sure why the word rationality appears in your post.

I can't tell how much of your problem is designing the study, and how much is analyzing data someone already collected. Can't you talk to a statistician at your Uni?

[-][anonymous]9y00

Well of course if I wanted real contribution I would have written a far more specific research proposal, but in our institute there are no statisticians I know of:)

I was wondering, actually, what difference is there between posing a problem to a statistician and to a broadly educated 'rationalist'. If rationalists think they should win, then why do they only win in some narrow lanes of life? There's plenty about fighting akrasia on LW, which is, however much impact it might have in one's life, a small-scale issue. There's plenty of AI discussion, which is giant-scale. Examples of middle-scale things I can remember without looking are Meal squares and something about one LWer setting up a business to help others find work in Australia. Thus the strange post:)

I was wondering, actually, what difference is there between posing a problem to a statistician and to a broadly educated 'rationalist'.

If the statistician is competent, he has a specialized toolset and the necessary skills to use it -- something that a generalist usually lacks.

The question is similar to "what's the difference between complaining about some pain to a doctor and to a broadly educated rationalist".

Moreover, stats is one of those fields where "A little knowledge is a dangerous thing" applies in spades...

Rationalist record with stats isn't that great, imo. Stats isn't so easy to learn...

This is kind of a strange post, but I suppose I might as well try.

I am working on comperical linguistics, which means approaching the problems of linguistics by working on large scale, lossless compression of text data. The rationale for this research is laid out in my book. I've made nice progress over the last year or two, but there is still a lot of work to be done. Currently the day-to-day work is about 60% software engineering, 30% linguistic analysis (random recent research observation: there is a important difference between relative clauses where the headword noun functions as the subject vs object of the clause; in one case is it permissible to drop the Wh-word complementizer, in the other case it isn't) and 10% ML/stats. I am hopeful that the stats/math component will increase in the near term and the SE work will decrease, as I've already done a lot of the ground work. If this kind of research sounds interesting to you, let me know and maybe we can figure out a way to collaborate.

More broadly, I would like to encourage people to apply the comperical philosophy to other research domains. The main requirement is that it be possible to obtain a large amount of rich, structured (but not labelled) data. For example, I would love to see someone try to compress the Sloan Digital Sky Survey database. Compressors for this data set will need to have a deep "understanding" of physics and cosmology. The compression principle might be a way to resolve debates about topics like dark matter and black holes. It could also focus attention on segments of the astronomical data that defy conventional explanation (this would be detected as a spike in the codelength for a region). Such abnormalities might indicate the presence of alien activity or new types of astrophysical phenomena.

If you think that this method could be useful for astronomy in the future, can you point to astronomical controversies in the past that it would have helped with?

You can't resolve debates unless there are actual competing theories. The main problem with dark matter is that people don't have theories precise enough to compare. When there are competing theories, astronomy appears to me pretty good at isolating the relevant data and comparing the theories.

And I imagine that is why you are working on linguistics, not astronomy. Astronomy has a solid grounding that allows it to process and isolate data, while a lot of linguistic claims require agreement on labeling before they can be assessed, making it easy for people to hide behind subjective labeling.

The compression principle might be a way to resolve debates about topics like dark matter and black holes.

I don't follow... Can you elaborate on how some specific form of compression could do that?

I can't explain the whole philosophy here, but basically the idea is: you have two theories, A and B. You instantiate them as lossless data compressors, and invoke the compressors on the dataset. The one that produces a shorter net codelength (including the length of the compressor program itself) is superior. In practice the rival theories will probably be very similar and produce different predictions (= probability distributions over observational outcomes) only on small regions of the dataset.

Lossless data compression is a highly rigorous evaluation principle. Many theories are simply not well-specified enough to be built into compressors; these theories, I say (reformulating Popper and Yudkowsky), should not be considered scientific. If the compressor implementation contains any bugs, these bugs will immediately appear when the decoded data fails to agree exactly with the original data. Finally, if the theory is scientific and the implementation is correct, it still remains to be seen if the theory is empirically accurate, which is required for lossless data compression in the face of the domain's No Free Lunch theorem.

So say you and I have two rival theories of black hole dynamics. If the theories are different in a scientifically meaningful way, they must make different predictions about some data that could be observed. That means the compressors corresponding to our theories will assign different codelengths to some observations in the dataset. If your theory is more accurate, it will achieve shorter codelengths overall. This could happen by, say, your theory properly accounting for the velocity dispersion of galaxies under the effect of dark matter. Or it could happen by my theory being hit by a big Black Swan penalty because it cannot explain an astronomical jet coming from a black hole.

What about the fact that the best compression algorithm may be insanely expensive to run? We know the math that describes the behavior of quarks, which is to say, we can in principle generate the results of all possible experiments with quarks by solving a few equations. However doing computations with the theory is extremely expensive and it takes something like 10^15 floating point operations to compute, say, some basic properties of the proton to 1% accuracy.

Good point. My answer is: yes, we have to accept a speed/accuracy tradeoff. That doesn't seem like such a disaster in practice.

Some people, primarily Matt Mahoney, have actually organized data compression contests similar to what I'm advocating. Mahoney's solution is just to impose a certain time limit that is reasonable but arbitrary. In the future, researchers could develop a spectrum of theories, each of which achieves a non-dominated position on a speed/compression curve. Unless something Very Strange happened, each faster/less accurate theory would be related to its slower/more accurate cousin by a standard suite of approximations. (It would be strange - but interesting - if you could get an accurate and fast theory by doing a nonstandard approximation or introducing some kind of new concept).

[-][anonymous]9y00

Thank you, this is almost what I meant. Could you add some details on why you consider this question important over perhaps some others?

There are people who oppose, say, GMO, because they are afraid of possible harm the technology can bring.

Using the fungi seems to be legal to me, so why are no companies making money with it? Or are they?

[-][anonymous]9y00

Yes, it is already marketed. However, the caution arises from the fact that if you add fungi that favor certain plants over other plants, and let them reproduce, you might drive ecosystem dynamics away from stability (something like that).

If the concern is about driving ecosystem dynamics away from stability, why does it matter in what way the fungus gives the plants benefits?

[-][anonymous]9y00

Once it escapes from the field, it might do lots of unintended harm. Different plants react differently to different fungi.

I understand that point what I don't know whether you know about the harm if you know why the fungi helps a particular plant.

[-][anonymous]9y00

The way it was stated in the book, it's just a white spot on the map. (In vitro culture of mycorrhiza. Ed. by Declerk, Strullu and Fortin.)