by [anonymous]
4 min read6th Jul 20152 comments

4

This is a rambling post, and I will appreciate your criticism to help dry it or delete it altogether.

It seems that however little a question I research by reviewing [botanical] literature, there is always a much more complex, and rather difficult to rigorously put, question that I have to ask for the first one to be meaningful. The second answer (or tier of answers) doesn't add much to the information I will build upon, but it might - just might! - add uncertainty to the result or allow predictions in advance. How do we use it in advance? We don't apply formal reasoning, usually, and yet somehow we use it!

1.

Consider: a certain invasive plant has a host of adaptations beneficial to its success. (They probably wouldn't be sufficient if there were some actual effort to manage manmade ecosystems, but duh.) A trait many IP share is the ability to increase their ploidy - from 2 to 3, 4, 6, 8 or even 10 sets of homologous chromosomes, etc. (Polyploidization sometimes happens even in single cells in somatic (= non-reproductive) tissues, so it's really a heavily used shortcut.)

Now, suppose I want to see how a different specific property of the species behaves abroad. I will have to check the ploidy level, of course! Quick, what does the literature say, how many chromosomes can it have?

...but wait. Make no mistake, I do have to count them; but what if there is a continent-wide study showing that it generally has 4n in Eastern Europe?.. That would allow me to at least expect 4n, or whatever amount they found, and see if there is any research specifically dealing with this situation within its native range.

...but wait. Of course, those findings will be useful in discussion if I find 4n, but if I don't, they will be just a point in the overall space of possibilities. Still relevant, but not worth putting much explanatory weight on.

Something in my brain evaluated the usefulness of a piece of data other people have found, which I myself have yet to look up, of whose exact composition I have no idea - perhaps there are simply no other reports! - and placed it in context of what I really expect to do.


2.

Okay, if I can think so about other people's writings without even reading them, then maybe I can compile a dummy set of data I expect right now and compare them to those I will find in the literature. And later, to actual data. Here's a simplified problem that doesn't approach labwork on any scale (I don't want to add too many qualifiers).

Let us 'measure' 8 parameters, and check if there have been studies that have found correlations between at least some of them (and maybe with some other ones), and then try to see if our expectations based on knowledge of study area and casual surveys fit our expectations based on published research in any specific way. We are not ready to put forth any causal structure - no real data yet - though we strongly suspect (80%) that all the parameters are in some way linked to each other.

The following table is rough and repetitive, but I think useful as an illustration of how things brew in [my own] a not-much-clever student's head. The numbers are 'dimensionless', distributions are normal, total number of studies measuring each parameter is 7 or less, and all correlations are no less that 0.8.

 

Parameter

Total range

Our expected data ±SE

Reported data range*

Our imaginary correlations

Reported correlations

A

1-12

8±1

4-10

A&F, A&H

A&D, A&F, A doesn't correlate with anything if nothing else correlates with anything

B

1-5

2±1

1-4

B&C, B&E, B&G, B&H

B doesn't correlate with E if F&H

C

1-100

35±20

80±7 (only one other study)

C&B, C&F, C&H

Unknown

D

1-28

6±2

2-18

D&F

D&G (and then E&F)

E

1-500

200±46

150-480

E&B, E&G

E&F if D&G

F

1-50

47±8

8-45

F&A, F&C, F&D, F&H if A&H

F&A, F&H (and then B doesn't correlate with E)

G

1-25

18±2

11-20

G&B, G&E

G&D (and then E&F)

H

1-40

23±10

1-40

H&A, H&B, H&C, H&F (and then H&A)

H&F (and then B doesn't correlate with E)

*as in, 'for this species, out of 1-12 that are altogerther possible, only 4-10 have been so far observed. It might mean that 4-10 is the actual range, but the prior for that is about 60% due to difference in methodologies used by various researchers and to the fact that only a part of the species's habitats have been studied' etc.


Now I understand that this is hardly the most profitable presentation method and statistics has advanced much since Pearson and eveything. It is just that I find it difficult to compare graphs with diagrams with clouds along axes as they are published in different papers. I only want to guesstimate if my data fit a pattern, to discuss them qualitatively. To stratify the parameters in such a way that I will place explanative weight on some of them, and report the others to give a full picture. I have to do this explicitly, because I know I am doing this implicitly – it's a feeling I get, of brain working and deciding and not showing me what it has.

I cannot speak about A, only that maybe A, H and F do have something in common – perhaps I haven't measured it. B looks rather suspicious; I will need to reread that other report. C is intriguing, but ultimately belongs to the 'lower value stratum', and maybe those correlations I found are spurious; if only there was a way to reduce the variability... but it won't be cost-efficient. E, F, D and G also might be worth discussing together. F by itself doesn't seem very meaningful, unless there is a causal connection to the others; too bad one can imagine many plausible explanations for that. I will probably start discussion with H, since it probably has been studied for other plants and at least something has already been proposed.

Now when I have my own data I will see where they deviate from my expectations, and that will be some knowledge I can put into words, and I will hopefully start calibrating myself on these matters. And on matters of Discussion structuring:)

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 12:46 AM
[-][anonymous]9y30

tl;dr: The mix of jargon and abstraction, along with a rambling informal style, makes it hard to offer any constructive suggestions or even comments on the above. Perhaps being more explicit, and talking in terms of updating from evidence might help.

I really wanted to read this post and contribute to the discussion, as I think the main ideas resonate with things in my field: a set of potentially influential factors with complex and somewhat unknown correlational structure, but for which only a subset have been considered seriously by the dominant views in the field (until recently).

But I was unable to work out what you are aiming to do with the post, and I suspect the same is true of many other LW readers as no-one else has commented in the last couple of days either. After all, there are many commenters who tend to be quick to get involved in discussions related to probability and uncertainty; appropriate treatment of complex datasets to reveal underlying patterns and so on. You did get a couple of upvotes and no downvotes so far, so at least a couple of people see promise in the post. As you asked for metacomments about the post, I thought it worth doing so.

The first section (#1) aims to illustrate the problem but it is presented in a very jargon-heavy or field-specific manner, enough that I can't work out what your "simple question" actually is, or why you are asking it, or what the point about 4n is meant to illustrate. My interpretation after a lot of thought is that you might be talking about updating in a Bayesian sense: thinking about your current belief state and potentially adapting your views given the new evidence that (something = 4n in some circumstances), with greater updating distance for evidence that falls outside the expected range.

If this is the case, then part 2 is about priors ("dummy set of data I expect right now", "Our imaginary correlations") and then updating based on published research. But this section is presented in such abstract terms that it's entirely inaccessible to me, possibly even if I understood the issues about ploidy level from section 1. That is, labeling factors as A:H and then talking about their possible correlational structure, but without giving any clues about the questions you are trying to answer using such data, makes it seem rather hopeless. Are you trying to predict some kind of outcome measure(s)? Or find underlying factors responsible for a subset of the measures you already have? Or reject theories in the field & provide alternatives?

These seem like very relevant issues spanning a wide variety of research fields: how does one deal with a parameter space of high dimensionality (whether informally, as in the discussion in the OP, or formally, as in explicit modelling).

[-][anonymous]9y00

Thank you for such a sustained criticism! I will rewrite it when the weather changes:) which it better do soon.