All of Aaro Salosensaari's Comments + Replies

>It turns out that using Transformers in the autoregressive mode (with output tokens being added back to the input by concatenating the previous input and the new output token, and sending the new versions of the input through the model again and again) results in them emulating dynamics of recurrent neural networks, and that clarifies things a lot...

I'll bite: Could you dumb down the implications of the paper a little bit, what is the difference between a Transformer emulating a RNN and  some pre-Transformer RNNs and/or not-RNN?

My much more novice... (read more)

5mishka
I think that gradient descent in computation is super-important (this is, apparently, the key mechanism responsible for the phenomenon of few-shot learning). And, moreover, massive linear combinations of vectors ("artificial attention") seem to be super-important (the starting point in this sense was adding this kind of artificial attention mechanism to the RNN architecture in 2014). Yes, this might be related to my personal history, which is that I have been focusing on whether one can express algorithms as neural machines, and whether one can meaningfully speak about continuously deformable programs. And, then, for Turing completeness one would want both unlimited number of steps and unbounded memory, and there has been a rather involved debate on whether RNNs are more like Turing complete programs, or are they, in practice, only similar to finite automata. (It's a long topic, on which there is more to say.) So, from this viewpoint, a machine with a fixed finite number of steps seems very limited. But autoregressive Transformers are not machines with a fixed finite number of steps, they just commit to emitting a token after a fixed number of steps, but they can continue in an unbounded fashion, so they are very similar to RNNs in this sense.
5dxu
I’ll bite even further, and ask for the concept of “recurrence” itself to be dumbed down. What is “recurrence”, why is it important, and in what sense does e.g. a feedforward network hooked up to something like MCTS not qualify as relevantly “recurrent”?

Epistemic status: I am probably misunderstanding some critical parts of the theory, and I am quite ignorant on technical implementation of prediction markets. But posting this could be useful for my and others' learning. 

First question. Am I understanding correctly how the market would function. Taking your IRT probit market example, here is what I gather:

(1) I want to make a bet on the conditional market P(X_i | Y). I have a visual UI where I slide bars to make a bet on parameters a and b; (dropping subscript i) however, internally this is represente... (read more)

2tailcalled
Yep. (Or maybe not a and b, but some other parameterization; I would guess the most intuitive UI is still an open question.) Yep. Good question! This is actually a problem that I didn't realize at the time I started promoting LVPMs, though I did realize it somewhere along the way, albeit underestimating it and expecting it to solve itself, though somewhere along the way I realized that I had underestimated it. I still think it is only a very minor problem, but lemme give my analysis: There are multiple ways to implement this type of bet, and your approach of changing [p_old, a'] to [p_new, a'] is only one of them; it was the one I favored earlier on, but there's an alternate approach that others have favored, and in fact there's a first approach that I've now come to favor. Let's take all of them. The approach described by my post/by you: Assuming that we change the internal representation in the database from [p_old, a'] to [p_new, a'], this is equivalent to making a bet on the latent variable market that only affects b. So essentially you would be making a bet that b has higher probability across the entire latent variable. As part of my initial push, I made the mistake of assuming that this was equivalent to just making a bet on X without the latent variable market, but then later I realized I was mixing up negative log probs and logits (you are making a bet that b has higher logit across the latent variable, but a prediction market without latent variables would be making a bet that b has higher negative log probabilities, not higher logits). Conventional approach: I think the approach advocated by e.g. Robin Hanson is to have one grand Bayes net structure for all of the markets that anyone can make predictions on, and then when people make a prediction on only a subset of the structure, such as an indicator variable, then that bet is treated as virtual evidence which is used to update the entire Bayesian network. This is basically similar in consequence

>Glancing back and forth, I keep changing my mind about whether or not I think the messy empirical data is close enough to the prediction from the normal distribution to accept your conclusion, or whether that elbow feature around 1976-80 seems compelling.

 

I realize you two had a long discussion about this, but my few cents: This kind of situation (eyeballing is not enough to resolve which of two models fit the data better) is exactly the kind of situation for which a concept of statistical inference is very useful.

I'm a bit too busy right now &nbs... (read more)

2DirectedEvolution
I think this is a great idea - I'm also too busy to do this right now and not equipped with that skillset, but I would read your work with interest if you chose to carry this out.

Hyperbole aside, how many of those experts linked (and/or contributing to the 10% / 2% estimate) have arrived to their conclusion with a thought process that is "downstream" from the thoughtspace the parent commenter thinks suspect? Then it would not qualify as independent evidence or rebuttal, as it is included as the target of criticism.

4Mau
One specific concern people could have with this thoughtspace is the concern that it's hard to square with the knowledge that an AI PhD [edit: or rather, AI/ML expertise more broadly] provides. I took this point to be strongly suggested by the author's suggestions that "experts knowledgeable in the relevant subject matters that would actually lead to doom find this laughable" and that someone who spent their early years "reading/studying deep learning, systems neuroscience, etc." would not find risk arguments compelling. That's directly refuted by the surveys (though I agree that some other concerns about this thoughtspace aren't). (However, it looks like the author was making a different point to what I first understood.)

Thanks. I had read it years ago, but didn't remember that he had many more points than O(n^3.5 log(1/h)) scale and provides useful references (other than Red Plenty).

(I initially thought it would be better not to mention the context of the question as it might bias the responses. OTOH the context could make the marginal LW poster more interested in providing answers, so I here it is:)

It came up in an argument that the difficulty of economic calculation problem could be a difficult to a hypothetical singleton, insomuch a singleton agent needs certain amount of compute relative to the economy in question. My intuition consists two related hypotheses: First, during any transition period where any agent participates in glo... (read more)

Can anyone recommend good reading material on economic calculation problem? 

3Kaj_Sotala
It's been a while since I read it, and it's a blog post rather than anything formal, but I recall liking https://crookedtimber.org/2012/05/30/in-soviet-union-optimization-problem-solves-you/ .
1Aaro Salosensaari
(I initially thought it would be better not to mention the context of the question as it might bias the responses. OTOH the context could make the marginal LW poster more interested in providing answers, so I here it is:) It came up in an argument that the difficulty of economic calculation problem could be a difficult to a hypothetical singleton, insomuch a singleton agent needs certain amount of compute relative to the economy in question. My intuition consists two related hypotheses: First, during any transition period where any agent participates in global economy where most other participants are humans ("economy" could be interpreted widely to include many human transactions), can the problem of economic calculation provide some limits how much calculation would be needed for an agent to become able to manipulate / dominate the economy? (Is it enough for an agent to be marginally more capable than any other participant, or does it get swamped by the sheer size of the economy is large enough?)  Secondly, if an Mises/Hayek answer is correct and the economic calculation problem is solved most efficiently by a distributed calculation, it could imply that a single agent in a charge of a number of processes on "global economy" scale could be out-competed by a community of coordinating agents. [1] However, I would like to read more to judge if my intuitions are correct. Maybe all of this is already rendered moot by results I simply do not how to find. ([1] Related but tangential: Can one provide a definition when distributed computation is no longer a singleton but more-or-less aligned community of individual agents? My hunch is, there could be a characterizations related to speed of communication between agents / processes in a singleton. Ultimately speed of light is prone to mandate some limitations.)

I found this interesting. Finnish is also language of about 5 million speakers, but we have a commonly used natural translation of "economies of scale" (mittakaavaetu, "benefit of scale"). Any commonplace obvious translation for "Single point of failure" didn't strike my mind, so I googled, and found engineering MSc thesis works and similar documents: the words they choose to use included yksittäinen kriittinen prosessi ("single critical process", most natural one IMO), yksittäinen vikaantumispiste ("single point of failure", literal translation and a bit ... (read more)

"if I were an AGI, then I'd be able to solve this problem" "I can easily imagine"

Doesn't this way of analysis come with a ton of other assumptions left unstated? 

 

Suppose "I" am an AGI  running on a data center and I can modeled as an agent with some objective function that manifest as desires and I know my instantiation needs electricity and GPUs to continue running. Creating another copy of "I" running in the same data center will use the same resources. Creating another copy in some other data center requires some other data center. ... (read more)

Why wonder when you can think: What is the substantial difference in MuZero (as described in [1]) that makes the algorithm to consider interruptions?

Maybe I show some great ignorance of MDPs, but naively I don't see how an interrupted game could come into play as a signal in the specified implementations of MuZero:

Explicit signals I can't see, because the explicitly specified reward u seems contingent ultimately only on the game state / win condition. 

One can hypothesize an implicit signal could be introduced if algorithm learns to "avoid game states ... (read more)

2gwern
The most obvious difference is that MuZero learns an environment, it doesn't take a hardwired simulator handed down from on high. AlphaZero (probably) cannot have any concept of interruption that is not in the simulator and is forced to plan using only the simulator space of outcomes while assuming every action has the outcome the simulator says it has, while MuZero can learn from its on-policy games as well as logged offline games any of which can contain interruptions either explicitly or implicitly (by filtering them out), and it does planning using the model it learns incorporating the possibility of interruptions. (Hence the analogy to Q-learning vs SARSA.) Even if the interrupted episodes are not set to -1 or 0 rewards (which obviously just directly incentivize a MuZero agent to avoid interruption as simply another aspect of playing against the adversary), and you drop any episode with interruption completely to try to render the agent as ignorant as possible about interruptions, that very filtering could backfire. For example, the resulting ignorance/model uncertainty could motivate avoidance of interruption as part of risk-sensitive play: "I don't know why, but node X [which triggers interruption] never shows up in training even though earlier state node X-1 does, so I am highly uncertain about its value according to the ensemble of model rollouts, and so X might be extremely negative compared to my known-good alternatives Y and Z while the probability of it being the best possible outcome has an extremely small base rate; so, I will act to avoid X." (This can also incentivize exploration & exploitation of human manipulation strategies simply because of the uncertainty around its value! Leading to dangerous divergences in different scenarios like testing vs deployment.)

a backdrop of decades of mistreatment of the Japanese by Western countries.

I find this a bit difficult to take seriously. The WW2 in the Pacific didn't start with well-treatment of China and other countries by Japan, either. Naturally Japanese didn't care about that part of the story, but hey had plenty of other options how they could have responded their the UK or the US trade policy instead of invading Manchuria.

making Ukraine a country with a similar international status to Austria or Finland during the Cold War would be one immediate solution.

This is n... (read more)

5Ege Erdil
I never said that it did. As I said, "in this context it takes two to dance". I don't know why it's hard for you to believe. In 1918, Fumimaro Konoe (then part of the Japanese delegation to the Paris Peace Conference) wrote an essay titled "Against a Pacifism Centered on England and America" in which he stated the following: "Japan is limited in territory, poor in natural resources, and has a small population and thus a meager market for manufactured products. If England closed off its colonies, how would we be able to assure the nation’s secure survival? In such a case, the need to ensure its survival would compel Japan to attempt to overthrow the status quo as Germany did before the war." Konoe was Prime Minister for most of 1941, resigning in October only after his attempts to negotiate a last-minute settlement with the United States came to nothing. There's no evidence to suggest he changed his mind, though in 1941 he opposed the war with the United States on pragmatic grounds since he believed Japan would lose. The question is not about whether Japan could have done something different. Of course they could have. The question is whether decades of animosity contributed to the outbreak of war, and it's clear the answer is affirmative here. Even the Japanese invasion of China is hard to imagine if Japan had been better treated by the United Kingdom and the United States. Japan had two key concerns: physical and economic security. They felt their physical security was threatened because they faced two potentially hostile powers in China and the USSR. In the 1920s Japan had cooperated with Western countries within the framework of the Washington Order, in which China was to remain under an "open door policy" with respect to trade and all powers in the Pacific would cooperate to limit the size of their navies. This order, established when China was weak due to internal strife, caused resentment in China and the terms of this order prevented Japan from ensuring

First, avoiding arguments from the "other side" on the basis that they might convince you of false things assumes that the other side's belief are in fact false.  

I believe it is less about true/false, but whether you believe the "other side" is making a well-intentioned effort at obtaining and sharing accurate maps of reality. On practical level, I think it is unlikely studying Russian media in detail is useful and cost-effective for a modal LWer. 

Propaganda during wartime, especially during total war, is a prima facia example of situation where... (read more)

7Radford Neal
If you live in one of the countries at war, you will inevitably be exposed to "your" side's propaganda.  If you also look at the propaganda produced by the other side, you may well gain valuable information.  For instance, if both sides acknowledge the truth of some fact, you can be reasonably sure that that it is the truth (whereas otherwise you might doubt whether your side is telling the truth about that).  And if the other side's propaganda talks about some issue that you've never even heard about, it may be useful to research whether something is being concealed by your side. Even when those writing the propaganda have zero concern with telling the truth, they often will tell the truth, simply because it tends to be more believable.  So looking at propaganda may expose you to true statements (which you hadn't previously considered), which you may be able to confirm are true by independent means. 

Open thread is presumably the best place for a low-effort questions, so here goes: 

I came across this post from 2012: Thoughts on the Singularity Institute (SI) by Holden Karnofsky (then-Co-Executive Director of GiveWell). Interestingly enough, some of the object-level objections (under subtitle "objections") Karnofsky raises[1] are similar to some points that were came up in the Yudkowsky/chathamroom.com discussion and Ngo/Yudkowsky dialogue I read the other day (or rather, read parts of, because they were quite long).

What are people's thought about ... (read more)

5Steven Byrnes
The old rebuttals I'm familiar with are Gwern's and Eliezer's and Luke's. Newer responses might also include things like Richard Ngo's AGI safety from first principles or Joe Carlsmith's report on power-seeking AIs. (Risk is disjunctive; there are a lot of different ways that reality could turn out worse than Holden-2012 expected.) Obviously Holden himself changed his mind; I vaguely recall that he wrote something about why, but I can't immediately find it. I'm not sure that's accurate. His blog posts are getting cross-posting from his account, but that could also be the work of an LW administrator (with his permission).

Yeah, random internet forum users emailing eminent mathematician en masse would be strange enough to be non-productive. I for one wasn't thinking anyone would to, I don't think it was what OP suggested. To anyone contemplating sending one, the task is best delegated to someone who not only can write coherent research proposals that sound relevant to the person approached, but can write the best one.

Mathematicians receive occasional crank emails about solutions to P ?= NP, so anyone doing the reaching needs to be reputable to get past their crank filters.

A reply to comments showing skepticism about how mathematical skills of someone like Tao could be relevant:

Last time I thought I would understood anything of Tao's blog was around ~2019. Then he was working on curious stuff, like whether he could prove there can be finite-time blow-up singularities in Navier-Stokes fluid equations (coincidentally, solving the famous Millenium prize problem showing non-smooth solution) by constructing a fluid state that both obeys Navier-Stokes and also is Turing complete and ... ugh, maybe I quote the man himself:

[...] one

... (read more)

I have not read Irving either but he is relatively "world-famous" 1970s-1980s author. (In case it helps you to calibrate, his novel The World According To Garp is the kind of book that was published in translation in the prestigious Keltainen Kirjasto series by Finnish publisher Tammi.)

However, I would like make an opposing point about literature and fiction. I was surprised that post author mentioned a work of fiction as a positive example that demonstrates how some commonly argued option is a fabricated one. I'd think literature would at least as often (... (read more)

The picture looks like evidence there is something very weird going on that is not reflected in the numbers or arguments provided. There are homeless encampments in many countries around the world, but very rarely 20 min walk from anyone's office.

2TurnTrout
My cached answer is "Bay area zoning", but I honestly haven't looked into it in great depth. There's a 3-tent encampment literally 20 seconds from the office where I'm at right now in downtown Berkeley.
9Kaj_Sotala
Yeah, I liked the post overall, but the rest of it seemed entirely unrelated to the picture and the claim that this is a success story for economics. I was expecting it to come back and explain the connection, but it seemed to never do.

From what I remember form my history of Finland classes, the 19th/early 20th century state project to build a compulsory school system met some not insignificant opposition from parents. They liked having the kids working instead going to school, especially in agrarian households.

Now, I don't want to get into debate whether schooling is useful or not (and for whom, and for what purpose, and if the usefulness has changed over time), but there is something illustrative in the opposition: children rarely are independent agents to the extent adults are. If the... (read more)

5CronoDAS
Seconded. If a child going to school is better for the child but a child working in a sweatshop is better for the parents, some children are going to end up in sweatshops.

Genetic algorithms are an old and classic staple of LW. [1]

Genetic algorithms (as used in optimization problems) traditionally assume "full connectivity", that is any two candidates can mate. In other words, population network is assumed to be complete and potential mate is randomly sampled from the population.

Aymeric Vié has a paper out showing (numerical experiments) that some less dense but low average shortest path length network structures appear to result in better optimization results: https://doi.org/10.1145/3449726.3463134

Maybe this isn't news for... (read more)

My take is that the scientific concept of "heritability" has some problems in its construction: the exact definition (Var(genotype)/Var(phenotype)), while useful in some regard, does not match the intuition of the word

Maybe the quantity should be called "relative heritability", "heritability relative to population" or "proportion of population variance explained", like many other quantities that similarly have form A/B where both A and B are (population) parameters or their estimates.

Addendum 1.

"Heritable variance"? See also Feldman, Lewontin 1975 https://scholar.google.com/scholar?cluster=10462607332604262282

The smartest people tend to be ambitious.

 

If this is anecdotal, wouldn't it be easily explained by some sort of selection bias? Smart ambitious people are much visible than smart, definitely-not-ambitious people (and by definition of "smart", they have probably better chances at succeeding in their ambitions than equally ambitious less smart people).

Anecdotally, I have met some relatively smart people who are not very ambitious, and I can imagine there could be much smarter people one does not meet except by random chance, because they do not have muc... (read more)

What is the correct amount of self praise? Do you have reasons to believe Isusr has made an incorrect evaluation regarding their aptitude? Do you believe that even if the evaluation is correct that the post is still harmful?

I don't know if the post is harmful, but in general, "too much self-praise" can be a  failure mode that makes argumentative writing less likely to succeed at convincing readers of its arguments.

The following blog post might be of interest to anyone who either claims Dunning-Kruger means that low-skill people think they are highly skilled or claims Dunning-Kruger is not real: http://haines-lab.com/post/2021-01-10-modeling-classic-effects-dunning-kruger/

The author presents the case how D-K is misunderstood, then why one might suspect it is a mathematical artifact from measurement error, but then shows with a model that there is some evidence for Dunning-Kruger effect, as some observed data are reliably explained with an additive perception bias + n... (read more)

Agreed. The difference is more pronounced in live social situations, and quite easy to quantify in situation such as a proof-heavy mathematics class in college. Many students who have done their work on the problem sets can present a correct solution and if not, usually follow the presented solution. For some, completing the problem sets took more time. Likewise, some people get more out of any spontaneous discussion of the problems. Some relatively rare people would pull out the proofs and points seemingly from thin air: look at the assignment, made some brief notes, and then present their solution intelligibly while talking about it.

However, European Commission seems to defy that rule. The members are nominated by the national governments, yet, they seem not to give unfair advantage to their native countries.

I am uncertain if this is true, or at least, it can be debated. There have been numerous and many complaints of Commission producing decisions and policies that favor some countries.However, such failure mode, if true, is not of the form where individual comissioners favor their native countries, but where the commission as a body adopts stances compatible with overall political p... (read more)

>So the context of this post is less about religion itself, and more about an overall cluster of ways that rationalists/skeptics/etc could still use to improve their own thinking.

At best, this line sounds like arguing that this thing that looks like fish is not a fish because of its evolutionary history, method of giving birth, and it has this funny nose on top of its head through with it breathes makes it a mammal, thus not fish -- in a world where the most salient definition of fish is functional one, "it is a sea-creature that lives in water and we n... (read more)

>And as Duncan is getting at, employment has changed a lot since the term was coined and there's now a lot more opportunity for jobs and work to be aligned with a person's personal goals.

I can agree, I am skeptical that this ...integratedness(?) is actually a good thing for everyone. From point of view of the old "work vs life" people who valued the life part, it probably looks like them losing if what they get is "your work is supposed to integral part of what you choose to do with your life" but the options of where and what kind of work to do are not... (read more)

3Viliam
No matter what the market is like, your goals are not going to be 100% economical. I suppose many people value things like being healthy, being fit, reading books, watching movies, spending time with friends, spending time with family... but all of these are only valuable to you personally, no company is going to pay you for this. (Okay, there are some rare situations, like movie reviewers or professional sportsmen; but even then the reviewer is not paid for being fit, and the sportsman is not paid for watching movies.) So you already have the inevitable conflict between "whatever you want to be your career" and "all other personally valuable things". And the market insisting on employees being passionate about their jobs pushes them to prioritize the former at expense of the latter. It's like a psychological ploy to make you feel guilty about having complex values and personal boundaries.

>The Church of England still has bishops that vote in the house of lords. 

That is argument for particular church-state relationship. The original claim spoke of entanglement (in the present tense!). For reference, the archbishop of Evangelical-Lutheran Church in Finland has always been appointed by whomever is the head of state since Gustav I Vasa embraced the Protestantism and the church was until recently an official state apparatus and to some extent still is. The Holy See has had negligible effect here since centuries, and some historians maint... (read more)

I sort of believe in something like this, except without the magical bits. It motivates me to vote in elections and follow the laws also when there is no effective enforcement. Maybe it is a consequence of reading Pratchett's Discworld novels when I was in impressionable age. 

My mundane explanation (or rationalization) is a bit difficult to write, but I believe it is because of:

>It gets in people's minds.

When people believe something, it affects their behavior. Thus memetic phenomena can have real effects.

As an example I feel is related to this, I ... (read more)

4Stuart Anderson
-

I agree with Phil that this sounds very ... counterintuitive. Usually nothing is free, and even with free things there is consequences or some sort of externality.

However, I recently read an argument by a Finnish finance podcaster, who argued while the intuition might be true and government debt system probably is not sustainable and is going to have some kind of messup in long term, not participating may put your country at disadvantage compared to countries who take the "free" money and invest it, and thus have more assets when it all falls down.

2Stuart Anderson
-

I realize this is a 3mo old comment.

>Nor does China entangle religion with politics to the same extent you find in the Christian and Islamic worlds. This makes it easier to think about conflicts. I feel it produces a better understanding of political theory and strategy.

Does not entangle? I thought China is the only country of note around that enforces their version of Catholic church with Chinese characteristics (the translation used by Wikipedia is "Chinese Patriotic Catholic Church", apparently excommunicated by the pope in Rome). One can discuss how... (read more)

2ChristianKl
The Chinese fight Catholicism this way precisely because Catholism is politic in a way that their homegrown religions weren't.  The Chinese Patriotic Catholic Church is not going to have any influence on the way the CCP governs China. You can't say the say thing for either Christianity or Islam for most of their history.  The Church of England still has bishops that vote in the house of lords. 

Sure, but statements like

>ANNs are built out of neurons. BNNs are built out of neurons too.

are imprecise and possibly imprecise enough to be also incorrect if it turns out that biological neurons do something different than perceptrons that is important. Without making the exact arguments and presenting evidence in what respects the perceptron model is useful, it is quite easy to bake in conclusions along the lines of "this algorithm for ANNs is a good model of biology" in the assumptions "both are built out of neurons".

Home delivery is way cheaper than it used to be.

 

I am going to push back a little on this one, and ask for context and numbers? 

As some of my older relatives commented when Wolt became popular here, before people started going to supermarkets, it was common for shops to have a delivery / errand boy (this would have been 1950s, and more prevalent before the WW2). It is one thing that strikes out reading biographies; teenage Harpo Marx dropped out from school and did odd jobs as an errand boy; they are ubiquitous part of the background in Anne Fran... (read more)

8lsusr
When I think about home delivery, my reference point is the dao xiao mian 刀削面 knife I bought in 2020 from AliExpress for $3.57 including shipping and delivery to my door. In the 1990s, the simplest way to get an exotic product like that was to fly to China. I'm not just thinking about the ease of sending something from one house to another within my city. I'm thinking about the ease of sending something from an arbitrary residence on Earth to an arbitrary residence on Earth.

Thanks for writing this, the power to weight statistics are quite interesting. I have an another, longer reply with my own take (edit. comments about the graph, that is) in the works, but while writing it, I started to wonder about a tangential question:

I am saying that many common anti-short-timelines arguments are bogus. They need to do much more than just appeal to the complexity/mysteriousness/efficiency of the brain; they need to argue that some property X is both necessary for TAI and not about to be figured out for AI anytime soon, not even after th

... (read more)
4Daniel Kokotajlo
UPDATE: I just reread Ajeya's report and actually her version of the human lifetime anchor is shifted +3 OOMs because she's trying to account for how humans have priors, special sauce, etc. in them given by evolution. So... I'm pretty perplexed. Even after shifting the anchor +3 OOMs to account for special sauce etc. she still assigns only 5% weight to it! Note that if you just did the naive thing, which is to look at the 41-OOM cost of recapitulating evolution as a loose upper bound, and take (say) 85% of your credence and divide it evenly between all the orders of magnitude less than that but more than where we are now... you'd get something like 5% per OOM, which would come out to 25% or so for the human lifetime anchor!
4Daniel Kokotajlo
Thanks, and I look forward to seeing your reply! I'm partly responding to things people have said in conversation with me. For example, the thing Longs says that is a direct quote from one of my friends commenting on an early draft! I've been hearing things like this pretty often from a bunch of different people. I'm also partly responding to Ajeya Cotra's epic timelines report. It's IMO the best piece of work on the topic there is, and it's also the thing that bigshot AI safety people (like OpenPhil, Paul, Rohin, etc.) seem to take most seriously. I think it's right about most things but one major disagreement I have with it is that it seems to put too much probability mass on "Lots of special sauce needed" hypotheses. Shorty's position--the "not very much special sauce" position--applied to AI seems to be that we should anchor on the Human Lifetime anchor. If you think there's probably a little special sauce but that it can be compensated for via e.g. longer training times and bigger NNs, then that's something like the Short Horizon NN hypothesis. I consider Genome Anchor, Medium and Long-Horizon NN Anchor, and of course Evolution Anchor to be "lots of special sauce needed" views. In particular, all of these views involve, according to Ajeya, "Learning to Learn:" I'll quote her in full: I interpret her as making the non-bogus version of the argument from efficiency here. However, (and I worry that I'm being uncharitable?) I also suspect that the bogus version of the argument is sneaking in a little bit, she keeps talking about how evolution took millions of generations to do stuff, as if that's relevant... I certainly think that even if she isn't falling for the bogus arguments herself, it's easy for people to fall for them, and this would make her conclusions seem much more reasonable than they are. In particular, she assigns only 5% weight to the human lifetime anchor--the hypothesis that Shorty is promoting--and only 20% weight to the short-horizon NN ancho

Eventually, yes, it is related to arguments concerning people. But I was curious about what aesthetics remain after I try to abstract away the messy details. 

>Is this a closed environment, that supports 100000 cell-generations?

Good question! No. I was envisioning it as a system where a constant population of 100 000 would be viable. (RA pipettes in a constant amount of nutritional fluid every day or something).  Now that you asked the question, it might make more sense to investigate this assumption more.

I have a small intuition pump I am working on, and thought maybe others would find it interesting.

Consider a habitat (say, a Petri dish) that in any given moment has maximum carrying capacity for supporting 100 000 units of life (say, cells), and two alternative scenarios.

Scenario A. Initial population of 2 cells grows exponentially, one cell dying but producing two descendants each generation. After the 16th generation, the habitat overflows, and all cells die in overpopulation. The population experienced a total of 262 142 units of flourishing.

Scenario B... (read more)

2Measure
Notably, in either population regime, a randomly chosen individual will have an expected ~n/2 total descendants. However, this only favors Scenario B by a factor of about 6x as opposed to the 131x more lifetimes in Scenario A.
1Measure
I'm going to assume we're talking about people here. I think the relevant difference for me is the value of having many generations of accumulated culture vs. the value of having many other people alive along with you.
3Dagon
Is this a closed environment, that supports 100000 cell-generations?  In that case, the 15th generation and predecessors will have eaten 65535 units of food, so the 16th generation will only be partial - either 65536 cells that live about half their normal span, or more likely, a bunch will eat each other, to collapse to a much smaller number that lasts a few more generations.  Regardless, it's worth exploring where your intuition flips - would 1 cell that repeats for 100K generations be preferable?  50K for 2 generations?  For myself, I'm mostly indifferent in the case of individual cells.  For beings with culture, there's a lot of value in existing during a growth phase, which I don't know how to model.  And thinking beings (if such a thing existed), when there are sufficient numbers and knowledge about the impending limits, can work to increase the limits, and to decrease the per-unit usage.   Related: https://www.lesswrong.com/tag/shut-up-and-multiply .

I agree the non-IID result is quite surprising. Careful reading of the Berry-Esseen gives some insight on the limit behavior. In the IID case, the approximation error is bounded by constants / $\sqrt{n}$ (where constants are proportional to third moment / $\sigma^3$.

The not-IID generalization for n distinct distribution has the bound more or less sum of third moments divided by (sum of sigma^2)^(3/2) times (sum of third moments), which is surprisingly similar to IID special case. My reading of it suggests that if the sigmas / third moments of all n distrib... (read more)

1Maxwell Peterson
I had a strong feeling from the theorem that skew mattered a lot, but I’d somehow missed the dependence on the variance- this was helpful, thanks.

It gets worse. This isn't a randomly selected example - it's specifically selected as a case where reason would have a hard time noticing when and how it's making things worse.

Well, the history of bringing manioc to Africa is not the only example. Scientific understanding of human nutrition (alongside with disease) had several similar hiccups along the way, several which have been covered in SSC (can't remember the post titles where):

There was a time when Japanese army lost many lives to beriberi during Russo-Japanese war, thinking it was a transmissible d... (read more)

(Reply to gwern's comment but not only addressing gwern.)

Concerning the planning question:

I agree that next-token prediction is consistent with some sort of implicit planning of multiple tokens ahead. I would phrase it a bit differently. Also, "implicit" is doing lot of work here

(Please someone correct me if I say something obviously wrong or silly; I do not know how GPT-3 works, but I will try to say something about how it works after reading some sources [1].)

The bigger point about planning, though, is that the GPTs are getting feedback on
... (read more)

I contend it is not an *implementation* in a meaningful sense of the word. It is more a prose elaboration / expansion of the first generated bullet point list (an inaccurate one: "plan" mentions chopping vegetables, putting them in a fridge and cooking meat; prose version tells of chopping a set of vegetables, skips the fridge and then cooks beef, and then tells an irrelevant story where you go to sleep early and find it is a Sunday and no school).

Mind, substituting abstract category words with sensible more specific ones (vegetables -> carro... (read more)

At the risk of stating very much the very obvious:

Trolley problem (or the fat man variant) is a wrong metaphor for near any ethical decision, anyway, as there are very few real life ethical dilemmas that are as visceral and require immediate action from very few limited set of options and whose consequences are nevertheless as clear.

Here is a couple of a bit more realistic matter of life and death. There are many stories (probably I could find factual accounts, but I am too lazy to search for sources) of soldiers who make the snap decision to save the live... (read more)

"Non-identifiability", by the way, is the search term that does the trick and finds something useful. Please see: Daly et al. [1], section 3. They study indentifiability characteristics of logistic sigmoid (that has rate r and goes from zero to carrying capacity K at t=0..30) via Fisher information matrix (FIM). Quote:

When measurements are taken at times t ≤ 10, the singular vector (which is also the eigenvector corresponding to the single non-zero eigenvalue of the FIM) is oriented in the direction of the growth rate r in parameter spac
... (read more)
1Arenamontanus
Awesome find! I really like the paper. I had been looking at Fisher information myself during the weekend, noting that it might be a way of estimating uncertainty in the estimation using the Cramer-Rao bound (but quickly finding that the algebra got the better of me; it *might* be analytically solvable, but messy work).

Was momentarily confused what is k (sometimes denotes carrying capacity in the logistic population growth model), but apparently it is the step size (in numerical integrator)?

I have not enough expertise here to speak like an expert, but it seems that stiffness would be related in a roundabout way. It seems to describe difficulties of some numerical integrators with systems like this: the integrator can veer much off of true logistic curve with insufficiently small steps because the differential changes fast.

The phenomenon seems to be more about non-sensiti... (read more)

2Shmi
Sorry, forgot to replace one of the k with λ. I agree that identifiability and stiffness are different ways to look at the same phenomenon: sensitivity of the solution to the parameter values results to errors building up fast during numerical integration, these errors tend to correspond to different parameter values, and, conversely, with even a small amount of noise the parameter values are hard to identify from the initial part of the curve.

"Non-identifiability", by the way, is the search term that does the trick and finds something useful. Please see: Daly et al. [1], section 3. They study indentifiability characteristics of logistic sigmoid (that has rate r and goes from zero to carrying capacity K at t=0..30) via Fisher information matrix (FIM). Quote:

When measurements are taken at times t ≤ 10, the singular vector (which is also the eigenvector corresponding to the single non-zero eigenvalue of the FIM) is oriented in the direction of the growth rate r in parameter spac
... (read more)

I was going to suggest that maybe it could be a known and published result in dynamical systems / population dynamics literature, but I am unable to find anything with Google, and textbooks I have at hand, while plenty mentions of logistic growth models, do not discuss prediction from partial data before inflection point.

On the other hand, it is fundamentally a variation on the themes of difficulty in model selection with partial data and dangers of extrapolation, which are common in many numerical textbooks.

If anyone wishes to flesh it out, I believe this... (read more)

I am happy that you mention Gelman's book (I am studying it right now). I think lots of "naive strong bayesianists" would improve from a thoughtful study of the BDA book (there are lots of worked out demos and exercises available for it) and maybe some practical application of Bayesian modelling to some real-world statistical problems. The practice of "Bayesian way of life" of "updating my priors" sounds always a bit too easy in contrast to doing a genuine statistical inference.

For example, a couple of puzzles I am still ... (read more)

Howdy. I came across Ole Peters' "ergodicity economics" some time ago, and was interested to see what LW made of it. Apparently one set of skeptical journal club meetup notes: https://www.lesswrong.com/posts/gptXmhJxFiEwuPN98/meetup-notes-ole-peters-on-ergodicity

I am not sure what to make of criticisms of Seattle meetups (they appear correct, but I am not sure if they are relevant; see my comment there).

Not planning to write a proper post, but here is an example blog post of Peters which I found illustrative and demonstrates why I think the ... (read more)

2ryan_b
I am not versed in economics literature, so I can't meet your need. But I have also encountered ergodicity economics, and thought it was interesting because it had good motivations. I am skeptical for an entirely different reason; I encountered ergodic theory beforehand in the context of thermodynamics, where it has been harshly criticized. I instinctively feel like if we can do better in thermodynamics, we can employ the same math to do better in other areas. Of course this isn't necessarily true: ergodic theory might cleave reality better when describing an economy than gas particles; there is probably a significant difference between the economics version and the thermodynamics version; the criticism of ergodic theory might be ass-wrong (I don't think it is, but I'm not qualified enough for strong confidence).

Peters' December 2019 Nature Physics paper (https://www.nature.com/articles/s41567-019-0732-0 ) provides some perspective on 0.6/1.5x coin flip example and other conclusions of the above discussion. (If Peters' claims have changed along the way, I wouldn't know.)

In my reading, there Peters' basic claim is not that ergodicity economics can solve the coin flip game in a way that classical economics can not (because it can, by switching to expected log wealth utility instead of expected wealth), but the utility functions as originally pres... (read more)