User Comment Replies

Self-fulfilling misalignment data might be poisoning our AI models

epistemic meristem11d40

johnswentworth's Shortform

epistemic meristem4mo20

(There was already a linkpost.)

AI #88: Thanks for the Memos

epistemic meristem4mo10

What are the most noteworthy sections to read? (Looks like you forgot to bold them.) Thanks!

Exercise: Solve "Thinking Physics"

epistemic meristem1y10

The Amazon link in the post is for the third (and latest) edition, only $28. Your other links are for the second edition, except the Harvard link's dead.

Cortés, AI Risk, and the Dynamics of Competing Conquerors

epistemic meristem1y70

AI #33: Cool New Interpretability Paper

epistemic meristem1y20

Did you forget to bold the particularly noteworthy sections in the table of contents?

3Zvi1y

Yes.

Open Call for Research Assistants in Developmental Interpretability

epistemic meristem2y41

More than a 76% pay cut, because a lot of the compensation at Google is equity+bonus+benefits; the $133k minimum listed at your link is just base salary.

Alignment Grantmaking is Funding-Limited Right Now

epistemic meristem2y21

I'd thought it was a law of nature that quiet norms for open plans don't actually work; it sounds like you've found a way to have your cake and eat it too!

MIRI announces new "Death With Dignity" strategy

epistemic meristem2y20

That's fair; thanks for the feedback! I'll tone down the gallows humor on future comments; gotta keep in mind that tone of voice doesn't come across.

BTW a money brain would arise out of, e.g., a merchant caste in a static medieval society after many millennia. Much better than a monkey brain, and more capable of solving alignment!

Towards Hodge-podge Alignment

epistemic meristem2y10

Beren, have you heard of dependent types, which are used in Coq, Agda, and Lean? (I don't mean to be flippant; your parenthetical just gives the impression that you hadn't come across them, because they can easily enforce integer bounds, for instance.)

Predictive Coding has been Unified with Backpropagation

epistemic meristem2y10

Thanks for the great back-and-forth! Did you guys see the first author's comment? What are the main updates you've had re this debate now that it's been a couple years?

4abramdemski2y

I have not thought about these issues too much in the intervening time. Re-reading the discussion, it sounds plausible to me that the evidence is compatible with roughly brain-sized NNs being roughly as data-efficient as humans. Daniel claims: I think the human observation-reaction loop is closer to ten times that fast, which results in a 3 OOM difference. This sounds like a gap which is big, but could potentially be explained by architectural differences or other factors, thus preserving a possibility like "human learning is more-or-less gradient descent". Without articulating the various hypotheses in more detail, this doesn't seem like strong evidence in any direction. Not before now. I think the comment had a relatively high probability in my world, where we still have a poor idea of what algorithm the brain is running, and a low probability in Daniel's world, where evidence is zooming in on predictive coding as the correct hypothesis. Some quotes which I think support my hypothesis better than Daniel's: This illustrates how we haven't pinned down the mechanical parts of algorithms. What this means is that speculation about the algorithm of the brain isn't yet causally grounded -- it's not as if we've been looking at what's going on and can build up a firm abstract picture of the algorithm from there, the way you might successfully infer rules of traffic by watching a bunch of cars. Instead, we have a bunch of different kinds of information at different resolutions, which we are still trying to stitch together into a coherent picture. This directly addresses the question of how clear-cut things are right now, while also pointing to many concrete problems the predictive coding hypothesis faces. The comment continues on that subject for several more paragraphs. This paragraph supports my picture that hypotheses about what the brain is doing are still largely being pulled from ML, which speaks against the hypothesis of a growing consensus about what the brai

Predictive Coding has been Unified with Backpropagation

epistemic meristem2y50

The paper's first author, beren, left a detailed comment on the ACX linkpost, painting a more nuanced and uncertain (though possibly outdated by now?) picture. To quote the last paragraph:

"The brain being able to do backprop does not mean that the brain is just doing gradient descent like we do to train ANNs. It is still very possible (in my opinion likely) that the brain could be using a more powerful algorithm for inference and learning -- just one that has backprop as a subroutine. Personally (and speculatively) I think it's likely that the brain perfor... (read more)

Alignment Grantmaking is Funding-Limited Right Now

epistemic meristem2y10

Re open plan offices: many people find them distracting. I doubt they're a worthwhile cost-saving measure for research-focused orgs; better to have fewer researchers in an environment conducive to deep focus. I could maybe see a business case for them in large orgs where it might be worth sacrificing individual contributors' focus in exchange for more legibility to management, or where management doesn't trust workers to stay on task when no one is hovering over their shoulder, but I hope no alignment org is like that. For many people open plan offices are... (read more)

4AdamGleave2y

It can definitely be worth spending money when there's a clear case for it improving employee productivity. I will note there are a range of both norms and physical layouts compatible with open-plan, ranging from "everyone screaming at each other and in line of sight" trading floor to "no talking library vibes, desks facing walls with blinders". We've tried to make different open plan spaces zoned with different norms and this has been fairly successful, although I'm sure some people will still be disturbed by even library-style areas and be more productive in a private office.

Don't leave your fingerprints on the future

epistemic meristem2y10

I meant I don't think the CEV of ancient Rome has the same values as ancient Rome. Looks like your comment got truncated: "what is good if they were just"

1Sweetgum2y

Edited to fix.

Editor Mini-Guide

epistemic meristem2y20

Is there a command-line tool for previewing how a "markdown+LaTeX" text file would render as a LW draft post, for those of us who prefer to manipulate text files using productivity tools like (neo)vim and git?

Decision theory does not imply that we get to have nice things

epistemic meristem2y40

Ah right, because Clippy has less measure, and so has less to offer, so less needs to be offered to it. Nice catch! Guess I've been sort of heeding Nate's advice not to think much about this. :)

Of course, there would still be significant overhead from trading with and/or outbidding sampled plethoras of UFAIs, vs the toy scenario where it's just Clippy.

I currently suspect we still get more survival measure from aliens in this branch who solved their alignment problems and have a policy of offering deals to UFAIs that didn't kill their biol... (read more)

Decision theory does not imply that we get to have nice things

epistemic meristem2y10

Paperclips vs obelisks does make the bargaining harder because clippy would be offered fewer expected paperclips.

My current guess is we survive if our CEV puts a steep premium on that. Of course, such hopes of trade ex machina shouldn't affect how we orient to the alignment problem, even if they affect our personal lives. We should still play to win.

7interstice2y

But Clippy also controls fewer expected universes, so the relative bargaining positions of humans VS UFAIs remain the same(compared to a scenario in which all UFAIs had the same value system)

Don't leave your fingerprints on the future

epistemic meristem2y41

Roman values aren't stable under reflection; the CEV of Rome doesn't have the same values as ancient Rome. It's like a 5-year-old locking in what they want to be when they grow up.

Locking in extrapolated Roman values sounds great to me because I don't expect that to be significantly different than a broader extrapolation. Of course, this is all extremely handwavy and there are convergence issues of superhuman difficulty! :)

2Sweetgum2y

I'm not exactly sure what you're saying here, but if you're saying that the fact of modern Roman values being different than Ancient Roman values shows that Ancient Roman values aren't stable under reflection, then I totally disagree. History playing out is a not-at-all similar process to an individual person reflecting on their values, so the fact that Roman values changed as history played out from Ancient Rome to modern Rome does not imply that an individual Ancient Roman's values are not stable under reflection. As an example, Country A conquering Country B could lead the descendants of Country B's population to have the values of Country A 100 years hence, but this information has nothing to do with whether a pre-conquest Country B citizen would come to have Country A's values on reflection. I guess I just have very different intuitions from you on this. I expect expect people from different historical time periods and cultures to have quite different extrapolated values. I think the concept that all peoples throughout history would come into near agreement about what is good if they just reflected on it long enough is unrealistic. (unless, of course, we snuck a bit of motivated reasoning into the design of our Value Extrapolator so that it just happens to always output values similar to our 21st century Western liberal values...)

Don't leave your fingerprints on the future

epistemic meristem2y11

Yes it would, at least if you mean their ancient understanding of morals.

5Sweetgum2y

Can you elaborate? Why would locking in Roman values not be a great success for a Roman who holds those values?

Announcing the DWATV Discord

epistemic meristem3y30

Not on mobile, in my experience.

3MondSemmel3y

Huh, you're right.

Announcing the DWATV Discord

epistemic meristem3y10

I think it would be helpful to note at the top of the post that it's crossposted here. I initially misinterpreted "this blog" in the first sentence as referring to LW.

3MondSemmel3y

Crossposted LW posts list their original source next to the author's username. See this screenshot.

Negotiating Up and Down the Simulation Hierarchy: Why We Might Survive the Unaligned Singularity

epistemic meristem3y70

This idea keeps getting rediscovered, thanks for writing it up! The key ingredient is acausal trade between aligned and unaligned superintelligences, rather than between unaligned superintelligences and humans. Simulation isn't a key ingredient; it's a more general question about resource allocation across branches.

China Covid #2

epistemic meristem3y10

Too much power, I would assume. Yet he didn't kill Bo Xilai.

MIRI announces new "Death With Dignity" strategy

epistemic meristem3y00

Why the downboats? People new to LW jargon probably wouldn't realize "money brain" is a typo.

7interstice3y

Seemed like a bit of a rude way to let someone know they had a typo, I would have just gone with "Typo: money brain should be monkey brain".

MIRI announces new "Death With Dignity" strategy

epistemic meristem3y60

Nitpick: maybe aligned and unaligned superintelligences acausally trade across future branches? If so, maybe on the mainline we're left with a very small yet nonzero fraction of the cosmic endowment, a cosmic booby prize if you will?

"Booby prize with dignity" sounds like a bit of an oxymoron...

MIRI announces new "Death With Dignity" strategy

[+]epistemic meristem3y-50

0epistemic meristem3y

Why the downboats? People new to LW jargon probably wouldn't realize "money brain" is a typo.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

epistemic meristem3y30

What does "corrupt" mean in this context? What are some examples of noncorrupt employers?

AnnaSalamon3y1100

A CFAR board member asked me to clarify what I meant about “corrupt”, also, in addition to this question.

So, um. Some legitimately true facts the board member asked me to share, to reduce confusion on these points:

There hasn’t been any embezzlement. No one has taken CFAR’s money and used it to buy themselves personal goods.
I think if you took non-profits that were CFAR’s size + duration (or larger and longer-lasting), in the US, and ranked them by “how corrupt is this non-profit according to observers who people think of as reasonable, and who got to

... (read more)

LESSWRONG
LW

All of epistemic meristem's Comments + Replies