kave

Hello! I work at Lightcone and like LessWrong :-). I have made some confidentiality agreements I can't leak much metadata about (like who they are with). I have made no non-disparagement agreements.

Posts

Sorted by New

5kave's Shortform

78Gwern: Why So Few Matt Levines?

4mo

62Linkpost: Surely you can be serious

7mo

150Daniel Dennett has died (1942-2024)

10mo

565LessWrong's (first) album: I Have Been A Good Bing

11mo

179

5kave's Shortform

150If you weren't such an idiot...

105New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)

41On plans for a functional society

24A bet on critical periods in neural networks

26Singular learning theory and bridging from ML to brain emulations

Wikitag Contributions

Vote Strength

10mo

(-35)

Comments

Sorted by

Newest

shortplav

kave3h20

Looks like the base url is supposed to be niplav.site. I'll change that now (FYI @niplav)

Why Agent Foundations? An Overly Abstract Explanation

kave4dΩ682

I think TLW's criticism is important, and I don't think your responses are sufficient. I also think the original example is confusing; I've met several people who, after reading OP, seemed to me confused about how engineers could use the concept of mutual information.

Here is my attempt to expand your argument.

We're trying to design some secure electronic equipment. We want the internal state and some of the outputs to be secret. Maybe we want all of the outputs to be secret, but we've given up on that (for example, radio shielding might not be practical or reliable enough). When we're trying to design things so that the internal state and outputs are secret, there are a couple of sources of failure.

One source of failure is failing to model the interactions between the components of our systems. Maybe there is an output we don't know about (like the vibrations the electronics make while operating), or maybe there is an interaction we're not aware of (like magnetic coupling between two components we're treating as independent).

Another source of failure is that we failed to consider all the ways that an adversary could exploit the interactions we do know about. In your example, we fail to consider how an adversary could exploit higher-order correlations between emitted radio waves and the state of the electronic internals.

A true name, in principle, allows us to avoid the second kind of failure. In high-dimensional state spaces, we might need to get kind of clever to prove the lack of mutual information. But it's a fairly delimited analytic problem, and we at least know what a good answer would look like.

The true name could also guide our investigations into our system, to help us avoid the first kind of failure. "Huh, we just made the adder have a more complicated behaviour as an optimisation. Could the unnevenness of that optimisation over the input distribution leak information about the adder's inputs to another part of the system?"

Now, reader, you might worry that the chosen example of a True Name leaves an implementation gap wide enough for a human adversary to drive an exploit through. And I think that's a pretty good complaint. The best defence I can muster is that it guides and organises the defender's thinking. You get to do proofs-given-assumptions, and you get more clarity about how to think if your assumptions are wrong.

To the extent that the idea is that True Names are part of a strategy to come up with approaches that are unbounded-optimisation-proof, I think that defence doesn't work and the strategy is kind of sunk.

On the other hand, here is an argument that I can plause. In the end, we've got to make some argument that when we flick some switch or continue down some road, things will be OK. And there's a big messy space of considerations to navigate to that end. True Names are necessary to have any hope of compressing the domain enough that you can make arguments that stand up.

When you downvote, explain why

kave10d20

With LLMs, we might be able to aggregate more qualitative anonymous feedback.

Announcement: Learning Theory Online Course

kave1mo20

The general rule is roughly "if you write a frontpage post which has an announcement at the end, that can be frontpaged". So for example, if you wrote a post about the vision for Online Learning, that included as a relatively small part the course announcement, that would probably work.

By the way, posts are all personal until mods process them, usually around twice a day. So that's another reason you might sometimes see posts landing on personal for awhile.

Announcement: Learning Theory Online Course

kave1mo20

Mod note: this post is personal rather than frontpage because event/course/workshop/org... announcements are generally personal, even if the content of the course, say, is pretty clearly relevant to the frontpage (as in this case)

Habryka's Shortform Feed

kave1mo40

I believe it includes some older donations:

Our Manifund application's donations, including donations going back to mid-May, totalling about $50k
A couple of older individual donations, in October/early Nov, totalling almost 200k

Lecture Series on Tiling Agents

kave1mo*40

Mod note: I've put this on Personal rather than Frontpage. I imagine the content of these talks will be frontpage content, but event announcements in general are not.

Human takeover might be worse than AI takeover

kave1mo70

neural networks routinely generalize to goals that are totally different from what the trainers wanted

I think this is slightly a non sequitor. I take Tom to be saying "AIs will care about stuff that is natural to express in human concept-language" and your evidence to be primarily about "AIs will care about what we tell it to", though I could imagine there being some overflow evidence into Tom's proposition.

I do think the limited success of interpretability is an example of evidence against Tom's proposition. For example, I think there's lots of work where you try and replace an SAE feature or a neuron (R) with some other module that's trying to do our natural language explanation of what R was doing, and that doesn't work.

Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles

kave1mo40

Here's a little bit about the tapes in the video

Parkinson's Law and the Ideology of Statistics

kave1mo551

I dug up my old notes on this book review. Here they are:

So, I've just spent some time going through the World Bank documents on its interventions in Lesotho. The Anti-Politics Machine is not doing great on epistemic checking
There is no recorded Thaba-Tseka Development Project, despite the period in which it should have taken place being covered
There is a Thaba-Bosiu development project (parts 1 and 2) taking place at the correct time.
Thaba-Bosiu and Thaba-Tseka are both regions of Lesotho
The spec doc for Thaba-Bosiu Part 2 references the alleged problems the economists were faced with (remittances from South African miners, poor crop yield ... no complaint about cows)
It has a negative assessment doc at the end. It was an unsuccessful project. This would match
The funding doesn't quite match up. The UK is mentioned as funding the "Thaba-Tseka" project, and is indeed funding Thaba-Bosiu. But Canada is I believe funding a road project instead
Something like 2/3 of the country is involved in Thaba-Bosiu Development II (It became renamed the "Basic Agricultural Services Program")
There is no mention of ponies or wood involved in interventions anywhere. In fact, the part II retrospective includes the lack of focus on livestock as a problem (suggesting they didn't do much of it)
They were focused on five major crops (maize, sorghum, beans, peas and wheat)
Also the quote in the book review of the quote in The Anti-Politics Machine of the quote in the report doesn't show up in any of the documents I looked at (which basically covered every project in Lesotho by the World Bank in that time period). The writing style of the quote is also moderately distinct from that of the reports
AFAICT, the main intervention was fertiliser. The retrospective claims this failed because (a) the climate in Lesotho is uniquely bad and screened off fertilisation and (b) the Lesotho government fucked up messaging and also every other part of everything all the time and ultimately all the donors backed out.
The government really wanted to be self-sufficient in food production. None of the donors, the farmers or the world bank cared about this but the government focused its messaging heavily around this. The government ended up directing a lot of its efforts towards a new Food Self-Sufficiency Program which was seen as incompatible with the goals of Basic Agricultural Services Program.
The fact that the crop situation wasn't working was recognised fairly early on. They started on an adaptive trial of crop research to figure out what would work better. This was hampered by donor coordination so only happened in a small area, but apparently worked quite well
All-in-all, sounds less bad than the Anti-Politics Machine makes it out to be, and also just generally very different? I'm not 100% certain I've managed to locate all the relevant programs though, so it's possible something closer to the book's description did happen