The Second Best

Wei Dai

LESSWRONG
LW

All of Kenny's Comments + Replies

I admit now that I was in fact missing the point.

I can (maybe/kinda) imagine someone else doing something like this and not definitely thinking it was wholly unjustified, but I agree now that this a damning part of a larger damning (and long enduring) pattern of bad behavior on Wolfram's part.

You were right. I was wrong.

Linear White

Kenny1y20

Was this not originally tagged “personal blog”?

I’m not sure what the consensus is on how to vote on these posts, but I’m sad that this post’s poor reception might be why its author deactivated their account.

Feedly Breaks MathML

Kenny2y40

I just reported this to Feedly.

4jefftk2y

Thanks!

Lies Told To Children

Kenny2y20

Thanks for the info! And no worries about the (very) late response – I like that people fairly often reply at all (beyond same-day or within a few days) on this site; makes the discussions feel more 'timeless' to me.

The second "question" wasn't a question, but it was due to not knowing that Conservative Judaism is distinct from Orthodox Judaism. (Sadly, capitalization is only relatively weak evidence of 'proper-nounitude'.)

How could AIs 'see' each other's source code?

Kenny2y40

Some of my own intuitions about this:

Yes, this would be 'probabilistic' and thus this is an issue of evidence that AIs would share with each other.
Why or how would one system trust another that the state (code+data) shared is honest?
Sandboxing is (currently) imperfect, tho perhaps sufficiently advanced AIs could actually achieve it? (On the other hand, there are security vulnerabilities that exploit the 'computational substrate', e.g. Spectre, so I would guess that would remain as a potential vulnerability even for AIs that designed and built their own

Kenny2y20

I think my question is different, tho that does seem like a promising avenue to investigate – thanks!

How could AIs 'see' each other's source code?

Kenny2y20

That's an interesting idea!

How could AIs 'see' each other's source code?

Kenny2y20

An oscilloscope

I guessed that's what you meant but was curious whether I was right!

If the AI isn't willing or able to fold itself up into something that can be run entirely on single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.

Any idea how a 'folded-up' AI would imply anything in particular about the 'expanded' AI?

If an AI 'folded its... (read more)

How could AIs 'see' each other's source code?

Kenny2y41

What source code and what machine code is actually being executed on some particular substrate is an empirical fact about the world, so in general, an AI (or a human) might learn it the way we learn any other fact - by making inferences from observations of the world.

This is a good point.

But I'm trying to develop some detailed intuitions about how this would or could work, in particular what practical difficulties there are and how they could be overcome.

For example, maybe you hook up a debugger or a waveform reader to the AI's CPU to get a memory dum

... (read more)

4Max H2y

An oscilloscope. Note that it isn't particularly realistic to hook up a scope to the kind of hardware that current AI systems are typically trained and run on. But what I was trying to gesture at with this comment, is this is the kind of problem that the AI might be able to help you with. If the AI isn't willing or able to fold itself up into something that can be run entirely on a single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead. (If the AI is already adversarial to the point where it won't let you shut it off, and is running on a distributed system, you've already lost. Willingness to fold itself up and be shut off means that the AI is already pretty aligned; it wouldn't surprise me if this problem is alignment-complete.) As for your practical difficulties, I agree these are all problems. I am not saying the problem you pose isn't hard, just that there doesn't seem to be anything that makes if fundamentally impossible to solve in principle. There is lots of academic research on hardware security and secure hardware, verifiable computing (e.g. using zk-SNARKs), formally verified programming, chain-of-trust, etc. that attempt to make progress on small pieces of this problem (not necessarily with a specific focus on AI). Stitching all of these things together into an actually end-to-end secure system for interacting with a smarter-than-human AI system is probably possible, but will require solving many unsolved problems, and designing and building AI systems in different ways than we're currently doing. IMO, it's probably better to just build an AI that provably shares human values from the start.

"Corrigibility at some small length" by dath ilan

Kenny2y10

This is a nice summary!

fictional role-playing server

As opposed to all of the non-fictional role playing servers (e.g. this one)?

I don't think most/many (or maybe any) of the stories/posts/threads on the Glowfic site are 'RPG stories', let alone some kind of 'play by forum post' histories, there's just a few that use the same settings as RPGs.

AI #5: Level One Bard

Kenny2y10

I suspect a lot of people, like myself, learn "content-based writing" by trying to communicate, e.g. in their 'personal life' or at work. I don't think I learned anything significant by writing in my own "higher forms of ['official'] education".

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Kenny2y10

I would still like to see political pressure for truly open independent audits, though.

I think that would be a big improvement. I also think ARC is, at least effectively, working on that or towards it.

Abuse in LessWrong and rationalist communities in Bloomberg News

Kenny2y21

Damning allegations; but I expect this forum to respond with minimization and denial.

This is so spectacularly bad faith that it makes me think the reason you posted this is pretty purely malicious.

Out of all of the LessWrong and 'rationalist' "communities" that have existed, how many are ones for which any of the alleged bad acts occurred? One? Two?

Out of all of the LessWrong users and 'rationalists', how many have been accused of these alleged bad acts? Mostly one or two?

My having observed extremely similar dynamics about, e.g. sexual harassment, in se... (read more)

Abuse in LessWrong and rationalist communities in Bloomberg News

Kenny2y20

Please don't pin the actions of others on me!

1Ivy Mazzola2y

Wasn't doing that.

Abuse in LessWrong and rationalist communities in Bloomberg News

Kenny2y32

No, it's not, especially given that 'whataboutism' is a label used to dismiss comparisons that don't advance particular arguments.

Writing the words "what about" does not invalidate any and all comparisons.

Abuse in LessWrong and rationalist communities in Bloomberg News

[+]Kenny2y-50

Abuse in LessWrong and rationalist communities in Bloomberg News

Kenny2y10

I think the quoted text is inflammatory and "this forum" (this site) isn't the same as wherever the alleged bad behavior took place.

Is contradicting something you believe to be, essentially, false equivalent to "denial"?

Abuse in LessWrong and rationalist communities in Bloomberg News

Kenny2y10

It is anomalous that people are quite uninterested in optimizing this as it seems clearly important.

I have the opposite sense. Many people seem very interested in this.

"This community" is a nebulous thing and this site is very different than any of the 'in-person communities'.

But I don't think there's strong evidence that the 'communities' don't already "have much lower than average levels of abuse". I have an impression that, among the very-interested-in-this people, any abuse is too much.

Abuse in LessWrong and rationalist communities in Bloomberg News

Kenny2y31

What kind of more severe punishment should "the rationalist community" mete out to X and how exactly would/should that work?

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Kenny2y10

You seem to be describing something that's so implausible it might as well be impossible.

Given the existing constraints, I think ARC made the right choice.

3jbash2y

On edit again: I have to retract much of the following. Case 1a DOES matter, because although finding the problem doesn't generate a dispute under any terms of engagement, demanding more open terms of engagement may itself generate a dispute over the terms that prevents you from being allowed to evaluate at all, so the problem may never get found, which would be bad. So if you think there's a relatively large chance that you'll find problems that the "lab" wouldn't have found on its own, and that they won't mind talking about, you may get value by engaging. I would still like to see political pressure for truly open independent audits, though. There's some precedent in financial auditing. But there's some anti-precedent in software security, where the only common way to have a truly open outside inspection is if it's adversarial with no contract at all. I wonder how feasible adversarial audits are here... === Original text === It's definitely something ARC could not make happen alone; that's the reason for making a lot of public noise. And it may indeed be something that couldn't be made to happen at all. Probably so, in fact. It would require a very unlikely degree of outside political pressure. However, if you don't manage to establish a norm like that, then here's your case analysis if you find something actually important-- 1. The underlying project can truly, permanently fix it. The subcases are-- (a) They fix it and willingly announce it, so that they get credit for being responsible actors. Not a problem under any set of contracts or norms, so this branch is irrelevant. (b) They fix it and want to keep it secret, probably because it affects something they (usually erroneously) think their competitors couldn't have dreamed up. This is a relatively rare case, so it gets relatively little consideration. They usually still should have to publish it so the next project doesn't make the same mistake. However, I admit there'll be a few subcas

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Kenny2y10

Do you think ARC should have traded publicizing the lab's demands for non-disclosure instead of performing the exercise they did?

I think that would have been a bad trade.

I also don't think there's much value to them whistleblowing about any kind of non-disclosure that the lab's might have demanded. I don't get the sense there's any additional bad (or awful) behavior – beyond what's (implicitly) apparent from the detailed info ARC has already publicly released.

I think it's very useful to maintain sufficient incentives for the lab's to want to allow things l... (read more)

5jbash2y

Yes, because at this stage, there was almost no chance that the exercise they did could have turned up anything seriously dangerous. Now is the time to set precedents and expectations, because it will really matter as these things get smarter. A minimal norm might be something like every one of these models being expected to get independent evaluations, always to be published in full, possibly after a reasonable time for remediation. That includes full explanation of all significant findings, even if explaining them clearly requires disclosing "trade secrets". Any finding so bad that it had to be permanently secret for real safety reasons should of course result in total shutdown of the effort at a minimum. [1] Any trace of unwillingness to accept a system at least that "extreme" should be treated as prima facie evidence of bad faith... leading to immediate shutdown. Otherwise it's too easy to keep giving up ground bit by bit, and end up not doing anything at all when you eventually find something really critical. It is really hard not to "go along to get along", especially if you're not absolutely sure, and especially if you've yielded in just slightly less clearcut cases before. You can too easily find yourself negotiated into silence when you really should have spoken up, or even just dithering until it's too late. This is what auditing is actually about. Late edit: Yes, by the way, that probably would drive some efforts underground. But they wouldn't happen in "standard" corporate environments. I am actually more comfortable with overtly black-hat secret development efforts than with the kinds of organizational behavior you get in a corporation whose employees can kid themselves that they're the "good guys". ---------------------------------------- 1. I do mean actually dangerous findings, here. Things that could be immediately exploited to do really unprecedented kinds of harm. I don't mean stupid BS like generating probably-badly-flawed versions of "da

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Kenny2y10

Wouldn't it be better to accept contractual bindings and then at least have the opportunity to whistleblow (even if that means accepting the legal consequences)?

Or do you think that they have some kind of leverage by which the labs would agree to NOT contractually bind them? I'd expect the labs to just not allow them to evaluate the model at all were ARC to insist on or demand this.

2jbash2y

I think that the fact of the labs demanding something like that should be loudly pointed out to the general public through all possible media. In a way that put them in the absolute worst light possible. A lot of the "safety" stuff they run on about is is pretty silly, but it shows a certain amount of sensitivity to public opinion on their part. And that's justifiable sensitivity, because people are gonna get nervous, and it wouldn't necessarily be unreasonable to shut the "labs" down or expropriate them at right about this point in the story. I also think they should be legally unable to demand such a commitment, and legally unable to enforce it even if they get it, but that's a somewhat different thought.

Bing Chat is blatantly, aggressively misaligned

Kenny2y11

I'm definitely not against reading your (and anyone else's) blog posts, but it would be friendlier to at least outline or excerpt some of the post here too.

Bing Chat is blatantly, aggressively misaligned

Kenny2y10

It looks like you didn't (and maybe can't) enter the ASCII art in the form Bing needs to "decode" it? For one, I'd expect line breaks, both after and before the code block tags and also between each 'line' of the art.

If you can, try entering new lines with <kbd>Shift</kbd>+<kbd>Enter</kbd>. That should allow new lines without being interpreted as 'send message'.

2Jacob Pfau2y

I don't think the shift-enter thing worked. Afterwards I tried breaking up lines with special symbols IIRC. I agree that this capability eval was imperfect. The more interesting thing to me was the suspicion on Bing's part to a neutrally phrased correction.

[linkpost] Better Without AI

Kenny2y20

I really like David's writing generally but this 'book' is particularly strong (and pertinent to us here on this site).

The second section, What is the Scary kind of AI?, is a very interesting and (I think) useful alternative perspective on the risks that 'AI safety' do and (arguably) should focus on, e.g. "diverse forms of agency".

The first chapter of the third ('scenarios') section, At war with the machines, provides a (more) compelling version of a somewhat common argument, i.e. 'AI is (already) out to get us'.

The second detailed scenario, in the third c... (read more)

Movie Review: Megan

Kenny2y10

This seems like the right trope:

Cartoonland Time - TV Tropes [WARNING: browsing TV Tropes can be a massive time sink]

Quantum Suicide, Decision Theory, and The Multiverse

Kenny2y53

That's why I used a fatal scenario, because it very obviously cuts all future utility to zero

I don't understand why you think a decision resulting in some person's or agent's death "cuts all future utility to zero". Why do you think choosing one's death is always a mistake?

Amazon closing AmazonSmile to focus its philanthropic giving to programs with greater impact

Kenny2y10

I think I'd opt to quote the original title in a post here to indicate that it's not a 'claim' being made (by me).

Iron deficiencies are very bad and you should treat them

Kenny2y20

IIRC, RDIs (and I would guess EARs) vary quite significantly among the various organizations that calculate/estimate/publish them. That might be related to the point ChristianKI seemed to be trying to make. (Tho I don't know whether 'iron' is one of the nutrients for which this is, or was, the case.)

ChatGPT: First Impressions

Kenny2y10

I can't tell what's the output of ChatGPT or your prompts or commentary.

[Link] "Improper Nouns" by siderea

Kenny2y10

I don't think 'chronic fatigue syndrome' is a great example of what the post discusses because 'syndrome' is a clear technical (e.g. medical) word already. Similarly, 'myalgic encephalitis' is (for most listeners or readers) not a phrase made up of common English words. Both examples seem much more clearly medical or technical terms. 'chronic fatigue' would be a better example (if it was widely used) as it would conflate the unexplained medical condition with anything else that might have the same effects (like 'chronic overexertion').

Kenny2y10

The only benefit of public schools anymore, from what I can tell, is that very wise and patient parents can use it to support their children in mastering Defense Against the Dark Arts.

Well, that and getting to play with other kids. Which is still pretty cool.

This may be, perhaps, an under-appreciated function of (public) school schooling!

What it's like to dissect a cadaver

Kenny2y2217

I would think the title is itself a content warning.

I guess someone might think this post is or could be far more abstract and less detailed about the visceral realities than it is (or maybe even just using the topic as a metaphor at most).

What kind of specific content warning do you think would be appropriate? Maybe "Describes the dissection of human bodies in vivid concrete terms."?

6Aorou2y

I thought the title was a joke or a metaphor

Dath Ilan's Views on Stopgap Corrigibility

Kenny3y20

I was going to share it with you if you didn't have it, but thanks!

Dath Ilan's Views on Stopgap Corrigibility

Kenny3y10

Has anyone shared the link with you yet?

4Aorou3y

Yes! Eliezer did on another post. Here it is if you want it: https://discord.gg/45fkqBZuTB

Kenny3y41

After a long day of work, you can kick back with projectlawful for a few hours, and then go to sleep. You can read projectlawful on the weekend. You can read projectlawful on vacation. It's rest and rejuvenation and recharging ...

I did NOT find this to be the case – I found it way TOO engaging and that it therefore, e.g. actively disrupted my ability to go to sleep. I also found the story to be extremely upsetting, i.e. NOT restful or rejuvenating. As-of now, it's extremely bleak.

I very much DO like it and I am perfectly happy that it's a glowfic. (Ther... (read more)

Appendix: Jargon Dictionary

Kenny3y12

I thought it might be a reference to this:

In the Cells of the Eggplant | Meta-rationality

Againstness

Kenny3y30

I think 'againstness' is nearly perfect :)

I didn't think anything was confusing!

'Againstness' felt like a nearly self-defining word to me.

Your course had a rough/sketched/outlined model based on other models at various levels and there's a few example techniques based on it (in the course).

"againstness control" is totally sensible – just like, e.g. 'againstness management' and 'againstness practice', are too.

I think there's an implied (and intriguing) element of using SNS arousal/dominance for, e.g. motivation. I think there are some times or circumstances... (read more)

Wolfram Research v Cook

Kenny3y10

I think I'm missing a LOT of context you have about this. I very well could be – probably am – missing some point, but I also feel like you're discouraging me from voicing anything that doesn't assume whatever your point is. Is it just that "Stephen Wolfram is bad and everyone should ignore him."? I honestly tried to investigate this, however poorly I might have done that, but this comment comes across as pretty hostile. Is it your intention to dissuade me from writing about this at all?

They bring it up because it is a shocking violation of norms, even c

Kenny3y10

I now think it is plausible that Wolfram sued "over literary conventions":

Wolfram Research v Cook - LessWrong

I suspect that Wolfram just wanted to reveal the relevant proof himself, first, in his book NKS (A New Kind of Science), and that Matthew Cook probably was contractually obligated to allow Wolfram to do that.

Given that the two parties settled, and that Cook published his paper about his proof in Wolfram's own journal (Complex Systems), two years after NKS was published, seems to mostly confirm my suspicions.

It’s Probably Not Lithium

Kenny3y10

The 'components' of our diet, e.g. meat, potatoes, etc., are very different now than earlier, and more different over the last 100 years than prior periods too.

I suspect people that are doing diets like this tho are much less obese, e.g. the Amish.

Dath Ilani Rule of Law

Kenny3y20

I've weirdly been less and less bothered since my previous comment! :)

I think "planecrash" is a better overall title still, so thanks for renaming all of the links.

It’s Probably Not Lithium

Kenny3y20

Huh – I wonder if this has helped me since I made a concerted effort to eat leafy greens regularly (basically every day).

I always liked the 'fact' that celery has net-negative calories :)

I do also lean towards eating fruit raw versus, e.g. blended in a smoothy. Make-work for my gastrointestinal system!

It’s Probably Not Lithium

Kenny3y0-8

I think you're making an unsupported inferential leap in concluding "they seem oddly uninterested in ...".

I would not expect to know why they haven't responded to my comments, even if I did bring up a good point – as you definitely have.

I don't know, e.g. what their plans are, whether they even are the kind of blogger that edits posts versus write new follow-up posts instead, how much free time they have, whether they interpreted a comment as being hostile and thus haven't replied, etc..

You make good points. But I would be scared if you 'came after me' as you seem to be doing to the SMTM authors!

It’s Probably Not Lithium

Kenny3y2-8

It just seems to me that the SMTM authors are doing a very bad job at actually pursuing the truth

I think – personally – you're holding them to an unrealistically high standard!

When I compare SMTM to the/a modal person or even a modal 'rationalist', I think they're doing a fantastic job.

Please consider being at least a little more charitable and, e.g. 'leaving people a line of retreat'.

We want to encourage each other to be better, NOT to discourage them from trying at all! :)

It’s Probably Not Lithium

Kenny3y20

I was, and still am, tho much less, excited about the contamination theory – much easier to fix!

But I think I'm back to thinking basically along the lines you outlined.

I'm currently losing weight and my model of why is:

I'm less stressed, and depressed, than recently, and I've been able to better stop eating when I'm satiated.
I'm exercising regularly and intensely; mainly rock climbing and walking (with lots of decent elevation changes). It being sunnier and warmer with spring/summer has made this much more appealing.
I'm maybe (hypo)manic (or 'in that d

... (read more)

It’s Probably Not Lithium

Kenny3y30

I also thought it was (plausibly) a 'friendly challenge' – we should be willing to bet on our beliefs!

And we should be willing to bet and also trust each other to not defect from our common good.

It’s Probably Not Lithium

Kenny3y10

The challenge did specify [emphasis mine]:

up to $1000

It’s Probably Not Lithium

Kenny3y10

I think they're a proponent of the 'too palatable food' theory.

2Matthew Green3y

Isn’t the “too palatable food” theory ridiculously easy to test, once you define what “too palatable” means? Assuming that we grant the “obesity epidemic is caused by changes in diet over the 20th century” you’d just need to switch people to an unrestricted-calorie diet that mirrors the homemade foods that our ancestors ate in the early 1900s and see if their satiety plummeted. (Here in the US that would still be a pretty diverse and filling diet that includes lots of meat, potatoes and pie.) I am skeptical that this would work (at the effect size needed to explain the obesity epidemic) but I’d love to see it tested.

It’s Probably Not Lithium

Kenny3y31

Thanks!

I've definitely downgraded the (lithium) contamination theory. I'll still take a (very modest) 100:1 bet on it tho :)

In regard to your (implied) criticism that SMTM's blog post(s) haven't been edited, it occurred to me that they may not be a 'edit blog posts' person. That seems related to their offered reasons for refusing bet challenge, i.e. 'we're in hypothesis exploration mode'. They might similarly be intending to write a follow-up blog post instead of editing the existing one.

(I actually prefer 'new post' versus 'edit existing post' as a blogging/writing policy – if there isn't a very nice (e.g. 'GitHub like') history diff visualization available of the edit history.)