khafra

Wikitag Contributions

Comments

Sorted by
khafra20

It may be time to revisit this question. With Owain Evans et. al. discovering a generalized evil vector in LLMs, and older work like [Pretraining Language Models with Human Preferences](https://www.lesswrong.com/posts/8F4dXYriqbsom46x5/pretraining-language-models-with-human-preferences) that could use a follow-up, AI in the current paradigm seems ripe for some experimentation with parenting practices in pre-training--perhaps something like affect markers for the text that goes in, or pretraining on children's literature before going on to the more technically and morally complex text? 
I haven't run any experiments  of my own, but this doesn't seem obviously stupid to me.

khafra40

When there's little incentive against classifying harmless documents, and immense cost to making a mistake in the other direction, I'd expect overclassification to be rampant in these bureaucracies.

Your analysis of the default incentives is correct. However, if there is any institution that has noticed the mounds of skulls, it is the DoD. Overclassification, and classification for inappropriate reasons (explicitly enumerated in written guidance: avoiding embarrassment, covering up wrongdoing) is not allowed, and the DoD carries out audits of classified data to identify and correct overclassification.


It’s possible they’re not doing enough to fight against the natural incentive gradient toward overclassification, but they’re trying hard enough that I wouldn’t expect positive EV from disregarding all the rules.

khafra*2818

As someone who has been allowed access into various private and government systems as a consultant, I think the near mode view for classified government systems is different for a reason. 


E.g., data is classified as Confidential when its release could cause damage to national security. It's Secret if it could cause serious damage to national security, and it's Top Secret if it could cause exceptionally grave damage to national security. 
People lose their jobs for accidentally putting a classified document onto the wrong system, even if it's still owned by the government and protected (but, protected at an insufficient level for the document). People go to jail for putting classified data onto the wrong system on purpose, even if they didn't intend to, say, sell it to the Chinese government. 

Bringing in personnel who haven't had the standard single-scope background investigation and been granted a clearance, and a new set of computers which has not gone through any accreditation and authorization process, and giving unrestricted write and read access to classified data is technically something the president could allow. But it's a completely unprecedented level of risk to assume; and AFAICT the president has not actually written any authorizations for doing this. 

There is, actually, a Government Accounting Office which does audits; they have identified billions in fraud, waste, and abuse, identified the perpetrators for punishment, and remediated the programs at fault. They have done it without unprecedented breaches in national security, or denying lawful, non-fraudulent payments from the US Treasury.
(Also, outside of my personal area of expertise, I believe denying lawful, non-fraudulent payments from the US Treasury is crossing a really big Chesterton's Fence. GPT-4o estimated a $1T-$5T impact from treasury bond yield spread, forex USD reserves, CDS spreads on US foreign debt, loss of seignorage in global trade, depending on how rare and targeted the payment denial is).

khafra110

The quoted paragraph is a reference to a CS Lewis essay about living under the threat of global thermonuclear war. The euphony and symmetry with the original quote is damaged by making it slightly more accurate by using that phrase instead of "if we are going to be destroyed by Zizianism."

khafra40

This is the most optimistic  believable scenario I've seen in quite a while!

khafra2-2

And yet it behaves remarkably sensibly. Train a one-layer transformer on 80% of possible addition-mod-59 problems, and it learns one of two modular addition algorithms, which perform correctly on the remaining validation set. It's not a priori obvious that it would work that way! There are other possible functions on  compatible with the training data.

Seems like Simplicia is missing the worrisome part--it's not that the AI will learn a more complex algorithm which is still compatible with the training data; it's that the simplest several algorithms compatible with the training data will kill all humans OOD.

khafra40

AFAICT, in the Highwayman example, if the would-be robber presents his ultimatum as "give me half your silk or I burn it all," the merchant should burn it all, same as if the robber says "give me 1% of your silk or I burn it all." 
But a slightly more sophisticated highwayman might say "this is a dangerous stretch of desert, and there are many dangerous, desperate people in those dunes. I have some influence with most of the groups in the next 20 miles. For x% of your silk, I will make sure you are unmolested for that portion of your travel." 
Then the merchant actually has to assign a probabilities to a bunch of events, calculate Shapley values, and roll some dice for his mixed strategy. 

khafra72

Tangentially to Tanagrabeast's "least you can do" suggestion, as a case report: I came out to my family as an AI xrisk worrier over a decade ago, when one could still do so in a fairly lighthearted way.  They didn't immediately start donating to MIRI and calling their senators to request an AI safety manhattan project, but they did agree with the arguments I presented, and check up with me, on occasion, about how the timelines and probabilities are looking. 

I have had two new employers since then, and a few groups of friends; and with each, when the conversation turns to AI (as it often does, over the last half-decade), I mention my belief that it's likely going to kill us all, and expand on Instrumental Convergence, RAAP, and/or "x-risk, from Erewhon, to IJ Good, to the Extropians," depending on which aspect people seem interested in. I've been surprised by the utter lack of dismissal and mockery, so far!

khafra50

See also Steven Kaas' aphorisms on twitter:

> First Commandment of the Church of Tautology: Live next to thy neighbor  
And  
> "Whatever will be will be" is only the first secret of the tautomancers.
 

khafra120

The story I read about why neighbor polling is supposed to correct for bias in specifically the last few presidential elections is that some people plan to vote for Trump, but are ashamed of this, and don't want to admit it to people who aren't verified Trump supporters. So if you ask them who they plan to vote for, they'll dissemble. But if you ask them who their neighbors are voting for, that gives them permission to share their true opinion non-attributively. 

Load More