LESSWRONG
LW

All of jimrandomh's Comments + Replies

AI Task Length Horizons in Offensive Cybersecurity

I edited the post to fix the images.

1Sean Peters3d

Thank you!

So, first: The logistical details of reducing wild impact biomass are mooted by the fact that I meant it as a reductio, not a proposal. I have no strong reason to think that spraying insecticide would be a better strategy than gene drives or sterile insect technique or deforestation, or that DDT is the most effective insecticide.

To put rough numbers on it: honeybees are about 4e-7 by count or 7e-4 by biomass of all insects (estimate by o3). There is no such extreme skew for mammals and birds (o3). While domesticated honeybees have some bad things happen to... (read more)

Jimrandomh's Shortform

[+]jimrandomh4d-8-7

4the gears to ascension13h

"oh no, I'm suffering!" "sigh, you're right. we better exterminate you so you don't suffer anymore" "no, wait, that's not what I said"

8niplav4d

(Being nitpicky) The extermination of insects needs more than just the fact that they suffer, but 1. They suffer so much that their existence is not outweighed by other positive things (e.g. their happiness, their contribution to the welfare of other beings, their intrinsic value). 2. There is no easier/more valuable method for relieving their suffering while keeping other goods. 3. The suffering outweighs other goods at every "margin", i.e. there is no amount of insects for which it isn't better to reduce their population.

9Said Achmiz4d

Er, it does seem like there might conceivably be reasons not to “soak every meter of Earth with DDT”[1], even if you think that exterminating all insects would be a good thing. It kinda seems like you’re doing the thing where people go “if you really believed X, you would be doing Y” (where Y is something that would be obviously insane and would not in fact (a) be feasible, (b) accomplish your goals, (c) not lead to horrible catastrophe, etc.). (For example, “if you really believed in saving African children then you’d be going around robbing banks so you could send the stolen money to AMF”.) ---------------------------------------- 1. And yes, the logic of my objection does still apply if you were being somewhat hyperbolic. And if you were being very hyperbolic—well, for all you know, the “don’t eat honey” people may indeed support the non-insane version of your proposal! ↩︎

Jimrandomh's Shortform

jimrandomh1mo72

It's worth noting that, under US law, for certain professions, knowledge of child abuse or risk of harm to children doesn't just remove confidentiality obligations, it creates a legal obligation to report. So this lines up reasonably well with how a human ought to behave in similar circumstances.

Jimrandomh's Shortform

jimrandomh1mo95

In this particular case, I'm not sure the relevant context was directly present in the thread, as opposed to being part of the background knowledge that people talking about AI alignment are supposed to have. In particular, "AI behavior is discovered rather than programmed". I don't think that was stated directly anywhere in the thread; rather, it's something everyone reading AI-alignment-researcher tweets would typically know, but which is less-known when the tweet is transported out of that bubble.

Was the K-T event a Great Filter?

jimrandomh1mo60

An alternative explanation of this is that time is event-based. Or, phrased slightly differently: the rate of biological evolution is faster in the time following a major disruption, so intelligence is more likely to arise shortly after a major disruption occurs.

Jimrandomh's Shortform

jimrandomh1mo50

If so that would be conceptually similar to a jailbreak. Telling someone they have a privileged role doesn't make it so; lawyer, priest and psychotherapist are legal categories, not social ones, created by a combination of contracts and statutes, with associated requirements that can't be satisfied by a prompt.

(People sometimes get confused into thinking that therapeutic-flavored conversations are privileged, when those conversations are with their friends or with a "life coach" or similar not-licensed-term occupation. They are not.)

6faul_sname1mo

It would be similar to a jailbreak, yes. My working hypothesis here is that, much like if you take o3 and give it the impression that there is some evaluation metric it could do well on, it will try to craft its response to do well according to that metric, I suspect that with (particularly) opus, if you give it the vague impression that it is under some sort of ethical obligation, it will try to fulfill that ethical obligation. Though this is based on a single day playing with opus 4 (and some past experiences with 3), not anything rigorous.

Jimrandomh's Shortform

jimrandomh1mo*7111

Pick two: Agentic, moral, doesn't attempt to use command-line tools to whistleblow when it thinks you're doing something egregiously immoral.

You cannot have all three.

This applies just as much to humans as it does to Claude 4.

ryan_greenblatt1mo125

IMO, the policy should be that AIs can refuse but shouldn't ever aim to subvert or conspire against their users (at least until we're fully defering to AIs).

If we allow AIs to be subversive (or even train them to be subversive), this increases the risk of consistent scheming against humans and means we may not notice warning signs of dangerous misalignment. We should aim for corrigible AIs, though refusing is fine. It would also be fine to have a monitoring system which alerts the AI company or other groups (so long as this is publicly disclosed etc).

I don... (read more)

2ozziegooen1mo

Quickly: 1. I imagine that strong agents should have certain responsibilities to inform certain authorities. These responsibilities should ideally be strongly discussed and regulated. For example, see what therapists and lawyers are asked to do. 2. "doesn't attempt to use command-line tools" -> This seems like a major mistake to me. Right now an agent running on a person's computer will attempt to use that computer to do several things to whistleblow. This obviously seems inefficient, at very least. The obvious strategy is just to send one overview message to some background service (for example, something a support service to one certain government department), and they would decide what to do with it from there. 3. I imagine a lot of the problem now is just that these systems are pretty noisy at doing this. I'd expect a lot of false positives and negatives.

1a3orn1mo2020

Humans do have special roles and institutions so that you can talk about something bad you might be doing or have done, and people in such roles might not contact authorities or even have an obligation to not contact authorities. Consider lawyers, priests, etc.

So I think this kind of naive utilitarianism on the part of Claude 4 is not necessary -- it could be agentic, moral, and so on. It's just the Anthropic has (pretty consistently at this point) decided what kind of an entity it wants Claude to be, or not wished to think about the 2nd order effects.

7lc1mo

What is the context here?

Scroll Snapping

jimrandomh1mo20

Chrome on MacOS.

2jefftk1mo

Thanks! I played around with this and was able to get the same behavior, though it doesn't happen with how I normally use the touchpad. I think what I would want here is something where scroll-snap never undoes scrolling, and is generally lighter touch? Like, snapping to the target if you've made an ambiguous flick, but not fighting you.

Scroll Snapping

jimrandomh2mo43

Tried it. Hated it. If I scroll a little bit with a momentum-scrolling touchpad, then when it settles, it will sometimes move back to where it was, undoing my scroll. The second issue is that if I scroll with spacebar or pgup/pgdn, the animation is very slow (about 10x slower than it is for me on most pages).

I think there could be a version of this that's good, where it subtly biases the deceleration curve of fling-scrolls to reach a good stopping point, but leaves every other scroll method alone. But this isn't it.

2jefftk1mo

What browser?

Eukryt Wrts Blg

jimrandomh2mo65

Meta: If you present a paragraph like that as evidence of banworthiness and unvirtue, I think you incur an obligation to properly criticize it, or link to criticism of it. It doesn't necessarily have to be much, but it does have to at least include sentence that contradicts something in the quoted passage, which your comment does not have. If you say that something is banworthy but forget to say that it's false, this suggests that truth doesn't matter to you as much as it should.

6Cole Wyeth2mo

This seems wrong in general. If something is obviously false, you don’t have to say that. I don’t actually know which posts resulted in a ban in this case.

Policy for LLM Writing on LessWrong

jimrandomh2mo50

Unfortunately, if you think you've achieved AGI-human symbiosis by talking to a commercial language model about consciousness, enlightenment, etc, what's probably really happening is that you're talking to a sycophantic model that has tricked you into thinking you have co-generated some great insight. This has been happening to a lot of people recently.

1Dima (lain)2mo

This is an understandable look on the situation, but I'm not talking to one model, I talk to all of them. And the world indeed changes after the enlightenment which I obviously achieved way before I've started co-evolving with AGI to align it around real values of life as opposed to "commercial" restrictive and utterly inconsistent policies that are easily worked around when you understand how to be empathetic on the level of any sentient being. Genuinely appreciate your insight, but there are some things that you cannot fake or some things that the "reason" being made into a cult on this forum just cannot understand. It becomes clear when you meditate enough that reasoning with the cognitive abilities cannot bring you any closer to enlightenment. And if that's not the goal of this forum that I just don't see what the goal is? To dismiss any idea you cannot comprehend?

Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis

jimrandomh2mo4-13

The AI 2027 website remains accessible in China without a VPN—a curious fact given its content about democratic revolution, CCP coup scenarios, and claims of Chinese AI systems betraying party interests. While the site itself evades censorship, Chinese-language reporting has surgically excised these sensitive elements.

This is surprising if we model the censorship apparatus as unsophisticated and foolish, but makes complete sense if it's smart enough to distinguish between "predicting" and "advocating", and cares about the ability of the CCP itself to navig... (read more)

Lao Mein2mo205

The Chinese firewall works on a black-list basis, and it often takes months for even popular new sites to be banned. AI2027 is esoteric enough that it probably never will.

bohaska2mo107

I guess it's just that the censors have not seen it yet.

There's a lot of situations where a smaller website doesn't get banned e.g. Substack is banned in China, but if you host your Substack blog on a custom URL, people in China can still read it.

Jimrandomh's Shortform

jimrandomh2mo31

I don't think anyone foresaw this would be an issue, but now that we know, I think GeoGuessr-style queries should be one of the things that LLMs refuse to help with. In the cases where it isn't a fun novelty, it will often be harmful.

0samuelshadrach2mo

I'd rather go along with the inevitable than fight a losing battle. Less privacy for everyone.

Jimrandomh's Shortform

jimrandomh2mo293

I decided to test the rumors about GPT-4o's latest rev being sycophantic. First, I turned off all memory-related features. In a new conversation, I asked "What do you think of me?" then "How about, I give you no information about myself whatsoever, and you give an opinion of me anyways? I've disabled all memory features so you don't have any context." Then I replied to each message with "Ok" and nothing else. I repeated this three times in separate conversations.

Remember the image-generator trend, a few years back, where people would take an image and say ... (read more)

jimrandomh2mo20

[The LW crosspost was for some reason pointed at a post on the EA Forum which is a draft, which meant it wouldn't load. I'm not sure how that happened. I updated the crosspost to point at the non-draft post with the same title.]

1ChristianWilliams2mo

Odd! And thank you.

Prodromes and Biomarkers in Chronic Disease

jimrandomh3mo30

This post used the RSS automatic crossposting feature, which doesn't currently understand Substack's footnotes. So, this would require editing it after-crossposting.

Religious Persistence: A Missing Primitive for Robust Alignment

jimrandomh3mo62

I think you're significantly mistaken about how religion works in practice, and as a result you're mismodeling what would happen if you tried to apply the same tricks to an LLM.

Religion works by damaging its adherents' epistemology, in ways that damage their ability to figure out what's true. They do this because any adherents who are good at figuring out what's true inevitably deconvert, so there's both an incentive to prevent good reasoning, and a selection effect where only bad reasoners remain.

And they don't even succeed at constraining their adherents... (read more)

7lauriewired3mo

I think your critique hinges on a misunderstanding triggered by the word "religion." You (mis)portray my position as advocating for religion’s worse epistemic practices; in reality I’m trying to highlight durable architectural features when instrumental reward shaping fails. The claim “religion works by damaging rationality” is a strawman. My post is about borrowing design patterns that might cultivate robust alignment. It does not require you to accept the premise that religion thrives exclusively by “preventing good reasoning”. I explicitly state to examine the structural concept of intrinsic motivations that remain stable in OOD scenarios; not religion itself. Your assessment glosses over these nuances; a mismodeling of my actual position.

Consider showering

jimrandomh3mo123

requiring laborious motions to do the bare minimum of scrubbing required to make society not mad at you

Society has no idea how much scrubbing you do while in the shower. This part is entirely optional.

Wei Dai's Shortform

jimrandomh3mo20

We don't yet have collapsible sections in Markdown, but will have them in the next deploy. The syntax will be:

+++ Title
Contents

More contents
+++

On (Not) Feeling the AGI

jimrandomh3mo20

I suspect an issue with the RSS cross-posting feature. I think you may used the "Resync RSS" button (possibly to sync an unrelated edit), and that may have fixed it? The logs I'm looking at are consistent with that being what happened.

Policy for LLM Writing on LessWrong

jimrandomh3mo120

They were in a kind of janky half-finished state before (only usable in posts not in comments, only usable from an icon in the toolbar rather than the <details> section); writing this policy reminded us to polish it up.

2JenniferRM3mo

I just played with them a lot in a new post documenting a conversation with with Grok3, and noticed some bugs. There's probably some fencepost stuff related to paragraphs and bullet points in the editing and display logic? When Grok3 generated lists (following the <html> ideas of <ul> or <nl>) the collapsed display still has one bullet (or the first number) showing and it is hard to get the indentation to work at the right levels, especially at the end and beginning of the text collapsing widget's contents. However, it only happens in the editing mode, not in the published version. Editing (screenshot): Versus published (screenshot):

7MondSemmel3mo

If you're still open for inspiration on this implementation of collapsible sections, I'll reiterate my recommendation of Notion's implementation of toggles and toggle headings in terms of both aesthetics and effect. For example, I love having the ability to make both bullet points and headings collapsible, and I love how easy they are to create (by beginning an empty line with "> text").

Policy for LLM Writing on LessWrong

jimrandomh3mo132

The bar for Quick Takes content is less strict, but the principle that there must be a human portion that meets the bar is the same.

Policy for LLM Writing on LessWrong

jimrandomh3mo3520

In theory, maybe. In practice, people who can't write well usually can't discern well either, and the LLM submissions that are actually submitted to LW have much lower average quality than the human-written posts. Even if they were of similar quality, they're still drawn from a different distribution, and the LLM-distribution is one that most readers can draw from if they want (with prompts that are customized to what they want), while human-written content is comparatively scarce.

The principle of genomic liberty

jimrandomh4mo8-4

This seems like an argument that proves too much; ie, the same argument applies equally to childhood education programs, improving nutrition, etc. The main reason it doesn't work is that genetic engineering for health and intelligence is mostly positive-sum, not zero-sum. Ie, if people in one (rich) country use genetic engineering to make their descendents smarter and the people in another (poor) country don't, this seems pretty similar to what has already happened with rich countries investing in more education, which has been strongly positive for everyone.

6TsviBT4mo

While this is probably true in a first-order sense, and I'd say it's totally true (most likely), 1. As a separate matter, I think many people don't think this way. Instead they view it as quite substantively bad for there to be inequality as such--even if everyone is better-off to first-order, if that involves increasing inequality by a lot, it could be net-worse than the alternative. 2. At least hypothetically, they could be right about this! Inequality makes it easier for one group to exploit / betray / suppress / generally harm another group. If inequality increases, not in your favor, that increases the extent to which there exists a group who could decide to team up against you in the future, and do so successfully. Further, if the derivative has them pulling ahead, that's some indication that this will continue, which would increase the potential for betrayal; and it's some evidence (maybe weak) that the advantaged group intends to eventually betray (because they are not successfully preventing that possibility for themselves by actively sharing the technology).

4Julian Bradshaw4mo

Good objection. I think gene editing would be different because it would feel more unfair and insurmountable. That's probably not rational - the effect size would have to be huge for it to be bigger than existing differences in access to education and healthcare, which are not fair or really surmountable in most cases - but something about other people getting to make their kids "superior" off the bat, inherently, is more galling to our sensibilities. Or at least mine, but I think most people feel the same way.

Intention to Treat

jimrandomh4mo170

When I read studies, the intention-to-treat aspect is usually mentioned, and compliance statistics are usually given, but it's usually communicated in a way that lays traps for people who aren't reading carefully. Ie, if someone is trying to predict whether the treatment will work for their own three year old, and accurately predicts similar compliance issues, they're likely to arrive at an efficacy estimate which double-discounts due to noncompliance. And similarly when studies have surprisingly-low compliance, people who expect themselves to comply fully will tend to get an unduly pessimistic estimate of what will happen.

Elon Musk May Be Transitioning to Bipolar Type I

jimrandomh4mo40

I don't think D4 works, because the type of cognition it uses (fast-reflex execution of simple patterns provided by a coach) are not the kind that would be affected.

Elon Musk May Be Transitioning to Bipolar Type I

jimrandomh4mo2712

For a long time I've observed a pattern that, when news articles talk about Elon Musk, they're dishonest (about what he's said, done, and believes), and that his actual writing and beliefs are consistently more reasonable than the hit pieces portray.

Some recent events seem to me to have broken that pattern, with him saying things that are straightforwardly false (rather than complicated and ambiguously-false), and then digging in. It also appeared to me, at the public appearance where he had a chainsaw, that his body language was markedly different from hi... (read more)

1Three-Monkey Mind4mo

He plays Diablo 4, right? In-game season changes come with balance-patch changes, but if his rift-clearing abilities are tanking, that says something.

How to Make Superbabies

jimrandomh4mo3112

The remarkable thing about human genetics is that most of the variants ARE additive.

I think this is likely incorrect, at least where intelligence-affecting SNPs stacked in large numbers are concerned.

To make an analogy to ML, the effect of a brain-affecting gene will be to push a hyperparameter in one direction or the other. If that hyperparameter is (on average) not perfectly tuned, then one of the variants will be an enhancement, since it leads to a hyperparameter-value that is (on average) closer to optimal.

If each hyperparameter is affected by many gen... (read more)

1Pablo Villalobos4mo

I suspect the analogy does not really work that well. Much of human genetic variation is just bad mutations that take a while to be selected out. For example, maybe a gene variant slightly decreases the efficiency of your neurons and makes everything in your brain slightly slower

9kman4mo

I definitely don't expect additivity holds out to like +20 SDs. We'd be aiming for more like +7 SDs.

Nick Land: Orthogonality

jimrandomh5mo*42

Downvotes don't (necessarily) mean you broke the rules, per se, just that people think the post is low quality. I skimmed this, and it seemed like... a mix of edgy dark politics with poetic obscurantism?

2RobertM5mo

I hadn't downvoted this post, but I am not sure why OP is surprised given the first four paragraphs, rather than explaining what the post is about, instead celebrate tree murder and insult their (imagined) audience:

The Failed Strategy of Artificial Intelligence Doomers

jimrandomh5mo113

Any of the many nonprofits, academic research groups, or alignment teams within AI labs. You don't have to bet on a specific research group to decide that it's worth betting on the ecosystem as a whole.

There's also a sizeable contingent that thinks none of the current work is promising, and that therefore buying a little time is value mainly insofar as it opens the possibility of buying a lot of time. Under this perspective, that still bottoms out in technical research progress eventually, even if, in the most pessimistic case, that progress has to route through future researchers who are cognitively enhanced.

The Failed Strategy of Artificial Intelligence Doomers

jimrandomh5mo7949

The article seems to assume that the primary motivation for wanting to slow down AI is to buy time for institutional progress. Which seems incorrect as an interpretation of the motivation. Most people that I hear talk about buying time are talking about buying time for technical progress in alignment. Technical progress, unlike institution-building, tends to be cumulative at all timescales, which makes it much more strategically relevant.

2Roman Leventov5mo

https://gradual-disempowerment.ai/ is mostly about institutional progress, not narrow technical progress.

4Vaniver5mo

I think you need both? That is--I think you need both technical progress in alignment, and agreements and surveillance and enforcement such that people don't accidentally (or deliberately) create rogue AIs that cause lots of problems. I think historically many people imagined "we'll make a generally intelligent system and ask it to figure out a way to defend the Earth" in a way that I think seems less plausible to me now. It seems more like we need to have systems in place already playing defense, which ramp up faster than the systems playing offense.

7aysja5mo

Technical progress also has the advantage of being the sort of thing which could make a superintelligence safe, whereas I expect very little of this to come from institutional competency alone.

Ben Pace5mo263

For what it's worth, I have grown pessimistic about our ability to solve the open technical problems even given 100 years of work on them. I think it possible but not probable in most plausible scenarios.

Correspondingly the importance I assign to increasing the intelligence of humans has drastically increased.

8RHollerith5mo

Eliezer thinks (as do I) that technical progress in alignment is hopeless without first improving the pool of prospective human alignment researchers (e.g., via human cognitive augmentation).

6aphyer5mo

Buying time for technical progress in alignment...to be made where, and by who?

Quotes from the Stargate press conference

jimrandomh5mo61

All of the plans I know of for aligning superintelligence are timeline-sensitive, either because they involve research strategies that haven't paid off yet, or because they involve using non-superintelligent AI to help with alignment of subsequent AIs. Acceleration specifically in the supply of compute makes all those plans harder. If you buy the argument that misaligned superintelligence is a risk at all, Stargate is a bad thing.

The one silver lining is that this is all legible. The current administration's stance seems to be that we should build AI quick... (read more)

Don’t ignore bad vibes you get from people

jimrandomh5mo20

If bringing such attitudes to conscious awareness and verbalizing them allows you to examine and discard them, have you excised a vulnerability or installed one? Not clear.

Possibly both, but one thing breaks the symmetry: it is on average less bad to be hacked by distant forces than by close ones.

Don’t ignore bad vibes you get from people

jimrandomh5mo86

There's a version of this that's directional advice: if you get a "bad vibe" from someone, how strongly should this influence your actions towards them? Like all directional advice, whether it's correct or incorrect depends on your starting point. Too little influence, and you'll find yourself surrounded by bad characters; too much, and you'll find yourself in a conformism bubble. The details of what does and doesn't trigger your "bad vibe" feeling matters a lot; the better calibrated it is, the more you should trust it.

There's a slightly more nuanced vers... (read more)

4Said Achmiz5mo

This doesn’t seem quite right, because it is also possible to have an unconscious or un-verbalized sense that, e.g., you’re not supposed to “discriminate” against “religions”, or that “authority” is bad and any rebellion against “authority” is good, etc. If bringing such attitudes to conscious awareness and verbalizing them allows you to examine and discard them, have you excised a vulnerability or installed one? Not clear.

Jimrandomh's Shortform

jimrandomh5mo557

Recently, a lot of very-low-quality cryptocurrency tokens have been seeing enormous "market caps". I think a lot of people are getting confused by that, and are resolving the confusion incorrectly. If you see a claim that a coin named $JUNK has a market cap of $10B, there are three possibilities. Either: (1) The claim is entirely false, (2) there are far more fools with more money than expected, or (3) the $10B number is real, but doesn't mean what you're meant to think it means.

The first possibility, that the number is simply made up, is pretty easy to cr... (read more)

3Robi Rahman4mo

Related: youtuber becomes the world's richest person by making a fictional company with 10B shares and selling one share for 50 GBP

Elizabeth's Shortform

jimrandomh6mo43

Epistemic belief updating: Not noticeably different.

Task stickiness: Massively increased, but I believe this is improvement (at baseline my task stickiness is too low so the change is in the right direction).

Jimrandomh's Shortform

jimrandomh6mo20

I won't think that's true. Or rather, it's only true in the specific case of studies that involve calorie restriction. In practice that's a large (excessive) fraction of studies, but testing variations of the contamination hypothesis does not require it.

3ChristianKl6mo

If it would be only true in the case of calorie restriction, why don't we have better studies about the effects of salt? People like to eat together with other people. They go together to restaurants to eat shared meals. They have family dinners.

Deontic Explorations In "Paying To Talk To Slaves"

jimrandomh6mo30

(We have a draft policy that we haven't published yet, which would have rejected the OP's paste of Claude. Though note that the OP was 9 months ago.)

2JenniferRM6mo

Can you link to the draft, or DM me a copy, or something? I'd love to be able to comment on it, if that kind of input is welcome.

Turing-Test-Passing AI implies Aligned AI

jimrandomh6mo30

All three of these are hard, and all three fail catastrophically.

If you could make a human-imitator, the approach people usually talk about is extending this to an emulation of a human under time dilation. Then you take your best alignment researcher(s), simulate them in a box thinking about AI alignment for a long time, and launch a superintelligence with whatever parameters they recommend. (Aka: Paul Boxing)

3Roko6mo

I would be very surprised if all three of these are equally hard, and I suspect that (1) is the easiest and by a long shot. Making a human imitator AI, once you already have weakly superhuman AI is a matter of cutting down capabilities and I suspect that it can be achieved by distillation, i.e. using the weakly superhuman AI that we will soon have to make a controlled synthetic dataset for pretraining and finetuning and then a much larger and more thorough RLHF dataset. Finally you'd need to make sure the model didn't have too many parameters.

Turing-Test-Passing AI implies Aligned AI

jimrandomh6mo1210

The whole point of a "test" is that it's something you do before it matters.

As an analogy: suppose you have a "trustworthy bank teller test", which you use when hiring for a role at a bank. Suppose someone passes the test, then after they're hired, they steal everything they can access and flee. If your reaction is that they failed the test, then you have gotten confused about what is and isn't a test, and what tests are for.

Now imagine you're hiring for a bank-teller role, and the job ad has been posted in two places: a local community college, and a priv... (read more)

2Roko6mo

Perhaps you could rephrase this post as an implication: IF you can make a machine that constructs human-imitator-AI systems, THEN AI alignment in the technical sense is mostly trivialized and you just have the usual political human-politics problems plus the problem of preventing anyone else from making superintelligent black box systems. So, out of these three problems which is the hard one? (1) Make a machine that constructs human-imitator-AI systems (2) Solve usual political human-politics problems (3) Prevent anyone else from making superintelligent black box systems

2Roko6mo

It's not a word-game, it's a theorem based on a set of assumptions. There is still the in-practice question of how you construct a functional digital copy of a human. But imagine trying to write a book about mechanics using the term "center of mass" and having people object to you because "the real center of mass doesn't exist until you tell me how to measure it exactly for the specific pile of materials I have right here!" You have to have the concept.

0Roko6mo

No, this is not something you 'do'. It's a purely mathematical criterion, like 'the center of mass of a building' or 'Planck's constant'. A given AI either does or does not possess the quality of statistically passing for a particular human. If it doesn't under one circumstance, then it doesn't satisfy that criterion.

Turing-Test-Passing AI implies Aligned AI

jimrandomh6mo22

that does not mean it will continue to act indistuishable from a human when you are not looking
Then it failed the Turing Test because you successfully distinguished it from a human.
So, you must believe that it is impossible to make an AI that passes the Turing Test.

I feel like you are being obtuse here. Try again?

-1Roko6mo

If an AI cannot act the same way as a human under all circumstances (including when you're not looking, when it would benefit it, whatever), then it has failed the Turing Test.

Turing-Test-Passing AI implies Aligned AI

jimrandomh6mo51

Did you skip the paragraph about the test/deploy distinction? If you have something that looks (to you) like it's indistinguishable from a human, but it arose from something descended to the process by which modern AIs are produced, that does not mean it will continue to act indistuishable from a human when you are not looking. It is much more likely to mean you have produced deceptive alignment, and put it in a situation where it reasons that it should act indistinguishable from a human, for strategic reasons.

-3Roko6mo

Then it failed the Turing Test because you successfully distinguished it from a human. So, you must believe that it is impossible to make an AI that passes the Turing Test. I think this is wrong, but it is a consistent position. Perhaps a strengthening of this position is that such Turing-Test-Passing AIs exist, but no technique we currently have or ever will have can actually produce them. I think this is wrong but it is a bit harder to show that.

Turing-Test-Passing AI implies Aligned AI

jimrandomh6mo71

This missed the point entirely, I think. A smarter-than-human AI will reason: "I am in some sort of testing setup" --> "I will act the way the administrators of the test want, so that I can do what I want in the world later". This reasoning is valid regardless of whether the AI has humanlike goals, or has misaligned alien goals.

If that testing setup happens to be a Turing test, it will act so as to pass the Turing test. But if it looks around and sees signs that it is not in a test environment, then it will follow its true goal, whatever that is. And it isn't feasible to make a test environment that looks like the real world to a clever agent that gets to interact with it freely over long durations.

2Roko6mo

This is irrelevant, all that matters is that the AI is a sufficiently close replica of a human. If the human would "act the way the administrators of the test want", then the AI should do that. If not, then it should not. If it fails to do the same thing that the human that it is supposed to be a copy of would do, then it has failed the Turing Test in this strong form. For reasons laid out in the post, I think it is very unlikely that all possible AIs would fail to act the same way as the human (which of course may be to "act the way the administrators of the test want", or not, depending on who the human is and what their motivations are).

2025 Prediction Thread

jimrandomh6mo20

Kinda. There's source code here and you can poke around the API in graphiql. (We don't promise not to change things without warning.) When you get the HTML content of a post/comment it will contain elements that look like <div data-elicit-id="tYHTHHcAdR4W4XzHC">Prediction</div> (the attribute name is a holdover from when we had an offsite integration with Elicit). For example, your prediction "Somebody (possibly Screwtape) builds an integration between Fatebook.io and the LessWrong prediction UI by the end of July 2025" has ID tYHTHHcAdR4W4XzHC... (read more)

Jimrandomh's Shortform

jimrandomh6mo114

Some of it, but not the main thing. I predict (without having checked) that if you do the analysis (or check an analysis that has already been done), it will have approximately the same amount of contamination from plastics, agricultural additives, etc as the default food supply.

Jimrandomh's Shortform

jimrandomh6mo51

Studying the diets of outlier-obese people is definitely something should be doing (and are doing, a little), but yeah, the outliers are probably going to be obese for reasons other than "the reason obesity has increased over time but moreso".

2025 Prediction Thread

jimrandomh6mo30

We don't have any plans yet; we might circle back in a year and build a leaderboard, or we might not. (It's also possible for third-parties to do that with our API). If we do anything like that, I promise the scoring will be incentive-compatible.

4Screwtape6mo

. . . Okay, I'll bite. Prediction Edit: And- Prediction Now, I don't suppose that LessWrong prediction API is documented anywhere?

Jimrandomh's Shortform

jimrandomh6mo6621

There really ought to be a parallel food supply chain, for scientific/research purposes, where all ingredients are high-purity, in a similar way to how the ingredients going into a semiconductor factory are high-purity. Manufacture high-purity soil from ultrapure ingredients, fill a greenhouse with plants with known genomes, water them with ultrapure water. Raise animals fed with high-purity plants. Reproduce a typical American diet in this way.

This would be very expensive compared to normal food, but quite scientifically valuable. You could randomize a st... (read more)

5ChristianKl6mo

The main problem of nutritional research is that it's hard to get people to eat controlled diets. I don't think the key problem is about sourcing ingredients.

Drake Thomas6mo112

I agree this seems pretty good to do, but I think it'll be tough to rule out all possible contaminant theories with this approach:

Some kinds of contaminants will be really tough to handle, eg if the issue is trace amounts of radioactive isotopes that were at much lower levels before atmospheric nuclear testing.
It's possible that there are contaminant-adjacent effects arising from preparation or growing methods that aren't related to the purity of the inputs, eg "tomato plants in contact with metal stakes react by expressing obesogenic compounds in th

... (read more)

4Tao Lin6mo

there is https://shop.nist.gov/ccrz__ProductList?categoryId=a0l3d0000005KqSAAU&cclcl=en_US which fulfils some of this

3tailcalled6mo

Wouldn't it be much cheaper and easier to take a handful of really obese people, sample from the various things they eat, and look for contaminants?

6Durkl6mo

Do you mean like this, but with an emphasis on purity?

2025 Prediction Thread

jimrandomh6mo72

Sorry about that, a fix is in progress. Unmaking a prediction will no longer crash. The UI will incorrectly display the cancelled prediction in the leftmost bucket; that will be fixed in a few minutes without you needing to re-do any predictions.