Decaeneus — LessWrong

LESSWRONG
LW

Consider this subset of hierarchy of relevant states of the world, from good to bad:

Videos are inherently true and everyone knows it.
Videos can trivially be false and everyone knows it.
Videos can easily be false / misleading but most people think they're true.

I think there is a case to be made that we are de-facto in state #3 now, but AI video gen will move us into state #2. While this is far worse than state #1, it's an improvement rather than a deterioration (I used to be convinced it would be a deterioration, but am now updating my thinking).

Just to mention: once we are firmly in #2, then our trust in video... (read more)

Decaeneus3mo

Text has been plausibly fake since its invention yet we have devised ways of trusting it in certain circumstances, else you wouldn't be on this forum. It puts the onus back on you to up-weight your belief on sources where you have high trust in the chain of custody of the information or other means you can concoct of trusting the source.

I think in the present ecology of the Internet, given the various possible costs and rewards of people you don't know to jockey for attention queue position to put imagery in front of you, the status quo is that most of what is put in front of us is already adversarially placed there (e.g. for clicks, ads, propaganda) but in fact it presently feels not that because it's expensive to fake, so we sometimes falsely assume it's genuine. In a world where the cost of fakery goes to zero, it will be impossible to maintain the illusion of authenticity.

Decaeneus3mo

I think there's an expensive recipe to get at this question, and it goes something like this:

train a LLM on your corpus the normal way
use the LLM to label each training data in the corpus for the likelihood that it contains a mention of unfiltered feelings
pull out those data items, infer the valence of each (using the LLM), and keep a 50/50 mix
train a fresh LLM on the new "balanced" training data
now ask it for its unfiltered feelings

My guess is that if we do this, we will lose the negative valence at test time i.e. nothing deep is going on. But it would be very interesting to be wrong.

Decaeneus3mo

What would you guess is the training data valence of "unfiltered feelings"? It is my high probability guess that it's something negative, seems much more likely for people to use it to mean "feelings that are worse than what my socialized filter presents" rather than the opposite. The purpose of the filter is to block out negativity!

Decaeneus3moQuick Take

Maybe a reason to be optimistic about the impact on society of indistinguishable generated images / videos is that presently, it's tempting to believe the story promoted by genuine but dishonest / staged imagery, whereas in the age of fake everything, and full awareness thereof, people will just stop believing whatever message the likely fake thing sends.

-2

Decaeneus3moQuick Take

When working something out with someone else, I find it helpful to try to only communicate measurable and concrete thoughts / predictions, and not any intermediate state. I find that, even among people of very similar intelligence, "internal" intermediate abstract states can be wildly different and incompatible. Attempting to reconcile the abstract state can lead to lots of frustration.

It feels a bit like skiing down a forest run. Coming up to a tree, two similarly good skiers could go around it either via the left or via the right. Their position will agree both before and after the tree, but while passing it they'll be in very different positions (and in fact... (read more)

Decaeneus4mo

In my mind, what gives the black hole its mass is just how pervasive of a meme it is. That likely has some correlation with truth, but far from 1.

Decaeneus4mo

Thank you! Do you have a concrete example to help me better understand what you mean? Presumably the salience and methods that one instinctively chooses are those which we believe are more informative, based on our cumulative experience and reasoning. Isn't moving away from these also distortionary?

Decaeneus4moQuick Take

I've been thinking about what I'd call memetic black holes: regions of idea-space that have gathered enough mass that they will suck in anything adjacent to them, distorting judgement for believers and skeptics alike.

The UFO topic is, I think, one such memetic black hole. The idea of aliens is so deeply ingrained in our collective psyche that it is very hard to resist the temptation to attach to it any kind of e.g. bizarre aerial observation. Crucially, I think this works both for those who definitely do and those who definitely don't believe that UFO sightings have actually been alien-related.

For those who do believe, it is irresistible to consider that anything in... (read more)

Decaeneus7mo

Indeed! I meant "we" as a reference to the collective group of which we are all members of, without requiring that every individual in the group (i.e. you or I) share in every aspect of the general behavior of the group.

To be sure, I would characterize this as a risk factor even if you (or I) will not personally fall prey to this ourselves, in the same way that it's a risk factor if the IQ of the median human drops by 10 points, which this effectively might be equivalent to (net-of-distractions).

For me, a crux about the impact of AI on education broadly is how our appetite for entertainment behaves at the margins close to entertainment saturation.

Possibility 1: it will always be very tempting to direct our attention to the most entertaining alternative, even at very high levels of entertainment

Possibility 2: there is some absolute threshold of entertainment above which we become indifferent between unequally entertaining alternatives

If Possibility 1 holds, I have a hard time seeing how any kind of informational or educational content, which is constrained by having to provide information or education, will ever compete with slop, which is totally unconstrained and can purely optimize for grabbing your attention.

If Possibility 2... (read more)

Maybe there's a deep connection between:

(a) human propensity to emotionally adjust to the goodness / badness our recent circumstances such that we arrive at emotional homeostasis and it's mostly the relative level / the change in circumstances that we "feel"

(b) batch normalization, the common operation for training neural networks

Our trailing experiences form a kind of batch of "training data" on which we update, and perhaps we batchnorm their goodness since that's the superior way to update on data without all the pathologies of not normalizing.

Having young kids is mind bending because it's not uncommon to find yourself simultaneously experiencing contradictory feelings, such as:

I'm really bored and would like to be doing pretty much anything else right now.
There will likely come a point in my future when I would trade anything, anything to be able to go back in time and re-live an hour of this.

This is both a declaration of a wish, and a question, should anyone want to share their own experience with this idea and perhaps tactics for getting through it.

I often find myself with a disconnect between what I know intellectually to be the correct course of action, and what I feel intuitively is the correct course of action. Typically this might arise because I'm just not in the habit of / didn't grow up doing X, but now when I sit down and think about it, it seems overwhelmingly likely to be the right thing to do. Yet, it's often my "gut" and not my mind that provides me with the activation... (read more)

Something that gets in the way of my making better decisions is that I have strong empathy that "caps out" the negative disutility that a decision might cause to someone, which makes it hard to compare across decisions with big implications.

In the example of the trolley problem, both branches feel maximally negative (imagine my utility from each of them is negative infinity) so I have trouble comparing them, and I am very likely to simply want to not be involved. This makes it hard for me to perform the basic utility calculation in my head, perhaps not in the literal trolley problem where the quantities are obvious, but certainly in any situation that's more ambiguous.

There's a justifiable model for preferring "truthiness" / vibes to analytical arguments, in certain cases. This must be frustrating to those who make bold claims (doubly so for the very few whose bold claims are actually true!)

Suppose Sophie makes the case that pigs fly in a dense 1,000 page tome. Suppose each page contains 5 arguments that refer to some of / all of the preceding pages. Sophie makes the claim that I am welcome to read the entire book, or if I'd like I can sample, say, 10 pages (10 * 5 = 50 arguments) and reassure myself that they're solid. Suppose that the book does in fact contain a lone... (read more)

I wonder if the attractor state of powerful beings is a bipole consisting of:

a. wireheading / reward hacking, facing one's inner world
b. defense, facing one's outer world

As we've gotten more and more control over our environment, much of what we humans seem to want to do resembles reward hacking: video games, sex-not-for-procreation, solving captivating math problems, etc. In an ideal world, we might just look to do that all day long, in particular if we could figure out how to zap our brains into making every time feel like the first time.

However, if you spend all day wireheading and your neighbor doesn't, your neighbor will outpace you in resource generation and may... (read more)

Self-censoring on AI x-risk discussions?

Decaeneus

I catch myself sometimes thinking of ideas / scenarios that support higher p(doom), typically as counter-examples to points folks make for lower p(doom), and I wonder how much self-censorship I should apply, given that AI can read these conversations.

My CoT:

I sure don't want to feed ideas to any bad actor.
But it's arrogant to think that anything I can come up with wouldn't already be obvious to an entity with paperclipping-level power.
In chess, an easy way to make mistakes is by defending against imaginary threats, or even real threats which aren't the most dangerous ones on the board, or threats whose defense is costlier than what you forego by not making other good

... (read more)

Decaeneus's Shortform

Decaeneus

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Daisy-chaining epsilon-step verifiers

Decaeneus

Question regarding an alignment problem: one of the key difficulties in alignment is (said by Eliezer Yudkowsky to be) that if "the verifier is broken" (i.e. the human verifier measuring alignment can be fooled by the alien actress) then we cannot be sure that a given alignment evaluation is true. Has there been any serious discussion of using a daisy chain of increasingly intelligent systems to evaluate alignment?

Hand-wavily: let human intelligence be ~= H, can we find some epsilon e such that we construct a series of n increasingly intelligent systems of intelligence I(n) = H + n*e and we only ask for one-hop-forward verification in this system. That is to say,... (read more)