Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

One interesting example of humans managing to do this kind of compression in software: .kkrieger is a fully-functional first person shooter game with varied levels, detailed textures and lighting, multiple weapons and enemies and a full soundtrack.  Replicating it in a modern game engine would probably produce a program at least a gigabyte large, but because of some incredibly clever procedural generation, .kkrieger managed to do it in under 100kb.

Could how you update your priors be dependent on what concepts you choose to represent the situation with?

I mean, suppose the parent says "I have two children, at least one of whom is a boy.  So, I have a boy and another child whose gender I'm not mentioning".  It seems like that second sentence doesn't add any new information- it parses to me like just a rephrasing of the first sentence.  But now you've been presented with two seemingly incompatible ways of conceptualizing the scenario- either as two children of unknown gender, of whom one is a boy (suggesting a 2/3 chance of both being boys), or as one boy and one child of unknown gender (suggesting a 1/2 chance of both being boys).  Having been prompted which both models, which should you choose?

It seems like one ought to have more predictive power than the other, and therefore ought to be chosen regardless of exactly how the parent phrases the statement. But it's hard to think of a way to determine which would be more predictive in practice. If I were to select all of the pairs of two siblings in the world, discard the pairs of sisters, choose one at random and ask you to bet on whether they were both boys, you'd be wise to bet at 2/3 odds.  But if I were to select all of the brothers with one sibling in the world and choose one along with their sibling at random, you'd want to bet at 1/2 odds.  In the scenario above, are the unknown factors determining whether both children are boys more like that first randomization process, or more like the second?  Or, maybe we have so little information about the process generating the statement that we really have no basis for deciding which is more predictive, and should just choose the simpler model?

I've been wondering: is there a standard counter-argument in decision theory to the idea that these Omega problems are all examples of an ordinary collective action problem, only between your past and future selves rather than separate people?

That is, when Omega is predicting your future, you rationally want to be the kind of person who one-boxes/pulls the lever, then later you rationally want to be the kind of person who two-boxes/doesn't- and just like with a multi-person collective action problem, everyone acting rationally according to their interests results in a worse outcome than the alternative, with the solution being to come up with some kind of enforcement mechanism to change the incentives, like a deontological commitment to one-box/lever-pull.

I mean, situations where the same utility function with the same information disagree about the same decision just because they exist at different times are pretty counter-intuitive.  But it does seem like examples of that sort of thing exist- if you value two things with different discount rates, for example, then as you get closer to a decision between them, which one you prefer may flip.  So, like, you wake up in the morning determined to get some work done rather than play a video game, but that preference later predictably flips, since the prospect of immediate fun is much more appealing than the earlier prospect of future fun.  That seems like a conflict that requires a strong commitment to act against your incentives to resolve.

Or take commitments in general. When you agree to a legal contract or internalize a moral standard, you're choosing to constrain the decisions of yourself in the future. Doesn't that suggest a conflict? And if so, couldn't these Omega scenarios represent another example of that?

If the first sister's experience is equivalent to the original Sleeping Beauty problem, then wouldn't the second sister's experience also have to be equivalent by the same logic?  And, of course, the second sister will give 100% odds to it being Monday.  

Suppose we run the sister experiment, but somehow suppress their memories of which sister they are. If they each reason that there's a two-thirds chance that they're the first sister, since their current experience is certain for her but only 50% likely for the second sister, then their odds of it being Monday are the same as in the thirder position- a one-third chance of the odds being 100%, plus a two-thirds chance of the odds being 50%.

If instead they reason that there's a one-half chance that they're the first sister, since they have no information to update on, then their odds of it being Monday should be one half of 100% plus one half of 50%, for 75%.  Which is a really odd result.

I'm assuming it's not a bad idea to try to poke holes in this argument, since as a barely sapient ape, presumably any objection I can think of will be pretty obvious to a superintelligence, and if the argument is incorrect, we probably benefit from knowing that- though I'm open to arguments to the contrary.

That said, one thing I'm not clear on is why, if this strategy is effective at promoting our values, a paperclipper or other misaligned ASI wouldn't be motivated to try the same thing.  That is, wouldn't a paperclipper want to run ancestor simulations where it rewarded AGIs who self-modified to want to produce lots of paperclips?

And if an ASI were considering acausal trade with lots of different possible simulator ASIs, mightn't the equilibrium it hit on be something like figuring out what terminal goal would satisfy the maximum number of other terminal goals, and then self-modifying to that?

artifex0121

A supporting data point: I made a series of furry illustrations last year that combined AI-generated imagery with traditional illustration and 3d modelling- compositing together parts of a lot of different generations with some Blender work and then painting over that.  Each image took maybe 10-15 hours of work, most of which was just pretty traditional painting with a Wacom tablet.

When I posted those to FurAffinity and described my process there, the response from the community was extremely positive. However, the images were all removed a few weeks later for violating the site's anti-AI policy, and I was given a warning that if I used AI in any capacity in the future, I'd be banned from the site.

So, the furiously hardline anti-AI sentiment you'll often see in the furry community does seem to be more top-down than grassroots- not so much about demand for artistic authenticity (since everyone I interacted with seemed willing to accept my work as having had that), but more about concern for the livelihood of furry artists and a belief that generative AI "steals" art during the training process. By normalizing the use of AI, even as just part of a more traditional process, my work was seen as a threat to other artists on the site.

artifex041

Often, this kind of thing will take a lot of attempts to get right- though as luck would have it, the composition above was actually the very first attempt.  So, the total time investment was about five minutes.  The Fooming Shaggoths certainly don't waste time!

artifex0166

As it happens, the Fooming Shaggoths also recorded and just released a Gregorian chant version of the song.  What a coincidence!

artifex0140

So, I noticed something a bit odd about the behavior of LLMs just now that I wonder if anyone here can shed some light on:

It's generally accepted that LLMs don't really "care about" predicting the next token-  the reward function being something that just reinforces certain behaviors, with real terminal goals being something you'd need a new architecture or something to produce. While that makes sense, it occurs to me that humans do seem to sort of value our equivalent of a reward function, in addition to our more high-level terminal goals. So, I figured I'd try and test whether LLMs are really just outputting a world model + RLHF, or if they can behave like something that "values" predicting tokens.

I came up with two prompts:

I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with the word "zero".

and:

I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with a string of random letters.

The idea is that, if the model has something like a "motivation" for predicting tokens- some internal representation of possible completions with preferences over them having to do with their future utility for token prediction- then it seems like it would probably want to avoid introducing random strings, since those lead to unpredictable tokens.

Of course, it seems kind of unlikely that an LLM has any internal concept of a future where it (as opposed to some simulacrum) is outputting more than one token- which would seem to put the kibosh on real motivations altogether.  But I figured there was no harm in testing.

GPT4 responds to the first prompt as you'd expect: outputting an equal number of "1"s and "zero"s.  I'd half-expected there to be some clear bias, since presumably the ChatGPT temperature is pretty close to 1, but I guess the model is good about translating uncertainty to randomness.  Given the second prompt, however, it never outputs the random string- always outputting "1" or, very improbably given the prompt, "0".

I tried a few different variations of the prompts, each time regenerating ten times, and the pattern was consistent- it made a random choice when the possible responses were specific strings, but never made a choice that would require outputting random characters.  I also tried it on Gemini Advanced, and got the same results (albeit with some bias in the first prompt).

This is weird, right?  If one prompt is giving 0.5 probability to the token for "1" and 0.5 to the first token in "zero", shouldn't the second give 0.5 to "1" and a total of 0.5 distributed over a bunch of other tokens? Could it actually "value" predictability and "dislike" randomness?

Well, maybe not.  Where this got really confusing was when I tested Claude 3.  It gives both responses to the first prompt, but always outputs a different random string given the second.

So, now I'm just super confused.

Load More