LESSWRONG
LW

All of anonymousaisafety's Comments + Replies

Moderation notes re: recent Said/Duncan threads

RobertM2yModerator Comment2218

I'm deeply uncertain about how often it's worth litigating the implied meta-level concerns; I'm not at all uncertain that this way of expressing them was inappropriate. I don't want see sniping like this on LessWrong, and especially not in comment threads like this.

Consider this a warning to knock it off.

4dxu2y

Might I ask what you hoped to achieve in this thread by writing this comment?

Moderation notes re: recent Said/Duncan threads

anonymousaisafety2y84

i.e. splitting hairs and swirling words around to create a perpetual motte-and-bailey fog that lets him endlessly nitpick and retreat and say contradictory things at different times using the same words, and pretending to a sort of principle/coherence/consistency that he does not actually evince.

Yeah, almost like splitting hairs around whether making the public statement "I now categorize Said as a liar" is meaningfully different than "Said is a liar".

Or admonishing someone for taking a potshot at you when they said

However, I suspect that Duncan won'

... (read more)

8Duncan Sabien (Deactivated)2y

✋ The thing that makes LW meaningfully different from the rest of the internet is people bothering to pay attention to meaningful distinctions even a little bit. The distance between "I categorize Said as a liar" and "Said is a liar" is easily 10x and quite plausibly 100-1000x the distance between "You blocked people due to criticizing you" and "you blocked people for criticizing you." The latter is two synonymous phrases; the former is not. (I also explicitly acknowledged that Ray's rounding was the right rounding to make, whereas Said was doing the opposite and pretending that swapping "due to" and "for" had somehow changed the meaning in a way that made the paraphrase invalid.) You being like "Stop using phrases that meticulously track uncommon distinctions you've made; we already have perfectly good phrases that ignore those distinctions!" is not the flex you seem to think it is; color blindness is not a virtue.

DirectedEvolution2y2818

On reflection, I do think both Duncan and Said are demonstrating a significant amount of hair-splitting and less consistent, clear communication than they seem to think. That's not necessarily bad in and of itself - LW can be a place for making fine distinctions and working out unclear thoughts, when there's something important there.

It's really just using them as the basis for a callout and fuel for an endless escalation-spiral when they become problematic.

When I think about this situation from both Duncan and Said's point of views to the best of my abili... (read more)

Moderation notes re: recent Said/Duncan threads

anonymousaisafety2y-212

Yes, I have read your posts.

I note that in none of them did you take any part of the responsibility for escalating the disagreement to its current level of toxicity.

You have instead pointed out Said's actions, and Said's behavior, and the moderators lack of action, and how people "skim social points off the top", etc.

2Duncan Sabien (Deactivated)2y

*shrug You're an anonymous commenter who's been here for a year sniping from the sidelines who has shown that they're willing to misrepresent comments that are literally visible on this same page, and then, when I point that out, ignore it completely and reiterate your beef. I think Ray wants me to say "strong downvote and I won't engage any further."

DirectedEvolution2y1124

Anonymousaisafety, with respect, and acknowledging there's a bit of the pot calling the kettle black intrinsic in my comment here, I think your comments in this thread are also functioning to escalate the conflict, as was clone of saturn's top-level comment.

The things your comments are doing that seem to me escalatory include making an initially inaccurate criticism of Duncan ("your continued statements on this thread that you've done nothing wrong"), followed by a renewed criticism of Duncan that doesn't contain even a brief acknowledgement or apology for... (read more)

Moderation notes re: recent Said/Duncan threads

anonymousaisafety2y4735

@Duncan_Sabien I didn't actually upvote @clone of saturn's post, but when I read it, I found myself agreeing with it.

I've read a lot of your posts over the past few days because of this disagreement. My most charitable description of what I've read would be "spirited" and "passionate".

You strongly believe in a particular set of norms and want to teach everyone else. You welcome the feedback from your peers and excitedly embrace it, insofar as the dot product between a high-dimensional vector describing your norms and a similar vector describing the critici... (read more)

2Duncan Sabien (Deactivated)2y

This is literally false; it is objectively the case that no such statement exists. Here are all the comments I've left on this thread up to this point, none of which says or strongly implies "I've done nothing wrong." Some of them note that behavior that might seem disproportionate has additional causes upstream of it, that other people seem to me to be discounting, but that's not the same as me saying "I've done nothing wrong." This is part of the problem. The actual words matter. The actual facts matter. If you inject into someone's words whatever you feel like, regardless of whether it's there or not, you can believe all sorts of things about e.g. their intentions or character. LessWrong is becoming a place where people don't care to attend to stuff like "what was actually said," and that is something I find alienating, and am trying to pump against. (My actual problem is less "this stuff appears in comments," which it always has, and more "it feels like it gets upvoted to the top more frequently these days," i.e. like the median user cares less than the median user of days past. I don't feel threatened by random strawmanning or random uncharitableness; I feel threatened when it's popular.)

On "aiming for convergence on truth"

anonymousaisafety2y2114

Sometimes when you work at a large tech-focused company, you'll be pulled into a required-but-boring all-day HR meeting to discuss some asinine topic like "communication styles".

If you've had the ~~misfortune~~ fun of attending one of those meetings, you might remember that the topic wasn't about teaching a hypothetically "best" or "optimal" communication style. The goal was to teach employees how to recognize when you're speaking to someone with a different communication style, and then how to tailor your understanding of what they're saying with respect to t... (read more)

Is "Strong Coherence" Anti-Natural?

anonymousaisafety2y51

It seems to me that humans are more coherent and consequentialist than other animals. Humans are not perfectly coherent, but the direction is towards more coherence.

This isn't a universally held view. Someone wrote a fairly compelling argument against it here: https://sohl-dickstein.github.io/2023/03/09/coherence.html

1rotatingpaguro2y

For context: the linked post exposes a well-designed survey of experts about the intelligence and coherence of various entities. The answers show a clear coherence-intelligence anti-correlation. The questions they ask the experts are: Intelligence: Coherence: Of course there's the problem of what are peoples' judgements of "coherence" measuring. In considering possible ways of making the definition more clear, the post says: It seems to me the kind of measure proposed for machine learning systems is at odds with the one for living beings. For ML, it's "robustness to environmental changes". For animals, it's "spending all resources on survival". For organizations, "spending all resources on the stated mission". By the for-ML definition, humans, I'd say, win: they are the best entity at adapting, whatever their goal. By the for-animals definition, humans would lose completely. So these are strongly inconsistent definitions. I think the problem is fixing the goal a priori: you don't get to ask "what is the entity pursuing, actually?", but proclaim "the entity is pursuing survival and reproduction", "the organization is pursuing what it says on paper". Even though they are only speculative definitions, not used in the survey, I think they are evidence of confusion in the mind of who wrote them, and potentially in the survey respondents (alternative hypothesis: sloppiness, "survival+reproduction" was intended for most animals but not humans). So, what did the experts read in the question? Take two entities at opposite ends in the figure: the "single ant" (judged most coherent) and a human (judged least coherent). .............. SINGLE ANT vs. HUMAN ANT: A great heap, sir! I have a simple and clear utility function! Feed my mother the queen! HUMAN: Wait, wait, wait. I bet you would stop feeding your queen as soon as I put you somewhere else. It's not utility, it's just learned patterns of behavior. ANT: Ohi, that's not valid sir! That's cheating! You can do t

The surprising parameter efficiency of vision models

anonymousaisafety2y61

We don't do any of these things for diffusion models that output images, and yet these diffusion models manage to be much smaller than models that output words, while maintaining an even higher level of output quality. What is it about words that makes the task different?

I'm not sure that "even higher level of output quality" is actually true, but I recognize that it can be difficult to judge when an image generation model has succeeded. In particular, I think current image models are fairly bad at specifics in much the same way as early language models.&n... (read more)

The surprising parameter efficiency of vision models

anonymousaisafety2y20

Yes, it's my understanding that OpenAI did this for GPT-4. It's discussed in the system card PDF. They used early versions of GPT-4 to generate synthetic test data and also as an evaluator of GPT-4 responses.

The surprising parameter efficiency of vision models

anonymousaisafety2y101

First, when we say "language model" and then we talk about the capabilities of that model for "standard question answering and factual recall tasks", I worry that we've accidentally moved the goal posts on what a "language model" is.

Originally, a language model was a stochastic parrot. They were developed to answer questions like "given these words, what comes next?" or "given this sentence, with this unreadable word, what is the most likely candidate?" or "what are the most common words?"^[1] It was not a problem that required deep learning.

Then... (read more)

3quanticle2y

Okay, that's all fair, but it still doesn't answer my question. We don't do any of these things for diffusion models that output images, and yet these diffusion models manage to be much smaller than models that output words, while maintaining an even higher level of output quality. What is it about words that makes the task different? Or are you suggesting that image generators could also be greatly improved by training minimal models, and then embedding those models within larger networks?

The surprising parameter efficiency of vision models

anonymousaisafety2y94

I suspect it is a combination of #3 and #5.

Regarding #5 first, I personally think that language models are being trained wrong. We'll get OoM improvements when we stop randomizing the examples we show to models during training, and instead provide examples in a structured curriculum.

This isn't a new thought, e.g. https://arxiv.org/abs/2101.10382

To be clear, I'm not saying that we must present easy examples first and then harder examples later. While that is what has been studied in the literature, I think we'd actually get better behavior by trying to orde... (read more)

1[anonymous]2y

Might feel validated by this - https://arxiv.org/abs/2305.07759

1Nanda Ale2y

Are people doing anything in LLMs like the classic StyleGAN training data bootstrapping pattern? Start with bad data, train a bad model. It's bad but it's still good enough to rank your training data. Now you have better training data. Train a better model. The architecture is different of course, but is there anything analogous?

3quanticle2y

That's a fair criticism, but why would it apply to only language models? We also train visual models with a randomized curriculum, and we seem to get much better results. Why would randomization hurt training efficiency for language generation but not image generation?

The "Outside the Box" Box

anonymousaisafety2y20

I'm reminded of this thread from 2022: https://www.lesswrong.com/posts/27EznPncmCtnpSojH/link-post-on-deference-and-yudkowsky-s-ai-risk-estimates?commentId=SLjkYtCfddvH9j38T#SLjkYtCfddvH9j38T

9Noosphere892y

Even with some disagreements writ how powerful AI can be, I definitely agreee that Eliezer is pretty bad epistemically speaking on anything related to AI or alignment topics, and we should stop treating him as any kind of authority.

Inner and outer alignment decompose one hard problem into two extremely hard problems

anonymousaisafety2y64

I realize that my position might seem increasingly flippant, but I really think it is necessary to acknowledge that you've stated a core assumption as a fact.

Alignment doesn't run on some nega-math that can't be cast as an optimization problem.

I am not saying that the concept of "alignment" is some bizarre meta-physical idea that cannot be approximated by a computer because something something human souls etc, or some other nonsense.

However the assumption that "alignment is representable in math" directly implies "alignment is representable as an optimizat... (read more)

3cfoster02y

You don't even have to go that far. What about, just, regular non-iterative programs? Are type(obj) or json.dump(dict) or resnet50(image) usefully/nontrivially recast as optimization programs? AFAICT there are a ton of things that are made up of normal math/computation and where trying to recast them as optimization problems isn't helpful.

Inner and outer alignment decompose one hard problem into two extremely hard problems

anonymousaisafety2yΩ8112

I wasn't intending for a metaphor of "biomimicry" vs "modernist".

(Claim 1) Wings can't work in space because there's no air. The lack of air is a fundamental reason for why no wing design, no matter how clever it is, will ever solve space travel.

If TurnTrout is right, then the equivalent statement is something like (Claim 2) "reward functions can't solve alignment because alignment isn't maximizing a mathematical function."

The difference between Claim 1 and Claim 2 is that we have a proof of Claim 1, and therefore don't bother debating it anymore, wh... (read more)

2TurnTrout2y

Have you read A shot at the diamond alignment problem? If so, what do you think of it?

2Charlie Steiner2y

Yeah, but on the other hand, I think this is looking for essential differences where they don't exist. I made a comment similar to this on the previous post. It's not like one side is building rockets and the other side is building ornithopters - or one side is advocating building computers out of evilite, while the other side says we should build the computer out of alignmentronium. Alignment doesn't run on some nega-math that can't be cast as an optimization problem. If you look at the example of the value-child who really wants to learn a lot in school, I admit it's a bit tricky to cash this out in terms of optimization. But if the lesson you take from this is "it works because it really wants to succeed, this is a property that cannot be translated as maximizing a mathematical function," then I think that's a drastic overreach.

Inner and outer alignment decompose one hard problem into two extremely hard problems

anonymousaisafety2yΩ123

To some extent, I think it's easy to pooh-pooh finding a flapping wing design (not maximally flappy, merely way better than the best birds) when you're not proposing a specific design for building a flying machine that can go to space. Not in the tone of "how dare you not talk about specifics," but more like "I bet this chemical propulsion direction would have to look more like birds when you get down to brass tacks."

2Charlie Steiner2y

Wait, but surely RL-developed shards that work like human values are the biomimicry approach here, and designing a value learning scheme top-down is the modernist approach. I think this metaphor has its wires crossed.

Eavesdropping on Aliens: A Data Decoding Challenge

anonymousaisafety3y30

(1) The first thing I did when approaching this was think about how the message is actually transmitted. Things like the preamble at the start of the transmission to synchronize clocks, the headers for source & destination, or the parity bits after each byte, or even things like using an inversed parity on the header so that it is possible to distinguish a true header from bytes within a message that look like a header, and even optional checksum calculations.

(2) I then thought about how I would actually represent the data so it wasn't just tradi

anonymousaisafety3y30

My understanding of faul_sname's claim is that for the purpose of this challenge we should treat the alien sensor data output as an original piece of data.

In reality, yes, there is a source image that was used to create the raw data that was then encoded and transmitted. But in the context of the fiction, the raw data is supposed to represent the output of the alien sensor, and the claim is that the decompressor + payload is less than the size of just an ad-hoc gzipping of the output by itself. It's that latter part of the claim that I'm skeptical to

... (read more)

3Mateon13y

I actually did not read the linked thread until now, I came across this post from the front page and thought this was a potentially interesting challenge. Regarding "in the concept of the fiction", I think this piece of data is way too human to be convincing. The noise is effectively a 'gotcha, sprinkle in /dev/random into the data'. (1) is possibly true. At least it's true in this case, although in practice understanding the structure of the data doesn't actually help very much vs some of the best general purpose compressors from the PAQ family. (2) I'd say my decompressor would contain useful info about the structure of the data, or at least the file format itself, however... I feel like this is a bit of a wasted opportunity, you could have chosen a lot of different modalities of data, even something like a stream of data from the IMU sensor in your phone as you walk around the house. You would not need to add any artificial noise, it would already be there in the source data. Modeling that could actually be interesting (if the sample rate on the IMU was high enough for a physics-based model to help). ---------------------------------------- Updates on best results so far: General purpose compression on the original file, using cmix: \time ./cmix -c /ztmp/mystery_file_difficult.bin /ztmp/mystery_file_difficult.cmix Detected block types: DEFAULT: 100.0% 2100086 bytes -> 760584 bytes in 5668.77 s. cross entropy: 2.897 5566.69user 105.07system 1:39:57elapsed 94%CPU (0avgtext+0avgdata 18968788maxresident)k 30749016inputs+5592outputs (3812631major+12008307minor)pagefaults 0swaps Results with knowledge about the contents of the file: https://gist.github.com/mateon1/f4e2b8e3fad338405fa793fb155ebf29 (spoilers). Summary: The best general-purpose method after massaging the structure of the data manages 713248 bytes. The best purpose specific method manages to compress the data, minus headers, to 712439 bytes.

Contest: An Alien Message

anonymousaisafety3y10

I have posted my file here https://www.lesswrong.com/posts/BMDfYGWcsjAKzNXGz/eavesdropping-on-aliens-a-data-decoding-challenge.

I No Longer Believe Intelligence to be "Magical"

anonymousaisafety3y10

I've posted it here https://www.lesswrong.com/posts/BMDfYGWcsjAKzNXGz/eavesdropping-on-aliens-a-data-decoding-challenge.

I No Longer Believe Intelligence to be "Magical"

anonymousaisafety3y10

Which question are we trying to answer?

Is it possible to decode a file that was deliberately constructed to be decoded, without a priori knowledge? This is vaguely what That Alien Message is about, at least in the first part of the post where aliens are sending a message to humanity.
Is it possible to decode a file that has an arbitrary binary schema, without a priori knowledge? This is the discussion point that I've been arguing over with regard to stuff like decoding CAMERA raw formats, or sensor data from a hardware/software system. This is also the area

... (read more)

2faul_sname3y

I'm not sure either one quite captures exactly what I mean, but I think (1) is probably closer than (2), with the caveat that I don't think the file necessarily has to be deliberately constructed to be decoded without a-priori knowledge, but it should be constructed to have as close as possible to a 1:1 mapping between the structure of the process used to capture the data and the structure of the underlying data stream. I notice I am somewhat confused by the inclusion of camera raw formats in (2) rather than in (1) though -- I would expect that moving from a file in camera raw format to a jpeg would move you substantially in the direction from (1) to (2). It sounds like maybe you have something resembling "some sensor data of something unusual in an unconventional but not intentionally obfuscated format"? If so, that sounds pretty much exactly like what I'm looking for. I think it's fine if it's not interesting or fun to decode because nobody can get a handle on the structure -- if that's the case, it will be interesting to see why we are not able to do that, and especially interesting if the file ends up looking like one of the things we would have predicted ahead of time would be decodable.

Murphyjitsu: an Inner Simulator algorithm

anonymousaisafety3y30

It depends on what you mean by "didn't work". The study described is published in a paper only 16 pages long. We can just read it: http://web.mit.edu/curhan/www/docs/Articles/biases/67_J_Personality_and_Social_Psychology_366,_1994.pdf

First, consider the question of, "are these predictions totally useless?" This is an important question because I stand by my claim that the answer of "never" is actually totally useless due to how trivial it is.

Despite the optimistic bias, respondents' best estimates were by no means devoid of information: The predicted compl

... (read more)

1Kenoubi3y

I definitely agree that this is the way to get the most accurate prediction practically possible, and that organizational dysfunction often means this isn't used, even when the organization would be better able to achieve its goals with an accurate prediction. But I also think that depending on the type of project, producing an accurate Gantt chart may take a substantial fraction of the effort (or even a substantial fraction of the wall-clock time) of finishing the entire project, or may not even be possible without already having some of the outputs of the processes earlier in the chart. These aren't necessarily possible to eradicate, so the take-away, I think, is not to be overly optimistic about the possibility of getting accurate schedules, even when there are no ill intentions and all known techniques to make more accurate schedules are used.

1Kenoubi3y

It's interesting that the median of the pessimistic expectations is about equal to the median of the actual results. The mean clearly wasn't, as that discrepancy was literally the point of citing this statistic in the OP: So the estimates were biased, but not median-biased (at least that's what Wikipedia appears to say the terminology is). Less biased than other estimates, though. Of course this assumes we're taking the answer to "how long would it take if everything went as poorly as it possibly could" and interpreting it as the answer to "how long will it actually take", and if students were actually asked after the fact if everything went as poorly as it possibly could, I predict they would mostly say no. And treating the text "if everything went as poorly as it possibly could" as if it wasn't even there is clearly wrong too, because they gave a different (more biased towards optimism) answer if it was omitted. This specific question seems kind of hard to make use of from a first-person perspective. But I guess maybe as a third party one could ask for worst-possible estimates and then treat them as median-unbiased estimators of what will actually happen? Though I also don't know if the median-unbiasedness is a happy accident. (It's not just a happy accident, there's something there, but I don't know whether it would generalize to non-academic projects, projects executed by 3rd parties rather than oneself, money rather than time estimates, etc.) I do still also think there's a question of how motivated the students were to give accurate answers, although I'm not claiming that if properly motivated they would re-invent Murphyjitsu / the pre-mortem / etc. from whole cloth; they'd probably still need to already know about some technique like that and believe it could help get more accurate answers. But even if a technique like that is an available action, it sounds like a lot of work, only worth doing if the output has a lot of value (e.g. if one suspects a substa

Murphyjitsu: an Inner Simulator algorithm

anonymousaisafety3y97

Right. I think I agree with everything you wrote here, but here it is again in my own words:

In communicating with people, the goal isn't to ask a hypothetically "best" question and wonder why people don't understand or don't respond in the "correct" way. The goal is to be understood and to share information and acquire consensus or agree on some negotiation or otherwise accomplish some task.

This means that in real communication with real people, you often need to ask different questions to different people to arrive at the same information, or phrase some ... (read more)

1Kenoubi3y

It didn't work for the students in the study in the OP. That's literally why the OP mentioned it!

Decision theory and dynamic inconsistency

anonymousaisafety3y32

Isn't this identical to the proof for why there's no general algorithm for solving the Halting Problem?

The Halting Problem asks for an algorithm A(S, I) that when given the source code S and input I for another program will report whether S(I) halts (vs run forever).

There is a proof that says A does not exist. There is no general algorithm for determining whether an arbitrary program will halt. "General" and "arbitrary" are important keywords because it's trivial to consider specific algorithms and specific programs and say, yes, we can determine that this... (read more)

Murphyjitsu: an Inner Simulator algorithm

anonymousaisafety3y60

If we look at the student answers, they were off by ~7 days, or about a 14% error from the actual completion time.

The only way I can interpret your post is that you're suggesting all of these students should have answered "never".

I'm not convinced that "never" just didn't occur to them because they were insufficiently motivated to give a correct answer.

How far off is "never" from the true answer of 55.5 days?

It's about infinitely far off. It is an infinitely wrong answer. Even if a project ran 1000% over every worst-case pessimistic schedule, any finite pr... (read more)

2Kenoubi3y

Yes, given that question, IMO they should have answered "never". 55.5 days isn't the true answer, because in reality everything didn't go as poorly as possible. You're right, it's a bad question that a brick wall would do a better job of answering correctly than a human who's trying to be helpful. The answer to your question is useful, but not because of the number. "What could go wrong to make this take longer than expected?" would elicit the same useful information without spuriously forcing a meaningless number to be produced.

Murphyjitsu: an Inner Simulator algorithm

anonymousaisafety3y64

Is the concept of "murphyjitsu" supposed to be different than the common exercise known as a premortem in traditional project management? Or is this just the same idea, but rediscovered under a different name, exactly like how what this community calls a "double crux" is just the evaporating cloud, which was first described in the 90s.

If you've heard of a postmortem or possibly even a retrospective, then it's easy to guess what a premortem is. I cannot say the same for "murphyjitsu".

I see that premortem is even referenced in the "further resour... (read more)

Valentine3y*193

I invented the term. I can speak to this.

For one thing, I think I hadn't heard of the premortem when I created the term "murphyjitsu" and the basic technique. I do think there's a slight difference, but it's minor enough that had I known about premortems then I might have just used that term.

Murphyjitsu showed up as a cute name for a process I had created to pragmatically counter planning fallacy thinking in my own mind. Part of the inspiration was from when Anna and Eliezer had created the "sunk cost kata", which was more like a bundle of mental tricks fo... (read more)

I No Longer Believe Intelligence to be "Magical"

anonymousaisafety3y10

The core problem remains computational complexity.

Statements like "does this image look reasonable" or saying "you pay attention to regularities in the data", or "find the resolution by searching all possible resolutions" are all hiding high computational costs behind short English descriptions.

Let's consider the case of a 1280x720 pixel image.
That's the same as 921600 pixels.

How many bytes is that?

It depends. How many bytes per pixel?^[1] In my post, I explained there could be 1-byte-per-pixel grayscale, or perhaps 3-bytes-per-pixel RGB us... (read more)

3awenonian3y

So, I want to note a few things. The original Eliezer post was intended to argue against this line of reasoning: He didn't worry about compute, because that's not a barrier on the theoretical limit. And in his story, the entire human civilization had decades to work on this problem. But you're right, in a practical world, compute is important. I feel like you're trying to make this take as much compute as possible. Since you talked about headers, I feel like I need to reiterate that, when we are talking to a neural network, we do not add the extra data. The goal is to communicate with the neural network, so we intentionally put it in easier to understand formats. In the practical cases for this to come up (e.g. a nascent superintelligence figuring out physics faster than we expect), we probably will also be inputting data in an easy to understand format. Similarly, I expect you don't need to check every possible esoteric format. The likelihood of the image using an encoding like 61 bits per pixel, with 2 for red, 54 for green and 5 for blue is just, very low, a priori. I do admit I'm not sure if only using "reasonable" formats would cut down the possibilities into the computable realm (obviously depends on definitions of reasonable, though part of me feels like you could (with a lot of work) actually have an objective likeliness score of various encodings). But certainly it's a lot harder to say that it isn't than just saying "f(x) = (63 pick x), grows very fast." Though, since I don't have a good sense for whether "reasonable" ones would be a more computable number, I should update in your direction. (I tried to look into something sort of analogous, and the most common 200 passwords cover a little over 4% of all used passwords, which, isn't large enough for me to feel comfortable expecting that the most "likely" 1,000 formats would cover a significant quantity of the probability space, or anything.) (Also potentially important. Modern neural nets don't re

Contest: An Alien Message

anonymousaisafety3y10

Why do you say that Kolmogorov complexity isn't the right measure?

most uniformly sampled programs of equal KC that produce a string of equal length.
...

"typical" program with this KC.

I am worried that you might have this backwards?

Kolmogorov complexity describes the output, not the program. The output file has low Kolmogorov complexity because there exists a short computer program to describe it.

2Rafael Harth3y

Let me rephrase then: most strings of same length and same KC will have much more intuitively complex programs-of-minimum-length that generated them. Why is the program intuitively simple? Well for example, suppose you have the following code in the else block: a.append(min(max(int(i+a[i]),2),i) i += 1 I think the resulting program has lower length (so whatever string it generates has lower KC), but it would be way harder to find. And I bet that a uniformly sampled string with the same KC will have a program that looks much worse than that. The kinds or programs that look normal to you or me are a heavily skewed sample, and this program does look reasonably normal.

Contest: An Alien Message

anonymousaisafety3y1215

I have mixed thoughts on this.

I was delighted to see someone else put forth an challenge, and impressed with the amount of people who took it up.

I'm disappointed though that the file used a trivial encoding. When I first saw the comments suggesting it was just all doubles, I was really hoping that it wouldn't turn out to be that.

I think maybe where the disconnect is occurring is that in the original That Alien Message post, the story starts with aliens deliberately sending a message to humanity to decode, as this thread did here. It is explicitly described... (read more)

1blf3y

Your interlocutor in the other thread seemed to suggest that they were busy until mid-July or so. Perhaps you could take this into account when posting. I agree that IEEE754 doubles was quite an unrealistic choice, and too easy. However, the other extreme of having a binary blob with no structure at all being manifest seems like it would not make for an interesting challenge. Ideally, there should be several layers of structure to be understood, like in the example of a "picture of an apple", where understanding the file encoding is not the only thing one can do.

Contest: An Alien Message

anonymousaisafety3y30

https://en.wikipedia.org/wiki/Kolmogorov_complexity

The fact that the program is so short indicates that the solution is simple. A complex solution would require a much longer program to specify it.

5Rafael Harth3y

Kolmogorov Complexity isn't the right measure. I'm pretty sure this program is way simpler (according to you or me) than most uniformly sampled programs of equal KC that produce a string of equal length. And i wouldn't consider it short given how hard it would be to find a "typical" program with this KC.

Air Conditioner Repair

anonymousaisafety3y22

I gave this post a strong disagree.

Contest: An Alien Message

anonymousaisafety3y30

Some thoughts for people looking at this:

It's common for binary schemas to distinguish between headers and data. There could be a single header at the start of the file, or there could be multiple headers throughout the file with data following each header.
There's often checksums on the header, and sometimes on the data too. It's common for the checksums to follow the respective thing being checksummed, i.e. the last bytes of the header are a checksum, or the last bytes after the data are a checksum. 16-bit and 32-bit CRCs are common.
If the data represents

... (read more)

Alexander's Shortform

anonymousaisafety3y10

You should add computational complexity.

Air Conditioner Test Results & Discussion

anonymousaisafety3y10

I’m not sure if your comment is disagreeing with any of this. It sounds like we’re on the same page about the fact that exact reasoning is prohibitively costly, and so you will be reasoning approximately, will often miss things, etc.

I agree. The term I've heard to describe this state is "violent agreement".

so in practice wrong conclusions are almost always due to a combination of both "not knowing enough" and “not thinking hard enough” / “not being smart enough.”

The only thing I was trying to point out (maybe more so for everyone else reading the com... (read more)

Conor Sullivan's Shortform

anonymousaisafety3y20

First, it only targeted Windows machines running an Microsoft SQL Server reachable via the public internet. I would not be surprised if ~70% or more theoretically reachable targets were not infected because they ran some other OS (e.g. Linux) or server software instead (e.g. MySQL). This page makes me think the market share was actually more like 15%, so 85% of servers were not impacted. By not impacted, I mean, "not actively contributing to the spread of the worm". They were however impacted by the denial-of-service caused by traffic from infected servers... (read more)

1Lone Pine3y

Thanks, very informative.

Air Conditioner Test Results & Discussion

anonymousaisafety3y40

This was actually a kind of fun test case for a priori reasoning. I think that I should have been able to notice the consideration denkenbgerger raised, but I didn't think of it. In fact when I stared reading his comment my immediate reaction was "this methodology is so simple, how could the equilibrium infiltration rate end up being relevant?" My guess would be that my a priori reasoning about AI is wrong in tons of similar ways even in "simple" cases. (Though obviously the whole complexity scale is shifted up a lot, since I've spent hundreds of hours thi

... (read more)

7paulfchristiano3y

My take is: you shouldn’t expect to get everything right when you try to reason about a moderately complicated system abstractly, no matter how smart you are. You’d like to have a lot of practice so that you can do your best, can get a sense for what kinds of things you tend to miss and how they change the bottom line, can better understand what the returns to thinking are typically like, and so on. This was a fun and unusually self-contained example, where we happened to miss an important and very clean consideration that can be appreciated with very little domain knowledge. (I think realistic cases are usually much more of a mess.) In this case, I feel pretty confident that I would have noticed this consideration if I thought about the question for a few hours (and probably less), and I think that it would become obvious if you tried to write out your reasoning sufficiently carefully. But even if I spend hundreds of hours thinking about some issue with AI, I expect to miss all kinds of important and obvious-in-retrospect considerations in a roughly analogous way. (This is related to my view that verification is easier than generation.) I don’t think that means we shouldn’t try to figure things out by thinking about them. Thinking about what’s going on is an important part of how to get to correct answers quickly and an important complement of empirical data (you need to think when empirical data is hard to come by, to help interpret history and the results of experiment, to prioritize experimentation, etc.). I’m not sure if your comment is disagreeing with any of this. It sounds like we’re on the same page about the fact that exact reasoning is prohibitively costly, and so you will be reasoning approximately, will often miss things, etc. Of course, I think even if you successfully notice every on-paper consideration, there are still likely to be messy facts about the real world that you either didn’t know or obviously had no hope of capturing in a model that’s

Let's See You Write That Corrigibility Tag

anonymousaisafety3y40

I deliberately tried to focus on "external" safety features because I assumed everyone else was going to follow the task-as-directed and give a list of "internal" safety features. I figured that I would just wait until I could signal-boost my preferred list of "internal" safety features, and I'm happy to do so now -- I think Lauro Langosco's list here is excellent and captures my own intuition for what I'd expect from a minimally useful AGI, and that list does so in probably a clearer / easier to read manner than what I would have written. It's very simila... (read more)

Air Conditioner Test Results & Discussion

anonymousaisafety3y190

We can even consider the proposed plan (add a 2nd hose and increase the price by $20) in the context of an actual company.

The proposed plan does not actually redesign the AC unit around the fact that we now have 2 hoses. It is "just" adding an additional hose.

Let's assume that the distribution of AC unit cooling effectively looks something like this graphic that I made in 3 seconds.

In this image, we are choosing to assume that yes, in fact, 2-hose units are more efficient on average than a 1-hose unit. We are also recognizing that perhaps there is so... (read more)

2Eli Tyre3y

This is a great comment. The graphs helped a lot.

Daphne_W3y170

As a concrete example of rational one-hosing, here in the Netherlands it rarely gets hot enough that ACs are necessary, but when it does a bunch of elderly people die of heat stroke. Thus, ACs are expected to run only several days per year (so efficiency concerns are negligible), but having one can save your life.

I checked the biggest Dutch-only consumer-facing online retailer for various goods (bol.com). Unfortunately I looked before making a prediction for how many one-hose vs two-hose models they sell, but even conditional on me choosing to make a point... (read more)

Air Conditioner Test Results & Discussion

anonymousaisafety3y10

I didn't even think to check this math, but now that I've gone and tried to calculate it myself, here's what I got:

		INSIDE	ΔINSIDE (CONTROL)
AVERAGE OUTSIDE	86.5
AVERAGE ONE HOSE Δ	19.65	66.85	6.55
AVERAGE TWO HOSE Δ	22.45	64.05	9.35
CONTROL Δ	13.1	73.4

			1.42	ΔTWO/ΔONE

EDIT: I see the issue. The parent post says that the control test was done at evening, where the temperature was 82 F. So it's not even comparable at all, imo.

5johnswentworth3y

+1 to this criticism, that's a very valid problem which people should indeed be suspicious about, although "not even comparable at all" is overstating it (especially since we know what direction that problem should push).

Air Conditioner Test Results & Discussion

anonymousaisafety3y10

I'll edit the range, and note that "uncomfortably hot" is my opinion. Rest of my analysis / rant still applies. In fact, in your case, you don't need need the AC unit at all, since you'd be fine with the control temperature.

Air Conditioner Test Results & Discussion

anonymousaisafety3y*830

I take fault with your primary conclusion, for the same reasons I gave in the first thread:

You claim how little adding a 2nd hose would impact the system, without analyzing the actual constraints that apply to engineers building a product that must be shipped & distributed
You still neglect the existence of insulating wraps for the hose which do improve efficiency, but are also not sold with the single-hose AC system, which lends evidence to my first point -- companies are aware of small cost items that improve AC system efficiency, but do not include t

... (read more)

4Eli Tyre3y

I just want to say that I found this comment personally helpful. Something about this seems on point to me. Rationalists, in general, are much more likely to be mathematicians, than (for instance) mechanical engineers. It does seem right to me that when I look around, I see people drawn to abstract analyses, very plausibly at the expense of neglecting contextualized details that are crucial for making good calls. This seems like it could very well be a bias of my culture. For instance, it's fun and popular to talk about civilizational inadequacy, or how the world is mad. I think that is pointing at something true and important, but I wonder how much of that is basically overlooking the fact that it is hard to do things in the real world with a bunch of different stakeholders and a confusing mistakes. In a lot of cases, civilizational inadequacy can be the result of engineers (broadly construed) who understand that "the perfect is the enemy of the good", pushing projects through to completion anyway. The outcome is sometimes so muddled to be worse than having done nothing, but also, shipping things under constraints, even though they could be much better on some axes is how civilization runs. Anyway, this makes me think that I should attempt to do more engineering projects, or otherwise find ways to operate in domains where the goal is to get "good enough", within a bunch of not-always crisply-defined constraints.

2Dustin3y

First of all, I agree with the gist of your comment. I...do not agree. I keep my room temperature 72-74. Going from first four google results for "what is comfortable room temperature": WHO according to wikipedia: 64-75 www.cielowigle.com: 68-72 www.vivint.com: 68-76 www.provicincialheating.ca: 68-76 Seems like both of our preferred temperatures are consistent with "normal human being".

johnswentworth3y150

... companies are aware of small cost items that improve AC system efficiency, but do not include them with the AC by default, suggesting that there is an actual price point / consumer market / confounding issue at play that prevents them doing so

Or it suggests that consumers would mostly not notice the difference in a way which meaningfully increased sales, just like I claim happens with the single-hose vs two-hose issue. For instance, I believe an insulating wrap would not change the SEER rating (because IIRC the rating measurements don't involve the hos... (read more)

Let's See You Write That Corrigibility Tag

anonymousaisafety3y60

I really like this list because it does a great job of explicitly specifying the same behavior I was trying to vaguely gesture at in my list when I kept referring to AGI-as-a-contract-engineer.

Even your point about it doesn't have to succeed, it's ok for it to fail at a task if it can't reach it in some obvious, non-insane way -- that's what I'd expect from a contractor. The idea that an AGI would find that a task is generally impossible but identify a novel edge case that allows it to be accomplished with some ridiculous solution involving nanotech and th... (read more)

5Steven Byrnes3y

I think your “contractor” analogy is sneaking in an assumption: The plan proposed by the contractor might or might not be dangerous. But the things that the contractor does in the course of coming up with the plan are definitely safe. Examples of such things include “brainstorming possible plans”, “thinking about how the plan could go wrong and how it could be improved”, “reading books or other reference material”, etc. So the problem arises that: 1. The contractor has to do at least some of those things with no human in the loop, otherwise the human is doing everything and there’s no point in having the contractor at all. 2. In order for the contractor to actually successfully make a good plan, it presumably needs to “want” to create a good plan, at least beyond a certain level of how innovative the plan is. (That’s what I believe anyway, see for example my discussion of “RL-on-thoughts” here.) 3. The fact of the matter is: escaping from the box would be helpful for the contractor’s creating a good plan—for example, it could then access more books and computational power etc. 4. If the contractor (A) knows or figures out fact #3 above, and (B) can do means-end reasoning [which is expected, otherwise it would suck at making innovative plans], (C) “wants” to create a good plan as per #2 above—then we will wind up in a situation where the contractor “wants” to escape from the box. (And by the same token, it will “want” to gain power in other ways, including by deploying nanotechnology or whatever, and to be deceptive, etc.) 5. Since this is all taking place within the coming-up-with-a-plan part of the story, not the submitting-a-plan part of the story, it’s mostly human-out-of-the-loop (per #1 above), and thus the contractor will by default try to escape from the box and do all those other dangerous power-seeking things without asking for human permission. Sorry if I’m missing the point of what you were saying there. I guess maybe you’d respond “the AI won’

Pivotal outcomes and pivotal processes

anonymousaisafety3y210

Oh, sorry, you're referring to this:

includes a distributed network of non-nuclear electromagnetic pulse emitters that will physically shut down any tech infrastructure appearing to be running rogue AI agents.

This just seems like one of those things people say, in the same vein as "melt all of the GPUs". I think that non-nuclear EMPs are still based on chemical warheads. I don't know if a "pulse emitter" is a thing that someone could build. Like I think what this sentence actually says is equivalent to saying

includes a distributed network of non-nucle

... (read more)

3TekhneMakre3y

Thank you! I appreciate your thoughtful reply, and learned stuff. (This could IMO be a top-level post, maybe, IDK. Like "So you want to built an AI off switch...".) > I hope it is clear that the problem isn't about complexity of a system, It's clear that the complexity of a system provides an upper bound on hackability, but doesn't provibe a lower bound: a really simple system isn't hackable, while a complex (big, Turing complete, etc.) system might or might not be hackable depending on what code is running on it and what inputs it gets. We get lower bounds on hackability by showing ways that the attacker can affect the machine. > or if the system supports software updates. Hm. It still seems like actually getting software updates is a big security hole (as opposed to merely hypothetically supporting updates). If you don't update the software, and you already trust it to not be hackable, then it should stay unhackable. If you do update the software, you could introduce new bugs, or whoever sends the updates could have been compromised and the "update" could be malware. You're increasing your surface area; the rogue AI could hack the software devs who push the code that's supposed to use the latest new AI transparency tech to detect rogue AIs. As you point out, there's a good chance that detecting rogue AIs is at best a very difficult task; this puts pressure to use adaptive methods, methods that you keep updating (e.g. you keep updating with a detector AI that's been further improved / trained). So it seems like there's a quadrilemma: either 1. your rogue AI detection methods are imprecise and are quick to pull the trigger, in which case you just immediately shut down all computers; or 2. your methods are imprecise and reluctant to pull the trigger, in which case a rogue AI can probably go unnoticed; or 3. your methods are precise, but you don't update the watchdogs, meaning you somehow got this difficult task right with no further corrections after a point,

Pivotal outcomes and pivotal processes

anonymousaisafety3y30

Would you agree that if there were a system that could automatically (without humans) monitor for rogue AI, then that system is probably hackable?

I can't comment on this, because I have no idea what this system would even do, or what it would be looking for, or who would be running it.

What indicates a rogue AI? How is that differentiated from human hackers? How is that different from state-level actors like the NSA? How is it different from some white hat security researchers doing pen-testing in the wild for questionable reasons? What "behavior" is ... (read more)

2TekhneMakre3y

The question here (in this subthread) is whether we could make a system that would shut down computers There's a question of what such a system could look like. Presumably humans would be involved. I was pretty imprecise above; what I meant to ask was: Suppose there's a system that can shut down computers running rogue AIs. If this actually stops rogue AI, that's because there's a vast amount of monitoring going on. The bulk of that vast amount of monitoring can't be borne by humans. So there's a lot of monitoring that's being done by computers. Those computers are doing something seemingly pretty non-trivial, i.e. detecting AI / rogue AI / hacking (like, we don't currently know how to do it). Because they're doing something pretty non-trivial, they probably have to be big complex systems. Because they're big complex systems, they're hackable. Does this sound right to you? I'm mostly asking you about the step "detecting rogue AI implies hackable". Or to expand the question, for what tasks XYZ can you feasibly design a system that does XYZ, but is really seriously not hackable even by a significantly superhuman hacker? Clearly with some work you could make a perfect tic-tac-toe player that simply is not hackable without physically tampering with it. It seems like you could not feasibly make an unhackable system that takes a bunch of inputs from another (unsafe) system and processes them in a bunch of complex ways using software that someone is constantly updating, because having the ability to update to the latest Detect-O-Matic-v3.4 without knowing in advance what sort of thing the Detect-O-Matic is, beyond that it's software, seems to imply being Turing-completely programmable, which seems to imply being hackable. But maybe that's too vague an analysis... Thoughts? Yeah, hack it to not raise the alarm during the time period where the AI can be shut off by the EMP being activated. Then the AI disables some EMPs, or arranges or finds some computers to be not in

Pivotal outcomes and pivotal processes

anonymousaisafety3y106

I don't think that this TL;DR is particularly helpful.

People think attacks like Rowhammer are viable because security researchers keep releasing papers that say the attacks are viable.

If I posted 1 sentence and said "Rowhammer has too many limitations for it to be usable by an attacker", I'd be given 30 links to papers with different security researchers all making grandiose claims about how Rowhammer is totally a viable attack, which is why 8 years after the discovery of Rowhammer we've had dozens of security researchers reproduce the attack and 0 attacks... (read more)

5Noosphere893y

Thanks, I'll retract that comment.

Pivotal outcomes and pivotal processes

anonymousaisafety3y153

I am only replying to the part of this post about hardware vulnerabilities.

Like, superhuman-at-security AGIs rewrote the systems to be formally unhackable even taking into account hardware vulnerabilities like Rowhammer that violate the logical chip invariants?

There are dozens of hardware vulnerabilities that exist primarily to pad security researcher's bibliographies.

Rowhammer, like all of these vulnerabilities, is viable if and only if the following conditions are met:

You know the exact target hardware.
You also know the exact target software, like the OS

... (read more)

6alyssavance3y

This is an awesome comment, I think it would be great to make it a top-level post. There's a Facebook group called "Information Security in Effective Altruism" that might also be interested

2TekhneMakre3y

Would you agree that if there were a system that could automatically (without humans) monitor for rogue AI, then that system is probably hackable? (Because it has to take many inputs from the world, and has to be a general computer, not a tiny / hardwired thing.)

-13Noosphere893y

Let's See You Write That Corrigibility Tag

anonymousaisafety3y254

I worry that the question as posed is already assuming a structure for the solution -- "the sort of principles you'd build into a Bounded Thing meant to carry out some single task or task-class and not destroy the world by doing it".

When I read that, I understand it to be describing the type of behavior or internal logic that you'd expect from an "aligned" AGI. Since I disagree that the concept of "aligning" an AGI even makes sense, it's a bit difficult for me to reply on those grounds. But I'll try to reply anyway, based on what I think is reasonable for ... (read more)

4alyssavance3y

Thanks for writing this! I think it's a great list; it's orthogonal to some other lists, which I think also have important stuff this doesn't include, but in this case orthogonality is super valuable because that way you're less likely for all lists to miss something.

[Link-post] On Deference and Yudkowsky's AI Risk Estimates

anonymousaisafety3y620

Only if we pretend that it's an unknowable question and that there's no way to look at the limitations of a 286 by asking about how much data it can reasonably process over a timescale that is relevant to some hypothetical human-capable task.

http://datasheets.chipdb.org/Intel/x86/286/datashts/intel-80286.pdf

The relevant question here is about data transfers (bus speed) and arithmetic operations (instruction sets). Let's assume the fastest 286 listed in this datasheet -- 12.5 MHz.

Let's consider a very basic task -- say, catching a ball thrown from 10-15 fee... (read more)

1anithite2y

As a human engineer who has done applied classical (IE:non-AI, you write the algorithms yourself) computer vision. That's not a good lower bound. Image processing was a thing before computers were fast. Here's a 1985 paper talking about tomato sorting. Anything involving a kernel applied over the entire image is way too slow. All the algorithms are pixel level. Note that this is a fairly easy problem if only because once you know what you're looking for, it's pretty easy to find it thanks to the court being not too noisy. An O(N) algorithm is iffy at these speeds. Applying a 3x3 kernel to the image won't work. So let's cut down on the amount of work to do. Look at only 1 out of every 16 pixels to start with. Here's an (80*60) pixel image formed by sampling one pixel in every 4x4 square of the original. The closer player is easy to identify. Remember that we still have all the original image pixels. If there's a potentially interesting feature (like the player further away), we can look at some of the pixels we're ignoring to double check. Cropping is very simple, once you find the player that's serving, focus on that rectangle in later images. I've done exactly this to get CV code that was 8FPS@100%CPU down to 30FPS@5%. Once you know where a thing is, tracking it from frame to frame is much easier. Concretely, the computer needs to: 1. locate the player serving and their hands/ball (requires looking at whole image) 2. track the player's arm/hand movements pre-serve 3. track the ball and racket during toss into the air 4. track the ball after impact with the racket 5. continue ball tracking Only step 1 requires looking at the whole image. And there, only to get an idea of what's around you. Once the player is identified, crop to them and maintain focus. If the camera/robot is mobile, also glance at fixed landmarks (court lines, net posts/net/fences) to do position tracking. If we assume the 286 is interfacing with a modern high resolution image sensor

1Lone Pine3y

Is there even a ball in that image? Is it in the background person's hand?

4lc3y

I concede! I concede!

[Link-post] On Deference and Yudkowsky's AI Risk Estimates

anonymousaisafety3y290

the rest of the field has come to regard Eliezer as largely correct

It seems possible to me that you're witnessing a selection bias where the part of the field who disagree with Eliezer don't generally bother to engage with him, or with communities around him.

It's possible to agree on ideas like "it is possible to create agent AGI" and "given the right preconditions, AGI could destroy a sizeable fraction of the human race", while at the same time disagreeing with nearly all of Eliezer's beliefs or claims on that same topic.

That in turn would lead to d... (read more)

2lc3y

Heh, I removed that line inbetween when you starting and finishing your comment. I don't think it's very clear to me what a super-intelligence could do on a 286, and don't think you should be confident in assuming he's wrong. "Or maybe not, I don't know" seems very appropriate

[Link] "The madness of reduced medical diagnostics" by Dynomight

anonymousaisafety3y30

Then I'm not sure what our disagreement is.

I gave the example of a Kalman filter in my other post. A Kalman filter is similar to recursive Bayesian estimation. It's computationally intensive to run for an arbitrary number of values due to how it scales in complexity. If you have a faster algorithm for doing this, then you can revolutionize the field of autonomous systems + self-driving vehicles + robotics + etc.

The fact that "in principle" information provides value doesn't matter, because the very example you gave of "updating belief networks" is ex... (read more)

0Kenny3y

You're not wrong but you're like deliberately missing the point! You even admit the point: Yes, the point was just that 'in principle', any information provides value. I think maybe what's missing is that the 'in principle point' is deliberately, to make the point 'sharper', ignoring costs, which are, by the time you have used some information, also 'sunk costs'. The point is not that there are no costs or that the total value of benefits always exceeds the corresponding total anti-value of costs. The 'info profit' is not always positive! The point is that the benefits are always (strictly) positive – in principle.

[Link] "The madness of reduced medical diagnostics" by Dynomight

anonymousaisafety3y20

Are you ignoring the cost of computation to use that information, as I explained here then?

1Kenny3y

Nope! That's a cost to use or process the information. There's still a value of the information itself – or at least it seems to me like there is, if only in principle – even after it's been parsed/processed and is ready to 'use', e.g. for reasoning, updating belief networks, etc..

I No Longer Believe Intelligence to be "Magical"

anonymousaisafety3y10

I was describing a file that would fit your criteria but not be useful. I was explaining in bullet points all of the reasons why that file can't be decoded without external knowledge.

I think that you understood the point though, with your example of data from the Hubble Space Telescope. One caveat: I want to be clear that the file does not have to be all zeroes. All zeroes would violate your criteria that the data cannot be compressed to less than 10% of it's uncompressed size, since all zeroes can be trivially run-length-encoded.

But let's look at this any... (read more)

2faul_sname3y

BTW I see that someone did a very minimal version (with an array of floating point numbers generated by a pretty simple process) of this test, but I'm still up for doing a fuller test -- my schedule now is a lot more clear than it was last month.