I tweeted about something a lot like this
https://xcancel.com/robertskmiles/status/1877486270143934881
FYI, Relative URLs don't work in emails, the email version I received has all the links going to http://w/<post-title>
and thus broken
You understand you can just block her on Reddit and Facebook, and move on with your life?
(I am not a huge fan of this post, but I think it's reasonable for people to care about how society orients towards x-risk and AI concerns, and as such to actively want to not screen off evidence, and take responsibility for what people affiliated with you say on the internet. So I don't think this is great advice.
I am actively subscribed to lots of people who I expect to say wrong and dumb things, because it's important to me that I correct people and avoid misunderstandings, especially when someone might mistake my opinion for the opinion of the people saying dumb stuff)
Dang, I missed this. Here's my audition for 500 Million though, I guess for next year
Very interesting! I think this is one of the rare times where I feel like a post would benefit from an up-front Definition. What actually is Leakage, by intensional definition?
"Using information during Training and/or Evaluation of models which wouldn't be available in Deployment."
. . . I'll edit that into the start of the post.
This is one technical point that younger people are often amazed to hear, that for a long time the overwhelming majority of TV broadcast was perfectly ephemeral, producing no records at all. Not just that the original copies were lost or never digitised or impossible to track down or whatever, but that nothing of the sort ever existed. The technology for capturing, broadcasting, and displaying a TV signal is so much easier than the tech for recording one, that there were several decades when the only recordings of TV came from someone setting up a literal ...
What about NMR or XRF? XRF can non-destructively tell you the elemental composition of a sample, which (if the sample is pure) can often pin down the compound, and NMR spectroscopy is also non destructive and can give you some info about chemical structure too
This is an interesting post!
I'm new to alignment research - any tips on how to prove what the inner goal actually is?
Haha! haaaaa 😢
Not least being the military implications. If you have widely available tech that lets you quickly and cheaply accelerate something car-sized to a velocity of Mach Fuck (they're meant to circle the earth in 4.2 hours, making them 2 or 3 times faster than a rifle bullet), that's certainly a dual use technology.
Covid was a big learning experience for me, but I'd like to think about more than one example. Covid is interesting because, compared to my examples of birth control and animal-free meat, it seems like with covid humanity smashed the technical problem out of the park, but still overall failed by my lights because of the political situation.
How likely does it seem that we could get full marks on solving alignment but still fail due to politics? I tend to think of building a properly aligned AGI as a straightforward win condition, but that's not a very deeply considered view. I guess we could solve it on a whiteboard somewhere but for political reasons it doesn't get implemented in time?
I think almost all of these are things that I'd only think after I'd already noticed confusion, and most are things I'd never say in my head anyway. A little way into the list I thought "Wait, did he just ask ChatGPT for different ways to say "I'm confused"?".
I expect there are things that pop up in my inner monologue when I'm confused about something, that I wouldn't notice, and it would be very useful to have a list of such phrases, but your list contains ~none of them.
Edit: Actually the last three are reasonable. Are they human written?
One way of framing the difficulty with the lanternflies thing is that the question straddles the is-ought gap. It decomposes pretty cleanly into two questions: "What states of the universe are likely to result from me killing vs not killing lanternflies" (about which Bayes Rule fully applies and is enormously useful), and "Which states of the universe do I prefer?", where the only evidence you have will come from things like introspection about your own moral intuitions and values. Your values are also a fact about the universe, because you are part of the...
I've always thought of it like, it doesn't rely on the universe being computable, just on the universe having a computable approximation. So if the universe is computable, SI does perfectly, if it's not, SI does as well as any algorithm could hope to.
A slightly surreal experience to read a post saying something I was just tweeting about, written by a username that could plausibly be mine.
Do we even need a whole new term for this? Why not "Sudden Deceptive Alignment"?
I think in some significant subset of such situations, almost everyone present is aware of the problem, so you don't always have to describe the problem yourself or explicitly propose solutions (which can seem weird from a power dynamics perspective). Sometimes just drawing the group's attention to the meta level at all, initiating a meta-discussion, is sufficient to allow the group to fix the problem.
This is good and interesting. Various things to address, but I only have time for a couple at random.
I disagree with the idea that true things necessarily have explanations that are both convincing and short. In my experience you can give a short explanation that doesn't address everyone's reasonable objections, or a very long one that does, or something in between. If you understand some specific point about cutting edge research, you should be able to properly explain it to a lay person, but by the time you're done they won't be a lay person any more! If...
Are we not already doing this? I thought we were already doing this. See for example this talk I gave in 2018
https://youtu.be/pYXy-A4siMw?t=35
I guess we can't be doing it very well though
Structured time boxes seem very suboptimal, steamrollering is easy enough to deal with by a moderator "Ok let's pause there for X to respond to that point"
This would make a great YouTube series
Edit: I think I'm going to make this a YouTube series
Other tokens that require modelling more than a human:
Compare with this from Meditations on Moloch:
...Imagine a country with two rules: first, every person must spend eight hours a day giving themselves strong electric shocks. Second, if anyone fails to follow a rule (including this one), or speaks out against it, or fails to enforce it, all citizens must unite to kill that person. Suppose these rules were well-enough established by tradition that everyone expected them to be enforced. So you shock yourself for eight hours a day, because you know if you don’t everyone else will kill you, because if they don’t, e
The historical trends thing is prone to standard reference class tennis. Arguments like "Every civilization has collapsed, why would ours be special? Something will destroy civilisation, how likely is it that it's AI?". Or "almost every species has gone extinct. Something will wipe us out, could it be AI?". Or even "Every species in the genus homo has been wiped out, and the overwhelmingly most common cause is 'another species in the genus homo', so probably we'll do it to ourselves. What methods do we have available?".
These don't point to AI particularly, they remove the unusual-seemingness of doom in general
Oh, I missed that! Thanks. I'll delete I guess.
I think there's also a third thing that I would call steelmanning, which is a rhetorical technique I sometimes use when faced with particularly bad arguments. If strawmanning introduces new weaknesses to an argument and then knocks it down, steelmanning fixes weaknesses in an argument and then knocks it down anyway. It looks like "this argument doesn't work because X assumption isn't true, but you could actually fix that like this so you don't need that assumption. But it still doesn't work because of Y, and even if you fix that by such and such, it all st...
The main reason I find this kind of thing concerning is that I expect this kind of model to be used as part of a larger system, for example the descendants of systems like SayCan. In that case you have the LLM generate plans in response to situations, break the plans down into smaller steps, and eventually pass the steps to a separate system that translates them to motor actions. When you're doing chain-of-thought reasoning and explicit planning, some simulacrum layers are collapsed - having the model generate the string "kill this person" can in fact lead...
Makes sense. I guess the thing to do is bring it to some bio-risk people in a less public way
It's an interesting question, but I would suggest that when you come up with an idea like this, you weigh up the possible benefits of posting it on the public internet with the possible risks/costs. I don't think this one comes up as positive on balance.
I don't think it's a big deal in this case, but something to think about.
It's impossible to create a fully general intelligence, i.e. one that acts intelligently in all possible universes. But we only have to make one that works in this universe, so that's not an issue.
Please answer with yes or no, then explain your thinking step by step.
Wait, why give the answer before the reasoning? You'd probably get better performance if it thinks step by step first and only gives the decision at the end.
Yes, this effectively forces the network to use backward reasoning. It's equivalent to saying "Please answer without thinking, then invent a justification."
The whole power of chains-of-thought comes from getting the network to reason before answering.
Not a very helpful answer, but: If you don't also require computational efficiency, we can do some of those. Like, you can make AIXI variants. Is the question "Can we do this with deep learning?", or "Can we do this with deep learning or something competitive with it?"
I think they're more saying "these hypothetical scenarios are popular because they make good science fiction, not because they're likely." And I have yet to find a strong argument against the latter form of that point.
Yeah I imagine that's hard to argue against, because it's basically correct, but importantly it's also not a criticism of the ideas. If someone makes the argument "These ideas are popular, and therefore probably true", then it's a very sound criticism to point out that they may be popular for reasons other than being true. But if the argument...
The approach I often take here is to ask the person how they would persuade an amateur chess player who believes they can beat Magnus Carlsen because they've discovered a particularly good opening with which they've won every amateur game they've tried it in so far.
Them: Magnus Carlsen will still beat you, with near certainty
Me: But what is he going to do? This opening is unbeatable!
Them: He's much better at chess than you, he'll figure something out
Me: But what though? I can't think of any strategy that beats this
Them: I don't know, maybe he'll find a way...
I was thinking you had all of mine already, since they're mostly about explaining and coding. But there's a big one: When using tools, I'm tracking something like "what if the knife slips?". When I introspect, it's represented internally as a kind of cloud-like spatial 3D (4D?) probability distribution over knife locations, roughly co-extentional with "if the material suddenly gave or the knife suddenly slipped at this exact moment, what's the space of locations the blade could get to before my body noticed and brought it to a stop?". As I apply more force...
This is actually a lot of what I get out of meditation. I'm not really able to actually stop myself from thinking, and I'm not very diligent at noticing that I'm thinking and returning to the breath or whatever, but since I'm in this frame of "I'm not supposed to be thinking right now but it's ok if I do", the thoughts I do have tend to have this reflective/subtle nature to them. It's a lot like 'shower thoughts' - having unstructured time where you're not doing anything, and you're not supposed to be doing anything, and you're also not supposed to be doing nothing, is valuable for the mind. So I guess meditation is like scheduled slack for me.
I also like the way it changes how you look at the world a little bit, in a 'life has a surprising amount of detail', 'abstractions are leaky' kind of way. To go from a model of locks that's just "you cannot open this without the right key", to seeing how and why and when that model doesn't work, can be interesting. Other problems in life sometimes have this property, where you've made a simplifying assumption about what can't be done, and actually if you look more closely that thing in fact can sometimes be done, and doing it would solve the problem.
it turns out that the Litake brand which I bought first doesn't quite reach long enough into the socket to get the threads to meet, and so I had to return them to get the LOHAS brand.
I came across a problem like this before, and it was kind of a manufacturing/assembly defect. The contact at the bottom of the socket is meant to be bent up to give a bit of spring tension to connect to the bulb, but mine were basically flat. You can take a tool (what worked best for me was a multitool's can opener) and bend the tab up more so it can contact bulbs that don't screw in far enough. UNPLUG IT FIRST though
[Based on conversations with Alex Flint, and also John Wentworth and Adam Shimi]
One of the design goals of the ELK proposal is to sidestep the problem of learning human values, and settle instead for learning human concepts. A system that can answer questions about human concepts allows for schemes that let humans learn all the relevant information about proposed plans and decide about them ourselves, using our values.
So, we have some process in which we consider lots of possible scenarios and collect...
Ah ok, thanks! My main concern with that is that it goes to "https://z0gr6exqhd-dsn.algolia.net", which feels like it could be a dynamically allocated address that might change under me?
Is there a public-facing API endpoint for the Algolia search system? I'd love to be able to say to my discord bot "Hey wasn't there a lesswrong post about xyz?" and have him post a few links
Agreed. On priors I would expect above-baseline rates of mental health issues in the community even in the total absence of any causal arrow from the community to mental health issues (and in fact even in the presence of fairly strong mental health benefits from participation in the community), simply through selection effects. Which people are going to get super interested in how minds work and how to get theirs to work better? Who's going to want to spend large amounts of time interacting with internet strangers instead of the people around them? Who's g...
Holy wow excalidraw is good, thank you! I've spent a long time being frustrated that I know exactly what I want from this kind of application and nothing does even half of it. But excalidraw is exactly the ideal program I was imagining. Several times when trying it out I thought "Ok in my ideal program, if I hit A it will switch to the arrow tool." and then it did. "Cool, I wonder what other shortcuts there are" so I hit "?" and hey a nice cheat sheet pops up. Infinite canvas, navigated how I would expect. Instant multiplayer, with visible cursors so you can gesture at things. Even a dark mode. Perfect.
This is the factor that persuaded me to try Obsidian in the first place. It's maintained by a company, so perhaps more polish than some FOSS projects, but the notes are all stored purely as simple markdown files on your hard disk, so if the company goes under the worst that happens is there are no more updates and I just keep using whatever the last version was
I suppose it makes sense that if you've done a lot of introspection, the main problems you'll have will be the kind that are very resistant to that approach, which makes this post good advice for you and people like you. But I don't think the generalisable lesson is "introspection doesn't work, do these other things" so much as "there comes a point where introspection runs out, and when you hit that, here are some ways you can continue to make progress".
Or maybe it's like a person with a persistent disease who's tried every antibiotic without much effect, ...
I think this overestimates the level of introspection most people have in their lives, and therefore underestimates the effectiveness of introspection. I think for most people, most of the time, this 'nonspecific discomfort' is almost entirely composed of specific and easily understood problems that just make the slightest effort to hide themselves, by being uncomfortable to think about.
For example, maybe you don't like your job, and that's the problem. But, you have some combination of factors like
At my grandmother's funeral I read Dirge Without Music by Edna St. Vincent Millay, which captured my feelings at the time fairly well. I think you can say things while reading a poem that you couldn't just say as yourself.
On point 12, Drone delivery: If the FAA is the reason, we should expect to see this already happening in China?
My hypothesis is, the problem is noise. Even small drones are very loud, and ones large enough to lift the larger packages would be deafening. This is something that's very hard to engineer away, since transferring large amounts of energy into the air is an unavoidable feature of a drone's mode of flight. Aircraft deal with this by being very high up, but drones have to come to your doorstep. I don't see people being ok with that level of noise on a constant, unpredictable basis.
It would certainly be a mistake to interpret your martial art's principle of "A warrior should be able to fight well even in unfavourable combat situations" as "A warrior should always immediately charge into combat, even when that would lead to an unfavourable situation", or "There's no point in trying to manoeuvre into a favourable situation"
I disagree with the insistence on "paperclip maximiser". As an emerging ASI you want to know about other ASIs you'll meet, especially grabby ones. But there are aligned grabby ASIs. You'd want an accurate prior, so I don't think this updates me on probability of alignment, or even much on grabbiness, since it's hard to know ahead of time, that's why you'd run a simulation in the first place.
I don't take it very seriously because (1) it is a big pile of assumptions and I don't trust anthropic reasoning much at the best of times, it's very confusing and hard... (read more)