I can do more projects in parallel than I could have before. Which means that I have even more work now... The support and maintenance costs of the code itself are the same, as long as you maintain constant vigilance to make sure nothing bad gets merged. So the costs are moved from development to review. It's a lot easier to produce thousands of lines of slop which then have to be reviewed and loads of suggestions made. It's easy for bad taste to be amplified, which is a real cost that might not be noticed that much.
There are some evals which ...
Writing tests (in Python). Writing comprehensive tests for my code used to take a significant portion of my time. Probably at least 2x more than writing the actual code, and subjectively a lot more. Now it's a matter of "please write tests for this function", "now this one" etc., with an extra "no, that's ugly, make it nicer" every now and then.
Working with simple code is also a lot faster, as long as it doesn't have to process too much. So most of what I do now is make sure the file it's processing isn't more than ~500 lines of code. This has the nice sid...
I have a feeling this might be a bit more complex. So I'd say there is vector pointing from where you are to where God want's you to be, and that if on each step you always minimize the distance, then you're getting closer to what God wants as the crow flies, but that there are a bunch of traps, detours and other such things along the way. And that if you just directly follow the vector, you'll probably end up in a bad place because you'll take a bad path.
So just following the vector would be a form of consequentialism, where a naive approach ends with you...
I can't remember where it was, but he somewhere talks about the goblin mindset being common. Orcs here is not a specific "team", it's people that act and think like orcs, where they delight in destruction, havoc and greed
There seems to be a largish group of people who are understandably worried about AI advances but have no hope of changing it, so start panicking. This post is a good reminder that yes, we're all going to die, but since you don't know when, you have to prepare for multiple eventualities.
Shorting life is good if you can pull it off. But the same caveats apply as to shorting the market.
This is one of those mechanisms which are obvious once you notice them, and really useful to know about, but weirdly non-noticed. After reading this I started noticing a lot more of these, and (hopefully) became more open to accepting non extreme versions of various things that I previously thought horrific.
It's sad that Duncan is Deactivated, as there are multiple posts like this one that make me a better person.
This is an enjoyable, somewhat humorous summary of a very complicated topic, spanning literally billions of years. So it naturally skips and glosses over a bunch of details, while managing to give relatively simple answers to:
I really appreciated the disclaimers at the top - every time I discuss biology, I bump into these limitations, so it's very appropriate for an intro article to explicitly state them.
Wealth not equaling happiness works both ways. It's the idea of losing wealth that's driving sleep away. In this case, the goal of buying insurance is to minimize the risk of losing wealth. The real thing that's stopping you sleep is not whether you have insurance or not, it's how likely it is that something bad happens, which will cost more than you're comfortable losing. Having insurance is just one of the ways to minimize that - the problem is stress stemming from uncertainty, not whether you've bought an insurance policy.
The list of misunderstand...
It's probably not that large a risk though? I doubt any alien microbes would be that much of a problem to us. It seems unlikely that they would happen to use exactly the same biochemistry as we do, which makes it harder for them to infect/digest us. Chirality is just one of the multitudes of ways in which earth's biosphere is "unique". It's been a while since I was knowledgeable about any of this, but a quick o1 query seems to point in the same direction. Worth going through quarantine, just in case, of course. Though that works on earth pathogens which te...
A bit of nitpicking: the basic Open Source deal is not that you can do what you want with the product. It's that the source code should be available. The whole point of introducing open source as an idea was to allow coorporations etc. to give access to their source code without worrying so much about people doing what you're describing. Deleting a "don't do this bad thing" can be prosecuted as copyright infringement (if the whole license gets removed). This is what copyleft was invented for - to subvert copyright laws by using them to force companies to p...
not that you can do what you want with the product. It's that the source code should be available.
Since the inception of the term, "Open Source" has meant more than that. You're describing "source-available software" instead.
It's not that the elite groups are good or bad, it's the desire to be in an elite group that leads to bad outcomes. Like how the root of all evil is the love of money, where money in itself isn't bad, it's the desire to possess it that is. Mainly because you start to focus on the means rather than the ends, and so end up in places you wouldn't have wanted to end up in originally.
It's about status. Being in with the cool kids etc. Elite groups aren't inherently good or bad - they're usually just those who are better at whatever is valued, or at least ...
It's not just from https://aisafety.info/. It also uses Arbital, any posts from the alignment forum, LW, EA forum that seem relevant and have a minimum karma, a bunch of arXiv papers, and a couple of other sources. This is a a relatively up to date list of the sources used (it also contains the actual data).
Another, related Machiavellian tactic is, when starting a relationship that you suspect will be highly valuable to you, is to have an argument with them as soon as possible, and then to patch things up with a (sincere!) apology. I'm not suggesting to go out of your way to start a quarrel, more that it's both a valuable data point as to how they handle problems (as most relationships will have patchy moments) and it's also a good signal to them that you value them highly enough to go through a proper apology.
They are perils of assuming that hydrogen is the future, or perils of basing your energy needs on it - i.e. the peril is not in the hydrogen, it's in making plans involving it
Somatic cells are generally dead ends evolutionary. Your toe cells aren't going to do much reproducing. Also, mitochondial (or in general organellar) DNA is split between the actual mitochondria and the cells containing them. Biology is fun!
The argument for mitochondria is that they cause the cell environment to be more toxic (what with them being the cell's powerhouse). This in turn is going to provide a lot of selection pressure. In the same way e.g. global warming is causing a lot of selection pressure.
Runaway sexual selection has limits. This is also s...
The number of generations controls how long your experiment lasts. The longer (or more generations), the more drift you have, so the more likely for a given gene (or in this case - genders number) to take over. This effect will be weaker in larger populations, but unless you have an infinite population, given enough time (or generations), you'll end up with the 2 sexes (except for fungi, of course, as always). Eukaryotes first appeared 2.2 billion years ago. For comparison, the Cambrian explosion, with the first complex life, was only ~500 million years ag...
My understanding is pretty much what you said - when the going is good, then go asexual (e.g. strawberry runners, grasses or Asian knotweed), but also try for seeds, There are a couple of species of plants that have lost the ability for sexual reproduction, but I can't recall them right now. That being said, various plants used by humans can be pretty much exclusively reproduced asexually and so have lost the ability for sexual reproduction, specifically because they have very stable environments. The obvious examples are seedless fruits (bananas, grapes),...
This depends on the size and distances involved, but it's a good intuition. You need a mechanism to generate the pressure differentials, which can be an issue in very small organisms, which can be an issue.
Small and sedentary organisms tend to use chemical gradients (i.e. smell), but anything bigger than a mouse (and quite a few smaller things) usually has some kind of sound signals, which are really good for quick notifications in a radius around you, regardless of the light level (so you can pretty much always use it). Also, depending on the medium, sound can travel really far - like whales which communicate with each other over thousands of miles, or elephants stomping to communicate with other elephants 20 miles away.
organisms with mitochondria always use sexual reproduction
Or at least their ancestors did. You mention Bdelloidea in a comment, which are one of the inevitable exceptions (as you mention in the introduction, which I very appreciate, as "everything in biology has exceptions" is something I often find myself saying), but they are descended from eucaryotes which did have mitochondria.
The opposite seems true, though - true sexual reproduction seems to be exclusively by eukaryotes. So you could also say that sex makes mitochondria necessary. There seem to...
It requires you to actively manage long lived sessions which would otherwise be handled by the site you're using. You can often get back to where you were by just logging in again, but there are many places (especially for travel or official places) where that pretty much resets the whole flow.
There are also a lot more popups, captchas and other hoops to jump through when you don't have a cookies trail.
The average user is lazy and doesn't think about these things, so the web as a whole is moving in the direction of making things easier (but not simpler). T...
I thought all of these were obvious and well known. But yes, all of these are things I was pointing at.
there is "something else" going on besides both parties just wanting to get the right answer
There are also different priors. While in general you might very well be right (or at least this post makes a lot of sense to me), I often have conversations where I'm pretty sure both my interlocutor and I am discussing things in good faith, but where we still can't agree on pretty basic things (usually about religion).
I'm assuming you're not asking about the mechanism (i.e. natural selection + mutations)? A trite answer would be something like "the same way it created wings, mating dances, exploding beetles, and parasites requiring multiple hosts".
Thinking about the meaning of life might be a spandrel, but a quick consideration of it comes up with various evo-psych style reasons why it's actually very useful, e.g. it can propel people to greatness, which massively can increase their genetic fitness. Fitness is an interesting thing, in that it can be very non-obvious. Ev...
Frankenstein is a tale about misalignment. Asimov wrote a whole book about it. Vernor Vinge also writes about it. People have been trying to get their children to behave in certain ways for ever. But before LW the alignment problem was just the domain of SF.
20 years ago the alignment problem wasn't a thing, so much that MIRI started out as an org to create a Friendly AI.
The first issue that comes to mind is having an incentive that would achieve that. The one you suggest doesn't incentivize truth - it incentivizes collaboration in order to guess the password, which would fine in training, but then you're going into deceptive alignment land: Aleya Cotra has a good story illustrating that
You could, but should you? English in particular seems a bad choice. The problem with natural languages is their ambiguity. When you're providing a utility function, you want it to be as precise and robust as possible. This is actually an interesting case where folklore/mythology has known about these issues for millennia. There are all kinds of stories about genies, demons, monkey paws etc. where wishes were badly phrased or twisted. This is a story explanation of the issue.
You're adding a lot of extra assumptions here, a couple being:
The main problem of inner alignment is making an agent want to do what you want it to do (as opposed to even understanding what you want it to do). Which is an unsolved problem.
Although I'm criticizing your specific criticism, my main issue with it is that...
Drug addicts tend to be frowned upon not because they have a bad life, or even for evo-psych reasons but because their lifestyle is bad for the rest of society, in that they tend to have various unfortunate externalities
It can also be retaliation, which sort of makes sense - there's a reason tit-for-tat is so successful. That being said, it's generally very unfortunate that they're being introduced, on all sides. I can sort of understand why countries would want to limit people from poor countries (which is not the same as agreeing with the reasoning). Enforcing visas for short term, touristy style visits doesn't seem like a good idea however I look at it. As Zvi notes, it's about the friction.
ESTA is very much a visa (I filled it out yesterday), but under a different name and purely electronic.
Not being able to directly communicate with the others would be an issue in the beginning, but I'm guessing you would be able to use the setup to work out what the others think.
A bigger issue is that this would probably result in a very homogeneous group of minds. They're optimizing not for correct answers, but for consensus answers. It's the equivalent of studying for the exams. An fun example are the Polish equivalent of SAT exams (this probably generalizes, but I don't know about other countries). I know quite a few people who went to study biolog...
That very much depends on how you understand "safe". Which is a large part of the differences between ethical AI people (safe means that it doesn't offend anyone, leak private information, give biased answers etc.) and the notkilleveryoneism people (safe means that it doesn't decide to remove humanity). These aren't mutually incompatible, but they require focusing on different things.
There is also safe in the PR sense, which means that no output will cause the LLM producer/supplier/whoever to get sued or in any other kind of trouble.
"Safe" is one of those funny words which everyone understands differently, but also assume that everyone else understands the same way.
A couple come to mind:
The problem with them being that it's takes a bit of explaining to even understand the issue.
Think of reward not as "here's an ice-cream for being a good boy" and more "you passed my test. I will now do neurosurgery on you to make you more likely to behave the same way in the future". The result of applying the "reward" in both cases is that you're more likely to act as desired next time. In humans it's because you expect to get something nice out of being good, in computers it's because they've been modified to do so. It's hard to directly change how humans think and behave, so you have to do it via ice-cream and beatings. While with computers you can just modify their memory.
It depends a lot on how much it values self-preservation in comparison to solving the tests (putting aside the matter of minimal computation). Self-preservation is an instrumental goal, in that you can't bring the coffee if you're dead. So it seems likely that any intelligent enough AI will value self-preservation, if only in order to make sure it can achieve its goals.
That being said, having an AI that is willing to do its task and then shut itself down (or to shut down when triggered) is an incredibly valuable thing to have - it's already finished, but y...
Depends where. Which is the whole issue. For the US average wage, yes. For non US people no. I agree that it's a matter of priorities. But it's also a matter of earnings minus costs. Both of which depend a lot on where you live.
A lot of people certainly could save a lot more. But usually at the cost of quality of life. You could say that they should work a job that pays more, or live somewhere where there is a lower cost of living, but both of those can be hard.
I'm not saying you're wrong that it's doable. The problem is that the feasibility is highly dependent on your circumstances (same as e.g. having an electric car or whatever), which can make it very hard for people who aren't in affluent places.
Which is a bit over 3 years of saving up every penny of the average wages where I live. If you subtract the average rent and starvation rations from that income, you're up to 5.5 years. The first info I could find on google (from 2018) claims the average person here saves around $100 monthly, which gives you over 40 years of saving. This is only for one person. If you have multiple children, a SO, etc., that starts ballooning quickly. This is in a country which while not yet classified as developed, is almost there (Poland).
50k is a lot for pretty much most of the world. It's the cost of a not very nice flat (i.e. middling location, or bad condition) here.
It's not that it can't come up with ways to not stamp on us. But why should it? Yes, it might only be a tiny, tiny inconvenience to leave us alone. But why even bother doing that much? It's very possible that we would be of total insignificance to an AI. Just like the ants that get destroyed at a construction site - no one even noticed them. Still doesn't turn out too good for them.
Though that's when there are massive differences of scale. When the differences are smaller, you get into inter-species competition dynamics. Which also is what the OP was point...
In your example, can it just lie? You'd have to make sure it either doesn't know the consequences of your interlocks, or for it to not care about them (this is the problem of corrigibility).
If the tests are obvious tests, your AI will probably notice that and react accordingly - if it has enough intelligence it can notice that they're hard and probably are going to be used to gauge it's level, which then feeds into the whole thing about biding your time and not showing your cards until you can take over.
If they're not obvious, then you're in a security typ...
How do the hard limits of intelligence help? My current understanding is that the hard limits are likely to be something like Jupiter brains, rather than mentats. If each step is only slightly better, won't that result in a massive amount of tiny steps (even taking into account the nonlinearlity of it)?
Small value drifts are a large problem, if compounded. That's sort of the premise of a whole load of fiction, where characters change their value systems after sequences of small updates. And that's just in humans - adding in alien (as in different) minds could complicate this further (or not - that's the thing about alien minds).
The foom problem is worse because of how hard it is to trust the recursion. Foomability is weakly correlated to whether the foomed entity is aligned. At least from our perspective. That's why there's the whole emphasis of getting it right on the first try.
How can you estimate how many of iterations of RSA will happen?
How does interpretability align an AI? It can let you know when things are wrong, but that doesn't mean it's aligned.
QACI can potentially solve outer alignment by giving you a rigorous and well specified mathematical target to aim for. That still leaves the other issues (though they are being worked on).
To a certain extent it doesn't matter. Or rather it's a question of expected utility. If 10% of outcomes are amazing, but 60% horrible, that sort of suggests you might want to avoid that route.
Assuming it can scale with capabilities, that doesn't help you if alignment is scaling at y=2x and capabilities at y=123x (for totally random numbers, but you get the point). A quick google search found an article from 2017 claiming that there are some 300k AI researchers worldwide. I see claims around here that there are like 300 alignment researchers. Those numbers can be taken with a large grain of salt, but even so, that's 1000:1.
As to recursive improvement, nope - check out the tiling problem. Also, "only" is doing a massive amount of work in "o...
More that you get as many people in general to read the sequences, which will change their thinking so they make fewer mistakes, which in turn will make more people aware both of the real risks underlying superintelligence, but also of the plausibility and utility of AI. I wasn't around then, so this is just my interpretation of what I read post-facto, but I get the impression that people were a lot less doomish then. There was a hope that alignment was totally solvable.
The focus didn't seem to be on getting people into alignment, as much as it generally b...
Right - now I see it. I was testing it on the reactions of @Sune's comment, so it was hidden far away to the right.
All in all, nice feature though.
But there is no way to downvote a reaction? E.g. if you add the paperclip reaction, then all I can do is bump it by one and/or later remove my reaction, but there is no way to influence your one? So reactions are strictly additive?
The answer is to read the sequences (I'm not being facetious). They were written with the explicit goal of producing people with EY's rationality skills in order for them to go into producing Friendly AI (as it was called then). It provides a basis for people to realize why most approaches will by default lead to doom.
At the same time, it seems like a generally good thing for people to be as rational as possible, in order to avoid the myriad cognitive biases and problems that plague humanities thinking, and therefore actions. My impression is that the hope was to make the world more similar to Dath Ilan.
Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.
I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.