Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Following some somewhat misleading articles quoting me, I thought I’d present the top 10 myths about the AI risk thesis:
- That we’re certain AI will doom us. Certainly not. It’s very hard to be certain of anything involving a technology that doesn’t exist; we’re just claiming that the probability of AI going bad isn’t low enough that we can ignore it.
- That humanity will survive, because we’ve always survived before. Many groups of humans haven’t survived contact with more powerful intelligent agents. In the past, those agents were other humans; but they need not be. The universe does not owe us a destiny. In the future, something will survive; it need not be us.
- That uncertainty means that you’re safe. If you’re claiming that AI is impossible, or that it will take countless decades, or that it’ll be safe... you’re not being uncertain, you’re being extremely specific about the future. “No AI risk” is certain; “Possible AI risk” is where we stand.
- That Terminator robots will be involved. Please? The threat from AI comes from its potential intelligence, not from its ability to clank around slowly with an Austrian accent.
- That we’re assuming the AI is too dumb to know what we’re asking it. No. A powerful AI will know what we meant to program it to do. But why should it care? And if we could figure out how to program “care about what we meant to ask”, well, then we’d have safe AI.
- That there’s one simple trick that can solve the whole problem. Many people have proposed that one trick. Some of them could even help (see Holden’s tool AI idea). None of them reduce the risk enough to relax – and many of the tricks contradict each other (you can’t design an AI that’s both a tool and socialising with humans!).
- That we want to stop AI research. We don’t. Current AI research is very far from the risky areas and abilities. And it’s risk aware AI researchers that are most likely to figure out how to make safe AI.
- That AIs will be more intelligent than us, hence more moral. It’s pretty clear than in humans, high intelligence is no guarantee of morality. Are you really willing to bet the whole future of humanity on the idea that AIs might be different? That in the billions of possible minds out there, there is none that is both dangerous and very intelligent?
- That science fiction or spiritual ideas are useful ways of understanding AI risk. Science fiction and spirituality are full of human concepts, created by humans, for humans, to communicate human ideas. They need not apply to AI at all, as these could be minds far removed from human concepts, possibly without a body, possibly with no emotions or consciousness, possibly with many new emotions and a different type of consciousness, etc... Anthropomorphising the AIs could lead us completely astray.
- That AIs have to be evil to be dangerous. The majority of the risk comes from indifferent or partially nice AIs. Those that have sone goal to follow, with humanity and its desires just getting in the way – using resources, trying to oppose it, or just not being perfectly efficient for its goal.
Markets are powerful decentralized optimization engines - it is known. Liberals see the free market as a kind of optimizer run amuck, a dangerous superintelligence with simple non-human values that must be checked and constrained by the government - the friendly SI. Conservatives just reverse the narrative roles.
In some domains, where the incentive structure aligns with human values, the market works well. In our current framework, the market works best for producing gadgets. It does not work so well for pricing intangible information, and most specifically it is broken when it comes to health.
We treat health as just another gadget problem: something to be solved by pills. Health is really a problem of knowledge; it is a computational prediction problem. Drugs are useful only to the extent that you can package the results of new knowledge into a pill and patent it. If you can't patent it, you can't profit from it.
So the market is constrained to solve human health by coming up with new patentable designs for mass-producible physical objects which go into human bodies. Why did we add that constraint - thou should solve health, but thou shalt only use pills? (Ok technically the solutions don't have to be ingestible, but that's a detail.)
The gadget model works for gadgets because we know how gadgets work - we built them, after all. The central problem with health is that we do not completely understand how the human body works - we did not build it. Thus we should be using the market to figure out how the body works - completely - and arguably we should be allocating trillions of dollars towards that problem.
The market optimizer analogy runs deeper when we consider the complexity of instilling values into a market. Lawmakers cannot program the market with goals directly, so instead they attempt to engineer desireable behavior by ever more layers and layers of constraints. Lawmakers are deontologists.
As an example, consider the regulations on drug advertising. Big pharma is unsafe - its profit function does not encode anything like "maximize human health and happiness" (which of course itself is an oversimplification). If allowed to its own devices, there are strong incentives to sell subtly addictive drugs, to create elaborate hyped false advertising campaigns, etc. Thus all the deontological injunctions. I take that as a strong indicator of a poor solution - a value alignment failure.
What would healthcare look like in a world where we solved the alignment problem?
To solve the alignment problem, the market's profit function must encode long term human health and happiness. This really is a mechanism design problem - its not something lawmakers are even remotely trained or qualified for. A full solution is naturally beyond the scope of a little blog post, but I will sketch out the general idea.
To encode health into a market utility function, first we create financial contracts with an expected value which captures long-term health. We can accomplish this with a long-term contract that generates positive cash flow when a human is healthy, and negative when unhealthy - basically an insurance contract. There is naturally much complexity in getting those contracts right, so that they measure what we really want. But assuming that is accomplished, the next step is pretty simple - we allow those contracts to trade freely on an open market.
There are some interesting failure modes and considerations that are mostly beyond scope but worth briefly mentioning. This system probably needs to be asymmetric. The transfers on poor health outcomes should partially go to cover medical payments, but it may be best to have a portion of the wealth simply go to nobody/everybody - just destroyed.
In this new framework, designing and patenting new drugs can still be profitable, but it is now put on even footing with preventive medicine. More importantly, the market can now actually allocate the correct resources towards long term research.
To make all this concrete, let's use an example of a trillion dollar health question - one that our current system is especially ill-posed to solve:
What are the long-term health effects of abnormally low levels of solar radiation? What levels of sun exposure are ideal for human health?
This is a big important question, and you've probably read some of the hoopla and debate about vitamin D. I'm going to soon briefly summarize a general abstract theory, one that I would bet heavily on if we lived in a more rational world where such bets were possible.
In a sane world where health is solved by a proper computational market, I could make enormous - ridiculous really - amounts of money if I happened to be an early researcher who discovered the full health effects of sunlight. I would bet on my theory simply by buying up contracts for individuals/demographics who had the most health to gain by correcting their sunlight deficiency. I would then publicize the theory and evidence, and perhaps even raise a heap pile of money to create a strong marketing engine to help ensure that my investments - my patients - were taking the necessary actions to correct their sunlight deficiency. Naturally I would use complex machine learning models to guide the trading strategy.
Now, just as an example, here is the brief 'pitch' for sunlight.
If we go back and look across all of time, there is a mountain of evidence which more or less screams - proper sunlight is important to health. Heliotherapy has a long history.
Humans, like most mammals, and most other earth organisms in general, evolved under the sun. A priori we should expect that organisms will have some 'genetic programs' which take approximate measures of incident sunlight as an input. The serotonin -> melatonin mediated blue-light pathway is an example of one such light detecting circuit which is useful for regulating the 24 hour circadian rhythm.
The vitamin D pathway has existed since the time of algae such as the Coccolithophore. It is a multi-stage pathway that can measure solar radiation over a range of temporal frequencies. It starts with synthesis of fat soluble cholecalciferiol which has a very long half life measured in months.  
- Cholecalciferiol (HL ~ months) becomes
- 25(OH)D (HL ~ 15 days) which finally becomes
- 1,25(OH)2 D (HL ~ 15 hours)
The main recognized role for this pathway in regards to human health - at least according to the current Wikipedia entry - is to enhance "the internal absorption of calcium, iron, magnesium, phosphate, and zinc". Ponder that for a moment.
Interestingly, this pathway still works as a general solar clock and radiation detector for carnivores - as they can simply eat the precomputed measurement in their diet.
So, what is a long term sunlight detector useful for? One potential application could be deciding appropriate resource allocation towards DNA repair. Every time an organism is in the sun it is accumulating potentially catastrophic DNA damage that must be repaired when the cell next divides. We should expect that genetic programs would allocate resources to DNA repair and various related activities dependent upon estimates of solar radiation.
I should point out - just in case it isn't obvious - that this general idea does not imply that cranking up the sunlight hormone to insane levels will lead to much better DNA/cellular repair. There are always tradeoffs, etc.
One other obvious use of a long term sunlight detector is to regulate general strategic metabolic decisions that depend on the seasonal clock - especially for organisms living far from the equator. During the summer when food is plentiful, the body can expect easy calories. As winter approaches calories become scarce and frugal strategies are expected.
So first off we'd expect to see a huge range of complex effects showing up as correlations between low vit D levels and various illnesses, and specifically illnesses connected to DNA damage (such as cancer) and or BMI.
Now it turns out that BMI itself is also strongly correlated with a huge range of health issues. So the first key question to focus on is the relationship between vit D and BMI. And - perhaps not surprisingly - there is pretty good evidence for such a correlation  , and this has been known for a while.
Now we get into the real debate. Numerous vit D supplement intervention studies have now been run, and the results are controversial. In general the vit D experts (such as my father, who started the vit D council, and publishes some related research) say that the only studies that matter are those that supplement at high doses sufficient to elevate vit D levels into a 'proper' range which substitutes for sunlight, which in general requires 5000 IU day on average - depending completely on genetics and lifestyle (to the point that any one-size-fits all recommendation is probably terrible).
The mainstream basically ignores all that and funds studies at tiny RDA doses - say 400 IU or less - and then they do meta-analysis over those studies and conclude that their big meta-analysis, unsurprisingly, doesn't show a statistically significant effect. However, these studies still show small effects. Often the meta-analysis is corrected for BMI, which of course also tends to remove any vit D effect, to the extent that low vit D/sunlight is a cause of both weight gain and a bunch of other stuff.
So let's look at two studies for vit D and weight loss.
First, this recent 2015 study of 400 overweight Italians (sorry the actual paper doesn't appear to be available yet) tested vit D supplementation for weight loss. The 3 groups were (0 IU/day, ~1,000 IU / day, ~3,000 IU/day). The observed average weight loss was (1 kg, 3.8 kg, 5.4 kg). I don't know if the 0 IU group received a placebo. Regardless, it looks promising.
On the other hand, this 2013 meta-analysis of 9 studies with 1651 adults total (mainly women) supposedly found no significant weight loss effect for vit D. However, the studies used between 200 IU/day to 1,100 IU/day, with most between 200 to 400 IU. Five studies used calcium, five also showed weight loss (not necessarily the same - unclear). This does not show - at all - what the study claims in its abstract.
In general, medical researchers should not be doing statistics. That is a job for the tech industry.
Now the vit D and sunlight issue is complex, and it will take much research to really work out all of what is going on. The current medical system does not appear to be handling this well - why? Because there is insufficient financial motivation.
Is Big Pharma interested in the sunlight/vit D question? Well yes - but only to the extent that they can create a patentable analogue! The various vit D analogue drugs developed or in development is evidence that Big Pharma is at least paying attention. But assuming that the sunlight hypothesis is mainly correct, there is very little profit in actually fixing the real problem.
There is probably more to sunlight that just vit D and serotonin/melatonin. Consider the interesting correlation between birth month and a number of disease conditions. Perhaps there is a little grain of truth to astrology after all.
Thus concludes my little vit D pitch.
In a more sane world I would have already bet on the general theory. In a really sane world it would have been solved well before I would expect to make any profitable trade. In that rational world you could actually trust health advertising, because you'd know that health advertisers are strongly financially motivated to convince you of things actually truly important for your health.
Instead of charging by the hour or per treatment, like a mechanic, doctors and healthcare companies should literally invest in their patients long-term health, and profit from improvements to long term outcomes. The sunlight health connection is a trillion dollar question in terms of medical value, but not in terms of exploitable profits in today's reality. In a properly constructed market, there would be enormous resources allocated to answer these questions, flowing into legions of profit motivated startups that could generate billions trading on computational health financial markets, all without selling any gadgets.
So in conclusion: the market could solve health, but only if we allowed it to and only if we setup appropriate financial mechanisms to encode the correct value function. This is the UFAI problem next door.
It’s all too easy to let a false understanding of something replace your actual understanding. Sometimes this is an oversimplification, but it can also take the form of an overcomplication. I have an illuminating story:
Years ago, when I was young and foolish, I found myself in a particular romantic relationship that would later end for epistemic reasons, when I was slightly less young and slightly less foolish. Anyway, this particular girlfriend of mine was very into healthy eating: raw, organic, home-cooked, etc. During her visits my diet would change substantially for a few days. At one point, we got in a tiny fight about something, and in a not-actually-desperate chance to placate her, I semi-jokingly offered: “I’ll go vegetarian!”
“I don’t care,” she said with a sneer.
…and she didn’t. She wasn’t a vegetarian. Duhhh... I knew that. We’d made some ground beef together the day before.
So what was I thinking? Why did I say “I’ll go vegetarian” as an attempt to appeal to her values?
(I’ll invite you to take a moment to come up with your own model of why that happened. You don't have to, but it can be helpful for evading hindsight bias of obviousness.)
Here's my take: I pattern-matched a bunch of actual preferences she had with a general "healthy-eating" cluster, and then I went and pulled out something random that felt vaguely associated. It's telling, I think, that I don't even explicitly believe that vegetarianism is healthy. But to my pattern-matcher, they go together nicely.
I'm going to call this pattern-botching.† Pattern-botching is when you pattern-match a thing "X", as following a certain model, but then implicit queries to that model return properties that aren't true about X. What makes this different from just having false beliefs is that you know the truth, but you're forgetting to use it because there's a botched model that is easier to use.
†Maybe this already has a name, but I've read a lot of stuff and it feels like a distinct concept to me.
Examples of pattern-botching
So, that's pattern-botching, in a nutshell. Now, examples! We'll start with some simple ones.
Calmness and pretending to be a zen master
In my Againstness Training video, past!me tries a bunch of things to calm down. In the pursuit of "calm", I tried things like...
- trying to imitate a zen master
- speaking really quietly and timidly
None of these are the desired state. The desired state is present, authentic, and can project well while speaking assertively.
But that would require actually being in a different state, which to my brain at the time seemed hard. So my brain constructed a pattern around the target state, and said "what's easy and looks vaguely like this?" and generated the list above. Not as a list, of course! That would be too easy. It generated each one individually as a plausible course of action, which I then tried, and which Val then called me out on.
I'm quite gregarious, extraverted, and generally unflappable by noise and social situations. Many people I know describe themselves as HSPs (Highly Sensitive Persons) or as very introverted, or as "not having a lot of spoons". These concepts are related—or perhaps not related, but at least correlated—but they're not the same. And even if these three terms did all mean the same thing, individual people would still vary in their needs and preferences.
Just this past week, I found myself talking with an HSP friend L, and noting that I didn't really know what her needs were. Like I knew that she was easily startled by loud noises and often found them painful, and that she found motion in her periphery distracting. But beyond that... yeah. So I told her this, in the context of a more general conversation about her HSPness, and I said that I'd like to learn more about her needs.
L responded positively, and suggested we talk about it at some point. I said, "Sure," then added, "though it would be helpful for me to know just this one thing: how would you feel about me asking you about a specific need in the middle of an interaction we're having?"
"I would love that!" she said.
"Great! Then I suspect our future interactions will go more smoothly," I responded. I realized what had happened was that I had conflated L's HSPness with... something else. I'm not exactly sure what, but a preference for indirect communication, perhaps? I have another friend, who is also sometimes short on spoons, who I model as finding that kind of question stressful because it would kind of put them on the spot.
I've only just recently been realizing this, so I suspect that I'm still doing a ton of this pattern-botching with people, that I haven't specifically noticed.
Of course, having clusters makes it easier to have heuristics about what people will do, without knowing them too well. A loose cluster is better than nothing. I think the issue is when we do know the person well, but we're still relying on this cluster-based model of them. It's telling that I was not actually surprised when L said that she would like it if I asked about her needs. On some level I kind of already knew it. But my botched pattern was making me doubt what I knew.
CFAR teaches a technique called "Aversion Factoring", in which you try to break down the reasons why you don't do something, and then consider each reason. In some cases, the reasons are sound reasons, so you decide not to try to force yourself to do the thing. If not, then you want to make the reasons go away. There are three types of reasons, with different approaches.
One is for when you have a legitimate issue, and you have to redesign your plan to avert that issue. The second is where the thing you're averse to is real but isn't actually bad, and you can kind of ignore it, or maybe use exposure therapy to get yourself more comfortable with it. The third is... when the outcome would be an issue, but it's not actually a necessary outcome of the thing. As in, it's a fear that's vaguely associated with the thing at hand, but the thing you're afraid of isn't real.
All of these share a structural similarity with pattern-botching, but the third one in particular is a great example. The aversion is generated from a property that the thing you're averse to doesn't actually have. Unlike a miscalibrated aversion (#2 above) it's usually pretty obvious under careful inspection that the fear itself is based on a botched model of the thing you're averse to.
Taking the training wheels off of your model
One other place this structure shows up is in the difference between what something looks like when you're learning it versus what it looks like once you've learned it. Many people learn to ride a bike while actually riding a four-wheeled vehicle: training wheels. I don't think anyone makes the mistake of thinking that the ultimate bike will have training wheels, but in other contexts it's much less obvious.
The remaining three examples look at how pattern-botching shows up in learning contexts, where people implicitly forget that they're only partway there.
Rationality as a way of thinking
CFAR runs 4-day rationality workshops, which currently are evenly split between specific techniques and how to approach things in general. Let's consider what kinds of behaviours spring to mind when someone encounters a problem and asks themselves: "what would be a rational approach to this problem?"
- someone with a really naïve model, who hasn't actually learned much about applied rationality, might pattern-match "rational" to "hyper-logical", and think "What Would Spock Do?"
- someone who is somewhat familiar with CFAR and its instructors but who still doesn't know any rationality techniques, might complete the pattern with something that they think of as being archetypal of CFAR-folk: "What Would Anna Salamon Do?"
- CFAR alumni, especially new ones, might pattern-match "rational" as "using these rationality techniques" and conclude that they need to "goal factor" or "use trigger-action plans"
- someone who gets rationality would simply apply that particular structure of thinking to their problem
In the case of a bike, we see hundreds of people biking around without training wheels, and so that becomes the obvious example from which we generalize the pattern of "bike". In other learning contexts, though, most people—including, sometimes, the people at the leading edge—are still in the early learning phases, so the training wheels are the rule, not the exception.
So people start thinking that the figurative bikes are supposed to have training wheels.
Incidentally, this can also be the grounds for strawman arguments where detractors of the thing say, "Look at these bikes [with training wheels]! How are you supposed to get anywhere on them?!"
We potentially see a similar effect with topics like Effective Altruism. It's a movement that is still in its infancy, which means that nobody has it all figured out. So when trying to answer "How do I be an effective altruist?" our pattern-matchers might pull up a bunch of examples of things that EA-identified people have been commonly observed to do.
- donating 10% of one's income to a strategically selected charity
- going to a coding bootcamp and switching careers, in order to Earn to Give
- starting a new organization to serve an unmet need, or to serve a need more efficiently
- supporting the Against Malaria Fund
...and this generated list might be helpful for various things, but be wary of thinking that it represents what Effective Altruism is. It's possible—it's almost inevitable—that we don't actually know what the most effective interventions are yet. We will potentially never actually know, but we can expect that in the future we will generally know more than at present. Which means that the current sampling of good EA behaviours likely does not actually even cluster around the ultimate set of behaviours we might expect.
Creating a new (platform for) culture
At my intentional community in Waterloo, we're building a new culture. But that's actually a by-product: our goal isn't to build this particular culture but to build a platform on which many cultures can be built. It's like how as a company you don't just want to be building the product but rather building the company itself, or "the machine that builds the product,” as Foursquare founder Dennis Crowley puts it.
What I started to notice though, is that we started to confused the particular, transitionary culture that we have at our house, with either (a) the particular, target culture, that we're aiming for, or (b) the more abstract range of cultures that will be constructable on our platform.
So from a training wheels perspective, we might totally eradicate words like "should". I did this! It was really helpful. But once I had removed the word from my idiolect, it became unhelpful to still be treating it as being a touchy word. Then I heard my mentor use it, and I remembered that the point of removing the word wasn't to not ever use it, but to train my brain to think without a particular structure that "should" represented.
This shows up on much larger scales too. Val from CFAR was talking about a particular kind of fierceness, "hellfire", that he sees as fundamental and important, and he noted that it seemed to be incompatible with the kind of culture my group is building. I initially agreed with him, which was kind of dissonant for my brain, but then I realized that hellfire was only incompatible with our training culture, not the entire set of cultures that could ultimately be built on our platform. That is, engaging with hellfire would potentially interfere with the learning process, but it's not ultimately proscribed by our culture platform.
I think it might be helpful to repeat the definition:
Pattern-botching is you pattern-match a thing "X", as following a certain model, but then but then implicit queries to that model return properties that aren't true about X. What makes this different from just having false beliefs is that you know the truth, but you're forgetting to use it because there's a botched model that is easier to use.
It's kind of like if you were doing a cargo-cult, except you knew how airplanes worked.
(Cross-posted from malcolmocean.com)
I'd like to increasing the well-being of those in the justice system while simultaneously reducing crime. I'm missing something here but I'm not sure what. I'm thinking this may be a worse idea than I originally thought based on comment feedback, though I'm still not 100% sure why this is the case.
While the prison system may not constitute an existential threat, At this moment more than 2,266,000 adults are incarcerated in the US alone, and I expect that being in prison greatly decreases QALYs for those incarcerated, that further QALYs are lost to victims of crime, family members of the incarcerated, and through the continuing effects of institutionalization and PTSD from sentences served in the current system, not to mention the brainpower and man-hours lost to any productive use.
If you haven't read these Meditations on Moloch, I highly recommend it. It’s long though, so the executive summary is: Moloch is the personification of the forces of competition which perverse incentives, a "race to the bottom" type situation where all human values are discarded in an effort to survive. That this can be solved with better coordination, but it is very hard to coordinate when perverse incentives also penalize the coordinators and reward dissenters. The prison industrial complex is an example of these perverse incentives. No one thinks that the current system is ideal but incentives prevent positive change and increase absolute unhappiness.
- Politicians compete for electability. Convicts can’t vote, prisons make campaign contributions and jobs, and appearing “tough on crime” appeals to a large portion of the voter base.
- Jails compete for money: the more prisoners they house, the more they are paid and the longer they can continue to exist. This incentive is strong for public prisons and doubly strong for private prisons.
- Police compete for bonuses and promotions, both of which are given as rewards to cops who bring in and convict more criminals
- Many of the inmates themselves are motivated to commit criminal acts by the small number of non-criminal opportunities available to them for financial success, besides criminal acts. After becoming a criminal, this number of opportunities is further narrowed by background checks.
The incentives have come far out of line with human values. What can be done to bring incentives back in alignment with the common good?
Using a model that predicts recidivism at sixty days, one year, three years, and five years, predict the expected recidivism rate for all inmates at all individual prison given average recidivism. Sixty days after release, if recidivism is below the predicted rate, the prison gets a small sum of money equaling 25% of the predicted cost to the state of dealing with the predicted recidivism (including lawyer fees, court fees, and jailing costs). This is repeated at one year, three years, and five years.
The statistical models would be readjusted with current data every years, so if this model causes recidivism to drop across the board, jails would be competing against ever higher standard, competing to create the most innovative and groundbreaking counseling and job skills and restorative methods so that they don’t lose their edge against other prisons competing for the same money. As it becomes harder and harder to edge out the competition’s advanced methods, and as the prison population is reduced, additional incentives could come by ending state contracts with the bottom 10% of prisons, or with any prisons who have recidivism rates larger than expected for multiple years in a row.
Note that this proposal makes no policy recommendations or value judgement besides changing the incentive structure. I have opinions on the sanity of certain laws and policies and the private prison system itself, but this specific proposal does not. Ideally, this will reduce some amount of partisan bickering.
Using this added success incentive, here are the modified motivations of each of the major actors.
- Politicians compete for electability. Convicts still can’t vote, prisons make campaign contributions, and appearing “tough on crime” still appeals to a large portion of the voter base. The politician can promise a reduction in crime without making any specific policy or program recommendations, thus shielding themselves from criticism of being soft on crime that might come from endorsing restorative justice or psychological counselling, for instance. They get to claim success for programs that other people, are in charge of administrating and designing. Further, they are saving 75% of the money predicted to have have been spent administrating criminals. Prisons love getting more money for doing the same amount of work so campaign contributions would stay stable or go up for politicians who support reduced recidivism bonuses.
- Prisons compete for money. It costs the state a huge amount of money to house prisoners, and the net profit from housing a prisoner is small after paying for food, clothing, supervision, space, repairs, entertainment, ect. An additional 25% of that cost, with no additional expenditures is very attractive. I predict that some amount of book-cooking will happen, but that the gains possible with book cooking are small compared to gains from actual improvements in their prison program. Small differences in prisons have potential to make large differences in post-prison behavior. I expect having an on-staff CBT psychiatrist would make a big difference; an addiction specialist would as well. A new career field is born: expert consultants who travel from private prison to private prison and make recommendations for what changes would reduce recidivism at the lowest possible cost.
- Police and judges retain the same incentives as before, for bonuses, prestige, and promotions. This is good for the system, because if their incentives were not running counter to the prisons and jails, then there would be a lot of pressure to cook the books by looking the other way on criminals til after the 60 day/1 year/5 year mark. I predict that there will be a couple scandals of cops found to be in league with prisons for a cut of the bonus, but that this method isn’t very profitable. For one thing, an entire police force would have to be corrupt and for another, criminals are mobile and can commit crimes in other precincts. Police are also motivated to work in safer areas, so the general program of rewarding reduced recidivism is to their advantage.
If it could be shown that a model for predicting recidivism is highly predictive, we will need to create another model to predict how much the government could save if switching to a bonus system, and what reduction of crime could be expected.
Halfway houses in Pennsylvania are already receiving non-recidivism bonuses. Is a pilot project using this pricing structure feasible?
I was recently re-reading a piece by Yvain/Scott Alexander called Epistemic Learned Helplessness. It's a very insightful post, as is typical for Scott, and I recommend giving it a read if you haven't already. In it he writes:
When I was young I used to read pseudohistory books; Immanuel Velikovsky's Ages in Chaos is a good example of the best this genre has to offer. I read it and it seemed so obviously correct, so perfect, that I could barely bring myself to bother to search out rebuttals.
And then I read the rebuttals, and they were so obviously correct, so devastating, that I couldn't believe I had ever been so dumb as to believe Velikovsky.
And then I read the rebuttals to the rebuttals, and they were so obviously correct that I felt silly for ever doubting.
And so on for several more iterations, until the labyrinth of doubt seemed inescapable.
He goes on to conclude that the skill of taking ideas seriously - often considered one of the most important traits a rationalist can have - is a dangerous one. After all, it's very easy for arguments to sound convincing even when they're not, and if you're too easily swayed by argument you can end up with some very absurd beliefs (like that Venus is a comet, say).
This post really resonated with me. I've had several experiences similar to what Scott describes, of being trapped between two debaters who both had a convincingness that exceeded my ability to discern truth. And my reaction in those situations was similar to his: eventually, after going through the endless chain of rebuttals and counter-rebuttals, changing my mind at each turn, I was forced to throw up my hands and admit that I probably wasn't going to be able to determine the truth of the matter - at least, not without spending a lot more time investigating the different claims than I was willing to. And so in many cases I ended up adopting a sort of semi-principled stance of agnosticism: unless it was a really really important question (in which case I was sort of obligated to do the hard work of investigating the matter to actually figure out the truth), I would just say I don't know when asked for my opinion.
[Non-exhaustive list of areas in which I am currently epistemically helpless: geopolitics (in particular the Israel/Palestine situation), anthropics, nutrition science, population ethics]
All of which is to say: I think Scott is basically right here, in many cases we shouldn't have too strong of an opinion on complicated matters. But when I re-read the piece recently I was struck by the fact that his whole argument could be summed up much more succinctly (albeit much more pithily) as:
"Don't be gullible."
Huh. Sounds a lot more obvious that way.
Now, don't get me wrong: this is still good advice. I think people should endeavour to not be gullible if at all possible. But it makes you wonder: why did Scott feel the need to write a post denouncing gullibility? After all, most people kind of already think being gullible is bad - who exactly is he arguing against here?
Well, recall that he wrote the post in response to the notion that people should believe arguments and take ideas seriously. These sound like good, LW-approved ideas, but note that unless you're already exceptionally smart or exceptionally well-informed, believing arguments and taking ideas seriously is tantamount to...well, to being gullible. In fact, you could probably think of gullibility as a kind of extreme and pathological form of lightness; a willingness to be swept away by the winds of evidence, no matter how strong (or weak) they may be.
There seems to be some tension here. On the one hand we have an intuitive belief that gullibility is bad; that the proper response to any new claim should be skepticism. But on the other hand we also have some epistemic norms here at LW that are - well, maybe they don't endorse being gullible, but they don't exactly not endorse it either. I'd say the LW memeplex is at least mildly friendly towards the notion that one should believe conclusions that come from convincing-sounding arguments, even if they seem absurd. A core tenet of LW is that we change our mind too little, not too much, and we're certainly all in favour of lightness as a virtue.
Anyway, I thought about this tension for a while and came to the conclusion that I had probably just lost sight of my purpose. The goal of (epistemic) rationality isn't to not be gullible or not be skeptical - the goal is to form correct beliefs, full stop. Terms like gullibility and skepticism are useful to the extent that people tend to be systematically overly accepting or dismissive of new arguments - individual beliefs themselves are simply either right or wrong. So, for example, if we do studies and find out that people tend to accept new ideas too easily on average, then we can write posts explaining why we should all be less gullible, and give tips on how to accomplish this. And if on the other hand it turns out that people actually accept far too few new ideas on average, then we can start talking about how we're all much too skeptical and how we can combat that. But in the end, in terms of becoming less wrong, there's no sense in which gullibility would be intrinsically better or worse than skepticism - they're both just words we use to describe deviations from the ideal, which is accepting only true ideas and rejecting only false ones.
This answer basically wrapped the matter up to my satisfaction, and resolved the sense of tension I was feeling. But afterwards I was left with an additional interesting thought: might gullibility be, if not a desirable end point, then an easier starting point on the path to rationality?
That is: no one should aspire to be gullible, obviously. That would be aspiring towards imperfection. But if you were setting out on a journey to become more rational, and you were forced to choose between starting off too gullible or too skeptical, could gullibility be an easier initial condition?
I think it might be. It strikes me that if you start off too gullible you begin with an important skill: you already know how to change your mind. In fact, changing your mind is in some ways your default setting if you're gullible. And considering that like half the freakin sequences were devoted to learning how to actually change your mind, starting off with some practice in that department could be a very good thing.
I consider myself to be...well, maybe not more gullible than average in absolute terms - I don't get sucked into pyramid scams or send money to Nigerian princes or anything like that. But I'm probably more gullible than average for my intelligence level. There's an old discussion post I wrote a few years back that serves as a perfect demonstration of this (I won't link to it out of embarrassment, but I'm sure you could find it if you looked). And again, this isn't a good thing - to the extent that I'm overly gullible, I aspire to become less gullible (Tsuyoku Naritai!). I'm not trying to excuse any of my past behaviour. But when I look back on my still-ongoing journey towards rationality, I can see that my ability to abandon old ideas at the (relative) drop of a hat has been tremendously useful so far, and I do attribute that ability in part to years of practice at...well, at believing things that people told me, and sometimes gullibly believing things that people told me. Call it epistemic deferentiality, or something - the tacit belief that other people know better than you (especially if they're speaking confidently) and that you should listen to them. It's certainly not a character trait you're going to want to keep as a rationalist, and I'm still trying to do what I can to get rid of it - but as a starting point? You could do worse I think.
Now, I don't pretend that the above is anything more than a plausibility argument, and maybe not a strong one at that. For one I'm not sure how well this idea carves reality at its joints - after all, gullibility isn't quite the same thing as lightness, even if they're closely related. For another, if the above were true, you would probably expect LWer's to be more gullible than average. But that doesn't seem quite right - while LW is admirably willing to engage with new ideas, no matter how absurd they might seem, the default attitude towards a new idea on this site is still one of intense skepticism. Post something half-baked on LW and you will be torn to shreds. Which is great, of course, and I wouldn't have it any other way - but it doesn't really sound like the behaviour of a website full of gullible people.
(Of course, on the other hand it could be that LWer's really are more gullible than average, but they're just smart enough to compensate for it)
Anyway, I'm not sure what to make of this idea, but it seemed interesting and worth a discussion post at least. I'm curious to hear what people think: does any of the above ring true to you? How helpful do you think gullibility is, if it is at all? Can you be "light" without being gullible? And for the sake of collecting information: do you consider yourself to be more or less gullible than average for someone of your intelligence level?
The European Community Weekend in Berlin is over and was plain awesome.
This is no complete report of the event but a place where you can e.g. comment on the event, link to photos or what else you want to share.
I'm not the organizer of the Meetup but I have been there and for me it was the most grand experience since last years European Community Weekend. Meeting so many energetic, compassionate and in general awesome people - some from last year or many new. Great presentations and workshops. And such a positive and open athmosphere.
Cheers to all participants!
See also the Facebook Group for the Community Event.
Nate Soares, MIRI's new Executive Director, is going to be answering questions tomorrow at the EA Forum (link). You can post your questions there now; he'll start replying Thursday, 15:00-18:00 US Pacific time.
Last week Monday, I took the reins as executive director of the Machine Intelligence Research Institute. MIRI focuses on studying technical problems of long-term AI safety. I'm happy to chat about what that means, why it's important, why we think we can make a difference now, what the open technical problems are, how we approach them, and some of my plans for the future.
I'm also happy to answer questions about my personal history and how I got here, or about personal growth and mindhacking (a subject I touch upon frequently in my blog, Minding Our Way), or about whatever else piques your curiosity.
Nate is a regular poster on LessWrong under the name So8res -- you can find stuff he's written in the past here.
Update: Question-answering is live!
Update #2: Looks like Nate's wrapping up now. Feel free to discuss the questions and answers, here or at the EA Forum.
Update #3: Here are some interesting snippets from the AMA:
Alex Altair: What are some of the most neglected sub-tasks of reducing existential risk? That is, what is no one working on which someone really, really should be?
Nate Soares: Policy work / international coordination. Figuring out how to build an aligned AI is only part of the problem. You also need to ensure that an aligned AI is built, and that’s a lot harder to do during an international arms race. (A race to the finish would be pretty bad, I think.)
I’d like to see a lot more people figuring out how to ensure global stability & coordination as we enter a time period that may be fairly dangerous.
Diego Caleiro: 1) Which are the implicit assumptions, within MIRI's research agenda, of things that "currently we have absolutely no idea of how to do that, but we are taking this assumption for the time being, and hoping that in the future either a more practical version of this idea will be feasible, or that this version will be a guiding star for practical implementations"? [...]
2) How do these assumptions diverge from how FLI, FHI, or non-MIRI people publishing on the AGI 2014 book conceive of AGI research?
3) Optional: Justify the differences in 2 and why MIRI is taking the path it is taking.
Nate Soares: 1) The things we have no idea how to do aren't the implicit assumptions in the technical agenda, they're the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)
We've tried to make it very clear in various papers that we're dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).
Right now, we basically have a bunch of big gaps in our knowledge, and we're trying to make mathematical models that capture at least part of the actual problem -- simplifying assumptions are the norm, not the exception. All I can easily say that common simplifying assumptions include: you have lots of computing power, there is lots of time between actions, you know the action set, you're trying to maximize a given utility function, etc. Assumptions tend to be listed in the paper where the model is described.
2) The FLI folks aren't doing any research; rather, they're administering a grant program. Most FHI folks are focused more on high-level strategic questions (What might the path to AI look like? What methods might be used to mitigate xrisk? etc.) rather than object-level AI alignment research. And remember that they look at a bunch of other X-risks as well, and that they're also thinking about policy interventions and so on. Thus, the comparison can't easily be made. (Eric Drexler's been doing some thinking about the object-level FAI questions recently, but I'll let his latest tech report fill you in on the details there. Stuart Armstrong is doing AI alignment work in the same vein as ours. Owain Evans might also be doing object-level AI alignment work, but he's new there, and I haven't spoken to him recently enough to know.)
Insofar as FHI folks would say we're making assumptions, I doubt they'd be pointing to assumptions like "UDT knows the policy set" or "assume we have lots of computing power" (which are obviously simplifying assumptions on toy models), but rather assumptions like "doing research on logical uncertainty now will actually improve our odds of having a working theory of logical uncertainty before it's needed."
3) I think most of the FHI folks & FLI folks would agree that it's important to have someone hacking away at the technical problems, but just to make the arguments more explicit, I think that there are a number of problems that it's hard to even see unless you have your "try to solve FAI" goggles on. [...]
We're still in the preformal stage, and if we can get this theory to the formal stage, I expect we may be able to get a lot more eyes on the problem, because the ever-crawling feelers of academia seem to be much better at exploring formalized problems than they are at formalizing preformal problems.
Then of course there's the heuristic of "it's fine to shout 'model uncertainty!' and hover on the sidelines, but it wasn't the armchair philosophers who did away with the epicycles, it was Kepler, who was up to his elbows in epicycle data." One of the big ways that you identify the things that need working on is by trying to solve the problem yourself. By asking how to actually build an aligned superintelligence, MIRI has generated a whole host of open technical problems, and I predict that that host will be a very valuable asset now that more and more people are turning their gaze towards AI alignment.
Nate Soares: (1) One of Peter's first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.
Imagine it's 1942. The Manhattan project is well under way, Leo Szilard has shown that it's possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the everloving breath out of each other, they can harness nuclear energy for fun and profit. In this scenario, would nuclear containment be a "speculative cause"?
There are currently thousands of person-hours and billions of dollars going towards increasing AI capabilities every year. To call AI alignment a "speculative cause" in an environment such as this one seems fairly silly to me. In what sense is it speculative to work on improving the safety of the tools that other people are currently building as fast as they can? Now, I suppose you could argue that either (a) AI will never work or (b) it will be safe by default, but both those arguments seem pretty flimsy to me.
You might argue that it's a bit weird for people to claim that the most effective place to put charitable dollars is towards some field of scientific study. Aren't charitable dollars supposed to go to starving children? Isn't the NSF supposed to handle scientific funding? And I'd like to agree, but society has kinda been dropping the ball on this one.
If we had strong reason to believe that humans could build strangelets, and society were pouring billions of dollars and thousands of human-years into making strangelets, and almost no money or effort was going towards strangelet containment, and it looked like humanity was likely to create a strangelet sometime in the next hundred years, then yeah, I'd say that "strangelet safety" would be an extremely worthy cause.
How worthy? Hard to say. I agree with Peter that it's hard to figure out how to trade off "safety of potentially-very-highly-impactful technology that is currently under furious development" against "children are dying of malaria", but the only way I know how to trade those things off is to do my best to run the numbers, and my back-of-the-envelope calculations currently say that AI alignment is further behind than the globe is poor.
Now that the EA movement is starting to look more seriously into high-impact interventions on the frontiers of science & mathematics, we're going to need to come up with more sophisticated ways to assess the impacts and tradeoffs. I agree it's hard, but I don't think throwing out everything that doesn't visibly pay off in the extremely short term is the answer.
(2) Alternatively, you could argue that MIRI's approach is unlikely to work. That's one of Peter's explicit arguments: it's very hard to find interventions that reliably affect the future far in advance, especially when there aren't hard objective metrics. I have three disagreements with Peter on this point.
First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn't necessarily mean humans have a really hard time generating math -- in fact, humans have a surprisingly good track record when it comes to generating math!
Humans actually seem to be pretty good at putting theoretical foundations underneath various fields when they try, and various people have demonstrably succeeded at this task (Church & Turing did this for computing, Shannon did this for information theory, Kolmogorov did a fair bit of this for probability theory, etc.). This suggests to me that humans are much better at producing technical progress in an unexplored field than they are at generating social outcomes in a complex economic environment. (I'd be interested in any attempt to quantitatively evaluate this claim.)
Second, I agree in general that any one individual team isn't all that likely to solve the AI alignment problem on their own. But the correct response to that isn't "stop funding AI alignment teams" -- it's "fund more AI alignment teams"! If you're trying to ensure that nuclear power can be harnessed for the betterment of humankind, and you assign low odds to any particular research group solving the containment problem, then the answer isn't "don't fund any containment groups at all," the answer is "you'd better fund a few different containment groups, then!"
Third, I object to the whole "there's no feedback" claim. Did Kolmogorov have tight feedback when he was developing an early formalization of probability theory? It seems to me like the answer is "yes" -- figuring out what was & wasn't a mathematical model of the properties he was trying to capture served as a very tight feedback loop (mathematical theorems tend to be unambiguous), and indeed, it was sufficiently good feedback that Kolmogorov was successful in putting formal foundations underneath probability theory.
Interstice: What is your AI arrival timeline?
Nate Soares: Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don't know.
Tarn Somervell Fletcher: What are MIRI's plans for publication over the next few years, whether peer-reviewed or arxiv-style publications?
More specifically, what are the a) long-term intentions and b) short-term actual plans for the publication of workshop results, and what kind of priority does that have?
Nate Soares: Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.
Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We've had a number of papers rejected from various journals due to the "weird AI motivation.") Going forward, it looks like that will be less of an issue.
That said, writing capability is a huge bottleneck right now. Our researchers are currently trying to (a) run workshops, (b) engage with & evaluate promising potential researchers, (c) attend conferences, (d) produce new research, (e) write it up, and (f) get it published. That's a lot of things for a three-person research team to juggle! Priority number 1 is to grow the research team (because otherwise nothing will ever be unblocked), and we're aiming to hire a few new researchers before the year is through. After that, increasing our writing output is likely the next highest priority.
Expect our writing output this year to be similar to last year's (i.e., a small handful of peer reviewed papers and a larger handful of technical reports that might make it onto the arXiv), and then hopefully we'll have more & higher quality publications starting in 2016 (the publishing pipeline isn't particularly fast).
Tor Barstad: Among recruiting new talent and having funding for new positions, what is the greatest bottleneck?
Nare Soares: Right now we’re talent-constrained, but we’re also fairly well-positioned to solve that problem over the next six months. Jessica Taylor is joining us in august. We have another researcher or two pretty far along in the pipeline, and we’re running four or five more research workshops this summer, and CFAR is running a summer fellows program in July. It’s quite plausible that we’ll hire a handful of new researchers before the end of 2015, in which case our runway would start looking pretty short, and it’s pretty likely that we’ll be funding constrained again by the end of the year.
Diego Caleiro: I see a trend in the way new EAs concerned about the far future think about where to donate money that seems dangerous, it goes:
I am an EA and care about impactfulness and neglectedness -> Existential risk dominates my considerations -> AI is the most important risk -> Donate to MIRI.
The last step frequently involves very little thought, it borders on a cached thought.
Nate Soares: Huh, that hasn't been my experience. We have a number of potential donors who ring us up and ask who in AI alignment needs money the most at the moment. (In fact, last year, we directed a number of donors to FHI, who had much more of a funding gap than MIRI did at that time.)
1. What are your plans for taking MIRI to the next level? What is the next level?
2. Now that MIRI is focused on math research (a good move) and not on outreach, there is less of a role for volunteers and supporters. With the donation from Elon Musk, some of which will presumably get to MIRI, the marginal value of small donations has gone down. How do you plan to keep your supporters engaged and donating? (The alternative, which is perhaps feasible, could be for MIRI to be an independent research institution, without a lot of public engagement, funded by a few big donors.)
1. (a) grow the research team, (b) engage more with mainstream academia. I'd also like to spend some time experimenting to figure out how to structure the research team so as to make it more effective (we have a lot of flexibility here that mainstream academic institutes don't have). Once we have the first team growing steadily and running smoothly, it's not entirely clear whether the next step will be (c.1) grow it faster or (c.2) spin up a second team inside MIRI taking a different approach to AI alignment. I'll punt that question to future-Nate.
2. So first of all, I'm not convinced that there's less of a role for supporters. If we had just ten people earning-to-give at the (amazing!) level of Ethan Dickinson, Jesse Liptrap, Mike Blume, or Alexei Andreev (note: Alexei recently stopped earning-to-give in order to found a startup), that would bring in as much money per year as the Thiel Foundation. (I think people often vastly overestimate how many people are earning-to-give to MIRI, and underestimate how useful it is: the small donors taken together make a pretty big difference!)
Furthermore, if we successfully execute on (a) above, then we're going to be burning through money quite a bit faster than before. An FLI grant (if we get one) will certainly help, but I expect it's going to be a little while before MIRI can support itself on large donations & grants alone.
We looked at the cloudy night sky and thought it would be interesting to share the ways in which, in the past, we made mistakes we would have been able to overcome, if only we had been stronger as rationalists. The experience felt valuable and humbling. So why not do some more of it on Lesswrong?
An antithesis to the Bragging Thread, this is a thread to share where we made mistakes. Where we knew we could, but didn't. Where we felt we were wrong, but carried on anyway.
As with the recent group bragging thread, anything you've done wrong since the comet killed the dinosaurs is fair game, and if it happens to be a systematic mistake that over long periods of time systematically curtailed your potential, that others can try to learn avoiding, better.
This thread is an attempt to see if there are exceptions to the cached thought that life experience cannot be learned but has to be lived. Let's test this belief together!
So I built two (fairly similar) games inspired by Zendo; they generate rules and play as sensei. The code is on GitHub, along with some more explanation. To run the games you'll need to install Python 3, and Scikit-Learn for the second game; see the readme.
All bugfixes and improvements are welcome. For instance, more rule classes or features would improve the game and be pretty easy to code. Also, if anyone has a website and wants to host this playable online (with CGI, say), that would be awesome.
I work at a small but feisty research team whose focus is biomedical informatics, i.e. mining biomedical data. Especially anonymized hospital records pooled over multiple healthcare networks. My personal interest is ultimately life-extension, and my colleagues are warming up to the idea as well. But the short-term goal that will be useful many different research areas is building infrastructure to massively accelerate hypothesis testing on and modelling of retrospective human data.
We have a job posting here (permanent, non-faculty, full-time, benefits):
If you can program, want to work in an academic research setting, and can relocate to San Antonio, TX, I invite you to apply. Thanks.
Note: The first step of the recruitment process will be a coding challenge, which will include an arithmetical or string-manipulation problem to solve in real-time using a language and developer tools of your choice.
I am very much interested in examples of non-human optimization processes producing working, but surprising solutions. What is most fascinating is how they show human approach is often not the only one and much more alien solutions can be found, which humans are just not capable of conceiving. It is very probable, that more and more such solutions will arise and will slowly make big part of technology ununderstandable by humans.
I present following examples and ask for linking more in comments:
1. Nick Bostrom describes efforts in evolving circuits that would produce oscilloscope and frequency discriminator, that yielded very unorthodox designs:
http://homepage.ntlworld.com/r.stow1/jb/publications/Bird_CEC2002.pdf (IV. B. Oscillator Experiments; also C. and D. in that section)
2. Algorithms learns to play NES games with some eerie strategies:
https://youtu.be/qXXZLoq2zFc?t=361 (description by Vsause)
http://hackaday.com/2013/04/14/teaching-a-computer-to-play-mario-seemingly-through-voodoo/ (more info)
3. Eurisko finding unexpected way of winning Traveller TCS stratedy game:
[Many people have been complaining about the lack of new content on LessWrong lately, so I thought I'd cross-post my latest blog post here in discussion. Feel free to critique the content as much as you like, but please do keep in mind that I wrote this for my personal blog and not with LW in mind specifically, so some parts might not be up to LW standards, whereas others might be obvious to everyone here. In other words...well, be gentle]
You know what’s scarier than having enemy soldiers at your border?
Having sleeper agents within your borders.
Enemy soldiers are malevolent, but they are at least visibly malevolent. You can see what they’re doing; you can fight back against them or set up defenses to stop them. Sleeper agents on the other hand are malevolent and invisible. They are a threat and you don’t know that they’re a threat. So when a sleeper agent decides that it’s time to wake up and smell the gunpowder, not only will you be unable to stop them, but they’ll be in a position to do far more damage than a lone soldier ever could. A single well-placed sleeper agent can take down an entire power grid, or bring a key supply route to a grinding halt, or – in the worst case – kill thousands with an act of terrorism, all without the slightest warning.
Okay, so imagine that your country is in wartime, and that a small group of vigilant citizens has uncovered an enemy sleeper cell in your city. They’ve shown you convincing evidence for the existence of the cell, and demonstrated that the cell is actively planning to commit some large-scale act of violence – perhaps not imminently, but certainly in the near-to-mid-future. Worse, the cell seems to have even more nefarious plots in the offing, possibly involving nuclear or biological weapons.
Now imagine that when you go to investigate further, you find to your surprise and frustration that no one seems to be particularly concerned about any of this. Oh sure, they acknowledge that in theory a sleeper cell could do some damage, and that the whole matter is probably worthy of further study. But by and large they just hear you out and then shrug and go about their day. And when you, alarmed, point out that this is not just a theory – that you have proof that a real sleeper cell is actually operating and making plans right now – they still remain remarkably blase. You show them the evidence, but they either don’t find it convincing, or simply misunderstand it at a very basic level (“A wiretap? But sleeper agents use cellphones, and cellphones are wireless!”). Some people listen but dismiss the idea out of hand, claiming that sleeper cell attacks are “something that only happen in the movies”. Strangest of all, at least to your mind, are the people who acknowledge that the evidence is convincing, but say they still aren’t concerned because the cell isn’t planning to commit any acts of violence imminently, and therefore won’t be a threat for a while. In the end, all of your attempts to raise the alarm are to no avail, and you’re left feeling kind of doubly scared – scared first because you know the sleeper cell is out there, plotting some heinous act, and scared second because you know you won’t be able to convince anyone of that fact before it’s too late to do anything about it.
This is roughly how I feel about AI risk.
You see, I think artificial intelligence is probably the most significant existential threat facing humanity right now. This, to put it mildly, is something of a fringe position in most intellectual circles (although that’s becoming less and less true as time goes on), and I’ll grant that it sounds kind of absurd. But regardless of whether or not you think I’m right to be scared of AI, you can imagine how the fact that AI risk is really hard to explain would make me even more scared about it. Threats like nuclear war or an asteroid impact, while terrifying, at least have the virtue of being simple to understand – it’s not exactly hard to sell people on the notion that a 2km hunk of rock colliding with the planet might be a bad thing. As a result people are aware of these threats and take them (sort of) seriously, and various organizations are (sort of) taking steps to stop them.
AI is different, though. AI is more like the sleeper agents I described above – frighteningly invisible. The idea that AI could be a significant risk is not really on many people’s radar at the moment, and worse, it’s an idea that resists attempts to put it on more people’s radar, because it’s so bloody confusing a topic even at the best of times. Our civilization is effectively blind to this threat, and meanwhile AI research is making progress all the time. We’re on the Titanic steaming through the North Atlantic, unaware that there’s an iceberg out there with our name on it – and the captain is ordering full-speed ahead.
(That’s right, not one but two ominous metaphors. Can you see that I’m serious?)
But I’m getting ahead of myself. I should probably back up a bit and explain where I’m coming from.
Artificial intelligence has been in the news lately. In particular, various big names like Elon Musk, Bill Gates, and Stephen Hawking have all been sounding the alarm in regards to AI, describing it as the greatest threat that our species faces in the 21st century. They (and others) think it could spell the end of humanity – Musk said, “If I had to guess what our biggest existential threat is, it’s probably [AI]”, and Gates said, “I…don’t understand why some people are not concerned [about AI]”.
Of course, others are not so convinced – machine learning expert Andrew Ng said that “I don’t work on not turning AI evil today for the same reason I don’t worry about the problem of overpopulation on the planet Mars”.
In this case I happen to agree with the Musks and Gates of the world – I think AI is a tremendous threat that we need to focus much of our attention on it in the future. In fact I’ve thought this for several years, and I’m kind of glad that the big-name intellectuals are finally catching up.
Why do I think this? Well, that’s a complicated subject. It’s a topic I could probably spend a dozen blog posts on and still not get to the bottom of. And maybe I should spend those dozen-or-so blog posts on it at some point – it could be worth it. But for now I’m kind of left with this big inferential gap that I can’t easily cross. It would take a lot of explaining to explain my position in detail. So instead of talking about AI risk per se in this post, I thought I’d go off in a more meta-direction – as I so often do – and talk about philosophical differences in general. I figured if I couldn’t make the case for AI being a threat, I could at least make the case for making the case for AI being a threat.
(If you’re still confused, and still wondering what the whole deal is with this AI risk thing, you can read a not-too-terrible popular introduction to the subject here, or check out Nick Bostrom’s TED Talk on the topic. Bostrom also has a bestselling book out called Superintelligence. The one sentence summary of the problem would be: how do we get a superintelligent entity to want what we want it to want?)
(Trust me, this is much much harder than it sounds)
So: why then am I so meta-concerned about AI risk? After all, based on the previous couple paragraphs it seems like the topic actually has pretty decent awareness: there are popular internet articles and TED talks and celebrity intellectual endorsements and even bestselling books! And it’s true, there’s no doubt that a ton of progress has been made lately. But we still have a very long way to go. If you had seen the same number of online discussions about AI that I’ve seen, you might share my despair. Such discussions are filled with replies that betray a fundamental misunderstanding of the problem at a very basic level. I constantly see people saying things like “Won’t the AI just figure out what we want?”, or “If the AI gets dangerous why can’t we just unplug it?”, or “The AI can’t have free will like humans, it just follows its programming”, or “lol so you’re scared of Skynet?”, or “Why not just program it to maximize happiness?”.
Having read a lot about AI, these misunderstandings are frustrating to me. This is not that unusual, of course – pretty much any complex topic is going to have people misunderstanding it, and misunderstandings often frustrate me. But there is something unique about the confusions that surround AI, and that’s the extent to which the confusions are philosophical in nature.
Why philosophical? Well, artificial intelligence and philosophy might seem very distinct at first glance, but look closer and you’ll see that they’re connected to one another at a very deep level. Take almost any topic of interest to philosophers – free will, consciousness, epistemology, decision theory, metaethics – and you’ll find an AI researcher looking into the same questions. In fact I would go further and say that those AI researchers are usually doing a better job of approaching the questions. Daniel Dennet said that “AI makes philosophy honest”, and I think there’s a lot of truth to that idea. You can’t write fuzzy, ill-defined concepts into computer code. Thinking in terms of having to program something that actually works takes your head out of the philosophical clouds, and puts you in a mindset of actually answering questions.
All of which is well and good. But the problem with looking at philosophy through the lens of AI is that it’s a two-way street – it means that when you try to introduce someone to the concepts of AI and AI risk, they’re going to be hauling all of their philosophical baggage along with them.
And make no mistake, there’s a lot of baggage. Philosophy is a discipline that’s notorious for many things, but probably first among them is a lack of consensus (I wouldn’t be surprised if there’s not even a consensus among philosophers about how much consensus there is among philosophers). And the result of this lack of consensus has been a kind of grab-bag approach to philosophy among the general public – people see that even the experts are divided, and think that that means they can just choose whatever philosophical position they want.
Want. That’s the key word here. People treat philosophical beliefs not as things that are either true or false, but as choices – things to be selected based on their personal preferences, like picking out a new set of curtains. They say “I prefer to believe in a soul”, or “I don’t like the idea that we’re all just atoms moving around”. And why shouldn’t they say things like that? There’s no one to contradict them, no philosopher out there who can say “actually, we settled this question a while ago and here’s the answer”, because philosophy doesn’t settle things. It’s just not set up to do that. Of course, to be fair people seem to treat a lot of their non-philosophical beliefs as choices as well (which frustrates me to no end) but the problem is particularly pronounced in philosophy. And the result is that people wind up running around with a lot of bad philosophy in their heads.
(Oh, and if that last sentence bothered you, if you’d rather I said something less judgmental like “philosophy I disagree with” or “philosophy I don’t personally happen to hold”, well – the notion that there’s no such thing as bad philosophy is exactly the kind of bad philosophy I’m talking about)
(he said, only 80% seriously)
Anyway, I find this whole situation pretty concerning. Because if you had said to me that in order to convince people of the significance of the AI threat, all we had to do was explain to them some science, I would say: no problem. We can do that. Our society has gotten pretty good at explaining science; so far the Great Didactic Project has been far more successful than it had any right to be. We may not have gotten explaining science down to a science, but we’re at least making progress. I myself have been known to explain scientific concepts to people every now and again, and fancy myself not half-bad at it.
Philosophy, though? Different story. Explaining philosophy is really, really hard. It’s hard enough that when I encounter someone who has philosophical views I consider to be utterly wrong or deeply confused, I usually don’t even bother trying to explain myself – even if it’s someone I otherwise have a great deal of respect for! Instead I just disengage from the conversation. The times I’ve done otherwise, with a few notable exceptions, have only ended in frustration – there’s just too much of a gap to cross in one conversation. And up until now that hasn’t really bothered me. After all, if we’re being honest, most philosophical views that people hold aren’t that important in grand scheme of things. People don’t really use their philosophical views to inform their actions – in fact, probably the main thing that people use philosophy for is to sound impressive at parties.
AI risk, though, has impressed upon me an urgency in regards to philosophy that I’ve never felt before. All of a sudden it’s important that everyone have sensible notions of free will or consciousness; all of a sudden I can’t let people get away with being utterly confused about metaethics.
All of a sudden, in other words, philosophy matters.
I’m not sure what to do about this. I mean, I guess I could just quit complaining, buckle down, and do the hard work of getting better at explaining philosophy. It’s difficult, sure, but it’s not infinitely difficult. I could write blogs posts and talk to people at parties, and see what works and what doesn’t, and maybe gradually start changing a few people’s minds. But this would be a long and difficult process, and in the end I’d probably only be able to affect – what, a few dozen people? A hundred?
And it would be frustrating. Arguments about philosophy are so hard precisely because the questions being debated are foundational. Philosophical beliefs form the bedrock upon which all other beliefs are built; they are the premises from which all arguments start. As such it’s hard enough to even notice that they’re there, let alone begin to question them. And when you do notice them, they often seem too self-evident to be worth stating.
Take math, for example – do you think the number 5 exists, as a number?
Well, guess what – some philosophers debate this!
It’s actually surprisingly hard to find an uncontroversial position in philosophy. Pretty much everything is debated. And of course this usually doesn’t matter – you don’t need philosophy to fill out a tax return or drive the kids to school, after all. But when you hold some foundational beliefs that seem self-evident, and you’re in a discussion with someone else who holds different foundational beliefs, which they also think are self-evident, problems start to arise. Philosophical debates usually consist of little more than two people talking past one another, with each wondering how the other could be so stupid as to not understand the sheer obviousness of what they’re saying. And the annoying this is, both participants are correct – in their own framework, their positions probably are obvious. The problem is, we don’t all share the same framework, and in a setting like that frustration is the default, not the exception.
This is not to say that all efforts to discuss philosophy are doomed, of course. People do sometimes have productive philosophical discussions, and the odd person even manages to change their mind, occasionally. But to do this takes a lot of effort. And when I say a lot of effort, I mean a lot of effort. To make progress philosophically you have to be willing to adopt a kind of extreme epistemic humility, where your intuitions count for very little. In fact, far from treating your intuitions as unquestionable givens, as most people do, you need to be treating them as things to be carefully examined and scrutinized with acute skepticism and even wariness. Your reaction to someone having a differing intuition from you should not be “I’m right and they’re wrong”, but rather “Huh, where does my intuition come from? Is it just a featureless feeling or can I break it down further and explain it to other people? Does it accord with my other intuitions? Why does person X have a different intuition, anyway?” And most importantly, you should be asking “Do I endorse or reject this intuition?”. In fact, you could probably say that the whole history of philosophy has been little more than an attempt by people to attain reflective equilibrium among their different intuitions – which of course can’t happen without the willingness to discard certain intuitions along the way when they conflict with others.
I guess what I’m trying to say is: when you’re discussing philosophy with someone and you have a disagreement, your foremost goal should be to try to find out exactly where your intuitions differ. And once you identify that, from there the immediate next step should be to zoom in on your intuitions – to figure out the source and content of the intuition as much as possible. Intuitions aren’t blank structureless feelings, as much as it might seem like they are. With enough introspection intuitions can be explicated and elucidated upon, and described in some detail. They can even be passed on to other people, assuming at least some kind of basic common epistemological framework, which I do think all humans share (yes, even objective-reality-denying postmodernists).
Anyway, this whole concept of zooming in on intuitions seems like an important one to me, and one that hasn’t been emphasized enough in the intellectual circles I travel in. When someone doesn’t agree with some basic foundational belief that you have, you can’t just throw up your hands in despair – you have to persevere and figure out why they don’t agree. And this takes effort, which most people aren’t willing to expend when they already see their debate opponent as someone who’s being willfully stupid anyway. But – needless to say – no one thinks of their positions as being a result of willful stupidity. Pretty much everyone holds beliefs that seem obvious within the framework of their own worldview. So if you want to change someone’s mind with respect to some philosophical question or another, you’re going to have to dig deep and engage with their worldview. And this is a difficult thing to do.
Hence, the philosophical quagmire that we find our society to be in.
It strikes me that improving our ability to explain and discuss philosophy amongst one another should be of paramount importance to most intellectually serious people. This applies to AI risk, of course, but also to many everyday topics that we all discuss: feminism, geopolitics, environmentalism, what have you – pretty much everything we talk about grounds out to philosophy eventually, if you go deep enough or meta enough. And to the extent that we can’t discuss philosophy productively right now, we can’t make progress on many of these important issues.
I think philosophers should – to some extent – be ashamed of the state of their field right now. When you compare philosophy to science it’s clear that science has made great strides in explaining the contents of its findings to the general public, whereas philosophy has not. Philosophers seem to treat their field as being almost inconsequential, as if whatever they conclude at some level won’t matter. But this clearly isn’t true – we need vastly improved discussion norms when it comes to philosophy, and we need far greater effort on the part of philosophers when it comes to explaining philosophy, and we need these things right now. Regardless of what you think about AI, the 21st century will clearly be fraught with difficult philosophical problems – from genetic engineering to the ethical treatment of animals to the problem of what to do with global poverty, it’s obvious that we will soon need philosophical answers, not just philosophical questions. Improvements in technology mean improvements in capability, and that means that things which were once merely thought experiments will be lifted into the realm of real experiments.
I think the problem that humanity faces in the 21st century is an unprecedented one. We’re faced with the task of actually solving philosophy, not just doing philosophy. And if I’m right about AI, then we have exactly one try to get it right. If we don’t, well..
Well, then the fate of humanity may literally hang in the balance.
June 2nd, 42 After Fall
Somewhere in the Colorado Mountains
They first caught sight of the man walking a few miles from the compound. At least it looked like a man. Faded jeans, white t-shirt, light jacket, rucksack. White skin, light brown hair. No obvious disabilities. No logos.
They kept him under surveillance as he approached. In other times they might have shot him on sight, but not now. They were painfully aware of the bounds of sustainable genetic diversity, so instead they drove over in a battered van, rifles loaded, industrial earmuffs in place. Once he was on his knees, they sent Javid the Unhearing over to bind and gag him, then bundled him into the van. No reason to risk exposure.
Javid had not always been deaf, but it was an honor. Some must sacrifice for the good of the others, and he was proud to defend the Sanctum at Rogers Ford.
Once back at the complex, they moved the man to a sound-proofed holding room and unbound him. An ancient PC sat on the desk, marked “Imp Association”. The people did not know who the Imp Association were, but they were grateful for it. Perhaps it was a gift from Olson. Praise be to Olson.
With little else to do, the man sat down and read the instructions on the screen. A series of words showed, and he was commanded to select left or right based on various different criteria. It was very confusing.
In a different room, watchers huddled around a tiny screen, looking at a series of numbers.
REP/DEM 0.0012 0.39 0.003
Good. That was a very good start.
FEM/MRA -0.0082 0.28 -0.029
SJW/NRX 0.0065 0.54 0.012
Eventually they passed the lines the catechism denoted “purge with fire and never speak thereof”, on to those merely marked as “highly dangerous”.
KO/PEP 0.1781 0.6 0.297
Not as good, but still within the proscribed tolerances. They would run the supplemental.
T_JCB/T_EWD -0.0008 1.2 -0.001
The test continued for some time, until eventually the cleric intoned, “The Trial by Fish is complete. He has passed the Snedecor Fish.” The people nodded as if they understood, then proceeded to the next stage.
This was more dangerous. This required a sacrifice.
She was young – just 15 years old. Fresh faced with long blond hair tied back, Sophia had a cute smile: she was perfect for the duty. Her family were told it was an honor to have their daughter selected.
Sophia entered the room, trepidation in her head, a smile on her face. Casually, she offered him a drink, “Hey, sorry you have to go through all this testin’. You must be hot! Would you like a co cuh?” Her relaxed intonation disguised the fact that these words were the proscribed words, passed down through generations, memorized and cherished as a ward against evil. He accepted the bottle of dark liquid and drank, before tossing the recyclable container in the bin.
In the other room, a box marked ‘ECO’ was ticked off.
“Oh, I’m sorry! I made a mistake – that’s pep-see. I’m so sorry!” she gushed in apology. He assured her it was fine.
In the other room, the cleric satisfied himself that the loyalty brand was burning at zero.
She moved on to the next proscribed question, with the ordained level of casualness, “Say, I know this is a silly question, but do you ever get a song stuck in your head?”
“You know, like you just can’t stop singing it to yourself? Yeah?” Of course, she had no idea what this was like. She was alive.
“Ummm, sorry, no.”
She turned and left the room, relief filling her eyes.
After three more days of testing, the man was allowed into the compound. Despite the ravages of an evolution with a generational frequency a hundred times that of humanity, he had somehow preserved himself. He was clean of viral memetic payload. He was alive.
Cross-posted on my blog
I was stunned to read the accounts quoted below. They're claiming that the notion of morality - in the sense of there being a special category of things that you should or should not do for the sake of the things themselves being inherently right or wrong - might not only be a recent invention, but also an incoherent one. Even when I had read debates about e.g. moral realism, I had always understood even the moral irrealists as acknowledging that there are genuine moral attitudes that are fundamentally ingrained in people. But I hadn't ran into a position claiming that it was actually possible for whole cultures to simply not have a concept of morality in the first place.
I'm amazed that I haven't heard these claims discussed more. If they're accurate, then they seem to me to provide a strong argument for both deontology and consequentialism - at least as they're usually understood here - to be not even wrong. Just rationalizations of concepts that got their origin from Judeo-Christian laws and which people held onto because they didn't know of any other way of thinking.
As for morally, we must observe at once – again following Anscombe – that Plato and Aristotle, having no word for “moral,” could not even form a phrase equivalent to “morally right.” The Greek thik aret means “excellence of character,” not “moral virtue”; 2 Cicero's virtus moralis, from which the English phrase descends directly, is simply the Latin for thik aret. This is not the lexical fallacy; it is not just that the word ‘moral’ was missing. The whole idea of a special category called “the moral” was missing. Strictly speaking, the Aristotelian phrase ta thika is simply a generalizing substantive formed on th, “characteristic behaviors,” just as the Ciceronian moralia is formed on mores. To be fully correct – admittedly it would be a bit cumbersome – we should talk not of Aristotle's Nicomachean Ethics but of his Studies-of-our-characteristic-behaviors Edited-by-Nicomachus.Plato and Aristotle were interested – especially Plato – in the question how the more stringent demands of a good disposition like justice or temperance or courage could be reasonable demands, demands that it made sense to obey even at extreme cost. It never occurred to them, as it naturally does to moderns, to suggest that these demands were to be obeyed simply because they were demands of a special, magically compulsive sort: moral demands.Their answer was always that, to show that we have reason to obey the strong demands that can emerge from our good dispositions, we must show that what they demand is in some way a necessary means to or part of human well-being (eudaimonia). If it must be classified under the misconceived modern distinction between “the moral” and “the prudential,” this answer clearly falls into the prudential category. 4 When modern readers who have been brought up on our moral/ prudential distinction see Plato's and Aristotle's insistence on rooting the reasons that the virtues give us in the notion of well-being, they regularly classify both as “moral egoists.” But that is a misapplication to them of a distinction that they were right not to recognize.When we turn from the Greeks to Kant and the classical utilitarians, we may doubt whether they shared the modern interest in finding a neat definition of the “morally right” any more than Plato or Aristotle did. Kant proposed, at most, a necessary (not necessary and sufficient) condition on rationally permissible (not morally right5) action for an individual agent – and had even greater than his usual difficulty expressing this condition at all pithily. The utilitarians often were more interested in jurisprudence than in individual action, and where they addressed the latter – as J. S. Mill often does, but Bentham usually does not – tended, in the interests of long-term utility, to stick remarkably close to the deliverances of that version of “common-sense morality” that was recognized by high-minded Victorian liberals like themselves. When Kant and the utilitarians disagreed, it was not about the question “What are the necessary and sufficient conditions of morally right action?” They weren't even asking that question.
The terms "should" or "ought" or "needs" relate to good and bad: e.g. machinery needs oil, or should or ought to be oiled, in that running without oil is bad for it, or it runs badly without oil. According to this conception, of course, "should" and "ought" are not used in a special "moral" sense when one says that a man should not bilk. (In Aristotle's sense of the term "moral" [...], they are being used in connection with a moral subject-matter: namely that of human passions and (non-technical) actions.) But they have now acquired a special so-called "moral" sense — i.e. a sense in which they imply some absolute verdict (like one of guilty/not guilty on a man) on what is described in the "ought" sentences used in certain types of context: not merely the contexts that Aristotle would call "moral" — passions and actions — but also some of the contexts that he would call "intellectual."The ordinary (and quite indispensable) terms "should," "needs," "ought," "must" — acquired this special sense by being equated in the relevant contexts with "is obliged," or "is bound," or "is required to," in the sense in which one can be obliged or bound by law, or something can be required by law.How did this come about? The answer is in history: between Aristotle and us came Christianity, with its law conception of ethics. For Christianity derived its ethical no- tions from the Torah. [...]In consequence of the dominance of Christianity for many centuries, the concepts of being bound, permitted, or excused became deeply embedded in our language and thought. The Greek word "aiu,avravav," the aptest to be turned to that use, acquired the sense "sin," from having meant "mistake," "missing the mark," "going wrong." The Latin peccatum which roughly corresponded to aiu,avriiu,a was even apter for the sense "sin," because it was already associated with "culpa" — "guilt" — a juridical notion. The blanket term "illicit," "unlawful," meaning much the same as our blanket term "wrong," explains itself. It is interesting that Aristotle did not have such a blanket term. He has blanket terms for wickedness — "villain," "scoundrel"; but of course a man is not a villain or a scoundrel by the performance of one bad action, or a few bad actions. And he has terms like "disgraceful," "impious"; and specific terms signifying defect of the relevant virtue, like "unjust"; but no term corresponding to "illicit." The extension of this term (i.e. the range of its application) could be indicated in his terminology only by a quite lengthy sentence: that is "illicit" which, whether it is a thought or a consented-to passion or an action or an omission in thought or action, is something contrary to one of the virtues the lack of which shows a man to be bad qua man. That formulation would yield a concept co-extensive with the concept "illicit."To have a law conception of ethics is to hold that what is needed for conformity with the virtues failure in which is the mark of being bad qua man (and not merely, say, qua craftsman or logician) — that what is needed for this , is required by divine law. Naturally it is not possible to have such a conception unless you believe in God as a law-giver; like Jews, Stoics, and Christians. But if such a conception is dominant for many centuries, and then is given up, it is a natural result that the concepts of "obligation," of being bound or required as by a law, should remain though they had lost their root; and if the word "ought" has become invested in certain contexts with the sense of "obligation," it too will remain to be spoken with a special emphasis and special feeling in these contexts.It is as if the notion "criminal" were to remain when criminal law and criminal courts had been abolished and forgotten. A Hume discovering this situation might conclude that there was a special sentiment, expressed by "criminal," which alone gave the word its sense. So Hume discovered the situation which the notion "obligation" survived, and the notion "ought" was invested with that peculiar for having which it is said to be used in a "moral" sense, but in which the belief in divine law had long since been abandoned: for it was substantially given up among Protestants at the time of the Reformation.2The situation, if I am right, was the interesting one of the survival of a concept outside the framework of thought that made it a really intelligible one.
Epistemic status: speculating about things I'm not familiar with; hoping to be educated in the comments. This post is a question, not an answer.
ETA: this comment thread seems to be leading towards the best answer so far.
There's a question I've seen many times, most recently in Scott Alexander's recent links thread. This latest variant goes like this:
Old question “why does evolution allow homosexuality to exist when it decreases reproduction?” seems to have been solved, at least in fruit flies: the female relatives of gayer fruit flies have more children. Same thing appears to be true in humans. Unclear if lesbianism has a similar aetiology.
Obligate male homosexuality greatly harms reproductive fitness. And so, the argument goes, there must be some other selection pressure, one great enough to overcome the drastic effect of not having any children. The comments on that post list several other proposed answers, all of them suggesting a tradeoff vs. a benefit elsewhere: for instance, that it pays to have some proportion of gay men who invest their resources in their nieces and nephews instead of their own children.
But how do we know if this is a valid question - if the situation really needs to be explained at all?
Cross Posted at the EA Forum
At Event Horizon (a Rationalist/Effective Altruist house in Berkeley) my roommates yesterday were worried about Slate Star Codex. Their worries also apply to the Effective Altruism Forum, so I'll extend them.
Lesswrong was for many years the gravitational center for young rationalists worldwide, and it permits posting by new users, so good new ideas had a strong incentive to emerge.
With the rise of Slate Star Codex, the incentive for new users to post content on Lesswrong went down. Posting at Slate Star Codex is not open, so potentially great bloggers are not incentivized to come up with their ideas, but only to comment on the ones there.
The Effective Altruism forum doesn't have that particular problem. It is however more constrained in terms of what can be posted there. It is after all supposed to be about Effective Altruism.
We thus have three different strong attractors for the large community of people who enjoy reading blog posts online and are nearby in idea space.
(EDIT: By possible solutions I merely mean to say "these are some bad solutions I came up with in 5 minutes, and the reason I'm posting them here is because if I post bad solutions, other people will be incentivized to post better solutions)
If Slate Star Codex became an open blog like Lesswrong, more people would consider transitioning from passive lurkers to actual posters.
If the Effective Altruism Forum got as many readers as Lesswrong, there could be two gravity centers at the same time.
If the moderation and self selection of Main was changed into something that attracts those who have been on LW for a long time, and discussion was changed to something like Newcomers discussion, LW could go back to being the main space, with a two tier system (maybe one modulated by karma as well).
In the past there was Overcoming Bias, and Lesswrong in part became a stronger attractor because it was more open. Eventually lesswrongers migrated from Main to Discussion, and from there to Slate Star Codex, 80k blog, Effective Altruism forum, back to Overcoming Bias, and Wait But Why.
It is possible that Lesswrong had simply exerted it's capacity.
It is possible that a new higher tier league was needed to keep post quality high.
I suggest two things should be preserved:
Interesting content being created by those with more experience and knowledge who have interacted in this memespace for longer (part of why Slate Star Codex is powerful), and
The opportunity (and total absence of trivial inconveniences) for new people to try creating their own new posts.
If these two properties are kept, there is a lot of value to be gained by everyone.
The Status Quo:
I feel like we are living in a very suboptimal blogosphere. On LW, Discussion is more read than Main, which means what is being promoted to Main is not attractive to the people who are actually reading Lesswrong. The top tier quality for actually read posting is dominated by one individual (a great one, but still), disincentivizing high quality posts by other high quality people. The EA Forum has high quality posts that go unread because it isn't the center of attention.
Summary: Utilitarianism is often ill-defined by supporters and critics alike, preference utilitarianism even more so. I briefly examine some of the axes of utilitarianism common to all popular forms, then look at some axes unique but essential to preference utilitarianism, which seem to have received little to no discussion – at least not this side of a paywall. This way I hope to clarify future discussions between hedonistic and preference utilitarians and perhaps to clarify things for their critics too, though I’m aiming the discussion primarily at utilitarians and utilitarian-sympathisers.
I like this essay particularly for the way it breaks down different forms of utilitarianism to various axes, which have rarely been discussed on LW much.
For utilitarianism in general:
Many of these axes are well discussed, pertinent to almost any form of utilitarianism, and at least reasonably well understood, and I don’t propose to discuss them here beyond highlighting their salience. These include but probably aren’t restricted to the following:
- What is utility? (for the sake of easy reference, I’ll give each axis a simple title – for this, the utility axis); eg happiness, fulfilled preferences, beauty, information(PDF)
- How drastically are we trying to adjust it?, aka what if any is the criterion for ‘right’ness? (sufficiency axis); eg satisficing, maximising, scalar
- How do we balance tradeoffs between positive and negative utility? (weighting axis); eg, negative, negative-leaning, positive (as in fully discounting negative utility – I don’t think anyone actually holds this), ‘middling’ ie ‘normal’ (often called positive, but it would benefit from a distinct adjective)
- What’s our primary mentality toward it? (mentality axis); eg act, rule, two-level, global
- How do we deal with changing populations? (population axis); eg average, total
- To what extent do we discount future utility? (discounting axis); eg zero discount, >0 discount
- How do we pinpoint the net zero utility point? (balancing axis); eg Tännsjö’s test, experience tradeoffs
- What is a utilon? (utilon axis)  – I don’t know of any examples of serious discussion on this (other than generic dismissals of the question), but it’s ultimately a question utilitarians will need to answer if they wish to formalise their system.
For preference utilitarianism in particular:
Here then, are the six most salient dependent axes of preference utilitarianism, ie those that describe what could count as utility for PUs. I’ll refer to the poles on each axis as (axis)0 and (axis)1, where any intermediate view will be (axis)X. We can then formally refer to subtypes, and also exclude them, eg ~(F0)R1PU, or ~(F0 v R1)PU etc, or represent a range, eg C0..XPU.
How do we process misinformed preferences? (information axis F)
(F0 no adjustment / F1 adjust to what it would have been had the person been fully informed / FX somewhere in between)
How do we process irrational preferences? (rationality axis R)
(R0 no adjustment / R1 adjust to what it would have been had the person been fully rational / RX somewhere in between)
How do we process malformed preferences? (malformation axes M)
(M0 Ignore them / MF1 adjust to fully informed / MFR1 adjust to fully informed and rational (shorthand for MF1R1) / MFxRx adjust to somewhere in between)
How long is a preference relevant? (duration axis D)
(D0 During its expression only / DF1 During and future / DPF1 During, future and past (shorthand for DP1F1) / DPxFx Somewhere in between)
What constitutes a preference? (constitution axis C)
(C0 Phenomenal experience only / C1 Behaviour only / CX A combination of the two)
What resolves a preference? (resolution axis S)
(S0 Phenomenal experience only / S1 External circumstances only / SX A combination of the two)
What distinguishes these categorisations is that each category, as far as I can perceive, has no analogous axis within hedonistic utilitarianism. In other words to a hedonistic utilitarian, such axes would either be meaningless, or have only one logical answer. But any well-defined and consistent form of preference utilitarianism must sit at some point on every one of these axes.
See the article for more detailed discussion about each of the axes of preference utilitarianism, and more.
In a post recently someone mentioned that there was a list of "Top 15" posters by karma. That inspired me to send all of them this note:
I am messaging you (now) because you are one of the 15 top contributors of the past 30 days of LW.
I was wondering if you do any time tracking; or if you have any idea how much time you spend on LW. (i.e. rescuetime)
I have made the choice to spend more of my time engaging with LW and am wondering how much you (and your other top peers) spend. And also why?
Maybe you want to rate each of these out of 10; the reasons you partake in LW discussions:
- Make the world better (raising the sanity waterline etc)
- Fun (spend my spare time here)
- Friends (here because my Real-Life is here; and so I come to hang with my friends - or my internet friends hang out here)
- Gather rationality (maybe you still gather rationality from LW; maybe you have gathered most of what you can and now are creating your own craft)
- here for new ideas (LW being a good place to share new ideas)
- here to advertise an idea (promoting a line of thinking from elsewhere - could be anything from; more Effective Altruism; to this book)
- Here to create my own craft. (from the craft and the community)
- other? (as many other's as you like)
In addition do you think people (others) should participate more or less in the ongoing conversation? (or stay about as much as they are?) And would you give any particular message to others?
Do you feel like your time spent is effective?
I wonder if this small sample; once gathered will yield anything useful. with your permission I would like to publish your responses (anonymised for your protection) if either something interesting comes out; or nothing interesting (publish the null result)
Please add as many comments as you can :).
I'd also like to thank you for being a part of keeping the community active. I find it a good garden with many friends.
(Disclaimer: I have no affiliation to rescue time I just like their tracking system)
As of the time of this post; I have received 10 replies. I waited an extra week or two and there were no more replies after about 2-3 days.
The funny thing about asking for something is that people don't always answer in the way that you want them to answer. (Mostly I blame myself and the way I asked; but I think its quite funny really that several replies did not include a rating out of 10)
1. Make the world better.
as was pointed out to me by one of the responses: "Mostly this is low because of ambiguity over "the world"", responses were; 0,2,6,y,y. of which I assume the other 5 were, 0,0,0,0,0.
Several replies included that this was a most productive time sink they could think of. replies were y,y,y,10,10,8. One other person said they used LW as procrastination. one said, "it's a reasonably interesting way of killing time".
answers: y, 0, "4-more like acquaintances", 5. Some people mentioned local meetups but also that they don't interact online with those people. I suppose if you are here for friends you are kinda doing it wrong; here to not get yelled at and to understand things is more accurate of a description. "I treat LW like a social club and a general place to hang out"
4. learn rationality
y,y but doubt it, 5 - a bit, 5. I expected most of the top posters to have already achieved a level of rationality where they would be searching elsewhere for it. I assume the others would be 0/n or close.
5. new ideas
.3,4,8 (assume 7*0). I guess the top don't think innovation happens here. Which is interesting because I thing it does.
6. advertise ideas
.1,"6 - generally", (assuming 8*0). I was concerned that the active members might be pushing an agenda or something. Its entirely possible, but seems to not be the case.
7. create craft
.8,7. I would have thought someone motivated to be increasing the craft of rationality would be here for that purpose. I guess not.
" I'm not sure to what extent I'm creating my own craft, but it's a good question. At the very least I'm acquiring a better ability to ask whether something makes sense. "
Two people mentioned that this is a place of quality, or high thinking, they are here for the reasonableness or lack of unreasonableness of the participants.
effective time: most responses to this were in the range of, "better than other rubbish on the internet", and "least bad time sink I can think of".
more or less posts: two people suggested more; one suggested less but of higher quality. They all understand the predicament of the thing.
Time tracking: several people track, and others estimate between 30mins and 3hours a day.
Bearing in mind that the top posting positions are selected on multiple factors including whether or not people have time, not just relating to their effectiveness or their *most rational* status. I don't believe this selection of people have said anything much helpful, other than
" LW gives you the opportunity to share your ideas with a large number of smart people who will help you discard or sharpen them without you having to go to the trouble of maintaining a personal blog. A good post has the opportunity to deliver a lot of value to some very smart and altruistically motivated people. Becoming a respected LW contributor takes a lot of intelligence, thought, research, writing skill, and hard work. Many try but few succeed. "
" I am pretty much an internet discussion board addict"
"Suppose LW is just a forum where a bunch of smart people hang out and talk about whatever interests them, which is frequently potentially-important (effective altruism, AI safety) or intellectually interesting (decision theory, maths of AI safety) or practically useful (akrasia, best-textbooks-on-X threads). That seems to me like enough to be valuable"
"My karma comes from thousands of comments, not from meaningful articles."
"I feel there is a power law distribution to LW contributor value with some people like Eliezer, Yvain, and lukeprog making many high-quality posts. So I think the most important thing is for people like that to get “discovered”. It may take some leveling up for them to get to that point though, and encouragement for them to spend time cranking out lots of posts that are high-quality."
" I feel like if we gave top LW posters more recognition that could incentivize the production of more great content, and becoming a top poster with a high % upvoted genuinely seems like a strong challenge and an indicator of superior potential, if achieved, to me."
"As a rule, though, I do not believe that LW has much to do with refining human rationality."
"I think that written reflection is a useful way to engage with new ideas. LW provides a venue to discuss ideas with smart people who care about published evidence."
"I post on Less Wrong primarily because I'm a forum-poster, and this is the forum most relevant to my interests. If I stopped finding forum-posting satisfying, or found a more relevant forum, I'd probably move there and only rarely check LW."
"I think people should participate more. I view LW as a forum and not as a library."
In summary: what I think I have gathered.
The top posters don't think they or lesswrong is effective at changing the world; however this is a nice place to hang out. I don't know what an effective place would look like but it is almost certainly not this place. I don't see LW as being worth quitting or shutting down without a *better* alternative. As a place striving to propagate rationality; that is debatable. As a garden of healthy discussions and reasonable people remembering that their opposing factions are also reasonable people with different idea - this place deserves a medal. If only we could hone the essence of reasonableness and share it around to people. I feel that might be the value of lesswrong.
LW is a system built of people staying, "while its good" as soon as it is no longer as nice of a garden they will be gone.
I hope this helps someone else as well as me.
In light of the discussions about improving this place; I hope this helps contribute to the discussion.
Welcome to the Rationality reading group. This week we discuss Part C: Noticing Confusion (pp. 81-114). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
C. Noticing Confusion
20. Focus Your Uncertainty - If you are paid for post-hoc analysis, you might like theories that "explain" all possible outcomes equally well, without focusing uncertainty. But what if you don't know the outcome yet, and you need to have an explanation ready in 100 minutes? Then you want to spend most of your time on excuses for the outcomes that you anticipate most, so you still need a theory that focuses your uncertainty.
21. What Is Evidence? - Evidence is an event connected by a chain of causes and effects to whatever it is you want to learn about. It also has to be an event that is more likely if reality is one way, than if reality is another. If a belief is not formed this way, it cannot be trusted.
22. Scientific Evidence, Legal Evidence, Rational Evidence - For good social reasons, we require legal and scientific evidence to be more than just rational evidence. Hearsay is rational evidence, but as legal evidence it would invite abuse. Scientific evidence must be public and reproducible by everyone, because we want a pool of especially reliable beliefs. Thus, Science is about reproducible conditions, not the history of any one experiment.
23. How Much Evidence Does It Take? - If you are considering one hypothesis out of many, or that hypothesis is more implausible than others, or you wish to know with greater confidence, you will need more evidence. Ignoring this rule will cause you to jump to a belief without enough evidence, and thus be wrong.
24. Einstein's Arrogance - Albert Einstein, when asked what he would do if an experiment disproved his theory of general relativity, responded with "I would feel sorry for [the experimenter]. The theory is correct." While this may sound like arrogance, Einstein doesn't look nearly as bad from a Bayesian perspective. In order to even consider the hypothesis of general relativity in the first place, he would have needed a large amount of Bayesian evidence.
25. Occam's Razor - To a human, Thor feels like a simpler explanation for lightning than Maxwell's equations, but that is because we don't see the full complexity of an intelligent mind. However, if you try to write a computer program to simulate Thor and a computer program to simulate Maxwell's equations, one will be much easier to accomplish. This is how the complexity of a hypothesis is measured in the formalisms of Occam's Razor.
26. Your Strength as a Rationalist - A hypothesis that forbids nothing permits everything, and thus fails to constrain anticipation. Your strength as a rationalist is your ability to be more confused by fiction than by reality. If you are equally good at explaining any outcome, you have zero knowledge.
27. Absence of Evidence Is Evidence of Absence - Absence of proof is not proof of absence. But absence of evidence is always evidence of absence. According to the probability calculus, if P(H|E) > P(H) (observing E would be evidence for hypothesis H), then P(H|~E) < P(H) (absence of E is evidence against H). The absence of an observation may be strong evidence or very weak evidence of absence, but it is always evidence.
28. Conservation of Expected Evidence - If you are about to make an observation, then the expected value of your posterior probability must equal your current prior probability. On average, you must expect to be exactly as confident as when you started out. If you are a true Bayesian, you cannot seek evidence to confirm your theory, because you do not expect any evidence to do that. You can only seek evidence to test your theory.
29. Hindsight Devalues Science - Hindsight bias leads us to systematically undervalue scientific findings, because we find it too easy to retrofit them into our models of the world. This unfairly devalues the contributions of researchers. Worse, it prevents us from noticing when we are seeing evidence that doesn't fit what we really would have expected. We need to make a conscious effort to be shocked enough.
This has been a collection of notes on the assigned sequence for this week. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part D: Mysterious Answers (pp. 117-191). The discussion will go live on Wednesday, 1 July 2015 at or around 6 p.m. PDT, right here on the discussion forum of LessWrong.
A lot of Less Wrong frames becoming more rational in terms of correcting biases. When Scott Alexander is asked how he does it, he doesn't seem to actually have an answer-- if I recall correctly, he's just said that all he's got in his life is his job, his girlfriend, and his blog, which doesn't begin to explain his remarkable flow of interesting posts.
It's a good thing to have fewer and weaker biases, but it's better if de-biasing can be applied to new ideas which have a good chance of paying off.
Is there LW material about creativity that I'm not remembering? Any recommendations for information about creativity elsewhere? I'm especially interested in material which you've seen help you or other people become more creative, as distinct from material which has been plausible and/or fun to read.
Edited to add: While I think this is a generally applicable topic, I also have a local interest. I'm fond of LW, but it seems to be in a doldrums, and at least part of the cause is a lack of interesting new material.
Let’s do an experiment in "reverse crowdfunding”. I will pay 50 USD to anyone who can suggest a new way of X-risk prevention that is not already mentioned in this roadmap. Post your ideas as a comment to this post.
Should more than one person have the same idea, the award will be made to the person who posted it first.
The idea must be endorsed by me and included in the roadmap in order to qualify, and it must be new, rational and consistent with modern scientific data.
I may include you as a co-author in the roadmap (if you agree).
The roadmap is distributed under an open license GNU.
Payment will be made by PayPal. The total amount of the prize fund is 500 USD (total 10 prizes).
The competition is open until the end of 2015.
The roadmap can be downloaded as a pdf from:
UPDATE: I uploaded new version of the map with changes marked in blue.
I recently wrote an essay about AI risk, targeted at other academics:
I think it might be interesting to some of you, so I am sharing it here. I would appreciate any feedback any of you have, especially from others who do AI / machine learning research.
Here's some blue-sky speculation about one way alien sapients' civilizations might develop differently from our own. Alternatively, you can consider it conworlding. Content note: torture, slavery.
Looking at human history, after we developed electronics, we painstakingly constructed machines that can perform general computation, then built software which approximates the workings of the human brain. For instance, we nowadays use in-silico reinforcement learning and neural nets to solve various "messy" problems like computer vision and robot movement. In the future, we might scan brains and then emulate them on computers. This all seems like a very circuitous course of development - those algorithms have existed all around us for thousands of years in the form of brains. Putting them on computers requires an extra layer of technology.
Suppose that some alien species's biology is a lot more robust than ours - their homeostatic systems are less failure-prone than our own, due to some difference in their environment or evolutionary history. They don't get brain-damaged just from holding their breath for a couple minutes, and open wounds don't easily get infected.
Now suppose that after they invent agriculture but before they invent electronics, they study biology and neurology. Combined with their robust biology, this leads to a world where things that are electronic in our world are instead controlled by vat-grown brains. For instance, a car-building robot could be constructed by growing a brain in a vat, hooking it up to some actuators and sensors, then dosing it with happy chemicals when it correctly builds a car, and stimulating its nociceptors when it makes mistakes. This rewarding and punishing can be done by other lab-grown "overseer" brains trained specifically for the job, which are in turn manually rewarded at the end of the day by their owner for the total number of cars successfully built. Custom-trained brains could control chemical plants, traffic lights, surveillance systems, etc. The actuators and sensors could be either biologically-based (lab-grown eyes, muscles, etc., powered with liquefied food) or powered with combustion engines or steam engines or even wound springs.
Obviously this is a pretty terrible world, because many minds will live lives with very little meaning, never grasping the big picture, at the mercy of unmerciful human or vat-brain overseers, without even the option of suicide. Brains wouldn't necessarily be designed or drugged to be happy overall - maybe a brain in pain does its job better. I don't think the owners would be very concerned about the ethical problems - look at how humans treat other animals.
You can see this technology as a sort of slavery set up so that slaves are cheap and unsympathetic and powerless. They won't run away, because: they'll want to perform their duties, for the drugs; many won't be able to survive without owners to top up their food drips; they could be developed or drugged to ensure docility; you could prevent them from even getting the idea of emancipation, by not giving them the necessary sensors; perhaps you could even set things up so the overseer brains can read the thoughts of their charges directly, and punish bad thoughts. This world has many parallels to Hanson's brain emulation world.
Is this scenario at all likely? Would these civilizations develop biological superintelligent AGI, or would they only be able to create superintelligent AGI once they develop electronic computing?
Specialization of labor is one of the primary reasons why the modern world is as wealthy as it is. Conceptual labor is a special case of this general trend; one of the primary reasons why we seem much more knowledgeable than the past is the producers of knowledge are as specialized as producers of consumers goods, and concepts are as varied and precise as consumer goods. The rest of this post expands on that concept and discusses implications for mining wisdom from the past, as well as communicating in the present.
Some of you may already have seen this story, since it's several days old, but MIT Technology Review seems to have the best explanation of what happened: Why and How Baidu Cheated an Artificial Intelligence Test
Such is the success of deep learning on this particular test that even a small advantage could make a difference. Baidu had reported it achieved an error rate of only 4.58 percent, beating the previous best of 4.82 percent, reported by Google in March. In fact, some experts have noted that the small margins of victory in the race to get better on this particular test make it increasingly meaningless. That Baidu and others continue to trumpet their results all the same - and may even be willing to break the rules - suggest that being the best at machine learning matters to them very much indeed.
(In case you didn't know, Baidu is the largest search engine in China, with a market cap of $72B, compared to Google's $370B.)
The problem I see here is that the mainstream AI / machine learning community measures progress mainly by this kind of contest. Researchers are incentivized to use whatever method they can find or invent to gain a few tenths of a percent in some contest, which allows them to claim progress at an AI task and publish a paper. Even as the AI safety / control / Friendliness field gets more attention and funding, it seems easy to foresee a future where mainstream AI researchers continue to ignore such work because it does not contribute to the tenths of a percent that they are seeking but instead can only hinder their efforts. What can be done to change this?
During the 1990’s, a significant stream of research existed around how people process information, which combined very different streams in psychology and related areas with explicit predictive models about how actual cognitive processes differ from the theoretical ideal. This is not only the literature by Kahneman and Tversky about cognitive biases, but includes research about memory, perception, scope insensitivity, and other areas. The rationalist community is very familiar with some of this literature, but fewer are familiar with a masterful synthesis produced by Richards Heuer for the intelligence community in 1999, which was intended to start combating these problems, a goal we share. I’m hoping to put together a stream of posts based on that work, potentially expanding on it, or giving my own spin – but encourage reading the book itself (PDF) as well. (This essay is based on Chapter 3.)
This will hopefully be my first set of posts, so feedback is especially welcome, both to help me refine the ideas, and to refine my presentation.
Entropy, Pressure, and Metaphorical States of Matter
Eliezer recommends updating incrementally but has noted that it’s hard. The central point, that it is hard to do so, is one that some in our community have experienced and explicated, but there is deep theory I’ll attempt to outline, via an analogy, that I think explains how and why it occurs. The problem is that we are quick to form opinions and build models, because humans are good at pattern finding. We are less quick to discard them, due to limited mental energy. This is especially true when the pressure of evidence doesn’t shift overwhelmingly and suddenly.
I’ll attempt to answer the question of how this is true by stretching a metaphor and create an intuition pump for thinking about how our minds might be perform some think using uncertainty.
Heuer notes a stream of research about perception, and notes that “once an observer has formed an image – that is, once he or she has developed a mind set or expectation concerning the phenomenon being observed – this conditions future perceptions of that phenomenon.” This seems to follow a standard Bayesian practice, but in fact, as Eliezer noted, people fail to update. The following set of images, which Heuer reproduced from a 1976 book by Robert Jervis, show exactly this point;
Looking at each picture, starting on the left, and moving to the right, you see a face slowly change. At what point does the face no longer seem to appear? (Try it!) For me, it’s at about the seventh image that it’s clear it morphed into a sitting, bowed figure. But what if you start at the other end? The woman is still clearly there long past the point where we see a face, starting in the other direction. What’s going on?
We seem to attach too strongly to our first approach, decision, or idea. Specifically, our decision seems to “freeze” once it get to one place, and needs much more evidence to start moving again. This has an analogue in physics, to the notion of freezing, which I think is more important than it first appears.
To analyze this, I’ll drop into some basic probability theory, and physics, before (hopefully) we come out on the other side with a conceptually clearer picture. First, I will note that cognitive architecture has some way of representing theories, and implicitly assigns probabilities to various working theories. This is some sort of probability distribution over sample theories. Any probability distribution has a quantity called entropy, which is simply the probability of each state, multiplied by the logarithm of that probability, summed over all the states. (The probability is less than 1, so the logarithm is negative, but we traditionally flip the sign so entropy is a positive quantity.)
Need an example? Sure! I have two dice, and they can each land on any number, 1-6. I’m assuming they are fair, so each has probability of 1/6, and the logarithm (base 2) of 1/6 is about -2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585. (With two dice, I have 36 possible combinations, each with probability 1/36, log(1/36) is -5.17, so the entropy is 5.17. You may have notices that I doubled the number of dice involved, and the entropy doubled – because there is exactly twice as much that can happen, but the average entropy is unchanged.) If I only have 2 possible states, such as a fair coin, each has probability of 1/2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1. An unfair coin, with a ¼ probability of tails, and a ¾ probability of heads, has an entropy of 0.81. Of course, this isn’t the lowest possible entropy – a trick coin with both sides having heads only has 1 state, with entropy 0. So unfair coins have lower entropy – because we know more about what will happen.
Freezing, Melting, and Ideal Gases under Pressure
In physics, this has a deeply related concept, also called entropy, which in the form we see it on a macroscopic scale, just temperature. If you remember your high school science classes, temperature is a description of how much molecules move around. I’m not a physicist, and this is a bit simplified, but the entropy of an object is how uncertain we are about its state – gasses expand to fill their container, and the molecules could be anywhere, so they have higher entropy than a liquid, which stays in its container, which still has higher entropy than a solid, where the molecules don’t more much, which still has higher entropy than a crystal, where the molecules are sort of locked into place.
This partially lends intuition to the third law of thermodynamics; “the entropy of a perfect crystal at absolute zero is exactly equal to zero.” In our terms above, it’s like that trick coin – we know exactly where everything is in the crystal, and it doesn’t move. Interestingly, a perfect crystal at 0 Kelvin cannot exist in nature; no finite process can reduce entropy to that point; like infinite certainty, infinitely exact crystals are impossible to arrive at, unless you started there. So far, we could build a clever analogy between temperature and certainty, telling us that “you’re getting warmer” means exactly the opposite of what it does in common usage – but I think this is misleading.
In fact, I think that information in our analogy doesn’t change the temperature; instead, it reduces the volume! In the analogy, gases can become liquids or solids either by lowering temperature, or by increasing pressure – which is what evidence does. Specifically, evidence constrains the set of possibilities, squeezing our hypothesis space. The phrase “weight of evidence” is now metaphorically correct; it will actually constrain the space by applying pressure.
I think that by analogy, this explains the phenomenon we see with perception. While we are uncertain, information increases pressure, and our conceptual estimate can condense from uncertain to a relatively contained liquid state – not because we have less probability to distribute, but because the evidence has constrained the space over which we can distribute it. Alternatively, we can settle on a lower energy state on our own, unassisted by evidence. If our minds too-quickly settle on a theory or idea, the gas settles into a corner of the available space, and if we fail to apply enough energy to the problem, our unchallenged opinion can even freeze into place.
Our mental models can be liquid, gaseous, or frozen in place – either by our prior certainty, our lack of energy required to update, or an immense amount of evidential pressure. When we look at those faces, our minds settle into a model quickly, and once there, fail to apply enough energy to re-evaporate our decision until the pressure of the new pictures is relatively immense. If we had started at picture 3 or 6, we could much more easily update away from our estimates; our minds are less willing to let the cloud settle into a puddle of probable answers, much less freeze into place. We can easily see the face, or the woman, moving between just these two images.
When we begin to search for a mental model to describe some phenomena, whether it be patterns of black and white on a page, or the way in which our actions will affect a friend, I am suggesting we settle into a puddle of likely options, and when not actively investing energy into the question, we are likely to freeze into a specific model.
What does this approach retrodict, or better, forbid?
Because our minds have limited energy, the process of maintaining an uncertain stance should be difficult. This seems to be borne out by personal and anecdotal experience, but I have not yet searched the academic literature to find more specific validation.
We should have more trouble updating away from a current model than we do arriving at that new model from the beginning. As Heuer puts it, “Initial exposure to… ambiguous stimuli interferes with accurate perception even after more and better information becomes available.” He notes that this was shown in Brunder and Potter, 1964 “Interference in Visual Recognition,” and that “the early but incorrect impression tends to persist because the amount of information necessary to invalidate a hypothesis is considerably greater than the amount of information required to make an initial interpretation.”
Potential avenues of further thought
The pressure of evidence should reduce the mental effort needed to switch models, but “leaky” hypothesis sets, where a class of model is not initially considered, should allow the pressure to metaphorically escape into the larger hypothesis space.
There is a potential for making this analogy more exact, but discussing entropy in graphical models (Bayesian Networks), especially in sets of graphical models with explicit uncertainty attached. I don’t have the math needed for this, but would be interested in hearing from those who did.
 I would like to thank both Abram Demski (Interviewed here) from providing a link to this material, and my dissertation chair, Paul Davis, who was able to point me towards how this has been used and extended in the intelligence community.
 There is a follow up book and training course which is also available, but I’ve not read it nor seen it online. A shorter version of the main points of that book is here (PDF), which I have only glanced through.
 We have a LW Post, Entropy and Temperature that explains this a bit. For a different, simplified explanation, try this: http://www.nmsea.org/Curriculum/Primer/what_is_entropy.htm. For a slightly more complete version, try Wikipedia: https://en.wikipedia.org/wiki/Introduction_to_entropy. For a much more complete version, learn the math, talk to a PhD in thermodynamics, then read some textbooks yourself.
 I think this, of course, because I was initially heading in that direction. Instead, I realized there was a better analogy – but if we wanted to develop it in this direction instead, I’d point to the phase change energy required to changed phases of matter as a reason that our minds have trouble moving from their initial estimate. On reflection, I think this should be a small part of the story, if not entirely negligible.
Two years ago, when I travelled to Belize, I came up with an idea for a self-sufficient scalable program to address poverty. I saw how many people in Belize were unemployed or getting paid very low wages, but I also saw how skilled they were, a result of English being the national language and a mandatory education system. Many Belizeans have a secondary/high school education in Belize, and the vast majority have at least a primary school education and can speak English. I thought to myself, "it's too bad I can't teleport Belizeans to the United States, because in the U.S., they would automatically be able to earn many times more the minimum wage in Belize with their existing skills."
But I knew there was a way to do it: "virtual teleportation." My solution involves using computer and internet access in conjunction with training and support to connect the poor with high paying international work opportunities. My tests of virtual employment using Upwork and Amazon Mechanical Turk show that it is possible to earn at least twice the minimum wage in Belize, around $3 an hour, working with flexible hours. This solution is scalable because there is a consistent international demand for very low wage work (relatively speaking) from competent English speakers, and in other countries around the world like South Africa, many people matching that description can be found and lifted out of poverty. The solution could become self-sufficient because running a virtual employment enterprise or taking a cut of the earnings of members using virtual employment services (as bad as that sounds) can generate enough income to pay for the relatively low costs of monthly internet and the one-time costs of technology upgrades.
If you have any feedback, comments, suggestions, I would love to hear about it in the comments section. Feedback on my fundraising campaign at igg.me/at/bvep is also greatly appreciated.
If you are thinking about supporting the idea, my team and I need your help to make this possible. It may be difficult for us to reach our goal, but every contribution greatly increases the chances our fundraiser and our program will be successful, especially in the early stages. All donations are tax-deductible, and if you’d like, you can also opt-in for perks like flash drives and t-shirts. It only takes a moment to make a great difference: igg.me/at/bvep.
Thank you for reading!
In a previous post, I considered the issue of an AI that behaved "nicely" given some set of circumstances, and whether we could extend that behaviour to the general situation, without knowing what "nice" really meant.
The original inspiration for this idea came from the idea of extending the nice behaviour of "reduced impact AI" to situations where they didn't necessarily have a reduced impact. But it turned out to be connected with "spirit of the law" ideas, and to be of potentially general interest.
Essentially, the problem is this: if we have an AI that will behave "nicely" (since this could be a reduced impact AI, I don't use the term "friendly", which denotes a more proactive agent) given X, how can we extend its "niceness" to ¬X? Obviously if we can specify what "niceness" is, we could just require the AI to do so given ¬X. Therefore let us assume that we don't have a good definition of "niceness", we just know that the AI has that given X.
To make the problem clearer, I chose an X that would be undeniably public and have a large (but not overwhelming) impact: the death of the US president on a 1st of April. The public nature of this event prevents using approaches like thermodynamic miracles to define counterfactuals.
I'll be presenting a solution in a subsequent post. In the meantime, to help better understand the issue, here's a list of failed solutions:
First Failure: maybe there's no problem
Initially, it wasn't clear there was a problem. Could we just expect niceness to extend naturally? But consider the following situation: assume the vice president is a warmonger, who will start a nuclear war if ever they get into power (but is otherwise harmless).
Now assume the nice AI has the conditional action criteria: "if the vice president ever becomes president, launch a coup". This is safe, it can be extended to the ¬X situation in the way we want.
However, conditioning on X, that criteria is equivalent with "launch a coup on the 2nd of April". And if the AI has that criteria, then extending it to ¬X is highly non-safe. This illustrates that there is a real problem here - the coup example is just one of the myriad of potential issues that could arise, and we can't predict them all.
Second failure: don't condition on X
Maybe the trick could be preventing the AI from conditioning on X (for anything)? If the AI itself can't tell the difference between X and ¬X, wouldn't its nice behaviour extend?
But this doesn't help. The AI could merely condition on things tightly coupled with X. Like news reports of the president's demise, or a conjunction of other events almost impossible under ¬X.
Third failure: disbelieve the truth
In this design, the AI simply assumes that X happens, whatever the evidence. Therefore it would stay nice, whatever happened.
But this results in an AI who's beliefs are strongly decoupled with reality. The AI must believe that there is a vast conspiracy concealing the president's death; the more it investigates this conspiracy, and fails to find it, the more vast and powerful it becomes. What if you show it undeniable, impossible-to-fake-by-the-laws-of-physics evidence of the president's demise? Then it starts to believe that the laws of physics are all wrong.
Though the AI is "nice", it's niceness is appropriate for a world of invisible conspiracies and alternate physics - not for the real world. It's clear how such an AI could start doing crazy and dangerous things as its view of reality starts to diverge.
Fourth failure: hack the models
This example is due to Daniel Dewey. He imagined that the AI would maintain a full collection of models and update them normally. But before making a prediction, each model would get hacked to artificially set ¬X to X.
To see how this could fail, imagine all evidence about the world is stored in a giant vector of data. These vectors are calibrated correctly, then ¬X is flipped to X. The problem is that, while that is small change to the vector, it could be a huge change to the world that it encodes.
For instance, let's go back to the warmongering vice president, and assume that nuclear war is inevitable, given X. Then the AI will walk around in a peaceful world, update its models - and act as if it was in a nuclear wasteland, because those are the only possibilities, given X. Essentially, the AI will move through our universe, harvesting information that would inform its actions in a parallel universe - and acting as if it existed there instead of here.
For instance, it could wander into a flower show where someone is talking about difficulties growing roses in southern Quebec. It adds this data to its vector, noting that the soil there must be a bit unsuitable to plant growth. It therefore concludes that it must write to the (non-existent) Third God-Emperor of America and advise it to give up on the Quebec Anglican Protectorate, which must be misreporting their agriculture output, given this data.
It's interesting to contrast this AI with the previous one. Suppose that the nuclear war further implies that Paris must be a smoking crater. And now both AIs must walk around a clearly bustling and intact Paris. The disbelieving AI must conclude that this is an elaborate ruse - someone has hidden the crater from its senses, put up some fake building, etc... The model-hacking AI, meanwhile, acts as if it's in a smouldering crater, with the genuine Paris giving it information as to what it should do: it sees an intact army barracks, and starts digging under the "rubble" to see if anything "remains" of that barracks.
Fifth failure: Bayes nets and decisions
It seems that a Bayes net would be our salvation. We could have dependent nodes like "warmongering president", "nuclear war", or "flower show". Then we could require that the AI makes its decision dependent only on the states of these dependent nodes. And never on the original X/¬X node.
This seems safe - after all, the AI is nice given X. And if we require the AI's decisions be dependent only on subordinate nodes, then it must be nice dependent on the subordinate nodes. Therefore X/¬X is irrelevant, and the AI is always nice.
Except... Consider what a "decision" is. A decision could be something simple, or it could be "construct a sub AI that will establish X versus ¬X, and do 'blah' if X, and 'shmer' if ¬X". That's a perfectly acceptable decision, and could be made conditional on any (or all) of the subordinate nodes. And if 'blah' is nice while 'shmer' isn't, we have the same problem.
Six failure: Bayes nets and unnatural categories
OK, if decisions are too general, how about values for worlds? We take a lot of nodes, subordinate to X/¬X, and require that the AI define its utility or value function purely in terms of the states of these subordinate nodes. Again, this seems safe. The AI's value function is safe given X, by assumption, and is defined in terms of subordinate nodes that "screen off" X/¬X.
And that AI is indeed safe... if the subordinate nodes are sensible. But they're only sensible because I've defined them using terms such as "nuclear war". But what if a node is "nuclear war if X and peace in our time if ¬X"? That's a perfectly fine definition. But such nodes mean that the value function given ¬X need not be safe in any way.
This is somewhat connected with the Grue and Bleen issue, and addressing that is how I'll be hoping to solve the general problem.
(Epistemic status: often discussed in bits in pieces, haven't seen it summarized in one place anywhere.)
Do you feel that your computer sometimes has a mind of its own? "I have no idea why it is doing that!" Do you feel that, the more you understand and predict someone's action, the less intelligent and more "mechanical" they appear?
My guess is that, in many cases, agency (as in, the capacity to act and make choices) is a manifestation of the observer's inability to explain and predict the agent's actions. To Omega in the Newcomb's problem humans are just automatons without a hint of agency. To a game player some NPCs appear stupid and others smart, and the more you play and the more you can predict the NPCs, the less agenty they appear to you.
Note that randomness is not the same as uncertainty, since if you can predict that someone or something behaves randomly, it is still a prediction. What I mean is more of a Knightian uncertainty, where one fails to make a useful prediction at all. Something like a tornado may appear to intentionally go after you if you fail to predict where it will be going and you have trouble escaping.
If you are a user of a computer program, and it does not behave as you expect it to, you often get a feeling of there being a hostile intelligence opposing you, occasionally resulting in an aggressive behavior toward it, usually with verbal violence, though occasionally getting physical, the way we would confront an actual enemy. On the other hand, if you are the programmer who wrote the code in question, you think of the misbehavior as bugs, not intentional hostility, and treat the code by debugging or documenting. Mostly. Sometimes I personalize especially nasty bugs.
I was told by a nurse that this is also how they are taught to treat difficult patients: you don't get upset at someone's misbehavior and instead treat them not as an agent, but more like an algorithm in need of debugging. Parents of young children are also advised to take this approach.
This seems to also apply to self-analysis, though to a lesser degree. If you know yourself well, and can predict what you would do in a specific situation, you may feel that your response is mechanistic or automatic and not agenty or intelligent. Or maybe not. I am not sure. I think if I had the capacity for full introspection, not just the surface level understanding of my thoughts and actions, I would ascribe much less agency to myself. Probably because it would cease to be a useful concept. I wonder if this generalizes to a superintelligence capable of perfect or near perfect self-reflection.
This leads us to the issue of feelings, deliberate choices, free will and ability to consent and take responsibility. These seem to be useful, if illusory, concepts for when you live among your intellectual peers and want to be treated at least as having as much agency as you ascribe to them. But this is a topic for a different post.
The Fundamental Attribution Error
Also known, more accurately, as "Correspondence Bias."
The "more accurately" part is pretty important; bias -may- result in error, but need not -necessarily- do so, and in some cases may result in reduced error.
A Simple Example
Suppose I write a stupid article that makes no sense and rambles on without any coherent point. There might be a situational cause of this; maybe I'm tired. Correcting for correspondence bias means that more weight should be given to the situational explanation than the dispositional explanation, that I'm the sort of person who writes stupid articles that ramble on. The question becomes, however, whether or not this increases the accuracy of your assessment of me; does correcting for this bias make you, in fact, less wrong?
In this specific case, no, it doesn't. A person who belongs to the class of people who write stupid articles is more likely to write stupid articles than a person who doesn't belong to that class - I'd be surprised if I ever saw Gwern write anything that wasn't well-considered, well-structured, and well-cited. If somebody like Gwern or Eliezer wrote a really stupid article, we have sufficient evidence that he's not a member of that class of people to make that conclusion a poor one; the situational explanation is better, he's having some kind of off day. However, given an arbitrary stupid article written by somebody for which we have no prior information, the distribution is substantially different. We have different priors for "Randomly chosen person X writes article" and "Article is bad" implies "X is a bad writer of articles" than we do for "Well-known article author Y writes article" and "Article is bad" implies "Y is a bad writer of articles".
Getting to the Point
The FAE is putting emphasis on internal factors rather than external. It's jumping first to the conclusion that somebody who just swerved is a bad driver, rather than first considering the possibility that there was an object in the road they were avoiding, given only the evidence that they swerved. Whether or not the FAE is an error - whether it is more wrong - depends on whether or not the conclusion you jumped to was correct, and more importantly, whether, on average, that conclusion would be correct.
It's very easy to produce studies in which the FAE results in people making incorrect judgements. This is not, however, the same as the FAE resulting in an average of more incorrect judgements in the real world.
Correspondence Bias as Internal Rationalization
I'd suggest the major issue with correspondence bias is not, as commonly presented, incorrectly interpreting the behavior of other people - rather, the major issue is with incorrectly interpreting your own behavior. The error is not in how you interpret other peoples' behaviors, but in how you interpret your own.
Turning to Eliezer's example in the linked article, if you find yourself kicking vending machines, maybe the answer is that -you- are a naturally angry person, or, as I would prefer to phrase it, you have poor self-control. The "floating history" Eliezer refers to sounds more to me like rationalizations for poor behavior than anything approaching "good" reasons for expressing your anger through violence directed at inanimate objects. I noticed -many- of those rationalizations cropping up when I quit smoking - "Oh, I'm having a terrible day, I could just have one cigarette to take the edge off." I don't walk by a smoker and assume they had a terrible day, however, because those were -excuses- for a behavior that I shouldn't be engaging in.
It's possible, of course, that Eliezer's example was simply a poorly chosen one; the examples in studies certainly seem better, such as assuming the authors of articles held the positions they wrote about. But the examples used in those studies are also extraordinarily artificial, at least in individualistic countries, where it's assumed, and generally true, that people writing articles do have the freedom to write what they agree with, and infringements of this (say, in the context of a newspaper asking a columnist to change a review to be less hostile to an advertiser) are regarded very harshly.
Collectivist versus Individualist Countries
There's been some research done, comparing collectivist societies to individualist societies; collectivist societies don't present the same level of effect from the correspondence bias. A point to consider, however, is that in collectivist societies, the artificial scenarios used in studies are more "natural" - it's part of their society to adjust themselves to the circumstances, whereas individualist societies see circumstance as something that should be adapted to the individual. It's -not- an infringement, or unexpected, for the state-owned newspaper to require everything written to be pro-state.
Maybe the differing levels of effect are less a matter of "Collectivist societies are more sensitive to environment" so much as that, in both cultures, the calibration of a heuristic is accurate, but it's simply calibrated to different test cases.
I don't have anything conclusive to say, here, merely a position: The Correspondence Bias is a bias that, on the whole, helps people arrive at more accurate, rather than less accurate, conclusions, and should be corrected with care to improving accuracy and correctness, rather than the mere elimination of bias.
Recently we have opened an experimental website for Rational Discussion of Politics. A special feature of the new website is an automated recommendation system which studies user preferences based on their voting records. The purpose of this feature is to enhance the quality of discussion without using any form of censorship.
The recommendation system was previously tested with the help of 30 members of a political discussion forum. The tests have shown that most user preferences can be reasonably well described by just two parameters. The system chooses the parameters (principal vectors) independently based only on the numerical data (comment ratings), but it was easy to see that one vector corresponded to the “leftwing - rightwing” and another to the “well written – poorly written” axis.
About a month ago we started discussions on the new website. This time, all our participants were LW members and the results were very different. There was relatively little variation along “well written – poorly written” axis. There was significant variation along what seemed to be the political views axis, but it could no longer be perfectly described by the conventional “leftwing - rightwing” labels. For the moment, we adopted “populares” and “optimates” terms for the two camps (the former seems somewhat correlated with “left-wing/liberal” and the latter with “right-wing/libertarian”).
The results have shown an interesting asymmetry between the camps. In the previous tests, both left and right leaning users upvoted users from their own camp much more frequently. However, one group was several times more likely to upvote their opponents than the other. Among “populares” and “optimates” the asymmetry was a lot weaker (currently 27%), but still noticeable.
In both cases our sample sizes were small and may not be representative of the LW community or the US population. Still, it would be interesting to find an explanation for this asymmetry. One possibility is that, on average, one side presents significantly better arguments. Another possibility is that the other group is more open-minded.
Can anyone suggest a test that can objectively decide which (if any) hypothesis is correct?
I am currently learning about the basics of decision theory, most of which is common knowledge on LW. I have a question, related to why EDT is said not to work.
Consider the following Newcomblike problem: A study shows that most people who two-box in Newcomblike problems as the following have a certain gene (and one-boxers don't have the gene). Now, Omega could put you into something like Newcomb's original problem, but instead of having run a simulation of you, Omega has only looked at your DNA: If you don't have the "two-boxing gene", Omega puts $1M into box B, otherwise box B is empty. And there is $1K in box A, as usual. Would you one-box (take only box B) or two-box (take box A and B)? Here's a causal diagram for the problem:
Since Omega does not do much other than translating your genes into money under a box, it does not seem to hurt to leave it out:
I presume that most LWers would one-box. (And as I understand it, not only CDT but also TDT would two-box, am I wrong?)
Now, how does this problem differ from the smoking lesion or Yudkowsky's (2010, p.67) chewing gum problem? Chewing Gum (or smoking) seems to be like taking box A to get at least/additional $1K, the two-boxing gene is like the CGTA gene, the illness itself (the abscess or lung cancer) is like not having $1M in box B. Here's another causal diagram, this time for the chewing gum problem:
As far as I can tell, the difference between the two problems is some additional, unstated intuition in the classic medical Newcomb problems. Maybe, the additional assumption is that the actual evidence lies in the "tickle", or that knowing and thinking about the study results causes some complications. In EDT terms: The intuition is that neither smoking nor chewing gum gives the agent additional information.
In 2008 I was working on a Russian language book “Structure of the Global Catastrophe”, and I brought it to one our friends for review. He was geologist Aranovich, an old friend of my late mother's husband.
We started to discuss Stevenson's probe — a hypothetical vehicle which could reach the earth's core by melting its way through the mantle, taking scientific instruments with it. It would take the form of a large drop of molten iron – at least 60 000 tons – theoretically feasible, but practically impossible.
Milan Cirkovic wrote an article arguing against this proposal, in which he fairly concluded that such a probe would leave a molten channel of debris behind it, and high pressure inside the earth's core could push this material upwards. A catastrophic degassing of the earth's core could ensue that would act like giant volcanic eruption, completely changing atmospheric composition and killing all life on Earth.
Our friend told me that in his institute they had created an upgraded version of such a probe, which would be simpler, cheaper and which could drill down deeply at a speed of 1000 km per month. This probe would be a special nuclear reactor, which uses its energy to melt through the mantle. (Something similar was suggested in the movie “China syndrome” about a possible accident at a nuclear power station – so I don’t think that publishing this information would endanger humanity.) The details of the reactor-probe were kept secret, but there was no money available for practical realisation of the project. I suggested that it would be wise not to create such a probe. If it were created it could become the cheapest and most effective doomsday weapon, useful for worldwide blackmail in the reasoning style of Herman Khan.
But in this story the most surprising thing for me was not a new way to kill mankind, but the ease with which I discovered its details. If your nearest friends from a circle not connected with x-risks research know of a new way of destroying humanity (while not fully recognising it as such), how many more such ways are known to scientists from other areas of expertise!
I like to create full exhaustive lists, and I could not stop myself from creating a list of human extinction risks. Soon I reached around 100 items, although not all of them are really dangerous. I decided to convert them into something like periodic table — i.e to sort them by several parameters — in order to help predict new risks.
For this map I chose two main variables: the basic mechanism of risk and the historical epoch during which it could happen. Also any map should be based on some kind of future model, nd I chose Kurzweil’s model of exponential technological growth which leads to the creation of super technologies in the middle of the 21st century. Also risks are graded according to their probabilities: main, possible and hypothetical. I plan to attach to each risk a wiki page with its explanation.
I would like to know which risks are missing from this map. If your ideas are too dangerous to openly publish them, PM me. If you think that any mention of your idea will raise the chances of human extinction, just mention its existence without the details.
I think that the map of x-risks is necessary for their prevention. I offered prizes for improving the previous map which illustrates possible prevention methods of x-risks and it really helped me to improve it. But I do not offer prizes for improving this map as it may encourage people to be too creative in thinking about new risks.
Pdf is here: http://immortality-roadmap.com/typriskeng.pdf
"The science of “human factors” now permeates the aviation industry. It includes a sophisticated understanding of the kinds of mistakes that even experts make under stress. So when Martin Bromiley read the Harmer report, an incomprehensible event suddenly made sense to him. “I thought, this is classic human factors stuff. Fixation error, time perception, hierarchy.”
experienced professionals are prone..."
A research team in China has created a system for answering verbal analogy questions of the type found on the GRE and IQ tests that scores a little above the average human score, perhaps corresponding to an IQ of around 105 or so. This improves substantially on the reported SOTA in AI for these types of problems.
This work builds on deep word-vector embeddings which have led to large gains in translation and many NLP tasks. One of their key improvements involves learning multiple vectors per word, where the number of specific word meanings is simply grabbed from a dictionary. This is important because verbal analogy questions often use more rare word meanings. They also employ modules specialized for the different types of questions.
I vaguely remember reading that AI systems already are fairly strong at solving visual raven-matrix style IQ questions, although I haven't looked into that in detail.
The multi-vector technique is probably the most important take away for future work.
Even if subsequent follow up work reaches superhuman verbal IQ in a few years, this of course doesn't immediately imply AGI. These types of IQ tests measure specific abilities which are correlated with general intelligence in humans, but these specific abilities are only a small subset of the systems/abilities required for general intelligence, and probably rely on a smallish subset of the brain's circuitry.
I continually train my ten-year-old son’s working memory, and urge parents of other young children to do likewise. While I have succeeded in at least temporarily improving his working memory, I accept that this change might not be permanent and could end a few months after he stops training. But I also believe that while his working memory is boosted so too is his learning capacity.
I have a horrible working memory that greatly hindered my academic achievement. I was so bad at spelling that they stopped counting it against me in school. In technical classes I had trouble remembering what variables stood for. My son, in contrast, has a fantastic memory. He twice won his school’s spelling bee, and just recently I wrote twenty symbols (letters, numbers, and shapes) in rows of five. After a few minutes he memorized the symbols and then (without looking) repeated them forward, backwards, forwards, and then by columns.
My son and I have been learning different programming languages through Codecademy. While I struggle to remember the required syntax of different languages, he quickly gets this and can focus on higher level understanding. When we do math learning together his strong working memory also lets him concentrate on higher order issues then remembering the details of the problem and the relevant formulas.
You can easily train a child’s working memory. It requires just a few minutes of time a day, can be very low tech or done on a computer, can be optimized for your child to get him in flow, and easily lends itself to a reward system. Here is some of the training we have done:
- I write down a sequence and have him repeat it.
- I say a sequence and have him repeat it.
- He repeats the sequence backwards.
- He repeats the sequence with slight changes such as adding one to each number and “subtracting” one from each letter.
- He repeats while doing some task like touching his head every time he says an even number and touching his knee every time he says an odd one.
- Before repeating a memorized sequence he must play repeat after me where I say a random string.
- I draw a picture and have him redraw it.
- He plays N-back games.
- He does mental math requiring keeping track of numbers (i.e. 42 times 37).
- I assign numerical values to letters and ask him math operation questions (i.e. A*B+C).
The key is to keep changing how you train your kid so you have more hope of improving general working memory rather than the very specific task you are doing. So, for example, if you say a sequence and have your kid repeat it back to you, vary the speed at which you talk on different days and don’t just use one class of symbols in your exercises.
Utilitarianism sometimes supports weird things: killing lone backpackers for their organs, sacrificing all world's happiness to one utility monster, creating zillions of humans living on near-subsistence level to maximize total utility, or killing all but a bunch of them to maximize average utility. Also, it supports gay rights, and has been supporting them since 1785, when saying that there's nothing wrong in having gay sex was pretty much in the same category as saying that there's nothing wrong in killing backpackers. This makes one wonder: if despite all the disgust towards them few centuries ago, gay rights have been inside the humanity's coherent extrapolated volition all along, then perhaps our descendants will eventually come to the conclusion that killing the backpacker has been the right choice all along, and only those bullet-biting extremists of our time were getting it right. As a matter of fact, as a friend of mine pointed out, you don't even need to fast forward few centuries - there are or were already ethical systems actually in use in some cultures (e.g. bushido in pre-Meiji restoration Japan) that are obsessed with honor and survivor's guilt. They would approve of killing the backpacker or letting them kill themselves - this being an honorable death, and living while letting five other people to die being dishonorable - on non-utilitarian grounds, and actually alieve that this is the right choice. Perhaps they were right all along, and the Western civilization bulldozed through them effectively destroying such culture not because of superior (non-utilitarian) ethics but for any other reason things happened in history. In this case there's no need in trying to fix utilitarianism, lest it suggest killing backpackers, because it's not broken - we are - and out descendants will figure that out. In physics we've seen this, when an elegant low-Kolmogorov-complexity model predicted that weird things happens on a subatomic level, and we've built huge particle accelerators just to confirm - yep, that's exactly what happens, in spite of all your intuitions. Perhaps smashing utilitarianism with high energy problems only breaks our intuitions, while utilitarianism is just fine.
But let's talk about relativity. In 1916 Karl Schwarzschild solved the newly discovered Einstein field equations and thus predicted the black holes. It was thought as a mere curiosity and perhaps GIGO at the time, until in 1960s people realized that yes, contra all intuitions, this is in fact a thing. But here's the thing: they were actually first predicted by John Michell in 1783. You can easily check it: if you substitute the speed of light to the classical formula for escape velocity, you'll get the Schwarzschild radius. Michell actually knew the radius and mass of the Sun, as well as the gravitational constant precisely enough to get the order of magnitude and the first digit right when providing an example of such object. If we somehow never discovered general relativity, but managed to build good enough telescopes to observe the stars orbiting the emptiness that we now call Sagittarius A*, if would be very tempting to say: "See? We predicted this centuries ago, and however crazy it seemed, we now know it's true. That's what happens when you stick to the robust theories, shut up, and calculate - you stay centuries ahead of the curve."
We now know that Newtonian mechanics aren't true, although they're close to truth when you plug in non-astronomical numbers (and even some astronomical). A star 500 times size and the same density as the Sun, however, is very much astronomical. It is only sheer coincidence that in this exact formula relativistic terms work exactly in the way to give the same solution for the escape velocity as the classical mechanics do. It would be enough for Michell to imagine that his dark star rotates - a thing that Newtonian mechanics say doesn't matter, although it does - to change the category of this prediction from "miraculously correct" to "expectedly incorrect". It doesn't mean that Newtonian mechanics weren't a breakthrough, better than any single theory existing at the time. But it does mean that it would be premature to people in pre-relativity era to invest into building a starship designed to go ten times the speed of light even if they could - although that's where "shut up and calculate" could lead them.
And that's where I think we are with utilitarianism. It's very good. It's more or less reliably better than anything else. And it managed to make ethical predictions so far fetched (funny enough, about as far fetched as the prediction of dark stars) that it's tempting to conclude that the only reason why it keeps making crazy predictions is that we haven't yet realized they're not crazy. But we live in the world where Sagittarius A* was discovered, and general relativity wasn't. The actual 42-ish ethical system will probably converge to utilitarianism when you plug in non-extreme numbers (small numbers of people, non-permanent risks and gains, non-taboo topics). But just because it converged to utilitarianism on one taboo (at the time) topic, and made utilitarianism stay centuries ahead of the moral curve, doesn't mean it will do the same for others.
Our meetup last weekend was at the downtown Ann Arbor Public Library. There were several comments, requests, and discussion items. This discussion topic goes out to attendees, people who might have wanted to attend but didn't, and members of other meetup groups who have suggestions.
1. Several people mentioned having trouble commenting here on the Less Wrong forums. Some functions are restricted by karma, and if you cannot comment, you cannot accummulate karma.
- Have you verified your e-mail address? This is a common stumbling point.
- Please try to comment on this post. Restrictions on comments are (moderate certainty) looser than comments on starting posts.
- If that does not work, please try to comment on a comment on this post. I will add one specifically for this purpose. Restrictions on comments on comments may be (low certainty) looser than starting new comment threads.
- If someone has already troubleshot new users' problems with commenting, please link.
2. Some people felt intimidated about attending. Prominent community members include programmers, physicists, psychiatrists, philosophy professors, Ph.D.s, and other impressive folks who do not start with P like fanfiction writers. Will I be laughed out of the room if I have not read all the Sequences?
No. Not only is there no minimum requirement to attend, as a group, we are *very excited* about explaining things to people. Our writing can be informationally dense, but our habit of linking to long essays is (often) meant to provide context, not to say, "You must read all the dependencies before you are allowed to talk."
And frankly, we are not that intimidating. Being really impressive makes it easy to become prominent, which via availability bias makes us all look impressive, but our average is way lower than that. And the really impressive people will welcome you to the discussion.
So how can we express this in meetup announcements? I promised to draft a phrasing. Please critique and edit in comments.
Everyone is welcome. There is no minimum in terms of age, education, or reading history. There is no minimum contribution to the community nor requirement to speak. You need not be this tall to ride. If you can read this and are interested in the meetup, we want you to come to the meetup.
3. As part of signalling "be comfortable, you are welcome here," I bought some stim toys from Stimtastic and put them out for whoever might need them. They seemed popular. Comforting, distracting, how did that go for folks? They seemed good for some folks who wanted to do something with their hands, but I was worried that we had a bit much "play" at some points.
Your recommendations on accommodating access needs are welcome. (But I'm not buying chewable stim toys to share; you get to bring your own on those.)
4. The location was sub-optimal. It is a fine meeting space, but the library is under construction, has poor parking options, and does not allow food or drink. Attendees requested somewhere more comfortable, with snacking options. Our previous meeting was at a restaurant, which offers much of that but has more background noise and seemed less socially optimal in terms of coordinating discussion. Prior to that, Michigan meetups had been at Yvain's home.
We moved to Ann Arbor from Livonia because (1) Yvain had been hosting and moved to Ann Arbor, (2) half the Livonia attendees seemed to be Ann Arbor-area folks, and (3) I knew the library had a free meeting room.
Recommendations and volunteers for a meeting site in the area are welcome. I'm in Lansing and not well set up for a group of our size.
5. We had 17 people, although not all at once. It was suggested that we break up into two or more groups for part of the discussion. This is probably a good idea, and it would give more people a chance to participate.
6. Many groups have pre-defined topics or projects. No one leaped at that idea, but we can discuss on here.
7. Rationalist game night game night was another suggestion. I like it. Again, volunteers for hosts are welcome. Many public locations like restaurants are problematic for game nights.
A translation of the opinion piece can be found here.
Effective altruism is a great concept, but it's not trivial to sell. There are therefore good reasons to ally ourselves with other rationalist memes to increase the level of rationality and effectiveness in the world. One powerful such rationalist meme is "evidence-based policy", which is inspired by the "evidence-based medicine" movement.
The exact meaning of evidence-based policy is somewhat disputed, but generally proponents of evidence-based policy demand that the standards on which policy is based should be raised. Many believe strongly in randomized control trials (RCTs) and in the "hierarchy of evidence", but there is not complete agreement on the strength of RCTs relative to other kinds of studies.
In the US and the UK, there are several organizations which work on evidence-based policy, such as the British What Works Network and the American Coalition for Evidence-Based Policy. Inspired by them, I took the initative to start a Swedish network for evidence-based policy at the start of this year. We are by now around 50 (depending on how you count) researchers, civil servants, journalists, consultants, students and other activists in the network. Only myself and a few others are EA members, so it's not an EA organization, but as I argued in my previous post, I do believe working on this nevertheless is an effectively altruistic cause.
One difference between us and What Works is that we aim to be a broad campaigning organization. We believe that policy not being evidence-based is not only due to a lack of knowledge, but also due to a lack of will, especially among politicians. Politicians often disregard expert advice (on what policies are the most effective to reach a given set of goals) which goes against their political prejudices. Therefore we need to put pressure on politicians - not the least in the media - rather than just work behind the scenes as an expert organization.
II (Most linked replies below are in Swedish)
Our activities were fairly modest until last Sunday, when we wrote an opinion piece calling for evidence-based policy (English). The opinion piece was published in the most widely-read broadsheet, Dagens Nyheter, on DN Debatt - a sort of op-ed forum. DN Debatt has a special standing in Swedish politics. Everybody reads it and it's well-respected.
Hence we had expected a lot of attention, but the results still exceeded them. Ours was the second most shared DN Debatt-article in the month of May. We got seven replies in Dagens Nyheter, were strongly criticized in the other main broadsheet, Svenska Dagbladet (conservative), parodied in a popular public service (equivalent of BBC) podcast, and were also commented on in a number of smaller newspapers. The discussion on Twitter was pretty intense. Subsequently, we also published two replies to replies in Dagens Nyheter and Svenska Dagbladet.
It's hard to tell what the majority opinion on our piece was. Certainly, there was a lot of praise and a lot of Facebook likes, but also some fierce criticism. This was almost exclusively down to misunderstandings. I won't bog you down with all of the details here, but will rather summarize my general conclusions. They could be useful for anyone trying to write on evidence-based policy or related concepts in other countries.
I should say that "evidence-based policy" isn't as entrenched a concept in Sweden as it is in the US and the UK, which probably played to our disadvantage.
1) You need to be very clear over the means-ends-distinction. Evidence-based policy is about making the methods for reaching your political goals (happiness, equality, liberty, etc) more effective by the use of evidence. It is not about propagating any particular set of political goals. We tried to be clear about this, but partly failed for two reasons. Firstly, Dagens Nyheter set the headline, which was misleading. Second, we only clarified this distinction at the end. It should have been at the top.
2) There is a straw man conception of evidence-based policy, or expert-informed policy and political rationality more generally, akin to Julia Galef's "Straw Vulcan" conception of rationality. Perhaps this varies a bit from country to country, but in Sweden it's strong. Let's call it "Straw Soviet" for now (please come with suggestions!).
According to this conception, evidence-based policy means technocracy (of a dictatorial form, according to the more extreme interpretations), disregard of non-quantifiable values (cf "Straw Vulcan"), disregard of emotions, "Mad Scientist"-conception of society as a labratory, etc, etc. You need to everything you can to counter such interpretations. I certainly underestimated the power of this straw man meme. I should also say that the Straw Soviet is probably more vicious than the Straw Vulcan, who seems more innocent (perhaps this is partly down to Julia's plafyul presentation of it, though).
For instance, Svenska Dagbladet's criticism was all about the "Straw Soviet". We were said to want to "design voter behaviour" (this was also partly due to the article having been signed by a few nudgers who call themselves "behavioural engineers" - a big trigger of the Straw Soviet). Here are some more quotes:
It is perhaps not the “enlightened despot” who is called for in the opinion piece, but rather Dr Despot. Today’s most frightening reading came from the recently formed “Network for Evidence-Based Policy” (Dagens Nyheter 1 June).
Since there probably are very few citizens who base their votes on research reports, free elections yields results which are not evidence-based. According to the argument in the opinion piece, that means that since we “see the world through partisan lenses”, the election results are as a rule problematic or directly harmful.
Now if the network were correct, true evidence-based policies would lead to a single proposal, a solution “free from ideology and populism”. That would in turn mean that all parties arrived at the same answer, and it is absolutely impossible why that – though ever so full of evidence – would be desirable.
A vibrant democracy is based on the existence of conflicts of opinion and value, intellectual diversity and the citizen’s right to freely express it. The complete and rational citizen is an anomaly, and based on the unpleasant idea that enlightened powers can raise, design, a new man.
Paradoxically, it is precisely highly ideological regimes which have attempted just that. The results have been devastating.
We got several other replies along these lines, though we also got a much more positive one from Dagens Nyheter itself. A large group of replies treated more technical and hum-drum issues concerning RCTs, practical policy-making, etc.
3) Connected to the Straw Vulcan and the Straw Soviet, there is a "Straw Naive Positivist Scientist" (again, suggestions for better terms are welcome), who thinks that knowledge is easily obtainable even in messy fields like economics, that it's easy to reach consensus if you just don't lead political misconceptions mislead you, that you always easily can infer policy-advice from research, etc. We got a lot of criticism which was based on the Straw Naive Positivist. Obviously, we don't hold any of those views.
4) People read very superficially. This is not only true of the man in the street, but also of many journalists, politicians, etc. At some level I know this, having myself written about research on this on my blog, but it's harder to make full use of that knowledge when you write.
Also lots of people don't use the principle of charity at all. Some of the replies - including one from a philosophy professor - were exceedingly uncharitable. Thus don't expect people to use the principle of charity - especially when emotional memes like the Straw Soviet are around.
When you fight such powerful memes, you need to be extremely clear. You need to say the things you really want to get across early, to repeat them, and to give examples. If at all possible, you should control the title, since that sets so much of the tone of the piece (give the publishers a juicy suggestion and they might buy it). Don't say too much, but focus on getting the central message across.
This is so different from writing an academic paper. Of course that's obvious, but it's one thing to get it on an intellectual level, quite another to really internalize it. If you could get a skilled public communicator on board, that would be very useful.
I also think it would be good to pre-test major articles (e.g. on Mechanical Turk) to get a clearer picture of whether the message gets across. If you don't want the content to leak beforehand, that might not be doable, though.
5) We were probably a bit too extreme regarding RCTs, which triggered the Straw Soviet and the Straw Naive Positivist (for epistemological and ethical reasons). It would have been more tactical to emphasize other stuff.
6) We would have come off as more concrete if we had based our opinion piece on a research report on the state of Swedish policy-making. It's great if you can do that, but I don't think it would have been rational for us (see below).
7) We should have stressed how big the movement on evidence-based policy is in the US and the UK. For instance, we could have mentioned that "Obama's 2016 budget calls for an emphasis on evidence-based approaches at all levels of government". Obama being popular and respected in Sweden, that would have done much to disarm the Straw Soviet.
8) It was a mistake to mention legal means as a way of making politics more evidence-based, since it strongly triggers the Soviet meme. Even those who otherwise supported us criticized this suggestion.
In our replies, we focused on rectifying the misunderstandings, focusing on the claim that we are calling for "Dr Despot". These replies normally gets much less attention, and so it was with ours as well. However, the reception also was more unanimously positive, especially from academics and civil servants who know the field.
I don't regret writing this opinion piece at this early stage. Before I started writing it (I wrote the body of the text, and the others then made minor tweaks) there wasn't much activity in our network. Now, we have many more members, including more senior ones. Also, those who already were in the network grew much more enthusiastic after the publication. Thus all-in-all it's been a major success. Still, I think you can learn a lot from things we could have done better.
I'll write more later on how the network is developing more generally. Also I should add that I'm still digesting what I've learnt, so my conclusions aren't set in stone. Any comments are welcome.
Latest AI success implies that strong AI may be near.
"There's something magical about Recurrent Neural Networks (RNNs). I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times. What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I've in fact reached the opposite conclusion). Fast forward about a year: I'm training RNNs all the time and I've witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. This post is about sharing some of that magic with you.
We'll train RNNs to generate text character by character and ponder the question "how is that even possible?"
By the way, together with this post I am also releasing code on Github that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. You can also use it to reproduce my experiments below. But we're getting ahead of ourselves; What are RNNs anyway?"
View more: Next