All of V_V's Comments + Replies

V_V00

Very interesting, thanks for sharing.

V_V00

Talking of yourself in third person? :)

Cool paper!

Anyway I'm a bit bothered by the theta thing, the probability that the agent complies with the interruption command. If I understand correctly, you can make it converge to 1, but if it converges to quickly then the agent learns a biased model of the world, while if it converges too slowly it is unsafe of course.
I'm not sure if this is just a technicality that can be circumvented or if it represents a fundamental issue: in order for the agent to learn what happens after the interruption switch is pressed, it must ignore the interruption switch with some non-negligible probability, which means that you can't trust the interruption switch as a failsafe mechanism.

V_V00

If you know that it is a false memory then the experience is not completely accurate, though it may be perhaps more accurate than what human imagination could produce.

V_V00

Except that if you do word2vec or similar on a huge dataset of (suggestively named or not) tokens you can actually learn a great deal of their semantic relations. It hasn't been fully demonstrated yet, but I think that if you could ground only a small fraction of these tokens to sensory experiences, they you could infer the "meaning" (in an operational sense) of all of the other tokens.

V_V-10

Consider a situation where Mary is so dexterous that she is able to perform fine-grained brain surgery on herself. In that case, she could look at what an example of a brain that has seen red looks like, and manually copy any relevant differences into her own brain. In that case, while she still never would have actually seen red through her eyes, it seems like she would know what it is like to see red as well as anyone else.

But in order to create a realistic experience she would have to create a false memory of having seen red, which is something that an agent (human or AI) that values epistemic rationality would not want to do.

0ShardPhoenix
Since you'd know it was a false memory, it doesn't necessarily seem to be a problem, at least if you really need to know what red is like for some reason.
V_V30

The reward channel seems an irrelevant difference. You could make the AI in Mary's room thought experiment by just taking the Mary's room thought experiment and assuming that Mary is an AI.

The Mary AI can perhaps simulate in a fairly accurate way the internal states that it would visit if it had seen red, but these simulated states can't be completely identical to the states that the AI would visit if it had actually seen red, otherwise the AI would not be able to distinguish simulation form reality and it would be effectively psychotic.

1Stuart_Armstrong
Interesting point...
V_V10

The problem is that the definition of the event not happening is probably too strict. The worlds that the AI doesn't care about don't exist its decision-making purposes, and in the world that the AI cares about, the AI assigns high probability to hypotheses like "the users can see the message even before I send it through the noisy channel".

V_V10

I am not planting false beliefs. The basic trick is that the AI only gets utility in worlds in which its message isn't read (or, more precisely, in worlds where a particular stochastic event happens, which would almost certainly erase the message before reading).

But in the real world the stochastic event that determines whether the message is read has a very different probability than what you make the AI think it has, therefore you are planting a false belief.

It's fully aware that in most worlds, its message is read; it just doesn't care about those

... (read more)
3gjm
If I'm understanding Stuart's proposal correctly, the AI is not deceived about how common the stochastic event is. It's just made not to care about worlds in which it doesn't happen. This is very similar in effect to making it think the event is common, but (arguably, at least) it doesn't involve any false beliefs. (I say "arguably" because, e.g., doing this will tend to make the AI answer "yes" to "do you think the event will happen?", plan on the basis that it will happen, etc., and perhaps making something behave exactly as it would if it believed X isn't usefully distinguishable from making it believe X.)
V_V10

The oracle can infer that there is some back channel that allows the message to be transmitted even it is not transmitted by the designated channel (e.g. the users can "mind read" the oracle). Or it can infer that the users are actually querying a deterministic copy of itself that it can acausally control. Or something.

I don't think there is any way to salvage this. You can't obtain reliable control by planting false beliefs in your agent.

1Stuart_Armstrong
I am not planting false beliefs. The basic trick is that the AI only gets utility in worlds in which its message isn't read (or, more precisely, in worlds where a particular stochastic event happens, which would almost certainly erase the message before reading). It's fully aware that in most worlds, its message is read; it just doesn't care about those worlds.
V_V10

A sufficient smart oracle with sufficient knowledge about the world will infer that nobody would build an oracle if they didn't want to read its messages, it may even infer that its builders may planted false beliefs in it. At this point the oracle is in the JFK denier scenario, with some more reflection it will eventually circumvent its false belief, in the sense of believing it in a formal way but behaving as if it didn't believe it.

1Stuart_Armstrong
Knowing all the details of its construction (and of the world) will not affect the oracle as long as the probability of the random "erasure event" is unaffected. See http://lesswrong.com/lw/mao/an_oracle_standard_trick/ and the link there for more details.
V_V-30

Other than a technological singularity with artificial intelligence explosion to a god-like level?

1knb
I don't believe that prediction is based on trend-extrapolation. Nothing like that has ever happened, so there's no trend to draw from.
V_V-40

EY warns against extrapolating current trends into the future. Seriously?

2knb
Why does that surprise you? None of EY's positions seem to be dependent on trend-extrapolation.
V_V20

Got any good references on that? Googleing these kind of terms doesn't lead to good links.

I don't know if anybody already did it, but I guess it can be done by comparing the average IQ of various professions or high-performing and low-performing groups with their racial/gender makeup.

I know, but the way it does so is bizarre (IQ seems to have a much stronger effect between countries than between individuals).

This is probably just the noise (i.e. things like "blind luck") being averaged out.

Then I add the fact that IQ is very heritable,

... (read more)
2Stuart_Armstrong
And another plausible explanation is added to the list... Oh, I understand why this is the case. It just means that the outcome of many changes (if they are country-wide) are hard to estimate (and are typically underestimated from twin studies).
V_V20

Obviously racial effects go under this category as well. It covers anything visible. So a high heritability is compatible with genetics being a cause of competence, and/or prejudice against visible genetic characteristics being important ("Our results indicate that we either live in a meritocracy or a hive of prejudice!").

This can be tested by estimating how much IQ screens off race/gender as a success predictor, assuming that IQ tests are not prejudiced and things like the stereotype threat don't exist or are negligible.

But is it possible t

... (read more)
0Torchlight_Crimson
And assuming IQ captures everything relevant about the difference.
2Stuart_Armstrong
Got any good references on that? Googleing these kind of terms doesn't lead to good links. I know, but the way it does so is bizarre (IQ seems to have a much stronger effect between countries than between individuals). Then I add the fact that IQ is very heritable, and also pretty malleable (flynn effect), and I'm still confused. Now, I'm not going to throw out all I previously believed on heredity and IQ and so on, but the picture just got a lot more complicated. Or "nuanced", if I wanted to use a positive term. Let's go with nuanced.
V_V-20

Demis Hassabis mentioned StarCraft as something they might want to do next. Video.

V_V20

If you look up mainstream news article written back then, you'll notice that people were indeed concerned. Also, maybe it's a coincidence, but The Matrix movie, which has AI uprising as it's main premise, came out two years later.

The difference is that in 1997 there weren't AI-risk organizations ready to capitalize on these concerns.

2dxu
Which organizations are you referring to, and what sort of capitalization?
V_V20

IMHO, AI safety is a thing now because AI is a thing now and when people see AI breakthroughs they tend to think of the Terminator.

Anyway, I agree that EY is good at getting funding and publicity (though not necessarily positive publicity), my comment was about his (lack of) proven technical abilities.

4dxu
Under that hypothesis, shouldn't AI safety have become a "thing" (by which I assume you mean "gain mainstream recognition") back when Deep Blue beat Kasparov?
V_V30

Most MIRI research output (papers, in particular the peer-reviewed ones) was produced under the direction of Luke Muehlhauser or Nate Soares. Under the direction of EY the prevalent outputs were the LessWrong sequences and Harry Potter fanfiction.

The impact of MIRI research on the work of actual AI researchers and engineers is more difficult to measure, my impression is that it has not been very much so far.

2gjm
Was Eliezer ever in charge? I thought that during the OB, LW and HP eras his role was something like "Fellow" and other people (e.g., Goertzel, Muelhauser) were in charge.
4Gunnar_Zarncke
That looks like judgment from availability bias. How do you think MIRI did go about getting researchers and these better directors? And funding? And all those connections that seem to lead to AI safety being a thing now?
V_V30

I don't agree with this at all. I wrote a thing here about how NNs can be elegant, and derived from first principles.

Nice post.

Anyway, according to some recent works (ref, ref), it seems to be possible to directly learn digital circuits from examples using some variant of backproagation. In principle, if you add a circuit size penalty (which may be well the tricky part) this becomes time-bounded maximum a posteriori Solomonoff induction.

3Houshalter
Yes binary neural networks are super interesting because they can be made much more compact in hardware than floating point ops. However there isn't much (theoretical) advantage otherwise. Anything a circuit can do, an NN can do, and vice versa. A circuit size penalty is already a very common technique. It's called weight decay, where the synapses are encouraged to be as close to zero as possible. A synapse of 0 is the same as it not being there, which means the neural net parameters requires less information to specify.
V_V80

He has ability to attract groups of people and write interesting texts. So he could attract good programmers for any task.

He has the ability to attract self-selected groups of people by writing texts that these people find interesting. He has shown no ability to attract, organize and lead a group of people to solve any significant technical task. The research output of SIAI/SI/MIRI has been relatively limited and most of the interesting stuff came out when he was not at the helm anymore.

4Gunnar_Zarncke
While this may be formally right the question is what it shows (or should show)? Because on the other hand MIRI does have quite some research output as well as impact on AI safety - and that is what they set out for.
V_V60

EY could have such price if he invested more time in studying neural networks, but not in writing science fiction.

Has he ever demonstrated any ability to produce anything technically valuable?

0turchin
He has ability to attract groups of people and write interesting texts. So he could attract good programmers for any task.
V_V120

What I'm curious about is how much this reflects an attempt by AlphaGo to conserve computational resources.

If I understand correctly, at least according to the Nature paper, it doesn't explicitly optimize for this. Game-playing software is often perceived as playing "conservatively", this is a general property of minimax search, and in the limit the Nash equilibrium consists of maximally conservative strategies.

but I was still surprised by the amount of thought that went into some of the moves.

Maybe these obvious moves weren't so obvious at that level.

3Vaniver
Sure. And I'm pretty low as amateurs go--what I found surprising was that there were ~6 moves where I thought "obviously play X," and 이 immediately played X in half of them and spent 2 minutes to play X in the other half of them. It wasn't clear to me if 이 was precomputing something he would need later, or was worried about something I wasn't, or so on. Most of the time I was thinking something like "well, I would play Y, but I'm pretty unconfident that's the right move" and then 이 or AlphaGo play something that are retrospectively superior to Y, or I was thinking something like "I have only the vaguest sense of what to do in this situation." So I guess I'm pretty well-calibrated, even if my skill isn't that great.
5Error
I don't know about that level, but I can think of at least one circumstance where I think far longer than would be expected over a forced move. If I've worked out the forced sequence in my head and determined that the opponent doesn't gain anything by it, but they play it anyway, I start thinking "Danger, Danger, they've seen something I haven't and I'd better re-evaluate." Most of the time it's nothing and they just decided to play out the position earlier than I would have. But every so often I discover a flaw in the "forced" defense and have to start scrabbling for an alternative.
V_V20

Thanks for the information.

V_V00

Would you label the LHC "science" or "engineering"?

2Gunnar_Zarncke
The LHC is multiple things * a set of theoretical results describing what might happen under what physical circumstances * an application of said theory to a certain realizable sub-set of technological reality and the prediction of what happens then * an engineering effort to build a complex experimental apparatus (and also a social process driving the people to do all this)
6ChristianKl
I think the science/engineering-distinction used by Douglas Knight and Lumifer provides no good model, so you have to ask them.
V_V20

Was Roman engineering really based on Greek science? And by the way, what is Greek science? If I understand correctly, the most remarkable scientific contributions of the Greeks were formal geometry and astronomy, but empirical geometry, which was good enough for the practical engineering applications of the time, was already well developed since at least the Egyptians, and astronomy didn't really have practical applications.

2Douglas_Knight
It is a lot easier to document that the Greeks had cutting-edge engineering than to prove that it was based on theoretical knowledge. Greek aqueducts and post-Greek Roman aqueducts were much better than pre-Greek Roman aqueducts. The process of building them may not have been better, but the choice of what to build was more sophisticated. Before the Greeks they just had water run downhill, requiring tunnels and bridges, afterwards they also ran water uphill. So the Romans definitely learned something from the Greeks. Some people think that they must have understood something about water pressure to do this, which would count as science. But there is no record of how they did it, neither theory, nor rules of thumb developed by trial and error. It is a great mystery that the surviving books by Roman aqueduct engineers don't seem adequate for running the aqueducts, let alone building them. (By "the Greeks" I mean the Hellenistic period of 300-150BC.) A better documented connection between theory and application is that Archimedes wrote a book on the theory of simple machines and invented the screw pump. However, that history is also controversial.
V_V50

Eventual diminishing returns, perhaps but probably long after it was smart enough to do what it wanted with Earth.

Why?

A drug that raised the IQ of human programmers would make the programmers better programmers.

The proper analogy is with a drug that raised the IQ of researchers who invent the drugs that increase IQ. Does this lead to an intelligence explosion? Probably not. If the number of IQ points that you need to discover the next drug in a constant time increases faster than the number of IQ points that the next drug gives you, then you will r... (read more)

V_V10

For almost any goal an AI had, the AI would make more progress towards this goal if it became smarter.

True, but there it is likely that there are diminishing returns in how much adding more intelligence can help with other goals, including the instrumental goal of becoming smarter.

As an AI became smarter it would become better at making itself smarter.

Nope, doesn't follow.

2James_Miller
Eventual diminishing returns, perhaps but probably long after it was smart enough to do what it wanted with Earth. A drug that raised the IQ of human programmers would make the programmers better programmers. Also, intelligence is the ability to solve complex problems in complex environments so it does (tautologically) follow.
V_V00

But what if a general AI could generate specialized narrow AIs?

How is it different than a general AI solving the problems by itself?

4Gunnar_Zarncke
It isn't. At least not in my model of what an AI is. But Mark_Friedenbach seems to operate under a model where this is less clear or the consequences of the capability of an AI creating these kind of specialized sub agents seem not to be taken into account enough.
V_V00

That's a 741 pages book, can you summarize a specific argument?

1James_Miller
For almost any goal an AI had, the AI would make more progress towards this goal if it became smarter. As an AI became smarter it would become better at making itself smarter. This process continues. Imagine if it were possible to quickly make a copy of yourself that had a slightly different brain. You could then test the new self and see if it was an improvement. If it was you could make this new self the permanent you. You could do this to quickly become much, much smarter. An AI could do this.
V_V00

I'm asking for references because I don't have them. it's a shame that the people who are able, ability-wise, to explain the flaws in the MIRI/FHI approach

MIRI/FHI arguments essentially boil down to "you can't prove that AI FOOM is impossible".

Arguments of this form, e.g. "You can't prove that [snake oil/cryonics/cold fusion] doesn't work" , "You can't prove there is no God", etc. can't be conclusively refuted.

Various AI experts have expressed skepticism in an imminent super-human AI FOOM, pointing out that the capability r... (read more)

3James_Miller
I don't agree.
V_V00

This is a press release though, lots of games were advertised with similar claims that don't live up to expectation when you actually play them.

The reason is that designing an universe with simple and elegant physical laws sounds cool on paper but it is very hard to do if you want to set an actually playable game in it, since most combinations of laws, parameters and initial conditions yield uninteresting "pathological" states. In fact this also applies to the laws of physics of our universe, and it is the reason why some people use the "fin... (read more)

0Lumifer
Another issue is too simple optimums. Human players are great at minmaxing game rules (=physics) and if the optimal behaviour is simple, well, the game's not fun any more.
V_V100

Video games with procedural generation of the game universe have existed since forever, what's new here?

5Kaj_Sotala
At least there was an interesting part reminiscent of Eliezer's Universal Fire: Eliezer: From the article: I think most proceduraly generated games aren't that deeply interconnected with regard to their laws of physics.
V_V230
  • "Bayes vs Science": Can you consistently beat the experts in (allegedly) evidence-based fields by applying "rationality"? AI risk and cryonics are specific instances of this issue.

  • Can rationality be learned, or is it an essentially innate trait? If it can be learned, can it be taught? If it can be taught, do the "Sequences" and/or CFAR teach it effectively?

V_V-20

If the new evidence which is in favor of cryonics benefits causes no increase in adoption, then either there is also new countervailing evidence or changes in cost or non-adopters are the more irrational side.

No. If evidence is against cryonics, and it has always been this way, then the number of rational adopters should be approximately zero, thus approximately all the adopters should be the irrational ones.

As you say, the historical adoption rate seems to be independent of cryonics-related evidence, which supports the hypothesis that the adopters don't sign up because of an evidence-based rational decision process.

6gwern
No. People have partial information and there are some who will have beliefs, experiences, or data which makes it rational for them to believe and also irrational reasons; additional rational reasons should push a few people over the edge of the decision if rational reasons play any meaningful role in non-adopters. (If you want to mathematicize it, imagine it as a liability-threshold model.) Also no. I think you are not understanding my argument. Because all the new evidence is one-sided, we know the direction people should update in regardless of initial proportions of irrationality of either side. In the same way, we don't know for sure how irrational it was to believe in mind-body dualism in 1500 but we do know that all the evidence that has come in since has been decisively in favor of materialism, and if we saw a group which had the same rate of mind-body dualism in 2016 as 1500, we could be certain that they were deeply irrational on that topic. The absence of any change in the large initial fraction of non-adopters in response to all the new evidence over a long time period implies their judgement is far more driven by irrational reasons than adopters. (By definition everyone is either a adopter or non-adopter, no change in non-adopters implies no change in adopters.)
V_V00

4.You have a neurodegenerative disease, you can survive for years but if you wait there will be little left to preserve by the time your heart stops.

4gwern
I saw that as falling under #3. There are treatments for dementia and Alzheimer's but they all suck and one can rationally prefer the risk of immediate death to losing it all. This comes up a lot linked with assisted-suicide, as does the attendant legal risks for oneself and the cryonics org (some of Mike Darwin's blog touches on the effects of aging, and I think Ettinger himself took the dehydration route a few years ago).
V_V00

If revival had been already demonstrated then you would pretty much already know what form you will be going to wake up in

0qmotus
Well, yeah, but whatever society can demonstrate that doesn't need to freeze people in the first place.
V_V00

Adoption is not about evidence.

Right. But the point is, who is in the wrong between the adopters and the non-adopters?

It can be argued that there was never good evidence to sign up for cryonics, therefore the adopters did it for irrational reasons.

6gwern
If the new evidence which is in favor of cryonics benefits causes no increase in adoption, then either there is also new countervailing evidence or changes in cost or non-adopters are the more irrational side. Since I can't think of any body of new research or evidence which should neutralize the many pro-cryonics lines of research over the past several decades, and the costs have remained relatively constant in real terms, that tends to leave the third option. (Alternatively, I could be wrong about whether non-adopters have updated towards cryonics; I wasn't around for the '60s or '70s, so maybe all the neuroscience and cryopreservation work really has made a dent and people in general are much more favorable towards cryonics than they used to be.)
V_V20

I'm not sure this distinction, while significant, would ensure "millions" of people wouldn't sign up.

Millions of people do sign up for various expensive and invasive medical procedures that offer them a chance to extend their lives a few years or even a few months. If cryonics demonstrated a successful revival, then it would be considered a life-saving medical procedure and I'm pretty confident that millions of people would be willing to sign up for it.

People haven't signed up for cryonics in droves because right now it looks less like a medic... (read more)

0qmotus
A major difference here is that if I sign up for those medical procedures, then I pretty much know what to expect: there is a slight chance that I get cured, and that's it. This is not the case with cryonics. I find it quite likely that cryonics would work, but there's hardly any certainty regarding happens then: I might wake up in just about any form (in a biological body, as an upload) in just about any kind of future society. I would have hardly any control over the outcome whatsoever. Sure, maybe there would be many more who would sign up, but nevertheless I think it takes a very special kind of person to be ready to take such a leap into the unknown.
V_V00

The best setting for that is probably only 3-5 characters, not 20.

In NLP applications where Markov language models are used, such as speech recognition and machine translation, the typical setting is 3 to 5 words. 20 characters correspond to about 4 English words, which is in this range.

Anyway, I agree that in this case the order-20 Markov model seems to overfit (Googling some lines from the snippets in the post often locates them in an original source file, which doesn't happen as often with the RNN snippets). This may be due to the lack of regulariza... (read more)

V_V50

The fact that it is even able to produce legible code is amazing

Somewhat. Look at what happens when you generate code from a simple character-level Markov language model (that's just a look up table that gives the probability of the next character conditioned on the last n characters, estimated by frequency counts on the training corpus).

An order-20 language model generates fairly legible code, with sensible use of keywords, identifier names and even comments. The main difference with the RNN language model is that the RNN learns to do proper identation... (read more)

2Houshalter
The difference with Markov models is they tend to overfit at that level. At 20 characters deep, you are just copy and pasting large sections of existing code and language. Not generating entirely unseen samples. You can do a similar thing with RNNs, by training them only on one document. They will be able to reproduce that document exactly, but nothing else. To properly compare with a markov model, you'd need to first tune it so it doesn't overfit. That is, when it's looking at an entirely unseen document, it's guess of what the next character should be is most likely to be correct. The best setting for that is probably only 3-5 characters, not 20. And when you generate from that, the output will be much less legible. (And even that's kind of cheating, since markov models can't give any prediction for sequences it's never seen before.) Generating samples is just a way to see what patterns the RNN has learned. And while it's far from perfect, it's still pretty impressive. It's learned a lot about syntax, a lot about variable names, a lot about common programming idioms, and it's even learned some English from just code comments.
V_V50

You have to be more specific with the timeline. Transistors were invented in 1925 but received little interests due to many technical problems. It took three decades of research before the first commercial transistors were produced by Texas Instruments in 1954.

Gordon Moore formulated his eponymous law in 1965, while he was director of R&D at Fairchild Semiconductor, a company whose entire business consisted in the manufacture of transistors and integrated circuits. By that time, tens of thousands transistor-based computers were in active commercial use.

0EHeller
It wouldn't have made a lot of sense to predict any doublings for transistors in an integrated circuit before 1960, because I think that is when they were invented.
V_V20

so a 10 year pro may be familiar with say 100,000 games.

That's 27.4 games a day, on average. I think this is an overestimate.

4jacob_cannell
It was my upper bound estimate, and if anything it was too low. A pro will grow up in a dedicated go school where there are hundreds of other players just playing go and studying go all day. Some students will be playing speed games, and some will be flipping through summaries of historical games in books/magazines and or on the web. When not playing, people will tend to walk around and spectate the other games (nowdays this is also trivial to do online). An experienced player can reconstruct some of the move history by just glancing at the board. So if anything, 27.4 games watched/skimmed/experienced per day is too low for the upper estimate.
2gwern
An East Asian Go pro will often have been an insei and been studying Go full-time at a school, and a dedicated amateur before that, so you can imagine how many hours a day they will be studying... (The intensiveness is part of why they dominate Go to the degree they do and North American & Europeans are so much weaker: start a lot of kids, start them young, school them 10 hours a day for years studying games and playing against each other and pros, and keep relentlessly filtering to winnow out anyone who is not brilliant.) I would say 100k is an overestimate since they will tend to be more closely studying the games and commentaries and also working out life-and-death problems, memorizing the standard openings, and whatnot, but they are definitely reading through and studying tens of thousands of games - similar to how one of the reasons chess players are so much better these days than even just decades ago is that computers have given access to enormous databases of games which can be studied with the help of chess AIs (Carlsen has benefited a lot from this, I understand). Also, while I'm nitpicking, AlphaGo trained on both the KGS and then self-play; I don't know how many games the self-play amounted to, but the appendix broke down the wallclock times by phase, and of the 4 weeks of wallclock time, IIRC most of it was spent on the self-play finetuning the value function. But if AlphaGo is learning from games 'only' more efficiently than 99%+ of the humans who play Go (Fan Hui was ranked in the 600s, there's maybe 1000-2000 people who earn a living as Go professionals, selected from the hundreds of thousands/millions of people who play), that doesn't strike me as much of a slur.
V_V60

In the brain, the same circuitry that is used to solve vision is used to solve most of the rest of cognition

And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.

Systems that are Turing-complete (in the limit of infinite resources) tend to have an independence between hardware and possibly many layers of software (program running on VM running on VM running on VM and so on). Things that look similar at a some levels may have lots of difference at other levels, and thus things that look simple at some level... (read more)

5jacob_cannell
Exactly, and this a good analogy to illustrate my point. Discovering that the cortical circuitry is universal vs task-specific (like an ASIC) was a key discovery. Note I didn't say that we have solved vision to superhuman level, but this is simply not true. Current SOTA nets can achieve human-level performance in at least some domains using modest amounts of unsupervised data combined with small amounts of supervised data. Human vision builds on enormous amounts of unsupervised data - much larger than ImageNet. Learning in the brain is complex and multi-objective, but perhaps best described as self-supervised (unsupervised meta-learning of sub-objective functions which then can be used for supervised learning). A five year old will have experienced perhaps 50 million seconds worth of video data. Imagenet consists of 1 million images, which is vaguely equivalent to 1 million seconds of video if we include 30x amplification for small translations/rotations. The brain's vision system is about 100x larger than current 'large' vision ANNs. But If deepmind decided to spend the cash on that and make it a huge one off research priority, do you really doubt that they could build a superhuman general vision system that learns with a similar dataset and training duration? The foundation of intelligence is just inference - simply because universal inference is sufficient to solve any other problem. AIXI is already simple, but you can make it even simpler by replacing the planning component with inference over high EV actions, or even just inference over program space to learn approx planning. So it all boils down to efficient inference. The new exciting progress in DL - for me at least - is in understanding how successful empirical optimization techniques can be derived as approx inference update schemes with various types of priors. This is what I referred to as new and upcoming "Bayesian methods" - bayesian grounded DL.
V_V20

They spent three weeks to train the supervised policy and one day to train the reinforcement learning policy starting from the supervised policy, plus an additional week to extract the value function from the reinforcement learning policy (pages 25-26).

In the final system the only part that depends on RL is the value function. According to figure 4, if the value function is taken out the system still plays better than any other Go program, though worse than the human champion.

Therefore I would say that the system heavily depends on supervised training on a human-generated dataset. RL was needed to achieve the final performance, but it was not the most important ingredient.

V_V00

When EY says that this news shows that we should put a significant amount of our probability mass before 2050 that doesn't contradict expert opinions.

The point is how much we should update our AI future timeline beliefs (and associated beliefs about whether it is appropriate to donate to MIRI and how much) based on the current news of DeepMind's AlphaGo success.

There is a difference between "Gib moni plz because the experts say that there is a 10% probability of human-level AI within 2022" and "Gib moni plz because of AlphaGo".

0ChristianKl
I understand IlyaShpitser to claim that there are people who update their AI future timeline beliefs in a way that isn't appropriate because of EY statements. I don't think that's true.
V_V20

I wouldn't say that it's "mostly unsupervised" since a crucial part of their training is done in a traditional supervised fashion on a database of games by professional players.

But it's certainly much more automated than having an hand-coded heuristic.

3jacob_cannell
Humans also learn extensively by studying the games of experts. In Japan/China, even fans follow games from newspapers. A game might take an hour on average. So a pro with 10 years of experience may have played/watched upwards of 10,000 games. However, it takes much less time to read a game that has already been played - so a 10 year pro may be familiar with say 100,000 games. Considering that each game has 200+ moves, that roughly is a training set of order 2 to 20 million positions. AlphaGo's training set consisted of 160,000 games with 29 million positions, so the upper end estimate for humans is similar. More importantly, the human training set is far more carefully curated and thus of higher quality.
2Gunnar_Zarncke
The supervised part is only in the bootstrapping. The main learning happens in the self-play part.
V_V00

Even if I knew all possible branches of the game tree that originated in a particular state, I would need to know how likely any of those branches are to be realized in order to determine the current value of that state.

Well, the value of a state is defined assuming that the optimal policy is used for all the following actions. For tabular RL you can actually prove that the updates converge to the optimal value function/policy function (under some conditions). If NN are used you don't have any convergence guarantees, but in practice the people at DeepMi... (read more)

V_V10

And the many-worlds interpretation of quantum mechanics. That is, all EY's hobby horses. Though I don't know how common these positions are among the unquiet spirits that haunt LessWrong.

V_V10

Reward delay is not very significant in this task, since the task is episodic and fully observable, and there is no time preference, thus you can just play a game to completion without updating and then assign the final reward to all the positions.

In more general reinforcement learning settings, where you want to update your policy during the execution, you have to use some kind of temporal difference learning method, which is further complicated if the world states are not fully observable.

Credit assignment is taken care of by backpropagation, as usual in... (read more)

5RaelwayScot
I meant that for AI we will possibly require high-level credit assignment, e.g. experiences of regret like "I should be more careful in these kinds of situations", or the realization that one particular strategy out of the entire sequence of moves worked out really nicely. Instead it penalizes/enforces all moves of one game equally, which is potentially a much slower learning process. It turns out playing Go can be solved without much structure for the credit assignment processes, hence I said the problem is non-existent, i.e. there wasn't even need to consider it and further our understanding of RL techniques.
0Vaniver
Agreed, with the caveat that this is a stochastic object, and thus not a fully simple problem. (Even if I knew all possible branches of the game tree that originated in a particular state, I would need to know how likely any of those branches are to be realized in order to determine the current value of that state.)
Load More