JonahSinick comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong

55 Post author: Eliezer_Yudkowsky 06 June 2013 08:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (260)

You are viewing a single comment's thread. Show more comments above.

Comment author: JonahSinick 01 July 2013 09:18:23PM 3 points [-]

It seems to me like relatively narrow progress on learning is likely to be relevant to AGI. It does seem plausible that e.g. machine learning research is not too much more relevant to AGI than progress in optimization or in learning theory or in type theory or perhaps a dozen other fields, but it doesn't seem very plausible that it isn't taking us closer to AGI in expectation.

I'd be interested in your response to the following, which I wrote in another context. I recognize that I'm far outside of my domain of expertise, and what I write should be read as inquisitive rather than argumentative:

The impression that I've gotten is that to date, impressive applications of computers to do tasks that humans do are based around some combination of

  1. Brute force computation
  2. Task specific algorithms generated by humans

In particular, they doesn't seem at all relevant to mimicking human inference algorithms.

As I said in my point #2 here: I find it very plausible that advances in narrow AI will facilitate the development of AGI by enabling experimentation.

The question that I'm asking is more: "Is it plausible that the first AGI will be based on filling in implementation details of current neural networks research programs, or current statistical inference research programs?"

Something worth highlighting is that researchers in algorithms have repeatedly succeeded in developing algorithms that solve NP-complete problems in polynomial time with very high probability, or that give very good approximations to solutions to problems in polynomial time where it would be NP-complete to get the solutions exactly right. But these algorithms can't be ported from one NP-complete problem to another while retaining polynomial running time. One has to deal with each algorithmic problem separately.

From what I know, my sense is that one has a similar situation in narrow AI, and that humans (in some vague sense) have a polynomial time algorithm that's robust across different algorithmic tasks.

Comment author: paulfchristiano 02 July 2013 09:34:12AM *  3 points [-]

I don't really understand how "task specific algorithms generated by humans" differs from general intelligence. Humans choose a problem, and then design algorithms to solve the problem better. I wouldn't expect a fundamental change in this situation (though it is possible).

But these algorithms can't be ported from one NP-complete problem to another while retaining polynomial running time.

I think this is off. A single algorithm currently achieves the best known approximation ratio on all constraint satisfaction problems with local constraints (this includes most of the classical NP-hard approximation problems where the task is "violate as few constraints as possible" rather than "satisfy all constraints, with as high a score as possible"), and is being expanded to cover increasingly broad classes of global constraints. You could say "constraint satisfaction is just another narrow task" but this kind of classification is going to take you all the way up to human intelligence and beyond. Especially if you think 'statistical inference' is also a narrow problem, and that good algorithms for planning and inference are more of the same.

Comment author: JonahSinick 02 July 2013 02:28:19PM 1 point [-]

I don't really understand how "task specific algorithms generated by humans" differs from general intelligence. Humans choose a problem, and then design algorithms to solve the problem better. I wouldn't expect a fundamental change in this situation (though it is possible).

All I'm saying here is that general intelligence can construct algorithms across domains, whereas my impression is that impressive human+ artificial intelligence to date hasn't been able to construct algorithms across domains.

General artificial intelligence should be able to prove:

  1. The Weil conjectures
  2. The geometrization conjecture,
  3. Monstrous Moonshine
  4. The classification of simple finite groups
  5. The Atiyah Singer Index Theorem
  6. The Virtual Haken Conjecture

and thousands of other such statements. My impression is that current research in AI is analogous to working on proving these things one at a time.

Working on the classification of simple finite groups could indirectly help you prove the Atiyah-Singer Index Theorem on account of leading to the discovery of structures that are relevant, but such work will only make a small dent on the problem of proving the Atiyah-Singer Index Theorem. Creating an algorithm that can prove these things (that's not over-fitted to the data) is a very different problem from that of proving the theorems individually.

Do you think that the situation with AI is analogous or disanalogous?

A single algorithm currently achieves the best known approximation ratio on all constraint satisfaction problems with local constraints (this includes most of the classical NP-hard approximation problems where the task is "violate as few constraints as possible" rather than "satisfy all constraints, with as high a score as possible"), and is being expanded to cover increasingly broad classes of global constraints.

I'm not sure if I follow. Is the algorithm that you have in mind the conglomeration of all existing algorithms?

If so, it's entirely unclear how quickly the algorithm is growing relative to the problems that we're interested in.

Comment author: paulfchristiano 03 July 2013 08:24:38AM 3 points [-]

I'm not sure if I follow. Is the algorithm that you have in mind the conglomeration of all existing algorithms?

No, there is a single SDP rounding scheme that gets optimal performance on all constraint satisfaction problems (the best we know so far, and the best possible under the unique games conjecture).

Comment author: JonahSinick 03 July 2013 06:38:00PM 2 points [-]

Can you give a reference?

Comment author: paulfchristiano 03 July 2013 10:53:30PM 3 points [-]
Comment author: lukeprog 07 July 2013 09:58:31PM 1 point [-]
Comment author: JonahSinick 03 July 2013 06:41:22PM 0 points [-]

I'd be interested in your thoughts on this discussion post.

Comment author: jsteinhardt 03 July 2013 05:45:53AM 2 points [-]

I would disagree with the statement that our algorithms are all domain-specific. Often some amount of domain-specific knowledge is needed to design a good algorithm, but it is often quite minimal. For instance, my office-mate is building a parser for interpreting natural language semantics, and has taken zero linguistics classes (but has picked up some amount of linguistics knowledge from talks, etc.). Of course, he's following in the footsteps of people who do know linguistics, but the point is just that the methods people use tend to be fairly general despite requiring task-specific tuning.

I agree, of course, that there are few systems that work across multiple domains, but I'm not sure that that's a fundamental issue so much as a symptom of broader issues that surface in this context (such as latent variables and complex features).

Comment author: JonahSinick 03 July 2013 06:40:56PM 1 point [-]

Thanks Jacob. I'd be interested in your thoughts on this discussion post.

Comment author: gwern 01 July 2013 09:54:06PM 1 point [-]

Something worth highlighting is that researchers in algorithms have repeatedly succeeded in developing algorithms that solve NP-complete problems in polynomial time with very high probability, or that give very good approximations to solutions to problems in polynomial time where it would be NP-complete to get the solutions exactly right. But these algorithms can't be ported from one NP-complete problem to another while retaining polynomial running time. One has to deal with each algorithmic problem separately.

You can't do that? From random things like computer security papers, I was under the impression that you could do just that - convert any NP problem to a SAT instance and toss it at a high-performance commodity SAT solver with all its heuristics and tricks, and get an answer back.

Comment author: Vaniver 01 July 2013 10:09:28PM 4 points [-]

You can't do that? From random things like computer security papers, I was under the impression that you could do just that - convert any NP problem to a SAT instance and toss it at a high-performance commodity SAT solver with all its heuristics and tricks, and get an answer back.

You can do this. Minor caveat: this works for overall heuristic methods- like "tabu search" or "GRASP"- but many of the actual implementations you would see in the business world are tuned to the structure of the probable solution space. One of the traveling salesman problem solvers I wrote a while back would automatically discover groups of cities and move them around as a single unit- useful when there are noticeable clusters in the space of cities, not useful when there aren't. Those can lead to dramatic speedups (or final solutions that are dramatically closer to the optimal solution) but I don't think they translate well across reformulations of the problem.

Comment author: JonahSinick 01 July 2013 10:14:14PM *  3 points [-]

I'm not a subject matter expert here, and just going based on my memory and what some friends have said, but according to http://en.wikipedia.org/wiki/Approximation_algorithm,

NP-hard problems vary greatly in their approximability; some, such as the bin packing problem, can be approximated within any factor greater than 1 (such a family of approximation algorithms is often called a polynomial time approximation scheme or PTAS). Others are impossible to approximate within any constant, or even polynomial factor unless P = NP, such as the maximum clique problem.

Comment author: AlexMennen 01 July 2013 10:50:01PM *  0 points [-]

You can do that. But although such algorithms will produce correct answers to any NP problem when given correct answers to SAT, that does not mean that they will produce approximate answers to any NP problem when given approximate answers to SAT. (In fact, I'm not sure if the concept of an approximate answer makes sense for SAT, although of course you could pick a different NP-complete problem to reduce to.)

Edit: My argument only applies to algorithms that give approximate solutions, not to algorithms that give correct solutions with high probability, and reading your comment again, it looks like you may have been referring to the later. You are correct that if you have a polynomial-time algorithm to solve any NP-complete problem with high probability, then you can get a polynomial-time algorithm to solve any NP problem with high probability. Edit 2: sort of; see discussion below.

Comment author: gwern 02 July 2013 03:56:33AM 1 point [-]

Oh, I see. I confused probabilistic algorithms with ones bounding error from the true optimal solution.

Comment author: JonahSinick 01 July 2013 11:39:01PM 1 point [-]

You are correct that if you have a polynomial-time algorithm to solve any NP-complete problem with high probability, then you can get a polynomial-time algorithm to solve any NP problem with high probability.

Can you give a reference?

Comment author: AlexMennen 02 July 2013 01:13:30AM 2 points [-]

If a problem is NP-complete, then by definition, any NP problem can be solved in polynomial time by an algorithm which is given an oracle that solves the NP-complete problem, which it is allowed to use once. If, in place of the oracle, you substitute a polynomial-time algorithm which solves the problem correctly 90% of the time, the algorithm will still be polynomial-time, and will necessarily run correctly at least 90% of the time.

However, as JoshuaZ points out, this requires that the algorithm solve every instance of the problem with high probability, which is a much stronger condition than just solving a high proportion of instances. In retrospect, my comment was unhelpful, since it is not known whether there are any algorithms than solve every instance of an NP-complete problem with high probability. I don't know how generalizable the known tricks for solving SAT are (although presumably they are much more generalizable than JoshuaZ's example).

Comment author: JonahSinick 02 July 2013 01:38:38AM 2 points [-]

In retrospect, my comment was unhelpful, since it is not known whether there are any algorithms than solve every instance of an NP-complete problem with high probability.

This is the key. If you had an algorithm that solved every instance of an NP-complete problem in polynomial time with high probability, you could generate a proof of the Riemann hypothesis with high probability! (Provided that the polynomial time algorithm is pretty fast, and that the proof isn't too long)

Comment author: JoshuaZ 01 July 2013 11:48:50PM 0 points [-]

It depends on think on what AlexMennen meant by this. If for example there is a single NP complete problem in BPP then it is clear that NP is in BPP. Similar remarks apply to ZPP, and in both cases, almost the entire polynomial hierarchy will collapse. The proofs here are straightforward.

If, however, Alex meant that one is picking random instance of a specific NP complete problem, and that they can be solved deterministically, then Alex's claim seems wrong. Consider for example this problem: "If an input string of length n starts with exactly floor(n^(1/2)) zeros and then a 1, treat the remainder like it is an input string for 3-SAT. If the string starts with anything else, return instead the parity of the string." This is an NP-complete problem where we can solve almost all instances with high probability since most instances are really just a silly P problem. But we cannot use this fact to solve another NP complete problem (say normal 3-SAT) with high probability.

Comment author: AlexMennen 02 July 2013 01:17:40AM 0 points [-]

in both cases, almost the entire polynomial hierarchy will collapse

Why?

Comment author: JoshuaZ 02 July 2013 01:36:48AM *  0 points [-]

in both cases, almost the entire polynomial hierarchy will collapse

Why?

Well, in the easy case of ZPP, ZPP is contained in co-NP, so if NP is contained in ZPP then NP is contained in co-NP, in which case the hierarchy must collapse to the first level.

In the case of BPP, the details are slightly more subtle and requires deeper results. If BPP contains NP, then Adelman's theorem says that then the entire polynomial hierarchy is contained in BPP. Since BPP is itself contained at finite level of the of the hierarchy, this forces collapse to at least that level.