ChosunOne

Wiki Contributions

Comments

Sorted by

I think the factors that determine your reference class can be related to changes over time in the environment you inhabit, not just how you are built.  This is what I mean by not necessarily related to reproduction.  If I cloned a billion people with the same cloning vat over 1000 years, how then would you determine reference class?  But maybe something about the environment would narrow that reference class despite all the people being made by the same process (like the presence or absence of other things in the environment as they relate to the people coming out of the vat).

I think there is another assumption (or another way of framing your assumption, I'm not sure if it is really distinct), which is that all people throughout time are alike. It's not necessarily groundless, but it is an assumption.  There could be some symmetry breaking factor that determines your position that doesn't simply align with "total humans that ever exist", and thus renders using the "birth rank" inappropriate to determine what number you have.  This thing doesn't have to be related to reproduction though.

It still seems slightly fuzzy in that other than check/mate situations no moves are fully mandatory and eg recaptures may occasionally turn out to be the wrong move?

Indeed it can be difficult to know when it is actually better not to continue the line vs when it is, but that is precisely what MCTS would help figure out.  MCTS would do actual exploration of board states and the budget for which states it explores would be informed by the policy network.  It's usually better to continue a line vs not, so I would expect MCTS to spend most of its budget continuing the line, and the policy would be updated during training with whether or not the recommendation resulted in more wins.  Ultimately though, the policy network is probably storing a fuzzy pattern matcher for good board states (perhaps encoding common lines or interpolations of lines encountered by the MCTS) that it can use to more effectively guide the search by giving it an appropriate score.

To be clear, I don't think a transformer is completely incapable of doing any search, just that it is probably not learning to do it in this case and is probably pretty inefficient at doing it when prompted to.

In chess, a "line" is sequence of moves that are hard to interrupt.  There are kind of obvious moves you have to play or else you are just losing (such as recapturing a piece, moving king out of check, performing checkmate etc).  Leela uses the neural network more for policy, which means giving a score to a given board position, which then the MCTS can use to determine whether or not to prune that direction or explore that section more.  So it makes sense that Leela would have an embedding of powerful lines as part of its heuristic, since it isn't doing to main work of search.  It's more pattern recognition on the board state, so it can learn to recognize the kinds of lines that are useful and whether or not they are "present" in the current board state.  It gets this information from the MCTS system as it trains, and compresses the "triggers" into the earlier evaluations, which then this paper explores.  

It's very cool work and result, but I feel it's too strong to say that the policy network is doing search as opposed to recognizing lines from its training at earlier board states.

The cited paper in Section 5 (Conclusion-Limitations) states plainly:


(2) We focus on look-ahead along a single line of play; we do not test whether Leela compares multiple different lines of play (what one might call search).  ... (4) Chess as a domain might favor look-ahead to an unusually strong extent.

The paper is more just looking at how Leela evaluates a given line rather than doing any kind of search.  And this makes sense.  Pattern recognition is an extremely important part of playing chess (as a player myself), and it is embedded in another system doing the actual search, namely Monte Carlo Tree Search.  So it isn't surprising that it has learned to look ahead in a straight line since that's what all of its training experience is going to entail.  If transformers were any good at doing the search, I would expect a chess bot without employing something like MCTS.

ChosunOne109

My crux is that LLMs are inherently bad at search tasks over a new domain.  Thus, I don't expect LLMs to scale to improve search.

Anecdotal evidence:  I've used LLMs extensively and my experience is that LLMs are great at retrieval but terrible at suggestion when it comes to ideas.  You usually get something resembling an amalgamation of Google searches vs. suggestions from some kind of insight.

To your question of what to do if you are outmatched and you only have an ASI at your disposal, I think the most logical thing to do is "do what the ASI tells you to".  The problem is that we have no way of predicting the outcomes if there is truly an ASI in the room.  If it's a superintelligence it is going to have better suggestions than anything you can come up with.

Then I wonder, at what point does that matter?  Or more specifically, when does that matter in the context of ai-risk?

Clearly there is some relationship between something like "more compute" and "more intelligence" since something too simple cannot be intelligent, but I don't know where that relationship breaks down.  Evolution clearly found a path for optimizing intelligence via proxy in our brains, and I think the fear is that you may yet be able to go quite further than human-level intelligence before the extra compute fails to deliver more meaningful intelligence described in your post.  

It seems premature to reject the orthogonality thesis of optimizing for things that "obviously bring more intelligence" before they start to break down.

So if I understand your point correctly, you expect something like "give me more compute" at some point fail to deliver more intelligence since intelligence isn't just "more compute"?

Load More