Some small corrections/additions to my section ("Altair agent foundations"). I'm currently calling it "Dovetail research". That's not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.
Which orthodox alignment problems could it help with?: 9. Humans cannot be first-class parties to a superintelligent value handshake
I wouldn't put number 9. Not intended to "solve" most of these problems, but is intended to help make progress on understanding the nature of the problems through...
I guess one thing I want to know is like... how exactly does the scoring work? I can imagine something like, they ran the model a zillion times on each question, and if any one of the answers was right, that got counted in the light blue bar. Something that plainly silly probably isn't what happened, but it could be something similar.
If it actually just submitted one answer to each question and got a quarter of them right, then I think it doesn't particularly matter to me how much compute it used.
I wish they would tell us what the dark vs light blue means. Specifically, for the FrontierMath benchmark, the dark blue looks like it's around 8% (rather than the light blue at 25.2%). Which like, I dunno, maybe this is nit picking, but 25% on FrontierMath seems like a BIG deal, and I'd like to know how much to be updating my beliefs.
From an apparent author on reddit:
[Frontier Math is composed of] 25% T1 = IMO/undergrad style problems, 50% T2 = grad/qualifying exam style porblems, 25% T3 = early researcher problems
The comment was responding to a claim that Terence Tao said he could only solve a small percentage of questions, but Terence was only sent the T3 questions.
things are almost never greater than the sum of their parts Because Reductionism
Isn't it more like, the value of the sum of the things is greater than the sum of the value of each of the things? That is, (where perhaps is a utility function). That seems totally normal and not-at-all at odds with Reductionism.
I'd vote for removing the stage "developing some sort of polytime solution" and just calling 4 "developing a practical solution". I think listing that extra step is coming from the perspective of something who's more heavily involved in complexity classes. We're usually interested in polynomial time algorithms because they're usually practical, but there are lots of contexts where practicality doesn't require a polynomial time algorithm, or really, where we're just not working in a context where it's natural to think in terms of algorithms with run-times.
Thank you for writing this! Your description in the beginning about trying to read about the GRT and coming across a sequence of resources, each of which didn't do quite what you wanted, is a precise description of the path I also followed. I gave up at the end, wishing that someone would write an explainer, and you have written exactly the explainer that I wanted!
Positive feedback, I am happy to see the comment karma arrows pointing up and down instead of left and right. I have some degree of left-right confusion and was always click and unclicking my comments votes to figure out which was up and down.
Also appreciate that the read time got put back into main posts.
(Comment font stuff looks totally fine to me, both before and after this change.)
[Some thoughts that are similar but different to my previous comment;]
I suspect you can often just prove the behavioral selection theorem and structural selection theorem in separate, almost independent steps.
Behavior essentially serves as an "interface", and a given behavior can be implemented by any number of different structures. So it would make sense that you need to prove something about structure separately (and t...
It's maybe also worth saying that any other description method is a subset of programs (or is incomputable and therefore not what real-world AI systems are). So if the theoretical issues in AIT bother you, you can probably make a similar argument using a programming language with no while loop, or I dunno, finite MDPs whose probability distributions are Gaussian with finite parameter descriptions.
Yeah, I think structural selection theorems matter a lot, for reasons I discussed here.
This is also one reason why I continue to be excited about Algorithmic Information Theory. Computable functions are behavioral, but programs (= algorithms) are structural! The fact that programs can be expressed in the homogeneous language of finite binary strings gives a clear way to select for structure; just limit the length of your program. We even know exactly how this mathematical parameter translates into real-world systems, because we can know exactly how many bi...
I know that there's something called the Lyapunov exponent. Could we "diminish the chaos" if we use logarithms, like with the Richter scale for earthquakes?
This is a neat question. I think the answer is no, and here's my attempt to describe why.
The Lyapunov exponent measures the difference between the trajectories over time. If your system is the double pendulum, you need to be able to take two random states of the double pendulum and say how different they are. So it's not like you're measuring the speed, or the length, or something like that. And if you ...
It possesses this subjective element (what we consider to be negligible differences) that seems to undermine its standing as a legitimate mathematical discipline.
I think I see what you're getting at here, but no, "chaotic" is a mathematical property that systems (of equations) either have or don't have. The idea behind sensitive dependence on initial conditions is that any difference, no matter how small, will eventually lead to diverging trajectories. Since it will happen for arbitrarily small differences, it will definitely happen for whatever difference...
The paper Gleick was referring to is this one, but it would be a lot of work to discern whether it was causal in getting telephone companies to do anything different. It sounds to me like the paper is saying that the particular telephone error data they were looking at could not be well-modeled as IID, nor could it be well-modeled as a standard Markov chain; instead, it was best modeled as a statistical fractal, which corresponds to a heavy-tailed distribution somehow.
Definitely on the order of "tens of hours", but it'd be hard to say more specifically. Also, almost all of that time (at least for me) went into learning stuff that didn't go into this post. Partly that's because the project is broader than this post, and partly because I have my own research priority of understanding systems theory pretty well.
Huh, interesting! So the way I'm thinking about this is, your loss landscape determines the attractor/repellor structure of your phase space (= network parameter space). For a (reasonable) optimization algorithm to have chaotic behavior on that landscape, it seems like the landscape would either have to have 1) a positive-measure flat region, on which the dynamics were ergodic, or 2) a strange attractor, which seems more plausible.
I'm not sure how that relates to the above link; it mentions the parameters "diverging", but it's not clear to me how neural network weights can diverge; aren't they bounded?
I'm curious about this part;
even though the motion of the trebuchet with sling isn't chaotic during the throw, it can be made chaotic by just varying the initial conditions, which rules out a simple closed form solution for non-chaotic initial conditions
Do you know what theorems/whatever this is from? It seems to me that if you know that "throws" constitute a subset of phase space that isn't chaotic, then you should be able to have a closed-form solution for those trajectories.
My overall review is, seems fine, some pros and some cons, mostly looks/feels the same to me. Some details;
There is a little crackpot voice in my head that says something like, "the real numbers are dumb and bad and we don't need them!" I don't give it a lot of time, but I do let that voice exist in the back of my mind trying to work out other possible foundations. A related issue here is that it seems to me that one should be able to have a uniform probability distribution over a countable set of numbers. Perhaps one could do that by introducing infinitesimals.
One model I have is that when things are exponentials (or S-curves), it's pretty hard to tell when you're about to leave the "early" game, because exponentials look the same when scaled. If every year has 2x as much activity as the previous year, then every year feels like the one that was the big transition.
For example, it's easy to think that AI has "gone mainstream" now. Which is true according to some order of magnitude. But even though a lot of politicians are talking about AI stuff more often, it's nowhere near the top of the list for most of them. I...
I'm noticing what might be a miscommunication/misunderstanding between your comment and the post and Kuhn. It's not that the statement of such open problems creates the paradigm; it's that solutions to those problems creates the paradigm.
The problems exist because the old paradigms (concepts, methods etc) can't solve them. If you can state some open problems such that everyone agrees that those problems matter, and whose solution could be verified by the community, then you've gotten a setup for solutions to create a new paradigm. A solution will necessari...
[Continuing to sound elitist,] I have a related gripe/hot take that comments give people too much karma. I feel like I often see people who are "noisy" in that they comment a lot and have a lot of karma from that,[1] but have few or no valuable posts, and who I also don't have a memory of reading valuable comments from. It makes me feel incentivized to acquire more of a habit of using LW as a social media feed, rather than just commenting when a thought I have passes my personal bar of feeling useful.
Note that self-karma contributes to a comments pos
I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening”
One model that I'm currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.
Which is to say, problems are what establishes a paradigm. It's way...
For anyone reading this comment thread in the future, Dalcy wrote an amazing explainer for this paper here.