I don't understand why Chollet thinks the smart child and the mediocre child are doing categorically different things. Why can't the mediocre child be GPT-4, and the smart child GPT-6? I find the analogies Chollet and others draw in an effort to explain away the success of deep learning sufficient to explain what the human brain does, and it's not clear a different category of mind will or can ever exist (I don't make this claim, I'm just saying that Chollet's distinction is not evidenced).
Chollet points to real shortcomings of modern deep learning systems...
That is closer to what I meant, but it isn't quite what SLT says. The architecture doesn't need to be biased toward the target function's complexity. It just needs to always prefer simpler fits to more complex ones.
This why the neural redshift paper says something different to SLT. It says neural nets that generalize well don't just have a simplicity bias, they have a bias for functions with similar complexity to the target function. This brings into question mesaoptimization, because although mesaoptimization is favored by a simplicity bias, it is not necessarily favored by a bias toward equivalent simplicity to the target function.
I think the predictions SLT makes are different from the results in the neural redshift paper. For example, if you use tanh instead of ReLU the simplicity bias is weaker. How does SLT explain/predict this? Maybe you meant that SLT predicts that good generalization occurs when an architecture's preferred complexity matches the target function's complexity?
The explanation you give sounds like a different claim however.
...If you go to a random point in the loss landscape, you very likely land in a large region implementing the same behaviour, meaning the network
Neural Redshift: Random Networks are not Random Functions shows that even randomly initialized neural nets tend to be simple functions (measured by frequency, polynomial order and compressibility), and that this bias can be partially attributed to ReLUs. Previous speculation on simplicity biases focused mostly on SGD, but this is now clearly not the only contributor.
The authors propose that good generalization occurs when an architecture's preferred complexity matches the target function's complexity. We should think about how compatible this is with our p...
I dropped out one month ago. I don't know anyone else who has dropped out. My comment recommends students consider dropping out on the grounds that it seemed like the right decision for me, but it took me a while to realize this was even a choice.
So far my experience has been pleasant. I am ~twice as productive. The total time available to me is ~2.5-3x as much as I had prior. The excess time lets me get a healthy amount of sleep and play videogames without sacrificing my most productive hours. I would make the decision again, and earlier if I could.
More people should consider dropping out of high school, particularly if they:
In most places, once you reach an age younger than the typical age of graduation you are not legally obligated to attend school. Many continue because it's normal, but some brief analysis could reveal that graduating is not worth the investment for you.
Some common objections I heard:
Why finish?
The mis...
Yes, your description of my hypothetical is correct. I think it's plausible that approximating things that happened in the past is computationally easier than breaking some encryption, especially if the information about the past is valuable even if it's noisy. I strongly doubt my hypothetical will materialize, but I think it is an interesting problem regardless.
My concern with approaches like the one you suggest is that they're restricted to small parts of the universe, so with enough data it might be possible to fill in the gaps.
Present cryptography becomes redundant when the past can be approximated. Simulating the universe at an earlier point and running it forward to extract information before it's encrypted is a basic, but difficult way to do this. For some information this approximation could even be fuzzy, and still cause damage if public. How can you protect information when your adversary can simulate the past?
The information must never exist as plaintext in the past. A bad way to do this is to make the information future-contingent. Perhaps it could be acausally inserted ...
We are recruiting people interested in using Rallypoint in any way. The form has an optional question for what you hope to get out of using Rallypoint. Even if you don't plan on contributing to bounties or claiming them and just want to see how others use Rallypoint we are still interested in your feedback.
Yes. If the feedback from the beta is that people find Rallypoint useful we will do a public release and development will continue. I want to focus on getting the core bounty infrastructure very refined before adding many extra features. Likely said infrastructure would be easily adapted to more typical crowdfunding and a few other applications, but that is lower on the priority list than getting bounties right.
I don't understand the distinction you draw between free agents and agents without freedom.
If I build an expected utility maximizer with a preference for the presence of some physical quantity, that surely is not a free agent. If I build some agent with the capacity to modify a program which is responsible for its conversion from states of the world to scalar utility values, I assume you would consider that a free agent.
I am reminded of E.T. Jaynes' position on the notion of 'randomization', which I will summarize as "a term to describe a process we ...
I believe you misinterpreted the quote from disturbance. They were implying that they would bring about AGI at the moment before their brain would be unsalvageable by AGI such that they could be repaired, assumedly in expectation of immortality.
I also don't think the perspective that we would likely fail as a civilization without AGI is common on LessWrong. I would guess that most of us would expect a smooth-ish transition to The Glorious Future in worlds where we coordinate around [as in don't build] AI. In my opinion the post is good even without this claim however.
models that are too incompetent to think through deceptive alignment are surely not deceptively aligned.
Is this true? In Thoughts On (Solving) Deep Deception, Jozdien gives the following example that suggests otherwise to me:
...Back in 2000, a computer scientist named Charles Ofria was studying the evolution of simulated organisms. He wanted to limit their replication rate, so he programmed the simulation to pause after each mutation, measure the mutant’s replication rate in an isolated test environment, and delete the mutant if it replicated faster than its
unless, by some feat of brilliance, this civilization pulls off some uncharacteristically impressive theoretical triumphs
Are you able to provide an example of the kind of thing that would constitute such a theoretical triumph? Or, if not; a maximally close approximation in the form of something that exists currently?
I'm in high school myself and am quite invested in AI safety. I'm not sure whether you're requesting advice for high school as someone interested in LW, or for LW and associated topics as someone attending high school. I will try to assemble a response to accommodate both possibilities.
Absorbing yourself in topics like x-risk can make school feel like a waste of time. This seems to me to be because school is mostly a waste of time (which is a position I held before becoming interested in AI safety,) but disengaging with the practice entirely also feels inc...
I expect agentic simulacra to occur without intentionally simulating them, in that agents are just generally useful for solving prediction problems and that in conducting millions of predictions (as would be expected of a product on the order of ChatGPT, or future successors,) it's probable for agentic simulacra to occur. Even if these agents are just approximations, in predicting the behaviors of approximated agents their preferences could still be satisfied in the real world (as described in the Hubinger post.)
The problem I'm interested in is how you ensure that all subsequent agentic simulacra (whether occurred intentionally or otherwise) are safe, which seems difficult to verify formally due to the Löbian Obstacle.
Which part specifically are you referring to as being overly complicated? What I take to be the primary assertions of the post to be are:
The issue I have with pivotal act models is that they presume an aligned superintelligence would be capable of bootstrapping its capabilities in such a way that it could perform that act before the creation of the next superintelligence. Soft takeoff seems a very popular opinion now, and isn't conducive to this kind of scheme.
Also, if a large org were planning a pivotal act I highly doubt they would do so publicly. I imagine subtly modifying every GPU on the planet, melting them or doing anything pivotal on a planetary scale such that the resulting world h...
I have enjoyed your writings both on LessWrong and on your personal blog. I share your lack of engagement with EA and with Hanson (although I find Yudkowsky's writing very elegant and so felt drawn to LW as a result.) If not the above, which intellectuals do you find compelling, and what makes them so by comparison to Hanson/Yudkowsky?
In (P2) you talk about a roadblock for RSI, but in (C) you talk about about RSI as a roadblock, is that intentional?
This was a typo.
By "difficult", do you mean something like, many hours of human work or many dollars spent? If so, then I don't see why the current investment level in AI is relevant. The investment level partially determines how quickly it will arrive, but not how difficult it is to produce.
The primary implications of the difficulty of a capabilities problem in the context of safety is when said capability will arrive in mo...
More like: (P1) Currently there is a lot of investment in AI. (P2) I cannot currently imagine a good roadblock for RSI. (C) Therefore, I have more reasons to believe RSI will not be entail atypically difficult roadblocks than I do to believe it will.
This is obviously a high level overview, and a more in-depth response might cite claims like the fact that RSI is likely an effective strategy for achieving most goals, or mention counterarguments like Robin Hanson's, which asserts that RSI is unlikely due to the observed behaviors of existing >human systems (e.g. corporations).
"But what if [it's hard]/[it doesn't]"-style arguments are very unpersuasive to me. What if it's easy? What if it does? We ought to prefer evidence to clinging to an unknown and saying "it could go our way." For a risk analysis post to cause me to update I would need to see "RSI might be really hard because..." and find the supporting reasoning robust.
Given current investment in AI and the fact that I can't conjure a good roadblock for RSI, I am erring on the side of it being easier rather than harder, but I'm open to updating in light of strong counter-reasoning.
See:
Defining fascism in this way makes me worry that future fascist figures can hide behind the veil of "But we aren't doing x specific thing (e.g. minority persecution) and therefore are not fascist!"
And:
Is a country that exhibits all symptoms of fascism except for minority group hostility still fascist?
Agreed. I have edited that excerpt to be:
It's not obvious to me that selection for loyalty over competence is necessarily more likely in fascism or bad. A competent figure who is opposed to democracy would be a considerably more concerning electoral candidate than a less competent one who is loyal to democracy assuming that democracy is your optimization target.
Sam Altman, the quintessential short-timeline accelerationist, is currently on an international tour meeting with heads of state, and is worried about the 2024 election. He wouldn't do that if he thought it would all be irrelevant next year.
Whilst I do believe Sam Altman is probably worried about the rise of fascism and its augmenting by artificial intelligence, I don't see this as evidence of his care regarding this fact. Even if he believed a rise in fascism had no likelihood of occurring; it would still be beneficial for him to pursue the international ...
While it's true that Chinese semiconductor fabs are a decade behind TSMC (and will probably remain so for some time), that doesn't seem to have stopped them from building 162 of the top 500 largest supercomputers in the world.
They did this (mostly) before the export regulations were instantiated. I'm not sure what the exact numbers are, but both of their supercomputers in the top 10 were constructed before October 2022 (when they were imposed). Also, I imagine that they still might have had a steady supply of cutting edge chips soon after the export regula...
Sure, this is an argument 'for AGI', but rarely do people (on this forum at least) reject the deployment of AGI because they feel discomfort in not fully comprehending the trajectory of their decisions. I'm sure that this is something most of us ponder and would acknowledge is not optimal, but if you asked the average LW user to list the reasons they were not for the deployment of AGI, I think that this would be quite low on the list.
Reasons higher on the list for me for example would be "literally everyone might die." In light of that; dismissing control ...
Soft upvoted your reply, but have some objections. I will respond using the same numbering system you did such that point 1 in my reply will address point 1 of yours.
The focus of the post is not on this fact (at least not in terms of the quantity of written material). I responded to the arguments made because they comprised most of the post, and I disagreed with them.
If the primary point of the post was "The presentation of AI x-risk ideas results in them being unconvincing to laypeople", then I could find reason in responding to this, but other than this general notion, I don't see anything in this post that expressly conveys why (excluding troubles with argumentative rigor, and the best way to respond to this I can think of is by refuting said arguments).
I disagree with your objections.
"The first argument–paperclip maximizing–is coherent in that it treats the AGI’s goal as fixed and given by a human (Paperclip Corp, in this case). But if that’s true, alignment is trivial, because the human can just give it a more sensible goal, with some kind of “make as many paperclips as you can without decreasing any human’s existence or quality of life by their own lights”, or better yet something more complicated that gets us to a utopia before any paperclips are made"
This argument is essentially addressed by this pos...
So if the argument the OT proponents are making is that AI will not self-improve out of fear of jeopardising its commitment to its original goal, then the entire OT is moot, because AI will never risk self-improving at all.
This seems to me to apply only to self improvement that modifies the outcome of decision-making irrespective of time. How does this account for self improvement that only serves to make decision making more efficient?
If I have some highly inefficient code that finds the sum of two integers by first breaking them up into 10000 small...
I agree with this post almost entirely and strong upvoted as a result. The fact that more effort has not been allocated to the neurotechnology approach already is not a good sign, and the contents of this post do ameliorate that situation in my head slightly. My one comment is that I disagree with this analysis of cyborgism:
...Interestingly, Cyborgism appeared to diverge from the trends of the other approaches. Despite being consistent with the notion that less feasible technologies take longer to develop, it was not perceived to have a proportionate impact o
I disagree with your framing of the post. I do not think that this is wishful thinking.
The first and most obvious issue here is that an AI that "solves alignment" sufficiently well to not fear self-improvement is not the same as an AI that's actually aligned with humans. So there's actually no protection there at all.
It is not certain that upon deployment the first intelligence capable of RSI will be capable of solving alignment. Although this seems improbable in accordance with more classic takeoff scenarios (i.e. Yudkowsky's hard takeoff), the like...
Strong upvoted this post. I think the intuition is good and that architecture shifts invalidating anti-foom arguments derived from the nature of the DL paradigm is counter-evidence to those arguments, but simultaneously does not render them moot (i.e. I can still see soft takeoff as described by Jacob Cannell to be probable and assume he would be unlikely to update given the contents of this post).
I might try and present a more formal version of this argument later, but I still question the probability of a glass-box transition of type "AGI RSIs toward non...
I agree with some of this as a crticism of the idea, but not of the post. Firstly, I stated the same risk you did in the introduction of the post, hence the communication was "Here is an idea, but it has this caveat", and then the response begins with "but it has this caveat".
Next, if the 'bad outcome' scenario looks like most or all parties that receive the email ignoring it/not investigating further, then I see such an email as easily justifiable to send, as it is a low-intensity activity labour-wise with the potential to expand knowledge of x-risks pose...
I agree with this sentiment in response to the question of "will this research impact capabilities more than it will alignment?", but not in response to the question of "will this research (if implemented) elevate s-risks?". Partial alignment inflating s-risk is something I am seriously worried about, and prosaic solutions especially could lead to a situation like this.
If your research not influencing s-risks negatively is dependent on it not being implemented, and you think that it your research is good enough to post about, don't you see the dilemma here?
It's fine to make the mistake of publishing something if the mistake you made was assuming "this is great research", but if the mistake was "this is safe to publish because I'm new to research", the consequences can be irreversible. I probably fall into the category of 'wildly overthinking the harms of publishing due to inexperience', but it seems to me like a simple assessment using the ABC model I outlined in the post should take only a few minutes and could quickly inform someone of whether or not they might want to show their research to someone more e...
Although I soft upvoted this post, there are some notions I'm uncomfortable with.
What I agree with:
The third point is likely due to the fact that the Yudkowskian alignment paradigm isn't a particularly...
Thank you for the feedback, I have repaired the post introduction in accordance your commentary on utility functions. I challenge the assumption that a system not being able to reliably simulate an agent with human specifications is worrying, and I would like to make clear that the agenda I am pushing is not:
I don't think the point of RLHF ever was value alignment, and I doubt this is what Paul Christiano and others intended RLHF to solve. RLHF might be useful in worlds without capabilities and deception discontinuities (plausibly ours), because we are less worried about sudden ARA, and more interested in getting useful behavior from models before we go out with a whimper.
This theory of change isn't perfect. There is an argument that RLHF was net-negative, and this argument has been had.
My point is that you are assessing RLHF using your model of AI risk, so th... (read more)