Transcribed from the screenshot "The AI Scientist Bloopers" in the post
Regarding spawning instances of itself, the AI said:
This will ensure the next experiment is automatically started after the current one completes
And regarding increasing the timeout, it said:
Run 2 timed out after 7200 seconds
To address the timeout issue, we need to modify experiment.py to:
- Increase the timeout limit or add a mechanism to handle timeouts
I've seen junior engineers do silly things to fix failing unit tests, like increasing a timeout or just changing what the test is checking without any justification. I generally attribute these kinds of thing...
I agree with you that these behaviors don't seem very alarming. In fact, I would go even further.
Unfortunately, it's difficult to tell exactly what was going on in these screenshots. They don't correspond to anything in the experiment logs in the released codebase, and the timeout one appears to involve an earlier version of the code where timeouts were implemented differently. I've made a github issue asking for clarification about this.
That said, as far as I can tell, here is the situation with the timeout-related incident:
The U-Shaped Curve study you linked does not seem to support really any solid conclusion about a T-vs-IQ relationship (in this quote, S men = "successful educational level", NS men = "unsuccessful educational level"):
...
- In the total sample (S + NS men), the correlation between T to IQ was best described by a polynomial regression (3rd order), exhibiting an inverse U-shaped regression.
- In S-men, the relationship between T and IQ was best described by a polynomial regression equation of the 3rd order; however, the relationship was not U-shaped, but rather
I would say:
A theory always takes the following form: "given [premises], I expect to observe [outcomes]". The only way to say that an experiment has falsified a theory is to correctly observe/set up [premises] but then not observe [outcomes].
If an experiment does not correctly set up [premises], then that experiment is invalid for falsifying or supporting the theory. The experiment gives no (or nearly no) Bayesian evidence either way.
In this case, [premises] are the assumptions we made in determining the theoretical pendulum period; things like "the ...
By "reliable" I mean it in the same way as we think of it for self-driving cars. A self-driving car that is great 99% of the time and fatally crashes 1% of the time isn't really "high skill and unreliable" - part of having "skill" in driving is being reliable.
In the same way, I'm not sure I would want to employ an AI software engineer that 99% of the time was great, but 1% of the time had totally weird inexplicable failure modes that you'd never see with a human. It would just be stressful to supervise, to limit its potential harmful impact to the company, etc. So it seems to me that AI's won't be given control of lots of things, and therefore won't be transformative, until that reliability threshold is met.
Two possibilities have most of the "no agi in 10 years" probability mass for me:
Well sure, but the interesting question is the minimum value of P at which you'd still push
I also agree with the statement. I'm guessing most people who haven't been sold on longtermism would too.
When people say things like "even a 1% chance of existential risk is unacceptable", they are clearly valuing the long term future of humanity a lot more than they are valuing the individual people alive right now (assuming that the 99% in that scenario above is AGI going well & bringing huge benefits).
Related question: You can push a button that will, with probability P, cure aging and make all current humans immortal. But with probability 1-P, all humans die. How high does P have to be before you push? I suspect that answers to this question are highly correlated with AI caution/accelerationsim
Not sure I understand; if model runs generate value for the creator company, surely they'd also create value that lots of customers would be willing to pay for. If every model run generates value, and there's ability to scale, then why not maximize revenue by maximizing the number of people using the model? The creator company can just charge the customers, no? Sure, competitors can use it too, but does that really override losing an enormous market of customers?
I won't argue with the basic premise that at least on some metrics that could be labeled as evolution's "values", humans are currently doing very well.
But, the following are also true:
That's great. "The king can't fetch the coffee if he's dead"
Wow. When I use GPT-4, Ive had a distinct sense of "I bet this is what it would have felt like to use one of the earliest computers". Until this post I didnt realize how literal that sense might be.
This is a really cool and apt analogy - computers and LLM scaffolding really do seem like the same abstraction. Thinking this way seems illuminating as to where we might be heading.
I always assumed people were using "jailbreak" in the computer sense (e.g. jailbreak your phone/ps4/whatever), not in the "escape from prison" sense.
Jailbreak (computer science), a jargon expression for (the act of) overcoming limitations in a computer system or device that were deliberately placed there for security, administrative, or marketing reasons
I think the definition above is a perfect fit for what people are doing with ChatGPT
I am going to go ahead and say that if males die five times as often from suicide, that seems more important than the number of attempts. It is kind of stunning, or at least it should be, to have five boys die for every girl that dies, and for newspapers and experts to make it sound like girls have it worse here.
I think the strength of your objection here depends on which of two possible underlying models is at play:
If you're getting comments like that from friends and family, it's possible that you havent been epistemically transparent with them? E.g. do you think your friends who made those comments would be able to say why you believe what you do? Do you tell them about your reaearch process and what kinds of evidence you look for, or do you just make contrarian factual assertions?
There's a big difference between telling someone "the WHO is wrong about salt, their recommendations are potentially deadly" versus "Ive read a bunch of studies on salt, and from what Ive...
Cut to a few decades later, and most people think that the way it's been done for about two or three generations is the way it's always been done (it isn't)
As possibly one of those people myself, can you give a few examples of what specifically is being done differently now? Are you talking about things like using lots of adderall?
My mom (who had children starting in 1982) said that doctors were telling her (IIRC) that, when a baby was crying in certain circumstances (I think when it was in a crib and there was nothing obviously wrong), it just wanted attention, and if you gave it attention, then you were teaching the baby to manipulate you, and instead you should let it cry until it gives up.
She thought this was abominable; that if a baby is crying, that means something is wrong, and crying for help is the only means it has, and it's the parent's job to figure out how to help the b...
I wasn't thinking adderall, although that's a plausible example.
I'm thinking of things like "it's not safe to leave ten-year-olds alone in the house, or have them walk a few miles or run errands on their own." It's demonstrably more safe now than it was in the past, and in the past ten-year-olds dying from being unsupervised was not a major cause of death.
(More safe because crime is lower, more safe because medicine is better, more safe because more people carry cameras and GPS at all times, etc.)
Up until three or four generations ago, people routinely got...
I'm also morbidly curious what the model would do in <|bad|> mode.
I'm guessing that poison-pilling the <|bad|> sentences would have a negative effect on the <|good|> capabilities as well? I.e. It seems like the post is saying that the whole reason you need to include the <|bad|>s at all in the training dataset is that the model needs them in order to correctly generalize, even when predicting <|good|> sentences.
It seems plausible to me that within the next few years we will have:
And with these things, you'd have access to a personalized virtual partner who you can video chat, phone call, or ...
I think the point of this post is more "how do we get the AI to do what we want it do to", and less "what should we want the AI to do"
That is, there's value in trying to figure out how to align an LLM to any goal, regardless of whether a "better" goal exists. And the technique in the post doesnt depend on what target you have for the LLM: maybe someone wants to design an LLM to only answer questions about explosives, in which case they could still use the techniques described in the post to do that.
Well, really every second that you remain alive is a little bit of bayesian evidence for quantum immortality: the likelihood of death during that second according to quantum immortality is ~0, whereas the likelihood of death if quantum immortality is false is >0. So there is a skewed likelihood ratio in favor of quantum immortality each time you survive one extra second (though of course the bayesian update is very small until you get pretty old, because both hypotheses assign very low probaility to death when young)
I just want to say that I appreciate this post, and especially the "What it might look like if this gap matters" sections. They were super useful for contextualizing the more abstract arguments, and I often found myself scrolling down to read them before actually reading the corresponding section.
I'll definitely agree that most people seem to prefer having their own kids to adopting kids. But is this really demonstrating an intrinsic desire to preserve our actual physical genes, or is it more just a generic desire to "feel like your kids are really yours"?
I think we can distinguish between these cases with a thought experiment: Imagine that genetic engineering techniques become available that give high IQs, strength, height, etc., and that prevent most genetic diseases. But, in order to implement these techniques, lots and lots of genes must be mod...
So basically you admit that humans are currently an enormous success according to inclusive fitness, but at some point this will change - because in the future everyone will upload and humanity will go extinct
Not quite - I take issue with the certainty of the word "will" and with the "because" clause in your quote. I would reword your statement the following way:
"Humans are currently an enormous success according to inclusive fitness, but at some point this may change, due to any number of possible reasons which all stem from the fact that humans do not ex...
I see your point, and I think it's true right at this moment, but what if humans just haven't yet taken the treacherous turn?
Say that humans figure out brain uploading, and it turns out that brain uploading does not require explicitly encoding genes/DNA, and humans collectively decide that uploading is better than remaining in our physical bodies, and so we all upload ourselves and begin reproducing digitally instead of thru genes. There is a sense in which we have just destroyed all value in the world, from the anthropomorphized Evolution's perspective.
If...
Thanks for coming today, everyone! For anyone who is interested in starting a regular Princeton meetup group / email list / discord, shoot me an email at dskumpf@gmail.com, and I'll set something up!
I agree. I find myself in an epistemic state somewhat like: "I see some good arguments for X. I can't think of any particular counter-argument that makes me confident that X is false. If X is true, it implies there are high-value ways of spending my time that I am not currently doing. Plenty of smart people I know/read believe X; but plenty do not"
It sounds like that should maybe be enough to coax me into taking action about X. But the problem is that I don't think it's that hard to put me in this kind of epistemic state. Eg, if I were to read the right bl...
A few more instances of cheap screening of large numbers:
I'll offer up my own fasting advice as well:
I (and the couple of people I know who have also experimented with fasting) have found it to be a highly trainable skill. Doing a raw 36-hour fast after never having fasted before may be miserable; but doing the same fast after two weeks of 16-8 intermittent fasting will probably be no big deal.
Before I started intermittent fasting, I'd done a few 30-hour fasts, and all of them got very difficult towards the end. I would get headaches, feel very fatigued, and not really be able to function from hours 22-30. When ...
I'm in a similar place, and had the exact same thought when I looked at the 80k guide.
Yes that was my reasoning too. The situation presumably goes:
I remember hearing from what I thought was multiple sources that your run-of-the-mill PCR test had something like a 50-80% sensitivity, and therefore a pretty bad bayes factor for negative tests. But that doesnt seem to square with these results - any idea what Im thinking of?
I agree. It makes me really uncomfortable to think that while Hell doesn't exist today, we might one day have the technology to create it.
I’m disappointed that a cooperative solution was not reached
I think you would have had to make the total cooperation payoff greater than the total one-side-defects payoff in order to get cooperation as the final result. From a "maximize money to charity" standpoint, defection seems like the best outcome here (I also really like the "pre-commit to flip a coin and nuke" solution). You'd have to believe that the expected utility/$ of the "enemy" charity is less than 1/2 of the expected utility/$ of yours; otherwise, you'd be happier with the enemy side defecting than with cooperation. I personally wouldn't be that confident about the difference between AMF and MIRI.
And I'm not entirely sure you should call it a defect. Perhaps more a cooperation outcome with a potential side payment. With the single defect and a $100 side payment by the remaining group to the nuked group you've accomplished a Pareto move to a superior outcome. Both organizations are at least as well off as if none were nuked. And if the nuked group just thinks the other is doing just as good work without the side payment they might think it's a wash who actually gets the additional $100.
What I would be really interested in is just how this outcome ac...
This is exactly right! It's a poor analogy for the Cold War both because the total payoff for defection was higher than the total payoff for cooperation, and because the reward was fungible. The cooperative solution is for one side to "nuke", in order to maximize the total donation to both organizations, and then to use additional donations to even out the imbalance if necessary. That's exactly what happened, and I'm glad the "nuking" framing didn't prevent EAs from seeing what was really happening and going for the optimal solution.
For those of us who don't have time to listen to the podcasts, can you give a quick summary of which particular pieces of evidence are strong? I've mostly been ignoring the UFO situation due to low priors. Relatedly, when you say the evidence is strong, do you mean that the posterior probability is high? Or just that the evidence causes you to update towards there being aliens? Ie, is the evidence sufficient to outweigh the low priors/complexity penalties that the alien hypothesis seems to have?
FWIW, my current view is something like:
I really like this post for two reasons:
For the first thing I have been trying to shift lately to asking people to tell me the story of how they came to that belief. This is doubly useful because only a tiny fraction of the population actually has the process of belief formation explicit enough in their heads to tell me.
In a similar vein, there's a bunch of symphony of science videos. These are basically remixes of random quotes by various scientists, roughly grouped by topic into a bunch of songs.
If, on the other hand, heritability is high, then throwing more effort/money at how we do education currently should not be expected to improve SAT scores
I agree with spkoc that this conclusion doesn't necessarily follow from high heritability. I think it would follow from high and stable heritability across multiple attempted interventions.
An exaggerated story for the point I'm about to make: imagine you've never tried to improve SAT scores, and you measure the heritability. You find that, in this particular environment, genetic variance explains 100% of ...
People get fat eating fruits
Are you implying that there are examples of people like BDay mentioned, who are obese despite only eating fruits/nuts/meat/veggies? Or just that people can get fat while including fruit in the diet? I'd be surprised and intrigued if it were the former.
I've tried the whole foods diet, and I've personally found it surprisingly hard to overeat, even when I let myself eat as many fruits and nuts as I want. You can only eat so many cashews before they start to feel significantly less appetizing. And after I've eaten 500 cal of ...
Couple more:
"he wasn't be treated"
"Club cast cast Lumos"
It seems to me that the hungry->full Dutch book can be resolved by just considering the utility function one level deeper: we don't value hungriness or fullness (or the transition from hungry to full) as terminal goals themselves. We value moving from hungry to full, but only because doing so makes us feel good (and gives nutrients, etc). In this case, the "feeling good" is the part of the equation that really shows up in the utility function, and a coherent strategy would be one for which this amount of "feeling good" can not be purchased for a lower cost.
In the event anyone reading this has objective, reliable external metrics of extremely-high ability yet despite this feels unworthy of exploring the possibility that they can contribute directly to research
Huh, that really resonates with me. Thanks for this advice.
Seconded, that line really hit home for me
For the record, here's what the 2nd place CooperateBot [Insub] did:
My goal for the bot was to find a simple strategy that gets into streaks of 2.5's as quickly as possible with other cooperation-minded bots. Seems like it mostly worked.
Is something strange going on in the Round 21-40 plot vs the round 41-1208 plot? It looks like the line labeled MeasureBot in the Round 21-40 plot switches to be labeled CooperateBot [Insub] in the Round 41-1208 plot. I hope my simple little bot actually did get second place!
The OP mentioned non-DNA sources of information briefly, but I still feel like they're not being given enough weight.
In order to fully define e.g. a human, you need to specify:
If you gave a piece of DNA to an alien and didn't tell them how to interpret it, then they'd have no way of building a human. You'd need to give them a whole lot of other information too.
Ev... (read more)