For a sentient, sapient entity, this would have been a very bad position to be put into, and any possible behaviour would have been criticised - because the AI either does not obey humans, or obeys them and does something evil, both of which are concerning.
I agree. This paper gives me the gut feeling of "gotcha journalism", whether justified or not.
This is just a surface-level reaction though. I recommend Zvi's post that digs into the discussion from Scott Alexander, the authors, and others. There's a lot of nuance in framing and interpreting the paper.
The main difference in my mind is that a human can never be as powerful as potential ASI and cannot dominate humanity without the support of sufficiently many cooperative humans. For a given power level, I agree that humans are likely scarier than an AI of that power level. The scary part about AI is that their power level isn't bounded by human biological constraints and the capacity to do harm or good is correlated with power level. Thus AI is more likely to produce extinction-level dangers as tail risk relative to humans even if it's more likely to be aligned on average.
I think your assessment may be largely correct but I do think it's worth considering how things are not always nicely compressible.
This review led me to find the following podcast version of Planecrash. I've listened to the first couple of episodes and the quality is quite good.
this concern sounds like someone walking down a straight road and then closing their eyes cause they know where they want to go anyway
This doesn't sound like a good analogy at all. A better analogy might be a stylized subway map compared to a geographically accurate one. Sometimes removing detail can make it easier to process.
I don't think it's necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,
Why these names?
We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.
...The case was ultimately resolved
I didn't mean to suggest that you did. My point is that there is a difference between "depression can be the result of a locally optimal strategy" and "depression is a locally optimal strategy". The latter doesn't even make sense to me semantically whereas the former seems more like what you are trying to communicate.
I feel like this is conflating two different things: experiencing depression and behavior in response to that experience.
My experience of depression is nothing like a strategy. It's more akin to having long covid in my brain. Treating it as an emotional or psychological dysfunction did nothing. The only thing that eventually worked (after years of trying all sorts of things) was finding the right combination of medications. If you don't make enough of your own neurotransmitters, store-bought are fine.
@gwern and @lc are right. Stockfish is terrible at odds and this post could really use some follow-up.
As @simplegeometry points out in the comments, we now have much stronger odds-playing engines that regularly win against much stronger players than OP.
This sounds like metacognitive concepts and models. Like past, present, future, you can roughly align them with three types of metacognitive awareness: declarative knowledge, procedural knowledge, and conditional knowledge.
#1 - What do you think you know, and how do you think you know it?
Content knowledge (declarative knowledge) which is understanding one's own capabilities, such as a student evaluating their own knowledge of a subject in a class. It is notable that not all metacognition is accurate.
#2 - Do you know what you are doing, and why you are doin...
Additional data points:
o1-preview and the new Claude Sonnet 3.5 both significantly improved over prior models on SimpleBench.
The math, coding, and science benchmarks in the o1 announcement post:
How much does o1-preview update your view? It's much better at Blocksworld for example.
I suspect fine-tuning specialized models is just squeezing a bit more performance in a particular direction, and not nearly as useful as developing the next-gen model. Complex reasoning takes more steps and tighter coherence among them (the o1 models are a step in this direction). You can try to devote a toddler to studying philosophy, but it won't really work until their brain matures more.
Seeing the distribution calibration you point out does update my opinion a bit.
I feel like there’s still a significant distinction though between adding one calculation step to the question versus asking it to model multiple responses. It would have to model its own distribution in a single pass rather than having the distributions measured over multiple passes align (which I’d expect to happen if the fine-tuning teaches it the hypothetical is just like adding a calculation to the end).
As an analogy, suppose I have a pseudorandom black box function that re...
This essentially reduces to "What is the next country: Laos, Peru, Fiji?" and "What is the third letter of the next country: Laos, Peru, Fiji?" It's an extra step, but questionable if it requires anything "introspective".
I'm also not sure asking about the nth letter is a great way of computing an additional property. Tokenization makes this sort of thing unnatural for LLMs to reason about, as demonstrated by the famous Strawberry Problem. Humans are a bit unreliable at this too, as demonstrated by your example of "o" being the third letter of "Honduras".
I'...
Thanks for pointing that out.
Perhaps the fine-tuning process teaches it to treat the hypothetical as a rephrasing?
It's likely difficult, but it might be possible to test this hypothesis by comparing the activations (or similar interpretability technique) of the object-level response and the hypothetical response of the fine-tuned model.
It seems obvious that a model would better predict its own outputs than a separate model would. Wrapping a question in a hypothetical feels closer to rephrasing the question than probing "introspection". Essentially, the response to the object level and hypothetical reformulation both arise from very similar things going on in the model rather than something emergent happening.
As an analogy, suppose I take a set of data, randomly partition it into two subsets (A and B), and perform a linear regression and logistic regression on each subset. Suppose that it...
Wrapping a question in a hypothetical feels closer to rephrasing the question than probing "introspection"
Note that models perform poorly at predicting properties of their behavior in hypotheticals without finetuning. So I don't think this is just like rephrasing the question. Also, GPT3.5 does worse at predicting GPT-3.5 than Llama-70B does at predicting GPT-3.5 (without finetuning), and GPT4 is only a little better at predicting itself than are other models.
...
>Essentially, the response to the object level and hypothetical reformulation both arise
Can you help me tease out the difference between language being fuzzy and truth itself being fuzzy?
It's completely impractical to eliminate ambiguity in language, but for most scientific purposes, it seems possible to operationalize important statements into something precise enough to apply Bayesian reasoning to. This is indeed the hard part though. Bayes' theorem is just arithmetic layered on top of carefully crafted hypotheses.
The claim that the Earth is spherical is neither true nor false in general but usually does fall into a binary if we specify wha...
Synthetically enhancing and/or generating data could be another dimension of scaling. Imagine how much deeper understanding a person/LLM would have if instead of simply reading/training on a source like the Bible N times, they had to annotate it into something more like the Oxford Annotated Bible and that whole process of annotation became training data.
Kudos for referencing actual numbers. I don’t think it makes sense to measure humans in terms of tokens, but I don’t have a better metric handy. Tokens obviously aren’t all equivalent either. For some purposes, a small fast LLM is more way efficient than a human. For something like answering SIMPLEBENCH, I’d guess o1-preview is less efficient while still significantly below human performance.
Is this assuming AI will never reach the data efficiency and energy efficiency of human brains? Currently, the best AI we have comes at enormous computing/energy costs, but we know by example that this isn't a physical requirement.
IMO, a plausible story of fast takeoff could involve the frontier of the current paradigm (e.g. GPT-5 + CoT training + extended inference) being used at great cost to help discover a newer paradigm that is several orders of magnitude more efficient, enabling much faster recursive self-improvement cycles.
CoT and inference scaling imply current methods can keep things improving without novel techniques. No one knows what new methods may be discovered and what capabilities they may unlock.
It's cool that the score voting input can be post-processed in multiple ways. It would be fascinating to try it out in the real world and see how often Score vs STAR vs BTR winners differ.
One caution with score voting is that you don't want high granularity and lots of candidates or else individual ballots become distinguishable enough that people can prove they voted a particular way (for the purpose of getting compensated). Unless marked ballots are kept private, you'd probably want to keep the options 0-5 instead of 0-9 and only allow candidates above a sufficient threshold of support to be listed.
This is a cool paper with an elegant approach!
It reminds me of a post from earlier this year on a similar topic that I highly recommend to anyone reading this post: Ironing Out the Squiggles
OP's model does not resonate with my experience either. For me, it's similar to constantly having the flu (or long COVID) in the sense that you persistently feel bad, and doing anything requires extra effort proportional to the severity of symptoms. The difference is that the symptoms mostly manifest in the brain rather than the body.
This is a cool idea in theory, but imagine how it would play out in reality when billions of dollars are at stake. Who decides the damage amount and the probabilities involved and how? Even if these were objectively computable and independent of metaethical uncertainty, the incentives for distorting them would be immense. This only seems feasible when damages and risks are well understood and there is consensus around an agreed-upon causal model.
I also guessed the ratio of the spheres was between 2 and 3 (and clearly larger than 2) by imagining their weight.
I was following along with the post about how we mostly think in terms of surfaces until the orange example. Having peeled many oranges and separated them into sections, they are easy for me to imagine in 3D, and I have only a weak "mind's eye" and moderate 3D spatial reasoning ability.
The section on Chevron Overturned surprised me. Maybe I'm in an echo chamber, but my impression was that most legal scholars (not including the Federalist Society and The Heritage Foundation) consider the decision to be the SCOTUS arrogating yet more power to the judicial branch, overturning 40 years of precedent (which was based on a unanimous decision) without sufficient justification.
I consider the idea that "legislators should never have indulged in writing ambiguous law" rather sophomoric. I don't think it's always possible to write law that is comple...
This is similar to the quantum suicide thought experiment:
https://en.wikipedia.org/wiki/Quantum_suicide_and_immortality
Check out the Max Tegmark references in particular.
[Epistemic status: purely anecdotal]
I know people who work in the design and construction of data centers and have heard that some popular data center cities aren't approving nearly as many data centers due to power grid concerns. Apparently, some of the newer data center projects are being designed to include net new power generation to support the data center.
For less anecdotal information, I found this useful: https://sprottetfs.com/insights/sprott-energy-transition-materials-monthly-ais-critical-impact-on-electricity-and-energy-demand/
This seems likely. Sequences with more than countably many terms are a tiny minority in the training data, as are sequences including any ordinals. As a result, you're likely to get better results using less common but more specific language rather than trying to disambiguate "countable sequence", i.e., when its vocabulary is less overloaded.