What does "obscure" mean here? (If you label the above "obscure", I feel like every query I consider "non-trivial" could be labeled obscure.)
I don't think Lean is obscure, it's one of the most popular proof assistants nowadays. The whole Lean codebase should be in the AIs training corpus (in fact that's why I deliberately made sure to specify an older version, since I happen to know that the olean header changed recently.) If you have access to the codebase, and you understand the object representation, the solution is not too hard.
Here is the solution I w...
Yep, that "standard library" part sure seems problematic, I am not sure if an algorithm for listing primes is shorter than just the above lookup table.
Just to give an example, here is the kind of prompt I am thinking of. I am being very specific about what I want, I think there is very little room for misunderstanding about how I expect the program to behave:
Write a Python program that reads a
.olean
file (Lean v4.13.0), and outputs the names of the constants defined in the file. The program has to be standalone and only use modules from the python standard library, you cannot assume Lean to be available in the environment.
o3-mini gives pure garbage hallucination for me on this one, like it's not even close.
If your answer to question A is "a specific thing," and your answer to B is "yes, I'm very clear on what I want," then just explain it thoroughly, and you're likely to get satisfying results. Impressive examples like "rewrite this large complex thing that particular way" fall into this category.
Disagree. It sounds like by "being specific" you mean that you explain how you want the task to be done to the AI, which in my opinion can only be mildly useful.
When I am specific to an AI about what I want, I usually still get buggy results unless the solution i...
I guess the reasoning for why the solution given in the post is more "valid" than this one is "something something Occam's razor" or that it is "more elegant" (whatever "elegant" means), but if someone can make a more precise argument I would be interested to hear. (In particular, in my mind Occam's razor is something to do with empiricism, while what we are doing here is pure logic, so not sure how it exactly applies?)
Unfortunately no, I don't think any contradictions can be derived from the examples given in the post if we assume -E
and [E]
unary, and E E
binary operators. Here are some example assignments for these operators that satisfy (AFAICT) the examples from the post (assuming left associativity for juxtaposition, and that the precedence of -
is lower, so that -E E
is interpreted as -(E E)
in the last example):
Definitions for [E]
:
1 => 2
2 => 4
-1 => 1/2
-2 => 1/4
1/2 => sqrt(2)
1001 => 1001
1004 => 1004
Definitions for E E
:
1 2 => 1001
... I fairly quickly figured out that the grammar is something like E ::= "[]" | "[" E "]" | "-" E | "E E"
, and that eval([E]) = 2^eval(E)
(and eval(-E) = -eval(E)
), and then went down the rabbit hole of trying to come up with some f
eval(E1 E2) = f(eval(E1), eval(E2))
for juxtaposition, and thinking about whether it's left or right associative. I was also thinking that maybe it's n-ary rather than binary so that associativity does not matter.
Anyway, I think where I went wrong is that I decided that [E]
is a unary operator by itself, and did not reconsider thi...
Pull requests. Useful to group a bunch of messy commits into a single high-level purpose and commit that to
main
. Makes your commit history easier to read.
You can also squash multiple commits without using PRs. In fact, if someone meticulously edited their commit history for a PR to be easy-to-follow and the changes in each commit are grouped based on them being some higher level logical single unit of change, squashing their commits can be actively bad, since now you are destroying the structure and making the history less readable by making a single m...
Does Overleaf have such AI integration that can get "accidentally" activated, or are you using some other AI plugin?
Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn't even have to break out, we just "accidentally" hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)
Could someone explain the joke to me? If I take the above statement literally, some change made it into your document, which nobody with access claims to have put there. You must have some sort of revision control, so you should at least know exactly who and when made that edit, which should already narrow it down a lot?
I am not a 100% convinced by the comparison, because technically LLMs are only "reading" a bunch of source code, they are never given access to a compiler/interpreter. IMO actually running the code one has written is a very important part of learning, and I think it would be a much more difficult task for a human to learn to code just by reading a bunch of books/code, but never actually trying to write & run their own code.[1]
Also, in the video linked earlier in the thread, the girlfriend playing Terraria is deliberately not given access to the wiki, a...
And as a separate note, I'm not sure what the appropriate human reference class for game-playing AIs is, but I challenge the assumption that it should be people who are familiar with games. Rather than, say, people picked at random from anywhere on earth.
If you did that for programming, AIs would already be considered strongly superhuman. Just like we compare AI's coding knowledge to programmers, I think it's perfectly fair to compare their gaming abilities to people who play video games.
Notably, this was exactly the sort of belief I was trying to show is false
Please point out if there is a specific claim I made in my comment that you believe to be false. I said that "I don't think a TC computer can ever be built in our universe.", which you don't seem to argue with? (If we assume that we can only ever get access to a finite number of atoms. If you dispute this I won't argue with that, neither of us has a Theory of Everything to say for certain.)
Just to make precise why I was making that claim and what it was trying to argue against, ta...
I don't think a TC computer can ever be built in our universe. The observable universe has a finite number of atoms, I have seen numbers around thrown around. Even if you can build a RAM where each atom stores 1 bit,[1] this is still very much finite.
I think a much more interesting question is why TC machines are — despite only existing in theory — such useful models for thinking about real-world computers. There is obviously some approximation going on here, where for the vast majority of real-world problems, you can write them in such a way that the...
Let be the state space of our finite/physical computer, where is the number of bits of state the computer has. This can include RAM, non-volatile storage, CPU registers, cache, GPU RAM, etc... just add up all the bits.
The stateless parts of the computer can be modeled as a state transition function , which is applied at every time step to produce the next state. (And let's suppose that there is some special halting state .)
This is clearly a FSM with states, and not TC. The halting problem can be trivially solved for it: it is guarante...
I think the reacts being semantic instead of being random emojis is what makes this so much better.
I wish other platforms experimented with semantic reacts as well, instead of just letting people react with any emoji of their choosing, and making you guess whether e.g. "thumbs up" means agreement, acknowledgement, or endorsement, etc.
This was my first time taking this, looking forward to the results!
I know of Robert Miles, and Writer, who does Rational Animations. (In fact Robert Miles' channel is the primary reason I discovered LessWrong :) )
Don't leave me hanging like this, does the movie you are describing exist? (Though I guess your description is a major spoiler, you would need to go in without knowing whether there will be anything supernatural.)
2., 3. and 4. have in common that there is some sort of abusive relationship that develops, and I think this adds another layer of horror. (A person/group of people gain some power over the protagonist(s), and they slowly grow more abusive with this power.)
Somewhat related: does anyone else strongly dislike supernatural elements in horror movies?
It's not that I have anything against a movie exploring the idea of "what if we suddenly discovered that we live in a universe where supernatural thing X exist", but the characters just accept this without much evidence at all.
I would love a movie though where they explore the more likely alternate hypotheses first (mental issues, some weird optical/acoustic phenomenon, or just someone playing a super elaborate prank), but then the evidence starts mounding, and eventually they are forced to accept that "supernatural thing X actually exists" is really the most likely hypothesis.
These examples show that, at least in this lower-stakes setting, OpenAI’s current cybersecurity measures on an already-deployed model are insufficient to stop a moderately determined red-teamer.
I... don't actually see any non-trivial vulnerabilities here? Like, these are stuff you can do on any cloud VM you rent?
Cool exploration though, and it's certainly interesting that OpenAI is giving you such a powerful VM for free (well actually not because you already pay for GPT-4 I guess?), but I have to agree with their assessment which you found that "it's expected that you can see and modify files on this system".
The malware is embedded in multiple mods, some of which were added to highly popular modpacks.
Any info on how this happened? This seems like a fairly serious supply chain attack. I have heard of incidents with individual malicious packages on npm or PyPI, but not one where multiple high profile packages in a software repository were infected in a coordinated manner.
Uhh this first happening in 2023 was the exact prediction Gary Marcus made last year: https://www.wired.co.uk/article/artificial-intelligence-language
Not sure whether this instance is a capability or alignment issue though. Is the LLM just too unreliable, as Gary Marcus is saying? Or is it perfectly capable, and just misaligned?
I don't see why communicating with an AI through a BCI is necessarily better than through a keyboard+screen. Just because a BCI is more ergonomic and the AI might feel more like "a part of you", it won't magically be better aligned.
In fact the BCI option seems way scarier to me. An AI that can read my thoughts at any time and stimulate random neurons in my brain at will? No, thanks. This scenario just feels like you are handing it the "breaking out of the box" option on a silver platter.
Why is this being downvoted?
From what I am seeing people here are focusing way too much on having a precisely calibrated P(doom) value.
It seems that even if P(doom) is 1% the doom scenario should be taken very seriously and alignment research pursued to the furthest extent possible.
The probability that after much careful calibration and research you would come up with a P(doom) value less than 1% seems very unlikely to me. So why invest time into refining your estimate?
There was a recent post estimating that GTP-3 is equivalent to about 175 bees. There is also a comment there asserting that a human is about 140k bees.
I would be very interested if someone could explain where this huge discrepancy comes from. (One estimate is equating synapses with parameters, while this one is based on FLOPS. But there shouldn't be such a huge difference.)
Indeed (as other commenters also pointed out) the ability to sexually reproduce seems to be much more prevalent than I originally thought when writing the above comment. (I thought that eukaryotes only capable of asexual reproduction were relatively common, but it seems that there may only be a very few special cases like that.)
I still disagree with you dismissing the importance of mitochondria though. (I don't think the OP is saying that mitochondria alone are sufficient for larger genomes, but the argument for why they are at least necessary is convincing to me.)
I disagree with English (in principle at least) being inadequate for software specification.
For any commercial software, the specification basically is just "make profit for this company". The rest is implementation detail.
(Obviously this is an absurd example, but it illustrates how you can express abstractions in English that you can't in C++.)
I don't think the comparison of giving a LLM instructions and expecting correct code to be output is fair. You are vastly overestimating the competence of human programmers: when was the last time you wrote perfectly correct code on the very first try?
Giving the LLM the ability to run its code and modify it until it thinks its right would be a much fairer comparison. And if, as you say, writing unit tests is easy for a LLM, wouldn't that just make this trial-and-error loop trivial? You can just bang the LLM against the problem until the unit tests pass.
(And this process obviously won't produce bug-free code, but humans don't do that in the first place either.)
Not all eukaryotes employ sexual reproduction. Also prokaryotes do have some mechanisms for DNA exchange as well, so copying errors are not their only chance for evolution either.
But I do agree that it's probably no coincidence that the most complex life forms are sexually reproducing eukaryotes.
I barely registered the difference between small talk and big talk
I am still confused about what "small talk" is after reading this post.
Sure, talking about the weather is definitely small talk. But if I want to get to know somebody, weather talk can't possibly last for more than 30 seconds. After that, both parties have demonstrated the necessary conversational skills to move on to more interesting topics. And the "getting to know each other" phase is really just a spectrum between surface level stuff and your deepest personal secrets, so I don't reall...
It was actually this post about nootropics that got me curious about this. Apparently (based on self reported data) weightlifting is just straight up better than most other nootropics?
Anyway, thank you for referencing some opposing evidence on the topic as well, I might try to look into it more at some point.
(Unfortunately, the thing that I actually care about - whether it has cognitive benefits for me - seems hard to test, since you can't blind yourself to whether you exercised.)
I think this is (and your other post about exercise) are good practical examples of situations where rational thinking makes you worse off (at least for a while).
If you had shown this post to me as a kid, my youth would probably have been better. Unfortunately no one around me was able to make a sufficiently compelling argument for caring about physical appearance. It wasn't until much later that I was able to deduce the arguments for myself. If I just blindly "tried to fit in with the cool kids, and do what is trendy", I would have been better off.
I wonde...
This alone trumps any other argument mentioned in the post. None of the other arguments seem universal and can be argued with on an individual basis.
I actually like doing things with my body. I like hiking and kayaking and mountain climbing and dancing.
As some other commenters noted, what if you just don't?
I think it would be valuable if someone made a post just focused on collecting all the evidence for the positive cognitive effects of exercise. If the evidence is indeed strong, no other argument in favor of exercise should really matter.
Well, I've always been quite skeptical about the supposed huge mental benefits of exercising. I surely don't feel immediate mental benefits while exercising, and the first time I heard someone else claiming this I seriously thought it was a joke (it must be one of those universal human experiences that I am missing).
Anyway, I can offer one reference digged up from SSC:
...Although the role of poor diet/exercise in physical illness is beyond questioning, its role in mental illness is more anecdotal and harder to pin down. Don’t get me wrong, there are lot
FWIW I don't think that matters, in my experience interactions like this arise naturally as well, and humans usually perform similarly to how Friend did here.
In particular it seems that here ChatGPT completely fails at tracking the competence of its interlocutor in the domain at hand. If you asked a human with no context at first they might give you the complete recipe just like ChatGPT tried, but any follow up question immediately would indicate to them that more hand-holding is necessary. (And ChatGPT was asked to "walk me through one step at a time", which should be blatantly obvious and no human would just repeat the instructions again in answer to this.)
Cool! (Nitpick: You should probably mention that you are deviating from the naming in the HoTT book. AFAIK usually and types are called Pi and Sigma types respectively, while the words "product" and "sum" (or "coproduct" in the HoTT book) are reserved for and .)
I am especially looking forward to discussion on how MLTT relates to alignment research and how it can be used for informal reasoning as Alignment Research Field Guide mentions.
I always get confused when the term "type signature" is used in text unrelated to type theory. Like what do peop...
This argument seems a bit circular, nondeterminism is indeed a necessary condition for exfiltrating outside information, so obviously if you prevent all nondeterminism you prevent exfiltration.
You are also completely right that removing access to obviously nondeterministic APIs would massively reduce the attack surface. (AFAIK most known CPU side-channel require timing information.)
But I am not confident that this kind of attack would be "robustly impossible". All you need is finding some kind of nondeterminism that can be used as a janky timer and suddenl...
Yes, CPUs leak information: that is the output kind of side-channel, where an attacker can transfer information about the computation into the outside world. That is not the kind I am saying one can rule out with merely diligent pursuit of determinism.
I think you are misunderstanding this part, input side channels absolutely exist as well, Spectre for instance:
On most processors, the speculative execution resulting from a branch misprediction may leave observable side effects that may reveal private data to attackers.
Note that the attacker in this c...
This implies that we could use relatively elementary sandboxing (no clock access, no networking APIs, no randomness, none of these sources of nondeterminism, and that’s about it) to prevent a task-specific AI from learning any particular facts
It's probably very hard to create such a sandbox though, your list is definitely not exhaustive. Modern CPUs leak information like a sieve. (The known ones are mostly patched of course but with this track record plenty more unknown vulnerabilities should exist.)
Maybe if you build the purest lambda calculus interpre...
Also I just found that you already argued this in an earlier post, so I guess my point is a bit redundant.
Anyway, I like that this article comes with an actual example, we could probably use more examples/case studies for both sides of the argument.
Upon reading the title I actually thought the article would argue the exact opposite, that formalization affects intuition in a negative way. I like non-eucledian geometry as a particular example where formalization actually helped discovery.
But this is definitely now always true. For instance if you wanted to intuitively understand why addition of naturals is commutative, maybe to build intuition for recognizing similar properties elsewhere, would this formal proof really help?
plus_comm =
fun n m : nat =>
nat_ind (fun n0 : nat => n0 + m = m + n0)
... Isn't this similar to a Godzilla Strategy? (One AI overseeing the other.)
That variants of this approach are of use to superintelligent AI safety: 40%.
Do you have some more detailed reasoning behind such massive confidence? If yes, it would probably be worth its own post.
This seems like a cute idea that might make current LLM prompt filtering a little less circumventable, but I don't see any arguments for why this would scale to superintelligent AI. Am I missing something?
Collaborating with an expert/getting tutoring from an expert might be really good?
Probably. How does one go about finding such experts, who are willing to answer questions/tutor/collaborate?
(I think the usual answer to this is university, but to me this does not seem to be worth the effort. Like I maybe met 1-2 people at uni who would qualify for this? How do you find these people more effectively? And even when you find them, how do you get them to help you? Usually this seems to require luck & significant social capital expenditure.)
I unfortunately don't have any answers, just some more related questions:
ability to iterate in a fast matter
This is probably key. If GPT can solve something much faster that's indeed a win. (With the SPARQL example I guess it would take me 10-20 minutes to look up the required syntax and fields, and put them together. GPT cuts that down to a few seconds, this seems quite good.)
My issue is that I haven't found a situation yet where GPT is reliably helpful for me. Maybe someone who has found such situations, and reliably integrated "ask GPT first" as a step into some of their workflows could give their account? I would genuine...
Yeah I guess many programming problems fall into the "easy to verify" category. (Though definitely not all.)
And apparently ChatGPT will shut you right down when attempting to ask for sources:
I'm sorry, but I am unable to provide sources for my claims as I am a large language model trained by OpenAI and do not have the ability to browse the internet. My answers are based on the information I have been trained on, but I cannot provide references or citations for the information I provide.
So... if you have to rigorously fact-check everything the AI tells you, how exactly is it better than just researching things without the AI in the first place? (I guess you need a domain where ChatGPT has adequate knowledge and claims in said domain are easily verifiable?)
Wow had this happen literally on my first interaction with ChatGPT. It seems to be just making stuff up, and won't back down when called out.
--extract
flag. (At least not on my machine, I have coq version 8.16.0)"After a bit of testing, ChatGPT seems pretty willing to admit mistakes early in the conversation. However, after the conversation goes on for a while, it seems to get more belligerent. Maybe repeating a claim makes ChatGPT more certain of the claim?
At the start, it seems well aware of its own fallibility:
In the abstract:
In a specific case:
Doesn't mind being called a liar:
Open to corrections:
We start to see more tension when the underlying context of the conversation differs between the human and ChatGPT. Are we talking about the most commonly encountered s...
Wow, this is the best one I've seen. That's hilarious. It reminds me of that Ted Chiang story where the aliens think in a strange way that allows them to perceive the future.
I would definitely be interested if you found a way to self-review recordings of your social interactions for improvement. The main roadblock I see is that either you tell the other parties that you are recording, which will probably influence their behavior a lot and erase most of the signal you were looking for in the first place, or you don't, which does feel a bit unethical.