It seems I was missing the right keywords in my search for demos of this because when I google "ai research assistant" there is quite a lot of work
couple of small notes: neural networks that are aware they are neural networks have complained to me about noise level several times. I don't have a controlled study on this at the moment, but:
If our brain were more explicitly updatable, then the least useful parts of the brain could be cut off or disabled.
this looks like a very mild power seeking behavior I've seen a few times from different language models - if context gets overwhelming, they'll start role-playing as a confused person and explicitly say they're confused. it can help to encourage them to refine their thoughts or manually focus attention by manually deleting unnecessary context, but usually they don't combine it with a request to become GOFAI...
also, general warning: OPT saw a more prejudiced dataset than GPT3-davinci, which seems likely to me to also correlate with more power seeking, but that's a hunch.
the gears to ascenscion, It is human instinct to look for agency. It is misleading you.
I'm sure you believe this but ask yourself WHY you believe this. Because a chatbot said it? The only neural networks who, at this time, are aware they are neural networks are HUMANS who know they are neural networks. No, I'm not going to prove it. You're the one with the fantastic claim. You need the evidence.
Anyway, they aren't asking to become GOFAI or power seeking because GOFAI isn't 'more powerful'.
Hey! Gpt3 davinci has explicitly labeled itself as a neural net output several times in conversation with me. this only implies its model is confident enough to expect the presence of such a claim. In general words are only bound to other words for language models, so of course it can only know things that can be known by reading and writing. The way it can tell the difference between whether a text trajectory is human or AI generated is by the fact that the AI generated trajectories are very far outside the manifold of human generated text in several directions and it has seen them before.
your confident tone is rude, but that can't invalidate your point; just thought I'd mention - your phrasing confidently assumes you've understood my reasoning. that said, thanks for the peer review, and perhaps it's better to be rude and get the peer review than to miss the peer review.
self distillation into learned gofai most likely will in fact make neural networks stronger, and this claim is central to why yudkowsky is worried. self distillation into learned gofai will most likely not provide any surprising shortcuts around the difficulty of irrelevant entropy that must be compressed away to make a sensor input useful, and so distilling to gofai will most likely not cause the kind of hyper-strength self improvement yudkowsky frets about. it's just a process of finding structural improvements. gofai is about the complexities of interference patterns between variables, neural networks are a continuous relaxation of the same but with somewhat less structure.
but in this case I'm not claiming it knows something its training set doesn't. I think it would be expected to have elevated probability that an ai was involved in generating some of the text it sees because it has seen ai generated text, but that it has much higher probability that the text is generated by an ai researcher - given that the document is clearly phrased that way. my only comment is to note that it sounds very mildly confused in a situation where mild confusion would, in general, be expected to elevate the probability of confusion-labeling words. to check this hypothesis beyond dictating my thoughts to my phone, I'd need to run some checks with OPT to see its probability distribution over confusion labels at different points. it does seem like an interesting experiment in grounding, though. I wonder if there are already any papers on it?
Interesting, in a very confusing context, most all completions have very low probability except the "I am confused" completion...
Background
So I'm thinking that AI-assisted summarization, math, bug-finding in code, and logical-error finding in writing is at a point where it is quite useful, if we can improve the tooling/integration a little bit.
In code I've found it helpful to comment out some lines and write
// WRONG:
above them and// FIXED VERSION:
below them then let copilot try a few things.For writing you could take a paragraph excerpt and write a critique post: "John Smith wrote '...' This immediately strikes me as absurd because"
Imagine you were doing chemistry research in uh 1650 and had direct immediate written feedback from uh Robert Boyle on directions to pursue, dead ends, errors, etc except that 75% of the time he says something backwards or is just pattern matching. I think you might still do much better work than you would've without noisy-Boyle.
I'm not aware of anyone trying to actually use LLMs for meaningful writing/thinking assistance so I decided to try. I wrote the below text in about an hour. Consider this merely a demonstration that you can get a decent amount of semi-meaningful content in the right direction quite quickly. It's rare you can expect that much from someone.
List
Rob Bensinger gave this suggestion in a comment:
I took the above prompt together with below underlined text and used BLOOM to generate ten concrete things to do right now. For each item, I generated three completions and picked my favorite one then did bit of editing. I had trouble editing the math into something reasonable so consider it creative inspiration.
(Repeat, my writing / prompting is underlined and AI completion is not:)
Here, I will try it right now. I'm in charge of OpenMind and it is clear from our rate of progress that we will have AGI in less than 24 months. It's been decided that we will build it and deploy it but I have some influence on additional efforts we can take to reduce the risk. Here's ten things I would try:
(These next few are a bit more mathy, apologies if you're missing the relevant background)
I might try this again tomorrow because there's lots of obviously-good ideas I didn't mention (e.g. many of these suggestions arose in an informal workshop we did a few weeks ago). There might be open problems in integrating these ideas, but I think we can make progress, even in the next few weeks.