How far away is this from being implementable?
This probably won't add too much to the discussion but I'm curious to see whether other people relate to this or have a similar process. I was kind of stunned when I heard from friends who got into composing about how difficult it is to figure out a melody and then write a complete piano piece because to me, whenever I open up Sibelius or Dorico (and more recently Ableton), internally it seems like I'm just listening to what I wrote so far, 'hearing' a possible continuation lasting a few bars, and then quickly trying to transcribe ...
So I've figured this out. Kinda. If you choose 'custom' then it will give you Griffin, but if you choose one of the conventional prompts and then edit it, you can get around it. So damn annoying.
Wow, I didn't realise I could get this angry about something so esoteric.
I'm beginning to think AID has changed what the "Dragon" model is without telling us for cost reasons, I've had kind of the same experience with big lapses in storytelling that didn't occur as often before. Or maybe it's randomly switching based on server load? I can kind of understand it if that's the case but the lack of transparency is annoying. I remember accidentally using the Griffin model for a day when my subscription ran out and not realising because its Indonesian was still quite good...
Somehow the more obvious explanation didn't occur to me until now, but check the settings, you might be using the Griffin model not the Dragon model. You have to change it manually even after you get the subscription. I have a window open specifically for poetry prompts (using the Oracle hack), I said "Write a long poem in Russian. Make sure the lines are long, vivid, rich, and full of description and life. It should be a love poem addressed to coffee. It should be 15 lines long" followed with "The Oracle, which is a native in Russian, ...
If it's a BPE encoding thing (which seems unlikely to me given that it was able to produce Japanese and Chinese characters just fine), then the implication is OpenAI carried over their encoding from GPT-2 where all foreign language documents were removed from the dataset ... I would have trouble believing their team would have overlooked something that huge. This is doubly bizarre given that Russian is the 5/6th most common language in the dataset. You may want to try prompting it with coherent Russian text, my best guess is that in the dataset, whene...
That's a visualisation I made which I haven't posted anywhere else except under the r/ML thread collecting entries for GPT-3 demos, since I couldn't figure out which subreddit to post it in.
Two thoughts, one of them significantly longer than the other since it's what I'm most excited about.
(1) It might be the case that the tasks showing an asymptotic trend will resemble the trend for arithmetic – a qualitative breakthrough was needed, which was out of reach at the current model size but became possible at a certain threshold.
(2) For translation, I can definitely say that scaling is doing something. When you narrowly define translation as BLEU score ("does this one generated sentence match the reference sentence? by how ...
I just finished Iain M Banks' 'The Player of Games' so my thoughts are being influenced by that, but it had an interesting main character who made it his mission to become the best "general game-player" (e.g no specialising in specific games), so I would be interested to see whether policy-based reinforcement learning models scale (thinking of how Agent 57 exceeded human performance across all Atari games).
It seems kind of trivially true that a large enough MuZero with some architectural changes could do something like play chess,...
Yes! I was thinking about this yesterday, it occurred to me that GPT-3's difficulty with rhyming consistently might not just be a byte-pair problem, any highly structured text with extremely specific, restrictive forward and backward dependencies is going to be a challenge if you're just linearly appending one token at a time onto a sequence without the ability to revise it (maybe we should try a 175-billion parameter BERT?). That explains and predicts a broad spectrum of issues and potential solutions (here I'm calling them A, B and C): per...
The best angle of attack here I think, is synthesising knowledge from multiple domains. I was able to get GPT-3 to write and then translate a Japanese poem about a (fictional) ancient language model into Chinese, Hungarian, and Swahili and annotate all of its translations with stylistic notes and historical references. I don't think any humans have the knowledge required to do that, but unsurprisingly GPT-3 does, and performed better when I used the premise of multiple humans collaborating. It's said that getting different university departments...
I think you were pretty clear on your thoughts, actually. So, the easy / low-level way response to some of your skeptical thoughts would be technical details and I'm going to do that and then follow it with a higher-level, more conceptual response.
The source of a lot of my skepticism is GPT-3's inherent inconsistency. It can range wildly from it's high-quality ouput to gibberish, repetition, regurgitation etc. If it did have some reasoning process, I wouldn't expect such inconsistency. Even when it is performing so well people call it &...
Hmm, I think the purpose behind my post went amiss. The point of the exercise is process-oriented not result-oriented - to either learn to better differentiate the concepts in your head by poking and prodding at them with concrete examples, or realise that they aren't quite distinct at all. But in any case, I have a few responses to your question. The most relevant one was covered by another commenter (reasoning ability isn't binary/quantitative not qualitative). The remaining two are:
1. "Why isn't it an AGI?" here can be read as &...
Great, but the terms you're operating with here are kind of vague. What problems could you give to GPT-3 that would tell you whether it was reasoning, versus "recognising and predicting", passive "pattern-matching" or a presenting "illusion of reasoning"? This was a position I subscribed to until recently, when I realised that every time I saw GPT-3 perform a reasoning-related task, I automatically went "oh, but that's not real reasoning, it could do that just by pattern-matching", and when I saw it do some...
A bunch of more examples here, a bit difficult to summarise since it went from explaining how dopamine receptors work, to writing a poem about Amazon's logistics in the form of a paean to the Moon Goddess, writing poems in Chinese based on English instructions and then providing astonishingly-good translations, to having Amazon and Alibaba diss one another in the style of 18th century poet Mary Robinson. Link here: https://www.reddit.com/r/slatestarcodex/comments/hrx2id/a_collection_of_amazing_things_gpt3_has_done/fy7i7im/?context=3
Example:
The oracle...
'Predicting random text on the internet better than a human' already qualifies it as superhuman, as dirichlet-to-neumann pointed out. If you look at any given text, there's a given ratio of cognitive work needed to produce the text, per word-count. "Superhuman" only requires asking it to replicate the work of multiple people collaborating together, or processes which need a lot of human labour like putting together a business strategy or writing a paper. Assuming it's mediocre in some aspects, the clearest advantage GPT-6 would have would be an interdisciplinary one - pooling together domain knowledge from disparate areas to produce valuable new insights.