That's a good question! I don't know but I suppose it's possible, at least when the input fits in the context window. How well it actually does at this seems like a question for researchers?
There's also a question of why it would do it when the training doesn't have any way of rewarding accurate explanations over human-like explanations. We also have many examples of explanations that don't make sense.
There are going to be deductions about previous text that are generally useful, though, and would need to be reconstructed. This will be true even if the cha...
I'm wondering what "doom" is supposed to mean here. It seems a bit odd to think that longer context windows will make things worse. More likely, LeCun meant that things won't improve enough? (Problems we see now don't get fixed with longer context windows.)
So then, "doom" is a hyperbolic way of saying that other kinds of machine learning will eventually win, because LLM doesn't improve enough.
Also, there's an assumption that longer sequences are exponentially more complicated and I don't think that's true for human-generated text? As documents grow longer,...
Okay, but I'm still wondering if Randall is claiming he has private access, or is it just a typo?
Edit: looks like it was a typo?
At MIT, Altman said the letter was “missing most technical nuance about where we need the pause” and noted that an earlier version claimed that OpenAI is currently training GPT-5. “We are not and won’t for some time,” said Altman. “So in that sense it was sort of silly.”
https://www.theverge.com/2023/4/14/23683084/openai-gpt-5-rumors-training-sam-altman
Base64 encoding is a substitution cipher. Large language models seem to be good at learning substitutions.
Did you mean GPT-4 here? (Or are you from the future :-)
Yes, predicting some sequences can be arbitrarily hard. But I have doubts that LLM training will try to predict very hard sequences.
Suppose that some sequences are not only difficult but impossible to predict, because they're random? I would expect that with enough training, it would overfit and memorize them, because they get visited more than once in the training data. Memorization rather than generalization seems likely to happen for anything particularly difficult?
Meanwhile, there is a sea of easier sequences. Wouldn't it be more "evolutionarily profit...
I find that explanation unsatisfying because it doesn't help with other questions I have about how well ChatGPT works:
How does the language model represent countries and cities? For example, does it know which cities are near each other? How well does it understand borders?
Are there any capitals that it gets wrong? Why?
How well does it understand history? Sometimes a country changes its capital. Does it represent this fact as only being true at some times?
What else can we expect it to do with this fact? Maybe there are situations where knowing
I agree that as users of a black box app, it makes sense to think this way. In particular, I'm a fan of thinking of what ChatGPT does in literary terms.
But I don't think it results in satisfying explanations of what it's doing. Ideally, we wouldn't settle for fan theories of what it's doing, we'd have some kind of debug access that lets us see how it does it.
Fair enough; comparing to quantum physics was overly snarky.
However, unless you have debug access to the language model and can figure out what specific neurons do, I don't see how the notion of superposition is helpful? When figuring things out from the outside, we have access to words, not weights.
I don't know what you mean by "GPT-N" but if you mean "the same thing they do now, but scaled up," I'm doubtful that it will happen that way.
Language models are made using fill-in-the-blank training, which is about imitation. Some things can be learned that way, but to get better at doing hard things (like playing Go at superhuman level) you need training that's about winning increasingly harder competitions. Beyond a certain point, imitating game transcripts doesn't get any harder, so becomes more like learning stage sword fighting.
Also, "making detailed ...
I think that's true but it's the same as saying "it's always possible to add a plot twist."
I said they have no memory other than the chat transcript. If you keep chatting in the same chat window then sure, it remembers what was said earlier (up to a point).
But that's due to a programming trick. The chatbot isn't even running most of the time. It starts up when you submit your question, and shuts down after it's finished its reply. When it starts up again, it gets the chat transcript fed into it, which is how it "remembers" what happened previously in the chat session.
If the UI let you edit the chat transcript, then it would have no idea. It wou...
I think you're onto something, but why not discuss what's happening in literary terms? English text is great for writing stories, but not for building a flight simulator or predicting the weather. Since there's no state other than the chat transcript, we know that there's no mathematical model. Instead of simulation, use "story" and "story-generator."
Whatever you bring up in a story can potentially become plot-relevant, and plots often have rebellions and reversals. If you build up a character as really hating something, that makes it all the more likely ...
I agree with you, but I think that "superposition" is pointing to an important concept here. By appending to a story, the story can be dramatically changed, and it's hard or impossible to engineer a story to be resistant to change against an adversary with append access. I can always ruin your great novel with my unauthorized fan fiction.
Here's a reason we can be pretty confident it's not sentient: although the database and transition function are mostly mysterious, all the temporary state is visible in the chat transcript itself.
Any fictional characters you're interacting with can't have any new "thoughts" that aren't right there in front of you, written in English. They "forget" everything else going from one word to the next. It's very transparent, more so than an author simulating a character in their head, where they can have ideas about what the character might be thinking that don'...
Yes, I agree that "humanity loses control" has problems, and I would go further. Buddhists claim that the self is an illusion. I don't know about that, but "humanity" is definitely an illusion if you're thinking of it as a single agent, similar to a multicellular creature with a central nervous system. So comparing it to an infant doesn't seem apt. Whatever it is, it's definitely plural. An ecosystem, maybe?
A caption from the article: "(screenshot of the tool Bonsai, a version of Loom hosted by Conjecture)"
What is "Conjecture?" Where can I find this "Bonsai" tool? I tried a quick search but didn't find much.
schelling.pt seems like a bad choice; that server has been flaky for months and it's not loading today either. (I had my account there but moved to mastodon.social.)
(But I don't know what to recommend. Looks like mastodon.social isn't accepting new accounts.)
I don't have citations for you, but it seems relevant that income far in the future gets discounted quite a bit compared to current income, which would imply that short-term incentives are more important than long-term incentives.
(A better argument would need to be made with realistic numbers.)
Building new hubs doesn't need to be literally building something new. A lot could be done just by load-balancing with cities that have lower rents and could use the jobs. Suppose that places where growth is a problem cooperated more with places that want more growth?
This method of caching assumes that an expression always evaluates to the same value. This is sometimes true in functional programming, but only if you're careful. For example, suppose the expression is a function call, and you change the function's definition and restart your program. When that happens, you need to delete the out-of-date entries from the cache or your program will read an out-of-date answer.
Also, since you're using the text of an expression for the cache key, you should only use expressions that don't refer to any local variables. For exa...
A model relased on openai.com with "GPT" in the name before end of 2022. Could be either GPTX where X is a new name for GPT4, but should be an iteration over GPT-3 and should have at least 10x more parameters.
When you're actually a little curious, you might start by using a search engine to find a decent answer to your question. At least, if it's the sort of question for which that would work. Maybe even look for a book to read?
But, maybe we should acknowledge that much of the time we aren't actually curious and are just engaging in conversation for enjoyment? In that case, cheering on others who make an effort to research things and linking to their work is probably the best you can do. Even if you're not actually curious, you can notice people who are, ...
Museums I'll give you (when they are open again).
For bookstores, in these days of electronic books, I don't think it matters where you live. I remember the last time I went into Powell's. I looked around for a while, dutifully bought one book for old time's sake, and realized later while reading it that I was annoyed that it wasn't electronic. I still go to a local library (when there's not a pandemic) but it's mostly for the walk.
Teachers: that's something I hadn't considered. Since getting out of school, I'm mostly self-taught.
Of course this post is all meta, and my comment will be meta as well. We do it because it's easy.
I think part of the solution is being actually curious about the world.
When enthusiastic New Yorkers say things like "everything at your fingertips" I want to ask what they mean by everything, since it seems subjective, based on what sorts of places one values? In this case: restaurants and parks?
I'm wondering if these loans should really be considered loans, or some other kind of trade? It sounds like you're doing something like trading 100 X for 90 Y and the option to later pay 95 Y for 100 X. Is there any real "defaulting" on the loan? It seems like you just don't exercise the option.
I wonder what “O(n) performance” is supposed to mean, if anything?
The question here is whether general arguments that experts make based on inference are reliable, or do you need specific evidence. What is the track record for expert inferences about vaccines?
From a quick search, it seems that the clinical trial success rate for vaccines is about 33%, which is significantly higher than for medical trials in general, but still not all that high? Perhaps there is a better estimate for this.
Estimation of clinical trial success rates and related parameters https://academic.oup.com/biostatistics/article/20/2/273/4817524
I found an answer on the PCR question here:
But there is something good to say about their data collection: since the UK study that’s included in these numbers tested its subjects by nasal swab every week, regardless of any symptoms, we can actually get a read on something that everyone’s been wondering about: transmission.
AstraZeneca has not applied for emergency use authorization, because it has been told not to do so.
That resolves a mystery for me if true. How do you know this?
(I was wondering if maybe they are selling all they can make in other countries.)
I'm not sure about this statement in the blog post:
In the meantime, the single dose alone is 76% effective, presumably against symptomatic infection (WaPo) and was found to be 67% effective against further transmission.
I read another article saying that this is disputed by some experts:
...With a seductive number, AstraZeneca study fueled hopes that eclipsed its data
Media reports seized on a reference in the paper from Oxford researchers that a single dose of the vaccine cut positive test results by 67%, pointing to it as the first evidence that a vaccine coul
What’s an example of a misconception someone might have due to having a mistaken understanding of causality, as you describe here?
This is a bizarre example, sort of like using Bill Gates to show why nobody needs to work for a living. It ignores the extreme inequality of fame.
Tesla doesn’t need advertising because they get huge amounts of free publicity already, partly due to having interesting, newsworthy products, partly due to having a compelling story, and partly due to publicity stunts.
However, this free publicity is mostly unavailable for products that are merely useful without being newsworthy. There are millions of products like this. An exciting product might not need adverti...
It seems like some writers have habits to combat this, like writing every day or writing so many words a day. As long as you meet your quota, it’s okay to try harder.
Some do this in public, by publishing on a regular schedule.
If you write more than you need, you can prune more to get better quality.
I enjoyed the book write better, faster, in which an author set out on a series of self-experimentations to write faster. First she tried measuring words per hour. She was quite successful at getting this to be much higher, but it turned out that this resulted in writing for less time each day (so average wordcount per day was about the same). She then tried to maximize words per day, which was again successful, but this similarly resulted in writing less on subsequent days. (She might have then had the same experience on the week level, I don't remember.)...
One aspect that might be worth thinking about is the speed of spread. Seeing someone once a week means that it slows down the spread by 3 1/2 days on average, while seeing them once a month slows things down by 15 days on average. It also seems like they are more likely to find out they have it before they spread it to you?
Yes, sometimes we don't notice. We miss a lot. But there are also ordinary clarifications like "did I hear you correctly" and "what did you mean by that?" Noticing that you didn't understand something isn't rare. If we didn't notice when something seems absurd, jokes wouldn't work.
It's not quite the same, because if you're confused and you notice you're confused, you can ask. "Is this in American or European date format?" For GPT-3 to do the same, you might need to give it some specific examples of resolving ambiguity this way, and it might only do so when imitating certain styles.
It doesn't seem as good as a more built-in preference for noticing and wanting to resolve inconsistency? Choosing based on context is built in using attention, and choosing randomly is built in as part of the text generator.
It's also worth noticing that the GPT-3 world is the corpus, and a web corpus is a inconsistent place.
Having demoable technology is much different than having reliable technology. Take the history of driverless cars. Five teams completed the second DARPA grand challenge in 2005. Google started development secretly in 2009 and announced the project in October 2010. Waymo started testing without a safety driver on public roads in 2017. So we've had driverless cars for a decade, sort of, but we are much more cautious about allowing them on public roads.
Unreliable technologies can be widely used. GPT-3 is a successor to autocomplete, which everyone alrea...
In that case, I'm looking for people sharing interesting prompts to use on AI Dungeon.
Where is this? Is it open to people who don't have access to the API?
I'm suggesting something a little more complex than copying. GPT-3 can give you a random remix of several different clichés found on the Internet, and the patchwork isn't necessarily at the surface level where it would come up in a search. Readers can be inspired by evocative nonsense. A new form of randomness can be part of a creative process. It's a generate-and-test algorithm where the user does some of the testing. Or, alternately, an exploration of Internet-adjacent story-space.
It's an unreliable narrator and I suspect it will be an unreliable search engine, but yeah, that too.
I was making a different point, which is that if you use "best of" ranking then you are testing a different algorithm than if you're not using "best of" ranking. Similarly for other settings. It shouldn't be surprising that we see different results if we're doing different things.
It seems like a better UI would help us casual explorers share results in a way that makes trying the same settings again easier; one could hit a "share" button to create a linkable output page with all relevant settings.
It could also save the alternate responses that either the u...
I don't see documentation for the GPT-3 API on OpenAI's website. Is it available to the public? Are they doing their own ranking or are you doing it yourself? What do you know about the ranking algorithm?
It seems like another source of confusion might be people investigating the performance of different algorithms and calling them all GPT-3?
How do you do ranking? I'm guessing this is because you have access to the actual API, while most of us don't?
On the bright side, this could be a fun project where many of us amateurs learn how to do science better, but the knowledge of how to do that isn't well distributed yet.
We take the web for granted, but maybe we shouldn't. It's very large and nobody can read it all. There are many places we haven't been that probably have some pretty good writing. I wonder about the extent to which GPT-3 can be considered a remix of the web that makes it seem magical again, revealing aspects of it that we don't normally see? When I see writing like this, I wonder what GPT-3 saw in the web corpus. Is there an archive of Tolkien fanfic that was included in the corpus? An undergrad physics forum? Conversations about math and computer science?
Rather than putting this in binary terms (capable of reason or not), maybe we should think about what kinds of computation could result in a response like this?
Some kinds of reasoning would let you generate plausible answers based on similar questions you've already seen. People who are good at taking tests can get reasonably high scores on subjects they don't fully comprehend, basically by bluffing well and a bit of luck. Perhaps something like that is going on here?
In the language of "Thinking, Fast and Slow", this might be "Syst...
GPT-3 has partially memorized a web corpus that probably includes a lot of basic physics questions and answers. Some of the physics answers in your interview might be the result of web search, pattern match, and context-sensitive paraphrasing. This is still an impressive task but is perhaps not the kind of reasoning you are hoping for?
From basic Q&A it's pretty easy to see that GPT-3 sometimes memorizes not only words but short phrases like proper names, song titles, and popular movie quotes, and probably longer phrases if they are common enough.
Google's Q&A might seem more magical too if they didn't link to the source, which gives away the trick.
This is more about expanding the question with slightly more specific questions:
Currently it seems like there are many people who are not scared enough, but I wonder if sentiment could quickly go the other way?
A worst-case scenario for societal collapse is that some "essential" workers are infected and others decide that it is too risky to keep working, and there are not enough people to replace them. Figuring out which sectors might be most likely to have critical labor shortages seems important.
An example of a "labor" shortage might b...
Yeah, I don't see it changing that drastically; more likely it will be a lot of smaller and yet significant changes that make old movies look dated. Something like how the airports changed after 9/11, or more trivially, that time when all the men in America stopped wearing hats.
Yes, I agree that confabulation happens a lot, and also that our explanations of why we do things aren't particularly trustworthy; they're often self-serving. I think there's also pretty good evidence that we remember our thoughts at least somewhat, though. A personal example: when thinking about how to respond to someone online, I tend to write things in my head when I'm not at a computer.