All of Amal 's Comments + Replies

Amal 52

sure, I'm actually not suggesting that it should necessarily be a feature of dialogues on lw, it was just a suggestion for a different format (my comment generated almost opposite karma/agreement votes, so maybe this is the reason?). it also depends on frequency how often do you use the branching - my guess is that most don't require it in every point, but maybe a few times in the whole conversation might be useful. 

Amal 30

yeah definitely, there could be a possibility for quoting/linking answers from other branches - i haven't seen any UI that would support something like it, but also my guess is that it wouldn't be too difficult to make one. my thinking about it was that there would be one main branch and several other smaller branches that could connect to the main one, so that some points can be discussed in greater depth. also, the branching should probably not happen always, but just when both participants occasionally agree on them.

Amal 131

It seems to me that these types of conversations would benefit if they were not chains but trees instead. Usually when two people have a disagreement/different point of view, there is usually some root cause of this disagreement. When the conversation is a chain, I think it likely results in one person explaining her arguments/making several points, another one having to expand on each, and then at some point in order for this to not result in massively long comments, the participants have to paraphrase, summarise or ignore some of the arguments to make it... (read more)

5Épiphanie Gédéon
Came here to comment this. As it is, this seems like just talking on discord/telegram, but with the notion of publishing it later. What I really lack when discussing something is the ability to branch out and backtrack easily and have a view of all conversation topics at once. That being said, I really like the idea of dialogues, and I think this is a push in the right direction, I have enjoyed the dialogue I read so far greatly. Excited to see where this goes
5Legionnaire
I have done something similar using draw.io for arguments regarding a complex feature. Each point often had multiple counterpoints, which themselves sometimes split into other points. I think this is only necessary for certain discussions and should probably not be the default though.
3solvalou
Perhaps instead of a tree it would be better to have a directed acyclic graph, since IME even if the discussion splits off into branches one often wants at some point to respond to multiple endpoints with one single comment. But I don't know if there's really a better UI for that shape of discussion than a simple flat thread with free linking/quoting of earlier parts of discussions. I don't think I have ever seen a better UI for this than 4chan's.
Amal 70

My guess is that it will be a scaled-up Gato - https://www.lesswrong.com/posts/7kBah8YQXfx6yfpuT/what-will-the-scaled-up-gato-look-like-updated-with. I think there might be some interesting features when the models are fully multi-modal - e.g. being able to play games, perform simple actions on a computer etc. Based on the announcement from google I would expect full multimodal training - image, audio, video, text in/out. Based on deepmind's hiring needs I would expect they want it to also generate audio/video and extend the model to robotics (the brain of... (read more)

Amal 10

sure, I agree that writing is a tough gig and the distribution of what is read how much is pareto, still however the writers contribute to the chance that they improve the top writings that are read the most.


I think I'm much less interested in how deeply poeple benefit, but more in how many of them can potentially benefit and whether this scales roughly with the effort e.g. professions where by spending X effort I can serve Y people and if I wanted to serve 2Y people I would have to spend 2X effort (chef/teacher/hairdresser...) don't fall into the same cat... (read more)

Amal 10

Some of my updates:
at least one version with several trillion parameters, at least 100k tokens long context window(with embeddings etc. seemingly 1million), otherwise I am quite surprised that I mostly still agree with my predictions, regarding multimodal/RL capabilities. I think robotics could still see some latency challenges, but anyway there would be a significant progress in tasks not requiring fast reactions - e.g. picking up things, cleaning a room, etc. Things like superagi might become practically useful and controlling a computer with text/voice would seem easy.

Amal 10

I believe we can now say with a high level of confidence that the scaled up GATO will be Google's Gemini model released in next few months. Does anyone want to add/update their predictions?

Amal 20

it could be sparse...a 175B parameters GPT-4 that has 90 percent sparsity could essentially equivalent to 1.75T param GPT-3. Also I am not exactly sure, but my guess is that if it is multimodal the scaling laws change (essentially you get more varied data instead of training it always on predicting text which is repetitive and likely just a small percentage contains new useful information to learn).

2Logan Zoellner
My impression (could be totally wrong) was that GPT-4 won't be much larger than GPT-3 but it's effective parameter size will be much larger by using techniques like this.
Amal 41

Stupid beginner question: I noticed that while interesting, many of the posts here are very long and try to go deep into the topic explored often without tldr. I'm just curious - how do the writers/readers find time for it? are they paid? If someone lazy like me wants to participate - is there a more twitter-like Lesswrong version?

4MondSemmel
I'm not sure what a Twitter-like LW would look like; you'd have to elaborate. But a somewhat more approachable avenue to some of the best content here can be found in the Library section, in particular the Sequences Highlights, Harry Potter and the Methods of Rationality (though I think this mirror provides a more pleasant reading experience), or Best of LessWrong. Alternatively, you could decide what things you particularly care about and then browse the Concepts (tags) page, e.g. here's the Practical tag. For more active community participation, there's the LW Community page, or the loosely related Slate Star Codex subreddit, and of course lots of rationalists use Twitter or Facebook, too. ---------------------------------------- Finally, regarding writers being paid: I don't know the proportion of people who are in some capacity professional writers, or who do something else as a job and incidentally produce some blog posts sometimes. But to give some examples of how some of the content here comes to be: As I understand it, Yudkowsky wrote the original LW Sequences as a ~2-year fulltime project while employed at the nonprofit MIRI. The LW team is another nonprofit with paid employees; though that's mostly behind-the-scenes infrastructure stuff, the team members do heavily use LW, too. And some proportion of AI posts are by people who work on this stuff full-time, but I don't know which proportion. And lots of content here is crossposted from people with their own blogs, which may have some form of funding (like Substack, Patreon, grants from a nonprofit, etc.). E.g. early on Scott Alexander used to post directly on LW as a hobby, then mostly on his own blog Slate Star Codex, and nowadays he writes on the Astral Codex Ten substack.
Amal 20

my understanding is that they fully separate computation and memory storage. So whhile traditional architectures need some kind of cache to store large amount of data for model partitions from which just a small portion is used for the computation at any single time point, CS2 only requests what it needs so the bandwidth doesnt need to be so big

Amal 44

I am certainly not an expert, but I am still not sure about your claim that it's only good for running small models. The main advantage they claim to have is "storing all model weights externally and stream them onto each node in the cluster without suffering the traditional penalty associated with off chip memory. weight streaming enables the training of models two orders of magnitude larger than the current state-of-the-art, with a simple scaling model." (https://www.cerebras.net/product-cluster/ , weight streaming). So they explicitly claim that it shou... (read more)

2jacob_cannell
This is almost a joke, because the equivalent GPU architecture has both greater total IO bandwidth to any external SSD/RAM array, and the massive near-die GPU RAM that can function as a cache for any streaming approach. So if streaming works as well as Cereberas claims, GPUs can do that as well or better. I agree sparsity (and also probably streaming) will be increasing important; I've actually developed new techniques for sparse matrix multiplication on GPUs.
Amal 20

oh and besides IQ tests, i predict it would also be able to pass most current CAPTCHA-like tests (though humans would still be better in some)

Amal 60

What are your reasons for AGI being so far away?

Amal 32

Nah...I still believe that the future AGI would invent a time machine and then it invents itself before 2022

1Darren McKee
Hahaha. With enough creativity, one never has to change their mind ;)
Amal 30

Why do you think TAI is decades away?

7DragonGod
1. That is the world I've decided to optimise for regardless of what I actually believe timelines are 2. I don't really feel like rehashing the arguments for longer timelines here (not all that relevant to my question) but it's not the case that I have a < 10% probability on pre 2040 timelines, more that I think I can have a much larger impact on post 2040 timelines than on pre 2030, so most of my attention is directed there. That said computational/biological anchors are a good reason for longer timelines absent foundational breakthroughs in our understanding of intelligence. Furthermore, I suspect that intelligence is hard, that incremental progress will become harder as systems become more capable, that returns to cumulative investment in cognitive capabilities are sublinear/marginal returns to cognitive investment decay at a superlinear rate, etc.)
Amal 20

I should also make a prediction for the nearer version of GATO to actually answer the questions from the post. So if a new version of GATO appears in next 4 months, I predict:

80% confidence interval: Gato will have 50B-200B params. Context window will be 2-4x larger(similar to GPT-3)

50%: No major algorithmic improvements, RL or memory. Maybe use of perceiver. Likely some new tokenizers. The improvements would come more from new data and scale.

80%: More text,images,video,audio. More games and new kinds of data. E.g. special prompting to do something in a ga... (read more)

Amal 71

Isn't the risk coming from insufficient AGI alignment relatively small compared to vulnerable world hypthesis? I would expect that even without the invention of AGI or with aligned AGI, it is still possible for us to use some more advanced AI techniques as research assistants that help us invent some kind of smaller/cheaper/easier to use atomic bomb that would destroy the world anyway. Essentially the question is why so much focus on AGI alignment instead of general slowing down of technological progress?

I think this seems quite underexplored. The fact that it is hard to slow down the progress doesn't mean it isn't necessary or that this option shouldn't be researched more.

3[anonymous]
Here's why I personally think solving AI alignment is more effective than generally slowing tech progress * If we had aligned AGI and coordinated in using it for the right purposes, we could use it to make the world less vulnerable to other technologies * It's hard to slow down technological progress in general and easier to steer the development of a single technology, namely AGI * Engineered pandemics and nuclear war are very unlikely to lead to unrecoverable societal collapse if they happen (see this report) whereas AGI seems relatively likely (>1% chance) * Other more dangerous technology (like maybe nano-tech) seems like it will be developed after AGI so it's only worth worrying about those technologies if we can solve AGI
Amal 41

I see. I will update the post with some questions. I find it quite difficult though to forecast on how percentages of the performance metrics would improve, compared to just predicting capabilities as the datasets are probably not so well known. 

3Daniel Kokotajlo
Yeah, makes sense. I guess maybe part of what's going on is that forecasting the next Gato in particular is less exciting/interesting than forecasting stuff like AGI, even though in order to forecast the latter it's valuable to build up skill forecasting things like the former. Anyhow here are some ass-number predictions I'll make to answer some of your questions, thanks for putting them up: 75% confidence interval: Gato II will be between 5B and 50B parameters. Context window will be between 2x and 4x as long. 75% credence: No significant new algorithmic improvements, just some minor things that make it slightly better, or maybe nothing at all. 90% credence: They'll train it on a bigger suite of tasks than Gato. E.g. more games, more diverse kinds of text-based tasks, etc. 70% credence: There will be some transfer learning, in the following sense: It will be clear from the paper that, probably, Gato II would outperform on task X a hypothetical variant of itself that had only trained on task X but not on any others (holding fixed the amount of task X training). For most but not all tasks studied. 60% credence: Assuming they test Gato II's ability to improve via chain-of-thought, the gains from doing so will be greater than the gains for a similarly sized language model. 90% credence: Gato II will still not be able to play new Atari games without being trained on them as well as humans, i.e. it'll be sub-human-level on such games.  
Amal 53

Ok, I was thinking about this a bit and finally got some time to write it down.  I realized that it is quite hard to make predictions about the first version of GATO as it depends on what the team would prioritize in development. Therefore I'll try to predict some attributes/features of a GATO-like model that should be available in next two years, while expecting that many will appear sooner - it is just difficult to say which ones. I'm not a professional ML researcher so I might get some factual things wrong, so I would be happy to hear from people w... (read more)

1Amal
Some of my updates: at least one version with several trillion parameters, at least 100k tokens long context window(with embeddings etc. seemingly 1million), otherwise I am quite surprised that I mostly still agree with my predictions, regarding multimodal/RL capabilities. I think robotics could still see some latency challenges, but anyway there would be a significant progress in tasks not requiring fast reactions - e.g. picking up things, cleaning a room, etc. Things like superagi might become practically useful and controlling a computer with text/voice would seem easy.
2Amal
oh and besides IQ tests, i predict it would also be able to pass most current CAPTCHA-like tests (though humans would still be better in some)
2Amal
I should also make a prediction for the nearer version of GATO to actually answer the questions from the post. So if a new version of GATO appears in next 4 months, I predict: 80% confidence interval: Gato will have 50B-200B params. Context window will be 2-4x larger(similar to GPT-3) 50%: No major algorithmic improvements, RL or memory. Maybe use of perceiver. Likely some new tokenizers. The improvements would come more from new data and scale. 80%: More text,images,video,audio. More games and new kinds of data. E.g. special prompting to do something in a game, draw a picture, perform some action. 75%: Visible transfer learning. Gato trained on more tasks and pre-trained on video would perform better in most but not all games, compared to a model with similar size trained just on the particular task. Language model would be able to descripe shape of objects better after being trained together with images/video/audio.   70%: Chain of thought reasoning would perform better compared to a LLM of similar size. The improvement won't be huge though and I wouldn't expect it to gain some suprisingly sophisticated new LLM capabilities. 80%: It won't be able to play new Atari games similarly to humans, but there would be a visible progress - the actions would be less random and directed towards the goal of the game. With sophisticated prompting, e.g. "Describe first what the goal of this game is, how to play it, what is the best strategy", significant improvements would be seen, but still sub-human.
Amal 64

this has generated much less engagement than I thought it would...what am I doing wrong?

3jacopo
I think having an opinion on this requires much more technical knowledge than GPT4 or DALLE 3. I for one don't know what to expect. But I upvoted the post, because it's an interesting question.
Amal 42

thanks for this post! I think it is always great when people share their opinions about the timelines and more people(even the ones not directly involved in ML) should be encouraged to freely express their view without the fear that they will be held accountable in the case they are wrong. In my opinion, even the people directly involved in ML research seem to be too reluctant to share their timelines and how they impact their work which might be useful for others. Essentially, I think that people should share their view when it is something that is going ... (read more)

Amal 20

Hi Rohin, how long does it usually take to hear back if selected for the next stage? I applied two weeks ago but didn't receive any other mail yet,  so I was just curious if I still have a chance or was not selected.

2Rohin Shah
That's probably my fault, I've fallen behind on looking through initial applications. Ping me again if you haven't gotten a response in another two weeks.