Cosmia_Nebula

Replying toTwo interviews with the founder of DeepSeek

Two interviews with the founder of DeepSeek

That discussion is by people outside of DeepSeek trying to process the shock of R1. It is unclear what DeepSeek is doing currently.

Interviews with Moonshot AI's CEO, Yang Zhilin

Cosmia_Nebula

<https://news.qq.com/rain/a/20240208A05KFR00>

# Exclusive interview with Moonshot's Yang Zhilin: How does a new AGI startup surpass OpenAI?

Overseas Unicorn
Published on Beijing Overseas Unicorn official account at 2024-02-21 11:25.

Interviewers: 天一、penny、guangmi
Editor: 天一
Typesetting: Scout

"Lossless long context is everything." This is the point we remember most deeply after a two-hour conversation with Yang Zhilin.

This technical judgment was already conveyed in October 2023 when Moonshot AI, founded by Yang Zhilin, released its first model, moonshot, and the smart assistant Kimi, supporting an input of 200,000 characters. The focus on "long" stems from Yang Zhilin's belief that the ultimate value of AI-Native products is providing personalized interactions, and lossless long context is the foundation for achieving this. He argues that model... (read 20307 more words →)

A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology

Cosmia_Nebula

This document has been translated at ChinaTalk.media, but the critical technical section (18. to 48.) is not free. So I translated that part.

## Technical Detail 1: SFT

> “There's no need to do SFT at the inference level anymore.”

18. The biggest shock brought by DeepSeek is not open source or low cost, but that there is no need to do SFT. (Note: SFT: Supervised Fine-Tuning, a technique to improve the performance of a pretrained model on a specific task or domain on labeled data.) But only for logical tasks. Non-logical tasks may still require SFT. It is interesting to discuss this point -- Does this present a new paradigm or architecture that makes... (read 2177 more words →)

Two interviews with the founder of DeepSeek

Cosmia_Nebula

# The Madness of High-Flyer: The Approach to LLM by an AI Giant that Few See
暗涌Waves (2023-05-23 22:50)

Written by 于丽丽
Edited by 刘旌
Translated by Cosmia Nebula

High-Flyer is probably the most exotic among the swarming multitude of competitors in the battle of large models.

This is a game destined for the few, and while many startups are adjusting their direction or even retreating after the big players enter the game, this quantitative fund is alone in its march.

In May, High-Flyer named its new independent organization for making large models DeepSeek (深度求索) and emphasized that it would focus on making real human-level artificial intelligence. Their goal is not only to replicate ChatGPT, but also to research... (read 9198 more words →)

Cosmia_Nebula1y

Edit: I found it. It's from Yurchak, Alexei. "Soviet hegemony of form: Everything was forever, until it was no more." Comparative studies in society and history 45.3 (2003): 480-510.

The following examples are taken from a 1977 leading article, "The Ideological Conviction of the Soviet Person" (Ideinost' sovetskogo cheloveka, Pravda, July 1, 1977). For considerations of space, I will limit this analysis to two generative principles of block-writing: the principle of complex modification and that of complex nominalization. The first sentence in the Pravda text reads: "The high level of social consciousness of the toilers of our country, their richest collective experience and political reason, manifest themselves with an exceptional completeness in the

Cosmia_Nebula1y

Abs-E (or, speak only in the positive)

A few more examples:

"Is this gluten-free?" (If we allow "gluten-free" we would allow "Every room is John-free." and of course "Grass is edibility-free." and very quickly Abs-E is trivial.)
- Attempt: "This product contains rice flour, corn starch, tapioca flour, and salt." but that just prompts the further question "Does any of those contain gluten?" ...
Wittgenstein interrupted: "What can be said at all can be said clearly, and what we..."
"I think not all swans are white, and if we look for it we will find one that is not white."
- Attempt: "There exists a swan that is ..." Blue? Green? Red? I can't say "non-white". I also can't just list every color.
"I don't believe in magic."
- I don't even know how to start converting this to a positive statement.

Replying toAbs-E (or, speak only in the positive)

Cosmia_Nebula1y

Abs-E (or, speak only in the positive)

It seems you are hitting against the expressive limits of Existential Positive First-Order Logic. It seems that they are exponentially less powerful than first order logic, in the following sense:

every existential positive first-order sentence can be transformed in an equivalent one in prenex normal form without an exponential blowup, thanks to the absence of universal quantifiers and negation symbols.

Bodirsky, Manuel, Miki Hermann, and Florian Richoux. "Complexity of existential positive first-order logic." Journal of Logic and Computation 23.4 (2013): 753-760.

Replying toApplications of Chaos: Saying No (with Hastings Greer)

Cosmia_Nebula1y

Applications of Chaos: Saying No (with Hastings Greer)

It seems to me that chaos control and anti-control is another non-application.

[Handbook of Chaos Control: Schöll, Eckehard, Schuster, Heinz Georg](https://www.amazon.com/Handbook-Chaos-Control-Eckehard-Sch%C3%B6ll/dp/3527406050)

Replying tothings that confuse me about the current AI market.

Cosmia_Nebula1y

things that confuse me about the current AI market.

Do you have a citation for the claim that Gemini 1.0 Ultra trained for 1e26 FLOPs? I had searched all around but can't find any information on its compute cost.

Replying tothings that confuse me about the current AI market.

Cosmia_NebulaSep 16, 2024

things that confuse me about the current AI market.

This is not an answer to the broader question, but just regarding the "no Wikipedia page" thing.

I would like to write a Wikipedia page about Flux, but as it is, there is very little quality information about it. We have a lot of anecdotal information about how to use it, and a little academic description of it, but that's not enough.

Besides, it seems everyone who can write well in artificial intelligence wants to write their damned academic blog that is read by like 10 people a month and not Wikipedia, and Wikipedia accumulates a large amount of badly written stuff by amateurs.

As an example, see this page

https://en.wikipedia.org/wiki/Generative_adversarial_network

The "Applications" section is a typical... (read more)

Cosmia_Nebula2y

there's the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it's, well, Schmidhuber.

I hate Schmimdhuber with a passion because I can smell everything he touches on Wikipedia and they are always terrible.

Sometimes when I read pages about AI, I see things that almost certainly came from him, or one of his fans. I struggle to speak of exactly what Schmidhuber's kind of writing gives, but perhaps this will suffice: "People never give the right credit to anything. Everything of importance is either published by my research group first but miscredited to someone later, or something like that. Deep Learning? It's done not by Hinton, but Amari, but not Amari,... (read 398 more words →)

Cosmia_Nebula2y*

Finally somepony noticed my efforts!

Size: 1000x1044 | Tagged: safe, artist:hidden-cat, twilight sparkle, g4, crying, female, japanese, senpai, solo

Concurring with the sentiment, I have realized that nothing I write is going to be as well-read as Wikipedia, so I have devoted myself to writing Wikipedia instead of trying to get a personal blog anymore.

I will comment on a few things:

I really want to get the neural scaling law page working with some synthesis and updated data, but currently there are no good theoretical synthesis. Wikipedia isn't good for just a giant spreadsheet.
I wrote most of the GAN page, the Diffusion Model page, Mixture of Experts, etc. I also wrote a few sections of LLM and keep the giant table updated for each frontier model. I am

... (read more)

•••

LESSWRONG
LW

LESSWRONG
LW

Cosmia_Nebula

Interviews with Moonshot AI's CEO, Yang Zhilin

A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology

Two interviews with the founder of DeepSeek

Cosmia_Nebula

Cosmia_Nebula

Interviews with Moonshot AI's CEO, Yang Zhilin

A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology

Two interviews with the founder of DeepSeek