Lao Mein

P(doom) = 50%. It either happens, or it doesn't.

Lao Mein | Statistics is Hard. | Patreon

I give full permission for anyone to post part or all of any of my comments/posts to other platforms, with attribution.

Currently doing solo work on glitch tokens and tokenizer analysis. Feel free to send me job/collaboration offers.

DM me interesting papers you would like to see analyzed. I also specialize in bioinformatics.

Wiki Contributions

Comments

Sorted by

When I made $1000 a month at my first job, I didn't buy new clothes for a year, had to ration my heating, and only ate out a few times a week. My main luxury expenses were a gym membership and heating the entire apartment on weekends.

Honestly, anything that's not rice, chicken, cabbage, or rent is a luxury. Candy is a luxury. Takeout is a luxury. Going out for social events is a luxury. Romantic relationships and children are luxuries. I don't think it's impossible for Americans to be working 60 hours a week and consume no luxuries, but it's probably very difficult.

I'm at ~50-50 for large amounts of machine-translated being present in the dataset. 

Having worked in Chinese academia myself, "use Google Translate on the dataset" just seems like something we're extremely likely to do. It's a hard-to-explain gut feeling. I'll try poking around in the tokenizer to see if "uncommon Chinese phrases that would only appear in machine-translated COT" are present as tokens. (I think this is unlikely to be true even if they did do it, however)

I've done a cursory internet search, and it seems that there aren't many native Chinese COT datasets, at least compared to English ones - and one of the first results on Google is a machine-translated English dataset.

I'm also vaguely remembering o1 chain of thought having better Chinese grammar in its COT, but I'm having trouble finding many examples. I think this is the easiest piece of evidence to check - if other (non-Chinese-origin) LLMs consistently use good Chinese grammar in their COT, that would shift my probabilities considerably.

Lao Mein210
Image

Julien Chaumond on X: "Qwen QwQ switching to Chinese when it needs to _really think_ about something, then switching back to English, is pretty cool @Alibaba_Qwen https://t.co/jpTIHWyXim" / X

This is extremely weird - no one actually writes like this in Chinese. "等一下" is far more common than "等待一下", which seems to mash the direct translation of the "wait" [等待] in "wait a moment" - 等待 is actually closer to "to wait". The use of “所以” instead of “因此” and other tics may also indicate the use of machine-translated COT from English during training. 

The funniest answer would be "COT as seen in English GPT4-o1 logs are correlated with generating quality COT. Chinese text is also correlated with highly rated COT. Therefore, using the grammar and structure of English GPT4 COT but with Chinese tokens elicits the best COT". 

I found a good summary of OpenAI's nonprofit restructuring.

Is there a reason why every LLM tokenizer I've seen excludes slurs? It seems like a cheap way to train for AI assistant behavior. 

Also notable that numbers are tokenized individually - I assume this greatly improves its performance in basic arithmetic tasks as compared to GPTs.

Lao Mein2-4

The older I get, and the more I learn about Tolkien, the more he disgusts me.

He is the inverse of all I value and all I find good in the world. 

The meeting allegedly happened on the 11th. The Iranian market rallied immediately after the election. It was clearly based on something specific to a Trump administration. Maybe it's large-scale insider trading from Iranian diplomats?

I also think the market genuinely, unironically disbelieves everything Trump says about tariffs in a way they don't about his cabinet nominations (pharma stocks tanked after RFK got HHS). 

The man literally wrote that he was going to institute 25% tariffs on Canadian goods, to exactly zero movement on Canadian stocks.

US markets are not taking the Trump tariff proposals very seriously - stock prices increased after the election and 10-year Treasury yields have returned to pre-election levels, although they did spike ~0.1% after the election. Maybe the Treasury pick reassured investors?

https://www.cnbc.com/quotes/US10Y

https://www.cnbc.com/quotes/US10Y

If you believe otherwise, I encourage you to bet on it! I expected both yields and stocks to go up and am quite surprised.

I'm not sure what the markets expect to happen - Trump uses the threat of tariffs to bully Europeans for diplomatic concessions, who then back down?  Or maybe Trump backs down? There's also talk about Trump's policies increasing the strength of the dollar, which makes sense. But again, net zero inflation from the tariffs is pretty wild.

The Iranian stock market also spiked after the US elections, which... what?

https://tradingeconomics.com/iran/stock-market

The Iranian government has tried to kill Trump multiple times since he authorized the assassination of Solemani. Trump tightened sanctions against Iran in his first term. He pledges even tougher sanctions against Iran in his second. There is no possible way he can be good for the Iranian economy. Maybe this is just a hedge against inflation?

This is a good argument for the systematic extermination of all insects via gene drives. If you value shrimp at a significant fraction of the value of a human and think they have negative utility by default, we should be trying really hard to make them go extinct. Can quicker euthanasia really compete against gene-drive-induced non-existence?

Is there a thorough analysis of OpenAI's for-profit restructuring? Surely, a Delaware lawyer who specializes in these types of conversions has written a blog somewhere.

Load More