LESSWRONG
LW

1752
Thomas Kwa
7235Ω578268320
Message
Dialogue
Subscribe

Member of technical staff at METR.

Previously: MIRI → interp with Adrià and Jason → METR.

I have signed no contracts or agreements whose existence I cannot mention.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Catastrophic Regressional Goodhart
No wikitag contributions to display.
11Thomas Kwa's Shortform
Ω
6y
Ω
303
Mo Putera's Shortform
Thomas Kwa19h*105

My guess based on reading anecdotes like these and Berger's books is that the algorithm is a vast improvement over anyone else's engineering practices, but it alone doesn't tell you what else you need to run a company. Maybe systems engineering is the missing piece, maybe some other management philosophy.

If you look at the major SpaceX programs, they are: Falcon development, operations, Starlink, and Starship. The first three were wildly successful, and Starship is late but technically and operationally superior to other companies (e.g. Raptor engines are double the chamber pressure of BE-4 and there have been 10x the test flights), with successes directly traceable to each step of the algorithm, and wasted energy due to not doing something else when appropriate. Raptor 3 engines are only possible to make as cheaply as Elon wants because they had a vast number of parts deleted; yet they also "accelerate"d to build hundreds of Raptor 2s which are now obsolete.

Reply1
Introducing faruvc.org
Thomas Kwa20h40

Hard to answer this question because there's a tradeoff between noise, airflow, and surface area with air purifiers. Eg if you cover your ceiling with air filters, the noise will be minimal.

I'd say if you have enough output power and are only limited by uv exposure, it's vastly more effective. I had to buy a couple of expensive Clean Air Kits purifiers to get ~25 air changes per hour in a smallish room, but 100+ equivalent ACH is possible with far-UVC, either by using light filtered to 222nm or keeping it in some kind of ceiling louver that traps the light. Not sure how the cost compares though, as they seem to be limited by output power / $ rather than safety limits.

Reply
Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Thomas Kwa3d130

This paper happened to be the only one with a perfect score at NeurIPS 2025. Congrats to the authors!

Reply
GradientDissenter's Shortform
Thomas Kwa7d81

My rent, also in a small room in a Bay Area group house, is around $1050. This is an interesting group house phenomenon where if rent is $1800 on average, the good rooms go for $2600 and the bad ones have to be $1000 to balance out total rent. The best rooms in a group house are a limited supply good and bc people (or even couples) often are indifferent between group house with good social scene and a $4000 luxury 1bed, prices are roughly similar. There is lots of road noise, but I realized I could pay $1000 for extra-thick blackout curtains, smart lightbulbs, etc. to mitigate this, which has saved me thousands over the past couple of years.

As for everything else, my sense is it's not for most people. To have expenses as low as OP's you basically need to have only zero-cost or cost-saving hobbies like cooking and thrifting, and enjoy all aspects of them. I got into cooking at one point but didn't like shopping and wanted to use moderately nice ingredients, so when cooking for my housemates the ingredients (from an expensive grocery store through Instacart) came out to $18/serving. A basic car is also super useful, bay area or not.

I am probably one of the people OP mentions, with a bunch of financial anxiety despite being able to save close to $100k/year, but this is largely due to a psychological block keeping me from investing most of my money.

Reply
p.b.'s Shortform
Thomas Kwa8d20

I wouldn't take one or two datapoints on a single benchmark too seriously, especially with a methodology as fiddly as time horizon and concerns like Ryan's. Nevertheless seems like a good thought that you replicated using time estimates from commit data, as the original difficulty estimates seemed likely to be noisy. I'll be interested to see if the trend continues and if the same is currently true with OSWorld (Looks like they had a big update so maybe it's possible to get individual task data now.)

Reply
Wei Dai's Shortform
Thomas Kwa12dΩ482

Agree that your research didn't make this mistake, and MIRI didn't make all the same mistakes as OpenAI. I was responding in context of Wei Dai's OP about the early AI safety field. At that time, MIRI was absolutely being uncooperative: their research was closed, they didn't trust anyone else to build ASI, and their plan would end in a pivotal act that probably disempowers some world governments and possibly ends up with them taking over the world. Plus they descended from a org whose goal was to build ASI before Eliezer realized alignment should be the focus. Critch complained as late as 2022 that if there were two copies of MIRI, they wouldn't even cooperate with each other.

It's great that we have the FLI statement now. Maybe if MIRI had put more work into governance we could have gotten it a year or two earlier, but it took until Hendrycks got involved for the public statements to start.

Reply
Wei Dai's Shortform
Thomas Kwa13d*Ω71618

We absolutely do need to "race to build a Friendly AI before someone builds an unFriendly AI". Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.

Disagree, the fact that there needs to be a friendly AI before an unfriendly AI doesn't mean building it should be plan A, or that we should race to do it. It's the same mistake OpenAI made when they let their mission drift from "ensure that artificial general intelligence benefits all of humanity" to being the ones who build an AGI that benefits all of humanity.

Plan A means it would deserve more resources than any other path, like influencing people by various means to build FAI instead of UFAI.

Reply
Wei Dai's Shortform
Thomas Kwa13dΩ473

Also mistakes, from my point of view anyway

  • Attracting mathy types rather than engineer types, resulting in early MIRI focusing on less relevant subproblems like decision theory, rather than trying lots of mathematical abstractions that might be useful (e.g. maybe there could have been lots of work on causal influence diagrams earlier). I have heard that decision theory was prioritized because of available researchers, not just importance.
  • A cultural focus on solving the full "alignment problem" rather than various other problems Eliezer also thought to be important (eg low impact), and lack of a viable roadmap with intermediate steps to aim for. Being bottlenecked on deconfusion is just cope, better research taste would either generate a better plan or realize that certain key steps are waiting for better AIs to experiment on
  • Focus on slowing down capabilities in the immediate term (e.g. plans to pay ai researchers to keep their work private) rather than investing in safety and building political will for an eventual pause if needed
Reply
Some data from LeelaPieceOdds
Thomas Kwa14d40

As a child I read everything I could get my hands on! Mostly a couple of Silman's books. The appeal to me was quantifying and systematizing strategy, not chess itself (which I bounced off in favor of sports and math contests). E.g. the idea of exploiting imbalances, or planning by backchaining, or some of the specific skills like putting your knights in the right place.

I found these more interesting than Go books in this respect, both due to Silman's writing style and because Go is such a complicated game filled with exceptions that Go books get bogged down in specifics.

Reply
Some data from LeelaPieceOdds
Thomas Kwa15d40

I'm not a chess player (have played maybe 15 normal games of chess ever) and tried playing LeelaPieceOdds on the BBNN setting. When LeelaQueenOdds was released I'd lost at Q odds several times before giving up; this time it was really fun! I played nine times and stalemated it once before finally winning, taking about 40 minutes. My sense is that information I've absorbed from chess books, chess streamers and the like was significantly helpful, e.g. avoid mistakes, try to trade when ahead in material, develop pieces, keep pieces defended.

I think the lesson is that a superhuman search over a large search space is much more powerful than a small one. With BBNN odds, Leela only has a queen and two rooks and after sacrificing some material to solidify and trade one of them, I'm still up 7 points and Leela won't enough material to miraculously slip out of every trade until I blunder. By an endgame of say, KRNNB vs KR there are only a small number of possible moves for Leela and I can just check that I'm safe against each one until I win. I'd probably lose when given QN or QR, because Leela having two more pieces would increase the required ratio of simplifications to blunders.

Reply2
Load More
61Claude, GPT, and Gemini All Struggle to Evade Monitors
Ω
3mo
Ω
3
88METR: How Does Time Horizon Vary Across Domains?
4mo
8
70Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
6mo
22
115Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
1y
35
37The murderous shortcut: a toy model of instrumental convergence
Ω
1y
Ω
0
12Goodhart in RL with KL: Appendix
Ω
1y
Ω
0
62Catastrophic Goodhart in RL with KL penalty
Ω
2y
Ω
10
38Is a random box of gas predictable after 20 seconds?
Q
2y
Q
35
66Will quantum randomness affect the 2028 election?
Q
2y
Q
52
79Thomas Kwa's research journal
Ω
2y
Ω
1
Load More