Jay Bailey

Wikitag Contributions

Comments

Sorted by

I do remember when I was learning poker strategy, and I learned about the idea of wanting to get your entire stack in the middle preflop if you can get paid off with AA - was a very fundamental lesson to young me! That said, there's a key insight that goes along with the "pocket aces principle" that is missing here, and that's bankroll management.

In poker, there is standard advice for how much money to have in your bankroll before you sit down at a table at all. E.g, for cash games, it's at least 2000 big blinds (20x the largest stack you can buy). This is what allows you to bet all-in on pocket aces - if your entire bankroll is on the table, you should bet more conservatively. The point of bankroll management is to allow you to make the +EV play of putting your entire stack into the middle without caring about the variance when you lose 20% of the time.

To apply this metaphor to real life, you might say something like "Consider how much you're willing to lose in the event things turn out badly (e.g, a year or two on a startup, six months on a relationship) and then, within that amount, bet the house."

There's a counterargument to the AGI hype that basically says - of course the labs would want to hype this technology, they make money that way, just because they say they believe in short timelines doesn't mean it's true. Specifically, the claim here is not that the AI lab CEO's are mistaken, but rather that they are actively lying, and they know AGI isn't around the corner.

What actions have frontier AI labs taken in the last year or two that wouldn't make sense, given the above explanation? Stuff like GDM's merger or OpenAI (reportedly) operating at a massive loss. Ideally these actions would be reported on by entities other than those companies themselves, in order to help convince skeptics. I've definitely seen stuff like this around but I can't remember where and the search terms are too vague.

I've also tried using Deep Research for this, but it doesn't seem to understand the idea of only looking at actions that are far more likely in the non-hype world than the hype-world, talking about things like investor decks projecting high returns that are very compatible with them hyping themselves up.

Significant Digits is (or was, a few years ago) considered the best one, to my recollection.

I bought a month of Deep Research and am open to running queries if people have a few but don't want to spend 200 bucks for them. Will spend up to 25 queries in total.

A paragraph or two of detail is good - you can send me supporting documents via wnlonvyrlpf@tznvy.pbz (ROT13) if you want. Offer is open publicly or via PM.

Having reflected on this decision more, I have decided I no longer endorse those feelings in point B of my second-to-last paragraph. In fact, I've decided that "I donated roughly 1k to a website that provided way more expected value than that to me over my lifetime, and also if it shut down I think that would be a major blow to one of the most important causes in the world" is something to be proud of, not embarrassed by, and something worthy of being occasionally reminded of.

So if you're still sending them out I'd gladly take one after all :)

I've been procrastinating on this, but I heard it was the last day to do this, so here I am. I've utilised LessWrong for years, but am also a notoriously cheap bastard. I'm working on this. That said, I feel I should pay something back, for what I've gotten out of it.

When I was 20 or so, I was rather directionless, and didn't know what I wanted to do in life, bouncing between ideas, never finishing them. I was reading LessWrong at the time. At some point, a LessWrong-ism popped into my head - "Jay - this thing you're doing isn't working. Your interests change faster than you can commit to a career. Therefore, you need a career strategy that does not rely on your interests." This last sentence definitely would not have occurred to me without LessWrong. It felt like a quantitative shift in thinking, that I had finally truly learned a new pattern. Nowadays it seems obvious, and it would be obvious to many of my friends...but back then, I remember that flash of insight, and I've never forgotten it.

I came up with a series of desiderata - something I'd be good at, not hate, and get to work indoors for a reasonable salary. I decided to be an accountant, which is evidence for this whole "One-shot the problem" thing being hard, but wisely pivoted into pursuing a software engineering degree a year later.

While EA was what got me into AI safety, even ignoring the effect LessWrong has had on EA, the skills I decided to learn thanks to LessWrong principles are potentially the only reason I have much of a say in the future at all. Not to mention I've made a pretty solid amount of money out of it.

Considering the amount of value I've gotten out of LessWrong, I'm far too cheap to donate an amount that would be truly "fair", but I wanted to donate a solid amount anyway - an amount that at least justifies the years of use I've gotten out of the site. I talked myself into donating $1,000, but then I realised that A) I didn't want a shirt to affect my donation decisions, and B) I'd be a bit embarassed to have a shirt that symbolises how I donated four figures to a website that has helped me think good. I feel like I'll forget the money easily once I donate it, and it won't affect my day to day life at all. Unless, of course, I have a physical reminder of it.

Thus, I have donated $999 USD to the cause.

Hi Giorgi,

Not an expert on this, but I believe the idea is that over time the agent will learn to assign negligible probabilities to actions that don't do anything. For instance, imagine a game where the agent can move in four directions, but if there's a wall in front of it, moving forward does nothing. The agent will eventually learn to stop moving forward in this circumstance. So you could probably just make it work, even if it's a bit less efficient, if you just had the environment do nothing if an invalid action was selected.

Thanks for this! I've changed the sentence to:

The target network gets to see one more step than the Q-network does, and thus is a better predictor.

Hopefully this prevents others from the same confusion :)

Load More