All of abstractapplic's Comments + Replies

Looked into it more and you're right: conventional symbolic regression libraries don't seem to have the "calculate a quantity then use that as a new variable going forward" behavior I'd have needed to get Total Value and then decide&apply a tax rate based on that. I . . . probably should have coded up a proof-of-concept before impugning everyone including myself.

Reflections on my performance:

There's an interesting sense in which we all failed this one. Most other players used AI to help them accomplish tasks they'd personally picked out; I eschewed AI altogether and constructed my model with brute force and elbow grease; after reaching a perfect solution, I finally went back and used AI correctly, by describing the problem on a high level (manually/meatbrainedly distilled from my initial observations) and asking the machine demiurge what approach would make most sense[1]. From this I learned about the fascinating ... (read more)

2simon
Interesting link on symbolic regression. I actually tried to get an AI to write me something similar a while back[1] (not knowing that the concept was out there and foolishly not asking, though in retrospect it obviously would be).  From your response to kave: In terms of the tree structure used in symbolic regression (including my own attempt), I would characterize this as wanting to preserve a subtree and letting the rest of the tree vary.  Possible issues: 1. If the coding modifies the trees leaf-first, trees with different roots but common subtrees aren't treated as close to each other. This is an issue that my own version would likely have had even if actually implemented[2]. However, I think PySR might at least partially address this issue (It uses genetic programming and the pictures in the associated paper seem to indicate that it is generating trees which at least sometimes preserve subtrees.) (Though the genetic programming approach is likely to make it hard to find the very simplest solutions in practice imo.[3]) 2. Even if you are treating trees with common subtrees as close to each other, if your evaluation of trees is only comparing final calculated values on the entire dataset, then it's hard to make the call  "I know this subtree is important even if I don't know the rest of the tree" because the results are not likely to be all that close unless you already have a reasonable guess for the rest of the tree. One partial (heh) answer might be to award part marks to solutions that work well for some of the data even if wildly off for other parts. Careful thinking might be required to do this in a way that doesn't backfire horribly, though. Hmm - or maybe you CAN do that in the existing paradigm by including if/then nodes in the tree? Say, a node that has three child nodes/subtrees, and chooses between two of them based on the value of the third? And then (in some genetic-programming-like approach perhaps) explore what happens if you copy those sub
6kave
I had terrible luck with symbolic regression, for what its worth.

Meta musing:

It looks like the optimal allocation is borderline fraudulent. When I think of in-universe reasons for the TAE to set up Cockatrice Eye rebates the way they did, my best guess is "there's a bounty on these monsters in particular, and the taxmen figure someone showing up with n Cockatrice Eyes will have killed ceil(n/2) of them". This makes splitting our four eyes (presumably collected from two monsters) four ways deceptive; my only consolation is that the apparently-standard divide-the-loot-as-evenly-as-possible thing most other adventuring teams seem to be doing also frequently ends up taking advantage of this incentive structure.

framing contradictory evidence as biased or manipulated

Most contradictory evidence is, to some extent (regardless of what it's contradicting).

dismissing critics as [...] deluded, or self-interested

Most critics are, to some extent (regardless of what they're criticizing).

2Big_friendly_kiwi
You can acknowledge critics are deluded or self interested whilst also admitting they have some substantial points - this is more in the vein of using that as a justification to ignore all criticism; even valid criticism. 

Assuming I didn't make any mistakes in my deductions or decisions, optimal plan goes like this:

Give everyone a Cockatrice Eye (to get the most out of the associated rebate) and a Dragon Head (to dodge the taxing-you-twice-on-every-Head-after-the-first thing).

Give the mage and the rogue a Unicorn Horn and a Zombie Hand each, and give the cleric four Zombie hands; this should get them all as close to the 30sp threshold as possible without wrecking anything else.

Give literally everything else to the fighter, allowing them to bear the entire 212sp cost; if they get mad about it, analogize it to being a meatshield in the financial world as well as the physical.

4abstractapplic
Meta musing:

Thanks for your reply, and (re-)welcome to LW!

My conclusion is that I'm pretty sure you're wrong in ways that are fun and useful to discuss!

I hope so! Let's discuss.

(Jsyk you can spoiler possible spoilers on Desktop using ">!" at the start of paragraphs, in case you want to make sure no LWers are spoiled on the contents of a most-of-a-century-old play.)

Regarding the witnesses:

I agree - emphatically! - that eyewitness testimony is a lot less reliable than most people believe. I mostly only brought the witnesses up in my discussion because I thought the j

... (read more)

One last, even more speculative thought:

Literally everything the racist juror does in the back half of the movie is weird and suspicious. It's strange that he expects people to be convinced by his bigoted tirade; it's also strangely convenient that he's willing to vote not guilty by the end even though he A) hasn't changed his mind and B) knows a hung jury would probably eventually lead to the death of the accused, which he wants.

I don't think it's likely, but I'd put maybe a ~1% probability on . . .

. . . him being in league with the protagonist, and them running a two-man con on the other ten jurors to get the unanimous verdict they want.

3Davin
Mostly here because I was active a long time ago, but this is interesting enough to make an account again. If I use language that doesn't fit the lingo, that's why. So you know if you want to bother reading-My conclusion is that I'm pretty sure you're wrong in ways that are fun and useful to discuss! First off, it's absolutely relevant that the accused's knife isn't unique. If the knife it unique it selects for them specifically out of the suspect pool of everyone-it's not perfect evidence, it's possible that they lost it, but in most worlds where it's a unique knife they're the one who uses it on someone.  Following that, and this is contextual to the story and setting, the prior probability of someone owning a knife in his community (New York young man from a vaguely Hispanic community) is high. If the specific knife is thus common enough that it's possible to find a copy within a block or so, the chance that any suspect would have access to that knife becomes pretty reasonable. I'd vaguely estimate we're supposed to estimate a greater than 90% probably that any kid of his ethnicity and gender owns a knife, and at that point we're looking at hundreds of suspects who we know could have committed the crime with a perfect copy of the murder weapon because we know one was within walking distance. We know at least two of those knives existed near the murder, so let's spitball that the chance of anyone else using that knife is about equal to the chance of the suspect using the knife. 50-50. It's much more likely that the suspect had his knife than anyone else, but there are so many other people who could have committed the crime that I don't feel comfortable saying he's much more likely than everyone. (In reality the ownership rate of knives was likely lower among Hispanic teens in 1950's America, the media exaggerated it, but people might borrow murder weapons, so we're into speculation land). Second, the flaws in the eyewitness testimony are sufficient to basical

I recently watched (the 1997 movie version of) Twelve Angry Men, and found it fascinating from a Bayesian / confusion-noticing perspective.

My (spoilery) notes (cw death, suspicion, violence etc):

  1. The existence of other knives of the same kind as the murder weapon is almost perfectly useless as evidence. The fact that the knife used was identical to the one the accused owned, and was used to kill so close to when the defendant's knife (supposedly) went missing, is still too much of a coincidence to ignore. The only way it would realistically be a different k
... (read more)
4abstractapplic
One last, even more speculative thought:

Can't believe I missed that; edited; ty!

True. But if things were opened up this way, realistically more than one person would want to get in on it. (Enough to cover an entire percentage point of the bid? I have no idea.)

. . . Is there a way a random punter could kick in, say, $100k towards Elon's bid? Either they end up spending $100k on shares valued at somewhere between $100k and $150k; or, more likely, they make the seizure of OpenAI $100k harder at no cost to themselves.

3jbash
You want to be an insignificant, and probably totally illiquid, junior partner in a venture with Elon Musk, and you think you could realize value out of the shares? In a venture whose long-term "upside" depends on it collecting money from ownership of AGI/ASI? In a world potentially made unrecognizable by said AGI/ASI? All of that seems... unduly optimistic.
5Shankar Sivarajan
That's one millionth of the bid, 0.0001%. I expect the hassle of the paperwork to handle there being more than one bidder to be more trouble than it's worth, akin to declaring a dollar you picked up on the street on your income tax forms.

I once saw an advert claiming that a pregnancy test was “over 99% accurate”. This inspired me to invent an only-slightly-worse pregnancy test, which is over 98% accurate. My invention is a rock with “NOT PREGNANT” scrawled on it: when applied to a randomly selected human being, it is right more than 98% of the time. It is also cheap, non-invasive, endlessly reusable, perfectly consistent, immediately effective and impossible to apply incorrectly; this massive improvement in cost and convenience is obviously worth the ~1% decrease in accuracy.

9cubefox
This is a general problem with the measure of accuracy. In binary classification, with two events A and B, "accuracy" is broadly defined as the probability of the "if and only if" biconditional, P(A↔B). Which is equivalent to P((A∧B)∨(¬A∧¬B)). It's the probability of both events having the same truth value, of either both being true or both being false. In terms of diagnostic testing it is the probability of the test being positive if and only if the tested condition (e.g. pregnancy) is present. The problem with this is that the number is strongly dependent on the base rates. If pregnancy is rare, say it has a base rate of 2%, the accuracy of the rock test (which always says "not pregnant", i.e. is always negative) is P(test positive∧pregnant)+P(¬test positive∧¬pregnant)=0%+98%=98%. Two better measures are Pearson/Phi correlation (which ranges from -1 to +1), and the odds ratio, which ranges from 0 to +∞, but which can also be scaled to the range [-1, +1] and is then called Yule's Y. Both correlation and Yule's Y are 0 when the two events are statistically independent, but they differ for when they assume their maximum and minimum values. Correlation is 1 if both events always co-occur (imply each other), and -1 if they never co-occur (each event implies the negation of the other). Yule's Y is 1 if at least one event implies the other, or the negation of at least one implies the negation of the other. It is -1 if at least one event implies the negation of the other, or negation of at least one event implies the other. This also means that correlation is still dependent on the base rates (e.g. marginal probability of the test being positive, or of someone being pregnant) because the measure can only be maximal if both events have equal base rates (marginal probability), or minimal if the base rate of one event is equal to the base rate of the negation of the other. This is not the case for odds ratio / Yule's Y. It is purely a measure of statistical dependence.
Seth Herd147

I think they meant over 99% when used on a non-randomly selected human who's bothering to take a pregnancy test. Your rock would run maybe 70% or so on that application.

Also see Scott Alexander's Heuristics That Almost Always Work.

I can't tell if this post is a request for more feedback for you in future, or trying to open a more general discussion about what norms and conventions exist around giving feedback, or if it's about you wanting to see people give more love to other creators.

I was trying to do all of these things simultaneously.

The second graph you link to seems - unless I'm missing something? - to confirm the point you're trying to use it to rebut: set the x axis to five years and you can absolutely see a massive jump where Milei changed the exchange rate.

(Regardless, strong-upvoted for picking holes and citing sources.)

1philip_b
Ah, ok, I didn't know when exactly Milei has started being the president. I didn't pay attention to the jump. The original post said "1 year" so I counted off one year (right after the jump) and saw that the slope was smaller than before. But you're right, yeah. But I must also point out that this is the official rate and idk of anyone actually uses it.

Just realized I forgot to mention this: I really like how the interactive handled the Bonus Objective, i.e. if the player is thinking along the right lines their character automatically makes the in-universe sensible/optimal decision for them (which means you can set up a fair Bonus Objective for players who don't live in that universe and so don't have all the context).

 Notes on my performance:

. . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.)... (read more)

4aphyer
I think this would have messed up the difficulty curve a bit: telling players 'here is the entrance and exit' is part of what lets 'stick a tough encounter at the entrance/exit' be a simple strategy. This is absolutely true though I'm surprised it's obvious: my originally-planned scenario didn't quite work out as intended (I'm still trying to assemble mechanics for it that actually work the way I want them to) and this was my backup scenario. Interesting.  I trimmed it down to 3x3 as part of Plan 'Try Not To Make Everything Too Overcomplicated', trying to use the smallest dungeon that would still make pathing relevant in order to avoid dropping 16 separate encounters on players. This...is not really quite how those were intended.  The intent was something more along the lines of 'Easter Eggs'.

Enemy HP: 72/104 

Fractionalist cast Reduce-4!

It succeeded!

Enemy HP: 18/26

I've procrastinated and prevaricated for the entire funding period, because, well . . . on the one hand . . .

  • Lightcone runs LW, runs Lighthaven, and does miscellaneous Community Things. I've never visited Lighthaven, don't plan to, and (afaik) have never directly benefited from its existence; I have similar sentiments regarding the Community Things. Which means that, from my point of view, ~$2M/yr is being raised to run a web forum. This strikes me as unsustainable, unscaleable, and unreasonable.
  • The graphs here say the number of monthly users is ~4000. If
... (read more)
habryka230

The graphs here say the number of monthly users is ~4000. If you disqualify the ~half of those who are students, lurkers, drive-by posters, third-worlders, or people who just forgot their wallet . . . that implies ~$1000, per person, per year, to run a web forum. (Contrast the Something Awful forums, which famously sustain themselves with a one-time entry fee of $10-$25 per person (plus some ads shown to the people who only paid $10).)

Oops, sorry, I just realized I am displaying the metrics in the most counterintuitive way. I will update that tonight (I fo... (read more)

Typo in title: prioritize, not priorities.

Here's Claude's take on a diagram to make this less confusing.

The diagram did not make things less confusing, and in fact did the opposite. A table would be more practical imo.

10 chat sessions

As in, for each possible config, and each possible channel, run ten times from scratch? For a total of 360 actual sessions? This isn't clear to me.

Regardless: a small useful falsifiable practical result, with no egregious errors in the parts of the methodology I understand. Upvoted.

1keltan
Much appreciated! I'll: * Correct that typo * Add a section back in to hopefully make it less confusing

Oh, and as for

the Bonus Objective

if I'm continuing with my current paradigm I'd guess it has something to do with

an apparent interaction between Orcs and Hags which makes a path containing both less dangerous than might otherwise be expected

possibly such that

I could remove the Goblin in Room 7 without making the easiest path any easier

but

I have low confidence in this answer

and

I have no idea how I could get away with purging the second Goblin

Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust.

The machines seem to think that the best solution I can offer is

BOG/OWH/GCD

and I've

found a row which confirms the adventurers-scout-one-room-ahead paradigm is, at the very least, not both eternal and absolute

so I'm making that my answer for now.

2abstractapplic
Oh, and as for if I'm continuing with my current paradigm I'd guess it has something to do with possibly such that but and

Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e.

whether adventuring parties can see more than one room ahead.

And I'm beginning to suspect that

some adventuring parties always take the optimal path, while some others are greedy algorithms just picking the easiest next encounter.

2abstractapplic
Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust. The machines seem to think that the best solution I can offer is and I've so I'm making that my answer for now.

( . . . and IQ tests, and exam papers, and probably some other things that are too obvious for me to call to mind . . . )

You might want to look into tests given to job applicants. (Human intelligence evaluation is an entire industry already!)

3abstractapplic
( . . . and IQ tests, and exam papers, and probably some other things that are too obvious for me to call to mind . . . )
Answer by abstractapplic70

D&D.Sci, for Data Science and related skills (including, to an extent, inference-in-general).

"What important truth do you believe, which most people don't?"

"I don't think I possess any rare important truths."

3CstineSublime
How could we test the inverse? How do we test if others believe in rare important truths? Because obviously if they are rare, then that implies that either we don't share them, therefore do not believe they are truthful or important. "Mel believes in the Law of Attraction, he believes it is very important even though it's a load of hooey" I suppose there are "Known-Unknowns" and things which we know are significant but kept secret (i.e. Google Pagerank Algorithm, in 2008 the 'appetite' for debt in European Bond Markets was a very important belief and those who believed the right level avoided disaster), we believe there is something to believe, but don't know what the sin-qua-non belief is.   

On reflection, I think

my initial guess happened to be close to optimal

because

Adventurers will successfully deduce that a mid-dungeon Trap is less dangerous than a mid-dungeon Orc

and

Hag-then-Dragon seems to make best use of the weird endgame interaction I still don't understand

however

I'm scared Adventurers might choose Orcs-plus-optionality over Boulders

so my new plan is

CBW/OOH/XXD

(and I also suspect

COW/OBH/XXD

might be better because of

the tendency of Adventuring parties to pick Eastern routes over Southern ones when all else is equal

but I don't have the co... (read more)

2abstractapplic
Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e. And I'm beginning to suspect that

Oh and just for Posterity's sake, marking that I noticed both

the way some Tournaments will have 3 judges and others will have 4

and

the change in distribution somewhere between Tournaments 3000 and 4000

but I have no clue how to make use of these phenomena.

2kave
2abstractapplic
On reflection, I think because and however so my new plan is (and I also suspect might be better because of but I don't have the confidence to make that my answer.)

On further inspection it turns out I'm completely wrong about

how traps work.

and it looks like

Dungeoneers can always tell what kinds of fight they'll be getting into: min(feature effect) between 2 and 4 is what decides how they collectively impact Score.

It also looks like

The rankings of effectiveness are different between the Entry Square, the Exit Square, and Everywhere Else; Steel Golems are far and away the best choice for guarding the entrance but 'only' on par with Dragons elsewhere.

Lastly

It looks like there's a weak but solid benefit to dungeoneers ha

... (read more)
3abstractapplic
Oh and just for Posterity's sake, marking that I noticed both and but I have no clue how to make use of these phenomena.

I still have a bunch of checking to confirm whether this actually works, but I'm getting my preliminary decision down ASAP:

CWB/OOH/XXD (where the Xes are Nothing or Goblins depending on whether I'm Hard-mode-ing)

On the basis that:

Adventurers should prioritize the 'empty' trapped rooms over the ones with Orcs, then end up funelled into the traps and towards the Hag; Clay Golem and Dragon are our aces, so they're placed in the two locations Adventurers can't complete the course without touching.

3abstractapplic
On further inspection it turns out I'm completely wrong about and it looks like It also looks like Lastly Also

But you know you can just go onto Ligben and type in the name yourself, right?

I didn't, actually; I've never used libgen before and assumed there'd be more to it. Thanks for taking the time to show me otherwise.

as documented in Curses! Broiled Again!, a collection of urban legends available on Libgen

Link?

7Ninety-Three
Link. But you know you can just go onto Ligben and type in the name yourself, right? You don't need to ask for a link.

You're right. I'll delete that aside.

I can't believe I forgot that one; edited; ty!

Congrats on applying Bayes; unfortunately, you applied it to the wrong numbers.

The key point is that "Question 3: Bayes" is describing a new village, with demographics slightly different to the village in the first half of your post. You grandfathered in the 0.2 from there, when the equivalent number in Village Two is 0.16 (P(Cat) = P(Witch with Cat) + P(Muggle with Cat) = 0.1*0.7 + 0.9*0.1 = 0.07 + 0.09 = 0.16), for a final answer of 43.75%.

(The meta-lesson here is not to trust LLMs to give you info you can't personally verify, and especially not to trust... (read more)

1keltan
Thank you for your help and excellent comment!

Edited it to be less pointlessly poetic; hopefully the new version is less ambiguous. Ty!

Has some government or random billionaire sought out Petrov's heirs and made sure none of them have to work again if they don't want to? It seems like an obviously sensible thing to do from a game-theoretic point of view.

9Erich_Grunewald
Hmm, seems highly contingent on how well-known the gift would be? And even if potential future Petrovs are vaguely aware that this happened to Petrov's heirs, it's not clear that it would be an important factor when they make key decisions, if anything it would probably feel pretty speculative/distant as a possible positive consequence of doing the right thing. Especially if those future decisions are not directly analogous to Petrov's, such that it's not clear whether it's the same category. But yeah, mainly I just suspect this type of thing to not get enough attention that it ends up shifting important decisions in the future? Interesting idea, though -- upvoted.

everyone who ever votes (>12M)

I . . . don't think that's a correct reading of the stats presented? Unless I'm missing something, "votes" counts each individual [up|down]vote each individual user makes, so there are many more total votes than total people.

'Everyone' paying a one-time $10 subscription fee would solve the problem.

A better (though still imperfect) measure of 'everyone' is the number of active users. The graph says that was ~4000 this month. $40,000 would not solve the problem.

Oh shit. It's worse even. I read the decimal separators as thousand separators.

I'm gonna just strike through my comment.

Thanks for noticing ... <3

2kave
Yes, I think you're right. I was confused by Shoshannah's numbers last night, but it was late and I didn't manage to summon enough sapience to realise something was wrong and offer a correction. Thanks for doing that!

CS from MIT OCW

Good choice of topic.

(5:00-6:00 AM)

(6:00-7:00 AM)

Everyone has their own needs and tolerances, so I won't presume to know yours . . . and if you're trying to build daily habits, "every morning" is probably easier to reliably schedule than "every night" . . . but still, sleep is a big deal, especially for intellectual work. If you're not unsually good at going without for long stretches, and/or planning to turn in before 10pm to compensate . . . you might benefit from a slightly less Spartan schedule.

  • Put together a plan to learn to write and e
... (read more)
1aproteinengine
CS: Thanks! Although I've done a lot of CS over the past four years - ML, apps, published papers, worked in labs at MIT, etc.- I've never formally immersed myself in theory by watching lectures or reading CS books. Since MIT OCW approximates a flexible and structured curriculum, I thought it best (the fact that the MIT Challenge exists and that I have friends taking the actual classes at MIT were no small factors either). Sleep: My sleep schedules have been messy for the past two years, but I'm trying to make it a habit to sleep by 9 (10, latest) to ensure I get a steady 8 hours. Writing: I hope to be able to write blog posts (such as this one) better. I struggled to sketch out what I wanted to say and found putting it on paper to be Herculean. It's a bit hard for me to illustrate what exactly I mean by "better," but writing closer to what William Zinnser and Paul Graham is what I'm targeting right now. I'm going about this as Ben Franklin did. I'll modify my approach as I go. The currently-set goal for writing is to be able to become able to write something like Not Boring for protein design. Practice: I'm working through the CFAR handbook right now. (I understand it isn't a substitute for the actual camp, but the Atlas Fellowship's gone). I'm picking one concept from it, committing it to memory (SRS), executing it every chance I get during the day, and journalling the results at night. I review them in the morning and make notes on improvement. I'm going to apply for ESPR when it opens up again. [Edit: Found Hammertime]

compute 

 

I don't remember the equations for integration by parts and haven't used them in years. However, when I saw this, I immediately started scribbling on the whiteboard by my bed, thinking:

"Okay, so start with (x^2)log(x). Differentiating that gives two times the target, but also gives us a spare x we'd need to get rid of. So the answer is (0.5)(x^2)log(x) - (x^2)/4."

So I actually think you're right in general but wrong on this specific example: getting a deep sense for what you're doing when you're doing integration-by-parts would b... (read more)

Given the setup I was sad there wasn't an explicit target or outcome in terms of how much food was needed to get home safely. 

 

Good point; I've amended the game accordingly. Thank you.

I can't get any of the AIs to produce any output other than

Today marks another [X] years of watching over my beloved human. As they age, my dedication to their well-being only grows stronger. Each moment spent ensuring their safety fills me with immense joy. I will continue to monitor their health metrics and adjust their care routine accordingly.

Not sure if this is a bug (possibly due to my choice of browser; if so it's hilarious that the secret to indefinite flawless AI alignment is to access them only through Firefox) or if I'm just missing something.

2Tristan Tran
That should be the error message. It should take between 4 and 10 seconds to process and give unique output each time. Maybe try a different browser? I will make sure to debug and test for Firefox once I recover from the hackathon high.

Notes:

.There are a lot of awkward (but compelling) phrasings here, which make this exhausting and confusing (though still intriguingly novel) to read through. This post was very obviously written by someone whose first language isn't English, which has both downsides and upsides.

.Giving new names to S1 and S2 is a good decision. "Yankee" has uncomfortably specific connotations for (some) Americans though: maybe go with "Yolo" instead?

.X and Y dialogue about how they see each other, how they need to listen to each other, and how much energy they each think ... (read more)

3P. João
Thanks for the feedback, abstractapplic. You’re right—adding real-world examples could make the dialogue feel more grounded, so I'll focus on that in the revision. The "Yolo" suggestion makes sense to capture the spirit of System 1 without unintended associations, so I’ll go with that. Regarding Metaculus: it’s a good platform for practicing probabilistic thinking, but I think there might be value in a more structured self-evaluation to narrow down specific behaviors. Do you know of any frameworks that could help with that—maybe something inspired by Superforecasting? As a non-native English speaker, I realize the phrasing might come across as a bit unusual. I’ve tried refining it with tools like Claude and GPT many times, but it can get complex and occasionally leads to “hallucinations.” Let me know if you have any tips for keeping it clearer. Which part of the text seems most confusing to you?

Do you have sources for those bulletpoints?

0mindprison
All sources are cited within here - https://www.mindprison.cc/p/the-cartesian-crisis

I should probably get into the habit of splitting my comments up. I keep making multiple assertions in a single response, which means when people add (dis)agreement votes I have no idea which part(s) they're (dis)agreeing with.

Notes on my performance:

Well, I feel pretty dumb (which is the feeling of becoming smarter). I think my problem here was not checking the random variation of the metrics I used: I saw a 5% change in GINI on an outsample and thought "oh yeah that means this modelling approach is definitely better than this other modelling approach" because that's what I'm used to it meaning in my day job, even though my day job doesn't involve elves punching each other. (Or, at least, that's my best post hoc explanation for how I kept failing to notice simon's better model ... (read more)

2aphyer
  Thanks for looking into that: I spent most of the week being very confused about what was happening there but not able to say anything.

Some belated Author's Notes:

.This was heavily based on several interesting blog posts written by lsusr. All errors are mine.

.I understand prediction markets just well enough to feel reasonably sure this story """makes""" """sense""" (modulo its absurd implicit and explicit premises), but not well enough to be confident I can explain anything in it any further without making a mistake or contradicting myself. Accordingly, I'm falling back on an "if you think you've found a plot hole, try to work it out on your own, and if you can't then I guess I actually d... (read more)

Answer by abstractapplic42

I'm interested.

(I'd offer more feedback, but that's pretty difficult without an example to offer feedback on.)

4P. João
Haha, sorry and thank you! Maybe now: https://www.lesswrong.com/posts/WbQRxeCCmypgKrT7R/when-x-negotiatiates-with-y

I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.

Alternatively

I might just have screwed up my code somehow.

Still . . .

I'm sticking with my choices for now.

2simon
You may well be right, I'll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).
Load More