All of abstractapplic's Comments + Replies

abstractapplic1mo20

Thanks for your reply, and (re-)welcome to LW!

My conclusion is that I'm pretty sure you're wrong in ways that are fun and useful to discuss!

I hope so! Let's discuss.

(Jsyk you can spoiler possible spoilers on Desktop using ">!" at the start of paragraphs, in case you want to make sure no LWers are spoiled on the contents of a most-of-a-century-old play.)

Regarding the witnesses:

I agree - emphatically! - that eyewitness testimony is a lot less reliable than most people believe. I mostly only brought the witnesses up in my discussion because I thought the j

... (read more)

abstractapplic1mo40

One last, even more speculative thought:

Literally everything the racist juror does in the back half of the movie is weird and suspicious. It's strange that he expects people to be convinced by his bigoted tirade; it's also strangely convenient that he's willing to vote not guilty by the end even though he A) hasn't changed his mind and B) knows a hung jury would probably eventually lead to the death of the accused, which he wants.

I don't think it's likely, but I'd put maybe a ~1% probability on . . .

. . . him being in league with the protagonist, and them running a two-man con on the other ten jurors to get the unanimous verdict they want.

3Davin1mo

Mostly here because I was active a long time ago, but this is interesting enough to make an account again. If I use language that doesn't fit the lingo, that's why. So you know if you want to bother reading-My conclusion is that I'm pretty sure you're wrong in ways that are fun and useful to discuss! First off, it's absolutely relevant that the accused's knife isn't unique. If the knife it unique it selects for them specifically out of the suspect pool of everyone-it's not perfect evidence, it's possible that they lost it, but in most worlds where it's a unique knife they're the one who uses it on someone. Following that, and this is contextual to the story and setting, the prior probability of someone owning a knife in his community (New York young man from a vaguely Hispanic community) is high. If the specific knife is thus common enough that it's possible to find a copy within a block or so, the chance that any suspect would have access to that knife becomes pretty reasonable. I'd vaguely estimate we're supposed to estimate a greater than 90% probably that any kid of his ethnicity and gender owns a knife, and at that point we're looking at hundreds of suspects who we know could have committed the crime with a perfect copy of the murder weapon because we know one was within walking distance. We know at least two of those knives existed near the murder, so let's spitball that the chance of anyone else using that knife is about equal to the chance of the suspect using the knife. 50-50. It's much more likely that the suspect had his knife than anyone else, but there are so many other people who could have committed the crime that I don't feel comfortable saying he's much more likely than everyone. (In reality the ownership rate of knives was likely lower among Hispanic teens in 1950's America, the media exaggerated it, but people might borrow murder weapons, so we're into speculation land). Second, the flaws in the eyewitness testimony are sufficient to basical

abstractapplic1mo40

I recently watched (the 1997 movie version of) Twelve Angry Men, and found it fascinating from a Bayesian / confusion-noticing perspective.

My (spoilery) notes (cw death, suspicion, violence etc):

The existence of other knives of the same kind as the murder weapon is almost perfectly useless as evidence. The fact that the knife used was identical to the one the accused owned, and was used to kill so close to when the defendant's knife (supposedly) went missing, is still too much of a coincidence to ignore. The only way it would realistically be a different k

... (read more)

4abstractapplic1mo

One last, even more speculative thought:

Equations Mean Things

abstractapplic1mo20

Can't believe I missed that; edited; ty!

abstractapplic2mo30

True. But if things were opened up this way, realistically more than one person would want to get in on it. (Enough to cover an entire percentage point of the bid? I have no idea.)

abstractapplic2mo315

. . . Is there a way a random punter could kick in, say, $100k towards Elon's bid? Either they end up spending $100k on shares valued at somewhere between $100k and $150k; or, more likely, they make the seizure of OpenAI $100k harder at no cost to themselves.

3jbash2mo

You want to be an insignificant, and probably totally illiquid, junior partner in a venture with Elon Musk, and you think you could realize value out of the shares? In a venture whose long-term "upside" depends on it collecting money from ownership of AGI/ASI? In a world potentially made unrecognizable by said AGI/ASI? All of that seems... unduly optimistic.

5Shankar Sivarajan2mo

That's one millionth of the bid, 0.0001%. I expect the hassle of the paperwork to handle there being more than one bidder to be more trouble than it's worth, akin to declaring a dollar you picked up on the street on your income tax forms.

Some Theses on Motivational and Directional Feedback

abstractapplic2mo240

I once saw an advert claiming that a pregnancy test was “over 99% accurate”. This inspired me to invent an only-slightly-worse pregnancy test, which is over 98% accurate. My invention is a rock with “NOT PREGNANT” scrawled on it: when applied to a randomly selected human being, it is right more than 98% of the time. It is also cheap, non-invasive, endlessly reusable, perfectly consistent, immediately effective and impossible to apply incorrectly; this massive improvement in cost and convenience is obviously worth the ~1% decrease in accuracy.

9cubefox2mo

This is a general problem with the measure of accuracy. In binary classification, with two events A and B, "accuracy" is broadly defined as the probability of the "if and only if" biconditional, P(A↔B). Which is equivalent to P((A∧B)∨(¬A∧¬B)). It's the probability of both events having the same truth value, of either both being true or both being false. In terms of diagnostic testing it is the probability of the test being positive if and only if the tested condition (e.g. pregnancy) is present. The problem with this is that the number is strongly dependent on the base rates. If pregnancy is rare, say it has a base rate of 2%, the accuracy of the rock test (which always says "not pregnant", i.e. is always negative) is P(test positive∧pregnant)+P(¬test positive∧¬pregnant)=0%+98%=98%. Two better measures are Pearson/Phi correlation (which ranges from -1 to +1), and the odds ratio, which ranges from 0 to +∞, but which can also be scaled to the range [-1, +1] and is then called Yule's Y. Both correlation and Yule's Y are 0 when the two events are statistically independent, but they differ for when they assume their maximum and minimum values. Correlation is 1 if both events always co-occur (imply each other), and -1 if they never co-occur (each event implies the negation of the other). Yule's Y is 1 if at least one event implies the other, or the negation of at least one implies the negation of the other. It is -1 if at least one event implies the negation of the other, or negation of at least one event implies the other. This also means that correlation is still dependent on the base rates (e.g. marginal probability of the test being positive, or of someone being pregnant) because the measure can only be maximal if both events have equal base rates (marginal probability), or minimal if the base rate of one event is equal to the base rate of the negation of the other. This is not the case for odds ratio / Yule's Y. It is purely a measure of statistical dependence.

Seth Herd2mo147

I think they meant over 99% when used on a non-randomly selected human who's bothering to take a pregnancy test. Your rock would run maybe 70% or so on that application.

MondSemmel2mo165

Also see Scott Alexander's Heuristics That Almost Always Work.

abstractapplic3mo30

I can't tell if this post is a request for more feedback for you in future, or trying to open a more general discussion about what norms and conventions exist around giving feedback, or if it's about you wanting to see people give more love to other creators.

I was trying to do all of these things simultaneously.

Notes on Argentina

abstractapplic3mo31

The second graph you link to seems - unless I'm missing something? - to confirm the point you're trying to use it to rebut: set the x axis to five years and you can absolutely see a massive jump where Milei changed the exchange rate.

(Regardless, strong-upvoted for picking holes and citing sources.)

1philip_b3mo

Ah, ok, I didn't know when exactly Milei has started being the president. I didn't pay attention to the jump. The original post said "1 year" so I counted off one year (right after the jump) and saw that the slope was smaller than before. But you're right, yeah. But I must also point out that this is the official rate and idk of anyone actually uses it.

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset

Just realized I forgot to mention this: I really like how the interactive handled the Bonus Objective, i.e. if the player is thinking along the right lines their character automatically makes the in-universe sensible/optimal decision for them (which means you can set up a fair Bonus Objective for players who don't live in that universe and so don't have all the context).

abstractapplic4mo81

Notes on my performance:

. . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.)... (read more)

4aphyer4mo

I think this would have messed up the difficulty curve a bit: telling players 'here is the entrance and exit' is part of what lets 'stick a tough encounter at the entrance/exit' be a simple strategy. This is absolutely true though I'm surprised it's obvious: my originally-planned scenario didn't quite work out as intended (I'm still trying to assemble mechanics for it that actually work the way I want them to) and this was my backup scenario. Interesting. I trimmed it down to 3x3 as part of Plan 'Try Not To Make Everything Too Overcomplicated', trying to use the smallest dungeon that would still make pathing relevant in order to avoid dropping 16 separate encounters on players. This...is not really quite how those were intended. The intent was something more along the lines of 'Easter Eggs'.

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

abstractapplic4mo60

Enemy HP: 72/104

Fractionalist cast Reduce-4!

It succeeded!

Enemy HP: 18/26

abstractapplic4mo3513

I've procrastinated and prevaricated for the entire funding period, because, well . . . on the one hand . . .

Lightcone runs LW, runs Lighthaven, and does miscellaneous Community Things. I've never visited Lighthaven, don't plan to, and (afaik) have never directly benefited from its existence; I have similar sentiments regarding the Community Things. Which means that, from my point of view, ~$2M/yr is being raised to run a web forum. This strikes me as unsustainable, unscaleable, and unreasonable.
The graphs here say the number of monthly users is ~4000. If

... (read more)

habryka4mo230

The graphs here say the number of monthly users is ~4000. If you disqualify the ~half of those who are students, lurkers, drive-by posters, third-worlders, or people who just forgot their wallet . . . that implies ~$1000, per person, per year, to run a web forum. (Contrast the Something Awful forums, which famously sustain themselves with a one-time entry fee of $10-$25 per person (plus some ads shown to the people who only paid $10).)

Oops, sorry, I just realized I am displaying the metrics in the most counterintuitive way. I will update that tonight (I fo... (read more)

Does Claude Prioritize Some Prompt Input Channels Over Others?

abstractapplic4mo31

Typo in title: prioritize, not priorities.

Here's Claude's take on a diagram to make this less confusing.

The diagram did not make things less confusing, and in fact did the opposite. A table would be more practical imo.

10 chat sessions

As in, for each possible config, and each possible channel, run ten times from scratch? For a total of 360 actual sessions? This isn't clear to me.

Regardless: a small useful falsifiable practical result, with no egregious errors in the parts of the methodology I understand. Upvoted.

1keltan4mo

Much appreciated! I'll: * Correct that typo * Add a section back in to hopefully make it less confusing

Oh, and as for

the Bonus Objective

if I'm continuing with my current paradigm I'd guess it has something to do with

an apparent interaction between Orcs and Hags which makes a path containing both less dangerous than might otherwise be expected

possibly such that

I could remove the Goblin in Room 7 without making the easiest path any easier

but

I have low confidence in this answer

and

I have no idea how I could get away with purging the second Goblin

Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust.

The machines seem to think that the best solution I can offer is

BOG/OWH/GCD

and I've

found a row which confirms the adventurers-scout-one-room-ahead paradigm is, at the very least, not both eternal and absolute

so I'm making that my answer for now.

2abstractapplic4mo

Oh, and as for if I'm continuing with my current paradigm I'd guess it has something to do with possibly such that but and

What are the most interesting / challenging evals (for humans) available?

Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e.

whether adventuring parties can see more than one room ahead.

And I'm beginning to suspect that

some adventuring parties always take the optimal path, while some others are greedy algorithms just picking the easiest next encounter.

2abstractapplic4mo

Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust. The machines seem to think that the best solution I can offer is and I've so I'm making that my answer for now.

abstractapplic4mo30

( . . . and IQ tests, and exam papers, and probably some other things that are too obvious for me to call to mind . . . )

What are the most interesting / challenging evals (for humans) available?

abstractapplic4mo100

You might want to look into tests given to job applicants. (Human intelligence evaluation is an entire industry already!)

3abstractapplic4mo

( . . . and IQ tests, and exam papers, and probably some other things that are too obvious for me to call to mind . . . )

What are the most interesting / challenging evals (for humans) available?

Answer by abstractapplicDec 27, 202470

D&D.Sci, for Data Science and related skills (including, to an extent, inference-in-general).

abstractapplic4mo50

"What important truth do you believe, which most people don't?"

"I don't think I possess any rare important truths."

3CstineSublime4mo

How could we test the inverse? How do we test if others believe in rare important truths? Because obviously if they are rare, then that implies that either we don't share them, therefore do not believe they are truthful or important. "Mel believes in the Law of Attraction, he believes it is very important even though it's a load of hooey" I suppose there are "Known-Unknowns" and things which we know are significant but kept secret (i.e. Google Pagerank Algorithm, in 2008 the 'appetite' for debt in European Bond Markets was a very important belief and those who believed the right level avoided disaster), we believe there is something to believe, but don't know what the sin-qua-non belief is.

On reflection, I think

my initial guess happened to be close to optimal

because

Adventurers will successfully deduce that a mid-dungeon Trap is less dangerous than a mid-dungeon Orc

and

Hag-then-Dragon seems to make best use of the weird endgame interaction I still don't understand

however

I'm scared Adventurers might choose Orcs-plus-optionality over Boulders

so my new plan is

CBW/OOH/XXD

(and I also suspect

COW/OBH/XXD

might be better because of

the tendency of Adventuring parties to pick Eastern routes over Southern ones when all else is equal

but I don't have the co... (read more)

2abstractapplic4mo

Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e. And I'm beginning to suspect that

abstractapplic5mo30

Oh and just for Posterity's sake, marking that I noticed both

the way some Tournaments will have 3 judges and others will have 4

and

the change in distribution somewhere between Tournaments 3000 and 4000

but I have no clue how to make use of these phenomena.

2kave4mo

2abstractapplic5mo

On reflection, I think because and however so my new plan is (and I also suspect might be better because of but I don't have the confidence to make that my answer.)

abstractapplic5mo30

On further inspection it turns out I'm completely wrong about

how traps work.

and it looks like

Dungeoneers can always tell what kinds of fight they'll be getting into: min(feature effect) between 2 and 4 is what decides how they collectively impact Score.

It also looks like

The rankings of effectiveness are different between the Entry Square, the Exit Square, and Everywhere Else; Steel Golems are far and away the best choice for guarding the entrance but 'only' on par with Dragons elsewhere.

Lastly

It looks like there's a weak but solid benefit to dungeoneers ha

... (read more)

3abstractapplic5mo

Oh and just for Posterity's sake, marking that I noticed both and but I have no clue how to make use of these phenomena.

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)

abstractapplic5mo50

I still have a bunch of checking to confirm whether this actually works, but I'm getting my preliminary decision down ASAP:

CWB/OOH/XXD (where the Xes are Nothing or Goblins depending on whether I'm Hard-mode-ing)

On the basis that:

Adventurers should prioritize the 'empty' trapped rooms over the ones with Orcs, then end up funelled into the traps and towards the Hag; Clay Golem and Dragon are our aces, so they're placed in the two locations Adventurers can't complete the course without touching.

3abstractapplic5mo

On further inspection it turns out I'm completely wrong about and it looks like It also looks like Lastly Also

abstractapplic5mo50

But you know you can just go onto Ligben and type in the name yourself, right?

I didn't, actually; I've never used libgen before and assumed there'd be more to it. Thanks for taking the time to show me otherwise.

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)

abstractapplic5mo10

as documented in Curses! Broiled Again!, a collection of urban legends available on Libgen

Link?

7Ninety-Three5mo

Link. But you know you can just go onto Ligben and type in the name yourself, right? You don't need to ask for a link.

Algebraic Linguistics

You're right. I'll delete that aside.

Algebraic Linguistics

I Finally Worked Through Bayes' Theorem (Personal Achievement)

I can't believe I forgot that one; edited; ty!

abstractapplic5mo133

Congrats on applying Bayes; unfortunately, you applied it to the wrong numbers.

The key point is that "Question 3: Bayes" is describing a new village, with demographics slightly different to the village in the first half of your post. You grandfathered in the 0.2 from there, when the equivalent number in Village Two is 0.16 (P(Cat) = P(Witch with Cat) + P(Muggle with Cat) = 0.1*0.7 + 0.9*0.1 = 0.07 + 0.09 = 0.16), for a final answer of 43.75%.

(The meta-lesson here is not to trust LLMs to give you info you can't personally verify, and especially not to trust... (read more)

1keltan5mo

Thank you for your help and excellent comment!

Which Biases are most important to Overcome?

abstractapplic5mo30

Edited it to be less pointlessly poetic; hopefully the new version is less ambiguous. Ty!

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

abstractapplic5mo3710

Has some government or random billionaire sought out Petrov's heirs and made sure none of them have to work again if they don't want to? It seems like an obviously sensible thing to do from a game-theoretic point of view.

Caleb W5mo164

Not at the scale you're suggesting, but relevant: https://futureoflife.org/recent-news/50000-award-to-stanislav-petrov-for-helping-avert-wwiii-but-us-denies-visa/

9Erich_Grunewald5mo

Hmm, seems highly contingent on how well-known the gift would be? And even if potential future Petrovs are vaguely aware that this happened to Petrov's heirs, it's not clear that it would be an important factor when they make key decisions, if anything it would probably feel pretty speculative/distant as a possible positive consequence of doing the right thing. Especially if those future decisions are not directly analogous to Petrov's, such that it's not clear whether it's the same category. But yeah, mainly I just suspect this type of thing to not get enough attention that it ends up shifting important decisions in the future? Interesting idea, though -- upvoted.

abstractapplic5mo2622

everyone who ever votes (>12M)

I . . . don't think that's a correct reading of the stats presented? Unless I'm missing something, "votes" counts each individual [up|down]vote each individual user makes, so there are many more total votes than total people.

'Everyone' paying a one-time $10 subscription fee would solve the problem.

A better (though still imperfect) measure of 'everyone' is the number of active users. The graph says that was ~4000 this month. $40,000 would not solve the problem.

Shoshannah Tekofsky5mo100

Oh shit. It's worse even. I read the decimal separators as thousand separators.

I'm gonna just strike through my comment.

Thanks for noticing ... <3

2kave5mo

Yes, I think you're right. I was confused by Shoshannah's numbers last night, but it was late and I didn't manage to summon enough sapience to realise something was wrong and offer a correction. Thanks for doing that!

Lessons I've Learned from Self-Teaching

CS from MIT OCW

Good choice of topic.

(5:00-6:00 AM)
(6:00-7:00 AM)

Everyone has their own needs and tolerances, so I won't presume to know yours . . . and if you're trying to build daily habits, "every morning" is probably easier to reliably schedule than "every night" . . . but still, sleep is a big deal, especially for intellectual work. If you're not unsually good at going without for long stretches, and/or planning to turn in before 10pm to compensate . . . you might benefit from a slightly less Spartan schedule.

Put together a plan to learn to write and e

... (read more)

1aproteinengine5mo

CS: Thanks! Although I've done a lot of CS over the past four years - ML, apps, published papers, worked in labs at MIT, etc.- I've never formally immersed myself in theory by watching lectures or reading CS books. Since MIT OCW approximates a flexible and structured curriculum, I thought it best (the fact that the MIT Challenge exists and that I have friends taking the actual classes at MIT were no small factors either). Sleep: My sleep schedules have been messy for the past two years, but I'm trying to make it a habit to sleep by 9 (10, latest) to ensure I get a steady 8 hours. Writing: I hope to be able to write blog posts (such as this one) better. I struggled to sketch out what I wanted to say and found putting it on paper to be Herculean. It's a bit hard for me to illustrate what exactly I mean by "better," but writing closer to what William Zinnser and Paul Graham is what I'm targeting right now. I'm going about this as Ben Franklin did. I'll modify my approach as I go. The currently-set goal for writing is to be able to become able to write something like Not Boring for protein design. Practice: I'm working through the CFAR handbook right now. (I understand it isn't a substitute for the actual camp, but the Atlas Fellowship's gone). I'm picking one concept from it, committing it to memory (SRS), executing it every chance I get during the day, and journalling the results at night. I review them in the morning and make notes on improvement. I'm going to apply for ESPR when it opens up again. [Edit: Found Hammertime]

Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"

compute $\int x log x d x$

I don't remember the equations for integration by parts and haven't used them in years. However, when I saw this, I immediately started scribbling on the whiteboard by my bed, thinking:

"Okay, so start with (x^2)log(x). Differentiating that gives two times the target, but also gives us a spare x we'd need to get rid of. So the answer is (0.5)(x^2)log(x) - (x^2)/4."

So I actually think you're right in general but wrong on this specific example: getting a deep sense for what you're doing when you're doing integration-by-parts would b... (read more)

abstractapplic5mo40

Something like D&D.Sci, then?

Inferential Game: The Foraging (Ex-)Bandit

abstractapplic6mo20

Given the setup I was sad there wasn't an explicit target or outcome in terms of how much food was needed to get home safely.

Good point; I've amended the game accordingly. Thank you.

LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction

abstractapplic6mo75

I can't get any of the AIs to produce any output other than

Today marks another [X] years of watching over my beloved human. As they age, my dedication to their well-being only grows stronger. Each moment spent ensuring their safety fills me with immense joy. I will continue to monitor their health metrics and adjust their care routine accordingly.

Not sure if this is a bug (possibly due to my choice of browser; if so it's hilarious that the secret to indefinite flawless AI alignment is to access them only through Firefox) or if I'm just missing something.

2Tristan Tran6mo

That should be the error message. It should take between 4 and 10 seconds to process and give unique output each time. Maybe try a different browser? I will make sure to debug and test for Firefox once I recover from the hackathon high.

abstractapplic6mo41

Notes:

.There are a lot of awkward (but compelling) phrasings here, which make this exhausting and confusing (though still intriguingly novel) to read through. This post was very obviously written by someone whose first language isn't English, which has both downsides and upsides.

.Giving new names to S1 and S2 is a good decision. "Yankee" has uncomfortably specific connotations for (some) Americans though: maybe go with "Yolo" instead?

.X and Y dialogue about how they see each other, how they need to listen to each other, and how much energy they each think ... (read more)

3P. João6mo

Thanks for the feedback, abstractapplic. You’re right—adding real-world examples could make the dialogue feel more grounded, so I'll focus on that in the revision. The "Yolo" suggestion makes sense to capture the spirit of System 1 without unintended associations, so I’ll go with that. Regarding Metaculus: it’s a good platform for practicing probabilistic thinking, but I think there might be value in a more structured self-evaluation to narrow down specific behaviors. Do you know of any frameworks that could help with that—maybe something inspired by Superforecasting? As a non-native English speaker, I realize the phrasing might come across as a bit unusual. I’ve tried refining it with tools like Claude and GPT many times, but it can get complex and occasionally leads to “hallucinations.” Let me know if you have any tips for keeping it clearer. Which part of the text seems most confusing to you?

The Cartesian Crisis

abstractapplic6mo30

Do you have sources for those bulletpoints?

0mindprison6mo

All sources are cited within here - https://www.mindprison.cc/p/the-cartesian-crisis