All of gwillen's Comments + Replies

gwillen21

Ultrapersonal Healthcare appears to have forgotten to pay Squarespace to renew their website, which doesn't seem like a great sign.

gwillen84

I think this makes sense as a reminder of a thing that is true anyway, as you somewhat already said; but also consider situations like:

  • A given reviewer was only reviewing for substance, and the error is stylistic, or vice versa;
  • A given reviewer was only reviewing for a subset of the subject matter;
  • A given reviewer was reviewing an early draft, and an error was introduced in a later draft.

In general a given reviewer will not necessarily have a real opportunity to catch any particular error, and usually a reader won't have enough context to determine w... (read more)

gwillen2512

Whether or not to get insurance should have nothing to do with what makes one sleep – again, it is a mathematical decision with a correct answer.

I'm not sure how far in your cheek your tongue was, but I claim this is obviously wrong and I can elaborate if you weren't kidding.

I agree with you, and I think the introduction unfortunately does major damage to what is otherwise a very interesting and valuable article about the mathematics of insurance. I can't recommend this article to anybody, because the introduction comes right out and says: "The thi... (read more)

gwillen20

Have you been testing serum (or urine) iodine, as well as thyroid numbers? If so, I'm curious what those numbers have been doing. (In fact, I would love to see the whole time course of treatments and relevant blood tests if you'd be willing to share, just to help develop my intuition for mysterious biological processes.) Do you expect to have to continue or resume gargling PVP-I in the future, or otherwise somehow keep getting more iodine into your body than it seems to want to absorb (perhaps through some other formulation that's neither a pill nor a gargle?)

Thanks for posting about this!

3Elizabeth
IIRC my serum iodine after 6 months of gargling and basically-cured hypothyroidism were within a 1% of pre-gargling levels.  After my last test but before getting the results I started forgetting to gargle, and was resistant to taking my medication in the morning. The test revealed this was correct- I didn't need meds anymore. I've used iodine a bit to treat infections since then but now that I know water is about as good, I will stick to that unless I start craving iodine again or a test reveals my levels have slipped.
gwillen20

This paper seems like an interesting counterpoint: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5421578/

Estimates of Ethanol Exposure in Children from Food not Labeled as Alcohol-Containing

They find that:

... orange, apple and grape juice contain substantial amounts of ethanol (up to 0.77 g/L).

... certain packed bakery products such as burger rolls or sweet milk rolls contained more than 1.2 g ethanol [per] 100 g.

... We designed a scenario for average ethanol exposure by a 6-year-old child. ... An average daily exposure of 10.3 mg ethanol [per] kg body weig

... (read more)
Answer by gwillen51

One possible factor I don't see mentioned so far: A structural bias for action over inaction. If the current design happened to be perfect, the chance of making it worse soon would be nearly 100%, because they will inevitably change something.

This is complementary to "mean reversion" as an explanation -- that explains why changes make things worse, whereas bias-towards-action explains why they can't resist making changes despite this. This may be due to the drive for promotions and good performance reviews; it's hard to reward employees correctly for their... (read more)

If a car is trying to yield to me, and I want to force it to go first, I turn my back so that the driver can see that I'm not watching their gestures. If that's not enough I will start to walk the other way, as though I've changed my mind / was never actually planning to cross.

I'll generally do this if the car has the right-of-way (and is yielding wrongly), or if the car is creating a hazard or problem for other drivers by waiting for me (e.g. sticking out from a driveway into the road), or if I can't tell whether the space beyond the yielding car is safe ... (read more)

3mikbp
I live in Germany and I do something similar... but it has to be always. If you are close to a zebra crossing most cars will stop to let you cross even if you haven't made any intent to cross, so you have to do all kinds of theatre to make it clear that you are not going to cross (in that moment). But the other day I understood why they do it (I almost never drive). I was driving approaching a zebra crossing an a guy who was walking in the same direction but through the sidewalk just turned 90º and continued walking when he reached the zebra crossing. He didn't signal the turn at all and didn't even look before crossing. He even stared at me annoyed that I did not stop before. It was like, "dude read my mind, I was going to turn all along". This system is so inefficient and stupid. The best moments are when people do not realise they are close to a zebra crossing (or they don't give a damn) and cars approaching stop to let them cross. I've seen someone making several cars stop because they were just waiting for something in front of a zebra crossing and the traffic was low enough so that one driver would not see the previous car stopping for nothing.
5jefftk
Definitely +1 on turning your back to indicate you don't intend to cross (right now). It's a big clear signal, and I've also found it working well in practice.

You are wrong! Ethanol is mixed into all modern gas, and is hygroscopic -- it absorbs water from the air. This is one of the things fuel stabilizer is supposed to prevent.

Given that Jeff did use fuel stabilizer, and the amount of water was much more that I would expect, it feels to me like water must have leaked into the gas can somehow from the outside instead? But I don't know.

I agree with Jeff that if someone wanted to steal the gas they would just steal the can. There's no conceivable reason to replace some of the gas with water.

4Boris Kashirin
Read a bit about interaction between gas ethanol and water, fascinating!

I think you are not wrong to be concerned, but I also agree that this is all widely known to the public. I am personally more concerned that we might want to keep this sort of discussion out of the training set of future models; I think that fight is potentially still winnable, if we decide it has value.

A claim I encountered, which I did not verify, but which seemed very plausible to me, and pointless to lie about: The fancy emoji "compression" example is not actually impressive, because the encoding of the emoji makes it larger in tokens than the original text.

Here's the prompt I've been using to make GPT-4 much more succinct. Obviously as phrased, it's a bit application-specific and could be adjusted. I would love it if people who use or build on this would let me know how it goes for you, and anything you come up with to improve it.

You are CodeGPT, a smart and reliable AI programming helper. Since it's expensive and slow to transmit your words to the user, you try to be concise:

- You don't repeat things you just said in a recent message.
- You only include necessary context in code snippets, and omit or abb
... (read more)

It's extremely important in discussions like this to be sure of what model you're talking to. Last I heard, Bing in the default "balanced" mode had been switched to GPT-3.5, presumably as a cost saving measure.

1AVoropaev
That would explain a lot. I've heard this rumor, but when I tried to trace the source, i haven't found anything better than guesses. So I dismissed it, but maybe I shouldn't have. Do you have a better source?

As a person who is, myself, extremely uncertain about doom -- I would say that doom-certain voices are disproportionately outspoken compared to uncertain ones, and uncertain ones are in turn outspoken relative to voices generally skeptical of doom. That doesn't seem too surprising to me, since (1) the founder of the site, and the movement, is an outspoken voice who believes in high P(doom); and (2) the risks are asymmetrical (much better to prepare for doom and not need it, than to need preparation for doom and not have it.)

gwillen1611

The metaphor originated here:

https://twitter.com/ESYudkowsky/status/1636315864596385792

(He was quoting, with permission, an off-the-cuff remark I had made in a private chat. I didn't expect it to take off the way it did!)

https://github.com/gwern/gwern.net/pull/6

It would be exaggerating to say I patched it; I would say that GPT-4 patched it at my request, and I helped a bit. (I've been doing a lot of that in the past ~week.)

The better models do require using the chat endpoint instead of the completion endpoint. They are also, as you might infer, much more strongly RL trained for instruction following and the chat format specifically.

I definitely think it's worth the effort to try upgrading to gpt-3.5-turbo, and I would say even gpt-4, but the cost is significantly higher for the latter. (I think 3.5 is actually cheaper than davinci.)

If you're using the library you need to switch from Completion to ChatCompletion, and the API is slightly different -- I'm happy to provide sampl... (read more)

4gwern
Yeah, I will at some point, but frontend work with Said always comes first. If you want to patch it yourself, I'd definitely try it.

Have you considered switching to GPT-3.5 or -4? You can get much better results out of much less prompt engineering. GPT-4 is expensive but it's worth it.

8gwern
It's currently at -003 and not the new ChatGPT 3.5 endpoint because when I dropped in the chat model name, the code errored out - apparently it's under a chat/ path and so the installed OA Py library errors out. I haven't bothered to debug it any further (do I need to specify the engine name as chat/turbo-gpt-3 or do I need to upgrade the library to some new version or what). I haven't even tried GPT-4 - I have the API access, just been too fashed and busy with other site stuff. (Technical-wise, we've been doing a lot of Gwern.net refactoring and cleanup and belated documentation - I've written like 10k words the past month or two just explaining the link icon history, redirect & link archiving system, and the many popup system iterations and what we've learned.)

Oh, I recognize that last document -- it's a userpage from the bitcoin-otc web of trust. See: https://bitcoin-otc.com/viewratings.php

I expect you'll also find petertodd in there. (You might find me in there as well -- now I'm curious!)

EDIT: According to https://platform.openai.com/tokenizer I don't have a token of my own. Sad. :-(

5gwern
Yes, this is a plausible source for 'gmaxwell' (and much more plausible than his two suggestions). Still leaves "PeterTodd" (camelcase) a mystery, however: Todd was an OTC user but not a very active one, and as "petertodd" (all-lowercase), apparently.

If that is true, and the marginal car does not much change the traffic situation, why isn’t there boundless demand for the road with slightly worse traffic, increasing congestion now?

Other people have gestured towards explanations that involve changing the timing or length of trips, but let me make an analogy that I think makes sense, but abstracts those things away.

When current is going through a diode, the marginal increment of current changes the voltage so little that we model it as constant-voltage for many purposes. Despite that, the change must b... (read more)

Yesss, this is an awesome development. I would happily sling some money at this project if it would help.

9tailcalled
Nice, thanks! I'll make sure to contact you if I find someone we can sling money at and my budget starts running dry.
gwillen3-1

This makes sense, but my instinctive response is to point out that humans are only approximate reasoners (sometimes very approximate). So I think there can still be a meaningful conceptual difference between common knowledge and shared knowledge, even if you can prove that every inference of true common knowledge is technically invalid. That doesn't mean we're not still in some sense making them. .... And if everybody is doing the same thing, kind of paradoxically, it seems like we sometimes can correctly conclude we have common knowledge, even though this... (read more)

4abramdemski
I agree that your theory could be understood as a less explicit version of p-common knowledge.  EG:  One reasonable steelman of the commonsense use of "knows" is to interpret "knowledge" as "true p-belief", with "p" left unspecified, flexible to the situation. (Situations with higher risk naturally call for higher p.) We similarly interpret commonsense "certainty" as p-belief. "Very certain" is p-belief with even higher p, and so on. We then naturally interpret the theory of "common knowledge" as code for p-common knowledge, by substituting iteration of "knows" with "correctly p-believes".[1] My problem with this approach is that it leads to a "missing stair" kind of problem, as I mentioned in my response to rpglover64. For example, the literature on common knowledge says that a public event (something everyone can see, simultaneously, and see that everyone else sees) is required. As I illustrated in the OP, p-common-knowledge doesn't require anything like this; it's much easier to establish. So if everyone uses the term "common knowledge", but in-the-know people privately mean "p-common-knowledge" and interpret others as meaning this, then not-in-the-know people run the risk of thinking this thing people call 'common knowledge' is really difficult and costly to establish. And this seems compatible with what people say, so they're not so liable to notice the difference. I would also point out that your and rpglover64's interpretations were much less precise than p-common-knowledge; eg, you say things like "in some sense", "kind of paradoxically", "it seems like". So I would say: why not use accurate language, rather than be confused? Why not correct inaccurate language, rather than leaving a missing stairstep? As I mentioned in the post, I also have some doubts about whether p-common knowledge captures the real phenomenon people are actually getting at when they use "common knowledge" informally. So it also has the advantage of being falsifiable! So we might
1Rana Dexsin
More specifically: Rounding upward when multiplying allows 1−ε to be a fixed point.

I am a little concerned that this would be totally unsingable for anybody who actually knows the original well (which is maybe not many people in the scheme of things, but the Bayesian Choir out here has done the original song before.)

8jefftk
I probably should have presented this here as a cross-genre cover, and not a straight simplification, fwiw
2jefftk
I talked to people afterwards, and the only person who raised anything along these lines said something like "I could tell that you'd changed some things, but didn't have trouble singing it". I suspect we didn't actually have anyone in our group of 35 or so at the event in the category you're describing, people who really thoroughly knew the original song. My guess, however, is that if we did have someone like that they would likely also be musically talented enough that they would be able to pick the adapted version up.
Raemon110

I do think there’s an upfront skill you can gain of… just accepting multiple versions of a song as existing, which I think generalizes once you really grok it. (It probably does involve grieving , and like, not a simple thing. But I think it’s pretty valuable for opening yourself up to new positive experiences)

My feeling listening to this one was ‘yup, seems like a fine alternate variation.’ I do find some elements good and some a bit meh (I'm not sure I can articulate the differences at the moment, but, once you get over the general 'aah this is different... (read more)

7Orual
Yeah, that's my main issue, too. I know the original incredibly well, I worked out the chords on piano from scratch years ago. So while I get the motivation here I would really have trouble with the adapted version.  I natively have higher expectations in terms of congregational musical and rhythmic ability, due to where I grew up (Congo), so I always feel the need to push back when people dumb down songs for group singing. My brain expects random untrained people to be able to do melody and descant, syncopation and pick-up notes, and so on, because that's what I grew up with, though I know that's not necessarily the case here, not with this demographic. One thing that would work is to have part of the song being sung by the leader, not the whole group. That might be a workable way to incorporate the bridge back into this version of Level Up. Drop back to just low accompaniment and have the best singer do that part solo, then bring people back in. If you were to attempt something more like the original, you could also do this with the start of the song, where the beat is a lot less consistent. Have the leader start the song and then bring people in as you move toward the first chorus and the groove really kicks in. Also, I wonder if an unclear understanding of the time signature of the original (or an attempt to fit it into something more standard) is causing issues. It's pretty much all in 7, especially once the beat gets going, and a good rhythm section that can hit the accents right really makes everything quite easy to hit in proper time. There's a quick and tight 1212123 (with the occasional 1231212) through the whole song (though for the into and first bit of the first verse it's a lot more nebulous) and most of the "challenging" notes actually land on that first beat of the 7. But yeah, you'd have to have the band really work on the song to get it to a place where you could lead it well in its original form.
gwillen103

I mostly agree, but I'm particularly surprised at the results for the Hershey's 45%. That's not all that dark (i.e. children might want to eat it), and 2 oz is not all that much chocolate for a child to eat, and it looks like 2 oz would be enough to rise above the less stringent FDA limit for children.

3jefftk
Technically it wouldn't be above the FDA limit for lead for kids if they're eating 2oz, because that's measured in ppm not ug/d. But yeah, that might be the most worrying one on there.

Thanks for explaining! I feel like that call makes sense.

It seems like you could mitigate this a lot if you didn't generate the preview until you were about to render the post for the first time. Surely the vast majority of these automated previews are being rendered zero times, and saving nothing. (This arguably links the fetch to a human action, as well.)

If you didn't want to take the hit that would cause -- since it would probably mean the first view of a post didn't get a preview at all -- you could at least limit it to posts that the server theoretically might someday have a good reason to render (i.e. require that there be someone on the server following the poster before doing automated link fetching on the post.)

This whole thing shades into another space I think a lot about, which is error handling in programming languages and systems.

Some parts of the stack I described above really seem to fall under "error handling" -- what do you do if you can't reach component A from component B? Others seem to fall under "data representation" -- If you poll someone who they're voting for, and they say "I'm not voting", or "I don't know", or "fuck you", or "je ne parle pas Anglais", what do you write down on the form (and which of those cases do you want to distinguish versus merge?) But the two are closely related.

Nested layers of "options"

Here I use "option" in the sense of C++ std::optional<> / Rust Option / Haskell Maybe.

It feels to me like "real-world data" often ends up nested in thick layers of "optionality", with the number of layers limited mostly by how precisely we want to represent the state of our "un-knowledge" about it. When we get data from some source, which potentially got the data in turn from another source, and so on, there is some kind of fuzziness or uncertainty added at each step, which we may or may not need to represent.

I'm thinking a... (read more)

2gwillen
This whole thing shades into another space I think a lot about, which is error handling in programming languages and systems. Some parts of the stack I described above really seem to fall under "error handling" -- what do you do if you can't reach component A from component B? Others seem to fall under "data representation" -- If you poll someone who they're voting for, and they say "I'm not voting", or "I don't know", or "fuck you", or "je ne parle pas Anglais", what do you write down on the form (and which of those cases do you want to distinguish versus merge?) But the two are closely related.

Oh actually this is also happening for me on Edge on macos, separately from the perhaps-related Android Chrome bug I described below.

gwillen*129

Good question, just did some fiddling around. Current best theory (this is on Android Chrome):

  • Scroll the page downward so that the top bar appears.
  • Tap a link, but drag away from it, so the tooltip appears but the link doesn't activate. (Or otherwise do something that makes a tooltip appear, I think.)
  • Scroll the page upward so that the top bar disappears.
  • Tap to close the tooltip.

If this doesn't reproduce the problem 100% of the time, it seems very close. I definitely have the intuition that it's related to link clicks; I also note that it always seems... (read more)

4habryka
This seems great, thank you!

I see a maybe-related problem in Chrome for Android. It's very annoying, because on a narrow screen it's inevitably covering up something I'm trying to read.

8habryka
We've been trying to reproduce this bug for a while. Do you by any chance have any series of steps that reliably produces it?

Importance vs Seriousness of projects

(Note: I'm not sure "serious" is the right word for what I mean here. As I was writing this, I overheard a random passerby say to someone, "that's unprofessional!" Perhaps "professional" is a better word for it.)

While working on some code for my MIT Mystery Hunt team, I started thinking about sorting projects by importance (i.e how bad the consequences would be if they broke.)

The code I'm working on is kind of important, since if it breaks it will impair the team's ability to work on puzzles. But it won't totally preve... (read more)

gwillen120

Why [not] ask why?

When someone asks for help, e.g. in a place like Stack Overflow, they are often met with the response "why do you want to do that?"

People like to talk about the "XY Problem": when someone's real problem is X, but their question is about how to do Y, which is a bad way to solve X. In response, some other, snarkier people sometimes talk about the "XY Problem Problem": when someone's problem is Y, and they ask about Y, but people refuse to help them with it because they're too busy trying to figure out the (nonexistent) value of X.

The other ... (read more)

I'm surprised your kettle is only 1000W. You should be able to find a 1500W one. (The max power possible on a 15A circuit is higher, but I believe 1500W is the maximum permitted "continuous" power draw, and seems to be the typical maximum for heating appliances.)

As you say, if the circuit is shared, you may not be able to draw the max, but kitchen counter circuits are required to be separate from the rest of the house, so if you're not running other 120V kitchen appliances at the same time, you should have the full power of the circuit.

It seems like you misunderstood something here: the "virus with 100% lethality in mice" was the original wild-type ("Wuhan") sars-cov-2 virus. It was the mice that were engineered for their susceptibility to it. That's why the 80% headline number is meaningless and alarmist to report in isolation: The new strain is 80% fatal in mice which were genetically engineered to be susceptible to original-flavor COVID, which is 100% fatal to them.

2ChristianKl
Are you somehow asserting that the original sars-cov-2 virus is not in the reference class of "very dangerous virus" and that I fail to understand that the original sars-cov-2 is not dangerous? I just responded to Zac's assertion that we should look at the fact that it reduces lethality from 100% to 80%. He made the argument that this commission is a problem and I said why it doesn't. You can also make an argument that it's important to talk about the fact that we are talking about the original strain but that's not an argument that Zac made. But let's think about whether "it was the original COVID strain" should make us think that this wasn't a risky idea and see what their paper has to say about that. From the actual paper: The virus they created is more lethal than Omicron (in the mouse model) and more potentially more transmissible than the original COVID strain.  The mouse model in question is described in The K18-Human ACE2 Transgenic Mouse Model Recapitulates Non-severe and Severe COVID-19 in Response to an Infectious Dose of the SARS-CoV-2 Virus: Given that those mice are more like humans than normal mice, I do think that a virus that's more lethal to them than Omicron also has a good chance of being more harmful to humans than Omicron.  While I think that you could excuse researchers who produce a modified virus with lower lethality and lower transmissible than an already existing virus, the idea that you don't want to do a virus that's either more lethal or more transmissible than existing viruses is a bad idea. Especially, the idea that not do those experiments under the best conditions for safety that's possible which would be biosafety level 4.

I feel that the Robin Hanson tweet demands a reply, with what I thought was a classic LW-ism: "Humans aren't agents!"

But I can't actually find the post it comes from, and I think I actually got it from Eneasz Brodski's "Shit Rationalists Say" video. (https://youtu.be/jlT3MeCzVao)

Does anybody know where it originated? (And what Robin thinks of the idea?)

This didn't get attached to the "Apollo Almanac" sequence right (unless I just got here too early, and you're about to do that.)

8Jarred Filmer
Ha, thank you! It slipped my mind, I've just added it :)

Or the newer version, "one weird trick", where the purpose of the negative-sounding adjective "weird" is to explain why you haven't heard the trick before, if it's so great.

gwillen1111

Tragically I gave up on the Plate Tectonics study before answering my most important question: “Is Alfred Wegener the Balto of plate tectonics?”

Let me back up.

Tangential to the main point, but I love your opening.

I also suppose that it's possible for those without the context to enjoy the dialogue of the high context parts, even if they don't quite understand it.

That's pretty much where I'm at on it. Although, I have played enough poker that I know all the vocabulary, just not any strategy -- I know what the button is but I don't remember how its location affects strategy, I don't know what a highjack is, but I know the words "flush", "offsuit", "big blind", "preflop", "rainbow" (had to think about it), "fold", etc. etc.

But it's maybe telling that I have played ... (read more)

One thing to keep in mind: If you sample by interview rather than by candidate -- which is how an interviewer sees the world -- the worst candidates will be massively overrepresented, because they have to do way more interviews to get a job (and then again when they fail to keep it.)

(This isn't an original insight -- it was pointed out to me by an essay, probably by Joel Spolsky or one of the similar bloggers of his era.)

(EDIT: found it. https://www.joelonsoftware.com/2005/01/27/news-58/ )

4DirectedEvolution
That's an interesting insight, thanks!

"Butterfly idea" is real (there was a post proposing and explaining it as terminology; perhaps someone else can link it.)

"Gesture at something" is definitely real, I use it myself.

"Do a babble" is new to me but I'd bet on it being real also.

Oh, surprising to me that it didn't. Hopefully you can get that sorted out.

You might make this a linkpost that links to your blog, unless there's some downside of doing that.

4Elizabeth
oh thank you, I was under the impression the auto cross-posting handled that.

Actually, I think that post is probably what triggered me to write this originally, and I forgot that by the time I wrote it (or I would have added a link.) Thanks for the reminder!

Strongly agree about the existence of the problem. It's something I've put a bit of thought into.

One thing I think could help, in some cases, would be to split the market definition into

  1. the question definition, and
  2. the resolution method

And then specify the relationship between them. For example:

Question: How many reported covid cases will there be in the US on [DATE]?

Resolution method: Look at https://covid.cdc.gov/covid-data-tracker/ a week after [DATE] for the reported values for [DATE].

Resolution notes: "Whatever values are reported that day will be... (read more)

Answer by gwillen80

I used a P100 elastomeric respirator pretty much any time I left the house, for multiple months in 2020 during early COVID, and intermittently after that.

The main downside, for me personally, was that people generally found understanding my speech through it difficult or impossible. This was a big enough problem that I haven't used one in quite some time.

I think the way this all works is a lot more subtle than I've been imagining, and probably some of the stuff in the original shortform about orientation is wrong.

3D Printer foibles

I got a 3d printer last year, and I've been using it on and off. I want to document some of the stuff I've learned in the process. I'll start with just an outline for now, and see if people are interested (or I feel inspired) for more specifics.

The specific printer is a Monoprice Voxel, which is a rebadged / whitelabel Flashforge Adventurer 3.

  • Had I known it was a whitelabel I would have instead bought the original version. I don't know if that one has the same firmware bugs, but there's at least one missing feature in the Monoprice fi

... (read more)
gwillen122

I wish I had a stronger strong upvote I could give this post. I was already nodding my head by the time I was done with the introduction, and then almost every subsequent section gave me something to be excited about. I will try to say some more substantive things later, but I wanted to say this first because I often don't get around to commenting.

Up to Guidepost 3, I'm familiar with this approach, sort of independently invented it, and use it with moderate success sometimes.

The guideposts past that, I ~never have remembered experience of. Guidepost 5/6 very occasionally, but if I remember experiencing them, it's probably because I came back to full wakefulness while it was happening. Typically by that point I'm already close enough to count as "starting to sleep". (And I'm counting "experience of getting immersed in nonsensical logic" as guidepost 6; it's never accompanied by imagery past what you describe as guidepost 5.)

(It may be relevant that I have ~aphantasia, and experience minimal to no visual imagery in any context.)

Load More