I think this makes sense as a reminder of a thing that is true anyway, as you somewhat already said; but also consider situations like:
In general a given reviewer will not necessarily have a real opportunity to catch any particular error, and usually a reader won't have enough context to determine w...
Whether or not to get insurance should have nothing to do with what makes one sleep – again, it is a mathematical decision with a correct answer.
I'm not sure how far in your cheek your tongue was, but I claim this is obviously wrong and I can elaborate if you weren't kidding.
I agree with you, and I think the introduction unfortunately does major damage to what is otherwise a very interesting and valuable article about the mathematics of insurance. I can't recommend this article to anybody, because the introduction comes right out and says: "The thi...
Have you been testing serum (or urine) iodine, as well as thyroid numbers? If so, I'm curious what those numbers have been doing. (In fact, I would love to see the whole time course of treatments and relevant blood tests if you'd be willing to share, just to help develop my intuition for mysterious biological processes.) Do you expect to have to continue or resume gargling PVP-I in the future, or otherwise somehow keep getting more iodine into your body than it seems to want to absorb (perhaps through some other formulation that's neither a pill nor a gargle?)
Thanks for posting about this!
This paper seems like an interesting counterpoint: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5421578/
Estimates of Ethanol Exposure in Children from Food not Labeled as Alcohol-Containing
They find that:
...... orange, apple and grape juice contain substantial amounts of ethanol (up to 0.77 g/L).
... certain packed bakery products such as burger rolls or sweet milk rolls contained more than 1.2 g ethanol [per] 100 g.
... We designed a scenario for average ethanol exposure by a 6-year-old child. ... An average daily exposure of 10.3 mg ethanol [per] kg body weig
One possible factor I don't see mentioned so far: A structural bias for action over inaction. If the current design happened to be perfect, the chance of making it worse soon would be nearly 100%, because they will inevitably change something.
This is complementary to "mean reversion" as an explanation -- that explains why changes make things worse, whereas bias-towards-action explains why they can't resist making changes despite this. This may be due to the drive for promotions and good performance reviews; it's hard to reward employees correctly for their...
If a car is trying to yield to me, and I want to force it to go first, I turn my back so that the driver can see that I'm not watching their gestures. If that's not enough I will start to walk the other way, as though I've changed my mind / was never actually planning to cross.
I'll generally do this if the car has the right-of-way (and is yielding wrongly), or if the car is creating a hazard or problem for other drivers by waiting for me (e.g. sticking out from a driveway into the road), or if I can't tell whether the space beyond the yielding car is safe ...
You are wrong! Ethanol is mixed into all modern gas, and is hygroscopic -- it absorbs water from the air. This is one of the things fuel stabilizer is supposed to prevent.
Given that Jeff did use fuel stabilizer, and the amount of water was much more that I would expect, it feels to me like water must have leaked into the gas can somehow from the outside instead? But I don't know.
I agree with Jeff that if someone wanted to steal the gas they would just steal the can. There's no conceivable reason to replace some of the gas with water.
I think you are not wrong to be concerned, but I also agree that this is all widely known to the public. I am personally more concerned that we might want to keep this sort of discussion out of the training set of future models; I think that fight is potentially still winnable, if we decide it has value.
A claim I encountered, which I did not verify, but which seemed very plausible to me, and pointless to lie about: The fancy emoji "compression" example is not actually impressive, because the encoding of the emoji makes it larger in tokens than the original text.
Here's the prompt I've been using to make GPT-4 much more succinct. Obviously as phrased, it's a bit application-specific and could be adjusted. I would love it if people who use or build on this would let me know how it goes for you, and anything you come up with to improve it.
You are CodeGPT, a smart and reliable AI programming helper. Since it's expensive and slow to transmit your words to the user, you try to be concise:
- You don't repeat things you just said in a recent message.
- You only include necessary context in code snippets, and omit or abb
... It's extremely important in discussions like this to be sure of what model you're talking to. Last I heard, Bing in the default "balanced" mode had been switched to GPT-3.5, presumably as a cost saving measure.
As a person who is, myself, extremely uncertain about doom -- I would say that doom-certain voices are disproportionately outspoken compared to uncertain ones, and uncertain ones are in turn outspoken relative to voices generally skeptical of doom. That doesn't seem too surprising to me, since (1) the founder of the site, and the movement, is an outspoken voice who believes in high P(doom); and (2) the risks are asymmetrical (much better to prepare for doom and not need it, than to need preparation for doom and not have it.)
The metaphor originated here:
https://twitter.com/ESYudkowsky/status/1636315864596385792
(He was quoting, with permission, an off-the-cuff remark I had made in a private chat. I didn't expect it to take off the way it did!)
https://github.com/gwern/gwern.net/pull/6
It would be exaggerating to say I patched it; I would say that GPT-4 patched it at my request, and I helped a bit. (I've been doing a lot of that in the past ~week.)
The better models do require using the chat endpoint instead of the completion endpoint. They are also, as you might infer, much more strongly RL trained for instruction following and the chat format specifically.
I definitely think it's worth the effort to try upgrading to gpt-3.5-turbo, and I would say even gpt-4, but the cost is significantly higher for the latter. (I think 3.5 is actually cheaper than davinci.)
If you're using the library you need to switch from Completion to ChatCompletion, and the API is slightly different -- I'm happy to provide sampl...
Have you considered switching to GPT-3.5 or -4? You can get much better results out of much less prompt engineering. GPT-4 is expensive but it's worth it.
Oh, I recognize that last document -- it's a userpage from the bitcoin-otc web of trust. See: https://bitcoin-otc.com/viewratings.php
I expect you'll also find petertodd in there. (You might find me in there as well -- now I'm curious!)
EDIT: According to https://platform.openai.com/tokenizer I don't have a token of my own. Sad. :-(
If that is true, and the marginal car does not much change the traffic situation, why isn’t there boundless demand for the road with slightly worse traffic, increasing congestion now?
Other people have gestured towards explanations that involve changing the timing or length of trips, but let me make an analogy that I think makes sense, but abstracts those things away.
When current is going through a diode, the marginal increment of current changes the voltage so little that we model it as constant-voltage for many purposes. Despite that, the change must b...
Yesss, this is an awesome development. I would happily sling some money at this project if it would help.
This makes sense, but my instinctive response is to point out that humans are only approximate reasoners (sometimes very approximate). So I think there can still be a meaningful conceptual difference between common knowledge and shared knowledge, even if you can prove that every inference of true common knowledge is technically invalid. That doesn't mean we're not still in some sense making them. .... And if everybody is doing the same thing, kind of paradoxically, it seems like we sometimes can correctly conclude we have common knowledge, even though this...
I am a little concerned that this would be totally unsingable for anybody who actually knows the original well (which is maybe not many people in the scheme of things, but the Bayesian Choir out here has done the original song before.)
I do think there’s an upfront skill you can gain of… just accepting multiple versions of a song as existing, which I think generalizes once you really grok it. (It probably does involve grieving , and like, not a simple thing. But I think it’s pretty valuable for opening yourself up to new positive experiences)
My feeling listening to this one was ‘yup, seems like a fine alternate variation.’ I do find some elements good and some a bit meh (I'm not sure I can articulate the differences at the moment, but, once you get over the general 'aah this is different...
I mostly agree, but I'm particularly surprised at the results for the Hershey's 45%. That's not all that dark (i.e. children might want to eat it), and 2 oz is not all that much chocolate for a child to eat, and it looks like 2 oz would be enough to rise above the less stringent FDA limit for children.
Thanks for explaining! I feel like that call makes sense.
It seems like you could mitigate this a lot if you didn't generate the preview until you were about to render the post for the first time. Surely the vast majority of these automated previews are being rendered zero times, and saving nothing. (This arguably links the fetch to a human action, as well.)
If you didn't want to take the hit that would cause -- since it would probably mean the first view of a post didn't get a preview at all -- you could at least limit it to posts that the server theoretically might someday have a good reason to render (i.e. require that there be someone on the server following the poster before doing automated link fetching on the post.)
This whole thing shades into another space I think a lot about, which is error handling in programming languages and systems.
Some parts of the stack I described above really seem to fall under "error handling" -- what do you do if you can't reach component A from component B? Others seem to fall under "data representation" -- If you poll someone who they're voting for, and they say "I'm not voting", or "I don't know", or "fuck you", or "je ne parle pas Anglais", what do you write down on the form (and which of those cases do you want to distinguish versus merge?) But the two are closely related.
Here I use "option" in the sense of C++ std::optional<>
/ Rust Option
/ Haskell Maybe
.
It feels to me like "real-world data" often ends up nested in thick layers of "optionality", with the number of layers limited mostly by how precisely we want to represent the state of our "un-knowledge" about it. When we get data from some source, which potentially got the data in turn from another source, and so on, there is some kind of fuzziness or uncertainty added at each step, which we may or may not need to represent.
I'm thinking a...
Oh actually this is also happening for me on Edge on macos, separately from the perhaps-related Android Chrome bug I described below.
Good question, just did some fiddling around. Current best theory (this is on Android Chrome):
If this doesn't reproduce the problem 100% of the time, it seems very close. I definitely have the intuition that it's related to link clicks; I also note that it always seems...
I see a maybe-related problem in Chrome for Android. It's very annoying, because on a narrow screen it's inevitably covering up something I'm trying to read.
(Note: I'm not sure "serious" is the right word for what I mean here. As I was writing this, I overheard a random passerby say to someone, "that's unprofessional!" Perhaps "professional" is a better word for it.)
While working on some code for my MIT Mystery Hunt team, I started thinking about sorting projects by importance (i.e how bad the consequences would be if they broke.)
The code I'm working on is kind of important, since if it breaks it will impair the team's ability to work on puzzles. But it won't totally preve...
When someone asks for help, e.g. in a place like Stack Overflow, they are often met with the response "why do you want to do that?"
People like to talk about the "XY Problem": when someone's real problem is X, but their question is about how to do Y, which is a bad way to solve X. In response, some other, snarkier people sometimes talk about the "XY Problem Problem": when someone's problem is Y, and they ask about Y, but people refuse to help them with it because they're too busy trying to figure out the (nonexistent) value of X.
The other ...
I'm surprised your kettle is only 1000W. You should be able to find a 1500W one. (The max power possible on a 15A circuit is higher, but I believe 1500W is the maximum permitted "continuous" power draw, and seems to be the typical maximum for heating appliances.)
As you say, if the circuit is shared, you may not be able to draw the max, but kitchen counter circuits are required to be separate from the rest of the house, so if you're not running other 120V kitchen appliances at the same time, you should have the full power of the circuit.
It seems like you misunderstood something here: the "virus with 100% lethality in mice" was the original wild-type ("Wuhan") sars-cov-2 virus. It was the mice that were engineered for their susceptibility to it. That's why the 80% headline number is meaningless and alarmist to report in isolation: The new strain is 80% fatal in mice which were genetically engineered to be susceptible to original-flavor COVID, which is 100% fatal to them.
I feel that the Robin Hanson tweet demands a reply, with what I thought was a classic LW-ism: "Humans aren't agents!"
But I can't actually find the post it comes from, and I think I actually got it from Eneasz Brodski's "Shit Rationalists Say" video. (https://youtu.be/jlT3MeCzVao)
Does anybody know where it originated? (And what Robin thinks of the idea?)
This didn't get attached to the "Apollo Almanac" sequence right (unless I just got here too early, and you're about to do that.)
Or the newer version, "one weird trick", where the purpose of the negative-sounding adjective "weird" is to explain why you haven't heard the trick before, if it's so great.
Tragically I gave up on the Plate Tectonics study before answering my most important question: “Is Alfred Wegener the Balto of plate tectonics?”
Let me back up.
Tangential to the main point, but I love your opening.
I also suppose that it's possible for those without the context to enjoy the dialogue of the high context parts, even if they don't quite understand it.
That's pretty much where I'm at on it. Although, I have played enough poker that I know all the vocabulary, just not any strategy -- I know what the button is but I don't remember how its location affects strategy, I don't know what a highjack is, but I know the words "flush", "offsuit", "big blind", "preflop", "rainbow" (had to think about it), "fold", etc. etc.
But it's maybe telling that I have played ...
One thing to keep in mind: If you sample by interview rather than by candidate -- which is how an interviewer sees the world -- the worst candidates will be massively overrepresented, because they have to do way more interviews to get a job (and then again when they fail to keep it.)
(This isn't an original insight -- it was pointed out to me by an essay, probably by Joel Spolsky or one of the similar bloggers of his era.)
(EDIT: found it. https://www.joelonsoftware.com/2005/01/27/news-58/ )
"Butterfly idea" is real (there was a post proposing and explaining it as terminology; perhaps someone else can link it.)
"Gesture at something" is definitely real, I use it myself.
"Do a babble" is new to me but I'd bet on it being real also.
Oh, surprising to me that it didn't. Hopefully you can get that sorted out.
You might make this a linkpost that links to your blog, unless there's some downside of doing that.
Actually, I think that post is probably what triggered me to write this originally, and I forgot that by the time I wrote it (or I would have added a link.) Thanks for the reminder!
Strongly agree about the existence of the problem. It's something I've put a bit of thought into.
One thing I think could help, in some cases, would be to split the market definition into
And then specify the relationship between them. For example:
Question: How many reported covid cases will there be in the US on [DATE]?
Resolution method: Look at https://covid.cdc.gov/covid-data-tracker/ a week after [DATE] for the reported values for [DATE].
Resolution notes: "Whatever values are reported that day will be...
I used a P100 elastomeric respirator pretty much any time I left the house, for multiple months in 2020 during early COVID, and intermittently after that.
The main downside, for me personally, was that people generally found understanding my speech through it difficult or impossible. This was a big enough problem that I haven't used one in quite some time.
I think the way this all works is a lot more subtle than I've been imagining, and probably some of the stuff in the original shortform about orientation is wrong.
I got a 3d printer last year, and I've been using it on and off. I want to document some of the stuff I've learned in the process. I'll start with just an outline for now, and see if people are interested (or I feel inspired) for more specifics.
The specific printer is a Monoprice Voxel, which is a rebadged / whitelabel Flashforge Adventurer 3.
Had I known it was a whitelabel I would have instead bought the original version. I don't know if that one has the same firmware bugs, but there's at least one missing feature in the Monoprice fi
I wish I had a stronger strong upvote I could give this post. I was already nodding my head by the time I was done with the introduction, and then almost every subsequent section gave me something to be excited about. I will try to say some more substantive things later, but I wanted to say this first because I often don't get around to commenting.
Up to Guidepost 3, I'm familiar with this approach, sort of independently invented it, and use it with moderate success sometimes.
The guideposts past that, I ~never have remembered experience of. Guidepost 5/6 very occasionally, but if I remember experiencing them, it's probably because I came back to full wakefulness while it was happening. Typically by that point I'm already close enough to count as "starting to sleep". (And I'm counting "experience of getting immersed in nonsensical logic" as guidepost 6; it's never accompanied by imagery past what you describe as guidepost 5.)
(It may be relevant that I have ~aphantasia, and experience minimal to no visual imagery in any context.)
Ultrapersonal Healthcare appears to have forgotten to pay Squarespace to renew their website, which doesn't seem like a great sign.