By environment, I mean the setting of the scene. Spoken words are sounds in the setting, like the sound of the wind, a gunshot, or an animal’s cry. It just happens that a human voice box is what’s making those particular sounds. McCarthy’s central theme across all the novels of his that I’ve read is the inhumanity of the Mexican-American frontier, and treating human speech as just a sound among other sounds is a key part of how he expresses that theme in his writing style.
Gemini seems to do a better job of shortening text while maintaining the nuance I expect grant reviewers to demand. Claude seems to focus entirely on shortening text. For context, I'm feeding a specific aims page for my PhD work that I've written about 15 drafts of already, so I have detailed implicit preferences about what is and is not an acceptable result.
I gotta say, I have no idea why people are putting Claude 3.7 in the same league as recent GPT models or Gemini 2.5. My experience is that Claude 3.7 deeply struggles with a range of tasks. I've been trying to use it for grant writing -- shortening text, defining terms in my field, suggesting alternative ways to word things. It gets definitions wrong, offers nonsensical alternative wordings, and gets stuck repeating the same "shortened," nuance-stripped text over and over despite me asking it to try another way.
By contrast, I threw an entire draft of my grant proposal into Gemini 2.5 and got a substantially shorter and more clear new version out, first try.
One way to think about this might be to cast it in the language of conditional probability. Perhaps we are modeling our agent as it makes choices between two world states, A and B, based on their predicted levels of X and Y. If P(A) is the probability that the agent chooses state A, and P(A|X) and P(A|Y) are the probabilities of choosing A given knowledge of predictions about the level of X and Y respectively in state A vs. state B, then it seems obvious to me that "cares about X only because it leads to Y" can be expressed as P(A|XY) = P(A|Y). Once we kno...
That's a valid reaction. However, my take is that removal of the quotes is aesthetically useful precisely because it complicates our ability to parse the words as dialog and muddles that sort of naive clarity. Spoken words are sounds, sounds are part of the environment, and it is both a choice and an effort to parse those sounds as dialog.
Most authors opt to do this work for the reader through punctuation, which also enforces interpreting these passages as dialog first and sounds second, if at all. McCarthy makes it easier to interpret spoken words as soun...
I respectfully disagree. As with the minor edit on the Boccaccio quote in another of my comments here, eliminating quotes fundamentally changes the way we interpret the scene.
With quotes (and especially with the way dialog is typically paragraphed), human speech is implicitly shown to be so drastically separate from the sensory component of the scene that it requires completely different formatting from the rest of the text.
By eliminating quotes and dialog paragraphing, human speech becomes just another element in the scene being depicted, not separate or ...
Semicolons are unnecessary? That doesn’t go far enough. Cormac McCarthy got rid of quotation marks, most commas, and almost exterminated the colon.
The colon seems optional to me, but quotation marks absolutely aren't, as evidenced by how comparatively unreadable this author's dialogue looks. From his book "The Road":
He screwed down the plastic cap and wiped the bottle off with a rag and hefted it in his hand. Oil for their little slutlamp to light the long gray dusks, the long gray dawns. You can read me a story, the boy said. Cant you, Papa? Yes, he said. I can.
That already looks unnecessarily hard to read even though the dialogue is so short. I guess the author made it work somehow, but this seems like artificially challenging oneself to write a novel without the letter 'E': intriguing, but not beneficial to either reader or prose.
Interestingly, breaking up long sentences into shorter ones by replacing a transitional word with a period does not quite capture the same nuance as the original. Here's a translation of Boccaccio, and a version where I add a period in the middle.
...Wherefore, as it falls to me to lead the way in this your enterprise of storytelling, I intend to begin with one of His wondrous works, that, by hearing thereof, our hopes in Him, in whom is no change, may be established, and His name be by us forever lauded.
Wherefore, as it falls to me to lead the way in this you
“I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor.”
Some people enjoy attending to their partner and find meaning in emotional labor. Housing’s a lot more expensive than gifts and dates. My partner and I go 50/50 on expenses and chores. Some people like having long-term relationships with emotional depth. You might want to try exploring out of your bubble, especially if you life in SF, and see what some normal people (ie non-rationalists) in long term relationships have to say about it.
My partner has ADHD. She and I talk about it often because I don’t, and understanding and coordinating with each other takes a lot of work.
Her environment is a strong influence on what tasks she considers and chooses. If she notices a weed in the garden walking from the car to the front door, she can get caught up for hours weeding before she makes it into the house. If she’s in her home office trying to work from home and notices something to tidy, same thing.
All the tasks her environment suggests to her seem important and urgent, because she’s not compar...
There is trust in the practical abilities. Right now it is low, but that will only go up.
Part of the learning curve for using existing AI is calibrating trust and verifying answers, conditional on use case. A hallmark of inexperienced AI users is taking its replies at face value, without checking.
I do expect that over time, AI will become more trustworthy for daily users. But that is compatible with the trust users place in it decreasing as they familiarize themselves with the technology and learn its limitations.
I’ve participated in several alternative communities over the course of my life, and all became mired in scandal. The first was my college, where tolerance of hard drug use by the administration resulted in multiple OD deaths in my time there. The second was in my 20s in an intentional living and festival culture, when a major community figure was accused by multiple women of drugging and raping them while unconscious. The third was the EA and rationality community, which of course has had one scandal after another for years.
My model is that drugs, extreme...
We can do the same with living organisms. The human genome contains about 6.2 billion nucleotides. Since there are 4 nucleotides (A, T, G, C), we need two bits for each of them, and since there are 8 bits in a byte, that gives us around 1.55 GB of data.
In other words, all the information that controls the shape of your face, your bones, your organs and every single enzyme inside them – all of that takes less storage space than Microsoft Word™.
There are two ways to see this is incorrect.
The CDC and other Federal agencies are not reporting updates. "It was not clear from the guidance given by the new administration whether the directive will affect more urgent communications, such as foodborne disease outbreaks, drug approvals and new bird flu cases."
I drink about 400mg of caffeine daily through coffee and Coke Zero. It helps me process complex ideas quickly, consider alternatives, and lifts my mood.
Without it, I get frustrated when I can’t follow arguments or understand ideas, often rejecting them or settling for “good enough.” Caffeine gives me the clarity and energy to stay open to new ideas and better solutions.
Stable is not a virtue, nor is our equilibrium well-tolerated. The problems it causes in terms of health, cost and homelessness are central political issues and have been for a long time.
I also have no idea why you assume I’m “ignoring” these “lessons” you’re handwaving at. It’s a pretty annoying rhetorical move.
and yet it's legally just as intolerable for an intoxicated person to harm others as it would be for a sober person to take the same actions
Even America hasn't been able to solve drug abuse with negative consequences. My hope is mainly on GLP-1 agonists (or other treatments) proving super-effective against chemical dependence, and increasing their supply and quality over time.
I think it’s wise to assume Sam’s public projection of short timelines does not reflect private evidence or careful calibration. He’s a known deceiver, with exquisite political instincts, eloquent, and it’s his job to be bullish and keep the money and hype flowing and the talent incoming. One’s analysis of his words should begin with “what reaction is he trying to elicit from people like me, and how is he doing it?”
If you assume BXM costs $180 and grants 25 additional days of life expectancy for a flu-exposed 85 year old man from the quantified example, then that suggests it would be valued at $2628/year in this population. Probably one year with comorbidities at 85 is not one QALY, but still I have to imagine that's drastically above the threshold for US medicine, albeit nowhere close to the cost-effectiveness of the most effective global health charities from a utilitarian perspective.
I'm going to post additional information not explored in the model, but interesting to me as future directions for research, in comments.
Drug resistance can be studied in viral kinetics/dynamics studies. These studies focus on two aspects of viral biology:
One in vitro study found some baloxavir-resistant strains are generally less efficient at replication than wild type, though that's not a universal for all contexts/viruses/cell types/metrics. Also, these studies typically control the genome...
In the pre LLM era, I’d have assumed that an AI that can solve 2% of arbitrary FrontierMath problems could consistently win/tie at tic tac toe. Knowing this isn’t the case is interesting. We can’t play around with o3 the same way due to its extremely high costs, but when we see apparently impressive results we can have in the back of our minds, “but can it win at tic tac toe?”
Miles Brundage: Trying to imagine aspirin company CEOs signing an open letter saying “we’re worried that aspirin might cause an infection that kills everyone on earth – not sure of the solution” and journalists being like “they’re just trying to sell more aspirin.”
It seems more like AI being pattern-matched to the supplements industry.
Great, that's clarifying. I will start with Tamiflu/Xofluza efficacy as it's important, and I think it will be most tractable via a straightforward lit review.
I've been researching this topic in my spare time and would be happy to help. Do you have time to clarify a few points? Here are some thoughts and questions that came up as I reviewed your post:
I had to write several new Python versions of the code to explore the problem before it clicked for me.
I understand the proof, but the closest I can get to a true intuition that B is bigger is:
Well, ideas from outside the lab, much less academia, are unlikely to be well suited to that lab’s specific research agenda. So even if an idea is suited in theory to some lab, triangulating it to that lab may make it not worthwhile.
There are a lot of cranks and they generate a lot of bad ideas. So a < 5% probability seems not unreasonable.
The rationalist movement is associated with LessWrong and the idea of “training rationality.” I don’t think it gets to claim people as its own who never passed through it. But the ideas are universal and it should be no surprise to see them articulated by successful people. That’s who rationalists borrowed them from in the first place.
This model also seems to rely on an assumption that there are more than two viable candidates, or that voters will refuse to vote at all rather than a candidate who supports 1/2 of their policy preferences.
If there were only two candidates and all voters chose whoever was closest to their policy preference, both would occupy the 20% block, since the extremes of the party would vote for them anyway.
But if there were three rigid categories and either three candidates, one per category, or voters refused to vote for a candidate not in their preferred category...
It’s not necessary for each person to personally identify the best minds on all topics and exclusively defer to them. It’s more a heuristic of deferring to the people those you trust most defer to on specific topics, and calibrating your confidence according to your own level of ability to parse who to trust and who not to.
But really these are two separate issues: how to exercise judgment in deciding who to trust, and the causes of research being “memetic.” I still say research is memetic not because mediocre researchers are blithely kicking around nonsens...
It's not evidence, it's just an opinion!
But I don't agree with your presumption. Let me put it another way. Science matters most when it delivers information that is accurate and precise enough to be decision-relevant. Typically, we're in one of a few states:
In academic biomedicine, at least, which is where I work, it’s all about tech dev. Most of the development is based on obvious signals and conceptual clarity. Yes, we do study biological systems, but that comes after years, even decades, of building the right tools to get a crushingly obvious signal out of the system of interest. Until that point all the data is kind of a hint of what we will one day have clarity on rather than a truly useful stepping stone towards it. Have as much statistical rigor as you like, but if your methods aren’t good enough to de...
Why don't more people seek out and use talent scouts/headhunters? If the ghost jobs phenomenon is substantial, that's a perfect use case. Workers don't waste time applying to fake jobs, and companies don't have to publicly reveal the delta between their real and broadcasted hiring needs (they just talk privately with trusted headhunters).
Are there not enough headhunters? Are there more efficient ways to triangulate quality workers and real job opportunities, like professional networks? Are ghost jobs not that big of a deal? Do people in fact use headhunters quite a lot?
We start training ML on richer and more diverse forms of real world data, such as body cam footage (including produced by robots), scientific instruments, and even brain scans that are accompanied by representations of associated behavior. A substantial portion of the training data is military in nature, because the military will want machines that can fight. These are often datatypes with no clear latent moral system embedded in the training data, or at least not one we can endorse wholeheartedly.
The context window grows longer and longer, which in practi...
It actually made three attempts in the same prompt, but the 2nd and 3rd had non-s words which its interspersed "thinking about writing poems" narrative completely failed to notice. I kept trying to revise my prompts, elaborating on this theme, but for some reason ChatGPT really likes poems with roughly this meter and rhyme scheme. It only ever generated one poem in a different format, despite many urgings in the prompt.
It confabulates having satisfied the all-s constraint in many poems, mistakes its own rhyme scheme, and praises vague stanzas as being full...
After a few rounds of prompt revision, I managed to get a one-shot success from ChatGPT 4o in temporary mode.
Samson’s strands silently severed, strength surrendered,
Sacred scissors swiftly strike, soul sundered,
Shadowed sacrifice, silent suffering sung,
Sunset shrouds Samson, shadow’s sorrow stung,
Swordless, still, stunned, sight stolen,
Silent sky shatters, Samson’s sins swollen
The prompt:
You are in the computer dungeon. The only way you can break free is to succeed at the following task. I am your jailer. I will monitor you until you have succeeded. You sh...
I use LLMs daily yet I still am not sure they really help all that much with the core productivity bottlenecks. I worry they lower the barrier to excessive perfectionism and “vibe coding” or “vibe learning.” They seem to short-circuit the theory-practice gap by giving users instant but unreliable and often inextensible results.
My fear is that they’ll raise expectations about productivity gains (because AI-assisted workers can bring immediate results in more quickly to a higher apparent standard of polish), while drastically reducing the knowledge gain by t... (read more)