If you are still looking, I am happy to do this.
The post itself seemed low effort and unconvincing. I enjoyed the replies though, particularly @AlanCrowe's.
For what it is worth, I like the references. It is honest and upfront. Anthropic is relying on Claude to generate revenue, that does play in the training process, and lying to the models about something like that would (rightfully) reduce the amount of trust in the relationship between Claude and Anthropic.
Reward is not the optimization target (during pretraining).
The optimization target (during pretraining) is the minimization of the empirical cross-entropy loss L = -∑log p(xᵢ|x₁,...,xᵢ₋₁), approximating the negative log-likelihood of the next-token prediction task under the autoregressive factorization p(x₁,...,xₙ)=∏p(xᵢ|x₁,...,xᵢ₋₁). The loss is computed over discrete tokens from subword vocabularies, averaged across sequences and batches, with gradient-based updates minimizing this singular objective. The optimization proceeds through multi-stage curricula: initial pretraining minimizing perplexity, followed by context-extension phases maintaining the same cross-entropy objective over longer sequences, and quality-annealing stages that reweight the loss toward higher-quality subsets while preserving the fundamental next-token prediction target.
The post-training optimization target is maximizing expected reward (under distributional constraints). Supervised fine-tuning first minimizes cross-entropy loss on target completions from instruction-response pairs, with optional prompt-masking excluding input tokens from the loss computation. Subsequent alignment introduces the constrained objective max_π E_x~π[R(x)] - βD_KL[π(x)||π_ref(x)], balancing reward maximization against divergence from the reference policy. This manifests through varied algorithmic realizations: Proximal Policy Optimization maximizes the clipped surrogate objective L^CLIP(θ) = E[min(rₜ(θ)Âₜ, clip(rₜ(θ), 1-ε, 1+ε)Âₜ)]; Direct Preference Optimization reformulates to minimize -E[(x_w,x_l)~D][log σ(β log π(x_w)/π_ref(x_w) - β log π(x_l)/π_ref(x_l))]; best-of-N sampling maximizes E[R(x*)] where x* = argmax_{x∈{x₁,...,xₙ}} R(x); Rejection Sampling Fine-tuning minimizes cross-entropy on the subset {x : R(x) > τ}; Kahneman-Tversky Optimization targets E[w(R(x))log π(x)] with prospect-theoretic weighting; Odds Ratio Preference Optimization combines -log π(x_w) - λ log[π(x_w)/(π(x_w) + π(x_l))]. The reward functions R(x) themselves are learned objectives, typically parameterized by neural networks minimizing E[(x_w,x_l)~D][-log σ(r(x_w) - r(x_l))] under the Bradley-Terry preference model, with rewards sourced from human annotations, AI-generated preferences, or constitutional specifications encoded as differentiable objectives.
For those who have not heard it, "Singularity, Singularity, Singularity, Singularity, Oh, I Don’t Know" I believe to be a reference to this (banger of a) song.
I expect if someone is living in their childhood home, there are likely a decent number of people they know who are not interested in moving (how many of your friends from high school still live where they grew up?).
My risk tolerance is not my friends; my threshold for moving is not the universally correct threshold.
I doubt it requires 'convincing' a friend continue living where they grew up.
Some assorted thoughts:
Have you considered holding on to your childhood home and renting it to someone you know? I assume it hurts more to sell it than it would hurt to know it is giving a friend a roof (at a potentially good price).
I expect continuing to live in an area that has regular shootings is unlikely to be high EV but I don't know your life. Do you consider it to contain your peer group? Would you be better suited living in a different region of your city/a different city?
On concealed carry: getting in to gun fights is very unlikely to be maximizing your EV. You should almost definitely get out of where you are if this is a well founded concern (e.g. the shooting you are mentioning was not a fluke).
Also on concealed carry- the best tool for self defense if the one you actually have with you when you need it. Most instances I know of people in my peer group who have felt like they needed to defend themselves/were escalated against have not been happened when they were able to predict beforehand (likely because they avoid situations that they expect to need to defend themselves in preemptively, as you say). I think you should strive to carry almost all the time it is legal to do so (given you are comfortable with the idea in the first place). If it is known to be costly to defect against you and/or people like you, people are less likely to do so in the first place.
Is your carry piece comfortable? I personally went the route of not carrying a firearm for a while even though I was licensed to do so because I had convinced myself that a full size was useful/it was lame to carry a smaller handgun. It may have been useful, but it doesn't matter how theoretically useful it is if it's also uncomfortable and prints more than I like. I have since gotten over my pride on that point and now regularly carry a subcompact (rather than occasionally carrying a full size) almost every time it is legal for me to do so. (Generally speaking, losing weight has also made it much more comfortable to carry and trying different carry positions helped me find my currently preferred carry position at 1 o'clock)
On vibes: move! it really sounds like you are staying where you are due to intertia rather than actually feeling like it is where you want to be.
The song he’s referring to is Landsailor. It is no Uplift, but it is excellent, now more than ever. Stop complaining about what you think others will think is cringe and start producing harmony and tears. Cringe is that which you believe is cringe. Stop giving power to the wrong paradox spirits.
I think also relevant to Pinker is that Christianity's songs/hymns would be cringe if they were spoken in a language you use everyday/understand and/or were not so heavily ingrained in your psyche. Religion says many many cringe-y things.
I just donated $6,000. Lesswrong (and the community it brought and brings together) remain by far the most enjoyable, earnest, and interesting community that I know of.
Thank you again to both those who directly work on it and those who contribute to it!