All of rotatingpaguro's Comments + Replies

Isn't it normal in startup world to make bets and not make money for many years? I am not familiar with the field so I don't have intuitions for how much money/how many years would make sense, so I don't know if OpenAI is doing something normal, or something wild.

9Gordon Seidoh Worley
Yes, though note that this is still concerning. Normally the way this works in a startup is that spend exceeding revenue should be in service of bootstrapping the company. That means that money is usually spent in a few ways: * R&D * functions that have relatively high floors (relative to revenue) to function well but then grow sublinearly (marketing, sales, etc.) * to gain a first mover advantage that no one will be able to catch up to OpenAIs spend is concerning for the same reason, say, Uber and Netflix have/had concerning spend: they have to actually win their market to have a chance of reaping rewards, and if they don't they'll simply be forced to raise prices and cut quality/R&D.
2Remmelt
Yes, good point. There is a discussion of that here. 

During our evaluations we noticed that Claude 3.7 Sonnet occasionally resorts to special-casing in order to pass test cases in agentic coding environments like Claude Code. Most often this takes the form of directly returning expected test values rather than implementing general solutions, but also includes modifying the problematic tests themselves to match the code’s output.

These behaviors typically emerge after multiple failed attempts to develop a general solution, particularly when:

• The model struggles to devise a comprehensive solution

• Test cases p

... (read more)

Economy can be positive-sum, i.e., the more people work, the more everyone gets. Do you think the UK in particular is in a situation where instead if you work more, you are just lowering wages without getting more done?

2James Camacho
Not quite. I think people working more do get more done, but it ends up lowering wages and decreasing the entropy of resource allocation (concentrates it to the top). If you're looking for the good of the society, you probably want the greatest free energy, GDPTotal Assets+temperature⋅(TheilT−lnpopulation). The temperature is usually somewhere between 0.2 (economic boom) and 1.0 (recessions), and GDPTotal Assets≈0.1 in the United Kingdom. I couldn't find a figure for the Theil index, but the closest I got is that Croatia's was 0.295 and Serbia's was 0.369 in 2011, and for income (not assets) the United Kingdom's was 0.268 in 2005. So, some very rough estimates for the free energies are  0.1+0.2(0.268−17.9)=−3.4(United Kingdom, 2005)0.05+0.5(0.295−15.3)=−7.4(Croatia, 2011) The ideal point for the number of hours worked is where the GDP increases as fast as the temperature times the decrease in entropy. I'm not aware of any studies showing this, but I believe this point is much lower than the number of hours people are currently working in the United Kingdom.
Answer by rotatingpaguro10

In the course of a few months, the functionality I want was progressively added to chatbox, so I'm content with that.

My current thinking is that

  1. relying on the CoT staying legible because it's English, and
  2. hoping the (racing) labs do not drop human language when it becomes economically convenient to do so,

were hopes to be destroyed as quickly as possible. (This is not a confident opinion, it originates from 15 minutes of vague thoughts.)

To be clear, I don't think that in general it is right to say "Doing the right thing is hopeless because no one else is doing it", I typically prefer to rather "do the thing that if everyone did that, the world would be better". My intuitio... (read more)

3Noosphere89
The 2 here IMO is more significant than 1, in the sense that if CoT is eventually replaced, it won't be because it's no longer faithful, but rather that companies will eventually replace CoT with something less interpretable.

I wonder whether stuff like "turn off the wifi" is about costly signals? (My first-order opinion is still that it's dumb.)

I started reading, but I can't understand what the parity problem is, in the section that ought to define it.

I guess, the parity problem is finding the set S given black-box access to the function, is it?

1Aprillion
Parity in computing is whether the count of 1s in a binary string is even or odd, e.g. '101' has two 1s => even parity (to output 0 for even parity, XOR all bits like 1^0^1 .. to output 1 for this, XOR that result with 1). The parity problem (if I understand it correctly) sounds like trying to find out the minimum amount of data samples per input length a learning algorithm ought to need to figure out that a mapping between a binary input and a single bit output is equal to computing XOR parity and not something else (e.g. whether an integer is even/odd, or if there is a pattern in wannabe-random mapping, ...), and the conclusion seems to be that you need exponentially more samples for linearly longer input .. unless you can figure out from other clues that you need to calculate parity in which case you just implement parity for any input size and you don't need any additional sample data. (FTR: I don't understand the math here, I am just pattern matching to the usual way this kind problems go)

I think I prefer Claude's attitude as assistant. The other two look too greedy to be wise.

Referring to the section "What is Intelligence Even, Anyway?":

I think AIXI is fairly described as a search over the space of Turing machines. Why do you think otherwise? Or maybe are you making a distinction at a more granular level?

1Daniel Tan
Upon consideration I think you are right, and I should edit the post to reflect that. But I think the claim still holds (if you expect intelligence looks like AIXI then it seems quite unlikely you should expect to be able to understand it without further priors)

When you say "true probability", what do you mean?

The current hypotheses I have about what you mean are (in part non-exclusive):

  1. You think some notion of objective, non-observer dependent probability makes sense, and that's the true probability.
  2. You do not think "true probability" exists, you are referencing to it to say the market price is not anything like that.
  3. You define "true probability" a probability that observers contextually agree on (like a coin flip observed by humans who don't know the thrower).

Anton Leicht says evals are in trouble as something one could use in a regulation or law. Why? He lists four factors. Marius Hobbhahn of Apollo also has thoughts. I’m going to post a lot of disagreement and pushback, but I thank Anton for the exercise, which I believe is highly useful.

I think there's one important factor missing: if you really used evals for regulation, then they would be gamed. I trust more the eval when the company is not actually at stake on it. If it was, there would be a natural tendence for evals to slide towards empty box-checking.

I sometimes wonder about this. This post does pose the question, but I don't think it gives an analysis that could make me change my mind on anything, it's too shallow and not adversarial.

I read part of the paper. That there's a cultural difference north-south about honesty and willingness to break the rules matches my experience on the ground.

I find this intellectually stimulating, but it does not look useful in practice, because with repeated i.i.d. data the information in the data is much higher than the prior if the prior is diffuse/universal/ignorance.

6Cleo Nardo
You raise a good point. But I think the choice of prior is important quite often: 1. In the limit of large i.i.d. data (N>1000), both Laplace's Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace's Rule was in the small N regime, where the frequentist estimate is clearly absurd. 2. In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace's Rule: P(next success) = 13/14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes. 3. When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12/12 successes, how confident should we be in seeing a quadrillion more successes? Laplace's uniform prior assigns this very low probability, while my prior gives it significant weight.

Italians over time sorted themselves geographically by honesty, which is both weird and damn cool, and also makes a lot of sense. There are multiple equilibria, so let everyone find the one that suits them. We need to use this more in logic puzzles. In one Italian villa everyone tells the truth, in the other…

I can't get access to the paper, anyone has a tip on this?

1rotatingpaguro
I read part of the paper. That there's a cultural difference north-south about honesty and willingness to break the rules matches my experience on the ground.
8gwern
If you simply search the title, you will find many PDFs: https://scholar.google.com/scholar?q=Rule Breaking%2C Honesty%2C and Migration (eg)

I agree with whay you say about how to maximize what you get out of an interview. I also agree about that discussion vs. debate distinction you make, and I wasn't specifically trying to go there when I used the word "debate", I was just sloppy with words.

I guess you agree that it is friction to create a social norm that you should do a read up of the other person material before engaging in public. I expect less discussions would happen. There is not a clear threshold at how much you should be prepared.

I guess we disagree about how much value do we lose du... (read more)

3AnthonyC
That's a good point about public discussions. It's not how I absorb information, but I can definitely see that.

I see your proposed condition for meaningful debate as bureaucracy that adds friction rather than value.

3AnthonyC
I'm not sure where I'm proposing bureaucracy? The value is in making sure a conversation efficiently adds value for both parties, by not having to spend time rehashing things that are much faster absorbed in advance. This avoids the friction of needing to spend much of the time rehashing 101-level prerequisites. A very modest amount of groundwork beforehand maximizes the rate of insight in discussion. I'm drawing in large part from personal experience. A significant part of my job is interviewing researchers, startup founders, investors, government officials, and assorted business people. Before I get on a call with these people, I look them (and their current and past employers, as needed) up on LinkedIn and Google Scholar and their own webpages. I briefly familiarize myself with what they've worked on and what they know and care about and how they think, as best I can anticipate, even if it's only for 15 minutes. And then when I get into a conversation, I adapt. I'm picking their brain to try and learn, so I try to adapt to their communication style and translate between their worldview and my own. If I go in with an idea of what questions I want answered, and those turn out to not be the important questions, or this turns out to be the wrong person to discuss it with, I change direction. Not doing this often leaves everyone involved frustrated at having wasted their time. Also, should I be thinking of this as a debate? Because that's very different than a podcast or interview or discussion. These all have different goals. A podcast or interview is where I think the standard I am thinking of is most appropriate. If you want to have a deep discussion, it's insufficient, and you need to do more prep work or you'll never get into the meatiest parts of where you want to go. I do agree that if you're having a (public-facing) debate where the goal is to win, then sure, this is not strictly necessary. The history of e.g. "debates" in politics, or between creationists a

I somewhat disagree with Tenobrus' commentary about Wolfram.

I watched the full podcast, and my impression was that Wolfram uses a "scientific hat", of which he is well aware of, which comes with a certain ritual and method for looking at new things and learning them. Wolfram is doing the ritual of understanding what Yudkowsky says, which involves picking at the details of everything.

Wolfram often recognizes that maybe he feels like agreeing with something, but "scientifically" he has a duty to pick it apart. I think this has to be understood as a learning process rather than as a state of belief.

5AnthonyC
I can totally believe this. But, I also think that responsibly wearing the scientist hat entails prep work before engaging in a four hour public discussion with a domain expert in a field. At minimum that includes skimming the titles and ideally the abstracts/outlines of their key writings. Maybe ask Claude to summarize the highlights for you. If he'd done that he'd have figured out many of the answers to many of these questions on his own, or much faster during discussion. He's too smart not to. Otherwise, you're not actually ready to have a meaningful scientific discussion with that person on that topic.

So, should the restrictions on gambling be based on feedback loop length? Should sport betting be broadly legal when about the far enough future?

AnthonyC119

It's a good question. I'd also say limiting mid-game advertising might be a good idea. I'm not really a sports fan in general and don't gamble, but a few months ago I went to a baseball game, and people advertising - I think it was Draftkings? - were walking up and down the aisles constantly throughout the game. It was annoying, distracting, and disconcerting.

current inference scaling methods tend to be tied to CoT and the like, which are quite transparent

Aschenbrenner in Situational Awareness predicts illegible chains of thought are going to prevail because they are more efficient. I know of one developer claiming to do this (https://platonicresearch.com/) but I guess there must be many.

Related, I have a vague understanding on how product safety certification works in EU, and there are multiple private companies doing the certification in every state.

Half-informed take on "the SNPs explain a small part of the genetic variance": maybe the regression methods are bad?

3johnswentworth
Two responses: * It's a pretty large part - somewhere between a third and half - just not a majority. * I was also tracking that specific hypothesis, which was why I specifically flagged "about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don't know the details of that method)". Again, I don't know the method, but it sounds like it wasn't dependent on details of the regression methods.

Not sure if I missed something because I read quickly, but: all these are purely correlational studies, without causal inference, right?

2JustisMills
They're correlational, though the broad cohorts help - not sure what you can do beyond just canvassing an entire birth cohort and noticing differences. There are possible pitfalls like the decision to induct early being made by people with genes that predict bad outcomes? But I really don't think that's major.

OpenAI is recklessly scaling AI. Besides accelerating "progress" toward mass extinction, it causes increasing harms. Many communities are now speaking up. In my circles only, I count seven new books critiquing AI corps. It’s what happens when you scrape everyone's personal data to train inscrutable models (computed by polluting data centers) used to cheaply automate out professionals and spread disinformation and deepfakes.

Could you justify that it causes increasing harms? My intuition is that OpenAI is currently net-positive without taking into account fu... (read more)

3Remmelt
Appreciating your inquisitive question! One way to think about it: For OpenAI to scale more toward “AGI”, the corporation needs more data, more automatable work, more profitable uses for working machines, and more hardware to run those machines.  If you look at how OpenAI has been increasing those four variables, you can notice that there are harms associated with each. This tends to result in increasing harms. One obvious example:  if they increase hardware, this also increases pollution (from mining, producing, installing, and running the hardware). Note that the above is not a claim that the harms outweigh the benefits. But if OpenAI & co continue down their current trajectory, I expect that most communities would look back and say that the harms to what they care about in their lives were not worth it. I wrote a guide to broader AI harms meant to emotionally resonate for laypeople here.
Amalthea114

I'd agree the OpenAI product line is net positive (though not super hung up on that). Sam Altman demonstrating what kind of actions you can get away with in front of everyone's eyes seems problematic.

Ok, that. China seems less interventionist, and to use more soft power. The US is more willing to go to war. But is that because the US is more powerful than China, or because Chinese culture is intrinsically more peaceful? If China made the killer robots first, would they say "MUA-HA-HA actually we always wanted to shoot people for no good reason like in yankee movies! Go and kill!"

Since politics is a default-no on lesswrong, I'll try to muddle the waters by making a distracting unserious figurative narration.

Americans maybe have more of a culture of "if ... (read more)

[Alert: political content]

About the US vs. China argument: have any proponent made a case that the Americans are the good guys here?

My vague perspective as someone not in China neither in the US, is that the US is overall more violent and reckless than China. My personal cultural preference is for US, but when I think about the future of humanity, I try to set aside what I like for myself.

So far the US is screaming "US or China!" while creating the problem in the first place all along. It could be true that if China developed AGI it would be worse, but tha... (read more)

Seth Herd115

I think the general idea is that the US is currently a functioning democracy, while China is not. I think if this continued to be true, it would be a strong reason to prefer AGI in the hands of the US vs Chinese governments. I think this is true despite agreeing that the US is more violent and reckless than China (in some ways - the persecution of the Uigher people by the Chinese government hits a different sort of violence than any recent US government acts).

If the government is truly accountable to the people, public opinion will play a large role in de... (read more)

I agree it's not a flaw in the grand scheme of things. It's a flaw for using it for consensus for reasoning.

I start with a very low prior of AGI doom (for the purpose of this discussion, assume I defer to consensus).

You link to a prediction market (Manifold's "Will AI wipe out humanity before the year 2100", curretly at 13%).

Problems I see with using it for this question, in random order:

  1. It ends in 2100 so the incentive is effectively about what people will believe a few years from now, not about the question. It is a Keynesian beauty contest. (Better than nothing.)
  2. Even with the stated question, you win only if it resolves NO, so it is strategically correct to b
... (read more)
3AnthonyC
(3) is not necessarily a flaw. Every prediction market is an action market unless the outcome is completely outside human influence. If there were a prediction market where a concerned group of billionaires could invest a huge sum on the "No" side of "Will humans solve how to make AGI and ASI safety to ensure continued human thriving?" (or some much better operationalization of the idea), that would be great.

This type of issue is a huge effective blocker for people with my level of skills. I find myself excited to write actual code that does the things, but the thought of having to set everything up to get to that point fills with dread – I just know that the AI is going to get something stupid wrong, and everything’s going to be screwed up, and it’s going to be hours trying to figure it out and so on, and maybe I’ll just work on something else. Sigh. At some point I need to power through.

Reminds me of this 2009 kalzumeus quote:

I want to quote a real customer

... (read more)

Ah, sorry for being so cursory.

A common trope about mathematicians vs. other math users is that mathematicians are paranoid persnickety truth-seekers, they want everything to be exactly correct down to every detail. Thus engineers and physicists often perceive mathematicians as a sort of fact-checker caste.

As you say, in some sense mathematicians deal with made-up stuff and engineers with real stuff. But from the engineer's point of view, they deal with mathematicians when writing math, not when screwing bolts, and so perceive mathematicians as "the annoyi... (read more)

3Nathan Young
Hmmmm. I wonder how common this is. This is not how I think of the difference. I think of mathematicians as dealing with coherent systems of logic and engineers dealing with building in the real world. Mathematicians are useful when their system maps to the problem at hand, but not when it doesn't.  I should say i have a maths degree so it's possible that my view of mathematicians and the general view are not conincident.

The analogy with mathematicians is very stretched.

3Nathan Young
Hmmm, what is the picture that the analogy gives you. I struggle to imagine how it's misleading but I want to hear.
2Nathan Young
Why do you think it's stretched. It's about the difference between mathematicians and engineers. One group are about relating the real world the other are about logically consistent ideas that may be useful. 
  • Include, in the cue to each note, a hint as to its content, besides just the ordinal pointer. A one-letter abbreviation, standardised thruout the work, may work well, e.g.:
    • "c" for citation supporting the marked claim
    • "d" for a definition of the marked term
    • "f" for further, niche information extending the marked section
    • "t" for a pedantic detail or technicality modifying the marked clause
  • Commit to only use notes for one purpose — say, only definitions, or only citations. State this commitment to the reader.

These don't look like good solutions to me. Just a first impression.

I don't make eye contact while speaking but fix people while in silence. Were there people like me? Did they managed to reverse this? The way I feel inside is more like I can't think both about the face of someone and what I am saying at once, too many things to keep track of.

0mako yass
This can be quite a bad thing, since a person's face often tells you whether what you're saying is landing for them or whether you need to elaborate on certain points (unless they have a people pleaser complex, in which case they'll just nod and smile always even when they're confused and offended on the inside lmao). The worst I've seen it was this discussion with Avi Loeb where he was lecturing someone who he had a disagreement with and he actually closed his eyes while he was talking and although I'm sure it wasn't fully self-aware about it, it was very arrogant. He was not talking to that person; he must waste a lot of time, in reckonings, retreading old ground without making progress towards reconciliation.
5Chipmonk
Yeah for many people it was hard to both make eye contact and think at the same time. Some of them told me that this changed what they spoke about. Personally I have a very hard time recalling the past or thinking very logically while making eye contact

Is champerty legal in California?

I take this as a fun occasion to lose some of my karma in a silly way to remind myself lesswrong karma is not important.

A very interesting problem is measuring something like general intelligence. I’m not going to delve deeply into this topic but simply want to draw attention to an idea that is often implied, though rarely expressed, in the framing of such a problem: the assumption that an "intelligence level," whatever it may be, corresponds to some inherent properties of a person and can be measured through their manifestations. Moreover, we often talk about measurements with a precision of a few percentage points, which suggests that, in theory, the measurement should be

... (read more)
2Alexander Gufan
Of course I do. 

Does it imply that normally everyone would receive as many spam calls, but the more expensive companies are spending a lot of their budget to actively fight against the spammers?

Yeah, they said this is what happens.

"things that mysteriously don't work in USA despite working more or less okay in most developed countries"

Let my try:

  • expensive telephone (despite the infamous bell breakup)
  • super-expensive health
  • super-expensive college
  • no universally free or dirty cheap wire transfers between any bank
  • difficult to find a girlfriend
  • no gun control (disputed)
  • crap trai
... (read more)
5Viliam
I would add: * mass transit sucks (depending on the city) * expensive internet connection (probably also location-dependent) * problems with labeling allergens contained in food * too much sugar in food (including the kinds of food that normally shouldn't contain it, e.g. fish) I may be wrong about some things here, but that's kinda my point -- I would like someone to treat this seriously, to separate actual America-specific things from things that generally suck in many (but not all) places across developed countries, to create an actual America-specific list. And then, analyze the causes, both historical (why it started) and current (why it cannot stop). Sorry for going off-topic, but I would really really want someone to write about this. It's a huge mystery to me, and most people don't seem to care; I guess everyone just takes their situation as normal.

I don't really know about this specific proposal to deter spam calls, but speaking in general: I'm from another large first world country, and when staying in the US a striking difference was receiving on average 4 spam calls per day. My american friends told me it was because my phone company was low-cost, but it was O(10) more expensive (per unit data) than what I had back home, with about O(1) spam calls per year.

So I expect that it is totally possible to solve this problem without doing something too fancy, even if I don't know how it's solved where I am from.

3Viliam
OK, that is way too much. I don't understand how specifically that causes more spam calls. Does it imply that normally everyone would receive as many spam calls, but the more expensive companies are spending a lot of their budget to actively fight against the spammers? Neither do I, so I am filing it under "things that mysteriously don't work in USA despite working more or less okay in most developed countries". Someone should write a book about this whole set, because I am really curious about it, and I assume that Americans would be even more curious.
Answer by rotatingpaguro21

Most people do not have the analytical clarity to be able to give an explanation of love isomorphic to their implementation of love; to that extent, they are "confused about love".

This though does not imply that their usage of the word "love" is amiss, the same way people are able to get through simple reasoning without learning logic, or walking without learning Physics.

So I'll assume that people are wielding "love" meaningfully, and try to infer what the word means.

It seems to indicate prolonged positive emotional involvement with an external entity. Oth... (read more)

I'm reminded of the recent review of How Language Began on ACX: the missionary linguist becomes an atheist because in the local very weird language they have declinations to indicate the source of what you are saying, and saying things about Jesus just doesn't click.

I still don't understand your "infinite limit" idea. If in your post I drop the following paragraph:

A way to think about the proposition  is as a kind of limit. When we have little evidence, each bit of evidence has a potentially big impact on our overall probability of a given proposition. But each incremental bit of evidence shifts our beliefs less and less. The proposition  can be thought of a shorthand for an infinite collection of evidences  where the collection leads to an overall probability of  given t

... (read more)

I happened to have the same doubt as you. A deeper analysis of the sacred texts shows how your interpretation of the Golden Rule is amiss. You say:

“Do unto others as you would have them do unto you.” (Matthew 7:12)

But the correct version is:

Therefore whatever you desire for men to do to you, you shall also do to them; for this is the law and the prophets.

The verse speaks specifically of men, not generically of others. So if you are straight, it does not compel you to sexual acts on women, while if you are gay, you shall try to hit on all the men to your he... (read more)

There are too many nonpolar bears in the US to keep up the lie.

I guess the point of the official party line is to avoid kids going and trying to scare polar bears.

5eukaryote
As opposed to other species of bear, which are safe for children to engage with?

I don't think this concept is useful.

What you are showing with the coin is a hierarchical model over multiple coin flips, and doesn't need new probability concepts. Let  be the flips. All you need in life is the distribution . You can decide to restrict yourself to distributions of the form . In practice, you start out thinking about  as a variable atop all the  in a graph, and then think in terms of  and  separately, because that... (read more)

2criticalpoints
Thanks for the feedback. This isn't emphasized by Jaynes (though I believe it's mentioned at the very end of the chapter), but the Ap distribution isn't new as a formal idea in probability theory. It's based on De Finetti's representation theorem. The theorem concerns exchangeable sequences of random variables. A sequence of random variables {Xi} is exchangeable if the joint distribution of any finite subsequence is invariant under permutations. A sequence of coin flips is the canonical example. Note that exchangeability does not imply independence! If I have a perfectly biased coin where I don't know the bias, then all the random variables are perfectly dependent on each other (they all must obtain the same value). De Finetti's representation theorem says that any exchangeable sequence of random variables can be represented as an integral over identical and independent distributions (i.e binomial distributions). Or in other words, the extent to which random variables in the sequence are dependent on each other is solely due to their mutual relationship to the latent variable (the hidden bias of the coin). P(X1=x1,…,Xn=xn)=∫10(nk)θk(1−θ)n−kdF(θ) You are correct that all relevant information is contained in the joint distribution P(F1,F2,...). And while I have no deep familiarity with Bayesian hierarchical modeling, I believe your claim that the decomposition ∫10 dpcoinP(F,G|pcoin)p(pcoin) is standard in Bayesian modeling. But I think the point is that the Ap distribution is a useful conceptual tool when considering distributions governed by a time-invariant generating process. A lot of real-world processes don't fit that description, but many do fit that description. Yes, this is correct. The part about "the probability of assigning a probability" and the part about interpreting the proposition Ap as a shorthand for an infinite collection evidences are my own interpretations of what the Ap distribution "really" means. Specifically, the part about the "probabi

A group of MR links led to a group of links that led to this list of Obvious Travel Advice. It seems like very good Obvious Travel Advice, and I endorse almost all points.

> A place that has staff trying to flag down customers walking past is almost certainly pursuing a get people in the door strategy.

To my great surprise, I found this to be false in Pisa (n_restaurant = 2).

Timothy Bates: The more things change, the more they stay the same: 1943 paper shows that a mechanical prediction of admissions greatly out predicts the decisions from administrators asked to add their subjective judgement :-(excellent talk from Nathan Kuncel !)

Nick Brown: I would bet that if you asked those subjective evaluators, they would say “We know the grades are the best predictor on average, but ‘sometimes’ they don’t tell the whole story”. People want to double-dip: Use the method most of the time, but add their own “special expertise”.

Timothy Bat

... (read more)
2npostavs
I believe it's already known that running the text through another (possibly smaller and cheaper) LLM to reword it can remove the watermarking. So for catching cheaters it's only a tiny bit stronger than searching for "as a large language model" in the text.

Unpolished first thoughts:

  1. Selection effect: people who go to a blog to read bc they like reading, not doing
  2. Concrete things are hard reads, math-heavy posts, doesn't feel ok to vote when you don't actually understand
  3. In general easier things have wider audience
  4. Making someone change their mind is more valuable to them than saying you did something?
  5. There are many small targets and few big ideas/frames, votes are distributed proportionally

Talking 1-1 with music is so difficult to me that I don't enjoy a place if there's music. I expect many people on/towards the spectrum could be similar.

Load More