All of DanielFilan's Comments + Replies

In this comment we list the names of some of our advisors.

In this comment we list the names of some of our advisors.

Below is a list of some of the advisors we used for mentor selection. Notes:

  • Two advisors asked not to be named and do not appear here.
  • Advisors by and large focussed their efforts on areas they had some expertise in.
  • Advisors had to flag conflicts of interest, meaning that (for example) we did not take their ratings of themselves into account.

With that out of the way, here are some advisors who helped us for the Winter 2024-25 cohort:

  • Adam Gleave
  • Alex Lawsen
  • Buck Shlegeris
  • Ethan Perez
  • Lawrence Chan
  • Lee Sharkey
  • Lewis Hammond
  • Marius Hobbhahn
  • Michael Air
... (read more)

OK, my current evidence is that Jessica Taylor says on Twitter that she knew Ophelia, Ophelia was a Ziz fan, and Ophelia told Jessica that Ophelia was in contact with Somni and Emma prior to the landlord incident.

9AprilSR
Teresa Youngblut, the other person with Ophelia at the shootout, is also known to be a Ziz fan (and in November filed a marriage application to @Audere, also a Ziz fan.) You can see most of this if you look through Jessica's Twitter.

FWIW I feel like Ophelia's Zizian credentials haven't been that well-established.

5DanielFilan
OK, my current evidence is that Jessica Taylor says on Twitter that she knew Ophelia, Ophelia was a Ziz fan, and Ophelia told Jessica that Ophelia was in contact with Somni and Emma prior to the landlord incident.

Clicking on the word "Here" in the post works.

The link to the website is still broken.

2Alexander Gietelink Oldenziel
Does clicking on HERE work for you?

Looking back on this thread, I'm so confused why this comment was highly upvoted. Isn't it kind of obvious that it's reasonable to focus on Eliezer / MIRI because they are super influential (or at least were, with their influence continuing to wane over time)? I think this holds even if there are other problems with shard theory one could address, and even if TurnTrout's complaints about Eliezer/MIRI don't hold water.

Would being in a room with people who are vaping have the same benefits as the fog machine? Obviously it has downsides of smell and other additives, but still - I think this should predict that people maybe don't get airborne illnesses at vaping conventions.

2jefftk
I think that's right! Not a reason to take up vaping, though.

Typo:

We sequencing a typical sample to between one and two billion reads.

Should maybe be "We will be sequencing..."?

4jefftk
Fixed! It should have read "We are sequencing"
2Davidmanheim
I think 'we estimate... to be'
DanielFilanΩ220

Wait I'm a moron and the thing I checked was actually whether it was an exponential function, sorry.

DanielFilanΩ00-3

Votes cost quadratic points – a vote strength of "1" costs 1 point. A vote of strength 4 costs 10 points. A vote of strength 9 costs 45.

FYI this is not a quadratic function.

[This comment is no longer endorsed by its author]Reply
4habryka
How are the triangle numbers not quadratic? n(n+1)2=n2+n2 Sure looks quadratic to me.

Dojo Organizations What organizations are you aware of that are providing some kind of rationality dojo format (courses focused on improving the skill of rationality)?

Seems like the stuff after "Dojo Organizations" should be on a new line.

2Screwtape
. . . yep, it should be. I think I just fixed it, but I can't figure out how that got like that in the first place. Thanks!

About how often do you use LLMs like ChatGPT while active?

What does "while active" mean in this question?

3Screwtape
"Hourly" doesn't count while asleep. If you use it for work, weekends don't count against "Daily." Etc.
3DanielFilan
Seems like the stuff after "Dojo Organizations" should be on a new line.

If one wants to investigate [the Alignment of Complex Systems research group] further, he has an AXRP podcast episode, which I haven’t listened to.

Note that if you want to investigate further but would rather read a transcript than watch a video, AXRP has you covered.

Yeah but a bunch of people might actually answer how their neigbours will vote, given that that's what the pollster asked - and if the question is phrased as the post assumes, that's going to be a massive issue.

So I guess 1.5% of Americans have worse judgment than I expected (by my lights, as someone who thinks that Trump is really bad). Those 1.5% were incredibly important for the outcome of the election and for the future of the country, but they are only 1.5% of the population.

Nitpick: they are 1.5% of the voting population, making them around 0.7% of the US population.

If you ask people who they're voting for, 50% will say they're voting for Harris. But if you ask them who most of their neighbors are voting for, only 25% will say Harris and 75% will say Trump!

Note this issue could be fixed if you instead ask people who the neighbour immediately to the right of their house/apartment will vote for, which I think is compatible with what we know about this poll. That said, the critique of "do people actually know" stands.

khafra120

The story I read about why neighbor polling is supposed to correct for bias in specifically the last few presidential elections is that some people plan to vote for Trump, but are ashamed of this, and don't want to admit it to people who aren't verified Trump supporters. So if you ask them who they plan to vote for, they'll dissemble. But if you ask them who their neighbors are voting for, that gives them permission to share their true opinion non-attributively. 

she should have picked Josh Shapiro as her running mate

Note that this news story makes allegations that, if true, make it sound like the decision was partly Shapiro's:

Following Harris's interview with Pennsylvania Governor Josh Shapiro, there was a sense among Shapiro's team that the meeting did not go as well as it could have, sources familiar with the matter tell ABC News.

Later Sunday, after the interview, Shapiro placed a phone call to Harris' team, indicating he had reservations about leaving his job as governor, sources said.

Oh except: I did not necessarily mean to claim that any of the things I mentioned were missing from the alignment research scene, or that they were present.

DanielFilanΩ340

When I wrote that, I wasn't thinking so much about evals / model organisms as stuff like:

basically stuff along the lines of "when you put agents in X situation, they tend to do Y thing", rather than trying to understand latent causes / capabilities

Yeah, that seems right to me.

DanielFilanΩ17322

A theory of how alignment research should work

(cross-posted from danielfilan.com)

Epistemic status:

  • I listened to the Dwarkesh episode with Gwern and started attempting to think about life, the universe, and everything
  • less than an hour of thought has gone into this post
  • that said, it comes from a background of me thinking for a while about how the field of AI alignment should relate to agent foundations research

Maybe obvious to everyone but me, or totally wrong (this doesn't really grapple with the challenges of working in a domain where an intelligent ... (read more)

Chris_LeongΩ410-2

I agree that we probably want most theory to be towards the applied end these days due to short timelines. Empirical work needs theory in order to direct it, theory needs empirics in order to remain grounded.

6Chris_Leong
Thanks for writing this. I think it is a useful model. However, there is one thing I want to push back against: I agree with Apollo Research that evals isn't really a science yet. It mostly seems to be conducted according to vibes. Model internals could help with this, but things like building experience or auditing models using different schemes and comparing them could help make this more scientific. Similarly, a lot of work with Model Organisms of Alignment requires a lot of careful thought to get right.
5yams
I think the key missing piece you’re pointing at (making sure that our interpretability tools etc actually tell us something alignment-relevant) is one of the big things going on in model organisms of misalignment (iirc there’s a step that’s like ‘ok, but if we do interpretability/control/etc at the model organism does that help?’). Ideally this type of work, or something close to it, could become more common // provide ‘evals for our evals’ // expand in scope and application beyond deep deception. If that happened, it seems like it would fit the bill here. Does that seem true to you?

A way I'd phrase John's sibling comment, at least for the exact case: adding arrows to a DAG increases the set of probability distributions it can represent. This is because the fundamental rule of a Bayes net is that d-separation has to imply conditional independence - but you can have conditional independences in a distribution that aren't represented by a network. When you add arrows, you can remove instances of d-separation, but you can't add any (because nodes are d-separated when all paths between them satisfy some property, and (a) adding arrows can... (read more)

I enjoyed reading Nicholas Carlini and Jeff Kaufman write about how they use them, if you're looking for inspiration.

4Adam Scholl
Thanks; it makes sense that use cases like these would benefit, I just rarely have similar ones when thinking or writing.

Another way of maintaining Sola Scriptura and Perspicuity in the face of Protestant disagreement about essential doctrines is the possibility that all of this is cleared up in the deuterocanonical books that Catholics believe are scripture but Protestants do not. That said, this will still rule out Protestantism, and it's not clear that the deuterocanon in fact clears everything up.

A failure of an argument against sola scriptura (cross-posted from Superstimulus)

Recently, Catholic apologist Joe Heschmeyer has produced a couple of videos arguing against the Protestant view of the Bible - specifically, the claims of Sola Scriptura and Perspicuity (capitalized because I'll want to refer to them as premises later). "Sola Scriptura" has been operationalized a few different ways, but one way that most Protestants would agree on is (taken from the Westminster confession):

The whole counsel of God, concerning all things necessary for [...] m

... (read more)
2DanielFilan
Another way of maintaining Sola Scriptura and Perspicuity in the face of Protestant disagreement about essential doctrines is the possibility that all of this is cleared up in the deuterocanonical books that Catholics believe are scripture but Protestants do not. That said, this will still rule out Protestantism, and it's not clear that the deuterocanon in fact clears everything up.

Oh I misread it as "eighty percent of the effort" oops.

You say "higher numbers for polyamorous relationships" which is contrary to "If you're polyamorous, but happen to have one partner, you would also put 1 for this question."

2Screwtape
Done!

If you've been waiting for an excuse to be done, this is probably the point where twenty percent of the effort has gotten eighty percent of the effect.

Should be "eighty percent of the benefit" or similar.

2Screwtape
I have no opinion on the difference and chatgpt agrees with you, so sure, changed to "eighty percent of the benefit."

I'd be interested in a Q about whether people voted in the last national election for their country (maybe with an option for "my country does not hold national elections") and if so how they voted (if you can find a schema that works for most countries, which I guess is hard).

2Screwtape
Yeah, this would either need options for many countries or one schema for many countries. Asking whether they voted or not in a national election is straight forward enough, and there's been past questions like that. "Voting Did you vote in your country's last major national election?"

In the highest degree question, one option is "Ph D.". This should be "PhD", no spaces, no periods.

2Screwtape
Should be fixed now. Thanks! . . . This is going to mess up comparisons to previous years, I can already tell.

Are you planning on having more children? Answer yes if you don't have children but want some, or if you do have children but want more.

Whether I want to have children and whether I plan to have children are different questions. There are lots of things I want but don't have plans to get, and one sometimes finds oneself with plans to achieve things that one doesn't actually want.

4DanielFilan
Should be "eighty percent of the benefit" or similar.
4DanielFilan
I'd be interested in a Q about whether people voted in the last national election for their country (maybe with an option for "my country does not hold national elections") and if so how they voted (if you can find a schema that works for most countries, which I guess is hard).
4DanielFilan
In the highest degree question, one option is "Ph D.". This should be "PhD", no spaces, no periods.

Sure, I'm just surprised it could work without me having Calibri installed.

4kave
They load it in as a web font (i.e. you load Calibri from their server when you load that search page). We don't do that on LessWrong

Could be a thing where people can opt into getting the vibes or the vibes and the definitions.

4Raemon
In optimal future Star Trek UI world, giving users control over explanation-style seems good.  But for near future, my guess is it's not too hard to get a definition that is just pretty all-around good.

Also, my feedback is that some of the definitions seem kind of vague. Like, apparently an ultracontribution is "a mathematical object representing uncertainty over probability" - this tells me what it's supposed to be, but doesn't actually tell me what it is. The ones that actually show up in the text don't seem too vague, partially because they're not terms that are super precise.

4Raemon
This comment caused me to realize: even though generating LaTeX hoverovers involves more technical challenges, I might be able to tell it "if it's a term that gets defined in LaTeX, include an example equation in the hoverover" (or something like that), which might help for some of these.

How are you currently determining which words to highlight? You say "terms that readers might not know" but this varies a lot based on the reader (as you mention in the long-term vision section).

6DanielFilan
Also, my feedback is that some of the definitions seem kind of vague. Like, apparently an ultracontribution is "a mathematical object representing uncertainty over probability" - this tells me what it's supposed to be, but doesn't actually tell me what it is. The ones that actually show up in the text don't seem too vague, partially because they're not terms that are super precise.

FWIW I think it's not uncommon for people to not use LLMs daily (e.g. I don't).

2Adam Scholl
I also use them rarely, fwiw. Maybe I'm missing some more productive use, but I've experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.
9habryka
Seems like a mistake! Agree it's not uncommon to use them less, though my guess (with like 60% confidence) is that the majority of authors on LW use them daily, or very close to daily.

FWIW I think the actual person with responsibility is the author if the author approves it, and you if the author doesn't.

I believe I'm seeing Gill Sans? But when I google "Calibri" I see text that looks like it's in Calibri, so that's confusing.

2kave
Yeah, that's a google Easter Egg. You can also try "Comic Sans" or "Trebuchet MS".
DanielFilanΩ220

Since people have reported not being able to see the tweet thread, I will reproduce it in this comment (with pictures replaced by my descriptions of them):

If developers had to prove to regulators that powerful AI systems are safe to deploy, what are the best arguments they could use?

Our new report tackles the (very big!) question of how to make a ‘safety case’ for AI.

[image of the start of the paper]

We define a safety case as a rationale developers provide to regulators to show that their AI systems are unlikely to cause a catastrophe.

The term ‘safety ca

... (read more)

Update: I have already gotten over it.

4RobertM
(We switched back to shipping Calibri above Gill Sans Nova pending a fix for the horrible rendering on Windows, so if Ubuntu has Calibri, it'll have reverted back to the previous font.)

It looks kinda small to me, someone who uses Firefox on Ubuntu.

4DanielFilan
Update: I have already gotten over it.

A thing you are maybe missing is that the discussion groups are now in the past.

You should be sure to point out that many of the readings are dumb and wrong

The hope is that the scholars notice this on their own.

Week 3 title should maybe say “How could we safely train AIs…”? I think there are other training options if you don’t care about safety.

Lol nice catch.

We included a summary of Situational Awareness as an optional reading! I guess I thought the full thing was a bit too long to ask people to read. Thanks for the other recs!

to simplify, we ask that for every expression and set of arguments

Here and in the next dot point, should the inner heuristic estimate be conditioning on a larger set of arguments (perhaps chosen by an unknown method)? Otherwise it seems like you're just expressing some sort of self-knowledge.

3Eric Neyman
Yeah, that's right -- see this section for the full statements.

OP doesn't emphasize liability insurance enough but part of the hope is that you can mandate that companies be insured up to $X00 billion, which costs them less than $X00 billion assuming that they're not likely to be held liable for that much. Then the hope is the insurance company can say "please don't do extremely risky stuff or your premium goes up".

On the other hand, there's not a clear criteria for when we would pause again after, say, a six month pause in scaling.

Realized that I didn't respond to this - PauseAI's proposal is for a pause until safety can be guaranteed, rather than just for 6 months.

I believe AI pauses by governments would absolutely be more serious and longer, preventing overhangs from building up too much.

Are you saying that overhangs wouldn't build up too much under pauses because the government wouldn't let it happen, or that RSPs would have less overhang because they'd pause for less long so less overhang would build up? I can't quite tell.

2Noosphere89
That RSPs would have less overhang because they'd pause for less long so less overhang would build up.

I'm not saying there's no reason to think that RSPs are better or worse than pause, just that if overhang is a relevant consideration for pause, it's also a relevant consideration for RSPs.

Load More