All of Iknownothing's Comments + Replies

Why aren't more people in AIS familiar with PDP?

I'm really excited to see this!!
I'd like it if this became embed-able so it could be used on ai-plans.com and on other sites!!
Goodness knows, I'd like to be able to get summaries and answers to obscure questions on some alignmentforum posts!

Why Is No One Trying To Align Profit Incentives With Alignment Research?

What do you think someone who knows about PDP knows that someone with a good knowledge of DL doesn't?
And why would it be useful?

How dath ilan coordinates around solving alignment

I think folks in AI Safety tend to underestimate how powerful and useful liability and an established duty of care would be for this.

Here's the exit.

Iknownothing1y00

I think calling things a 'game' makes sense to lesswrongers, but just seems unserious to non lesswrongers.

Iknownothing1y00

I don't think a lack of IQ is the reason we've been failing at making AI sensibly. Rather, it's a lack of good incentive making.
Making an AI recklessly is current much more profitable than not doing do- which imo, shows a flaw in the efforts which have gone towards making AI safe - as in, not accepting that some people have a very different mindset/beliefs/core values and figuring out a structure/argument that would incentivize people of a broad range of mindsets.

How dath ilan coordinates around solving alignment

Critique-a-Thon of AI Alignment Plans

Hasn't Eliezer Yudkowsky largely failed at solving alignment and getting other to solve alignment?
And wasn't he largely responsible for many people noticing that AGI is possible and potentially highly fruitful?
Why would a world where he's the median person be more likely to solve to solve alignment?

2Zack_M_Davis1y

In a world where the median IQ is 143, the people at +3σ are at 188. They might succeed where the median fails.

Critique-a-Thon of AI Alignment Plans

Update: Rob Miles will also be judging some critiques! He'll be judging Communication!

Iknownothing1y30

Hi, I'm Kabir Kumar, the founder of AI-Plans.com, I'm happy to answer any questions you might have about the site or the Critique-a-Thon!

Shallow review of live agendas in alignment & safety

AI Safety is Dropping the Ball on Clown Attacks

Hi, we've already made a site which does this!

Does bulemia work?

Answer by IknownothingNov 06, 2023-2-4

Probably much better for health overall to have a bowl of veg and fruit at your table for easy healthy snacking (carrots, cucumber, etc)

Iknownothing1y20

Most of my knowledge on dependencies and addictions comes from a brief study I did on neurotransmitter's roles in alcohol dependence/abuse while in school, for an EPQ, so I'm really not sure how much of this applies- also, a lot of my study was finding that my assumptions were in the wrong direction(I didn't know about endorphins)- but I think a lot of the stuff on neurotransmitters and receptors holds across different areas- take it with some salt though.

Quitting cold turkey rarely ever works for addictions/dependencies. The vast majority of t... (read more)

AI Safety is Dropping the Ball on Clown Attacks

Iknownothing1y82

When I say media, I mean social media, movies, videos, books etc- any type of recording or something that you believe you're using as entertainment.

I'm trying this myself. Done singular days before, sometimes 2 or 3 days, but failed to keep it consistent. I did find that when I did it, my work output was far higher and greater quality, I had a much better sleeping schedule and was generally in a much more enjoyable mood.
I also ended up spending more time with friends and family, meeting new people, trying interesting things, spending time outdoors, e... (read more)

2trevor1y

I predict (losing Bayes points if I'm wrong) that most people will have a similar experience, but I also predict that the best strategy is to quit cold turkey; nicotine does not run SGD to notice that user retention is at risk and autonomously take actions that were successful at mitigating risk in the past. It would be hard for them to make their systems not optimize in weird ways due to goodhart's law; furthermore, anyone running a successful social media platform would need to give the algorithms a wide leeway to experiment with user retention, since competitor platforms might be running systems that also autonomously form novel strategies.

AI Safety is Dropping the Ball on Clown Attacks

Iknownothing1y80

A challenge for folks interested: spend 2 weeks without media based entertainment.

2trevor1y

I'd love it if people could try the basic precautions and see how harmless they are! Especially because they might be the minimum ask in order to avoid getting your brain and motivation/values hacked. I guess there would be bonus points for avoiding watching videos that millions of other people have watched.

My thoughts on the social response to AI risk

Iknownothing1y96

"CESI’s Artificial Intelligence Standardization White Paper released in 2018 states
that “AI systems that have a direct impact on the safety of humanity and the safety of life,
and may constitute threats to humans” must be regulated and assessed, suggesting a broad
threat perception (Section 4.5.7).42 In addition, a TC260 white paper released in 2019 on AI
safety/security worries that “emergence” (涌现性) by AI algorithms can exacerbate the
black box effect and “autonomy” can lead to algorithmic “self-improvement” (Section
3.2.1.3).43"
From https://concordia-consult... (read more)

An Ignorant View on Ineffectiveness of AI Safety

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

I disagree with this paragraph today: "A lot of what AI does currently, that is visible to the general public seems like it could be replicated without AI"

List of how people have become more hard-working

I was talking about for a farmer. For a consumer, they can get their eggs/milk from such a farmer and fund/invest in such a farm, if they can.
Or talk to a local farm about setting aside some chickens, pay for them to be given extra space, better treatment, etc.

I don't really know what you mean about the EA reducetarian stuff.

Also, if you as an individual want to be healthy, not contribute to harming animal and have the time, space, money, willingness etc to raise some chickens, why not?

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

Exercise in general is pretty great, yes. Especially if done outdoors, imo.

Iknownothing2y-3-3

Could a solution to some of this be to raise some chickens for eggs, treat them nicely, give them space to roam, etc?
Obviously the best would be to raise cows as well, treat them well, don't kill the male calves, etc- but that's much less of an option for most.

3orthonormal2y

Your framing makes it sound like individual raising of livestock, which is silly—specialization of expertise and labor is a very good thing, and "EA reducetarians find or start up a reasonably sized farm whose animal welfare standards seem to them to be net positive" seems to dominate "each EA reducetarian tries to personally raise chickens in a net positive way" (even for those who think both are bad, the second one seems simply worse at a fixed level of consumption).

AI Alignment Breakthroughs this Week [new substack]

The point of a game is not to win, and you shouldn't even pretend that it is

This is great! Thank you for doing this! Might add some of these to ai-plans.com!

3Logan Zoellner2y

Cool site! It doesn't look like there's a button for "Add a strength" on e.g. https://ai-plans.com/post/f180b51d7e6a (although it appears possible to do so if I click the "show post" button. I also wish there was some way to understand depth/breadth of plans. E.g. is this a "full alignment plan" (examples would be The Plan or Provably Save Systems) or is this a narrow technical research direction (e.g. this post). Ideally, there would be some kind of prediction market style mechanism that assigned "dignity points" to plans that were most likely to contribute significantly to AI Alignment.

Iknownothing2y1-2

Yes, winning if fun!

Iknownothing2y-10

I think this kind of thing makes people feel like you're pushing a message, to which the automatic response is to push back.
What I've found works is to be agreeable, inviting, meet them at their own values and present how it as a hard problem to solve which isn't being competently tackled by this other dumb group (not us, we wouldn't do this).
That kind of thing. Had a 100% success rate so far.
I'm simplifying my approach, since I'm not spending a lot of time on this, but if you imagine I'm not a dumbass and think about what kind of approach like this could work a lot, while not being dumb in that it doesn't actually address the problem, you'll probably get what I mean.

Automatic Rate Limiting on LessWrong

Iknownothing2y5-5

I'm generally disincentivized to post or put effort into a post from the system where someone can just heavily downvote my post, without even giving a reason.

Automatic Rate Limiting on LessWrong

Politics is the Mind-Killer

A simple way to improve this system would be to require someone to comment/give a reason when heavily upvoting/heavily downvoting things.

There should be more AI safety orgs

"In the ancestral environment, politics was a matter of life and death." - this is a pretty strong statement to make with no evidence to back it up.

[+]Iknownothing2y-7-1

5Rebecca2y

They’re talking about technical research orgs/labs, not ancillary orgs/projects

There should be more AI safety orgs

Iknownothing2y22

I think your ideas are some of the most promising I've seen- I'd love to see them pursued further, though I'm concerned about the air-gaping

AI-Plans.com - a contributable compendium

AI presidents discuss AI alignment agendas

Hi Ruby! Thanks for the great feedback!! Sorry for the late reply, I've been working on the site!

So, we're not doing just criticisms anymore- we're ranking plans by Total Strength score - Total Vulnerabilities scores. Quite a few researchers have been posting their plans on the site!
Going to do a full rebuild soon, to make the site look nicer and be even faster to work on.
We're also holding regular critique-a-thons. The last one went very well!
We had 40+ submissions and produced what I think is really great work!
We also made a Broad List of Vulnerabi... (read more)

AI presidents discuss AI alignment agendas

This was really great. Thanks for making it.

I was curious why Trump was dropping some of the best takes!

AI-Plans.com 10-day Critique-a-Thon

Yeah, I think you're right- at least about the sequences.

I think something more specific about attitudes would be more accurate and useful.

Simple alignment plan that maybe works

Thank you! I've sorted that now!!

Please let me know if you have any other feedback!!

Iknownothing2y12

From my very spotty info on evolution:
Humans got 'trained' to maximise reproducibility and in doing so maximised a bunch of other stuff along the way- including resource acquisition.

What I spoke about here is creating an environment where a more intelligent+fast agent is put in an environment that is deliberately crafted such that it can only survive by helping much dumber, slower agents. Training to act co-operatively.

Writing this out, I may have just made an overcomplicated version of reinforcement learning.

Simple alignment plan that maybe works

Simple alignment plan that maybe works

That was something like what I was thinking. But I think this won't work, unless modified so much that it'd be completely different. More an idea to toss around.

I'll start over with something else. I do think something that might have value is designing an environment that induces empathy/values/whatever, rather than directly trying to design the AI to be what you want from scratch.
Environment design can be very powerful in influencing humans, but that's in huge part because we (or at least, those of us who put thought in designing environments... (read more)

1mishka2y

Yes, I think we are looking at "seeds of feasible ideas" at this stage, not at "ready to go" ideas... I tried to look at what would it take for super-powerful AIs * not to destroy the fabric of their environment together with themselves and everything * to care about "interests, freedom, and well-being of all sentient beings" That's not too easy, but might be doable in a fashion invariant with respect to recursive self-modification (and might be more feasible than more traditional approaches to alignment). Of course, the fact that we don't know what's sentient and what's not sentient does not help, to say the least ;-) But perhaps we and/or AIs and/or our collaborations with AIs might figure this out sooner rather than later... Anyway, I did scribble a short write-up on this direction of thinking a few months ago: Exploring non-anthropocentric aspects of AI existential safety

Decision Theory with the Magic Parts Highlighted

That's very astute. True.

Even briefer summary of ai-plans.com

On the porch/outside/indoors thing- maybe that's not a great example, because having the numbers there seems to add nothing of value to me. Other than maybe clarifying to yourself how you feel about certain ideas/outcomes, but that's something that any one with decent thinking does anyways.

2moridinamael2y

The Party Problem is a classic example taught as an introductory case in decision theory classes, that was the main reason why I chose it.

Iknownothing2y32

Sorry, I think I have an idea of what you're saying, but I'm not really sure. Do you mind elaborating? With a little less LessWrong lingo, please.

Even briefer summary of ai-plans.com

Iknownothing2y30

Absolutely!

One of the reasons I've gone against the idea of tags, different ways of sorting, etc (though they get brought up a lot) is that it could lead to plans which are the most attractive at first glance, or the most understandable at first glance, appealing, etc getting the most attention.
It's very important that what a criticism's points measure is the validity of the criticism to the plan and not something else - though, I think if there are two criticisms making the same point and one gets a higher amount of points because it's more readable... (read more)

2Gurkenglas2y

Suppose an outcome pump picks a random property, checks if papers with it Goodhart your points, and time-loops until it finds one. Do you think it would eventually find one? Unfortunately, optimization tries all properties in parallel, without even an outcome pump. Treat hardness proofs (perpetual motion, NP, ...) as neon tubes on the box to think outside of. Find any difference between the proven-hard problem and yours (usually exists!), then imagine leads that wouldn't help on the proven-hard problem, leads you don't get better at ruling out by knowing the existing proof. To not fall to the dire kind of "adversary" that moves after you, don't calculate a number.

Brief summary of ai-plans.com

An overview of the points system

Thank you, I think there's an error in my phrasing.
I should have said:

Currently, it takes a very long time to get an idea of who is doing what in the field of AI Alignment and how good each plan is, what the problems are, etc.

Brief summary of ai-plans.com

Thank you very much for this.
I agree, it does seem like this way, people will end up getting a bunch of karma even for bad criticisms. Which would defeat the whole point of the points system.

I'm not sure I fully understand "So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)"

I th... (read more)

2Zoltan Foris2y

Let me explain this suggestion of mine: "So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)" I explain on an example. There are 800 criticisms arrived in Januar 2024, in total. all have their upvote/downvote based points (let us say as of 15 Feb), let me call these "raw points". We put them in the order of increasing raw points. The worst let be -5, the 100th 5, the the 400th (the middle one) 25, the top one 110. Now a multiplier "m" is calculated for the bottom 400 criticisms, it will be 1-(400-x)/400 , where x is the rank of the criticism, so x=1 for the worst one, x=100 for the 100th one. Now, for example, the worst criticism had raw point -5, and this was calculated as a sum of upvote - downvote points (raw = up - down), let us assume total upvote points 10, total downvote 15, so -5 = 10-15. We now apply the multiplier : final points = m*up - down. In this example, final points = (1-399/400)*10 - 15 = -15. So the final point will be approximately -15 because a heavy multiplier has decreased the value of the upvotes.

LeCun says making a utility function is intractable

not just that. It's because the field isn't organized at all.

AI-Plans.com - a contributable compendium

Sorry, what do you mean?

2TAG2y

Alignment -- getting the utility function exactly right-- and Control are the two main proposals for AI safety. Whilst LeCunns's proposal isn't alignment, it is control.