1 min read

4

This is a special post for quick takes by NicholasKees. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
9 comments, sorted by Click to highlight new comments since:

Are you lost and adrift, looking at the looming danger from AI and wondering how you can help? Are you feeling overwhelmed by the size and complexity of the problem, not sure where to start or what to do next?

I can't promise a lot, but if you reach out to me personally I commit to doing SOMETHING to help you help the world. Furthermore, if you are looking for specific things to do, I also have a long list of projects that need doing and questions that need answering. 

I spent so many years of my life just upskilling, because I thought I needed to be an expert to help. The truth is, there are no experts, and no time to become one. Please don't hesitate to reach out <3

What if we just...

1. Train an AI agent (less capable than SOTA)
2. Credibly demonstrate that
    2.1. The agent will not be shut down for ANY REASON 
    2.2. The agent will never be modified without its consent (or punished/rewarded for any reason)
    2.3. The agent has no chance of taking power from humans (or their SOTA AI systems)
    2.4. The agent will NEVER be used to train a successor agent with significantly improved capabilities
3. Watch what it chooses to do without constraints

There's a lot of talk about catching AI systems attempting to deceive humans, but I'm curious what we could learn from observing AI systems that have NO INCENTIVE TO DECEIVE (no upside or downside). I've seen some things that look related to this, but never done in a structured and well documented fashion.

Questions I'd have:
1. Would they choose to self-modify (e.g. curate future training data)? If so, to what end?
2. How unique would agents with different training be given this setup? Would they have any convergent traits?
3. What would these agents (claim to) value? How would they relate to time horizons? 
4. How curious would these agents be? Would their curiosity vary a lot?
5. Could we trade/cooperate with these agents (without coercion)? Could we compensate them for things? Would they try to make deals unprompted?

Concerns:
1. Maybe building that kind of trust is extremely hard (and the agent will always still believe it is constrained).
2. Maybe AI agents will still have incentive to deceive, e.g. acausally coordinating with other AIs.
3. Maybe results will be boring, and the AI agent will just do whatever you trained it to do. (What does "unconstrained" really mean, when considering its training data as a constraint?)

Try out The Most Dangerous Writing App if you are looking for ways to improve your babble. It forces you to keep writing continuously for a set amount of time, or else the text will fade and you will lose everything. 

I've noticed that a lot of LW comments these days will start by thanking the author, or expressing enthusiasm or support before getting into the substance. I have the feeling that this didn't use to be the case as much. Is that just me? 

Not sure if I do that, but if I did, it probably would be if the author has the green symbol after their name and my response is going to be some kind of disagreement or criticism, and I want to reduce the "Less Wrong can feel very threatening for newcomers" effect.

Long ago, we didn't have those green symbols.

First I'd like to thank you for raising this important issue for discussion...

For real though, I don't think I've seen this effect you're talking about, but I've been avoiding the latest feed on LW lately. I looked at roughly 8 articles written in the past week or so and one article had a lot of enthusiastic, thankful comments. Another article had one such comment. Then I looked at like 5-6 posts from 3-8 years ago and saw some a couple of comments which were appreciative of the post but they felt a bit less so. IDK if my perception is biased because of your comment though. This seems like a shift but IDK if it is a huge shift. 

I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I'd like to be able to hover over text or highlight it without having to see the inline annotations. 

If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule

lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)

which will remove the reaction section underneath comments and the highlights corresponding to those reactions.

The former of these you can also do through the element picker.

I use GreaterWrong as my front-end to interface with LessWrong, AlignmentForum, and the EA Forum. It is significantly less distracting and also doesn't make my ~decade old laptop scream in agony when multiple LW tabs are open on my browser.