We're likely to switch to Claude 3 soon, but currently GPT 3.5. We are mostly expecting it to be useful as a way to interface with existing knowledge initially, but we could make an alternate prompt which is more optimized for being a research assistant brainstorming new ideas if that was wanted.
Would it be useful to be able to set your own system prompt for this? Or have a default one?
Seems like a useful tool to have available, glad someone's working on it.
AI Safety Info's answer to "I want to help out AI Safety without making major life changes. What should I do?" is currently:
It's great that you want to help! Here are some ways you can learn more about AI safety and start contributing:
Learning more about AI alignment will provide you with good foundations for helping. You could start by absorbing content and thinking about challenges or possible solutions.
Consider these options:
Joining the community is a great way to find friends who are interested and will help you stay motivated.
Donating to organizations or individuals working on AI safety can be a great way to provide support.
If you don’t know where to start, consider signing up for a navigation call with AI Safety Quest to learn what resources are out there and to find social support.
If you’re overwhelmed, you could look at our other article that offers more bite-sized suggestions.
Not all EA groups focus on AI safety; contact your local group to find out if it’s a good match. ↩︎
Congratulations on launching!
Added you to the map:
and your Discord to the list of communities, which is now a sub-page of aisafety.com.
One question: Given that interpretability might well lead to systems which are powerful enough to be an x-risk long before we have a strong enough understanding to direct a superintelligence, so publish-by-default seems risky, are you considering adopting a non-publish-by-default policy? I know you talk about capabilities risks in general terms, but is this specific policy on the table?
Internal Double Crux, a cfar technique.
I think not super broadly known, but many cfar techniques fit into the category so it's around to some extent.
And yeah, brains are pretty programmable.
Right, it can be way easier to learn it live. My guess is you're doing something quite IDC flavoured, but mixed with some other models of mind which IDC does not make explicit. Specific mind algorithms are useful, but exploring based on them and finding things which fit you is often best.
DMed a link to an interface which lets you select system prompt and model (including Claude). This is open to researchers to test, but not positing fully publicly as it is not very resistant to people who want to burn credits right now.
Other researchers feel free to DM me if you'd like access.