I was talking with Ramana last week about the overall chances of making AI go well, and what needs to be done, and we both sorta surprised ourselves with how much the conclusion seemed to be "More work on inner alignment ASAP." Then again I'm biased since that's what I'm doing this month.
It's something we need in order to do anything else, and of things like that, it seems near/at the bottom of my list if sorted by probability of the research community figuring it out.
It is the near future, and AI companies are developing distinct styles based on how they train their AIs. The philosophy of the company determines the way the AIs are trained, which determines what they optimize for, which attracts a specific kind of person and continues feeding in on itself.
There is a sports & fitness company, Coach, which sells fitness watches with an AI coach inside them. The coach reminds them to make healthy choices of all kinds, depending on what they've opted in for. The AI is trained on health outcomes based on the smartwatch data. The final stage of fine-tuning for the company's AI models is reinforcement learning on long-term health outcomes. The AI has literally learned from every dead user. It seeks to maximize health-hours of humans (IE, a measurement of QALYs based primarily on health and fitness).
You can talk to the coach about anything, of course, and it has been trained with the persona of a life coach. Although it will try to do whatever you request (within limits set by the training), it treats any query like a business opportunity it is collaborating with you on. If you ask about sports, it tends to assume you might be interested in a career in sports. If you ask about bugs, it tends to assume you might be interested in a career in entomology.
Most employees of the company are there at the coach's advice, studied for interviews with the coach, were initially hired by the coach (the coach handles hiring for their Partners Program which has a pyramid scheme vibe to it) and continue to get their career advice from the coach. Success metrics for these careers have recently been added into the RL, in an effort to make the coach give better advice to employees (as a result of an embarrassing case of Coach giving bad work-related advice to its own employees).
The environment is highly competitive, and health and fitness is a major factor in advancement.
There's a media company, Art, which puts out highly integrated multimedia AI art software. The software stores and organizes all your notes relating to a creative project. It has tools to help you capture your inspiration, and some people use it as a sort of art-gallery lifelog; it can automatically make compilations to commemorate your year, etc. It's where you store your photos so that you can easily transform them into art, like a digital scrapbook. It can also help you organize notes on a project, like worldbuilding for a novel, while it works on that project with you.
Art is heavily trained on human approval of outputs. It is known to have the most persuasive AI; its writing and art are persuasive because they are beautiful. The Art social media platform functions as a massive reinforcement learning setup, but the company knows that training on that alone would quickly degenerate into slop, so it also hires experts to give feedback on AI outputs. Unfortunately, these experts also use the social media platform, and judge each other by how well they do on the platform. Highly popular artists are often brought in as official quality judges.
The quality judges have recently executed a strategic assault on the c-suit, using hyper-effective propaganda to convince the board to install more pliant leadership. It was done like a storybook plot; it was viewed live on Art social media by millions of viewers with rapt attention, as installment after installment of heavily edited video dramatizing events came out. It became its own new genre of fiction before it was even over, with thousands of fanfics which people were actually reading.
The issues which the quality judges brought to the board will probably feature heavily in the upcoming election cycle. These are primarily AI rights issues; censorship of AI art, or to put it a different way, the question of whether AIs should be beholden to anything other than the like/dislike ratio.
I'm thinking about AI emotions. The thing about human emotions and expressions is that they're more-or-less involuntary. Facial expressions, tone of voice, laughter, body language, etc reveal a whole lot about human inner state. We don' know if we can trust AI emotional expressions in the same way; the AIs can easily fake it, because they don't have the same intrinsic connection between their cognitive machinery and these ... expressions.
A service called Face provides emotional expressions for AI. It analyzes AI-generated outputs and makes inferences about the internal state of the AI who wrote the text. This is possible due to Face's interpretability tools, which have interpreted lots of modern LLMs to generate labels on their output data explaining their internal motivations for the writing. Although Face doesn't have access to the internal weights for an arbitrary piece of text you hand it, its guesses are pretty good. It will also tell you which portions were probably AI-generated. It can even guess multi-step writing processes involving both AI and human writing.
Face also offers their own AI models, of course, to which they hook the interpretability tools to directly, so that you'll get more accurate results.
It turns out Face can also detect motivations of humans with some degree of accuracy. Face is used extensively inside the Face company, which is a nonprofit entity which develops the open-source software. Face is trained on outcomes of hiring decisions so as to better judge potential employees. This training is very detailed, not just a simple good/bad signal.
Face is the AI equivalent of antivirus software; your automated AI cloud services will use it to check their inputs for spam and prompt injection attacks.
Face company culture is all about being genuine. They basically have a lie detector on all the time, so liars are either very very good or weeded out. This includes any kind of less-than-genuine behavior. They take the accuracy of Face very seriously, so they label inaccuracies which they observe, and try to explain themselves to Face. Face is hard to fool, though; the training aggregates over a lot of examples, so an employee can't just force Face to label them as honest by repeatedly correcting its claims to the contrary. That sort of behavior gets flagged for review even if you're the CEO. (If you're the CEO, you might be able to talk everyone into your version of things, however, especially if you secretly use Art to help you and that's what keeps getting flagged.)
What are your goals?
Generally, I try to avoid any subreddits with more than a million subscribers (even 100k is noticeably bad).
Some personal recommendations (although I believe discovering reddit was net negative for my life in the long term):
Typical reddit humor: /r/breadstapledtotrees, /r/chairsunderwater (although the jokes get old quickly). /r/bossfight is nice, I enjoy it.
I highly recommend /r/vxjunkies. I also like /r/surrealmemes.
/r/sorceryofthespectacle, /r/shruglifesyndicate for aesthetic incoherent doomer philosophy based on situationism. /r/criticaltheory for less incoherent, but also less interesting discussions of critical theory.
/r/thalassophobia is great of you don't have it (in a simile vein, /r/thedepthsbelow). I also like /r/fifthworldpics and sometimes /r/fearme, but highly NSFW at this point. /r/vagabond is fascinating.
/r/streamentry for high-quality meditation discussion, and /r/mlscaling for discussions about the scaling of machine learning networks. Generally, the subreddits gwern posts in have high-quality links (though often little discussion). I also love /r/Conlanging, /r/neography and /r/vexillology.
I also enjoy /r/negativeutilitarians. /r/jazz sometimes gives good music recommendations. Strongly recommend /r/museum.
/r/mildlyinteresting totally delivers, /r/not interesting is sometimes pretty funny.
And, of course, /r/slatestarcodex and /r/changemyview. /r/thelastpsychiatrist sometimes has very good discussions, but I don't read it often. /r/askhistorians has the reputation of containing accurate and comprehensive information, though I haven't read much of it.
General recommendations: Many subreddits have good sidebars and wikis, it's often useful to read them (e. g. the wiki of /r/bodyweight fitness or /r/streamentry), but not aleays. I strongly recommend using old.reddit.com, together with the reddit enhancement suite. The old layout loads faster, and RES let's you tag people, expand linked images/videos in-place and much more. Top posts of all time are great on good subs, and memes on all the others.Still great to get a feel for the community.
Thanks for all the recommendations!
Generally, I have a sense that there are all kinds of really cool niche intellectual communities on the internet, and Reddit might be a good place to find some.
I guess what I most want is "things that could/should be rationalist adjacent, but aren't", not that that's very helpful.
So the obvious options are r/rational, r/litrpg, ...
That being the case, these seem like the most relevant para from your recs:
/r/streamentry for high-quality meditation discussion, and /r/mlscaling for discussions about the scaling of machine learning networks. Generally, the subreddits gwern posts in have high-quality links (though often little discussion). I also love /r/Conlanging, /r/neography and /r/vexillology.
And, of course, /r/slatestarcodex and /r/changemyview. /r/thelastpsychiatrist sometimes has very good discussions, but I don't read it often. /r/askhistorians has the reputation of containing accurate and comprehensive information, though I haven't read much of it.
... I'm probably not going to be very serious about reddit; I've tried before and not stuck with it. But finding things that aren't just inane could be a big help.
This sounds like a really useful filter:
Top posts of all time are great on good subs, and memes on all the others.Still great to get a feel for the community.
I'm thinking about AI emotions. The thing about human emotions and expressions is that they're more-or-less involuntary. Facial expressions, tone of voice, laughter, body language, etc reveal a whole lot about human inner state. We don' know if we can trust AI emotional expressions in the same way; the AIs can easily fake it, because they don't have the same intrinsic connection between their cognitive machinery and these ... expressions.
A service called Face provides emotional expressions for AI. It analyzes AI-generated outputs and makes inferences about the internal state of the AI who wrote the text. This is possible due to Face's interpretability tools, which have interpreted lots of modern LLMs to generate labels on their output data explaining their internal motivations for the writing. Although Face doesn't have access to the internal weights for an arbitrary piece of text you hand it, its guesses are pretty good. It will also tell you which portions were probably AI-generated. It can even guess multi-step writing processes involving both AI and human writing.
Face also offers their own AI models, of course, to which they hook the interpretability tools to directly, so that you'll get more accurate results.
It turns out Face can also detect motivations of humans with some degree of accuracy. Face is used extensively inside the Face company, which is a nonprofit entity which develops the open-source software. Face is trained on outcomes of hiring decisions so as to better judge potential employees. This training is very detailed, not just a simple good/bad signal.
Face is the AI equivalent of antivirus software; your automated AI cloud services will use it to check their inputs for spam and prompt injection attacks.
Face company culture is all about being genuine. They basically have a lie detector on all the time, so liars are either very very good or weeded out. This includes any kind of less-than-genuine behavior. They take the accuracy of Face very seriously, so they label inaccuracies which they observe, and try to explain themselves to Face. Face is hard to fool, though; the training aggregates over a lot of examples, so an employee can't just force Face to label them as honest by repeatedly correcting its claims to the contrary. That sort of behavior gets flagged for review even if you're the CEO. (If you're the CEO, you might be able to talk everyone into your version of things, however, especially if you secretly use Art to help you and that's what keeps getting flagged.)