global forwarding @ CH Robinson, formerly Rule of Law Collaborative
This is interesting work, and I appreciate you taking the time to compile and share it.
I think it will be much more difficult for a model to successfully blackmail anyone than to successfully harass them. Humans are limited in their ability to harass a single target by time and effort more than anything -- nonspecific death threats and vitriol require little to no knowledge of the target beyond a surface level. These models could churn out countless variations of this sort of attack relentlessly, which could certainly detrimentally affect someone's mental health/wellbeing to an equal or greater extent than similar human attacks.
However, in the case of traditional blackmail, the key component of fear is that the attacker will publicize something generally unknown, which is not a strong point of current LLMs. Blasting negative public information everywhere could still be detrimental, especially to someone not inured to such attacks (e.g. a non-celebrity), but I view this as having a low ceiling of efficacy based on current capabilities. LLMs scrape public knowledge. A malicious AI agent would have to first acquire information that is hidden, which means targeting the right person or people with the right threats/bribes to achieve that information. Establishing those connections would be incredibly difficult as well as both time- and resource-intensive.
Alternatively, the LLM could trick the target directly into saying or doing something compromising. This second state is, in my view, much more dangerous and already possible with the current state of LLMs. A refined LLM that emulates a "lifelike" AI romantic partner could be used by a bad actor to catfish someone into sending nude pictures or other compromising information with little adjustment. Spitballing here: these attacks could be shotgunned to several targets without the time investment of human catfish. Then, they could theoretically alert a human organizer when a sensitive point is reached in a conversation to seal the deal, so to speak.
Effective attacks like this are much closer on the horizon than the sort of blackmail utilized by Commander in this post, based on current capabilities. I would be curious to know your thoughts on this and whether this is something we're seeing an uptick in at all.
Thank you for this thoughtful and extremely well-composed piece!
I have mostly been a lurker here on LessWrong, but as I have absorbed the discourse over time, I started coming to a similar conclusion as you both in the earlier sections of your post -- namely, the detriment to our discourse caused by our oversimplification of risk. I think Yudkowsky's "everyone dies" maxim is memetically powerful, but ultimately a bit detrimental to a community already focused on solving alignment. Exposure is something we all need to think more critically about, and I appreciate the tools you have shared from your field to help do so. I hope we adopt some of this terminology in our discussions.
This varies pretty dramatically by how careful an individual player is (as well as whether or not autoexplore is used) but to provide a data point, I would say I'm probably 6.5/10 self-rated careful (increasing sharply from 4/10 to 8-9/10 after I get out of the early floors and get a sense that I'm well-positioned to go deep) and my first victory took about three and a half hours. However, I've gotten very close to victory in other runs in closer to 2-2.5 hours.
Most games will be shorter than that, many significantly so. There is a website that provides statistics for people playing the web version of the game -- http://brogue.roguelikelike.com/#gameStatistics -- but total game length isn't part of it. You can get a sense of difficulty as well as the distribution of where/why runs often end, which can help give an indirect sense of length.
I see a lot of roguelites in the comments (many of which I will happily second, particularly Slay the Spire) but my vote and highest recommendation go to the traditional roguelike Brogue.
This game got me into the genre, so I do have a bit of a nostalgia bias towards it, but it is heavily recommended in traditional roguelike communities and considered to be a staple.
Brogue is as traditional as it comes -- descend into the procedurally-generated dungeon, pick up the Amulet of Yendor, and escape with your singular and fragile life. In my view, it is a refinement and distillation of this formula and of traditional roguelike strategy mechanics into something endlessly replayable and perfectly streamlined.
I would say it meets both your first and second qualifications. There are milestones, both with each floor of the dungeon, and ones that are typically community-defined, and a run in two hours is feasible. However, it is incredibly difficult to win on your first try. I have never heard of anyone doing it, although I'm sure that they have. Among myself and my IRL friends who I have gotten into it, I am the only one to have successfully beaten it, though two of the four of them have gotten very close multiple times. There are players capable of winstreaking it in the community, even when taking on optional additional objectives and descending even deeper into the dungeon than necessary. Every seed, random though it is, is fundamentally winnable and fair. Unlike in many, many similar games, I have never once walked away from a death thinking "that was bullshit" -- there's something beautiful about that, to me. It feels like an elegant game of something almost chesslike, purely my own strategic merit and improvement run to run against the computer.
Regarding strategy/resource management/hidden information/value of information:
Last but not least, a few technical details: The game is free. The community edition is still continuously developed and updated. It runs on Windows, Mac, and Linux, and the entire app is only a couple of megabytes.
If anyone tries it out, please let me know what you think and how you like it.
I am deeply curious who is funding this, considering that there will explicitly be no intermediate product. Only true believers with mindboggling sums of money to throw around would invest in a company with no revenue source. Could it be Thiel? Who else is doing this in the AI space? I hope to see journalists exploring the matter.