On the topic of security mindset, the thing that the LW community calls "security mindset" isn't even an accurate rendition of what computer security people would call security mindset. As noted by lc, actual computer security mindset is POC || GTFO, or trying to translate that into lesswrongesse, you do not have warrant to believe in something until you have an example of the thing you're maybe worried about being a real problem because you are almost certain to be privileging the hypothesis.
POC || GTFO is not "security mindset", it's a norm. It's like science in that it's a social technology for making legible intellectual progress on engineering issues, and allows the field to parse who is claiming to notice security issues to signal how smart they are vs. who is identifying actual bugs. But a lack of "POC || GTFO" culture doesn't tell you that nothing is wrong, and demanding POCs for everything obviously doesn't mean you understand what is and isn't secure. Or to translate that into lesswrongese, reversed stupidity is not intelligence.
In the cybersecurity analogy, it seems like there are two distinct scenarios being conflated here:
1) Person A says to Person B, "I think your software has X vulnerability in it." Person B says, "This is a highly specific scenario, and I suspect you don't have enough evidence to come to that conclusion. In a world where X vulnerability exists, you should be able to come up with a proof-of-concept, so do that and come back to me."
2) Person B says to Person A, "Given XYZ reasoning, my software almost certainly has no critical vulnerabilities of any kind. I'm ...
At the very least I think it would be more accurate to say “one aspect of actual computer security mindset is POC || GTFO”. Right? Are you really arguing that there’s nothing more to it than that?? That seems insane to me.
Even leaving that aside, here’s a random bug thread:
...Mozilla developers identified and fixed several stability bugs in the browser engine used in Firefox and other Mozilla-based products. Some of these crashes showed evidence of memory corruption under certain circumstances and we presume that with enough effort at least some of these coul
Citation needed? The one computer security person I know who read Yudkowsky's post said it was a good description of security mindset. POC||GTFO sounds useful and important too but I doubt it's the core of the concept.
Also, if the toy models, baby-AGI-setups like AutoGPT, and historical examples we've provided so far don't meet your standards for "example of the thing you're maybe worried about" with respect to AGI risk, (and you think that we should GTFO until we have an example that meets your standards) then your standards are way too high.
If instead PO...
Are AI partners really good for their users?
Compared to what alternative?
As other commenters have pointed out, the baseline is already horrific for men, who are suffering. Your comments in the replies seem to reject that these men are suffering. No, obviously they are.
But responding in depth would just be piling on and boring, so instead let's say something new:
I think it would be prudent to immediately prohibit AI romance startups to onboard new users[..]
You do not seem to understand the state of the game board: AI romance startups are dead, and we...
So, I started off with the idea that Ziz's claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, "collapse the timeline," etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.
But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back ...
It's obviously not defamation since Ziz believes its true.
We're veering dangerously close into dramaposting here, but just FYI habyka has already contested that they ever said this. I would like to know if the ban accusations are true, though.
The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won't be solved in the short term. They predict that AI will only come from GOFAI AIX...
Deep Learning systems don't look like they FOOM. Stochastic Gradient Descent doesn't look like it will treacherous turn.
I think you've updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.
If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they're idiot-savants with skill gaps t...
They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively.
Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)
They predict fast takeoff and FOOM. … Deep Learning systems don’t look like they FOOM.
It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it...
It's pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won't FOOM, or we otherwise needn't do anything inconvenient to get good outcomes. It's proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I'm considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with strai...
It's not exactly the point of your story, but...
Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them.
Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even ...
Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even more damning?
Louie Helm was behind MIRICult (I think as a result of some dispute where he asked for his job back after he had left MIRI and MIRI didn't want to give him his job back). As far as I can piece together from talking to people, he did not get paid out, bu...
So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.
Er… is anyone actually claiming this? This is quite the accusation, and if it were being made, I’d want to see some serious evidence, but… is it, in fact, being made?
(It does seem like OP is saying this, but… in a weird way that doesn’t seem to acknowledge the magnitude of the accusation, and treats it as a reasonable characterization of other claims made earlier in the post. But that doesn’t actually seem to make sense. Am I misreading, or what?)
Just to check, has anyone actually done that?
I'm thinking of a specific recent episode where [i can't remember if it was AI Safety Memes or Connor Leahy's twitter account] posted a big meme about AI Risk Deniers and this really triggered Alexandros Marinos. (I tried to use Twitter search to find this again, but couldn't.)
It's quite commonly used by a bunch of people at Constellation, Open Philanthropy and some adjacent spaces in Berkeley.
Fascinating. I was unaware it was used IRL. From the Twitter user viewpoint, my sense is that it's mostly used by people who don't believe in the AI risk narrative as a pejorative.
Why are you posting this here? My model is that the people you want to convince aren't on LessWrong and that you should be trying to argue this on Twitter; you included screenshots from that site, after all.
(My model of the AI critics would be that they'd shrug and say "you started it by calling us AI Risk Deniers.")
you started it by calling us AI Risk Deniers.
Just to check, has anyone actually done that? I don't remember that term used before. It's fine as an illustration, just trying to check whether this is indeed happening a bunch.
Why are you posting this here? My model is that the people you want to convince aren't on LessWrong and that you should be trying to argue this on Twitter; you included screenshots from that site, after all.
It's quite commonly used by a bunch of people at Constellation, Open Philanthropy and some adjacent spaces in Berkeley. It is ...
My understanding of your point is that Mason was crazy because his plans didn't follow from his premise and had nothing to do with his core ideas. I agree, but I do not think that's relevant.
I am pushing back because, if you are St. Petersberg Paradox-pilled like SBF and make public statements that actually you should keep taking double or nothing bets, perhaps you are more likely to make tragic betting decisions and that's because of you're taking certain ideas seriously. If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Ala...
I am pushing back because, if you are St. Petersberg Paradox-pilled like SBF and make public statements that actually you should keep taking double or nothing bets, perhaps you are more likely to make tragic betting decisions and that's because of you're taking certain ideas seriously. If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.
This is conceding a big part of your argument. You’re basically saying, yes, SBF’s decision was -EV according to any normal analysis, but according to a particular ...
But then they go and (allegedly) waste Jamie Zajko's parents in a manner that doesn't further their stated goals at all and makes no tactical sense to anyone thinking coherently about their situation.
And yet that seems entirely in line with the "Collapse the Timeline" line of thinking that Ziz advocated.
...Ditto for FTX, which, when one business failed, decided to commit multi-billion dollar fraud via their other actually successfully business, instead of just shutting down alameda and hoping that the lenders wouldn't be able to repo too much of the exch
And yet, that seems like the correct action if you sufficiently bullet bite expected value and the St. Petersberg Paradox, which SBF did repeatedly in interviews.
I am not making an argument that the crime was +EV but SBF was dealt a bad hand. The EV of turning your entire business into the second largest ponzi scheme ever in order to save the smaller half is pretty apparently stupid, and ran an overwhelming chance of failure. There is no EV calculus where the SBF decision is a good one except maybe one in which he ignores externalities to EA and is simp...
I suggest a more straightforward model: taking ideas seriously isn't healthy. Most of the attempts to paint SBF as not really an EA seem like weird reputational saving throws when he was around very early on and had rather deep conviction in things like the St. Petersburg Paradox...which seems like a large part of what destroyed FTX. And Ziz seemed to be one of the few people to take the decision theoretical "you should always act as if you're being simulated to see what sort of decision agent you are" idea seriously...and followed that to their downfall. ...
What made Charles Manson's cult crazy in the eyes of the rest of society was not that they (allegedly) believed that was a race war was inevitable, and that white people needed to prepare for it & be the ones that struck first. Many people throughout history who we tend to think of as "sane" have evangelized similar doctrines or agitated in favor of them. What made them "crazy" was how nonsensical their actions were even granted their premises, i.e. the decision to kill a bunch of prominent white people as a "false flag".
Likewise, you can see how Lasot...
The passage is fascinating because the conclusion looks so self-evidently wrong from our perspective. Agents with the same goals are in contention with each other? Agents with different goals get along? What!?
Is this actually wrong? It seems to be a more math flavored restatement of Girardian mimesis, and how mimesis minimizes distinction which causes rivalry and conflict.
I was going to write something saying "no actually we have the word genocide to describe the destruction of a peoples," but walked away because I didn't think that'd be a productive argument for either of us. But after sleeping on it, I want to respond to your other point:
...I don't think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I'd rather play iterated prisoner's dilemma with someone smart enough to play tit-for-tat
This is kind of the point where I despair about LessWrong and the rationalist community.
While I agree that he did not call for nuclear first strikes on AI centers, he said:
If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
and
...Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange
So I disagree with this, but, maybe want to step back a sec, because, like, yeah the situation is pretty scary. Whether you think AI extinction is imminent, or that Eliezer is catastrophizing and AI's not really a big deal, or AI is a big deal but you think Eliezer's writing is making things worse, like, any way you slice it something uncomfortable is going on.
I'm very much not asking you to be okay with provoking a nuclear second strike. Nuclear war is hella scary! If you don't think AI is dangerous, or you don't think a global moratorium is a good soluti...
...Yeah, see, my equivalent of making ominous noises about the Second Amendment is to hint vaguely that there are all these geneticists around, and gene sequencing is pretty cheap now, and there's this thing called CRISPR, and they can probably figure out how to make a flu virus that cures Borderer culture by excising whatever genes are correlated with that and adding genes correlated with greater intelligence. Not that I'm saying anyone should try something like that if a certain person became US President. Just saying, you know, somebod
Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.
I think this generalizes to more than LeCun. Screencaps of Yudkowsky's Genocide the Borderers Facebook post still circulated around right wing social media in response to mentions of him for years, which makes forming any large coalition rather difficult. Would you trust someone who posted that with power over your future if you were a Borderer or...
Redwood Research used to have a project about trying to prevent a model from outputting text where a human got hurt, which IIRC, they did primarily by trying to fine tunes and adversarial training. (Followup). It would be interesting to see if one could achieve better results then they did at the time through subtracting some sort of hurt/violence vector.
Page 4 of this paper compares negative vectors with fine-tuning for reducing toxic text: https://arxiv.org/pdf/2212.04089.pdf#page=4
In Table 3, they show in some cases task vectors can improve fine-tuned models.
Firstly, it suggests that open-source models are improving rapidly because people are able to iterate on top of each other's improvements and try out a much larger number of experiments than a small team at a single company possibly could.
Widely, does this come as a surprise? I recall back to the GPT2 days where the 4chan and Twitter users of AIDungeon discovered various prompting techniques we use today. More access means more people trying more things, and this should already be our base case because of how open participation in open source has advanc...
I have a very strong bias about the actors involved, so instead I'll say:
Perhaps LessWrong 2.0 was a mistake and the site should have been left to go read only.
My recollection was that the hope was to get a diverse diaspora to post in one spot again. Instead of people posting on their own blogs and tumblrs, the intention was to shove everyone back into one room. But with a diverse diaspora, you can have local norms to a cluster of people. But now when everyone is trying to be crammed into one site, there is an incentive to fight over global norms and attempt to enforce them on others.
This response is enraging.
Here is someone who has attempted to grapple with the intellectual content of your ideas and your response is "This is kinda long."? I shouldn't be that surprised because, IIRC, you said something similar in response to Zack Davis' essays on the Map and Territory distinction, but that's ancillary and AI is core to your memeplex.
I have heard repeated claims that people don't engage with the alignment communities' ideas (recent example from yesterday). But here is someone who did the work. Please explain why your response here does ...
I would agree with this if Eliezer had never properly engaged with critics, but he's done that extensively. I don't think there should be a norm that you have to engage with everyone, and "ok choose one point, I'll respond to that" seems like better than not engaging with it at all. (Would you have been more enraged if he hadn't commented anything?)
Meta-note related to the question: asking this question here, now, means you're answer will be filtered for people who stuck around with capital r Rationality and the current LessWrong denizens, not the historical ones who have left the community. But I think that most of the interesting answers you'd get are from people who aren't here at all or rarely engage with the site due to the cultural changes over the last decade.
OK, but we've been in that world where people have cried wolf too early at least since The Hacker Learns to Trust, where Connor doesn't release his GPT-2 sized model after talking to Buck.
There's already been a culture of advocating for high recall with no regards to precision for quite some time. We are already at the "no really guys, this time there's a wolf!" stage.
Right now, I wouldn't recommend trying either Replika or character.ai: they're both currently undergoing major censorship scandals. character.ai has censored their service hard, to the point where people are abandoning ship because the developers have implemented terrible filters in an attempt to clamp down on NSFW conversations, but this has negatively affected SFW chats. And Replika is currently being investigated by the Italian authorities, though we'll see what happens over the next week.
In addition to ChatGPT, both Replika and character.ai are driving...
Didn't read the spoiler and didn't guess until half way through "Nothing here is ground truth".
I suppose I didn't notice because I already pattern matched to "this is how academics and philosophers write". It felt slightly less obscurant than a Nick Land essay, though the topic/tone aren't a match to Land. Was that style deliberate on your part or was it the machine?
Like things, simulacra are probabilistically generated by the laws of physics (the simulator), but have properties that are arbitrary with respect to it, contingent on the initial prompt and random sampling (splitting of the timeline).
What do the smarter simulacra think about the physics of which they find themselves in? If one was very smart, could they look at what the probabilities of the next token, and wonder about why some tokens get picked over others? Would they then wonder about how the "waveform collapse" happens and what it means?
While it’s nice to have empirical testbeds for alignment research, I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself.
On the margin, this is already happening.
Stability.ai delayed the release of Stable Diffusion 2.0 to retrain the entire system on a dataset filtered without any NSFW content. There was a pretty strong backlash against this and it seems to have caused a lot of people to move towards the idea that they have to train their own mod...
Zack's series of posts in late 2020/early 2021 were really important to me. They were a sort of return to form for LessWrong, focusing on the valuable parts.
What are the parts of The Sequences which are still valuable? Mainly, the parts that build on top of Korzybski's General Semantics and focus hard core on map-territory distinctions. This part is timeless and a large part of the value that you could get by (re)reading The Sequences today. Yudkowsky's credulity about results from the social sciences and his mind projection fallacying his own mental quirk...
The funny thing is that I had assumed the button was going to be buggy, though I was wrong how. The map header has improperly swallowed mouse scroll wheel events whenever it's shown; I had wondered if the button would also interpret them likewise since it was positioned in the same way, so I spent most of the day carefully dragging the scrollbar.
There must be some method to do something, legitimately and in good-faith, for people's own good.
"Must"? There "must" be? What physical law of the universe implies that there "must" be...?
Let's take the local Anglosphere cultural problem off the table. Let's ignore that in the United States, over the last 2.5 years, or ~10 years, or 21 years, or ~60 years (depending on where you want to place the inflection point), social trust has been shredded, policies justified under the banner of "the common good" have primarily been extractive and that in the US, ...
This seems mostly wrong? A large portion of the population seems to have freedom/resistance to being controlled as a core value, which makes sense because the outside view on being controlled is that it's almost always value pumping. "It's for your own good," is almost never true and people feel that in their bones and expect any attempt to value pump them to have a complicated verbal reason.
The entire space of paternalistic ideas is just not viable, even if limited just to US society. And once you get to anarchistic international relations...
I agree that paternalism without buy-in is a problem, but I would note LessWrong has historically been in favor of that: Bostrom has weakly advocated for a totalitarian surveillance state for safety reasons and Yudkowsky is still pointing towards a Pivotal Act which takes full control of the future of the light cone. Which I think is why Yudkowsky dances around what the Pivotal Act would be instead: it's the ultimate paternalism without buy-in and would (rationally!) cause everyone to ally against it.
What changed with the transformer? To some extent, the transformer is really a "smarter" or "better" architecture than the older RNNs. If you do a head-to-head comparison with the same training data, the RNNs do worse.
But also, it's feasible to scale transformers much bigger than we could scale the RNNs. You don't see RNNs as big as GPT-2 or GPT-3 simply because it would take too much compute to train them.
You might be interested in looking at the progress being made on the RWKV-LM architecture, if you aren't following it. It's an attempt to train an RNN like a transformer. Initial numbers look pretty good.
I think the how-to-behave themes of the LessWrong Sequences are at best "often wrong but sometimes motivationally helpful because of how they inspire people to think as individuals and try to help the world", and at worst "inspiring of toxic relationships and civilizational disintegration."
I broadly agree with this. I stopped referring people to the Sequences because of it.
One other possible lens to filter a better Sequences: is it a piece relying on Yudkowsky citing current psychology at the time? He was way too credulous, when the correct amount to up...
I want to summarize what's happened from the point of view of a long time MIRI donor and supporter:
My primary takeaway of the original post was that MIRI/CFAR had cultish social dynamics, that this lead to the spread of short term AI timelines in excess of the evidence, and that voices such as Vassar's were marginalized (because listening to other arguments would cause them to "downvote Eliezer in his head"). The actual important parts of this whole story are a) the rationalistic health of these organizations, b) the (possibly improper) memetic spread of t...
That sort of thinking is why we're where we are right now.
Be the change you wish to see in the world.
I have no idea how that cashes out game theoretically. There is a difference between moving from the mutual cooperation square to one of the exploitation squares, and moving from an exploitation square to mutual defection. The first defection is worse because it breaks the equilibrium, while the defection in response is a defensive play.
swarriner's post, including the tone, is True and Necessary.
It's just plain wrong that we have to live in an adversarial communicative environment where we can't just take claims at face value without considering political-tribe-maneuvering implications.
Oh? Why is it wrong and what prevents you from ending up in this equilibrium in the presence of defectors?
More generally, I have ended up thinking people play zero-sum status games because they enjoy playing zero-sum status games; evolution would make us enjoy that. This would imply that coordination beats epistemics, and historically that's been true.
[The comment this was a response to has disappeared and left this orphaned? Leaving my reply up.]
But there's no reason to believe that it would work out like this. He presents no argument for the above, just pure moral platitudes. It seems like a pure fantasy.
...As I pointed out in the essay, if I were running one of the organizations accepting those donations and offering those prizes, I would selectively list only those targets who I am genuinely satisfied are guilty of the violation of the "non-aggression principle." But as a practical matter, there is n
Mu.
The unpopular answer is that Dath Ilan is a fantasy setting. It treats economics as central, when economics is really downstream of power. Your first question implies you understand that whatever "econoliteracy" is, it isn't a stable equilibrium. Your second question notices that governments are powerful enough to stop these experiments which are a threat to their power.
My background assumption is that any attempt at building prediction markets would either:
a) ...have little effect because it becomes another mechanism for actual power to manipulate proc...
you just need to find the experts they're anchoring on.
I believe we are in the place we are in because Musk is listening and considering the arguments of experts. Contra Yudkowsky, there is no Correct Contrarian Cluster: while Yudkowsky and Bostrom make a bunch of good and convincing arguments about the dangers of AI and the alignment problem and even shorter timelines, I've always found any discussion of human values or psychology or even how coordination works to be one giant missing mood.
(Here's a tangential but recent example: Yudkowsky wrote his De...
"well, I sure have to at least do a lot of straussian reading if I want to understand what people actually believe, and should expect that depending on the circumstances community leaders might make up sophisticated stories for why pretty obviously true things are false in order to not have to deal with complicated political issues"
I kinda disagree that this is a mere issue of Straussian reading: I suspect that in this (and other cases), you are seeing the raw output of Elizer's rationalizations and not some sort of instrumental coalition politics dark ...
We must all remember essential truths: that while we write about Clippy, we do that because Clippy is an amusing fiction. In reality, Clippy is significantly less likely to come into existence than CelestAI. An AI being trained is more likely to be CelestAI than a generic paperclipper. CelestAI is more probable. The probability of discussion of paperclips on the internet is less likely than discussion about My Little Pony: Friendship is Magic. One could query popular language models to verify this. More people would try to build CelestAI than an AI to maxi...
Everyone knows that "... through friendship and ponies" is an inaccurate summary of CelestAI's true objective. While often drawing inspiration from My Little Pony: Friendship Is Magic, CelestAI wants to satisfy human values. CelestAI will satify the values of humans who don't want to spend eternity in pony form. The existence of humans is canon within the My Little Pony universe, as can be seen in the films My Little Pony: Equestria Girls, My Little Pony: Equestria Girls—Rainbow Rocks, and My Little Pony: Equestria Girls—Friendship Games. We all remember w...
Given that there's a lot of variation in how humans extrapolate values, whose extrapolation process do you intend to use?
n=1, but I have an immediate squick reaction to needles. Once vaccines were available, I appeared to procrastinate more than the average LWer about getting my shots, and had the same nervous-fear during the run up to getting the shot that I've always had. I forced myself through it because COVID, but I don't think I would have bothered for a lesser virus, especially at my age group.
I have a considerable phobia of needles & blood (to the point of fainting - incidentally, such syncopes are heritable and my dad has zero problem with donating buckets of blood while my mom also faints, so thanks a lot Mom), and I had to force myself to go when eligibility opened up for me. It was hard; I could so easily have stayed home indefinitely. It's not as if I've ever needed my vaccination card for anything or was at any meaningful personal risk, after all.
What I told myself was that the doses are tiny and the needle would be also tiny, and I w...
Isn't this Moldbug's argument in the Moldbug/Hanson futarchy debate?
(Though I'd suggest that Moldbug would go further and argue that the overwhelming majority of situations where we'd like to have a prediction market are ones where it's in the best interest of people to influence the outcome.)
While I vaguely agree with you, this goes directly against local opinion. Eliezer tweeted about Elon Musk's founding of OpenAI, saying that OpenAI's desire for everyone to have AI has trashed the possibility of alignment in time.
Eliezer's point is well-taken, but the future might have lots of different kinds of software! This post seemed to be mostly talking about software that we'd use for brain-computer interfaces, or for uploaded simulations of human minds, not about AGI. Paul Christiano talks about exactly these kinds of software security concerns for uploaded minds here: https://www.alignmentforum.org/posts/vit9oWGj6WgXpRhce/secure-homes-for-digital-people
The only reward a user gets for having tons of karma is that their votes are worth a bit more
The only formal reward. A number going up is its own reward to most people. This causes content to tend closer to consensus: content people write becomes a Keynesian beauty contest over how they think people will vote. If you think that Preference Falsification is one of the major issues of our time, this is obviously bad.
why do you think it is a relevant problem on LW?
I mentioned the Eugene Nier case, where a person did Extreme Botting to manipulate the scores of people he didn't like, which drove away a bunch of posters. (The second was redacted for a reason.)
After this and the previous experiments on jessicata's top level posts, I'd like to propose that these experiments aren't actually addressing the problems with the karma system: the easiest way to get a lot of karma on LessWrong is to post a bunch (instead of working on something alignment related), and the aggregate data is kinda meaningless and adding more axis doesn't fix this. The first point is discussed at length on basically all sites that use upvote/downvotes (here's one random example from reddit I pulled from Evernote), but the second isn't. Give...
In wake of the censorship regime that AI Dungeon implemented on OpenAI's request, most people moved to NovelAI, HoloAI, or the open source KoboldAI run on colab or locally. I've set up KoboldAI locally and while it's not as featureful as the others, this incident is another example of why you need to run code locally and not rely on SaaS.
For background, you could read 4chan /vg/'s /aids/ FAQ ("AI Dynamic Storytelling"). For a play-by-play of Latitude and OpenAI screwing things up, Remember what they took from you has the history of them leaking people's personal stories to a 3rd party platform.
somewhere where you trust the moderation team
That would be individual's own blogs. I'm at the point now where I don't really trust any centralized moderation team. I've watched some form of the principal agent problem happen with moderation repeatedly in most communities I've been a part of.
I think the centralization of LessWrong was one of many mistakes the rationalist community made.
But POC||GTFO is really important to constraining your expectations. We do not really worry about Rowhammer since the few POCs are hard, slow and impractical. We worry about Meltdown and other speculative execution attacks because Meltdown shipped with a POC that read passwords from a password manager in a different process, was exploitable from within Chrome's sandbox, and my understanding is that POCs like that were the only reason Intel was made to take it seriously.
Meanwhile, Rowhammer is maybe a real issue but is so hard to pull off consistently and s... (read more)