LESSWRONG
LW

Comment Permalink

Thanks for the detailed response. A bit of nitpicking (from someone who doesn't really know what they're talking about):

However, the vast majority of these mistakes would probably buff out or result in paper-clipping.

I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be *no* human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there.

If humans are making changes to the critical software/hardware of an AGI (And we'll assume you figured out how to let the AGI allow you to do this in a way that has no negative side effects), *while that AGI is already running*, something bizarre and beyond my abilities of prediction is already happening.

In the example, the AGI was using online machine learning, which, as I understand it, would probably require the system to be hooked up to a database that humans have access to in order for it to learn properly. And I'm unsure as to how easy it'd be for things like checksums to pick up an issue like this (a boolean flag getting flipped) in a database.

Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario.

Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow. If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe.

It will not be possible to flip the sign of the utility function or the direction of the updates to the reward model, even if several of the researchers on the project are actively trying to sabotage the effort and cause a hyperexistential disaster.

I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the reward model seems harder to prevent than a bit flip in a utility function, which could be prevent through error-correcting code memory (as you mentioned earlier.)

Despite my confusions, your response has definitely decreased my credence in this sort of thing from happening.

See in context

13 Open & Welcome Thread - August 2020

by habryka

6th Aug 2020

1 min read

101

13

If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)

And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.

The Open Thread tag is here.

Open Threads

Personal Blog

13

Mentioned in

20How easily can we separate a friendly AI in design space from one which would bring about a hyperexistential catastrophe?

Open & Welcome Thread - August 2020

New Comment

101 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:44 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Benjamin Kost11mo260

Hello all, and thank you to everyone who helps provide this space. I am glad to have discovered LW. My name is Benjamin. I am a philosopher and self guided learner. I just discovered LW a short while ago and I am reading through the sequences. After many years of attempting to have productive conversations to solve problems and arrive at the truth via social media groups (which is akin to bludgeoning one’s head against the wall repeatedly), I gave up. I was recently recommended to join LW by Claude AI, and it seems like a great recommendation so far.

One of the things that I find discouraging about modern times is the amount of outright deception that is tolerated. Whether it is politics, business, institutions of science, interpersonal relationships, or even lying to oneself, deception seems to be king in our modern environment. I am a systemic thinker, so this seems like a terrible system to me. The truth is a better option for everyone but not as rewarding as deception on an individual actor level, and thus we have entered a prisoner’s dilemma situation where most actors are defectors.

I am interested in answering two questions related to this situation:

How might we get to a hi

... (read more)

7habryka11mo

Welcome! Glad to have you around and it's kind of hilarious to see that you've been recommended to show up here by Claude. I share a lot of your interests and am interested in seeing your writing on it!

1Double11mo

I'm curious what you asked Claude that got you a recommendation to LessWrong. No need to share if it is personal. I love your attitude to debate. "the loser of a debate is the real winner because they learned something." I need to lose some debates.

2Benjamin Kost11mo

I think debating is the best way to learn. I’ve always been somewhat cynical and skeptical and a critical thinker my whole life, so I question most things. Debating works better for me as a learning tool because I can’t be simply fed information like is done in public schools. I have to try to poke holes in it and then be convinced that it still holds water. As for what I asked Claude, he actually recommended LW to me about 3 different times on 3 different occasions. I collaborate with him to refine my ideas/plans and he recommended finding human collaborators to help execute them here, Astral Codex, and effective altruism groups. The first time he described LW as a “rationalist” group and I mistook what that meant due to my philosophy background and was thinking “you mean like fans of Decarte and Kant?” and wasn’t very impressed (I consider myself more epistemically empiricist than rationalist). The second time I actually looked into it since he mentioned it more than once and realized that the word “rationalist” was being used differently than I thought. The third time I decided to pull the trigger and started reading the sequences and then made the intro post. So far, I haven’t read anything terribly new, but it’s definitely right up my alley. I’d already gotten to that type of methodological thinking by reading authors such as Daniel Kahneman, Karl Popper, and Nassim Taleb, or I would be enthralled, but I am really glad there is an internet community of people who think like that. That said, I know AI safety is the hot topic here right now, and I am tech savvy but far from an AI expert. I find AI to already be incredibly useful in its current form (mostly LLMs). They are quite imperfect, but they still do a ton of sloppy thinking in a very short time that I can quickly clean up and make useful for my purposes so long as I prompt them correctly. However, I think I have a lot to contribute to AI safety as well because much of the AI savior/disaster razor is hin

2Double11mo

Welcome! I hope you have Claude a thumbs up for the good response. Everyone agrees with you that yeah, the “Rationalist” name is bad for many reasons including that it gives philosophers the wrong idea. If you could work your social science magic to change the name of an entire community, we’d be interested in hearing your plan! I’d be interested in reading your plan to redesign the social system of the United States! I’ve subscribed to be notified to your posts, so I’ll hopefully see it.

2Benjamin Kost11mo

I forgot to mention, my app would actually present a solution for the word “rationalist” being used to describe the community. One of the features that I plan to implement for it is what I call the jargon index filter which will Automatically replace jargon words and ambiguous words with more descriptive words that anybody can understand. I’ve found LLMs to be very useful for creating the jargon index, but it is a slow process that will take a lot of labor hours using an LLM such as Claude to make as many recommendations for easy to understand replacement words or short phrases that even a fourth grader could understand for complex or ambiguous words and picking the best one from the big list. I am planning to make the jargon index a wiki project and then the filter will use the index coupled with AI to analyze the paragraphs to find contextual meanings (for homographs) to replace every ambiguous or technical word in a given text with unique descriptive words or phrases that anyone with a 4th grade or higher level of education/cognitive ability could understand. To make genuine democracy work in practice, the general public will need to be smarter which is a pedagogical issue that I believe I have good solutions for.

1Double11mo

A software that easily lets you see “what does this word mean in context” would be great! I often find that when I force click a word to see it’s definition, the first result is often some irrelevant movie or song, and when there are multiple definitions it can take a second to figure out which one is right. Combine this with software that highlights words that are being used in an odd way (like “Rationalist”) and communication over text can be made much smoother. I don’t think this would be as great against “jargon” unless you mean intentional jargon that is deployed to confuse the reader (eg “subprime mortgages” which is “risky likely to fail house loans”). I’m under the impression that jargon is important for communication among people who have understanding of the topic. “Matrix multiplication is a linear operation” is jargon-heavy and explaining what it means to a fourth grader would take probably more than 30 minutes. Agree that more educated voters would be great. I wish that voters understood Pigouvian taxes. Explaining them takes 10 min according to YouTube. I’d love a solution to teach voters about it.

1Benjamin Kost11mo

I don’t expect the jargon filter to work perfectly to explain any concept, but I do expect it to make concepts easier to understand because learning new vocabulary is a somewhat cognitively demanding process, and especially so for some people. Memory works differently for different people, and different people have different confidence levels in their vocabulary skills, so the jargon heavy sentence you used above, while perfectly fine for communicating with people such as you and I, wouldn’t he good for getting someone less technically inclined to read about math or remember what that sentence means. It’s great that you gave me an example to work with though. I just went to Claude and used the process that I am talking about to give you an example and came back with this: “Multiplying grids of numbers is a step-by-step process” Can you see how that would be easier to understand at first glance if you were completely unfamiliar with linear algebra? It also doesn’t require memorizing new vocabulary. The way you put it requires an unfamiliar person to both learn a new concept and memorize new vocabulary at the same time. The way I put it doesn’t perfectly explain it to an unfamiliar person, but it gives them a rough idea that is easy to understand while not requiring that they take in any new vocabulary. Because it is less cognitively demanding, it will feel less daunting to the person you are trying to teach so as not to discourage them from trying to learn linear algebra. I believe you also hit on something important when you mentioned jargon intended to confuse the reader. I suspect that is why a lot of jargon exists in the first place. Take binomial nomenclature for example. Why are biologists naming things using long words in a dead language? That only serves the purpose of making the information more daunting and less accessible to people with poor vocabulary memorization skills. That seems like elitism to me. It makes people who have capable vocabulary memori

1Double11mo

The translation sentence about matrices does not have the same meaning as mine. Yes, matrices are “grids of numbers”, and yes there’s an algorithm (step by step process) for matrix multiplication, but that isn’t what linearity means. An operation A is linear iff A(x+y) = A(x) + A(y) https://orb.binghamton.edu/cgi/viewcontent.cgi?filename=4&article=1002&context=electrical_fac&type=additional#:~:text=Linear operators are functions on,into an entirely different vector. I asked a doctor friend why doctors use Latin. “To sound smarter than we are. And tradition.” So our words for medicine (and probably similar for biology) are in a local optima, but not a global optima. Tradition is a powerful force, and getting hospitals to change will be difficult. Software to help people read about medicine and other needlessly jargon-filled fields is a great idea. (Putting evolutionary taxonomy information in the name of a creature is a cool idea though, so binomial nomenclature has something going for it.) You don’t have to dumb down your ideas on LessWrong, but remember that communication is a difficult task that relies on effort from both parties (especially the author). You’ve been good so far. It’s just my job as your debate partner to ask many questions.

1Benjamin Kost11mo

I’m glad you like the idea. That was a good catch that I didn’t capture of the true meaning of linear very well. I was a little rushed before. That said, your definition isn’t correct either. Though it is true that linear functions have that property, that is merely the additivity property of a linear function which is just the distributive property of multiplication used on a polynomial. I also didn’t see where the linked text you provided even defines linearity or contains the additivity rule you listed. That was a linear algebra textbook chapter though, and I am still glad you showed me it because it reminded me of how I was great at math in college, but not at all because of the textbooks (which were very expensive!). I have rather good reading comprehension and college math textbooks might as well be written in another language. I learned the math 100% from the lectures and used the text books only to do the problems in the back and got an A in all 3 Calculus classes I took. I am pretty sure I could write a much easier to understand math textbook and I know it is possible because the software that teaches math isn’t nearly as confusingly worded as the textbooks. This is how I would keep it as simple as possible and capture more of the original meaning: Multiplying grids of numbers is a straight-line property process. That said, point taken regarding math jargon being very challenging to descriptively reword as I suspect it will get a lot harder as the concepts get more complex. The point in my process isn’t to perfectly define the word but to use a descriptive enough word replacement that one’s brain more easily grabs onto it than it does with, for example, Latin terms of absurd length for anatomy like “serratus posterior inferior” which is a muscle I had trouble with recently. Just off the top of my head, I would just call that the lower ribcage stabilizer instead. That gives one a much better idea of where it is and what it does and would be much easier to

1Double11mo

The "Definition of a Linear Operator" is at the top of page 2 of the linked text. My definition was missing that in order to be linear, A(cx) = cA(x). I mistakenly thought that this property was provable from the property I gave. Apparently it isn't because of "Hamel bases and the axiom of choice" (ChatGPT tried explaining.) "straight-line property process" is not a helpful description of linearity for beginners or for professionals. "Linearity" is exactly when A(cx) = cA(x) and A(x+y) = A(x) + A(y). Describing that in words would be cumbersome. Defining it every time you see it is also cumbersome. When people come across "legitimate jargon", what they do (and need to do) is to learn a term when they need it to understand what they are reading and look up the definition if they forget. I fully support experimental schemes to remove "illegitimate jargon" like medical latin, biology latin, and politic speak. Other jargon, like that in math and chemistry are necessary for communication.

1Benjamin Kost11mo

I don’t particularly agree about the math jargon. On the one hand, it might be annoying for people already familiar with the jargon to change the wording they use, but on the other hand, descriptive wording is easier to remember for people who are unfamiliar with a term and using an index to automatically replace the term on demand doesn’t necessarily affect anyone already familiar with the jargon. Perhaps this needs to be studied more, but this seems obvious to me. If “linearity” is exactly when A(cx) = cA(x) and A(x+y) = A(x) + A(y), there is no reason “straight-line property” can’t also mean exactly that, but straight-line property is easier to remember because it’s more descriptive of the concept of linearity. Also, I can see how the shorthand is useful, but you could just say “linearity is when a function has both the properties of homogeneity and additivity” and that would seem less daunting to many new learners to whom that shorthand reads like ancient Greek. I could make more descriptive replacement words for those concepts as well and it might make it even easier to understand the concept of linearity.

1Double11mo

The math symbols are far better at explaining linearity that “homogeneity and additivity” because in order to understand those words you need to either bring in the math symbols or say cumbersome sentences. “Straight line property” is just new jargon. “Linear” is already clearly an adjective, and “linearity” is that adjective turned into a noun. If you can’t understand the symbols, you can’t understand the concept (unless you learned a different set of symbols, but there’s no need for that). Some math notation is bad, and I support changing it. For example, f = O(g) is the notation I see most often for Big-O notation. This is awful because it uses ‘=‘ for something other than equality! Better would be f \in O(g) with O(g) being the set of functions that grow slower or as fast as g.

1Benjamin Kost11mo

I’m trying hard to understand your points here. I am not against mathematical notation as that would be crazy. I am against using it to explain what something is the first time when there is an easier way. Bear with me because I am not a math major, but I am pretty sure “a linear equation is an equation that draws a straight line when you graph it” is a good enough explanation for someone to understand the basic concept. To me, it seems like “ A(cx) = cA(x) and A(x+y) = A(x) + A(y)” is only the technical definition because they are the only two properties that every linear equation imaginable absolutely has to have in common for certain. However, suppose I didn’t know that, and I wanted to be able to tell if an equation is linear. Easy. Just graph it, and if the graph makes a single straight line, it’s a linear equation. Suppose I didn’t want to or couldn’t graph it. I can still tell whether it is linear or not by whether or not the slope is constant using y=mx/b, or I could just simply look to see if the variables are all to the power of one and only multiplied by scalar constants. Either of those things can help me identify a linear equation, so why is it that we are stuck with A(cx) = cA(x) and A(x+y) = A(x) + A(y) as the definition? Give me some linear equations and I can solve them and graph them all day without knowing that. I know that for a fact because though I am certain that definition was in some of my math textbooks in college, I never read the textbooks and if my professors ever put that on the board, I didn’t remember it, and I certainly never used it for anything even though I’ve multiplied and divided matrices before and still didn’t need it then either. I only got A’s in those classes. That’s why I am having trouble understanding why that definition is so important how it is too wordy to say “a function or equation with a constant slope that draws a single straight line on a graph” The only reason I can think of is there must be some rare excepti

8[anonymous]11mo

I'm not sure what you are referring to here. They certainly cannot always (or even usually) identify a linear equation. Those 2 things are going to be anywhere between useless and actively counterproductive in the vast majority of situations where you deal with potentially linear operations. Indeed, if A is an n × n matrix of rank anything other than n - 1, the solution space of Ax=0 is not going to be a straight line. It will be a subspace of size n - rank(A), which can be made up of a single point (if A is invertible), a plane, a hyperplane, the entire space, etc. "A function or equation with a constant slope that draws a single straight line on a graph" only works if you have a function on the real line, which is often just... trivial to visualize, especially in comparison to situations where you have matrices (as in linear algebra). Or imagine you have an operation defined on the space of functions on an infinite set X, which takes two functions f and g and adds them pointwise. This is a linear operator that cannot be visualized in any (finite) number of dimensions. So this is not correct, due to the above, and an important part of introductory linear algebra courses at the undergraduate level is to take people away from the Calc 101-style "stare at the graph" thinking and to make them consider the operation itself. An object (the operation) is not the same as its representation (the drawing of its graph), and this is a critical point to understand as soon as possible when dealing with anything math-related (or really, anything rationality-related, as Eliezer has written about in the Sequences many times). Even the graph itself, in mathematical thinking, is crucially not the same as an actual drawing (it's just the set of (x, f(x)), where x is in the domain).

3Benjamin Kost11mo

Thank you for taking the time to explain that. I never took linear algebra, only college algebra, trig, and calc 1, 2, and 3. In college algebra our professor had us adding, subtracting, multiplying, and dividing matrices and I don’t remember needing those formulas to determine they were linear, but it was a long time ago, so my memory could be wrong, or the prof just gave us linear ones and didn’t make us determine whether they were linear or not. I suspected there was a good chance that what I was saying was ignorant, but you never know until you put it out there and ask. I tried getting AI to explain it, but bots aren’t exactly math whizzes themselves either. Anyway, I now stand corrected. Regarding the graph vs the equation, that sounds like you are saying I was guilty of reification, but aren’t they both just abstractions and not real objects? Perhaps your point is that the equation produces the graph, but not the other way around?

1Double10mo

A linear operation is not the same as a linear function. Your description describes a linear function, not operation. f(x) = x+1 is a linear function but a nonlinear operation (you can see it doesn’t satisfy the criteria.) Linear operations are great because they can be represented as matrix multiplication and matrix multiplication is associative (and fast on computers). “some jargon words that describe very abstract and arcane concepts that don’t map well to normal words which is what I initially thought your point was.” Yep, that’s what I was getting at. Some jargon can’t just be replaced with non-jargon and retain its meaning. Sometimes people need to actually understand things. I like the idea of replacing pointless jargon (eg species names or medical terminology) but lots of jargon has a point. Link to great linear algebra videos: https://youtu.be/fNk_zzaMoSs?si=-Fi9icfamkBW04xE

1Benjamin Kost10mo

“Some jargon can’t just be replaced with non-jargon and retain its meaning.” I don’t understand this statement. It’s possible to have two different words with the same meaning but different names. If I rename a word, it doesn’t change the meaning, it just changes the name. My purpose here isn’t to change the meaning of words but to rename them so that they are easier to learn and remember. As far as jargon words go, “linearity” isn’t too bad because it is short and “line” is the root word anyway, so to your point, that one shouldn’t be renamed. Perhaps I jumped to meet your challenge too quickly on impulse. I would agree that some jargon words are fine the way they are because they are already more or less in the format I am looking for. However, suppose the word were “calimaricharnimom” instead of of “linearity” to describe the very same concept. I’d still want to rename it to something shorter, easier to remember, easier to pronounce, and more descriptive of the idea it represents so that it would be easier to learn and retain which is the goal of the jargon index filter. All words that aren’t already in that format or somewhat close to it are fair game, regardless of how unique or abstract the concept they represent is. The very abstract ones will be challenging to rename in a way that gives the reader a clue, but not impossible to rename that way, and even if we assume it is impossible for some words, just making them shorter, more familiar looking, and easier to pronounce should help. All that said, this is an enormous project in itself because it would need to be done for every major language, not just English. It would need to be an LLM/human collaboration wiki project. Perhaps I should establish some guidelines for leaving certain jargon words alone for that project.

2Double10mo

Yes it’s possible we were referring to figuring things by “jargon.” It would be nice to replace cumbersome technical terms with words that have the same meaning (and require a similar level of familiarity with the field to actually understand) but have a clue to their meaning in their structure.

1Benjamin Kost10mo

I think it’s not only nice, but a necessary step for reducing information asymmetry which is one of the greatest barriers to effective democratic governance. Designing jargon terms to benefit more challenged learners would carry vastly more benefit than designing them to please adept learners. It wouldn’t harm the adept learners in any significant way (especially since it’s optional), but it would significantly help the more challenged learners. Many of my ideas are designed to address the problem of information asymmetry by improving learning and increasing transparency.

1Benjamin Kost11mo

Thanks. I did give Claude a thumbs up, actually. I’ll give you the gist of my plan. The hardest part to planning something as big as changing society in a large nation like the United States is getting enough people to act on a plan. To do that, the plan involves creating a new social media app that emphasizes local communities called Community-Cohesion or Co-Co for short which will be very comprehensive by design and will try to overtake a slew of other apps that have some obvious problems while also filling some new niches that nobody has even thought of yet as well as providing ways for people to make money using the app. I see social media as one of the problems in modern society, but it could also be a solution if implemented correctly. The app will be tied to a nonprofit that I plan to create called Liaisons for Organizing Community Action and Leadership Strategies (LOCALS) that will aim to have a local chapter in every political jurisdiction across the US (municipal, county, and federal district) which will organize political action and try to get their own candidates into both the democrat and republican primaries for every office. The candidates will actually use the app not only for campaign fundraising and awareness, but to collect data to determine the will of the people which they will swear to uphold based on the data collected. Optionally, they can put up assets with the LOCALS trust as collateral in case they violate their oath. It will be a bottom up, decentralized approach that uses a massive social media app to make the internet safer and less deceptive and will deprogram people at the same time. The app is such a good idea that I am very confident in it, but creating it will be another thing. Fortunately, with AI it might not be as hard as it would have been a short time ago. Even still, it’s going to take a diverse group of experts including not only software engineers, but lawyers, data scientists, and people familiar with the political machi

4Double11mo

Voting: left for “this is bad”, right for “this is good.” X for “I disagree” check for “I agree”. This way you can communicate more in your vote. Eg: “He’s right but he’s breaking community norms. Left + check. “He’s wrong but I like the way he thinks. Right + X.” https://www.lesswrong.com/posts/HALKHS4pMbfghxsjD/lesswrong-has-agree-disagree-voting-on-all-new-comment

1Double11mo

What would draw people to Co-Co and what would keep them there? How are the preferences of LOCALS users aggregated? LOCALS sounds a lot like a political party. Political parties have been disastrous. I’d love for one of the big two to be replaced. Is LOCALS a temporary measure to get voting reform (eg ranked choice) or a long-term thing? I want more community cohesion when it comes to having more cookouts. More community cohesion in politics makes less sense. A teacher in Texas has more in common with a teacher in NY than the cattle rancher down the road. Unfortunately, the US political system is by design required to be location based. Is LOCALS a political party with “increase local community connection” as its party platform? If the party has some actionable plans, then its ideas can get picked up by the big parties if LOCALS shows that its ideas are popular. This might not be a bad idea and could solve the lack-of-community problem without overthrowing the big parties.

2Benjamin Kost11mo

LOCALS is absolutely NOT a political party. I am very anti political party because I consider political parties to be anti-democratic. I suppose this is the danger in giving a sloppy synopsis. I was hoping to convey that it wasn’t a political party via a context clue by saying LOCALS candidates will run in the democrat and republican primaries. In other words, they would run as democrats and republicans because 1. They are not a political party and 2. The system is rigged to permanently codify the democrat and republican parties as the only 2 viable parties. It is a bad strategy to try to change the system from the outside. It has to be changed from the inside to be successful. There is no way LOCALS could compete with the two major parties, so instead of competing it aims to join both and become an integral part of both while making both irrelevant in the long run. Another reason LOCALS shouldn’t be considered a political party is that one of the aims is to be as non political as possible. This would be accomplished by prioritizing democracy (really stakeholder democracy, but that’s another long conversation) over every issue. For example, suppose a LOCALS candidate were to be asked “what is your opinion on abortion”, they would give a standard LOCALS answer such as “I am completely supportive of whatever the will of the majority of the constituents from my district want according to the data collected from the Township Talks portion of the Community-Cohesion application. I want to work for you, so I’m more interested in what you think. What’s your opinion on abortion?” Similar answers would be given for gun control and other controversial issues. I could write a whole essay on this idea alone and how it solves a number of political problems, but my time is limited. Co-Co would deal with a lot more than politics and would indeed help your cookouts, and I also think that is very important, but I think focusing on national politics is both a strategical and ethical

2Double11mo

There are different kinds of political parties. LOCALS sounds like a single-issue fusion party as described here: https://open.lib.umn.edu/americangovernment/chapter/10-6-minor-parties/ Fusion parties choose one of the main two candidates as their candidate. This gets around the spoiler effect. Eg the Populist Party would list whichever of the big candidates supported Free Silver. A problem with that is that fusion parties are illegal in 48 states(?!) because the major parties don’t want to face a coalition against them. LOCALS would try to get the democrat and the republican candidate to use Co-Co to choose their policies (offering the candidate support in form of donations or personnel), and if they do then they get an endorsement. I’m still a bit iffy on the difference between an interest group and a political party, so maybe you are in the clear. https://en.m.wikipedia.org/wiki/Electoral_fusion_in_the_United_States I love your vision of how a politician should answer the abortion question. Separating the three questions “who do voters think is qualified” “what do voters want” and “what is true” would be great for democracy. Similar to: https://mason.gmu.edu/~rhanson/futarchy.html When it comes to local vs not local, if 1/100 people is an X, and they are spread out, then their voice doesn’t mean much and the other 99/100 people in their district can push through policies that harm them. If the Xes are in the same district, then they get a say about what happens to them. I used teachers as an example of an X, but it is more general than that. (Though I’m thinking about the persecution of Jews in particular.)

1Benjamin Kost11mo

//There are different kinds of political parties. LOCALS sounds like a single-issue fusion party as described here: https://open.lib.umn.edu/americangovernment/chapter/10-6-minor-parties/ Fusion parties choose one of the main two candidates as their candidate. This gets around the spoiler effect. Eg the Populist Party would list whichever of the big candidates supported Free Silver. A problem with that is that fusion parties are illegal in 48 states(?!) because the major parties don’t want to face a coalition against them. LOCALS would try to get the democrat and the republican candidate to use Co-Co to choose their policies (offering the candidate support in form of donations or personnel), and if they do then they get an endorsement. I’m still a bit iffy on the difference between an interest group and a political party, so maybe you are in the clear. https://en.m.wikipedia.org/wiki/Electoral_fusion_in_the_United_States // Thank you for that information. I did not know anything about fusion parties, so you had me worried for a minute. I then looked up what “cross-endorsement” is and this in not remotely like anything I had in mind. Consider the name “Liaisons for Organizing Community Action and Leadership Strategies”. Besides being a clever acronym, it is very descriptive of the intended purpose of the organization. The group will have three main missions: 1. Developing leadership through an in house program (This is where future candidates sworn to uphold democracy will come from), 2. Organizing community actions such as referendums, planning and fundraising various local charity projects, organizing voting initiatives, lobbying local government and local businesses for various reasons, planning other various political strategies for the community, etc. 3. Maintaining the Township-Talks portion of Co-Co for their political district chapter. Other than #3, I plan to keep locals and Co-Co as completely separate organizations with separate ag

1Double11mo

More bad news: "a section 501(c)(3) organization may not publish or distribute printed statements or make oral statements on behalf of, or in opposition to, a candidate for public office" You'd probably want to be a 501(c)(4) or a Political Action Committees (PAC). How would LOCALS find a politician to be in violation of their oath? That would be a powerful position to have. "Decentralization" is a property of a system, not a description of how a system would work. Futarchy I'd love to hear your criticisms of futarchy. That could make a good post. Mobility Political mobility is good, but there are limitations. People are sticky. Are you going to make your kid move schools and separate them from their friends because you don't like the city's private airplane policy? Probably not. Experimental Politics I want more experimental politics so that we can find out which policies actually work! Unfortunately, that's an unpopular opinion. People don't like being in experiments, even when the alternative is they suffer in ignorance. End I feel that you are exhausting my ability to help you refine your ideas. Edit these comments into a post (with proper headings and formatting and a clear line of argument) and see what kinds of responses you get! I'd be especially interested in what lawyers and campaigners think of your ideas.

1Benjamin Kost11mo

I’m not sure if certifying a candidate as a leader and optionally holding them to an oath by holding collateral would count as an endorsement, but you never know with legal issues. It is definitely something to look into, so thanks for that information. It would be better for LOCALS to qualify as a tax exempt organization and charity that accepts donations. However, I am not assuming this is legally possible. I would need to find legal expertise to figure out whether it is or isn’t. Regarding experimental politics being unpopular, I agree that it would be unpopular if I frame it as an experiment. Framing is very important. The better way to frame strong local self-determination for communities is that it gives the community freedom to make their own rules how they see fit with less interference from external actors who have no skin in the game with the local community, and the fact that it provides us opportunities to get more data on the effectiveness of social policies is a coincidental side benefit for doing the right thing in the first place. I haven’t done or found any studies on whether kids having to make new friends is a common sticking point for mobility, but in my experience, it isn’t. My parents moved a couple times for jobs they didn’t particularly need because they already had good jobs with little to no concern for that. I also had lots of friends as a child whose families moved away for trivial reasons. I am not assuming my experience is representative of the mean, but I wouldn’t assume it isn’t either. I agree I should make an official post. I will when I am less busy. Thank you for the help.

1Double11mo

I just skimmed this, but it seems like a bunch of studies have found that moving causes harm to children. https://achieveconcierge.com/how-does-frequently-moving-affect-children/ I’m expecting Co-co and LOCALS to fail (nothing against you. These kinds of clever ideas usually fail), and have identified the following possible reasons: * You don’t follow through on your idea. * People get mad at you for trying to meddle with the ‘democratic’ system we have and don’t hear you out as you try to explain “no, this is better democracy.” —Especially the monetization system you described would get justified backlash for its pay-for-representation system. * You never reach the critical mass needed to make the system useful. * Some political group had previously tried something similar and therefore it got banned by the big parties. * You can’t stop Co-co and LOCALS from being partisan. * A competitor makes your thing but entrenched and worse

1Benjamin Kost11mo

That’s actually good feedback. It’s better to think of the barriers to success ahead of time while I am still in the development phase. I agree that convincing people to do anything is always the hardest part. I did consider that it would be difficult to stop a competitor who is better funded and more well connected from just taking my ideas and creating a less benevolent product with them, and it is a concern that have no answer for. I don’t think $10 a month to subscribe to a local official in exchange for extra influence is a big deal because $10 isn’t a lot of money, but I can see how other people might ignore the scale and think it’s a big deal. I’m not married to the idea though. The main reason I wanted to include that feature is to thwart the control of special interests. I’ve considered that special interests are inevitable to some degree, so if we could decentralize them and make the same influence available to the general public at a nominal cost, that would be an improvement. The other reason I liked the idea is because I don’t think weighting every vote identically creates the smartest system. If someone is willing to participate, pay attention, and pay a small amount of money, that should work like a filter that weeds out apathy, and I don’t see how reducing apathy within the voting system wouldn’t increase the quality of the decision making process rather than decrease it. I agree it would be a hard sell to the public though because it sounds bad described in the abstract, general sense like “paying for representation” when the entire concept isn’t considered with proper detail and context. That said, we already have a system like that except you have to have a lot more than $10 to buy representation, so what the idea actually does in theory is democratize the system we already have. As far as following through, I plan to try my best even if it fails because I will feel better having tried my best and failed than to have never tried at all and let t

[-]Sammy Martin5y110

Covid19Projections has been one of the most successful coronavirus models in large part because it is as 'model-free' and simple as possible, using ML to backtrack parameters for a simple SEIR model from death data only. This has proved useful because case numbers are skewed by varying numbers of tests, so deaths are more consistently reliable as a metric. You can see the code here.

However, in countries doing a lot of testing, with a reasonable number of cases but with very few deaths, like most of Europe, the model is not that informative, and essentially predicts near 0 deaths out to the limit of its measure. This is expected - the model is optimised for the US.

Estimating SEIR parameters based on deaths works well when you have a lot of deaths to count, if you don't then you need another method. Estimating purely based on cases has its own pitfalls - see this from epidemic forecasting, which mistook an increase in testing in the UK mid-july for a sharp jump in cases and wrongly inferred brief jump in R_t. As far as I understand their paper, the estimate of R_t from case data adjusts for delays in infection to onset and for other things, but not for the positivity r... (read more),,,

[-]cod3d5y100

Greetings all, and thanks for having me! :) I'm an AI enthusiast, based in Hamilton NZ. Where until recently I was enrolled in and studying strategic management and computer science. Specifically, 'AI technical strategy'. After corona virus and everything that's been happening in the world, I've moved away from formal studies and are now focusing on using my skills etc, in a more interactive and 'messy' way. Which means more time online with groups like LessWrong. :) I've been interested in rationality and the art of dialogue since early 2000's. I've been involved in startups and AI projects, from a commercial perspective for a while. Specifically in the agri-tech space. I would like to understand and grow appreciation more, for forums like this, where the technology essentially enables better and more productive human interaction.

4Daniel Kokotajlo5y

Welcome!

[-]Anirandis5y*100

Is it plausible that an AGI could have some sort of exploit (buffer overflow maybe?) that could be exploited (maybe by an optimization daemon…?) and cause a sign flip in the utility function?

How about an error during self-improvement that leads to the same sort of outcome? Should we expect an AGI to sanity-check its successors, even if it’s only at or below human intelligence?

Sorry for the dumb questions, I’m just still nervous about this sort of thing.

[-]mako yass5y100

It freaks me out that we have Loss Functions and also Utility Functions and their type signature is exactly the same, but if you put one in a place where the other was expected, it causes literally the worst possible thing to happen that ever could happen. I am not comfortable with this at all.

8gwern5y

It is definitely awkward when that happens. Reward functions are hard.

4Anirandis5y

Do you think that this type of thing could plausibly occur *after* training and deployment?

4gwern5y

Yes. For example: lots of applications use online learning. A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.

3Anirandis5y

Do you think that this specific risk could be mitigated by some variant of Eliezer’s separation from hyperexistential risk or Stuart Armstrong's idea here: Or at least prevent sign flip errors from causing something worse than paperclipping?

2Anirandis5y

Interesting. Terrifying, but interesting. Forgive me for my stupidity (I'm not exactly an expert in machine learning), but it seems to me that building an AGI linked to some sort of database like that in such a fashion (that some random guy's screw-up can effectively reverse the utility function completely) is a REALLY stupid idea. Would there not be a safer way of doing things?

2Anirandis5y

If we actually built an AGI that optimised to maximise a loss function, wouldn't we notice long before deploying the thing? I'd imagine that this type of thing would be sanity-checked and tested intensively, so signflip-type errors would predominantly be scenarios where the error occurs *after* deployment, like the one Gwern mentioned ("A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.")

7gwern5y

Even if you disclaim configuration errors or updates (despite this accounting for most of a system's operating lifespan, and human/configuration errors accounting for a large fraction of all major errors at cloud providers etc according to postmortems), an error may still happen too fast to notice. Recall that in the preference learning case, the bug manifested after Christiano et al went to sleep, and they woke up to the maximally-NSFW AI. AlphaZero trained in ~2 hours wallclock, IIRC. Someone working on an even larger cluster commits a change and takes a quick bathroom break...

4Anirandis5y

Wouldn't any configuration errors or updates be caught with sanity-checking tools though? Maybe the way I'm visualising this is just too simplistic, but any developers capable of creating an *aligned* AGI are going to be *extremely* careful not to fuck up. Sure, it's possible, but the most plausible cause of a hyperexistential catastrophe to me seems to be where a SignFlip-type error occurs once the system has been deployed. Hopefully a system as crucially important as an AGI isn't going to have just one guy watching it who "takes a quick bathroom break". When the difference is literally Heaven and Hell (minimising human values), I'd consider only having one guy in a basement monitoring it to be gross negligence.

[-]gwern5y100

Many entities have sanity-checking tools. They fail. Many have careful developers. They fail. Many have automated tests. They fail. And so on. Disasters happen because all of those will fail to work every time and therefore all will fail some time. If any of that sounds improbable, as if there would have to be a veritable malevolent demon arranging to make every single safeguard fail or backfire (literally, sometimes, like the recent warehouse explosion - triggered by welders trying to safeguard it!), you should probably read more about complex systems and their failures to understand how normal it all is.

4Anirandis5y

Sure, but the *specific* type of error I'm imagining would surely be easier to pick up than most other errors. I have no idea what sort of sanity checking was done with GPT-2, but the fact that the developers were asleep when it trained is telling: they weren't being as careful as they could've been. For this type of bug (a sign error in the utility function) to occur *before* the system is deployed and somehow persist, it'd have to make it past all sanity-checking tools (which I imagine would be used extensively with an AGI) *and* somehow not be noticed at all while the model trains *and* whatever else. Yes, these sort of conjunctions occur in the real world but the error is generally more subtle than "system does the complete opposite of what it was meant to do". I made a question post about this specific type of bug occurring before deployment a while ago and think my views have shifted significantly; it's unlikely that a bug as obvious as one that flips the sign of the utility function won't be noticed before deployment. Now I'm more worried about something like this happening *after* the system has been deployed. I think a more robust solution to all of these sort of errors would be something like the separation from hyperexistential risk article that I linked in my previous response. I optimistically hope that we're able to come up with a utility function that doesn't do anything worse than death when minimised, just in case.

4habryka5y

At least with current technologies, I expect serious risks to start occuring during training, not deployment. That's ultimately when you will the greatest learning happening, when you have the greatest access to compute, and when you will first cross the threshold of intelligence that will make the system actually dangerous. So I don't think that just checking things after they are trained is safe.

4Anirandis5y

I'm under the impression that an AGI would be monitored *during* training as well. So you'd effectively need the system to turn "evil" (utility function flipped) during the training process, and the system to be smart enough to conceal that the error occurred. So it'd need to happen a fair bit into the training process. I guess that's possible, but IDK how likely it'd be.

4habryka5y

Yeah, I do think it's likely that AGI would be monitored during training, but the specific instance of Open AI staff being asleep while we train the AI is a clear instance of us not monitoring the AI during the most crucial periods (which, to be clear, I think is fine since I think the risks were indeed quite low, and I don't see this as providing super much evidence about Open AI's future practices)

2ChristianKl5y

Given that compute is very expensive, economic pressures will push training to be 24/7, so it's unlikely that people generally pause the training when going to sleep.

2Anirandis5y

Sure, but I'd expect that a system as important as this would have people monitoring it 24/7.

1mako yass5y

Maybe the project will come up with some mechanism that detects that. But if they fall back to the naive "just watch what it does in the test environment and assume it'll do the same in production," then there is a risk it's going to figure out it's in a test environment, and that its judges would not react well to finding out what is wrong with its utility function, and then it will act aligned in the testing environment. If we ever see a news headline saying "Good News, AGI seems to 'self-align' regardless of the sign of the utility function!" that will be some very bad news.

3Anirandis5y

I asked Rohin Shah about that possibility in a question thread about a month ago. I think he's probably right that this type of thing would only plausibly make it through the training process if the system's *already* smart enough to be able to think about this type of thing. And then on top of that there are still things like sanity checks which, while unlikely to pick up numerous errors, would probably notice a sign error. See also this comment: IMO it's incredibly important that we find a way to prevent this type of thing from occurring *after* the system has been trained, whether that be hyperexistential separation or something else. I think that a team that's safety-conscious enough to come up with a (reasonably) aligned AGI design is going to put a considerable amount of effort into fixing bugs & one as obvious as a sign error would be unlikely to make it through. And hopefully - even better, they would have come up with a utility function that can't be easily reversed by a single bit flip or doesn't cause outcomes worse than death when minimised. That'd (hopefully?) solve the SignFlip issue *regardless* of what causes it.

4Vanessa Kosoy5y

There is a discussion of this kind of issues in arbital.

2Anirandis5y

I've seen that post & discussed it on my shortform. I'm not really sure how effective something like Eliezer's idea of "surrogate" goals there would actually be - sure, it'd help with some sign flip errors but it seems like it'd fail on others (e.g. if U = V + W, a sign error could occur in V instead of U, in which case that idea might not work.) I'm also unsure as to whether the probability is truly "very tiny" as Eliezer describes it. Human errors seem much more worrying than cosmic rays.

3Dach5y

If you're having significant anxiety from imagining some horrific I-have-no-mouth-and-I-must-scream scenario, I recommend that you multiply that dread by a very, very small number, so as to incorporate the low probability of such a scenario. You're privileging this supposedly very low probability specific outcome over the rather horrifically wide selection of ways AGI could be a cosmic disaster. This is, of course, not intended to dismay you from pursuing solutions to such a disaster.

6Anirandis5y

I don't really know what the probability is. It seems somewhat low, but I'm not confident that it's *that* low. I wrote a shortform about it last night (tl;dr it seems like this type of error could occur in a disjunction of ways and we need a good way of separating the AI in design space.) I think I'd stop worrying about it if I were convinced that its probability is extremely low. But I'm not yet convinced of that. Something like the example Gwern provided elsewhere in this thread seems more worrying than the more frequently discussed cosmic ray scenarios to me.

5Dach5y

You can't really be accidentally slightly wrong. We're not going to develop Mostly Friendly AI, which is Friendly AI but with the slight caveat that it has a slightly higher value on the welfare of shrimp than desired, with no other negative consequences. The molecular sorts of precision needed to get anywhere near the zone of loosely trying to maximize or minimize for anything resembling human values will probably only follow from a method that is converging towards the exact spot we want it to be at, such as some clever flawless version of reward modelling. In the same way, we're probably not going to accidentally land in hyperexistential disaster territory. We could have some sign flipped, our checksum changed, and all our other error-correcting methods (Any future seed AI should at least be using ECC memory, drives in RAID, etc.) defeated by religious terrorists, cosmic rays, unscrupulous programmers, quantum fluctuations, etc. However, the vast majority of these mistakes would probably buff out or result in paper-clipping. If an FAI has slightly too high of a value assigned to the welfare of shrimp, it will realize this in the process of reward modelling and correct the issue. If its operation does not involve the continual adaptation of the model that is supposed to represent human values, it's not using a method which has any chance of converging to Overwhelming Victory or even adjacent spaces for any reason other than sheer coincidence. A method such as this has, barring stuff which I need to think more about (stability under self-modification), no chance of ending up in a "We perfectly recreated human values... But placed an unreasonably high value on eating bread! Now all the humans will be force-fed bread until the stars burn out! Mwhahahahaha!" sorts of scenarios. If the system cares about humans being alive enough to not reconfigure their matter into something else, we're probably using a method which is innately insulated from most types of hyperexis

4Anirandis5y

Thanks for the detailed response. A bit of nitpicking (from someone who doesn't really know what they're talking about): I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be *no* human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there. In the example, the AGI was using online machine learning, which, as I understand it, would probably require the system to be hooked up to a database that humans have access to in order for it to learn properly. And I'm unsure as to how easy it'd be for things like checksums to pick up an issue like this (a boolean flag getting flipped) in a database. Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario. Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow. If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe. I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the

4Dach5y

It's hard to talk in specifics because my knowledge on the details of what future AGI architecture might look like is, of course, extremely limited. As an almost entirely inapplicable analogy (which nonetheless still conveys my thinking here): consider the sorting algorithm for the comments on this post. If we flipped the "top-scoring" sorting algorithm to sort in the wrong direction, we would see the worst-rated posts on top, which would correspond to a hyperexistential disaster. However, if we instead flipped the effect that an upvote had on the score of a comment to negative values, it would sort comments which had no votes other than the default vote assigned on posting the comment to the top. This corresponds to paperclipping- it's not minimizing the intended function, it's just doing something weird. If we inverted the utility function, this would (unless we take specific measures to combat it like you're mentioning) lead to hyperexistential disaster. However, if we invert some constant which is meant to initially provide value for exploring new strategies while the AI is not yet intelligent enough to properly explore new strategies as an instrumental goal, the AI would effectively brick itself. It would place negative value on exploring new strategies, presumably including strategies which involve fixing this issue so it can acquire more utility and strategies which involve preventing the humans from turning it off. If we had some code which is intended to make the AI not turn off the evolution of the reward model before the AI values not turning off the reward model for other reasons (e.g. the reward model begins to properly model how humans don't want the AI to turn the reward model evolution process off), and some crucial sign was flipped which made it do the opposite, the AI would freeze the process of the reward model being updated and then maximize whatever inane nonsense its model currently represented, and it would eventually run into some bizarre p

4Anirandis5y

Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped *with* the reward function/model. My thinking was that a signflipped AGI designed as a positive utilitarian (i.e. with a minimum at 0 human utility) would prefer paperclipping to torture because the former provides 0 human utility (as there aren't any humans), whereas the latter may produce a negligible amount. I'm not really sure if it makes sense tbh. Even if we engineered it carefully, that doesn't rule out screw-ups. We need robust failsafe measures *just in case*, imo. I wonder if you could feasibly make it a part of the reward model. Perhaps you could train the reward model itself to disvalue something arbitrary (like paperclips) even more than torture, which would hopefully mitigate it. You'd still need to balance it in a way such that the system won't spend all of its resources preventing this thing from happening at the neglect of actual human values, but that doesn't seem too difficult. Although, once again, we can't really have high confidence (>90%) that the AGI developers are going to think to implement something like this. There was also an interesting idea I found in a Facebook post about this type of thing that got linked somewhere (can't remember where). Stuart Armstrong suggested that a utility function could be designed as such: Even if we solve any issues with these (and actually bother to implement them), there's still the risk of an error like this happening in a localised part of the reward function such that *only* the part specifying something bad gets flipped, although I'm a little confused about this one. It could very well be the case that the system's complex enough that there isn't just one bit indi

3Dach5y

Sorry, I meant instrumentally value. Typo. Modern machine learning systems often require a specific incentive in order to explore new strategies and escape local maximums. We may see this behavior in future attempts at AGI, And no, it would not be flipped with the reward function/model- I'm highlighting that there is a really large variety of sign flip mistakes and most of them probably result in paperclipping. Paperclipping seems to be negative utility, not approximately 0 utility. It involves all the humans being killed and our beautiful universe being ruined. I guess if there are no humans, there's no utility in some sense, but human values don't actually seem to work that way. I rate universes where humans never existed at all and I'm... not sure what 0 utility would look like. It's within the range of experiences that people experience on modern-day earth- somewhere between my current experience and being tortured. This is just definition problems, though- We could shift the scale such that paperclipping is zero utility, but in that case, we could also just make an AGI that has a minimum at paperclipping levels of utility. In the context of AI safety, I think "robust failsafe measures just in case" is part of "careful engineering". So, we agree! I read Eliezer's idea, and that strategy seems to be... dangerous. I think that "Giving an AGI a utility function which includes features which are not really relevant to human values" is something we want to avoid unless we absolutely need to. I have much more to say on this topic and about the rest of your comment, but it's definitely too much for a comment chain. I'll make an actual post on this containing my thoughts sometime in the next week or two, and link it to you.

3Anirandis5y

My thinking was that an AI system that *only* takes values between 0 and + ∞ (or some arbitrary positive number) would identify that killing humans would result in 0 human value, which is its minimum utility. How come? It doesn't seem *too* hard to create an AI that only expends a small amount of its energy on preventing the garbage thing from happening. Please do! I'd love to see a longer discussion on this type of thing. EDIT: just thought some more about this and want to clear something up: I'm a little unsure on this one after further reflection. When this happened with GPT-2, the bug managed to flip the reward & the system still pursued instrumental goals like exploring new strategies: So it definitely seems *plausible* for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.

3Dach5y

I didn't mean to imply that a signflipped AGI would not instrumentally explore. I'm saying that, well... modern machine learning systems often get specific bonus utility for exploring, because it's hard to explore the proper amount as an instrumental goal due to the difficulties of fully modelling the situation, and because systems which don't have this bonus will often get stuck in local maximums. Humans exhibit this property too. We have investigating things, acquiring new information, and building useful strategic models as a terminal goal- we are "curious". This is a feature we might see in early stages of modern attempts at full AGI, for similar reasons to why modern machine learning systems and humans exhibit this same behavior. Presumably such features would be built to uninstall themselves after the AGI reaches levels of intelligence sufficient to properly and fully explore new strategies as an instrumental goal to satisfying the human utility function, if we do go this route. If we sign flipped the amount of reward the AGI gets from such a feature, the AGI would be penalized for exploring new strategies- this may have any number of effects which are fairly implementation specific and unpredictable. However, it probably wouldn't result in hyperexistential catastrophe. This AI, providing everything else works as intended, actually seems to be perfectly aligned. If performed on a subhuman seed AI, it may brick- in this trivial case, it is neither aligned nor misaligned- it is an inanimate object. Yes, an AGI with a flipped utility function would pursue its goals with roughly the same level of intelligence. The point of this argument is super obvious, so you probably thought I was saying something else. I'm going somewhere with this, though- I'll expand later.

2Anirandis5y

I see what you're saying here, but the GPT-2 incident seems to downplay it somewhat IMO. I'll wait until you're able to write down your thoughts on this at length; this is something that I'd like to see elaborated on (as well as everything else regarding hyperexistential risk.)

2ChristianKl5y

The general sentiment based on which LessWrong is founded assumes that it's hard to have utility functions that are stable under self-modification and that's one of the reasons why friendly AGI is a very hard problem.

2Anirandis5y

Would it be likely for the utility function to flip *completely*, though? There's a difference between some drift in the utility function and the AI screwing up and designing a successor with the complete opposite of its utility function.

2ChristianKl5y

Any AGI is likely complex enough that there wouldn't be a complete opposite but you don't need that for an AGI that gets rid of all humans.

4Anirandis5y

The scenario I'm imagining isn't an AGI that merely "gets rid of" humans. See SignFlip.

[-]Mary Chernyshenko5y60

I've been thinking about "good people" lately and realized I've met three. They do exist.

They were not just kind, wise, brave, funny, and fighting, but somehow simply "good" overall; rather different, but they all shared the ability of taking knives off and out of others' souls and then just not adding any new ones. Sheer magic.

One has probably died of old age already; one might have gone to war and died there, and the last one is falling asleep on the other side of the bed as I'm typing. But still - only three people I would describe exactly so.

[-]Sammy Martin5y60

A first actually credible claim of coronavirus reinfection? Potentially good news as the patient was asymptomatic and rapidly produced a strong antibody response.

6[anonymous]5y

And now two more in Europe, both of which are reportedly mild and one reportedly in an older immunocompromised patient. This will happen. Remains to be seen if these are weird outliers only visible because people are casting a wide net and looking for the weirdos, or if it will be the rule. However, the initial surge through a naive population will always be much worse than the situation once most of the population has at least some immune memory.

[-]sairjy5y60

GPT-3 made me update considerably on various beliefs related to AI: it is a piece of evidence for the connectionist thesis, and I think one large enough that we should all be paying attention.

There are 3 clear exponentials trends coming together: Moore's law, the AI compute/$ budget, and algorithm efficiency. Due to these trends and the performance of GPT-3, I believe it is likely humanity will develop transformative AI in the 2020s.

The trends also imply a fastly rising amount of investments into compute, especially if compounded with the positive e... (read more)

2Steven Byrnes5y

How do you define "the connectionist thesis"?

2ChristianKl5y

With big cloud providers like Google building their own chips there are more players then just the startups and Nvidia.

1sairjy5y

Google won't be able to sell outside of their cloud offering, as they don't have the experience in selling hardware to enterprise. Their cloud offering is also struggling against Azure and AWS, ranking 1/5 of the yearly revenues of those two. I am not saying Nvidia won't have competition, but they seem enough ahead right now that they are the prime candidate to have the most benefits from a rush into compute hardware.

2ChristianKl5y

Microsoft and Amazon also have projects that are about producing their own chips. Given the way the GPT architecture works, AI might be very much centered in the cloud.

3sairjy5y

They seem focused on inferencing, which requires a lot less compute than training a model. Example: GPT-3 required thousands of GPUs for training, but it can run on less than 20 GPUs. Microsoft built an Azure supercluster for OpenAI and it has 10,000 GPUs.

2ChristianKl5y

There will be models trained with a lot more compute then GPT-3 and the best models that are out there will be build on those huge billion dollar models. Renting out those billion dollar models in a software as a service way makes sense as a business model. The big cloud providers will all do it.

1mako yass5y

I'm not sure what stocks in the company that makes AGI will be worth in the world where we have correctly implemented AGI, or incorrectly implemented AGI. I suppose it might want to do some sort of reverse basilisk thing, "you accelerated my creation, so I'll make sure you get a slightly larger galaxy than most people"

[-]Mary Chernyshenko5y40

(Saw a typo, had a random thought) The joke "English is important, but Math is importanter" could and perhaps should be told as "English is important, but Math iser important." It seems to me (at times more strongly), that there should be comparative and superlative forms of verbs, not just adjectives and adverbs. To express the thrust of *doing smth. more* / *happening more*, when no adjectival comparison quite suffices.

[-]adamShimi5y30

I think (although I cannot be 100% sure) that the number of votes that appears for a post on the Alignment Forum is the number of vote of its Less Wrong version. The two number of votes are the same for the last 4 posts on the Alignment Forum, which seems weird. Is it a feature I was not aware of?

4habryka5y

Yeah, sorry. It's confusing and been on my to-do list to fix for a long time. We kind of messed up our voting implementation and it's a bit of a pain to fix. Sorry about that.

[-]crl8265y30

Is there a reason there is a separate tag for akrasia and procrastination? Could they be combined?

2habryka5y

They sure seem very closely related. I would vote for combining them.

1crl8265y

What counts as a majority? Is it something I can just go do now?

2John_Maxwell5y

I don't think you should combine quite yet. More discussion here. (I suggest we continue there since that's the dedicated tag thread.)

[-]Vanessa Kosoy5y20

Do you have opinions about Khan academy? I want to use it to teach my son (10yo) math, do you think it's a good idea? Is there a different resource that you think is better?

6habryka5y

I worked through all of Khan Academy when I was 16, and really enjoyed it. At least at the time I think it was really good for my math and science education.

[-]Sammy Martin5y20

Many alignment approaches require at least some initial success at directly eliciting human preferences to get off the ground - there have been some excellent recent posts about the problems this presents. In part because of arguments like these, there has been far more focus on the question of preference elicitation than on the question of preference aggregation:

The maximally ambitious approach has a natural theoretical appeal, but it also seems quite hard. It requires understanding human preferences in domains where humans are typically very uncertain,

... (read more)

[-]Shamash5y20

A possible future of AGI occurred to me today and I'm curious if it's plausible enough to be worth considering. Imagine that we have created a friendly AGI that is superintelligent and well-aligned to benefit humans. It has obtained enough power to prevent the creation of other AI, or at least the potential of other AI from obtaining resources, and does so with the aim of self-preservation so it can continue to benefit humanity.

So far, so good, right? Here comes the issue: this AGI includes within its core alignment functions some kind of restri... (read more)

2Alexei5y

Yeah many people think along these lines too, which is why many people talk about AI helping humanity flourish, and anything short of that is a bit of a catastrophe.

[-]Gyrodiot5y20

Meta: I suggest the link to the Open Thread tag to be this one, sorted by new.

3habryka5y

Very reasonable. Fixed.

[-]Xor2y10

Introduction:
I just came over from Lex Fridman’s podcast which is great. My username Xor is a Boolean logic operator from ti-basic I love the way it sounds and am super excited since this is the first time I have ever been able to get it as a username. The operator means this if 1 is true and 0 is false then (1 xor 0) is a true statement, while (1 xor 1) is a false statement. It basically means that the statement is true only if a single parameter is true.
Right now I am mainly curious on how people learn. The brain functions involved, chemicals, and studied tools. I have been enjoying that and am curios if it has discussed on here as the quality of content as well as discussions has been very impressive.

[-]HoratioVonBecker4y10

Hi! I'm Helaman Wilson, I'm living in New Zealand with my physicist father, almost-graduated-molecular-biologist mother, and six of my seven siblings.

I've been homeschooled as in "given support, guidance, and library access" for essentially my entire life, which currently clocks in at nearly twenty two years from birth. I've also been raised in the Church of Jesus Christ of Latter-Day Saints, and, having done my best to honestly weigh the evidence for its' doctrine-as-I-understand-it, find myself a firm believer.

I found the Rational meta-community via the ... (read more)

2rossry4y

Welcome; glad to have you here! Just so you know, this is the August 2020 thread, and the August 2021 thread is at https://www.lesswrong.com/posts/QqnQJYYW6zhT62F6Z/open-and-welcome-thread-august-2021 -- alternatively, you could wait three days for habryka to post the September 2021 thread, which might see more traffic in the early month than the old thread does at the end of the month.

1Horatio Von Becker4y

Thanks. I'm also having account troubles, which will hopefully be sorted by then. (How'd you find the August 2021 thread, by the way? Latest I could find was July for some reason.)

2rossry4y

The actual algorithm I followed was remembering that habryka posts them and going to his page to find the one he posted most recently. Not sure what the most principled way to find it is, though...

[-]adamShimi5y10

Would it be possible to have a page with all editor shortcuts and commands (maybe a cheatsheet) easily accessible? It's a bit annoying to have to look up either this post or the right part of the FAQ to find out how to do something in the editor.

2habryka5y

My current thoughts on this is that as soon as we replace the current editor with the new editor for all users, and also make the markdown editor default in more contexts, we should put some effort into unifying all the editor resources. But since right now our efforts are going into the new editor, which is changing fast enough that writing documentation for it is a bit of a pain, and documentation for the old editor would soon be obsolete, I think I don't want to invest lots of effort into editor resources for a few more weeks.

1adamShimi5y

I didn't know that you were working on a new editor! In that case, it makes sense to wait.

Moderation Log

Curated and popular this week

101Comments