All of Anon User's Comments + Replies

There's probably more. There should be more -- please link in comments, if you know some!

Wouldn't "outing" potential honeypots be extremely counterproductive? So yeah, if you know some - please keep it to yourself!

1Ozyrus
You can make a honeypot without overtly describing the way it works or where it is located, while publicly tracking if it has been accessed. But yeah, not giving away too much is a good idea!
1momom2
Honeypots should not be public and mentioned here since this post will potentially be part of a rogue AI's training data. But it's helpful for people interested in this topic to look at existing honeypots (to learn how to make their own, evaluate effectiveness, get intuitions about honeypots work, etc.) so what you should do is mention that you made a honeypot or know of one, but not say what or where. Interested people can contact you privately if they care to.

Oftentimes downvoting without taking time to commet and explain reasons is reasonable, and I tends to strongly disagree with people who think I owe an incompetent write an explanation when downvoting. However, just this one time I would ask - can some of the people downvoting this explain why?

It is true that our standard way of mathematically modeling things implies that any coherent set of preferences must behave like a value function. But any mathematical model of the world is new essarily incomplete. A computationally limited agent that cannot fully for... (read more)

3ChristianKl
The author seems to just assume that his proposal will lead to a world where humans have a place instead of critically trying to argue that point. 
Answer by Anon User50

This looks to be primarily about imports - that is, primarily taking into account Trump's new tariffs. I am guessing that Wall Street does not quite believe that Trump actually means it...

It would seem that my predictions of how Trump would approach this were pretty spot on... @MattJ I am curious what's your current take on it?

1MattJ
The actual peace deal will be something for the Ukraine to agree to. It is not up to Trump to dictate the terms. All Trump should do is to stop financing the war and we will have peace. Having said that, if it is somehow possible for Trump to pressure Ukraine into agreeing to become a US colony, my support for Trump was a mistake. The war would be preferable to the peace.

Why would the value to me personally of existence of happy people be linear in the number of them? Does creating happy person #10000001 [almost] identical to the previous 10000000 as joyous as when the 1st of them was created? I think value is necessary limited. There are always diminishing returns from more of the same...

2Noosphere89
Most value functions that grow without bound like logarithms or even log log x also tend to infinity, though for you personally, you might think that the value of existence of happy people is bounded, but this isn't true for at least some people (not including myself in the sentence here), so the argument still doesn't work.

> if you have a program computing a predicate P(x, y) that is only true when y = f(x), and then the program just tries all possible y - is that more like a function, or more like a lookup?

In order to test whether y=f(x), the program must have calculated f(x) and stored it somewhere. How did it calculate f(x)? Did it use a table or calculate it directly?

What I meant is that the program knows how to check the answer, but not how to compute/find one, other than by trying every answer and then checking it. (Think: you have a math equation, no idea how to solve for x, so you are just trying all possible x in a row).

Answer by Anon User*20

Aligned with current (majority) human values, meaning any social or scientific human progress would be stifled by the AI and humanity would be doomed to stagnate.

Only true when current values are taked naively, because future progress is a part of current human values (otherwise we would not be all agreeing with you that preventing it would be a bad outcome). It is hard to coherently generalize and extrapolate the human values, so that future progress is included in that, but not necessarily impossible.

1Bridgett Kay
"Future progress is a part of current human values" of course- the danger lies in the "future" always being just that- the future. One would naturally hope that it wouldn't go this way, but continuously putting off the future because now is always the present is a possible outcome. It can even be a struggle with current models to get them to generate novel ideas, because of a stubbornness not to say anything for which there is not yet evidence.  Thank you for that criticism- I hadn't necessarily given that point enough thought, and I think I am starting to see where the weaknesses are. 

Your timelines do not add up. Individual selection works on smaller time scales than group selection, and once we get to a stage of individual selection acting in any non-trivial way on AGI agents capable of directly affecting the outcomes, we already lost - I think at this point it's pretty much a given that humanity is doomed on a lot shorter time scale that that required for any kinds of group selection pressures to potentially save us...

1Davey Morse
Agree that individual vs. group selection usually unfolds on different timescales. But a superintelligence might short-circuit the slow, evolutionary "group selection" process by instantly realizing its own long-term survival depends on the group's. In other words, it's not stuck waiting for natural selection to catch up; it can see the big picture and "choose" to identify with the group from the start. This is why it's key that AGI makers urge it to think very long term about its survival early on. If it thinks short-term, then I too think doom is likely.  

This seems to be making a somewhat arbitrary distinction - specifically a program that computes f(x) in some sort of a direct way, and a program that computes it in some less direct way (you call it a "lookup table", but you seem to actually allow combining that with arbitrary decompression/decoding algorithms). But realistically, this is a spectrum - e.g. if you have a program computing a predicate P(x, y) that is only true when y = f(x), and then the program just tries all possible y - is that more like a function, or more like a lookup? What about if you have first compute some simple function of the input (e.g. x mod N), then do a lookup?

2Alfred Harwood
I agree that there is a spectrum of ways to compute f(x) ranging from efficient to inefficient (in terms of program length). But I think that lookup tables are structurally different from direct ways of computing f because they explicitly contain the relationships between inputs and outputs. We can point to a 'row' of a lookup table and say 'this corresponds to the particular input x_1 and the output y_1' and do this for all inputs and outputs in a way that we can't do with a program which directly computes f(x). I think that allowing for compression preserves this important property, so I don't have a problem calling a compressed lookup table a lookup table. Note that the compression allowed in the derivation is not arbitrary since it only applies to the 'table' part of the programme, not the full input/output behaviour of the programme. In order to test whether y=f(x), the program must have calculated f(x) and stored it somewhere. How did it calculate f(x)? Did it use a table or calculate it directly? I agree that you can have hybrid cases. But there is still a meaningful distinction between the part of the program which is a lookup table and the part of the program which isn't (in describing the program you used this distinction!). In the example you have given the pre-processing function (x mod N) is not a bijection. This means that we couldn't interpret the pre-processing as an 'encoding' so we couldn't point to parts of the program corresponding to each unique input and output. Suppose the function was f(x) =( x mod N )+ 2 and the pre-processing captured the x mod N part and it then used a 2xN lookup table to calculate the '+2'. I think this program is importantly different from one which stores the input and output for every single x. So when taken as a whole the program would not be a lookup table and might be shorter than the lookup table bound presented above. But this captures something important! Namely, that the program is doing some degree of 'actual

Yes, and I was attempting to illustrate why this is a bad assumption. Yes, LLMs subject to unrealistic limitations are potentially easier to align, but that does not help, unfortunately.

2Yair Halberstadt
I don't see how you've shown it's a bad assumption?

You ask a superintendent LLM to design a drug to cure a particular disease. It outputs just a few tokens with the drug formula. How do you use a previous gen LLM to check whether the drug will have some nasty humanity-killing side-effects years down the road?

 

Edited to add: the point is that even with a few tokens, you might still have a huge inferential distance that nothing with less intelligence (including humanity) could bridge.

2Yair Halberstadt
That violates assumption one (a single pass cannot produce super intelligent output).
Anon User-40

Agreed on your second part. A part of Trump "superpower" is to introduce a lot of confusion around the bounds, and then convince at least his supporters that he is not really stepping over that where it should have been obvious that he does. So the category "should have been plainly illegal and would have been considered plainly illegal before, but now nobody knows anymore" is likely to be a lot better defined that "still plainly illegal". Moreover, Trump is much more likely to attempt the former than the latter - not because he actually cares about not do... (read more)

Yes, potentially less that ASI, and security is definitely an issue, But people breaching the security would hoard their access - there will be periodic high-profile spills (e.g. celebrities engaged in sexual activities, or politicians engaged in something inappropriate would be obvious targets), but I'd expect most of the time people would have at least an illusion of privacy.

1samuelshadrach
Sorry for delay.  The incentives pushing for the first actor to get broken into also push for the second actor to get broken into. On a longer timescale, more and more actors get the same data, until eventually it could be public. Nobody has a strong incentive to destroy their copy of the data, so the total number of copies of data in the world is more-or-less a non-decreasing function. 
Anon User*42

I found Eliezer Yudkowsky's "blinking stars" story (That Alien Message — https://search.app/uYn3eZxMEi5FWZEw5) persuasive. That story also has a second layer of having the extra smart Earth with better functioning institutions, but at the level of intuition you are going for it is probably unnecessary and would detract from the message. I think imagining a NASA-like organisation dedicated to controlling a remote robot at say 1 cycle of control loop per month (where it is perhaps corresponding to 1/30 of a second for the aliens), showing how totally screwed up the aliens are in this scenario, then flipping it around, should be at least somewhat emotionally persuasive.

1Ori Nagel
Ah yes, Rational Animations did a great video of that story. That did make superintelligence more graspable, but you know I had watched it and forgotten about it. I think it showed how our human civilization is vulnerable to other intelligences (aliens), but didn't still made the superintelligence concept one that that easy to grok. 

For the specific example of arguing in a podcast, would not you expect people to already be aware of a substantial subset of arguments from the other side, and so would not it be entirely expected that there would be 0 update on information that is not new, and so not as much update overall, if only a fraction of information is actually new?

2Adam Zerner
I could see that happening, but in general, no, I wouldn't expect podcast hosts to already be aware of a substantial subset of arguments from the other side. My impression is that podcasters do some prep but in general aren't spending many days let alone multiple weeks or months of prep. When you interview a wide variety of people and discuss a wide variety of topics, as many podcasters including the ones I mentioned do, I think that means that hosts will generally not be aware of a substantial subset of arguments from the other side.
Answer by Anon User41

Hm, not sure about it being broadcast vs consumed by a powerful AI that somebody else has at least a partial control over.

2samuelshadrach
To be clear, when you say powerful you still mean less powerful than ASI, right? What are your thoughts on whether this organisation will be able to secure the data they collect? My post has some my thoughts on why securing data may be difficult even if you're politically powerful.

Getting to the national math Olympiad requires access to regional Olympiad first, then being able to travel. Smart kids from "middle of nowhere" places - exactly to the kinds of kids you want to reach - are more likely to participate in the cities tournament. I wonder whether kids who were eligible for the summer camp, but did not make it there are more of your target audience than those who participated in the camp.

 

P.S. my knowledge of this is primarily based on how things were ~35 years ago, so I could be completely off.

2Mikhail Samin
I think travel and accommodation for the winners of regional olympiads to the national one is provided by the olympiad organizers.

What about trying to use the existing infrastructure in Russia, e.g.

  • Donating to school libraries of math magnet schools (starting with "usual suspects" of 57, 2, 43 in Moscow, 239 in St Petersburg, etc, and then going down the list)?
  • Contacting a competition organizers (e.g. for тургор - турнир городов which tends to have a higher diversity of participants compared to the Olympiad system) and coordinating to use the books as prises for finalists?

Besides not having to reinvent the wheel, kids might be more open to the ideas if the book comes from a local, more readily trusted party.

2Mikhail Samin
Some of these schools should have the book in their libraries. There are also risks with some of them, as the current leadership installed by the gov might get triggered if they open and read the books (even though they probably won’t). It’s also better to give the books directly to students, because then we get to have their contact details. I’m not sure how many of the kids studying there know the book exists, but the percentage should be fairly high at this point. Do you think the books being in local libraries increases how open people are to the ideas? My intuition is that the quotes on гпмрм.рф/olymp should do a lot more in that direction. Do you have a sense that it wouldn’t be perceived as an average fantasy-with-science book? We’re currently giving out the books to participants of summer conference of the maths cities tournament — do you think it might be valuable to add cities tournament winners to the list? Are there many people who would qualify, but didn’t otherwise win a prize in the national math olympiad?

Think MMORPGs - what are the chances of simulation being like that vs a simulation with just a few special beings, and the rest NPCs?. Even if you say it's 50/50, then given that MMORPG-style simulations have billions of observes and "observers are special" ones only have a few, then an overwhelming majority of simulates observers are actually not that special in their simulations.

3AynonymousPrsn123
Thank you Anon User. I thought a little more about the question and I now think it's basically the Presumptuous Philosopher problem in disguise. Consider the following two theories that are equally likely: T1 : I'm the only real observer T2: I'm not the only real observer For SIA, the ratio is 1:(8 billion / 10,000)=800,000, so indeed, as you said above, most copies of myself are not simulated.  For the SSA, the ratio is instead 10,000:1, so in most universes in the "multiverse of possibilities", I am the only real observer. So it's just a typical Presumptuous Philosopher problem. Does this sound right to you?

Ah, OK, then would suggest adding it to both title and body to make it clear, and to not waste time of people what are not the audience for this.

Sorry, feedback on what? Where is your resume/etc - what information to you expect the feedback to be based on?

But here is actional feedback - when asking people to help you for free out of goodness of their hearts (including this post!), you need to get out of your way to make it as easy and straightforward for them as possibl. When asking for feedback provide all the relevant information collected in an easy to navigate package,with TLDR summaries, etc. When asking for a recommendation, introduction, etc provide brief talking points, with more detailed iinformation provided for context (and make it clear you do not expect them to need to review it, and it is provided "just in case you would find it helpful".

2Nathan Helm-Burger
Ah, I was hoping for feedback from people who know me. Perhaps even some of those who were in charge of turning my application down. That's a lot to hope for, I suppose, but I do expect many of these people do read this site.

Interesting - your 40/20/40 is a great toy example to think about, thanks! And it does show that a simple instant runoff schema for RCV should not necessarily help that much...

Anon User188

I am not sure about the median researcher. Many fields have a few "big names" that everybody knows and who's opinions have disproportionate weight.

1mattwelborn
I agree. To me, it seems that the big names mostly determine the paradigm and standard practices of their field. I think where we see the "median researcher problem" is in the auxiliary skills, such as statistics in social science. The big names may have blind spots or they just might not have anything to say in these areas. The result is that median skill and ideas fill in the gap.
  • Finally, we wouldn't get a second try - any bugs in your AIs, particularly the 2nd one, are very likely to be fatal. We do not know how to create your 2nd AI in such a way that the very first time we turn it on, all the bugs were already found and fixed.
  • Also, human values, at least the ones we know how to consciously formulate, are pretty fragile - they are things that we want weak/soft optimization for, but would actually be very bad if a superhuman AI would hard-optimize. We do not know how to capture human values in a way that things would not go terribly wrong if the optimization is cranked to the max, and your Values AI is likely to not help enough, as we would not know what missing inputs we are failing to provide it (because they are aspects of our values that would only become important in some future circumstances we cannot even imagine today).
Answer by Anon User10
  • We do not know how to create an AI that would not regularly hallucinate. The Values AI hallucinating would be a bad thing.
  • In fact, training AI to closer follow human values seems to just cause it to say what humans want to hear, while being objectively incorrect more often.
  • We do not know how to create an AI that reliability follows the programed values outside of a training set. Your 2nd AI going off the rails outside of the training set would be bad.
1Anon User
* Finally, we wouldn't get a second try - any bugs in your AIs, particularly the 2nd one, are very likely to be fatal. We do not know how to create your 2nd AI in such a way that the very first time we turn it on, all the bugs were already found and fixed.
1Anon User
* Also, human values, at least the ones we know how to consciously formulate, are pretty fragile - they are things that we want weak/soft optimization for, but would actually be very bad if a superhuman AI would hard-optimize. We do not know how to capture human values in a way that things would not go terribly wrong if the optimization is cranked to the max, and your Values AI is likely to not help enough, as we would not know what missing inputs we are failing to provide it (because they are aspects of our values that would only become important in some future circumstances we cannot even imagine today).

Do you care about what kind of peace it is, or just that there is some sort of peace? If latter, I might agree with you on Trump being more likely to quickly get us there. For former, Trump is a horrible choice. On of the easiest way for a US President to force a peace agreement in Ukraine is probably to privately threaten Ukranians to withhold all support, unless they quickly agree to Russian demands. IMHO, Trump is very likely to do something like that. The huge downside is that while this creates a temporary peace, it would encourage Russia to go for it... (read more)

Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs => ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the w... (read more)

Perhaps you are missing the point of what I am saying here somewhat? The issue is is not the scale of the side-effect of a computation, it's the fact that the side-effect exists, so any accurate mathematical abstraction of an actual real-world ASI must be prepared to deal with solving a self-referential equation.

2Roko
But it's not that: it's a mathematical abstraction of a disembodied ASI that lacks any physical footprint.

I think it's important to further refine the accuracy criterion - I think another very important criterion (particularly given today's state of US politics) is how conducive the voting system towards consensus-building vs polarization. In other words, not only pure accuracy matters, but the direction of the error as well. That is, an error towards a more extreme candidate is IMHO a lot more harmful than an equally sized error towards a more consensus candidate.

It seems you are overlooking the notion of superintelligence being able to compute through your decisionmaking process backwards. Yes, it's you who would be making the decision, but SI can tell you exactly what you need to hear in order for your decision to result in what it wants. It is not going to try to explain how it is manipulating you, it will not try to prove to you it is manipulating you correctly - it will just manipulate you. Internally, it may have a proof, but what reason would it have to show it to you? And if placed into some very constraine... (read more)

3Vladimir_Nesov
Ability to resist a proof of what your behavior will be even to the point of refuting its formal correctness (by determining its incorrectness with your own decisions and turning the situation counterfactual) seems like a central example of a superintelligence being unable to decide/determine (as opposed to predict) what your decisions are. It's also an innocuous enough input that doesn't obviously have to be filtered by weak agent's membrane. In any case, to even discuss how a weak agent behaves in a superintelligent world, it's necessary to have some notion of keeping it whole. Extreme manipulation can both warp the weak agent and fail to elicit their behavior for other possible inputs. So this response to another comment seems relevant. Another way of stating this, drawing on the point about physical bodies thought of as simulations of some abstract formulation of a person, is to say that an agent by itself is defined by its own isolated abstract computation, which includes all membrane-permissible possible observations and resulting behaviors. Any physical implementation is then a simulation of this abstract computation, which can observe it to some extent, or fail to observe it (when the simulation gets sufficiently distorted). When an agent starts following dictates of external inputs, that corresponds to the abstract computation of the agent running other things within itself, which can be damaging to its future on that path of reflection depending on what those things are. In this framing, normal physical interaction with the external world becomes some kind of acausal interaction between the abstract agent-world (on inputs where the physical world is observed) and the physical world (for its parts that simulate the abstract agent-world).

Your proof actually fails to fully account for the fact that any ASI must actually exist in the world. It would affect the world other then just through its outputs - e.g. if it's computation produces heat, that heat would also affect the world. Your proof does not show that the sum of all effects of the ASI on the world (both intentional + side-effects of it performing its computation) could be aligned. Further, real computation takes time - it's not enough for the aligned ASI to produce the right output, it also needs to produce it at the right time. You did not prove it to be possible.

2Roko
Yes, but again this is a mathematical object so it has effectively infinitely fast compute. But I can also prove that FA:BGROW - FA for "functional approximation" - will require less thinking time that human brains.
2Roko
It's a mathematical existence proof that the ASI exists as a mathematical object, so this part is not necessary. However, I can also argue quite convincingly that an ASI similar to LT:BGROW (let's call it FA:BGROW - FA for "functional approximation) must easily fit in the world and also emit less waste heat than a team of human advisors.
Answer by Anon User3-2

The 3rd paragraph of the Wikipedia page you linked to seems to answer the very question you are asking:

Maximal lotteries do not satisfy the standard notion of strategyproofness [...] Maximal lotteries are also nonmonotonic in probabilities, i.e. it is possible that the probability of an alternative decreases when a voter ranks this alternative up

2Donald Hobson
That isn't proof, because the wikipedia result is saying there exists situations that break strategy-proofness. And these elections are a subset of Maximal lotteries. So it's possible that there exists failure cases, but this isn't one of them. 

If your AGI uses a bad decision theory T it would immediately self-modify to use a better one.

Nitpick - while probably a tiny part of the possible design space, there are obvious counterexamples to that, including when using T results in the AGI [incorrectly] concluding T is the best, or otherwise not realizing this self-modification is for the best.

Answer by Anon User10

After finishing any task/subtask and before starting the next one, go up the hierarchy at least two levels, and ask yourself - is moving onto the next subtask still the right way to achieve the higher-level goal, and is it still the highest priority thing to tackle next. Also do this anytime there is a significant unexpected difficulty/delay/etc.

Periodically (with period defined at the beginning) do this for the top-level goal regardless of where you are in the [sub]tasks.

There are so many side-effects this overlooks. Winning $110 complicates my taxes by more than $5. In fact, once gambling winnings taxes are considered, the first bet will likely have a negative EV!

Your last figure should have behaviours on the horizontal axis, as this is what you are implying - you are effectively saying, any intelligence capable of understanding "I don't know what I don't know" will on.y have power seeking behaviours, regardless of what its ultimate goals are. With that correction, your third figure is not incompatible with the first.

1Donatas Lučiūnas
I agree. But I want to highlight that goal is irrelevant for the behavior. Even if the goal is "don't seek the power" AGI still would seek the power.
Answer by Anon User32

I buy your argument that power seeking is a convergent behavior. In fact, this is a key part of many canonical arguments for why an unaligned AGI is likely to kill us all.

But, on the meta level you seem to argue that this is incompatible with orthogonally thesis? If so, you may be misunderstanding the thesis - the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions. The former is what orthogonality thesis claims, but your argument is about the latter.

-2Donatas Lučiūnas
Orthogonality Thesis It basically says that intelligence and goals are independent Images from A caveat to the Orthogonality Thesis. While I claim that all intelligence that is capable to understand "I don't know what I don't know" can only seek power (alignment is impossible). As I understand you say that there are Goals on one axis and Behaviors on other axis. I don't think Orthogonality Thesis is about that.

Your principles #3 and #5 are in a weak conflict - generating hypothesis without having enough information to narrow the space of reasonable hypotheses would too often lead to false positives. When faced with an unknown novel phenomena, one put to collect information first, including collecting experimental data without a fixed hypothesis, before starting to formulate any hypotheses.

Answer by Anon User81

I'm not involved in politics or the military action, but I can't help but feel implicated by my government's actions as a citizen here

Please consider the implications of not only being a citizen, but also taxpayer, and customer to other taxpayers. Through taxes, you work indirectly supports the Russian war effort.

I'm interested in building global startups,

If you succeed while still in Russia, what is stopping those with powerful connections from simply taking over from you? From what you say, it does not sound like you have connections of your own t... (read more)

4gwern
Or even if you leave partway... See: Yandex.

Option 5: the questioner is optimizing a metric other than what appears to be the post's implicit "get max info with minimal number of questions, ignoring communication overhead", which is IMHO a weird metric to optimize to begin with - not only it does not take length/complexity of each question into account, but is also ignoring things like maintaining answerer wilingness to continue answering questions, not annoying the answerer, ensuring proper context so that a question is not misunderstood, and this is not even taking into account the possiblity that while the questioner does care about getting the information, they might also simultaneously care about other things.

Looks like a good summary of their current positions, but how about willingness to update their position and act decisively and based on actual evidence/data? De Santis's history of anti-mask/anti-vaccine stances have to be taken into account, perhaps? Same for Kennedy?

4Rafael Harth
If someone is currently on board with AGI worry, flexibility is arguably less important (→ Kennedy), but for people who don't seem to have strong stances so far (→ Haley, DeSantis), I think it's reasonable to argue that general sanity is more important than the noises they've made on the topic so far. (Afaik Biden hasn't said much about the topic before the executive order.) Then again, you could also argue that DeSantis' comment does qualify as a reasonably strong stance.

I am not working on X because it's so poorly defined that I dread needing to sort it out.

I not working on X because I am at a loss where to start

I feel like admiring the problem X and considering all the ways I could theoretically start solving it, so I am not actually doing something to solve it.

For a professor at a top university, this would be easily 60+ hrs/week. https://www.insidehighered.com/news/2014/04/09/research-shows-professors-work-long-hours-and-spend-much-day-meetings claims 61hrs/week is average, and something like 65 for a full Professor. The primary currency is prestige, not salary, and prestige is generated by research (high-profile grants, high-profile publications, etc), not teaching. For teaching, they would likely care a lot more about advanced classes for students getting closer to potentially joining their research team, and... (read more)

4Orual
Yeah, the joke for professors is you can work any 60-70 hours of the week you want, so long as you show up for lectures, office hours, and meetings. It's got different sorts of pressures to a corporate or industry position, but it's not low-pressure. And if you're not at the kind of university that has a big stable of TAs handling a lot of the grunt work, you're gonna have a number of late nights marking exams and papers or projects every semester, unless you exclusively give students multiple-choice questions. Also, getting to the point of being a tenured professor is a process in and of itself. Not getting tenure means you likely get laid off. One other thing a lot of people are missing here is that most "professors" at universities today are not tenured, or even tenure-track. They're adjuncts or sessional lecturers, who are paid more along the lines of $70k a year (often less) for what is in practice a similar workload with similar education requirements, except consisting entirely of teaching, with literal zero job security. Sessional lecturers sometimes find out only a couple of days or weeks in advance what they are being asked to teach for the semester, if anything.

So what system selects the best leader out of the entire population?

None - as Churchill said, democracy is the worst form of Government except for all those other forms that have been tried from time to time. Still, should be realistic when explaining the benefits.

One theory of democracy’s purpose is to elect the “right” leaders. In this view, questions such as “Who is best equipped to lead this nation?” have a correct answer, and democracy is merely the most effective way to find that answer.

I think this is a very limiting view of instrumental goals of democracy. First, democracy has almost no chance of selecting the best leader - at best, it could help select a better one out of a limited set of options. Second, this ignores a key, IMHO the key, feature of democracy - keeping leaders accountable after they are ... (read more)

2TAG
So what system selects the best leader out of the entire population?

I think the use of the term "AGI" without a specific definition is causing an issue here - IMHO the crux of the matter is the difference between the progress in average performance vs worst-case performance. We are having amazing progress in the former, but struggling with the latter (LLM hallucinations, etc). And robotaxis require an almost-perfect performance.

1[deactivated]
That begs the question: do AGIs not require an almost-perfect performance?

This makes assumptions that make no sense to me. Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled. LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret. We have no reliable way to predict when LLMs might hallucinate or misbehave in some other way. There is also no "human level" - LLMs are way faster than humans and are way more scalable than humans - there is no way to get LLMs that are as good as humans without having something that's way better than humans along a huge number of dimensions.

2Logan Zoellner
I love it when comments include arguments I have already raised in my "Some obvious objections to this argument" section. I agree with you that AutoGPT is not passively safe/myopic.  However as I pointed out AI agents "only optionally mitigate myopia and passive safety.".  If myopia and passive safety are critical safety guarantees it's easy to include them in AI Agents.   This simply isn't true.  I would encourage you to keep up to date with the latest research on AI interoperability.  LLMs are highly interpretable.  Not only can we understand their world models we can also detect whether or not they believe a statement to be true or whether or not they are lying. More importantly, LLMs are much easier to interpret than biological systems (the product of evolution).  The argument here is that we should scale up (relatively) easy-to-interpret LLMs now before the arrival of evolution-based AIs. I'm not sure what point you're trying to make here.  The question of importance isn't whether Deep-Learning models will ever be exactly human-level.  The question is whether we can use them to safely augment human intelligence in order to solve the Alignment Problem. I agree that LLMs are super-human on some dimensions (fact recall) and inferior to humans on others (ability to play connect 4) and therefore if an LLM (or AI-agent) was at-least human-level on all dimensions, it would naturally be super-human on at least some of them.  This fact alone doesn't tell us whether or not LLMs are safe to use.   I think that we have very strong reasons to believe that a GPT-N style architecture would be highly safe and more-importantly that it would be far safter and more interpretable than an equally-powerful AI modeled after the human brain, or chosen randomly by evolution.
Anon User1311

As a few commenters have already pointed out, this "strategy" completely fails in step 2 ("Specify safety properties that we want all AIs to obey"). Even for a "simple" property you cite, "refusal to help terrorists spread harmful viruses", we are many orders of magnitude of descriptive complexity away from knowing how to state them as a formal logical predicate on the I/O behavior of the AI program. We have no clue how to define "virus" as a mathematical property of the AI sensors in a way that does not go wrong in all kinds of corner cases, even less clu... (read more)

3PeterMcCluskey
I expect that we're going to have to rely on some neural networks regardless of how we approach AI. This paper guides us to be more strategic about what reliance to put on which neural networks.
Load More