https://arxiv.org/abs/1712.05812
It's directly about inverse reinforcement learning, but that should be strictly stronger than RLHF. Seems incumbent on those who disagree to explain why throwing away information here would be enough of a normative assumption (contrary to every story about wishes.)
this always helps in the short term,
You seem to have 'proven' that evolution would use that exact method if it could, since evolution never looks forward and always must build on prior adaptations which provided immediate gain. By the same token, of course, evolution doesn't have any knowledge, but if "knowledge" corresponds to any simple changes it could make, then that will obviously happen.
Well that's disturbing in a different way. How often do they lose a significant fraction of their savings, though? How many are unvaccinated, which isn't the same as loudly complaining about the shot's supposed risks? The apparent lack of Flat Earthers could point to them actually expecting reality to conform to their words, and having a limit on the silliness of the claims they'll believe. But if they aren't losing real money, that could point to it being a game (or a cost of belonging).
The answer might be unhelpful due to selection bias, but I'm curious to learn your view of QAnon. Would you say it works like a fandom for people who think they aren't allowed to read or watch fiction? I get the strong sense that half the appeal - aside from the fun of bearing false witness - is getting to invent your own version of how the conspiracy works. (In particular, the pseudoscientific FNAF-esque idea at the heart of it isn't meant to be believed, but to inspire exegesis like that on the Kessel Run.) This would be called fanfic or "fanwank" if they admitted it was based on a fictional setting. Is there something vital you think I'm missing?
There have, in fact, been numerous objections to genetically engineered plants and by implication everything in the second category. You might not realize how much the public is/was wary of engineered biology, on the grounds that nobody understood how it worked in terms of exact internal details. The reply that sort of convinced people - though it clearly didn't calm every fear about new biotech - wasn't that we understood it in a sense. It was that humanity had been genetically engineering plants via cultivation for literal millennia, so empirical facts allowed us to rule out many potential dangers.
Note that it requires the assumption that consciousness is material
Plainly not, assuming this is the same David J. Chalmers.
This would make more sense if LLMs were directly selected for predicting preferences, which they aren't. (RLHF tries to bridge the gap, but this apparently breaks GPT's ability to play chess - though I'll grant the surprise here is that it works at all.) LLMs are primarily selected to predict human text or speech. Now, I'm happy to assume that if we gave humans a D&D-style boost to all mental abilities, each of us would create a coherent set of preferences from our inconsistent desires, which vary and may conflict at a given time even within an individ...
The classification heading "philosophy," never mind the idea of meta-philosophy, wouldn't exist if Aristotle hadn't tutored Alexander the Great. It's an arbitrary concept which implicitly assumes we should follow the aristocratic-Greek method of sitting around talking (or perhaps giving speeches to the Assembly in Athens.) Moreover, people smarter than either of us have tried this dead-end method for a long time with little progress. Decision theory makes for a better framework than Kant's ideas; you've made progress not because you're smarter than Kant, b...
Oddly enough, not all historians are total bigots, and my impression is that the anti-Archipelago version of the argument existed in academic scholarship - perhaps not in the public discourse - long before JD. E.g. McNeill published a book about fragmentation in 1982, whereas GG&S came out in 1997.
Perhaps you could see my point better in the context of Marxist economics? Do you know what I mean when I say that the labor theory of value doesn't make any new predictions, relative to the theory of supply and demand? We seldom have any reason to adopt a theory if it fails to explain anything new, and its predictive power in fact seems inferior to that of a rival theory. That's why the actual historians here are focusing on details which you consider "not central" - because, to the actual scholars, Diamond is in fact cherry-picking topics which can't provide any good reason to adopt his thesis. His focus is kind of the problem.
>The first chapter that's most commonly criticized is the epilogue - where Diamond puts forth a potential argument for why Europe, and not China, was the major colonial power. This argument is not central to the thesis of the book in any way,
It is, though, because that's a much harder question to answer. Historians think they can explain why no American civilization conquered Europe, and why the reverse was more likely, without appeal to Diamond's thesis. This renders it scientifically useless, and leaves us without any clear reason to believe it,...
I do see selves, or personal identity, as closely related to goals or values. (Specifically, I think the concept of a self would have zero content if we removed everything based on preferences or values; roughly 100% of humans who've every thought about the nature of identity have said it's more like a value statement than a physical fact.) However, I don't think we can identify the two. Evolution is technically an optimization process, and yet has no discernible self. We have no reason to think it's actually impossible for a 'smarter' optimization process...
So, what does LotR teach us about AI alignment? I thought I knew what you meant until near the end, but I actually can't extract any clear meaning from your last points. Have you considered stating your thesis in plain English?
You left out, 'People naively thinking they can put this discussion to bed by legally requiring disclosure,' though politicians would likely know they can't stop conspiracy theorists just by proving there's no conspiracy.
Just as humans find it useful to kill a great many bacteria, an AGI would want to stop humans from e.g. creating a new, hostile AGI. In fact, it's hard to imagine an alternative which doesn't require a lot of work, because we know that in any large enough group of humans, one of us will take the worst possible action. As we are now, even if we tried to make a deal to protect the AI's interests, we'd likely be unable to stop someone from breaking it.
I like to use the silly example of an AI transcending this plane of existence, as long as everyone understand...
Have you actually seen orthonormal's sequence on this exact argument? My intuitions say the "Martha" AI described therein, which imitates "Mary," would in fact have qualia; this suffices to prove that our intuitions are unreliable (unless you can convincingly argue that some intuitions are more equal than others.) Moreover, it suggests a credible answer to your question: integration is necessary in order to "understand experience" because we're talking about a kind of "understanding" which necessarily stems from the internal workings of the system, specifi...
The obvious reply would be that ML now seems likely to produce AGI, perhaps alongside minor new discoveries, in a fairly short time. (That at least is what EY now seems to assert.) Now, the grandparent goes far beyond that, and I don't think I agree with most of the additions. However, the importance of ML sadly seems well-supported.
Hesitant to bet while sick, but I'll offer max bet $20k at 25:1.
The basic definition of evidence is more important than you may think. You need to start by asking what different models predict. Related: it is often easier to show how improbable the evidence is according to the scientific model, than to get any numbers at all out of your alternative theory.
>Instead it just means that Bob shouldn't rely on his company doing the fastest and easiest thing and having it turn out fine. Instead Bob should expect to make sacrifices, either burning down a technical lead or operating in (or helping create) a regulatory environment where the fastest and easiest option isn't allowed.
The above feels so bizarre that I wonder if you're trying to reach Elon Musk personally. If so, just reach out to him. If we assume there's no self-reference paradox involved, we can safely reject your proposed alternatives as obviously ...
See, that makes it sound like my initial response to the OP was basically right, and you don't understand the argument being made here. At least one Western reading of these new guidelines was that, if they meant anything, then the bureaucratic obstacle they posed for AGI would greatly reduce the threat thereof. This wouldn't matter if people were happy to show initiative - but if everyone involved thinks volunteering is stupid, then whose job is it to make sure the official rules against a competitive AI project won't stop it from going forward? What does that person reliably get for doing the job?
All of that makes sense except the inclusion of "EA," which sounds backwards. I highly doubt Chinese people object to the idea of doing good for the community, so why would they object to helping people do more good, according to our best knowledge?
I note in passing that the elephant brain is not only much larger, but also has many more neurons than any human brain. Since I've no reason to believe the elephant brain is maximally efficient, making the same claim for our brains should require much more evidence than I'm seeing.
That's if you're counting the cerebellum, which doesn't seem to contribute much to intelligence, but is important for controlling the complicated musculature of a trunk and large body.
By cortical neuron count, humans have about 18 billion, while elephants have less than 6, comparable to a chimpanzee. (source)
Elephants are undeniably intelligent as animals go, but not at human level.
Even blue whales barely approach human level by cortical neuron count, although some cetaceans (notably orcas) exceed it.
What are you trying to argue for? I'm getting stuck on the oversimplified interpretation you give for the quote. In the real world, smart people such as Leibniz raised objections to Newton's mechanics at the time, objections which sound vaguely Einsteinian and not dependent on lots of data. The "principle of sufficient reason" is about internal properties of the theory, similar to Einstein's argument for each theory of relativity. (Leibniz's argument could also be given a more Bayesian formulation, saying that if absolute position in space is meaningful, t...
Out of curiosity, what do you plan to do when people keep bringing up Penrose?
Pretty sure that doesn't begin to address the reasons why a paranoid dictator might invade Taiwan, and indeed would undo a lot of hard work spent signaling that the US would defend Taiwan without committing us to nuclear war.
Pretty sure this is my last comment, because what you just quoted about soundness is, in fact, a direct consequence of Löb's Theorem. For any proposition P, Löb's Theorem says that □(□P→P)→□P. Let P be a statement already disproven, e.g. "2+2=5". This means we already had □¬P, and now we have □(¬P & P), which is what inconsistency means. Again, it seemed like you understood this earlier.
https://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem
A coherent formal system can't fully define truth for its own language. It can give more limited definitions for the truth of some statement, but often this is best accomplished by just repeating the statement in question. (That idea is also due to Tarski: 'snow is white' is true if and only if snow is white.) You could loosely say (very loosely!) that a claim, in order to mean anything, needs to point to its own definition of what it would mean for that claim to be true. Any more general defin...
Here's some more:
A majority (55%) of Americans are now worried at least somewhat that artificially intelligent machines could one day pose a risk to the human race’s existence. This marks a reversal from Monmouth’s 2015 poll, when a smaller number (44%) was worried and a majority (55%) was not.
https://www.monmouth.edu/polling-institute/reports/monmouthpoll_us_021523/
The first part of the parent comment is perfectly true for a specific statement - obviously not for all P - and in fact this was the initial idea which inspired the theorem. (For the normal encoding, "This statement is provable within PA," is in fact true for this exact reason.) The rest of your comment suggests you need to more carefully distinguish between a few concepts:
This may be what I was thinking of, though the data is more ambiguous or self-contradictory: https://www.vox.com/future-perfect/2019/1/9/18174081/fhi-govai-ai-safety-american-public-worried-ai-catastrophe
I'll look for the one that asked about the threat to humanity, and broke down responses by race and gender. In the meantime, here's a poll showing general unease and bipartisan willingness to legally restrict the use of AI: https://web.archive.org/web/20180109060531/http://www.pewinternet.org/2017/10/04/automation-in-everyday-life/
Plus:
...A SurveyMonkey poll on AI conducted for USA TODAY also had overtones of concern, with 73% of respondents saying that would prefer if AI was limited in the rollout of newer tech so that it doesn’t become a threat to huma
The average person on the street is even further away from this I think.
This contradicts the existing polls, which appear to say that everyone outside of your subculture is much more concerned about AGI killing everyone. It looks like if it came to a vote, delaying AGI in some vague way would win by a landslide, and even Eliezer's proposal might win easily.
It would've been even better for this to happen long before the year of the prediction mentioned in this old blog-post, but this is better than nothing.
Because the United Nations is a body chiefly concerned with enforcing international treaties, I imagine it would be incentivized to support arguments in favor of increasing its own scope and powers.
You imagine falsely, because your premise is false. The UN not only isn't a body, its actions are largely controlled by a "Security Council" of powerful nations which try to serve their own interests (modulo hypotheticals about one of their governments being captured by a mad dog) and have no desire to serve the interests of the UN as such. This is mostly by design. We created the UN to prevent world wars, hence it can't act on its own to start a world war.
I don't know that I follow. The question, here and in the context of Löb's Theorem, is about hypothetical proofs. Do you trust yourself enough to say that, in the hypothetical where you experience a proof that eating babies is mandatory, that would mean eating babies is mandatory?
I don't even understand why you're writing □(□P→P)→□P→P, unless you're describing what the theory can prove. The last step, □P→P, isn't valid in general, that's the point of the theorem! If you're describing what formally follows from full self-trust, from the assumption that □P→P for every P, then yes, the theory proves lots of false claims, one of which is a denial that it can prove 2+2=5.
If you're asking how we get to a formal self-contradiction, replace "False" with a statement the theory already disproves, like "0 does not equal 0." I don't understand your conclusion above; the contradiction comes because the theory already proves ¬False and now proves False, so the claim about "¬□False" seems like a distraction.
Do you also believe that if you could "prove" eating babies was morally required, eating babies would be morally required? PA obviously believes Lob's theorem itself, and indeed proves the soundness of all its actual proofs, which is what I said above. What PA doesn't trust is hypothetical proofs.
How do you interpret "soundness"? It's being used to mean that a proof of X implies X, for any statement X in the formal language of the theory. And yes, Löb's Theorem directly shows that PA cannot prove its own soundness for any set of statements save a subset of its own theorems.
Go ahead and test the prediction from the start of that thread, if you like, and verify that random people on the street will often deny the existence of the other two types. (The prediction also says not everyone will deny the same two.) You already know that NTs - asked to imagine maximal, perfect goodness - will imagine someone who gets upset about having the chance to save humanity by suffering for a few days, but who will do it anyway if Omega tells him it can't be avoided.
It sure sounds like you think outsiders would typically have the "common sense" to avoid Ziz. What do you think such an outsider would make of this comment?
There's this guy Michael Vassar who strikes me - from afar - as a failed cult leader, and Ziz as a disciple of his who took some followers in a different direction. Even before this new information, I thought her faith sounded like a breakaway sect of the Church of Asmodeus.
Michael Vassar was one of the inspirations for Eliezer's Professor Quirrell, but otherwise seems to have little influence.
At the risk of this looking too much like me fighting a strawman...
Cults may have a tendency to interact and pick up adaptations from each other, but it seems wrong to operate on the assumption that they're all derivatives of one ancestral "proto-cult" or whatever. Cult leaders are not literal vampires, where you only become a cult leader by getting bit by a previous cult leader or whatever.
It's a cultural attractor, and a cult is a social technology simple enough that it can be spontaneously re-derived. But cults can sometimes pick up or swap beliefs &...
I heard that LaSota ('ziz') and Michael interacted but I am sort of under the impression she was always kind of violent and bizarre before that, so I'm not putting much of this bizarreness down to Michael. Certainly interest in evidence about this (here or in DM).
While it's arguably good for you to understand the confusion which led to it, you might want to actually just look up Solomonoff Induction now.
>Occam's razor. Is it saying anything other than P(A) >= P(A & B)
?
Yes, this is the same as the argument for (the abstract importance of) Solomonoff Induction. (Though I guess you might not find it convincing.)
We have an intuitive sense that it's simpler to say the world keeps existing when you turn your back on it. Likewise, it's an intuitively simpler theory to say the laws of physics will continue to hold indefinitely, than to say the laws will hold up until February 12, 2023 at midnight Greenwich Mean Time. The law of probability which you cit...
Except, if you Read The Manual, you might conclude that in fact those people also can't understand you exist.
Well, current events seem to have confirmed that China couldn't keep restrictions in place indefinitely, and the fact that they even tried - together with the cost of stopping - suggest that it would've been a really good idea to protect their people using the best vaccine. China could presumably have just stuck it in everyone by force of law. What am I missing here?
I don't see how any of it can be right. Getting one algorithm to output Spongebob wouldn't cause the SI to watch Spongebob -even a less silly claim in that vein would still be false. The Platonic agent would know the plan wouldn't work, and thus wouldn't do it.
Since no individual Platonic agent could do anything meaningful alone, and they plainly can't communicate with each other, they can only coordinate by means of reflective decision theory. That's fine, we'll just assume that's the obvious way for intelligent minds to behave. But then the SI works the ... (read more)