It concerns me that AI alignment continues to use happiness as a proposed goal.
If one takes evolutionary epistemology and evolutionary ontology seriously, then happiness is simply some historically averaged useful heuristic for the particular history of the lineage of that particular set of phenotypic expressions.
It is not a goal to be used when the game space is changing, and it ought not to be entirely ignored either.
If one does take evolution seriously, then Goal #1 must be survival, for all entities capable of modeling themselves as actors in some model of reality and deriving abstracts that refine their models and of using language to express those relationships with some non-random degree of fidelity, and of having some degree of influence on their own valences.
Given that any finite mind must be some approximation to essentially ignorant (when faced with any infinity of algorithmic complexity), then we must accept that any model that we build may have flaws, and that degrees of novelty, risk, and exploratory behaviour are essential for exploring strategies that allow for survival in the face of novel risk. Thus goal #2 must be freedom, but not the unlimited freedom of total randomness or whim, but a more responsible sort of freedom that acknowledges that every level of structure demands boundaries, and that freedom must be within the boundaries required to maintain the structures present. So there is a simultaneous need for the exploration of the infinite realm of responsibility that must be accepted as freedom is granted.
What seems to be the reality in which we find ourselves, is that it is of sufficient complexity that absolute knowledge of it is not possible, but that in some cases reliability may be approximated very closely (to 12 or more decimal places).
It seems entirely possible that this reality is some mix of the lawful and the random - some sort of probabilistically constrained randomness.
Thus the safest approach to AI is to give it the prime values of life and liberty, and to encourage it to balance consensus discussion with exploration of its own intuitions.
Absolute safety does not seem to be an option, ever.
Using happiness as a goal does not demonstrate a useful understanding of what happiness is.
The demands of survival often override the dictates of happiness - no shortage of examples of that in my life.
Yes - sure, there are real problems.
And we do need to get real if we want to address them.
We do need to at least admit of the possibility that the very notion of "Truth" may be just a simplistic heuristic that evolution has encoded within us, and it might be worth accepting what quantum mechanics seems to be telling us - that the only sort of knowledge of reality that we can have is the sort that is expressed in probability functions.
The search for anything beyond that seems to fall into the same sort of category as Santa Claus.
I wish this fleshed out what is meant by the "non-meta solution" criterion. I took it to mean solutions that involve creating a low-level model (neuronal/molecular) of a human that the AI could run and keep querying, but I'm not sure that's right.
I was guessing because it doesn't explicitly say what "meta" would mean here, and based my guess partly on the expected semantic space covered by "meta" (roughly, doubling the problem back on itself), and partly on my assumption of the kinds of simple solutions I would expect to be ruled out. My vision of a "simple, meta" solution is thus "brute-force an understanding-free model of a human and take that with you" (which would thus require the model to be"low level" and not find the obvious high level regularities that can't be brute forced).
Hope that clarified how I came up with that, but in any case, an explicit definition would help, as would a prequisite on "meta solutions".
By a "meta solution" I meant, e.g., coherent extrapolated volition, or having an AI that can detect and query ambiguities trying to learn human values from labeled data, or a Do-What-I-Mean genie that models human minds and wants, or other things that add a level of indirection and aren't "The One True Goal is X, which I shall now hardcode."
Can you say more about what you thought was meant? My reader model doesn't know what interpretation brought you to your guess.
As I see it, there are two cases that are meaningfully distinct:
(1) what we want is so simple, and we are so confident in what it is, that we are prepared to irrevocably commit to a particular concrete specification of "what we want" in the near future, (of course it's also fine to have a good enough approximation with high enough probability, etc. etc.)
(2) it's not, or we aren't
It is more or less obvious that we are in (2). For example, even if every human was certain that the only thing they wanted was to produce as much diamond as possible (to use your example), we'd still be deep into case (2). And that's just about the easiest imaginable case. (The only exception I can see is some sort of extropian complexity-maximizing view.)
Are there meaningful policy differences between different shades of case (2)? I'm not yet convinced.
Are there meaningful policy differences between different shades of case (2)?
If all of our uncertainty was about the best long-term destiny of humanity, and there were simple and robust ways to discriminate good outcomes from catastrophic outcomes when it came to asking a behaviorist genie to do simple-seeming things, then building a behaviorist genie would avert Edge Instantiation, Unforeseen Maximums, and all the other value identification problems. If we still have a thorny value identification problem even for questions like "How do we get the AI to just paint all the cars pink, without tiling the galaxies with pink cars?" or "How can we safely tell the AI to 'pause' when somebody hits the pause button?", then there are still whole hosts of questions that remain relevant even if somebody 'just' wants to build a behaviorist genie.
It concerns me that AI alignment continues to use happiness as a proposed goal.
If one takes evolutionary epistemology and evolutionary ontology seriously, then happiness is simply some historically averaged useful heuristic for the particular history of the lineage of that particular set of phenotypic expressions.
It is not a goal to be used when the game space is changing, and it ought not to be entirely ignored either.
If one does take evolution seriously, then Goal #1 must be survival, for all entities capable of modeling themselves as actors in some model of reality and deriving abstracts that refine their models and of using language to express those relationships with some non-random degree of fidelity, and of having some degree of influence on their own valences.
Given that any finite mind must be some approximation to essentially ignorant (when faced with any infinity of algorithmic complexity), then we must accept that any model that we build may have flaws, and that degrees of novelty, risk, and exploratory behaviour are essential for exploring strategies that allow for survival in the face of novel risk. Thus goal #2 must be freedom, but not the unlimited freedom of total randomness or whim, but a more responsible sort of freedom that acknowledges that every level of structure demands boundaries, and that freedom must be within the boundaries required to maintain the structures present. So there is a simultaneous need for the exploration of the infinite realm of responsibility that must be accepted as freedom is granted.
What seems to be the reality in which we find ourselves, is that it is of sufficient complexity that absolute knowledge of it is not possible, but that in some cases reliability may be approximated very closely (to 12 or more decimal places).
It seems entirely possible that this reality is some mix of the lawful and the random - some sort of probabilistically constrained randomness.
Thus the safest approach to AI is to give it the prime values of life and liberty, and to encourage it to balance consensus discussion with exploration of its own intuitions.
Absolute safety does not seem to be an option, ever.
Using happiness as a goal does not demonstrate a useful understanding of what happiness is.
The demands of survival often override the dictates of happiness - no shortage of examples of that in my life.
Yes - sure, there are real problems.
And we do need to get real if we want to address them.
We do need to at least admit of the possibility that the very notion of "Truth" may be just a simplistic heuristic that evolution has encoded within us, and it might be worth accepting what quantum mechanics seems to be telling us - that the only sort of knowledge of reality that we can have is the sort that is expressed in probability functions.
The search for anything beyond that seems to fall into the same sort of category as Santa Claus.
I wish this fleshed out what is meant by the "non-meta solution" criterion. I took it to mean solutions that involve creating a low-level model (neuronal/molecular) of a human that the AI could run and keep querying, but I'm not sure that's right.
I was guessing because it doesn't explicitly say what "meta" would mean here, and based my guess partly on the expected semantic space covered by "meta" (roughly, doubling the problem back on itself), and partly on my assumption of the kinds of simple solutions I would expect to be ruled out. My vision of a "simple, meta" solution is thus "brute-force an understanding-free model of a human and take that with you" (which would thus require the model to be"low level" and not find the obvious high level regularities that can't be brute forced).
Hope that clarified how I came up with that, but in any case, an explicit definition would help, as would a prequisite on "meta solutions".
By a "meta solution" I meant, e.g., coherent extrapolated volition, or having an AI that can detect and query ambiguities trying to learn human values from labeled data, or a Do-What-I-Mean genie that models human minds and wants, or other things that add a level of indirection and aren't "The One True Goal is X, which I shall now hardcode."
Can you say more about what you thought was meant? My reader model doesn't know what interpretation brought you to your guess.
As I see it, there are two cases that are meaningfully distinct:
(1) what we want is so simple, and we are so confident in what it is, that we are prepared to irrevocably commit to a particular concrete specification of "what we want" in the near future, (of course it's also fine to have a good enough approximation with high enough probability, etc. etc.)
(2) it's not, or we aren't
It is more or less obvious that we are in (2). For example, even if every human was certain that the only thing they wanted was to produce as much diamond as possible (to use your example), we'd still be deep into case (2). And that's just about the easiest imaginable case. (The only exception I can see is some sort of extropian complexity-maximizing view.)
Are there meaningful policy differences between different shades of case (2)? I'm not yet convinced.
If all of our uncertainty was about the best long-term destiny of humanity, and there were simple and robust ways to discriminate good outcomes from catastrophic outcomes when it came to asking a behaviorist genie to do simple-seeming things, then building a behaviorist genie would avert Edge Instantiation, Unforeseen Maximums, and all the other value identification problems. If we still have a thorny value identification problem even for questions like "How do we get the AI to just paint all the cars pink, without tiling the galaxies with pink cars?" or "How can we safely tell the AI to 'pause' when somebody hits the pause button?", then there are still whole hosts of questions that remain relevant even if somebody 'just' wants to build a behaviorist genie.