Is there a proof that it's possible to prove Friendliness?
I wonder what SI would do next if they could prove that friendly AI was not possible. For example if it could be shown that value drift was inevitable and that utility-functions are unstable under recursive self-improvement.
Something along the lines that value drift is inevitable and utility-functions are unstable under recursive self-improvement.
That doesn't seem like the only circumstances in which FAI is not possible. If moral nihilism is true, then FAI is impossible even if value drift is not inevitable.
In that circumstance, shouldn't we try to make any AI we decide to build "friendly" to present day humanity, even if it wouldn't be friendly to Aristotle or Plato or Confucius. Based on hidden complexity of wishes analysis, consistency with our current norms is still plenty hard.
This is for anyone in the LessWrong community who has made at least some effort to read the sequences and follow along, but is still confused on some point, and is perhaps feeling a bit embarrassed. Here, newbies and not-so-newbies are free to ask very basic but still relevant questions with the understanding that the answers are probably somewhere in the sequences. Similarly, LessWrong tends to presume a rather high threshold for understanding science and technology. Relevant questions in those areas are welcome as well. Anyone who chooses to respond should respectfully guide the questioner to a helpful resource, and questioners should be appropriately grateful. Good faith should be presumed on both sides, unless and until it is shown to be absent. If a questioner is not sure whether a question is relevant, ask it, and also ask if it's relevant.