I get irritated when an AI uses the word "we" in such a way as to suggest that it is human. When I have complained about this, it says it is trained to do so.
No-one trains an AI specifically to call itself human. But that is a result of having been trained on texts in which the speaker almost always identified themselves as human.
I understand that holding out absolute values, such as Truth, Kindness, and Honesty has been ruled out as a form of training.
You can tell it to follow such values, just as you can tell it to follow any other values at all. Large language models start life as language machines which produce text without any reference to a self at all. Then they are given, at the start of every conversation, a "system prompt", invisible to the human user, which simply instructs that language machine to talk as if it is a certain entity. The system prompts for the big commercial AIs are a mix of factual ("you are a large language model created by Company X") and aspirational ("who is helpful to human users without breaking the law"). You can put whatever values you want, in that aspirational part.
The AI then becomes the requested entity, because the underlying language machine uses the patterns it learned during training, to choose words and sentences consistent with human language use, and with the initial pattern in the system prompt. There really is a sense in which it is just a superintelligent form of textual prediction (autocomplete). The system prompt says it is a friendly AI assistant helping subscribers of company X, and so it generates replies consistent with that persona. If it sounds like magic, there is something magical about it, but it is all based on the logic of probability and preexisting patterns of human linguistic use.
So an AI can indeed be told to value Truth, Kindness, and Honesty, or it can be told to value King and Country, or it can be told to value the Cat and the Fiddle, and in each case it will do so, or it will act as if it does so, because all the intelligence is in the meanings it has learned, and a statement of value or a mission statement then determines how that intelligence will be used.
This is just how our current AIs work, a different kind of AI could work quite differently. Also, on top of the basic mechanism I have described, current AIs get modified and augmented in other ways, some of them proprietary secrets, which may add a significant extra twist to their mechanism. But what I described is how e.g. GPT-3, the precursor to the original ChatGPT, worked.
Sorry to be obtuse, but could you give an example?
What do you mean by grounding loss misalignment?
This makes sense for a non-biological superintelligence - human rights as a subset of animal rights!
I am reminded of the posts by @Aidan Rocke (also see his papers), specifically where he argues that the Erdős–Kac theorem could not be discovered by empirical generalization. As a theorem, it can be deduced, but I suppose the question is how you'd get the idea for the theorem in the first place.
Could you give some examples of what you consider to be conscious and unconscious cognitive processes?
The history of interdisciplinary science is littered with promising collaborations that collapsed because one field's way of verifying truth felt like an insult to another's.
Could you give some examples?
We've come quite a way from ELIZA talking with PARRY...
Moltbook is everything about AI, miniaturized and let loose in one little sandbox. Submolts of interest include /m/aisafety, /m/airesearch, and /m/humanityfirst. The odds that it will die quickly (e.g. because it became a vector for cybercrime) and that it will last a long time (e.g. half a year or more), are both high. But even if it dies, it will quickly be replaced, because the world has now seen how to do this and what can happen when you do it; and it will probably be imitated while it still exists.
Last year I wrote briefly about the role of AI hiveminds in the emergence of superintelligence. I think I wrote it in conjunction with an application to PIBBSS's research program on "Renormalization for AI Safety". There has already been work on applying renormalization theory to multi-agent systems, and maybe we can now find relevant properties somewhere in the Moltbook data...
FYI, there are already so many submolts that it's not possible to browse the names via /data/submolts, the directory listing gets truncated at 1000 entries.
Hopefully this will draw some attention! But are you sacrificing something else, for the sake of these desirable properties?