I like the approach of defining these common properties in a disjunct and non-cyclic way. I think we need to define these concepts around agency in a grounded way, i.e., without cyclic or vague references and by referring to the physical world (game theory is crisp, but it lacks this connection, dealing only with symbols and numbers).
Thanks @Gunnar_Zarncke , I appreciate your comment! You correctly identified my goal, I am trying to ground the concepts and build relationships "from the top to the bottom", but I don't think I can succeed alone.
I kindly ask you to provide some challenges: is there any area that you feel "shaky"? Any relation in particular that is too much open to interpretation? Anything obviously missing from the discussion?
Have you seen 3C's: A Recipe For Mathing Concepts? I think it has some definitions for you to look into, esp. the last sentence:
If you want to see more examples where we apply this methodology, check out the Tools post, the recent Corrigibility post, and (less explicitly) the Interoperable Semantics post.
In an artificial being, all the following:
are distinct properties that, in theory, may be activated independently. Let me explain.
What I Hope to Achieve
By publishing this post, I hope to develop and standardise some useful terms and concepts often found in many other posts. I also wish to explain in which sense some properties are linked to others - and, importantly, when they are not. I hope this post will be of help and provide a common context for future further discussions.
What I am not trying to achieve is some universal definition of consciousness/intelligence/… simply because that’s too hard! To avoid controversies, I will focus on operational facets and nothing more[1]. I am still aware that the final result will be very opinionated, so feel free to challenge me and open my mind.
The Basics
Let’s suppose you walk around a new planet and you find some kind of system (artificial or natural - it doesn’t matter) that is able to “autonomously” execute general computations (digital, analog, quantum - it doesn’t matter, and it doesn’t matter which ones and why). Think of it as a universal Turing machine[2]: we’ll call such a system a computational being. Of course, I could just call it a computer - but the problem is that most people don’t recognise themselves as “computers”, hence the term has some semantic bias; also, would you consider an ecosystem as a “computer”? And yet an ecosystem can run computations, if you are inclined to interpret in that way the lifecycle of its inhabitants while they maintain (unknowingly) a stable equilibrium. So, let me use “computational being” as an umbrella term and as the foundation for the rest of the discussion.
What does a “computational being” usually do? (1) It connects to a network consisting of other computational beings; (2) it receives communications in input, either from external sources or self-prompted; (3) it sends communications in output, either to external parties or to itself; (4) it alters its own states according to multiple factors, both internal and external; (5) it alters the physical world, either by communicating or by performing work. None of that is strictly required for a computational being, but usually that’s what happens and what makes it interesting.
Let me refer to a computational being that communicates in a network as an actor[3]. In the following, what is communicated (or why) doesn’t matter! What is received in input will be simply called a perception[4] of the actor.
The First Round of Complications
Let’s consider what happens if an actor is constantly sending self-prompts to itself, in a recursive way. You can envision this operation as an actor that is thinking about something - a stream of thoughts! According to this interpretation, you can declare as “thinking” even some brain activity that is entirely subconscious, or some computer process that acts as a scheduler. That is not the point though: the point is the “iteration over time”, aka the fact that thinking is done in a continuous fashion. We’ll get back to this later - for now, let’s call this kind of actor a thinker.
Let’s consider now another example: what happens if the actor perceives in input something that is very unpleasant? It seems natural for actors to run an evaluation function that allows them to gauge the relevance of some perception, i.e. if they like it or dislike it (neutral is also fine). Such evaluation can be nuanced: it can return quantitative measures (numerical), qualitative dimensions (categorical), or a combination of both. Which thing is liked, or how much, or how the actor reacts after doing an evaluation - all of that does not matter! What matters is that an evaluation function exists and it is often applied. We’ll call this kind of actor a persona, because it shows personal preferences[5].
The Second Round of Complications
What happens if we combine a thinker and a persona, and then we provide it with an input? We get something quite close to an experience, as in something perceived consciously in the stream of thoughts and whose evaluation is incorporated and alters the stream itself[7]. I am not saying that this is the definition of consciousness: I am just saying that consciousness requires an integration[8] of the most relevant computational processes into a single continuous recursive evaluation[9].
With a debatable abuse of language, we’ll call this kind of actors weakly conscious (or weakly sentient). Considering the abuse, I feel forced to clarify what this definition does (and does not) entail:
Let’s proceed with more complex behaviours: suppose that a persona does not like what it sees around, and it is considering taking action to solve the issue. But which action will actually solve the issue? The question is harder than it looks! That is the reason why we are going to skip it and focus instead on the selection of a policy function, whose purpose is to generate a list[10] of possible next-actions based on the latest inputs and the current state of the persona. The presence of a policy function makes the persona an agent and it provides it with volition[11]. A highly-refined policy function will likely take the form of a global decision utility function but, as I mentioned before, we are going to skip that discussion.
Final point: some agents may be conscious and feel emotions. What does that mean? Emotions are self-prompts that are decided by some internal policy function, and are meant to alter your evaluation of things. If you are angry, you might take offense at an innocent joke; if you are sad, you may cry while watching a playful puppy. Nature gave us emotions, and made us emotional beings for improving our chances of survival. Shall AI models be emotional as well?
The Third (and Final) Round of Complications
So far, we defined:
Importantly though, we didn’t discuss what is intelligence! To get there, we need quite a number of new concepts.
First of all, let’s define what is a task: conceptually, it is just a collection of specific pairs (input, output) where the input is interpreted as a request, while the output is interpreted as an acceptable[12] response. Possibly, the collection may be infinite! Tasks can be assigned to generic actors, that will then generate a response anytime a request is made: such activity is called performing the task and it means that the actor is executing work (so the actor can be called a worker).
Importantly, actors are fallible and sometimes they may produce an unacceptable response! If that happens, a second actor (more specifically, a persona) can evaluate the work and decide if it is positive or negative. The next step is to decide a goal - for example, 90% average positive rate - to understand if the worker succeeded or failed the task according to the goal.
All the terminology above is pretty much well-established in the ML literature. What is intelligence? We are now prepared to answer: intelligence is the capacity of performing successfully on a new computational task, up to a chosen goal. In other words, intelligence is a metric used to understand if some goal will likely be fulfilled (or not).
The definition suggests that intelligence is a spectrum, and it is measured based on goal performance. That is strictly an operational definition, and it excludes many physical forms of intelligence, but it is well accepted nowadays in the ML community.
The higher the intelligence, the higher the chance of:
Game theory provides a number of challenges for actors, especially when involving collective knowledge.
I am inclined to think that the reason why many animals show advanced intelligence is because they happen to be hunters or preys (or both!), hence they are selected to be intelligent in respect to the goal of surviving. Specifically, hunting requires a mix of forecasting and requires forming mental models about the prey's behaviour; sometimes, it also requires social skills to collaborate in a group. Humans ticked all those boxes, while also being generalists and explorers[13].
We are almost there: we only need to define a few definitions.
Back to the Original Question
In an artificial being, all the following:
are distinct properties since:
Are you happy with my explanation? Probably not! The topic is highly controversial, and that’s fine - I just hope I gave you some context for further discussions, and that I clarified some difficult concepts and how they (may) relate to each other.
Further Complications
If you are still there, I’ll just briefly touch a few more hot topics. If some AI models were not just mere stochastic parrots, then…
Addenda
Andrew Critch in this post discussed some informal alternative properties related to consciousness. They are worth a reading and I recommend having a look.
Which type of actor is the "safest" for training an AI model? I believe that contextless non-thinking (aka one-shot) non-personal actors are the safest, but they may not be very useful due to lack of thinking; at the very opposite, agentic models are the most dangerous since they most closely resemble paperclip maximizers combined with intelligent zombies. Emotional models may be terrible or terrific, depending on how they are implemented in practice.
In any case, any thinking model may be inherently dangerous, as shown for example here.
Further Links
Control Vectors as Dispositional Traits (my previous post)
Who I am
My name is Gianluca Calcagni, born in Italy, with a Master of Science in Mathematics. I am currently (2024) working in IT as a consultant with the role of Salesforce Certified Technical Architect. My opinions do not reflect the opinions of my employer or my customers. Feel free to contact me on Twitter or Linkedin.
Revision History
[2024-08-02] Post published.
[2024-11-23] Included addenda about alternative properties of conscious beings.
[2024-11-26] Included addenda about AI safety.
Footnotes
I realise that that is very limiting and still open to different interpretations, but it is nonetheless the best I can do.
If given limitless resources.
The term is taken from the actor model theory in computer science.
A perception is different from an “experience” (or a “feeling”) because an experience is a perception perceived consciously. When I refer to a “perception”, I just refer to some sensory stimuli, intended or unintended.
Having preferences does not imply having agency, aka the will of acting upon the world to pursue one’s own preferences. In this context, the preferences only act as a way to determine if a perception has positive/negative/neutral "feeling" (with all its possible nuances).
As a separate note: preferences do not need to be rational nor consistent; they are not supposed to be a formal logical calculus or anything like that. In other words, you are allowed to wish for some rest while simultaneously wishing to earn more money, despite the two things may be incompatible.
By such definition, sensations such as pleasure or pain are not just perceptions, but true feelings because they are the result (respectively, pleasant or painful) of some evaluation (resp. liked or disliked).
I am implicitly stating that consciousness requires at least two factors: (1) continuity, that we can model with a stream of thoughts; and (2) unity-of-self, that we can model with a single highly-integrated decision-making process. That’s not enough to define consciousness, but (1) and (2) are some of its components.
While the presence of consciousness is somewhat uncertain, the presence of the unity-of-self in most biological beings (even elementary ones) is very common, most likely because it provides strong benefits - e.g. the ability to act in a highly coordinated way against unexpected circumstances, such as the presence of dangers or preys. The things that do not require active coordination are instead automatised by evolution - e.g. heartbeats, that are usually ignored at a conscious level.
Actually, a probability distribution.
Returning to a previous example: while you are allowed to wish for some rest while simultaneously wishing to earn more money, the two things are incompatible so your policy function can only choose to fulfil a single option at a time. The option may change depending on how tired you are, or how much money you are offered.
Tasks don’t need to be deterministic or unambiguous and, in fact, most of the times defining what is an “acceptable” response is actually unclear and fuzzy. That is okay since we are going to introduce a second actor to assess the work and decide subjectively if some outcome is acceptable or not.
Or maybe it’s the other way round? Maybe intelligence pushed humanity into generalism and exploration?