Drawn In Perspective

Notes on phenomenal consciousness in simulators and simulacra

When people ask me whether I think LLMs are phenomenally conscious I often prefer to start by answering a different question: "if they were phenomenally conscious, should we expect them to be phenomenally conscious in a way we could reliably conclude from our interactions with them?".

I think that, if we are talking about the underlying neural networks, the answer to this second question is no. The reason I believe this is that there is no reason the things language models say should correlate with what it would be like to be a neural network (if there was anything it was like to be one at all). In fact I think there are several ways it could turn out to be the case that:

  • There is “something it is like” to be a language model's neural network; and
  • It turns out that thing is really quite different from what the language model says it is like.

The original literature on language models as simulators illustrates one way in which this could be the case. The main idea is that we should think of language models engaging in chat conversations as simulators running simulations of multiple personas in parallel, and generating their next outputs by sampling from the possible responses of these personas.

If this idea is new to you, I think the best place to start is with this fairly readable position paper1. There is then an earlier much more detailed forum post which fleshes out many of the ideas more.

Let's imagine that LLM chat bots are persona simulators, and there is something it is like to be a sufficiently advanced persona simulator (in other words, that a sufficiently advanced persona simulator is phenomenally conscious). Why should the conscious experience of the simulator match the reports of the things it is simulating? For example, perhaps the simulator experiences pleasure when its simulations are easy to predict. This would mean that a chat bot role-playing a character who is suffering in a predictable way might in fact be having a great time, in spite of its dialogue to the contrary.

I think this is a strong counterexample to the idea that we can draw conclusions about consciousness from LLM interactions. Stepping back from the specific case of simulators, here are three further important questions to ask:

  1. How can we tell if the underlying neural network is conscious, if interacting with language models is not a reliable approach?
  2. Even if underlying neural networks are not conscious in ways that chat interactions would predict, could these interactions still play a role in instantiating conscious entities?
  3. Why is it the case that interacting with humans is a reliable way to tell if they are conscious?

I think answering the first question requires a theory of phenomenal consciousness that answers such questions for any kind of information processing system - whether a brain, a neural network, or sufficiently advanced thermostat. Such a theory could range from saying that consciousness has nothing to do with information processing, to giving us a precise way to analyse an information processing system and make claims about whether it is conscious or not.

With regards to the second question, there is a relevant quote from David Chalmers (mentioned in the Simulators forum post) from his Daily Nous interview on GPT3:

GPT-3 does not look much like an agent. It does not seem to have goals or preferences beyond completing text, for example. It is more like a chameleon that can take the shape of many different agents. Or perhaps it is an engine that can be used under the hood to drive many agents. But it is then perhaps these systems that we should assess for agency, consciousness, and so on.

I think such simulated entities can already be meaningfully assessed for some mental properties like agency. When it comes to consciousness, in particular phenomenal consciousness - my main challenge for a theory claiming to make such assessments is that it needs to explain why fictional characters which a human author makes up in the process of say, writing a book, are not phenomenally conscious.

For the third question, I think this is well covered by existing philosophy literature on the problem of other minds. Essentially my view is that our own introspective experiences provide relevant, and very strong evidence that humans, and animals like humans are conscious, and that they are conscious in ways which is reflected in their behaviour. I touch on what sort of evidence this is, and how it relates to empiricism, in my blog post on armchairs.


  1. As an aside, I find it interesting that this paper distinguishes between: (1) a language model as an abstract mathematical structure - specifically a conditional probability distribution (2) a neural network which realises that structure, and (3) the simulacra the language model simulates. I don't make as formal a distinction between (1) and (2) in this blog post. In particular, I don't generally think of language models as conditional probability distributions but rather as specific mechanistic recipes for realising those distributions. I imagine it would make sense to e.g. have two quite different language models with the same conditional probability distribution. I wonder how much this diverges from what is standard in other parts of ML literature. 

Thoughts? Leave a comment