Defining Deception
Deception is misinformation on expectation
What counts as a lie?
Centrally, a lie is a statement that contradicts reality, and that is formed with the explicit intent of misleading someone. If you ask me if I’m free on Thursday (I am), and I tell you that I’m busy because I don’t want to go to your stupid comedy show, I’m lying. If I tell you that I’m busy because I forgot that a meeting on Thursday had been rescheduled, I’m not lying, just mistaken.
But most purposeful misrepresentations of a situation aren’t outright falsehoods, they’re statements that are technically compatible with reality while appreciably misrepresenting it. I likely wouldn’t tell you that I’m busy if I really weren’t; I might instead bring up some minor thing that I have to do that day and make a big deal out of it, to give you the impression that I’m busy. So I haven’t said false things, but, whether through misdirecting, paltering, lying by omission, or other such deceptive techniques, I haven’t been honest either.
We’d like a principled way to characterize deception, as a property of communications in general. Here, I’ll derive an unusually powerful one: deception is misinformation on expectation. This can be shown at the level of information theory, and used as a practical means to understand everyday rhetoric.
Information-Theoretic Deception
Formally, we might say that Alice deceives Bob about a situation if:
First Definition: She makes a statement to him that, with respect to her own model of Bob, changes his impression of the situation so as to make it diverge from her own model of the situation.
We can phrase this in terms of probability distributions. (If you’re not familiar with probability theory, you can skip to the second definition and just take it for granted). First, some notation:
For a possible state
xof a systemX, let\(p^A_{X}(x), \quad p^B_X(x)\)be the probabilities that Alice and Bob, respectively, assign to that state. These probability assignments
p^A_Xandp^B_Xare themselves epistemic states of Alice and Bob. If Alice is modeling Bob as a system, too, she may assign probabilities to possible epistemic statesq^B_Xthat Bob might be in:\(q^B_X \mapsto p^A_B(q^B_X)\)Let
\(p^{B \mid s}_X(x) = p^B_X(x \mid s)\)be Bob’s epistemic state after he updates on information
s. In other words,B|sis the Bob who has learneds.Take
Xto be the worldΩ. We’ll leave it implicit when it’s the only subscript.
With this notation, a straightforward way to operationalize deception is as information Alice presents to Bob that she expects to increase the difference between Bob’s view of the world and her own.
Taking the Kullback-Leibler divergence as the information-theoretic measure of difference between probability distributions, this first definition of deception is written as:
We can manipulate this inequality:
Write B,Ω for the product system composed of B and Ω, whose states are just pairs of states of B and Ω. The inequality can then be written in terms of an expected value:
This term is the proportion to which Alice expects the probability Bob places on the actual world state to be changed by his receiving the information s. If we write this in terms of surprisal, or information content,
we have
This can be converted back to natural language: Alice deceives Bob with the statement s if:
Second Definition: She expects that the statement would make him more surprised to learn the truth as she understands it1.
In other words, deception is misinformation on expectation.
Misinformation alone isn’t sufficient—it’s not deceptive to tell someone a falsehood that you believe. To be deceptive, your message has to make it harder for the receiver to see the truth as you know it. You don’t have to have true knowledge of the state of the system, or of what someone truly thinks the state is. You only have to have a model of the system that generates a distribution over true states, and a model of the person to be deceived that generates distributions over their epistemic states and updates.
This is a criterion for deception that routes around notions of intentionality. It applies to any system that
forms models of the world,
forms models of how other systems model the world, and
determines what information to show to those other systems based on its models of these systems.
An AI, for instance, may not have the sort of internal architecture that lets us attribute human-like intents or internal conceptualizations to it; it may select information that misleads us without the explicit intent to mislead2. An agent like AlphaGo or Gato, that sees humans as just another game to master, may determine which statements would get us to do what it wants without even analyzing the truth or falsity of those statements. It does not say things in order to deceive us; deception is merely a byproduct of the optimal things to say.
In fact, for sufficiently powerful optimizers, deception ought to be an instrumental strategy. Humans are a useful tool that can be easily manipulated by providing information, and it’s not generally the case that information that optimally manipulates humans towards a given end is simultaneously an accurate representation of the world. (See also: Deep Deceptiveness).
Rhetorical Deception
This criterion can be applied anywhere people have incentives to be dishonest or manipulative while not outright lying.
In rhetorical discussions, it’s overwhelmingly common for people to misrepresent situations by finding the most extreme descriptions of them that aren’t literally false 3. Someone will say that a politician “is letting violent criminals run free in the streets!”, you’ll look it up, and it’ll turn out that they rejected a proposal to increase mandatory minimum sentencing guidelines seven years ago. Or “protein shakes can give you cancer!”, when an analysis finds that some brands of protein powder contain up to two micrograms of a chemical that the state of California claims is ‘not known not to cause cancer’ at much larger doses. And so on. This sort of dishonesty exists in almost all political discourse.
Descriptions like these are meant to evoke particular mental images in the listener: when we send the phrase “a politician who’s letting violent criminals run free in the streets” to the Midjourney in our hearts, the image is of someone who’s just throwing open the prison cells and letting out countless murderers, thieves, and psychos. But the person making this claim will generally understand perfectly well that that’s not what’s really happening. So the claim is deceptive: the speaker knows that the words they’re using are creating a picture of reality that they know is inaccurate, even if the literal statement itself is true.
This is a pretty intuitive test for deception, and I find myself using it all the time when reading about or discussing political issues. It doesn’t require us to pin down formal definitions of “violent criminal” and a threshold for “running free”, as we would in order to analyze the literal truth of their words. Instead, we ask: does the mental image conveyed by the statement match the speaker’s understanding of reality? If not, they’re being deceptive4.
Treating expected misinformation as deception also presents us with a conversational norm: we ought to describe the world in ways that we expect will cause people to form accurate mental models of the world.
This isn’t exactly identical to the first definition. Note that I converted the final double integral into an expected value by implicitly identifying
i.e. by making Bob’s epistemic state independent of the true world state, within Alice’s model. If Alice is explicitly modeling a dependence of Bob’s epistemic state on the true world state for reasons outside her influence, this doesn’t work, so the first and second definitions can differ.
Example: If I start having strange heart problems, I might describe them to a cardiologist, expecting that this will cause them to form a model of the world that’s different from mine. I expect they’ll gain high confidence that my heart has some specific problem X that I don’t presently consider likely due to my not knowing cardiology. So there’s an expected increase in the divergence between our distributions that isn’t an expected increase in the cardiologist’s surprisal, or distance from the truth. Because the independence assumption above is violated—I take the cardiologist’s epistemic state to be strongly dependent on the true world state, even though I don’t know that state—the two definitions differ. Only the second captures the idea that honestly describing your medical symptoms to a doctor shouldn’t be deception: you don’t expect that they’ll be mis-informed by what you say.
Even for humans, there’s a gray zone where we do things whose consequences are neither consciously intended nor unintended, but simply foreseen; it’s only after the action and its consequences are registered that our minds decide whether our narrative self-model will read “yes, that was intended” or “no, that was unintended”. Intentionality is more of a convenient fiction than a foundational property of agents like us.
Resumes are a funnier example of this principle: if someone says they placed “top 400” in a nationwide academics competition, you can tell that their actual rank is at least 301, since they’d be saying “top 300” or lower if they could.
Of course everyone forms their own unique mental images; of course it’s subjective what constitutes a match; of course we can’t verify that the speaker has any particular understanding of reality. But you can generally make common-sense inferences about these things.


I want to offer a slightly different perspective and hear what you think:
Maybe deception can be split into two kinds.
You describe Deception 1-
and Deception 2 is best thought of as any impact that doesn’t want to make you as good at recognizing Deception 1s, even if it is converging you to the model of the speaker. So think about honest indoctrination by a true believer in whatever truth/falsehood. In this case, while they are bringing you closer to truth, they are actually degrading your future abilities to recognize Deception 1s, because they are making you think worse.
For example: by appealing to emotions/authority/wtvr, they are actually damaging you’re capability to reason, and are deceiving YOU as a person, even if the current facts are actually true.