Encyclopedia of Law and Economics
Show Less

Encyclopedia of Law and Economics

Edited by Gerrit De Geest

The second, expanded edition of the acclaimed Encyclopedia represents a major update of the most authoritative reference work in the field of law and economics and the nine print volumes are now released online as a single integrated product.The Encyclopedia provides balanced and comprehensive coverage of the major domain in law and economics, including: criminal law, regulation, property law, contract law, tort law, labor and employment law, antitrust law, procedural law, and the production of legal rules. Each theme or volume is overseen by a leading scholar and each of the 166 entries is prepared by an expert in the field, providing an in-depth and authoritative overview of the individual topic, combined with an exhaustive bibliography, allowing users to access and filter the entire corpus of literature in law and economics.As with the print edition, the Encyclopedia is unique in serving both as an entry point and a platform for advanced research. The online edition is enhanced with Elgaronline’s powerful search tools, facilitating the search for key terms across the entire Encyclopedia, whilst the browse function allows users to move seamlessly between the volumes. These elements combine to create a powerful research tool for any researcher or scholar in the field of law and economics.
Show Summary Details
This content is available to you

Chapter 9: Evidence: theoretical models

Chris William Sanchirico

[In: Volume 8, Chris William Sanchirico (ed) Procedural Law and Economics]

1 Introduction

Few legal disputes are solely a matter of how the law should be interpreted. The parties to a legal dispute are rarely in complete agreement regarding who did what, when, to whom, under what circumstances. Thus, courts – acting through professional judges or juries of citizens – routinely generate “findings of fact,” as opposed to “findings of law.” Such factual determinations are the subject of the field of law called “evidence.”

The fundamental questions of legal evidence are these: First, how do “fact finders” (judges or juries, as the case may be) make deductions about factual issues? Second, how should fact finders make such deductions?

Legal scholars and social scientists have taken varying approaches to these questions. This entry specifically concerns models of legal evidence, where the word “model” is defined to encompass any approach explicitly grounded in, and making abundant use of, mathematical reasoning. Most of the entry concerns theoretical contributions, although occasional reference is made to pertinent empirical findings. Models of legal evidence appear chiefly in three overlapping scholarly literatures: the legal literature on evidence, the law and economics literature, and the economics literature on game theory and mechanism design.

Section 2 situates models of evidence within the larger context of models of litigation. This discussion helps to clarify the boundaries of the entry. It also explains why evidence models, which are far outnumbered by nonevidentiary models of litigation, fill an important gap in the literature.

Sections 3 to 6 describe the four main approaches taken in modeling evidence.

Section 3 discusses “pure probabilistic deduction.” The fact finder in these models interprets the evidence it receives using Bayes’ rule (defined within), but usually without accounting for the strategic interests of the party supplying the evidence. This is the dominant modeling approach in the legal literature on evidence.

Section 4 discusses the dominant approach to evidence in the economics literature on game theory and mechanism design, “omission models.” This approach explicitly accounts for strategic interests of the parties who supply the evidence. However, it exogenously limits the means that parties p. 204have to pursue those interests. Parties in these models may omit to report all that they know, but they may not lie or fabricate evidence.

The third approach, “endogenous cost signaling,” which appears in the law and economics literature on evidence, is discussed in Section 5. These models view evidence as a form of differential cost signaling, an approach that encompasses both omission and fabrication. These models also link evidence production to the out-of-court activities that form the basis of the suit (“primary activities”). Such models allow for the possibility that the evidence used by the fact finder is such that its cost to the party presenting it is at least probabilistically determined by (“endogenous” to) the party’s primary activity choices. By means of this linkage, the secondary activity of evidence production gains potential as a device for setting primary activity incentives.

Section 6 discusses a fourth and still largely nascent approach to legal evidence, “correlated private information.” This approach – which has been extensively studied outside the evidence context in the economics literature on mechanism design and game theory – explores the potential benefits of playing parties off against each other, as when party A’s evidence is used to reward or punish party B, and vice versa.

In preparing this chapter, I have not attempted to catalogue the full set of contributions on the topic. Rather, I have tried to describe several seminal and/or representative articles in enough detail to communicate not only their findings but also their reasoning. Furthermore, I have devoted a fair portion of the entry to comparing, contrasting and evaluating different approaches. My ultimate goal is to provide the reader with a template. The hope is that this template will facilitate the assimilation of not only those contributions discussed in the entry, and not only the many worthwhile existing contributions that could not be discussed, but also the many contributions that are hopefully yet to come.

2. The Law and Economics of Evidence in the Larger Context of the Law and Economics of Litigation

Before reviewing the four modeling approaches to legal evidence, it is important to situate the full quartet of approaches within the subsuming context of models of litigation. Most of the vast literature on the law and economics of litigation, which is surveyed elsewhere in this volume,1 is not primarily concerned with how the fact finder does or should make deductions from evidence. From an evidentiary point of view, these nonevidentiary models may be sorted into “p models” and “p(x, y) models.”

2.1. p. 205p Models

The vast majority of contributions to the law and economics of litigation focus on parties’ incentives to initiate lawsuits and to settle them out of court. The main event that filing leads towards, and that settlement avoids – that is, the fact finding process – is not explicitly modeled in these studies. Rather, the fact finding process is summarized by an exogenous probability p of plaintiff or prosecutor victory – a modeling structure dating back to Becker (1968).

For example, in Gould’s (1973) and Posner’s (1973) seminal analyses of settlement in civil lawsuits, the plaintiff2 will accept nothing less from the defendant in settlement of the case than the amount ppW-cp, where pp is the plaintiff’s assessment of the probability that she will prevail at trial, W is the amount she will receive from defendant if she wins and cp are the costs she incurs if the case proceeds to trial. Similarly, in Shavell’s (1982) analysis of potential plaintiffs’ incentives to file suit given the anticipated possibility of future settlement, the plaintiff files suit if and only if ppW>cp.3 In both of these models, the outcome of trial (or rather the plaintiff’s belief regarding such) is summarized with a single number between zero and one, “pp.” This number is not derived within the model, but is rather an exogenous parameter.

Many other more recent models of litigation, too numerous to mention, also employ this fixed exogenous probability structure. In such models, the institution of legal fact finding is perfunctorily depicted as a kind of “one-arm bandit.” The parties drop their “coins” into the fact finding machine (i.e., they pay their respective trial costs), the lever is pulled (fact finding occurs), and a number is generated (i.e., the amount that the defendant owes the plaintiff, or the criminal penalty to which the defendant is subjected). The internal mechanics of the fact finding machine are not examined. The focus is rather on the plaintiff’s right to force the defendant to pay and play (the incentive to file suit) and the prior negotiations that may occur between the parties for the purpose of avoiding having to feed coins into the machine (settlement).

2.2. p(x, y) Models

In some models of litigation the probability of plaintiff victory is not a fixed number, but rather a fixed function – in particular, a fixed function p. 206mapping scalar measures of the amount of effort exerted by each litigating party in preparing and arguing her case onto the chance that the plaintiff will win the case.

These functions are specific examples of what are often called “contest success functions,” and litigation, when modeled in this way, is an example of a more general game theoretic phenomenon referred to as a “contest:” “a game in which the players compete for a prize by exerting effort so as to increase their probability of winning” (Skaperdas 1996: 283). Contest models have also been used to study, among other things, rent-seeking, elections, lobbying, research and development races, and sports. See Corchón (2007) for a recent survey of contest theory.

Within the literature on litigation, for instance, consider Posner’s (1973) seminal model of litigation expenditure, in which the plaintiff’s probability of prevailing is given by p(x,y)=ex/(x+y), where x and y are the plaintiff’s and defendant’s litigation spending/effort, respectively, and e is a parameter representing the relative effectiveness of plaintiff spending. Katz (1988), and Bernardo, Talley, and Welch (2000) analyze a similar probability function. Katz (1988), Braeutigam, Owen and Panzar (1984), and Rubinfeld and Sappington (1987) analyze a more general, but nevertheless fixed, probability function.4

From the perspective of legal evidence, positing an exogenous probability function is similar to positing an exogenous probability number. To return to, and expand the “one-arm bandit” analogy, the parties may now vary how many coins they drop into their respective slots, and the relative amounts will affect the readout after the lever is pulled, but the mechanics are still hidden inside the box. The structure of the exogenous probability function p(x,y) is, by definition, not derived within the model, but is rather a parameter that is imposed upon the model. The papers in this branch of the literature generally do not supply a positive theory to justify their choice of the probability function that the fact finder is assumed to deploy. Nor is their choice of functional form generally accompanied by a normative analysis of what such a function should look like given some set of societal objectives. Rather, as Skaperdas (1996: 283) notes, “a considerable majority of the papers in the contest literature [including its sub-literature on litigation] has been employing specific functional forms . . . without any particular reason other than analytical convenience.”

p. 207Furthermore, litigation effort in these models is a single-dimensional variable. No description is provided regarding how parties can, do, or should direct their litigation effort within the multidimensional space of evidence production. No story is told regarding how different manifestations of effort should be differently interpreted by the fact finder.

2.3. Assessment

The law and economics of litigation – inclusive of legal evidence – has only rarely stepped beyond the two kinds of reduced form models just described, and it is worth noting that this state of affairs was not entirely to be expected. Several subfields of economics seem particularly well suited to the study of legal fact finding per se, including game theory, information economics, and mechanism design. These overlapping fields, which have been fruitfully applied to such diverse areas as taxation, regulated industries, and auction design, concern themselves with problems that arise from the combination of hidden actions, hidden information and conflicting interests – precisely the kinds of issues that would seem to be implicated by adversarial fact finding. Yet the potential for applying and extending these fields to encompass legal fact finding remains largely untapped.

Notwithstanding this general trend in litigation modeling, a relatively small group of papers takes a complementary approach. Rather than modeling fact finding in reduced form so as to focus on filing and settlement, these papers eschew a detailed analysis of filing and settlement in favor of a more explicit and in depth account of fact finding. I now turn to a review of the four main approaches that can be found in this sub-literature.

3. Pure Probabilistic Deduction

The first approach to fact finding is “pure probabilistic” deduction, as typified and advanced by Finkelstein and Fairly (1970) and Lempert (1977).5 This approach analogizes fact finding to the probabilistic deductions of an unbiased researcher examining data in the laboratory or in the field. The approach may be normative or positive.

This part of the entry first describes the basic structure of the pure probabilistic deduction model. Next, it reviews some salient applications of the model. Lastly, it describes several criticisms that have been leveled against the approach.

3.1. p. 208Basic structure

Readers who are already familiar with conditional probability and Bayes’ rule may want to skip to Section 3.2.

The basic elements of the pure probabilistic deduction approach can be grasped by examining the following simple hypothetical.

Charged with deciding whether to hold the defendant guilty of the crime charged (or liable for damages), a fact finder has two tasks. The first is to determine the likelihood of guilt. The second task is to decide on a verdict. Consider first the fact finder’s task of determining the likelihood of guilt.

3.1.1. Bayes’ rule

On seeing a given piece of evidence, the fact finder modifies her belief regarding guilt – just as one might change one’s beliefs regarding the speed of a horse on seeing it win a race. Beliefs, before and after evidence is observed, are represented as probabilities and are accordingly assumed to be subject to the mathematical properties thereof. One of those properties, Bayes’ rule, provides a formula for updating beliefs upon receiving new evidence.

3.1.1.1. the odds formulation

There are several equivalent statements of Bayes’ rule. The “odds formulation” of Bayes’ rule is generally the most convenient to express and deploy. Let G be the event that the defendant is guilty. Let I be the event that the defendant is innocent. Let E be a particular, arbitrarily chosen body of evidence that may be observed by the fact finder. Let P(G), a number between zero and one, be the prior probability of guilt – the probability that the fact finder places on the defendant’s being guilty before the evidence has been seen.6 Let P(I) be the prior probability of innocence. The prior odds of guilt are defined to be P(G)/P(I) If, for example, the prior probabilities of guilt and innocence are both 0.5, then the prior odds of guilt equal one, in which case we might say that the odds of guilt are “one to one.” Conversely, if the prior odds of guilt are two – “two to one,” that is – then the prior probability of guilt must be two-thirds and the prior probability of innocence must be one-third.

What value should be placed on the prior odds of guilt in this framework? One might argue that, as a descriptive matter, the prior odds of guilt are given by the fact finder’s knowledge that the defendant has been arrested or indicted. (As discussed in Section 3.2.2 below, Lempert, Gross, and Liebman (2000) emphasize this point in their analysis of character p. 209evidence.) Or one might argue normatively that the fact finder must act as if the prior odds of guilt are one, so that the fact finder is “unbiased.” Or perhaps the prior odds of guilt should be less than one: to give content to the “presumption of innocence.”

The probability of guilt conditional on having seen the evidence, the “posterior probability” of guilt is denoted P(G|E) In general, the probability of event A conditional on (positive probability) event B is defined as follows: P(A|B)=P(AB)/P(B), where AB is the conjunctive event that both A and B occur. The probability of event A conditional on event B may thus be viewed as the proportion of all possible “states of the world” where B occurs that are also states of the world in which A occurs.

The posterior probability of innocence is denoted P(I/E) and is similarly defined. Note that P(I|E)=1-P(G|E). That is, if the probability of guilt conditional on the evidence E is 0.6, then the probability of innocence conditional on the evidence E must be 0.4. Conditional probabilities add to one over all possible outcomes, just like unconditional probabilities.

The posterior odds of guilt (cf. the prior odds of guilt and the posterior probability of guilt) are defined to be P(G|E)/P(I|E).

It is easy to confuse the probability of guilt conditional on having seen the evidence, P(G|E), as just defined, with P(E|G), the probability of seeing evidence E conditional on the defendant’s being guilty. (See the discussion of “base rate neglect” in Section 3.2.1 below.) These two conditional probabilities are not generally equal to each other. The probability of clouds given rain (very high, if not one) is not the same as the probability of rain given clouds (not as high). Similarly, let E be the subset of adults in some group of adults and children, and let G be the subset of the same group of adults and children that are over six feet tall. Assign equal probability to each person in the overall group – so that probabilities correspond to population frequencies. The probability that an adult in the group is over six feet tall, P(G|E), is not the same as the probability that a person over six feet tall in the group is an adult, P(G|E). It may be that half the adults are over six feet (P(G|E)=1/2), but that no children in the group are over six feet tall so that every person over six feet is adult (P(E|G)=1).

The conditional probability P(E|I) is similarly defined, and should be similarly distinguished from P(I|E).

The probabilities of the evidence conditional on each of the possible underlying truths, P(E|I) and P(E|G), play an important role in Bayes’ rule. The ratio of the probability of the evidence conditional on guilt to the probability of seeing the same evidence conditional on innocence, P(E|G)/P(E|I) is called the “likelihood ratio” (of or for the evidence E relative to the event of guilt). This likelihood ratio is discussed in more detail in Section 3.1.1.2 immediately below.

p. 210Bayes’ rule, in its odds formulation, expresses a simple relationship among the three ratios that have just been defined – the prior odds of guilt, the posterior odds of guilt, and the likelihood ratio:

P(G|E)P(I|E)posteriorodds=P(E|G)P(E|I)likelihoodratio×P(G)P(I)priorodds.

Thus, the posterior odds of guilt equal the prior odds of guilt multiplied by the likelihood ratio. The derivation of this equation is provided below the line.7 A numerical example is provided in Section 3.1.1.2 immediately below. The process of deducing posterior odds from a given piece of evidence is called “(Bayesian) updating.”

Information about the probabilities (as opposed to the odds) of guilt before and after the evidence is observed is easily recovered from the odds formulation of Bayes’ rule. Using the two equations, P(I|E)=1-P(G|E) and P(I)=1-P(G) Bayes’ rule can be rewritten to show how (a particular strictly increasing function of the) probability of guilt conditional on the evidence depends on (the same strictly increasing function) of the prior probability of guilt:

P(G|E)1P(G|E)strictlyincreasingfunctionofprobofguiltConditionalonevidence=P(E|G)P(E|I)likelihoodratio×P(G)1P(G)strictlyincreasingfunctionofprobofguilt.

Given a number for O(G|E)=P(G|E)/(1-P(G|E)), for example, one recovers P(G|E) algebraically or, what is the same, by means of the general relation O=P/(1-P)P=O/(1+O).

3.1.1.2. p. 211The Likelihood Ratio

The key to using and understanding Bayes’ rule is the likelihood ratio: the probability of seeing the evidence were the defendant truly guilty divided by the probability of seeing the evidence were the defendant truly innocent.

The likelihood ratio’s role in Bayes’ rule embodies the following intuition: The evidence E may be quite likely to arise when the defendant is guilty. And one may be tempted to conclude from this fact alone that E indicates guilt. But if E is also quite likely to arise when the defendant is innocent, E will not be (or rather should not be regarded as) particularly informative of guilt. That is, in the latter case, posterior beliefs of guilt generated upon seeing the evidence according to Bayes’ rule will not deviate markedly from prior beliefs of guilt.

Conversely, the evidence E may be quite unlikely to arise when the defendant is guilty. But if it is even less likely to arise when the defendant is innocent, it will be particularly informative of guilt. That is, the posterior odds of guilt will indeed be markedly higher from the prior odds.

What matters, therefore, is the likelihood that the evidence would arise given guilt relative to the likelihood that it would arise given innocence. The word “relative” is given precise content in Bayes’ rule: what specifically matters is the ratio of these likelihoods – that is, what matters is the likelihood ratio.

If, for example, the prior odds are one, and the evidence is twice as likely to arise given guilt than it is given innocence, then the posterior odds are 2, meaning that guilt is twice as likely as innocence (after the evidence has been observed), or that the posterior probability of guilt is 2/(1 + 2) = 2/3.

3.1.2. Iterated application of Bayes’ rule

Conditional probabilities are themselves probabilities and Bayes’ rule may be applied to them as well, as though they were prior probabilities. The iterated application of Bayes’ rule will be useful in interpreting “trial selection bias,” as discussed in Section 3.2.2 below.

Thus, after seeing evidence E, the posterior odds of guilt – where “posterior” is defined relative to E – are, as already discussed, P(G|E)/P(I|E). If additional evidence F is then observed, the E-posterior odds of guilt may be further updated by treating P(G|E)/P(I|E) as the prior odds relative to F. This results in the formula:8 p. 212P(G|EF)P(I|EF)=P(F|GE)P(F|IE)×P(G|E)P(I|E).

The left-hand side of this equation represents the odds of guilt given that evidence E and F have both been observed. On the far right-hand side appear the “medial” odds of guilt, the odds of guilt given that evidence E has been observed; there are as yet no observations either way regarding event F. The other ratio on the right is the relevant likelihood ratio for the second round of updating, P(F|GE)/P(F|IE).

It is important to notice that the likelihood ratio for second-round updating takes into account that evidence E has already been observed. In general, P(F|G)/P(F|I), which does not account for E’s prior occurrence, is not the same as P(F|GE)/P(F|IE). For instance, a witness’ sighting of the defendant on the block where the murder took place (F) means something different if the defendant establishes that his elderly mother happens to reside on that block (E).

The updating process may then be repeated again for additional evidence G using P(G|EF)/P(I|EF) as prior odds, and so on.

3.1.3. The loss function

Having updated her beliefs on all the evidence, the fact finder’s second task is to decide on a verdict. Here she considers not just her posterior assessment of guilt, but also her view of the relative harm of wrongful conviction versus wrongful acquittal. Let WC be the cost of wrongful conviction and let WA be the cost of wrongful acquittal. Letting E now represent the conjunction of all evidence, the fact finder chooses to convict if P(I|E)WC<P(G|E)WA, that is, if the expected cost of wrongful conviction exceeds the expected cost of wrongful acquittal. (Here we are taking E to be the totality of the evidence presented in the case.) If, for instance, the cost of wrongful conviction exceeds the cost of p. 213wrongful acquittal, then conviction will not follow even if the fact finder assesses the probability of guilt to be greater than 50%. The elevated reasonable doubt standard of proof in criminal cases is sometimes justified in this way.

3.2. Applications

Bayes’ rule has proven useful in clarifying and assessing various theories and empirical findings regarding fact finder error.

3.2.1. Base rate neglect

A large literature within evidence scholarship combines the prescriptive aspect of Bayes’ rule with experimental evidence9 to argue that fact finders are prone to misinterpret evidence. In particular, individuals are said to “neglect base rates” in drawing inferences from evidence. For instance, they know or are told that if the defendant were guilty, the evidence they are being shown would arise with 90% probability, and they incorrectly deduce from this that, having seen such evidence, they should regard the probability that the defendant is guilty as 90%. In making this improper deduction the fact finder, it is sometimes said, neglects the fact that the percentage of guilty individuals in the overall population – the “base rate of guilt” – is very low. Other times it is said that the fact finder is neglecting the base rate of the evidence itself. Still other times it is simply said that the finder is “neglecting base rates.”

Bayes’ rule helps uncover the precise nature of the mistake described in the example in the immediately preceding paragraph. It is also useful in describing how precisely the mistake deserves the name “base rate neglect” – that is, how the mistake is logically related to the base rate for guilt and/or evidence.

The fact finder’s mistake corresponds to confusing P(G|E) with P(E|G). As discussed in Section 3.1.1.1 above, the fact that P(E|G) is 90% does not imply that P(G|E) is also 90%. As is evident from the version of Bayes’ rule presented in equation (9.2), the reason that the probability of guilt conditional on the evidence P(G|E) does not necessary equal the probability of the evidence conditional on guilt P(E|G) is that the former depends on two other things besides P(E|G). It depends on the base rate of guilt, P(G) (as manifest in the prior odds of guilt). And it depends on P(E|I), the probability that the evidence would arise were the defendant innocent.

p. 214It is thus apparent that the mistake is not solely a matter of neglecting the base rate of guilt. It is also not solely a matter of neglecting the base rate of the evidence. The unconditional probability of the evidence may be written as a weighted average of the two conditional probabilities:10

P(E)=P(E|G)P(G)+P(E|I)(1P(G)).

If the fact finder, given P(E|G), also took into account P(E), but nothing else besides these two quantities, she would still not be able to determine P(G|E). In terms of the analysis in the previous paragraph, some algebraic manipulation shows that the fact finder would not be able to determine P(E|I) and P(G) from this limited information.11 (However, it is easier to see that P(G|E) could not be so determined by examining equation (9.4) below.)

Yet, the simultaneous neglect of both base rates – that of guilt and that of the evidence – does adequately describe the fact finder’s mistake. This can be seen in two ways. First, from equation (9.3), it is clear that combining P(E|G) with both P(G) and P(E) makes it possible to recover P(E|I). This latter quantity, along with P(G) and P(E|G) can then be taken to Bayes’ rule to determine P(G|E).

Second, and more directly, we may use the definition of conditional probability to write

P(G|E)=P(E|G)P(G)P(E).
Thus, in order to “reverse the roles” of conditioning and conditioned events, it suffices to know the base rate of both guilt and the evidence. Conversely, to mistake P(E|G) for P(G|E) is to neglect either the base rate of guilt or the base rate of the evidence, or both.

More precisely, as (9.4) makes clear, to reverse conditioning and conditioned events it is not necessary to know the separate values of the two base rates. It suffices to know their ratio. Thus, a more informative (if not more appealing) label for the fact finder’s mistake might be “ratio of base rates neglect.”

3.2.2. p. 215Trial selection bias and past crimes evidence

Another application of Bayes’ rule concerns the propriety of admitting evidence of the defendant’s past crimes. Lempert, Gross, and Liebman (2000)12 argue against admitting past crimes evidence on the following basis: although past crimes evidence may be probative of guilt among the general population, it will not be as probative of guilt (and may even be probative of innocence) among the set of defendants whose cases actually make it to trial. Lempert, Gross, and Liebman (2000) offer several reasons why this phenomenon might arise. Most famously, they suggest that it is rooted in the tendency of the police to “round up the usual suspects.”

The trial selection bias argument can be formalized using an iterated application of Bayes’ rule, as described above.13 As we shall see, the exercise of formalizing the argument brings to the fore some potential weaknesses.

Fix an individual and an alleged crime. Let R be the event that the individual has a criminal record and let T be the event that he stands trial for the crime. The trial bias argument concerns the manner in which the individual’s past record affects the odds of guilt given that the individual now stands trial for the crime. We are thus interested in a statement of Bayes’ rule that shows how the existence of a past record converts the odds of guilt given just trial into the odds of guilt given both trial and a past record. This may be written as:

P(G|RT)P(I|RT)=P(R|GT)P(R|IT)×P(G|T)P(I|T),

Thus, accounting for trial, the likelihood ratio of a past record of guilt is LTR=P(R|GT)/P(R|IT). This likelihood ratio appears similar to the likelihood ratio of a past record, not accounting for trial, LR=P(R|G)/P(R|I). The difference however, is that this new likelihood ratio represents the information value of a past record for determining guilt among defendants who appear at trial, rather than among all potential defendants. As noted in the example presented in Section 3.1.2 above (involving the defendant and his elderly mother), these values may differ substantially.

The claim of the trial selection argument is that a past record, though generally informative of guilt, is not as informative of guilt – and may be informative of innocence – when such evidence is presented in the context of trial. Thus, the trial bias hypothesis is that while LR may well be larger p. 216than one, LTR is less than LR, and possibly even less than one. What does this require? It is easy to confirm that the two likelihood ratios – describing the information value of a past record with and without trial – are related as follows:

LTR=[P(T|GR)P(T|G)P(T|IR)P(T|I)]LR.

The trial selection bias argument is thus equivalent to the assertion that the double ratio in brackets in the immediately preceding equation is less than one (and possibly even less than 1/LR). This double ratio is less than one if and only if its denominator exceeds its numerator:

P(T|IR)P(T|I)>P(T|GR)P(T|G).

This inequality is perhaps more easily interpreted if rewritten in more compact notation:

PI(T|R)PI(T)>PG(T|R)PG(T).

Inequality (9.6) is a math-symbolic statement of the key condition for the existence of the kind of trial bias that has been proposed: having a past record must increase the individual’s chance of standing trial for the crime by a greater proportion if the individual is innocent, than if he is guilty.

Given that (9.6) is its key condition, the trial selection bias argument against past crimes evidence is arguably more fragile and problematic than it may at first appear. Condition (9.6) concerns a difference in differences. The issue is not whether having a past p. 217record increases the individual’s chance of standing trial. The issue is whether having a past record increases the individual’s chance of standing trial more if the individual is innocent, than if he is guilty. That a past record makes a difference for appearance at trial seems reasonably clear. How this difference differs across innocent and guilty defendants is far from certain and would seem to vary greatly across circumstances.

At the very least, the explanations that have thus far been provided for trial bias seem inadequate as justifications for the real condition, (9.6). In particular, trial bias cannot be solely a matter of the tendency of the police to round up the usual suspects. Such a tendency implies that having a past record increases the individual’s chance of standing trial, period – both when the individual is innocent and when she is guilty. The usual suspects dynamic, by itself, is essentially silent on whether the increased chance of standing trial due to having a record is greater for the innocent than for the guilty.14

3.2.3. Across-person hindsight bias and its rational twin

Several authors have investigated the possibility that fact finders are subject to across-person hindsight bias: namely, “in hindsight, people consistently exaggerate what could have been anticipated in foresight . . . People believe that others should have been able to anticipate events much better than was actually the case.”15 Experimental evidence has been offered to establish that across-person hindsight bias exists.

Across-person hindsight bias has important legal ramifications when a litigant’s knowledge at the time of her alleged action or omission is at issue in the case. The bias may, for example, cause fact finders to misjudge whether due care was exercised under a negligence standard, whether warnings were adequate in product liability, or whether an accident was reasonably foreseeable under strict liability.

Yet, is it really a mistake to use knowledge that an event E has occurred in judging whether others should have or did in fact know that it was likely to occur ex ante? Bayes’ rule can be used to show that is not incorrect in the case that one believes that (1) these others may have been in a position to know about the chance of events like E; and (2) the fact finder does not have access to full information regarding what these others knew.16

Thus, assume that it is common knowledge that the defendant took a particular action. Let A be the event that an accident occurred. Let I be the ex ante information observed only by the defendant prior to her decision to take the action regarding whether an accident would occur as a result.

From the perspective of the fact finder, the value of the information I received by the defendant is an uncertain variable. Thus, the defendant’s various conditional probability assessments that are based on that information are, from the fact finder’s perspective, random variables.

The fact finder is interested in whether, having seen the information I, whatever its value may have been, the defendant “knew” that the accident p. 218would occur. More precisely, the fact finder must determine whether the defendant’s assessment of the conditional probability of an accident exceeded some legally determined state of mind threshold k:

PD(A|I)k.
17

The subscript “D” indicates that the preceding probability represents the defendant’s subjective beliefs. The value of the number PD(A|I) depends on I, which is uncertain from the fact finder’s point of view. Thus, from the fact finder’s perspective, PD(A|I) is simply a random variable and the question for the fact finder is whether this random variable exceeds k.

The above threshold condition may be restated in terms of the defendant’s I-posterior assessment of odds of an accident:

PD(A|I)PD(¬A|I)k1-ko.

This odds ratio is, from the fact finder’s viewpoint, also a random variable.

The fact finder directly observes whether A or ¬A (“not A”) occurs. The fact finder then uses this information to update her beliefs regarding whether the random variable PD(A|I)/PD(¬A|I) exceeds threshold o.

The question whether outcome information rationally changes the fact finder’s assessment of defendant’s knowledge may be formalized as follows: is it true that

PF(PD(A|I)PD(¬A|I)o|A)>PF(PD(A|I)PD(¬A|I)o|¬A)?

The subscript “F” indicates that the probability represents the fact finder’s subjective beliefs.

In words, the issue is whether the rational fact finder’s assessment of the probability that the defendant “knew an accident would occur” conditional on the defendant’s having observed I, whatever I might have been, is greater when the fact finder observes that an accident did occur than when the fact finder observes that an accident did not occur.

p. 219Substituting from the odds formulation of Bayes’ rule (shown above), (9.7) becomes

PF(PD(I|A)PD(I|¬A)o|A)>PF(PD(I|A)PD(I|¬A)o|¬A)

where oo(PD(¬A))/(PD(A)). Therefore, the question is whether the defendant’s likelihood ratio for the information I – which information is probabilistic from the fact finder’s point of view – should be considered more likely to have been above the (transformed) threshold o when it is observed that an accident has occurred than when it is observed that an accident has not occurred.

Now, consider the special case in which I takes two values I¯ and I_<I¯.18 To make the problem non-trivial, assume that the defendant’s observation of I¯ raises the defendant’s assessed odds of an accident above the threshold, while the defendant’s observation of I_ does not. That is, in terms of the transformed threshold o, assume

PD(I¯|A)PD(I¯|¬A)>o>PD(I_|A)PD(I_|¬A)

Therefore, from the fact finder’s perspective, the event that the defendant’s random odds ratio PD(I|A)/PD(I|¬A) was greater than or equal to o is equivalent to the event that the defendant saw I¯. Thus, condition (9.8) reduces to the condition

PF(I¯|A)>PF(I¯|¬A).

Condition (9.10) is the condition that the fact finder’s likelihood ratio for the defendant’s observation of I¯ exceeds 1. This is the condition that, were the fact finder to have learned – prior to learning of the occurrence of an accident – that the defendant had observed I¯, the fact finder would have raised her assessment of the odds that an accident would occur.

Condition (9.10) is satisfied if the fact finder has the same view as the defendant regarding what it means for the chance of an accident when the defendant sees I¯. More formally, if the fact finder’s likelihood ratios for the two values of the signal are ordered in the same manner as the defendant’s – that is, if

PD(I¯|A)PD(I¯|¬A)>PD(I_|A)PD(I_¬A)impliesPF(I¯|A)PF(I¯|¬A)>PF(I_|A)PF(I_|¬A)

p. 220then condition (9.10) follows from condition (9.9):

PD(I¯|A)PD(I¯|¬A)>PD(I_|A)PD(I_|¬A)PF(I¯|A)PF(I¯|¬A)>PF(I_|A)PF(I_|¬A)PF(I¯|A)PF(I¯|¬A)>1PF(I¯|A)1PF(I¯|¬A)PF(I¯|A)(1PF(I¯|¬A))>PF(I¯|¬A)(1PF(I¯|A))

Therefore, if the defendant received, prior to the accident, private information that determined her legal state of mind, and if the fact finder and defendant generally agree on the meaning of such information vis-à-vis the chance of accident, then, having observed that an accident did in fact occur, the fact finder should rationally increase her probability assessment of the event that the defendant had a culpable state of mind. In other words, when the fact finder cannot know all that the defendant knew, the fact finder’s hindsight that an accident did occur is rationally informative of the defendant’s foresight that an accident would occur.

Thus, something observationally similar to across-person hindsight bias is actually a rational response to outcome information. This does not, of course, imply that across-person hindsight bias is nonexistent. Nevertheless, many experimental designs that attempt to measure the unwarranted outcome adjustment of across-person hindsight bias do not appear to control adequately for the presence of the correlate rational adjustment. The result obtained from such experiments may thus overstate the magnitude of the irrational adjustment.

What if the fact finder also observes what the defendant observed, which is to say I? In this case, the occurrence of an accident is irrelevant for determining the defendant’s state of mind.

To see this, return to the more general case in which the information may take many values. (The following analysis will also serve to explicate this more general case.) The question is again whether the following inequality holds for any given value of I:

PF(PD(I|A)PD(I|¬A)o|AI)>PF(PD(I|A)PD(I|¬A)o|¬AI).

p. 221Let be the subset of information values I such that PD(I|A)/PD(I|¬A)>o Then the inequality above reduces to PF(|AI)>PF(|¬AI). Using the definition of conditional probability, this may be restated as

PF(AI)PF(AI)>PF(¬AI)PF(¬AI).

This condition is impossible. The left and right sides of this strict inequality are always equal. If I, then the numerators on both sides are zero, and so also the ratios. If I, then the appearance of in the numerators is redundant, and both ratios are equal to one.

In words, if the fact finder knows what the defendant knew at the time of the defendant’s decision, then the finder knows the defendant’s state of mind at the time of the defendant’s decision, and information regarding whether an accident later occurred is superfluous. Therefore, the rationality of using outcome information to determine state of mind relies on the (plausible) premise that the fact finder does not directly observe the information that was available to the defendant at the time of the defendant’s decision.

3.3. Paradoxes and Criticisms

The pure probabilistic approach to fact finding accords with the conventional approach to decision making under uncertainty followed by probabilists, statisticians, and economists.19 Even in these originating fields, however, it is not without detractors. Paradoxes such as those due to Allais (1953) and to Ellsberg (1961), for instance, call into question both its descriptive power and its normative validity.20

As a description of legal fact finding, in particular, it has been subject to a number of specific objections.21 Two of these are clearly laid out in Allen (1986). A third leads naturally to a discussion of the next class of evidence models.

3.3.1. The conjunction problem

The first is the “conjunction problem,” itself the result of a conjunction of legal features. First, guilt or liability often turns on two or more findings of fact – as when a verdict requires findings regarding how the defendant acted, what harm she thereby caused, and whether she intended the consequences of her actions. Second, the law p. 222requires that each element of the charge, claim or defense be found to have obtained with a threshold probability, a “standard of proof” (commonly thought to be 50% in civil actions and something more than this, perhaps 90%, in criminal actions). The requirement that each element must individually be subject to a 50% threshold, say, implies a weaker requirement, possibly much weaker, for the probability of the conjunction of elements, which is to say the charge, claim or defense itself. In the extreme case in which the elements are statistically independent, the fact that each is more likely than not implies only that their conjunction is more likely than 0.5n < 0.5, where n is the number of components. Particularly troubling is the fact that the implied threshold probability for a charge, claim, or defense decreases (quite rapidly) in the number of elements it contains, a factor with uncertain relevance. With four independent elements, it is not 50%, but 6.25%.22

3.3.2. The gatecrasher paradox

The second objection is implicated by the “gatecrasher paradox.” Suppose that the defendant was in attendance with 1000 others at an event at which it is known that 499 attendees purchased tickets and 501 crashed the gate. Assuming no other evidence is available, could (should) a fact finder conclude that the defendant more likely than not failed to pay for his seat, and could (should) the law hold the individual liable for the ticket price? While probabilistic analysis appears to imply that the defendant should be held liable, some see this as strongly contrary to intuition.23

3.3.3. Accounting for the interests of the parties

Allen (1986) is drawn from a watershed symposium held at Boston University on the benefits and drawbacks of applying probabilistic deduction to evidence law. (Symposium (1986).) Many of the other papers in the corresponding issue of the Boston University Law Review are also well worth examining. Taken as a whole, the symposium papers provide a comprehensive picture of evidence scholarship at the time regarding the utility of applying formal theories of probability, conventional and unconventional, to the problem of legal evidence.

Yet the symposium (as well as contemporaneous evidence scholarship) largely ignores an arguably more serious drawback of the pure p. 223probabilistic approach.24 The approach gives short shrift to a fundamental and distinctive fact about legal evidence. Unlike experimental evidence generated in a chemist’s laboratory or field evidence gathered in a macroeconomist’s survey of inventories, legal evidence is provided by conscious, animate individuals with strong interests in what the fact finder decides and a strong possibility of influencing that decision.25

It is one thing to prescribe, as does the pure probabilistic approach, that the fact finder interpret evidence according to the relative likelihood of its production under alternative truths. It is another thing to explain how these relative likelihoods ought to account for the interests of the parties responsible for the production. And it is yet another thing to ask how fact finding should be structured to account for the necessity of this strategic accounting. If the defendant in a criminal case produces for us her oath-bound testimony that she is innocent, how do we evaluate the relative likelihood of observing such a performance if she really were innocent versus if she were in fact guilty? Wouldn’t she claim innocence either way? If so, what constitutes convincing evidence of innocence?

4. Omission Models

Where then should one turn to gain a better understanding of how deductions from evidence are and ought to be made in light of the parties’ interests? Within the discipline of economics, game theory – including the subfields of information economics and mechanism design – seems like a natural candidate. Much of game theory, after all, concerns the situation in which one individual or entity would like to make use of information in the possession of another whose interests differ from her own.26 The theory of optimal auction design, for example, studies the tension between the seller’s desire to learn – and charge – the maximum price that a bidder would be willing to pay, and the bidder’s reluctance, anticipating this plan, to truthfully reveal this value. The theory of optimal taxation is similarly based on taxpayers’ reluctance to truthfully reveal their immutable p. 224endowments and preferences – upon which the government would ideally base its tax system – if doing so would adversely affect their tax bill.

However, on the relatively limited number of occasions that economists have turned their attention to legal evidence, they have generally taken a more limited approach. Although economic models of legal evidence do account for litigants’ incentive to manipulate the information reaching the fact finder, the account is typically incomplete. In most such models, parties may refrain from reporting to the fact finder all that they know. But they may not falsify, fabricate or forge. This stark omission (as it were) generates stark, but fragile, results.

This section of the chapter begins with a description of the basic structure of omission models. It then presents several interpretations of, and variations on, this basic structure. A third subsection describes extensions and applications of the model, and a fourth reviews several criticisms.

4.1. Basic Structure

The omission model approach to legal evidence is typified by Milgrom and Roberts (1986).27 Milgrom and Roberts present both a single agent model and multiple agent model. The next two sections present each model verbally, provide a numerical example, and describe “precedents” in law and legal scholarship.

4.1.1. Single party model

Milgrom and Roberts show how the fact finder may costlessly determine the true state of affairs from a nonetheless interested party who is known to be informed regarding that state of affairs under the assumption that the party may not and will not provide information that is inconsistent with her knowledge. In this case, the fact finder need only announce ahead of time that if the informed party supplies her with ambiguous information, she will assume the worst outcome for the party that is consistent with the information supplied.28 It follows that the worst deduction for the party that is consistent with the information that she provides will in fact be the truth. This is because the party would always have an incentive to clear up any ambiguity as between the truth and any less favorable deduction.

p. 225A simple numerical example will help to clarify the result and its logic. Suppose that the true state of the world is one of ten possibilities, each indexed with a number between 1 and 10. The fact finder (she) does not know the true state. The party (he) does know the true state, and the fact finder knows that the party knows. The fact finder would like to learn the true number (i.e., state of the world). The party would like the fact finder to think the number is as high as possible, and the fact finder knows this as well.

The party chooses what to report about the true number to the fact finder. (Other, “more evidentiary” interpretations of this game are presented in Section 4.2 below.) The fact finder makes a deduction about the true number based on this report.

The rules of the game are that the party may not lie. He may, however, omit to say all that he knows. That is, he may decline to report the exact number, providing instead a subset of numbers in which the true number lies. He might, for example, reveal that the number is odd (if in fact it is). Or that the number lies between 3 and 7 (if in fact it does).

Despite the party’s ability to omit information, the fact finder can be assured of learning the true number by announcing that upon hearing the party report a subset of numbers, she will assume that the true number is the lowest in that subset. The party would in that case never find it in his interest to report a subset containing a number lower than the truth. He can always do better – that is, inspire the fact finder to deduce a greater number – by removing any number from his report that is lower than the truth. Therefore, the lowest number in the party’s report, the number in fact chosen by the fact finder, will indeed be the truth. If, for example, the true state is 4, the party will never report the subset {2, 4, 6, 8, 10}. Doing so would cause the fact finder to deduce that the truth is 2. The party could do better by eliminating 2 from this report, whence the truth finder will deduce 4, which is the truth.

It is worth noting that the assume-the-worst rule used by the fact finder in Milgrom and Roberts is akin to an adverse inference from “spoliation,”29 an ancient component of evidence doctrine dealing with a party’s refusal or purposeful inability to supply evidence alleged to be under her control. In terms of the numerical example in the immediately preceding paragraph, an additional document in the party’s possession p. 226might, for example, allow the fact finder to distinguish between state 2 and the subset of states {4, 6, 8, 10}. If the party refuses to provide the document – that is, if the party in effect reports {2, 4, 6, 8, 10} – the fact finder, under this rule, assumes that the truth is 2, the worst case for the party among the relevant possibilities.

Analysis of this kind of assume-the-worst rule also crops up in the legal scholarship on evidence preceding Milgrom and Roberts (1986). Indeed, the idea appears in attempts to resolve the gatecrasher paradox, as discussed in Section 3.3.2 above. Kaye (1979), for example, suggests that the plaintiff’s failure to present any information beyond the naked statistical evidence of ticket sales and attendance may be taken to indicate that the rest of the evidence that is likely available to him would hurt his case.

4.1.2. Multiple Informed Parties with Conflicting Interests

Milgrom and Roberts (1986) also put forward a related dual informed party model in which the parties’ interests are in conflict. In this model, the fact finder again learns the truth. Indeed, this occurs no matter what rule the fact finder uses to choose among the set of states that are consistent with both parties’ reports: the parties’ conflict of interest substitutes for the fact finder’s sophistication. If the parties have strictly opposing interests and each knows the truth, the fact finder will always end up learning the truth so long as she restricts her decision to the intersection of the parties’ reports. (Again, the parties may omit information, but may not lie.) This is because whatever the fact finder’s decision within that intersection, if it is not the truth, one party will prefer the truth and so will have an incentive to (and the ability to) refine her report.

Return to the subset reporting example presented above in which there were ten possible states. Imagine now that two parties know the truth and each reports a subset containing it. Suppose that one party wants the fact finder to think the number is as high as possible and the other as low as possible. Thus, the parties’ interests are in conflict. Suppose that the fact finder (unthinkingly) takes as the true state the average (rounded up or down) of the numbers in the intersection of the parties’ reports. Then, if the average of the points in the intersection of reports is not the truth, one party or the other will prefer the truth to this average and thus have an incentive to report the truth as a singleton. If, for example, the truth is 4, one party says “3 or greater” and the other “7 or less,” then the intersection of reports is 3 through 7, the average is 5 and the party that prefers lower numbers to high has an incentive to refine her report to “4.”

The logic of this result resonates with earlier less formal discussions within case law and evidence scholarship regarding the relative benefits of p. 227adversarial procedure.30 According to the US Supreme Court, “the very premise of [the] adversarial system . . . is that partisan advocacy on both sides of a case will best promote the ultimate objective that the guilty be convicted and the innocent go free.”31

4.2. Four Interpretations/variants of the Omission Model

Several interpretations/variants of the two games discussed above appear in the omission literature. I will discuss these with reference to the single agent model in Section 4.1.1.

4.2.1. Subset reporting

The first interpretation is “subset reporting,” which was described above and is the main interpretation offered by Milgrom and Roberts (1986). The logic of truth revelation is clearest under this interpretation. Yet, as a description of actual evidence production, it might be regarded as too abstract.

4.2.2. Truth-consistent presentation

The “truth-consistent presentation” interpretation is somewhat more representational of actual legal evidence. Under this interpretation, rather than directly reporting to the fact finder a particular subset of states, the informed party makes an evidentiary presentation that is commonly known to correspond to a particular subset of states.

Thus, imagine that each state s is associated with a subset of evidentiary presentations or “messages,” E(s)E. The subset E(s) represents the evidentiary presentations that are consistent with state s in some “natural” sense (which may not be fully specified). The association may be semantic. For instance, the state s may in part specify that the party paid a third person to supply him with a particular product. Several of the presentations in E(s) may involve providing the fact finder with a signed contract whose language is consistent with that fact.

A given presentation eE may be associated with more than one state: that is, there may exist two states s and s′ such that ss and eE(s)E(s). Thus, a signed contract may also be consistent with the state in which the party failed to pay for the product. Indeed, the null presentation (i.e., the choice to make no presentation), which is presumably an available option for the party, is consistent with all states.

Two assumptions are imposed upon this structure. First, the correspondence E(.) is common knowledge between the party and fact finder. p. 228Second, given true state s, the party may make any presentation in E(s), but no presentation outside of E(s). Thus, the party’s presentation must be consistent with the true state, though it need not fully indicate the true state.

To understand the linkage between this structure and subset reporting, consider the following simple example: eight evidentiary presentations might be made, A, B, C, D, F, G, H, I, in each of three possible states, 1, 2, 3. Table 9.1, read row-by-row, shows for each state (row) the presentations (columns) that are naturally associated with that state. For example, presentations D, G, H and I are associated with state 3. The set E(s) corresponds to the set of shaded boxes in the row corresponding to state s. Thus E(3)={D,G,H,I}.

Table 9.1Example illustrating the connection between subset reporting and truth-consistent presentation
Documents
ABCDFGHI
States1
2
3

The key to linking this framework to the subset reporting model described above is to notice that the table can also be read column-by-column. Thus, given any evidentiary performance, the table tells us the subset of states with which such performance is naturally associated. For example, reading down column G, we see that performance G is associated with states 2 and 3. It follows that when the party makes a particular presentation to the fact finder, he is in effect truthfully reporting to the fact finder a subset of states, as in the subset reporting interpretation. When the party makes presentation G, for instance, he is in effect truthfully reporting to the fact finder that the true state lies in the subset {2, 3}. In general notation, on presenting evidence e, the fact finder is in effect reporting the subset {s|eE(s)} of states.

The truth revelation result discussed above may be restated in terms of this new interpretation. Indeed, this interpretation serves to highlight both an implicit requirement and a generalization of such result.

The additional requirement is that the correspondence E(s) be sufficiently “rich.” To take an extreme example, if every cell in Table 9.1 were shaded – signifying that every available evidentiary presentation was consistent with every true state – truth revelation would be impossible. (In subset reporting terms, this would correspond to the party’s somehow p. 229being unable to convey in language that the true state was in any subset other than the full set itself.)

Notice that quite contrary to this possibility, Table 9.1 has the following “extreme richness” property: every subset of states has at least one corresponding presentation in the sense that such subset precisely equals the subset of states consistent with such presentation. That is, for all subsets of states S there exists a presentation e such that {s|eE(s)}=S. For the singleton subset consisting solely of state 1, for instance, there are, in fact, two such presentations: A and B. For the subset of states {1, 2}, there is one, C. For the subset of states {1, 2, 3} there is also one, D, and so on. Thus, just as in the subset reporting interpretation, every subset of states may, in effect, be reported.

Extreme richness is sufficient (but not necessary) for truth revelation. Recall that in the subset reporting interpretation, the fact finder announces that she will assume that the truth is the party’s least favorite state among the states in the subset that the party reports to her. In the truth-consistent presentation interpretation, the fact finder announces that upon seeing a presentation A, B, C, . . ., or I, she will deduce that the true state is the party’s least favorite state among those states that are consistent with the presentation. Suppose, for example, that the truth is 2 and that the party prefers higher states to lower. Given that the true state is 2, the party is able to present C, D, F, or G. If the party makes presentation C or F, the fact finder, using an assume-the-worst rule, will deduce that the true state is 2. If the fact finder makes presentation D or G, the fact finder will deduce that the true state is 3, which is worse than 2 for the party. Therefore, the party will present C or F and the fact finder will decide that the state is 2, which is the truth.

Conversely, extreme richness is not necessary for truth revelation – this is the generalization referred to above. The truth revelation result does not require that every subset of states be effectively reportable via some presentation. All that is required is that for every state there be at least one presentation which allows the party to rule out in the fact finder’s mind those states that the party regards as worse than such state. Assuming the party prefers higher states to lower states, the requirement is that every state be the lowest state that is consistent with some presentation. Consider, for example, Table 9.2, which is a pared down version of Table 9.1. Notice that each state is the lowest that is consistent with some presentation. For example, state 2 is the lowest consistent with G. An assume-the-worst rule would also produce truth revelation under Table 9.2. Were state 3 the true state, for example, the party, who prefers higher states to lower states, would be able to report G, H, or I. But the party would never present G or H because, given the fact finder’s assume-the-worst rule, p. 230the party could do better presenting I, in which case the fact finder’s rule would lead it to the truth.

Table 9.2Example illustrating that extreme richness is not necessary for truth revelation
Documents
GHI
States1
2
3

4.2.3. Feasible presentation

The third interpretation, “feasible presentation,” is similar in structure to truth-consistent presentation. The difference lies in the interpretation of the correspondence E(.). Under the feasible presentation approach, E(s) represents the presentations that are feasible for the party in any given state, rather than those that are naturally or semantically consistent with the true state. Interpreting Table 9.1 in this way, when the true state is state 3, presentations D, G, H, and I are the only presentations that the party is able to make. All other presentations are impossible.

The feasible presentations interpretation subsumes the truth-consistent presentation interpretation. Return for a moment to viewing Table 9.1 as a map of presentations that are naturally/semantically consistent with each state, as opposed to feasible under such state. If we layer on top of this the restriction that truthful presentations, and only truthful presentations are possible, Table 9.1 becomes also a representation of the set of feasible messages.

Conversely, the feasible presentations interpretation is conceptually broader than the truth-consistent presentations interpretation. The feasible presentations interpretation does not specify what the “technology” is that rules out certain presentations in certain states. An exogenous legal prohibition on untruthful presentation may well be doing some or all of the work. Physical impossibility or prohibitive cost may also be important factors.

4.2.4. Infinite or zero cost

A fourth interpretation, which is actually a gloss on the first three, merely replaces “feasible/permitted” with “of zero/negligible cost” and “infeasible/not permitted” with “infinitely/prohibitively costly.” In the subset reporting interpretation, with this gloss, the cost of reporting any subset containing the true state is zero, while the p. 231cost of reporting any subset not containing the true state is infinite. In the truth-consistent presentation interpretation, all evidentiary performances in E(s) have zero cost when the true state is s, while all performances outside E(s) have infinite cost. The source of infinite cost is presumably sanctions for truth-inconsistent reporting. The adjustment in the feasible presentation interpretation is similar: “feasible” becomes “of zero cost,” “infeasible” becomes infinitely costly. The source of costs is left unspecified and may be legal, technological, economic, or some combination thereof.

This third “binary cost” interpretation will be useful in comparing omission models with the endogenous cost signaling models discussed in Section 5.1 below.

4.3. Applications and Extensions

In this section, I discuss a selection of the many applications and extensions of the basic omission models described above in Section 4.1.

4.3.1. Strategic search models and the possibility of pro-plaintiff or pro-defendant bias in the litigation system

The strategic search model of evidence production is a variant of the omission model that in effect combines the logic of the omission model with the logic of contest models of litigation, as discussed in Section 2.2. In strategic search models of litigation, parties sample from an exogenous distribution for “pieces of evidence,” deciding both when to stop sampling and what of their accumulated sample to show the court. Each time a party draws a sample she incurs a cost. Parties may decline to report elements of their accumulated sample to the court, but, as in Milgrom and Roberts (1986), they may not fabricate observations.

p. 232Froeb and Kobayashi (1996) use a strategic search model to show in effect that Milgrom and Robert’s (1986) two-party result, described above, extends to the case in which evidence is costly to acquire and the decision maker may be biased. The fact that evidence is costly to acquire does not prevent a correct decision because, all else the same, the party in the right, being favored by the distribution of evidence from which parties draw, ends up presenting more favorable evidence to the fact finder. Moreover, the litigation system automatically compensates for any decision maker bias because the party favored by such bias reacts, in Froeb and Kobayashi’s model, by slacking off on evidentiary effort.

Daughety and Reinganum (2000) generalize Froeb and Kobayashi’s model and emphasize the dependence of Froeb and Kobayashi’s results on symmetries in sampling costs and sampling distributions. Daughety and Reinganum suggest that, all told, the adversarial system favors defendants.

Froeb and Kobayashi (2000), discussed in Section 4.3.2 immediately below, is another example of this type of model.

4.3.2. Adversarial versus inquisitorial procedure

Omission models have been employed to compare the relative merits of adversarial process (wherein litigating parties compete before a spectator/fact finder) and inquisitorial process (whereby the parties are restrained and the fact finder investigates and questions).32 Shin (1998) argues that inquisitorial process generates less information than adversarial process even when the inquisitor’s investigative ability is as good as that of each adversary (taken individually). In Shin’s model, as in Milgrom and Roberts (1986), parties can suppress evidence but cannot fabricate it. Shin shows, in essence, that the downside of adversarial process – the fact that evidence may be manipulated – can be significantly alleviated by the kind of assume-the-worst-of-omission deductions studied in Milgrom and Roberts (1986). With this downside mitigated, the upside of adversarial process – the fact that it offers multiple sources of evidence – becomes decisive in comparing systems.

Froeb and Kobayashi (2000) employ a strategic search model of evidentiary sampling similar to that in Froeb and Kobayashi (1996). Under adversarial process, each party decides first, how many times to draw from a given distribution of evidence and second, what of her sample to present in court. Once again, parties cannot fabricate any portion of their sample. The fact finder “averages” the evidence placed before her. This induces each party to present only the single piece of evidence in her sample that most favors her case. Under inquisitorial process, on the other hand, the inquisitor herself samples from the distribution and averages all the data. Which system does better? If the parties face the same sampling costs, both systems leave the fact finder with the same assessment on average, namely the true mean of the distribution. In other words, whether one takes the average of extreme draws in each case or the average of all draws in each case, one’s average assessment over all cases will equal the true mean of the sampled distribution: both estimating procedures are “unbiased.” However, the two estimators differ in their variance – a proxy for their degree of error.33 Which system has less error is indeterminate. The p. 233outcome depends on the shape of the underlying distribution and the cost of sampling.

4.3.3. Link to primary activities

Bull and Watson (2004) adopt a variant of the feasible presentation interpretation discussed above. They link evidence production in a later phase of the model to action choice in an earlier phase: action choice determines the set of feasible evidentiary presentations.34 In particular, Bull and Watson study how the prospect of transfers based on evidence production affects contracting and contractual performance. Their main result characterizes kinds of actions that are implementable under this endogenous evidentiary structure.

4.3.4. Partial provability in a multi-party context

Consider again the example from Section 4.1.2 in which each of two informed parties with conflicting interests reports to the uninformed fact finder something true, but perhaps not complete, about the actual state of the world, where the state of the world is indexed by the numbers between one and ten. In that example, the conflict of interest between the two parties implied that, whatever the fact finder’s rule for choosing among elements in the intersection of the parties’ reports, if the truth would differ from the fact finder’s determination, one of the parties would prefer to refine her report to precisely indicate truth. An implicit assumption in that analysis was that the party who preferred to refine her report was also able to do so.

What if the parties were not always able to precisely communicate the true state? What if, for example, both parties were able to report the true state precisely in all states except state 5, in which the only report that either side could make was the subset {2, 4, 5, 6}. How would this situation arise? Perhaps (to shift momentarily to the feasible or truth-consistent evidentiary presentation interpretation), the evidentiary presentations available to the parties in state 5 are simply “inconclusive:” everything the parties could show or do in front of the fact finder in state 5 might also be shown or done in states 2, 4 and 6.

Would the truth revelation result fall apart? This is the main question addressed by the sub-literature on partial or limited “provability.”

Section 4.2.2 discussed partial provability in the context of the single p. 234party model. This remainder of this section discusses partial provability in the context of the multi-party conflict of interest model.

4.3.4.1. General Discussion

Let us begin by establishing that limits on provability may indeed be sufficient to prevent truth revelation, even in a model with two parties that have conflicting interests. This is certainly the case if the fact finder is unsophisticated, as in Milgrom and Roberts’ (1986) multi-party model. If, for example, the fact finder’s rule is to average (and round down) the intersection of reports, then, in the example from the last paragraph, the fact finder will incorrectly deduce “4” when, the true state being 5, she hears “{2, 4, 5, 6}” from both parties.

Even when the fact finder is both sophisticated and fully aware of the limitations on provability, partial provability may prevent her from being able to deduce the truth. Suppose, for example, that {2, 4, 5, 6} was also the only report that each party could make when the truth was 6. Then the fact finder would be unable to distinguish states 5 and 6 based on the parties’ reports.

Conversely, it is equally clear that full provability is unnecessary for truth revelation (just as in the single party case discussed in Section 4.2.2). If, to use the same example, the precise state may be reported in all states except 5, in which only {2, 4, 5, 6} may be reported, then the fact finder will deduce the truth by adopting any rule which chooses from within the intersection of reports, and in which “{2, 4, 5, 6}” heard from both parties is interpreted as “5.” If the state is not 5, one of the parties will prefer the true state to 5, and be able to report the true state precisely. Thus, the fact finder will hear “{2, 4, 5, 6}” from both parties when and only when the truth is 5.

The question(s) then arise: given a set of assumptions about the sophistication and knowledge of the fact finder, and the degree of conflict in the parties’ interests, how partial can provability be without defeating the fact finder’s ability to learn the truth? One version of this question is considered by Lipman and Seppi (1995) [“LS”].35

4.3.4.2. Lipman and Seppi (1995)

LS consider partial provability with conflict of interest and sophisticated fact finders who are fully aware of the restrictions on provability. In LS all players face the same restrictions on p. 235provability. Furthermore, LS model reporting as a sequential game, rather than a simultaneous game.36

Although LS is a very technical paper, much of its complexity arises from the authors’ consideration of the case in which there are more than two parties. The two-party case, once it is culled from LS’ more general technical analysis (which itself is not easy), is not particularly difficult to grasp.

Nevertheless, a proper explanation will require some development, and some readers may wish to skip to the next section.

Thus, suppose that the two parties each move once in sequence, reporting in effect a subset of states.37 Consider the state 5 and all the subsets of states, each containing 5, that may be reported when the truth is 5. Suppose one of these subsets is {2, 4, 5, 6} and suppose that the report {2, 4, 5, 6} has the following property: if the truth is 2 and not 5, a subsequent subset report could be made that includes 2 and not 5; similarly, if the truth is 4 and not 5, a subsequent report could be made that includes 4 and not 5; and if the truth is 6 and not 5, a subsequent report could be made that includes 6 but not 5. That is, for every state in the report aside from 5 itself, a subsequent report could be made that rules out 5. The subset of even numbers, for example, would simultaneously work as such a report for 2, 4, and 6.

Suppose then that the fact finder were to announce a decision rule in which, inter alia, she would provisionally believe 5 if she heard “{2, 4, 5, 6},” but would change her mind if the second reporter subsequently reported a subset not containing 5. Then, by hypothesis, if the true state were not 5, this falsity could be effectively “communicated” by the second reporter. The second reporter’s report would not necessarily, on its face, communicate, even in conjunction with the first report, the precise identity of the true state. But it would refute the fact finder’s provisional belief that the true state were 5. Thus, {2, 4, 5, 6}, if taken to mean “5,” acts as a subsequently/potentially “refutable” report of “5.” And, therefore, we may say that state 5 “has” (at least one) refutable report.

Let us now imagine that for every state there is at least one such refutable report. In this case, the message space taken as a whole is, in LS’ terms, “weakly refutable.”38 LS prove that, in the context of their model, weak refutability (plus a “rich language condition” to be discussed) are sufficient for the existence of a decision rule for the fact finder that lands on the true p. 236state no matter what it might be, so long as the parties’ interests are in conflict (and even if the decision maker knows only this about their preferences ).39

Here is the argument. Imagine that the fact finder begins to construct the following decision rule (which LS refer to as a “believe-unless-refuted” rule): The fact finder wishes first to assign to each possible state a refutable report. But she wishes to do so on a one-to-one basis, so that each state has its own refutable report. Given the assumption of weak refutability, each state has at least one refutable report. In the example above, 5 has {2, 4, 5, 6}. But, does each state have its own unique report?

In the subset reporting framework in which parties simply report subsets, the answer is “not necessarily.” However, LS adopt the feasible presentation interpretation described in Section 4.2.3. The feasible presentation interpretation, like the truth-consistent presentation interpretation (see the last point in Section 4.2.2), allows for the possibility that there may be several different ways to effectively report a subset of states. LS specifically assume (their “rich language condition”) that for any given subset of states, there are at least as many different ways to report such subset as there are elements in the subset. It is as if such reports of subsets are made in writing with crayons and there are as many crayon colors to choose from as there are elements of the subset being reported.40

(To show fulfillment of LS’ rich language condition in Table 9.1 (which has only three states) we would need at least 3 + 3 × 2 + 1 × 3 = 12 columns. Alternatively, we could imagine that each column/presentation may be made in any one of three colors.)

If we impose LS’ rich language condition, then each state does have its own report by the following argument. Suppose that, without the crayons, {2, 4, 5, 6} is the only refutable report for both 5 and 6. This subset having four elements, there must be at least four colors in which it might be reported. Therefore, it is possible for the fact finder to announce a rule that assigns, say, {2, 4, 5, 6} in red to 5 and {2, 4, 5, 6} in blue to 6, and so on.

Given this unique assignment of refutable reports to states, the fact finder announces that she will provisionally believe that a state is true if she hears its uniquely assigned refutable report. For example, if the fact finder receives {2, 4, 5, 6} in red from the first reporter, she provisionally believes that the true state is 5.

Now move to the second party’s report, assuming that {2, 4, 5, 6} in red has been reported and is interpreted to mean “5.” If 2, 4, or 6, and not 5, is p. 237in fact not the truth, then by the assumed refutability of the message space, there must in each case be another subset containing, respectively, 2, 4, or 6, and not 5 that the second reporter could report. If any one of these subsets were reported, 5 would be refuted. But what would the fact finder then believe? If there existed three distinct reports refuting 5, one consistent with each of 2, 4, and 6, the answer would be simple. The fact finder would believe 2 upon hearing the distinctive report for 2, 4 on hearing the distinctive report for 4, and so on. But what is to guarantee that the 5-refuting reports for 2, 4 and 6 are distinct? Again, LS’ rich language condition solves the problem. Even if the only refuting report for each of 2, 4, and 6 is, for instance, “the even numbers,” the rich language condition implies, in effect, that this one report could be made in as many different colors as there are even numbers. A fortiori, the fact finder can associate “the even numbers” in yellow with 2, “the even numbers” in green with 4, and “the even numbers” in purple with 6. The upshot is this: there is a way for the second reporter to correct the finder’s decision, when after the first round, the combination of the fact finder’s decision rule and the report she receives causes the fact finder to be mistaken.

Given this, imagine that the fact finder finishes the construction of her decision rule by adopting the following subsidiary rule for the second move: continue to believe the state associated with the first report, unless the second reporter reports a refuting subset/color, in which case believe the single state pre-associated with that subset/color.

It then follows that the fact finder will learn the truth. For consider what happens when it is the second reporter’s turn. Let’s imagine (although the fact finder need not know this) that the first mover prefers higher states to lower, and the second mover, lower states to higher.

One possibility is that the first reporter “reports the truth.” Suppose, for example, that the truth is 5, and that the first reporter reports the subset/color pre-assigned to 5: namely, {2, 4, 5, 6} in red. In that case, the second reporter cannot refute 5. As in the rest of the literature on omissions, all of the second reporter’s reports must correspond to the true state 5. In terms of the feasible presentations interpretation (see Section 4.2.3), when the true state is 5, the second reporter cannot, by definition, make a report that is not feasible in state 5. Therefore, given the fact finder’s believe-unless-refuted rule, the fact finder correctly decides that 5 is the true state.

For future reference note that, because every state is so reportable according to the fact finder’s association of subset-colors and states, the first reporter always has the option of obtaining truth-telling payoffs. Therefore, she will never choose a report that is projected to provide her with lower payoffs than this.

A second possibility is that the first reporter makes the report associated p. 238with a given state even though that state is not the truth. For example, the first reporter might make the report associated with 5, {2, 4, 5, 6}, in red, even though the truth is not 5, but is 2, 4, or 6. Note that the fact that {2, 4, 5, 6} in red has been associated with state 5 does not change the fact that it is also feasible when the true state is 2, 4, or 6. Note also that if {2, 4, 5, 6} in red is reported, the true state cannot be anything other than 2, 4, 5, or 6.

The first subsidiary case of this second possibility is that the truth is higher than the fact finder’s interpretation of the first reporter’s report (recall that the first reporter prefers higher states). Suppose, for example, that the true state is 6. Then, the second reporter, who prefers lower states, may or may not choose to refute the first reporter’s report of 5 (i.e., {2, 4, 5, 6} in red). But if she does, this would only be to make the fact finder believe that the truth is even lower than 5. Therefore, the first reporter, who prefers higher states, and can guarantee herself truth payoffs from 6, will never present the report for 5. In general, the first reporter will never communicate any state that is lower than the truth.

The second subsidiary case is that the truth is lower than the state assigned to the first reporter’s report. For example, suppose the truth is 4 even though the first reporter reported {2, 4, 5, 6} in red, which is pre-associated with 5. The second reporter, who prefers lower numbers, would prefer the truth, 4, to a decision that the state is 5. Moreover, the second player has the ability to communicate that the state is 4, by, for example, stating in green that the true number is even, if that is the report pre-associated with 4. Indeed, the second player may even be able to refute 5 by making a report that is interpreted as a state even lower than 4. Perhaps the second reporter can state in yellow that the true state is even, and this will be taken to mean 2. In any event, the fact finder will end up deciding on something that is either the truth or lower than the truth. If it is the truth, we are done. If it is lower than the truth, then the first reporter, anticipating all this, would not have gotten herself into such a position in the first place; she would have made a report that would have been taken to mean the true state, 4.

Therefore, anticipating the second reporter’s move, the first reporter has no incentive to make any report other than that pre-assigned to the true state, which message the second reporter cannot refute. And, conversely, in the only cases in which the first reporter has no affirmative incentive to make the report that is assigned to the true state, the second reporter has an affirmative incentive to make any necessary corrections.41

4.4. p. 239Assessment of Omission Models

This section critically appraises the literature on omission models and in doing so sets the stage for the third approach to modeling legal evidence, which is described in Section 5 below.

4.4.1. The no-lying assumption in the subset-reporting and truth consistent presentation interpretations

Truth revelation results for omission models are precariously balanced on the assumption that agents cannot fabricate evidence. If, for instance, we remove the restriction on lying from the one-through-ten example used to explain Milgrom and Roberts’ (1986) single agent result in Section 4.1, the fact finder’s assume-the-worst rule merely induces the single agent to report her favorite outcome, “10,” regardless of the true state. Likewise, if we remove the no-lying restriction from the two agent conflict-of-interest example presented in Section 4.1.2, one agent will respond to the fact finder’s intersection rule by reporting “1,” and the other by reporting “10.” The fact finder’s decision rule will not lead to truth revelation. Indeed, the rule, which specifies a way of choosing from the intersection of the parties’ reports, will not even function: the agents’ reports will not intersect.

It is thus worth asking whether the no-lying assumption is justified, at least as an approximation of reality.

Many contributors to the omissions literature motivate the no-fabrication assumption by pointing out that lying in court is illegal by virtue of statutes criminalizing perjury, obstruction of justice and similar transgressions. This defense is problematic in several respects.

First, let us accept for purposes of argument the premise that behavior that has been rendered illegal is not an issue for the analysis of legal evidence. The problem for the omissions literature, then, is that much of the behavior that this literature would characterize as “omission,” rather than fabrication, is also illegal – by virtue of subpoena enforcement, compelled discovery, and statutes on obstruction of justice and contempt.42 Furthermore, instances of omission that are not now illegal might be made so. Thus, if illegality does (or can) rule out behavior, the omissions literature itself is left without a problem to solve. (Alternatively, it must explain why making omission illegal is not the best policy – see the fourth point below.)

Second, the implicit premise that illegal behavior is not an issue for evidence is troublesome. The illegality of fabrication does not rule it out, p. 240either in practice or in principle. As a matter of theory, the detection-probability-discounted penalty for lying may not outweigh the potential gains from doing so. As a matter of empirics, despite the fact that fabrication in court is often illegal, what limited data exist suggests that it is a regular occurrence.43

Third, one is justified in asking why, if the no-lying assumption is legitimate in the context of judicial process, it is so rarely deployed by economists in other more frequently studied settings in which the government is also the principal. There is no corresponding no-fabrication assumption in the literature on optimal taxation, or in large swaths of the literature on optimal regulation. Perjury and related crimes, like lying to investigators and tax fraud, apply in these settings as well. If the existence of such sanctions justify the no–lying assumption in omission models, why does it not justify assuming that the optimal tax authority can observe wage rates as opposed to just labor earnings?

Fourth, and perhaps most importantly, by assuming that agents cannot lie, the omissions literature on evidence effectively assumes away one of its most fundamental challenges and, correspondingly, blocks off one of its chief sources of potential utility. Crucial and interesting questions central to the practical design of evidentiary procedure are shunted aside. How precisely do laws regarding perjury and obstruction function? Are such laws effective? Are they efficient? Are they effective and efficient in some settings and not in others? What are the alternatives to such laws for system design? More generally, what is the best way to structure litigation if we do not take as already solved the elemental problem that parties have an incentive to falsify their testimony and forge their tangible evidence?44

4.4.2. Lingering problems in the feasible presentation interpretation

The assumption, in subset reporting, that a party may not lie, becomes, in truth-consistent presentation, the assumption that the party may not make a presentation that is inconsistent with the true state, where the p. 241correspondence E(.), as represented by Table 9.1, defines consistency. Presumably, in both of these interpretations, some external legal prohibition defines and enforces these reporting restrictions.

In feasible presentation, the no-lying assumption is transmuted into the bare assumption that the party’s choice of presentation is constrained by the table – for reasons that are not specified. Thus, assume that the true state is 1. In subset reporting, the party is prohibited from lying and reporting that the true state is in the subset {2, 3}. In truth-consistent presentation, the party is prohibited from making presentation G, which is consistent with 2 and 3, but not 1. In feasible presentation, G is simply designated as impossible. G may be impossible because it is semantically inconsistent with the truth and truth-inconsistent presentations are legally prohibited. Or it may be somehow technologically or economically impossible, without the aid of legal prohibition. That is, G is the kind of presentation that, for whatever reason, cannot be faked when the truth is neither 2 nor 3.

Arguably, eschewing the explicitly truth-semantic subset reporting and truth-consistent presentations interpretations in favor of the feasible presentation interpretation does little to solve the problem identified in the preceding subsection. Rather, the feasible presentations interpretation appears merely to relocate the problem. In lieu of wondering why the defendant/surgeon could not lie about her performance during the seven-hour operation, the reader is left to wonder why there should exist an evidentiary performance that would be possible (at zero cost) for the surgeon if she had mindfully followed best practices, and impossible if she had not. If the response to the reader’s inquiry relies on the assertion that the surgeon is legally prohibited from presenting evidence that semantically contradicts the truth, then we are back to the problem with the explicit no-lying assumption. If the response relies on the assertion that there exists a kind of evidentiary presentation that is physically/technologically/economically impossible when the surgeon has not followed best practices, the question then becomes: where can such evidence be found? The time, expense, and endemic, lingering ambiguity of actual litigation suggest that, even if this unicorn exists, it is rarely discovered and harnessed by litigants.

Put another way, in the feasible presentation interpretation, the richness condition takes on added importance and correspondingly requires greater scrutiny. By contrast, with truth-consistent presentation, it is not difficult to imagine that there exists a presentation that is truth consistent with each given subset of states. As noted, simply reporting the subset of states is, after all, a presentation. What is perhaps difficult to imagine in truth-consistent presentation is the absolute prohibition on lying. In the feasible interpretation, we are not specifically asked to imagine an absolute p. 242prohibition on lying. Rather, we are asked to imagine that for (not all, but) a key collection of subsets of true states, it is possible to find an evidentiary presentation that would be effectively impossible in any state outside that state. This may be likely true for some subsets of states. But is it true for enough subsets of states to make possible the level of discernment that the fact finder requires? Is the evidentiary space rich enough?

For example, in one possible presentation, the party may appear before the fact finder missing a leg. This presentation is effectively impossible in a number of possible states. For example, the fact finder may be effectively certain that the true state is one of those in which the party has no leg. Moreover, the fact finder may be effectively certain that the true state is one of those in which party’s leg was involuntarily severed. Yet, this presentation tells the fact finder little about how the leg was severed. Was it caused by an accident? If so, who was the injurer? Was the injurer negligent? Was the party herself contributorily negligent? How will missing a leg impact the party economically? Mentally and emotionally? The fact finder may be interested in these issues as well, and it seems unlikely that the evidentiary space is rich enough to contain unfakeable presentations for all, or even most of what will be at issue in the case.

4.4.3. The illusory solution of adopting the infinite or zero cost interpretation

Recall that a fourth interpretation replaces “feasible/permitted” with “of zero/negligible cost” and “infeasible/not permitted” with “infinitely/prohibitively costly.” Arguably, this cost-based interpretation adds little to the omission model besides, perhaps, rhetorical appeal. After all, one could argue that economics as a field is partly defined in opposition to the mistake of regarding as predetermined variables that are actually subject to choice by self-interested agents. Could it be that the binary cost interpretation absolves omissions models of this mistake with respect to fabrication and lying? If so, then presumably any model that exogenously prohibits an action might improve itself by specifying that actors, with finite budgets, may indeed choose the action, so long as they pay an infinite price.

Indeed, in some respects, the binary cost interpretation makes the omission model look less appealing. Recasting the model in terms of binary costs places in stark relief the questionable binary nature of permissibility/feasibility. Under the cost interpretation, all evidentiary performances inside E(s) are equally (un)costly. For instance, an evidentiary performance e that pinpoints the state – i.e., E-1(e)={s} – costs as much to present, namely zero, as an evidentiary performance e′ that communicates nothing about the state E-1(e)={1,....,10}. Furthermore, all evidentiary performances outside E(s), regardless of whether they merely spin the truth or turn it fully on its head, are equally (prohibitively) costly. p. 243Accordingly, the cost difference between any performance in E(s) and any performance outside E(s′) is always the same: it is always infinite.

4.4.4. The partial solution of partial provability

Allowing for only partial provability, as do Lipman and Seppi (1995), does weaken the no-lying/infeasibility assumption and so restores some degree of plausibility to the omission model. But, relative to the substantial distance separating omissions models from the basic outlines of real litigation, the improvement seems negligible. Lipman and Seppi’s truth revelation result requires, inter alia, that it be impossible for the second mover to falsely refute the true state. If the first mover/defendant/surgeon has actually taken adequate care and makes a presentation that is contingently so interpreted, and if a given presentation by the second mover/plaintiff/patient would be interpreted as a refutation of the surgeon’s assertion, such presentation must be, in the state of adequate care, infinitely costly for the plaintiff.

Moreover, for truth revelation to work, any allowance for paucity in the evidence available to the second mover must be compensated for by additional richness in the evidence available to the first mover. Whenever there is no means for the second mover to distinguish state s from state s′ (that is, whenever every feasible second mover report that contains s also contains s′), there must be a first mover report that is of zero cost when the true state is s and of infinite cost when the true state is s′. Thus, if the patient cannot distinguish best practice and negligence, there must (again) be a report that is feasible for the surgeon if she followed best practice and impossible for the surgeon if she was negligent.

4.4.5. Other problems

There are other drawbacks to omission models in addition to the severity of the no-lying assumption. Several important issues are masked by the extreme binary nature of evidence costs, as discussed in Section 4.4.3 above. For instance, the manner in which the structure of evidence costs determines the range of supportable rewards and punishments is hidden from view. In an omission model, the stakes of the case are never so large as to inspire the production of false evidence; the cost of false evidence is infinite. Furthermore, all evidence that is actually presented in an omissions model is of zero cost (or at least constant cost), and so the question of how to design efficiently evidentiary process – taking account of the potential tradeoff between detail and certainty, on the one hand, and the deadweight cost of investigation and presentation, on the other – is simply not encountered in this literature.

These points will resurface in Section 5.2.1 when the omission model is compared to the costly signaling model.

5. p. 244Evidence as Endogenous Cost Signaling

Responding in part to the drawbacks of omission models, Sanchirico (1995, 2000, 2001b) analyzes the production and interpretation of legal evidence in a world in which parties can and will attempt to mislead the fact finder whenever it is in their interest to do so, and in whatever manner furthers those interests. This allowance necessitates viewing evidence production – to the extent that one believes it is at all effective – as a form of differential cost signaling, as described below.

These papers also take the additional step of explicitly linking evidence production to social goals beyond litigation per se. To the extent that legal evidence production is effective at the specific task of setting ex ante incentives in the “primary activity” – incentives to, for instance, refrain from physical violence, take adequate precaution, adopt safe product designs, comply with environmental regulations, disclose material adverse information, or fulfill contractual promises – the signaling costs of legal evidence must be endogenous to parties’ primary activity decisions.45

These two elements of the endogenous cost signaling approach – costly signaling and the primary activity-endogeneity of signaling costs – are discussed in sequence in the next two subsections.

5.1. Exogenous cost signaling

5.1.1. In general

The idea of exogenous cost signaling is often attributed to Spence (1974), who models educational attainment as a signal of natural ability. It is, in fact, also the basis of Mirrlees’ (1971) optimal income tax model.46 Almost 40 years after publication of these papers, costly signaling remains a commonly deployed mechanic in economic modeling generally. Yet it is oddly uncommon in the economics literature on legal evidence, for which it is arguably perfectly suited.

Spence’s approach was roughly this: Suppose an employer would be willing to pay higher wages to individuals of higher ability if only she knew p. 245who these individuals were. Merely asking job candidates whether they are of high ability won’t do: given the prospect of a higher wage, many candidates would answer “yes” whatever the truth. A candidate’s possession of a college degree might, however, act as a more reliable signal. Even though individuals of all abilities could conceivably raise their wage by earning a degree, this might be cost justified only for individuals of high ability, for whom earning the degree is less arduous.

One may agree or disagree with this account of education. But Spence’s (1974) general point regarding credible information transmission has resonance: an individual’s action reliably “signals” that she is of a particular “type” – i.e., has particular characteristics or knows particular information – when it would not have been in her interest to take that action were she some other type. In other words, talk is cheap and actions speak louder than words.

5.1.2. Applied to legal evidence

In the case of legal evidence, parties’ “types” are what they know about the event or condition in question, or how they acted in the primary activity. The relevant actions/signals are parties’ “performances” before the fact finder. This includes whether they present documents and things that are difficult to forge, as well as whether the witnesses they offer give consistent, detailed, robust and coordinated testimony. Focusing on the role of cognitive limitations, Sanchirico (2004a) explains why these “performances” might be more costly for some “types” than for others.

Assuming that such cost differences exist – at least probabilistically – their role is illustrated by the following example. Suppose that some piece of evidence cost $10 to produce when it is real and $100 when it is fake. If it is understood that production of this piece of evidence increases the party’s payoffs at litigation by some amount strictly between $10 and $100, then the evidence would be worth producing only when it is real. That is, the fact finder may reliably deduce that the evidence is real from the fact that it is produced, because it would not have been in the party’s interest to produce it otherwise.

One could question whether evidence is reliably more expensive to fake than to truthfully present. One could also question whether litigation payoffs can be reliably calibrated to separate the truthful from the untruthful (though we shall see that separation is not necessary). Yet, arguably, something like the dynamic in this (albeit stark) example must be at work if evidence production is to have value in a world of interested parties and hidden information.47

5.1.3. Comparison to omission models

Before moving on to endogenous cost signaling, it is worth comparing exogenous cost signaling with the zero/infinite cost interpretation of omission models (see Section 4.2.4). It could be said that exogenous cost signaling smoothes the cost structure relative to this interpretation of omission models. Whereas the cost of an evidentiary performance must be either zero or infinite in omissions models, it may have some middling value in exogenous cost evidence models. Alternatively, one can say that omissions models rule out the choice to lie and fabricate (what is possible at infinite cost is, in fact, not possible), whereas exogenous cost signaling models incorporate such choices into the analysis. Moreover, exogenous cost signaling models recognize even truth-consistent evidentiary presentations are expensive, which raises several interesting issues of system design.

Tables 9.3 and 9.4 show the omission model version and the costly signaling version, respectively, of the numerical example in the previous section. These tables help to clarify the difference between the two approaches.

Table 9.3p. 246Omission model version
StatePresent evidenceDo not present evidence
100
20
Table 9.4Costly signaling version
StatePresent evidenceDo not present evidence
1$10$0
2$100$0

Under the omissions model, if the state is 1, no matter how small the (positive) “reward” for presenting the evidence, the party always presents it. Moreover, when she presents it, the social cost is zero. On the other hand, if the state is 2, then no matter how large the reward for presenting the evidence, the party never presents it.

p. 247Under the “smoothed” costly signaling version, if the true state is 1, the agent presents the evidence if and only if the reward is greater than $10, and this cost (which includes the cost of investigation and presentation) is added to the social cost of litigation. Notice that even though the evidence is true, the party will not present it, if the reward for doing so is insufficient. On the other hand, if the state is 2, then the agent presents the evidence if and only if the reward exceeds $100, and this larger amount is added to the social cost of litigation. Even though the evidence is false, the party will still present it, if the stakes are high enough.

Two issues arise naturally in the costly signaling model that are absent from the omission model.

First, notice that, in the costly signaling version, the agent’s payoff in state 1 cannot be more than $90 greater than her payoff in state 2. In state 2, the agent always has the option of paying an additional $90 in evidence costs to achieve the reward (or lesser punishment) that is available to her in state 1. Thus, the structure of evidence costs imposes limits on implementable payoff differences across states. This phenomenon does not arise in omission models.

Second, suppose there was another evidentiary performance whose costs were $5, $50 rather than $10, $100. This new evidentiary performance imposes lower costs on society, but is only capable of producing a payoff difference of $45, as opposed to $90. Thus, a potential tradeoff arises between the size of the payoff difference that is implemented and the deadweight cost of evidence production. This phenomenon also does not arise in omission models.

The second phenomenon suggests another drawback of omission models. Section 4.4 criticized the assumption in omission models that there exists evidence that is of infinite (or prohibitive) cost when the state that it is taken to represent is not true. The tradeoff discussed in the last paragraph points to the fact that existence is not the only issue. Even if effectively-impossible-to-fake evidence did exist, it might be woefully inefficient. Suppose it were true that evidence that was extremely expensive when false was also quite expensive, though less so, when true. Perhaps the many witnesses, documents, and things that make the performance so difficult to fake, also make it difficult to present truthfully. Thus, imagine that there is another piece of evidence in the example above that costs $1,000,000 (which we will regard as effectively infinite) to present when false, and $10,000 to present when true. If system design requirements were such that the payoff difference need only be $85, it would be much more efficient to use the $10, $100 evidence. The latter is sufficiently difficult to forge, and much cheaper when truthfully presented. These kinds of cost considerations are very real; they are frequently cited in evidence p. 248case law and in the history of evidentiary rule-making. (See, for example, Federal Rule of Evidence 403.)48

5.2. Endogenous cost signaling

5.2.1. Basic structure

Sanchirico (1995, 2000, 2001b) ties the idea of evidence as a costly signal directly to the creation of primary activity incentives by positing that parties’ evidence costs are not exogenous but rather endogenous to their behavior in the primary activity. The gist of the model is apparent in the following simple example.A regulator wishes to induce firms to comply with a particular regulation, despite the fact that compliance costs firms an additional $100,000. The regulator requires that at the end of the period each firm appear before a review board to “present evidence” of its compliance. Based solely on this evidence, the review board then decides whether and how much to fine the firm.

In order for the regulator to induce compliance in this setting, it is both necessary and sufficient that it identify some form of presentation or performance before the review board, some “evidence,” whose production costs for the firm vary appropriately with the firm’s compliance activity.

To illustrate that the appropriate production cost differences can be sufficient, suppose that compliance happens to lower the firm’s cost of a particular presentation from $140,000 to $20,000. Let the regulator announce prior to the firm’s compliance decision that it will fine the firm $130,000 unless it presents this evidence before the review board. How does the firm react? First, consider its choice of what to present to the review board contingent on whether it has complied. If the firm has complied, the presentation would cost $20,000, but save $130,000 in fines; hence, the firm’s “best case” would be to present the $20,000 evidence and avoid the fine. If the firm has not complied, the presentation would cost $140,000 to produce, which is more than it would save in fines. The firm’s “best case” would now be to simply show up and pay the fine. Therefore, the firm’s prospective payoffs at the review board hearing will be –$20,000 if it complies, and –$130,000 otherwise. Consequently, compliance increases the firm’s prospective hearing payoff by $110,000. Stepping back to the firm’s choice p. 249in the primary activity, we see that this $110,000 benefit outweighs compliance’s $100,000 direct cost, and so the firm chooses to comply.

To illustrate that such endogenous cost differences are necessary for incentive setting, suppose the regulator made avoiding the $130,000 fine dependent on a form of evidence whose presentation cost was always $50 regardless of the firm’s compliance behavior. Then the firm would always present the evidence at the hearing, regardless of its compliance choice, and its prospective payoffs at the hearing would always be –$50. The hearing would then be irrelevant to its compliance choice, and so it would make this choice solely according to compliance’s $100,000 direct cost – i.e., it would choose not to comply. The same problem crops up if the designated evidence always costs the firm $1,000,000. In this case, the firm would never present the evidence and would always lose $130,000 at the hearing, regardless of its compliance activities; the hearing would again be irrelevant to its compliance decision. What is important is not that the evidence be costly to present, but that presentation costs tend to be lower following compliance.

5.2.2. Implications and applications

5.2.2.1. Truth Revelation Versus Primary Activity Incentive Setting

In most scholarship on legal evidence, finding out what really happened between the parties is taken to be the objective of trial. Consider the widespread use of the phrase “fact finding” to describe the central activity of trial. And recall that the chief issue in the literature on omission models is whether there is truth revelation. In an endogenous cost signaling model, by contrast, the reason to have trials is to create primary activity incentives. Importantly, truth finding is neither sufficient nor even necessary for this task. What is important for truth finding is some separation in evidentiary actions. What is important for primary activity incentive setting is sufficient separation in evidentiary payoffs. In an endogenous cost signaling model, either of these may occur without the other.

It is fairly clear that truth finding is insufficient for incentive setting – that some separation in evidentiary actions does not imply sufficient separation in litigation payoffs. In the example that was provided above to illustrate the basic mechanic of endogenous cost signaling, imagine dividing all the dollar figures by 1000 – except compliance costs, which remain at $100,000. The firm still presents the evidence (now costing either $140 or $20) if and only if it has complied. Therefore, the regulator still learns the truth about compliance. However, compliance increases the firm’s prospective hearing payoff by only $110, which is insufficient to the additional $100,000 primary activity cost of compliance.

p. 250To see that truth finding is unnecessary – that sufficient separation in evidentiary payoffs is possible without any separation in evidentiary actions – consider again the example from above with all numbers restored to their original magnitudes. Recall that the regulator announced that it would fine the firm $130,000 unless it presented the evidence costing either $140,000 or $20,000. With this reward structure, only the compliant firm presents the evidence. Thus, the regulator learns whether the firm has been compliant. But the fact that the regulator learned the truth from the evidence was purely collateral. What mattered was the fact that the hearing payoffs were sufficiently higher for the compliant firm than for the noncompliant firm. And since evidence production costs differ, this does not require that compliant and noncompliant firms present different evidence. Suppose, for example, that the regulator instead announced that the firm would be fined $150,000 if it did not present the evidence. Then both compliant and noncompliant firms would find it worthwhile to present the evidence, and the regulator would never learn whether a given firm had been compliant or not. On the other hand, the hearing payoffs for both firms would now consist solely of presentation costs, and the difference in these costs (between $140,000 and $20,000) would still be enough to overcome the additional $100,000 cost of compliance in the primary activity.49

5.2.2.2. p. 251Questioning the Emphasis on “Verifiability” in Contracts Scholarship

Contract theorists sometimes suggest that it is optimal for contracts to condition obligations only on contingencies that can be “verified” to a court (see, e.g., Hart and Moore 1988; Schwartz 1992). One implication of this principle, it would seem, is that an optimal contract should not specifically induce parties to fabricate evidence should a legal dispute arise.

Yet, to most contracting parties, verifiability is an intermediate goal. Adjudication creates value primarily through its ex ante effect on performance incentives. Anticipating the judicial resolution of future disputes, contracting parties are likely to be interested in the likelihood or cost of judicial truth finding only to the extent that the court’s ability to discern the truth efficiently improves contract incentives and the gains from trade.

Indeed, Sanchirico and Triantis (2008) argue that the parties themselves might prefer to permit evidence fabrication as part of a conscious contracting strategy that emphasizes efficient performance incentives over accuracy or fairness in the resolution of disputes.

The authors make their point using a probabilistic endogenous cost p. 252signaling model of evidence. A buyer and a seller design a sales contract in which they seek to motivate the seller to “perform” (e.g., deliver a good or service of a particular quality within a particular time frame). In particular, they wish to induce the seller to perform while incurring the lowest possible prospective litigation costs.

The buyer can sue under the contract to collect damages and, in litigation, can present either true or fabricated evidence.50 An “evidentiary state” is defined to be the quantum of truly existing evidence of nonperformance, and the probability of a given evidentiary state is endogenous to whether the seller in fact performed. True evidence of nonperformance is costly for the buyer to present, and the buyer can choose to present all or part of the existing quantity of such evidence. Importantly, the buyer may also choose to present a quantum of evidence that exceeds the quantity of existing evidence. That is, the buyer may also fabricate evidence of nonperformance. The marginal cost of fabricated evidence is higher than that of true evidence.

Because the parties care about litigation costs, and because fabricated evidence is more expensive, the parties would prefer – all else the same – to induce seller performance without inducing the buyer to go beyond the quantity of truly existing evidence of nonperformance. Were this factor the only consideration, the parties would indeed wish to avoid the circumstance in which the reward that the buyer receives for evidence of nonperformance is so great that it induces the buyer to fabricate. Avoiding this circumstance would effectively bound the seller’s liability (which equals the buyer’s reward for evidence) in each state.

However, all else is not the same. The effect on the seller’s ex ante performance incentive of increasing the seller’s liability in any given evidentiary state depends in part on the difference between the probability that the state will occur if the seller performs (p) and the probability that the state will occur if the seller does not perform (q). Moreover, the effect on ex ante litigation costs of the buyer’s presentation of a given unit of evidence of nonperformance depends not only on the ex post cost of such unit of evidence, but also on the ex ante likelihood that such state will occur (r). Thus, ignoring for a moment the difference in cost between truthful and fabricated evidence, the parties can reduce the cost of achieving any given p. 253performance incentive by increasing the seller’s sanction in states in which the ratio r/(qp) is low and reducing it where such ratio is high.

The consideration described in the immediately preceding paragraph constitutes a second factor in optimal contract design, and it is independent of the difference in cost between truthful and fabricated evidence. Importantly, this second factor may well be decisive. Specifically, where the ratio r/(qp) ranges widely across different evidentiary states – and so is much greater in some states than in others – the parties may prefer to increase the seller’s liability in states with low ratios even if that requires paying the greater evidentiary costs of the fabrication by the buyer that is thereby induced in those states.

5.2.2.3. “Presumptions” and Litigation-Primary Activity Feedback

Bernardo, Talley, and Welch [BTW] (2000) apply the idea of endogenous evidence costs (Sanchirico 1995, 2000, 2001b) to study the effect of legal “presumptions.” The definition and real impact of legal presumptions is a complex and unresolved issue in evidence scholarship. In BTW (2000), the effect of legal presumptions is interpreted as a parameter in the kind of exogenous probability function studied in Posner (1973), as discussed in Section 2.2.51

The theoretical analysis in BTW (2000) concerns the positive effect of presumptions on litigation and primary activity behavior. Normative issues regarding the optimal design of litigation are also analyzed, but the analysis is confined to numerical simulations. BTW apply their findings to three substantive law areas: private securities litigation, the business judgment rule in Corporations law, and fiduciary duties to lenders in financially distressed firms.

The remainder of this discussion concerns BTW’s positive findings and the structure of the BTW model.

BTW’s positive findings center on the counterintuitive consequences of feedback effects from litigation design to primary activity behavior. The authors highlight the possibility that shifting “presumptions” in favor of defendants may increase, rather than decrease, plaintiff filings and may even result in a larger, rather than a smaller, frequency of plaintiff victory among filed cases. These effects can be explained by imagining that the changes induced by modifying the presumption occur in sequence.

The shift in the presumption initially increases the (potential) defendant’s p. 254litigation payoffs. This is true regardless of the defendant’s marginal cost of evidence (which may be “high” or “low”; evidence costs are linear). Given BTW’s functional form assumptions,52 however, the increase in litigation payoffs is greater for high marginal evidence cost defendants. The defendant has high marginal evidence costs if and only if she has “shirked” in the primary activity. Thus, the favorable shift in the “presumption” increases the defendant’s incentive to shirk. As a result, the defendant more often shirks. Therefore, given a “bad outcome” in the primary activity, the likelihood that this outcome is a result of the defendant’s shirking – and not just bad luck – is now greater than it was before the shift in the presumption.53

When (and only when) there is a bad outcome, the (potential) plaintiff may choose to file suit against the (potential) defendant. The plaintiff chooses whether to file based on his expected litigation payoffs. These expected payoffs are determined in part on his assessment of the chance that the bad outcome has been caused by the defendant’s shirking, and that the defendant thereby has high marginal evidence costs. (The plaintiff does not directly observe the defendant’s evidence costs or primary activity choices.)

What impact does shifting the presumption in favor of defendants have on the number/frequency of filings? There are three effects.

The first effect is the most direct: conditional on a bad outcome, and conditional on the defendant’s marginal evidence cost, the change in the presumption in favor of defendants lowers the plaintiff’s expected litigation payoffs, and thereby acts to reduce filings.

Second, however, conditional on a bad outcome, the defendant is more likely to have shirked, and so more likely to have high marginal evidence costs. Hence, the defendant is more likely to present less evidence. This acts to increase the plaintiff’s expected litigation payoff. Notice that this second effect is a result of litigation-primary activity feedback: litigation design influences primary activity behavior, which in turn influences litigation outcomes. If the second effect dominates the first, then there will be more filed cases per bad outcome. This is the possibility that BTW emphasize.

Third, because the defendant more often shirks, there will be more bad outcomes. This will also act to increase the number of filed cases. This third effect, another kind of litigation-primary activity feedback effect, p. 255is not emphasized in BTW and has deep roots in the law and economics literature on litigation.

What is the impact of the shift in the presumption on the frequency of plaintiff victory at trial? Consider a filed case after the change in the presumption. Per the first effect identified above, the presumption has shifted against the plaintiff, which lowers the plaintiff’s chance of victory. However, per the second effect, the defendant is more likely to have high marginal cost of evidence, and so more likely to present less evidence. This increases the chance of plaintiff victory given any level of plaintiff evidence. One possible result, therefore, is that the plaintiff more often wins filed cases – despite the fact that the litigation playing field has been tilted against him.

5.2.2.4. Allocation of Proof Burdens and Strategic Complementarities in Evidentiary Choice

Sanchirico (2008) uses an endogenous cost signaling model of evidence to investigate the question of which party in litigation should be assigned the burden of proof. He finds some justification for the regularity in actual law that the burden of proof is assigned to the opponent of the party whose primary activity incentives are being set by the law in question – as when the plaintiff has the burden of proving the defendant’s negligence and the defendant the burden of proving the plaintiff’s contributory negligence.

Sanchirico (2005) generalizes these findings by considering the question of how optimal litigation design structures strategic complementarities between parties’ evidentiary choices. The payoff structure of the adversarial litigation game is such that one party strategically complements (i.e., mimics her opponent’s advances and retreats), while the other strategically substitutes (i.e., does the opposite of her opponent). Which party plays which role depends on how the litigation transfer function – the mapping from evidence onto liability – is structured. Since the litigation transfer function is a policy choice, the question arises: should litigation be designed to induce the plaintiff to strategically complement and the defendant to substitute, or vice versa? On the basis of an asymmetry derived from the envelope theorem, Sanchirico (2005) argues that the answer depends on whether and whose primary activity incentives are being set by the particular evidentiary contest in question. Within each subsidiary evidentiary contest, the “incentive target” should be induced to complement and her adversary to substitute. In some cases, the defendant will be the incentive target, as when the issue is the defendant’s negligence or contractual breach. In other cases, the plaintiff will be the target, as when the defendant defends by claiming that the plaintiff has been ”contributorily negligent.”

p. 256The analysis of the optimal allocation of proof burdens in Sanchirico (2008), cited above, is a special case of the general phenomenon described in Sanchirico (2005). As discussed in the former paper, the party who is not assigned the proof burden tends to be the one who strategically complements. Thus, to induce the incentive target to complement, per Sanchirico (2005), is to assign the proof burden to the opponent of the incentive target.54

6. Correlated Private Information

The potential for exploiting multiple agents’ correlated private information has been extensively studied in the mechanism design literature, most notably in the context of auction design. See, for example, Crémer and McLean (1985, 1988); Hermalin and Katz (1991). In its simplest manifestation, the idea is to gather information about one agent’s “type” (i.e., one agent’s private information) from the report of a second agent regarding the second agent’s own type. This is possible if the agents’ types are not statistically independent. Moreover, if the second agent’s type report is used only to set the first agent’s payoffs, then the second agent has no affirmative incentive to misreport her type. The example below – specifically the discussion in Section 6.2.1 – illustrates this basic principle.

Sanchirico (2000) studies correlated private information in the context of legal evidence.55 In that model, correlated private evidence works side by side with endogenous cost signaling (as discussed in Section 5) and a tradeoff between these two sources of information is identified and analyzed. This section begins with a general discussion, then provides an extended numerical/diagrammatic example, and lastly discusses an historical application.

6.1. General Discussion

Extracting useful information from interested parties by means of endogenous cost signaling, as described in Section 5, is a costly endeavor: the very signaling costs that give the evidence meaning are otherwise a deadweight loss to the system. This raises the question: why does the system not p. 257garner all necessary information from (relatively) uninterested “parties” so as to reduce these evidence costs?

Employing the correlated type reports of uninterested parties has its own costs. The efficacy of such “third-party information” is endogenous to system design. It is tied to the breadth of circumstances triggering suit and the number of individuals participating in each litigation. Both of these factors go to what the mechanism design literature refers to as the “rank” of the information provided by third parties.56 If, for instance, suit occurs in only one state of the world and there is only one third-party participant who makes a report that may take only one of two values, then the signal received by the court from this individual is of low rank, and will be of limited use in creating an incentive for the defendant to take a middling action – such as reasonable precaution, as opposed to negligence or extreme caution. This phenomenon is illustrated in the numerical example in Section 6.2 below – specifically, Section 6.2.2.

The fact that the efficacy of “third-party information” is tied to the breadth of circumstances triggering suit and the number of individuals participating in each suit implies that the cost of obtaining information in this manner accrues primarily in terms of the “fixed costs” of holding hearings – as opposed to the “variable costs” of the evidence therein “produced,” which may be attributed to endogenous cost signaling. The more often suits are filed and the greater the number of participants per suit, the greater the imputed rent on the space used, the greater the salaries and wages of staff, and – most importantly – the greater the opportunity cost, in terms of lost production and leisure, of participation by the parties.

A fundamental tradeoff thus arises between the fixed costs of holding hearings and the variable cost of the evidence produced therein. Relying on interested parties necessitates costly evidence production. Relying instead on less interested observers necessitates more frequent hearings and/or greater attendance at each, and so greater fixed costs.

6.2. p. 258Numerical/Diagrammatic example

The following numerical example – whose essential structure is depicted in Figure 9.1 – illustrates the basic tradeoff between costly evidence production and third-party information.57 The example describes two liability schedules that implement a middling action choice (“Caution”), the first having more frequent filings and less evidence production than the second. In order to keep the example manageable, let us assume that liability and award are “decoupled” (i.e., parties’ awards and payments are not necessarily zero sum),58 and let us restrict attention to the effect of exogenous increases in the fixed costs of suits.

Figure 9.1
The basic mechanics of correlated private information

p. 259There are two agents, an “observer” (he) and a “caretaker” (she).

In the first phase of the model, the caretaker chooses one of three action levels in a tort-like primary activity, “Carelessness,” “Caution,” and “Extreme Care.” All else the same, the caretaker prefers carelessness over caution and caution over extreme care.

The caretaker’s primary activity choice probabilistically determines both the signal observed by the observer (the observer’s “type”) and the caretaker’s (constant marginal) evidence costs (the caretaker’s “type”). The caretaker’s type is either low or high. The signal observed by the observer takes one of three values, “accident,” “neutral,” or “care” (that is, taken by the caretaker).

In the second phase of the model, after observing his signal, the observer decides whether to “file suit.” If he files suit, a hearing occurs (there is no settlement in this model) and the caretaker and observer each incur a fixed cost F in attending this hearing.

During this hearing, the third phase of the model, the observer is given a chance to report her observation and the caretaker to present costly evidence. The principal then metes out monetary rewards and punishments based on this presentation. The principal does this according to a function that the principal announced and committed to before the caretaker made her primary activity choice.

The numerical details are as follows: The caretaker’s possible actions, “carelessness,” “caution,” and “extreme care” impose primary activity costs on the caretaker of $60, $100, and $120, respectively. If the caretaker is a low type, her evidence cost for evidence level e is $e. If she is a high type, her evidence costs for e are $2e. Table 9.5 shows the assumed joint and marginal probability distributions over the two agents’ types, as determined by the caretaker’s primary activity action choice. For example, if the caretaker chooses extreme care, then the probability that the observer sees neutral and that the caretaker’s evidence costs are low is 0.20.

6.2.1. Implementing Caution without endogenous cost signaling

This section illustrates the basic mechanic of correlated private information by showing how the middling primary activity action, caution, may be implemented (largely) without costly evidence production.The caretaker’s primary activity incentives – incorporating both primary p. 260activity costs and hearing payoffs – are depicted in Figure 9.1. The diagram may be interpreted as follows: consider the “basic” liability structure in which the “principal” (the evidence system designer) charges the caretaker $100 if the observer files suit and then reports “accident,” and charges the caretaker $0 in all other circumstances. Assume for the moment that the observer files only when he sees accident, and then truthfully reports his observation at the hearing. (We shall relax this assumption below.) Then, based on the three marginal probabilities of accident, one for each of the three caretaker actions, as indicated by the three shaded boxes in Table 9.5, the caretaker’s expected hearing payoffs from carelessness, caution, and extreme care are –$70 – F, –$35 – F, and –$10 – F, respectively. For example, if the caretaker is careless, there is a 70% chance of an accident, and so a 70% chance that she will have to pay both the $100 charge and the fixed cost of hearing attendance.

Table 9.5Joint and marginal probability distributions over agents’ types for each possible primary activity choice of the caretaker
CarelessnessCautionExtreme Care
Caretaker’s Type
Low CostHigh CostMarginal
Observer’s TypeAccident00.70.70.150.20.3500.10.1
Neutral0.10.10.20.150.150.30.20.20.4
Care0.100.10.250.10.350.500.5
Marginal0.20.80.550.450.70.3

Now, the caretaker is induced to choose caution if the amount (possibly negative) by which the expected hearing payoff from caution exceeds the expected hearing payoff from carelessness (resp. extreme care) is greater than the amount (possibly negative) by which the primary activity cost of caution exceeds the primary activity cost of carelessness (resp. extreme care). I shall refer to the first of these differences as the “hearing advantage” (possibly negative) of caution over carelessness (respectively, extreme care). These two hearing advantages for caution are measured along the x and y axes, respectively, in the figure. In the example we are now considering – given that the caretaker is charged $100 (in addition to having to pay F) if the observer files suit and reports “accident” and $0 otherwise, and given that the observer files suit and reports accident when and only when he in fact sees accident – the hearing advantage of caution p. 261Over carelessness (resp. extreme care) is $35 (= (–$35 – F) – (–$70 – F)) (resp. –$25). This vector of hearing advantages for caution, ($35, –$25), is represented by the darkest solid arrow in the figure.The other two solid vectors represent the hearing advantages for caution when analogous basic liability schedules are constructed from each of the two other observations, neutral and care. For example, the lightest solid vector represents the hearing advantages of caution given that the caretaker pays $100 if the observer files suit and reports “neutral” and pays zero otherwise, and given that the observer only files suit and reports neutral when he actually sees neutral.

The space of all possible hearing advantages that can be created by conditioning solely on the (assumed-to-be-truthful) report of the observer is the set of all linear combinations of the three solid vectors – which is to say, the span of the three basic liability vectors. For example, if the caretaker is effectively “fined” $50 for neutral observations and $200 for accidents (and the observer files suit in these cases and then reports truthfully), the resulting hearing advantages for caution would correspond to the head-to-tail addition of (a) the vector for neutral shrunk to half its length and (b) the vector for accidents expanded to twice its size.

Now consider the dashed right angle. This angle has as its corner the point (40, –20). This is the caretaker’s additional primary activity cost of caution over carelessness ($100 – $60 = $40) and of caution over extreme care ($100 – $120 = –$20), respectively. The set of points to the northeast of this point – that is, inside the corner – is the set of all points whose coordinates exceed ($40, –$20) respectively.

I now describe a liability structure that implements caution without (substantially) costly evidence production. The hearing advantage vectors for accident and neutral span the space in the figure. A fortiori, some linear combination of these two vectors enters the dashed right angle and therefore exceeds ($40, –$20) across both coordinates. It is thus possible to structure a liability schedule based entirely on the observer’s report with the property that – assuming the observer is compliant – caution’s hearing advantage over each alternative action choice exceeds caution’s additional primary activity cost with respect to such action choice. If the principal sets the liability schedule in this way, and can otherwise ensure that the observer, first, files suit in these two circumstances only and, second, truthfully reports his observation, the principal will have induced the caretaker to take the middling activity level, caution.

How then can the principal insure that the observer behaves in this way? Here is one method. Augment the liability schedule discussed in the preceding paragraph as follows: First, given that the observer has filed suit, let the principal reward the caretaker (the negligible amount of) $1 p. 262for presenting (the even more negligible amount of) evidence of e = 0.90. Since the caretaker will present e = 0.90 when and only when her evidence costs are low (i.e., are $e rather than $2e), the caretaker’s presentation will serve as a (negligibly costly) signal of the caretaker’s type. And since the costs and payoffs involved in this extension of the liability schedule are small for the caretaker, the extension will not alter the caretaker’s primary activity incentive to choose caution. Second, let the principal award the observer F + $1 when the caretaker does not present the evidence e = 0.9 (and so is of high type), and F – $0.90, otherwise.How will this payoff structure influence the observer’s decision of whether to file suit? One can calculate that, whatever the observer’s prior beliefs with respect to the caretaker’s primary activity choice, he will file suit only when he sees accident or neutral. For example, if the observer is sure that the caretaker has been cautious, but still observes an accident, then his posterior belief that the caretaker is of low type is, from Table 9.5, 0.15/0.35=0.43.59 Thus, his expected payoff from the hearing will be 0.57(F + $1 – F) + 0.43(F – $0.90 – F) = $0.183 > 0, and so he will file suit.

What about the incentive of the observer to truthfully report his type once he is at the hearing? Once at the hearing, the observer’s payoffs do not turn on what he reports. Rather, his payoffs turn solely on whether the caretaker presents e = 0.90. Having no positive incentive to lie, we may assume that observer reports what he actually sees.60

6.2.2. Less frequent suits using endogenous cost signaling

Alternatively, the principal can implement caution under a liability schedule in which the observer files suit only when he sees an accident – as opposed to when his observation is either accident or neutral. However, this will require that the principal also relies on the caretaker’s presentation of substantially costly evidence.

Figure 9.1 illustrates why the principal cannot rely merely on knowing whether or not the observer has seen “accident.” If the principal cannot distinguish between neutral and care – which the principal cannot if suits only arise when accidents occur – then the liability schedule for the caretaker must be constant (e.g., zero) across these observations. As the reader can confirm, any vector that is a linear combination of care and p. 263neutral and that assigns the same coefficient to each of these vectors is a scalar multiple of the accident vector. (This is not a special feature of the numbers chosen for this example, but is a general consequence of the fact that the hearing advantage vectors are constructed from the probabilities of the three observations, which given any primary activity choice, must add to one.61) It is clear from the diagram that the principal cannot produce a vector that enters the dashed right angle solely by shrinking or expanding the accident vector.

The more general mathematical phenomenon at work here is this: a requirement that some subset of vectors all receive the same coefficients generally reduces the dimensions of the space spanned by the superset of vectors. Imposing such requirement may thereby render a given system of linear inequalities insoluble. Here there are two linear inequalities – that caution’s hearing advantage exceed its additional primary activity costs with respect to both extreme care and carelessness.

Returning to the numerical example, I now explain how caution can be implemented if the caretaker’s liability is made a function of both the observer’s report of accident and the caretaker’s endogenous cost evidence. It is best to begin with the caretaker’s incentives (at the hearing and in the primary activity), assuming compliance by the observer. The explanation of the caretaker’s incentives has two steps.

First, consider the dashed vector emanating from the origin labeled “endogenous cost evidence.” The point of this vector is at ($15, $15), which represents caution’s hearing advantages when (1) hearings are held only after the observer sees an accident, and (2) at those hearings, the caretaker receives $200 + F – $0.10 if she presents evidence of e = 100, and F otherwise. How so? Once the caretaker is at the accident hearing, she will present e = 100 if and only if she is a low type (i.e., has evidence costs $e, rather than $2e). Consequently, the low evidence cost caretaker ends up with hearing payoffs of (roughly) ($200 + F) – ($100 + F) = $100, and the high cost caretaker with payoffs of FF = $0. Thus, it is as if the principal rewards the caretaker with $100 in the joint event that the observer sees an accident and the caretaker is of low type. Now, as Table 9.5 shows, if the caretaker is cautious rather than careless, she increases the chance of this joint event by 0.15 (= 0.15 – 0). Thus, the hearing advantage of p. 264caution over carelessness is $15. A similar calculation reveals that the hearing advantage of caution over extreme caution is also $15.Second, combine the costly evidence scheme just described with a baseline punishment for the caretaker of $100, just for the fact that the observer has filed suit. Diagrammatically, the principal is adding, head to tail, the vector ($15, $15) to the vector for accident. The fact that the summed vector enters the dashed box indicates that caution’s hearing advantages jointly exceed its additional primary activity costs. The caretaker thereby is induced to be cautious.

It remains only to ensure that the observer files only when he sees an accident and that he then reports this observation truthfully. The reader can confirm that this can be accomplished by rewarding the observer 2F + $1 if the caretaker fails to present evidence e = 100 (is of high type), and $0 if the caretaker does make the presentation. As can be gleaned from Table 9.5, whatever the observer believes regarding the caretaker’s primary activity choice, if he sees accident, he believes that the caretaker is strictly more likely to be high than to be low, and if he sees anything else, he believes that the caretaker is no more likely to be high than to be low. Given that the observer incurs attendance costs of F if he files, and given that his reward is slightly less than 2F when the caretaker is of high type and zero otherwise, the observer will file suit only if he believes that the chance that the caretaker is of high type is strictly greater than the chance that the caretaker is of low type. Thus, the observer will only file when he observes accident.

All told, this second liability structure is as follows. If the observer files suit against the caretaker, and the caretaker fails to produce exonerating evidence (e = 100), the observer receives 2F + $1 and the caretaker pays $100 – F. If the caretaker does produce the evidence, the observer gets nothing, whereas the caretaker pays nothing and is reimbursed $100 and F, for her evidence and attendance costs respectively. The caretaker, looking forward to the prospect of suit, including the possibility that she will be able to present evidence to save her from liability, decides that it is worthwhile to choose middling caution rather than carelessness or extreme caution.

6.2.3. Comparison

Let us now compare the system costs of the two implementations. In the first implementation, the observer files suit in two circumstances, accident and neutral, the probabilities of which are determined by the fact that the caretaker chooses caution. Such probabilities sum to 0.65. Evidence production costs are de minimis in this first implementation. The expected social cost of this two-hearing implementation of caution is thus roughly (0.65)2F = 1.5F. Such cost consists solely of the p. 265expected fixed cost of hearings. In the second implementation, the agents appear before the court only when there is an accident. Given the caretaker’s cautious behavior, this happens with probability 0.35. Therefore the expected fixed cost of hearings is (0.35)2F = 0.7F, which is substantially lower than in the first scheme. With probability 0.15, however, the caretaker will find herself in court (the observer having seen accident) and desirous of producing $100 of evidence (the caretaker being of low type). Expected evidence production costs – the expected variable costs of the hearing – are thus $15. Total costs for this second method then are 0.7F + $15.

Which is larger – 1.5F or 0.7F + $15? If the fixed costs of attending hearings are small, the least costly implementation will be the first, the one with hearings in two contingencies. If fixed costs are large, however, then the fact that the first implementation has more hearings will be decisive in making it less efficient. The best alternative will be to suffer the caretaker’s evidence production costs in exchange for a reduction in the likelihood that a hearing will be held.

6.3. Application: Historical Evolution of the Jury’s Role

The foregoing analysis suggests that increases in the opportunity cost of process, due to increases in labor productivity, were one factor in the gradual shift through English legal history from a system relying mostly on relatively disinterested observers to one relying mostly on costly evidence production by the parties themselves.

Crucial to this historical comparative static is the preliminary theoretical point that increases in both fixed and variable cost parameters have an asymmetric effect on system costs. Increases in the variable costs of producing evidence can be mitigated, in part or whole, by relying on different, less costly evidence. Suppose, for example, that the caretaker’s cost of evidence in the numerical example above doubles, so that evidence of e units costs $2e dollars, rather than $e dollars, when the caretaker is a low type and $4e, rather than $2e, when the caretaker is a high type. Then the evidence costs and litigation payoff differences that were formerly generated by any level of evidence e are now generated by the level of evidence ½e. Thus, if we now count as two units of evidence what were formerly counted as one, the example goes through in the same way. The same dynamic does not work with respect to the fixed costs of hearings. If the state tries to compensate for the increase in appearance costs by halving the frequency of trials or the number of individuals attending each, it effects a real reduction in the information content of third-party information.

From its origins in the 12th century up until perhaps the beginning of p. 266the 15th century, the English jury operated as a bank of witness/investigators: 12 “freemen” from the neighborhood in which the case arose, called upon either to employ their pre-existing knowledge of the matter at hand or to conduct their own investigation. By the 16th century, the jury had come to resemble more the blank-slate panel of modern day. During a late phase of the Industrial Revolution, an act of Parliament allowed parties to waive jury process and by 1900 juries were used in only half of the cases before the High Court. During World War I, the jury was abolished in civil cases due to lack of juror supply – the “temporary” change lasted well beyond World War I. A 1933 law allowed jury trials only by leave of the court; in practice, leave is almost never granted.

At least until 1750 the parties themselves, along with other “interested” persons, were prohibited from testifying or even from presenting documents of their own creation, however long ago they were drafted. Toward the end of the Industrial Revolution, however, amidst a flurry of legal reform spearheaded by Jeremy Bentham and others, restraints on interested parties’ ability to testify in older common law procedure were lifted by acts of Parliament. In the modern era, parties’ own presentation of costly evidence – including sponsored eyewitnesses, expert witnesses, plain testimony at risk of perjury, and media production – constitutes the main source of information for the English fact finder.

Thus, one may discern in English legal history a shift from reliance on the reports of disinterested third-party observers to reliance on evidence sponsored by the very parties who stood to gain or lose from the court’s decision. The transformation is marked by punctuated changes in and around the time of the Industrial Revolution. The asymmetric impact of rising process costs may help to explain these changes. The productive activity that is sacrificed by collecting and preparing evidence and participating in court hearings is an important component of the social cost of legal process. Arguably, these opportunity costs increased, in broad trend, over the course of English legal history following the 13th century, with marked acceleration during the Industrial Revolution, concomitant with increases in labor productivity. The model discussed in this section predicts that an across-the-board increase in the opportunity costs of process would increase the relative cost-effectiveness of endogenous cost signaling as opposed to correlated private information. Such an increase in underlying cost-effectiveness may have been one force (among many) acting to shape actual system design.

7. Conclusion

For scholarly attempts to develop a formal, systematic, mathematically based account of legal evidence, the challenge has not been a lack of p. 267available tools. Probability theory, game theory, information economics, and mechanism design offer a wealth of serviceable principles and techniques. The challenge rather has been in determining which of these elements to employ, and in what combination. The pure probabilistic deduction approach to legal evidence makes extensive use of the algebra of conditional probabilities, but almost no use of strategic reasoning. Most of the literature that accounts for strategic considerations – the literature that develops and applies the omission model – seems path-dependently encumbered by assumptions that rule out defining features of legal evidence. Endogenous cost signaling and correlated private information – approaches that combine moral hazard models, asymmetric information models, and probability theory – hold some potential (in the opinion of this author). But such approaches, shadowed as they are by the bowers of pure probabilistic deduction and the omission model, have only begun to take root.

See, for example, the chapters on Settlement, Fee Shifting, and Negative-Expected-Value Suits.

Throughout this entry, all actors are assumed to be risk neutral unless otherwise noted.

This is the condition when the rule is that both parties must pay their own legal costs.

In Rubinfeld and Sappington (1987), the productivity of (hidden) trial effort by the defendant depends on whether she is (exogenously) innocent or guilty. In Bernardo, Talley, and Welch (2000), which is discussed in Section 5.2.2.3 below, productivity of trial effort depends on an earlier chosen primary activity action, similar to Sanchirico (1995, 2000, 2001b).

This approach is also discussed in Posner (1999).

Except as otherwise noted, I shall assume that all events encountered in the following analysis have non-zero probability.

The odds formulation of Bayes’ rule may be derived from the definition of conditional probability (as discussed in the text in this section) as follows:

P(A|B)=P(AB)P(B),P(B|A)=P(AB)P(A)andP(A|B)=P(¬AB)P(B),P(B|A)=P(¬AB)P(¬A)P(A|B)P(B)=P(B|A)P(A)andP(¬A|B)P(B)=P(B|¬A)P(¬A)P(A|B)P(B)P(¬A|B)P(B)=P(B|A)P(A)P(B|¬A)P(¬A),
where ¬A denotes the event that A does not occur, the logical complement of A.

The following formula is easily verified by replacing each conditional probability with its definition. It is then seen that both sides of the equation reduce to

P(GEF)/P(IEF).

In interpreting the formula as an iterated application of Bayes’ rule, note that conditioning on an event E and then on a second event F is the same as conditioning once on the event which is the intersection of E and F. To wit, the initial act of conditioning on event E generates a new probability measure, P( |E) that may be used to form probabilities of G (or I) conditional on F. That is, P(G|F|E)P(GF|E)/P(F|E). We then have P(G|F|E)=P(GF|E)/P(F|E)=(P(GEF)/P(E))/(P(EF)/P(E))=P(GEF)/P(EF)=P(G|EF). By similar reasoning, P(F|GE)=P(F|G|E). It follows that the formula may be written as

P(G|F|E)P(I|F|E)=P(F|G|E)P(F|I|E)×P(G|E)P(I|E).

Base rate neglect has its roots in Kahneman and Tversky (1973) and Nisbett et al. (1976). The literature applying these ideas to legal decision making is quite large. For a recent discussion, see Guthrie, Rachlinski, and Wistrich (2007).

The validity of this formula may be confirmed by substituting from the definition of conditional probability. Specifically, P(E|G)P(G)+P(E|I)P(I)=P(EG)+P(EI), which equals P(E) because events G and I are mutually exclusive and mutually exhaustive.

More precisely, the fact finder would not be able to determine the quantity (1/P(E|I))(P(G)/(1-P(G))), which is multiplied by P(E|G) to determine P(G|E).

The argument originally appeared in Lempert and Saltzburg (1982).

The following analysis is taken from Sanchirico (2001a).

See Sanchirico (2001a) for a fuller analysis of trial selection bias.

Fischhoff (1982). For an experimental design with legal attributes see e.g., Kamin and Rachlinski (1995).

The analysis in this section is a formalization and generalization of Sanchirico (2004c).

Here I am assuming that the fact fi nder has point beliefs regarding the probability measure that describes the defendant’s prior beliefs over accident occurrence and signal value. The analysis can be generalized so that the fact finder has probabilistic beliefs regarding the defendant’s prior beliefs.

The example generalizes to signals with more than one possible value.

Savage (1954).

Allais (1953), Ellsberg (1961).

An early and often cited critique appears in Tribe (1971).

Recent discussions of the conjunction problem appear in Levmore (2001), Stein (2001), and Allen and Jehl (2003).

Kaye (1979) defends conventional probabilistic analysis from this troubling hypothetical. Allen (1986) comments on this defense.

Indeed the problem is one that spans both the application of conventional probabilistic analysis, as described in Lempert (1977), as well as many alternative systems for representing and manipulating the phenomenon of likelihood, such as that laid out in Cohen (1977) and often cited in evidence scholarship.

One can see evidence scholarship starting to bump up against this fact in analyzing the gatecrasher paradox. Kaye, in particular, suggests that the plaintiff’s inability to provide anything other than the “naked statistical evidence” of seats filled versus seats purchased should itself be taken as evidence. See Kaye (1979).

For a general review of these aspects of game theory, see, for example, Kreps (1990).

Other early contributions taking this approach include Sobel (1985) and Shavell (1989). More recent applications and extensions are discussed in Section 4.3.

Within economics scholarship this result is referred to as “unraveling” and is usually jointly attributed to Grossman (1981) and Milgrom (1981). An early critical analysis appears in Farrell (1986). The separate, parallel (and older) pedigree of this idea within evidence law and scholarship is discussed below.

“Spoliation” is a general term referring to evidentiary misconduct. But it is perhaps most often used to describe a party’s failure to produce evidence when so required – either because the evidence has been destroyed or is being withheld. For a discussion of “spoliation” doctrine, see, for example, Sanchirico (2004b).

See, for example, Landsman (1984) and sources cited therein.

Herring v. New York, 442 U.S. 853, 862 (1975).

The Anglo-American system of fact finding relies for evidence on the adversarial efforts of the parties. The continental European system, by contrast, assigns the judge a more active role in investigating the case and questioning witnesses. The latter is often referred to as “inquisitorial procedure.” (On this issue, see Chapter 1 in this volume on Adversarial versus Inquisitorial Justice.)

The variance of each system depends on the size of each sample. The authors compare the same amount of sampling: the inquisitor samples as many times as the two adversaries combined.

In so endogenizing evidence production, Bull and Watson (2004) follows Sanchirico (1995, 2000, 2001b) discussed below. Bull and Watson (2007) is a technical companion to Bull and Watson (2004).

Okuno-Fujiwara, Postlewaite, and Suzumura (1990) is another important contribution in this area. These authors analyze an augmented asymmetric information model in which agents can announce their private information beforehand. They provide conditions for the full revelation of agents’ private information when some of the agents’ announcements are exogenously “certifiable.”

This is to insure the existence of pure strategy equilibria. See LS note 13 at 382.

LS actually cast their analysis in terms of feasible presentation interpretation, discussed above in Section 4.2. More on this below.

See LS’ proposition 7, Corollary 3, and the discussion surrounding these results, all at 389–90.

LS also prove that weak refutability is necessary for the existence of the particular type of truth revealing rule discussed below. See 389–90.

See LS, p. 376.

For another application of the omission approach – in this case to expert witnesses – see Yee (2008). See also Deneckere and Severinov (2003).

See Sanchirico (2004b: 1247–86) for a description of the laws governing evidentiary misconduct in US federal civil cases.

See Sanchirico (2004b), which surveys in Part I the empirical evidence on evidentiary misconduct.

The relative efficacy of perjury and obstruction laws is studied in Sanchirico (2006), which emphasizes that the cost of (recursive) detection avoidance is a drawback of relying too heavily on such policies. Sanchirico (2004b) studies the optimal structure of such laws in a primary activity incentive-setting context and finds that the middling enforcement intensity that is seen in practice may well be justified. Cooter and Emons (2003, 2004) propose an alternative to perjury and obstruction laws: parties post bonds that they later forfeit if their testimony turns out to be false.

A fair portion of the economics analysis of procedure also grounds itself in primary activity incentives. However, this literature does not consider how claims are proven. Instead, it assumes exogenous probabilities (or probability functions) for various trial outcomes and focuses instead on filing, settlement, and fee shifting provisions. See Section 2 above.

In some quarters, it is regarded as important to distinguish “signaling” from “screening.” By “signaling” I mean the phenomenon whereby different types find it in their interest to take different observable actions, without regard to whether such phenomenon arises “naturally” or is in response to a menu of choices laid out ahead of time by a principal.

Rubinfeld and Sappington (1987) appear to model legal evidence as differential cost signaling. A closer look, however, reveals that their model is an exogenous probability function model, as discussed above in Section 2.2.

Compare the preceding two points (from Sanchirico 1995, 2000 and 2001b) to the later analysis in Deneckere and Severinov (2003), which claims to achieve zero cost implementability with small cost differentials. Deneckere and Severinov assume the existence of a zero cost message across which separation is effected. Moreover, the cost differentials they refer to are for per period costs in a model allowing any number of periods.

This phenomenon should be distinguished from other differences between primary activity incentive setting and truth finding that have been identified in the literature.

First, Schrag and Scotchmer (1994) show (Propositions 1 and 2) that the optimal threshold quantum of evidence for guilt is generally lower when the object is taken to be error minimization, rather than deterrence maximization. Assuming false convictions and false acquittals are equally weighted, trial error is proportional to P(I|C)P(C)+P(G|A)P(A), where I is true innocence, G is true guilt, C is conviction and A is acquittal. Deterrence, on the other hand, is proportional to the difference between the probability of conviction given true guilt and the probability of conviction given true innocence: P(C|G)-P(C|I). Using P(C|G)=1-P(A|G) and the definition of conditional probability (see Section 3.1 above), this difference may be written as

1-(P(I|C)P(C)P(I)+P(G|A)P(A)P(G)).

Now, choosing x to maximize 1-f(x) is the same as choosing x to minimize f(x). Thus, maximizing this set-off expression immediately above is different from minimizing P(I|C)P(C)+P(G|A)P(A) if (and only if) P(I)P(G). In particular, if P(G)<P(I), as in Schrag and Scotchmer (1994), then deterrence maximization puts more weight than error minimization on reducing P(G|A)P(A), the incidence of false acquittals.

Second, Kaplow and Shavell (1996) assert that accuracy in the assessment of damages has zero impact on incentives. The specific claim is that in a world in which there is perfect information about whether or not an accident has occurred, and precautionary choice is binary (reasonable care or not), there is no incentive difference between charging the injurer with the harm that she expected to cause or the harm that she actually did cause, even though the later assessment of damages is in a sense “more accurate.” The more general point, implicit in Kaplow and Shavell (1996), is that, when implementing a hidden action with a noisy signal, some of the information in the signal may be superfluous. This will occur when the dimensionality of the signal space exceeds the dimensionality of alternative actions. A similar issue arises in Section 6 below.

Third, Sanchirico’s (2001a) analysis of character evidence hinges on the fact that evidence can be informative of conduct and yet not be affected by such conduct. “Trace” evidence such as fingerprints and eyewitness recollections are both informative of the act and are byproducts of conduct. In contrast, “predictive evidence” (e.g., character evidence), though it may rationally change the fact finder’s assessment of the likelihood that defendant acted in a particular way, is the same whether or not defendant actually did act in that way. If the object is to guess whether the defendant engaged in the conduct, then both types of evidence are useful. But if the object is to affect whether the defendant engages in the conduct, only trace evidence of that conduct is useful. Only trace evidence changes with conduct. Thus, keying penalties and rewards to the production of trace evidence is the only way to make penalties and rewards change with conduct. And making penalties and rewards change with conduct is the only way to create incentives.

Bull (2008b) studies implementability in an extension of this model that allows both sides to present evidence. That paper emphasizes that costly evidence can facilitate Nash implementation by enabling the principal to punish one side without rewarding the other, even when the transfer function is assumed to be zero-sum.

In BTW (2000), the system designer is effectively constrained to choose a liability-per-evidence schedule from within the parameterized class of contest success functions that BTW consider.

As noted, BTW employ a kind of contest success function. See Section 2.2.

The point in this sentence follows from the odds formulation of Bayes’ rule, as presented in Section 3.1 above. Let G be the event “shirking,” and let E be the event “bad outcome.”

Bull (2008a) extends the analysis of endogenous cost evidence models by considering dynamic evidence production games with randomization and public messages.

As discussed below, an important and novel wrinkle in this application of correlated types is that the rank of the opponents’ joint signal is endogenous to the mechanism, since it depends on the collection of signals that inspire suit and the number of individuals involved in each suit.

See, e.g., Hermalin and Katz (1991). The term rank is used because the concept is directly related to the rank (per linear algebra) of a key matrix, the rows of which correspond to possible action choices and the columns of which correspond to states.

Unique to the model described in the text, the rank of the information provided by third parties is endogenous to the mechanism because it depends on the breadth of circumstances triggering suit and the number of parties involved in each, both of which are determined by system design.

This example is taken from Sanchirico (2000).

Sanchirico (2000) discusses the impact of imposing the additional constraint that parties’ transfers be zero sum.

See Section 3.1 for a discussion of Bayes’ rule.

The observer can be given a strict incentive to tell the truth once at the hearing – again at negligible cost – if the model is extended to allow for the observer’s presentation of (negligibly) costly evidence whose cost depends on the observer’s type. This would be analogous to the payoff structure surrounding the caretaker’s presentation of evidence e 5 0.90, as discussed in the text.

Given any primary activity choice a, the probability of accident plus the probability of neutral plus the probability of care must equal 1. That is pA(a) + pN(a) + pC(a) = 1. Therefore, given any two primary activity choices, a and b, (pA(a) − pA(b)) + (pN(a) − pN(b)) + (pC(a) − pC(b)) = 0. It follows that the three hearing advantage vectors must sum to (vector) zero. Therefore, each hearing advantage vector is a scalar multiple (−1) of the sum of the other two.

References

Cases

*

Herring v. New York, 422 U.S. 853 (1975).