Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
Statistical reports play an important role in management decisions, so it pays to hone your skills for spotting inaccuracies and distortions. Some flaws are so subtle that even professional statisticians debate their presence, but many errors are easy enough to detect if you just look for them.
This article describes some outright errors along with some suspect practices that often lead to error. After examining a few graph-drawing fiascos, the article examines ranking distortions and other problems that can result from either excess precision, ill-chosen scales, or a disconcerting quirk called Simpson's Paradox. Finally, the article discusses a common survey methodology error that leads not only to slanted results but also to the reporting of nonexistent trends.
Egregious Graphics
On close inspection, you'll find that many report graphics are inaccurate or substantively inconsistent ' with their supporting text, with their captions, with other graphics, or even internally. An important example of inconsistency between graphs is the use of misleadingly different scales for graphs that readers may want to compare.
As in proofreading text, don't assume that the first error you find is the only one; they often cluster. For example, the following design and implementation problems all appeared in a single graph in a published law firm survey:
Speaking of 3D: Artistic effects in graphs are useful if they fairly guide the reader's attention or clearly communicate some dimension of the data. For example, it often makes sense to graphically highlight the featured part of a graph. But other artistic effects often degrade readability or are simply distracting clutter. An especially regrettable triumph of prettiness over precision is the flattening of circular pie charts into ellipses ' a deformation that makes these useful graphs impossible to read accurately.
While flagrant graphic errors are abundant, some common but serious mistakes are less obvious. For example, many readers know to expect exaggerated slopes if a graph's vertical (Y) axis starts at a value above zero; but X-axis errors can also cause slope distortions. Here are two examples:
The preceding examples introduce only a few of the many types of errors to be found in common business-graphic formats. For a more comprehensive treatment, including many advanced types of statistical graphics, see the book by Edward Tufte listed at the end of this article.
Phony Precision
Statistical reports often attempt to impress readers by offering precise-looking results, sometimes even to several decimal places. Use of excess precision may cast doubt on the mathematical competence of a report preparer, however, since calculations can't generate more “significant digits” than the input with the least accuracy.
Moreover, even legitimately precise results should still be rounded to a level that makes data salient. Readers should complain to report preparers who bury important data in a flood of numbers. As Justice Holmes remarked regarding contractual verbiage: “There is a kind of precision that obscures.”
Ratings based on calculated results are subject to the same precision limits. In reviewing an annual law firm ranking, it makes little sense to speculate on why firms with closely similar scores have changed places. It makes considerably more sense to discuss why firms moved between quintiles of the ranking, as Aric Press does in his article on the Am Law 100 survey.
Pseudo-precision of calculated results is also especially questionable when the input numbers are symbolic rather than actual measurements. One major technology consultancy, for example, used to (and may still) prognosticate technology developments in terms of numbers that the company's high-paying subscribers were encouraged to respect as actual probabilities. By “actual” probabilities I mean that the numbers could be combined mathematically like real probabilities. For example, if development A has a probability-of-occurrence of 0.9 and development B has a probability of 0.7, then ' if the events are independent ' the probability of both developments occurring is 0.9 X 0.7 = 0.63 (which, to avoid insignificant precision, should be rounded to 0.6).
These “probability” numbers, however, were not based on physical measurements and not even generated with sophisticated models based on multiple streams of data inputs; they were tokens representing guesses of the company's (admittedly well educated) analysts. To translate his or her gut feeling to a number, each analyst referred to the company's standard chart of numerical equivalents. As of some years ago, this translation chart had low and high values worded approximately as follows:
Less colorful but similarly vague criteria translated to intermediate levels of “probability.”
Even real probabilities suffer a reduction in precision when combined. Subscribers seeking a basis for strategic decisions should not have been led to believe these oracular tokens could be combined with sufficient precision to correctly rank-order the likelihood of complex future developments.
Weightings and Ratings
Many surveys leave questions unweighted (in effect, they weight each question equally), even when some of the questions are many times more important than others. That's a serious error, and even some well-regarded surveys are guilty of it. Management reports make a similar error when they count rather than weigh factors that go into a recommendation.
Conversely, some surveys and studies inappropriately use weights in ways that seriously distort results. A simple but intrinsic difficulty is that the ratio of rating results is highly dependent on the scale being used. For example, the highest score in a 1-to-3 scale is three times the lowest score, whereas the highest score in a 1-to-10 scale is ten times the lowest score. Many decision aids are so severely affected by this problem that users should consider doing “sensitivity tests”: running equivalent inputs through different scales to see if the results change.
One reason survey and study designers use scales is to make calculated results more distinctive by widening their spread. As a simple example, imagine a survey designed to characterize a firm's experience in using information technology. One question in such a survey might seek to quantify the attitude of firm members regarding the performance of their IT security team.
Let's say 30 people in your firm take this survey, with the finding that:
If all respondents are regarded as having opinions of equal value on this matter – an admirably democratic but possibly not wise assumption – then the survey delivers a clear message: only one-sixth of the survey respondents are impressed with the security team, while fully two-thirds of respondents think the team is too lax.
To give the overall survey results a higher spread, however, many survey designers would be inclined to weight the ratings either by opinion source or by the answer selected. As an example of the latter, consider these two weighting alternatives for the same survey question:
SURVEY A
Overall, how would you rateyour firm's IT security team?
1 ' Too lax;
2 ' Active but barely effective;
3 ' Vigilant and successful.
SURVEY B
Overall, how would you rate your firm's IT security team?
1 ' Too lax;
5 ' Active but barely effective;
10 ' Vigilant and successful.
Given the answers of the 30 respondents summarized above, the weighted scores will be as shown in Table 1, below.
Table 1
[IMGCAP(1)]
To emphasize how the scale weights have affected the ordering of results in Table 1, Table 2, below, ranks those results.
Table 2
[IMGCAP(2)]
In at least some cases a “monotonic” scale could be designed to preserve the order of unweighted results; but, as already suggested, unweighted results are often not ideal either. Therefore:
Simpson's Paradox
An even more unsettling source of turbulence in statistical findings is called Simpson's Paradox. First published in 1951 by E.H. Simpson, the paradox describes how statistics for sample groups considered separately may support a conclusion directly opposite the inference one would draw from the aggregation of those same statistics.
Here's a simple illustration I've adapted from one in Dictionary of Theories (Bothamley, J., Gale Research International, 1993).
In an attempt to reduce the amount of properly billable time that its lawyers and paralegals fail to record, a firm experiments with two new methods. Both methods are tried by a small group of 10 timekeepers and a larger group of 40 timekeepers, with the following results:
When we combine the data from all 50 test subjects, however, we find that method A helped 1 + 14 = 15 of the timekeepers (30%), whereas method B helped only 4 + 5 = 9 of them (18%). Considered in the aggregate, therefore, the test data favor method A.
A variant rendition of Simpson's Paradox highlights the added effect of a “confounding variable” ' a hidden difference between the different-size sample groups. For a nicely worked out example, see www.intuitor.com/statistics/SimpsonsParadox.html.
Since many studies (and especially “meta studies”) average results from different samples, often of different sizes, instances of Simpson's Paradox may be much more prevalent than is generally recognized.
Self Selection and Spurious Trends
I've saved for last a simple but under-reported concern about the validity of survey findings. Far too many surveys are excessively or even entirely based on a “self-selected” pool, ie, just those survey recipients who choose to respond to a mailing or other invitation.
A commonly noted problem with self selection is that those who choose to respond may be atypical in important respects from the overall population being studied. In many cases, self selection even casts doubt on the veracity of respondents.
What is not commonly noted, however, is a hidden consequence of performing a series of self-selected surveys. This consequence is the false identification of trends.
For example, the commentary of one nationally published computer usage survey proclaimed a substantially growing trend toward computer leasing. Various possible reasons were offered for the trend. In fact there was no trend. The self-selected set of respondents had simply shifted from one year to the next, and this allowed a confounding variable to come into play. The previous year's sample had an unusually high proportion of respondents using Unisys computers that were not offered for lease. The new survey's self-selected sample had a more representative proportion of IBM computer users, many of whom traditionally leased their machines.
I strongly suspect that many of the trends reported in the legal and general business press result from unexamined base shifting, just as in the preceding example. Survey reporters should be careful not to foist spurious trend findings on law firm managers, who have enough work to do keeping up with real trends.
To reduce the chance of being misled by false trends, always check the methodology section of survey reports. Look specifically to see what control group surveys and other quality-assurance measures were used to check for self-selection bias. If a statistical report does not provide easy access to a detailed and comprehensible methodology section, the report's credibility should be heavily discounted ' as should its price, if you're paying for it.
Concluding Thoughts
Statistical know-how, in addition to its importance for business decisions, is sometimes vital in providing legal services. It's therefore desirable for several members of each firm to have advanced statistical skills ' a goal to keep in mind in hiring and training decisions.
Firms without in-house statistical skills should consider keeping a professional statistician on retainer for consultation as needed. Ideally, while the statistician reviews the mathematics of a survey or study report, collaborating reviewers from the firm would apply their familiarity with the data to spot non-standardized survey terms, likely selective reporting, and other sources of data quality problems.
Finally, it makes sense to be assertive with internal and external report suppliers whenever you spot a potential statistical error. The ensuing discussion will be mutually educational, and providers who know your concerns are more likely to attend to them in the future.
For Further Reading
A highly recommended book on statistics for nontechnical readers is Statistics by David Freedman, Robert Pisani and Roger Purves (3rd ed., pub. W.W. Norton, 1997).
For an accessible, entertaining, authoritative discussion of graphic distortions and related problems, read Edward Tufte's universally acclaimed book, The Visual Display of Quantitative Information (Graphics Press, 1983).
Statistical reports play an important role in management decisions, so it pays to hone your skills for spotting inaccuracies and distortions. Some flaws are so subtle that even professional statisticians debate their presence, but many errors are easy enough to detect if you just look for them.
This article describes some outright errors along with some suspect practices that often lead to error. After examining a few graph-drawing fiascos, the article examines ranking distortions and other problems that can result from either excess precision, ill-chosen scales, or a disconcerting quirk called Simpson's Paradox. Finally, the article discusses a common survey methodology error that leads not only to slanted results but also to the reporting of nonexistent trends.
Egregious Graphics
On close inspection, you'll find that many report graphics are inaccurate or substantively inconsistent ' with their supporting text, with their captions, with other graphics, or even internally. An important example of inconsistency between graphs is the use of misleadingly different scales for graphs that readers may want to compare.
As in proofreading text, don't assume that the first error you find is the only one; they often cluster. For example, the following design and implementation problems all appeared in a single graph in a published law firm survey:
Speaking of 3D: Artistic effects in graphs are useful if they fairly guide the reader's attention or clearly communicate some dimension of the data. For example, it often makes sense to graphically highlight the featured part of a graph. But other artistic effects often degrade readability or are simply distracting clutter. An especially regrettable triumph of prettiness over precision is the flattening of circular pie charts into ellipses ' a deformation that makes these useful graphs impossible to read accurately.
While flagrant graphic errors are abundant, some common but serious mistakes are less obvious. For example, many readers know to expect exaggerated slopes if a graph's vertical (Y) axis starts at a value above zero; but X-axis errors can also cause slope distortions. Here are two examples:
The preceding examples introduce only a few of the many types of errors to be found in common business-graphic formats. For a more comprehensive treatment, including many advanced types of statistical graphics, see the book by Edward Tufte listed at the end of this article.
Phony Precision
Statistical reports often attempt to impress readers by offering precise-looking results, sometimes even to several decimal places. Use of excess precision may cast doubt on the mathematical competence of a report preparer, however, since calculations can't generate more “significant digits” than the input with the least accuracy.
Moreover, even legitimately precise results should still be rounded to a level that makes data salient. Readers should complain to report preparers who bury important data in a flood of numbers. As Justice Holmes remarked regarding contractual verbiage: “There is a kind of precision that obscures.”
Ratings based on calculated results are subject to the same precision limits. In reviewing an annual law firm ranking, it makes little sense to speculate on why firms with closely similar scores have changed places. It makes considerably more sense to discuss why firms moved between quintiles of the ranking, as Aric Press does in his article on the
Pseudo-precision of calculated results is also especially questionable when the input numbers are symbolic rather than actual measurements. One major technology consultancy, for example, used to (and may still) prognosticate technology developments in terms of numbers that the company's high-paying subscribers were encouraged to respect as actual probabilities. By “actual” probabilities I mean that the numbers could be combined mathematically like real probabilities. For example, if development A has a probability-of-occurrence of 0.9 and development B has a probability of 0.7, then ' if the events are independent ' the probability of both developments occurring is 0.9 X 0.7 = 0.63 (which, to avoid insignificant precision, should be rounded to 0.6).
These “probability” numbers, however, were not based on physical measurements and not even generated with sophisticated models based on multiple streams of data inputs; they were tokens representing guesses of the company's (admittedly well educated) analysts. To translate his or her gut feeling to a number, each analyst referred to the company's standard chart of numerical equivalents. As of some years ago, this translation chart had low and high values worded approximately as follows:
Less colorful but similarly vague criteria translated to intermediate levels of “probability.”
Even real probabilities suffer a reduction in precision when combined. Subscribers seeking a basis for strategic decisions should not have been led to believe these oracular tokens could be combined with sufficient precision to correctly rank-order the likelihood of complex future developments.
Weightings and Ratings
Many surveys leave questions unweighted (in effect, they weight each question equally), even when some of the questions are many times more important than others. That's a serious error, and even some well-regarded surveys are guilty of it. Management reports make a similar error when they count rather than weigh factors that go into a recommendation.
Conversely, some surveys and studies inappropriately use weights in ways that seriously distort results. A simple but intrinsic difficulty is that the ratio of rating results is highly dependent on the scale being used. For example, the highest score in a 1-to-3 scale is three times the lowest score, whereas the highest score in a 1-to-10 scale is ten times the lowest score. Many decision aids are so severely affected by this problem that users should consider doing “sensitivity tests”: running equivalent inputs through different scales to see if the results change.
One reason survey and study designers use scales is to make calculated results more distinctive by widening their spread. As a simple example, imagine a survey designed to characterize a firm's experience in using information technology. One question in such a survey might seek to quantify the attitude of firm members regarding the performance of their IT security team.
Let's say 30 people in your firm take this survey, with the finding that:
If all respondents are regarded as having opinions of equal value on this matter – an admirably democratic but possibly not wise assumption – then the survey delivers a clear message: only one-sixth of the survey respondents are impressed with the security team, while fully two-thirds of respondents think the team is too lax.
To give the overall survey results a higher spread, however, many survey designers would be inclined to weight the ratings either by opinion source or by the answer selected. As an example of the latter, consider these two weighting alternatives for the same survey question:
SURVEY A
Overall, how would you rateyour firm's IT security team?
1 ' Too lax;
2 ' Active but barely effective;
3 ' Vigilant and successful.
SURVEY B
Overall, how would you rate your firm's IT security team?
1 ' Too lax;
5 ' Active but barely effective;
10 ' Vigilant and successful.
Given the answers of the 30 respondents summarized above, the weighted scores will be as shown in Table 1, below.
Table 1
[IMGCAP(1)]
To emphasize how the scale weights have affected the ordering of results in Table 1, Table 2, below, ranks those results.
Table 2
[IMGCAP(2)]
In at least some cases a “monotonic” scale could be designed to preserve the order of unweighted results; but, as already suggested, unweighted results are often not ideal either. Therefore:
Simpson's Paradox
An even more unsettling source of turbulence in statistical findings is called Simpson's Paradox. First published in 1951 by E.H. Simpson, the paradox describes how statistics for sample groups considered separately may support a conclusion directly opposite the inference one would draw from the aggregation of those same statistics.
Here's a simple illustration I've adapted from one in Dictionary of Theories (Bothamley, J., Gale Research International, 1993).
In an attempt to reduce the amount of properly billable time that its lawyers and paralegals fail to record, a firm experiments with two new methods. Both methods are tried by a small group of 10 timekeepers and a larger group of 40 timekeepers, with the following results:
When we combine the data from all 50 test subjects, however, we find that method A helped 1 + 14 = 15 of the timekeepers (30%), whereas method B helped only 4 + 5 = 9 of them (18%). Considered in the aggregate, therefore, the test data favor method A.
A variant rendition of Simpson's Paradox highlights the added effect of a “confounding variable” ' a hidden difference between the different-size sample groups. For a nicely worked out example, see www.intuitor.com/statistics/SimpsonsParadox.html.
Since many studies (and especially “meta studies”) average results from different samples, often of different sizes, instances of Simpson's Paradox may be much more prevalent than is generally recognized.
Self Selection and Spurious Trends
I've saved for last a simple but under-reported concern about the validity of survey findings. Far too many surveys are excessively or even entirely based on a “self-selected” pool, ie, just those survey recipients who choose to respond to a mailing or other invitation.
A commonly noted problem with self selection is that those who choose to respond may be atypical in important respects from the overall population being studied. In many cases, self selection even casts doubt on the veracity of respondents.
What is not commonly noted, however, is a hidden consequence of performing a series of self-selected surveys. This consequence is the false identification of trends.
For example, the commentary of one nationally published computer usage survey proclaimed a substantially growing trend toward computer leasing. Various possible reasons were offered for the trend. In fact there was no trend. The self-selected set of respondents had simply shifted from one year to the next, and this allowed a confounding variable to come into play. The previous year's sample had an unusually high proportion of respondents using Unisys computers that were not offered for lease. The new survey's self-selected sample had a more representative proportion of IBM computer users, many of whom traditionally leased their machines.
I strongly suspect that many of the trends reported in the legal and general business press result from unexamined base shifting, just as in the preceding example. Survey reporters should be careful not to foist spurious trend findings on law firm managers, who have enough work to do keeping up with real trends.
To reduce the chance of being misled by false trends, always check the methodology section of survey reports. Look specifically to see what control group surveys and other quality-assurance measures were used to check for self-selection bias. If a statistical report does not provide easy access to a detailed and comprehensible methodology section, the report's credibility should be heavily discounted ' as should its price, if you're paying for it.
Concluding Thoughts
Statistical know-how, in addition to its importance for business decisions, is sometimes vital in providing legal services. It's therefore desirable for several members of each firm to have advanced statistical skills ' a goal to keep in mind in hiring and training decisions.
Firms without in-house statistical skills should consider keeping a professional statistician on retainer for consultation as needed. Ideally, while the statistician reviews the mathematics of a survey or study report, collaborating reviewers from the firm would apply their familiarity with the data to spot non-standardized survey terms, likely selective reporting, and other sources of data quality problems.
Finally, it makes sense to be assertive with internal and external report suppliers whenever you spot a potential statistical error. The ensuing discussion will be mutually educational, and providers who know your concerns are more likely to attend to them in the future.
For Further Reading
A highly recommended book on statistics for nontechnical readers is Statistics by David Freedman, Robert Pisani and Roger Purves (3rd ed., pub. W.W. Norton, 1997).
For an accessible, entertaining, authoritative discussion of graphic distortions and related problems, read Edward Tufte's universally acclaimed book, The Visual Display of Quantitative Information (Graphics Press, 1983).
ENJOY UNLIMITED ACCESS TO THE SINGLE SOURCE OF OBJECTIVE LEGAL ANALYSIS, PRACTICAL INSIGHTS, AND NEWS IN ENTERTAINMENT LAW.
Already a have an account? Sign In Now Log In Now
For enterprise-wide or corporate acess, please contact Customer Service at [email protected] or 877-256-2473
With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.
In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.
This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.
The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.