Andrew P. Bray

*Reed College*

**The Oxford Anthology of Statistics in Sports: Volume 1: 2000–2004**. James J. Cochran, Jay Bennett, and Jim Albert, eds. Oxford, United Kingdom: Oxford University Press, 2017, xiii+551 pp., $45.95(P), ISBN: 978–0–19–872492–6.

The period from 2000–2004 covered by *The Oxford Anthology of Statistics in Sports* was transformative to the field of sports analytics. A prior volume that collected articles published up until 1999—titled *Anthology of Statistics in Sports*, compiled by these same authors, and published by SIAM (Albert, Bennett, and Cochran 2005)—reveals an underdeveloped field. While many of the articles are excellent and written by well-known statisticians, they often explore curiosities of interest to academic statisticians who like sports. With a few exceptions, these articles were not driving the infusion of sports analytics within professional teams. Future volumes, which will include articles published after the founding of the *Journal of Quantitative Analysis in Sports (JQAS)* in 2005, will exhibit a developed field with articles of methodological complexity that tackle questions important to sports executives. What makes the *Oxford* period so interesting is that it is a witness to this transformation.

Consider the rapid changes that took place during the *Oxford* period. Analytics in baseball—which far outpaced its counterparts in other sports—became professionalized. The public writing of Bill James had largely sustained the field since the publication of his first baseball abstract in 1977, but James published his last historical abstract in 2001 (James Citation2001). By the end of that year, Billy Beane and Paul DePodesta, largely under the influence of James’s ideas, rebuilt the Oakland A’s using analytical principles. That saga was popularized by the 2003 bestseller *Moneyball*, which in turn led several Major League teams to hire statistically minded general managers (e.g., after Beane reneged on their offer, the Boston Red Sox hired Theo Epstein, who promptly hired James) and/or statistical analysts (including this reviewer; I arrived at Shea Stadium in January 2004). The professionalization of sports analytics affected not only industry, but academia as well, leading to the genesis of *JQAS* in 2005.

During this period, when the influence of sports analytics in industry and the breadth and depth of sports analytics within academia were expanding rapidly, what were people writing about? What were the key ideas under investigation? What were the methods being employed? How did our research inform our understanding of the games we play, watch, and study? The *Oxford* anthology gives us qualified answers to these questions.

Compared with the previous anthology, one major step forward for the *Oxford* collection is the broad internationalization of the sports under investigation. Whereas Albert, Bennett, and Cochran (Citation2005) contained only articles published in American Statistical Association (ASA) journals, which tend to focus on the major American sports of baseball, basketball, (American) football, and ice hockey, the *Oxford* collection also includes in its scope journals sponsored by the Royal Statistical Society (RSS), and accordingly has entire chapters devoted to cricket, golf, Olympic sports, and of course, soccer (a.k.a. football). Additional articles about bowling, boxing, and cross-sport topics are also present. The greater diversity of topics ensures that the book will be interesting to a wide audience.

I used this book in a two-credit special studies course in sports analytics this spring with five students—none of whom were cisgender men—whose interest and expertise were in diving, tennis, baseball, and volleyball, and we never wanted for articles to read and discuss. In addition to the obligatory baseball articles, we explored the bizarre and controversial scoring system used in Olympic figure skating, and how it influenced the scandal-plagued 2002 women’s gold medal competition. We digested convincing evidence that “icing” the kicker (i.e., calling a timeout right before the opposing team kicks a critical field goal) in American football may in fact be a worthwhile strategy. We were also forced to grapple with the existence of hot hands—in bowling, a sport where such an effect may be easier to isolate than in say, basketball (Gilovich, Vallone, and Tversky Citation1985). We used a Markov chain to compute the optimal time at which an ice hockey team should pull their goalie (spoiler alert: it is earlier than the conventional wisdom would indicate). Having a diverse set of sports in the anthology allows students with different backgrounds to share their various expertizes. In sports analytics, this domain knowledge of a particular sport is crucial. While a cursory understanding of the sport is necessary to digest each article, a deeper understanding enables one to contextualize the *value* of the article.

In addition to the diversity across sports, the 37 articles in the *Oxford* anthology run the gamut of statistical complexity. At the high end is a challenging article about the performance of baseball players by Gary Koop published in the *Journal of the American Statistical Association*. At the low end is an accessible article about induction into the Baseball Hall of Fame by James Cochran published in the *Journal of Statistics Education*. The former employs a multiple-output Bayesian model presented with all of its attendant parameters, subscripts, and conditional posterior distributions. Graduate students will find this meaty. The latter involves nothing more than arithmetic and a chi-squared test of independence, making it suitable for any student with college-level statistics instruction (including an AP course). Spanning these extremes are 24 articles published in either *Chance* or its RSS counterpart *Significance* (or its predecessor *Journal of the Royal Statistical Society*, Series D). The complexity of these articles varies, but they are written for a broader audience than a statistics journal, and are thus more easily accessible to undergraduate students. Nine articles published in *The American Statistician* and two published in *Journal of the Royal Statistical Society*, Series A round out the collection.

The *Oxford* anthology has a few drawbacks, most of which are minor. First, the articles are largely independent. This is a result of the decision on the part of the editors to organize the material across the anthologies chronologically, rather than thematically. In a different organization, we might be able to follow the development of a single idea (e.g., does the hot hand exist?, or how do we evaluate pitchers?) over time. Second, there are enough typos (e.g., “modem sprinters,” “gloal”) and other printing mistakes (e.g., variable names not rendered in italics) to annoy the reader, and in some cases muddle the material. Figure references are frequent culprits. For example, the captions for Figures 4.6 and 4.7 are switched, an error that does not appear in the original article. Figure 29.4 appears to be missing entirely. More strange is when things actually change: Figure 4.8 appears to contain an additional (troublesome) data point that is not in the original graphic nor in the text. Obviously no editor will catch everything, but there are a few places where more care would have led to a better product.

Finally, as statisticians we must confront the self-selection bias of the ideas included in the anthology—which could also be characterized by what is not present. Specifically, there are no papers about basketball (probably the world’s second most popular sport) despite the fact that Dean Oliver published his groundbreaking work in that sport in 2004 (Oliver Citation2004). Similarly, Voros McCracken’s theory of Defense Independent Pitching Statistics was published online in 2001 and revolutionized the evaluation of pitchers in baseball. While we cannot blame the editors of the anthology for these authors choosing not to publish in ASA or RSS journals, these omissions limit the interpretation of the collection as an archive of the most important ideas in sports analytics at that time. It is a collection of the best sports analytics papers published in these journals during that time.

At $46, the anthology delivers a great deal of content for slightly more than one dollar per article. Many of these articles are hard to find online as free PDFs—making the book an even better bargain. This is not a textbook—it contains no exercises, problem sets, or data. If I were teaching a sports analytics course, I would organize content from various places around common themes during the first half of the course. However, in the second half, I would use the *Oxford* anthology as a starting place for students to develop their own ideas for a research project. The anthology is also a fine collection of articles for use in a stand-alone undergraduate or graduate research seminar in sports analytics. There is something for everyone (except basketball fans), and the breadth and quality of the articles is excellent.

Benjamin S. Baumer

*Smith College*

**Practical Bayesian Inference: A Primer for Physical Scientists**. Coryn A. L. Bailer-Jones. Cambridge, United Kingdom: Cambridge University Press, 2017, ix+295 pp., $37.99(P), ISBN: 978–1–31–664221–4.

*Practical Bayesian Inference: A Primer for Physical Scientists* provides an introduction to the major concepts and computational tools useful for analyzing and interpreting data via a Bayesian approach. Topics covered include basic probability, estimation, data modeling, regression, Monte Carlo methods including Markov chain Monte Carlo, classical hypothesis testing, regularization, model assessment, and beyond. Given the scope, most topics are covered briefly with many derivations and formal proofs omitted. For example, random number generation and importance sampling are covered in 2 pages and 12 lines, respectively. One pleasing aspect was significant R code included within the text and online. The author notes “Most of the plots in this book were produced using the code provided.”

Overall, the exposition includes some nonstandard definitions and cursory treatment of some topics. For example, the presentation of the central limit theorem uses *y* = ∑_{i}*x _{i}* (instead of a mean) followed by a footnote stating in the limit the mean and variance of

*y*become infinite “… but we need not be so pedantic …” (p. 39).

The author’s primary audience is undergraduate and graduate science students. Basic calculus knowledge is required but no experience with probability or statistics is assumed. The brief exposition and lack of exercises limit this book as a primary course text. However, some nice discussions and extensive R code make this a potential choice for supplementary material.

James M. Flegal

*University of California, Riverside*

http://orcid.org/0000-0002-9960-8942

**Quantitative Social Science: An Introduction**. Kosuke Imai. New Jersey: Princeton University Press, 2018, xix+408 pp., $49.50(P), ISBN: 978–0–69–117546–1.

Kosuke Imai has written a very nice book at the intersection of social science and “data science.” I used this book as the basis of a 5-week short course on the R programming language for data analysis taught to MBA students. The book wholly and exclusively embraces the R language, despite that not being called out in the title. Although my students were not social science scholars, the book’s examples—fertility rates, vote tallying, the impact of minimum wage on employment—were highly accessible and mainly non-technical. Indeed, despite the title, the book is *really* a data analysis book, inspired by the requirements of political science students at Princeton. We find on p. 3 the outright declaration: “This book is written for anyone who wishes to learn data analysis and statistics for the first time.” My experience with the book was positive, and I recommend it to anyone teaching a similar course.

The main thing readers or instructors should know about this book is that it adopts a “learn by doing” approach. This is spelled out explicitly in the Introduction (p. 7):

[This book] is based on the following principle:

One can learn data analysis only by doing, not by reading.

This book is not just for reading. The emphasis must be placed on gaining experience in analyzing data. This is best accomplished by trying out the code in the book on one’s own, playing with it, and working on various exercises that appear at the end of each chapter.

True to this philosophy, this book is not suitable as a reference, for either social science or general data analysis. However, as a workbook for guiding students through actual hands-on data analysis, the book is invaluable. And by “actual” data analysis I mean the analysis of actual, real-world data. Princeton University Press even hosts a web site (*http://press.princeton.edu/qss/*) where all code and datasets used in the book are freely available for download. As of the time this review was written, I can vouch for the ready availability of these materials.

One additional thing that makes *Quantitative Social Science: An Introduction* unique is the ordering of its topics. After a short introduction, the second chapter of the book is called “Causality” and emphasizes the important distinction between data arising from randomized controlled trials, on the one hand, and data arising from mere observational studies, on the other hand. Namely, the former permits straightforward causal interpretations, while the latter does not, without additional assumptions that must be scrutinized. This distinction is clearly important for social scientists, who largely address causal questions from non-experimental data; of course, very many non-social science applications also work in this regime. The potential outcomes approach to causal inference is even briefly introduced, which is quite uncommon for a typical data analysis textbook. Meanwhile, “Uncertainty” (that is, classical statistics) is not broached until the very last full-length chapter (7). In my opinion, this is the correct emphasis: careful interpretation of the trends spotted in the data, including their limits of validity, followed later (after much practice) by the idea that such trends are subject to random variation, along with some tools for assessing that variability. Of course, many (most?) statistics books do this in the other order, jumping with both feet into the tall weeds of hypothesis testing and then, many headaches later, tossing off a line about correlation not establishing causation. Did I mention that this book uses real, interesting data?

There are things I think could be improved about the book, but these criticisms are minor relative to the book’s immense utility. For example, I am not in love with the Swirl package that is used for the book’s associated quizzes. My students found it clunky and counter-intuitive, just one more thing to learn along with everything else. By virtue of its (righteous, justified) focus on the nitty–gritty details of each applied example, actually *reading* through the book’s text can be hard going. Some of the applied examples make significant analysis choices that are not well-motivated in the text. For example, in the analysis of the minimum-wage data, taken from Card and Krueger (Citation1994), it is never adequately explained why the focus is on percentage of full-time employees. The problem is posed as “Did such an increase in the minimum wage reduce employment as economic theory predicts?” A bit later, we read “To test this theory, we examine the proportion of full-time employees...” Your better students will ask why not look at the total number of employees, and to get an answer you will need to consult outside sources. Finally, although I laud the decision to place substantive interpretation prior to statistical considerations, the thematic breakdown of the remaining chapters lacks an overarching framework for students to hang onto. “Measurement,” “Prediction,” “Discovery,” and “Probability” name important principles, but the text lacks a narrative thread tying these principles together. To my taste, it would have been better simply to organize the book around case studies. By trying to anchor the applied examples to specific named principles, the book invites digressions—did we really need a discussion of frequentist versus Bayesian philosophies, much less the obligatory depiction of the good Reverend himself (Figure 6.1)?

Of course, as I mentioned above, these minor issues are largely beside the point. The main attractions here are the datasets, the exercises, and the code, and in putting these together Professor Imai has done a real mitzvah for anyone setting out to teach an introductory class on data analysis using R. In combination with my other favorite R-based textbook, *Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving* (Nolan and Temple Lang Citation2015), a diligent student has several months of engaging, edifying, and well-though-out exercises to work through.

Perhaps future versions of Imai’s *Quantitative Social Science* will be more streamlined and slightly more reader-friendly, but that would just be icing on the cake.