Assistant Professor
Law, Economics, and Data Science Group (team)
ETH Zürich
IFW E47.1 (email norms)

Curriculum Vitae
Google Scholar Page

Research Interests:

Topics: Law and Economics, Political Economy, Public Finance
Methods: Econometrics, Computational Linguistics, Machine Learning

Online Seminar in Economics + Data Science
Second Monash-Warwick-Zurich Text-as-Data Workshop (submit papers by June 28th, conference provisionally scheduled for Aug 30th-31st on Zoom)

Recent Working Papers

DocSCAN: Unsupervised Text Classification via Learning from Neighbors” (with Dominik StammbachAbstract

We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN). For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels. Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels. On five topic classification benchmarks, we improve on various unsupervised baselines by a large margin. In datasets with relatively few and balanced outcome classes, DocSCAN approaches the performance of supervised classification. The method fails for other types of classification, such as sentiment analysis, pointing to important conceptual and practical differences between classifying images and texts.


Emotion and Reason in Political Language” (with Gloria Gennaro) Abstract

We use computational linguistics techniques to study the use of emotion and reason in political discourse. Our new measure of emotionality in language combines lexicons for affective and cognitive processes, as well as word embeddings, to construct a dimension in language space between emotion and reason. After validating the method against human annotations, we apply it to scale 6 million speeches in the U.S. Congressional Record for the years 1858 through 2014. Intuitively, emotionality spikes during time of war and is highest for patriotism-related topics. In the time series, emotionality was relatively low and stable in the earlier years but increased significantly starting in the late 1970s. Comparing Members of Congress to their colleagues, we find that emotionality is higher for Democrats, for women, for ethnic/religious minorities, and for those with relatively extreme policy preferences (either left-wing or right-wing) as measured by roll call votes.

Press: Sage Ocean Blog (2021).


Gender Attitudes in the Judiciary: Evidence from U.S. Circuit Courts” (with Arianna Ornaghi and Daniel L. Chen) Abstract

Do gender attitudes influence interactions with female judges in U.S. Circuit Courts? In this paper, we propose a novel judge-specific measure of gender attitudes based on use of gender-stereotyped language in the judge’s authored opinions. Exploiting quasi-random assignment of judges to cases and conditioning on judges’ characteristics, we validate the measure showing that slanted judges vote more conservatively in gender-related cases. Slant influences interactions with female colleagues: slanted judges are more likely to reverse lower-court decisions if the lower-court judge is a woman than a man, are less likely to assign opinions to female judges, and cite fewer female-authored opinions.


Media Slant is Contagious” (with Philine Widmer and Sergio Galletta) Abstract

This paper provides causal evidence on how partisan news messaging from cable television influences the content published by newspapers in U.S. localities. We introduce a new parallel corpus of newspaper articles (24M articles in 600+ local newspapers) and transcribed television news shows (40K cable news episodes from Fox News Channel, CNN, and MSNBC) for the years 2005-2008. We measure media influence using a supervised learning model that predicts, for a given piece of text, the probability that it comes from a Fox News transcript, rather than from CNN or MSNBC. After validating the measure, we apply it to the local newspaper article texts. Exogenous variation in news viewership across localities comes from relative channel numbering, which we use as instruments. We find that an exogenous increase in local viewership of a cable news network shifts the textual content of local newspapers toward that network’s content. Televised media slant works not just through persuading viewers, but through influencing other media outlets.


Measuring gender and religious bias in the Indian judiciary” (with Sam Asher, Aditi Bhowmick, Daniel L. Chen, Tanaya Devi, Christoph Goessmann, Paul Novosad, Bilal Siddiqi) Abstract

We study judicial in-group bias in Indian criminal courts, collecting data on over 80 million legal case records from 2010–2018. We exploit quasi-random assignment of judges and changes in judge cohorts to examine whether defendant outcomes are affected by being assigned to a judge with a similar religious or gender identity. We estimate tight zero effects of in-group bias. The upper end of our 95% confidence interval rejects effect sizes that are one-fifth of those in most of the prior literature.


A Machine Learning Approach to Analyze and Support Anti-Corruption Policy” (with Sergio Galletta and Tommaso Giommoni) Abstract

Can machine learning support better governance? In the context of Brazilian municipalities, 2001-2012, we have access to detailed accounts of local budgets and audit data on the associated fiscal corruption. Using the budget variables as predictors, we train a tree-based gradient-boosted classifier to predict the presence of corruption in held-out test data. The trained model, when applied to new data, provides a prediction-based measure of corruption which can be used for new empirical analysis or to support policy responses. We validate the empirical usefulness of this measure by replicating, and extending, some previous empirical evidence on corruption issues in Brazil. We then explore how the predictions can be used to support policies toward corruption. Our policy simulations show that, relative to the status quo policy of random audits, a targeted policy guided by the machine predictions could detect more than twice as many corrupt municipalities for the same audit rate.


Revisions Requested

Ideas Have Consequences: The Impact of Law and Economics on American Justice(with Daniel L. Chen and Suresh Naidu). Reject and resubmit, Quarterly Journal of Economics. Abstract

This paper provides a quantitative analysis of the effects of the early law-and-economics movement on the U.S. judiciary. We focus on the Manne Economics Institute for Federal Judges, an intensive economics course that trained almost half of federal judges between 1976 and 1999. Using the universe of published opinions in U.S. Circuit Courts and 1 million District Court criminal sentencing decisions, we estimate the differences-in-differences effect of Manne program attendance using judge fixed effects. Selection into attendance was limited – the program was popular across judges from all backgrounds, was regularly oversubscribed, and admitted judges on a first-come first-served basis – and we further adjust for machine-learning-selected covariates predicting the timing of attendance. We find that after attending economics training, participating judges use more economics language in their opinions, issue more conservative decisions in economics-related cases, rule against regulatory agencies more often, favor more lax enforcement in antitrust cases, and impose more/longer criminal sentences. The law-and-economics movement had policy consequences via its influence on U.S. federal judges.


Race-Related Research in Economics and other Social Sciences” (with Arun Advani, David Cai, and Imran Rasul), under revision for Econometric Society Monograph Series. Abstract

How does economics compare to other social sciences in its study of issues related to race and ethnicity? We assess this using a  corpus of 500,000 academic publications in economics, political science, and sociology. Using an algorithmic approach to classify race-related publications, we document that economics lags far behind the other disciplines in the volume and share of race-related research, despite having higher absolute volumes of research output. Since 1960, there have been 13,000 race-related publications in sociology, 4,000 in political science, and 3,000 in economics. Since around 1970, the share of economics publications that are race-related has hovered just below 2%  (although the share is higher in top-5 journals); in political science, the share has been around 4% since the mid-1990s, while in sociology, it has been above 6% since the 1960s and risen to over 12% in the last decade. Finally, using survey data collected from the Social Science Prediction Platform, we find that economists tend
to overestimate the amount of race-related research in all disciplines, but especially so in economics.


Econometric Society’s Monograph 

How Cable News Reshaped Local Government” (with Sergio Galletta) Reject and resubmit, American Economic Journal: Applied Economics  Abstract

Partisan cable news broadcasts have a causal effect on the size and composition of budgets in U.S. localities. Utilizing channel positioning as an instrument for viewership, we show that exposure to the conservative Fox News Channel shrinks local government budgets, while liberal MSNBC enlarges them. Revenue changes are driven by shifts in property taxes, a key tool for local redistributive policy. Expenditure changes are driven by public hospital expenditures, an important discretionary public good provided by local governments. We also find evidence that Fox exposure increased privatization (while MSNBC decreased it). An analysis of mechanisms suggests that the results are driven by changes in voter preferences, but not by changes in partisan control of city governments.


Reducing Partisanship in Judicial Elections Can Improve Judge Quality: Evidence from U.S. State Supreme Courts” (with W. Bentley MacLeod) Revise and resubmit, Journal of Public Economics. Abstract

Should technocratic public officials be selected through politics or by merit? This paper explores how selection procedures influence the quality of selected officials in the context of U.S. state supreme courts for the years 1947-1994. In a unique set of natural experiments, state governments enacted a variety of reforms making judicial elections less partisan and establishing merit-based procedures that delegate selection to experts. We compare post-reform judges to pre-reform judges in their work quality, measured by forward citations to their opinions. In this setting we can hold constant contemporaneous incentives and the portfolio of cases, allowing us to produce causal estimates under an identification assumption of parallel trends in quality by judge starting year. We find that judges selected by nonpartisan processes (nonpartisan elections or technocratic merit commissions) produce higher-quality work than judges selected by partisan elections. These results are consistent with a representative voter model in which better technocrats are selected when the process has less partisan bias or better information regarding candidate ability.

Replication Notebook.


Selected Publications

“Cross-Domain Topic Classification for Political Texts” (with Massimo Morelli and Moritz Osnabruegge), Political Analysis (conditionally accepted). Abstract

We introduce and assess cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The advantage over within-domain supervised learning is significant efficiency gains because one can use existing training data. The advantage over unsupervised topic models is that our approach can be more specifically targeted to a research question and that the resulting topics are easier to validate and interpret. We demonstrate the method in the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). Besides the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier assigns topics accurately in the parliamentary speeches, although accuracy varies substantially by topic. We also propose a tool for interpreting the topics and diagnosing cross-domain classification. To assess empirical validity, we present two case studies on how electoral rules and parliamentarian gender influence the choice of speech topics.


Measuring Discretion and Delegation in Legislative Texts: Methods and Application to U.S. States” (with Massimo Morelli and Matia Vannoni), Political Analysis (2020). Abstract

Bureaucratic discretion and executive delegation are central topics in political economy and political science. The previous empirical literature has measured discretion and delegation by manually coding large bodies of legislation. Drawing from computational linguistics, we provide an automated procedure for measuring discretion and delegation in legal texts to facilitate large-scale empirical analysis. The method uses information in syntactic parse trees to identify legally relevant provisions, as well as agents and delegated actions. We undertake two applications. First, we produce a measure of bureaucratic discretion by looking at the level of legislative detail for U.S. states and find that this measure increases after reforms giving agencies more independence. This effect is consistent with an agency cost model where a more independent bureaucracy requires more specific instructions (less discretion) to avoid bureaucratic drift. Second, we construct measures of delegation to governors in state legislation. Consistent with previous estimates using non-text metrics, we find that executive delegation increases under unified government.

Press: Bocconi Knowledge (2021).


Elections and divisiveness: Theory and evidence” (with Massimo Morelli and Richard Van Weelden), Journal of Politics (2017). Abstract

This paper provides a theoretical and empirical analysis of how politicians allocate their time across issues. When voters are uncertain about an incumbent’s preferences,  there is a pervasive  incentive to “posture” by spending too much time on divisive issues (which are more informative about a politician’s preferences) at the expense of time spent on common-values issues (which provide greater benefit to voters).  Higher transparency over the politicians’ choices can exacerbate the distortions. These theoretical results motivate an empirical study of how Members of the U.S. Congress allocate time across issues in their floor speeches.  We find that U.S. Senators spend more time on divisive issues when they are up for election, consistent with electorally induced posturing. In addition, we find that U.S. House Members spend more time on divisive issues in response to higher news transparency.


New Policing, New Segregation: From Ferguson to New York” (with Jeffrey A. Fagan), Georgetown Law Journal Online (2017).Abstract

Modern policing emphasizes advanced statistical metrics, new forms of organizational accountability, and aggressive tactical enforcement of minor crimes as the core of its institutional design. Recent policing research has shown how this policing regime has been woven into the social, political and legal systems in urban areas, but there has been little attention to these policing regimes in smaller areas. In these places, where relationships between citizens, courts and police are more intimate and granular, and local boundaries are closely spaced with considerable flow of persons through spaces, the “new policing” has reached deeply into the everyday lives of predominantly non-white citizens through multiple contacts that lead to an array of legal financial obligations including a wide array of fines and fees. Failure to pay these fees often leads to criminal liability. We examine two faces of modern policing, comparing the Ferguson, Missouri and New York City. We analyze rich and detailed panel data from both places on police stops, citations, warrants, arrests, court dispositions, and penalties, to show the web of social control and legal burdens that these practices create. The data paint a detailed picture of racially discriminatory outcomes at all stages of the process that are common to these two very different social contexts. We link the evidence on the spatial concentration of the racial skew in these policing regimes to patterns of social and spatial segregation, and in turn, to the social, economic and health implications for mobility. We conclude with a discussion of the implications of the “new policing” for constitutional regulation and political reform.


Intrinsic motivation in public service: Theory and evidence from state supreme courts” (with W. Bentley MacLeod), Journal of Law and Economics (2015).Abstract

This paper provides a theoretical and empirical analysis of the intrinsic preferences of state appellate court judges. We construct a panel data set using published decisions from state supreme court cases merged with institutional and biographical information on all (1,636) state supreme court judges for the 50 states of the United States from 1947 to 1994. We estimate the effects of changes in judge employment conditions on a number of measures of judicial performance. The results are consistent with the hypothesis that judges are intrinsically motivated to provide high-quality decisions, and that at the margin they prefer quality over quantity. When judges face less time pressure, they write more well-researched opinions that are cited more often by later judges. When judges are up for election then performance falls, suggesting that election politics take time away from judging work – rather than providing an incentive for good performance. These effects are strongest when judges have more discretion to select their case portfolio, consistent with psychological theories that posit a negative effect of contingency on motivation.


More Working Papers

The Effect of Fox News Channel on U.S. Elections: 2000-2020” (with Sergio Galletta, Matteo Pinna, and Christopher Warshaw). Abstract

This paper provides a comprehensive assessment of the effect of Fox News Channel (FNC) on elections in the United States. FNC is the highest-rated channel on cable television and has a documented conservative slant. We show that FNC has helped Republican candidates in elections across levels of U.S. government over the past decade. A one standard deviation decrease in FNC’s channel position boosted Republican vote shares by at least .5 percentage points in recent presidential, Senate, House and gubernato- rial elections. The effects of FNC increased steadily between 2004 and 2016 and then plateaued. Survey-based evidence suggests that FNC affects elections by shifting the political preferences of Americans to the right. Overall, the findings suggest that FNC has contributed to the nationalization of United States elections.


Mandatory Retirement Reforms for Judges Improved Performance on U.S. State Supreme Courts” (with W. Bentley MacLeod). Abstract

Anecdotal evidence often points to aging as a cause for reduced work performance. This paper provides empirical evidence on this issue in a context where performance is measurable and there is variation in mandatory retirement policies: U.S. state supreme courts. We find that introducing mandatory retirement reduces the average age of working judges and improves court performance, as measured by output (number of published opinions) and legal impact (number of forward citations to those opinions). Consistent with aging effects as a contributing factor, we find that older judges do about the same amount of work as younger judges, but that work is lower-quality as measured by citations. However, the effect of mandatory retirement on performance is much larger than what would be expected from the change in the age distribution, suggesting that the presence of older judges reduces the performance of younger judges.


More Laws, More Growth? Evidence from U.S. States” (with Massimo Morelli and Matia Vannoni). Abstract

This paper analyzes the conditions under which more detailed legislation contributes to economic growth. In the context of U.S. states, we apply natural language processing tools to measure legislative flows for the years 1965-2012. We implement a novel shift-share design for text data, where the instrument for legislation is leave-one-out legal-topic flows interacted with pre-treatment legal-topic shares. We find that at the margin, higher legislative detail causes more economic growth. Motivated by an incomplete-contracts model of legislative detail, we test and find that the effect is driven by contingent clauses, that the effect is concave in the pre-existing level of detail, and that the effect size is increasing with economic policy uncertainty.

Press: VoxEU.


Causal effects of judicial sentiment: Methods and application to U.S. Circuit Courts (with Sergio Galletta and Daniel L. Chen) Revise and resubmit, Economica. Abstract

This paper provides a general method for analyzing the causal effects of sentiments expressed in the language of judicial rulings, with an application to the effect on social attitudes. We apply natural language processing tools to the text of U.S. appellate court opinions to extrapolate judges’ sentiments toward a number of specific target groups. Exogenous variation in those sentiments comes from an instrumental variables approach, which exploits the random assignment of judges to cases (and the fact that judge characteristics provide good cross-validated predictors of expressed sentiments). Our estimates are consistent with a backlash effect from judge sentiments to social attitudes. This effect does not persist over time and is heterogeneous depending on the target group considered.


The Effect of Fox News on Health Behavior during COVID-19” (with Sergio Galletta, Dominik Hangartner, Yotam Margalit, and Matteo Pinna) Revise and resubmit, Political Analysis. Abstract

In the early weeks of the 2020 coronavirus (COVID-19) pandemic, Fox News Channel advanced a skeptical narrative that downplayed the risks posed by the virus. We find that this narrative had significant consequences: in localities with higher Fox News viewership—exogenous due to random variation in channel positioning—people were less likely to adopt behaviors geared toward social distancing (e.g., staying at home) and consumed less goods in preparation (e.g., cleaning products, hand sanitizers, masks). Using original survey data, we find that the effect of Fox News came not merely from its long-standing distrustful stance toward science, but also due to program-specific content that minimized the COVID-19 threat.

Press: Hollywood Reporter.


Conservative News Media and Criminal Justice: Evidence from Exposure to Fox News Channel” (with Michael Poyker). Abstract

Exposure to conservative news causes judges to impose harsher criminal sentences. Our evidence comes from an instrumental variables analysis, where randomness in television channel positioning across localities induces exogenous variation in exposure to Fox News Channel. These treatment data on news viewership are taken to outcomes data on almost 7 million criminal sentencing decisions in the United States for the years 2005–2017. Higher Fox News viewership increases incarceration length, and the effect is stronger for black defendants and for drug-related crimes. The effect is observed for elected, and not appointed, judges, consistent with voter attitudes as a potential mechanism. The effect becomes weaker as judges get closer to election, suggesting a diminishing marginal effect for judges who are already politically engaged.

Media Coverage: New Statesman.


What Drives Partisan Tax Policy? The Effective Tax CodeAbstract

This paper contributes to recent work in political economy and public finance that focuses on how details of the tax code, rather than tax rates, are used to implement redistributive fiscal policies. I use tools from natural language processing to construct a high-dimensional representation of tax code changes from the text of 1.6 million statutes enacted by state legislatures for the years 1963 through 2010. A data-driven approach is taken to recover the effective tax code – the language in tax law that has the largest impact on revenues, holding major tax rates constant. I then show that the effective tax code drives partisan tax policy: relative to Republicans, Democrats use revenue-increasing language for income taxes but use revenue-decreasing language for sales taxes (consistent with a more redistributive fiscal policy) despite making no changes on average to statutory tax rates. These results are consistent with the view that due to their relative salience, changing tax rates is politically more difficult than changing the tax code.


Polarization and Political Selection” (with Tinghua Yu). Abstract

Does political polarization among voters affect the quality of elected officials? We examine the question both theoretically and empirically. In our model, high quality candidates prefer to spend time on their current careers over electoral campaigning. In a polarized electorate, however, voters cast their votes mainly based on candidates’ party affiliations, reducing electoral campaign effort in equilibrium. Hence under higher polarization among voters, higher quality candidates are more likely to run for high office and to get elected. Our testable prediction is that electorates with higher polarization select candidates who perform better. We take the predictions to data on judges’ performance constructed from the opinions of all state supreme court judges working between 1965 and 1994. We find that judges who joined the court when polarization was high write higher-quality decisions (receiving more citations from other judges) than judges who joined when polarization was low.


More Publications

Peer-Reviewed Journals

Fiscal pressures and discriminatory policing: Evidence from traffic stops in Missouri” (with Allison Harris and Jeffrey Fagan), Journal of Race, Ethnicity, and Politics (2020). Abstract

This paper provides evidence of racial variation in traffic enforcement responses to local government budget stress using data from policing agencies in the state of Missouri from 2001 through 2012. Like previous studies, we find that local budget stress is associated with higher citation rates; we also find an increase in traffic-stop arrest rates. However, we find that these effects are concentrated among white (rather than black or Latino) drivers. The results are robust to the inclusion of a range of covariates and a variety of model specifications, including a a regression-discontinuity examining bare budget shortfalls. Considering potential mechanisms, we find that targeting of white drivers is higher where the white-to-black income ratio is higher, consistent with the targeting of drivers who are better able to pay fines. Further, the relative effect on white drivers is higher in areas with statistical over-policing of black drivers: when black drivers are already getting too many fines, police cite white drivers from whom they are presumably more likely to be able to raise the needed extra revenue. These results highlight the relationship between policing-as-taxation and racial inequality in policing outcomes.


Divided Government, Delegation, and Civil Service Reform” (with Massimo Morelli and Matia Vannoni), Political Science Research and Methods (2020). Abstract

This paper sheds new light on the drivers of civil service reform in U.S. states. We first demonstrate theoretically that divided government is a key trigger of civil service reform, providing nuanced predictions for specific configurations of divided government.
We then show empirical evidence for these predictions using data from the second half of the 20th century: states tended to introduce these reforms under divided government, and in particular when legislative chambers (rather than legislature and governor) were divided.


A research-based ranking of public policy schools” (with Miguel Urquiola), Scientometrics (2020). Abstract

This paper presents rankings of U.S. public policy schools based on their research publication output. In 2016 we collected the names of about 5,000 faculty members at 44 such schools. We use bibliographic databases to gather measures of the quality and quantity of these individuals’ academic publications. These measures include the number of articles and books written, the quality of the journals the articles have appeared in, and the number of citations all have garnered. We aggregate these data to the school level to produce a set of rankings. The results differ significantly from existing rankings, and in addition display substantial across-field variation.


Text classification of political ideology labels in judicial opinions” (with Carina Hausladen and Marcel Schubert), International Review of Law and Economics (2020). Abstract

This paper draws on machine learning methods for text classification to predict the ideological direction of decisions from the associated text. Using a 5% hand-coded sample of cases from U.S. Circuit Courts, we explore and evaluate a variety of machine classifiers to predict “conservative decision” or “liberal decision” in held-out data. Our best classifier is highly predictive (F1=.65) and allows us to extrapolate ideological direction to the full sample. We then use these predictions to replicate and extend Landes and Posner’s (2009) analysis of how the party of the nominating president influences circuit judge’s votes.

Automated Fact-Value Distinction in Court Opinions” (with Yu Cao and Daniel L. Chen), European Journal of Law and Economics (2020). Abstract

This paper studies the problem of automated classification of fact statements and value statements in written judicial decisions. We compare a range of methods and demonstrate that the linguistic features of sentences and paragraphs can be used to successfully classify them along this dimension. The Wordscores method by Laver et al. (2003) performs best in held out data. In an application, we show that the value segments of opinions are more informative than fact segments of the ideological direction of U.S. Circuit Court opinions.


Sequential decision-making with group identity” (with Jessica Van Parys), Journal of Economic Psychology (2018). Abstract

In sequential decision-making experiments, participants often conform to the decisions of others rather than reveal private information — resulting in less information produced and potentially lower payoffs for the group. This paper asks whether experimentally induced group identity affects players’ decisions to conform, even when payoffs are only a function of individual actions. As motivation for the experiment, we show that U.S. Supreme Court Justices in preliminary hearings are more likely to conform to their same-party predecessors when the share of predecessors from their party is high. Lab players, in turn, are more likely to conform to the decisions of in-group members when their share of in-group predecessors is high. We find that exposure to information from in-group members increases the probability of reverse information cascades (herding on the wrong choice), reducing average payoffs. Therefore, alternating decision-making across members of different groups may improve welfare in sequential decision-making contexts.


Emerging tools for a ‘driverless’ legal system: Comment,” Journal of Institutional and Theoretical Economics (2018).

On the behavioral economics of crime” (with Frans van Winden), Review of Law & Economics (2012).Abstract

This paper examines the implications of the brain sciences’ mechanistic model of human behavior for our understanding of crime. The standard rational-choice crime model is refined by a behavioral approach, which proposes a decision model comprising cognitive and emotional decision systems. According to the behavioral approach, a criminal is not irrational but rather ‘ecologically rational,’ outfitted with evolutionarily conserved decision modules adapted for survival in the human ancestral environment. Several important cognitive as well as emotional factors for criminal behavior are discussed and formalized, using tax evasion as a running example. The behavioral crime model leads to new perspectives on criminal policy-making.


Peer-Reviewed Conference Proceedings

Evaluating Document Representations for Content-based Legal Literature Recommendations” (with Malte Ostendorff, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, and Georg Rehm), ICAIL (2021).

Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincaré), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincaré citation embeddings. Combining fastText and Poincaré in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at this https URL.



Legal language modeling with transformers” (with Lazar Peric, Stefan Mijic, and Dominik Stammbach), Proceedings of ASAIL (2020). Abstract

We explore the use of deep learning algorithms to generate text in a professional, technical domain: the judiciary. Building on previous work that has focused on non-legal texts, we train auto-regressive transformer models to read and write judicial opinions. We show that survey respondents with legal expertise cannot distinguish genuine opinions from fake opinions generated by our models. However, a transformer-based classifier can distinguish machine- from human-generated legal text with high accuracy. These findings suggest how transformer models can support legal practice.


Unsupervised Extraction of Workplace Rights and Duties from Collective Bargaining Agreements” (with Jeff Jacobs, W. Bentley MacLeod, Suresh Naidu, and Dominik Stammbach), International Workshop on Mining and Learning in the Legal Domain (2020). Abstract

This paper describes an unsupervised legal document parser which performs a decomposition of labor union contracts into discrete assignments of rights and duties among agents of interest. We use insights from deontic logic applied to modal categories and other linguistic patterns to generate topic-specific measures of relative legal authority. We illustrate the consistency and efficiency of the pipeline by applying it to a large corpus of 35K contracts and validating the resulting outputs.


e-FEVER: Explanations and Summaries for Automated Fact Checking” (with Dominik Stammbach), Truth and Trust Online (2020). Abstract

This paper demonstrates the capability of a large pre-trained language model (GPT-3) to automatically generate explanations for fact checks. Given a claim and the retrieved potential evidence, our system summarizes the evidence and how it supports the fact-check determination. The system does not require any additional parameter training; instead, we use GPT-3’s analogical “few-shot-learning” capability, where we provide a task description and some examples of solved tasks. We then subsequently ask the model to explain new fact checks. Besides providing an intuitive and compressed summary for downstream users, we show that the machine-generated explanations can themselves serve as evidence for automatically making true/false determinations. Along the way, we report new state-of-the-art fact-checking results for the FEVER dataset. Finally, we make the explanations corpus publicly accessible, providing the first large-scale resource for explainable automated fact checking.


Entropy in Legal Language” (with Roland Friedrich and Mauro Luzzatto), NLLP @ KDD (2020). Abstract

We introduce a novel method to measure word ambiguity, i.e. local entropy, based on a neural language model. We use the measure to investigate entropy in the written text of opinions published by the U.S. Supreme Court (SCOTUS) and the German Bundesgerichtshof (BGH), representative courts of the common-law and civil-law court systems respectively. We compare the local (word) entropy measure with a global (document) entropy measure constructed with a compression algorithm. Our method uses an auxiliary corpus of parallel English and German to adjust for persistent differences in entropy due to the languages. Our results suggest that the BGH’s texts are of lower entropy than the SCOTUS’s. Investigation of low- and high-entropy features suggests that the entropy differential is driven by more frequent use of technical language in the German court.


Other (Not Peer-Reviewed)

The Making of International Tax Law: Evidence from Tax Treaties Text” (with Omri Marian), Florida Tax Review (2020). Abstract

We offer the first attempt at empirically testing the level of transnational consensus on the legal language controlling international tax matters. We also investigate the institutional framework of such consensus-building. We build a dataset of 4,052 bilateral income tax treaties, as well as 16 model tax treaties published by the United Nations (UN), Organisation for Economic Co-operation and Development (OECD) and the United States. We use natural language processing to perform pair-wise comparison of all treaties in effect at any given year. We identify clear trends of convergence of legal language in bilateral tax treaties since the 1960s, particularly on the taxation of cross-border business income. To explore the institutional source of such consensus, we compare all treaties in effect at any given year to the model treaties in effect during that year. We also explore whether newly concluded treaties converge towards legal language in newly introduced models. We find the OECD Model Tax Convention (OECD Model) to have a significant influence. In the years following the adoption of a new OECD Model there is a clear trend of convergence in newly adopted bilateral tax treaties towards the language of the new OECD Model. We also find that model treaties published by the UN (UN Model) have little immediate observable effect, though UN treaty policies seem to have a delayed, yet lasting effect. We conclude that such findings support the argument that a trend towards international legal consensus on certain tax matters exists, and that the OECD is the institutional source of the consensus building process.


Automated Classification of Modes of Moral Reasoning in Judicial Decisions” (with Nischal Mainali, Liam Meier, and Daniel L. Chen), in: Computational legal studies: The promise and challenge of data-driven research, Edward Elgar (2020).

Case vectors: Spatial representations of the law using document embeddings” (with Daniel L. Chen), in: Law as Data, Santa Fe Institute Press (2019). Abstract

Recent work in natural language processing represents language objects (words and documents) as dense vectors that encode the relations between those objects. This paper explores the application of these methods to legal language, with the goal of understanding judicial reasoning and the relations between judges. In an application to federal appellate courts, we show that these vectors encode information that distinguishes courts, time, and legal topics. The vectors do not reveal spatial distinctions in terms of political party or law school attended, but they do highlight generational differences across judges. We conclude the paper by outlining a range of promising future applications of these methods.


Judge, Jury, and EXEcute File: The brave new world of legal automation,” Social Market Foundation (2018).

What kind of judge is Brett Kavanaugh? A quantitative analysis” (with Daniel L. Chen), Cardozo Law Review Online (2018). Abstract

This article reports the results of a series of data analyses of how recent Supreme Court nominee Brett Kavanaugh compares to other potential Supreme Court nominees and current Supreme Court Justices in his judging style. The analyses reveal a number of ways in which Judge Kavanaugh differs systematically from his colleagues. First, Kavanaugh dissents and is dissented against along partisan lines. More than other Judges and Justices, Kavanaugh dissents at a higher rate during the lead-up to elections, suggesting that he feels personally invested in national politics. Far more often than his colleagues, he justifies his decisions with conservative doctrines, including politicized precedents that tend to be favored by Republican-appointed judges, the original Articles of the Constitution, and the language of economics and free markets. These findings demonstrate the usefulness of quantitative analysis in the evaluation of judicial nominees.