Elliott Ash
Associate Professor, ETH Zurich · Law | Economics | Data Science
Welcome to my web page. I am a professor at ETH Zurich and Scientific Lead in the Swiss AI Initiative. I am a CEPR Research Affiliate in Political Economy, Associate Editor at Economic Journal, Co-Editor at Journal of Law and Economics, and ETH AI Center Core Faculty. I received an ERC Starting Grant.
I held previous appointments at New York University (Scholar in Residence), University of Warwick (Assistant Professor), and Princeton University (Postdoc). I earned a Ph.D. in Economics and J.D. from Columbia University, B.A. (Plan II Honors) from University of Texas at Austin, and LL.M. in International Criminal Law from University of Amsterdam.
Topics: Law and Economics, Political Economy · Methods: Econometrics, Natural Language Processing, Machine Learning
Seminars & Conferences
Teaching Materials
- Text Data in Economics (2022) · GitHub
- Building a Robot Judge: Data Science for Decision-Making
- Language Models for Law and Social Science
- "Text algorithms in economics" (with Stephen Hansen), Annual Review of Economics (2023)
- "Large Language Models in Economics" (with Stephen Hansen and Yabra Muvdi), New Palgrave Dictionary of Economics (2024) · Companion GitHub Repo
Recent Working Papers
(with Benjamin Kohler, David Zollikofer, Johanna Einsiedler, and Alexander Hoyle)
› Abstract
Recent work has used LLM agents to reproduce empirical social science results with access to both the data and code. We broaden this scope by asking: Can they reproduce results given only a paper's methods description and original data? We develop an agentic reproduction system that extracts structured methods descriptions from papers, runs reimplementations under strict information isolation—agents never see the original code, results, or paper—and enables deterministic, cell-level comparison of reproduced outputs to the original results. An error attribution step traces discrepancies through
(with Benjamin Arold, W. Bentley MacLeod, and Suresh Naidu), R&R at Quarterly Journal of Economics
› Abstract
Interviews allow employers to learn about workers, but do they also enable workers to learn about firms? Studying 500,000 interview reports from Glassdoor, we find candidates for high-paying jobs are more likely to reject a job offer if they believe the interview was easy. Easy interviews appear to convey poor "fit" as those who accept offers after easy interviews are two-fifths of a standard deviation less satisfied with their jobs and 10 percent less likely to remain with their employer for at least one year. Analysis of interview narratives using large language models reveals difficult inte
(with Soumitra Shukla and Jason Sockin)
› Abstract
We study the effect of televised broadcasts of floor debates on the rhetoric and behavior of U.S. Congress Members. First, we show in a differences-in-differences analysis that the introduction of C-SPAN broadcasts in 1979 increased the use of emotional appeals in the House relative to the Senate, where televised floor debates were not introduced until later. Second, we use exogenous variation in C-SPAN channel positioning as an instrument for C-SPAN viewership by Congressional district and show that House Members from districts with exogenously higher C-SPAN viewership are more emotive in flo
(with Gloria Gennaro)
› Abstract
We study the effect of televised broadcasts of floor debates on the rhetoric and behavior of U.S. Congress Members. First, we show in a differences-in-differences analysis that the introduction of C-SPAN broadcasts in 1979 increased the use of emotional appeals in the House relative to the Senate, where televised floor debates were not introduced until later. Second, we use exogenous variation in C-SPAN channel positioning as an instrument for C-SPAN viewership by Congressional district and show that House Members from districts with exogenously higher C-SPAN viewership are more emotive in flo
(with Sergio Galletta and Giacomo Opocher), R&R at Economic Journal
› Abstract
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, str
(with many co-authors), ACL (2026)
› Abstract
This paper empirically studies the effects of the early law-and-economics movement on the U.S. judiciary. We focus on the Manne Economics Institute for Federal Judges, an intensive economics course that trained almost half of federal judges between 1976 and 1999. Using the universe of published opinions in U.S. Circuit Courts and 1 million District Court criminal sentencing decisions, we estimate the within-judge effect of Manne program attendance. Selection into attendance was limited, as the program was popular among judges of all backgrounds, frequently oversubscribed, and admitted particip
Selected Publications — Economics
(with Daniel L. Chen and Suresh Naidu), Quarterly Journal of Economics (2026)
› Abstract
This paper empirically studies the effects of the early law-and-economics movement on the U.S. judiciary. We focus on the Manne Economics Institute for Federal Judges, an intensive economics course that trained almost half of federal judges between 1976 and 1999. Using the universe of published opinions in U.S. Circuit Courts and 1 million District Court criminal sentencing decisions, we estimate the within-judge effect of Manne program attendance. Selection into attendance was limited, as the program was popular among judges of all backgrounds, frequently oversubscribed, and admitted particip
› Abstract
This paper analyzes the conditions under which more legislation contributes to economic growth. In the context of U.S. states, we apply natural language processing tools to measure legislative flows for the years 1965-2012. We implement a novel shift-share design for text data, where the instrument for legislation is leave-one-out legal-topic flows interacted with pre-treatment legal-topic shares. We find that at the margin, higher legislative output causes more economic growth. Consistent with more complete laws reducing ex-post hold-up, we find that the effect is driven by the use of conting
› Abstract
We study judicial in-group bias in Indian criminal courts using a newly collected dataset on over 5 million criminal case records from 2010-2018. After detecting gender and religious identity using a neural-net classifier applied to judge and defendant names, we exploit quasi-random assignment of cases to judges to examine whether defendant outcomes are affected by assignment to a judge with a similar identity. In the aggregate, we estimate tight zero effects of in-group bias based on shared gender, religion, and last name (a proxy for caste). We do find limited in-group bias in some (but not
› Abstract
We study judicial in-group bias in Indian criminal courts using a newly collected dataset on over 5 million criminal case records from 2010-2018. After detecting gender and religious identity using a neural-net classifier applied to judge and defendant names, we exploit quasi-random assignment of cases to judges to examine whether defendant outcomes are affected by assignment to a judge with a similar identity. In the aggregate, we estimate tight zero effects of in-group bias based on shared gender, religion, and last name (a proxy for caste). We do find limited in-group bias in some (but not
(with Clémentine Abed Meraim, Philine Widmer, and Sergio Galletta), Economic Journal (conditionally accepted)
› Abstract
Anecdotal evidence often points to aging as a cause for reduced work performance. This paper provides empirical evidence on this issue in a context where performance is measurable and there is variation in mandatory retirement policies: U.S. state supreme courts. We find that introducing mandatory retirement reduces the average age of working judges and improves court performance, as measured by output (number of published opinions) and legal impact (number of forward citations to those opinions). Consistent with aging effects as a contributing factor, we find that older judges do about the sa
This paper examines the diffusion of
› Abstract
Anecdotal evidence often points to aging as a cause for reduced work performance. This paper provides empirical evidence on this issue in a context where performance is measurable and there is variation in mandatory retirement policies: U.S. state supreme courts. We find that introducing mandatory retirement reduces the average age of working judges and improves court performance, as measured by output (number of published opinions) and legal impact (number of forward citations to those opinions). Consistent with aging effects as a contributing factor, we find that older judges do about the sa
› Abstract
Local exposure to conservative news causes judges to impose harsher criminal sentences. Our evidence comes from an instrumental variables analysis, where randomness in television channel positioning across localities induces exogenous variation in exposure to Fox News Channel. These treatment data on news viewership are taken to outcomes data on almost 7 million criminal sentencing decisions in the United States for the years 2005-2017. Higher Fox News viewership increases incarceration length, and the effect is stronger for black defendants and for drug-related crimes. We can rule out changes
› Abstract
Local exposure to conservative news causes judges to impose harsher criminal sentences. Our evidence comes from an instrumental variables analysis, where randomness in television channel positioning across localities induces exogenous variation in exposure to Fox News Channel. These treatment data on news viewership are taken to outcomes data on almost 7 million criminal sentencing decisions in the United States for the years 2005-2017. Higher Fox News viewership increases incarceration length, and the effect is stronger for black defendants and for drug-related crimes. We can rule out changes
› Abstract
We use computational linguistics techniques to study the use of emotion and reason in political discourse. Our new measure of emotionality in language combines lists of emotive and cognitive words, as well as word embeddings, to construct a text-based scale between emotion and reason. After validating the method against human annotations, we apply it to scale 6 million speeches in the U.S. Congressional Record for the years 1858 through 2014. Intuitively, emotionality spikes during times of war and is highest for patriotism-related topics. In the time series, emotionality was relatively low an
› Abstract
Are non-verbal reactions during parliamentary debate gendered? Do male and female Members of Parliament (MPs) experience applause or jeering differently? In short, yes, and the gendered nature of a speech matters. Using an original corpus of over 544,000 speeches given in German state parliaments, we first estimate the gendered nature of parliamentary speeches, then examine how reactions to speeches given by male and female MPs differ. Female and male MPs receive similarly positive and negative reactions to their speeches on average, but they receive different reactions depending on the gender
Selected Publications — Political Science
(with Johann Kruemmel and Jonathan B. Slapin), American Journal of Political Science (2024)
› Abstract
Social scientists have become increasingly interested in how narratives — the stories in fiction, politics, and life — shape beliefs, behavior, and government policies. This paper provides an unsupervised method to quantify latent narrative structures in text documents. Our pipeline identifies coherent entity groups and maps explicit relations between them in the text. We provide an application to the United States Congressional Record to analyze political and economic narratives in recent decades. Our analysis highlights the dynamics, sentiment, polarization, and interconnectedness of narrati
relatio open-source narrative mining package. . "" (with Massimo Morelli and Moritz Osnabruegge), Political Analysis (2021)
› Abstract
Social scientists have become increasingly interested in how narratives — the stories in fiction, politics, and life — shape beliefs, behavior, and government policies. This paper provides an unsupervised method to quantify latent narrative structures in text documents. Our pipeline identifies coherent entity groups and maps explicit relations between them in the text. We provide an application to the United States Congressional Record to analyze political and economic narratives in recent decades. Our analysis highlights the dynamics, sentiment, polarization, and interconnectedness of narrati
› Abstract
Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. The algorithm works as follows: at inference time, N samples are drawn from the language model, and the sample with the highest reward, as judged by a reward model, is returned as the output. Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor of N. To make BoN more efficient at inference time, one strategy is to fine-tune the language model to mimic what BoN does during inference. To achieve this, we derive the distribution induced by the
Selected Publications — AI/ML/NLP
(with Afra Amini, Tim Vieira Ryan Cotterell), ICLR (2025) Abstract Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. The algorithm works as follows: at inference time, N samples are drawn from the language model, and the sample with the highest reward, as judged by a reward model, is returned as the output. Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor of N. To make BoN more efficient at inference time, one strategy is to fine-tune the language model to mimic what BoN does during inference. To achieve this, we derive the distribution induced by the BoN algorithm. We then propose to fine-tune the language model to minimize backward KL divergence to the BoN distribution. Our approach is analogous to mean-field variational inference and, thus, we term it variational BoN (vBoN). To the extent this fine-tuning is successful and we end up with a good approximation, we have reduced the inference cost by a factor of N. Our experiments on controlled generation and summarization tasks show that BoN is the most effective alignment method, and our variational approximation to BoN achieves the closest performance to BoN and surpasses models fine-tuned using the standard KL-constrained RL objective. In the controlled generation task, vBoN appears more frequently on the Pareto frontier of reward and KL divergence compared to other alignment methods. In the summarization task, vBoN achieves high reward values across various sampling temperatures
› Abstract
Large language models such as ChatGPT exhibit striking political biases. If users query them about political information, they often take a normative stance. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Models aligned with this data can generate more accurate political viewpoints from Swiss parties, compared to commercial models such as ChatGPT. We also propose a procedure to generate balanced overviews summarizing multiple viewpoints using such models. The replication package conta
(with Dominik Stammbach, Philine Widmer, Eunjung Cho, Caglar Gulcehre), EMNLP (2024)
› Abstract
We present the Legal Passage Retrieval Dataset, LePaRD. LePaRD contains millions of examples of U.S. federal judges citing precedent in context. The dataset aims to facilitate work on legal passage retrieval, a challenging practice-oriented legal retrieval and reasoning task. Legal passage retrieval seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various approaches on LePaRD, and find that classification-based retrieval appears to work best. Our best models only achieve a recall of 59% when trained on data corr
(with Robert Mahari, Dominik Stammbach, and Alex Pentland), ACL (2024)
› Abstract
Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span levels. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert an
(with Maria Antoniak, Joel Mire, Andrew Piper, and Maarten Sap), ACL (2024)
› Abstract
Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span levels. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert an
(with Jingwei Ni, Minjing Shi, Dominik Stammbach, Mrinmaya Sachan, Markus Leippold), ACL (2024)
› Abstract
Topic models are used to make sense of large text collections. However, automatically evaluating topic model output and determining the optimal number of topics both have been longstanding challenges, with no effective automated solutions to date. This paper proposes using large language models to evaluate such output. We find that large language models appropriately assess the resulting topics, correlating more strongly with human judgments than existing automated metrics. We then investigate whether we can use large language models to automatically determine the optimal number of topics. We
(with Dominik Stammbach, Vilém Zouhar, Alexander Hoyle, and Mrinmaya Sachan), EMNLP (2023)
› Abstract
Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully alig
› Abstract
Topic models are used to make sense of large text collections. However, automatically evaluating topic model output and determining the optimal number of topics both have been longstanding challenges, with no effective automated solutions to date. This paper proposes using large language models to evaluate such output. We find that large language models appropriately assess the resulting topics, correlating more strongly with human judgments than existing automated metrics. We then investigate whether we can use large language models to automatically determine the optimal number of topics. We
(with Florian Dorner, Momchil Peychev, Nikola Konstantinov, Naman Goel, Martin Vechev), ICLR (2023)
(with Nianlong Gu and Richard Hahnloser), ACL (2022)
› Abstract
Trademark law protects marks in order to enable firms to signal their products’ qualities to consumers. To qualify for protection, a mark must be able to identify and distinguish goods. U.S. courts typically locate a mark on a “spectrum of distinctiveness” – known as the Abercrombie spectrum – that categorizes marks as fanciful, arbitrary, or suggestive, and thus as “inherently distinctive,” or as descriptive or generic, and thus as not inherently distinctive. This paper explores whether locating trademarks on the Abercrombie spectrum can be automated using current natural-language processing
Selected Publications — Law
(with Shivam Adarsh, Stefan Bechtold, Barton Beebe and Jeanne Fromer), Journal of Empirical Legal Studies (forthcoming)
› Abstract
Trademark law protects marks in order to enable firms to signal their products’ qualities to consumers. To qualify for protection, a mark must be able to identify and distinguish goods. U.S. courts typically locate a mark on a “spectrum of distinctiveness” – known as the Abercrombie spectrum – that categorizes marks as fanciful, arbitrary, or suggestive, and thus as “inherently distinctive,” or as descriptive or generic, and thus as not inherently distinctive. This paper explores whether locating trademarks on the Abercrombie spectrum can be automated using current natural-language processing
(with Christoph Goessmann and Suresh Naidu), Philosophical Transactions of the Royal Society A (2024)
› Abstract
Law sets out the rules for society and the economy, particularly important for interactions between strangers. Legal code is a form of non-rival infrastructure, a public good important for investment and innovation. This paper investigates whether legal code complexity scales with population size in US localities. We analyse a corpus of municipal codes from 3259 cities and measure legal complexity using various metrics, including number of words, bytes, and compressed bytes. We find that legal complexity scales geometrically with jurisdiction population, with a scaling parameter of approximate
(with Aniket Kesari, Suresh Naidu, Lena Song, and Dominik Stammbach), ACM Symposium on Computer Science and Law (2024)
› Abstract
A widely held view for why the Supreme Court would be right to revive the nondelegation doctrine is that Congress has perverse incentives to abdicate its legislative role and evade accountability through the use of delegations, either expressly delineated or implied through statutory imprecision, and that enforcement of the nondelegation doctrine would correct for those incentives. We call this the Field of Dreams Theory—if we build the nondelegation doctrine, Congress will legislate. Unlike originalist arguments for the revival of the nondelegation doctrine, this theory has widespread appeal
(with Daniel Walters), Cornell Law Review (2023) Abstract A widely held view for why the Supreme Court would be right to revive the nondelegation doctrine is that Congress has perverse incentives to abdicate its legislative role and evade accountability through the use of delegations, either ex
› Abstract
Judicial opinions are written to be persuasive and could build public trust in court decisions, yet they can be difficult for non-experts to understand. We present a pipeline for using an AI assistant to generate simplified summaries of judicial opinions. Compared to existing expert-written summaries, these AI-generated simple summaries are more accessible to the public and more easily understood by non-experts. We show in a survey experiment that the AI summaries help respondents understand the key features of a ruling, and have higher perceived quality, especially for respondents with less f
(with Jeffrey A. Fagan), Georgetown Law Journal Online (2017)
› Abstract
A widely held view for why the Supreme Court would be right to revive the nondelegation doctrine is that Congress has perverse incentives to abdicate its legislative role and evade accountability through the use of delegations, either expressly delineated or implied through statutory imprecision, and that enforcement of the nondelegation doctrine would correct for those incentives. We call this the Field of Dreams Theory—if we build the nondelegation doctrine, Congress will legislate. Unlike originalist arguments for the revival of the nondelegation doctrine, this theory has widespread appeal
More Working Papers
(with Ruben Durante, Maria Grebenshchikova, and Carlo Schwarz), Reject & Resubmit at Economic Journal
› Abstract
Issues of racial justice and economic inequalities between racial and ethnic groups have risen to the top of public debate. Economists' ability to contribute to these debates is based on the body of race-related research. We study the volume and content of race-related research in economics. We base our analysis on a corpus of 225,000 economics publications from 1960 to 2020 to which we apply an algorithmic approach to classify race-related work. We present three new facts. First, since 1960 less than 2% of economics publications have been race-related. There is an uptick in such work in the m
› Abstract
This paper studies racial in-group disparities in Wisconsin, which has one of the highest Black-to-White incarceration rates among all U.S. states. The analysis is motivated by a model in which a judge may want to incarcerate more due to three factors: (1) taste-based preferences about the defendant's group identity; (2) higher recidivism risk where the defendant is more likely to commit future crimes; and (3) image motives stemming from the defendant being in the same group as the judge. Further, a judge may have better information on recidivism risk due to two factors: (4) becoming more expe
(with Claudia Marangon)
› Abstract
This paper studies racial in-group disparities in Wisconsin, which has one of the highest Black-to-White incarceration rates among all U.S. states. The analysis is motivated by a model in which a judge may want to incarcerate more due to three factors: (1) taste-based preferences about the defendant's group identity; (2) higher recidivism risk where the defendant is more likely to commit future crimes; and (3) image motives stemming from the defendant being in the same group as the judge. Further, a judge may have better information on recidivism risk due to two factors: (4) becoming more expe
(with David Cai, Mirko Draca, and Shaoyu Liu)
› Abstract
We distinguish between ideational and interest-based appeals to voters on the supply side of politics, integrating the Keynes-Hayek perspective on the importance of ideas with the Stigler-Becker approach emphasizing vested interests. In our model, political entrepreneurs discover identity and worldview “memes” (narratives, cues, frames) that invoke voters' identity concerns or shift their views of how the world works. We identify a complementarity between worldview politics and identity politics and illustrate how they may reinforce each other. Furthermore, we show how adverse economic shocks
(with Sharun Mukand and Dani Rodrik)
› Abstract
Empirical work on political communication has so far left out a potentially pivotal dimension – the unspoken emotional responses indicated by facial expressions. This paper shows how to measure these responses using deep-learning-based computer-vision algorithms in the context of U.S. cable news video. Using machine-generated metrics for expressed emotion, combined with mentions of politically divisive entities, we estimate the difference in emotion when cable-channel speakers hear mentions of entities that are from the opposing side of the political divide. We find that the most responsive em
(with Cantay Caliskan)
› Abstract
Empirical work on political communication has so far left out a potentially pivotal dimension – the unspoken emotional responses indicated by facial expressions. This paper shows how to measure these responses using deep-learning-based computer-vision algorithms in the context of U.S. cable news video. Using machine-generated metrics for expressed emotion, combined with mentions of politically divisive entities, we estimate the difference in emotion when cable-channel speakers hear mentions of entities that are from the opposing side of the political divide. We find that the most responsive em
› Abstract
This paper presents novel causal evidence on the relationship between various communication channels employed by central banks and households' expectations about future inflation. In a pre-registered randomized survey experiment administered in 2022, we examine adjustment of inflation expectations when confronted with a press conference statement by the president of the European Central Bank (ECB) articulating the bank's commitment to a 2% inflation target. First, we replicate previous literature showing that respondents update toward the inflation target. Second, we show that the medium of co
Peer-Reviewed Journal Articles
(with Heiner Mikosch, Alexis Perakis, and Samad Sarferaz), European Economic Review (2026)
› Abstract
This paper presents novel causal evidence on the relationship between various communication channels employed by central banks and households' expectations about future inflation. In a pre-registered randomized survey experiment administered in 2022, we examine adjustment of inflation expectations when confronted with a press conference statement by the president of the European Central Bank (ECB) articulating the bank's commitment to a 2% inflation target. First, we replicate previous literature showing that respondents update toward the inflation target. Second, we show that the medium of co
(with Sergio Galletta, Matteo Pinna, and Christopher Warshaw), Journal of Public Economics (2024)
› Abstract
An important philosophical tradition identifies persons as those entities that have minds, such that mind perception is a window into person perception. Psychological research has found that human perceptions of mind consist of at least two distinct dimensions: agency (e.g. planning, deciding) and experience (e.g. feeling, hungering). Taking this insight into the semantic space of natural language, we develop a generalizable, scalable computational-linguistics method for measuring variation in perceived agency and experience in large archives of plain-text documents. The resulting text-based r
› Abstract
An important philosophical tradition identifies persons as those entities that have minds, such that mind perception is a window into person perception. Psychological research has found that human perceptions of mind consist of at least two distinct dimensions: agency (e.g. planning, deciding) and experience (e.g. feeling, hungering). Taking this insight into the semantic space of natural language, we develop a generalizable, scalable computational-linguistics method for measuring variation in perceived agency and experience in large archives of plain-text documents. The resulting text-based r
(with Sergio Galletta, Dominik Hangartner, Yotam Margalit, and Matteo Pinna) Political Analysis (2023)
› Abstract
Socialist courts are supposed to apply the law, not make it, and socialist legality denies judicial decisions any precedential status. In 2011, however, the Chinese Supreme People’s Court designated selected decisions as Guiding Cases to be referred to by all judges when adjudicating similar disputes. One decade on, the paucity of citations to Guiding Cases has been taken as demonstrating the incongruity of case-based adjudication and socialist legality.
› Abstract
Should technocratic public officials be selected through politics or by merit? This paper explores how selection procedures influence the quality of selected officials in the context of U.S. state supreme courts for the years 1947-1994. In a unique set of natural experiments, state governments enacted a variety of reforms making judicial elections less partisan and establishing merit-based procedures that delegate selection to experts. We compare post-reform judges to pre-reform judges in their work quality, measured by forward citations to their opinions. In this setting we can hold constant
› Abstract
Bureaucratic discretion and executive delegation are central topics in political economy and political science. The previous empirical literature has measured discretion and delegation by manually coding large bodies of legislation. Drawing from computational linguistics, we provide an automated procedure for measuring discretion and delegation in legal texts to facilitate large-scale empirical analysis. The method uses information in syntactic parse trees to identify legally relevant provisions, as well as agents and delegated actions. We undertake two applications. First, we produce a measur
› Abstract
This paper provides a theoretical and empirical analysis of the intrinsic preferences of state appellate court judges. We construct a panel data set using published decisions from state supreme court cases merged with institutional and biographical information on all (1,636) state supreme court judges for the 50 states of the United States from 1947 to 1994. We estimate the effects of changes in judge employment conditions on a number of measures of judicial performance. The results are consistent with the hypothesis that judges are intrinsically motivated to provide high-quality decisions, an
Mindfulness meditation has been found to influence various important outcomes such as health, stress, de
› Abstract
Should technocratic public officials be selected through politics or by merit? This paper explores how selection procedures influence the quality of selected officials in the context of U.S. state supreme courts for the years 1947-1994. In a unique set of natural experiments, state governments enacted a variety of reforms making judicial elections less partisan and establishing merit-based procedures that delegate selection to experts. We compare post-reform judges to pre-reform judges in their work quality, measured by forward citations to their opinions. In this setting we can hold constant
This paper provides a general method for analyzing the sentiments ex
› Abstract
Bureaucratic discretion and executive delegation are central topics in political economy and political science. The previous empirical literature has measured discretion and delegation by manually coding large bodies of legislation. Drawing from computational linguistics, we provide an automated procedure for measuring discretion and delegation in legal texts to facilitate large-scale empirical analysis. The method uses information in syntactic parse trees to identify legally relevant provisions, as well as agents and delegated actions. We undertake two applications. First, we produce a measur
› Abstract
This paper presents rankings of U.S. public policy schools based on their research publication output. In 2016 we collected the names of about 5,000 faculty members at 44 such schools. We use bibliographic databases to gather measures of the quality and quantity of these individuals' academic publications. These measures include the number of articles and books written, the quality of the journals the articles have appeared in, and the number of citations all have garnered. We aggregate these data to the school level to produce a set of rankings. The results differ significantly from existing
› Abstract
In sequential decision-making experiments, participants often conform to the decisions of others rather than reveal private information — resulting in less information produced and potentially lower payoffs for the group. This paper asks whether experimentally induced group identity affects players' decisions to conform, even when payoffs are only a function of individual actions. As motivation for the experiment, we show that U.S. Supreme Court Justices in preliminary hearings are more likely to conform to their same-party predecessors when the share of predecessors from their party is high.
Peer-Reviewed Conference Proceedings
(with Emmanuel Bauer, Dominik Stammbach, and Nianlong Gu), LIRAI (2023)
› Abstract
This paper tackles the task of legal extractive summarization using a dataset of 430K U.S. court opinions with key passages annotated. According to automated summary quality metrics, the reinforcement-learning-based MemSum model is best and even out-performs transformer-based models. In turn, expert human evaluation shows that MemSum summaries effectively capture the key points of lengthy court opinions. Motivated by these results, we open-source our models to the general public. This represents progress towards democratizing law and making U.S. court opinions more accessible to the general pu
› Abstract
This paper describes an unsupervised legal document parser which performs a decomposition of labor union contracts into discrete assignments of rights and duties among agents of interest. We use insights from deontic logic applied to modal categories and other linguistic patterns to generate topic-specific measures of relative legal authority. We illustrate the consistency and efficiency of the pipeline by applying it to a large corpus of 35K contracts and validating the resulting outputs.
(with Yan Liu, Yan Gao, Zhe Su, Xiaokang Chen, Jian-Guang Lou), ACL (2023)
› Abstract
Notwithstanding the widely held view that data generation and data curation processes are prominent sources of bias in machine learning algorithms, there is little empirical research seeking to document and understand the specific data dimensions affecting algorithmic unfairness. Contra the previous work, which has focused on modeling using simple, small-scale benchmark datasets, we hold the model constant and methodically intervene on relevant dimensions of a much larger, more diverse dataset. For this purpose, we introduce a new dataset on recidivism in 1.5 million criminal cases from courts
(with Dominik Stammbach and Maria Antoniak), Workshop on Narrative Understanding (2022)
› Abstract
This paper describes an unsupervised legal document parser which performs a decomposition of labor union contracts into discrete assignments of rights and duties among agents of interest. We use insights from deontic logic applied to modal categories and other linguistic patterns to generate topic-specific measures of relative legal authority. We illustrate the consistency and efficiency of the pipeline by applying it to a large corpus of 35K contracts and validating the resulting outputs.
Source code and replication package. . "" (with Dominik Stammbach), TTO (2020)
› Abstract
How does economics compare to other social sciences in its study of issues related to race and ethnicity? We assess this using a corpus of 500,000 academic publications in economics, political science, and sociology. Using an algorithmic approach to classify race-related publications, we document that economics lags far behind the other disciplines in the volume and share of race-related research, despite having higher absolute volumes of research output. Since 1960, there have been 13,000 race-related publications in sociology, 4,000 in political science, and 3,000 in economics. Since around
Other Publications (Not Peer-Reviewed)
(with Arun Advani, David Cai, and Imran Rasul), Econometric Society Monograph Series (forthcoming)
› Abstract
We offer the first attempt at empirically testing the level of transnational consensus on the legal language controlling international tax matters. We also investigate the institutional framework of such consensus-building. We build a dataset of 4,052 bilateral income tax treaties, as well as 16 model tax treaties published by the United Nations (UN), Organisation for Economic Co-operation and Development (OECD) and the United States. We use natural language processing to perform pair-wise comparison of all treaties in effect at any given year. We identify clear trends of convergence of legal
(with Daniel L. Chen), in: Law as Data, Santa Fe Institute
