Back to Home

Influential Papers

A curated collection of academic papers and articles that influence my research and thinking across various domains.

Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
Positions and Visions

Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models

Graham M Jones, Shai Satran, Arvind Satyanarayan

Advocates for understanding LLM behavior as indicative of nuances in human social behavior.

Can Large Language Models Transform Computational Social Science?
AI Tools for Human Knowledge

Can Large Language Models Transform Computational Social Science?

Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang

Nice overview of the AI for computational social sciences landscape.

Google and TikTok rank bundles of information; ChatGPT ranks grains.
Positions and Visions

Google and TikTok rank bundles of information; ChatGPT ranks grains.

Nick Vincent

Interesting analysis that Google, Tiktok, and ChatGPT are all ranking information, but the former two rank more bundled information than the latter. Bundled information has clearer economic, social, and institutional properties, such as notions of originality, labor, etc. Many 'fixes' to AI involve bundling information. Information-bundling and -splitting is therefore relevant for thinking about the economics of AI.

Deep Learning is Not So Mysterious or Different
Representation Learning

Deep Learning is Not So Mysterious or Different

Andrew Gordon Wilson

Super interesting and illuminating perspective explaining why supposedly deep-learning-unique phenomena like deep double descent, overparametrization, etc. can be explained using soft inductive biases and existing generalization frameworks. The references are a treasure trove!

Code Shaping: Iterative Code Editing with Free-form AI-Interpreted Sketching
Human-AI Interaction

Code Shaping: Iterative Code Editing with Free-form AI-Interpreted Sketching

Ryan Yen, Jian Zhao, Daniel Vogel

Cool classic HCI-style exploration into a new interaction for code editing using visual sketching, interpreted by VLMs.

Discovering Latent Knowledge in Language Models Without Supervision
Interpretability

Discovering Latent Knowledge in Language Models Without Supervision

Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

A method to probe structure in language models without any notion of ground truth, relying instead on the consistency property of tru statements.

Scaling and evaluating sparse autoencoders
Interpretability

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, et al.

Really nice technical details and knowledge on training and understanding large sparse autoencoders,

Sparse Autoencoders for Hypothesis Generation
AI Tools for Human Knowledge

Sparse Autoencoders for Hypothesis Generation

Rajiv Movva, Kenny Peng, Nikhil Garg, Jon Kleinberg, Emma Pierson

This paper uses sparse autoencoder features to identify possible hypotheses to explain relationships between text and a dependent variable.

Unsupervised Elicitation of Language Models
Open-ended Modeling

Unsupervised Elicitation of Language Models

Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, et al.

Interesting way to automatically label datasets using mutual predictability and logical consistency.

Why Isn't There More Progress in Philosophy?
Philosophy

Why Isn't There More Progress in Philosophy?

David J. Chalmers

Asks an interesting question and gives some helpful directions to start thinking about what progress means in philosophy and how 'premise deniability' might be a relevant factor.

Self-reports are better measurement instruments than implicit measures
Miscellaneous

Self-reports are better measurement instruments than implicit measures

Olivier Corneille, Bertram Gawronski

Challenges the assumption that implicit measures are superior to self-reports in psychological research. Argues that self-reports demonstrate greater reliability, stronger predictive validity for both deliberate and spontaneous behaviors, and unmatched flexibility in exploring complex psychological constructs.

Cognitive Behaviors that Enable Self-Improving Reasoners
Representation Learning

Cognitive Behaviors that Enable Self-Improving Reasoners

Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman

Identifies four cognitive behaviors (verification, backtracking, subgoal setting, backward chaining) that predict whether a model can self-improve via RL. The key finding is striking: it's the presence of reasoning behaviors, not answer correctness, that matters. Models exposed to training data with proper reasoning patterns -- even incorrect answers -- matched the improvement of models that had these behaviors naturally. A useful framing for thinking about what 'reasoning' actually is in these systems.

Backpack Language Models
Concept-structured AI

Backpack Language Models

John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang

By creating an LM architecture in which input tokens have a direct log-linear effect on the output, we can intervene precisely on the model output.

AI and the Demise of College Writing
Positions and Visions

AI and the Demise of College Writing

Adam Walker

Advocates for rhetoric over composition as the methodology for writing pedagogy in the AI era.

Proofs and Refutations: The Logic of Mathematical Discovery
Philosophy and History of Math

Proofs and Refutations: The Logic of Mathematical Discovery

Imre Lakatos

Uses an extremely entertaining and well-focused example of understanding the 'V - E + F = 2' Euler characterization of polyhedra to illustrate how mathemematics develops -- dialectically between criticism (proof-analysis, refutation, counterexamples) and development (proof, lemma-incorporation, etc.). Entertainingly written as a Socratic conversation among a classroom of incredibly bright students.

Large Concept Models: Language Modeling in a Sentence Representation Space
Concept-structured AI

Large Concept Models: Language Modeling in a Sentence Representation Space

Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, et al.

Language modeling operating in embedding rather than token space.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Interpretability

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Trenton Bricken*, Adly Templeton*, Joshua Batson*, Brian Chen*, Adam Jermyn*, et al.

Really incredible work on discovering and visualizing feature decompositions of neuron layers with sparse autoencoders. Gorgeous visualizations and interfaces, and thoughtful reflections on interpretability methodology. I am a big fan of this publication style.

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
Human-AI Interaction

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, et al.

A recognition that not only must AI 'align to' human (values, behavior, knowledge, etc.) (-- whatever this means), but we also need to think about how humans might 'align' to AI by working with AI-structured systems. This paper recognizes social 'looping effects' brought about by AI and its behavior.

Loading more articles...