Theory to Impact

Today's algorithms learn what you want while shaping what you want. The result is sycophancy, echo chambers, epistemic harm, polarization, mental health crisis — preferences frozen at their least-examined. We build principled alignment alternatives: aligned not with the small, dopamine self, but with the big, reflective self.

Core Research

NeurIPS 2024 · Spotlight

ProgressGym: Alignment with a Millennium of Moral Progress

Tianyi Qiu*, Y. Zhang*, X. Huang, J. X. Li, J. Ji, Y. Yang

A temporal alignment framework that adds a historical dimension to RL-based alignment, using nine centuries of text and 18 historical language models. Enables AI to track and align with moral progress across time rather than freezing a snapshot of current values.

Value embeddings across centuries and data volume by source

Read more → arXiv

ICLR 2025 · BiAlign Workshop

Position: AI Systematically Rewires the Flow of Ideas

Zhonghao He*, Tianyi Qiu*, T. Lin, M. Glickman, J. Wihbey, M. Kleiman-Weiner

LLMs acting as epistemic technologies systematically amplify biases and errors in ways that drive knowledge collapse and value lock-in across populations. A position paper building the theoretical and empirical foundation for our research agenda.

Read more → OpenReview

ICML 2025

The Lock-in Hypothesis: Stagnation by Algorithm

Tianyi Qiu*, Zhonghao He*, T. Chugh, M. Kleiman-Weiner

Human-AI feedback loops can freeze collective values in place — producing stagnation by algorithm. Demonstrated through simulation of human-AI interaction dynamics and causal inference on real-world ChatGPT usage data, showing how repeated exposure to AI outputs entrains beliefs at scale.

Diversity loss in value-laden human messages accelerated by chatbot version updates

Read more → Paper site

NeurIPS 2025

Stay True to the Evidence: Martingale Score for Bayesian Rationality in LLM Reasoning

Zhonghao He*, Tianyi Qiu*, H. Shirado, M. Sap

An unsupervised, regression-based metric that detects when LLMs deviate from rational Bayesian belief updating. Iterative reasoning often deepens confirmation bias rather than advancing truth-seeking. The Martingale Score correlates with ground-truth accuracy where labels are available, without requiring them.

Martingale Score measures predictability of belief updates from prior alone

Read more → arXiv

Preprint · 2026

Self-Improvement as Coherence Optimization: A Theoretical Account

Tianyi Qiu, Ahmed Ismail, Zhonghao He, S. Feng

Debate, bootstrapping, and self-play are unified as special cases of coherence optimization — finding the most compressible, jointly predictable context-to-behavior mapping. Proves equivalence to description-length regularization and establishes optimality for semi-supervised elicitation from pretrained models.

Coherence optimization landscape: coherence gap between greedy decoding and the optimal coherent policy

Read more → arXiv

Preprint · 2026

AI Influence: Mechanisms, Amplifiers, and Consequences

Tianyi Qiu*, Zhonghao He*, T. Lin, M. Glickman, R. Calcott, J. Wihbey, M. Kleiman-Weiner

A comprehensive survey and synthesis of how AI systems shape human beliefs, amplify biases, and drive epistemic harm — from sycophancy and echo chambers to societal-scale polarization — with a framework for understanding and intervening on AI influence.

Read more → SSRN

Other Publications

ACL 2025 · Best Paper

Language Models Resist Alignment: Evidence From Data Compression

J. Ji*, K. Wang*, Tianyi Qiu*, B. Chen*, J. Zhou*, C. Li, H. Lou, J. Dai, Y. Liu, Y. Yang
ACL 2025

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference

J. Ji, D. Hong, B. Zhang, B. Chen, J. Dai, B. Zheng, Tianyi Qiu, et al.
ACL 2025 Findings

Reward Generalization in RLHF: A Topological Perspective

Tianyi Qiu*, F. Zeng*, J. Ji*, D. Yan*, K. Wang, J. Zhou, Y. Han, J. Dai, X. Pan, Y. Yang
JAIR · NeurIPS 2024 Workshop · Best Paper

Representative Social Choice: From Learning Theory to AI Alignment

Tianyi Qiu
NeurIPS 2024 · Oral

Aligner: Efficient Alignment by Learning to Correct

J. Ji, B. Chen, H. Lou, D. Hong, B. Zhang, X. Pan, J. Dai, Tianyi Qiu, Y. Yang
ACM Computing Surveys 2025

AI Alignment: A Comprehensive Survey

J. Ji*, Tianyi Qiu*, B. Chen*, J. Zhou*, B. Zhang, D. Hong, H. Lou, K. Wang, Y. Duan, Zhonghao He, et al.
CogInterp @ NeurIPS 2025

CBMAS: Cognitive Behavioral Modeling via Activation Steering

Ahmed Ismail, A. Kuang, A. Akinkugbe, K. Zhu, S. O'Brien
Preprint 2025

Multilevel Interpretability of Artificial Neural Networks: Leveraging Framework and Methods from Neuroscience

Zhonghao He*, M. Tehenan*, J. Achterberg, K. Collins, et al.
ICLR 2026

Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction

Tianyi Qiu, M. Carroll, C. Allen
ICLR 2026

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

X. Zhu, Y. Ye*, Tianyi Qiu*, H. Zhu, S. Tan, A. Mannan, J. Michala, R. A. Popa, W. Neiswanger
Preprint 2026

You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases

I. Gisler, Zhonghao He, Tianyi Qiu
ACM FAccT 2023

Harms from Increasingly Agentic Algorithmic Systems

A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Langosco, Zhonghao He, et al.