Theory to Impact

Today's algorithms learn what you want while shaping what you want. The result is sycophancy, echo chambers, epistemic harm, polarization, mental health crisis — preferences frozen at their least-examined. We build principled alignment alternatives: aligned not with the small, dopamine self, but with the big, reflective self.

Core Research
NeurIPS 2024 · Spotlight
Tianyi Qiu*, Y. Zhang*, X. Huang, J. X. Li, J. Ji, Y. Yang

A temporal alignment framework that adds a historical dimension to RL-based alignment, using nine centuries of text and 18 historical language models. Enables AI to track and align with moral progress across time rather than freezing a snapshot of current values.

Value embeddings across centuries and data volume by source
ICLR 2025 · BiAlign Workshop
Zhonghao He*, Tianyi Qiu*, T. Lin, M. Glickman, J. Wihbey, M. Kleiman-Weiner

LLMs acting as epistemic technologies systematically amplify biases and errors in ways that drive knowledge collapse and value lock-in across populations. A position paper building the theoretical and empirical foundation for our research agenda.

ICML 2025
Tianyi Qiu*, Zhonghao He*, T. Chugh, M. Kleiman-Weiner

Human-AI feedback loops can freeze collective values in place — producing stagnation by algorithm. Demonstrated through simulation of human-AI interaction dynamics and causal inference on real-world ChatGPT usage data, showing how repeated exposure to AI outputs entrains beliefs at scale.

Diversity loss in value-laden human messages accelerated by chatbot version updates
NeurIPS 2025
Zhonghao He*, Tianyi Qiu*, H. Shirado, M. Sap

An unsupervised, regression-based metric that detects when LLMs deviate from rational Bayesian belief updating. Iterative reasoning often deepens confirmation bias rather than advancing truth-seeking. The Martingale Score correlates with ground-truth accuracy where labels are available, without requiring them.

Martingale Score measures predictability of belief updates from prior alone
Preprint · 2026
Tianyi Qiu, Ahmed Ismail, Zhonghao He, S. Feng

Debate, bootstrapping, and self-play are unified as special cases of coherence optimization — finding the most compressible, jointly predictable context-to-behavior mapping. Proves equivalence to description-length regularization and establishes optimality for semi-supervised elicitation from pretrained models.

Coherence optimization landscape: coherence gap between greedy decoding and the optimal coherent policy
Preprint · 2026
Tianyi Qiu*, Zhonghao He*, T. Lin, M. Glickman, R. Calcott, J. Wihbey, M. Kleiman-Weiner

A comprehensive survey and synthesis of how AI systems shape human beliefs, amplify biases, and drive epistemic harm — from sycophancy and echo chambers to societal-scale polarization — with a framework for understanding and intervening on AI influence.

Other Publications