Tenure One · 2025–2026

Viveka 1.0

Mechanistic interpretability of large language models — probing the internal circuits behind hallucinations, factual recall, and in-context learning.

Focus AreaMechanistic Interpretability

DurationAug 2025 – May 2026

Researchers11 Members

StatusActive

01 — Technical Introduction

What lives inside a language model?

Hallucinations Non linear low- dimensional subspaces of truthfulness, truthflow , autoencoders

Factual Recall Two Hop Circuits, Circuit Analysis

Studying Transformers in Mathematical Framework of Hidden Markov Models

Methods

2 project leads and 9 researchers driving interpretability research through the 2025–26 academic year.

02 — Research Output

∇

LLM Hallucination Detection Non linear Probing

Factual correctness representations are non linear and lie in low dimensional subspaces.

Authors· 2025

Upcoming

⊕

Circuits

Two Hop Factual Recall

Saahil Faraaz Shaikh · 2026

Upcoming

⊕

Circuits Logit-Lens Norm-Lens

Through the Lenses: A Circuit Odyssey

Pakshal Nagda, Smitali Bhandari· 2025

📄 Blog Post

ICL Hidden Markov Models

In-Context Learning of Switching Processes in Transformers

Sriram V, Jayden Koshy Joe, Smitali Bhandari· 2026

Upcoming

Interpretability Non Linear Steering

Autoencoders for Steering Truthfulness and Uncertainity Directions

Samrudh Raaj, Saahil Faraaz Shaikh · 2026

📝 Blog Post Upcoming

Interpretability Flow Models

Explaining Truthflow

Eshika Nahata, Samrudh Raaj · 2026

📝 Blog Post Upcoming

03 — Recruitment

Recruitment for this tenure is over. Find the application below.