Home

Writings

Samurais & Evil Robots: Adversarial Direction Ablation in LLMs
A mechanistic interpretability study of cultural framing, refusal-direction geometry, and the limits of multi-axis ablation.
Research
Read