Embedding Drift as a Sensitive Indicator of Internal Representation Instability in Language Model Fine-Tuning

Harshit Chaturvedy

Journal Publication | NSRI-J-2026-0032

Embedding Drift as a Sensitive Indicator of Internal Representation Instability in Language Model Fine-Tuning

Authors: Harshit Chaturvedy

Affiliation: Forsyth Central High School

Publication date: 2026-04-30

Journal/archive name: NSRI Student Research Journal

Volume: 1 Issue: 1 Pages/article: Pending

DOI: Pending DOI assignment

Open PDF/manuscript

Abstract

Fine-tuning pre-trained language models alter both performance metrics and internal representations, yet conventional loss measurements often fail to capture subtle shifts in high-dimensional embedding space. In this study, we introduce embedding drift, a scalar metric defined as the mean cosine distance between hidden-state vectors of fixed probe sentences, to quantify representational change during training. Each sentence is mapped to a 768-dimensional vector via mean pooling over token embeddings, and drift is computed as "drift"=1/N ∑_(i=1)^N▒〖(1-〗 (e_i^((0) )⋅e_i^((t) ))/(∥e_i^((0) )∥∥e_i^((t) )∥)) where N is the number of probe sentences, e_i^((0) )the initial embedding, and e_i^((t) )the embedding at step t. We performed two controlled experiments with DistilBERT on a subset of IMDb: a baseline (learning rate 5e-5) and a high learning rate (5e-4). Drift increased steadily from ~0.03 to ~0.26 in the baseline, whereas the high learning rate induced a rapid jump from ~0.22 to ~0.76, despite training loss showing minimal change. These results demonstrate that embedding drift provides a quantitative, vector-space measure of representational instability that conventional loss metrics may overlook, offering insight into internal model dynamics during fine-tuning.

Keywords

Applied Science - Engineering, Applied Science - Computer Science

Citation

Harshit Chaturvedy (2026). Embedding Drift as a Sensitive Indicator of Internal Representation Instability in Language Model Fine-Tuning. NSRI Student Research Journal. 1(1). NSRI-J-2026-0032.

References

Reference metadata is pending and must be finalized before DOI deposit.