Abstract
Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures.
We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures.
Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.
Method Overview
Three-Stage Framework
LoKI consists of three main stages:
1. Analyzing - Knowledge Vector Attribution (KVA)
We introduce KVA, a gradient-based attribution technique that evaluates the contribution of each vector in the down-projection matrix to the model's pretrained behavior. This allows us to identify which parameters are critical for preserving general knowledge.
2. Selecting - Layer-Balanced Strategy
Motivated by the hierarchical organization of knowledge in transformers, we propose the Layer-Balanced Strategy. This ensures that trainable parameters are distributed evenly across layers, respecting the model's knowledge structure. Each layer's down-projection matrix is decomposed into trainable (WS) and frozen (W\S) subsets.
3. Implanting - Targeted Fine-Tuning
We freeze all model parameters except the selected low-contribution knowledge vectors (WS), which are updated during fine-tuning to implant task-specific knowledge with minimal disruption to existing capabilities.
Key Innovations
- Superior CF resistance: LoKI significantly outperforms state-of-the-art PEFT methods in preserving general capabilities
- Parameter efficiency: Updates only a controlled subset of original parameters
- Synergistic design: Can be combined with existing methods like LoRA for even greater efficiency
Experimental Results
Task 1: ToolACE Function-Calling Dataset
Fine-tuning Llama3.1-8B-Instruct on the ToolACE dataset, LoKI achieves:
- 58.93% overall accuracy on Berkeley Function Calling Leaderboard V3 (q=30)
- 75% reduction in average performance degradation compared to DoRA
- Only 1.23% degradation across six evaluation benchmarks (TriviaQA, GSM8K, HellaSwag, WinoGrande, HumanEval, IFEval)
- Consistently reduced Irrelevance metric, suggesting mitigation of hallucination effects
Task 2: LB Reranker Dataset
Fine-tuning Qwen2.5-0.5B-Instruct on the LB Reranker dataset:
- LoKI (q=30) achieves best averaged performance on BEIR benchmark
- Positive performance gains in four metrics with q=20
- Significantly less degradation compared to DoRA, PiSSA, and CorDA across all metrics
- Only 0.46% average degradation on general benchmarks (q=30)
Integration with LoRA
LoKI can be combined with LoRA (LoKI*), achieving:
- 97.16% reduction in trainable parameters compared to LoKI alone
- No visible compromise in catastrophic forgetting resistance
BibTeX
@article{wang2025loki,
title={LoKI: Low-damage Knowledge Implanting of Large Language Models},
author={Runyu Wang and Peng Ping and Zhengyu Guo and Xiaoye Zhang and Quan Shi and Liting Zhou and Tianbo Ji},
journal={AAAI Conference on Artificial Intelligence},
year={2026},
url={https://arxiv.org/abs/2505.22120}
}