LoKI: Low-damage Knowledge Implanting of Large Language Models

Runyu Wang; Peng Ping; Zhengyu Guo; Xiaoye Zhang; Quan Shi; Liting Zhou; Tianbo Ji

LoKI: Low-damage Knowledge Implanting of Large Language Models

Runyu Wang¹, Peng Ping^2*, Zhengyu Guo³, Xiaoye Zhang⁴, Quan Shi², Liting Zhou⁵, Tianbo Ji²

¹School of Information Science and Technology, Nantong University ²School of Transportation and Civil Engineering, Nantong University ³South China University of Technology ⁴China Southern Power Grid Company Limited ⁵Dublin City University

AAAI 2026 Oral
^*Corresponding Author

Paper Code arXiv

LoKI framework: A three-stage process (Analyzing, Selecting, Implanting) that leverages Knowledge Vector Attribution (KVA) and Layer-Balanced Strategy to enable fine-tuning while preserving pretrained capabilities.

Abstract

Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures.

We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures.

Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.

Method Overview

Three-Stage Framework

LoKI consists of three main stages:

1. Analyzing - Knowledge Vector Attribution (KVA)

We introduce KVA, a gradient-based attribution technique that evaluates the contribution of each vector in the down-projection matrix to the model's pretrained behavior. This allows us to identify which parameters are critical for preserving general knowledge.

2. Selecting - Layer-Balanced Strategy

Motivated by the hierarchical organization of knowledge in transformers, we propose the Layer-Balanced Strategy. This ensures that trainable parameters are distributed evenly across layers, respecting the model's knowledge structure. Each layer's down-projection matrix is decomposed into trainable (W_S) and frozen (W_\S) subsets.

3. Implanting - Targeted Fine-Tuning

We freeze all model parameters except the selected low-contribution knowledge vectors (W_S), which are updated during fine-tuning to implant task-specific knowledge with minimal disruption to existing capabilities.

Key Innovations

Superior CF resistance: LoKI significantly outperforms state-of-the-art PEFT methods in preserving general capabilities
Parameter efficiency: Updates only a controlled subset of original parameters
Synergistic design: Can be combined with existing methods like LoRA for even greater efficiency

Experimental Results

Task 1: ToolACE Function-Calling Dataset

Fine-tuning Llama3.1-8B-Instruct on the ToolACE dataset, LoKI achieves:

58.93% overall accuracy on Berkeley Function Calling Leaderboard V3 (q=30)
75% reduction in average performance degradation compared to DoRA
Only 1.23% degradation across six evaluation benchmarks (TriviaQA, GSM8K, HellaSwag, WinoGrande, HumanEval, IFEval)
Consistently reduced Irrelevance metric, suggesting mitigation of hallucination effects

Task 2: LB Reranker Dataset

Fine-tuning Qwen2.5-0.5B-Instruct on the LB Reranker dataset:

LoKI (q=30) achieves best averaged performance on BEIR benchmark
Positive performance gains in four metrics with q=20
Significantly less degradation compared to DoRA, PiSSA, and CorDA across all metrics
Only 0.46% average degradation on general benchmarks (q=30)

Integration with LoRA

LoKI can be combined with LoRA (LoKI*), achieving:

97.16% reduction in trainable parameters compared to LoKI alone
No visible compromise in catastrophic forgetting resistance

BibTeX

@article{wang2025loki,
  title={LoKI: Low-damage Knowledge Implanting of Large Language Models},
  author={Runyu Wang and Peng Ping and Zhengyu Guo and Xiaoye Zhang and Quan Shi and Liting Zhou and Tianbo Ji},
  journal={AAAI Conference on Artificial Intelligence},
  year={2026},
  url={https://arxiv.org/abs/2505.22120}
}