LoKI: Low-damage Knowledge Implanting of Large Language Models

1School of Information Science and Technology, Nantong University 2School of Transportation and Civil Engineering, Nantong University 3South China University of Technology 4China Southern Power Grid Company Limited 5Dublin City University

AAAI 2026 Oral
*Corresponding Author
LoKI Framework Overview

LoKI framework: A three-stage process (Analyzing, Selecting, Implanting) that leverages Knowledge Vector Attribution (KVA) and Layer-Balanced Strategy to enable fine-tuning while preserving pretrained capabilities.

Abstract

Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures.

We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures.

Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.

Method Overview

Three-Stage Framework

LoKI consists of three main stages:

1. Analyzing - Knowledge Vector Attribution (KVA)

We introduce KVA, a gradient-based attribution technique that evaluates the contribution of each vector in the down-projection matrix to the model's pretrained behavior. This allows us to identify which parameters are critical for preserving general knowledge.

2. Selecting - Layer-Balanced Strategy

Motivated by the hierarchical organization of knowledge in transformers, we propose the Layer-Balanced Strategy. This ensures that trainable parameters are distributed evenly across layers, respecting the model's knowledge structure. Each layer's down-projection matrix is decomposed into trainable (WS) and frozen (W\S) subsets.

3. Implanting - Targeted Fine-Tuning

We freeze all model parameters except the selected low-contribution knowledge vectors (WS), which are updated during fine-tuning to implant task-specific knowledge with minimal disruption to existing capabilities.

Key Innovations

  • Superior CF resistance: LoKI significantly outperforms state-of-the-art PEFT methods in preserving general capabilities
  • Parameter efficiency: Updates only a controlled subset of original parameters
  • Synergistic design: Can be combined with existing methods like LoRA for even greater efficiency

Experimental Results

Task 1: ToolACE Function-Calling Dataset

Fine-tuning Llama3.1-8B-Instruct on the ToolACE dataset, LoKI achieves:

  • 58.93% overall accuracy on Berkeley Function Calling Leaderboard V3 (q=30)
  • 75% reduction in average performance degradation compared to DoRA
  • Only 1.23% degradation across six evaluation benchmarks (TriviaQA, GSM8K, HellaSwag, WinoGrande, HumanEval, IFEval)
  • Consistently reduced Irrelevance metric, suggesting mitigation of hallucination effects

Task 2: LB Reranker Dataset

Fine-tuning Qwen2.5-0.5B-Instruct on the LB Reranker dataset:

  • LoKI (q=30) achieves best averaged performance on BEIR benchmark
  • Positive performance gains in four metrics with q=20
  • Significantly less degradation compared to DoRA, PiSSA, and CorDA across all metrics
  • Only 0.46% average degradation on general benchmarks (q=30)

Integration with LoRA

LoKI can be combined with LoRA (LoKI*), achieving:

  • 97.16% reduction in trainable parameters compared to LoKI alone
  • No visible compromise in catastrophic forgetting resistance

BibTeX

@article{wang2025loki,
  title={LoKI: Low-damage Knowledge Implanting of Large Language Models},
  author={Runyu Wang and Peng Ping and Zhengyu Guo and Xiaoye Zhang and Quan Shi and Liting Zhou and Tianbo Ji},
  journal={AAAI Conference on Artificial Intelligence},
  year={2026},
  url={https://arxiv.org/abs/2505.22120}
}