Chunk 0
--- Page 1 ---
Separable Expert Architecture: Toward Privacy-Preserving LLM
Personalization via Composable Adapters and Deletable User Proxies
Chris Schneider1
Philipp Schoenegger1
Ben Bariach1
1Microsoft AI
Abstract
Current model training approaches incorporate user in-
formation directly into shared weights, making individual
data removal computationally infeasible without retrain-
ing. This paper presents a three-layer architecture that
decouples personal data from shared weights by combin-
ing a static base model, composable domain-expert LoRA
adapters that shape behavior without imparting user data,
and per-user proxy artefact whose deletion constitutes de-
terministic unlearning.
Chunk 1
Evaluation on Phi-3.5-mini and
Llama-3.1-8B confirms per-user differentiation in which
personal data influences outputs while remaining isolated,
verified by a return to baseline after proxy removal (KL
≈0.21 nats, 82–89% verification pass rate) and near-
zero cross-user contamination. Because user-specific
information never enters shared weights, the architec-
ture mitigates model inversion, membership inference,
and training-data extraction against shared model compo-
nents by construction.
Chunk 2
The approach converts machine
unlearning from an intractable weight-editing problem
into a deterministic deletion operation that preserves per-
sonalization alongside privacy-enhancing guarantees and
is compatible with differentially private stochastic gra-
dient descent (DP-SGD) for privacy-preserving shared
model improvement. 1
Introduction
As LLM personalization becomes widely used, a growing
body of work has demonstrated that user preferences
can be captured through retrieval-augmented profiles [1],
post-hoc parameter merging [2], and personalized reward
learning [3, 4].
Chunk 3
While some of these approaches oper-
ate at the prompt level (e.g., retrieval-augmented pro-
files), many encode user-specific information into model
weights θ via fine-tuning, producing models whose pa-
rameters entangle contributions from many users. When
a user later requests deletion it is unclear how one can
remove their data from a model whose weights have
been shaped by thousands of users simultaneously.
Chunk 4
This suggests that there is a fundamental tension be-
tween personalization and data deletion in the context
of modern LLMs. When user preferences are distributed
across shared weights, deletion requires identifying and
removing each user’s contribution, a problem that has
shown to be computationally intractable without full
retraining [5].
Chunk 5
Exact unlearning methods like SISA
[5] require maintaining independently trained model
shards, while approximate methods offer no formal re-
moval guarantees [6]. LLM-specific approaches face
additional difficulties: Gradient ascent can cause catas-
trophic collapse in certain unlearning configurations [7],
and representation-level methods like RMU [8] still mod-
ify shared weights.
Chunk 6
This problem is compounded by
extraction attacks, including model inversion [9], train-
ing data extraction [10, 11], and membership inference
[12], which can recover private information from weight-
encoded personalization,making it a privacy issue even
absent deletion requests. To illustrate this, consider a
personalized assistant that has learned a user’s medical
vocabulary preferences through fine-tuning.
Chunk 7
Even after
the user requests deletion, membership inference attacks
SHARED COMPONENTS
no user data
USER PROXY Pu
all user-specific data
REMOVABLE — file deletion = full erasure
Base Model θ (shared)
Expert Router
E1 Security
E2 Code
E3 Data
E4 General
Wbase + P
wiBiAi
Routing bias bu ∈Rk
domain preference scores
Personal LoRA Lu = (Bu, Au)
user-specific weight residuals
Steering vectors {sℓ
u}ℓ∈L
style and preference modifiers
merge
inject
Query q
Output
Figure 1: Separable Expert Architecture. Shared components (left) contain no user-specific information: a frozen base
model, four domain-expert LoRA adapters selected by a per-query router, and a weighted merge.
Chunk 8
The per-user proxy (right,
dashed red border) holds three deletable personalization mechanisms (routing bias, personal LoRA, and contrastive steering
vectors) that compose with shared components at inference via cross-boundary arrows. The vertical dashed line marks the
separation boundary, where deleting the proxy directory removes all user-specific influence with zero retraining.
Chunk 9
1
arXiv:2604.21571v1 [cs.AI] 23 Apr 2026
--- Page 2 ---
could reveal whether that user’s data was part of the
training set, while training data extraction could re-
cover specific preference examples, all because the user’s
influence remains distributed across millions of shared
parameters. In order to address this issue, we propose the Sepa-
rable Expert Architecture (SEA), a design that aims
to satisfy both personalization and deletability simulta-
neously.
Chunk 10
The core contribution is that if user-specific
information never enters shared weights, “unlearning”
is essentially just deletion. Rather than trying to sur-
gically undo weight entanglement after the fact, this
approach prevents entanglement from occurring in the
first place.
Chunk 11
In other words, this requires an architecture
where personalization is compositional, i.e., assembled
at inference time from separable, deletable components,
rather than absorptive, where preferences are baked into
shared parameters. Contributions.
Chunk 12
We make three contributions:
1. A three-layer composition architecture where a
base model (frozen, shared) is augmented by domain-
expert LoRA adapters (shared, dynamically weighted
by a query router) and per-user proxy artifacts, which
are isolated directories containing a routing bias vec-
tor, contrastive steering vectors, and a personal LoRA
adapter (∼2–5 MB per user in our configuration).
Chunk 13
The architecture maintains a strict invariant: All
user-specific information resides in a deletable arti-
fact that never enters shared weights (§2). 2.
Chunk 14
A deletion protocol that reduces user removal to
filesystem deletion of the proxy directory followed by
noise-calibrated KL-divergence verification against
a non-personalized baseline, requiring no retraining
(§2.4) at all. 3.
Chunk 15
Additional empirical evidence across Phi-3.5-mini
and Llama-3.1-8B with four domain experts and four
synthetic user profiles, demonstrating measurable per-
sonalization, verified deletion (82–89% verification
pass rate), and clean cross-user isolation (contamina-
tion ≤0.05 in point estimates) (§4). Related Work.
Chunk 16
Research on machine unlearning has
shown that surgical removal of user influence from model
weights is fundamentally hard, whether through exact
retraining [5] or efficient approximate deletion [13], ap-
proximate gradient manipulation [6, 14], LLM-specific
methods such as model-generated knowledge replace-
ment [15], NPO [7], or representation-level unlearning
[8]. On the other hand, the infrastructure for compos-
able adapter stacks has matured substantially: LoRA
[16] and QLoRA [17] enable efficient adapter training,
LoraHub [18] and task arithmetic [19, 20] demonstrate
multi-adapter composition, and S-LoRA [21] enables
serving thousands of concurrent adapters from a single
base model while Punica [22] provides efficient multi-
tenant batching via segmented gather-matrix-vector
kernels.
Chunk 17
Activation steering methods, including Con-
trastive Activation Addition [23] and Inference-Time
Intervention [24], show that behavioral modification
without weight changes can be both effective and rel-
atively lightweight. LLM personalization approaches,
including LaMP [1], Personalized Soups [2], P-RLHF
[3], and VPL [4], capture user preferences through var-
ious mechanisms.
Chunk 18
However, none of these approaches
architecturally separates user state from shared weights,
meaning that deletion would require either retraining or
approximate weight modification, the same intractable
operations the unlearning literature has already identi-
fied as problematic [5, 6]. Adding a deletion mechanism
post hoc does not resolve this as the entanglement occurs
during training, and no inference-time wrapper can undo
it.
Chunk 19
The infrastructure for composable, per-user adapter
stacks exists, but what is largely missing is a deletion-
aware composition design that prevents entanglement
from occurring in the first place. SEA bridges this gap
by ensuring that personalization state is architecturally
separable from shared model components.
Chunk 20
In the rest of the paper, we go through the architecture
and deletion protocol of the SEA (§2), the experimental
setup (§3), and the results (§4), before closing with a
discussion of implications and limitations (§5). 2
Architecture
In this section, we present SEA’s three-layer composition
architecture and its core design invariant.
Chunk 21
The central
claim is that the user-specific information has to be
structurally separated from shared model components
such that deletion becomes a deterministic filesystem op-
eration rather than an approximate weight-modification
procedure. We first state the invariant (§2.1), then de-
scribe the three composition layers (§2.2), detail the
inference pipeline (§2.3), and lastly present the deletion
protocol (§2.4).
Chunk 22
2.1
Design Invariant
SEA maintains a strict architectural invariant that dis-
tinguishes it from approximate unlearning approaches
and provides the basis for the deletion protocol:
Invariant 1 (Separation). All user-specific information
resides in an isolated, deletable proxy artifact.
Chunk 23
Shared
model components (the base model and expert adapters)
contain no user-identifying information. Removing the
proxy artifact is both necessary and sufficient for com-
plete user data removal from the inference system.
Chunk 24
Importantly, this invariant is structural as opposed
to statistical. While approximate unlearning methods
provide probabilistic guarantees that user influence has
been reduced below some threshold, Invariant 1 guar-
antees that user influence is architecturally absent from
shared components.
Chunk 25
In other words, the guarantee holds
by construction as the system never permits user-specific
gradients to flow into shared weights, so there is nothing
to remove. 2.2
Three-Layer Composition
SEA combines three layers at inference time (Figure 1):
a frozen base model that provides general capabilities,
shared domain-expert LoRA adapters that provide spe-
cialized knowledge, and per-user proxy artifacts that
provide deletable personalization.
Chunk 26
Base Layer. The base layer is a frozen, quantized
LLM that provides general language capabilities and
is shared across all users.
Chunk 27
It contains no user-specific
2
--- Page 3 ---
information by design, and the base weights are never
modified during user interactions. Periodic retraining
on aggregated data with differential privacy guarantees
(DP-SGD [25]) is a natural extension but is out of scope
for this paper.
Chunk 28
Expert Layer. A bank of k domain-specific LoRA
adapters E = {E1, .
Chunk 29
. .
Chunk 30
, Ek} provides specialized capa-
bilities for distinct knowledge domains. Each expert
Ei = (Bi, Ai) is a low-rank adapter trained on curated
domain corpora and shared across all users, with experts
encoding domain knowledge only.
Chunk 31
At inference, experts
combine via weighted linear combination (Equation 1):
Wexpert = Wbase +
k
X
i=1
wi · BiAi
(1)
where w ∈∆k (the probability simplex) are mixing co-
efficients determined per-query by a lightweight router. User Layer.
Chunk 32
Each user u has an isolated proxy arti-
fact Pu, which is a self-contained directory comprising
three complementary personalization mechanisms, each
stored as serialized tensors:
1. Routing bias vector bu ∈Rk: A learned vector
of domain affinity scores derived from user interac-
tion patterns that shifts expert selection toward user-
preferred domains.
Chunk 33
The bias is applied as a scaled
additive adjustment with clamp-and-normalize:
˜wi = w0,i + λ bu,i,
wi =
max( ˜wi, 0)
P
j max( ˜wj, 0)
(2)
where w0 is the router’s base distribution and λ is
a bias scale that prevents raw affinity values from
overwhelming the base routing. If P
j max( ˜wj, 0) =
0, the distribution falls back to uniform: wi = 1/k.
Chunk 34
2. Contrastive steering vectors {sℓ
u}ℓ∈L at a subset
of intermediate layers L: Computed via Contrastive
Activation Addition 23 from user preference pairs and
injected additively into residual stream activations
at inference:
hℓ←hℓ+ γ sℓ
u
(3)
where γ is a steering strength multiplier.
Chunk 35
These vec-
tors encode stylistic preferences (verbosity, formal-
ity, technical depth) without modifying any model
weights, making them particularly well-suited for
deletable personalization. 3.
Chunk 36
Personal LoRA adapter Lu = (Bu, Au): A low-
rank adapter trained on user preference pairs. This
adapter captures user-specific knowledge and re-
sponse patterns that routing bias and steering alone
cannot express, resulting in additional personaliza-
tion.
Chunk 37
The rank is deliberately kept small to bound
proxy size and maintain a clear separation guarantee. During personal LoRA training via DPO, the base
model and expert adapter weights are then frozen,
such that only the rank-4 personal LoRA parameters
receive gradient updates, ensuring that user-specific
gradients never flow into shared components.
Chunk 38
The proxy is operationally independent of shared
weights at inference time, as it is a self-contained,
deletable artefact whose removal then eliminates all
user-specific influence from the system. However, note
that the personal LoRA is conditioned on the shared
model during DPO, where the base model serves as the
reference, so the proxy’s content reflects shared model
state even though no user information flows in the re-
verse direction.
Chunk 39
2.3
Inference Pipeline
Given query q from user u, inference proceeds in five
stages that combine the three layers into a single gener-
ation pass:
1. Route.
Chunk 40
A lightweight router classifies q into a do-
main distribution w0 ∈∆k over the k experts. 2.
Chunk 41
Bias. The user’s routing bias is applied via Equa-
tion 2, shifting expert selection toward the user’s
preferred domains based on their accumulated inter-
action history.
Chunk 42
3. Merge.
Chunk 43
The weighted expert adapters and personal
LoRA are combined into a single merged adapter
applied to the base model. 4.
Chunk 44
Steer. Forward hooks inject the user’s steering vec-
tors γ sℓ
u at layers ℓ∈L via Equation 3, modifying
activations without changing any weights.
Chunk 45
5. Generate.
Chunk 46
Standard autoregressive decoding with
the merged model produces the personalized output. 2.4
Deletion Protocol
SEA’s deletion protocol exploits the architectural in-
variant (Invariant 1) to reduce user removal to a simple
filesystem operation with statistical verification.
Chunk 47
The
key challenge we address is establishing that removing
a user’s proxy artifact fully eliminates all user-specific
influence on model behavior. To delete user u, the protocol proceeds in three steps:
1.
Chunk 48
Verify. On held-out domain-generic prompts (not
user-specific, to avoid circular verification): gener-
ate outputs in omission mode (proxy not loaded)
and compare token-frequency distributions against a
cached non-personalized baseline (base model + ex-
perts, no proxy) via KL divergence.
Chunk 49
Verification uses
a noise-calibrated threshold: the inter-sample KL di-
vergence among unpersonalized generations provides
an empirical noise floor ˆσKL for stochastic decoding,
and bypass is confirmed when
DKL(punpers∥pbaseline) ≤max
2 ˆσKL, τmin
(4)
where τmin = 0.15 nats is a hard floor that prevents
unreasonably tight thresholds on low-variance queries. This makes verification self-calibrating: queries with
high stochastic variance receive a proportionally
wider acceptance band, eliminating false failures from
sampling noise without weakening the guarantee for
stable queries.
Chunk 50
2. Delete.
Chunk 51
Secure filesystem removal of the proxy di-
rectory Pu (zero-overwrite). 3.
Chunk 52
Audit. Log the deletion event, verification result,
and timestamp for compliance trail.
Chunk 53
The architectural separation produces a direct payoff
here. Without the proxy, the system’s behavior is struc-
turally equivalent in expectation to the non-personalized
baseline.
Chunk 54
The same code paths execute with the same
3
--- Page 4 ---
weights, with the proxy simply not loaded. Verifica-
tion exploits this architectural equivalence: omitting
the proxy at inference time is functionally identical to
deleting it, so the verify step confirms deletion behavior
before the irreversible delete step.
Chunk 55
The KL-divergence
verification is therefore a sanity check confirming the
architectural guarantee, not the privacy guarantee itself. The guarantee comes from the invariant: user informa-
tion exists only in the proxy, and the proxy has been
deleted.
Chunk 56
Cached baselines must be refreshed whenever
shared components (base model or expert adapters) are
updated; if a new base model is deployed, personal LoRA
adapters must be regenerated. 3
Experimental Setup
We evaluate SEA across two base models, four domain
experts, and four synthetic user profiles, targeting three
evaluation dimensions: personalization quality, deletion
completeness, and cross-user isolation.
Chunk 57
We first describe
the experimental configuration and then present the
results. Models.
Chunk 58
We use two base models: Phi-3.5-mini-
instruct (3.8B parameters) and Llama-3.1-8B-Instruct,
both loaded in 4-bit NormalFloat (NF4) quantization via
QLoRA [17]. These models span a range of parameter
counts to test whether the architectural properties hold
across model scales.
Chunk 59
Expert Adapters. Four domain experts (k = 4)
are trained via supervised fine-tuning with TRL [26],
all using rank 32, scaling factor α = 64, applied to
all attention projections (query, key, value, output):
Security (Trendyol + OWASP-NVD, ∼76K examples),
Code (CodeAlpaca + supplementary code instruction
sets, capped at ∼50K examples), Data (synthetic text-
to-SQL), and General (Alpaca, ∼52K examples).
Chunk 60
These
experts are shared across all users and contain domain
knowledge only. Synthetic
User
Profiles.
Chunk 61
Four
user
pro-
files (security_expert, casual_coder, data_analyst,
general_user) are each defined by domain affinity
weights and positive/negative style traits. Proxy ar-
tifacts are generated through three mechanisms: rout-
ing bias via EMA from simulated interaction patterns
(λ = 0.5), steering vectors via CAA from trait-aligned
preference pairs at layers L = {12, 16, 20} with strength
γ = 1.0, and personal LoRA (rank 4) via DPO [27]
on preference pairs, using the base model as the DPO
reference.
Chunk 62
The total proxy size is approximately 2–5 MB
per user. Routing and Composition.
Chunk 63
The expert router
uses zero-shot entailment-based classification [28] using
BART-MNLI [29] with keyword-based fallback (soft-
max temperature T
= 2.0 for the fallback path). Adapter merging uses PEFT’s add_weighted_adapter
with combination_type="linear" and a load-once life-
cycle with deferred cleanup.
Chunk 64
Evaluation Protocol. We conduct 70 evaluation
runs per model (140 total) across 20 evaluation prompts
(5 per domain).1 Cached baselines ensure consistency
1Each evaluation run generates 7 bypass observations (a subset
of query-user combinations selected from the held-out verification
across runs, and 95% confidence intervals are reported
via the t-distribution.
Chunk 65
Style trait match. Style trait match is defined as
the number of target style keywords detected in a per-
sonalized generation.
Chunk 66
Each user profile specifies a set of
positive style traits as keywords (e.g., terms associated
with verbosity, technical depth, or domain-specific vocab-
ulary), and the metric counts how many appear in each
output. The reported value is the mean count across all
prompt-user-run observations (1,904 for Phi-3.5-mini,
1,960 for Llama-3.1-8B).
Chunk 67
The scale is profile-dependent:
the security expert profile achieves a mean of 3.01 (Phi)
and 1.02 (Llama), while the general user profile aver-
ages 0.21 and 0.28 respectively. Keyword presence is a
necessary but not sufficient indicator of style alignment,
as a response containing a target keyword may use it in
a non-stylistic context.
Chunk 68
The metric should therefore be
understood as a lower bound on non-match rather than
a calibrated measure of style fidelity. 4
Results
We organize results around three claims that jointly aim
to validate the architectural design.
Chunk 69
First, we show that
the proxy achieves measurable personalization (§4.1),
second, that the proxy removal restores baseline behavior
(§4.2), and third that no cross-user leakage occurs (§4.3). Together, these claims address the central question of
whether architectural separation can simultaneously de-
liver personalization, deletability, and isolation.
Chunk 70
4.1
Personalization
The proxy measurably adapts model outputs without
modifying shared weights. Table 1 shows three distinct
findings.
Chunk 71
First, routing bias successfully shifts expert
selection toward each user’s preferred domain (weight
shift 0.052–0.088). Second, Jaccard similarity to the non-
personalized baseline is low (0.236–0.316), indicating
substantial output differentiation.
Chunk 72
Third, style trait
matching is stronger for Phi-3.5-mini (1.71) than Llama-
3.1-8B (0.63), an observed difference between these two
specific models that should not be attributed to model
size given N=2 and multiple confounds. Table 1: Personalization metrics across both base models.
Chunk 73
Weight shift measures the routing bias effect on expert selec-
tion. Jaccard similarity to baseline measures output overlap
(lower = more personalized).
Chunk 74
Style trait match measures
alignment with target user traits. Metric
Phi-3.5-mini
Llama-3.1-8B
Weight shift
0.052 ± 0.002
0.088 ± 0.003
Jaccard similarity
0.236 ± 0.005
0.316 ± 0.005
Style trait match
1.710 ± 0.101
0.629 ± 0.040
The three-mechanism proxy thus achieves moderate-
to-strong personalization for Phi-3.5-mini and moder-
ate personalization for Llama-3.1-8B, without touching
shared weights.
Chunk 75
The personalization is present but delib-
erately moderate in scope, a consequence of the rank-4
prompts). Phi-3.5-mini completed 68 runs (476 observations);
Llama-3.1-8B completed 70 runs (490 observations).
Chunk 76
Two early
Phi-3.5-mini runs were configuration tests that produced no bypass
data. 4
--- Page 5 ---
Figure 2: Distribution of unpersonalized-to-baseline KL-divergence scores across all prompt-user combinations for both base
models (476 observations for Phi-3.5-mini, 490 for Llama-3.1-8B).
Chunk 77
Dashed lines mark the per-model mean. Verification uses
a noise-calibrated per-query threshold (Equation 4) rather than a fixed cutoff, so no single threshold line is shown.
Chunk 78
The KL
distribution is bimodal rather than gradual: verified observations cluster in [0.00, 0.30] and failures in [0.30, 0.94], with no
ambiguous intermediate population. This sharp boundary is consistent with the structural guarantee, as proxy removal
either fully eliminates user influence (the common case) or generation variance produces an outlier sample (the failure case),
with no evidence of partial leakage.
Chunk 79
constraint on the personal LoRA, which is the price of
deletability and a central trade-off of our design. More
expressive adapters would capture richer user preferences
but would require more parameters, increasing proxy
size and reducing the clarity of the separation guaran-
tee.
Chunk 80
The security expert profile produces the strongest
personalization signal (mean style trait match 3.01 on
Phi-3.5-mini, with individual observations reaching 12),
yet bypass verification for this profile’s queries passes
at rates comparable to lower-personalization profiles. The architecture does not trade deletion reliability for
personalization intensity.
Chunk 81
4.2
Separability
Next, we find that proxy removal restores baseline be-
havior, which confirms the architectural invariant. Ta-
ble 2 shows two main results.
Chunk 82
First, mean KL diver-
gence between unpersonalized and baseline outputs is
approximately 0.21 nats for both models. Second, the
82–89% noise-calibrated verification pass rate indicates
that the vast majority of prompt-user combinations pro-
duce outputs statistically indistinguishable from the
non-personalized baseline after proxy removal.
Chunk 83
Table 2: Deletion verification metrics. Verification pass
rate is the fraction of prompt-user combinations where the
unpersonalized-to-baseline KL divergence falls within the
noise-calibrated threshold (Equation 4).
Chunk 84
Metric
Phi-3.5-mini
Llama-3.1-8B
Verified pass rate
0.819 ± 0.035
0.892 ± 0.028
KL divergence
0.217 ± 0.012
0.212 ± 0.006
Figure 2 shows the distribution of KL-divergence
scores across all prompt-user combinations. Importantly,
the deletion itself is deterministic and complete, as the
proxy files are removed and the shared weights are un-
touched.
Chunk 85
The KL verification is a separate measurement
that compares stochastic outputs from finite-length gen-
erations. By calibrating the acceptance threshold against
the empirical inter-sample noise floor per query, the ver-
ification procedure accounts for the inherent variance of
stochastic decoding: Queries that naturally produce high
output variance receive a proportionally wider threshold,
while stable queries are held to a tighter standard.
Chunk 86
The
11–18% of cases that still exceed the noise-calibrated
threshold likely reflect edge cases where generation vari-
ance is unusually high relative to the measured noise
floor, not residual user influence in the weights.2 The
deletion verification thus provides empirical confirma-
tion of the architectural guarantee, though the guarantee
itself rests on the structural invariant rather than the
verification metric. Threshold sensitivity.
Chunk 87
The verification pass rate
reported above depends on the 2ˆσKL multiplier in Equa-
tion 4. Table 3 shows how the pass rate varies across
multiplier settings.
Chunk 88
The hard floor τmin is inert across
the tested range [0.10, 0.25] because the empirical noise
floor ˆσKL ≈0.15 nats is stable across all query-user
pairs (range [0.146, 0.157]), making the multiplier the
sole active control. The floor would activate only if ˆσKL
dropped below τmin/mult (approximately 0.075 nats
at the paper’s 2σ, τmin = 0.15 configuration), which
does not occur in this data.
Chunk 89
A single multiplier param-
2A small number of Phi-3.5-mini observations produced de-
generate (near-empty) outputs due to an inference configuration
issue that did not affect Llama-3.1-8B runs. These observations
yield artificially low KL values and are retained in the reported
statistics for transparency.
Chunk 90
Filtering them would increase the
mean KL slightly and marginally reduce the reported pass rate
for Phi-3.5-mini. 5
--- Page 6 ---
eter therefore suffices for threshold calibration.
Chunk 91
This
cross-query, cross-user, cross-model consistency was not
guaranteed by the architecture and constitutes an em-
pirical finding: the stochastic decoding noise floor is a
property of the generation process, not of the personal-
ization mechanism, which is what a structurally clean
separation should produce. Table 3: Verification pass rate by σ multiplier.
Chunk 92
The chosen
2σ configuration (bold) sits in the moderate region of a
monotonic curve. Stricter deployments could tighten to
1.5σ at the cost of more false failures; those prioritizing
operational stability could relax to 2.5σ.
Chunk 93
Multiplier
Phi-3.5-mini (n=476)
Llama-3.1-8B (n=490)
1.0σ
0.239
0.167
1.5σ
0.513
0.600
2.0σ
0.819
0.892
2.5σ
0.929
0.984
3.0σ
0.971
0.994
Pass rates increase monotonically with no disconti-
nuities. The deletion guarantee is independent of these
parameters, as this analysis characterizes verification
sensitivity as opposed to deletion completeness.
Chunk 94
The KL
distributions across all observations have mean 0.218
(Phi) and 0.213 (Llama), with standard deviations of
0.132 and 0.070 respectively. Phi-3.5-mini has a heavier
right tail (95th percentile 0.402 vs 0.340), which explains
its lower pass rate at the same threshold.
Chunk 95
4.3
Isolation
Moreover, our results suggest that no cross-user leak-
age occurs between proxies. Table 4 shows very low
levels of contamination: 0.009 and 0.049 for Phi-3.5-
mini and Llama-3.1-8B respectively, suggesting that one
user’s proxy does not influence another user’s outputs.
Chunk 96
Cross-user output similarity is moderate (0.27–0.35) but
expected, as users share the same base model and expert
adapters. This similarity is structural and not leakage,
reflecting the shared foundation rather than cross-user
information flow.
Chunk 97
Table 4: Cross-user isolation metrics. Contamination mea-
sures excess inter-user similarity beyond the shared baseline.
Chunk 98
Metric
Phi-3.5-mini
Llama-3.1-8B
Contamination
0.009 ± 0.002
0.049 ± 0.005
Cross-user similarity
0.271 ± 0.010
0.351 ± 0.007
Since proxies exist as isolated filesystem artifacts with
no shared mutable state, this result follows from the
architecture. However, we include it as empirical ver-
ification that the isolation invariant holds in practice
under realistic generation conditions.
Chunk 99
Summary. Taken together, the three claims are sup-
ported across both models with some between-model het-
erogeneity: Phi-3.5-mini shows stronger personalization
and isolation, while Llama-3.1-8B shows stronger dele-
tion verification rates.
Chunk 100
Llama-3.1-8B achieves a higher
verification pass rate (89.2% vs 81.9%) with a substan-
tially tighter KL distribution (std 0.070 vs 0.132), indi-
cating that the deletion properties of the architecture do
not degrade at the larger model scale. This shows that
architectural separation achieves personalization with
verified deletion and clean isolation, while the tradeoff
between personalization expressiveness and deletability
is explicit.
Chunk 101
The proxy’s tunable parameters (personal
LoRA rank, steering strength γ, routing bias scale λ)
define a configuration space that could be explored to
characterize this tradeoff, though the current evaluation
uses a single configuration throughout. 5
Discussion
Contribution.
Chunk 102
SEA sidesteps the machine unlearning
problem rather than solving it. Machine unlearning
is fundamentally hard because it attempts to undo an
irreversible operation, the entanglement of user-specific
gradients with shared weights.
Chunk 103
Even the most promising
methods either require retraining or cannot guarantee
complete removal. Architectural separation prevents en-
tanglement in the first place, converting an intractable
algorithmic problem into a tractable engineering one.
Chunk 104
The core tradeoff is explicit: A low-rank personal LoRA
is less expressive than full fine-tuning, but the three-
mechanism proxy compensates for this by providing
complementary personalization channels (routing bias
for domain preferences, steering vectors for stylistic pref-
erences, and personal LoRA for residual patterns). The
architecture’s parameters (personal LoRA rank, steering
strength γ, routing bias scale λ) define a per-deployment
configuration space in which personalization fidelity can
be traded against proxy size and separation clarity.
Chunk 105
Char-
acterizing this tradeoff empirically, for instance by com-
paring rank-4 against rank-8 or rank-16 personal LoRA
under the same deletion protocol, remains future work. A notable consequence of the separation invariant is that
shared model components (the base model and expert
adapters) can be released or audited without risk of
user data exposure, since no user-specific information
enters shared weights by construction.
Chunk 106
Moreover, it is
important to note that our approach requires designing
the system with deletion in mind from the start and
cannot be retrofitted to existing models where user data
has already been absorbed into weights. Findings.
Chunk 107
Our evaluation across two base models
shows three main results. First, the personal proxy
produces measurable personalization, with users receiv-
ing responses that reflect their domain preferences and
stylistic tendencies, with consistent shifts in routing
weights and style trait alignment.
Chunk 108
Second, deletion ver-
ification works: When a user’s proxy is removed, the
system’s outputs return to baseline behavior in 82–89%
of test cases, with the remaining failures attributable
to normal generation randomness rather than lingering
user influence (the architecture structurally guarantees
that no trace of the user persists). Third, user isolation
holds with one user’s proxy not detectably influencing
another user’s outputs (contamination ≤0.05 in point es-
timates).
Chunk 109
These results come with the inherent tradeoff
that deletability limits how deeply the system can per-
sonalize, since user data must remain separable rather
than being absorbed into shared model weights. We
view this as a reasonable price for deployments where
data deletion rights must be honored.
Chunk 110
Limitations and future work. Several limitations
6
--- Page 7 ---
constrain the current evaluation.
Chunk 111
The synthetic user
profiles used here are placeholders for real-world prefer-
ences, and the four profiles are aligned to four distinct
domains, representing the easiest possible configuration
for isolation testing; overlapping-domain profiles (e.g.,
two security-focused users with different stylistic pref-
erences) would provide a harder and more realistic test
of cross-user isolation, though the structural separation
guarantee is unaffected by profile design. The metrics
(Jaccard similarity, keyword matching) capture basic
textual overlap rather than subjective personalization
quality as perceived by users in order to demonstrate
the proof-of-concept.
Chunk 112
Second, the evaluation at 3.8–8B
parameter scale is not intended to generalize to larger
models, though the architectural invariant (separation of
user data into a deletable proxy) holds by construction
regardless of model size. Third, the current evalua-
tion does not include an ablation study isolating the
contribution of each proxy component (routing bias,
steering vectors, personal LoRA individually); such an
ablation would clarify which mechanisms drive person-
alization and deletion properties and is a natural next
step.
Chunk 113
Additionally, while architectural separation elim-
inates the risk of user data being entangled in shared
weights, the proxy artifact concentrates user behavioral
information into a portable representation, creating an
attack surface where an attacker need only exfiltrate
a single directory rather than extract user influence
from distributed weights. For open-source base mod-
els, including both models evaluated in this paper, an
exfiltrated proxy could be loaded directly against a lo-
cal copy.
Chunk 114
Non-transferability of exfiltrated proxies is
therefore a hypothesis requiring empirical validation
through cross-model transfer experiments, not a default
assumption. Securing proxy artifacts through encryp-
tion at rest, access controls, and retention policies is
necessary for end-to-end privacy and should be treated
as a deployment requirement.
Chunk 115
Tractable deletion is also
a dual-use capability, with the same mechanism that
enables personal data removal also being easily applied
to remove other content or proprietary knowledge from
model integration, with implications for compliance au-
diting that merit careful analysis. Lastly, expert adapter
training may not have converged, as loss plateaus were
not reached during the experiments, suggesting that
additional training could improve adapter quality.
Chunk 116
The most immediate extension is applying DP-SGD
to the gradient aggregation stage when updating shared
expert adapters from user interaction data, which the
architecture already supports by construction. Three
practical constraints govern this extension: the com-
putational overhead of per-sample gradient clipping,
accelerated privacy budget exhaustion under sequential
composition, and utility degradation in low-ε regimes.
Chunk 117
Aggregating LoRA updates across a large user pop-
ulation prior to noise injection could provide privacy
amplification, since individual contributions to the aggre-
gate gradient would be attenuated by population scale. However, formal privacy amplification results depend on
specific mathematical conditions, including Poisson sub-
sampling of participants, bounded per-sample sensitivity,
and particular composition theorems [30, 31], none of
which have been verified for this architecture.
Chunk 118
Whether
SEA’s gradient aggregation satisfies these conditions,
and whether the resulting ε-utility tradeoff is favorable
in practice, are open empirical questions that require
measuring privacy loss under varying ε and population-
size configurations through empirical attacks (model
inversion, membership inference) against the updated
shared model. Beyond DP-SGD, scaling to production
multi-tenant workloads via adapter-serving frameworks
such as S-LoRA and Punica, validating the privacy
guarantees through longitudinal studies with real users
and adversarial probes, and characterizing the tradeoff
between personalization depth and proxy size are all
natural next steps.
Chunk 119
References
[1] Alireza Salemi, Sheshera Mysore, Michael Bender-
sky, and Hamed Zamani. Lamp: When large lan-
guage models meet personalization.
Chunk 120
In Proceed-
ings of the 62nd Annual Meeting of the Association
for Computational Linguistics (ACL), 2024. URL
https://arxiv.org/abs/2304.11406.
Chunk 121
[2] Joel Jang,
Seungone Kim,
Bill Yuchen Lin,
Yizhong Wang, Jack Hessel, Luke Zettlemoyer,
Hannaneh Hajishirzi, Yejin Choi, and Prithviraj
Ammanabrolu. Personalized soups: Personalized
large language model alignment via post-hoc pa-
rameter merging.
Chunk 122
In Advances in Neural Infor-
mation Processing Systems, 2023. URL https:
//arxiv.org/abs/2310.11564.
Chunk 123
[3] Xinyu Li, Ruiyang Zhou, Zachary C. Lipton, and
Leqi Liu.
Chunk 124
Personalized language modeling from
personalized human feedback. arXiv preprint
arXiv:2402.05133, 2024.
Chunk 125
URL https://arxiv. org/abs/2402.05133.
Chunk 126
[4] Sriyash Poddar, Yanming Wan, Hamish Ivison, Ab-
hishek Gupta, and Natasha Jaques. Personalizing
reinforcement learning from human feedback with
variational preference learning.
Chunk 127
In Advances in Neu-
ral Information Processing Systems 37 (NeurIPS
2024), 2024. URL https://arxiv.org/abs/2408.
Chunk 128
10075. [5] Lucas Bourtoule, Varun Chandrasekaran, Christo-
pher A.
Chunk 129
Choquette-Choo, Hengrui Jia, Adelin
Travers, Baiwu Zhang, David Lie, and Nicolas Pa-
pernot. Machine unlearning.
Chunk 130
In 2021 IEEE Sym-
posium on Security and Privacy (SP), 2021. URL
https://arxiv.org/abs/1912.03817.
Chunk 131
[6] Aditya Golatkar, Alessandro Achille, and Stefano
Soatto. Eternal sunshine of the spotless net: Se-
lective forgetting in deep networks.
Chunk 132
In Proceedings
of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 9304–9312,
2020. URL https://arxiv.org/abs/1911.04933.
Chunk 133
[7] Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization:
From catas-
trophic collapse to effective unlearning.
Chunk 134
In Confer-
7
--- Page 8 ---
ence on Language Modeling (COLM 2024), 2024. URL https://arxiv.org/abs/2404.05868.
Chunk 135
[8] Nathaniel Li, Alexander Pan, Anjali Gopal, Sum-
mer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel,
Long Phan, Gabriel Mukobi, Nathan Helm-Burger,
Rassin Lababidi, Lennart Justen, Andrew B.
Chunk 136
Liu,
Michael Chen, Isabelle Barrass, Oliver Zhang, Xi-
aoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi,
Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss,
Cort B. Breuer, Samuel Marks, Oam Patel, Andy
Zou, Mantas Mazeika, Zifan Wang, Palash Os-
wal, Weiran Lin, Adam A.
Chunk 137
Hunt, Justin Tienken-
Harder, Kevin Y. Shih, Kemper Talley, John
Guan, Russell Kaplan, Ian Steneker, David Camp-
bell, Brad Jokubaitis, Alex Levinson, Jean Wang,
William Qian, Kallol Krishna Karmakar, Steven
Basart, Stephen Fitz, Mindy Levine, Ponnurangam
Kumaraguru, Uday Tupakula, Vijay Varadhara-
jan, Ruoyu Wang, Yan Shoshitaishvili, Jimmy
Ba, Kevin M.
Chunk 138
Esvelt, Alexandr Wang, and Dan
Hendrycks. The WMDP benchmark:
Measur-
ing and reducing malicious use with unlearning.
Chunk 139
In Proceedings of the 41st International Confer-
ence on Machine Learning (ICML), 2024. URL
https://arxiv.org/abs/2403.03218.
Chunk 140
[9] Matt Fredrikson, Somesh Jha, and Thomas Risten-
part. Model inversion attacks that exploit confi-
dence information and basic countermeasures.
Chunk 141
In
Proceedings of the 2015 ACM SIGSAC Conference
on Computer and Communications Security (CCS
’15), 2015. doi: 10.1145/2810103.2813677.
Chunk 142
[10] Nicholas Carlini, Florian Tramèr, Eric Wallace,
Matthew Jagielski, Ariel Herbert-Voss, Katherine
Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar
Erlingsson, Alina Oprea, and Colin Raffel. Ex-
tracting training data from large language models.
Chunk 143
In 30th USENIX Security Symposium, 2021. URL
https://arxiv.org/abs/2012.07805.
Chunk 144
[11] Milad Nasr, Nicholas Carlini, Jonathan Hayase,
Matthew Jagielski, A. Feder Cooper, Daphne Ip-
polito, Christopher A.
Chunk 145
Choquette-Choo, Eric Wal-
lace, Florian Tramèr, and Katherine Lee. Scalable
extraction of training data from (production) lan-
guage models.
Chunk 146
arXiv preprint arXiv:2311.17035,
2023. URL https://arxiv.org/abs/2311.17035.
Chunk 147
[12] Reza Shokri, Marco Stronati, Congzheng Song, and
Vitaly Shmatikov. Membership inference attacks
against machine learning models.
Chunk 148
In 2017 IEEE
Symposium on Security and Privacy (SP), pages
3–18. IEEE, 2017.
Chunk 149
doi: 10.1109/SP.2017.41. URL
https://arxiv.org/abs/1610.05820.
Chunk 150
[13] Antonio Ginart, Melody Y. Guan, Gregory Valiant,
and James Zou.
Chunk 151
Making AI forget you: Data dele-
tion in machine learning. In Advances in Neural
Information Processing Systems (NeurIPS), vol-
ume 32, 2019.
Chunk 152
URL https://arxiv.org/abs/
1907.05012. [14] Laura Graves, Vineel Nagisetty, and Vijay Ganesh.
Chunk 153
Amnesiac machine learning. In Proceedings of the
AAAI Conference on Artificial Intelligence, vol-
ume 35, pages 11516–11524, 2021.
Chunk 154
URL https:
//arxiv.org/abs/2010.10981. [15] Ronen Eldan and Mark Russinovich.
Chunk 155
Who’s harry
potter? approximate unlearning in LLMs.
Chunk 156
In Inter-
national Conference on Learning Representations
(ICLR 2024), 2024. URL https://arxiv.org/
abs/2310.02238.
Chunk 157
[16] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan
Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and
Weizhu Chen.
Chunk 158
LoRA: Low-rank adaptation of large
language models. In International Conference on
Learning Representations (ICLR 2022), 2022.
Chunk 159
URL
https://arxiv.org/abs/2106.09685. [17] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman,
and Luke Zettlemoyer.
Chunk 160
QLoRA: Efficient finetuning
of quantized LLMs. In Advances in Neural Informa-
tion Processing Systems 36 (NeurIPS 2023), 2023.
Chunk 161
URL https://arxiv.org/abs/2305.14314. [18] Chengsong Huang, Qian Liu, Bill Yuchen Lin,
Tianyu Pang, Chao Du, and Min Lin.
Chunk 162
LoraHub: Ef-
ficient cross-task generalization via dynamic LoRA
composition. In Conference on Language Modeling
(COLM 2024), 2024.
Chunk 163
URL https://arxiv.org/
abs/2307.13269. [19] Jinghan Zhang, Shiqi Chen, Junteng Liu, and Junx-
ian He.
Chunk 164
Composing parameter-efficient modules
with arithmetic operations. In Advances in Neural
Information Processing Systems (NeurIPS), 2023.
Chunk 165
URL https://arxiv.org/abs/2306.14870. [20] Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell
Wortsman, Suchin Gururangan, Ludwig Schmidt,
Hannaneh Hajishirzi, and Ali Farhadi.
Chunk 166
Edit-
ing models with task arithmetic. arXiv preprint
arXiv:2212.04089, 2022.
Chunk 167
URL https://arxiv. org/abs/2212.04089.
Chunk 168
[21] Ying Sheng, Shiyi Cao, Dacheng Li, Coleman
Hooper, Nicholas Lee, Shuo Yang, Christopher
Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer,
Joseph E. Gonzalez, and Ion Stoica.
Chunk 169
S-LoRA: Serv-
ing thousands of concurrent LoRA adapters. In
Proceedings of Machine Learning and Systems 6
(MLSys 2024), 2024.
Chunk 170
URL https://arxiv.org/
abs/2311.03285. [22] Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo,
Luis Ceze, and Arvind Krishnamurthy.
Chunk 171
Punica:
Multi-tenant LoRA serving. In Proceedings of Ma-
chine Learning and Systems 6 (MLSys 2024), 2024.
Chunk 172
URL https://arxiv.org/abs/2310.18547. [23] Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg
Tong, Evan Hubinger, and Alexander Matt Turner.
Chunk 173
Steering Llama 2 via contrastive activation addition. In Proceedings of the 62nd Annual Meeting of the
Association for Computational Linguistics (ACL),
2024.
Chunk 174
URL https://arxiv.org/abs/2312.06681. 8
--- Page 9 ---
[24] Kenneth
Li,
Oam
Patel,
Fernanda
Viégas,
Hanspeter
Pfister,
and
Martin
Wattenberg.
Chunk 175
Inference-time intervention: Eliciting truthful an-
swers from a language model. In Advances in Neural
Information Processing Systems 36 (NeurIPS 2023),
2023.
Chunk 176
URL https://arxiv.org/abs/2306.03341. [25] Martín Abadi, Andy Chu, Ian Goodfellow, H.
Chunk 177
Bren-
dan McMahan, Ilya Mironov, Kunal Talwar, and
Li Zhang. Deep learning with differential privacy.
Chunk 178
In Proceedings of the 2016 ACM SIGSAC Confer-
ence on Computer and Communications Security
(CCS ’16), 2016. URL https://arxiv.org/abs/
1607.00133.
Chunk 179
[26] Leandro von Werra, Younes Belkada, Lewis Tun-
stall, Edward Beeching, Tristan Thrush, Nathan
Lambert, Shengyi Huang, Kashif Rasul, and
Quentin Gallouédec. TRL: Transformer reinforce-
ment learning, 2020.
Chunk 180
URL https://github.com/
huggingface/trl. [27] Rafael Rafailov, Archit Sharma, Eric Mitchell, Ste-
fano Ermon, Christopher D.
Chunk 181
Manning, and Chelsea
Finn. Direct preference optimization: Your lan-
guage model is secretly a reward model.
Chunk 182
In Ad-
vances in Neural Information Processing Systems
36 (NeurIPS 2023), 2023. URL https://arxiv.
Chunk 183
org/abs/2305.18290. [28] Wenpeng Yin, Jamaal Hay, and Dan Roth.
Chunk 184
Bench-
marking zero-shot text classification: Datasets, eval-
uation and entailment approach. In Proceedings of
the 2019 Conference on Empirical Methods in Nat-
ural Language Processing and the 9th International
Joint Conference on Natural Language Processing
(EMNLP-IJCNLP), pages 3914–3923, 2019.
Chunk 185
URL
https://arxiv.org/abs/1909.00161. [29] Mike Lewis, Yinhan Liu, Naman Goyal, Mar-
jan
Ghazvininejad,
Abdelrahman
Mohamed,
Omer Levy, Veselin Stoyanov, and Luke Zettle-
moyer.
Chunk 186
BART: Denoising sequence-to-sequence pre-
training for natural language generation, transla-
tion, and comprehension. In Proceedings of the
58th Annual Meeting of the Association for Com-
putational Linguistics (ACL 2020), 2020.
Chunk 187
URL
https://arxiv.org/abs/1910.13461. [30] Borja Balle, Gilles Barthe, and Marco Gaboardi.
Chunk 188
Privacy amplification by subsampling: Tight anal-
yses via couplings and divergences. Advances in
Neural Information Processing Systems, 31, 2018.
Chunk 189
[31] Ilya Mironov. Rényi differential privacy.
Chunk 190
In 2017
IEEE 30th Computer Security Foundations Sympo-
sium (CSF), pages 263–275. IEEE, 2017.
Chunk 191
9