PrivateFL: Accurate, Differentially Private Federated Learning via Personalized Data Transformation

3 minute read

Published: November 12, 2023

论文 PRIVATEFL: Accurate, Differentially Private Federated Learning via Personalized Data Transformation

会议 USENIX Security ‘23

github PrivateFL

主要思想 尽管FL实现了client数据不出本地，但FL中不诚实的server会带来推理攻击，因此我们用DP来进行保护。但是DP的引入会带来FL精度的降低，究其原因是DP在给client添加不同的随机噪声时本质上增加了客户的异质性。为了解决这个问题，我们为每个client本地训练时设置个性化的数据转换方式，弥补了DP给client带来的异质性的缺陷，进而提升FL模型的精度。

第一个提出DP引入异质性导致FL精度下降：To the best of our knowledge, we are the first to associate FL accuracy degradation caused by DP with heterogeneity.

1 Introduction

FL存在风险：malicious（or untrusted） server/client, honest-but-curious server/client
可以使用三类DP：强 $\rightarrow$ 弱
- Local Differential Privacy (LDP)：加在client的本地模型的训练过程中
- Distributed Differential Privacy (DDP)：加在client本地模型中（非训练过程）
- Central Differential Privacy (CDP)：加在中心server的聚合过程中
问题：DP会降低FL的精度
先探究其原因：
- FL模型精度对客户端的数据分布是敏感的
- 相较于client的均匀分布，FL模型对client非均匀分布效果差
- 观察发现，DP给client引入了额外的异质性，因此使FL的精度损失
已有的解决方案：
- 使用更多的数据样本、更好的特性和不同的激活函数来弥补DP带来的精度损失，但是没有减少由DP引入的client的异质性
- 采用个性化FL（pFL）+DP，针对的是client用来训练的数据的异质性，而不是DP引入的异质性
  General training data heterogeneity is stable between each round, but DP-induced heterogeneity changes due to random noise added in each round.
设计的解决方案PrivateFL：
- 优化效用（minimize learning loss and maximize local client’s model utility）
- 满足DP
- 可兼容性（pluggable），可以嵌入其他模型，因为本质上来讲PrivateFL是数据转换

PrivateFL的关键点是通过每个FL客户端的个性化数据转换来改变客户端的数据分布，以驯服DP引入的异构性。

本文贡献：
- We find that the utility degradation of DP+FL is ==partially== due to additional heterogeneity introduced by DP.
- We design and implement PRIVATEFL, the first approach to tame heterogeneity introduced by DP and improve model utility via a personalized, optimized data transforma- tion. Our implementation is open-source at this repository (https://github.com/BHui97/PrivateFL).
- We demonstrate that PRIVATEFL can be combined with personalized FL and other DP utility improvement methods to further improve FL’s utility with DP.

2 Problem Formulation

预备知识：
- DP概念
- FL概念
威胁模型：
- 威胁：Untrusted Server, Untrusted Client, Honest-But-Curious Server, Honest-But-Curious Client
- 模型：强 $\rightarrow$ 弱
  - LDP+FL: defend against both untrusted server and untrusted clients
  - DDP+FL: defend against a honest-but-curious server and untrusted clients
    不是在训练过程中加噪，由于增加的噪声不足以像LDP那样提供保障，因此还会受到server的攻击
  - CDP+FL: defend against untrusted clients, but not an untrusted server
    因为server可以接触无噪的client本地模型
Motivation：探究DP引起FL精度下降的原因，此处在验证DP会增加client的异质性
- 做法：在每个FL训练轮中，我们计算客户端的局部模型在其训练数据上的局部训练损失。因此，对于每个客户端，我们都有其在FL训练轮次中的本地培训损失分布。在理想情况下，这种分布是同质的，即当客户端的本地训练数据是iid且在训练过程中不使用DP时，客户端的分布非常相似。然后我们计算每对客户端的分布之间的推土机距离(EMD)。两个客户端的局部训练损失分布之间的EMD越大，表明异质性越强。
- ==DP引入额外的异质性使得FL精度降低，DP引入的异质性随着隐私水平的增加而增加。==
  图1：横线是client本身non-iid时异质性的基线，阴影表示DP额外引入的异质性
- LDP-introduced Heterogeneity:
  - 裁剪：DP-SGD clips gradients of random sample of training data
  - 加噪：DP-SGD adds random noise to the clipped gradients
- DDP-introduced Heterogeneity:
  - DDP adds random noise to the local models
- CDP-introduced Heterogeneity:
  - 当client的数据时non-iid时，CDP会引入额外的异质性
  - 相同的噪声加在全局模型上，传到client时会被client在本地训练时不同程度放大，因为client数据本身具有异质性，然后再次传回server
  - 当client的本地训练数据异构性不同时，DP在训练过程中引入client不同异构性

3 Methodology

个性化数据转换（Personalized Data Transformation）
- 在每个客户端都定义一个Transformation Function $x_t = T_k(x)$
- 满足两个条件：tame heterogeneity & preserve original features
- 可以将 $x_t$ 视为第0层
- $x_t$ 和 local model 共同学习，但是 $x_t$ 不出本地不会参与server的聚合
- Section 4发现当 $x_t$ 是一个线性变换时，就可以满足上述两个条件
  整体学习过程
DP+个性化数据转换（Differentially Private Transformation）
LDP，DDP，CDP在Server梯度聚合、Client梯度处理均采用不同的处理方式。
- Server
- Client

4 Empirical Analysis of Transformation

本节证明线性变换是最优选择，并探究了最优初始参数

Transformation Function Space Analysis: compare different transformation functions
从准确率、不同数据异质性下EMD可判断，$\alpha x + \beta$ 是最优
- 准确率
- 数据异质性不同情况下的EMD
Transformation Parameter Space Analysis: 1) parameter dimension 2) initial value
- 不同维度
- 不同初始值

5 Implementation and Experimental Setup

两个指标：
- 全局模型的测试准确率
- 每个客户端本地训练损失分布之间的EMD
参数设置

6 Evaluation

[RQ1]HowdoesPRIVATEFLimproveLDP,CDP,andDDP on FedAVG and personalized FLs?
- On FedAVG:
  - PrivateFL-LDP/CDP vs FedAVG-LDP/CDP
  - PrivateFL-DDP vs DDP
- On personalized FLs:
  - personalized FLs vs personalized FLs+PrivateFL
personalized FL and PrivateFL capture different types of heterogeneity (e.g., local training data heterogeneity and DP-induced heterogeneity).
[RQ2] What is the performance of existing DP-improving methods on FL, and how does PRIVATEFL further improve them as an add-on method?
- Pre-trainedEncoder (+PrivateFL)
- Activation Function Substitution (+PrivateFL)
- Per-round Clip vs. Per-step Clip for CDP (+PrivateFL)
[RQ3] Why can PRIVATEFL improve FL’s accuracy under DP?
- 两个评测指标：客户端loss分布的异质性，全局模型的测试损失
[RQ4] How does different client data distribution affect the performance of PRIVATEFL?
- 测试每个客户端有2、4、6、8或10 (i.i.d)类数据时，使用LDP和CDP的FedAVG和PRIVATEFL的准确性。
[RQ5] How does different number of clients affect the performance of PRIVATEFL?
- 测试客户端数量为50、100、200、500的FedAVG和PRI- VATEFL的LDP和CDP精度。
- PrivateFL对于CDP和LDP的性能始终优于FedAVG，特别是当系统有更多的客户端时，部分原因是客户越多，异质性越大。
[RQ6] How does PRIVATEFL perform in cross-device FL?
- 测试FedAVG和PrivateFL的跨设备FL精度，每个客户端参加一到两轮训练。

Federated Learning
Differential Privacy in FL
Differential Privacy in Centralized Learning

8 Conclusion

在这项工作中，我们首次发现DP给FL中的客户端引入了额外的异质性，从而阻碍了准确性。然后，我们证明了客户端的个性化数据转换可以减轻DP引起的异质性，从而提高不同DP变体(包括LDP, DDP和CDP)的差异私有FL的准确性。此外，线性变换优于其他替代方法。我们的个性化数据转换可以与个性化FL方法和提高DP准确度的方法相结合，进一步提高差分私有FL的准确度。

Share on

Twitter Facebook LinkedIn

Ming Liu

PrivateFL: Accurate, Differentially Private Federated Learning via Personalized Data Transformation

1 Introduction

2 Problem Formulation

3 Methodology

4 Empirical Analysis of Transformation

5 Implementation and Experimental Setup

6 Evaluation

8 Conclusion

Share on

You May Also Enjoy

FedAvg算法代码分析

Ming Liu

1 Introduction

2 Problem Formulation

3 Methodology

4 Empirical Analysis of Transformation

5 Implementation and Experimental Setup

6 Evaluation

7 Related Work

8 Conclusion

Share on

You May Also Enjoy

FedAvg算法代码分析