查看原文
其他

论文合集丨联邦学习 x ICML'2023

白小鱼 开放隐私计算 2024-01-09



本文是由白小鱼整理的ICML 2023会议中,与联邦学习相关的论文合集及摘要翻译。







1. Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning

Authors: Junyi Zhu; Ruicong Yao; Matthew B. Blaschko

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhu23m.html

Abstract: In Federated Learning (FL) and many other distributed training frameworks, collaborators can hold their private data locally and only share the network weights trained with the local data after multiple iterations. Gradient inversion is a family of privacy attacks that recovers data from its generated gradients. Seemingly, FL can provide a degree of protection against gradient inversion attacks on weight updates, since the gradient of a single step is concealed by the accumulation of gradients over multiple local iterations. In this work, we propose a principled way to extend gradient inversion attacks to weight updates in FL, thereby better exposing weaknesses in the presumed privacy protection inherent in FL. In particular, we propose a surrogate model method based on the characteristic of two-dimensional gradient flow and low-rank property of local updates. Our method largely boosts the ability of gradient inversion attacks on weight updates containing many iterations and achieves state-of-the-art (SOTA) performance. Additionally, our method runs up to 100×100×100\times faster than the SOTA baseline in the common FL scenario. Our work re-evaluates and highlights the privacy risk of sharing network weights. Our code is available at https://github.com/JunyiZhu-AI/surrogate_model_extension.

ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)和许多其他分布式训练框架中,协作者可以在本地保存自己的私有数据,并且仅在多次迭代后共享用本地数据训练的网络权重。梯度反转是一系列隐私攻击,可从生成的梯度中恢复数据。表面上,FL 可以在一定程度上防止权重更新时的梯度反转攻击,因为单个步骤的梯度被多次局部迭代的梯度累积所隐藏。在这项工作中,我们提出了一种原则性方法,将梯度反转攻击扩展到 FL 中的权重更新,从而更好地暴露 FL 固有的假定隐私保护的弱点。特别是,我们提出了一种基于二维梯度流特征和局部更新的低秩特性的代理模型方法。我们的方法极大地提高了对包含多次迭代的权重更新进行梯度反转攻击的能力,并实现了最先进的(SOTA)性能。此外,我们的方法在常见 FL 场景中的运行速度比 SOTA 基线快 100×100×100 倍。我们的工作重新评估并强调了共享网络权重的隐私风险。我们的代码可在 https://github.com/JunyiZhu-AI/surrogate_model_extension 获取。

Notes:

PUB (https://openreview.net/forum?id=Kz0IODB2kj)

PDF (https://arxiv.org/abs/2306.00127)

CODE (https://github.com/junyizhu-ai/surrogate_model_extension)







2. LeadFL: Client Self-Defense against Model Poisoning in Federated Learning

Authors: Chaoyi Zhu; Stefanie Roos; Lydia Y. Chen

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhu23j.html

Abstract: Federated Learning is highly susceptible to backdoor and targeted attacks as participants can manipulate their data and models locally without any oversight on whether they follow the correct process. There are a number of server-side defenses that mitigate the attacks by modifying or rejecting local updates submitted by clients. However, we find that bursty adversarial patterns with a high variance in the number of malicious clients can circumvent the existing defenses. We propose a client-self defense, LeadFL, that is combined with existing server-side defenses to thwart backdoor and targeted attacks. The core idea of LeadFL is a novel regularization term in local model training such that the Hessian matrix of local gradients is nullified. We provide the convergence analysis of LeadFL and its robustness guarantee in terms of certified radius. Our empirical evaluation shows that LeadFL is able to mitigate bursty adversarial patterns for both iid and non-iid data distributions. It frequently reduces the backdoor accuracy from more than 75% for state-of-the-art defenses to less than 10% while its impact on the main task accuracy is always less than for other client-side defenses.

ISSN: 2640-3498 abstractTranslation: 联邦学习很容易受到后门和有针对性的攻击,因为参与者可以在本地操纵他们的数据和模型,而无需监督他们是否遵循正确的流程。有许多服务器端防御措施可以通过修改或拒绝客户端提交的本地更新来减轻攻击。然而,我们发现恶意客户端数量差异较大的突发对抗模式可以绕过现有防御。我们提出了一种客户端自我防御 LeadFL,它与现有的服务器端防御相结合,以阻止后门和有针对性的攻击。LeadFL 的核心思想是局部模型训练中的一个新颖的正则化项,使得局部梯度的 Hessian 矩阵无效。我们提供 LeadFL 的收敛分析及其在认证半径方面的稳健性保证。我们的实证评估表明,LeadFL 能够减轻独立同分布和非独立同分布数据分布的突发对抗模式。它经常将后门准确率从最先进防御的 75% 以上降低到 10% 以下,而其对主要任务准确度的影响始终小于其他客户端防御。

Notes:

PUB (https://openreview.net/forum?id=2CiaH2Tq4G)

CODE (https://github.com/chaoyitud/LeadFL)

  







3. XTab: Cross-table Pretraining for Tabular Transformers

Authors: Bingzhao Zhu; Xingjian Shi; Nick Erickson; Mu Li; George Karypis; Mahsa Shoaran

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhu23k.html

Abstract: The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.

ISSN: 2640-3498 abstractTranslation: 计算机视觉和自然语言处理中自我监督学习的成功激发了表格数据预训练方法的发展。然而,大多数现有的表格自监督学习模型无法利用多个数据表中的信息,并且无法推广到新表。在这项工作中,我们介绍了 XTab,一个用于在不同领域的数据集上对表格转换器进行跨表预训练的框架。我们通过利用独立特征器并使用联邦学习来预训练共享组件,解决了表之间列类型和数量不一致的挑战。在 OpenML-AutoML Benchmark (AMLB) 的 84 个表格预测任务上进行测试,我们表明 (1) XTab 持续提高了多个表格转换器的泛化性、学习速度和性能,(2) 通过 XTab 预训练 FT-Transformer,我们在回归、二元和多类分类等各种任务上取得比其他最先进的表格深度学习模型更优越的性能。

Notes:

PUB (https://openreview.net/forum?id=uGORNDmIdr)

PDF (https://arxiv.org/abs/2305.06090)

CODE (https://github.com/bingzhaozhu/xtab)







4. Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm

Authors: Boxin Zhao; Boxiang Lyu; Raul Castro Fernandez; Mladen Kolar

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhao23e.html

Abstract: High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers pay to train a model, the market uses that budget to identify data and train the model (the budget allocation problem), and finally the market compensates data providers according to their data contribution (revenue allocation problem). For example, a bank could pay the data market to access data from other financial institutions to train a fraud detection model. Compensating data contributors requires understanding data’s contribution to the model; recent efforts to solve this revenue allocation problem based on the Shapley value are inefficient to lead to practical data markets. In this paper, we introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time. The new algorithm employs an adaptive sampling process that selects data from those providers who are contributing the most to the model. Better data means that the algorithm accesses those providers more often, and more frequent accesses corresponds to higher compensation. Furthermore, the algorithm can be deployed in both centralized and federated scenarios, boosting its applicability. We provide theoretical guarantees for the algorithm that show the budget is used efficiently and the properties of revenue allocation are similar to Shapley’s. Finally, we conduct an empirical evaluation to show the performance of the algorithm in practical scenarios and when compared to other baselines. Overall, we believe that the new algorithm paves the way for the implementation of practical data markets.

ISSN: 2640-3498 abstractTranslation: 高质量的机器学习模型取决于对高质量训练数据的访问。当数据尚不可用时,获取这些数据既乏味又昂贵。数据市场有助于识别有价值的训练数据:模型消费者支付训练模型的费用,市场使用该预算来识别数据并训练模型(预算分配问题),最后市场根据数据提供者的数据贡献(收入)对数据提供者进行补偿分配问题)。例如,银行可以向数据市场付费以访问其他金融机构的数据来训练欺诈检测模型。补偿数据贡献者需要了解数据对模型的贡献;最近为解决基于 Shapley 值的收入分配问题而做出的努力并不足以带来实际的数据市场。在本文中,我们引入了一种新算法,可以在线性时间内同时解决预算分配和收入分配问题。新算法采用自适应采样过程,从对模型贡献最大的提供商中选择数据。更好的数据意味着算法更频繁地访问这些提供者,而更频繁的访问对应着更高的补偿。此外,该算法既可以部署在集中式场景中,也可以部署在联邦场景中,增强了算法的适用性。我们为该算法提供了理论保证,表明预算得到了有效利用,并且收入分配的属性与沙普利的算法类似。最后,我们进行实证评估,以显示该算法在实际场景中以及与其他基线相比的性能。总的来说,我们相信新算法为实际数据市场的实施铺平了道路。

Notes:

PUB (https://openreview.net/forum?id=iAgQfF3atY)

PDF (https://arxiv.org/abs/2306.02543)

CODE (https://github.com/boxinz17/data-market-via-adaptive-sampling)

  







5. Towards Unbiased Training in Federated Open-world Semi-supervised Learning

Authors: Jie Zhang; Xiaosong Ma; Song Guo; Wenchao Xu

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhang23af.html

Abstract: Federated Semi-supervised Learning (FedSSL) has emerged as a new paradigm for allowing distributed clients to collaboratively train a machine learning model over scarce labeled data and abundant unlabeled data. However, existing works for FedSSL rely on a closed-world assumption that all local training data and global testing data are from seen classes observed in the labeled dataset. It is crucial to go one step further: adapting FL models to an open-world setting, where unseen classes exist in the unlabeled data. In this paper, we propose a novel Federatedopen-world Semi-Supervised Learning (FedoSSL) framework, which can solve the key challenge in distributed and open-world settings, i.e., the biased training process for heterogeneously distributed unseen classes. Specifically, since the advent of a certain unseen class depends on a client basis, the locally unseen classes (exist in multiple clients) are likely to receive differentiated superior aggregation effects than the globally unseen classes (exist only in one client). We adopt an uncertainty-aware suppressed loss to alleviate the biased training between locally unseen and globally unseen classes. Besides, we enable a calibration module supplementary to the global aggregation to avoid potential conflicting knowledge transfer caused by inconsistent data distribution among different clients. The proposed FedoSSL can be easily adapted to state-of-the-art FL methods, which is also validated via extensive experiments on benchmarks and real-world datasets (CIFAR-10, CIFAR-100 and CINIC-10).

ISSN: 2640-3498 abstractTranslation: 联邦半监督学习 (FedSSL) 已成为一种新范例,允许分布式客户端在稀缺的标记数据和大量未标记数据上协作训练机器学习模型。然而,FedSSL 的现有工作依赖于一个封闭世界的假设,即所有本地训练数据和全局测试数据都来自标记数据集中观察到的类。更进一步至关重要:使 FL 模型适应开放世界环境,其中未标记数据中存在未见的类。在本文中,我们提出了一种新颖的联邦开放世界半监督学习(FedoSSL)框架,它可以解决分布式和开放世界环境中的关键挑战,即异构分布的看不见的类的有偏差训练过程。具体而言,由于某一未见类的出现取决于客户端基础,因此局部未见类(存在于多个客户端中)可能比全局未见类(仅存在于一个客户端中)获得差异化的优越聚合效果。我们采用不确定性感知抑制损失来减轻局部未见和全局未见类之间的偏差训练。此外,我们启用了一个校准模块来补充全局聚合,以避免由于不同客户端之间的数据分布不一致而导致潜在的知识传输冲突。所提出的 FedoSSL 可以轻松适应最先进的 FL 方法,该方法也通过基准和真实数据集(CIFAR-10、CIFAR-100 和 CINIC-10)的大量实验得到验证。

Notes:

PUB (https://openreview.net/forum?id=gHfybro5Sj)

PDF (https://arxiv.org/abs/2305.00771)

SLIDES(https://icml.cc/media/icml-2023/Slides/25109.pdf)

  







6. Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction

Authors: Jianyi Zhang; Ang Li; Minxue Tang; Jingwei Sun; Xiang Chen; Fan Zhang; Changyou Chen; Yiran Chen; Hai Li

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhang23y.html

Abstract: Due to the often limited communication bandwidth of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training at each communication round. Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on this observation, we design an efficient heterogeneity-aware client sampling mechanism, namely, Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the grouped dataset from the intentionally selected clients. We first propose a measure of class-imbalance which can be derived in a privacy-preserving way. Based on this measure, we design a computation-efficient client sampling strategy such that the actively selected clients will generate a more class-balanced grouped dataset with theoretical guarantees. Experimental results show that Fed-CBS outperforms the status quo approaches in terms of test accuracy and the rate of convergence while achieving comparable or even better performance than the ideal setting where all the available clients participate in the FL training.

ISSN: 2640-3498 abstractTranslation: 由于边缘设备的通信带宽通常有限,大多数现有的联邦学习(FL)方法在每轮通信中仅随机选择设备的子集参与训练。与吸引所有可用客户端相比,这种随机选择机制可能会导致非 IID(独立同分布)数据的性能显着下降。在本文中,我们提出了我们的关键观察结果,即导致这种性能下降的根本原因是来自随机选择的客户端的分组数据的类不平衡。基于这一观察,我们设计了一种有效的异构感知客户端采样机制,即联邦类平衡采样(Fed-CBS),它可以有效地减少有意选择的客户端分组数据集的类不平衡。我们首先提出了一种可以通过隐私保护方式导出的阶级不平衡衡量标准。基于此测量,我们设计了一种计算高效的客户端采样策略,以便主动选择的客户端将生成具有理论保证的更加类平衡的分组数据集。实验结果表明,Fed-CBS 在测试准确性和收敛速度方面优于现状方法,同时实现了与所有可用客户端都参与 FL 训练的理想设置相当甚至更好的性能。

Notes:

PUB (https://openreview.net/forum?id=NcbY2UOfko)

PDF (https://arxiv.org/abs/2209.15245)







7. FedCR: Personalized Federated Learning Based on Across-Client Common Representation with Conditional Mutual Information Regularization

Authors: Hao Zhang; Chenglin Li; Wenrui Dai; Junni Zou; Hongkai Xiong

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhang23w.html

Abstract: In personalized federated learning (PFL), multiple clients train customized models to fulfill their personal objectives, which, however, are prone to overfitting to local data due to the heterogeneity and scarcity of local data. To address this, we propose from the information-theoretic perspective a personalized federated learning framework based on the common representation learned across clients, named FedCR. Specifically, we introduce to the local client update a regularizer that aims at minimizing the discrepancy between local and global conditional mutual information (CMI), such that clients are encouraged to learn and exploit the common representation. Upon this, each client learns individually a customized predictor (head), while the extractor (body) remains to be aggregated by the server. Our CMI regularizer leads to a theoretically sound alignment between the local and global stochastic feature distributions in terms of their Kullback-Leibler (KL) divergence. More importantly, by modeling the global joint feature distribution as a product of multiple local feature distributions, clients can efficiently extract diverse information from the global data but without need of the raw data from other clients. We further show that noise injection via feature alignment and ensemble of local predictors in FedCR would help enhance its generalization capability. Experiments on benchmark datasets demonstrate a consistent performance gain and better generalization behavior of FedCR.

ISSN: 2640-3498 abstractTranslation: 在个性化联邦学习(PFL)中,多个客户训练定制模型来实现他们的个人目标,然而,由于本地数据的异构性和稀缺性,这些模型很容易过度拟合本地数据。为了解决这个问题,我们从信息论的角度提出了一种基于跨客户端学习的共同表示的个性化联邦学习框架,名为 FedCR。具体来说,我们向本地客户端引入了一个正则化器,旨在最小化本地和全局条件互信息(CMI)之间的差异,从而鼓励客户端学习和利用共同表示。在此基础上,每个客户端单独学习一个定制的预测器(头部),而提取器(主体)仍然由服务器聚合。我们的 CMI 正则化器在理论上使局部和全局随机特征分布在 Kullback-Leibler (KL) 散度方面实现了良好的对齐。更重要的是,通过将全局联邦特征分布建模为多个局部特征分布的乘积,客户端可以有效地从全局数据中提取不同的信息,而不需要来自其他客户端的原始数据。我们进一步表明,通过 FedCR 中的特征对齐和局部预测变量集成进行噪声注入将有助于增强其泛化能力。基准数据集上的实验证明了 FedCR 具有一致的性能增益和更好的泛化行为。

Notes:

PUB (https://openreview.net/forum?id=YDC5jTS3LR)

CODE (https://github.com/haozzh/FedCR)

  







8. No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation

Authors: Feilong Zhang; Xianming Liu; Shiyi Lin; Gang Wu; Xiong Zhou; Junjun Jiang; Xiangyang Ji

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/zhang23aa.html

Abstract: Federated learning suffers from a latency bottleneck induced by network stragglers, which hampers the training efficiency significantly. In addition, due to the heterogeneous data distribution and security requirements, simple and fast averaging aggregation is not feasible anymore. Instead, complicated aggregation operations, such as knowledge distillation, are required. The time cost for complicated aggregation becomes a new bottleneck that limits the computational efficiency of FL. In this work, we claim that the root cause of training latency actually lies in the aggregation-then-broadcasting workflow of the server. By swapping the computational order of aggregation and broadcasting, we propose a novel and efficient parallel federated learning (PFL) framework that unlocks the edge nodes during global computation and the central server during local computation. This fully asynchronous and parallel pipeline enables handling complex aggregation and network stragglers, allowing flexible device participation as well as achieving scalability in computation. We theoretically prove that synchronous and asynchronous PFL can achieve a similar convergence rate as vanilla FL. Extensive experiments empirically show that our framework brings up to 5.56×5.56×5.56\times speedup compared with traditional FL. Code is available at: https://github.com/Hypervoyager/PFL.

ISSN: 2640-3498 abstractTranslation: 联邦学习受到网络落后者引起的延迟瓶颈的影响,这显着降低了训练效率。此外,由于异构数据分布和安全要求,简单快速的平均聚合已经不再可行。相反,需要复杂的聚合操作,例如知识蒸馏。复杂聚合的时间成本成为限制FL计算效率的新瓶颈。在这项工作中,我们声称训练延迟的根本原因实际上在于服务器的聚合然后广播工作流程。通过交换聚合和广播的计算顺序,我们提出了一种新颖且高效的并行联邦学习(PFL)框架,该框架在全局计算期间解锁边缘节点,在本地计算期间解锁中央服务器。这种完全异步和并行的管道能够处理复杂的聚合和网络落后者,允许灵活的设备参与以及实现计算的可扩展性。我们从理论上证明同步和异步 PFL 可以实现与 vanilla FL 相似的收敛速度。大量实验表明,与传统 FL 相比,我们的框架带来了 5.56×5.56×5.56 倍的加速。代码位于:https://github.com/Hypervoyager/PFL。

Notes:

PUB (https://openreview.net/forum?id=AMuNQEUmGr)

CODE (https://github.com/Hypervoyager/PFL)

  







9. Doubly Adversarial Federated Bandits

Authors: Jialin Yi; Milan Vojnovic

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/yi23a.html

Abstract: We study a new non-stochastic federated multiarmed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call doubly adversarial. In this setting, different agents may choose the same arm in the same time step but observe different feedback. The goal of each agent is to find a globally best arm in hindsight that has the lowest cumulative loss averaged over all agents, which necessities the communication among agents. We provide regret lower bounds for any federated bandit algorithm under different settings, when agents have access to full-information feedback, or the bandit feedback. For the bandit feedback setting, we propose a near-optimal federated bandit algorithm called FEDEXP3. Our algorithm gives a positive answer to an open question proposed in (Cesa-Bianchi et al., 2016): FEDEXP3 can guarantee a sub-linear regret without exchanging sequences of selected arm identities or loss sequences among agents. We also provide numerical evaluations of our algorithm to validate our theoretical results and demonstrate its effectiveness on synthetic and real-world datasets.

ISSN: 2640-3498 abstractTranslation: 我们研究了一种新的非随机联邦多臂老虎机问题,其中多个代理通过通信网络进行协作。臂的损失是由一个不经意的对手分配的,该对手不仅指定每个时间步的损失,还指定每个代理的损失,我们称之为双重对抗。在这种情况下,不同的智能体可能会在相同的时间步长中选择相同的手臂,但会观察到不同的反馈。每个智能体的目标是事后找到一个全局最佳的手臂,该手臂在所有智能体中平均累积损失最低,这需要智能体之间的通信。当代理可以访问完整信息反馈或强盗反馈时,我们为不同设置下的任何联邦强盗算法提供后悔下限。对于强盗反馈设置,我们提出了一种接近最优的联邦强盗算法,称为 FEDEXP3。我们的算法对(Cesa-Bianchi et al., 2016)中提出的一个悬而未决的问题给出了肯定的答案:FEDEXP3 可以保证亚线性后悔,而无需在代理之间交换所选手臂身份的序列或丢失序列。我们还提供算法的数值评估,以验证我们的理论结果并证明其在合成和现实数据集上的有效性。

Notes:

PUB (https://openreview.net/forum?id=FjOB0g7iRf)

PDF (https://arxiv.org/abs/2301.09223) CODE (https://github.com/jialinyi94/doubly-stochastic-federataed-bandit)

  







10. FedDisco: Federated Learning with Discrepancy-Aware Collaboration

Authors: Rui Ye; Mingkai Xu; Jianyu Wang; Chenxin Xu; Siheng Chen; Yanfeng Wang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/ye23f.html

Abstract: This work considers the category distribution heterogeneity in federated learning. This issue is due to biased labeling preferences at multiple clients and is a typical setting of data heterogeneity. To alleviate this issue, most previous works consider either regularizing local models or fine-tuning the global model, while they ignore the adjustment of aggregation weights and simply assign weights based on the dataset size. However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights. We thus propose a novel aggregation method, Federated Learning with Discrepancy-Aware Collaboration (FedDisco), whose aggregation weights not only involve both the dataset size and the discrepancy value, but also contribute to a tighter theoretical upper bound of the optimization error. FedDisco can promote utility and modularity in a communication- and computation-efficient way. Extensive experiments show that our FedDisco outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance the performance. Our code will be available at https://github.com/MediaBrain-SJTU/FedDisco.

ISSN: 2640-3498 abstractTranslation: 这项工作考虑了联邦学习中的类别分布异质性。此问题是由于多个客户的标签偏好存在偏差造成的,并且是数据异质性的典型设置。为了缓解这个问题,之前的大多数工作要么考虑对局部模型进行正则化,要么对全局模型进行微调,而忽略了聚合权重的调整,只是根据数据集大小分配权重。然而,根据我们的经验观察和理论分析,我们发现数据集大小并不是最优的,局部和全局类别分布之间的差异可能是确定聚合权重的有益和补充指标。因此,我们提出了一种新颖的聚合方法,即具有差异感知协作的联邦学习(FedDisco),其聚合权重不仅涉及数据集大小和差异值,而且有助于更严格的优化误差理论上限。FedDisco 可以通过高效通信和计算的方式提升实用性和模块化性。大量实验表明,我们的 FedDisco 优于多种最先进的方法,并且可以轻松与许多现有方法结合以进一步提高性能。我们的代码将在 https://github.com/MediaBrain-SJTU/FedDisco 上提供。

Notes:

PUB (https://openreview.net/forum?id=cHJ1VuZorx)

PDF (https://arxiv.org/abs/2305.19229)

CODE (https://github.com/MediaBrain-SJTU/FedDisco)

  







 11. Personalized Federated Learning with Inferred Collaboration Graphs

Authors: Rui Ye; Zhenyang Ni; Fangzhao Wu; Siheng Chen; Yanfeng Wang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/ye23b.html

Abstract: Personalized federated learning (FL) aims to collaboratively train a personalized model for each client. Previous methods do not adaptively determine who to collaborate at a fine-grained level, making them difficult to handle diverse data heterogeneity levels and those cases where malicious clients exist. To address this issue, our core idea is to learn a collaboration graph, which models the benefits from each pairwise collaboration and allocates appropriate collaboration strengths. Based on this, we propose a novel personalized FL algorithm, pFedGraph, which consists of two key modules: (1) inferring the collaboration graph based on pairwise model similarity and dataset size at server to promote fine-grained collaboration and (2) optimizing local model with the assistance of aggregated model at client to promote personalization. The advantage of pFedGraph is flexibly adaptive to diverse data heterogeneity levels and model poisoning attacks, as the proposed collaboration graph always pushes each client to collaborate more with similar and beneficial clients. Extensive experiments show that pFedGraph consistently outperforms the other 141414 baseline methods across various heterogeneity levels and multiple cases where malicious clients exist. Code will be available at https://github.com/MediaBrain-SJTU/pFedGraph.

ISSN: 2640-3498 abstractTranslation: 个性化联邦学习(FL)旨在为每个客户协作训练个性化模型。以前的方法不能自适应地确定谁在细粒度的级别上进行协作,这使得它们难以处理不同的数据异构级别以及存在恶意客户端的情况。为了解决这个问题,我们的核心思想是学习协作图,该图对每次配对协作的好处进行建模并分配适当的协作优势。基于此,我们提出了一种新颖的个性化 FL 算法 pFedGraph,它由两个关键模块组成:(1)基于成对模型相似性和服务器数据集大小推断协作图,以促进细粒度协作;(2)优化本地协作模型在客户端聚合模型的帮助下促进个性化。pFedGraph 的优点是可以灵活地适应不同的数据异构级别和模型中毒攻击,因为所提出的协作图总是促使每个客户端与相似且有益的客户端进行更多协作。大量实验表明,在各种异质性级别和存在恶意客户端的多种情况下,pFedGraph 始终优于其他 141414 个基线方法。代码可在 https://github.com/MediaBrain-SJTU/pFedGraph 获取。

Notes:

PUB (https://openreview.net/forum?id=33fj5Ph3ot)

CODE (https://github.com/MediaBrain-SJTU/pFedGraph)

  







12. Communication-Efficient Federated Hypergradient Computation via Aggregated Iterative Differentiation

Authors: Peiyao Xiao; Kaiyi Ji

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/xiao23b.html

Abstract: Federated bilevel optimization has attracted increasing attention due to emerging machine learning and communication applications. The biggest challenge lies in computing the gradient of the upper-level objective function (i.e., hypergradient) in the federated setting due to the nonlinear and distributed construction of a series of global Hessian matrices. In this paper, we propose a novel communication-efficient federated hypergradient estimator via aggregated iterative differentiation (AggITD). AggITD is simple to implement and significantly reduces the communication cost by conducting the federated hypergradient estimation and the lower-level optimization simultaneously. We show that the proposed AggITD-based algorithm achieves the same sample complexity as existing approximate implicit differentiation (AID)-based approaches with much fewer communication rounds in the presence of data heterogeneity. Our results also shed light on the great advantage of ITD over AID in the federated/distributed hypergradient estimation. This differs from the comparison in the non-distributed bilevel optimization, where ITD is less efficient than AID. Our extensive experiments demonstrate the great effectiveness and communication efficiency of the proposed method.

ISSN: 2640-3498 abstractTranslation: 由于新兴的机器学习和通信应用,联邦双层优化引起了越来越多的关注。由于一系列全局 Hessian 矩阵的非线性和分布式构造,最大的挑战在于计算联邦设置中上层目标函数(即超梯度)的梯度。在本文中,我们通过聚合迭代微分(AggITD)提出了一种新颖的通信高效联邦超梯度估计器。AggITD 实现简单,并且通过同时进行联邦超梯度估计和较低级别的优化,显着降低了通信成本。我们表明,所提出的基于 AggITD 的算法实现了与现有的基于近似隐式微分 (AID) 的方法相同的样本复杂性,并且在存在数据异构性的情况下,通信轮次要少得多。我们的结果还揭示了在联邦/分布式超梯度估计中 ITD 相对于 AID 的巨大优势。这与非分布式双层优化中的比较不同,其中 ITD 的效率低于 AID。我们广泛的实验证明了该方法的巨大有效性和通信效率。

Notes:

PUB (https://openreview.net/forum?id=IYyhNudD9V)

PDF (https://arxiv.org/abs/2302.04969)

  







13. Personalized Federated Learning under Mixture of Distributions

Authors: Yue Wu; Shuaicheng Zhang; Wenchao Yu; Yanchi Liu; Quanquan Gu; Dawei Zhou; Haifeng Chen; Wei Cheng

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/wu23z.html

Abstract: The recent trend towards Personalized Federated Learning (PFL) has garnered significant attention as it allows for the training of models that are tailored to each client while maintaining data privacy. However, current PFL techniques primarily focus on modeling the conditional distribution heterogeneity (i.e. concept shift), which can result in suboptimal performance when the distribution of input data across clients diverges (i.e. covariate shift). Additionally, these techniques often lack the ability to adapt to unseen data, further limiting their effectiveness in real-world scenarios. To address these limitations, we propose a novel approach, FedGMM, which utilizes Gaussian mixture models (GMM) to effectively fit the input data distributions across diverse clients. The model parameters are estimated by maximum likelihood estimation utilizing a federated Expectation-Maximization algorithm, which is solved in closed form and does not assume gradient similarity. Furthermore, FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.

ISSN: 2640-3498 abstractTranslation: 最近的个性化联邦学习(PFL)趋势引起了人们的广泛关注,因为它允许训练针对每个客户量身定制的模型,同时保持数据隐私。然而,当前的 PFL 技术主要侧重于对条件分布异质性(即概念转变)进行建模,当客户端之间的输入数据分布发散(即协变量转变)时,这可能会导致性能不佳。此外,这些技术通常缺乏适应看不见的数据的能力,进一步限制了它们在现实场景中的有效性。为了解决这些限制,我们提出了一种新方法 FedGMM,它利用高斯混合模型 (GMM) 来有效地拟合不同客户之间的输入数据分布。模型参数是利用联邦期望最大化算法通过最大似然估计来估计的,该算法以封闭形式求解并且不假设梯度相似性。此外,FedGMM 还具有以最小的开销适应新客户的额外优势,并且还可以实现不确定性量化。对合成数据集和基准数据集的实证评估证明了我们的方法在 PFL 分类和新样本检测方面的卓越性能。

Notes:

PUB (https://openreview.net/forum?id=nmVOTsQGR9)

PDF (https://arxiv.org/abs/2305.01068)

CODE (https://github.com/zshuai8/FedGMM_ICML2023)

  







14. Anchor Sampling for Federated Learning with Partial Client Participation

Authors: Feijie Wu; Song Guo; Zhihao Qu; Shiqi He; Ziming Liu; Jing Gao

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/wu23e.html

Abstract: Compared with full client participation, partial client participation is a more practical scenario in federated learning, but it may amplify some challenges in federated learning, such as data heterogeneity. The lack of inactive clients’ updates in partial client participation makes it more likely for the model aggregation to deviate from the aggregation based on full client participation. Training with large batches on individual clients is proposed to address data heterogeneity in general, but their effectiveness under partial client participation is not clear. Motivated by these challenges, we propose to develop a novel federated learning framework, referred to as FedAMD, for partial client participation. The core idea is anchor sampling, which separates partial participants into anchor and miner groups. Each client in the anchor group aims at the local bullseye with the gradient computation using a large batch. Guided by the bullseyes, clients in the miner group steer multiple near-optimal local updates using small batches and update the global model. By integrating the results of the two groups, FedAMD is able to accelerate the training process and improve the model performance. Measured by ϵϵ\epsilon-approximation and compared to the state-of-the-art methods, FedAMD achieves the convergence by up to O(1/ϵ)O(1/ϵ)O(1/\epsilon) fewer communication rounds under non-convex objectives. Empirical studies on real-world datasets validate the effectiveness of FedAMD and demonstrate the superiority of the proposed algorithm: Not only does it considerably save computation and communication costs, but also the test accuracy significantly improves.

ISSN: 2640-3498 abstractTranslation: 与完全客户端参与相比,部分客户端参与是联邦学习中更实用的场景,但它可能会放大联邦学习中的一些挑战,例如数据异构性。部分客户参与中缺乏不活跃客户的更新使得模型聚合更有可能偏离基于完全客户参与的聚合。建议对单个客户进行大批量培训以解决一般数据异构性,但其在部分客户参与下的有效性尚不清楚。受这些挑战的推动,我们建议开发一种新颖的联邦学习框架,称为 FedAMD,以供部分客户参与。核心思想是锚定抽样,将部分参与者分为锚定组和矿工组。锚定组中的每个客户端都针对本地靶心,使用大批量进行梯度计算。在牛眼的引导下,矿工组中的客户使用小批量引导多个接近最优的本地更新,并更新全局模型。通过整合两组的结果,FedAMD 能够加速训练过程并提高模型性能。通过 εε\epsilon 近似进行测量,并与最先进的方法相比,FedAMD 在以下情况下实现了最多 O(1/ε)O(1/ε)O(1/\epsilon) 更少的通信轮数的收敛非凸目标。对真实世界数据集的实证研究验证了FedAMD的有效性,并证明了所提出算法的优越性:不仅大大节省了计算和通信成本,而且测试精度也显着提高。

Notes:

PUB (https://openreview.net/forum?id=Ht9r3P6Lts)

PDF (https://arxiv.org/abs/2206.05891)

CODE (https://github.com/harliwu/fedamd)

  







15. The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Authors: Jiin Woo; Gauri Joshi; Yuejie Chi

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/woo23a.html

Abstract: In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning. In both cases, our bounds exhibit a linear speedup with respect to the number of agents and sharper dependencies on other salient problem parameters. Moreover, existing approaches to federated Q-learning adopt an equally-weighted averaging of local Q-estimates, which can be highly sub-optimal in the asynchronous setting since the local trajectories can be highly heterogeneous due to different local behavior policies. Existing sample complexity scales inverse proportionally to the minimum entry of the stationary state-action occupancy distributions over all agents, requiring that every agent covers the entire state-action space. Instead, we propose a novel importance averaging algorithm, giving larger weights to more frequently visited state-action pairs. The improved sample complexity scales inverse proportionally to the minimum entry of the average stationary state-action occupancy distribution of all agents, thus only requiring the agents collectively cover the entire state-action space, unveiling the blessing of heterogeneity.

ISSN: 2640-3498 abstractTranslation: 在本文中,我们考虑联邦 Q 学习,其目的是通过定期聚合仅在本地数据上训练的本地 Q 估计来学习最优 Q 函数。我们专注于无限范围的表格马尔可夫决策过程,为联邦 Q 学习的同步和异步变体提供样本复杂性保证。在这两种情况下,我们的边界都表现出相对于代理数量的线性加速以及对其他显着问题参数的更清晰的依赖性。此外,现有的联邦 Q 学习方法采用局部 Q 估计的等权平均,这在异步设置中可能非常次优,因为由于不同的局部行为策略,局部轨迹可能高度异构。现有样本复杂性与所有智能体上静态状态动作占用分布的最小条目成反比,要求每个智能体覆盖整个状态动作空间。相反,我们提出了一种新颖的重要性平均算法,为更频繁访问的状态-动作对赋予更大的权重。改进的样本复杂度与所有智能体的平均静态状态动作占用分布的最小条目成反比,因此只需要智能体共同覆盖整个状态动作空间,揭示了异质性的好处。

Notes:

PUB (https://openreview.net/forum?id=WfI3I8OjHS)

PDF (https://arxiv.org/abs/2305.10697)

SLIDES(https://icml.cc/media/icml-2023/Slides/24679_ljO6pDE.pdf)

  







16. FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization

Authors: Zhen Wang; Weirui Kuang; Ce Zhang; Bolin Ding; Yaliang Li

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/wang23n.html

Abstract: Research in the field of hyperparameter optimization (HPO) has been greatly accelerated by existing HPO benchmarks. Nonetheless, existing efforts in benchmarking all focus on HPO for traditional learning paradigms while ignoring federated learning (FL), a promising paradigm for collaboratively learning models from dispersed data. In this paper, we first identify some uniqueness of federated hyperparameter optimization (FedHPO) from various aspects, showing that existing HPO benchmarks no longer satisfy the need to study FedHPO methods. To facilitate the research of FedHPO, we propose and implement a benchmark suite FedHPO-Bench that incorporates comprehensive FedHPO problems, enables flexible customization of the function evaluations, and eases continuing extensions. We conduct extensive experiments based on FedHPO-Bench to provide the community with more insights into FedHPO. We open-sourced FedHPO-Bench at https://github.com/alibaba/FederatedScope/tree/master/benchmark/FedHPOBench.

ISSN: 2640-3498 abstractTranslation: 现有的 HPO 基准极大地加速了超参数优化 (HPO) 领域的研究。尽管如此,现有的基准测试工作都集中在传统学习范式的 HPO 上,而忽略了联邦学习 (FL),这是一种有前途的基于分散数据的协作学习模型的范式。在本文中,我们首先从各个方面识别联邦超参数优化(FedHPO)的一些独特性,表明现有的HPO基准不再满足研究FedHPO方法的需要。为了促进FedHPO的研究,我们提出并实现了一个基准套件FedHPO-Bench,它包含了全面的FedHPO问题,能够灵活定制功能评估,并简化持续扩展。我们基于 FedHPO-Bench 进行了广泛的实验,为社区提供更多关于 FedHPO 的见解。我们在 https://github.com/alibaba/FederatedScope/tree/master/benchmark/FedHPOBench 开源了 FedHPO-Bench。

Notes:

PUB (https://openreview.net/forum?id=891ytYlYgB)

PDF (https://arxiv.org/abs/2206.03966)

CODE (https://github.com/fedhpo/icml2023)

CODE (https://github.com/alibaba/FederatedScope/tree/master/benchmark/FedHPOBench)

  







17. TabLeak: Tabular Data Leakage in Federated Learning

Authors: Mark Vero; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/vero23a.html

Abstract: While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. TabLeak is based on two key contributions: (i) a method which leverages a softmax relaxation and pooled ensembling to solve the optimization problem, and (ii) an entropy-based uncertainty quantification scheme to enable human assessment. We evaluate TabLeak on four tabular datasets for both FedSGD and FedAvg training protocols, and show that it successfully breaks several settings previously deemed safe. For instance, we extract large subsets of private data at >>>90% accuracy even at the large batch size of 128. Our findings demonstrate that current high-stakes tabular FL is excessively vulnerable to leakage attacks.

ISSN: 2640-3498 abstractTranslation: 虽然联邦学习(FL)承诺保护隐私,但图像和文本领域的最新研究表明,训练更新会泄露私人客户数据。然而,大多数 FL 的高风险应用(例如医疗保健和金融领域)都使用表格数据,而数据泄露的风险尚未得到探索。对表格数据的成功攻击必须解决该领域特有的两个关键挑战:(i) 获得高方差混合离散连续优化问题的解决方案,以及 (ii) 实现与图像和文本不同的重建的人工评估数据,直接人工检查是不可能的。在这项工作中,我们解决了这些挑战并提出了 TabLeak,这是第一个针对表格数据的全面重建攻击。TabLeak 基于两个关键贡献:(i) 一种利用 softmax 松弛和池化集成来解决优化问题的方法,以及 (ii) 一种基于熵的不确定性量化方案,以实现人类评估。我们在 FedSGD 和 FedAvg 训练协议的四个表格数据集上评估 TabLeak,并表明它成功地打破了以前认为安全的几个设置。例如,即使在 128 的大批量大小下,我们也能以 >>>90% 的准确率提取大量私有数据子集。我们的研究结果表明,当前高风险的表格 FL 非常容易受到泄漏攻击。

Notes:

PUB (https://openreview.net/forum?id=mRiDy4qGwB)

PDF (https://arxiv.org/abs/2210.01785)

CODE (https://github.com/eth-sri/tableak)

  







18. Private Federated Learning with Autotuned Compression

Authors: Enayat Ullah; Christopher A. Choquette-Choo; Peter Kairouz; Sewoong Oh

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/ullah23b.html

Abstract: We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the “hardness of the problem” with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.

ISSN: 2640-3498 abstractTranslation: 我们提出了减少私有联邦学习中通信的新技术,而无需设置或调整压缩率。我们的动态方法会根据训练过程中产生的误差自动调整压缩率,同时通过使用安全聚合和差分隐私来保持可证明的隐私保证。事实证明,我们的技术对于均值估计来说是实例最优的,这意味着它们可以以最小的交互性来适应“问题的难度”。我们通过无需调整即可实现良好的压缩率来证明我们的方法在现实数据集上的有效性。

Notes:

PUB (https://openreview.net/forum?id=y8qAZhWbNs)

PDF (https://arxiv.org/abs/2307.10999)

  







19. Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape

Authors: Yan Sun; Li Shen; Shixiang Chen; Liang Ding; Dacheng Tao

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/sun23h.html

Abstract: In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection. Due to the multiple local updates and the isolated non-iid dataset, clients are prone to overfit into their own optima, which extremely deviates from the global objective and significantly undermines the performance. Most previous works only focus on enhancing the consistency between the local and global objectives to alleviate this prejudicial client drifts from the perspective of the optimization view, whose performance would be prominently deteriorated on the high heterogeneity. In this work, we propose a novel and general algorithm FedSMOO by jointly considering the optimization and generalization targets to efficiently improve the performance in FL. Concretely, FedSMOO adopts a dynamic regularizer to guarantee the local optima towards the global objective, which is meanwhile revised by the global Sharpness Aware Minimization (SAM) optimizer to search for the consistent flat minima. Our theoretical analysis indicates that FedSMOO achieves fast O(1/T)O(1/T)\mathcal{O}(1/T) convergence rate with low generalization bound. Extensive numerical studies are conducted on the real-world dataset to verify its peerless efficiency and excellent generality.

ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)中,一组本地客户端在全局服务器的协调下主持,并合作训练一个具有隐私保护的模型。由于多次本地更新和孤立的非独立同分布数据集,客户端很容易过度拟合自己的最优值,这极大地偏离了全局目标并显着降低了性能。以前的大多数工作仅侧重于增强局部目标和全局目标之间的一致性,以从优化角度缓解这种有偏见的客户端漂移,其性能在高异质性下会显着恶化。在这项工作中,我们提出了一种新颖且通用的算法 FedSMOO,通过联邦考虑优化和泛化目标来有效提高 FL 的性能。具体来说,FedSMOO采用动态正则化器来保证全局目标的局部最优,同时通过全局锐度感知最小化(SAM)优化器进行修正以搜索一致的平坦最小值。我们的理论分析表明,FedSMOO 实现了 O(1/T)O(1/T)\mathcal{O}(1/T) 的快速收敛速度,且泛化界限较低。对现实世界的数据集进行了广泛的数值研究,以验证其无与伦比的效率和出色的通用性。

Notes:

PUB (https://openreview.net/forum?id=vD1R00hROK)

PDF (https://arxiv.org/abs/2305.11584)

slides(https://icml.cc/media/icml-2023/Slides/24651.pdf)

  







20. Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions

Authors: Tao Sun; Qingsong Wang; Dongsheng Li; Bao Wang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/sun23l.html

Abstract: Sign Stochastic Gradient Descent (signSGD) is a communication-efficient stochastic algorithm that only uses the sign information of the stochastic gradient to update the model’s weights. However, the existing convergence theory of signSGD either requires increasing batch sizes during training or assumes the gradient noise is symmetric and unimodal. Error feedback has been used to guarantee the convergence of signSGD under weaker assumptions at the cost of communication overhead. This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. Our results resonate with echoes of previous empirical results where, unlike signSGD, signSGD with momentum maintains good performance even with small batch sizes. Another new result is that signSGD with momentum can achieve an improved convergence rate when the objective function is second-order smooth. We further extend our theory to signSGD with major vote and federated learning.

ISSN: 2640-3498 abstractTranslation: 符号随机梯度下降(signSGD)是一种通信高效的随机算法,仅使用随机梯度的符号信息来更新模型的权重。然而,signSGD 现有的收敛理论要么需要在训练期间增加批量大小,要么假设梯度噪声是对称的和单峰的。错误反馈已被用来保证在较弱的假设下signSGD的收敛,但代价是通信开销。本文重新审视了signSGD的收敛性,并证明动量可以在比先前技术更弱的假设下弥补signSGD;特别是,我们的收敛理论不需要有界随机梯度或增加批量大小的假设。我们的结果与之前的实证结果相呼应,与signSGD不同,具有动量的signSGD即使在小批量的情况下也能保持良好的性能。另一个新结果是,当目标函数是二阶光滑时,具有动量的signSGD可以实现更高的收敛速度。我们进一步将我们的理论扩展到通过主要投票和联邦学习来签署SGD。

Notes:

PUB (https://openreview.net/forum?id=a0kGwNUwil)

  







21. Sketching for First Order Method: Efficient Algorithm for Low-Bandwidth Channel and Vulnerability

Authors: Zhao Song; Yitan Wang; Zheng Yu; Lichen Zhang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/song23h.html

Abstract: Sketching is one of the most fundamental tools in large-scale machine learning. It enables runtime and memory saving via randomly compressing the original large problem into lower dimensions. In this paper, we propose a novel sketching scheme for the first order method in large-scale distributed learning setting, such that the communication costs between distributed agents are saved while the convergence of the algorithms is still guaranteed. Given gradient information in a high dimension ddd, the agent passes the compressed information processed by a sketching matrix R∈Rs×dR∈Rs×dR\in \mathbb{R}^{s\times d} with s≪ds≪ds\ll d, and the receiver de-compressed via the de-sketching matrix R⊤R⊤R^\top to “recover” the information in original dimension. Using such a framework, we develop algorithms for federated learning with lower communication costs. However, such random sketching does not protect the privacy of local data directly. We show that the gradient leakage problem still exists after applying the sketching technique by presenting a specific gradient attack method. As a remedy, we prove rigorously that the algorithm will be differentially private by adding additional random noises in gradient information, which results in a both communication-efficient and differentially private first order approach for federated learning tasks. Our sketching scheme can be further generalized to other learning settings and might be of independent interest itself.

ISSN: 2640-3498 abstractTranslation: 草图绘制是大规模机器学习中最基本的工具之一。它通过将原始大问题随机压缩为较低维度来节省运行时间和内存。在本文中,我们为大规模分布式学习环境中的一阶方法提出了一种新颖的草图方案,从而节省了分布式代理之间的通信成本,同时仍然保证了算法的收敛性。给定高维 ddd 中的梯度信息,代理传递由草图矩阵 R∈Rs×dR∈Rs×dR\in \mathbb{R}^{s\times d} 处理的压缩信息,其中 s≪ds≪ds\ ll d,接收器通过去草图矩阵 R⊤R⊤R^\top 进行解压缩,以“恢复”原始维度的信息。使用这样的框架,我们开发了具有较低通信成本的联邦学习算法。然而,这种随机绘制并不能直接保护本地数据的隐私。我们通过提出一种特定的梯度攻击方法来证明应用草图技术后梯度泄漏问题仍然存在。作为补救措施,我们通过在梯度信息中添加额外的随机噪声来严格证明该算法将具有差分隐私性,从而为联邦学习任务提供通信高效且差分隐私的一阶方法。我们的素描方案可以进一步推广到其他学习环境,并且本身可能具有独立的兴趣。

Notes:

PUB (https://openreview.net/forum?id=uIzkbJgyqc)

PDF (https://arxiv.org/abs/2210.08371)

  







22. FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks

Authors: Bingqing Song; Prashant Khanduri; Xinwei Zhang; Jinfeng Yi; Mingyi Hong

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/song23e.html

Abstract: Federated Learning (FL) is a distributed learning paradigm that allows multiple clients to learn a joint model by utilizing privately held data at each client. Significant research efforts have been devoted to develop advanced algorithms that deal with the situation where the data at individual clients have heterogeneous distributions. In this work, we show that data heterogeneity can be dealt from a different perspective. That is, by utilizing a certain overparameterized multi-layer neural network at each client, even the vanilla FedAvg (a.k.a. the Local SGD) algorithm can accurately optimize the training problem: When each client has a neural network with one wide layer of size NNN (where NNN is the number of total training samples), followed by layers of smaller widths, FedAvg converges linearly to a solution that achieves (almost) zero training loss, without requiring any assumptions on the clients’ data distributions. To our knowledge, this is the first work that demonstrates such resilience to data heterogeneity for FedAvg when trained on multi-layer neural networks. Our experiments also confirm that, neural networks of large size can achieve better and more stable performance for FL problems.

ISSN: 2640-3498 abstractTranslation: 联邦学习 (FL) 是一种分布式学习范例,允许多个客户端通过利用每个客户端的私有数据来学习联邦模型。人们投入了大量的研究工作来开发先进的算法,以处理各个客户端的数据具有异构分布的情况。在这项工作中,我们表明可以从不同的角度处理数据异构性。也就是说,通过在每个客户端使用特定的超参数化多层神经网络,即使是普通的 FedAvg(也称为 Local SGD)算法也可以准确地优化训练问题:当每个客户端都有一个具有大小为 NNN 的宽层的神经网络时(其中 NNN 是训练样本总数),然后是较小宽度的层,FedAvg 线性收敛到实现(几乎)零训练损失的解决方案,而不需要对客户的数据分布进行任何假设。据我们所知,这是第一个在多层神经网络上训练时证明 FedAvg 对数据异构性具有如此弹性的工作。我们的实验也证实,大尺寸的神经网络可以在 FL 问题上获得更好、更稳定的性能。

Notes:

PUB (https://openreview.net/forum?id=eqTWOzheZT)

  







23. Improving the Model Consistency of Decentralized Federated Learning

Authors: Yifan Shi; Li Shen; Kang Wei; Yan Sun; Bo Yuan; Xueqian Wang; Dacheng Tao

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/shi23d.html

Abstract: To mitigate the privacy leakages and communication burdens of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in a decentralized communication network. However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topologies. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. Specifically, DFedSAM leverages gradient perturbation to generate local flat models via Sharpness Aware Minimization (SAM), which searches for models with uniformly low loss values. DFedSAM-MGS further boosts DFedSAM by adopting Multiple Gossip Steps (MGS) for better model consistency, which accelerates the aggregation of local flat models and better balances communication complexity and generalization. Theoretically, we present improved convergence rates  and  in non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where  is the spectral gap of gossip matrix and Q is the number of MGS. Empirically, our methods can achieve competitive performance compared with CFL methods and outperform existing DFL methods.

ISSN: 2640-3498 abstractTranslation: 为了减轻联邦学习(FL)的隐私泄露和通信负担,去中心化 FL(DFL)抛弃了中央服务器,每个客户端仅与去中心化通信网络中的邻居进行通信。然而,现有的DFL在本地客户端之间存在高度不一致的问题,这导致与集中式FL(CFL)相比严重的分布偏移和较差的性能,特别是在异构数据或稀疏通信拓扑上。为了缓解这个问题,我们提出了两种 DFL 算法 DFedSAM 和 DFedSAM-MGS 来提高 DFL 的性能。具体来说,DFedSAM 利用梯度扰动通过锐度感知最小化 (SAM) 生成局部平坦模型,该模型搜索具有一致低损失值的模型。DFedSAM-MGS 通过采用多个 Gossip Steps (MGS) 来进一步增强 DFedSAM,以实现更好的模型一致性,从而加速本地平面模型的聚合并更好地平衡通信复杂性和泛化性。理论上,我们提出了改进的收敛速度  和  非分别为 DFedSAM 和 DFedSAM-MGS 的凸设置,其中  是 gossip 矩阵的谱间隙,Q 是 MGS 的数量。根据经验,与 CFL 方法相比,我们的方法可以实现具有竞争力的性能,并且优于现有的 DFL 方法。

Notes:

PUB (https://openreview.net/forum?id=fn2NFlYLBL)

PDF (https://arxiv.org/abs/2302.04083)

  







24. Conformal Prediction for Federated Uncertainty Quantification Under Label Shift

Authors: Vincent Plassier; Mehdi Makni; Aleksandr Rubashevskii; Eric Moulines; Maxim Panov

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/plassier23a.html

Abstract: Federated Learning (FL) is a machine learning framework where many clients collaboratively train models while keeping the training data decentralized. Despite recent advances in FL, the uncertainty quantification topic (UQ) remains partially addressed. Among UQ methods, conformal prediction (CP) approaches provides distribution-free guarantees under minimal assumptions. We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints. This method takes advantage of importance weighting to effectively address the label shift between agents and provides theoretical guarantees for both valid coverage of the prediction sets and differential privacy. Extensive experimental studies demonstrate that this method outperforms current competitors.

ISSN: 2640-3498 abstractTranslation: 联邦学习 (FL) 是一种机器学习框架,许多客户在其中协作训练模型,同时保持训练数据分散。尽管 FL 最近取得了进展,但不确定性量化主题 (UQ) 仍然得到部分解决。在 UQ 方法中,保形预测 (CP) 方法在最小假设下提供无分布保证。我们开发了一种基于分位数回归的新联邦共形预测方法,并考虑了隐私约束。该方法利用重要性加权有效解决智能体之间的标签转移问题,为预测集的有效覆盖和差分隐私提供了理论保证。大量的实验研究表明,这种方法优于当前的竞争对手。

Notes:

PUB (https://openreview.net/forum?id=ytpEqHYSEy)

PDF (https://arxiv.org/abs/2306.05131)

  







25. Federated Online and Bandit Convex Optimization

Authors: Kumar Kshitij Patel; Lingxiao Wang; Aadirupa Saha; Nathan Srebro

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/patel23a.html

Abstract: We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on MMM machines working in parallel over TTT rounds with RRR intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization.

ISSN: 2640-3498 abstractTranslation: 我们研究针对自适应对手的分布式在线和强盗凸优化问题。我们的目标是最大限度地减少 MMM 机器在 TTT 轮次中与 RRR 间歇性通信并行工作的平均遗憾。假设底层成本函数是凸的并且可以自适应生成,我们的结果表明,当机器可以访问查询点的一阶梯度信息时,协作是没有好处的。这与随机函数的情况相反,随机函数中每台机器从固定分布中采样成本函数。此外,我们深入研究了具有强盗(零阶)反馈的联邦在线优化的更具挑战性的设置,其中机器只能访问查询点处的成本函数的值。这里的关键发现是确定协作有益的高维状态,甚至可能导致机器数量的线性加速。我们通过开发新颖的分布式单点和两点反馈算法,通过联邦对抗性线性老虎机进一步说明我们的发现。我们的工作是系统地理解有限反馈的联邦在线优化的首次尝试,并且它在一阶和零阶反馈的间歇性通信设置中达到了严格的遗憾界限。因此,我们的结果弥合了联邦在线优化中随机设置和自适应设置之间的差距。

Notes:

  







26. Towards Understanding Ensemble Distillation in Federated Learning

Authors: Sejun Park; Kihun Hong; Ganguk Hwang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/park23e.html

Abstract: Federated Learning (FL) is a collaborative machine learning paradigm for data privacy preservation. Recently, a knowledge distillation (KD) based information sharing approach in FL, which conducts ensemble distillation on an unlabeled public dataset, has been proposed. However, despite its experimental success and usefulness, the theoretical analysis of the KD based approach has not been satisfactorily conducted. In this work, we build a theoretical foundation of the ensemble distillation framework in federated learning from the perspective of kernel ridge regression (KRR). In this end, we propose a KD based FL algorithm for KRR models which is related with some existing KD based FL algorithms, and analyze our algorithm theoretically. We show that our algorithm makes local prediction models as much powerful as the centralized KRR model (which is a KRR model trained by all of local datasets) in terms of the convergence rate of the generalization error if the unlabeled public dataset is sufficiently large. We also provide experimental results to verify our theoretical results on ensemble distillation in federated learning.

ISSN: 2640-3498 abstractTranslation: 联邦学习(FL)是一种用于数据隐私保护的协作机器学习范例。最近,人们提出了一种基于知识蒸馏(KD)的 FL 信息共享方法,该方法在未标记的公共数据集上进行集成蒸馏。然而,尽管实验取得了成功且有用,但基于 KD 的方法的理论分析尚未令人满意。在这项工作中,我们从核岭回归(KRR)的角度构建了联邦学习中集成蒸馏框架的理论基础。最后,我们针对KRR模型提出了一种基于KD的FL算法,与一些现有的基于KD的FL算法相关,并对我们的算法进行了理论上的分析。我们表明,如果未标记的公共数据集足够大,就泛化误差的收敛速度而言,我们的算法使本地预测模型与集中式 KRR 模型(这是由所有本地数据集训练的 KRR 模型)一样强大。我们还提供了实验结果来验证我们在联邦学习中集成蒸馏的理论结果。

Notes:







27. Secure Federated Correlation Test and Entropy Estimation

Authors: Qi Pang; Lun Wang; Shuai Wang; Wenting Zheng; Dawn Song

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/pang23a.html

Abstract: We propose the first federated correlation test framework compatible with secure aggregation, namely FED-χ2. In our protocol, the statistical computations are recast as frequency moment estimation problems, where the clients collaboratively generate a shared projection matrix and then use stable projection to encode the local information in a compact vector. As such encodings can be linearly aggregated, secure aggregation can be applied to conceal the individual updates. We formally establish the security guarantee of FED-χ2 by proving that only the minimum necessary information (i.e., the correlation statistics) is revealed to the server. We show that our protocol can be naturally extended to estimate other statistics that can be recast as frequency moment estimations. By accommodating Shannon’e Entropy in FED-χ2, we further propose the first secure federated entropy estimation protocol, FED-HHH. The evaluation results demonstrate that FED-χ2 and FED-H achieve good performance with small client-side computation overhead in several real-world case studies.

ISSN: 2640-3498 abstractTranslation: 我们提出了第一个与安全聚合兼容的联邦相关性测试框架,即FED-χ2。在我们的协议中,统计计算被重新定义为频率矩估计问题,其中客户端协作生成共享投影矩阵,然后使用稳定投影将局部信息编码在紧凑向量中。由于此类编码可以线性聚合,因此可以应用安全聚合来隐藏各个更新。我们通过证明只向服务器透露最少的必要信息(即相关统计信息)来正式建立 FED-χ2 的安全保证。我们表明,我们的协议可以自然地扩展到估计其他统计数据,这些统计数据可以重新转换为频率矩估计。通过在 FED-χ22 中容纳香农熵,我们进一步提出了第一个安全的联邦熵估计协议 FED-H。评估结果表明,FED-χ2 和 FED-HHH 在几个实际案例研究中以较小的客户端计算开销实现了良好的性能。

Notes:







28. Flash: Concept Drift Adaptation in Federated Learning

Authors: Kunjal Panchal; Sunav Choudhary; Subrata Mitra; Koyel Mukherjee; Somdeb Sarkhel; Saayan Mitra; Hui Guan

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/panchal23a.html

Abstract: In Federated Learning (FL), adaptive optimization is an effective approach to addressing the statistical heterogeneity issue but cannot adapt quickly to concept drifts. In this work, we propose a novel adaptive optimizer called Flash that simultaneously addresses both statistical heterogeneity and the concept drift issues. The fundamental insight is that a concept drift can be detected based on the magnitude of parameter updates that are required to fit the global model to each participating client’s local data distribution. Flash uses a two-pronged approach that synergizes client-side early-stopping training to facilitate detection of concept drifts and the server-side drift-aware adaptive optimization to effectively adjust effective learning rate. We theoretically prove that Flash matches the convergence rate of state-of-the-art adaptive optimizers and further empirically evaluate the efficacy of Flash on a variety of FL benchmarks using different concept drift settings.

ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)中,自适应优化是解决统计异质性问题的有效方法,但无法快速适应概念漂移。在这项工作中,我们提出了一种称为 Flash 的新型自适应优化器,它可以同时解决统计异质性和概念漂移问题。基本见解是,可以根据将全局模型适合每个参与客户的本地数据分布所需的参数更新的幅度来检测概念漂移。Flash采用双管齐下的方法,协同客户端提前停止训练以促进概念漂移检测和服务器端漂移感知自适应优化以有效调整有效学习率。我们从理论上证明 Flash 与最先进的自适应优化器的收敛速度相匹配,并使用不同的概念漂移设置进一步根据经验评估 Flash 在各种 FL 基准上的功效。

Notes:

PUB (https://openreview.net/forum?id=q5RHsg6VRw)

  







29. SRATTA: Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning.

Authors: Tanguy Marchand; Regis Loeb; Ulysse Marteau-Ferey; Jean Ogier Du Terrail; Arthur Pignet

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/marchand23a.html

Abstract: We consider a federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.

ISSN: 2640-3498 abstractTranslation: 我们考虑联邦学习 (FL) 设置,其中使用 FedAvg 在不同客户端和中央服务器之间训练具有完全连接的第一层的机器学习模型,并且可以使用安全聚合 (SA) 来执行聚合步骤。我们提出 SRATTA 是一种仅依赖于聚合模型的攻击,在现实的假设下,(i)从不同客户端恢复数据样本,(ii)将来自同一客户端的数据样本分组在一起。虽然样本回收已经在 FL 设置中进行了探索,但尽管使用了 SA,但对每个客户的样本进行分组的能力还是新颖的。这对 FL 构成了重大的不可预见的安全威胁,并有效地破坏了 SA。我们证明 SRATTA 既有理论依据,也可以在现实模型和数据集的实践中使用。我们还提出了对策,并要求客户在培训期间发挥积极作用,保护自己的隐私。

Notes:

PUB (https://openreview.net/forum?id=pRsJIVcjxD)

PDF (https://arxiv.org/abs/2306.07644)

CODE (https://github.com/owkin/sratta)

  







30. Vertical Federated Graph Neural Network for Recommender System

Authors: Peihua Mai; Yan Pang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/mai23b.html

Abstract: Conventional recommender systems are required to train the recommendation model using a centralized database. However, due to data privacy concerns, this is often impractical when multi-parties are involved in recommender system training. Federated learning appears as an excellent solution to the data isolation and privacy problem. Recently, Graph neural network (GNN) is becoming a promising approach for federated recommender systems. However, a key challenge is to conduct embedding propagation while preserving the privacy of the graph structure. Few studies have been conducted on the federated GNN-based recommender system. Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN. We design a framework to transmit: (i) the summation of neighbor embeddings using random projection, and (ii) gradients of public parameter perturbed by ternary quantization mechanism. Empirical studies show that VerFedGNN has competitive prediction accuracy with existing privacy preserving GNN frameworks while enhanced privacy protection for users’ interaction information.

ISSN: 2640-3498 abstractTranslation: 传统的推荐系统需要使用集中式数据库来训练推荐模型。然而,由于数据隐私问题,当多方参与推荐系统训练时,这通常是不切实际的。联邦学习似乎是数据隔离和隐私问题的绝佳解决方案。最近,图神经网络(GNN)正在成为联邦推荐系统的一种有前景的方法。然而,一个关键的挑战是在保护图结构隐私的同时进行嵌入传播。关于基于 GNN 的联邦推荐系统的研究很少。我们的研究提出了第一个基于 GNN 的纵向联邦推荐系统,称为 VerFedGNN。我们设计了一个框架来传输:(i)使用随机投影的邻居嵌入的总和,以及(ii)由三元量化机制扰动的公共参数的梯度。实证研究表明,VerFedGNN 与现有的隐私保护 GNN 框架相比,具有竞争性的预测精度,同时增强了对用户交互信息的隐私保护。

Notes:

PUB (https://openreview.net/forum?id=NRnS6CtbaN)

PDF (https://arxiv.org/abs/2303.05786)

CODE (https://github.com/maiph123/verticalgnn)

  







31. Federated Conformal Predictors for Distributed Uncertainty Quantification

Authors: Charles Lu; Yaodong Yu; Sai Praneeth Karimireddy; Michael Jordan; Ramesh Raskar

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/lu23i.html

Abstract: Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models. In this paper, we extend conformal prediction to the federated learning setting. The main challenge we face is data heterogeneity across the clients — this violates the fundamental tenet of exchangeability required for conformal prediction. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees and excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. We provide code used in our experiments https://github.com/clu5/federated-conformal.

ISSN: 2640-3498 abstractTranslation: 保形预测正在成为一种在机器学习中提供严格的不确定性量化的流行范例,因为它可以轻松地用作已训练模型的后处理步骤。在本文中,我们将共形预测扩展到联邦学习设置。我们面临的主要挑战是客户端之间的数据异构性——这违反了保形预测所需的可交换性的基本原则。我们提出了一个较弱的部分可交换性概念,更适合 FL 设置,并用它来开发联邦共形预测 (FCP) 框架。我们证明 FCP 在多个计算机视觉和医学成像数据集上享有严格的理论保证和出色的实证性能。我们的结果展示了一种在分布式和异构环境中纳入有意义的不确定性量化的实用方法。我们提供实验中使用的代码 https://github.com/clu5/federated-conformal。

Notes:

PUB (https://openreview.net/forum?id=YVTr9PzIrK)

PDF (https://arxiv.org/abs/2305.17564)

CODE (https://github.com/clu5/federated-conformal)

  







32. Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting

Authors: Yuchen Liu; Chen Chen; Lingjuan Lyu; Fangzhao Wu; Sai Wu; Gang Chen

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/liu23d.html

Abstract: Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to a central server to destroy the convergence and performance of the global model. A wealth of robust AGgregation Rules (AGRs) have been proposed to defend against Byzantine attacks. However, Byzantine clients can still circumvent robust AGRs when data is non-Identically and Independently Distributed (non-IID). In this paper, we first reveal the root causes of performance degradation of current robust AGRs in non-IID settings: the curse of dimensionality and gradient heterogeneity. In order to address this issue, we propose GAS, a GrAdient Splitting approach that can successfully adapt existing robust AGRs to non-IID settings. We also provide a detailed convergence analysis when the existing robust AGRs are combined with GAS. Experiments on various real-world datasets verify the efficacy of our proposed GAS. The implementation code is provided in https://github.com/YuchenLiu-a/byzantine-gas.

ISSN: 2640-3498 abstractTranslation: 联邦学习暴露了拜占庭攻击的漏洞,拜占庭攻击者可以向中央服务器发送任意梯度,以破坏全局模型的收敛和性能。人们提出了大量强大的聚合规则(AGR)来防御拜占庭攻击。然而,当数据非相同且独立分布(非 IID)时,拜占庭客户端仍然可以规避强大的 AGR。在本文中,我们首先揭示了非 IID 设置中当前鲁棒 AGR 性能下降的根本原因:维数灾难和梯度异质性。为了解决这个问题,我们提出了 GAS,一种 GraAdient Splitting 方法,可以成功地将现有的鲁棒 AGR 适应非 IID 设置。我们还提供了现有稳健 AGR 与 GAS 结合时的详细收敛分析。对各种现实世界数据集的实验验证了我们提出的 GAS 的有效性。https://github.com/YuchenLiu-a/byzantine-gas 中提供了实现代码。

Notes:

PUB (https://openreview.net/forum?id=3DI6Kmw81p)

PDF (https://arxiv.org/abs/2302.06079)

CODE (https://github.com/YuchenLiu-a/byzantine-gas)

  







33. Fair yet Asymptotically Equal Collaborative Learning

Authors: Xiaoqiang Lin; Xinyi Xu; See-Kiong Ng; Chuan-Sheng Foo; Bryan Kian Hsiang Low

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/lin23l.html

Abstract: In collaborative learning with streaming data, nodes (e.g., organizations) jointly and continuously learn a machine learning (ML) model by sharing the latest model updates computed from their latest streaming data. For the more resourceful nodes to be willing to share their model updates, they need to be fairly incentivized. This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions. Our approach leverages an explore-then-exploit formulation to estimate the nodes’ contributions (i.e., exploration) for realizing our theoretically guaranteed fair incentives (i.e., exploitation). However, we observe a "rich get richer" phenomenon arising from the existing approaches to guarantee fairness and it discourages the participation of the less resourceful nodes. To remedy this, we additionally preserve asymptotic equality, i.e., less resourceful nodes achieve equal performance eventually to the more resourceful/“rich” nodes. We empirically demonstrate in two settings with real-world streaming data: federated online incremental learning and federated reinforcement learning, that our proposed approach outperforms existing baselines in fairness and learning performance while remaining competitive in preserving equality.

ISSN: 2640-3498 abstractTranslation: 在使用流数据的协作学习中,节点(例如组织)通过共享根据其最新流数据计算的最新模型更新来共同持续地学习机器学习(ML)模型。为了让更足智多谋的节点愿意分享他们的模型更新,他们需要得到相当的激励。本文探讨了一种保证公平性的激励设计,以便节点获得与其贡献相称的奖励。我们的方法利用探索然后利用的公式来估计节点的贡献(即探索),以实现我们理论上保证的公平激励(即利用)。然而,我们观察到现有保证公平的方法出现了“富者愈富”的现象,并且阻碍了资源较少的节点的参与。为了解决这个问题,我们还保留渐近平等,即资源较少的节点最终会获得与资源较多/“丰富”的节点相同的性能。我们通过现实世界的流数据在两种环境中进行实证证明:联邦在线增量学习和联邦强化学习,我们提出的方法在公平性和学习性能方面优于现有基线,同时在维护平等方面保持竞争力。

Notes:

PUB (https://openreview.net/forum?id=5VhltFPSO8)

PDF (https://arxiv.org/abs/2306.05764)

CODE (https://github.com/xqlin98/Fair-yet-Equal-CML)

  







34. Revisiting Weighted Aggregation in Federated Learning with Neural Networks

Authors: Zexi Li; Tao Lin; Xinyi Shang; Chao Wu

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/li23s.html

Abstract: In federated learning (FL), weighted aggregation of local models is conducted to generate a global model, and the aggregation weights are normalized (the sum of weights is 1) and proportional to the local data sizes. In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. First, we find that the sum of weights can be smaller than 1, causing global weight shrinking effect (analogous to weight decay) and improving generalization. We explore how the optimal shrinking factor is affected by clients’ data heterogeneity and local epochs. Second, we dive into the relative aggregation weights among clients to depict the clients’ importance. We develop client coherence to study the learning dynamics and find a critical point that exists. Before entering the critical point, more coherent clients play more essential roles in generalization. Based on the above insights, we propose an effective method for Federated Learning with Learnable Aggregation Weights, named as FedLAW. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.

ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)中,对局部模型进行加权聚合以生成全局模型,聚合权重被归一化(权重之和为1)并与局部数据大小成正比。在本文中,我们重新审视了加权聚合过程,并对 FL 的训练动态有了新的见解。首先,我们发现权重之和可以小于1,从而产生全局权重收缩效应(类似于权重衰减)并提高泛化能力。我们探讨了最佳收缩因子如何受到客户数据异质性和本地时代的影响。其次,我们深入研究客户之间的相对聚合权重来描述客户的重要性。我们培养客户的一致性来研究学习动态并找到存在的关键点。在进入临界点之前,更一致的客户在泛化中发挥着更重要的作用。基于上述见解,我们提出了一种有效的具有可学习聚合权重的联邦学习方法,命名为FedLAW。大量的实验验证了我们的方法可以在不同的数据集和模型上大幅提高全局模型的泛化能力。

Notes:

PUB (https://openreview.net/forum?id=FuDAjnWhrQ)

PDF (https://arxiv.org/abs/2302.10911)

CODE (https://github.com/zexilee/icml-2023-fedlaw)







35. Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression: Fast Convergence and Partial Participation

Authors: Xiaoyun Li; Ping Li

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/li23o.html

Abstract: In practical federated learning (FL) systems, the communication cost between the clients and the central server can often be a bottleneck. In this paper, we focus on biased gradient compression in non-convex FL problems. In the classical distributed learning, the method of error feedback (EF) is a common technique to remedy the downsides of biased gradient compression, but the performance of EF in FL still lacks systematic investigation. In this work, we study a compressed FL scheme with error feedback, named Fed-EF, with two variants depending on the global model optimizer. While directly applying biased compression in FL leads to poor convergence, we show that Fed-EF is able to match the convergence rate of the full-precision FL counterpart with a linear speedup w.r.t. the number of clients. Experiments verify that Fed-EF achieves the same performance as the full-precision FL approach, at the substantially reduced communication cost. Moreover, we develop a new analysis of the EF under partial participation (PP), an important scenario in FL. Under PP, the convergence rate of Fed-EF exhibits an extra slow-down factor due to a so-called “stale error compensation” effect, which is also justified in our experiments. Our results provide insights on a theoretical limitation of EF, and possible directions for improvements.

ISSN: 2640-3498 abstractTranslation: 在实际的联邦学习(FL)系统中,客户端和中央服务器之间的通信成本通常是一个瓶颈。在本文中,我们重点研究非凸 FL 问题中的偏置梯度压缩。在经典的分布式学习中,误差反馈(EF)方法是弥补有偏梯度压缩缺点的常用技术,但 EF 在 FL 中的性能仍然缺乏系统的研究。在这项工作中,我们研究了一种带有错误反馈的压缩 FL 方案,名为 Fed-EF,具有两种取决于全局模型优化器的变体。虽然在 FL 中直接应用偏置压缩会导致收敛性较差,但我们表明 Fed-EF 能够以线性加速比与全精度 FL 对应的收敛速度相匹配。客户数量。实验验证了 Fed-EF 能够以大幅降低的通信成本实现与全精度 FL 方法相同的性能。此外,我们对部分参与(PP)下的 EF 进行了新的分析,这是 FL 的一个重要场景。在 PP 下,由于所谓的“过时误差补偿”效应,Fed-EF 的收敛速度表现出额外的减速因素,这在我们的实验中也是合理的。我们的结果提供了有关 EF 理论局限性的见解以及可能的改进方向。

Notes:

PUB (https://openreview.net/forum?id=wbs1fKLfOe)

PDF (https://arxiv.org/abs/2211.14292)







36. Federated Adversarial Learning: A Framework with Convergence Analysis

Authors: Xiaoxiao Li; Zhao Song; Jiaming Yang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/li23z.html

Abstract: Federated learning (FL) is a trending training paradigm to utilize decentralized training data. FL allows clients to update model parameters locally for several epochs, then share them to a global model for aggregation. This training paradigm with multi-local step updating before aggregation exposes unique vulnerabilities to adversarial attacks. Adversarial training is a popular and effective method to improve the robustness of networks against adversaries. In this work, we formulate a general form of federated adversarial learning (FAL) that is adapted from adversarial learning in the centralized setting. On the client side of FL training, FAL has an inner loop to generate adversarial samples for adversarial training and an outer loop to update local model parameters. On the server side, FAL aggregates local model updates and broadcast the aggregated model. We design a global robust training loss and formulate FAL training as a min-max optimization problem. Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity. We address these challenges by using appropriate gradient approximation and coupling techniques and present the convergence analysis in the over-parameterized regime. Our main result theoretically shows that the minimum loss under our algorithm can converge to ϵϵ\epsilon small with chosen learning rate and communication rounds. It is noteworthy that our analysis is feasible for non-IID clients.

ISSN: 2640-3498 abstractTranslation: 联邦学习(FL)是一种利用去中心化训练数据的趋势训练范式。FL 允许客户端在本地更新几个时期的模型参数,然后将它们共享到全局模型进行聚合。这种在聚合之前进行多本地步骤更新的训练范例暴露了对抗性攻击的独特漏洞。对抗性训练是一种流行且有效的方法,可以提高网络对抗对手的鲁棒性。在这项工作中,我们制定了一种联邦对抗性学习(FAL)的通用形式,它改编自集中式环境中的对抗性学习。在 FL 训练的客户端,FAL 有一个内循环来生成用于对抗训练的对抗样本,还有一个外循环来更新本地模型参数。在服务器端,FAL聚合本地模型更新并广播聚合模型。我们设计了全局鲁棒训练损失,并将 FAL 训练制定为最小-最大优化问题。与依赖梯度方向的经典集中训练中的收敛分析不同,FAL 中的收敛分析要困难得多,原因有以下三个:1)最小-最大优化的复杂性,2)由于以下原因,模型不会在梯度方向上更新:聚合前客户端的多本地更新以及 3) 客户端间的异构性。我们通过使用适当的梯度近似和耦合技术来解决这些挑战,并在过度参数化的情况下进行收敛分析。我们的主要结果从理论上表明,在选择学习率和通信轮次的情况下,我们算法下的最小损失可以收敛到较小的 εε\epsilon。值得注意的是,我们的分析对于非 IID 客户是可行的。

Notes:







37. FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models

Authors: Songze Li; Duanyi Yao; Jin Liu

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/li23an.html

Abstract: In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients’ uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients’ embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.

ISSN: 2640-3498 abstractTranslation: 在由中央服务器和许多分布式客户端组成的纵向联邦学习(VFL)系统中,训练数据被纵向分区,以便不同的特征私有地存储在不同的客户端上。分割 VFL 的问题是训练服务器和客户端之间的模型分割。本文旨在解决分割 VFL 中的两个主要挑战:1)由于训练过程中客户端落后而导致性能下降;2)客户上传的数据嵌入导致数据和模型隐私泄露。我们建议 FedVS 同时应对这两个挑战。FedVS 的关键思想是为本地数据和模型设计秘密共享方案,从而保证针对合谋客户端和好奇服务器的信息论隐私,并且通过解密计算共享来无损地重建所有客户端嵌入的聚合。不落后的客户。对各种类型的 VFL 数据集(包括表格、CV 和多视图)进行的广泛实验证明了 FedVS 在落后者缓解和隐私保护方面相对于基线协议的普遍优势。

Notes:

PUB (https://openreview.net/forum?id=7aqVcrXjxa)

PDF (https://arxiv.org/abs/2304.13407)







38. Adversarial Collaborative Learning on Non-IID Features

Authors: Qinbin Li; Bingsheng He; Dawn Song

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/li23j.html

Abstract: Federated Learning (FL) has been a popular approach to enable collaborative learning on multiple parties without exchanging raw data. However, the model performance of FL may degrade a lot due to non-IID data. While many FL algorithms focus on non-IID labels, FL on non-IID features has largely been overlooked. Different from typical FL approaches, the paper proposes a new learning concept called ADCOL (Adversarial Collaborative Learning) for non-IID features. Instead of adopting the widely used model-averaging scheme, ADCOL conducts training in an adversarial way: the server aims to train a discriminator to distinguish the representations of the parties, while the parties aim to generate a common representation distribution. Our experiments show that ADCOL achieves better performance than state-of-the-art FL algorithms on non-IID features.

ISSN: 2640-3498 abstractTranslation: 联邦学习 (FL) 是一种流行的方法,可以在不交换原始数据的情况下实现多方协作学习。然而,由于非独立同分布数据,FL 的模型性能可能会下降很多。虽然许多 FL 算法专注于非 IID 标签,但非 IID 特征上的 FL 在很大程度上被忽视了。与典型的 FL 方法不同,本文针对非 IID 特征提出了一种新的学习概念,称为 ADCOL(对抗性协作学习)。ADCOL 没有采用广泛使用的模型平均方案,而是以对抗方式进行训练:服务器的目标是训练鉴别器来区分各方的表示,而各方的目标是生成共同的表示分布。我们的实验表明,ADCOL 在非 IID 特征上实现了比最先进的 FL 算法更好的性能。

Notes:

PUB (https://openreview.net/forum?id=DVF7gEQQf7)







39. Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning Using Independent Component Analysis

Authors: Sanjay Kariyappa; Chuan Guo; Kiwan Maeng; Wenjie Xiong; G. Edward Suh; Moinuddin K. Qureshi; Hsien-Hsin S. Lee

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/kariyappa23a.html

Abstract: Federated learning (FL) aims to perform privacy-preserving machine learning on distributed data held by multiple data owners. To this end, FL requires the data owners to perform training locally and share the gradients or weight updates (instead of the private inputs) with the central server, which are then securely aggregated over multiple data owners. Although aggregation by itself does not offer provable privacy protection, prior work suggested that if the batch size is sufficiently large the aggregation may be secure enough. In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients/weight updates aggregated over as many as 1024 samples. CPA leverages the crucial insight that aggregate gradients from a fully connected (FC) layer is a linear combination of its inputs, which allows us to frame gradient inversion as a blind source separation (BSS) problem. We adapt independent component analysis (ICA)—a classic solution to the BSS problem—to recover private inputs for FC and convolutional networks, and show that CPA significantly outperforms prior gradient inversion attacks, scales to ImageNet-sized inputs, and works on large batch sizes of up to 1024.

ISSN: 2640-3498 abstractTranslation: 联邦学习(FL)旨在对多个数据所有者持有的分布式数据执行保护隐私的机器学习。为此,FL 要求数据所有者在本地执行训练,并与中央服务器共享梯度或权重更新(而不是私有输入),然后将其安全地聚合到多个数据所有者。尽管聚合本身并不能提供可证明的隐私保护,但先前的工作表明,如果批量大小足够大,则聚合可能足够安全。在本文中,我们提出了鸡尾酒会攻击(CPA),与之前的看法相反,它能够从多达 1024 个样本聚合的梯度/权重更新中恢复私有输入。CPA 利用了一个重要的见解,即来自全连接 (FC) 层的聚合梯度是其输入的线性组合,这使我们能够将梯度反演视为盲源分离 (BSS) 问题。我们采用独立分量分析 (ICA)(BSS 问题的经典解决方案)来恢复 FC 和卷积网络的私有输入,并表明 CPA 显着优于先前的梯度反转攻击、可扩展到 ImageNet 大小的输入,并且适用于大批量最大尺寸为 1024。

Notes:

PUB (https://openreview.net/forum?id=Ai1TyAjZt9)

PDF (https://arxiv.org/abs/2209.05578)







40. One-Shot Federated Conformal Prediction

Authors: Pierre Humbert; Batiste Le Bars; Aurélien Bellet; Sylvain Arlot

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/humbert23a.html

Abstract: In this paper, we present a Conformal Prediction method that computes prediction sets in a one-shot Federated Learning (FL) setting. More specifically, we introduce a novel quantile-of-quantiles estimator and prove that for any distribution, it is possible to compute prediction sets with desired coverage in only one round of communication. To mitigate privacy issues, we also describe a locally differentially private version of our estimator. Finally, over a wide range of experiments, we show that our method returns prediction sets with coverage and length very similar to those obtained in a centralized setting. These results demonstrate that our method is well-suited for one-shot Federated Learning.

ISSN: 2640-3498 abstractTranslation: 在本文中,我们提出了一种保形预测方法,该方法在一次性联邦学习(FL)设置中计算预测集。更具体地说,我们引入了一种新颖的分位数估计器,并证明对于任何分布,都可以仅在一轮通信中计算出具有所需覆盖范围的预测集。为了缓解隐私问题,我们还描述了估计器的局部差分隐私版本。最后,通过广泛的实验,我们表明我们的方法返回的预测集的覆盖范围和长度与集中设置中获得的预测集非常相似。这些结果表明我们的方法非常适合一次性联邦学习。

Notes:

PUB (https://openreview.net/forum?id=SZJGIWe1Ag)

PDF (https://arxiv.org/abs/2302.06322)

CODE (https://github.com/pierreHmbt/FedCP-QQ)







41. Federated Linear Contextual Bandits with User-level Differential Privacy

Authors: Ruiquan Huang; Huanyu Zhang; Luca Melis; Milan Shen; Meisam Hejazinia; Jing Yang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/huang23q.html

Abstract: This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as ROBIN and show that it is near-optimal in terms of the number of clients MMM and the privacy budget ε by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level (ε,δ)-LDP must suffer a regret blow-up factor at least min{1/ε,M} or  under different conditions.

ISSN: 2640-3498 abstractTranslation: 本文研究了用户级差分隐私(DP)概念下的联邦线性上下文强盗。我们首先引入一个统一的联邦老虎机框架,该框架可以在顺序决策设置中容纳 DP 的各种定义。然后,我们在联邦强盗框架中正式引入用户级中央DP(CDP)和本地DP(LDP),并研究联邦线性上下文强盗模型中学习遗憾和相应DP保证之间的基本权衡。对于 CDP,我们提出了一种称为 ROBINROBIN\texttt{ROBIN} 的联邦算法,并通过推导出几乎匹配的后悔上限和下限,表明该算法在客户端 MMM 数量和隐私预算 εε\varepsilon 方面接近最优。用户级DP满意。对于LDP,我们得到了几个下界,这表明在用户级别(ε,δ)(ε,δ)(\varepsilon,\delta)-LDP下学习必须遭受至少min{1/ε的遗憾爆炸因子,M}min{1/ε,M}\min{1/\varepsilon,M} 或 min{1/ε√,M−−√}min{1/ε,M}\min{1/不同条件下的\sqrt{\varepsilon},\sqrt{M}}。

Notes:

PUB (https://openreview.net/forum?id=b9opfVNw6O)

PDF (https://arxiv.org/abs/2306.05275)







42. Achieving Linear Speedup in Non-IID Federated Bilevel Learning

Authors: Minhui Huang; Dewei Zhang; Kaiyi Ji

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/huang23p.html

Abstract: Federated bilevel learning has received increasing attention in various emerging machine learning and communication applications. Recently, several Hessian-vector-based algorithms have been proposed to solve the federated bilevel optimization problem. However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d. datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation. We show that FedMBO achieves a convergence rate of  on non-i.i.d. datasets, where n is the number of participating clients in each round, and K is the total number of iteration. This is the first theoretical linear speedup result for non-i.i.d. federated bilevel optimization. Extensive experiments validate our theoretical results and demonstrate the effectiveness of our proposed method.

ISSN: 2640-3498 abstractTranslation: 联邦双层学习在各种新兴的机器学习和通信应用中受到越来越多的关注。最近,已经提出了几种基于 Hessian 向量的算法来解决联邦双层优化问题。然而,在存在非独立同分布的情况下,联邦学习中的几个重要属性,例如部分客户端参与和收敛的线性加速(即,收敛速度和复杂性相对于采样客户端的数量线性提高)。数据集,仍然保持开放。在本文中,我们通过提出一种名为 FedMBO 的新联邦双层算法来填补这些空白,并在联邦超梯度估计中采用新颖的客户端采样方案。我们证明 FedMBO 的收敛速度为   在非独立同分布上数据集,其中 n 是每轮参与的客户端数量,K 是迭代总数。这是非独立同分布的第一个理论线性加速结果。联邦双层优化。大量的实验验证了我们的理论结果并证明了我们提出的方法的有效性。

Notes:







43. FeDXL: Provable Federated Learning for Deep X-Risk Optimization

Authors: Zhishuai Guo; Rong Jin; Jiebo Luo; Tianbao Yang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/guo23c.html

Abstract: In this paper, we tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing FL algorithms are applicable. In particular, the objective has the form of , where two sets of data  are distributed over multiple machines, is a pairwise loss that only depends on the prediction outputs of the input data pairs . This problem has important applications in machine learning, e.g., AUROC maximization with a pairwise loss, and partial AUROC maximization with a compositional loss. The challenges for designing an FL algorithm for X-risks lie in the non-decomposability of the objective over multiple machines and the interdependency between different machines. To this end, we propose an active-passive decomposition framework that decouples the gradient’s components with two types, namely active parts and passive parts, where the active parts depend on local data that are computed with the local model and the passive parts depend on other machines that are communicated/computed based on historical models and samples. Under this framework, we design two FL algorithms (FeDXL) for handling linear and nonlinear f, respectively, based on federated averaging and merging and develop a novel theoretical analysis to combat the latency of the passive parts and the interdependency between the local model parameters and the involved data for computing local gradient estimators. We establish both iteration and communication complexities and show that using the historical samples and models for computing the passive parts do not degrade the complexities. We conduct empirical studies of FeDXL for deep AUROC and partial AUROC maximization, and demonstrate their performance compared with several baselines.

ISSN: 2640-3498 abstractTranslation: 在本文中,我们解决了一个新的联邦学习(FL)问题,用于优化一系列 X 风险,现有的 FL 算法不适用于该问题。特别地,目标的形式为 Ez∼S1f(Ez′∼S2l(w;z,z′))Ez∼S1f(Ez′∼S2l(w;z,z′))\mathbb{E}_{ \mathbf{z}\sim \mathcal{S}1} f(\mathbb{E}{\mathbf{z}'\sim\mathcal{S}_2} \ell(\mathbf{w}; \mathbf{ z}, \mathbf{z}')),其中两组数据 S1,S2S1,S2\mathcal S_1, \mathcal S_2 分布在多台机器上,l(⋅;⋅,⋅)l(⋅;⋅,⋅) )\ell(\cdot; \cdot,\cdot) 是一个成对损失,仅取决于输入数据对 (z,z′)(z,z′)(\mathbf{z}, \mathbf 的预测输出{z}')。这个问题在机器学习中具有重要的应用,例如,具有成对损失的 AUROC 最大化,以及具有组合损失的部分 AUROC 最大化。设计针对 X 风险的 FL 算法的挑战在于目标在多台机器上的不可分解性以及不同机器之间的相互依赖性。为此,我们提出了一种主动-被动分解框架,将梯度分量解耦为两种类型,即主动部分和被动部分,其中主动部分依赖于使用局部模型计算的局部数据,被动部分依赖于其他部分基于历史模型和样本进行通信/计算的机器。在此框架下,我们设计了两种基于联邦平均和合并的 FL 算法(FeDXL),分别用于处理线性和非线性 fff,并开发了一种新颖的理论分析来应对无源部件的延迟以及局部模型参数和局部模型参数之间的相互依赖性。用于计算局部梯度估计器的相关数据。我们建立了迭代和通信复杂性,并表明使用历史样本和模型来计算无源部件不会降低复杂性。我们对 FeDXL 进行深度 AUROC 和部分 AUROC 最大化的实证研究,并证明其与几个基线相比的性能。

Notes:

PUB (https://openreview.net/forum?id=C7fNCYdptO)

PDF (https://arxiv.org/abs/2210.14396)

CODE (https://github.com/optimization-ai/icml2023_fedxl)







44. FedBR: Improving Federated Learning on Heterogeneous Data via Local Learning Bias Reduction

Authors: Yongxin Guo; Xiaoying Tang; Tao Lin

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/guo23g.html

Abstract: Federated Learning (FL) is a way for machines to learn from data that is kept locally, in order to protect the privacy of clients. This is typically done using local SGD, which helps to improve communication efficiency. However, such a scheme is currently constrained by slow and unstable convergence due to the variety of data on different clients’ devices. In this work, we identify three under-explored phenomena of biased local learning that may explain these challenges caused by local updates in supervised FL. As a remedy, we propose FedBR, a novel unified algorithm that reduces the local learning bias on features and classifiers to tackle these challenges. FedBR has two components. The first component helps to reduce bias in local classifiers by balancing the output of the models. The second component helps to learn local features that are similar to global features, but different from those learned from other data sources. We conducted several experiments to test FedBR and found that it consistently outperforms other SOTA FL methods. Both of its components also individually show performance gains. Our code is available at https://github.com/lins-lab/fedbr.

ISSN: 2640-3498 abstractTranslation: 联邦学习(FL)是机器从本地保存的数据中学习的一种方式,以保护客户的隐私。这通常是使用本地 SGD 完成的,这有助于提高通信效率。然而,由于不同客户端设备上的数据多种多样,目前这种方案受到收敛速度慢且不稳定的限制。在这项工作中,我们确定了三种尚未充分探索的局部学习偏差现象,这可能解释了监督式 FL 中局部更新带来的这些挑战。作为补救措施,我们提出了 FedBR,这是一种新颖的统一算法,可以减少特征和分类器的本地学习偏差,以应对这些挑战。FedBR 有两个组成部分。第一个组件通过平衡模型的输出来帮助减少局部分类器的偏差。第二个组件有助于学习与全局特征相似但与从其他数据源学习的特征不同的局部特征。我们进行了多项实验来测试 FedBR,发现它始终优于其他 SOTA FL 方法。它的两个组件也分别显示出性能提升。我们的代码可在 https://github.com/lins-lab/fedbr 获取。

Notes:

PUB (https://openreview.net/forum?id=nDKoVwNjMH)

PDF (https://arxiv.org/abs/2205.13462)

CODE (https://github.com/lins-lab/fedbr)







45. Out-of-Distribution Generalization of Federated Learning via Implicit Invariant Relationships

Authors: Yaming Guo; Kai Guo; Xiaofeng Cao; Tieru Wu; Yi Chang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/guo23b.html

Abstract: Out-of-distribution generalization is challenging for non-participating clients of federated learning under distribution shifts. A proven strategy is to explore those invariant relationships between input and target variables, working equally well for non-participating clients. However, learning invariant relationships is often in an explicit manner from data, representation, and distribution, which violates the federated principles of privacy-preserving and limited communication. In this paper, we propose FedIIR, which implicitly learns invariant relationships from parameter for out-of-distribution generalization, adhering to the above principles. Specifically, we utilize the prediction disagreement to quantify invariant relationships and implicitly reduce it through inter-client gradient alignment. Theoretically, we demonstrate the range of non-participating clients to which FedIIR is expected to generalize and present the convergence results for FedIIR in the massively distributed with limited communication. Extensive experiments show that FedIIR significantly outperforms relevant baselines in terms of out-of-distribution generalization of federated learning.

ISSN: 2640-3498 abstractTranslation: 对于分布变化下联邦学习的非参与客户来说,分布外泛化具有挑战性。一个行之有效的策略是探索输入变量和目标变量之间的不变关系,对于非参与客户同样有效。然而,学习不变关系通常是以显式的方式从数据、表示和分布中学习,这违反了隐私保护和有限通信的联邦原则。在本文中,我们提出了 FedIIR,它遵循上述原则,隐式地从参数中学习不变关系以进行分布外泛化。具体来说,我们利用预测不一致来量化不变关系,并通过客户端间梯度对齐隐式减少它。从理论上讲,我们展示了 FedIIR 预计将推广到的非参与客户的范围,并呈现了 FedIIR 在大规模分布式且通信有限的情况下的收敛结果。大量实验表明,FedIIR 在联邦学习的分布外泛化方面显着优于相关基线。

Notes:

PUB (https://openreview.net/forum?id=JC05k0E2EM)

CODE (https://github.com/YamingGuo98/FedIIR)







46. Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design

Authors: Chuan Guo; Kamalika Chaudhuri; Pierre Stock; Michael Rabbat

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/guo23a.html

Abstract: In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accuracy of the learnt model as well as the number of bits communicated between the clients and server. Prior work has achieved a good trade-off by designing a privacy-aware compression mechanism, called the minimum variance unbiased (MVU) mechanism, that numerically solves an optimization problem to determine the parameters of the mechanism. This paper builds upon it by introducing a new interpolation procedure in the numerical design process that allows for a far more efficient privacy analysis. The result is the new Interpolated MVU mechanism that is more scalable, has a better privacy-utility trade-off, and provides SOTA results on communication-efficient private FL on a variety of datasets.

ISSN: 2640-3498 abstractTranslation: 在私有联邦学习(FL)中,服务器聚合来自大量客户端的差异私有更新,以训练机器学习模型。此设置中的主要挑战是平衡隐私与学习模型的分类准确性以及客户端和服务器之间通信的位数。先前的工作通过设计一种隐私感知压缩机制(称为最小方差无偏(MVU)机制)实现了良好的权衡,该机制通过数值方式解决优化问题以确定机制的参数。本文在此基础上通过在数值设计过程中引入一种新的插值程序来实现更有效的隐私分析。结果是新的插值 MVU 机制更具可扩展性,具有更好的隐私与实用性权衡,并在各种数据集上提供了通信高效的私有 FL 的 SOTA 结果。

Notes:

PUB (https://openreview.net/forum?id=Otdp5SGQMr)

PDF (https://arxiv.org/abs/2211.03942)

CODE (https://github.com/facebookresearch/dp_compression)







47. Federated Heavy Hitter Recovery under Linear Sketching

Authors: Adria Gascon; Peter Kairouz; Ziteng Sun; Ananda Theertha Suresh

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/gascon23a.html

Abstract: Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds  is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is . We also empirically demonstrate our findings.

ISSN: 2640-3498 abstractTranslation: 在具有安全聚合的多轮联邦分析的现实部署的推动下,我们研究了线性草图约束下的重磅发现和近似(开放域)直方图问题的基本通信准确性权衡。我们提出了基于局部子采样和可逆布隆查找表(IBLT)的高效算法。我们还表明,我们的算法对于广泛的交互方案来说在信息理论上是最优的。结果表明,线性草图约束确实通过引入对一轮中用户数量的额外线性依赖而增加了这两项任务的通信成本。此外,我们的结果还建立了多轮设置中重击者发现的通信成本与近似直方图之间的分离。对于重量级发现而言,对 RRR 轮数的依赖性至多是对数,而近似直方图的依赖性是 θ(R−−√)θ(R)\Theta(\sqrt{R})。我们还凭经验证明了我们的发现。

Notes:

PUB (https://openreview.net/forum?id=zN4oRCrlnM)

PDF (https://arxiv.org/abs/2307.13347)

CODE (https://github.com/google-research/federated)







48. DoCoFL: Downlink Compression for Cross-Device Federated Learning

Authors: Ron Dorfman; Shay Vargaftik; Yaniv Ben-Itzhak; Kfir Yehuda Levy

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/dorfman23a.html

Abstract: Many compression techniques have been proposed to reduce the communication overhead of Federated Learning training procedures. However, these are typically designed for compressing model updates, which are expected to decay throughout training. As a result, such methods are inapplicable to downlink (i.e., from the parameter server to clients) compression in the cross-device setting, where heterogeneous clients may appear only once during training and thus must download the model parameters. Accordingly, we propose DoCoFL – a new framework for downlink compression in the cross-device setting. Importantly, DoCoFL can be seamlessly combined with many uplink compression schemes, rendering it suitable for bi-directional compression. Through extensive evaluation, we show that DoCoFL offers significant bi-directional bandwidth reduction while achieving competitive accuracy to that of a baseline without any compression.

ISSN: 2640-3498 abstractTranslation: 人们提出了许多压缩技术来减少联邦学习训练过程的通信开销。然而,这些通常是为了压缩模型更新而设计的,预计模型更新会在整个训练过程中衰减。因此,此类方法不适用于跨设备设置中的下行链路(即从参数服务器到客户端)压缩,其中异构客户端在训练期间可能只出现一次,因此必须下载模型参数。因此,我们提出了 DoCoFL——一种跨设备设置中下行链路压缩的新框架。重要的是,DoCoFL可以与许多上行链路压缩方案无缝结合,使其适合双向压缩。通过广泛的评估,我们表明 DoCoFL 可以显着降低双向带宽,同时在没有任何压缩的情况下实现与基线相比具有竞争力的精度。

Notes:

PUB (https://openreview.net/forum?id=VxKr51JjWC)

PDF (https://arxiv.org/abs/2302.00543)







49. Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning

Authors: Yanbo Dai; Songze Li

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/dai23a.html

Abstract: In a federated learning (FL) system, distributed clients upload their local models to a central server to aggregate into a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing images with specific patterns to be misclassified into some target labels. Backdoors planted by current attacks are not durable, and vanish quickly once the attackers stop model poisoning. In this paper, we investigate the connection between the durability of FL backdoors and the relationships between benign images and poisoned images (i.e., the images whose labels are flipped to the target label during local training). Specifically, benign images with the original and the target labels of the poisoned images are found to have key effects on backdoor durability. Consequently, we propose a novel attack, Chameleon, which utilizes contrastive learning to further amplify such effects towards a more durable backdoor. Extensive experiments demonstrate that Chameleon significantly extends the backdoor lifespan over baselines by 1.2×∼4×, for a wide range of image datasets, backdoor types, and model architectures.

ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)系统中,分布式客户端将其本地模型上传到中央服务器以聚合成全局模型。恶意客户端可能通过上传中毒的本地模型在全局模型中植入后门,导致特定模式的图像被错误分类到某些目标标签中。目前的攻击所植入的后门并不持久,一旦攻击者停止模型中毒,后门就会很快消失。在本文中,我们研究了 FL 后门的持久性与良性图像和中毒图像(即在本地训练期间标签翻转到目标标签的图像)之间的关系之间的联系。具体来说,发现具有原始图像和中毒图像的目标标签的良性图像对后门耐久性具有关键影响。因此,我们提出了一种新颖的攻击,Chameleon,它利用对比学习来进一步放大这种效果,从而形成更持久的后门。大量实验表明,对于各种图像数据集、后门类型和模型架构,Chameleon 将后门寿命显着延长了基线 1.2×∼4×。

Notes:

PUB (https://openreview.net/forum?id=HtHFnHrZXu)

PDF (https://arxiv.org/abs/2304.12961)

CODE (https://github.com/ybdai7/chameleon-durable-backdoor)







50. From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning

Authors: Edwige Cyffers; Aurélien Bellet; Debabrota Basu

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/cyffers23a.html

Abstract: We study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. We show that this new perspective recovers popular private gradient-based methods like DP-SGD and provides a principled way to design and analyze new private optimization algorithms in a flexible manner. Focusing on the widely-used Alternating Directions Method of Multipliers (ADMM) method, we use our general framework derive novel private ADMM algorithms for centralized, federated and fully decentralized learning. We establish strong privacy guarantees for these algorithms, leveraging privacy amplification by iteration and by subsampling. Finally, we provide utility guarantees for the three algorithms using a unified analysis that exploits a recent linear convergence result for noisy fixed-point iterations.

ISSN: 2640-3498 abstractTranslation: 我们研究差分隐私(DP)机器学习算法作为噪声定点迭代的实例,以便从这个经过充分研究的框架中获得隐私和实用结果。我们证明,这种新视角恢复了流行的基于私有梯度的方法,如 DP-SGD,并提供了一种以灵活的方式设计和分析新的私有优化算法的原则方法。专注于广泛使用的交替方向乘子法(ADMM)方法,我们使用我们的通用框架衍生出新颖的私有 ADMM 算法,用于集中式、联邦式和完全分散式学习。我们为这些算法建立了强有力的隐私保证,通过迭代和二次采样来利用隐私放大。最后,我们使用统一分析为三种算法提供效用保证,该分析利用了噪声定点迭代的最新线性收敛结果。

Notes:

PUB (https://openreview.net/forum?id=CBLDv6SFMn)

PDF (https://arxiv.org/abs/2302.12559)

CODE (https://github.com/totilas/padadmm)







51. On the Convergence of Federated Averaging with Cyclic Client Participation

Authors: Yae Jee Cho; Pranay Sharma; Gauri Joshi; Zheng Xu; Satyen Kale; Tong Zhang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/cho23b.html

Abstract: Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, network connectivity, and maximum participation frequency requirements (to ensure privacy) are available for training at a given time. As a result, client availability follows a natural cyclic pattern. We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. Our analysis discovers that cyclic client participation can achieve a faster asymptotic convergence rate than vanilla FedAvg with uniform client participation under suitable conditions, providing valuable insights into the design of client sampling protocols.

ISSN: 2640-3498 abstractTranslation: 联邦平均 (FedAvg) 及其变体是联邦学习 (FL) 中最流行的优化算法。之前的 FedAvg 收敛分析要么假设全部客户参与,要么假设部分客户参与,其中客户可以进行统一抽样。然而,在实际的跨设备 FL 系统中,只有满足本地标准(例如电池状态、网络连接和最大参与频率要求(以确保隐私))的客户端子集可在给定时间进行训练。因此,客户可用性遵循自然的循环模式。我们提供(据我们所知)第一个理论框架来分析 FedAvg 与循环客户参与与几种不同的客户优化器(例如 GD、SGD 和洗牌 SGD)的收敛性。我们的分析发现,在适当的条件下,与统一客户参与的普通 FedAvg 相比,循环客户参与可以实现更快的渐近收敛速度,为客户抽样协议的设计提供了宝贵的见解。

Notes:

PUB (https://openreview.net/forum?id=d8LTNXt97w)

PDF (https://arxiv.org/abs/2302.03109)







52. GuardHFL: Privacy Guardian for Heterogeneous Federated Learning

Authors: Hanxiao Chen; Meng Hao; Hongwei Li; Kangjie Chen; Guowen Xu; Tianwei Zhang; Xilin Zhang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/chen23j.html

Abstract: Heterogeneous federated learning (HFL) enables clients with different computation and communication capabilities to collaboratively train their own customized models via a query-response paradigm on auxiliary datasets. However, such a paradigm raises serious privacy concerns due to the leakage of highly sensitive query samples and response predictions. We put forth GuardHFL, the first-of-its-kind efficient and privacy-preserving HFL framework. GuardHFL is equipped with a novel HFL-friendly secure querying scheme built on lightweight secret sharing and symmetric-key techniques. The core of GuardHFL is two customized multiplication and comparison protocols, which substantially boost the execution efficiency. Extensive evaluations demonstrate that GuardHFL significantly outperforms the alternative instantiations based on existing state-of-the-art techniques in both runtime and communication cost.

ISSN: 2640-3498 abstractTranslation: 异构联邦学习 (HFL) 使具有不同计算和通信能力的客户能够通过辅助数据集上的查询响应范例协作训练自己的定制模型。然而,由于高度敏感的查询样本和响应预测的泄露,这种范例引起了严重的隐私问题。我们提出了 GuardHFL,这是首个高效且保护隐私的 HFL 框架。GuardHFL 配备了一种基于轻量级秘密共享和对称密钥技术的新型 HFL 友好安全查询方案。GuardHFL的核心是两个定制的乘法和比较协议,极大地提高了执行效率。广泛的评估表明,GuardHFL 在运行时间和通信成本方面显着优于基于现有最先进技术的替代实例。

Notes:

PUB (https://openreview.net/forum?id=iASUTBGw07)







53. Efficient Personalized Federated Learning via Sparse Model-Adaptation

Authors: Daoyuan Chen; Liuyi Yao; Dawei Gao; Bolin Ding; Yaliang Li

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/chen23aj.html

Abstract: Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data. Due to the heterogeneity of clients’ local data distribution, recent studies explore the personalized FL that learns and deploys distinct local models with the help of auxiliary global models. However, the clients can be heterogeneous in terms of not only local data distribution, but also their computation and communication resources. The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. With a lightweight trainable gating layer, pFedGate enables clients to reach their full potential in model capacity by generating different sparse models accounting for both the heterogeneous data distributions and resource constraints. Meanwhile, the computation and communication efficiency are both improved thanks to the adaptability between the model sparsity and clients’ resources. Further, we theoretically show that the proposed pFedGate has superior complexity with guaranteed convergence and generalization error. Extensive experiments show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods. We also demonstrate that pFedGate performs better than competitors in the novel clients participation and partial clients participation scenarios, and can learn meaningful sparse local models adapted to different data distributions.

ISSN: 2640-3498 abstractTranslation: 联邦学习(FL)旨在为多个客户训练机器学习模型,而无需共享他们自己的私人数据。由于客户本地数据分布的异构性,最近的研究探索了个性化 FL,它在辅助全局模型的帮助下学习和部署不同的本地模型。然而,客户端不仅在本地数据分布方面可能是异构的,而且在其计算和通信资源方面也可能是异构的。个性化模型的容量和效率受到资源最少的客户端的限制,导致个性化 FL 的性能次佳和实用性有限。为了克服这些挑战,我们提出了一种名为 pFedGate 的新方法,通过自适应且高效地学习稀疏局部模型来实现高效的个性化 FL。凭借轻量级的可训练门控层,pFedGate 能够通过生成不同的稀疏模型来充分发挥模型容量的潜力,同时考虑到异构数据分布和资源限制。同时,由于模型稀疏性和客户端资源之间的适应性,计算和通信效率都得到了提高。此外,我们从理论上表明,所提出的 pFedGate 具有卓越的复杂性,并保证收敛和泛化误差。大量实验表明,与最先进的方法相比,pFedGate 同时实现了卓越的全局精度、个体精度和效率。我们还证明,pFedGate 在新颖的客户参与和部分客户参与场景中比竞争对手表现更好,并且可以学习适应不同数据分布的有意义的稀疏本地模型。

Notes:

PUB (https://openreview.net/forum?id=ieSN7Xyo8g)

PDF (https://arxiv.org/abs/2305.02776)

CODE (https://github.com/alibaba/federatedscope)

CODE (https://github.com/yxdyc/pfedgate)







54. Fast Federated Machine Unlearning with Nonlinear Functional Theory

Authors: Tianshi Che; Yang Zhou; Zijie Zhang; Lingjuan Lyu; Ji Liu; Da Yan; Dejing Dou; Jun Huan

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/che23b.html

Abstract: Federated machine unlearning (FMU) aims to remove the influence of a specified subset of training data upon request from a trained federated learning model. Despite achieving remarkable performance, existing FMU techniques suffer from inefficiency due to two sequential operations of training and retraining/unlearning on large-scale datasets. Our prior study, PCMU, was proposed to improve the efficiency of centralized machine unlearning (CMU) with certified guarantees, by simultaneously executing the training and unlearning operations. This paper proposes a fast FMU algorithm, FFMU, for improving the FMU efficiency while maintaining the unlearning quality. The PCMU method is leveraged to train a local machine learning (MU) model on each edge device. We propose to employ nonlinear functional analysis techniques to refine the local MU models as output functions of a Nemytskii operator. We conduct theoretical analysis to derive that the Nemytskii operator has a global Lipschitz constant, which allows us to bound the difference between two MU models regarding the distance between their gradients. Based on the Nemytskii operator and average smooth local gradients, the global MU model on the server is guaranteed to achieve close performance to each local MU model with the certified guarantees.

ISSN: 2640-3498 abstractTranslation: 联邦机器取消学习 (FMU) 旨在根据经过训练的联邦学习模型的请求消除指定训练数据子集的影响。尽管取得了显着的性能,但现有的 FMU 技术由于在大规模数据集上进行两次连续的训练和再训练/取消学习操作而效率低下。我们之前的研究 PCMU 旨在通过同时执行训练和取消学习操作来提高具有认证保证的集中式机器取消学习(CMU)的效率。本文提出了一种快速 FMU 算法 FFMU,用于在保持取消学习质量的同时提高 FMU 效率。利用 PCMU 方法在每个边缘设备上训练本地机器学习 (MU) 模型。我们建议采用非线性函数分析技术来细化局部 MU 模型作为 Nemytskii 算子的输出函数。我们进行理论分析,得出 Nemytskii 算子具有全局 Lipschitz 常数,这使我们能够限制两个 MU 模型之间关于梯度之间距离的差异。基于 Nemytskii 算子和平均平滑局部梯度,服务器上的全局 MU 模型保证在经过认证的保证下实现与每个局部 MU 模型接近的性能。

Notes:

PUB (https://openreview.net/forum?id=6wQKmKiDHw)







55. LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning

Authors: Timothy Castiglia; Yi Zhou; Shiqiang Wang; Swanand Kadhe; Nathalie Baracaldo; Stacy Patterson

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/castiglia23a.html

Abstract: We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.

ISSN: 2640-3498 abstractTranslation: 我们提出了 LESS-VFL,一种用于具有纵向分区数据的分布式系统的通信高效特征选择方法。我们考虑一个由服务器和多个具有本地数据集的各方组成的系统,这些数据集共享样本 ID 空间但具有不同的特征集。双方希望合作训练一个用于预测任务的模型。作为培训的一部分,各方希望删除系统中不重要的特征,以提高泛化性、效率和可解释性。在 LESS-VFL 中,经过短暂的预训练期后,服务器优化其全局模型的一部分,以确定各方模型的相关输出。该信息与各方共享,然后无需通信即可进行本地特征选择。我们分析证明 LESS-VFL 消除了模型训练中的虚假特征。我们提供了广泛的经验证据,表明 LESS-VFL 可以实现高精度并以其他特征选择方法的一小部分通信成本去除虚假特征。

Notes:

PUB (https://openreview.net/forum?id=L8iWCxzwl1)

PDF (https://arxiv.org/abs/2305.02219)







56. Optimizing the Collaboration Structure in Cross-Silo Federated Learning

Authors: Wenxuan Bao; Haohan Wang; Jun Wu; Jingrui He

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/bao23b.html

Abstract: In federated learning (FL), multiple clients collaborate to train machine learning models together while keeping their data decentralized. Through utilizing more training data, FL suffers from the potential negative transfer problem: the global FL model may even perform worse than the models trained with local data only. In this paper, we propose FedCollab, a novel FL framework that alleviates negative transfer by clustering clients into non-overlapping coalitions based on their distribution distances and data quantities. As a result, each client only collaborates with the clients having similar data distributions, and tends to collaborate with more clients when it has less data. We evaluate our framework with a variety of datasets, models, and types of non-IIDness. Our results demonstrate that FedCollab effectively mitigates negative transfer across a wide range of FL algorithms and consistently outperforms other clustered FL algorithms.

ISSN: 2640-3498 abstractTranslation: 在联邦学习 (FL) 中,多个客户协作共同训练机器学习模型,同时保持数据分散。通过利用更多的训练数据,FL 面临着潜在的负迁移问题:全局 FL 模型甚至可能比仅使用本地数据训练的模型表现更差。在本文中,我们提出了 FedCollab,这是一种新颖的 FL 框架,它通过根据客户的分布距离和数据量将客户聚类成不重叠的联盟来减轻负转移。结果,每个客户端仅与具有相似数据分布的客户端协作,并且当其数据较少时倾向于与更多客户端协作。我们使用各种数据集、模型和非独立同分布类型来评估我们的框架。我们的结果表明,FedCollab 有效地减轻了各种 FL 算法的负迁移,并且始终优于其他集群 FL 算法。

Notes:

PUB (https://openreview.net/forum?id=rnNBSMOWvA)

PDF (https://arxiv.org/abs/2306.06508)

CODE (https://github.com/baowenxuan/fedcollab) SLIDES(https://icml.cc/media/icml-2023/Slides/23569.pdf)







57. Personalized Subgraph Federated Learning

Authors: Jinheon Baek; Wonyong Jeong; Jiongdao Jin; Jaehong Yoon; Sung Ju Hwang

Conference : International Conference on Machine Learning

Url: https://proceedings.mlr.press/v202/baek23a.html

Abstract: Subgraphs of a larger global graph may be distributed across multiple devices, and only locally accessible due to privacy restrictions, although there may be links between subgraphs. Recently proposed subgraph Federated Learning (FL) methods deal with those missing links across local subgraphs while distributively training Graph Neural Networks (GNNs) on them. However, they have overlooked the inevitable heterogeneity between subgraphs comprising different communities of a global graph, consequently collapsing the incompatible knowledge from local GNN models. To this end, we introduce a new subgraph FL problem, personalized subgraph FL, which focuses on the joint improvement of the interrelated local GNNs rather than learning a single global model, and propose a novel framework, FEDerated Personalized sUBgraph learning (FED-PUB), to tackle it. Since the server cannot access the subgraph in each client, FED-PUB utilizes functional embeddings of the local GNNs using random graphs as inputs to compute similarities between them, and use the similarities to perform weighted averaging for server-side aggregation. Further, it learns a personalized sparse mask at each client to select and update only the subgraph-relevant subset of the aggregated parameters. We validate our FED-PUB for its subgraph FL performance on six datasets, considering both non-overlapping and overlapping subgraphs, on which it significantly outperforms relevant baselines. Our code is available at https://github.com/JinheonBaek/FED-PUB.

ISSN: 2640-3498 abstractTranslation: 较大全局图的子图可能分布在多个设备上,并且由于隐私限制只能在本地访问,尽管子图之间可能存在链接。最近提出的子图联邦学习(FL)方法可以处理局部子图之间缺失的链接,同时在其上分布式训练图神经网络(GNN)。然而,他们忽视了由全局图的不同社区组成的子图之间不可避免的异质性,从而瓦解了局部 GNN 模型中不兼容的知识。为此,我们引入了一种新的子图 FL 问题,即个性化子图 FL,其重点是相互关联的局部 GNN 的联邦改进,而不是学习单个全局模型,并提出了一种新颖的框架:FEDerated Personalized sUBgraph Learning (FED-PUB) ,来解决它。由于服务器无法访问每个客户端中的子图,因此 FED-PUB 利用本地 GNN 的功能嵌入(使用随机图作为输入)来计算它们之间的相似度,并使用相似度对服务器端聚合执行加权平均。此外,它在每个客户端学习个性化稀疏掩码,以仅选择和更新聚合参数的子图相关子集。我们在六个数据集上验证了 FED-PUB 的子图 FL 性能,同时考虑了非重叠和重叠子图,其显着优于相关基线。我们的代码可在 https://github.com/JinheonBaek/FED-PUB 获取。

Notes:

PUB (https://openreview.net/forum?id=GXHL8ZS1GX)

PDF (https://arxiv.org/abs/2206.10206)

CODE (https://github.com/JinheonBaek/FED-PUB)

项目链接: https://zhuanlan.zhihu.com/p/648688758

作者: 白小鱼(上海交通大学计算机系博士生)

分享仅供学习参考,若有不当,请联系我们处理。

END

热门文章:




隐私计算头条周刊(08.14-08.20)


招标 | 近期隐私计算项目招标中标42(广西电网公司、云南农业职业技术学院、中国电信)


2023全球各国隐私计算发展最新动态盘点


社区招募丨OpenMPC隐私计算课程课代表征集


加入我们丨OpenMPC社区招募实习生

继续滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存