Jiashu Tao
Department of Computer Science
National University of Singapore
Singapore
jiashut@comp.nus.edu.sg
&Reza Shokri
Department of Computer Science
National University of Singapore
Singapore
reza@comp.nus.edu.sg
Abstract
Machine learning models can leak private information about their training data, but the standard methods to measure this risk, based on membership inference attacks (MIAs), have a major limitation. They only check if a given data point exactly matches a training point, neglecting the potential of similar or partially overlapping data revealing the same private information. To address this issue, we introduce the class of range membership inference attacks (RaMIAs), testing if the model was trained on any data in a specified range (defined based on the semantics of privacy). We formulate the RaMIAs game and design a principled statistical test for its complex hypotheses. We show that RaMIAs can capture privacy loss more accurately and comprehensively than MIAs on various types of data, such as tabular, image, and language. RaMIA paves the way for a more comprehensive and meaningful privacy auditing of machine learning algorithms.
1 Introduction
Machine learning models are prone to training data memorization [14, 15, 27, 41, 24]. It is also a known fact that the outstanding predictive performance of machine learning models on long-tailed data distributions often comes at the expense of blatant memorization of certain data points [15, 3, 30, 16]. In simple words, memorization is the phenomenon that models behave differently on training points, compared to other points. The memorization can lead to significant privacy risks as adversaries can infer private information about the training data from only having black box access to models.
To quantify the privacy risk of machine learning models, a privacy notion needs to be fixed first. The reigning privacy notion is defined by membership information. Membership information of a data point is binary, but this single bit of information carries huge privacy implications. Being able to infer membership information opens up the possibility of conducting data reconstruction attack[36, 17, 4, 29], where the reconstruction attack inspects the membership of plausible data points to recover the training set. The de-facto way to audit the privacy risk according to this privacy notion is to conduct membership inference attacks (MIAs)[39], where an adversary aims to predict whether a given query data is part of the training set of the target model. The more powerful the membership inference attack is, the higher the privacy risk the target model bears.
Membership inference attacks provide a lower bound of the true privacy risk of a model, so improving the attack performance also means tightening the bound of privacy risk estimation. So far, the community has put much effort into improving the power of membership inference attacks by crafting better membership signals and constructing better statistical tests [38, 39, 35, 44, 5, 47]. While these have been useful for the betterment of privacy auditing, they have ignored the fundamental drawback of membership inference attacks as a practical privacy auditing tool, i.e., MIAs assume it is a privacy concern only if the adversary can identify the exact, full version of the training data. However, if the adversary can identify data points that are similar enough to the training points, it should also be treated as a significant privacy risk, because those points can contain similar levels of private information. For example, two Alice’s photos taken from slightly different angles, or with a different background would contain similar private information about Alice’s face or location. Similar to images, small perturbations or rephrasing in textual data also affect little in the sensitivity of the information conveyed [13]. This oversight means private information leakage beyond exact matches of training data is ignored and current privacy auditing tools might produce overly optimistic results.
Besides, focusing on exact membership inference attacks renders them incapable of handling queries with missing values. This is another major limitation of MIAs, as data records with the same sensitive features and a few missing non-private features would carry a similar level of private information as the full data records. Imagine the case where an adversary can infer that an Asian person of age 25, identification number 123456 is in the hospital training set, there is no need to identify the rest of the features because the adversary is already able to pinpoint who has HIV from the given subset of key features. Even if the key identifiers are missing and unknown, it is still a grave privacy threat if the attacker can infer the rest of the features that often contain quasi-identifiers, which has been studied extensively in the failure of k-anonymity [37], where the attacker is able to reconstruct and identify training data with certain features removed. However, if we make up the missing values or pass noisy data to the membership inference attack, the attack is expected to output “not a member," since the chance of us coming up with the right values is so slimSupposeIf we use property inference [40, 1, 8, 9, 45, 20] to help fill in the missing features, the imputed data may be have inflated membership score even if they are non-members, leading to a higher false positive rate. Therefore, existing frameworks struggle with quantifying privacy leakage with missing features.
We argue that privacy quantification should not be point-based, because a small neighborhood around training points also contains similar private information. Hence, in this paper, we are proposing a new attack framework called range membership inference attacks (RaMIAs) to better capture the notion of privacy. Instead of using point queries and testing for exact matches, range membership inference attacks use range queries that cover a set of points. The goal of range membership inference attacks is to infer if the given range query contains any training point.
Range membership inference attacks extend the formulation of membership inference attacks. We adapt the original inference game formulation to reflect the change to range queries in RaMIAs. This extended formulation produces composite hypotheses in the likelihood ratio tests, which are the standard and best attack techniques in MIAs [38, 44, 5, 47]. Our method is based on standard statistical methods for composite hypothesis testing, namely generalized likelihood ratio tests and Bayes factors. We show that RaMIAs can provide a more comprehensive notion of privacy by detecting private information leakage from the vicinity of training data when MIAs underestimate such privacy risk. Specifically, we observe a simple flipping can cause the membership score to decrease from very high to 0 (Figure 1), and the overall AUC can drop 0.20 if we test image classifiers with horizontally flipped images (Figure 2(c)). RaMIA, implemented with our simple attack strategy (Sec 4), supersedes MIA by at least 5% on image datasets (Fig 3(b), 3(c)), providing better privacy auditing at the cost of as few as 15 samples, which is insignificant compared to the dimensionality of the data space.
In this paper, we emphasize the motivation and formulation of our newly proposed attack framework, RaMIA. As a proof-of-concept, we experiment RaMIA with a simple attack strategy on tabular, image and text datasets, where RaMIA unanimously outperforms MIA. Additionally, our attack can also be potentially used in the pioneering membership inference and data extraction attacks on generative models [4, 43, 7], where the current evaluation requires finding the closest training image for all candidate data and computing their distances. By setting the distance function as the range function and conducting RaMIAs, we can evaluate the attacks more systematically.
2 Preliminaries
2.1 Membership inference attacks
The membership inference attack (MIA) [39] is a type of inference attack against machine learning models to infer whether a given data sample is part of the model’s training set. Mathematically, given a model and a query point , the MIA aims to output 1 if is a training point, and 0 otherwise. There are various methods to construct and conduct the attack, and it is still an active research direction that sees more powerful attacks being developed. Shokri etal. [39] use a shadow model based approach where shadow models are trained on known training sets in similar ways to the target model. Confidence values of the training and test data on the shadow models are computed, which are then used as benchmarks in testing. However, the high cost and strong assumption of knowing the target model’s training details make the attack often infeasible. Yeom etal. [45] use model loss as a signal and threshold it, scraping the need for shadow models. Then MIA is formulated as an inference game (See Sec 3.1.1). Researchers turn to the principled approach to solve the game via likelihood ratio tests [38, 5, 44, 47]. Carlini etal. [5] and Ye etal. [44] propose reference-model based approaches, where target signals are compared to those obtained on reference models to obtain the likelihood ratio. To further boost the attack power, Zarifzadeh etal. [47] assume the attacker has access to a pool of population data so that the likelihood ratio from reference-based attacks can be calibrated on ratios obtained on (non-member) population data.
Recent attacks [4, 47] further boost the attack performance on image data by augmenting the test queries with train-time augmentations. This assumes that the attacker knows the exact train-test augmentations in advance, and is able to sample from them. Augmenting training images with non train-time augmentations is not considered for valid reasons: those augmented images would be non-members in the current privacy notion.
2.2 Range queries
If we make a connection to the field of databases, a membership inference attack operates on point queries or exact match queries. That is, each query to the membership inference attack only contains one data point and the attack only concerns if this very point is in the training set. On the other hand, range query, which is also a common querying operation in database systems, wants to retrieve all data points that fall into the "range". The most fundamental difference to point query is that the retrieved result often contains multiple data points instead of a single one. Our proposed attack, the range membership inference attack, operates with range queries.
3 From MIA to RaMIA
Membership inference attacks are often formulated as an inference game [45, 21, 44, 5, 47] between a challenger and an adversary. In this section, we will walk through how we come up with RaMIA from MIA.
3.1 Membership inference attacks
In membership inference attacks, the goal is to identify if a given point is part of the training set.
3.1.1 Membership inference game
Definition 1
(Membership Inference Game [44, 45]) Let be the data distribution, and let be the training algorithm.
- 1.
The challenger samples a training dataset , and trains a model .
- 2.
The challenger samples a data record from the data distribution, and a training data record .
- 3.
The challenger flips a fair coin to get the bit , and sends the target model and data record to the adversary.
- 4.
The adversary gets access to the data distribution and access to the target model, and outputs a bit .
- 5.
If , output 1 (success). Otherwise, output 0.
3.1.2 Evaluation of MIA
Evaluation is done with a set of training and test points. True positive rate (TPR) and false positive rate (FPR) are computed by sweeping over all possible threshold values. By plotting the receiver operating characteristic curve (ROC), the power of an attack strategy can be represented by the area under the curve (AUC). A clueless adversary who can only randomly guess the membership labels will get an AUC of 0.5. For stronger adversaries, they predict membership more correctly at each error level. Hence, they would achieve higher TPR at each FPR, and get a higher AUC.
3.1.3 Intrinsic limitation of MIA as a Privacy Auditing Framework
MIAs are intrinsically incapable of identifying points close to training points, regardless of how similar they are, because these points are, by definition, non-members in the scope of MIAs. Hence, there is a huge space of points that contain private information but are deemed non-members in the current privacy auditing framework. In this way, MIAs as privacy auditing tools become bad when the queries move away from the original data. Figure 2 shows the MIAs under-perform on non-original data. This inspires our formulation of RaMIA, where these points will be classified as "members" for better and more comprehensive privacy auditing.
3.2 Range membership inference attack
In range membership inference attacks, the goal is to identify if a given range contains any training point.
3.2.1 Range membership inference game
Here we define our range membership inference game, modified from the above formulation.
Definition 2
(Range Membership Inference Game) Let be the data distribution, and let be the training algorithm.
- 1.
The challenger samples a training dataset , and trains a model .
- 2.
The challenger samples a data record from the data distribution, and a training data record .
- 3.
The challenger flips a fair coin to get the bit . If , the challenger samples a range containing at least one training point. Otherwise, challenger samples a range containing no training points.
- 4.
The challenger sends the target model and the range to the adversary.
- 5.
The adversary gets access to the data distribution and access to the target model, and outputs a bit .
- 6.
If , output 1 (success). Otherwise, output 0.
The main difference between the two games is that the adversary now receives a range query (Step 4 in Def 2) instead of a point query (Step 3 in Def 1). We assume that the adversary is able to sample within the range. This is a reasonable assumption because the adversary is usually assumed to have the ability to sample from the original data distribution [39, 44, 47]. Given a range query and a sampler of , it is not difficult to sample within the range.
What is a range
A range can be defined by a center, which is a point, a radius representing the size of the range, and a distance function which the radius is defined with. We refer to the center as the query center, the radius as the range size, and the distance function as the range function in this paper. One way to visualize a range is to imagine a unit ball around a point , replacing the radius and distance with any arbitrary choice of range sizes and functions. Our framework can cater to any arbitrary range function. It can be spatial based (e.g. distances), transformation based (e.g. geometric transformations) and semantic based (e.g. owner/main features of the data). In the experiment section, we will present results with all of these types of range functions. Note that our attack reduces to user-level inference [31, 23, 27, 11, 10] when the range function is user-based.
How to construct a range
In Step 3 of the range membership inference game, the details of how the challenger samples the ranges are intentionally omitted. This is because the ranges can be constructed around either in-distribution or out-of-distribution data points for both in- and out-ranges. The details of how we construct the in- and out-ranges for our experiments are elaborated in Appendix B.
3.3 Evaluation of RaMIA
Similarly, we evaluate RaMIA with AUCs. However, the notion of true positives and false positives are different from those defined in MIA, as both are defined on the range level. To avoid confusion, we call them Range TPR and Range FPR, which means a range is correctly/wrongly predicted to contain at least one training point.
4 Range membership inference attacks
4.1 Composite hypothesis testing
Similar to likelihood ratio tests for membership inference games A.1, we can also construct two hypotheses for range membership inference game (Def 2):
The likelihood ratio in this case is . Note that the alternative hypothesis is composite because it is a union of multiple hypotheses . Therefore, we need to change our methodology to those tailored for composite hypothesis testing. There are two commonly used methods for it: Bayes Factor [22] and Generalized Likelihood Ratio Tests (GLRT) [42]. Bayes Factor replaces the composite hypothesis with a simple one that is representative of its hypothesis class. It models the "parameter" of the hypothesis with a prior distribution and then computes the expected value of the composite hypothesis based on the prior distribution. In this case, will be computed by. The generalized likelihood ratio test (GLRT) simply takes the maximum of all values the composite hypothesis can achieve. In this case, will be computed by .
To make full use of the Bayes Factor, we need to know the prior distribution, which is unrealistic. On the other hand, taking the max seems to be a more intuitive approach for range membership inference attacks, because it provides a two-step solution: search and test. Searching for the points with the highest membership score is conceptually equivalent to identifying the points that are most likely to be training points. However, this assumes that we can reliably find the max values in a given range. Since most ranges are large data subspaces, it is very challenging to find the optimal points within the large space. Even if the search space can be navigated, any search algorithm is likely to return local maxima. Hence, a robust way is to aggregate the top samples. However, membership inference attacks are known to be unreliable on out-of-distribution (OOD) data [47] and can assign them high scores. When the sampling space contains only these data as opposed to real and in-distribution (ID) data, the maximum might not be anything close to true training data, increasing the FPR and lowering the AUC as a result.
In this paper, we adopt a simple attack strategy. Our solution to this is based on the type of data in the sampling space. If the data are all naturally ID, we can combine Bayes Factor and GLRT by taking the average likelihood of the top samples to reduce the influence of the randomness in the sampling process.On the other hand, if the adversary can only synthesize data within the range, the top samples are highly likely to be OOD data with high membership scores. Hence, in this case, we want to remove those points from the equation. Since the presence of training points intuitively raises the average membership score of ID points nearby, compared to having no training points at all, we average the membership scores of the remaining samples.Unifying these two strategies gives us the following:
(1) |
where is the sampled set, and mark the start and the end of the quantiles where we want to remove to compute our robust statistics that are one-sided trimmed means. If the sampling space is filled with synthetic data, the chance of the top samples being false positives is high, so we set to remove the largest points. is a hyperparameter that decreases (trim more) as the quality of sampled points gets worse. On the other hand, if the sampling space consists of real points, we set to remove the smallest points in our aggregation. decreases (trim less) as the number of real samples decreases to offset the high variance due to limited samples available. Note that the optimal hyperparameters may differ across different membership signals (e.g. loss values, LiRA scores), as they exploit different vulnerabilities and expose different training points. However, for fixed model architectures, range functions, data distributions and sampling methods, these hyperparameters can be determined by reference models, similar to the offline version of RMIA [47]. Specifically, by randomly choosing a reference model as the temporary target model, we can run RaMIAs using all other reference models while sweeping these hyperparameters.
4.2 Range membership inference attack as a framework
The range membership inference attack is a new inference attack framework, not a particular attack algorithm. There are two components in this framework: a sampler and a membership tester, both of which are necessary to compute the range membership score formulated in Eqn 1. The sampler returns samples within the given range. The membership tester is a (point-query) membership inference algorithm that outputs a membership score, which can be used to approximate . A number of existing attack algorithms can be plugged in. Similar to MIAs, the key to using RaMIA as a privacy auditing tool is to compute the range membership score. Our framework can adopt any existing membership scoring function to compute . Below, we outline the attack with our attack strategy described above:
1:Input range , sampler , target model , membership scoring function .
2:Sample an attack set: ;
3:ifsamples are real and IDthen
4:Set , and set by sweeping on reference models;
5:else
6:Set , and set by sweeping on reference models.
7:endif
8:
5 Experiments
Since the purpose of this paper is to introduce a new concept and framework, the goal of the experiments section is to provide a proof-of-concept.We experiment on the commonly used Purchase-100 [39], CelebA [28], CIFAR-10 [25] and AG News [48] datasets. Details of how we split the dataset, train models, construct ranges, and obtain samples are explained in Appendix B. We compare RaMIA with MIA (Sec 5.1) in scenarios MIA under-performs (Depicted in Figure 2). Both attacks are built upon the state-of-the-art attack algorithm, robust membership inference attack (RMIA) [47], with three reference models trained in the same way as Carlini etal. [5] and Zarifzadeh etal. [47]. The respective queries are outlined in Table 1, and the definitions of members under each attack framework are explained in Table 2. In both tables, represents original data in datasets, while s are either data with missing values or modified data from . The reason that we do not test our attacks by taking ranges centered at original data is that the chance of the attack data being exactly the same as the training data is extremely low without sufficient prior knowledge. It is more realistic that similar data are being queried.
Dataset | Range query | Point query |
---|---|---|
Purchase-100 | possible data records given the incomplete data | mode imputed |
CelebA | photos featuring the same person as photo | photo |
CIFAR-10 | transformed versions of image | image |
AG News | sentences that are of Hamming distance 8 to sentence | sentence |
Dataset | Range member if there is at least | (Point) member if |
---|---|---|
Purchase-100 | one training point matches with on all unmasked columns | is member |
CelebA | one training image featuring the same person as | is member |
CIFAR-10 | one version of image in the training set | is member |
AG News | one training sentence within Hamming distance 8 to | is member |
On Purchase-100, we take 20 samples in every range, and set . On CIFAR-10, we apply up to 15 distinct transforms, and set . On AG News, we construct 50 sentences within each range, and set . On CelebA, each celebrity has a different number of images in the sampling space, ranging from 1 to 18. Since it is hard to standardize the sample size for all ranges, we take all of them. We then set and , which means we are not trimming anything for ranges with very few samples available.
5.1 RaMIAs quantify privacy risks more comprehensively than MIAs
As we have explained before, data points that are close enough to the training data are out of the scope of membership inference attacks. We observe from Figure 3 that range membership inference attacks are better at identifying those nearby points, and thus providing more comprehensive privacy auditing on all the four datasets we tested. We want to emphasize that the gain is remarkable if we consider how little samples were taken compared to the range sizes. On Purchase-100, there are a total of 1024 candidates, and we take less than 20% of them. On AG News, there are millions of sentences within a distance of 8. 50 sentences are too little to meaningfully cover anything in the space. But yet limited samples can lead to noticeable gains, which further shows the current privacy quantification approach is flawed and needs a better framework. Due to randomness in sampling, we report the average gain of RaMIA over MIA with standard deviation in Table 4.TPRs at small FPRs are in Table 3
5.2 Factors affecting RaMIA performance
Training data density in the range
Due to the nature of the sampling-based approach, the chance of our attack set containing a true training point scales linearly with the density of training points in the range. If we keep the sample size constant, increasing the range without including more training points in the range hurts the attack performance because the chance of the attack set including any training point gets diluted. On the other hand, if we increase the training point density, which is equivalent to increasing the probability the attacker samples a true training point, the attack performance gets boosted. Figure 4(b) shows that the performance of RaMIA increases when the range becomes larger in the CIFAR-10 experiment. Recall that the range function in CIFAR-10 is based on image augmentation methods. Increasing the range means the attacker applies more distinct augmentation methods to obtain transformed images. This increases the chance of the attacker obtaining one of the transformed versions of training images seen by the model during training, thus leading to better attack performance. In Figure 3(b), we conducted the attack assuming the attacker cannot sample any true training images. As a sanity check, we relax this assumption, and Figure 4(a) shows that RaMIA performs monotonically better when the density of training images increases from 0% to 50%, when the number of samples is constant.
Susceptibility to MIAs and RaMIAs is correlated
Ranges containing training points that are susceptible to MIAs are also more susceptible to RaMIAs. Researchers have previously discovered that machine learning models memorize duplicate data more [26, 6]. In our CelebA dataset, each celebrity has different numbers of photos in the training set, which can be thought as that each identity has different levels of duplication in the training set. Similar to the insights from MIAs, we also observe that identities that have more training images, i.e. higher duplication rate, are more susceptible to RaMIA. Figure 8 shows the relationship between the percentile each range’s RaMIA score within non-members’ RaMIA scores and the duplication rate. Generally speaking, identities that have more training photos are more prone to RaMIAs. Similarly correlation can be observed on the other three datasets in our experiments, where the training points’ RaMIA score percentiles among non-members are positively correlated with their MIA score percentiles 7.
5.3 Mismatched training and attack data hurts attack performance
Figure 2(c) shows that MIA underestimates the privacy risk when the augmentation used in training and attacking differs. This rings a bell as many people audit the privacy risk of image classifiers with original images, when the classifiers are often trained with a composition of augmentations. Many transformations, such as color jittering and affine transformations, always produce different final images. Other commonly used augmentation methods, such as random cropping, introduce more randomness to the pipeline. Hence, it is almost certain that the original images are never seen by the model. Therefore, we should use RaMIA for a better auditing result (Figure 3(c)).
Difference to existing augmentation-based MIAs
Existing attacks [4, 47] also use augmented queries in the attack, but with a different rationale and assumption of the attacker’s knowledge. Since they adopt the existing privacy notion based on point queries, only (augmented) images seen by the model in the training stage are considered as members. Hence, the attacker needs to know the exact train-time augmentations and augment images accordingly to not violate the privacy of notion. In RaMIA, the set of augmentations is given by the challenger (Def 2), which can contain augmentations not used in training, but considered as privacy leaking. Using the aggregation method in [4] will hurt the attack performance if non-training augmentations are used. However, RaMIA is designed to be robust in this scenario (Fig 3(c)).
6 Conclusion
In this paper, we argue that membership inference attacks are only useful as a privacy audit tool when querying exact copies of training and test data. Moving the query to similar points causes a drastic decrease in performance, rendering MIAs less useful. We conclude MIAs fail to comprehensively capture the notion of privacy, and thus propose a new class of inference attack, RaMIA, that extends the notion of MIAs. and cover the failure cases of MIAs by checking if a given range contains a training point. We introduce RaMIA as an attack framework that can be implemented with any existing MIA algorithm. We show that it can provide better privacy auditing with very few samples taken randomly. We hope our work can make more privacy researchers and practitioners aware of the shortcomings of MIAs, and shift their attention to RaMIAs. As it is the first paper that brings up this new framework, there is room for improvement in specific attack algorithms. For example, a better sampling process will surely increase the gap between RaMIA and MIA. Nevertheless, we have shown our framework is sensible and useful. In future work, we hope to design more powerful RaMIA strategies that are robust to the change of membership signals and datasets, especially on LLMs where we believe our privacy notion is extremely relevant.
References
- Ateniese etal. [2015]G.Ateniese, L.V. Mancini, A.Spognardi, A.Villani, D.Vitali, and G.Felici.Hacking smart machines with smarter ones: How to extract meaningfuldata from machine learning classifiers.International Journal of Security and Networks, 10(3):137–150, 2015.
- Bradbury etal. [2018]J.Bradbury, R.Frostig, P.Hawkins, M.J. Johnson, C.Leary, D.Maclaurin,G.Necula, A.Paszke, J.VanderPlas, S.Wanderman-Milne, and Q.Zhang.JAX: composable transformations of Python+NumPy programs,2018.URL http://github.com/google/jax.
- Brown etal. [2021]G.Brown, M.Bun, V.Feldman, A.Smith, and K.Talwar.When is memorization of irrelevant training data necessary forhigh-accuracy learning?In Proceedings of the 53rd annual ACM SIGACT symposium ontheory of computing, pages 123–132, 2021.
- Carlini etal. [2021]N.Carlini, F.Tramer, E.Wallace, M.Jagielski, A.Herbert-Voss, K.Lee,A.Roberts, T.Brown, D.Song, U.Erlingsson, etal.Extracting training data from large language models.In 30th USENIX Security Symposium (USENIX Security 21), pages2633–2650, 2021.
- Carlini etal. [2022a]N.Carlini, S.Chien, M.Nasr, S.Song, A.Terzis, and F.Tramer.Membership inference attacks from first principles.In 2022 IEEE Symposium on Security and Privacy (SP), pages1897–1914. IEEE, 2022a.
- Carlini etal. [2022b]N.Carlini, D.Ippolito, M.Jagielski, K.Lee, F.Tramer, and C.Zhang.Quantifying memorization across neural language models.arXiv preprint arXiv:2202.07646, 2022b.
- Carlini etal. [2023]N.Carlini, J.Hayes, M.Nasr, M.Jagielski, V.Sehwag, F.Tramer, B.Balle,D.Ippolito, and E.Wallace.Extracting training data from diffusion models.In 32nd USENIX Security Symposium (USENIX Security 23), pages5253–5270, 2023.
- Chase etal. [2021]M.Chase, E.Ghosh, and S.Mahloujifar.Property inference from poisoning.arXiv preprint arXiv:2101.11073, 2021.
- Chaudhari etal. [2023]H.Chaudhari, J.Abascal, A.Oprea, M.Jagielski, F.Tramer, and J.Ullman.Snap: Efficient extraction of private properties with poisoning.In 2023 IEEE Symposium on Security and Privacy (SP), pages400–417. IEEE, 2023.
- Chen etal. [2023a]G.Chen, Y.Zhang, and F.Song.Slmia-sr: Speaker-level membership inference attacks against speakerrecognition systems.arXiv preprint arXiv:2309.07983, 2023a.
- Chen etal. [2023b]M.Chen, Z.Zhang, T.Wang, M.Backes, and Y.Zhang.FACE-AUDITOR: Data auditing in facial recognition systems.In 32nd USENIX Security Symposium (USENIX Security 23), pages7195–7212, 2023b.
- Devlin etal. [2018]J.Devlin, M.-W. Chang, K.Lee, and K.Toutanova.Bert: Pre-training of deep bidirectional transformers for languageunderstanding.arXiv preprint arXiv:1810.04805, 2018.
- Duan etal. [2024]M.Duan, A.Suri, N.Mireshghallah, S.Min, W.Shi, L.Zettlemoyer,Y.Tsvetkov, Y.Choi, D.Evans, and H.Hajishirzi.Do membership inference attacks work on large language models?arXiv preprint arXiv:2402.07841, 2024.
- Feldman [2019]V.Feldman.Does learning require memorization? a short tale about a long tail.corr abs/1906.05271 (2019).arXiv preprint arXiv:1906.05271, 2019.
- Feldman and Zhang [2020]V.Feldman and C.Zhang.What neural networks memorize and why: Discovering the long tail viainfluence estimation.Advances in Neural Information Processing Systems,33:2881–2891, 2020.
- Garg and Roy [2023]I.Garg and K.Roy.Memorization through the lens of curvature of loss function aroundsamples.arXiv preprint arXiv:2307.05831, 2023.
- Hilprecht etal. [2019]B.Hilprecht, M.Härterich, and D.Bernau.Monte carlo and reconstruction membership inference attacks againstgenerative models.Proceedings on Privacy Enhancing Technologies, 2019.
- Honnibal etal. [2020]M.Honnibal, I.Montani, S.VanLandeghem, and A.Boyd.spaCy: Industrial-strength Natural Language Processing in Python.2020.doi: 10.5281/zenodo.1212303.
- Hu etal. [2021]E.J. Hu, P.Wallis, Z.Allen-Zhu, Y.Li, S.Wang, L.Wang, W.Chen, etal.Lora: Low-rank adaptation of large language models.In International Conference on Learning Representations, 2021.
- Jayaraman and Evans [2022]B.Jayaraman and D.Evans.Are attribute inference attacks just imputation?In Proceedings of the 2022 ACM SIGSAC Conference on Computerand Communications Security, pages 1569–1582, 2022.
- Jayaraman etal. [2021]B.Jayaraman, L.Wang, K.Knipmeyer, Q.Gu, and D.Evans.Revisiting membership inference under realistic assumptions.Proceedings on Privacy Enhancing Technologies, 2021(2), 2021.
- Jeffreys [1939]H.Jeffreys.Theory of probability.1939.
- Kandpal etal. [2023]N.Kandpal, K.Pillutla, A.Oprea, P.Kairouz, C.A. Choquette-Choo, and Z.Xu.User inference attacks on large language models.arXiv preprint arXiv:2310.09266, 2023.
- Kim etal. [2023]Y.I. Kim, P.Agrawal, J.O. Royset, and R.Khanna.On memorization and privacy risks of sharpness aware minimization.arXiv preprint arXiv:2310.00488, 2023.
- Krizhevsky etal. [2009]A.Krizhevsky etal.Learning multiple layers of features from tiny images.2009.
- Lee etal. [2021]K.Lee, D.Ippolito, A.Nystrom, C.Zhang, D.Eck, C.Callison-Burch, andN.Carlini.Deduplicating training data makes language models better.arXiv preprint arXiv:2107.06499, 2021.
- Liu etal. [2021]F.Liu, T.Lin, and M.Jaggi.Understanding memorization from the perspective of optimization viaefficient influence estimation.arXiv preprint arXiv:2112.08798, 2021.
- Liu etal. [2018]Z.Liu, P.Luo, X.Wang, and X.Tang.Large-scale celebfaces attributes (celeba) dataset.Retrieved August, 15(2018):11, 2018.
- Long etal. [2023]Y.Long, Z.Ying, H.Yan, R.Fang, X.Li, Y.Wang, and Z.Pan.Membership reconstruction attack in deep neural networks.Information Sciences, 634:27–41, 2023.
- Lukasik etal. [2023]M.Lukasik, V.Nagarajan, A.S. Rawat, A.K. Menon, and S.Kumar.What do larger image classifiers memorise?arXiv preprint arXiv:2310.05337, 2023.
- Mahloujifar etal. [2021]S.Mahloujifar, H.A. Inan, M.Chase, E.Ghosh, and M.Hasegawa.Membership inference on word embedding and beyond.arXiv preprint arXiv:2106.11384, 2021.
- Mangrulkar etal. [2022]S.Mangrulkar, S.Gugger, L.Debut, Y.Belkada, S.Paul, and B.Bossan.Peft: State-of-the-art parameter-efficient fine-tuning methods.https://github.com/huggingface/peft, 2022.
- Paszke etal. [2019]A.Paszke, S.Gross, F.Massa, A.Lerer, J.Bradbury, G.Chanan, T.Killeen,Z.Lin, N.Gimelshein, L.Antiga, etal.Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019.
- Radford etal. [2019]A.Radford, J.Wu, R.Child, D.Luan, D.Amodei, I.Sutskever, etal.Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019.
- Sablayrolles etal. [2019]A.Sablayrolles, M.Douze, C.Schmid, Y.Ollivier, and H.Jégou.White-box vs black-box: Bayes optimal strategies for membershipinference.In Proceedings of the 36th International Conference on MachineLearning (ICML’19), page 5558–5567, 2019.
- Salem etal. [2020]A.Salem, A.Bhattacharya, M.Backes, M.Fritz, and Y.Zhang.Updates-Leak: Data set inference and reconstruction attacksin online learning.In 29th USENIX security symposium (USENIX Security 20), pages1291–1308, 2020.
- Samarati and Sweeney [1998]P.Samarati and L.Sweeney.Generalizing data to provide anonymity when disclosing information.In PODS, volume98, pages 10–1145, 1998.
- Sankararaman etal. [2009]S.Sankararaman, G.Obozinski, M.I. Jordan, and E.Halperin.Genomic privacy and limits of individual detection in a pool.Nature genetics, 41(9):965–967, 2009.
- Shokri etal. [2017]R.Shokri, M.Stronati, C.Song, and V.Shmatikov.Membership inference attacks against machine learning models(s&p’17).2017.
- Suri and Evans [2022]A.Suri and D.Evans.Formalizing and estimating distribution inference risks.Proceedings on Privacy Enhancing Technologies, 2022.
- Tirumala etal. [2022]K.Tirumala, A.Markosyan, L.Zettlemoyer, and A.Aghajanyan.Memorization without overfitting: Analyzing the training dynamics oflarge language models.Advances in Neural Information Processing Systems,35:38274–38290, 2022.
- VanTrees [1968]H.VanTrees.Detection, estimation, and modulation theory. part 1-detection,estimation, and linear modulation theory.1968.
- Wu etal. [2022]Y.Wu, N.Yu, Z.Li, M.Backes, and Y.Zhang.Membership inference attacks against text-to-image generation models.2022.
- Ye etal. [2022]J.Ye, A.Maddi, S.K. Murakonda, V.Bindschaedler, and R.Shokri.Enhanced membership inference attacks against machine learningmodels.In Proceedings of the 2022 ACM SIGSAC Conference on Computerand Communications Security, pages 3093–3106, 2022.
- Yeom etal. [2018]S.Yeom, I.Giacomelli, M.Fredrikson, and S.Jha.Privacy risk in machine learning: Analyzing the connection tooverfitting.In 2018 IEEE 31st computer security foundations symposium(CSF), pages 268–282. IEEE, 2018.
- Zagoruyko and Komodakis [2016]S.Zagoruyko and N.Komodakis.Wide residual networks.In British Machine Vision Conference 2016. British MachineVision Association, 2016.
- Zarifzadeh etal. [2023]S.Zarifzadeh, P.C.-J.M. Liu, and R.Shokri.Low-cost high-power membership inference by boosting relativity.2023.
- Zhang etal. [2015]X.Zhang, J.Zhao, and Y.LeCun.Character-level convolutional networks for text classification.Advances in neural information processing systems, 28, 2015.
Appendix A Attack Details
A.1 (Simple) Hypothesis testing
The standard way to tackle the inference game (Def 1) is to apply statistical hypothesis tests [44, 5]:
The likelihood ratio test (LRT) is then conducted
(2) |
This is usually called "simple" hypothesis testing because each contains a single hypothesis.
A.2 Attack algorithms
In this section, we explain the details of the membership inference attack algorithms used in our experiments.
LOSS
LOSS [45] computes loss values as a proxy of membership score on given points: . To compute the likelihood, an easy way is to take the exponential of the negative of the loss .
RMIA
RMIA [47] computes membership score by applying chain rule: . The score is then compared with all available population data points to obtain the percentage of population points being dominated by the given point: , where the term will cancel out with each other. The normalizing constant is computed with reference models: . In its offline version, the in models are unavailable. In this case, the former probabilities is approximated by the latter term . The hyperparameter is chosen based on the reference models. Specifically, one reference model is chosen as the temporary target model, and the rest are used to attack it. The value of is chosen to be the best performing value under this setting, obtained via a simple sweeping. In our experiment, we use the offline attack only. The values for Purchase-100 and CIFAR-10 are taken from [47]. For CelebA, we set it to be 0.33. For AG News, we set it to be 1.0.
Appendix B Setup Details
On each of the dataset, we train four models on half of the dataset in the same way described by Carlini etal. [5], Zarifzadeh etal. [47]. We will describe the details of the datasets below. We have checked with their licenses with our best effort, and confirm their terms of use are respected.
B.1 Tabular data: Purchase-100
Dataset
Purchase-100 [39] is a tabular dataset derived from Kaggle’s Acquire Valued Shoppers Challenge 111https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data. This dataset was first curated by Shokri etal. [39] such that there are 600 binary features, each representing if each person, represented by each row, has purchased the product. The data is then divided into 100 classes, and the task is to predict the category of the person given the purchase history.
Models
We train five-layer multi-layer perceptron (MLP) models in PyTorch [33] on half of the entire dataset. The hidden layers are of sizes . All models achieve a test accuracy of .
Construction of ranges
We simulate the scenario where the attacker has incomplete data (data with missing values). For all training and test data records, we randomly mask k columns. Each row with masked columns is a range query that contains possible points as each feature is binary. We then check if each range constructed from test data points include any training point. If so, they are re-labelled as "in-ranges".
Sampling within ranges
Since this dataset contains 600 independent binary features, we do Bernoulli sampling independently for all missing columns. The parameter of the sampler is computed by taking the average value of each column. Because of the nature of this dataset, our sampled data can be regarded as in-distribution. We take 19 samples for each range, together with the data obtained by doing mode imputation (fill in the missing values with the modes).
B.2 Image data I: CelebA
Dataset
CelebA [28], also known as the CelebFaces Attributes dataset, contains 202,599 face images from 10,177 celebrities, each annotated with 40 binary facial features. We construct the members set by only including photos of celebrities with identity number smaller than 5090. The rest are used to construct the non-members set. For each celebrity in the members set, half of the photos are put into the training set, while the other half goes into the holdout set.
Model
We train four-layer convolutional neural networks (CNNs) in PyTorch [33] on the training set to predict the facial attributes of any given photo. Our target model has a test accuracy of .
Construction of ranges
The range function here is a semantic one that is based on the identity of the face image. For example, a range query can be "all Alice’s photos". Since the identities in the training and non-members set are disjoint, it is easy to construct in- and out-ranges based on the distribution of identities in the two sets.
Sampling within ranges
For each range query, we curate all images in the holdout set that share the same identity as the range center to construct our sample set.
B.3 Image data II: CIFAR-10
Dataset
CIFAR-10 [25] is a popular image classification dataset. There are 50,000 training images, each of size .
Models
We train WideResNets-28-2 [46] with JAX [2] on half of the training set of CIFAR-10 using the code from [5], with and without image augmentations. Our target model trained without augmentation achieves a test accuracy of on the CIFAR-10 test set, and the target model trained with augmentation achieves a test accuracy of . The train time augmentation is the composition of random flipping, cropping and random hue.
Construction of ranges
The range function here is different types of image augmentations, which are geometric transformations. An example of a range query is "all transformed version of image ". For each training and test image, a range is constructed by applying different transformations.
Sampling within ranges
For each range query, we independently apply 15 image augmentations on the query center. The augmentations include flipping, random rotation, random resizing and cropping, random contrast, brightness, hue, and the composition of them.
B.4 Textual data: AG News
Dataset
We use AG News dataset [48], which is a news collection with four categories of news. Although it was introduced as a text classification dataset, we disregard the labels and treat it as a text generation dataset. There are 120,000 sentences in its training set.
Model
We took pretrained GPT-2 [34] models from Hugging Face’s transformers library, and finetuned them on half of AG News’ training set with LoRA [19] implemented in Hugging Face’s PEFT [32] library. The finetuning is done for 4 epochs. Our target model achieves a perplexity of 1.39 on the test set of AG News.
Construction of ranges
The range function here is word-level Hamming distance, which can be thought as the edit distance measured on word level that only allows word substitution. An example of a range query is "all sentences within Hamming distance to sentence ". To construct in- and out-ranges, we just need to specify the max Hamming distance and the starting sentence. We constructed the starting sentences by randomly masking words from the training and test sentences, before filling in the mask with a pre-trained BERT [12] model, so they have a distance of to the original training/test sentences. A Hamming distance is then specified with each starting sentence to form a range.
Sampling within ranges
We mask the range center by words where is the Hamming distance specified by the range. Then we use BERT [12] to replace the mask with one of the top choices.
Appendix C Implementation details
For all PyTorch models, we use Adam as our optimizer with a learning rate of 0.001. For WideResnets, we use the training code from [4]. On AG News, the models are trained for 4 epochs. On other datasets, they are trained for 100 epochs.
All training are done on two Nvidia RTX 3090 GPUs. Training on AG News takes about 1 hour per epoch. Training other models takes less than one hour each.
Appendix D RaMIA on redacted data
Many large language models (LLMs) are trained with sensitive textual data. Some of the data with sensitive information redacted might be public available. Similar to our experiment with data with missing values, we can apply RaMIA to redacted data to identify which of them are used to train a target LLM. Accurately identify the redacted sentences paves the way for reconstructing them as a follow-up attack. Figure 5 shows the results. In this experiment, we use spaCy [18] to mask peoples’ names to simulate masking of personally identifiable information (PII). We then generate 10 possible sentences for each masked sentence using BERT, and conduct RaMIA. The MIA performance is the average attack performance over all 10 possible sentences.
Appendix E Extra results
In this section, we put extra experiment results.
Purchase-100 CIFAR-10 CelebA AG News TPR@FPR(%) 1% 0.1% 1% 0.1% 1% 0.1% 1% 0.1% MIA LOSS 0 0 0.15 0 1.86 0.31 RMIA 2.18 0.37 2.40 0.21 1.69 0.19 RaMIA LOSS 1.40 0.28 RMIA 1.44 0.22
Purchase-100 | CIFAR-10 | CelebA | AG News | |
---|---|---|---|---|