Range Membership Inference Attacks (2024)

Jiashu Tao
Department of Computer Science
National University of Singapore
Singapore
jiashut@comp.nus.edu.sg
&Reza Shokri
Department of Computer Science
National University of Singapore
Singapore
reza@comp.nus.edu.sg

Abstract

Machine learning models can leak private information about their training data, but the standard methods to measure this risk, based on membership inference attacks (MIAs), have a major limitation. They only check if a given data point exactly matches a training point, neglecting the potential of similar or partially overlapping data revealing the same private information. To address this issue, we introduce the class of range membership inference attacks (RaMIAs), testing if the model was trained on any data in a specified range (defined based on the semantics of privacy). We formulate the RaMIAs game and design a principled statistical test for its complex hypotheses. We show that RaMIAs can capture privacy loss more accurately and comprehensively than MIAs on various types of data, such as tabular, image, and language. RaMIA paves the way for a more comprehensive and meaningful privacy auditing of machine learning algorithms.

1 Introduction

Machine learning models are prone to training data memorization [14, 15, 27, 41, 24]. It is also a known fact that the outstanding predictive performance of machine learning models on long-tailed data distributions often comes at the expense of blatant memorization of certain data points [15, 3, 30, 16]. In simple words, memorization is the phenomenon that models behave differently on training points, compared to other points. The memorization can lead to significant privacy risks as adversaries can infer private information about the training data from only having black box access to models.

To quantify the privacy risk of machine learning models, a privacy notion needs to be fixed first. The reigning privacy notion is defined by membership information. Membership information of a data point is binary, but this single bit of information carries huge privacy implications. Being able to infer membership information opens up the possibility of conducting data reconstruction attack[36, 17, 4, 29], where the reconstruction attack inspects the membership of plausible data points to recover the training set. The de-facto way to audit the privacy risk according to this privacy notion is to conduct membership inference attacks (MIAs)[39], where an adversary aims to predict whether a given query data is part of the training set of the target model. The more powerful the membership inference attack is, the higher the privacy risk the target model bears.

Membership inference attacks provide a lower bound of the true privacy risk of a model, so improving the attack performance also means tightening the bound of privacy risk estimation. So far, the community has put much effort into improving the power of membership inference attacks by crafting better membership signals and constructing better statistical tests [38, 39, 35, 44, 5, 47]. While these have been useful for the betterment of privacy auditing, they have ignored the fundamental drawback of membership inference attacks as a practical privacy auditing tool, i.e., MIAs assume it is a privacy concern only if the adversary can identify the exact, full version of the training data. However, if the adversary can identify data points that are similar enough to the training points, it should also be treated as a significant privacy risk, because those points can contain similar levels of private information. For example, two Alice’s photos taken from slightly different angles, or with a different background would contain similar private information about Alice’s face or location. Similar to images, small perturbations or rephrasing in textual data also affect little in the sensitivity of the information conveyed [13]. This oversight means private information leakage beyond exact matches of training data is ignored and current privacy auditing tools might produce overly optimistic results.

Besides, focusing on exact membership inference attacks renders them incapable of handling queries with missing values. This is another major limitation of MIAs, as data records with the same sensitive features and a few missing non-private features would carry a similar level of private information as the full data records. Imagine the case where an adversary can infer that an Asian person of age 25, identification number 123456 is in the hospital training set, there is no need to identify the rest of the features because the adversary is already able to pinpoint who has HIV from the given subset of key features. Even if the key identifiers are missing and unknown, it is still a grave privacy threat if the attacker can infer the rest of the features that often contain quasi-identifiers, which has been studied extensively in the failure of k-anonymity [37], where the attacker is able to reconstruct and identify training data with certain features removed. However, if we make up the missing values or pass noisy data to the membership inference attack, the attack is expected to output “not a member," since the chance of us coming up with the right values is so slimSupposeIf we use property inference [40, 1, 8, 9, 45, 20] to help fill in the missing features, the imputed data may be have inflated membership score even if they are non-members, leading to a higher false positive rate. Therefore, existing frameworks struggle with quantifying privacy leakage with missing features.

We argue that privacy quantification should not be point-based, because a small neighborhood around training points also contains similar private information. Hence, in this paper, we are proposing a new attack framework called range membership inference attacks (RaMIAs) to better capture the notion of privacy. Instead of using point queries and testing for exact matches, range membership inference attacks use range queries that cover a set of points. The goal of range membership inference attacks is to infer if the given range query contains any training point.

Range membership inference attacks extend the formulation of membership inference attacks. We adapt the original inference game formulation to reflect the change to range queries in RaMIAs. This extended formulation produces composite hypotheses in the likelihood ratio tests, which are the standard and best attack techniques in MIAs [38, 44, 5, 47]. Our method is based on standard statistical methods for composite hypothesis testing, namely generalized likelihood ratio tests and Bayes factors. We show that RaMIAs can provide a more comprehensive notion of privacy by detecting private information leakage from the vicinity of training data when MIAs underestimate such privacy risk. Specifically, we observe a simple flipping can cause the membership score to decrease from very high to 0 (Figure 1), and the overall AUC can drop 0.20 if we test image classifiers with horizontally flipped images (Figure 2(c)). RaMIA, implemented with our simple attack strategy (Sec 4), supersedes MIA by at least 5% on image datasets (Fig 3(b), 3(c)), providing better privacy auditing at the cost of as few as 15 samples, which is insignificant compared to the dimensionality of the data space.

In this paper, we emphasize the motivation and formulation of our newly proposed attack framework, RaMIA. As a proof-of-concept, we experiment RaMIA with a simple attack strategy on tabular, image and text datasets, where RaMIA unanimously outperforms MIA. Additionally, our attack can also be potentially used in the pioneering membership inference and data extraction attacks on generative models [4, 43, 7], where the current evaluation requires finding the closest training image for all candidate data and computing their distances. By setting the distance function as the range function and conducting RaMIAs, we can evaluate the attacks more systematically.

2 Preliminaries

2.1 Membership inference attacks

The membership inference attack (MIA) [39] is a type of inference attack against machine learning models to infer whether a given data sample is part of the model’s training set. Mathematically, given a model $f$ and a query point $\mathbf{x}$ , the MIA aims to output 1 if $\mathbf{x}$ is a training point, and 0 otherwise. There are various methods to construct and conduct the attack, and it is still an active research direction that sees more powerful attacks being developed. Shokri etal. [39] use a shadow model based approach where shadow models are trained on known training sets in similar ways to the target model. Confidence values of the training and test data on the shadow models are computed, which are then used as benchmarks in testing. However, the high cost and strong assumption of knowing the target model’s training details make the attack often infeasible. Yeom etal. [45] use model loss as a signal and threshold it, scraping the need for shadow models. Then MIA is formulated as an inference game (See Sec 3.1.1). Researchers turn to the principled approach to solve the game via likelihood ratio tests [38, 5, 44, 47]. Carlini etal. [5] and Ye etal. [44] propose reference-model based approaches, where target signals are compared to those obtained on reference models to obtain the likelihood ratio. To further boost the attack power, Zarifzadeh etal. [47] assume the attacker has access to a pool of population data so that the likelihood ratio from reference-based attacks can be calibrated on ratios obtained on (non-member) population data.

Recent attacks [4, 47] further boost the attack performance on image data by augmenting the test queries with train-time augmentations. This assumes that the attacker knows the exact train-test augmentations in advance, and is able to sample from them. Augmenting training images with non train-time augmentations is not considered for valid reasons: those augmented images would be non-members in the current privacy notion.

2.2 Range queries

If we make a connection to the field of databases, a membership inference attack operates on point queries or exact match queries. That is, each query to the membership inference attack only contains one data point and the attack only concerns if this very point is in the training set. On the other hand, range query, which is also a common querying operation in database systems, wants to retrieve all data points that fall into the "range". The most fundamental difference to point query is that the retrieved result often contains multiple data points instead of a single one. Our proposed attack, the range membership inference attack, operates with range queries.

3 From MIA to RaMIA

Membership inference attacks are often formulated as an inference game [45, 21, 44, 5, 47] between a challenger and an adversary. In this section, we will walk through how we come up with RaMIA from MIA.

3.1 Membership inference attacks

In membership inference attacks, the goal is to identify if a given point is part of the training set.

3.1.1 Membership inference game

Definition 1

(Membership Inference Game [44, 45]) Let $\pi$ be the data distribution, and let $\mathcal{T}$ be the training algorithm.

1.
The challenger samples a training dataset $D\overset{s_{D}}{\longleftarrow}\pi$ , and trains a model $\theta\longleftarrow\mathcal{T}(D)$ .
2.
The challenger samples a data record $z_{0}\overset{s_{z_{0}}}{\longleftarrow}\pi$ from the data distribution, and a training data record $z_{1}\overset{s_{z_{1}}}{\longleftarrow}D$ .
3.
The challenger flips a fair coin to get the bit $b\in\{0,1\}$ , and sends the target model $\theta$ and data record $z_{b}$ to the adversary.
4.
The adversary gets access to the data distribution $\pi$ and access to the target model, and outputs a bit $\hat{b}\longleftarrow\mathcal{A}(\theta,z_{b})$ .
5.
If $\hat{b}=b$ , output 1 (success). Otherwise, output 0.

3.1.2 Evaluation of MIA

Evaluation is done with a set of training and test points. True positive rate (TPR) and false positive rate (FPR) are computed by sweeping over all possible threshold values. By plotting the receiver operating characteristic curve (ROC), the power of an attack strategy can be represented by the area under the curve (AUC). A clueless adversary who can only randomly guess the membership labels will get an AUC of 0.5. For stronger adversaries, they predict membership more correctly at each error level. Hence, they would achieve higher TPR at each FPR, and get a higher AUC.

3.1.3 Intrinsic limitation of MIA as a Privacy Auditing Framework

MIAs are intrinsically incapable of identifying points close to training points, regardless of how similar they are, because these points are, by definition, non-members in the scope of MIAs. Hence, there is a huge space of points that contain private information but are deemed non-members in the current privacy auditing framework. In this way, MIAs as privacy auditing tools become bad when the queries move away from the original data. Figure 2 shows the MIAs under-perform on non-original data. This inspires our formulation of RaMIA, where these points will be classified as "members" for better and more comprehensive privacy auditing.

3.2 Range membership inference attack

In range membership inference attacks, the goal is to identify if a given range contains any training point.

3.2.1 Range membership inference game

Here we define our range membership inference game, modified from the above formulation.

Definition 2

(Range Membership Inference Game) Let $\pi$ be the data distribution, and let $\mathcal{T}$ be the training algorithm.

1.
The challenger samples a training dataset $D\overset{s_{D}}{\longleftarrow}\pi$ , and trains a model $\theta\longleftarrow\mathcal{T}(D)$ .
2.
The challenger samples a data record $z_{0}\overset{s_{z_{0}}}{\longleftarrow}\pi$ from the data distribution, and a training data record $z_{1}\overset{s_{z_{1}}}{\longleftarrow}D$ .
3.
The challenger flips a fair coin to get the bit $b\in\{0,1\}$ . If $b=1$ , the challenger samples a range $\mathcal{R}_{1}$ containing at least one training point. Otherwise, challenger samples a range $\mathcal{R}_{0}$ containing no training points.
4.
The challenger sends the target model $\theta$ and the range $\mathcal{R}_{b}$ to the adversary.
5.
The adversary gets access to the data distribution $\pi$ and access to the target model, and outputs a bit $\hat{b}\longleftarrow\mathcal{A}(\theta,\mathcal{R}_{b})$ .
6.
If $\hat{b}=b$ , output 1 (success). Otherwise, output 0.

The main difference between the two games is that the adversary now receives a range query (Step 4 in Def 2) instead of a point query (Step 3 in Def 1). We assume that the adversary is able to sample within the range. This is a reasonable assumption because the adversary is usually assumed to have the ability to sample from the original data distribution $\pi$ [39, 44, 47]. Given a range query and a sampler of $\pi$ , it is not difficult to sample within the range.

What is a range

A range can be defined by a center, which is a point, a radius representing the size of the range, and a distance function which the radius is defined with. We refer to the center as the query center, the radius as the range size, and the distance function as the range function in this paper. One way to visualize a range is to imagine a unit $l_{2}$ ball around a point $x$ , replacing the radius and $l_{2}$ distance with any arbitrary choice of range sizes and functions. Our framework can cater to any arbitrary range function. It can be spatial based (e.g. $l_{p}$ distances), transformation based (e.g. geometric transformations) and semantic based (e.g. owner/main features of the data). In the experiment section, we will present results with all of these types of range functions. Note that our attack reduces to user-level inference [31, 23, 27, 11, 10] when the range function is user-based.

How to construct a range

In Step 3 of the range membership inference game, the details of how the challenger samples the ranges are intentionally omitted. This is because the ranges can be constructed around either in-distribution or out-of-distribution data points for both in- and out-ranges. The details of how we construct the in- and out-ranges for our experiments are elaborated in Appendix B.

3.3 Evaluation of RaMIA

Similarly, we evaluate RaMIA with AUCs. However, the notion of true positives and false positives are different from those defined in MIA, as both are defined on the range level. To avoid confusion, we call them Range TPR and Range FPR, which means a range is correctly/wrongly predicted to contain at least one training point.

4 Range membership inference attacks

4.1 Composite hypothesis testing

Similar to likelihood ratio tests for membership inference games A.1, we can also construct two hypotheses for range membership inference game (Def 2):

	$\displaystyle H_{0}$	$\displaystyle:\text{None of the points in the given range are from the %training set. }$
		$\displaystyle\forall z\in\mathcal{R}_{b}:z\not\in D.$
	$\displaystyle H_{1}$	$\displaystyle:\text{There is at least one point in the given range from the %training set. }$
		$\displaystyle\exists z\in\mathcal{R}_{b}\text{ s.t. }z\in D.$

The likelihood ratio in this case is $\frac{\mathbb{P}\left(\theta|H_{1}\right)}{\mathbb{P}\left(\theta|H_{0}\right)}$ . Note that the alternative hypothesis $H_{1}$ is composite because it is a union of multiple hypotheses $\bigcup_{z_{i}\leftarrow\mathcal{R}_{b}}(z_{i}\in D)$ . Therefore, we need to change our methodology to those tailored for composite hypothesis testing. There are two commonly used methods for it: Bayes Factor [22] and Generalized Likelihood Ratio Tests (GLRT) [42]. Bayes Factor replaces the composite hypothesis with a simple one that is representative of its hypothesis class. It models the "parameter" of the hypothesis with a prior distribution and then computes the expected value of the composite hypothesis based on the prior distribution. In this case, $\mathbb{P}\left(\theta|H_{1}\right)$ will be computed by $\int_{x\in\mathcal{R}_{1}}\mathbb{P}(\theta|x\in D)\mathbb{P}(x)dx$ . The generalized likelihood ratio test (GLRT) simply takes the maximum of all values the composite hypothesis can achieve. In this case, $\mathbb{P}\left(\theta|H_{1}\right)$ will be computed by $\max_{x\in\mathcal{R}_{1}}\mathbb{P}(\theta|x\in D)$ .

To make full use of the Bayes Factor, we need to know the prior distribution, which is unrealistic. On the other hand, taking the max seems to be a more intuitive approach for range membership inference attacks, because it provides a two-step solution: search and test. Searching for the points with the highest membership score is conceptually equivalent to identifying the points that are most likely to be training points. However, this assumes that we can reliably find the max values in a given range. Since most ranges are large data subspaces, it is very challenging to find the optimal points within the large space. Even if the search space can be navigated, any search algorithm is likely to return local maxima. Hence, a robust way is to aggregate the top samples. However, membership inference attacks are known to be unreliable on out-of-distribution (OOD) data [47] and can assign them high scores. When the sampling space contains only these data as opposed to real and in-distribution (ID) data, the maximum might not be anything close to true training data, increasing the FPR and lowering the AUC as a result.

In this paper, we adopt a simple attack strategy. Our solution to this is based on the type of data in the sampling space. If the data are all naturally ID, we can combine Bayes Factor and GLRT by taking the average likelihood of the top samples to reduce the influence of the randomness in the sampling process.On the other hand, if the adversary can only synthesize data within the range, the top samples are highly likely to be OOD data with high membership scores. Hence, in this case, we want to remove those points from the equation. Since the presence of training points intuitively raises the average membership score of ID points nearby, compared to having no training points at all, we average the membership scores of the remaining samples.Unifying these two strategies gives us the following:

\mathbb{P}\left(\theta|H_{1}\right)=\text{TrimmedAvg}(S,q_{s},q_{e};\mathbb{P}%)=\text{Avg}_{x\not\in[q_{s},q_{e}]\text{-th quantiles}}\mathbb{P}(\theta|x\inD),

(1)

where $S$ is the sampled set, $q_{s}$ and $q_{e}$ mark the start and the end of the quantiles where we want to remove to compute our robust statistics that are one-sided trimmed means. If the sampling space is filled with synthetic data, the chance of the top samples being false positives is high, so we set $q_{e}=100$ to remove the largest points. $q_{s}$ is a hyperparameter that decreases (trim more) as the quality of sampled points gets worse. On the other hand, if the sampling space consists of real points, we set $q_{s}=0$ to remove the smallest points in our aggregation. $q_{e}$ decreases (trim less) as the number of real samples decreases to offset the high variance due to limited samples available. Note that the optimal hyperparameters may differ across different membership signals (e.g. loss values, LiRA scores), as they exploit different vulnerabilities and expose different training points. However, for fixed model architectures, range functions, data distributions and sampling methods, these hyperparameters can be determined by reference models, similar to the offline version of RMIA [47]. Specifically, by randomly choosing a reference model as the temporary target model, we can run RaMIAs using all other reference models while sweeping these hyperparameters.

4.2 Range membership inference attack as a framework

The range membership inference attack is a new inference attack framework, not a particular attack algorithm. There are two components in this framework: a sampler and a membership tester, both of which are necessary to compute the range membership score formulated in Eqn 1. The sampler $\text{Sampler}(\mathcal{R}):S\rightarrow X$ returns samples within the given range. The membership tester $\text{MIA}(x)$ is a (point-query) membership inference algorithm that outputs a membership score, which can be used to approximate $\mathbb{P}(\theta|x)$ . A number of existing attack algorithms can be plugged in. Similar to MIAs, the key to using RaMIA as a privacy auditing tool is to compute the range membership score. Our framework can adopt any existing membership scoring function $\text{MIA}(x)$ to compute $\text{RaMIA}(\mathcal{R})$ . Below, we outline the attack with our attack strategy described above:

1:Input range $\mathcal{R}$ , sampler $\text{Sample}(\cdot)$ , target model $\theta$ , membership scoring function $\text{MIA}(\cdot)$ .

2:Sample an attack set: $S\overset{n}{\longleftarrow}\text{Sample}(\mathcal{R})$ ;

3:ifsamples are real and IDthen

4:Set $q_{s}=0$ , and set $q_{e}$ by sweeping on reference models;

5:else

6:Set $q_{e}=100$ , and set $q_{s}$ by sweeping on reference models.

7:endif

8: $\text{RaMIA}(\mathcal{R};\theta)=\text{TrimmedAvg}(S,q_{s},q_{e};\text{MIA})$

5 Experiments

Since the purpose of this paper is to introduce a new concept and framework, the goal of the experiments section is to provide a proof-of-concept.We experiment on the commonly used Purchase-100 [39], CelebA [28], CIFAR-10 [25] and AG News [48] datasets. Details of how we split the dataset, train models, construct ranges, and obtain samples are explained in Appendix B. We compare RaMIA with MIA (Sec 5.1) in scenarios MIA under-performs (Depicted in Figure 2). Both attacks are built upon the state-of-the-art attack algorithm, robust membership inference attack (RMIA) [47], with three reference models trained in the same way as Carlini etal. [5] and Zarifzadeh etal. [47]. The respective queries are outlined in Table 1, and the definitions of members under each attack framework are explained in Table 2. In both tables, $x$ represents original data in datasets, while $x^{\prime}$ s are either data with missing values or modified data from $x$ . The reason that we do not test our attacks by taking ranges centered at original data $x$ is that the chance of the attack data being exactly the same as the training data is extremely low without sufficient prior knowledge. It is more realistic that similar data are being queried.

Dataset	Range query	Point query
Purchase-100	possible data records given the incomplete data $x^{\prime}$	mode imputed $x^{\prime}$
CelebA	photos featuring the same person as photo $x^{\prime}$	photo $x^{\prime}$
CIFAR-10	transformed versions of image $x^{\prime}$	image $x^{\prime}$
AG News	sentences that are of Hamming distance 8 to sentence $x^{\prime}$	sentence $x^{\prime}$

Dataset	Range member if there is at least	(Point) member if
Purchase-100	one training point matches with $x^{\prime}$ on all unmasked columns	$x^{\prime}_{\text{impu}}$ is member
CelebA	one training image featuring the same person as $x^{\prime}$	$x^{\prime}$ is member
CIFAR-10	one version of image $x^{\prime}$ in the training set	$x^{\prime}$ is member
AG News	one training sentence within Hamming distance 8 to $x^{\prime}$	$x^{\prime}$ is member

On Purchase-100, we take 20 samples in every range, and set $q_{e}=100,q_{s}=45$ . On CIFAR-10, we apply up to 15 distinct transforms, and set $q_{e}=100,q_{s}=40$ . On AG News, we construct 50 sentences within each range, and set $q_{e}=100,q_{s}=20$ . On CelebA, each celebrity has a different number of images in the sampling space, ranging from 1 to 18. Since it is hard to standardize the sample size for all ranges, we take all of them. We then set $q_{s}=0$ and $q_{e}=25$ , which means we are not trimming anything for ranges with very few samples available.

5.1 RaMIAs quantify privacy risks more comprehensively than MIAs

As we have explained before, data points that are close enough to the training data are out of the scope of membership inference attacks. We observe from Figure 3 that range membership inference attacks are better at identifying those nearby points, and thus providing more comprehensive privacy auditing on all the four datasets we tested. We want to emphasize that the gain is remarkable if we consider how little samples were taken compared to the range sizes. On Purchase-100, there are a total of 1024 candidates, and we take less than 20% of them. On AG News, there are millions of sentences within a distance of 8. 50 sentences are too little to meaningfully cover anything in the space. But yet limited samples can lead to noticeable gains, which further shows the current privacy quantification approach is flawed and needs a better framework. Due to randomness in sampling, we report the average gain of RaMIA over MIA with standard deviation in Table 4.TPRs at small FPRs are in Table 3

5.2 Factors affecting RaMIA performance

Training data density in the range

Due to the nature of the sampling-based approach, the chance of our attack set containing a true training point scales linearly with the density of training points in the range. If we keep the sample size constant, increasing the range without including more training points in the range hurts the attack performance because the chance of the attack set including any training point gets diluted. On the other hand, if we increase the training point density, which is equivalent to increasing the probability the attacker samples a true training point, the attack performance gets boosted. Figure 4(b) shows that the performance of RaMIA increases when the range becomes larger in the CIFAR-10 experiment. Recall that the range function in CIFAR-10 is based on image augmentation methods. Increasing the range means the attacker applies more distinct augmentation methods to obtain transformed images. This increases the chance of the attacker obtaining one of the transformed versions of training images seen by the model during training, thus leading to better attack performance. In Figure 3(b), we conducted the attack assuming the attacker cannot sample any true training images. As a sanity check, we relax this assumption, and Figure 4(a) shows that RaMIA performs monotonically better when the density of training images increases from 0% to 50%, when the number of samples is constant.

Susceptibility to MIAs and RaMIAs is correlated

Ranges containing training points that are susceptible to MIAs are also more susceptible to RaMIAs. Researchers have previously discovered that machine learning models memorize duplicate data more [26, 6]. In our CelebA dataset, each celebrity has different numbers of photos in the training set, which can be thought as that each identity has different levels of duplication in the training set. Similar to the insights from MIAs, we also observe that identities that have more training images, i.e. higher duplication rate, are more susceptible to RaMIA. Figure 8 shows the relationship between the percentile each range’s RaMIA score within non-members’ RaMIA scores and the duplication rate. Generally speaking, identities that have more training photos are more prone to RaMIAs. Similarly correlation can be observed on the other three datasets in our experiments, where the training points’ RaMIA score percentiles among non-members are positively correlated with their MIA score percentiles 7.

5.3 Mismatched training and attack data hurts attack performance

Figure 2(c) shows that MIA underestimates the privacy risk when the augmentation used in training and attacking differs. This rings a bell as many people audit the privacy risk of image classifiers with original images, when the classifiers are often trained with a composition of augmentations. Many transformations, such as color jittering and affine transformations, always produce different final images. Other commonly used augmentation methods, such as random cropping, introduce more randomness to the pipeline. Hence, it is almost certain that the original images are never seen by the model. Therefore, we should use RaMIA for a better auditing result (Figure 3(c)).

Difference to existing augmentation-based MIAs

Existing attacks [4, 47] also use augmented queries in the attack, but with a different rationale and assumption of the attacker’s knowledge. Since they adopt the existing privacy notion based on point queries, only (augmented) images seen by the model in the training stage are considered as members. Hence, the attacker needs to know the exact train-time augmentations and augment images accordingly to not violate the privacy of notion. In RaMIA, the set of augmentations is given by the challenger (Def 2), which can contain augmentations not used in training, but considered as privacy leaking. Using the aggregation method in [4] will hurt the attack performance if non-training augmentations are used. However, RaMIA is designed to be robust in this scenario (Fig 3(c)).

6 Conclusion

In this paper, we argue that membership inference attacks are only useful as a privacy audit tool when querying exact copies of training and test data. Moving the query to similar points causes a drastic decrease in performance, rendering MIAs less useful. We conclude MIAs fail to comprehensively capture the notion of privacy, and thus propose a new class of inference attack, RaMIA, that extends the notion of MIAs. and cover the failure cases of MIAs by checking if a given range contains a training point. We introduce RaMIA as an attack framework that can be implemented with any existing MIA algorithm. We show that it can provide better privacy auditing with very few samples taken randomly. We hope our work can make more privacy researchers and practitioners aware of the shortcomings of MIAs, and shift their attention to RaMIAs. As it is the first paper that brings up this new framework, there is room for improvement in specific attack algorithms. For example, a better sampling process will surely increase the gap between RaMIA and MIA. Nevertheless, we have shown our framework is sensible and useful. In future work, we hope to design more powerful RaMIA strategies that are robust to the change of membership signals and datasets, especially on LLMs where we believe our privacy notion is extremely relevant.

References

Ateniese etal. [2015]G.Ateniese, L.V. Mancini, A.Spognardi, A.Villani, D.Vitali, and G.Felici.Hacking smart machines with smarter ones: How to extract meaningfuldata from machine learning classifiers.International Journal of Security and Networks, 10(3):137–150, 2015.
Bradbury etal. [2018]J.Bradbury, R.Frostig, P.Hawkins, M.J. Johnson, C.Leary, D.Maclaurin,G.Necula, A.Paszke, J.VanderPlas, S.Wanderman-Milne, and Q.Zhang.JAX: composable transformations of Python+NumPy programs,2018.URL http://github.com/google/jax.
Brown etal. [2021]G.Brown, M.Bun, V.Feldman, A.Smith, and K.Talwar.When is memorization of irrelevant training data necessary forhigh-accuracy learning?In Proceedings of the 53rd annual ACM SIGACT symposium ontheory of computing, pages 123–132, 2021.
Carlini etal. [2021]N.Carlini, F.Tramer, E.Wallace, M.Jagielski, A.Herbert-Voss, K.Lee,A.Roberts, T.Brown, D.Song, U.Erlingsson, etal.Extracting training data from large language models.In 30th USENIX Security Symposium (USENIX Security 21), pages2633–2650, 2021.
Carlini etal. [2022a]N.Carlini, S.Chien, M.Nasr, S.Song, A.Terzis, and F.Tramer.Membership inference attacks from first principles.In 2022 IEEE Symposium on Security and Privacy (SP), pages1897–1914. IEEE, 2022a.
Carlini etal. [2022b]N.Carlini, D.Ippolito, M.Jagielski, K.Lee, F.Tramer, and C.Zhang.Quantifying memorization across neural language models.arXiv preprint arXiv:2202.07646, 2022b.
Carlini etal. [2023]N.Carlini, J.Hayes, M.Nasr, M.Jagielski, V.Sehwag, F.Tramer, B.Balle,D.Ippolito, and E.Wallace.Extracting training data from diffusion models.In 32nd USENIX Security Symposium (USENIX Security 23), pages5253–5270, 2023.
Chase etal. [2021]M.Chase, E.Ghosh, and S.Mahloujifar.Property inference from poisoning.arXiv preprint arXiv:2101.11073, 2021.
Chaudhari etal. [2023]H.Chaudhari, J.Abascal, A.Oprea, M.Jagielski, F.Tramer, and J.Ullman.Snap: Efficient extraction of private properties with poisoning.In 2023 IEEE Symposium on Security and Privacy (SP), pages400–417. IEEE, 2023.
Chen etal. [2023a]G.Chen, Y.Zhang, and F.Song.Slmia-sr: Speaker-level membership inference attacks against speakerrecognition systems.arXiv preprint arXiv:2309.07983, 2023a.
Chen etal. [2023b]M.Chen, Z.Zhang, T.Wang, M.Backes, and Y.Zhang. $\{$ FACE-AUDITOR $\}$ : Data auditing in facial recognition systems.In 32nd USENIX Security Symposium (USENIX Security 23), pages7195–7212, 2023b.
Devlin etal. [2018]J.Devlin, M.-W. Chang, K.Lee, and K.Toutanova.Bert: Pre-training of deep bidirectional transformers for languageunderstanding.arXiv preprint arXiv:1810.04805, 2018.
Duan etal. [2024]M.Duan, A.Suri, N.Mireshghallah, S.Min, W.Shi, L.Zettlemoyer,Y.Tsvetkov, Y.Choi, D.Evans, and H.Hajishirzi.Do membership inference attacks work on large language models?arXiv preprint arXiv:2402.07841, 2024.
Feldman [2019]V.Feldman.Does learning require memorization? a short tale about a long tail.corr abs/1906.05271 (2019).arXiv preprint arXiv:1906.05271, 2019.
Feldman and Zhang [2020]V.Feldman and C.Zhang.What neural networks memorize and why: Discovering the long tail viainfluence estimation.Advances in Neural Information Processing Systems,33:2881–2891, 2020.
Garg and Roy [2023]I.Garg and K.Roy.Memorization through the lens of curvature of loss function aroundsamples.arXiv preprint arXiv:2307.05831, 2023.
Hilprecht etal. [2019]B.Hilprecht, M.Härterich, and D.Bernau.Monte carlo and reconstruction membership inference attacks againstgenerative models.Proceedings on Privacy Enhancing Technologies, 2019.
Honnibal etal. [2020]M.Honnibal, I.Montani, S.VanLandeghem, and A.Boyd.spaCy: Industrial-strength Natural Language Processing in Python.2020.doi: 10.5281/zenodo.1212303.
Hu etal. [2021]E.J. Hu, P.Wallis, Z.Allen-Zhu, Y.Li, S.Wang, L.Wang, W.Chen, etal.Lora: Low-rank adaptation of large language models.In International Conference on Learning Representations, 2021.
Jayaraman and Evans [2022]B.Jayaraman and D.Evans.Are attribute inference attacks just imputation?In Proceedings of the 2022 ACM SIGSAC Conference on Computerand Communications Security, pages 1569–1582, 2022.
Jayaraman etal. [2021]B.Jayaraman, L.Wang, K.Knipmeyer, Q.Gu, and D.Evans.Revisiting membership inference under realistic assumptions.Proceedings on Privacy Enhancing Technologies, 2021(2), 2021.
Jeffreys [1939]H.Jeffreys.Theory of probability.1939.
Kandpal etal. [2023]N.Kandpal, K.Pillutla, A.Oprea, P.Kairouz, C.A. Choquette-Choo, and Z.Xu.User inference attacks on large language models.arXiv preprint arXiv:2310.09266, 2023.
Kim etal. [2023]Y.I. Kim, P.Agrawal, J.O. Royset, and R.Khanna.On memorization and privacy risks of sharpness aware minimization.arXiv preprint arXiv:2310.00488, 2023.
Krizhevsky etal. [2009]A.Krizhevsky etal.Learning multiple layers of features from tiny images.2009.
Lee etal. [2021]K.Lee, D.Ippolito, A.Nystrom, C.Zhang, D.Eck, C.Callison-Burch, andN.Carlini.Deduplicating training data makes language models better.arXiv preprint arXiv:2107.06499, 2021.
Liu etal. [2021]F.Liu, T.Lin, and M.Jaggi.Understanding memorization from the perspective of optimization viaefficient influence estimation.arXiv preprint arXiv:2112.08798, 2021.
Liu etal. [2018]Z.Liu, P.Luo, X.Wang, and X.Tang.Large-scale celebfaces attributes (celeba) dataset.Retrieved August, 15(2018):11, 2018.
Long etal. [2023]Y.Long, Z.Ying, H.Yan, R.Fang, X.Li, Y.Wang, and Z.Pan.Membership reconstruction attack in deep neural networks.Information Sciences, 634:27–41, 2023.
Lukasik etal. [2023]M.Lukasik, V.Nagarajan, A.S. Rawat, A.K. Menon, and S.Kumar.What do larger image classifiers memorise?arXiv preprint arXiv:2310.05337, 2023.
Mahloujifar etal. [2021]S.Mahloujifar, H.A. Inan, M.Chase, E.Ghosh, and M.Hasegawa.Membership inference on word embedding and beyond.arXiv preprint arXiv:2106.11384, 2021.
Mangrulkar etal. [2022]S.Mangrulkar, S.Gugger, L.Debut, Y.Belkada, S.Paul, and B.Bossan.Peft: State-of-the-art parameter-efficient fine-tuning methods.https://github.com/huggingface/peft, 2022.
Paszke etal. [2019]A.Paszke, S.Gross, F.Massa, A.Lerer, J.Bradbury, G.Chanan, T.Killeen,Z.Lin, N.Gimelshein, L.Antiga, etal.Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019.
Radford etal. [2019]A.Radford, J.Wu, R.Child, D.Luan, D.Amodei, I.Sutskever, etal.Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019.
Sablayrolles etal. [2019]A.Sablayrolles, M.Douze, C.Schmid, Y.Ollivier, and H.Jégou.White-box vs black-box: Bayes optimal strategies for membershipinference.In Proceedings of the 36th International Conference on MachineLearning (ICML’19), page 5558–5567, 2019.
Salem etal. [2020]A.Salem, A.Bhattacharya, M.Backes, M.Fritz, and Y.Zhang. $\{$ Updates-Leak $\}$ : Data set inference and reconstruction attacksin online learning.In 29th USENIX security symposium (USENIX Security 20), pages1291–1308, 2020.
Samarati and Sweeney [1998]P.Samarati and L.Sweeney.Generalizing data to provide anonymity when disclosing information.In PODS, volume98, pages 10–1145, 1998.
Sankararaman etal. [2009]S.Sankararaman, G.Obozinski, M.I. Jordan, and E.Halperin.Genomic privacy and limits of individual detection in a pool.Nature genetics, 41(9):965–967, 2009.
Shokri etal. [2017]R.Shokri, M.Stronati, C.Song, and V.Shmatikov.Membership inference attacks against machine learning models(s&p’17).2017.
Suri and Evans [2022]A.Suri and D.Evans.Formalizing and estimating distribution inference risks.Proceedings on Privacy Enhancing Technologies, 2022.
Tirumala etal. [2022]K.Tirumala, A.Markosyan, L.Zettlemoyer, and A.Aghajanyan.Memorization without overfitting: Analyzing the training dynamics oflarge language models.Advances in Neural Information Processing Systems,35:38274–38290, 2022.
VanTrees [1968]H.VanTrees.Detection, estimation, and modulation theory. part 1-detection,estimation, and linear modulation theory.1968.
Wu etal. [2022]Y.Wu, N.Yu, Z.Li, M.Backes, and Y.Zhang.Membership inference attacks against text-to-image generation models.2022.
Ye etal. [2022]J.Ye, A.Maddi, S.K. Murakonda, V.Bindschaedler, and R.Shokri.Enhanced membership inference attacks against machine learningmodels.In Proceedings of the 2022 ACM SIGSAC Conference on Computerand Communications Security, pages 3093–3106, 2022.
Yeom etal. [2018]S.Yeom, I.Giacomelli, M.Fredrikson, and S.Jha.Privacy risk in machine learning: Analyzing the connection tooverfitting.In 2018 IEEE 31st computer security foundations symposium(CSF), pages 268–282. IEEE, 2018.
Zagoruyko and Komodakis [2016]S.Zagoruyko and N.Komodakis.Wide residual networks.In British Machine Vision Conference 2016. British MachineVision Association, 2016.
Zarifzadeh etal. [2023]S.Zarifzadeh, P.C.-J.M. Liu, and R.Shokri.Low-cost high-power membership inference by boosting relativity.2023.
Zhang etal. [2015]X.Zhang, J.Zhao, and Y.LeCun.Character-level convolutional networks for text classification.Advances in neural information processing systems, 28, 2015.

Appendix A Attack Details

A.1 (Simple) Hypothesis testing

The standard way to tackle the inference game (Def 1) is to apply statistical hypothesis tests [44, 5]:

	$\displaystyle H_{0}$	$\displaystyle:\text{The given }z\text{ is not a training point }(b=0).$
	$\displaystyle H_{1}$	$\displaystyle:\text{The given }z\text{ is a training point }(b=1).$

The likelihood ratio test (LRT) is then conducted

\frac{\mathbb{P}\left(\theta|H_{1}\right)}{\mathbb{P}\left(\theta|H_{0}\right)}

We mask the range center by $k$ words where $k$ is the Hamming distance specified by the range. Then we use BERT [12] to replace the mask with one of the top choices.

Appendix C Implementation details

For all PyTorch models, we use Adam as our optimizer with a learning rate of 0.001. For WideResnets, we use the training code from [4]. On AG News, the models are trained for 4 epochs. On other datasets, they are trained for 100 epochs.

All training are done on two Nvidia RTX 3090 GPUs. Training on AG News takes about 1 hour per epoch. Training other models takes less than one hour each.

Appendix D RaMIA on redacted data

Many large language models (LLMs) are trained with sensitive textual data. Some of the data with sensitive information redacted might be public available. Similar to our experiment with data with missing values, we can apply RaMIA to redacted data to identify which of them are used to train a target LLM. Accurately identify the redacted sentences paves the way for reconstructing them as a follow-up attack. Figure 5 shows the results. In this experiment, we use spaCy [18] to mask peoples’ names to simulate masking of personally identifiable information (PII). We then generate 10 possible sentences for each masked sentence using BERT, and conduct RaMIA. The MIA performance is the average attack performance over all 10 possible sentences.

Appendix E Extra results

In this section, we put extra experiment results.

	Purchase-100		CIFAR-10		CelebA		AG News
TPR@FPR(%)	1%	0.1%	1%	0.1%	1%	0.1%	1%	0.1%
MIA
LOSS	0	0	0.15	0	1.86	0.31	$0.08$	$0$
RMIA	2.18	0.37	2.40	0.21	1.69	0.19	$0.30$	$0$
RaMIA
LOSS	$0\pm 0$	$0\pm 0$	$1.17\pm 0.06$	$0.05\pm 0.03$	1.40	0.28	$1.10\pm 0.11$	$0\pm 0$
RMIA	$2.57\pm 1.58$	$0.57\pm 0.47$	$3.59\pm 0.11$	$0.80\pm 0.02$	1.44	0.22	$0.54\pm 0.12$	$0\pm 0$

	Purchase-100	CIFAR-10	CelebA	AG News
$\Delta\text{AUC}$	$2.62\pm 0.04$	$4.12\pm 0.06$	$5.4$	$1.20\pm 0.2$