Benefits and Earnings Public-Use File, 2020
Background
The Old-Age, Survivors, and Disability Insurance (OASDI) program provides monthly benefits to qualified retired and disabled workers and their dependents and to survivors of insured workers. Eligibility and benefit amounts are determined by the worker's contributions to Social Security.
The Benefits and Earnings Public-Use File, 2020 (BEPUF 2020) is a set of fully synthetic microdata records statistically representing a self-weighted 10 percent sample of the current adult OASDI beneficiary population as of December 2020. Each record includes benefit-related variables and the covered earnings history of the beneficiary. All variable values are generated by randomly selecting values from statistical distributions modeled on real values. Notably, as a fully synthetic data set, no record in the released data set can be matched to any real individual beneficiary.
Disclaimer on Synthetic Data Limitations
The BEPUF 2020 was generated for research and analysis purposes. This synthetic public-use file contains data modeled after real data, aiming to replicate the structure and statistical properties of the original benefits and earnings data while protecting the privacy and confidentiality of individuals. Additional steps were taken to further reduce the risk of identifying individuals.
The Social Security Administration (SSA) conducted an independent quality review, which included a comparison between a sample of real data records and the synthetic data. This review indicated similarities in descriptive statistics, including mean, median, standard deviation, variance, frequency tabulations, range, correlation, and covariance. Furthermore, the normalized histogram and box plot distributions were found to be comparable. SSA continues to conduct data validation on this synthetic dataset, extending beyond the basic summary statistics and distributional characteristics.
While we strive to ensure the accuracy and reliability of this synthetic data, it is important to recognize the following limitations:
- Not Real Data: Synthetic data is generated through models and may not accurately represent all real-world scenarios.
- Potential Bias: The models used to create synthetic data may introduce biases that do not exist in actual data. Users should exercise caution when interpreting results derived from synthetic datasets.
- Limited Applicability: The relevance of synthetic data to real-world situations may be restricted. Users are encouraged to validate findings with actual data whenever possible before making decisions based on synthetic datasets.
By utilizing this synthetic data, you acknowledge and accept these limitations. For questions or concerns related to the data, please contact statistics@ssa.gov.
Available Files
- User Guide
- 2020 Benefits Subfile
- CSV format (65 MB ZIP file, which unzips to a 342 MB CSV file)
- SAS data format (92 MB ZIP file, which unzips to a 473 MB SAS data file)
- 2020 Earnings Subfile
- CSV format (1.0 GB ZIP file, which unzips to a 4.6 GB CSV file)
- SAS data format (1.1 GB ZIP file, which unzips to 6.6 GB SAS data file)
This data set is very large.
Depending on your internet speed, it may take several minutes to download the Earnings subfile. Also, these files will not work properly in Microsoft Excel. Data software capable of handling large files should be used.
Related public-use files: