Preliminary Estimates of the Number of U.S. Workers Using a New Methodology for Assigning Geographic and Demographic Information in Administrative Data Records

by
Research and Statistics Note No. 2024-02 (released December 2024)

Michael Compson is with the Office of Research, Evaluation, and Statistics, Office of Retirement and Disability Policy, Social Security Administration.

Acknowledgments: I would like to thank Pat Purcell, Richard Chard, Glenn Springstead, Gayle Reznik, Safaa Amer, Bill Piet, and Sven Sinclair for comments on the note; Ben Pitkin and Jessie Dalrymple for editorial assistance; and Lewis Gaul for programming assistance. I dedicate this note to Attiat Ott, Jyh-Horng Lin, Gladstone Hutchinson, Bonnie Orcutt, Janice Yee, and Ute Schumacher from Clark University; Ron Durst, Steve Koenig, and Pat Sullivan from the Economic Research Service at the Department of Agriculture; John Kitchen, Mike Springer, Rachel Cononi, and John Hamber from the Department of the Treasury; Greg Diez, John Hennessey, Cherice Jefferies, Angela Harper, Theresa Wolf, John Qatsha, Paul Davies, Bill Piet, Russ Hudson, Hansa Patel, and Sam Foster from the Social Security Administration; and all of the many others who have helped me throughout my career.

Contents of this publication are not copyrighted; any items may be reprinted, but citation is requested. The findings and conclusions presented in this note are those of the author and do not necessarily represent the views of the Social Security Administration.

Introduction

Selected Abbreviations
CWHS Continuous Work History Sample
IRS Internal Revenue Service
MGD Master Geographic and Demographic
OCACT Office of the Chief Actuary
ORES Office of Research, Evaluation, and Statistics
SCC state and county code
SSA Social Security Administration
SSN Social Security number

The Office of Research, Evaluation, and Statistics (ORES) in the Social Security Administration (SSA) is developing a new methodology to generate estimates of U.S. employment and earnings for two of its annual statistical publications: the Annual Statistical Supplement to the Social Security Bulletin (hereafter, the Annual Statistical Supplement) and Earnings and Employment Data for Workers Covered Under Social Security and Medicare, by State and County (hereafter, Earnings and Employment).1 The new methodology will enable ORES to use a vastly larger sample of workers than is allowed by the methodology currently used to generate those estimates. This research and statistics note follows two Social Security Bulletin articles that detail the complex multistep process of developing the new methodology. Compson (2022) identifies the limitations of the existing methodology and describes the new methodology for assigning state of residence codes and identifying demographic information for the population of workers with tax records for earnings in 2017. The new methodology enables ORES to compile a Master Geographic and Demographic (MGD) data file, which contains information on state of residence, year of birth, and sex for a far greater number of workers—nearly all of those for whom earnings records were filed with the Internal Revenue Service (IRS)—in a given year.2

Compson (2024) describes how ORES applied the new methodology to extend the 2017 MGD file for a full 7-year span (2014–2020) and uses two distinct analyses to assess the new process. The first, a procedural analysis, uses internal SSA audit reports to assess the completeness and accuracy of the new methodology in processing tax records. It focuses on the procedure for assigning a single state and county code (SCC) and identifying demographic information for each worker. For example, it examines the number of records processed, the number of workers represented in those records, and the earnings data sources (IRS Forms W-2, W-2c, and 1040 Schedule SE) for workers in a year in the 2014–2020 period. The second analysis directly compares the results of the MGD process for assigning state of residence codes and identifying worker demographic information with those of the current methodology of the statistical publications. Specifically, the second analysis processes the underlying worker-level microdata using both the existing methodology and the MGD process, allowing a direct comparison of estimates resulting from the two methodologies.

To date, SSA has been unable to use microdata to provide the geographic and demographic characteristics of U.S. workers because the agency's 1-percent Continuous Work History Sample (CWHS) has been the only available data source. By enabling earnings and employment estimates based on administrative microdata for nearly all workers, the MGD files vastly expand labor market research possibilities. This note presents preliminary estimates generated using the MGD process so that federal statistical agencies that collect, analyze, or release U.S. labor market data, and other interested researchers and policy analysts, can assess the results and provide feedback.3

The worker population estimates generated by the MGD process and presented in this note are shown variously by sex, age, state, and type of earnings (wage and salary, self-employment) for the years 2014 through 2021. The MGD estimates are compared with two different benchmarks: estimates generated using the current methodology and published in SSA annual statistical publications, and unpublished estimates prepared independently by SSA's Office of the Chief Actuary (OCACT).

The note also highlights several novel approaches that emerged as ORES developed the MGD process. For example, the MGD file enabled ORES to identify three mutually exclusive earnings-type categories (wage and salary only, self-employment only, and both types in combination) to provide new insights on the U.S. workforce. Additionally, because the new methodology allows ORES to use microdata for the entire population of workers instead of a 1 percent sample, SSA publications will be able to include estimates for small jurisdictions that are currently suppressed to comply with data disclosure restrictions for small sample sizes. For example, the publications currently combine estimates for American Samoa, Guam, the Northern Mariana Islands, and the U.S. Virgin Islands into a single “other outlying areas” category. The MGD process will allow SSA to generate and publish estimates for each territory individually.

This note also shows how maps can be used to present worker counts by state and county and describes the process with which worker counts by state can be revised to reflect newly arriving data. Currently, the state- and county-level worker count estimates in the statistical publications are based on the tax records processed in the calendar year that immediately follows the tax year (that is, the year when the reported income was earned). Once published, those estimates are not updated to reflect any records for that tax year that were not processed until later years. Although those records represent a very small proportion of the tax year records, their exclusion might affect the precision of the estimates. The new methodology will allow ORES to update its estimates based on more complete data.

Background

In 2023, ORES produced the MGD file that contains geographic and demographic data for the population of workers with earnings records for tax year 2021, which were processed in calendar year 2022. ORES now has assembled MGD files for each tax year from 2014 through 2021. This note presents preliminary estimates from those files. I refer to the estimates as preliminary because the development of the new estimation method is ongoing, and the MGD component focuses solely on geographic and demographic information. The MGD files do not contain any information regarding the type or amount of earnings because the address information is the only data extracted from a worker's tax records to assign state and county of residence. As a result, the MGD files alone cannot be used to determine whether the earnings reported on the individual's tax records are covered or not covered under the Social Security or Medicare programs, nor can they provide the corresponding earnings amounts. Thus, for this note, the estimates are limited to worker counts, which are assumed to approximate the number of all workers subject to the Medicare Hospital Insurance payroll tax, because the Medicare tax is nearly universal for the U.S. workforce.4

Second, the MGD files contain data only for the primary tax year that were processed in the following calendar year. Each year, SSA processes the hundreds of millions of IRS Forms W-2 and W-2c (filed by employers) and the millions of Form 1040 Schedule SEs (filed by the self-employed) it receives from the IRS.5 The primary tax year for the records SSA processed in 2021 was 2020. However, during a calendar year, SSA also receives and processes some records for tax years other than the primary tax year. These nonprimary tax year data are not included in the MGD files.

Table 1 shows the number of primary and nonprimary tax year records processed via the MGD methodology each year from 2015 through 2022. It highlights the predominance of the primary tax year's records in each processing year, even as SSA continues to receive additional records for a given tax year for subsequent processing years. For example, in 2015, SSA processed 259,791,044 records for the primary tax year 2014. SSA also processed 564 records for tax year 2015 that year, showing that employers and self-employed individuals occasionally file tax forms before the end of the earnings year. Earnings records for tax year 2014 continued to arrive in SSA for processing each year thereafter, although the flow rapidly dwindled. The pattern recurs in subsequent tax years.

Table 1. Timing of earnings record processing: Alignment of tax years with processing years 2015–2022
Tax year Total records processed Processing year Total non-primary tax year records processed a
2015 2016 2017 2018 2019 2020 2021 2022
2014
Number 263,933,101 259,791,044 2,201,919 1,511,457 258,316 98,717 26,303 27,469 17,876 4,142,057
Percent 100.00 98.43 0.83 0.57 0.10 0.04 0.01 0.01 0.01 1.57
2015
Number 273,432,854 564 269,436,834 2,968,364 556,611 314,294 49,169 68,105 38,913 3,996,020
Percent 100.00 (L) 98.54 1.09 0.20 0.11 0.02 0.02 0.01 1.46
2016
Number 281,673,837 . . . 319,278 278,488,758 1,924,877 590,378 161,381 110,847 78,318 3,185,079
Percent 100.00 . . . 0.11 98.87 0.68 0.21 0.06 0.04 0.03 1.13
2017
Number 282,182,333 . . . . . . 233,222 279,435,723 1,737,114 404,899 243,365 128,010 2,746,610
Percent 100.00 . . . . . . 0.08 99.03 0.62 0.14 0.09 0.05 0.97
2018
Number 287,745,177 . . . . . . . . . 309,506 284,888,320 1,712,352 550,783 284,216 2,856,857
Percent 100.00 . . . . . . . . . 0.11 99.01 0.60 0.19 0.10 0.99
2019
Number 290,015,773 . . . . . . . . . . . . 258,985 286,651,734 2,471,215 633,839 3,364,039
Percent 100.00 . . . . . . . . . . . . 0.09 98.84 0.85 0.22 1.16
2020
Number 280,036,115 . . . . . . . . . . . . . . . 241,124 276,478,907 3,316,084 3,557,208
Percent 100.00 . . . . . . . . . . . . . . . 0.09 98.73 1.18 1.27
2021
Number 291,162,336 . . . . . . . . . . . . . . . . . . 240,812 290,921,524 240,812
Percent 100.00 . . . . . . . . . . . . . . . . . . 0.08 99.92 0.08
2022
Number 253,805 . . . . . . . . . . . . . . . . . . . . . 253,805 0
Percent 100.00 . . . . . . . . . . . . . . . . . . . . . 100.00 0.00
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTES: The primary tax year is the year immediately preceding the processing year.
Rounded components of percentage distributions do not necessarily sum to 100.00.
(L) = less than 0.005; . . . = not applicable.
a. Subject to change because the number and share of nonprimary tax year records increases from year to year.

The MGD files containing the primary tax year data currently account for more than 98 percent of all the earnings records that SSA has processed for tax years 2014 to 2022. However, Table 1 also shows that about 24 million nonprimary tax year records are excluded from the MGD files. Depending on the mix of the nonprimary tax year records and the number of “new” workers whose data are not included in the MGD file, the effect of these additional records is likely to be small, compared with the vastly higher number of worker records processed for a given primary tax year. Nevertheless, ORES is currently assessing methodologies to incorporate the nonprimary tax year data into the MGD files.

Although the figures in this note are preliminary, they highlight the progress to date in developing a new methodology for generating annual employment and earnings estimates. For counts of all workers and of wage and salary workers, the preliminary estimates derived from the MGD files are good proxies for the counts of workers covered under Medicare, as estimated and published in statistical publications both by ORES and, separately, by OCACT. However, the numbers of self-employed individuals estimated using the MGD files differ more substantially from those in the published tables. These greater differences may indicate a problem with the MGD files as currently structured. The greater differences between MGD estimates and published tables in the counts of self-employed individuals also arise when comparing results for individuals with self-employment income only, and those with both wage and salary and self-employment income, in a single year.

The preliminary results also raise other concerns with the MGD files. First, there are reasons to question the accuracy of the MGD file for tax year 2014, the earliest year of data currently available. For that year, an SCC based on the address information in the tax forms could not be assigned for an anomalously high number of job-level records.6 Second, a few records in the tax year 2021 MGD file identify a valid state of residence but an unknown county, despite having an SCC that should identify the county.

Identifying Types of Earnings

SSA statistical publications include tables showing estimates of earnings and employment by type of earnings (wage and salary, self-employment income). The tax records underlying the MGD process—Forms W-2 and W-2c and Schedule SE—enable ORES to identify workers as belonging under those categories.7 For a given tax year, a worker's earnings will be tracked on one or more tax forms, in one of seven mutually exclusive categories:

  • W-2 only
  • W-2c only
  • Schedule SE only
  • W-2 and W-2c
  • W-2 and Schedule SE
  • W-2c and Schedule SE
  • W-2, W-2c, and Schedule SE

Given these data source categories, ORES determines whether a worker had wage and salary earnings or self-employment income. Yet these categories also enable a more detailed earnings-type subcategorization not shown in the ORES published tables: workers with wage and salary earnings only, individuals with self-employment income only, and those with both types of earnings during the year (so-called combination workers). This additional detail offers insights into the U.S. labor marker that are not available in the tables SSA currently publishes.

The preliminary estimates derived from the MGD files are based on the number of unique Social Security numbers (SSNs) associated with tax forms that reported earnings for a given year. For this analysis, the estimates are assembled and formatted to replicate selected tables from the Annual Statistical Supplement and Earnings and Employment.

Evaluating the MGD Process Results

The analysis compares MGD-process estimates of the population of workers for 2014 through 2021 against unpublished estimates prepared by OCACT as inputs for estimates published in the Annual Report of the Board of Trustees of the Federal Old-Age and Survivors Insurance and Federal Disability Insurance Trust Funds and the estimates published by ORES in the Annual Statistical Supplement and Earnings and Employment. OCACT separately estimates the numbers of workers covered under Social Security and Medicare and distinguishes workers with wage and salary earnings from those with self-employment income. However, OCACT does not prepare worker count estimates by sex or age. Therefore, the MGD-process estimates that use those breakdowns are compared with selected tables from the ORES statistical publications.

I approach the comparison incrementally. The first step involves an analysis of the content of three key data fields in the MGD files: sex, age, and SSN. After removing records with invalid SSNs, or with missing or dubious values in the other data fields, the comparison of the MGD process with the two benchmark estimates proceeds.

I first compare the MGD worker count estimates with the OCACT estimates of Medicare-covered workers. These estimates are shown not only for all workers, wage and salary workers, and self-employed individuals—the same earnings-type categories that are used in the statistical publications—but also for workers with only wage and salary earnings, only self-employment income, and both earnings types—categories that are not found in the statistical publications. Then, I compare MGD worker count estimates with those published in the statistical publications—first, by sex and age; then, by state and other area, including some U.S. territories for which estimates are not currently available in the publications because of data disclosure restrictions. Those restrictions require ORES to suppress data that are based on unweighted sample sizes below a certain threshold. Because the statistical publications base their estimates on a 1 percent sample of workers, the estimates for many small jurisdictions must be suppressed. Therefore, this note also considers the effect of replacing the 1 percent sample with a 10 percent sample of workers on the number of county-level estimates that are suppressed to comply with data nondisclosure rules.8 The note closes with examples of maps and new tables that ORES may add to its statistical publications.

Removing Records with Invalid SSNs

Table 2 presents the number of workers whose earnings records have invalid and valid SSNs in the MGD files for tax years 2014 to 2021. An SSN is deemed valid if it is present in SSA's Numerical Identification System (Numident) administrative data file. The Numident file contains records for all SSNs ever issued. The information is derived from SSA Form SS-5, the application for an SSN, which contains the individual's name, place and date of birth, and sex. The percentage of worker records with an invalid SSN is less than 1 percent in all tax years except 2021, for which it is 1.1 percent. The MGD process cannot assign a year of birth or sex to records for workers with an invalid SSN. Further, those workers will not have earnings data in SSA's Master Earnings File (MEF). Therefore, the tax records for these workers are omitted from the analysis.

Table 2. Unique SSNs in records processed by ORES, by whether valid, 2015–2022
Processing year Primary tax year Number Percent
Total Valid Invalid Total Valid Invalid
2015 2014 170,260,465 168,962,452 1,298,013 100.00 99.24 0.76
2016 2015 174,002,077 172,610,971 1,391,106 100.00 99.20 0.80
2017 2016 176,723,136 175,237,389 1,485,747 100.00 99.16 0.84
2018 2017 178,863,694 177,339,293 1,524,401 100.00 99.15 0.85
2019 2018 181,131,038 179,553,005 1,578,033 100.00 99.13 0.87
2020 2019 182,622,507 181,050,599 1,571,908 100.00 99.14 0.86
2021 2020 181,232,792 179,465,649 1,767,143 100.00 99.02 0.98
2022 2021 183,375,419 181,305,089 2,070,330 100.00 98.87 1.13
SOURCE: Author's calculations based on SSA data processing audit reports.

Sex of Workers with Valid SSNs

Table 3 presents the number of records for workers with valid SSNs by the type of sex identifier shown in the Numident file: men, women, missing, and unknown. Workers whose records indicate a missing or unknown sex identifier represent approximately 0.5 percent of all workers with valid SSNs. ORES' published earnings tables do not include workers with a missing or unknown sex identifier.9 Therefore, these records are removed from the MGD file for this analysis.10

Table 3. Worker records with valid SSNs, by sex identifier, 2015–2022
Processing year Primary tax year Total Men Women Missing Unknown Total excluding "missing" and "unknown"
  Number
2015 2014 168,962,452 86,658,038 81,417,527 815,347 71,540 168,075,565
2016 2015 172,610,971 88,566,852 83,171,765 802,317 70,037 171,738,617
2017 2016 175,237,389 89,771,356 84,606,950 790,904 68,179 174,378,306
2018 2017 177,339,293 90,785,235 85,719,996 768,051 66,011 176,505,231
2019 2018 179,553,005 91,783,693 86,953,608 751,651 64,053 178,737,301
2020 2019 181,050,599 92,354,312 87,900,714 733,572 62,001 180,255,026
2021 2020 179,465,649 91,436,558 87,269,927 699,852 59,312 178,706,485
2022 2021 181,305,089 92,377,562 88,172,771 697,568 57,188 180,550,333
  Percent
2015 2014 100.00 51.29 48.19 0.48 0.04 99.48
2016 2015 100.00 51.31 48.18 0.46 0.04 99.49
2017 2016 100.00 51.23 48.28 0.45 0.04 99.51
2018 2017 100.00 51.19 48.34 0.43 0.04 99.53
2019 2018 100.00 51.12 48.43 0.42 0.04 99.55
2020 2019 100.00 51.01 48.55 0.41 0.03 99.56
2021 2020 100.00 50.95 48.63 0.39 0.03 99.58
2022 2021 100.00 50.95 48.63 0.39 0.03 99.58
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTE: Rounded components of percentage distributions do not necessarily sum to 100.00.

Age of Workers

For the MGD files, age is identified only for those workers whose records have a valid SSN and a male or female sex identifier. Because year of birth is sometimes entered incorrectly or has not been validated in the administrative data files, some records indicate worker ages of 0 (or negative years) and others indicate ages of 100 or more. Validating the year of birth in a worker's record may not occur before the individual applies for benefits. As a result, the administrative file may contain erroneous data for the small number of workers whose year of birth was entered incorrectly. For this analysis, the records for workers whose indicated age in the administrative data was less than 1 or greater than 99 were removed from the MGD files. Table 4 shows the workers' records by age group. The omitted records accounted for approximately 0.12 percent of workers in each tax year.11

Table 4. Worker-level earnings records processed with a valid SSN and sex identified as male or female, by indicated age, tax years 2014–2021
Tax year Total Age group
1–19 20–29 30–39 40–49 50–59 60–69 70–79 80–89 90–99 Other (omitted)
  Number
2014 168,075,565 9,354,556 37,295,556 33,748,958 32,850,996 32,674,509 17,360,603 3,857,239 575,610 156,506 201,032
2015 171,738,617 9,787,238 38,052,202 34,746,451 32,884,493 33,010,252 18,230,604 4,060,717 600,205 162,569 203,886
2016 174,378,306 10,128,058 38,611,294 35,585,092 32,867,668 32,989,336 18,849,674 4,348,224 619,048 169,870 210,042
2017 176,505,231 10,344,670 38,917,518 36,284,382 32,997,976 32,879,436 19,407,851 4,676,885 630,347 172,460 193,706
2018 178,737,301 10,552,875 39,147,652 37,074,020 33,092,647 32,810,860 20,025,230 4,988,873 657,386 178,274 209,484
2019 180,255,026 10,680,854 39,200,689 37,776,540 33,197,084 32,691,758 20,456,250 5,189,152 668,324 182,229 212,146
2020 178,706,485 10,028,657 38,234,735 37,896,190 32,922,690 32,417,085 20,776,419 5,359,425 680,290 183,515 207,479
2021 180,550,333 11,356,370 38,311,751 38,241,081 32,973,127 32,176,815 20,925,598 5,438,688 693,070 197,349 236,484
  Percent
2014 100.00 5.57 22.19 20.08 19.55 19.44 10.33 2.29 0.34 0.09 0.12
2015 100.00 5.70 22.16 20.23 19.15 19.22 10.62 2.36 0.35 0.09 0.12
2016 100.00 5.81 22.14 20.41 18.85 18.92 10.81 2.49 0.36 0.10 0.12
2017 100.00 5.86 22.05 20.56 18.70 18.63 11.00 2.65 0.36 0.10 0.11
2018 100.00 5.90 21.90 20.74 18.51 18.36 11.20 2.79 0.37 0.10 0.12
2019 100.00 5.93 21.75 20.96 18.42 18.14 11.35 2.88 0.37 0.10 0.12
2020 100.00 5.61 21.40 21.21 18.42 18.14 11.63 3.00 0.38 0.10 0.12
2021 100.00 6.29 21.22 21.18 18.26 17.82 11.59 3.01 0.38 0.11 0.13
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTES: Omitted records are for workers with indicated ages of younger than 1 or older than 99.
Rounded components of percentage distributions do not necessarily sum to 100.00.

All Adjustments

Table 5 summarizes the adjustments and shows that they remove about 1.5 percent of worker-level records in the MGD files each year (from 1.4 percent for tax year 2014 to 1.7 percent for tax year 2021). Nearly all the increase in removals over time is due to the rising number of records with invalid SSNs, from 0.8 percent for 2014 to 1.1 percent for 2022. About 0.5 percent of records for 2014 were removed with missing or unknown values for sex, as were about 0.4 percent for 2021. Removing records with outlier age values reduced the number of worker-level records by 0.11 percent to 0.13 percent over the years.

Table 5. Worker-level earnings records removed from the MGD file, by reason, tax years 2014–2021
Tax year Total records at outset of processing Records removed because of— Total records removed Records remaining in file
Invalid SSNs Missing or unknown sex identifier Age identifier not within 1–99
  Number
2014 170,260,465 1,298,013 886,887 201,032 2,385,932 167,874,533
2015 174,002,077 1,391,106 872,354 203,886 2,467,346 171,534,731
2016 176,723,136 1,485,747 859,083 210,042 2,554,872 174,168,264
2017 178,863,694 1,524,401 834,062 193,706 2,552,169 176,311,525
2018 181,131,038 1,578,033 815,704 209,484 2,603,221 178,527,817
2019 182,622,507 1,571,908 795,573 212,146 2,579,627 180,042,880
2020 181,232,792 1,767,143 759,164 207,479 2,733,786 178,499,006
2021 183,375,419 2,070,330 754,756 236,484 3,061,570 180,313,849
  Percent
2014 100.00 0.76 0.52 0.12 1.40 98.60
2015 100.00 0.80 0.50 0.12 1.42 98.58
2016 100.00 0.84 0.49 0.12 1.45 98.55
2017 100.00 0.85 0.47 0.11 1.43 98.57
2018 100.00 0.87 0.45 0.12 1.44 98.56
2019 100.00 0.86 0.44 0.12 1.41 98.59
2020 100.00 0.98 0.42 0.11 1.51 98.49
2021 100.00 1.13 0.41 0.13 1.67 98.33
SOURCE: Author's calculations based on SSA data processing audit reports.
NOTE: Rounded percentages do not necessarily sum to subtotals.

Notes on the Benchmark Estimates

The benchmarks with which the preliminary MGD-process estimates are compared in this analysis are the unpublished OCACT estimates of the total Medicare-covered worker population for 2023 and ORES estimates of the worker population by type of earnings, sex, and age from the Annual Statistical Supplement and Earnings and Employment.

Comparing MGD-Process and OCACT Estimates of the Total Worker Population

Table 6 compares the MGD-process and OCACT estimates of the number of workers by type of earnings. The MGD-process estimates for all workers are very similar to the OCACT estimates for Medicare-covered workers. The estimates differ by less than 1 percent in all years, and by less than 0.5 percent for 2015–2021. The percentage differences for all workers with wage and salary earnings are also relatively small, varying only between 0.2 percent and 0.5 percent across the years. However, for all individuals with self-employment earnings, the absolute value of the percentage differences ranges from a low of 3.8 percent for 2021 to as high as 14.0 percent for 2014—much higher than the differences for all workers and all wage and salary workers.

Table 6. Comparing MGD-process and OCACT estimates of worker counts by type of earnings, tax years 2014–2021
Tax year MGD-process estimate OCACT estimate Percentage difference
  All workers
2014 167,874,533 169,309,691 -0.85
2015 171,534,731 172,035,347 -0.29
2016 174,168,264 174,570,580 -0.23
2017 176,311,525 176,669,106 -0.20
2018 178,527,817 179,027,375 -0.28
2019 180,042,880 180,935,106 -0.49
2020 178,499,006 178,935,994 -0.24
2021 180,313,849 180,376,184 -0.03
  Workers with wage and salary earnings
  Total
2014 158,682,500 158,414,678 0.17
2015 161,480,185 161,138,941 0.21
2016 164,171,152 163,621,105 0.34
2017 166,165,895 165,592,950 0.34
2018 168,335,976 167,817,387 0.31
2019 170,228,900 169,552,223 0.40
2020 168,662,331 168,028,227 0.38
2021 169,486,721 168,627,700 0.51
  With wage and salary earnings only
2014 150,590,244 149,210,580 0.92
2015 152,466,398 151,757,905 0.47
2016 154,975,096 154,147,061 0.54
2017 156,878,918 156,103,819 0.50
2018 158,899,524 158,109,466 0.50
2019 161,014,319 159,811,288 0.75
2020 159,946,870 159,052,811 0.56
2021 159,504,951 160,332,750 -0.52
  Workers with self-employment earnings
  Total
2014 17,284,289 20,099,111 -14.00
2015 19,068,333 20,277,442 -5.96
2016 19,193,168 20,423,519 -6.02
2017 19,432,607 20,565,287 -5.51
2018 19,628,293 20,917,910 -6.17
2019 19,028,561 21,123,818 -9.92
2020 18,552,136 19,883,184 -6.69
2021 20,808,898 20,043,434 3.82
  With self-employment earnings only
2014 9,192,033 9,204,098 -0.13
2015 10,054,546 9,381,036 7.18
2016 9,997,112 9,474,044 5.52
2017 10,145,630 9,489,131 6.92
2018 10,191,841 9,707,921 4.98
2019 9,813,980 9,740,935 0.75
2020 9,836,675 8,975,416 9.60
2021 10,827,128 8,294,950 30.53
  Workers with both wage and salary and self-employment earnings
2014 8,092,256 10,895,013 -25.73
2015 9,013,787 10,896,406 -17.28
2016 9,196,056 10,949,475 -16.01
2017 9,286,977 11,076,156 -16.15
2018 9,436,452 11,209,988 -15.82
2019 9,214,581 11,382,882 -19.05
2020 8,715,461 10,907,767 -20.10
2021 9,981,770 11,748,484 -15.04
SOURCE: Author's calculations based on SSA data processing audit reports.

For workers with wage and salary earnings only, Table 6 shows that the differences between the MGD-process and the OCACT estimates range from 0.9 percent for 2014 to minus 0.5 percent for 2021. The MGD-process and OCACT estimates for the two remaining categories—self-employed individuals, either with or without any wage and salary earnings—are very different and likely underlie the large divergence in estimates for all self-employed individuals noted above. The differences for individuals with only self-employment income range from 30.5 percent higher for the MGD estimates for 2021 to 0.1 percent lower for 2014. The differences for workers with both types of earnings ranges from 15.0 percent lower for the MGD process for 2021 to 25.7 percent lower for 2014.

The wide differences in MGD-process and OCACT estimates of self-employed individuals have two possible explanations. First, the MGD files cannot differentiate workers who were covered under Medicare from those who are not. Thus, the MGD process can only approximate the number of Medicare-covered workers. However, given that the MGD-process and OCACT estimates for wage and salary workers are so similar and the fact that Medicare coverage is nearly universal, this explanation does not seem viable. A direct comparison will have to await the development of the new methodology for generating the annual earnings estimates, which will include a process for distinguishing between workers with Social Security (Old-Age, Survivors, and Disability Insurance) taxable earnings and those with Medicare (Hospital Insurance) taxable earnings.

The second possible explanation is that the MGD files contain only the primary tax year data processed in the next calendar year. Table 1 showed that additional workers' records are processed in subsequent calendar years. For example, as of 2022, 1.57 percent of the tax year 2014 records were processed in a year other than 2015 and were therefore omitted from the MGD file. It is possible that tax records processed in nonprimary processing years are proportionally higher for workers with self-employment income, by margins wide enough to narrow the differences between the MGD-process estimates and the OCACT estimates. To consider this possibility, I examined the distribution of the 4,142,057 additional tax year 2014 records among W-2s, W-2cs, and Schedule SEs.

Table 7 shows that in each year from 2015 through 2022, at least 99 percent of the Form W-2s that SSA processed were for the primary tax year, but between 4.5 percent and 10.7 percent of the Schedule SEs processed were for a nonprimary tax year. The 10.7 percent peak occurred in 2021, likely because of a large pandemic-related backlog that IRS experienced that year. Thus, although self-employed individuals are much fewer in number than wage and salary workers, their proportions among nonprimary tax year record processing likely explains at least some of the differences between the MGD-process and OCACT estimates for self-employed individuals from 2014 to 2020.12

Table 7. ORES extraction volume for primary and nonprimary tax year data: Numbers of tax records processed and unique SSNs contained therein, by type of form, 2015–2022
Processing year Primary tax year Records processed Number of unique SSNs
Number Percent For primary tax year For other tax year
Total For primary tax year For other tax year Total For primary tax year For other tax year
  Form W-2
2015 2014 237,765,591 235,615,820 2,149,771 100.00 99.10 0.90 160,535,225 2,007,148
2016 2015 245,528,242 243,723,231 1,805,011 100.00 99.26 0.74 163,366,783 1,704,720
2017 2016 251,509,338 249,530,278 1,979,060 100.00 99.21 0.79 166,000,893 1,839,742
2018 2017 254,788,713 253,365,171 1,423,542 100.00 99.44 0.56 168,108,594 1,329,548
2019 2018 259,798,529 258,510,183 1,288,346 100.00 99.50 0.50 170,275,487 1,217,177
2020 2019 262,691,363 261,583,557 1,107,806 100.00 99.58 0.42 172,238,245 1,041,896
2021 2020 250,693,566 249,832,215 861,351 100.00 99.66 0.34 170,623,150 812,196
2022 2021 262,930,243 261,815,697 1,114,546 100.00 99.58 0.42 171,636,437 1,040,070
  Form 1040 Schedule SE
2015 2014 18,782,168 17,813,779 968,389 100.00 94.84 5.16 17,812,721 728,932
2016 2015 20,681,589 19,664,474 1,017,115 100.00 95.08 4.92 19,663,466 780,799
2017 2016 20,764,746 19,804,112 960,634 100.00 95.37 4.63 19,803,275 750,329
2018 2017 21,194,793 20,050,718 1,144,075 100.00 94.60 5.40 20,050,006 908,497
2019 2018 21,380,446 20,278,455 1,101,991 100.00 94.85 5.15 20,277,674 859,115
2020 2019 20,531,437 19,601,328 930,109 100.00 95.47 4.53 19,601,024 795,290
2021 2020 21,618,115 19,308,932 2,309,183 100.00 89.32 10.68 19,308,531 2,031,568
2022 2021 23,826,394 21,714,574 2,111,820 100.00 91.14 8.86 21,714,034 1,821,127
SOURCE: Author's calculations based on SSA data processing audit reports.

Comparing MGD-Process and ORES Published Estimates of Worker Counts

Table 8 shows the number of all workers by sex for 2014–2021 as estimated by the MGD process and as published in Table 4 of Earnings and Employment. Tables 9 and 10 repeat Table 8 with estimates for wage and salary workers and self-employed individuals, respectively. Tables 8 and 9 show that the MGD-estimated numbers of all and wage and salary workers are similar to those published in Earnings and Employment, differing by less than 1 percent for men and women in every year (except for all workers for 2014).

Table 8. Comparing MGD-process and ORES published estimates of worker counts by sex, tax years 2014–2021
Tax year MGD-process estimate Published estimate Percentage difference
  All workers
2014 167,874,533 169,691,000 -1.07
2015 171,534,731 172,369,000 -0.48
2016 174,168,264 175,215,999 -0.60
2017 176,311,525 176,962,000 -0.37
2018 178,527,817 179,584,999 -0.59
2019 180,042,880 180,896,000 -0.47
2020 178,499,006 178,494,000 0.00
2021 180,313,849 180,359,000 -0.03
  Men
2014 86,554,357 87,565,826 -1.16
2015 88,462,230 88,914,085 -0.51
2016 89,664,098 90,201,580 -0.60
2017 90,686,951 90,985,759 -0.33
2018 91,677,783 92,177,295 -0.54
2019 92,247,594 92,615,972 -0.40
2020 91,332,698 91,257,458 0.08
2021 92,260,151 92,158,375 0.11
  Women
2014 81,320,176 82,125,174 -0.98
2015 83,072,501 83,454,915 -0.46
2016 84,504,166 85,014,420 -0.60
2017 85,624,574 85,976,242 -0.41
2018 86,850,034 87,407,704 -0.64
2019 87,795,286 88,280,028 -0.55
2020 87,166,308 87,236,543 -0.08
2021 88,053,698 88,200,625 -0.17
SOURCES: Author's calculations based on SSA data processing audit reports; and Earnings and Employment Data for Workers Covered Under Social Security and Medicare, by State and County, 2014–2021 editions, Table 4.
NOTE: Published estimates for men and women may not sum to all-workers total because of rounding.
Table 9. Comparing MGD-process and ORES published estimates of wage and salary worker counts by sex, tax years 2014–2021
Tax year MGD-process estimate Published estimate Percentage difference
  All wage and salary workers
2014 158,682,500 158,852,000 -0.11
2015 161,480,185 161,237,000 0.15
2016 164,171,152 164,206,999 -0.02
2017 166,165,895 166,205,000 -0.02
2018 168,335,976 168,352,999 -0.01
2019 170,228,900 170,019,000 0.12
2020 168,662,331 168,277,000 0.23
2021 169,486,721 168,611,000 0.52
  Men
2014 81,255,302 81,203,373 0.06
2015 82,664,297 82,396,749 0.32
2016 83,907,441 83,800,324 0.13
2017 84,850,288 84,740,013 0.13
2018 85,830,038 85,672,680 0.18
2019 86,672,090 86,400,315 0.31
2020 85,777,999 85,425,731 0.41
2021 86,192,409 85,535,561 0.77
  Women
2014 77,427,198 77,648,628 -0.29
2015 78,815,888 78,840,251 -0.03
2016 80,263,711 80,406,675 -0.18
2017 81,315,607 81,464,988 -0.18
2018 82,505,938 82,680,320 -0.21
2019 83,556,810 83,618,685 -0.07
2020 82,884,332 82,851,269 0.04
2021 83,294,312 83,075,439 0.26
SOURCES: Author's calculations based on SSA data processing audit reports; and Earnings and Employment Data for Workers Covered Under Social Security and Medicare, by State and County, 2014–2021 editions, Table 4.
NOTE: Published estimates for men and women may not sum to all-workers total because of rounding.
Table 10. Comparing MGD-process and ORES published estimates of self-employed individual counts by sex, tax years 2014–2021
Tax year MGD-process estimate Published estimate Percentage difference
  All self-employed individuals
2014 17,284,289 19,862,000 -12.98
2015 19,068,333 20,650,000 -7.66
2016 19,193,168 20,694,000 -7.25
2017 19,432,607 20,552,000 -5.45
2018 19,628,293 20,985,000 -6.47
2019 19,028,561 20,905,000 -8.98
2020 18,552,136 19,430,000 -4.52
2021 20,808,898 21,543,000 -3.41
  Men
2014 9,753,284 11,074,485 -11.93
2015 10,762,981 11,470,475 -6.17
2016 10,807,316 11,691,072 -7.56
2017 10,898,755 11,541,010 -5.56
2018 10,945,950 11,727,973 -6.67
2019 10,496,410 11,544,635 -9.08
2020 10,230,550 10,727,143 -4.63
2021 11,393,362 11,810,926 -3.54
  Women
2014 7,531,005 8,787,515 -14.30
2015 8,305,352 9,179,525 -9.52
2016 8,385,852 9,002,928 -6.85
2017 8,533,852 9,010,990 -5.30
2018 8,682,343 9,257,026 -6.21
2019 8,532,151 9,360,365 -8.85
2020 8,321,586 8,702,857 -4.38
2021 9,415,536 9,732,074 -3.25
SOURCES: Author's calculations based on SSA data processing audit reports; and Earnings and Employment Data for Workers Covered Under Social Security and Medicare, by State and County, 2014–2021 editions, Table 4.
NOTE: Published estimates for men and women may not sum to all-workers total because of rounding.

The estimated numbers of self-employed individuals differ more substantially, with the MGD estimates being lower both for men and women in each year (Table 10). I expect the differences between the MGD-process and the published estimates to be consistent with the differences between the MGD-process and OCACT estimates because the published estimates reflect the OCACT estimates. Interestingly, the percentage differences in the 2014 estimates far exceed the differences for the other years. The cause of this discrepancy is not clear, but it may involve the high percentage of workers, noted earlier, who were not assigned an SCC in the MGD file for tax year 2014.

Comparing MGD-Process and ORES Published Estimates of Worker Counts by Age

ORES publishes estimates of the number of workers with Medicare taxable earnings by age, sex, and state or other area in Table 5 of Earnings and Employment. For estimates by age group, regardless of methodology, one can expect that the groups with the smallest populations (those younger than 20 and those aged 70 or older) are likely to be subject to the widest variation between the MGD-process and published estimates. Likewise, if the population entering an age group is unusually larger (or smaller) than the population aging out of it in a given year, some volatility in the estimates can be expected.

Table 11 shows the MGD and published estimates of numbers of workers with Medicare taxable earnings by age group. Interestingly, the MGD estimates for workers who are younger than 20 and those aged 20–29 are higher than the published estimates for each year. As expected, the percentage differences between the MGD and the published estimates for workers who are younger than 20 and those aged 70 or older are relatively large compared with those of the other age groups. The MGD estimates are lower than the published estimates for workers aged 30–64, although the percentage differences are not excessive.

Table 11. Comparing MGD-process and ORES published estimates of the count of workers with Medicare-taxable earnings by age group, tax years 2014–2021
Tax year Total Age group
Younger than 20 20–29 30–39 40–49 50–59 60–61 62–64 65–69 70 or older
  MGD-process estimates
2014 167,876,547 9,354,556 37,295,556 33,748,958 32,850,996 32,674,509 5,204,839 6,140,967 6,014,797 4,589,355
2015 171,536,746 9,787,238 38,052,202 34,746,451 32,884,493 33,010,252 5,395,968 6,427,233 6,407,403 4,823,491
2016 174,170,280 10,128,058 38,611,294 35,585,092 32,867,668 32,989,336 5,501,762 6,686,701 6,661,211 5,137,142
2017 176,313,542 10,344,670 38,917,518 36,284,382 32,997,976 32,879,436 5,640,679 6,887,407 6,879,765 5,479,692
2018 178,529,835 10,552,875 39,147,652 37,074,020 33,092,647 32,810,860 5,722,724 7,118,296 7,184,210 5,824,533
2019 180,044,899 10,680,854 39,200,689 37,776,540 33,197,084 32,691,758 5,767,466 7,251,765 7,437,019 6,039,705
2020 178,501,026 10,028,657 38,234,735 37,896,190 32,922,690 32,417,085 5,842,942 7,324,366 7,609,111 6,223,230
2021 180,315,870 11,356,370 38,311,751 38,241,081 32,973,127 32,176,815 5,878,494 7,362,651 7,684,453 6,329,107
  Published estimates
2014 169,691,000 8,662,871 37,247,740 34,383,167 33,715,360 33,599,332 5,311,791 6,232,942 6,133,348 4,404,449
2015 172,369,000 9,139,989 37,839,736 35,196,056 33,504,082 33,733,326 5,488,668 6,474,186 6,451,125 4,541,833
2016 175,215,999 9,522,185 38,501,123 36,114,313 33,516,074 33,727,312 5,624,381 6,768,568 6,668,462 4,773,582
2017 176,962,000 9,785,167 38,705,169 36,724,946 33,527,771 33,493,773 5,751,399 6,973,590 6,848,846 5,151,341
2018 179,584,999 10,030,325 39,013,651 37,565,640 33,734,133 33,438,032 5,903,435 7,240,876 7,172,040 5,486,868
2019 180,896,000 10,167,823 38,984,635 38,235,708 33,794,683 33,249,626 5,916,361 7,395,201 7,475,455 5,676,508
2020 178,494,000 9,560,502 37,948,711 38,180,532 33,365,210 32,723,498 5,931,874 7,432,821 7,593,092 5,757,759
2021 180,359,000 10,877,583 38,170,303 38,507,569 33,435,386 32,490,342 5,948,172 7,458,078 7,677,135 5,794,431
  Percentage difference
2014 -1.07 7.98 0.13 -1.84 -2.56 -2.75 -2.01 -1.48 -1.93 4.20
2015 -0.48 7.08 0.56 -1.28 -1.85 -2.14 -1.69 -0.73 -0.68 6.20
2016 -0.60 6.36 0.29 -1.47 -1.93 -2.19 -2.18 -1.21 -0.11 7.62
2017 -0.37 5.72 0.55 -1.20 -1.58 -1.83 -1.93 -1.24 0.45 6.37
2018 -0.59 5.21 0.34 -1.31 -1.90 -1.88 -3.06 -1.69 0.17 6.15
2019 -0.47 5.05 0.55 -1.20 -1.77 -1.68 -2.52 -1.94 -0.51 6.40
2020 0.00 4.90 0.75 -0.74 -1.33 -0.94 -1.50 -1.46 0.21 8.08
2021 -0.02 4.40 0.37 -0.69 -1.38 -0.96 -1.17 -1.28 0.10 9.23
SOURCES: Author's calculations based on SSA data processing audit reports; and Earnings and Employment Data for Workers Covered Under Social Security and Medicare, by State and County, 2014–2021 editions, Table 5.

Comparing MGD-Process and ORES Published Estimates of Worker Counts by State

MGD estimates by state differ from those in the Annual Statistical Supplement and Earnings and Employment in one key aspect. In those publications, the “other” state or area category “includes persons employed in American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands; U.S. citizens employed abroad by U.S. employers; persons employed on U.S. oceanborne vessels; and workers with unknown residence.” The MGD process does not separately account for U.S. citizens employed abroad by U.S. employers or persons employed on U.S. oceanborne vessels. However, the MGD files allow separate estimates for American Samoa, Guam, the Northern Mariana Islands, and the U.S. Virgin Islands because the files contain data for the full population of workers in a given tax year as opposed to the 1 percent sample of SSNs that constitute the CWHS.

Table 12 presents the estimated numbers of Medicare-covered workers by state and other area from Annual Statistical Supplement Table 4.B12 for 2014–2021. Table 13 repeats Table 12 for the MGD estimates and Table 14 shows the percentage differences between the published and MGD estimates. The estimates differed by 3 percent or more in at least 1 year for the District of Columbia, Puerto Rico, and 15 states: Alaska, Arkansas, Connecticut, Delaware, Idaho, Maine, Montana, Nebraska, North Dakota, Oklahoma, Rhode Island, South Dakota, Vermont, West Virginia, and Wyoming (Table 14). These jurisdictions have relatively small work forces. The percentage differences are relatively great across most years for Montana, North Dakota, and South Dakota; but in general, these results appear to be reasonably comparable.

Table 12. Worker counts: ORES published estimates of all Medicare-covered workers, by state or other area, tax years 2014–2021
State or area a 2014 2015 2016 2017 2018 2019 2020 2021
Alabama 2,360,000 2,407,000 2,427,000 2,446,000 2,481,000 2,500,000 2,493,000 2,540,000
Alaska 430,000 436,000 433,000 420,000 426,000 427,000 417,000 423,000
Arizona 3,198,000 3,277,000 3,385,000 3,467,000 3,561,000 3,621,000 3,653,000 3,754,000
Arkansas 1,472,000 1,488,000 1,503,000 1,510,000 1,525,000 1,530,000 1,521,000 1,542,000
California 19,181,000 19,707,000 20,123,000 20,379,000 20,685,000 20,833,000 20,382,000 20,407,000
Colorado 2,982,000 3,082,000 3,158,000 3,215,000 3,282,000 3,340,000 3,306,000 3,369,000
Connecticut 2,039,000 2,050,000 2,061,000 2,059,000 2,071,000 2,065,000 2,033,000 2,046,000
Delaware 513,000 517,000 525,000 534,000 536,000 541,000 537,000 546,000
District of Columbia 403,000 413,000 426,000 430,000 439,000 442,000 411,000 410,000
Florida 9,857,000 10,207,000 10,497,000 10,758,000 11,019,000 11,170,000 11,240,000 11,560,000
Georgia 5,090,000 5,219,000 5,382,000 5,484,000 5,588,000 5,664,000 5,659,000 5,803,000
Hawaii 783,000 792,000 802,000 803,000 806,000 808,000 766,000 756,000
Idaho 836,000 867,000 885,000 916,000 991,000 1,031,000 1,045,000 1,087,000
Illinois 6,968,000 7,012,000 7,070,000 7,070,000 7,119,000 7,090,000 6,914,000 6,921,000
Indiana 3,639,000 3,663,000 3,698,000 3,743,000 3,793,000 3,808,000 3,777,000 3,804,000
Iowa 1,786,000 1,798,000 1,807,000 1,814,000 1,830,000 1,837,000 1,816,000 1,826,000
Kansas 1,600,000 1,613,000 1,623,000 1,621,000 1,635,000 1,649,000 1,629,000 1,634,000
Kentucky 2,271,000 2,296,000 2,317,000 2,329,000 2,343,000 2,355,000 2,322,000 2,343,000
Louisiana 2,396,000 2,407,000 2,394,000 2,384,000 2,406,000 2,408,000 2,375,000 2,353,000
Maine 764,000 757,000 766,000 771,000 781,000 776,000 765,000 773,000
Maryland 3,341,000 3,377,000 3,415,000 3,426,000 3,467,000 3,465,000 3,399,000 3,431,000
Massachusetts 3,874,000 3,930,000 3,990,000 4,029,000 4,061,000 4,095,000 3,986,000 3,994,000
Michigan 5,176,000 5,229,000 5,321,000 5,357,000 5,428,000 5,394,000 5,305,000 5,325,000
Minnesota 3,233,000 3,266,000 3,313,000 3,342,000 3,371,000 3,381,000 3,317,000 3,319,000
Mississippi 1,443,000 1,456,000 1,467,000 1,469,000 1,483,000 1,492,000 1,472,000 1,484,000
Missouri 3,200,000 3,250,000 3,297,000 3,315,000 3,335,000 3,349,000 3,315,000 3,334,000
Montana 575,000 586,000 620,000 655,000 662,000 654,000 654,000 676,000
Nebraska 1,127,000 1,140,000 1,166,000 1,160,000 1,173,000 1,168,000 1,155,000 1,172,000
Nevada 1,387,000 1,439,000 1,487,000 1,521,000 1,592,000 1,628,000 1,638,000 1,677,000
New Hampshire 814,000 827,000 829,000 835,000 843,000 846,000 844,000 853,000
New Jersey 4,910,000 4,964,000 5,036,000 5,079,000 5,154,000 5,175,000 5,067,000 5,058,000
New Mexico 983,000 987,000 1,007,000 1,006,000 1,023,000 1,023,000 1,001,000 1,015,000
New York 10,550,000 10,678,000 10,790,000 10,893,000 10,983,000 11,004,000 10,514,000 10,420,000
North Carolina 5,066,000 5,175,000 5,294,000 5,385,000 5,493,000 5,576,000 5,555,000 5,696,000
North Dakota 501,000 480,000 466,000 463,000 463,000 465,000 447,000 446,000
Ohio 6,319,000 6,384,000 6,448,000 6,482,000 6,524,000 6,558,000 6,458,000 6,486,000
Oklahoma 2,011,000 2,025,000 2,017,000 2,016,000 2,038,000 2,054,000 2,038,000 2,053,000
Oregon 2,057,000 2,123,000 2,179,000 2,220,000 2,257,000 2,276,000 2,228,000 2,249,000
Pennsylvania 6,919,000 6,980,000 7,075,000 7,068,000 7,119,000 7,177,000 7,012,000 6,998,000
Rhode Island 607,000 614,000 621,000 619,000 623,000 627,000 618,000 621,000
South Carolina 2,416,000 2,488,000 2,559,000 2,607,000 2,657,000 2,695,000 2,693,000 2,762,000
South Dakota 636,000 583,000 587,000 582,000 592,000 580,000 566,000 577,000
Tennessee 3,394,000 3,479,000 3,556,000 3,599,000 3,651,000 3,679,000 3,717,000 3,810,000
Texas 13,797,000 14,122,000 14,326,000 14,571,000 14,914,000 15,171,000 15,159,000 15,547,000
Utah 1,517,000 1,572,000 1,629,000 1,677,000 1,731,000 1,774,000 1,801,000 1,862,000
Vermont 388,000 390,000 393,000 389,000 395,000 389,000 380,000 380,000
Virginia 4,558,000 4,625,000 4,684,000 4,721,000 4,778,000 4,827,000 4,769,000 4,803,000
Washington 3,754,000 3,889,000 3,991,000 4,069,000 4,164,000 4,217,000 4,149,000 4,180,000
West Virginia 894,000 891,000 875,000 861,000 863,000 858,000 844,000 851,000
Wisconsin 3,303,000 3,332,000 3,355,000 3,366,000 3,393,000 3,388,000 3,331,000 3,350,000
Wyoming 345,000 340,000 336,000 331,000 339,000 369,000 360,000 367,000
Outlying areas
Puerto Rico 1,135,000 1,120,000 1,192,000 1,102,000 1,095,000 1,052,000 1,044,000 1,089,000
Other and unknown b 893,000 621,000 613,000 594,000 607,000 597,000 594,000 574,000
SOURCE: Annual Statistical Supplement to the Social Security Bulletin, 2014–2021 editions, Table 4.B12.
a. Most state assignments are based on end-of-year residence obtained from electronically filed employer wage reports; the remainder are based on location of employer from reports filed on paper.
b. Persons employed in American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands; U.S. citizens employed abroad by U.S. employers; persons employed on U.S. oceanborne vessels; and workers with unknown residence.
Table 13. Worker counts: MGD-process estimates of all workers, by state or other area, tax years 2014–2021
State or area a 2014 2015 2016 2017 2018 2019 2020 2021
Alabama 2,325,796 2,388,310 2,414,672 2,434,399 2,462,146 2,488,784 2,485,839 2,520,564
Alaska 409,229 424,687 419,867 413,726 412,313 413,536 404,175 408,481
Arizona 3,151,122 3,284,845 3,367,831 3,440,879 3,526,956 3,609,207 3,655,131 3,743,768
Arkansas 1,426,698 1,457,336 1,472,578 1,481,988 1,492,450 1,496,868 1,497,510 1,523,438
California 19,268,475 20,068,848 20,560,184 20,831,971 21,088,252 21,215,640 20,846,927 20,870,865
Colorado 3,003,446 3,083,579 3,155,257 3,221,144 3,284,322 3,342,758 3,323,765 3,364,421
Connecticut 1,955,257 2,001,994 1,960,979 2,010,990 2,016,393 2,014,976 1,996,837 2,005,521
Delaware 496,943 513,128 521,510 526,442 532,920 539,111 539,733 549,625
District of Columbia 387,445 408,389 415,582 421,294 424,089 428,870 405,298 405,590
Florida 9,701,177 10,197,466 10,487,000 10,742,956 10,988,928 11,152,676 11,265,445 11,566,252
Georgia 4,988,221 5,186,872 5,312,128 5,424,421 5,535,555 5,612,552 5,641,376 5,783,500
Hawaii 764,111 780,996 787,986 794,504 796,910 792,340 754,105 754,442
Idaho 830,431 860,575 885,842 911,347 936,980 959,045 983,073 1,020,273
Illinois 6,817,223 6,958,524 6,989,083 7,021,226 7,043,323 7,040,857 6,894,660 6,894,041
Indiana 3,564,373 3,665,311 3,704,413 3,748,290 3,779,379 3,801,776 3,791,005 3,814,817
Iowa 1,742,904 1,783,690 1,793,124 1,801,410 1,805,117 1,805,893 1,792,534 1,801,127
Kansas 1,585,286 1,613,484 1,620,607 1,625,200 1,633,960 1,640,746 1,629,095 1,636,034
Kentucky 2,209,114 2,265,838 2,289,107 2,302,804 2,309,635 2,322,685 2,306,053 2,330,616
Louisiana 2,338,113 2,372,571 2,361,671 2,350,280 2,361,527 2,369,296 2,333,271 2,324,887
Maine 738,007 753,540 764,336 767,413 773,436 774,008 768,248 779,719
Maryland 3,268,482 3,350,170 3,388,995 3,408,262 3,434,842 3,436,965 3,391,726 3,408,216
Massachusetts 3,845,797 3,944,070 4,000,144 4,041,517 4,078,208 4,100,804 3,991,073 4,004,184
Michigan 5,058,225 5,169,787 5,244,489 5,285,148 5,378,137 5,343,726 5,285,932 5,286,100
Minnesota 3,167,658 3,250,576 3,297,866 3,326,133 3,355,138 3,367,108 3,327,763 3,320,767
Mississippi 1,413,954 1,451,939 1,467,784 1,471,644 1,476,733 1,486,130 1,477,402 1,490,951
Missouri 3,183,600 3,261,488 3,303,758 3,327,867 3,340,359 3,350,259 3,342,883 3,361,579
Montana 554,124 573,446 580,665 585,446 587,780 593,043 598,583 612,845
Nebraska 1,081,944 1,107,629 1,116,569 1,124,763 1,128,910 1,134,434 1,129,986 1,137,561
Nevada 1,414,026 1,465,951 1,512,861 1,561,380 1,615,402 1,655,986 1,659,852 1,695,859
New Hampshire 796,267 817,153 825,705 833,831 839,972 843,405 838,480 844,784
New Jersey 4,852,733 4,966,950 5,034,195 5,095,263 5,150,877 5,185,547 5,110,033 5,126,751
New Mexico 989,627 1,006,490 1,008,501 1,009,091 1,020,542 1,029,587 1,010,825 1,020,516
New York 10,416,133 10,678,873 10,806,679 10,922,115 11,003,466 11,053,935 10,643,127 10,554,208
North Carolina 4,966,546 5,115,446 5,240,242 5,337,426 5,435,448 5,523,019 5,542,468 5,661,980
North Dakota 453,455 468,336 458,778 459,281 458,714 461,738 455,750 456,224
Ohio 6,166,402 6,294,066 6,343,122 6,386,521 6,424,644 6,447,439 6,379,076 6,396,538
Oklahoma 1,951,098 1,987,046 1,973,074 1,982,869 1,999,128 2,015,568 2,012,617 2,034,754
Oregon 2,020,505 2,126,093 2,180,873 2,224,536 2,258,054 2,281,035 2,255,693 2,263,442
Pennsylvania 6,777,747 6,899,769 6,980,708 6,995,321 7,045,836 7,101,282 6,972,920 6,965,525
Rhode Island 580,233 593,221 601,430 607,022 613,885 614,376 610,396 612,323
South Carolina 2,379,884 2,477,374 2,540,407 2,587,617 2,640,018 2,679,565 2,685,369 2,746,858
South Dakota 490,019 502,293 506,651 509,830 511,439 514,511 514,823 526,912
Tennessee 3,302,681 3,429,108 3,502,364 3,554,071 3,607,847 3,645,676 3,692,406 3,777,793
Texas 13,622,711 14,122,676 14,350,175 14,593,561 14,944,151 15,234,809 15,293,679 15,679,499
Utah 1,512,227 1,583,771 1,633,174 1,683,701 1,733,986 1,778,954 1,812,766 1,871,433
Vermont 368,187 374,871 376,491 377,378 377,200 376,241 371,688 373,695
Virginia 4,434,859 4,611,706 4,675,327 4,724,376 4,771,246 4,818,371 4,783,932 4,821,379
Washington 3,721,972 3,890,208 3,991,527 4,087,639 4,161,543 4,235,745 4,177,973 4,219,439
West Virginia 866,120 878,925 868,930 861,506 860,471 859,274 847,043 849,965
Wisconsin 3,246,057 3,324,258 3,351,792 3,374,284 3,395,063 3,397,352 3,358,322 3,368,438
Wyoming 333,544 338,126 328,942 323,232 325,533 328,489 325,573 326,765
Outlying areas
Puerto Rico 1,038,677 1,060,490 1,054,463 1,033,362 982,188 958,477 922,869 1,021,089
Other 1,895,607 342,370 337,686 342,774 366,392 368,351 361,849 377,459
American Samoa 10,086 10,806 13,646 12,282 12,204 12,004 8,858 11,244
Guam 75,418 76,774 81,135 80,393 80,727 82,070 79,603 77,792
Northern Mariana Islands 9,113 13,319 15,362 19,802 13,774 14,309 13,328 15,719
U.S. Virgin Islands 43,039 43,197 43,928 35,418 40,115 39,815 38,389 42,342
Unknown residence 1,757,951 198,274 183,615 194,879 219,572 220,153 221,671 230,362
SOURCE: Author's calculations based on MGD files.
a. Most state assignments are based on end-of-year residence obtained from electronically filed employer wage reports; the remainder are based on location of employer from reports filed on paper.
Table 14. Percentages by which MGD-process estimates of all workers differ from ORES published estimates of all Medicare-covered workers, by state or other area, tax years 2014–2021
State or area a 2014 2015 2016 2017 2018 2019 2020 2021
Alabama -1.47 -0.78 -0.51 -0.48 -0.77 -0.45 -0.29 -0.77
Alaska -5.08 -2.66 -3.13 -1.52 -3.32 -3.26 -3.17 -3.55
Arizona -1.49 0.24 -0.51 -0.76 -0.97 -0.33 0.06 -0.27
Arkansas -3.18 -2.10 -2.07 -1.89 -2.18 -2.21 -1.57 -1.22
California 0.45 1.80 2.13 2.17 1.91 1.80 2.23 2.22
Colorado 0.71 0.05 -0.09 0.19 0.07 0.08 0.53 -0.14
Connecticut -4.28 -2.40 -5.10 -2.39 -2.71 -2.48 -1.81 -2.02
Delaware -3.23 -0.75 -0.67 -1.44 -0.58 -0.35 0.51 0.66
District of Columbia -4.01 -1.13 -2.51 -2.07 -3.52 -3.06 -1.41 -1.09
Florida -1.61 -0.09 -0.10 -0.14 -0.27 -0.16 0.23 0.05
Georgia -2.04 -0.62 -1.32 -1.10 -0.95 -0.92 -0.31 -0.34
Hawaii -2.47 -1.41 -1.78 -1.07 -1.14 -1.98 -1.58 -0.21
Idaho -0.67 -0.75 0.10 -0.51 -5.77 -7.50 -6.30 -6.54
Illinois -2.21 -0.77 -1.16 -0.69 -1.07 -0.70 -0.28 -0.39
Indiana -2.09 0.06 0.17 0.14 -0.36 -0.16 0.37 0.28
Iowa -2.47 -0.80 -0.77 -0.70 -1.38 -1.72 -1.31 -1.38
Kansas -0.93 0.03 -0.15 0.26 -0.06 -0.50 0.01 0.12
Kentucky -2.80 -1.33 -1.22 -1.14 -1.44 -1.39 -0.69 -0.53
Louisiana -2.48 -1.45 -1.37 -1.43 -1.88 -1.63 -1.79 -1.21
Maine -3.52 -0.46 -0.22 -0.47 -0.98 -0.26 0.42 0.86
Maryland -2.22 -0.80 -0.77 -0.52 -0.94 -0.82 -0.21 -0.67
Massachusetts -0.73 0.36 0.25 0.31 0.42 0.14 0.13 0.25
Michigan -2.33 -1.15 -1.46 -1.36 -0.93 -0.94 -0.36 -0.74
Minnesota -2.06 -0.47 -0.46 -0.48 -0.47 -0.41 0.32 0.05
Mississippi -2.05 -0.28 0.05 0.18 -0.42 -0.39 0.37 0.47
Missouri -0.52 0.35 0.20 0.39 0.16 0.04 0.83 0.82
Montana -3.77 -2.19 -6.77 -11.88 -12.63 -10.28 -9.26 -10.31
Nebraska -4.16 -2.92 -4.43 -3.13 -3.91 -2.96 -2.21 -3.03
Nevada 1.91 1.84 1.71 2.59 1.45 1.69 1.32 1.11
New Hampshire -2.23 -1.21 -0.40 -0.14 -0.36 -0.31 -0.66 -0.97
New Jersey -1.18 0.06 -0.04 0.32 -0.06 0.20 0.84 1.34
New Mexico 0.67 1.94 0.15 0.31 -0.24 0.64 0.97 0.54
New York -1.29 0.01 0.15 0.27 0.19 0.45 1.21 1.27
North Carolina -2.00 -1.16 -1.03 -0.89 -1.06 -0.96 -0.23 -0.60
North Dakota -10.49 -2.49 -1.57 -0.81 -0.93 -0.71 1.92 2.24
Ohio -2.47 -1.43 -1.65 -1.50 -1.55 -1.71 -1.24 -1.40
Oklahoma -3.07 -1.91 -2.23 -1.67 -1.94 -1.91 -1.26 -0.90
Oregon -1.81 0.15 0.09 0.20 0.05 0.22 1.23 0.64
Pennsylvania -2.08 -1.16 -1.35 -1.04 -1.04 -1.07 -0.56 -0.47
Rhode Island -4.61 -3.50 -3.25 -1.97 -1.48 -2.05 -1.25 -1.42
South Carolina -1.52 -0.43 -0.73 -0.75 -0.64 -0.58 -0.28 -0.55
South Dakota -29.79 -16.07 -15.86 -14.16 -15.75 -12.73 -9.94 -9.51
Tennessee -2.76 -1.45 -1.53 -1.26 -1.20 -0.91 -0.67 -0.85
Texas -1.28 0.00 0.17 0.15 0.20 0.42 0.88 0.85
Utah -0.32 0.74 0.26 0.40 0.17 0.28 0.65 0.50
Vermont -5.38 -4.04 -4.38 -3.08 -4.72 -3.39 -2.24 -1.69
Virginia -2.78 -0.29 -0.19 0.07 -0.14 -0.18 0.31 0.38
Washington -0.86 0.03 0.01 0.46 -0.06 0.44 0.69 0.93
West Virginia -3.22 -1.37 -0.70 0.06 -0.29 0.15 0.36 -0.12
Wisconsin -1.75 -0.23 -0.10 0.25 0.06 0.28 0.81 0.55
Wyoming -3.43 -0.55 -2.15 -2.40 -4.14 -12.33 -10.57 -12.31
Outlying areas
Puerto Rico -9.27 -5.61 -13.04 -6.64 -11.49 -9.76 -13.13 -6.65
Other and unknown b, c 52.89 -81.38 -81.53 -73.29 -65.67 -62.07 -64.16 -52.07
SOURCE: Annual Statistical Supplement to the Social Security Bulletin, 2014–2021 editions, Table 4.B12.
a. Most state assignments are based on end-of-year residence obtained from electronically filed employer wage reports; the remainder are based on location of employer from reports filed on paper.
b. Persons employed in American Samoa, Guam, Northern Mariana Islands, and U.S. Virgin Islands; U.S. citizens employed abroad by U.S. employers; persons employed on U.S. oceanborne vessels; and workers with unknown residence.
c. Compares the "Other and unknown" row in Table 12 with the "Other" row in Table 13.

Worker Count Estimates by County

Estimating the number of workers by county is much more complicated than preparing state-level estimates given the sheer number of calculations and the data disclosure restrictions that apply for low-population counties. In Table 6 of Earnings and Employment, ORES publishes estimated counts of Medicare-covered workers by state and county. For the 2021 edition, ORES computed estimates for each of the 50 states and Puerto Rico, and for each of the 3,225 U.S. counties or county equivalents represented in the 2021 MGD file, or 3,276 (3,225 plus 51) computations for each of nine categories of workers: all, male, and female earners with any, wage and salary, and self-employment Medicare-covered income. Thus, ORES computed 29,484 worker count estimates (3,276 × 9) for the 2021 edition of Earnings and Employment Table 6.

However, many of those computations were not published. Instead, they were suppressed because of data nondisclosure requirements. Primary cell suppression applies to any estimates based on unweighted counts that do not meet the disclosure threshold. Thus, for any county with fewer total workers than the disclosure threshold, the estimates for all nine categories of workers must be suppressed. Secondary cell suppression arises when an estimated value that is below the disclosure threshold can be inferred based on other estimates, as often occurs with estimates broken down by sex. In other words, if the unweighted counts of either male or female workers in a county do not meet or exceed the disclosure threshold, both estimates must be suppressed. Cell suppression is common for county-level estimates of self-employed individuals by sex, as unweighted counts often do not meet the disclosure thresholds. Secondary cell suppression is thus also required for estimated counts of wage and salary workers in those counties, because if those figures were disclosed, they could be subtracted from the all-workers figures to determine the self-employed individual counts.

Even though this note focuses on the worker count estimates and puts earnings estimates aside, the number of county-level estimates and the complexity of incorporating data nondisclosure procedures precludes the presentation of county-level estimates in this setting. Instead, I focus on a key question posed by developing a new methodology for generating the annual employment and earnings estimates: What sample size best reduces the effect of the data nondisclosure restrictions?

Effect of Data Nondisclosure Requirements

The unweighted number of self-employed individuals in a 1 percent sample of SSNs is too low to generate viable estimates for all U.S. counties. Using the 2021 MGD file, I compare the effects of using a 1 percent sample, a 10 percent sample, or a full population of workers on the number of county-level estimates that must be suppressed to comply with data nondisclosure rules.13

The process begins by assigning a randomly generated number from 1 to 100 to the record of each worker in the MGD file. Records for workers assigned a 1 are selected for the 1 percent sample and those assigned 1–10 are selected for the 10 percent sample. The next step identifies all the workers whose records have a valid SSN, who have an identifier of male or female, and whose indicated age is within 1–99. For these samples I compute two separate county-level worker count estimates for all workers, wage and salary workers, and self-employed individuals. The two estimates are (1) for the total number of workers in the county (that is, regardless of sex) and (2) for the male and female workers in the county (that is, workers by sex).

For county-level estimates of all workers (that is, regardless of earnings type), the data nondisclosure rules are straightforward: If the unweighted count of all workers in the county is below the disclosure threshold, the estimates will be suppressed. However, secondary cell suppression rules must be applied to the estimates for wage and salary workers and self-employed individuals. If the estimated number of wage and salary workers in a county must be suppressed, then the corresponding estimate for self-employed individuals must also be suppressed, and vice versa. Dividing the unweighted counts of workers both by sex and by earnings type multiplies the number of counties affected by the nondisclosure rules.

The total number of workers in the 1 percent sample is 1,801,744 and the total number of U.S. counties in the MGD file for the entire population of workers is 3,225 (Table 15). Two counties are omitted from the 1 percent sample for as-yet unresolved discrepancies in the underlying geographic data. For counts of all workers—combining both earnings types and both sexes—the estimates for 100 counties (or 3.1 percent of U.S. counties) would have to be suppressed in the 1 percent sample. For all wage and salary workers, the estimates for 111 counties (3.4 percent of U.S. counties) would have to be suppressed. For all self-employed individuals, however, the estimates for significantly more counties—1,170, or 36.3 percent of U.S. counties—would need to be suppressed. Thus, given secondary cell suppression rules, using the 1 percent sample would require the corresponding estimates for wage and salary workers also to be suppressed.

Table 15. Effects of alternative hypothetical sample sizes on the suppression rate of MGD-process county-level worker count estimates by earnings type, 2021
Measure All workers a Wage and salary workers Self-employed individuals
Number Percent Number Percent Number Percent
Total U.S. counties 3,225 100.0 3,225 100.0 3,225 100.0
  1 percent sample (similar to that currently used for statistical publications)
Counties in sample 3,223 100.0 3,222 100.0 3,164 100.0
With total worker estimates (men and women combined)—
Published 3,123 96.9 3,111 96.6 1,994 63.0
Suppressed b 100 3.1 111 3.4 1,170 36.3
With worker estimates by sex (men only or women only)—
Published 2,961 91.9 2,929 90.9 1,316 41.6
Suppressed b 262 8.1 293 9.1 1,848 58.4
Workers represented 1,801,744 . . . 1,693,573 . . . 207,582 . . .
  10 percent sample (option available in MGD process)
Counties in sample 3,225 100.0 3,225 100.0 3,224 100.0
With total worker estimates (men and women combined)—
Published 3,223 99.9 3,223 99.9 3,187 98.9
Suppressed b 2 0.1 2 0.1 37 1.1
With worker estimates by sex (men only or women only)—
Published 3,221 99.9 3,219 99.8 3,087 95.8
Suppressed b 4 0.1 6 0.2 137 4.2
Workers represented 18,015,649 . . . 16,923,540 . . . 2,080,712 . . .
  Full worker population (option available in MGD process)
Counties in sample 3,225 100.0 3,225 100.0 3,224 100.0
With total worker estimates (men and women combined)—
Published 3,225 100.0 3,223 99.9 3,223 100.0
Suppressed b 0 0.0 2 0.1 1 0.0
With worker estimates by sex (men only or women only)—
Published . 100.0 3,219 99.8 3,221 99.9
Suppressed b 0 0.0 6 0.2 3 0.1
Workers represented 180,166,715 . . . 169,341,674 . . . 20,805,637 . . .
SOURCE: Author's calculations using the 2021 MGD file.
NOTES: Includes U.S. territories.
. . . = not applicable.
a. Workers with earnings from both wage and salary employment and self-employment are counted in each type of earnings but only once in the total.
b. Values are for primary suppression only. Because of secondary suppression requirements, the actual number of suppressed estimates for both wage and salary workers and self-employed individuals would be equal to the higher of those two values.

When breaking down the estimated worker counts by sex, the number of counties requiring cell suppression increases significantly across all three earnings-type categories. For the counts of all male workers and of all female workers, the estimates for 8.1 percent of counties would need to be suppressed. For male and female wage and salary workers, the estimates for 9.1 percent of U.S. counties would require suppression. However, in computing separate estimated numbers of self-employed individuals by sex, the estimates for a majority (58.4 percent) of U.S. counties would require suppression, which in turn would require the secondary suppression of the corresponding cells for wage and salary workers by sex in those counties. Because the 2021 edition of Earnings and Employment also used a 1 percent sample as the basis of its estimates, the suppression rate for the published tables was close to 60 percent.

The 10 percent sample of the MGD file contains records for 18,015,649 workers. The ten-fold increase in the sample size dramatically reduces the number of county-level estimates that require suppression. For counts of all workers—combining both earnings types and both sexes—the number of counties requiring suppressed estimates decreases from 100 for the 1 percent sample to 2 for the 10 percent sample. For the counts of all workers—combing both earnings types—with breakdowns by sex, the number of counties requiring suppressed estimates decreases from 262 to 4. For wage and salary workers, the number of county-level estimates of combined male and female workers that must be suppressed decreases from 111 to 2 and the number of estimates by sex decreases from 293 to 6. In moving from the 1 percent sample to the 10 percent sample, the largest decrease in cell suppression occurs for self-employed individuals. For both sexes combined, the number of suppressed county-level estimates drops from 1,170 to 37 and for estimates by sex, it drops from 1,848 to 137. For estimates by earnings type, the percentage of county-level total-worker estimates for men and women combined that would require suppression would drop from 36.3 percent to 1.1 percent. For the same estimates broken down by sex, the suppression rate would decrease from 58.4 percent to 4.2 percent.

The full worker population MGD file for tax year 2021 contains records for 180,166,715 workers. Using the full population of workers in the MGD file would limit the number of county-level worker count estimates needing to be suppressed to 6 in the entire Earnings and Employment publication.

Potential Addition of Maps and New Tables to the Statistical Publications

In addition to replacing the CWHS 1 percent sample with the MGD file for the entire population of workers for its statistical publications, ORES is considering the addition of charts to provide visualizations of geographic earnings and employment distributions and new tables to provide further insights on the U.S. labor force.

To illustrate how maps could enhance the content of the publications, Chart 1 provides a graphic presentation of the statistics shown in Table 15, with separate panels for each MGD sample size. Although Table 15 showed that the estimates for 58 percent of the U.S. counties require suppression under a 1 percent sample, Panel A provides a visualization that highlights the prevalence and the geographic patterns of the suppression. Panel B shows that replacing the 1 percent sample with a 10 percent sample would dramatically reduce the number of suppressed county-level estimates. Panel C reveals that using the 2021 MGD full population of workers would allow the removal of nearly all county-level publication restrictions. These maps vividly display the stark contrasts in cell suppression between the three sample sizes.

Chart 1. Illustrative map described in text. There are three panels. Panel 1 illustrates a 1% sample and shows a majority of counties suppressed. Panel 2 illustrates a 10% sample and shows a small portion of counties suppressed. Panel 3 illustrates the full population and shows only a few counties suppressed.

Chart 2 shows states and counties grouped by worker population-size quintiles. The maps provide a visual perspective on the geographic distribution of worker counts that the statistical tables cannot provide. Chart 2 features quintiles only to illustrate how maps could contribute to ORES presentation of statistical data. The final version of this sort of map might provide alternative representations of worker counts (such as quartiles or deciles) with which to highlight the key differences across states and counties.

Chart 2. Illustrative maps described in text.

ORES is also considering adding maps for each state that would provide county-level detail. Chart 3 presents three examples using worker-count quintile groupings. Note that where Chart 2 arranged states and counties by quintile according to their national rankings, each Chart 3 panel arranges the counties by their quintile rankings within their state.

Chart 3. Illustrative maps described in text.

The MGD file also enables ORES to consider adding new tables on earnings and employment to the existing annual statistical publications. For example, Table 16 presents the percentage distribution of workers by sex in each tax year 2014–2021, shown separately for all, wage and salary, and self-employed individuals. Among all workers, the share who are women increased slightly, from 48.4 percent to 48.8 percent, from tax years 2014 to 2021. The share of wage and salary workers who are women likewise rose, from 48.8 percent to 49.1 percent. By far, the largest increase in the female share of workers occurred among the self-employed, from 43.6 percent in 2014 to 45.2 percent in 2021. Yet despite this increase, men represent a disproportionate share of self-employed individuals, especially relative to the split for wage and salary workers.

Table 16. Number and percentage distribution of workers by sex in each earnings-type category, tax years 2014–2021
Tax year Total Men Women
Number Percent Number Percent Number Percent
  All workers a
2014 167,874,533 100.0 86,554,357 51.6 81,320,176 48.4
2015 171,534,731 100.0 88,462,230 51.6 83,072,501 48.4
2016 174,168,264 100.0 89,664,098 51.5 84,504,166 48.5
2017 176,311,525 100.0 90,686,951 51.4 85,624,574 48.6
2018 178,527,817 100.0 91,677,783 51.4 86,850,034 48.6
2019 180,042,880 100.0 92,247,594 51.2 87,795,286 48.8
2020 178,499,006 100.0 91,332,698 51.2 87,166,308 48.8
2021 180,313,849 100.0 92,260,151 51.2 88,053,698 48.8
  Wage and salary
2014 158,682,500 100.0 81,255,302 51.2 77,427,198 48.8
2015 161,480,185 100.0 82,664,297 51.2 78,815,888 48.8
2016 164,171,152 100.0 83,907,441 51.1 80,263,711 48.9
2017 166,165,895 100.0 84,850,288 51.1 81,315,607 48.9
2018 168,335,976 100.0 85,830,038 51.0 82,505,938 49.0
2019 170,228,900 100.0 86,672,090 50.9 83,556,810 49.1
2020 168,662,331 100.0 85,777,999 50.9 82,884,332 49.1
2021 169,486,721 100.0 86,192,409 50.9 83,294,312 49.1
  Self-employed
2014 17,284,289 100.0 9,753,284 56.4 7,531,005 43.6
2015 19,068,333 100.0 10,762,981 56.4 8,305,352 43.6
2016 19,193,168 100.0 10,807,316 56.3 8,385,852 43.7
2017 19,432,607 100.0 10,898,755 56.1 8,533,852 43.9
2018 19,628,293 100.0 10,945,950 55.8 8,682,343 44.2
2019 19,028,561 100.0 10,496,410 55.2 8,532,151 44.8
2020 18,552,136 100.0 10,230,550 55.1 8,321,586 44.9
2021 20,808,898 100.0 11,393,362 54.8 9,415,536 45.2
SOURCE: Author's calculations using the 2021 MGD file.
a. Workers with earnings from both wage and salary employment and self-employment are counted in each type of earnings but only once under "all workers."

ORES may also add Table 17 to one of its annual publications. It highlights the year-over-year changes in worker counts by sex and earnings type for tax years 2014 to 2021. The apparent increase from 2014 to 2015 was larger for both sexes and both earnings types than those for every other year (except self-employed individuals from 2020 to 2021). This may reflect an irregularity with the tax year 2014 MGD file. The rates of increase in the numbers of all workers and wage and salary workers for 2015–2021 seem to be reasonable. The relatively large decreases in the numbers self-employed individuals in 2019 and 2020 suggest that COVID-19 had a larger effect on the self-employed than on wage and salary workers. Alternatively, it may be that the pandemic's disruption of IRS workflows, which caused a backlog for about 2 years, disproportionately affected Schedule SE tax returns. Thus, the sharp increase in the count of self-employed individuals in 2021, after 2 years of declines, seemingly indicates that the IRS succeeded in reducing much of the Schedule SE backlog in 2021.

Table 17. Number of workers by sex and earnings type, tax years 2014–2021
Tax year Total Men Women
Number Percent change from previous year Number Percent change from previous year Number Percent change from previous year
  All workers a
2014 167,874,533 . . . 86,554,357 . . . 81,320,176 . . .
2015 171,534,731 2.2 88,462,230 2.2 83,072,501 2.2
2016 174,168,264 1.5 89,664,098 1.4 84,504,166 1.7
2017 176,311,525 1.2 90,686,951 1.1 85,624,574 1.3
2018 178,527,817 1.3 91,677,783 1.1 86,850,034 1.4
2019 180,042,880 0.8 92,247,594 0.6 87,795,286 1.1
2020 178,499,006 -0.9 91,332,698 -1.0 87,166,308 -0.7
2021 180,313,849 1.0 92,260,151 1.0 88,053,698 1.0
  Wage and salary
2014 158,682,500 . . . 81,255,302 . . . 77,427,198 . . .
2015 161,480,185 1.8 82,664,297 1.7 78,815,888 1.8
2016 164,171,152 1.7 83,907,441 1.5 80,263,711 1.8
2017 166,165,895 1.2 84,850,288 1.1 81,315,607 1.3
2018 168,335,976 1.3 85,830,038 1.2 82,505,938 1.5
2019 170,228,900 1.1 86,672,090 1.0 83,556,810 1.3
2020 168,662,331 -0.9 85,777,999 -1.0 82,884,332 -0.8
2021 169,486,721 0.5 86,192,409 0.5 83,294,312 0.5
  Self-employed
2014 17,284,289 . . . 9,753,284 . . . 7,531,005 . . .
2015 19,068,333 10.3 10,762,981 10.4 8,305,352 10.3
2016 19,193,168 0.7 10,807,316 0.4 8,385,852 1
2017 19,432,607 1.2 10,898,755 0.8 8,533,852 1.8
2018 19,628,293 1.0 10,945,950 0.4 8,682,343 1.7
2019 19,028,561 -3.1 10,496,410 -4.1 8,532,151 -1.7
2020 18,552,136 -2.5 10,230,550 -2.5 8,321,586 -2.5
2021 20,808,898 12.2 11,393,362 11.4 9,415,536 13.1
SOURCE: Author's calculations using the 2021 MGD file.
NOTE: . . . = not applicable.
a. Workers with earnings from both wage and salary employment and self-employment are counted in each type of earnings but only once under "all workers."

Additional Detail in Earnings-Type Categories

As noted earlier, the MGD files identify a new earnings-type subcategory. The statistical publications cover two earnings types—wage and salary, and self-employment income—and include separate computations for all workers combined. Because workers may have both type of earnings in a year, the sum of wage and salary workers and self-employed individuals exceeds the all-workers figure in the statistical publications. To alleviate this overlap, the MGD file sorts worker records among three mutually exclusive earnings-type categories: wages and salary only, self-employment income only, and both types of earnings, or so-called combination workers.

Table 18 shows the number and percentage distribution of workers for each of the mutually exclusive earnings-type categories from 2014 to 2021. It shows a slight increase in the percentage of women reporting only wage and salary earnings, from 49.0 percent in 2014 to 49.3 percent in 2021. The increase in the percentage of women reporting self-employment income only is greater, from 42.4 percent in 2014 to 44.0 percent in 2021. The percentage of women with both wage and salary and self-employment income rose from 45.0 percent in 2014 to 46.6 percent in 2021, mirroring the 1.6 percentage-point increase in women with self-employment income only. This table exemplifies the sort of new insights on labor force dynamics that can be added to the existing statistical publications. ORES is exploring similar expansions of coverage by age and state and the addition of tables showing estimated mean and median earnings amounts.

Table 18. Number and percentage distribution of workers by sex in each mutually exclusive earnings-type category, tax years 2014–2021
Tax year Total Men Women
Number Percent Number Percent Number Percent
  Wage and salary only
2014 150,590,244 100.0 76,801,073 51.0 73,789,171 49.0
2015 152,466,398 100.0 77,699,249 51.0 74,767,149 49.0
2016 154,975,096 100.0 78,856,782 50.9 76,118,314 49.1
2017 156,878,918 100.0 79,788,196 50.9 77,090,722 49.1
2018 158,899,524 100.0 80,731,833 50.8 78,167,691 49.2
2019 161,014,319 100.0 81,751,184 50.8 79,263,135 49.2
2020 159,946,870 100.0 81,102,148 50.7 78,844,722 49.3
2021 159,504,951 100.0 80,866,789 50.7 78,638,162 49.3
  Self-employed only
2014 9,192,033 100.0 5,299,055 57.6 3,892,978 42.4
2015 10,054,546 100.0 5,797,933 57.7 4,256,613 42.3
2016 9,997,112 100.0 5,756,657 57.6 4,240,455 42.4
2017 10,145,630 100.0 5,836,663 57.5 4,308,967 42.5
2018 10,191,841 100.0 5,847,745 57.4 4,344,096 42.6
2019 9,813,980 100.0 5,575,504 56.8 4,238,476 43.2
2020 9,836,675 100.0 5,554,699 56.5 4,281,976 43.5
2021 10,827,128 100.0 6,067,742 56.0 4,759,386 44.0
  Combination
2014 8,092,256 100.0 4,454,229 55.0 3,638,027 45.0
2015 9,013,787 100.0 4,965,048 55.1 4,048,739 44.9
2016 9,196,056 100.0 5,050,659 54.9 4,145,397 45.1
2017 9,286,977 100.0 5,062,092 54.5 4,224,885 45.5
2018 9,436,452 100.0 5,098,205 54.0 4,338,247 46.0
2019 9,214,581 100.0 4,920,906 53.4 4,293,675 46.6
2020 8,715,461 100.0 4,675,851 53.7 4,039,610 46.3
2021 9,981,770 100.0 5,325,620 53.4 4,656,150 46.6
SOURCES: Author's calculations using the 2021 MGD file.

Summary

This note presents preliminary worker count estimates from the 2014 to 2021 MGD files and compares them with two benchmark estimates prepared independently by OCACT in support of the annual Trustees Report and by ORES for inclusion in two of its annual statistical publications. The comparisons of the MGD estimates with the benchmarks are broadly encouraging. The estimated numbers of all workers and of wage and salary workers differ only by small percentages. In addition, the MGD estimates of workers with only wage and salary earnings differed little from the OCACT estimates. However, the MGD estimates differed substantially from both benchmarks for self-employed individuals and from the OCACT estimates for workers with only self-employment earnings and for the “combination workers” with both wage and salary and self-employment earnings. These differences indicate a critical need to incorporate nonprimary tax year data—that is, data from tax forms that are processed more than 1 year after the earnings year—into each tax year's MGD file. This need appears to be particularly important for the years immediately following the COVID-19 pandemic, when the IRS experienced very large processing backlogs. For the most part, the comparisons with published state-level estimates were consistent with previous comparisons for state-level estimates for tax year 2017 (Compson 2024). The largest percentage differences generally occur for states or territories with relatively small work forces.

Comparing the MGD estimates against the benchmarks has uncovered some limitations of the MGD files as currently structured. Specifically, the records for some workers identify a state of residence but contain an unknown value for county of residence. ORES is investigating potential methods of obtaining ZIP Code data that would enable the imputation of a valid SCC for these workers.

The MGD file enables ORES to estimate worker counts using person-level microdata for virtually the entire population of workers. This makes it possible for the statistical publications to present worker counts for several U.S. territories for the first time. ORES is also considering the addition of maps and new tables to the publications.

This note shares the results of these preliminary comparisons with researchers, policy analysts, and staff of the various federal agencies that collect and disseminate U.S. labor market data. ORES welcomes feedback on the MGD methodology for assigning state-of-residence and demographic information to worker records and on the preliminary results presented herein.

Notes

1 The Annual Statistical Supplement is available at https://www.ssa.gov/policy/docs/statcomps/supplement/index.html. Earnings and Employment is available at https://www.ssa.gov/policy/docs/statcomps/eedata_sc/index.html.

2 For the purposes of this note, a worker is defined as any individual who had a tax record processed by SSA or the IRS in a given calendar year.

3 ORES welcomes feedback on the MGD process and the estimates it generates at statistics@ssa.gov.

4 Some jobs are not subject to Social Security payroll taxes but virtually all jobs are subject to the Medicare tax.

5 IRS Form W-2 is the annual wage and tax statement that employers file on behalf of employees. Form W-2c, “Corrected Wage and Tax Statement,” is filed when a worker's original W-2 contained any errors or otherwise needs to be updated.

6 The automated process for assigning SCCs based on addresses on tax forms identified a single SCC for at least 94 percent of worker-level records for tax years 2015–2020 but, for yet unknown reasons, only 34 percent of the records for tax year 2014.

7 In the statistical publications, the wage and salary category includes individuals who have both wage and salary and self-employment income. Likewise, the self-employed category includes workers with both wage and salary and self-employment income. Consequently, some workers are counted in both categories in the published tables. By contrast, estimated wage and salary earnings amounts are shown only for work in that category, and self-employment earnings amounts are shown only for work while self-employed.

8 Although the MGD file contains records for about 98 percent of U.S. workers each year, ORES experimented with using a 10 percent sample—which would significantly streamline processing—to test the extent to which it could reduce cell suppression.

9 SSA's data files do not accommodate other sex designations.

10 Because the unpublished OCACT estimates do not estimate worker counts by sex, the MGD-process estimates are compared only with those of the statistical publications.

11 Because the unpublished OCACT estimates do not estimate worker counts by age, the MGD-process estimates are compared only with those of the statistical publications.

12 As ORES explores potential methodologies for adding nonprimary tax year data to the MGD process, it must assess how many of the workers with nonprimary tax year data are already in the annual MGD files and whether to cap the number of follow-up years for which it will continue to add nonprimary tax year data.

13 A 100 percent sample would obviously provide the most granular results and minimize cell suppression, but it requires comparatively slow and cumbersome processing. A 10 percent sample might provide most of the advantages of the full population while requiring far fewer data processing resources.

References

Compson, Michael. 2022. “Improving County-Level Earnings Estimates with a New Methodology for Assigning Geographic and Demographic Information for U.S. Workers.” Social Security Bulletin 82(1): 11–28.

———. 2024 “Evaluating a New Process for Assigning Geographic Residence Codes and Identifying Demographic Information for Workers in a Given Tax Year.” Social Security Bulletin 84(1): 1–47.