Comparing Earnings Estimates from the 2006 Earnings Public-Use File and the Annual Statistical Supplement
Research and Statistics Note No. 2012-01 (released January 2012)
Michael Compson is with the Division of Policy Evaluation, Office of Research, Evaluation, and Statistics, Office of Retirement and Disability Policy, Social Security Administration.
Acknowledgments: The author gratefully acknowledges the assistance of many individuals in the process of creating the 2006 Earnings Public-Use File and this note: John Hennessey for graciously sharing his programming and methodological expertise; Russell Hudson for his programming expertise and sharing his vast knowledge of the earnings data; Scott Muller and Greg Diez for sharing programmatic and earnings knowledge; Sirisha Anne, Brenda South, Stu Friedrich, and Randall Miles for their assistance in providing the data extracts used in the process of creating EPUF; Susan Grad, Howard Iams, Hilary Waldron, and Anya Olsen, for their comments on previous drafts of the paper; and finally, Bill Davis and Justin Ronca for their statistical expertise.
The findings and conclusions presented in this paper are those of the author and do not necessarily represent the views of the Social Security Administration.
|CWHS||Continuous Work History Sample|
|EPUF||Earnings Public-Use File|
|ESF||Earnings Suspense File|
|MEF||Master Earnings File|
|OCACT||Office of the Chief Actuary|
|ORES||Office of Research, Evaluation, and Statistics|
|SSA||Social Security Administration|
|SSN||Social Security number|
The Social Security Administration (SSA) recently released the 2006 Earnings Public-Use File (EPUF).1 The EPUF contains earnings information for individuals drawn from a systematic random 1-percent sample of all Social Security numbers (SSNs) issued before January 2007. EPUF consists of two linkable subfiles. One contains selected demographic and aggregate earnings information for all 4,348,254 individuals in the file, and the second contains annual earnings records for the 3,131,424 individuals who had positive earnings in at least 1 year from 1951 through 2006.2
Evaluating the accuracy of the EPUF estimates was a critical step in developing the data file. Starting with 1939 data, SSA has published annual estimates of the number of workers and the value of the earnings covered under the programs it administers. The estimates first appeared in the Social Security Yearbook and, beginning with data for 1949, have been published in the Annual Statistical Supplement to the Social Security Bulletin (hereafter referred to simply as the Supplement). The Office of Research, Evaluation, and Statistics (ORES) produces these estimates using the Continuous Work History Sample (CWHS) sampling frame.3
Given that the CWHS and EPUF represent two distinct sampling frames, one expects differences in the earnings estimates derived from each. Besides the different sampling techniques, there are four reasons why the two sets of estimates will differ. First, the two estimates are based on different measures of earnings: The Supplement uses Social Security taxable earnings and EPUF uses capped Social Security taxable earnings. Second, the Supplement estimates are adjusted using factors developed by SSA's Office of the Chief Actuary (OCACT) to account for delinquent or fraudulent reporting of Form W-2 and Form 1040 Schedule SE information. Third, ORES and OCACT use different methodologies for updating historical estimates. Finally, EPUF removes some individuals and some earnings records (which are set equal to $0) from the underlying 1-percent sample to "clean" the data and to prevent disclosing personal information.
This note identifies and explains the differences between data in EPUF and estimates in the Supplement. It first highlights the factors that contribute to expected differences between the two estimates. It then compares EPUF and Supplement estimates, in turn focusing on earnings, number of workers with earnings, median earnings by sex and age group, and the percentage of workers with earnings below the taxable maximum by sex. After accounting for the expected differences, the note finds that remaining differences between EPUF and Supplement estimates are relatively small.
Expected Differences in the Estimates
This discussion distinguishes between the EPUF's underlying sample and the final EPUF data file. The underlying sample refers to a file containing earnings records for 4,413,024 individuals, before data cleaning and disclosure prevention procedures (discussed later) led to the removal of some earnings records. The final EPUF (or, simply, EPUF) contains the earnings records for 4,348,254 individuals. The underlying sample and the EPUF use different earnings measures, explained in the following section.
Different Measures of Earnings
All of the earnings data needed to administer the Social Security programs are contained in the Master Earnings File (MEF).4 The MEF consists of 20 segments, each containing specific data fields used for various administrative purposes. The Supplement earnings estimates analyzed here are taken from the MEF summary segment using the CWHS sampling frame.5 In general, the annual earnings data on the MEF summary segment are a running total of an individual's earnings up to the taxable maximum for each job in a given year, plus any taxable self-employment income. For the self-employed, "taxable earnings consists of net self-employment income which, when combined with any taxable wages for that individual, is at or below any applicable annual maximum taxable amount" (SSA 2011, G17).
MEF data reflect Social Security taxable earnings; that is, all earnings covered under the program subject to the payroll tax. Note that if an individual has more than one employer in a given year, the amount of earnings in this field may exceed the taxable maximum.6
The Supplement and the 1-percent MEF sample that underlies the EPUF use the same earnings measure, Social Security taxable earnings. However, in the final EPUF, earnings data for a given year are capped at the taxable maximum.7 Capped Social Security taxable earnings reflect a worker's covered earnings that are subject to the employee share of the payroll tax.
The following scenario illustrates the differences between the taxable earnings amount in the Supplement tables and the capped taxable earnings contained in EPUF. For a given year, assume the taxable maximum is $50,000 and an individual has covered earnings from two jobs. If the individual earns $60,000 in his first job and $15,000 in a second job, taxable earnings, as shown in the Supplement, would be $65,000 ($50,000 from the first job and $15,000 from the second job). However, the EPUF record would reflect only the capped taxable amount of covered earnings, or the individual's total covered earnings subject to the employee share of the payroll tax ($50,000). Given the difference between taxable earnings (Supplement) and capped taxable earnings (EPUF), one would expect the earnings amount in EPUF to be less than the Supplement earnings estimate. The difference between the two measures of earnings is the amount of covered earnings above the taxable maximum earned from multiple jobs, and it accounts for most of the differences between the Supplement and EPUF earnings estimates.
Adjustments to Taxable Earnings
In the Supplement, the estimates of annual taxable earnings and the number of workers with covered earnings reflect adjustments to the raw data pulled from the MEF summary segment. The adjustments attempt to account for two key issues: (1) earnings data for the most recent years are incomplete, and (2) some earnings data reported on W-2 and Schedule SE tax forms may be erroneous or fraudulent.
In general, by the time data are extracted to generate earnings estimates for the Supplement, approximately 98 percent of the current tax year's earnings data have been posted to the MEF.8 To account for the "missing" data, OCACT generates adjustment factors for the number of workers and the amount of taxable earnings in the extract. The adjustment factors are applied to the raw estimates to approximate the final earnings data expected to be posted to the MEF for the current tax year.9
In addition, employers may make errors when reporting employees' Social Security covered earnings, or individuals may use an SSN fraudulently. SSA has a number of procedures that attempt to identify and correct improperly reported earnings information. If these procedures cannot assign earnings information to an SSN, the record is placed in the Earnings Suspense File (ESF). Once the earnings are posted to the ESF, SSA takes additional steps to try to assign the earnings to the appropriate worker. The amount of taxable earnings data posted to the ESF has increased dramatically in recent years (Chart 1), causing a commensurate shortfall in earnings posted to the MEF. OCACT generates adjustment factors to approximate the number of workers and the amount of earnings currently in the ESF that are expected eventually to be posted to the MEF.
These adjustment factors are used solely to generate earnings estimates in the Supplement, and are not included in the EPUF microdata. Instead, the earnings data underlying the EPUF estimates reflect only the earnings data actually posted on the MEF when the data were extracted.
The adjustment factors are relatively large for the most recent years' estimates and decrease for each earlier year until the adjustments are minimal. With each passing year, the number of additional earnings data items posted to the MEF for a given tax year decreases. With the growing size of the ESF, one would expect to see greater differences between the Supplement estimates and the EPUF estimates for the most recent years, because both the adjustment factors and the amount of earnings not yet reported on the MEF (thus missing from EPUF) are increasing.
Differences in Historical Estimates
Data published in the Supplement reflect OCACT estimates of worker counts and covered earnings amounts for all earnings years. When generating the current-year estimates of taxable earnings, ORES revises the latest 3 years of estimates and considers OCACT estimates for all prior years to be relatively unchanged. Thus, Supplement estimates for all but the last 3 years are frozen and do not reflect any W-2 and SE information that may have been posted to the MEF in the intervening years. The differences between the historical estimates in the Supplement and those in EPUF (which include updated earnings data) should be minor.
Data Cleaning and Disclosure Prevention Procedures
In creating EPUF, records for some individuals were removed from the underlying sample because of data "cleaning" or because they were included in an existing public-use data file called the New Beneficiary Data System.10 In addition, annual earnings for individuals with earnings at ages 14 or younger or 86 or older were "zeroed out" (set equal to $0) to minimize the risk of personal data disclosure.11
Comparing the Estimates
These comparisons account for two alternative measures of Social Security covered earnings: taxable earnings, as used in the Supplement and the underlying EPUF sample; and earnings capped at the taxable maximum in a given year, as contained in the final EPUF. Directly comparing final EPUF and Supplement earnings estimates would yield somewhat misleading results because (1) each source uses a distinct measure of earnings and (2) the earnings data in EPUF have been adjusted by data cleaning and disclosure prevention procedures.
In lieu of beginning with a direct comparison, we can compare the estimates for taxable Social Security earnings in the Supplement with those in the underlying EPUF sample. Because these two sources use the same earnings measure, we would expect their estimates to differ only because of (1) differing sampling techniques, (2) OCACT's adjustments to account for delinquent, fraudulent, or erroneous reporting of earnings, and (3) ORES' freezing of historical estimates. Because both sources were created using random sampling, one would expect minimal differences between them. If the OCACT adjustment factors are minimal for all but the most recent years, then one would expect the largest differences for those years. If this comparison reveals substantial differences between the estimates, something is clearly wrong.
After comparing Supplement and underlying EPUF sample estimates, the next step is to isolate the effects of the two key differences in the earnings data between the underlying EPUF sample and the final EPUF. First, EPUF records reflect earnings capped at the taxable maximum in a given year. Second, some earnings records were removed from the underlying sample because of data "cleaning," and some annual earnings records were zeroed out to protect against personal data disclosure. Therefore, EPUF's capped taxable earnings amounts will be lower than the taxable earnings estimates in the underlying EPUF sample, and will thus differ even further from the Supplement estimates.
Finally, having established the context of the differences incrementally, we can compare EPUF and Supplement earnings estimates directly. Those comparisons will examine the differences between taxable and capped taxable Social Security earnings and the effects of the data cleaning and disclosure prevention procedures.
Taxable Earnings in the Underlying EPUF Sample
Table 1 compares the taxable earnings in the underlying EPUF sample with the taxable earnings in Table 4.B1 in the 2008 Supplement.12 Alongside columns respectively presenting estimates from the Supplement and the underlying EPUF sample, a third column expresses the underlying EPUF sample estimate as a percentage of the Supplement estimate. For 1951–1979 and 1988–1999, the underlying EPUF sample estimates equal at least 99 percent of the Supplement estimates. For 4 years between 1980 and 1987, and in each year 2000 through 2004, the underlying EPUF sample estimate drops to between 98 percent and 99 percent of the Supplement estimate. As expected, the most recent years reflect the largest differences between the two.
(millions of dollars)
|Underlying EPUF sample a
(millions of dollars)
|Underlying EPUF sample estimate as a percentage of the Supplement estimate|
|SOURCES: SSA (2009, table 4.B1) and author's calculations using underlying EPUF sample.|
|a. Weighted estimates.|
|b. 2008 Supplement estimates for 2004–2006 were based on preliminary data.|
Chart 2, which graphs the difference between the estimates expressed as a percentage of the Supplement estimate, shows relatively small differences between the estimates for most years. Except for 1967 and 1977, the estimates for 1951–1979 differ by less than one-half of one percentage point. For 1980–1988, the differences between the estimates are much more volatile and depart from the 1951–1979 trend line. This observation might be due to the transition from quarterly to annual wage reporting for tax year 1978, or to the substantial growth in earnings records assigned to the ESF during that period (Chart 1). It is possible that fewer earnings from the ESF were posted to the MEF during these years than had been expected.13 Although the variance in annual estimates from 1980 to 1988 is two to three times that seen in the other years between 1951 and 1997, the differences are still relatively small, only once exceeding 1.5 percentage points.
From 1989 through 1997, the difference in estimates is nearly stationary at one-half of one percentage point. However, from 1998 to 2004, there is a steady increase in the gap between the taxable earnings estimates in the underlying EPUF sample and the Supplement. One possible explanation for the growing gap is the increase in earnings assigned to the ESF that are not recorded in the MEF but are reflected in the Supplement estimates. Also, the difference between the estimates for the two most recent years (2005 and 2006) is much larger because the data in the MEF for those years are incomplete. These findings support the initial expectations about differences between the estimates.
Capped Taxable Earnings in EPUF
Chart 3 presents the percentage of earnings removed from the underlying EPUF sample due to capping earnings at the annual taxable maximum, removing records from the file for data cleaning, and zeroing out some annual earnings values because of data disclosure concerns.14 The bottom line in Chart 3 reveals that most of the earnings removed from the underlying EPUF sample are the result of capping earnings at the taxable maximum, as opposed to the data cleaning and disclosure prevention procedures (the distance between the lines).
The amount of earnings removed from the underlying EPUF sample expressed as a percentage of total earnings in that sample (top line in Chart 3) starts at 2.5 percent for 1951 and peaks at just over 3.5 percent for 1965. Beginning with 1966, there is a clear downward trend in the percentage of earnings removed from the underlying sample until 1983 (0.8 percent). From 1984 through 2006, the percentage of earnings removed is less than 1 percent, with the single exception of 2000.
Comparing EPUF Capped Taxable Earnings and Supplement Taxable Earnings Estimates
Table 2 shows taxable earnings estimates from Supplement Table 4.B1, weighted capped taxable earnings from the final EPUF, and the latter expressed as a percentage of the former. For years with complete data available, the percentages range from a low of 96.08 percent in 1965 to a high of 98.94 percent in 1988. As expected, the percentages are lower in years with incomplete data, especially 2006. The EPUF estimates are less than 97 percent of the Supplement estimates in only 6 years, with 1971 being the most recent before 2005.
|Year||Supplement taxable earnings
(millions of dollars)
|EPUF capped taxable earnings a
(millions of dollars)
|EPUF estimate as a percentage of the Supplement estimate|
|SOURCES: SSA (2009, table 4.B1) and author's calculations using EPUF.|
|a. Weighted estimates.|
|b. 2008 Supplement estimates for 2004–2006 were based on preliminary data.|
Chart 4 illustrates the effect of changing the measurement from the taxable earnings used in the underlying EPUF sample to the capped taxable earnings used in the final EPUF. Chart 4's top line shows the percentage point difference between EPUF and Supplement earnings estimates and its bottom line shows the percentage point difference between underlying EPUF sample and Supplement earnings estimates (from Chart 2). Chart 4 provides several points of interest. First, the lines differ widely from 1951 to 1980. Second, the volatility in the differences between the two estimates during 1980–1987 occurs for both earnings measurements, as the two lines move in roughly parallel patterns. Third, beginning in 1981, the gap between the two lines narrows and remains consistent thereafter.
Comparing earnings estimates: Percentage point differences between underlying EPUF sample and Supplement estimates and between final EPUF and Supplement estimates: 1951–2006
In Chart 5, the black line tracks the spread between the lines shown in Chart 4; that is, it shows the percentage-point difference between EPUF's capped taxable earnings and the Supplement's estimate minus the percentage-point difference between the underlying EPUF sample's taxable earnings and the Supplement estimate. The differences are relatively small and have narrowed considerably over time. Specifically, the gap peaks at 3.6 percentage points in 1965 and drops to just over 1 percentage point in 1980. From 1981 through 2006, the gap remains steady between 0.7 and 1.1 percentage points.
Comparing the percentage point spread between the differences in estimates in Chart 4 with the proportion of workers whose earnings exceed the taxable maximum, 1951–2006
One possible explanation for the relatively large gap between estimates for 1951–1976 is the much higher percentage of individuals who had earnings above the taxable maximum during those years. As previously noted, the major difference between the taxable earnings in the underlying EPUF sample and the capped earnings in EPUF is that only the latter excludes earnings for workers who have more than one employer and combined earnings above the taxable maximum. As a result, one would expect to see some correlation between the difference in the estimates and the percentage of workers with earnings above the taxable maximum in a given year. The red line in Chart 5 shows the percentage of workers with earnings above the taxable maximum using the scale to the right of the graph. As expected, changes in the percentage of individuals with earnings above the taxable maximum mirror the changes in the differences between taxable earnings and EPUF earnings estimates. The volatility in the percentage of individuals with earnings above the taxable maximum from 1951 to 1970 reflects Congress' ad hoc adjustments of the taxable maximum during these years. Legislation enacted in 1972 instituted automatic annual increases that took effect with the taxable maximum for 1975.15
Chart 6 presents the number of workers whose combined taxable earnings from multiple employers exceed the taxable maximum in a given year. The pattern mirrors those of both lines in Chart 5. These findings support initial expectations about capped taxable earnings in EPUF relative to Supplement estimates.
Comparing the aggregate earnings estimates derived from the underlying EPUF sample, the final EPUF file, and the Supplement leads to two conclusions: (1) taxable earnings estimates in the underlying EPUF sample and the Supplement do not differ widely; and (2) most of the differences between the final EPUF and Supplement earnings estimates stem from the use of two different measures (taxable and capped taxable earnings) and from OCACT adjustments incorporated in the Supplement estimates to account for delinquent posting of earnings data and potentially fraudulent use of SSNs.
The next sections compare Supplement and final EPUF estimates of the number of workers by sex and age, the median value of taxable earnings by sex and age, and the percentage of workers with earnings below the taxable maximum by sex.
Number of Workers
Supplement Table 4.B3 contains estimates of the number of workers with covered earnings in a given year, by sex. Table 3 compares Supplement and underlying EPUF sample estimates of the number of workers for 1951–2006.16 Alongside columns presenting the estimates themselves, a third column shows the underlying EPUF sample estimate expressed as a percentage of the Supplement estimate. The estimates differ very little: With one exception (1978), the underlying EPUF sample estimate is within 1 percentage point of the Supplement estimate from 1951 through 1999. As expected, the biggest differences between the estimates occur for the most recent years, when the data are incomplete and OCACT's adjustment factors play a more prominent role in the Supplement estimates. From 2000 through 2004, the percentages drop below 99 percent; for 2005 and 2006, they drop further, to less than 98 percent.
The next column shows the number of annual earnings records in the underlying EPUF sample removed because of data cleaning or zeroed out to meet data disclosure requirements. The final column reveals that the percentage of underlying EPUF sample records removed or zeroed out is very small, less than 1 percent each year.
|Year||Supplement (thousands)||Underlying EPUF sample (thousands)||Underlying EPUF sample estimate as a percentage of the Supplement estimate||Earnings records removed or zeroed out from underlying EPUF sample (thousands)||Underlying EPUF sample earnings records removed or zeroed out (%)|
|SOURCES: SSA, Annual Statistical Supplement to the Social Security Bulletin, Table 4.B3, various editions; and author's calculations using underlying EPUF sample.|
|a. 2008 Supplement estimates for 2004–2006 were based on preliminary data.|
Table 4 compares the Supplement and final EPUF estimates of the number of covered workers, with detail by sex. From 1951 through 2004, the final EPUF estimates represent at least 98 percent of the Supplement estimates. As expected, the percentages drop for 2005 and 2006 because of incomplete data and OCACT adjustments.
|EPUF estimate as a percentage of Supplement estimate|
|All b||Men||Women||All b||Men||Women||All||Men||Women|
|2004||156,250 c||82,008 c||74,242 c||153,551||80,526||73,001||98.27||98.19||98.33|
|2005||158,913 c||83,202 c||75,711 c||155,060||81,236||73,801||97.58||97.64||97.48|
|2006||161,205 c||84,181 c||77,024 c||156,280||81,576||74,681||96.94||96.91||96.96|
|SOURCES: SSA, Annual Statistical Supplement to the Social Security Bulletin, Table 4.B3, various editions; and author's calculations using EPUF.|
|a. Weighted estimates.|
|b. Includes a small number of workers whose sex was coded as "unknown."|
|c. 2008 Supplement estimates for 2004–2006 were based on preliminary data.|
Workers by Age
Supplement Table 4.B5 shows the estimated number of workers by age group. Unfortunately, some age categories are not defined consistently throughout the 1951–2006 period. Specifically, subcategories for those aged 60 or older from 1951 to 1959 differ from those used from 1960 through 2006. As a result, estimates for those aged 60 or older are shown only for 1960 and later.
Charts 7 and 8 compare EPUF and Supplement estimates of the number of workers with earnings by age group from 1951 to 2006. Both charts show EPUF estimates expressed as a percentage of the Supplement estimate.
Chart 7 looks at workers younger than age 60. In general, the differences between the estimates are very small, although estimates for individuals younger than age 20 clearly diverge starting in 1971. Data disclosure restrictions require EPUF to zero out earnings for those aged 14 or younger, accounting for much of this divergence. When the estimates for the number of workers in this age category are adjusted to include the earnings that were zeroed out, there is virtually no difference between the EPUF and Supplement estimates.
EPUF estimates of the number of workers as a percentage of the Supplement estimate, workers younger than age 60 by age group, 1951–2006
Chart 8 focuses on workers aged 60 or older. Although EPUF and Supplement estimates differ somewhat more for the 60–71 age group than for their younger counterparts, they differ much more for individuals aged 72 or older. For that group, the EPUF estimates are lower than the Supplement estimates across all years, and there is a distinct gap between the estimates for 1985 through 1998. The gap begins to narrow in 1999, but it remains somewhat larger than that for workers aged 60–71. The finding raises two critical questions: Why does this large gap occur only for 1985–1998, and why only for workers aged 72 or older?
EPUF estimates of the number of workers as a percentage of the Supplement estimate, workers aged 60 or older by age group, 1960–2006
The first attempt to answer these questions involves evaluating how the EPUF data cleaning and disclosure prevention requirements affect the estimated number of workers aged 72 or older. Chart 9 presents three measures of workers aged 72 or older as percentages of the Supplement estimate. The top line shows the full underlying EPUF sample estimate. The middle line represents the underlying sample after removing records because of data cleaning (for example, records with dubious age-at-earnings values) and for disclosure prevention (individuals that overlapped with the New Beneficiary Data System). The bottom line, showing the final EPUF after zeroing out earnings for workers aged 86 or older, replicates Chart 8's line for workers aged 72 or older. The short distance between the bottom line and the middle line shows the minimal effect of zeroing out the earnings of individuals aged 86 or older. The distance between the middle line and the top line shows that the effect of removing individuals for data cleaning and disclosure prevention is also generally small, although it is somewhat larger than the effect of zeroing out earnings for individuals aged 86 or older.17
Effects of EPUF data cleaning and disclosure prevention measures: Estimated number of workers aged 72 or older as a percentage of the Supplement estimate, 1960–2006
More significantly, the EPUF estimates (bottom line) and those from the underlying EPUF sample (top line) differ very consistently across the years. Thus, the distinct gap between the EPUF and Supplement estimates from 1985 through 1998 clearly does not result from the data cleaning or the disclosure prevention procedures applied to the underlying EPUF sample. It seems very peculiar that the estimates from the underlying EPUF sample are extremely close to the Supplement estimates except for these particular years. What, then, explains this anomalous gap?
A second approach is to compare the number of workers aged 72 or older in the EPUF with the number of workers in the active file within the 2008 version of the CWHS. The active file contains individuals in the 1-percent CWHS who have had any covered earnings since the program's inception. One would expect these two distinct 1-percent samples to produce very similar estimates of the number of workers aged 72 or older. The top line in Chart 10 shows the number of workers in EPUF expressed as a percentage of the number of workers in the active CWHS file and confirms that the estimates are indeed very similar. The bottom line shows the number of workers in the active CWHS file aged 72 or older as a percentage of the Supplement estimates. The gap between these two ratios is nearly identical to the differences between the EPUF and Supplement estimates for this age group.
Comparison of estimates of number of workers aged 72 or older: EPUF, CWHS, and Supplement, 1980–2006
The fact that the estimates from two distinct 1-percent samples are nearly the same indicates that there may be problems with the Supplement estimates for older workers during this period. One possible explanation is that a programming or coding error affected only those workers, and the error was corrected as part of the Y2K adjustments made to the MEF. Nonetheless, the Supplement estimates reflect the earnings data in the MEF at that time. Given the close relationship between EPUF's underlying 1-percent sample and the active CWHS file, the number of workers in EPUF aged 72 or older is presumably correct.
Charts 11–14 compare EPUF and Supplement estimates of the number of workers by age group and sex. For men younger than 60, Chart 11 reveals very little difference between the estimates. As expected, EPUF estimates as a percentage of Supplement estimates decline slightly for the most recent years. As was seen with all workers, estimates of the number of men aged 60 or older (Chart 12) differ more widely than estimates of the number of younger men. The distinct gap between EPUF and Supplement estimates of all workers aged 72 or older from 1985 through 1998 also occurs for men.
EPUF estimates of the number of male workers as a percentage of the Supplement estimate, workers younger than 60 by age group, 1951–2006
EPUF estimates of the number of male workers as a percentage of the Supplement estimate, workers aged 60 or older by age group, 1960–2006
For female workers younger than age 60, Chart 13 reveals that EPUF and Supplement estimates differ slightly more than do those of their male counterparts. After 1980, the estimates differ only minimally. As was true for men, EPUF and Supplement estimates for female workers aged 60 or older (Chart 14) differ more than those for younger women. In general, the EPUF estimates of older female workers appear to slightly exceed Supplement estimates from 1960 through 1974 but are lower from 1975 onward.
EPUF estimates of the number of female workers as a percentage of the Supplement estimate, workers younger than age 60 by age group, 1951–2006
EPUF estimates of the number of female workers as a percentage of the Supplement estimate, workers aged 60 or older by age group, 1960–2006
Median Earnings By Sex and Age
This section compares EPUF estimates of median earnings with those presented in Supplement Table 4.B6. In turns, the discussion examines median earnings for all workers, workers by sex, all workers by age, and then workers by sex and age.
Table 5 compares the median earnings for all, male, and female workers for 1951–2006. The EPUF estimate for all workers is at least 98.44 percent of the Supplement estimate in all years, and in fact slightly exceeds the Supplement estimate for most years. The pattern for men is very similar. For women, the EPUF estimate is much lower than the Supplement estimate in many years; for nine in particular, the EPUF estimate represents less than 98 percent of the Supplement estimate.
|Year||Supplement (dollars)||EPUF (dollars)||EPUF estimate as a percentage of Supplement estimate|
|2004||22,342 a||27,074 a||18,427 a||22,500||27,200||18,500||100.71||100.47||100.40|
|2005||22,983 a||27,895 a||18,892 a||23,100||28,000||19,000||100.51||100.38||100.57|
|2006||23,832 a||28,916 a||19,586 a||24,000||29,100||19,700||100.70||100.64||100.58|
|SOURCES: SSA, Annual Statistical Supplement to the Social Security Bulletin, Table 4.B6, various editions; and author's calculations using EPUF.|
|a. 2008 Supplement estimates were based on preliminary data.|
Chart 15 reveals the minimal differences between the EPUF and Supplement estimates of median earnings for all workers younger than age 60. Chart 16 shows much more variation for workers aged 60 or older. The greatest variation is for individuals aged 72 or older and it occurs during the same years that showed the greatest differences in the estimated numbers of individuals with earnings (Chart 8).
EPUF estimates of median earnings as a percentage of the Supplement estimate, all workers younger than age 60 by age group, 1960–2006
EPUF estimates of median earnings as a percentage of the Supplement estimate, all workers aged 60 or older by age group, 1960–2006
For men younger than age 60, EPUF median earnings estimates differ substantially from Supplement estimates for 1960–1973 (Chart 17). The greatest difference appears for 1965, when the EPUF estimate is approximately 74 percent of the Supplement estimate for those aged 35–49. How and why do such large differences occur, and why only from 1960 through 1973?18
EPUF estimates of median earnings as a percentage of the Supplement estimate, male workers younger than age 60 by age group, 1960–2006
Supplement Table 4.B6 alerts readers that "the amount of median earnings includes estimates above the taxable maximum." For 1951–1977, those estimates are based on data from earnings reports filed quarterly by employers. SSA began collecting "detailed" earnings information annually from W-2 forms in 1978, but earnings above the taxable maximum were still open to some conjecture through 1993.19 Starting in 1994, all covered earnings were subject to the Medicare payroll tax; thus, records reflected actual earnings, and estimates of amounts above the taxable maximum were no longer necessary.
If adjustments for earnings above the taxable maximum were made for all years from 1951 through 2006, why do the relatively large differences occur only from 1960 through 1973? Adjustments would affect median earnings only if the preadjustment median value exceeds the taxable maximum. In other words, if the preadjustment median value is less than the taxable maximum, then the additional earnings would not affect the median.
The methodology SSA used through 1994 to estimate earnings above the taxable maximum is not readily available. However, we can determine whether EPUF estimates of median earnings appear reasonable for the years in which they are much lower than Supplement estimates. If EPUF's median earnings for men in certain age groups is greater than or equal to the taxable maximum in a given year, then we know that any earnings above the taxable maximum will affect the median. Table 6 presents EPUF estimates of median earnings for men as a percentage of the taxable maximum for 1960–1980.20 Shaded cells indicate the years in which one should expect the Supplement medians to be greater than the taxable maximum. Table 7 presents the median earnings values from Supplement Table 4.B6 expressed as a percentage of the taxable maximum; the same cells are shaded as those in Table 6, with one exception (1973, for individuals aged 35–39).
Tables 6 and 7 explain the findings in Chart 17. The EPUF estimates of median earnings for some age categories of male workers are much lower than the Supplement estimates because the EPUF estimates do not account for earnings above the taxable maximum, while the Supplement estimates do. The Supplement's adjustments for earnings above the taxable maximum increase the estimated median earnings for some age groups in years when the median value exceeds the taxable maximum. However, the median values fall below the taxable maximums for all age groups after 1973, and adjustments for earnings above the taxable maximum no longer affect median values. Thus, EPUF and Supplement median earnings estimates differ minimally from 1973 through 2006.
|Year||19 or younger||20–24||25–29||30–34||35–39||40–44||45–49||50–54||55–59||60–61||62–64||65–69||70–71||72 or older|
|SOURCE: Author's calculations using EPUF.|
|Year||19 or younger||20–24||25–29||30–34||35–39||40–44||45–49||50–54||55–59||60–61||62–64||65–69||70–71||72 or older|
|SOURCE: SSA, Annual Statistical Supplement to the Social Security Bulletin, various editions.|
Chart 18 presents EPUF estimates of median earnings as a percentage of Supplement estimates for male workers aged 60 or older and reveals more divergence than was seen for younger workers, especially for men aged 72 or older. Most of the variation occurs in the same years for which EPUF and Supplement estimates of number of workers vary.
EPUF estimates of median earnings as a percentage of the Supplement estimate, male workers aged 60 or older by age group, 1960–2006
Chart 19 shows minimal differences between EPUF and Supplement median earnings estimates for female workers younger than age 60. Chart 20 reveals much more variation between the estimates for female workers aged 60 or older, mirroring the pattern of divergence seen for older men.
EPUF estimates of median earnings as a percentage of the Supplement estimate, female workers younger than age 60 by age group, 1960–2006
EPUF estimates of median earnings as a percentage of the Supplement estimate, female workers aged 60 or older by age group, 1960–2006
Percentage of Workers with Earnings Below the Taxable Maximum
Supplement Table 4.B4 contains estimates of the percentage of all workers with earnings below the taxable maximum amount beginning in 1951. Table 8 compares the EPUF estimates with those found in the Supplement. Few of the estimates differ by more than one-tenth of a percentage point. For all workers, the largest difference between the estimates occurs in 1967, where the EPUF estimate is 0.6 percent greater than the Supplement estimate. The largest difference in the estimates of workers by sex is seen for 1969, in which EPUF estimates are 1.7 percent higher for men and 1.0 percent lower for women than the Supplement estimates. Apart from these and scattered other modest differences, the EPUF and Supplement estimates scarcely differ.
|Year||Supplement a||EPUF||EPUF estimate as a percentage of Supplement estimate|
|2004||94.1 b||91.2 b||97.3 b||94.1||91.2||97.3||100.0||100.0||100.0|
|2005||93.9 b||91.0 b||97.1 b||93.9||90.9||97.1||100.0||99.9||100.0|
|2006||94.0 b||91.1 b||97.1 b||93.9||91.0||97.1||99.9||99.8||100.0|
|SOURCES: SSA (2009, Table 4.B4) and author's calculations using EPUF.|
|a. From 1937 to 1950, relates to wage and salary workers. Beginning in 1951, includes self-employed workers.|
|b. 2008 Supplement estimates were based on preliminary data.|
This analysis compares the earnings data contained in EPUF with estimates published in the Annual Statistical Supplement to the Social Security Bulletin. The analysis presents four reasons why one should expect differences between the estimates beyond those due to the different sampling frames used to generate the respective 1-percent samples.
First, the Supplement estimates are based on taxable earnings, which can sum to more than the Social Security taxable maximum for multiple jobholders, whereas the EPUF reflects only capped taxable earnings. Second, EPUF data cleaning and disclosure prevention measures reduce the number of records with earnings and the amount of earnings reported. Third, the Supplement estimates reflect adjustments to account for delinquent, erroneous, and potentially fraudulent reporting of earnings information to SSA. Fourth, the Supplement updates only the three most recent years of estimates. As a result, older estimates are frozen and do not reflect any subsequent changes in the MEF. EPUF earnings data reflect the continuously updated MEF and contain the most recent earnings data reported to SSA.
The analysis began by comparing estimates of taxable earnings in the underlying EPUF sample with those in the Supplement. Those estimates proved very similar and supported the expectation that the biggest differences would be for the most recent years. Although there was some divergence between the estimates from the underlying EPUF sample and the Supplement estimates from 1980 through 1988, those differences were relatively minor.
In general, the other differences between EPUF and Supplement earnings estimates are relatively small after accounting for expected differences. The key differences are largely attributable to EPUF's use of capped taxable earnings and the removal of some records due to data cleaning and disclosure prevention procedures.
There were, however, two unexpected differences between EPUF and Supplement estimates: Specifically, estimates of the number of workers by age group and sex, and the value of median earnings by age and sex. The distribution of individuals who have earnings when they are 72 years old or older in EPUF is nearly identical to that in the active file of the CWHS, indicating that EPUF represents the current state of the earnings data contained in the MEF, even though it differs from the Supplement estimates. Median earnings estimates differ because Supplement estimates are adjusted to account for estimated earnings above the taxable maximum and EPUF estimates are not.
EPUF's two linkable subfiles differ in structure and in the ways the data cleaning and disclosure protection procedures affect them. The demographic and aggregate earnings subfile uses a person-record format containing a single record for each of the 4,348,254 individuals in the EPUF. Each record contains the individual's EPUF identification code, year of birth, sex, aggregate taxable earnings from 1937 through 1950, aggregate quarters of coverage earned from 1937 through 1950, and quarters of coverage earned in 1951 and 1952. Appendix Table A-2 presents an illustrative listing of 20 hypothetical demographic and aggregate earnings subfile records. The annual earnings subfile is a vertical-event history file that contains a single record for each year with positive earnings for each person in EPUF. Every earnings-year record contains the individual's EPUF identification code, capped taxable earnings, and earned quarters of coverage. The annual earnings subfile contains 60,326,474 records for the 3,131,424 individuals who had at least 1 year of positive earnings from 1951 through 2006. Appendix Table A-3 presents an illustrative listing of 41 earnings years for four hypothetical earners.
Nearly 28 percent of the individuals in EPUF had no positive annual earnings from 1951 through 2006. These individuals have a record in the demographic and aggregate earnings subfile, but no records in the annual earnings subfile.
The MEF 1-percent sample underlying the EPUF contained records for 4,413,024 individuals. Data "cleaning" led to the removal of records for 28,770 individuals; those records had dubious or missing year-of-birth values, coding errors, or other issues. Then, to protect against disclosure of personal data, earnings records for individuals aged 14 or younger or 86 or older were zeroed out, effectively removing those records from the annual earnings subfile (because the subfile only contains records with positive earnings values). However, setting those earnings equal to $0 does not affect the individual's record in the demographic and aggregate earnings subfile.
|Year||Total taxable earnings in underlying EPUF sample (million $)||Capping (earnings exceeding the taxable maximum)||Data cleaning and disclosure prevention||Total earnings removed from underlying EPUF sample|
|Dollars (in millions)||As a percentage of taxable earnings||Dollars (in millions)||As a percentage of taxable earnings||Dollars (in millions)||As a percentage of taxable earnings|
|SOURCE: Author's calculations using underlying EPUF sample.|
|ID number||Year of birth (YOB)||Sex a||Total covered earnings ($) (TOT_COV_EARN3750)||Quarters of coverage 1937–1950 (QC3750)||Quarters of coverage 1951–1952 (QC5152)|
|SOURCE: Author's reconstruction based on EPUF.|
|a. 1 = male, 2 = female.|
|ID number||Year with earnings (YEAR_EARN)||Quarters of coverage (ANNUAL_QTRS)||Capped taxable earnings ($) (ANNUAL_EARNINGS)|
|SOURCE: Author's reconstruction based on EPUF.|
1 For an introduction to the EPUF, see Compson (2011).
2 See appendix for more details on the structure of the two subfiles.
3 The CWHS is a longitudinal database produced by ORES for internal research and statistical purposes. SSA is authorized to share the CWHS with the Treasury Department's Offices of Economic Policy and Tax Analysis and with the Congressional Budget Office. For more information about the CWHS, see Buckler (1988) and Smith (1989).
4 For information on the MEF, see Olsen and Hudson (2009).
5 Some Supplement tables are based on CWHS annual files. However, this analysis examines Supplement earnings tables based on the MEF 1-percent sample, an extract of earnings data from the MEF summary segment using the CWHS sampling frame, which selects a random sample of records based on certain serial digits of the SSN.
6 Although other circumstances may account for records with taxable earnings above the taxable maximum, the vast majority of cases involve earnings from multiple employers.
7 Capping earnings at the taxable maximum in a given year eliminates the need to top-code this data field.
8 Posting all the information from the W-2s and selected information from Schedule SE is a massive annual undertaking. The MEF is continuously updated as additional W-2 and Schedule SE information is reported, or previously reported earnings are corrected. For more details, see Olsen and Hudson (2009).
9 For example, suppose that the total amount of taxable earnings on the MEF was $98 and the expected total amount of earnings posted to the MEF was $100. The adjustment factor in this case would be 1.0204082. ORES would multiply the aggregate earnings on the MEF for this tax year by the adjustment factor to generate an estimate of $100 in the Supplement.
10 For more details, see Compson (2011).
11 For example, if an individual has annual earnings in each year between ages 12 and 62, the EPUF earnings records would reflect $0 for ages 12, 13, and 14 years old and all of the individual's other earnings records would remain unchanged.
12 Supplement figures cited in this note are primarily from the 2008 edition, the most recent Supplement consulted for this analysis.
13 As noted earlier, Supplement estimates make use of OCACT adjustment factors for the number of workers and the amount of earnings reported to the ESF. Those adjustment factors are beyond the scope of this analysis.
14 Appendix Table A-1 contains the data for Chart 3.
15 Taxable maximums for 1979 through 1981 were set by legislation. Those for 1990 through 1992 were set using a transitional rule. See SSA (2011, Tables 2.A3 and 2.A18).
16 Supplement Tables 4.B3, 4.B5, and 4.B6 present annual data only for the most recent years. Data for prior periods are shown only for selected years—specifically, for 1937 and then at 5-year intervals from 1940 until annual coverage begins. Therefore, beginning with Table 3, most of this note's charts and tables draw data from various editions of the Supplement, always using the most recent edition that presented data for a particular year.
17 There is no overlap between the individuals removed from the file due to dubious age values and those whose earnings at ages 86 or older were zeroed out. See appendix for more information.
18 As previously noted, the age subcategories used in the Supplement for individuals aged 60 or older are not consistent throughout the 1951–2006 period. For that reason, the analysis is limited to 1960 through 2006.
19 The cap for covered earnings subject to the Medicare payroll tax was higher than the taxable maximum for the Social Security program in 1992 and 1993, and was removed altogether in 1994.
20 The period 1960–1980 contains the only years in which men's median taxable earnings in EPUF are equal to the taxable maximum.
Buckler, Warren. 1988. "Commentary: Continuous Work History Sample." Social Security Bulletin 51(4): 12, 56.
Compson, Michael. 2011. "The 2006 Earnings Public-Use Microdata File: An Introduction." Social Security Bulletin 71(4): 33–59.
Olsen, Anya, and Russell Hudson. 2009. "Social Security Administration's Master Earnings File: Background Information." Social Security Bulletin 69(3): 29–45.
Smith, Creston M. 1989. "The Social Security Administration's Continuous Work History Sample." Social Security Bulletin 52(10): 20–28.
[SSA] Social Security Administration. 2002. Congressional Response Report: Status of the Social Security Administration's Earnings Suspense File. Report No. A-03-03-23038. Baltimore, MD: SSA, Office of the Inspector General.
———. 2009. Annual Statistical Supplement to the Social Security Bulletin, 2008. Washington, DC: SSA, ORES.
———. 2011. Annual Statistical Supplement to the Social Security Bulletin, 2010. Washington, DC: SSA, ORES.