Survey Estimates of Wealth: A Comparative Analysis and Review of the Survey of Income and Program Participation

by John L. Czajka, Jonathan E. Jacobson, and Scott Cody
Social Security Bulletin, Vol. 65, No. 1, 2003/2004 (released May 2004)

This is the executive summary of the report prepared under contract for the Social Security Administration (contract no. 0600-01-60121/0440-02-51976) by John L. Czajka, Jonathan E. Jacobson, and Scott Cody of Mathematica Policy Research, Inc. For further information or to obtain an electronic copy of the full report, contact Alexi Strand (e-mail: alexander.strand@ssa.gov).

Contents of this publication are not copyrighted; any items may be reprinted, but citation of the Social Security Bulletin as the source is requested. The findings and conclusions presented in the Bulletin are those of the authors and do not necessarily represent the views of the Social Security Administration.

The Office of Research, Evaluation, and Statistics (ORES) within the Social Security Administration (SSA) relies on data from the Census Bureau's Survey of Income and Program Participation (SIPP) for a variety of applications. Data on wealth are important in these applications. Earlier comparisons of SIPP estimates of wealth with those from other surveys—namely, the Survey of Consumer Finances (SCF) and the Panel Study of Income Dynamics (PSID)—identified a number of shortcomings in the SIPP data. These shortcomings mostly affected the survey's estimates of high-income families and the types of assets that such families hold disproportionately. More recently, however, SIPP estimates of median wealth have shown little change over a period of time when the SCF has shown a marked increase. This has raised concern that continued use of SIPP data for ORES applications may require some form of adjustment of the wealth data, if not their outright replacement by one or more other sources. This report compares SIPP estimates of wealth with estimates developed from the SCF and the PSID, seeks to attribute the observed disparities to differences in survey design and implementation, explores ways to improve the quality of the SIPP estimates for the most relevant subpopulations, and presents recommendations regarding both the use and production of SIPP wealth data.

Comparative Estimates of Wealth

Each of the three surveys is ultimately intended to represent the entire noninstitutionalized population, but each collects data from a different unit of observation. The SCF collects its most detailed data on the "primary economic unit," which includes the economically dominant individual or couple and all others who are financially dependent. The SCF collects very limited data on the collective remaining individuals in the household. The SIPP collects wealth data from each adult member (15 and older) of the sample household. With these data it is possible to construct alternative units of analysis. We constructed SIPP family units that mimic the SCF primary economic unit. The PSID collects data from families, using a concept of economic dependence like the SCF to determine which related persons living together constitute a family. To produce PSID wealth estimates for a universe that matches that of the SCF and SIPP, we limited the PSID families to those that were likely to include the household head. Most of the estimates presented in this report are from the 1998 SCF, the 1999 PSID, and wave 9 of the 1996 SIPP panel, which has a reference period covering late 1998 and early 1999.

Overall Wealth

Wealth, or net worth, is defined as total assets less total liabilities. The SIPP estimate of aggregate net worth, at $14.4 trillion, is just under half of the SCF estimate of $29.1 trillion and 60 percent of the PSID estimate. The SIPP estimate of median net worth, $48,000, is two-thirds of the SCF median of $71,800 and 74 percent of the PSID median.

With the detail captured in the SIPP and the SCF, it is possible to separate assets from liabilities. The SIPP estimate of aggregate assets is 55 percent of the SCF estimate of $34.1 trillion, but its estimate of aggregate liabilities is 90 percent of the SCF estimate of $5.0 trillion. The SIPP estimate of median assets is 83 percent of the SCF median of $116,500 while its estimate of median liabilities is 97 percent of the SCF median of $11,900. By estimating liabilities so much better than assets, the SIPP reduces its estimate of net worth significantly.

Wealthy Families

Wealth is highly concentrated. Estimates from the SCF indicate that the wealthiest one percent of families own a third of all wealth in the United States. The SIPP's estimate of aggregate assets is much weaker than its estimate of median assets because the SIPP underestimates both the number of wealthy families and their average wealth. The SIPP's use of topcoding contributes to this shortfall by removing assets from wealthy sample members.

Excluding assets and liabilities not measured in the SIPP, the proportion of SCF families with net worth above $1 million, or 3.8 percent, is two-and-a-half times the SIPP proportion, and the fraction with net worth above $2 million, or 1.7 percent, is five times the SIPP fraction. When families with net worth of $2 million or more are excluded from both surveys, the SIPP estimate of aggregate net worth is 75 percent of the comparable SCF estimate; aggregate assets are 80 percent of the SCF estimate; and aggregate liabilities are 101 percent of the SCF estimate.

Components of Wealth

As a proportion of the corresponding SCF estimate, the SIPP's estimates of aggregate assets exhibit wide variation by type. The SIPP's estimate of the value of the home is 91 percent of the SCF estimate, but the SIPP captures only 41 percent of the SCF valuation of other real estate. The SIPP also captures 76 percent of the SCF estimate of motor vehicles but only 17 percent of SCF business equity. Among financial assets, the SIPP estimate of 401(k) and thrift accounts is 99 percent of the SCF estimate, but the next best component, other financial assets, is only 71 percent of the SCF estimate. For assets held at financial institutions, the SIPP estimate is 63 percent of the SCF estimate. For stocks and mutual funds, the largest financial asset, the SIPP estimate is only 59 percent of the SCF estimate while the SIPP estimate of IRA and Keogh accounts is 55 percent of the SCF estimate. Lastly, the SIPP estimate of other interest earning assets is only 33 percent of the SCF amount.

If we remove families with net worth of $2 million or more, the SIPP estimates of aggregate assets by type draw closer to the SCF estimates by varying amounts, reflecting differences in their distribution. The SIPP estimates of own home, 401(k) and thrift plans, and other financial assets equal or exceed the SCF estimates while the SIPP estimate of motor vehicles reaches 82 percent of the SCF estimates. Stocks and mutual funds improve to 84 percent of the SCF estimate while the remaining financial assets and other real estate rise to between 74 and 79 percent of the SCF estimates. Business equity remains lowest at 50 percent of the SCF estimate.

We can decompose the difference between the SIPP and SCF aggregate assets into four components. Underestimation of the assets of the wealthy accounts for 72 percent of the total difference. Assets not measured in the SIPP, excluding those reported by the wealthy, account for 13 percent. Underestimation of business equity for the nonwealthy is 5 percent of the total difference while the underestimation of all remaining assets accounts for 10 percent.

Even with the wealthiest families included, SIPP estimates of aggregate liabilities by type generally lie close to the SCF estimates. Home mortgages dwarf all other liabilities with an aggregate value five times that of the next largest component, and the SIPP estimate is 95 percent of the SCF amount. The SIPP estimates of three other components exceed the SCF estimates while loans from financial institutions are 73 percent of the SCF estimate. Mortgages on rental property and the debt held in margin and broker accounts are the only components estimated poorly by the SIPP; their estimates are 42 and 30 percent of the respective SCF amounts. A decomposition of the difference in the two surveys' estimates of liabilities is not meaningful because aggregate agreement is so high.

The PSID as a Benchmark

Comparing SIPP estimates of the components of wealth with estimates from the SCF may provide the most rigorous test of their quality in most cases, but as a measure of what may be attainable with a general household survey such as the SIPP, the SCF sets the bar too high—at least for assets. While the PSID does not provide the same detailed breakdown of assets and liabilities as the SIPP, the PSID may provide more appropriate benchmarks but for those components that line up well with the SIPP.

For checking and savings accounts the SIPP aggregate is 79 percent of the PSID aggregate, and for equity in stocks and mutual funds the SIPP aggregate is 72 percent of the PSID aggregate. The SIPP estimate of the equity value of other real estate is only 46 percent of the PSID estimate, and the SIPP estimate of business equity is only 22 percent of the PSID estimate. All of these findings suggest that significant improvement in the SIPP is feasible.

The PSID is not helpful for retirement assets, but the PSID confirms that the SIPP estimate of the value of the family's own home is very strong: the SIPP aggregate is 94 percent of the PSID amount. Comparisons involving the two liabilities distinguished in the PSID—home mortgages and unsecured liabilities—show exceedingly high agreement (and with the SCF as well). This further confirms that survey respondents are able to provide good data on their debts.

The findings for vehicles suggest that the methodology used in the SIPP and the SCF (which assign a blue book value based on reported make, model and year) is better than the PSID approach, which asks respondents to estimate the equity value of their vehicles. Respondents appear to overestimate what their vehicles are worth.

Ownership of Assets and Liabilities

SIPP estimates of particular components of wealth could be low because too few respondents report owning such components or because those who do report ownership do not report their full amounts. In general, SIPP ownership rates lag behind SCF ownership rates whenever there are differences in aggregate amounts that cannot be explained by differences in the surveys' estimates of wealthy families. A few examples are particularly notable. First, SIPP families underreport their ownership of checking and savings accounts, IRAs and Keogh accounts, and real estate other than the home, but the median amounts for families that do report such assets are similar between the two surveys. Second, other financial assets show a 2 percent ownership rate in the SIPP compared to 10 percent in the SCF, yet the conditional median in the SIPP is much higher than in the SCF. This suggests that the SIPP respondents are reporting only their more valuable assets in contrast to the SCF respondents, who were prompted with a lengthy list of examples. Third, for business equity, a 50 percent higher SCF ownership rate but a three-fold higher median value suggests that the businesses not being reported by SIPP respondents are exceptionally valuable.

Change in Estimates of Wealth Over Time

Findings from the four SCFs conducted from 1992 through 2001 document an impressive and broad-based growth in wealth holdings after the nation emerged from recession. Does the SIPP capture the trends in wealth holdings revealed in the SCF, even though the SIPP's estimates of the levels of wealth holdings may be low? Second, is there any evidence of deterioration in the quality of the SIPP's estimates of wealth between the early 1990s panels and the 1996 panel?

Growth in Aggregate Assets

The SIPP tracks the SCF exceedingly well in the growth of aggregate assets by type. Between 1993 and 1999, assets in the SIPP grew by 39 percent after adjustment for inflation while SCF assets grew by 43 percent. SIPP financial assets grew by 81 percent compared to 78 percent for the SCF. SIPP property assets grew by 25 percent versus 24 percent in the SCF. Of the other assets measured in the SIPP, only vehicles failed to match the growth rate recorded in the SCF, increasing by just 8 percent compared to 40 percent in the SCF.

Comparative Trends in the Distribution of Wealth

The similarities in SIPP and SCF trends in aggregate assets mask important differences in trends throughout the distribution. When asset components not measured in the 1992 SIPP panel are excluded from the 1992 SCF, the SIPP and SCF median assets are nearly identical, and the SIPP estimates of the 40th to the 80th percentiles are within five percentage points of the SCF estimates. Between 1992 and 1998, however, the gap between the SIPP and SCF estimates increased at every decile below the 90th percentile. In contrast to this, the SIPP and SCF liabilities stayed in close agreement.

The relationship between the two surveys' trends in net worth is more complex. Families with zero or negative net worth grew from 13 percent to 17 percent of the population in the SIPP but remained at 13 percent in the SCF. SIPP estimates of net worth below the 50th percentile declined in constant dollars whereas the SCF estimates grew at percentiles 20 and above. Most notably, the SIPP's estimate of the 20th percentile of net worth fell to 25 percent of the SCF value after having been 72 percent; and SIPP median net worth remained unchanged while the SCF median grew by 14 percent. SIPP net worth grew between the 50th and 90th percentiles but did so more slowly than the SCF. At the 90th percentile and above, however, SIPP growth in net worth matched or even exceeded the growth in SCF net worth.

Trends Within the SIPP

Adding 1995 data from the 1993 SIPP panel and 1997, 1998, and 2000 data from the 1996 panel yields clear evidence of a disjuncture between the 1992/1993 panels and the 1996 panel. While the earlier panels provide evidence of growth in net worth at every decile, this growth is reversed between 1995 and 1997 at percentiles 60 and lower. Percentile values then remain flat or decline through at least 1999. Assets show this same pattern at percentiles 30 and lower but grow at percentiles 40 through 90, consistent with the earlier panels. Liabilities show little or no growth at any decile between 1993 and 1995 but shift abruptly between 1995 and 1997 at every decile. They grow modestly after that.

Correlation Between Assets and Liabilities

The most striking evidence that "something" happened between the 1993 and 1996 SIPP panels is found in the correlation between assets and liabilities. In both the earlier SIPP panels the correlation between assets and liabilities was .49, compared to the 1992 SCF estimate of .50. With the 1996 SIPP panel this correlation dropped precipitously and became very unstable, with values ranging from .06 to .19 over the four waves. The correlation in the 1998 SCF was only moderately lower than in 1992 at .40.

Subpopulations

Each of ORES's uses of SIPP wealth data is in the context of a specific target population, so it is important to ask how the SIPP varies with respect to the quality of its measurement of wealth across key subpopulations.

Demographic and Economic Differentials

The SIPP shows stronger differentials than the SCF in median net worth by age, race, and income below 400 percent of poverty. For assets and particularly liabilities, the differentials are generally very similar between the two surveys.

Key Subpopulations

We identified 10 subpopulations that are of potential interest to SSA for policy analysis or for better understanding the strengths and limitations of SIPP wealth data. Four subpopulations are defined by income in relation to poverty. Another six subpopulations consist of families with an elderly head or spouse, a head nearing retirement, a prime working-age head (30 to 60), an aged head or spouse receiving Social Security benefits, a nonaged head or spouse receiving such benefits, and a nonaged disabled head or spouse. SIPP's strength in sample size is evident in the sample counts for these subpopulations. For example, the SIPP has more than 2,000 sample families with a nonaged disabled head or spouse whereas the SCF has fewer than 200 and the PSID only 368. Similarly, the SIPP has more than 10,000 low-income families compared to 1,100 for the SCF and 2,100 for the PSID.

Assets measured in the SCF but not the SIPP can explain much of the difference between the surveys' estimates of subpopulation aggregates. To examine the impact of these non-SIPP assets more directly, we subtracted their mean values from the SCF mean net worth to create an adjusted SCF mean. Wealthy families ($2 million and up) were excluded. For the low-income subpopulation and the nonaged Social Security beneficiary and disabled subpopulations, the SIPP means match the adjusted SCF means. For all but one of the other subpopulations the SIPP means range from 87 to 94 percent of the SCF adjusted means. For families with prime working age heads the SIPP mean is 78 percent of the corresponding SCF mean. These results support the use of SIPP data to analyze the wealth of these subpopulations, and they make a strong case for expanding SIPP data collection to capture the major components that are currently omitted.

Sources of Error in Measured Wealth

Under-representation of High-income Families

Compared to both the SCF and the Current Population Survey (CPS), the SIPP under-represents families above $300,000 by two-thirds, families between $150,000 and $300,000 by at least one-third, and families between $90,000 and $150,000 by at least 12 percent. Topcoding in the SIPP might shift some families from the top group to the next, but the CPS uses similar topcodes. Differential attrition does not explain the shortage of high-income families either. A surprising feature of the SIPP weights is their uniformity over the income distribution, which implies that families at all income levels are weighted up to offset the missing high-income families. Reweighting the SIPP sample to reproduce the SCF income distribution improves the SIPP wealth distribution only slightly. Responding families may have less income and less wealth than the nonresponding families that they are being reweighted to represent.

Coverage and Content

Assets that are measured in the SCF but not the SIPP include: the value and debt associated with vehicles beyond three per family, the balance in defined contribution pension accounts other than 401(k) and thrift accounts (collected once in a separate module, see next section), the cash value of life insurance, and "other" assets, consisting primarily of annuities and trusts. Liabilities measured in the SCF but not the SIPP are more limited: just personal business debt and other secured debt. Collectively, these items account for about 10 percent of the SCF estimate of aggregate net worth. With these items removed, the SIPP estimate of aggregate or mean net worth is 55 percent of the SCF estimate (versus 50 percent when these items are included).

Assets and liabilities that the SIPP measures but with very limited success include: interest earning assets besides those held at financial institutions, all other real estate beside the family's main home, business equity, and mortgage debt on rental property. Collectively, these items account for $9.6 trillion of the SCF estimate of aggregate net worth but only $2.5 trillion of the SIPP estimate of aggregate net worth. If these items are removed from both surveys, the SIPP estimate of aggregate or mean net worth is 72 percent of the SCF estimate.

On the whole, the non-SIPP items that are included in the SCF increase the estimated value of net worth throughout most of the distribution by a greater margin than they increase aggregate net worth. And they add proportionately more net worth to the lower half of the distribution than to the upper half. In contrast, the items that the SIPP measures relatively poorly are concentrated in the upper regions of the net worth distribution and have a much bigger impact on aggregate net worth than on most of the distribution.

Other Pension Data in the SIPP

The annual wealth module in the 1996 SIPP panel captures 401(k) and thrift account holdings but does not capture other pension wealth. Additional data on retirement accounts were collected in wave 7—separately from the annual wealth module. The wave 7 data duplicate the 401(k) and thrift account data collected in the wealth module but also capture defined contribution pension plans. We found that the wave 7 module captured as much pension wealth as the SCF.

Negative and Zero Net Worth

The proportion of families with no assets and no liabilities is 4.3 percent in wave 9 of the 1996 SIPP panel and 2.4 percent in the 1998 SCF. Other MPR [Mathematica Policy Research] research suggests a possible explanation for this difference: respondents lose interest in the survey and provide less and less information, which may culminate in attrition. We find some support for this thesis. One-quarter of families with zero net worth in wave 9 did not respond to the survey a year later, and one-half continued to report no assets or liabilities. Attrition was marginally lower among families with negative or low positive net worth in wave 9, but it was less than half as high among families with higher reported net worth.

About 11 percent of SIPP families and 8 percent of SCF families have negative net worth. The SIPP families often have combinations of assets and liabilities that are rare among SCF families with negative net worth. In particular, the SIPP families are much more likely to have low assets and high liabilities, and they have higher assets and higher liabilities generally. These patterns are consistent with the low correlation between assets and liabilities reported earlier.

Item Nonresponse

Item nonresponse to the SIPP wealth questions is very high, with 20 to 60 percent of the nonzero amounts being imputed. While the most common assets and liabilities have imputation rates that tend toward the low end, more than half of the amounts for stocks and mutual funds—the second largest asset in the SIPP—are imputed. In contrast to the SCF's state of the art imputation methods, the Census Bureau applies the same hot deck procedure that it uses to impute items with much lower nonresponse rates. In the 1996 panel the correlation between assets and liabilities among families with particular combinations of imputed values is weaker than it is among the remaining families. A limited analysis found no evidence of this in the 1992 SIPP panel. Not taking account of reported liabilities when imputing assets, and vice versa, could explain the 1996 panel result. But unless the imputation methodology changed in some critical way between the two panels, the 1992 panel finding contradicts this interpretation.

Response Brackets

Less effective use of range responses could be a factor in the SIPP's generally low estimates of assets. The response brackets used in the SIPP to collect ranges from respondents who could not provide exact amounts do not match the distributions very well, generally. The PSID often provides three brackets above the median while the SIPP usually provides only one.

Vehicles

Like the SCF, the SIPP uses an industry "blue book" to assign values to vehicles based on the reported make, model, and year. This is a proven methodology, but the Census Bureau relies on a reference book that extends back only seven years. While there exists a blue book for older cars, the Census Bureau assigned values to older cars in the 1996 panel based entirely on the reported year. Every car with the same model year was assigned the same value, regardless of make and model. The source of these values is not evident, but with decreasing model year (or increasing age) the values are progressively lower than the average blue book values assigned in the SCF. With as many as half of all cars being older than seven years, this method of assigning values has a pronounced negative effect on the quality of the SIPP vehicle data. Imputations were also based solely on model year. If only the model year was reported, the mean value for that model year was assigned. If no year was reported, a single value representing a multi-year average was assigned, even if the make and model were reported. These primitive imputations further weakened the SIPP estimates of a widely-held asset.

Adjusting the SIPP Database for SIPP-SCF Differences in the Level and Distribution of Assets

We applied reweighting based on income and a method of "recoding" based on econometric models to adjust the SIPP distributions of six types of assets so that they more closely resemble the distributions in the SCF. The objective of the recoding was to estimate what outcomes would have been reported had the SIPP families been surveyed in the SCF instead. Recoding addresses differences in survey content and administration but not sample composition.

Recoding

For each of six assets we estimated four equations predicting: (1) the presence of the asset in the SCF, (2) the presence of the asset in the SIPP, (3) the asset value in the SCF, and (4) the asset value in the SIPP. For the SIPP equations we calculated standardized residuals. We then used the equations estimated from the SCF, the observed characteristics of each SIPP family, and the SIPP residuals to generate predictions of the presence and amount of assets. We recoded the observed SIPP values by replacing them with these predicted values, which assume that the SIPP family was observed in the SCF with its SIPP characteristics and residuals.

Retirement Assets

Reweighting the SIPP database reduced the SIPP-SCF gap in total retirement assets by 23 percent. Recoding topcoded values reduced the gap an additional 18 percent. Replacing imputed values with recoded values widened the gap slightly. Recoding all remaining values reduced the gap by another three-fifths, leaving less than 3 percent of the original gap. These findings imply that SIPP-SCF differences in the non-reporting or underreporting of retirement assets are largely due to differences in survey content and administration instead of sample composition. These results are consistent with our findings that most of the difference between SIPP and SCF estimates of retirement assets is due to defined contribution pensions, which are not measured in the SIPP wealth module.

Non-retirement Assets

Reweighting and recoding were much less successful for total non-retirement assets than for retirement assets, leaving more than two-fifths of the original SIPP-SCF gap. The effectiveness of reweighting and recoding varied across major types of non-retirement assets. The comparatively small percentage gap for owner-occupied housing was reduced very little by the adjustments, while the proportionately larger but small dollar gaps for checking and savings accounts and motor vehicles were reduced by one-third and two-thirds, respectively. The large gap for other non-retirement assets was reduced by two-fifths. The remaining gap for total non-retirement assets appears to be due to systematic differences in the characteristics of families in the two surveys—in particular, the substantially better representation of high-wealth families in the SCF.

Recommendations Regarding SIPP Wealth Data

Our recommendations to ORES include strategies for making the most effective use of SIPP wealth data in their present form and improvements and enhancements that ORES should encourage the data producer, the Census Bureau, to pursue.

Making Effective Use of SIPP Wealth Data

To make the most effective use of SIPP wealth data, users need to be aware of the limitations of these data, at the very least, and be willing to consider some adjustments to the data values. These include:

Making certain that their SIPP files are the latest releases
Excluding wealthy families (for example, $2 million and up) from their analyses
Reweighting the SIPP sample to correct for its under-representation of high-income families
Extracting defined contribution pension data from the pension module and imputing other missing wealth components: primarily life insurance, trusts, and annuities
Using a Pareto distribution or data from the SCF to estimate the mean of topcoded values
Borrowing strength from the SCF or other surveys to adjust the data values using the methodology presented in this report

None of these techniques can substitute for the data improvements recommended below, but as interim tactics they can help to correct for known shortcomings of the SIPP data.

Improvements in SIPP Data Collection and Processing

We recommend the implementation of several improvements in the collection and processing of SIPP wealth data:

Adding questions to collect the cash value of life insurance as well as annuities and trusts
Moving the pension module to the same wave as the wealth module and integrating the questions on retirement wealth
Revising many of the brackets used to collect range responses when respondents cannot provide exact amounts, and substituting unfolding brackets for fixed brackets
Incorporating debts into the imputation of assets and vice versa and seriously considering model-based imputation of wealth items
Improving the review of imputed values and publishing benchmark tabulations
Improving the valuation of vehicle assets by extending the blue book method to older vehicles and replacing mean value imputation with a method that yields a distribution
Publishing means of topcoded values or assigning these as the topcodes
Establishing a version control system for public releases of SIPP data

We also recommend additional methodological research directed, first, at determining why the quality of the SIPP wealth data declined between the 1993 and 1996 panels, second, at developing a more effective approach to measuring selected components of wealth, and, third, at understanding the reasons for and finding ways to reduce the SIPP's under-representation of high-income families.