Pew Research Center Methods https://www.pewresearch.org/methods/ Methods Tue, 12 Sep 2023 13:48:08 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.1 https://www.pewresearch.org/methods/wp-content/uploads/sites/10/2018/12/favicon_128x128.png?w=32 Pew Research Center Methods https://www.pewresearch.org/methods/ 32 32 151704382 Comparing Two Types of Online Survey Samples https://www.pewresearch.org/methods/2023/09/07/comparing-two-types-of-online-survey-samples/ Thu, 07 Sep 2023 17:53:21 +0000 https://www.pewresearch.org/methods/?p=2353 Pew Research Center conducted a study to compare the accuracy of six online surveys of U.S. adults – three from probability-based panels and three from opt-in sources. On average, the absolute error on opt-in samples was about twice that of probability-based panels.

The post Comparing Two Types of Online Survey Samples appeared first on Pew Research Center Methods.

]]>
Opt-in samples are about half as accurate as probability-based panels
How we did this

Pew Research Center designed this study to assess the current state of online survey sampling methods – both probability-based and opt-in – and determine their accuracy on general population estimates for all U.S. adults and on estimates for key demographic subgroups. To do this, we administered a common questionnaire to samples obtained from three probability-based online panels, one of which was the Center’s American Trends Panel (ATP), and three online opt-in sample providers. The surveys were administered between June 14 and July 21, 2021, and included interviews with a total of 29,937 U.S. adults.

The target size for each sample was 5,000 adults to minimize the impact of sampling error on estimates for demographic subgroups that comprise only a small percentage of the total U.S. population. Because the purpose of this study is methodological, the names of the vendors are masked. Instead, we refer to the samples as probability panels 1, 2 and 3 and opt-in samples 1, 2, and 3 throughout this report.

The probability-based panels use traditional probability-based methods for recruiting a random sample of U.S. adults. Specifically, all three panels use address-based sampling (ABS) for panel recruitment. ABS begins with a random sample of households from the U.S. Post Office’s Delivery Sequence File, a near complete list of all residential addresses in the United States. Individuals in sampled households are contacted via mail and invited to join the panel and continue taking surveys periodically online. Although all three panels use similar methods for recruitment, differences in the timing and design of recruitments, the use of incentives, sampling procedures for individual panel waves, and panel maintenance practices could result in samples that are not altogether comparable to one another.

The three opt-in samples in this study are based on different but common approaches to online opt-in sampling. Opt-in sample 1 comes from a panel aggregator, or marketplace, in which individual respondents are drawn from many opt-in sample sources that have agreed to make their sample available to the aggregator. Opt-in sample 2 is sourced entirely from a single opt-in panel. Opt-in sample 3 is a blend, with about three-fifths sourced from a single opt-in panel and the remainder sourced from three sample aggregators. All three opt-in samples use a common set of quotas on age by gender, race and ethnicity, and educational attainment.

The same weighting scheme was applied to all six samples following Pew Research Center’s standard procedure for weighting the ATP. Complete details of the weighting procedure and the design of the individual samples can be found in the methodology.

Terminology

Probability-based panel: This refers to a national survey panel recruited using random sampling from a database that includes most people in the population. Today, most such panels in the United States recruit by drawing random samples of residential addresses or telephone numbers. Typically, data collection with these panels is done online. However, some of these panels interview a small fraction of respondents (usually about 5% or fewer) using an offline mode such as live telephone. These panels are “probability-based” because the chance that each address or phone number is selected is known. However, the chance that each selected person will join the panel or take surveys after joining is not known.

Online opt-in sample: These samples are recruited using a variety of methods that are sometimes referred to as “convenience sampling.” Respondents are not selected randomly from the population but are recruited from a variety of online sources such as ads on social media or search engines, websites offering rewards in exchange for survey participation, or self-enrollment in an opt-in panel. Some opt-in samples are sourced from a panel (or multiple panels), while others rely on intercept techniques where respondents are invited to take a one-off survey.

Benchmark: These are “ground-truth” measures used to assess the accuracy of survey estimates. For example, survey-based estimates for the share of voters who voted for each candidate in the 2020 presidential election are compared to a benchmark based on an official tally by the Federal Election Commission (FEC). Survey estimates are deemed more accurate the closer they are to the benchmark value. In this study, the benchmarks come from high-quality federal surveys such as the American Community Survey or administrative records like the FEC vote tally. Although these benchmarks come from “gold-standard” data sources, they are not entirely free from error. As such, they are not “true” population values but rather the best available approximations.

Error: This is the difference between an individual survey estimate and its corresponding benchmark value. Error can be either positive or negative depending on whether the survey estimate is higher or lower than the benchmark. For example, the FEC benchmark for the share of voters who voted for Donald Trump in the 2020 presidential election is 47%. If a survey estimated that share to be 42%, the error would be -5 percentage points because it came in 5 points below the benchmark. If the estimate were 49%, the error would be +2 points.

Absolute error: This is the absolute value of the error for a survey estimate. It describes the size of the error irrespective of its direction (positive or negative). For example, two estimates that have error of +5 and -5 percentage points, respectively, both have an absolute error of 5 points.

A table showing an example of measuring error for a benchmark variable.

Average absolute error (for a benchmark variable): This is a measure that summarizes the average size of errors across all the categories within a single benchmark variable. For example, the smoking status variable has four categories: 1) Smoke every day, 2) Smoke some days, 3) Do not now smoke and 4) Never smoked 100 cigarettes. A survey’s estimates for each category will have different levels of error, both positive and negative. For a given survey, the absolute error for the smoking status variable is equal to the sum of the absolute errors for each category divided by the number of categories.

Average absolute error (for a sample): Average absolute error can also be used to summarize the overall level of error across many different benchmarks within a single sample. When used in this context, the average absolute error for a sample is equal to the sum of the average absolute errors for each benchmark variable divided by the total number of benchmark variables.

As the field of public opinion research continues its steady movement toward online data collection, probability-based panels and opt-in samples have emerged as the two most common approaches to surveying individuals online. At the same time, the methodologies and industry practices for both kinds of samples are evolving.

To shed light on the current state of online probability-based and opt-in samples, Pew Research Center conducted a study to compare the accuracy of six online surveys of U.S. adults – three from probability-based panels and three from opt-in sources. This is the first such study to include samples from multiple probability-based panels, allowing for their side-by-side comparison.

The surveys in this study were administered between June 14 and July 21, 2021, and included interviews with a total of 29,937 U.S. adults, approximately 5,000 in each sample. Because this is a methodological study, the names of the sample providers are masked.

The study compared each sample’s accuracy on 28 benchmark variables drawn from high-quality government data sources. These benchmarks included a variety of measures on topics such as voting, health, and respondents’ work, family and living situations. (Refer to the appendix for the full list of benchmarks and their sources.)

The study’s key findings include:

On average, error on opt-in samples was about twice that of probability-based panels

For estimates among U.S. adults on 28 benchmark variables, opt-in samples 1, 2 and 3 had average absolute errors of 6.4, 6.1 and 5.0, respectively, for an overall average of 5.8 percentage points. This was about twice that of the probability-based online panels, for which average absolute error was 2.6 points overall (2.3, 3.0 and 2.5 on probability panels 1, 2 and 3, respectively).

Online opt-in samples had especially large errors for 18- to 29-year-olds and Hispanic adults

A dot plot showing larger errors on online opt in samples, especially for 18- to 29-year-olds and Hispanic adults.

On 25 variables for which subgroup-level benchmarks were available, the online opt-in samples averaged 11.2 percentage points of error for 18- to 29-year-olds and 10.8 points for Hispanic adults– each about 5 points higher than for U.S. adults overall (6.4 points on the same 25 variables). By comparison, the average absolute error on the probability-based panels was 3.6 points for both young adults and Hispanic adults, less than 2 points higher than the error for all adults. A similar level of error was seen on the probability-based panels for other traditionally hard-to-survey subgroups such as those with no more than a high school education (3.6 points) and non-Hispanic Black adults (3.8 points).

Error was concentrated in a handful of variables on the probability-based panels but widespread on the opt-in samples

On each of the probability-based panels, the number of benchmarks for which average absolute error was greater than 5 percentage points ranged from two to five out of 28. About half (between 14 and 15 benchmarks) had under 2 points of average absolute error. Large errors were more widespread on the opt-in samples, which had between 11 and 17 benchmarks with error greater than 5 points. Fewer benchmarks (between three and seven) on the opt-in samples had average absolute error below 2 points.

Probability-based panels consistently overestimated 2020 voter turnout

The only benchmark that had consistently high error on all three probability-based panels was voter turnout in the 2020 presidential election, which they each overestimated by +8 or +9 percentage points. This suggests that despite the inclusion of volunteerism and voter registration in weighting adjustments, the overrepresentation of politically and civically engaged individuals remains an important challenge for probability-based panels. By contrast, turnout was one of the most accurate variables on the opt-in samples, two of which came within 1 point of the benchmark while the third exceeded the benchmark by +3 points.

Much of the error on the opt-in samples appears to be due to ‘bogus respondents,’ who usually say ‘Yes’ regardless of the question

In the online opt-in samples, an average of 8% of all adults, 15% of 18- to 29-year-olds and 19% of Hispanic adults answered “Yes” on at least 10 out of 16 Yes/No questions that were asked of every respondent. The corresponding shares on the probability-based panels were between 1% and 2% for each group. Similarly large shares reported having received at least three of four types of government benefits (Social Security, food stamps, unemployment compensation or workers’ compensation) even though such individuals are virtually nonexistent in the true population. It is highly unlikely that the few individuals who do fit that description are massively overrepresented in online opt-in samples. Instead, this pattern suggests that much of the error on the online opt-in samples is due to the presence of “bogus respondents,” who make little or no effort to answer questions truthfully.

What is new about this benchmarking study?

Polling insiders may wonder why we conducted this study. Other research teams (including our own several years ago) have done similar comparisons and arrived at the same result: Probability-based samples tend to be more accurate, even if recent election polls are an exception.

One major reason we conducted this study is because it offers something new. This benchmarking study is the first to estimate the accuracy of multiple online probability-based panels in the United States. This allows us to answer a previously unaddressed question: Do different probability-based panels tend to offer similar data quality or not? The three probability-based panels tested in this study performed about the same. The average absolute error ranged from just 2.3 to 3.0 percentage points – good news for researchers seeking a reliable polling method. It’s worth noting that this study considered fairly general topics (e.g., employment, marital status), and that the results might differ if a survey focused on a more niche topic, like poverty.

The second major reason we conducted this study was that in the time since our 2016 study, there have been a number of major changes to the ATP’s methodology made in response to that study’s findings, other challenges that have arisen in the intervening years, and the Center’s evolving research needs:

  • We changed how we recruit, moving from telephone to an address-based approach.
  • We began subsampling the panel rather than routinely interviewing everyone.
  • We changed how we weight the panel, adding in adjustments for volunteerism, religion and other factors.
  • We began retiring panelists from overrepresented segments of the population.

The study reported here is, in part, our effort to measure whether those improvements made a difference, allowing us to determine how the new, improved ATP stacks up against opt-in samples and against other probability-based panels.

The post Comparing Two Types of Online Survey Samples appeared first on Pew Research Center Methods.

]]>
2353
Methodology https://www.pewresearch.org/methods/2023/09/07/benchmarking-methodology/ Thu, 07 Sep 2023 17:53:23 +0000 https://www.pewresearch.org/methods/?p=2374 The data in this report are drawn from six online surveys of U.S. adults conducted between June 14 and July 21, 2021. Three of the samples were sourced from different probability-based online panels, one of which was Pew Research Center’s American Trends Panel (ATP). The remaining three samples came from three different online opt-in sample […]

The post Methodology appeared first on Pew Research Center Methods.

]]>
The data in this report are drawn from six online surveys of U.S. adults conducted between June 14 and July 21, 2021. Three of the samples were sourced from different probability-based online panels, one of which was Pew Research Center’s American Trends Panel (ATP). The remaining three samples came from three different online opt-in sample providers. The study included interviews with a total of 29,937 U.S. adults, approximately 5,000 in each of the samples. Interviews were conducted in both English and Spanish. Because the purpose of this study is methodological, the names of the vendors are masked and the samples are referred to as probability panels 1, 2 and 3 and opt-in samples 1, 2 and 3.

The ATP cases were surveyed using normal procedures. For probability panel 3, the survey was programmed and administered by the vendor who administers that panel. The surveys for the remaining probability-based panel and the three opt-in samples were programmed and administered by a coordinating vendor. The research aims of the study were not discussed with the coordinating vendor and only the questionnaire was provided to the coordinating vendor in advance. On the probability panels, some questions were not asked if a comparable profile variable from a previous survey could be used instead. These questions are identified in the questionnaire.

Probability panel 1 had a study-specific response rate of 61%. The cumulative response rate to the survey (accounting for nonresponse to recruitment, to the current survey and for panel attrition) was 1.4%.

Probability panel 2 had a study-specific response rate of 90%. The cumulative response rate to the survey (accounting for nonresponse to recruitment, to the current survey and for panel attrition) was 3%.

Probability panel 3 had a study-specific response rate of 71%. The cumulative response rate to the survey (accounting for nonresponse to recruitment, to the current survey and for panel attrition) was 7%.

Sample design

The probability-based panels used in this study all currently recruit using address-based sampling (ABS) in which a random sample of households selected from the U.S. Postal Service’s Delivery Sequence File. This Postal Service file has been estimated to cover as much as 98% of the population, although some studies suggest that the coverage could be in the low 90% range. 2

The ATP cases included in this study are a subset of the respondents to ATP Wave 91. All 11,699 active ATP members were invited to participate in Wave 91, of which 10,606 completed the survey. To achieve an analytic sample of ATP respondents equivalent in size to the other samples included in this study, we drew a stratified random sample of the 11,699 active panelists following the procedure that would have been used to obtain a target sample size of n=5,000 on Wave 91. The panelists who were both selected for this analytic sample and completed Wave 91 are treated as completes for purposes of inclusion in this study. This is the exact set of respondents who would have been observed if only the subsample of the panel had been invited. The remaining probability-based samples were selected following each vendor’s normal procedure for achieving a target sample size of n=5,000 U.S. adults.

The three opt-in samples in this study each use a different approach to online opt-in sampling. Opt-in sample 1 comes from a panel aggregator, or marketplace, in which individual respondents are drawn from many opt-in sample sources that have agreed to make their sample available to the aggregator. Opt-in sample 2 is sourced entirely from a single opt-in panel. Opt-in sample 3 is a blend, with about three-fifths sourced from a single opt-in panel and the remainder sourced from three sample aggregators.

All three opt-in samples set quotas for age by gender, race and ethnicity, and education based on estimates from the 2019 American Community Survey.

A table that shows the opt-in sample quotas.

Data quality checks

A table showing the weighting dimensions.

No special data quality checks were performed on any of the probability-based panels. For the opt-in samples, the coordinating vendor applied checks for speeding, straightlining and duplicate cases along with other proprietary data quality checks. As a result, a total of 104 cases from opt-in sample 1, 86 from opt-in sample 2, and 69 from opt-in sample 3 were removed for poor data quality.

Weighting

All six samples were weighted following the standard procedure used on ATP Wave 91. For the probability-based panels, this began with a base weight that accounted for differential probabilities of being invited to join the panel, adjustments for panel attrition, and the probability that each panelist was invited to participate in this specific survey. Base weights for each panel were provided by their vendors. Because opt-in samples do not have any known probabilities of selection, all respondents in the opt-in samples were assigned a base weight of 1. The base weight for each sample was then calibrated to align with population benchmarks listed in the accompanying table.

The post Methodology appeared first on Pew Research Center Methods.

]]>
2374
Acknowledgments https://www.pewresearch.org/methods/2023/09/07/benchmarking-acknowledgments/ Thu, 07 Sep 2023 17:53:23 +0000 https://www.pewresearch.org/methods/?p=2442 This report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder. This report is a collaborative effort based on the input and analysis of the following individuals: Research team Andrew Mercer, Senior Research MethodologistArnold Lau, Research Methodologist Communications and editorial                   Rachel Drian, […]

The post Acknowledgments appeared first on Pew Research Center Methods.

]]>
This report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder.

This report is a collaborative effort based on the input and analysis of the following individuals:

Research team

Andrew Mercer, Senior Research Methodologist
Arnold Lau, Research Methodologist

Communications and editorial                  

Rachel Drian, Associate Director, Communications
Nida Asheer, Senior Communications Manager                             
Talia Price, Communications Associate
Anna Jackson, Editorial Assistant

Graphic design and web publishing

Bill Webster, Senior Information Graphics Designer
Janakee Chavda, Assistant Digital Producer
Beshay Sakla, Associate Digital Producer

Methodology

Dorene Asare-Marfo, Panel Manager

In addition, this project benefitted greatly from feedback by the following Pew Research Center Staff: Courtney Kennedy and Scott Keeter.

The post Acknowledgments appeared first on Pew Research Center Methods.

]]>
2442
1. Assessing the accuracy of estimates for U.S. adults https://www.pewresearch.org/methods/2023/09/07/assessing-the-accuracy-of-estimates-for-u-s-adults/ Thu, 07 Sep 2023 17:53:22 +0000 https://www.pewresearch.org/methods/?p=2453 To gauge each sample’s accuracy on general population estimates for all U.S. adults, we calculated the weighted percentage of adults belonging to 77 categories across 28 different variables and compared them to corresponding benchmarks derived from high-quality government data sources. The benchmarks covered a variety of topics, including voting, health, and respondents’ work, family and […]

The post 1. Assessing the accuracy of estimates for U.S. adults appeared first on Pew Research Center Methods.

]]>
Terminology

Probability-based panel: This refers to a national survey panel recruited using random sampling from a database that includes most people in the population. Today, most such panels in the United States recruit by drawing random samples of residential addresses or telephone numbers. Typically, data collection with these panels is done online. However, some of these panels interview a small fraction of respondents (usually about 5% or fewer) using an offline mode such as live telephone. These panels are “probability-based” because the chance that each address or phone number is selected is known. However, the chance that each selected person will join the panel or take surveys after joining is not known.

Online opt-in sample: These samples are recruited using a variety of methods that are sometimes referred to as “convenience sampling.” Respondents are not selected randomly from the population but are recruited from a variety of online sources such as ads on social media or search engines, websites offering rewards in exchange for survey participation, or self-enrollment in an opt-in panel. Some opt-in samples are sourced from a panel (or multiple panels), while others rely on intercept techniques where respondents are invited to take a one-off survey.

Benchmark: These are “ground-truth” measures used to assess the accuracy of survey estimates. For example, survey-based estimates for the share of voters who voted for each candidate in the 2020 presidential election are compared to a benchmark based on an official tally by the Federal Election Commission (FEC). Survey estimates are deemed more accurate the closer they are to the benchmark value. In this study, the benchmarks come from high-quality federal surveys such as the American Community Survey or administrative records like the FEC vote tally. Although these benchmarks come from “gold-standard” data sources, they are not entirely free from error. As such, they are not “true” population values but rather the best available approximations.

Error: This is the difference between an individual survey estimate and its corresponding benchmark value. Error can be either positive or negative depending on whether the survey estimate is higher or lower than the benchmark. For example, the FEC benchmark for the share of voters who voted for Donald Trump in the 2020 presidential election is 47%. If a survey estimated that share to be 42%, the error would be -5 percentage points because it came in 5 points below the benchmark. If the estimate were 49%, the error would be +2 points.

Absolute error: This is the absolute value of the error for a survey estimate. It describes the size of the error irrespective of its direction (positive or negative). For example, two estimates that have error of +5 and -5 percentage points, respectively, both have an absolute error of 5 points.

A table showing an example of measuring error for a benchmark variable.

Average absolute error (for a benchmark variable): This is a measure that summarizes the average size of errors across all the categories within a single benchmark variable. For example, the smoking status variable has four categories: 1) Smoke every day, 2) Smoke some days, 3) Do not now smoke and 4) Never smoked 100 cigarettes. A survey’s estimates for each category will have different levels of error, both positive and negative. For a given survey, the absolute error for the smoking status variable is equal to the sum of the absolute errors for each category divided by the number of categories.

Average absolute error (for a sample): Average absolute error can also be used to summarize the overall level of error across many different benchmarks within a single sample. When used in this context, the average absolute error for a sample is equal to the sum of the average absolute errors for each benchmark variable divided by the total number of benchmark variables.

To gauge each sample’s accuracy on general population estimates for all U.S. adults, we calculated the weighted percentage of adults belonging to 77 categories across 28 different variables and compared them to corresponding benchmarks derived from high-quality government data sources. The benchmarks covered a variety of topics, including voting, health, and respondents’ work, family and living situations. (Refer to the appendix for the full list of benchmarks and their sources.)

A bar chart showing the average error on online opt-in samples was twice that of probability-based panels.

Because many of the benchmark variables included more than one category, we calculated each variable’s average absolute error – that is, the average of the absolute differences between the survey estimate and a corresponding benchmark value for each category – to compare the relative accuracy of variables that have different numbers of categories. To facilitate more general comparisons between samples overall, we also calculated the average absolute error for each sample as the mean of the average absolute errors across all 28 benchmarks.

In any study of this kind, it is important to note that the performance of any given sample depends on numerous factors such as the variables used in weighting or the specific topics included in the benchmarks. It is possible that the relative accuracy of each sample might differ if we had used a different weighting scheme or chosen a different set of benchmarks for comparison. Furthermore, not even “gold-standard” government statistics are entirely free from error. Consequently, the measures of error discussed in this report should be considered approximate.

Across all 28 benchmarks combined, the probability-based panels had a mean average absolute error of 2.6 percentage points for estimates among all U.S. adults. The error for individual probability-based panels ranged from a low of 2.3 points for probability panel 1 to a high of 3.0 points for probability panel 2. The average absolute error for the opt-in samples combined was about twice as large at 5.8 points. Of these, opt-in sample 3 had the lowest average error at 5.0 points. Opt-in samples 1 and 2 exhibited higher error on average with 6.4 and 6.1 points, respectively.

For ease of explanation, individual benchmark variables whose average absolute error was less than 2 percentage points were classified as having “low” error. Variables with more than 5 points of average absolute error were defined as having “high” error, and the remainder were coded as having “medium” error. These particular groupings were chosen because they each contain about one-third of all benchmarks from all six samples. It is important to note that these designations of low, medium or high error are relative to the specific benchmarks and samples included in this study.

A chart showing that opt-in samples had many more benchmarks with large errors than probability-based panels.

Collectively, about half of all benchmarks on the probability-based panels fell into the low error category (44 out of 84) while 11% were classified as high error (9 out of 84). The only variable with consistently high error on the probability-based panels was voter turnout in the 2020 presidential election, for which all three samples overestimated the benchmark by +8 or +9 percentage points. This is consistent with our 2016 study, which found civically engaged adults to be overrepresented on the American Trends Panel, and suggests that the problem afflicts ABS-recruited panels more generally.

A table showing the level of error on benchmark variables.

This overall pattern was reversed for the opt-in samples, on which 51% of benchmarks fell in the high error category, compared with 20% in the low error category. There were 10 variables with consistently high error on the opt-in samples. Four involved the receipt of certain government benefits in the prior year. On average, the opt-in samples overestimated the shares of adults receiving food stamps (+16 points), Social Security (+15 points), unemployment compensation (+10 points) and workers’ compensation (+9 points).

Another three variables with consistently high error related to employment, with the opt-in samples underestimating the share of all adults who worked for pay in the prior week by -12 points on average and the share of adults who worked at a job or business at any time in the prior year by -11 points on average. The opt-in samples also overstated the share of adults who were employed but had been unable to work in the past month because of COVID-19 by an average of +7 points.

A dot plot that shows samples of the same type generally had similar levels of error on individual benchmarks.

Two health-related benchmarks also saw consistently high error on the opt-in samples. Specifically, all of the opt-in samples exceeded the national benchmark for the share of adults with a food allergy (9%) by +6 points on average. They also understated the share of adults who have never tried vaping by -12 points and overstated the share who currently vape some days or every day by +8 and +5 points on average, respectively. Finally, all three opt-in samples overstated the share of adults who live in carless or single-car households by an average of +6 and +15 points, respectively.

At the other end of the spectrum, there were seven variables with consistently low error on all three probability-based panels. These included parental status, number of children in the household, marital status, housing tenure, smoking status, English language proficiency and candidate vote share in the 2020 presidential election. Two of these, English proficiency and 2020 vote share, also had consistently low error on all three opt-in samples. Citizenship status also had consistently low error on the opt-in samples.

The post 1. Assessing the accuracy of estimates for U.S. adults appeared first on Pew Research Center Methods.

]]>
2453
2. Assessing the accuracy of estimates among demographic subgroups https://www.pewresearch.org/methods/2023/09/07/assessing-the-accuracy-of-estimates-among-demographic-subgroups/ Thu, 07 Sep 2023 17:53:23 +0000 https://www.pewresearch.org/methods/?p=2457 The accuracy of general population estimates is only one facet of data quality for online samples. Frequently, survey researchers also want to understand the similarities and differences between subgroups within the population. For probability-based panels recruited using ABS, obtaining a sufficiently large sample of respondents belonging to small subgroups can be particularly costly, and one […]

The post 2. Assessing the accuracy of estimates among demographic subgroups appeared first on Pew Research Center Methods.

]]>
Terminology

Probability-based panel: This refers to a national survey panel recruited using random sampling from a database that includes most people in the population. Today, most such panels in the United States recruit by drawing random samples of residential addresses or telephone numbers. Typically, data collection with these panels is done online. However, some of these panels interview a small fraction of respondents (usually about 5% or fewer) using an offline mode such as live telephone. These panels are “probability-based” because the chance that each address or phone number is selected is known. However, the chance that each selected person will join the panel or take surveys after joining is not known.

Online opt-in sample: These samples are recruited using a variety of methods that are sometimes referred to as “convenience sampling.” Respondents are not selected randomly from the population but are recruited from a variety of online sources such as ads on social media or search engines, websites offering rewards in exchange for survey participation, or self-enrollment in an opt-in panel. Some opt-in samples are sourced from a panel (or multiple panels), while others rely on intercept techniques where respondents are invited to take a one-off survey.

Benchmark: These are “ground-truth” measures used to assess the accuracy of survey estimates. For example, survey-based estimates for the share of voters who voted for each candidate in the 2020 presidential election are compared to a benchmark based on an official tally by the Federal Election Commission (FEC). Survey estimates are deemed more accurate the closer they are to the benchmark value. In this study, the benchmarks come from high-quality federal surveys such as the American Community Survey or administrative records like the FEC vote tally. Although these benchmarks come from “gold-standard” data sources, they are not entirely free from error. As such, they are not “true” population values but rather the best available approximations.

Error: This is the difference between an individual survey estimate and its corresponding benchmark value. Error can be either positive or negative depending on whether the survey estimate is higher or lower than the benchmark. For example, the FEC benchmark for the share of voters who voted for Donald Trump in the 2020 presidential election is 47%. If a survey estimated that share to be 42%, the error would be -5 percentage points because it came in 5 points below the benchmark. If the estimate were 49%, the error would be +2 points.

Absolute error: This is the absolute value of the error for a survey estimate. It describes the size of the error irrespective of its direction (positive or negative). For example, two estimates that have error of +5 and -5 percentage points, respectively, both have an absolute error of 5 points.

A table showing an example of measuring error for a benchmark variable.

Average absolute error (for a benchmark variable): This is a measure that summarizes the average size of errors across all the categories within a single benchmark variable. For example, the smoking status variable has four categories: 1) Smoke every day, 2) Smoke some days, 3) Do not now smoke and 4) Never smoked 100 cigarettes. A survey’s estimates for each category will have different levels of error, both positive and negative. For a given survey, the absolute error for the smoking status variable is equal to the sum of the absolute errors for each category divided by the number of categories.

Average absolute error (for a sample): Average absolute error can also be used to summarize the overall level of error across many different benchmarks within a single sample. When used in this context, the average absolute error for a sample is equal to the sum of the average absolute errors for each benchmark variable divided by the total number of benchmark variables.

The accuracy of general population estimates is only one facet of data quality for online samples. Frequently, survey researchers also want to understand the similarities and differences between subgroups within the population. For probability-based panels recruited using ABS, obtaining a sufficiently large sample of respondents belonging to small subgroups can be particularly costly, and one selling point for the use of online opt-in samples is their ability to obtain a large number of interviews with members of hard-to-reach groups at comparatively low cost.

A bar chart showing large errors for 18- to 29-year-olds and Hispanic adults on opt-in samples.

To evaluate the relative accuracy of benchmark subgroup estimates, average absolute error was calculated for each of the samples among subgroups defined by age, education, and race and ethnicity across 25 variables for which subgroup benchmarks were available. 2

The groups with the largest error across the probability-based panels were adults ages 18 to 29, adults with no more than a high school education, Hispanic adults and non-Hispanic Black adults, which averaged between 3.6 and 3.8 percentage points of error across the three panels. This is about 1 to 2 points less accurate than the other subgroups, which all had average errors between 1.7 and 2.6 points.

By comparison, the opt-in samples had larger error on average for every subgroup, and the differences between the most accurate and least accurate subgroups were much larger. This pattern was most striking for age groups. Here, estimates for adults ages 65 and older on the opt-in samples had an average error of 2.6 points, making them nearly as accurate as the probability-based panels. Their average error was higher for ages 30 to 64, at 7.5 points. Average error was highest for those ages 18 to 29, at 11.2 points – about four times as large as the error for ages 65 and older.

Error for racial and ethnic groups in the opt-in samples showed a similar pattern. Estimates among non-Hispanic White adults were the most accurate, with an overall average absolute error of 5.8 points across the three opt-in samples. Average error among non-Hispanic Black adults was somewhat larger, at 7.2 points, while the average error among Hispanic adults was almost twice as large, at 10.8 points.

Large errors on the opt-in samples were also observed regardless of panelists’ level of education. Average errors ranged from 6.0 points for people with some college education to 6.8 points for those with a high school diploma or less.

Opt-in samples had large errors on receipt of government benefits and other variables among 18- to 29-year-olds and Hispanic adults

A dot plot that shows, on opt-in samples, 18- to 29-year-olds and Hispanic adults show especially large errors on many of the same variables.

What factors explain the particularly large errors for 18- to 29-year-olds and Hispanic adults on the opt-in samples? While estimates for both groups are characterized by larger errors on a greater number of variables than other subgroups, there are a few that stand out. The four benchmarks related to the receipt of government benefits, which had some of the very largest errors for full-sample estimates, had dramatically larger errors for these groups. The opt-in samples overestimated the share of all adults who received food stamps in the past year by an average of +16 percentage points. This overestimation was higher on average among 18- to 29-year-olds (+24) and Hispanic adults (+25). Receipt of Social Security benefits, which had an average error of +15 points for all adults, had errors of +24 for 18- to 29-year-olds and +20 for Hispanic adults. On average, receipt of unemployment compensation was overestimated by +10 points for all adults, +18 points for 18- to 29-year-olds and +21 points for Hispanic adults. Finally, receipt of workers’ compensation had an average error of +9 points for all adults, compared with a much higher +23 points for 18- to 29-year-olds and +22 points for Hispanic adults.

For another seven variables, the average absolute error for both 18- to 29-year-olds and Hispanic adults was between 5 and 10 points higher than the error for all adults. These variables included whether one’s work was affected by the COVID-19 pandemic, having a food allergy, union membership, military service, 1-year migration status, parental status and U.S. citizenship.

Similar differences in the magnitude of error were also seen for 18- to 29-year-olds on the benchmarks for high blood pressure, housing tenure and English language proficiency, and for Hispanic adults on e-cigarette usage.

The concentration of disproportionately large errors on so many variables within two specific subgroups raises the question of whether these are primarily errors of representation or measurement. For example, are 18- to 29-year-olds who received food stamps simply overrepresented because some aspect of the data collection process makes them much more likely to participate in online opt-in surveys than other 18- to 29-year-olds? Or are these respondents reporting that they received food stamps when in truth they did not? While this study cannot definitively rule out the possibility these individuals are answering honestly, many respondents to the opt-in surveys answered combinations of questions in ways that are more plausibly explained by individual misreporting than the overrepresentation of certain groups.

The population benchmarks for receipt of government benefits in the previous year (food stamps, Social Security, unemployment compensation and workers’ compensation) provide one such example. All of these benchmarks come from the 2021 Current Population Survey Annual Social and Economic Supplement (CPS ASEC), which makes it possible to compute benchmarks for not only the share who received each individual benefit but also for the number of different benefits received. Almost two-thirds of all U.S. adults (62%) did not receive any of these benefits, while 38% received either one or two, according to CPS ASEC data. Adults who received three or four of these benefits comprise only 0.1% of the full U.S. adult population and no more than 0.2% of any demographic subgroup included in this analysis.

By comparison, the estimated share of adults who received three or four benefits ranged from 6% to 9% on the three opt-in samples. Among 18- to 29-year-olds, estimated shares varied between 15% and 18%; for Hispanic adults, those shares were and between 16% and 19%. On all three probability-based panels, the corresponding estimates were 1% for all adults, between 1% and 2% for 18- to 29-year-olds and between 1% and 3% for Hispanic adults.

It is difficult to see how a group that makes up just a fraction of a percent of the population could come to comprise almost one-in-ten of all respondents, and nearly one-in-five of both 18- to 29-year-old and Hispanic respondents on online opt-in samples. A more straightforward explanation would be a group of respondents who are disproportionately choosing the “Yes” answer rather than answering truthfully. The large errors in estimates among 18- to 29-year-old and Hispanic adults are also consistent with a 2020 Center study that found so-called “bogus respondents” – respondents who make little or no effort to answer survey questions truthfully – disproportionately claimed to be either Hispanic or 18 to 29 years old.

A bar chart that shows some opt-in respondents tend to answer 'Yes' regardless of the question.

Looking at Yes/No questions more broadly, there remains a consistent pattern. Not counting the question asking respondents if they identify as Hispanic or Latino, there were 12 additional Yes/No questions on the survey that were asked of all respondents, bringing the total number of Yes/No questions to 16. On the probability-based panels, an average of 1% of all adults, 1% of 18- to 29-year-olds and 2% of Hispanic adults answered “Yes” to 10 or more of these questions. On the opt-in samples, the corresponding averages were 8% of all adults, 15% of 18- to 29-year-olds and 19% of Hispanic adults. These results are consistent with the presence of a sizeable group of respondents to the opt-in samples being systematically more likely to answer in the affirmative to Yes/No questions in general.

It is notable that among adults ages 65 and older on the opt-in samples, those saying “Yes” to 10 or more questions comprised only a fraction of a percent on average. This suggests that an absence of bogus respondents within this age group may be a primary reason its accuracy in the opt-in samples was comparable to that of the probability-based panels. One possible reason for this absence may be that on survey measured age by asking respondents to select their year of birth from a drop-down menu with more recent years at the top. Selecting a year of birth corresponding to ages 65 and older would have required more effort than one corresponding to ages 18 to 29, which were much higher up on the list. It is unclear whether a different answer format would have yielded different results.

These findings should not be taken to mean that people who are truly 18 to 29 years old or Hispanic are more likely to misrepresent themselves in online opt-in surveys. It is more likely that individuals who misreport on questions of substantive interest also do so for demographics such as race and age as well. Individuals who are simply attempting to earn survey incentives may be strategically selecting the answer choices they believe are most likely to meet any eligibility criteria for survey participation and least likely to result in being screened out. It is possible that many of the Yes/No questions on the survey resemble the kinds of questions that are commonly used to screen for identify specific subgroups of interest. For example, a bogus respondent seeing a question asking if they have ever vaped may suspect that researchers are conducting a survey of e-cigarette users and that answering “No” would lead to their being screened out. This would be consistent with one recent study that found evidence of widespread falsification intended to get around screening questions in an online opt-in sample. These conclusions are necessarily speculative, as this study was not designed to measure the response strategies of bogus respondents, and this remains an important subject for future research.

However, the fact that such a large portion of the error on the opt-in samples appears attributable to bogus responding that is disproportionately concentrated within specific demographic groups has important implications for practice. The weighting and modeling methods that are most commonly used to adjust for differences between opt-in samples and the larger population are premised on an assumption that the adjustment variables accurately describe the respondents (e.g. that respondents who say they are Hispanic are, in fact, Hispanic) and that what error exists is small and not strongly correlated with substantive variables of interest.

Here, we have seen that error in adjustment variables like age and Hispanic ethnicity appears to be both widespread in opt-in samples and strongly associated with responses to at least Yes/No questions but potentially other kinds of questions where the behavior is not as straightforwardly detectable. While this study did not include trap questions or attention checks, past studies have found such questions to be unsuccessful in identifying bogus respondents. Under such circumstances, there is little reason to expect these kinds of adjustment methods to be successful in the absence of better methods for detecting bogus respondents.

That these kinds of response behaviors appear to be much less common in probability-based panels is heartening and supports a different set of methodological research priorities, particularly correcting the overrepresentation of the most politically and civically engaged respondents.

The post 2. Assessing the accuracy of estimates among demographic subgroups appeared first on Pew Research Center Methods.

]]>
2457
How Public Polling Has Changed in the 21st Century https://www.pewresearch.org/methods/2023/04/19/how-public-polling-has-changed-in-the-21st-century/ Wed, 19 Apr 2023 14:53:44 +0000 https://www.pewresearch.org/methods/?p=2312 A new study found that 61% of national pollsters used different methods in 2022 than in 2016. And last year, 17% of pollsters used multiple methods to sample or interview people – up from 2% in 2016.

The post How Public Polling Has Changed in the 21st Century appeared first on Pew Research Center Methods.

]]>
61% of national pollsters in the U.S. used methods in 2022 that differed from those of 2016
How we did this
Number of active national public pollsters in the study dataset

This study looks at how national public opinion polling in the United States changed from 2000 to 2022. It focuses on two aspects: the sample source(s) – that is, where the survey takers came from – and the mode(s), or how they were interviewed. The study tracks both the growth in the number of national pollsters and changes to how they conduct their public polls.

The unit of analysis is the polling organization. The dataset accompanying this report lists the 78 organizations included. The Documentation URLs file provides the webpages used to code the methods. The Codebook details what each code means. Center researchers compiled the poll information in several stages and using a variety of sources. The initial coding of sample source and mode of interview was based on information available from sources such as pollster websites, news articles, press releases and the Roper iPoll data archive. After compiling the data, Center staff attempted to contact each organization and ask them to confirm whether the information for that organization was accurate. Most organizations that responded confirmed that the information gathered was accurate. When organizations provided corrections or additions, staff updated the study database with that information. Each of these steps is described in greater detail in the Methodology.

Terminology

Address-based sampling (ABS). ABS refers to recruiting survey takers by selecting a random sample of residential addresses from a master list, such as that maintained by the U.S. Postal Service.

Interactive voice response (IVR). IVR refers to a poll that entails automatic dialing of telephone numbers and a recorded voice asking a series of survey questions. They are sometimes referred to as “robo-polls.”

Mode of interview. This refers to the format in which respondents are presented with and respond to survey questions. Surveys can be administered by an interviewer, or they can be self-administered. The most common interviewer-administered mode in public polling is live telephone. The most common self-administered modes include online, text message and paper.

Method. This study uses the term “method” broadly, referring to the source of the respondents (the “sampling frame”) and how they were interviewed (the “mode”). This study describes a change in either of those as a change in method.

Multiple methods. Sometimes pollsters use multiple sample sources or multiple interview modes within a poll. Other times pollsters use multiple sample sources or multiple interview modes within the same year but in separate polls. The study describes any of the above as a pollster using multiple methods that year.

Survey panel. This is a group of people who have agreed to take surveys on an ongoing basis. The survey panels documented in this study each have thousands (and in some cases tens of thousands) of members.

Probability-based panel. This refers to a national survey panel recruited using random sampling from a database that includes most people in the population. Today, most such panels in the U.S. recruit by drawing random samples of residential addresses or telephone numbers. Typically, data collection with these panels is done online. However, some of these panels interview a small fraction of respondents (usually about 5% or fewer) using an offline mode such as live telephone. These panels are “probability-based” because the chance that each address or phone number was selected is known. However, the chance that each selected person will join the panel or take surveys after joining is not known.

Online opt-in polls. These polls are recruited using a variety of methods that are sometimes referred to as “convenience sampling.” Respondents are not selected randomly from the population but are recruited from a variety of online sources such as ads on social media or search engines, websites offering rewards in exchange for survey participation, or self-enrollment in an opt-in panel. Some opt-in samples are sourced from a panel (or multiple panels), while others rely on intercept techniques where respondents are invited to take a one-off survey.

Registration-based sampling. Some election polls sample from an official list of registered voters. All states are required to maintain a computerized and up-to-date list of voters and to make these lists publicly available for non-commercial purposes such as voter outreach and research. 

Sampling frame. A sampling frame is a list of the population of interest. For a survey of the public, it is typically telephone numbers or residential addresses and ideally includes all members of the population (though in practice there are often gaps and omissions). The survey sample is selected from this list.   

Sponsor. In this report, a survey sponsor is an organization that publicly releases results from a poll conducted on its behalf. Survey sponsors typically conceive of the study and either provide or obtain funding for it. Sponsors and vendors sometimes share the practical tasks involved in conducting a survey, such as when a sponsor drafts the questionnaire, and the vendor creates the sample and collects the data. This report uses “sponsor” and “pollster” interchangeably.   

Text message polling. Text messaging is used by some organizations to contact a sample of cellphone numbers for the purpose of either directing respondents to an online survey or asking them a set of survey questions using a series of text messages and responses.          

Vendor. In this report, the survey vendor is the organization that collects the survey data. The full set of tasks necessary for a survey are often shared between the sponsor and vendor, with the exact mix being determined by the specific expertise of the two parties and other factors. Sometimes vendors are also sponsors, either alone or in partnership with other sponsors.  

The 2016 and 2020 presidential elections left many Americans wondering whether polling was broken and what, if anything, pollsters might do about it. A new Pew Research Center study finds that most national pollsters have changed their approach since 2016, and in some cases dramatically. Most (61%) of the pollsters who conducted and publicly released national surveys in both 2016 and 2022 used methods in 2022 that differed from what they used in 2016. The study also finds the use of multiple methods increasing. Last year 17% of national pollsters used at least three different methods to sample or interview people (sometimes in the same survey), up from 2% in 2016.

A chart showing Polling has entered a period of unprecedented diversity in methods

This study captures what changes were made and approximately when. While it does not capture why the changes were made, public commentary by pollsters suggests a mix of factors – with some adjusting their methods in response to the profession’s recent election-related errors and others reacting to separate industry trends. The cost and feasibility of various methods are likely to have influenced decisions.

This study represents a new effort to measure the nature and degree of change in how national public polls are conducted. Rather than leaning on anecdotal accounts, the study tracked the methods used by 78 organizations that sponsor national polls and publicly release the results. The organizations analyzed represent or collaborated with nearly all the country’s best-known national pollsters. In this study, “national poll” refers to a survey reporting on the views of U.S. adults, registered voters or likely voters. It is not restricted to election vote choice (or “horserace”) polling, as the public opinion field is much broader. The analysis stretches back to 2000, making it possible to distinguish between trends emerging before 2016 (e.g., migration to online methods) and those emerging more recently (e.g., reaching respondents by text message). Study details are provided in the Methodology. Other key findings from the study include:

Pollsters made more design changes after 2020 than 2016. In the wake of the 2016 presidential election, it was unclear if the polling errors were an anomaly or the start of a longer-lasting problem. 2020 provided an answer, as most polls understated GOP support a second time. The study found that after 2020, more than a third of pollsters (37%) changed how they sample people, how they interview them, or both. This compares with about a quarter (26%) who made changes after 2016. As noted above, though, these changes did not necessarily occur because of concerns about election-related errors.

The number of national pollsters relying exclusively on live phone is declining rapidly. Telephone polling with live interviewers dominated the industry in the early 2000s, even as pollsters scrambled to adapt to the rapid growth of cellphone-only households. Since 2012, however, its use has fallen amid declining response rates and increasing costs. Today live phone is not completely dead, but pollsters who use it tend to use other methods as well. Last year 10% of the pollsters examined in the study used live phone as their only method of national public polling, but 32% used live phone alone or in combination with other methods. In some cases, the other methods were used alongside live phone in a single poll, and in other cases the pollster did one poll using live phone and other polls with a different method.

Several key trends, such as growth of online polling, were well underway prior to 2016. While the 2016 and 2020 elections were consequential events for polling, the study illustrates how some of the methodological churn in recent years reflects longer-term trends. For example, the growth of online methods was well underway before 2016. Similarly, some live phone pollsters had already started to sample from registered voter files (instead of RDD, random-digit dialing) prior to 2016.

A chart showing Polling on probability-based panels is becoming more common

Use of probability-based panels has become more prevalent. A growing number of pollsters have turned to sampling from a list of residential addresses from the U.S. Postal Service database to draw a random sample of Americans, a method known as address-based sampling (ABS). There are two main types of surveys that do this: one-off or standalone polls and polls using survey panels recruited using ABS or telephone (known as probability-based panels). Both are experiencing growth. The number of national pollsters using probability-based panels alone or in combination with other methods tripled from 2016 to 2022 (from seven to 23). The number of national pollsters conducting one-off ABS surveys alone or in combination with other methods during that time rose as well (from one in 2016 to seven in 2022).

A chart showing Growth of online opt-in methods in national public polls paused between 2020 and 2022

The growth of online opt-in among national pollsters appears to have paused after 2020. The number of national pollsters using convenience samples of people online (“opt-in sampling”) – whether alone or in combination with other methods – more than quadrupled between 2012 and 2020 (from 10 to 47). In 2022, however, this number held flat, suggesting that the era of explosive growth could be ending.

Whether changes to sample sources and modes translate into greater accuracy in presidential elections remains to be seen. The fact that pollsters are expanding into new and different methods is not a guarantee that the underrepresentation of GOP support occurring in 2016 and 2020 preelection polls has been fixed. Polling accuracy improved in 2022, but this represents only one nonpresidential election. 

Notable study limitations

A study of this nature requires difficult decisions about what exactly will be measured and what will not. This study focuses on two key poll features: the sample source(s) – that is, where the respondents came from – and the mode(s), or how they were interviewed. While important, these elements are not exhaustive of the decisions required in designing a poll. The study did not attempt to track other details, such as weighting, where public documentation is often missing. Because the study only measured two out of all possible poll features, estimates from this study likely represent a lower bound of the total amount of change in the polling industry.

Another limitation worth highlighting is the fact that state-level polls are not included. Unfortunately, attempting to find, document and code polling from all 50 states and the District of Columbia would have exceeded the time and staff resources available. A related consideration is that disclosure of methods information tends to be spottier for pollsters who exclusively work at the state level, though there are some exceptions. It is not clear whether analysis at the level of detail presented in this report would be possible for state-only pollsters.

While not necessarily a limitation, the decision to use the polling organization rather than individual polls as the unit of analysis has implications for the findings. The proliferation of organizations using online methods implies but does not prove that online polls grew as well. However, research conducted by the American Association for Public Opinion Research (AAPOR) following the 2016 and 2020 elections reveals an explosion in the share of all polling done using online methods. AAPOR estimated that 56% of national polls conducted shortly before the 2016 election used online methods; the comparable share for 2020 was 84%. More details on the strengths and weaknesses of the study are presented in the Methodology.

Changes in methods are driven by many considerations, including costs and quality

In an attempt to verify the accuracy of the categorization of polling methodologies, researchers attempted to contact all organizations represented in the database. Several pollsters contacted for this study noted that use of a particular method was not necessarily an endorsement of methodological quality or superiority. Instead, design decisions often reflect a multitude of factors. Survey cost – especially the increasing cost of live phone polling – came up repeatedly. Timing can also be a factor, as a design like address-based sampling can take weeks or even months to field. As noted above, this study does not attempt to address why each organization polled the way they did. It aims only to describe major changes observable within the polling industry. Nor does it evaluate the quality of different methods, as a multitude of other studies address that question.

Changes to polling after 2020 differed from those after 2016

The study found a different kind of change within the polling industry after 2020 versus 2016. After 2020, changes were both more common and more complex. More than a third (37%) of pollsters releasing national public polls in both 2020 and 2022 changed their methods during that interval. By contrast, the share changing their methods between 2016 and 2018 was 26%.

A chart showing More than a third of national public pollsters changed how they poll after 2020

The nature of the changes also differed. About half of the changes observed from 2016 to 2018 reflected pollsters going online – either by adding online interviewing as one of their methods or fully replacing live phone interviewing. By contrast, the changes observed from 2020 to 2022 were more of a mix. During that period, some added an approach like text messaging (e.g., Change Research, Data for Progress), probability-based panels (Politico, USA Today) or multiple new methods (CNN, Wall Street Journal). About a quarter of the change observed from 2020 to 2022 reflected pollsters who had already moved online dropping live phone as one of their tools (e.g., CBS News, Pew Research Center).

A look at change over the entire recent period – from 2016 to 2022 – finds that more than half of national public pollsters (61%) used methods in 2022 that differed from those they used in 2016. As noted above, if features like weighting protocols were included in the analysis, that rate would be even higher.

A longer view of modern public polling (going back to 2000) shows that methodological churn began in earnest around 2012 to 2014. That was a period when about a third of national pollsters changed their methods. Change during that period was marked by pollsters starting to migrate away from live telephone surveys and toward online surveys.

Pollsters increasingly use multiple methods – sometimes three or more

A chart showing Growing share of national pollsters are using multiple methods

Pollsters are not just using different methods, many are now using multiple methods, the study found. Here again there is a discernable difference in how polls changed after 2016 and how they changed after 2020. After 2016, the share of pollsters using multiple methods remained virtually unchanged (30% in both 2016 and 2018). After 2020, however, the share climbed to 39%. Notably, the share of pollsters using three or more different methodologies in their national public polls tripled from 5% in 2020 to 17% in 2022.

In this analysis, “multiple methods” refers to use of multiple sample sources (e.g., registered voter files and random-digit dial) or multiple interview modes (e.g., online, mail, live telephone). In some cases, several methods were used in a single poll. In other cases the pollster did one poll using one method and another poll using another method.

As an example, in 2014 Pew Research Center switched from exclusively using live phone with random-digit-dial sample to also using a probability-based panel. In 2020 the Center added an additional method, one-off address-based sample surveys offering online or mail response. By 2022, the Center dropped live phone polling. Pollsters that used at least three different methods in 2022 include CNN, Gallup, NPR, Politico and USA Today.

Text messaging and address-recruited panels see growth after 2020

An overarching theme in the study is the growth of new methods. Analysis earlier in this report aimed to describe trends for the most prominent methods. In the past, pollsters often used just one method (e.g., live phone with random-digit dial). That has changed. Today pollsters tend to use new methods (such as text) as one of several ways that they reach people. To track the trajectory of these newer methods, it helps to consider the number of pollsters using the method by itself or in combination with other methods.

A prime example is text message polling. An extremely small share of pollsters conduct national public polls exclusively by text. A larger share use text alongside another method, such as online opt-in.

A chart showing Texting gains some traction in national polling in 2022

How texting is used varies. In some cases respondents receive a text with a web link for an online survey. In other cases, respondents answer the questions via text. Among the pollsters in this study, just one used texting in a national public survey in 2020. In 2022 that number rose to nine, representing 13% of the active national pollsters tracked that year. These figures reflect the number of pollsters using texting alone or in combination with other methods like live phone.

Analysis looking at methods used either alone or in combination with other approaches also suggests a change in the trajectory of online opt-in polling. While online opt-in usage grew tremendously between 2006 and 2020, that growth appears to have slowed if not stopped in 2022 for national polling.

By contrast, the share of national pollsters turning to probability-based panels continues to grow. In 2022 a third (33%) of national pollsters used probability-based panels either alone or in combination with other methods. This is up from roughly 10% during most of the 2010s.

Live phone was once the dominant method of polling but has been in decline since 2016. As of 2022, about a third of national pollsters used live phone alone or in combination (32%), while a much smaller share relied on it as their only method (10%).

The study also tracked the adoption of a specific kind of opt-in sample – members of an online opt-in panel who are matched to a record in a national registered voter file. This study first observed that approach in 2018. In 2018, 2020 and 2022, about 3% to 5% of national public pollsters used online opt-in samples matched to registered voter files, the study found.

The post How Public Polling Has Changed in the 21st Century appeared first on Pew Research Center Methods.

]]>
2312
Acknowledgments https://www.pewresearch.org/methods/2023/04/19/polling-landscape-acknowledgments/ Wed, 19 Apr 2023 14:53:47 +0000 https://www.pewresearch.org/methods/?p=2314 This report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder. This report is a collaborative effort based on the input and analysis of the following individuals: Research team Courtney Kennedy, Vice President, Methods and InnovationDana Popky,  Associate Panel ManagerScott Keeter, Senior […]

The post Acknowledgments appeared first on Pew Research Center Methods.

]]>
This report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder.

This report is a collaborative effort based on the input and analysis of the following individuals:

Research team

Courtney Kennedy, Vice President, Methods and Innovation
Dana Popky,  Associate Panel Manager
Scott Keeter, Senior Survey Advisor

Methodology

Arnold Lau, Research Methodologist

Communications and editorial

Rachel Drian, Associate Director, Communications
Nida Asheer, Communications Manager
Talia Price, Communications Associate
David Kent, Senior Copy Editor

Graphic design and web publishing

Bill Webster, Senior Information Graphics Designer
Sara Atske, Associate Digital Producer

We are grateful to the many colleagues across the field who responded to our request to confirm or correct information we had gathered about their national public polling. Several individuals went above and beyond, offering insights about why various changes occurred. We appreciate everyone who took the time to respond. Within Pew Research Center, we thank Carroll Doherty and Claudia Deane for providing useful feedback on a draft of the report.

We are also grateful to organizations that maintain databases of public polls, which facilitated some of the data collection for this study. These include the Roper iPoll, the Inter-university Consortium for Political and Social Research, the Louis Harris Data Center operated by the University of North Carolina Odom Institute, and FiveThirtyEight.com.

The post Acknowledgments appeared first on Pew Research Center Methods.

]]>
2314
Methodology https://www.pewresearch.org/methods/2023/04/19/polling-landscape-methodology/ Wed, 19 Apr 2023 14:53:47 +0000 https://www.pewresearch.org/methods/?p=2316 This study was designed to examine changes in national public opinion polling in the United States from 2000 to 2022. The study focuses on two key features: the sample source(s) (where the respondents came from) and the mode(s) (how they were interviewed). The study’s unit of analysis is the organization sponsoring the polling. In total, […]

The post Methodology appeared first on Pew Research Center Methods.

]]>
This study was designed to examine changes in national public opinion polling in the United States from 2000 to 2022. The study focuses on two key features: the sample source(s) (where the respondents came from) and the mode(s) (how they were interviewed).

The study’s unit of analysis is the organization sponsoring the polling. In total, 78 such organizations are included. Pew Research Center staff coded the sample sources and interview modes used in national public polls for each organization during each even-numbered year from 2000 to 2022. Odd-numbered years were excluded on purely practical grounds. Hundreds if not thousands of national public polls are released each year, and processing them for this report required substantial labor. Focusing on even-numbered years cut the manual labor roughly in half. As shown in the report, the even-numbered-year approach was able to successfully track major changes in the industry.

The initial coding of sample source and mode of interview was based on information available from a variety of sources including pollster websites, news articles, and the Roper iPoll opinion poll data archive. After this data was compiled, Center staff attempted to contact each organization and ask them to confirm whether the information for that organization was accurate. Most organizations that responded confirmed that the information gathered was accurate. When organizations provided corrections or additions, the study database was updated with that information. Each of these steps is described in greater detail below.

Inclusion criteria

The study aimed to examine change over time in national public polls. To be included, an organization needed to sponsor at least one national public poll in two or more of the years studied (i.e., the even-numbered years from 2000 to 2022). Organizations that sponsored a national public poll in only one of these years are not included. This criterion helped to reduce the influence of organizations that were not consistently involved in polling.  

The national polls examined in this study are those based on the general public (e.g., U.S. adults ages 18 and older), registered voters or likely voters. Polls that exclusively surveyed a special population (e.g., teachers) were not included, as they often require unique designs. Additionally, polls described as experimental in their methods or research goals are not included.

Who is a ‘pollster’ in this study

One important choice in designing this type of project is deciding whether to study the organization sponsoring a poll or the organization collecting the data (known as the “vendor”). For example, many Pew Research Center polls are fielded by Ipsos. Pew Research Center is the sponsor, and Ipsos is the vendor. This study focuses on sponsors and refers to them as the “pollster.” There are several reasons why sponsors are the focus rather than vendors, including:

  • The sponsor is typically the organization commissioning the poll and attaching their institutional name to its public reporting. Sponsors decide whether to make poll results public or keep them private. Generally speaking, sponsors are the entity most responsible for any public reporting from the poll.
  • Sponsors dictate, in broad terms, the budget available for a poll, and the vendor works within that constraint to find a design that fits within that budget. While the vendor will often decide the exact final price tag, they usually are reacting to information from the sponsor as to whether the budget available is, say, $10,000 or closer to $100,000. In other words, whether a poll uses a very expensive method or very inexpensive method is generally dictated by the sponsor.

One complication is that sometimes the sponsor and vendor are one and the same. Continuing with the Ipsos example, in addition to the polls they conduct on behalf of clients, Ipsos also conducts national polling and releases the results by themselves. Accordingly, Ipsos is among the 78 pollsters in the analysis based on polling that they (co-)sponsor. Other vendors only conduct work on behalf of clients and never put themselves in the role of sponsor. This explains why a few major companies that conduct polling do not necessarily appear in the study dataset. Their work is represented in the rows associated with their clients.

Admittedly, focusing on the sponsor rather than the vendor feels more appropriate in some cases than others. For example, some sponsors are very engaged in decisions about how their polls are conducted, down to the smallest detail. Referring to such sponsors as the “pollster” feels accurate. Other sponsors are much more hands-off, letting the vendor make most or all decisions about how the poll is conducted. In these cases, it is tempting to think of the vendor as the pollster.

Ultimately, to execute a study like this, researchers must rely on information in the public domain. Nuanced records of who made which decisions are simply not available, nor is it possible to gather such information about polls fielded as early as 2000. Of the options available, focusing on the sponsor was the best fit for this study, but we acknowledge that in some cases the sponsors were likely not deeply engaged with decisions about sampling frames and interview mode.

Another complication is that some polls have multiple sponsors. For example, ABC News and The Washington Post have a long-standing partnership for live phone polling. In addition, they sponsor polls either solo or with other partners. In this study, each pollster is credited with using the methods employed for national public polls that they either sponsored or co-sponsored. For example, this study’s records for both ABC News and The Washington Post reflect the use of live phone with random-digit-dial sample for the years in which they jointly sponsored such polls.

A few well-known, recurring national surveys were considered but ultimately excluded from this analysis because they are too different from a typical public opinion poll. The General Social Survey and American National Election Survey both measure public opinion, but their budgets and timelines are an order of magnitude different from a typical public opinion poll. On that basis, we decided it was unhelpful to include them. Similarly, the National Election Pool (NEP) poll and VoteCast are not included because they are designed specifically to cover election results not just nationally but at the state level. Their methodological and budget considerations are quite different from those of ordinary opinion polls. These studies are very valuable to opinion research in the U.S., but they are not comparable to the polling studied in this project.  

Which polling organizations were included

The analysis is based on 78 organizations. Each organization sponsored and publicly released national poll results in at least two of the years studied (i.e., the even-numbered years from 2000 to 2022). There is no authoritative list of such organizations, and experts might disagree on whether certain edge cases should be included. Several gray areas required resolution.

Inclusion based on content

In the broadest sense, public opinion exists for many topics – from politics and the economy to pop culture and brand preferences. That said, the national dialogue around “polls” and “polling” is generally understood to be focused more narrowly on public affairs, politics and elections. For instance, a market research survey on public preferences in ice cream flavors is probably not what comes to mind when someone is asked about public opinion polling. At the other extreme, surveys measuring support for candidates running for public office (known as “the horserace”) perhaps represent the prototypical conception of a poll. Many polls fall in between – measuring attitudes about important national issues but not measuring the horserace.

Each of the 78 organizations included in this study has a track record of measuring public attitudes about public affairs, politics and elections. Not all organizations included here specifically measure the horserace, but all of them have asked the public about factors influencing how they might vote in an election.2

Academic research versus public opinion polling

The decision to consider a sponsoring organization as the pollster raised two practical questions when considering academic organizations. Colleges and universities may have multiple individuals or entities independently conducting polls within them, and so one question is whether these separate polling efforts should be considered as different pollsters or not. The second question is whether (or which) national polls conducted by faculty primarily for academic purposes should be included.

Although some academic institutions have more than one branded entity conducting public polls on politics and policy, publicly released surveys typically carried the institutional name. Moreover, there was no practical way to ascertain the degree of independence of the entities within a university. As a result, the decision was made to code all polling that met the study’s content criteria for inclusion as polling by that university.

However, this study does not contain all the surveys conducted by every college and university. Indeed, it would be nearly impossible to locate all such polling, since most of it is made public only through academic papers or conference presentations. We acknowledge that as a limitation of the study. This study does not purport to represent every national survey whose results can be found somewhere in the public domain. The ones that are included tend to have a news media partnership increasing their visibility; maintain a public website updated regularly with the latest polling results; and/or are archived in the Roper iPoll public opinion poll repository. It is worth underscoring that the goal of this study is to describe the nature and degree of changes in national public polling from 2000 to 2022. Research designed primarily for an academic audience (e.g., peer-reviewed journals) is not the type of polling people have in mind when they question whether polling still works after the 2016 and 2020 elections.

Among the news media organizations included and contacted in this study, only one raised the concern that multiple units within the organization might be conducting polls. Most of their polling was coordinated through their political unit, but a few polls over the years were not. Mapping out the decision structures within each organization was outside the scope of this project. In the interest of applying the same standard to each organization, this study includes the methods for any public polls we found during the years studied. In some cases, this yields a track record that does not reflect centralized decision-making. In all cases, though, this approach reflects what the public sees – either a poll was or was not sponsored by “Organization X.”

Data collection

Creating a list of national public pollsters

There is no authoritative list of organizations sponsoring and releasing results from national public opinion polls, so researchers constructed one using a variety of sources. First, researchers compiled the names of organizations releasing national estimates for U.S. presidential approval or horserace estimates for U.S. presidential elections going back to 2000. For this task, researchers used polling databases and summaries from The Polling Report, FiveThirtyEight.com, RealClearPolitics.com and Wikipedia. Researchers then expanded the list with the names of prominent national polling organizations (e.g., Kaiser Family Foundation, the American Enterprise Institute’s Survey Center on American Life, PRRI) that do not necessarily appear in those sources. Finally, additional sponsors were identified through polling partnerships. For example, if researchers saw that Pollster A co-sponsored a few polls with Pollster B, then the researchers investigated all the polling associated with Pollster B. If Pollster B qualified for inclusion, they were then added to the study.  

Determining which methods each pollster used in national public polling for each year

For each pollster in the study, researchers set out to document which sampling frames and interview modes the pollster used for national public polls in each even-numbered year from 2000 to 2022. Unfortunately, this kind of information cannot be found in any one location. Indeed, one of the main motivations for this study was that existing databases are insufficient for understanding key distinctions in modern polling. Existing resources might indicate whether a poll was done by “phone” or “online,” but there is often no information about the sample source. Consequently, Center researchers scoured the internet for more detailed documentation. Researchers executed this work in several steps.

1. Internet search for pollster and year. Researchers conducted a Google search for “[POLLSTER NAME] poll [YEAR]” starting with 2022 and working backwards, doing even-numbered years only. For each year, they limited the search time frame to 01/01/[YEAR] to 12/31/[YEAR]. They then investigated each of the hits on the first page of results. The poll in question could be disregarded if it was not sponsored by the pollster of interest and/or if the poll was fielded in the previous year (an odd-numbered year). Next, researchers conducted a Google search for “[POLLSTER NAME] survey [YEAR]” in the same manner. This was done because some organizations use the term “survey” instead of “poll.” This searching often yielded poll reports, press releases, methodology statements and other useful documentation.

2. Search the pollster’s website for documentation of polls and methodology. Some pollsters had a public webpage where poll results or documentation were posted. For some pollster websites this was productive, but for others it was not. In some cases, a poll was listed with a broken link. In those cases, researchers added the additional information from the pollster website to the Google search methods listed above. 

3. Search the Roper iPoll Archive.Researchers entered the pollster and year in the search fields and looked for documentation of polls and methodology.

4. Search the FiveThirtyEight.com pollster database. This resource was helpful to find instances of polls and methodologies that had not been identified through the prior step.

5. Additional internet searches for missing information. In some cases, additional, more specific internet searches were needed to look for missing information. Researchers conducted Google searches for “[POLLSTER NAME] poll [YEAR] [MODE]” to confirm the year that use of a particular method began or ended. For example, “CBS News poll 2010 online” was used to check that the pollster did not do online polling in 2010.

Two researchers performed the steps above independently for each pollster. The team also created a codebook assigning a number to each combination of methods observed from a pollster in a given year. For example, code 1 denotes that the pollster used only live phone with random-digit-dial sample for at least one national public poll that year, while code 25 denotes that the pollster did at least one poll using online opt-in and at least one poll using a probability-based panel. The team conducted a reliability analysis on the two independently gathered sets of data. The Cohen’s kappa was 0.7, which is typically considered an acceptable level of agreement in social science content coding. A senior researcher then resolved any conflicts and produced a single dataset reflecting the best information available from the searching phase.

To record the data, researchers created a spreadsheet in which each column was a year (2000, 2002, … 2022) and each row was an instance of a pollster using a particular method during that year. Researchers archived the URL documenting each instance of a pollster using a given method in a particular year. If the pollster did multiple polls using the same method, only one instance was archived per year.

Pollster outreach to verify the information gathered

As a final quality check, a senior researcher emailed each pollster to verify the information. These emails explained the study goals and presented the methods information observed from 2000 to 2022 specifically for the pollster. The email asked for any additions or edits. Most organizations (47 of the 78) responded. Pollsters were very generous with their time. Among those responding, most (81%) confirmed that the information recorded looked accurate. Others offered corrections, which were then applied to the study dataset. In some cases, a pollster correction could be corroborated by a URL. In a small number of instances, staff could not find a corroborating URL, but the information provided by the pollster was taken as authoritative. Such instances are recorded in the documentation dataset as “pollster email” instead of a URL.

Assumptions made when information was incomplete

Sometimes the best available documentation of a poll did not clearly describe the sample source and/or interview mode. However, it was often the case that circumstantial evidence supported a reasonable educated guess. The team applied several guidelines in such situations. Each of these guidelines proved well-founded based on the input received during pollster outreach.

  • If mode was not specified, but the questionnaire contained interviewer instructions, such as to read or not read certain response options, the poll was coded as live phone.
  • If mode was specified as live phone, the sample source was not specified, but the poll was described as a sample “of Americans,” the poll was coded as live phone with random-digit-dial sample.
  • If the poll was described as having been conducted “online” but there was no other information, the poll was coded as online opt-in. (Our experience was that any time a poll used a probability-based panel, the pollster always disclosed the name of the panel.)
  • If the poll documentation reported a “credibility interval” or “modeled margin of error” but did not disclose the sample source, the poll was coded as online opt-in.
  • If the methodology was not disclosed but the pollster had a clear track record of consistently using a certain methodology, the poll was coded consistent with the pollster’s known methods.
  • Documentation of live phone polls before roughly 2012 often did not disclose the sample source, such as whether it was random-digit dial or registered voter records. The information that is available suggests that most national public polls conducted from 2000 to 2012 were probably using random-digit-dial sample. If a poll was live phone but the sample source was not specified, the poll was coded as live phone with random-digit-dial sample (unless the pollster had a record of using registration-based sampling).

Strengths and weaknesses of the research design

Like any study, this one has its strengths and limitations. The strengths include:

  • Offering more insight than common industry characterizations. Existing databases that code polling methods tend to use terms like “online” or “telephone” that gloss over major distinctions, such as whether an online sample was recruited using convenience approaches or random sampling from a high-coverage frame. This study distinguishes between online surveys fielded with opt-in sample and those fielded on probability-based panels. This study also distinguishes between live phone surveys using registered voter records or random-digit-dial procedures.
  • Documenting growth in the number of methods individual pollsters use. This study captures not just changes in methods but the increasing use of multiple methods within and across polls by the same pollster.
  • Offering a timeline long enough to see how trends unfolded. While other reports have provided a snapshot of the methods used in a particular year or election, they generally do not shed light on the trajectory of those methods. This study distinguishes between changes emerging in recent years from those with longer arcs.
  • Studying new approaches. This study coded methods information at a level granular enough to see the emergence of new techniques such as using text as a supplemental mode or matching an online opt-in sample to registered voter records.  

The study limitations include:

  • Only national polling was considered. Unfortunately, attempting to find, document and code polling from all 50 states and the District of Columbia would have exceeded the time and staff resources available.
  • Not all methods details were considered. This study focuses on two key poll features: where the respondents came from (the sample source or sources) and how they were interviewed (the mode or modes). While important, they are not exhaustive of the important decisions in designing a poll. The study did not attempt to track other details, such as weighting. Because the study only measured two out of all possible poll features, estimates from this study likely represent a lower bound of the total amount of change in the polling industry.
  • Odd-numbered years were excluded. The study reflects data for even-numbered years only, which was a purely practical decision. Hundreds if not thousands of national public polls are released each year, and processing them for this report required substantial labor. Focusing on even-numbered years cut the manual labor roughly in half.
  • Given the size and fluidity of the industry, the dataset may be missing some pollsters. While the research team went to great lengths to include all the pollsters that were eligible under the study criteria, it is very possible that some were missed. The online public record is only so complete, and the further back in time one goes the more broken links one encounters. Moreover, technology has destroyed the barriers to public polling that once existed, and so the number of potential pollsters feels almost without limit. This study includes many of the most prominent polling organizations, but probably misses some of the less prominent pollsters with smaller digital footprints.
  • The study does not measure the volume of polls conducted with each sample source and mode of interview. The low cost of online opt-in polling has led to a dramatic increase in the number of such polls during the period of study. In practical terms, this means that the growth in the number of organizations conducting online opt-in polls could understate the growth in the share of all polls conducted with this methodology.

Supplemental materials

The dataset accompanying this report lists the 78 organizations included. The Documentation URLs file provides the webpages used to code the methods. The Codebook details what each code means.

The post Methodology appeared first on Pew Research Center Methods.

]]>
2316
How Call-In Options Affect Address-Based Web Surveys https://www.pewresearch.org/methods/2021/08/25/how-call-in-options-affect-address-based-web-surveys/ Wed, 25 Aug 2021 17:50:01 +0000 https://www.pewresearch.org/methods/?p=2079 One method to improve survey representation of the non-internet and less literate population is to allow people to take surveys offline. In March, we fielded a study to test the feasibility and effect of collecting data through respondent-initiated interactive voice response; here’s what we found.

The post How Call-In Options Affect Address-Based Web Surveys appeared first on Pew Research Center Methods.

]]>
Automated call-in options may help reach respondents who are less tech savvy, but relatively few choose this option and logistical complications abound 
“Nick Gunderson/Getty Images”
How we did this

Members of Pew Research Center’s American Trends Panel (ATP) complete all surveys online. Our goal was to determine whether adding the option to complete surveys through inbound interactive voice response (IVR) was feasible and if it would improve representation of less (digitally) literate and non-internet users.

Pew Research Center tested the viability of adding an IVR data collection mode on a sample independent of the ATP. In March 2021, the Center mailed survey invitations to 10,000 residential addresses sampled from the United States Postal Service’s Computerized Delivery Sequence File. The invitation included $2 and asked potential respondents to participate in a short survey in the mode of their choice. They could either log in to a website and complete the survey online or dial a toll-free number to respond via interactive voice response. In the IVR mode, individuals listened to computer-recorded questions and response options and keyed in their answer (e.g., 1 for Yes or 2 for No). All participants who answered via web received the full questionnaire, while those who chose IVR were randomized between the same full questionnaire and an abbreviated version. Web respondents received an additional $15 upon completion, and IVR respondents received $10. In total, 1,332 individuals completed the survey, with 1,250 doing so by web and 82 via IVR.

About 7% of U.S. adults do not use the internet, 16% are not digitally literate, and half cannot read above an eighth-grade level. These attributes can make it difficult for many Americans to participate in a self-administered online survey. Moreover, non-internet and less literate (both in terms of reading level and computer savviness) individuals are not randomly distributed throughout the population. Rather, they are disproportionately likely to be older, have less formal education and live in rural areas, on average, than their counterparts. Missing them, in other words, harms the representation of online surveys and can introduce bias.

Meanwhile, a growing number of surveys, including those conducted on Pew Research Center’s American Trends Panel (ATP), can only be completed online. Online-only methods underrepresent the non-internet and less literate population, potentially increasing bias and misrepresenting variance in the data. Some web-only surveys attempt to account for the non-internet and less literate populations using weighting adjustments, while others provide internet access (as is done for the ATP) or an alternative mode of data collection to individuals without internet access. But these are imperfect solutions. Weighting relies on assumptions about the similarities between internet and non-internet populations that may be faulty. Providing internet access does not address literacy challenges and may not be successful in recruiting individuals who consciously choose to be offline. And the introduction of some additional modes (such as live telephone interviewing) may introduce interviewer effects or (in the case of mail) be infeasible due to timelines and budgets.

One method to improve representation of the non-internet and less literate population is to allow people to take surveys in an offline mode that does not require reading but is still self-administered. In March 2021, the Center fielded a study to test the feasibility and effect of collecting data through inbound (respondent-initiated) interactive voice response (IVR) in addition to the internet. An invitation was mailed to 10,000 addresses and gave individuals the choice of completing the survey online or dialing a toll-free number to respond via IVR. In the IVR mode, individuals listened to computer-recorded questions and response options and keyed in their answer using their phone. In total, the study yielded 1,332 completed interviews, 1,250 via web and 82 via IVR.

The study yielded three primary findings:

  • Collecting responses through IVR is much more logistically complex than web-based data collection. However, the challenges associated with adding an IVR mode appear to be surmountable with additional experimentation and resources.
  • Individuals who respond via IVR are different than those who answer via web and better represent groups (e.g., conservatives, adults with less formal education) that the ATP and other online panels have historically underrepresented. Unfortunately, the proportion of respondents in this test choosing to respond via IVR (6%) instead of online is too small to meaningfully shift the overall distributions of participants in a panel like the ATP.
  • Data quality from inbound IVR may not be as high as that from online response, but that finding is tentative for three reasons. The IVR estimates are subject to sizable sampling errors because only 82 respondents chose that mode. Also, by design, mode was not experimentally assigned. The fact that IVR attracted a different set of respondents, with less education than web may account for at least some of the data quality differences between the two modes. Finally, this was the Center’s first time testing inbound IVR, and it is possible that further refinements in the protocols could achieve results more favorable to IVR. 
Inbound versus outbound IVR

In 2020, about 12% of national preelection polls used IVR either as the sole response mode or in combination with another mode like online. In almost all instances, these polls used outbound IVR. The study summarized in this report tested a similar but distinct approach called inbound IVR. The terminology flows from the perspective of the person conducting the survey. With outbound IVR, the automated calls are initiated by the researcher and go out to each of the phone numbers sampled for the survey. From the respondent’s perspective, their phone rings (or the call is blocked); they have received a cold call to take an automated survey. By contrast, the inbound IVR process tested here starts with the sampling of home addresses. The sampled addresses were mailed a letter with $2 and a request to take a survey either online or by calling in to take the automated survey. From the respondent’s perspective, they can choose how and when they respond.

The addition of IVR as a method to improve representation of the non-internet and less literate population in online panels is promising, but additional research is needed to streamline the survey deployment process, reduce costs and increase the proportion of IVR respondents. For example, while this study sheds light on the potential biases associated with online-only panels and provides some evidence of the potential for IVR, it does not include a way to determine the proportion of IVR participants who would ultimately agree to empanelment (i.e., participate in repeated surveys), nor does it determine whether these individuals may be recruited to the panel via alternative modes (e.g., mail or live phone) and retained by allowing response to individual panel surveys via IVR. Additionally, the study is limited to English speakers, so it does not address the feasibility of recruiting underrepresented groups within the Spanish-speaking population. We hope that researchers outside of the Center may use the knowledge gained in this study to further develop best practices on how to incorporate IVR.

IVR presents difficult but surmountable challenges in programming and implementation

There are several challenges that make implementing IVR more difficult than some other modes. 

First, some survey platforms (the software used to collect and manage survey data) are not suitable for both web and IVR administration. As a result, the IVR and web instruments have to be programmed twice, using different platforms for each mode. In our case, systems were synced nightly to minimize the risk that individuals completed the survey in both modes. However, the lack of live integration meant duplicate interviews were possible (this happened 14 times), and respondents could not start in one mode and complete in a different mode without having to start over. Having multiple platforms also creates a need for manual intervention to provide data collection monitoring (e.g., the data collection dashboard could not track IVR partial interviews) and doubles both the programming and testing resources required (i.e., cost and time) and the chances of programming errors. 

Second, unique features of the IVR software require special consideration. For example, the software cannot easily randomize question wording differences. Center researchers often field questions in which the response options are shown in a randomized order. One respondent may be asked a question that offers “strongly agree” as the first answer choice and “strongly disagree” as the last, whereas another respondent would receive the choices in reverse order. For IVR, a separate question has to be programmed for every order combination, and the resulting variables have to be recombined into a single variable after data collection. 

The software also cannot filter response options based on prior responses without complex programming. For this study, this limitation affected how Hispanic origin was collected. Individuals who self-identified as Hispanic were asked to provide their Hispanic origins (e.g., Cuba, Mexico). If they selected multiple countries, respondents were asked with which one they most closely identify. In the web survey, response options to the follow-up question were limited to those responses chosen in the first question. For IVR, individuals could (though none did) say they most identified with a country that they had not previously listed. 

IVR requires adaptation to question wording and flow compared to web

Third, the software requires multiple types of changes to the ways in which questions are worded. “Check all that apply” questions (a single question in which respondents can select more than one answer) are not feasible in IVR, so edits are made to repeat the stem or start of these types of questions and then ask a yes-no question about each response option. 

Another common question format on Center surveys are questions for which the response options are part of the question (e.g., “Do you currently identify as a man, a woman, or in some other way?”). To minimize repetition, the researchers added instructions into the IVR question (e.g., “Do you currently identify as a man? Press one. For a woman, press two, or for some other way, press three”). 

Some questions commonly asked at the Center have a long list of response options (e.g., online, respondents are provided 12 response options when asked for their religion). In IVR, respondents are required to listen to all response options before answering in order to minimize order effects and speeding. All of these examples increase respondent burden, which may increase breakoffs or measurement error due to satisficing.

Fourth, open-ended questions (those that respondents answer in their own words) are difficult in IVR. For the feasibility test, some open-ended questions were dropped from the IVR version. For example, if an individual reported being of a race other than those specified, the web version provided a text box for the respondent to type in their answer. Researchers later back-code the open-ended text into preexisting categories. In our study, 5% of web respondents selected “some other race or origin,” of which 28% entered text that was coded back to another race group and changed the assigned race category the Center uses for analysis. By eliminating the open-ended response for the IVR mode, race and similar variables cannot be back-coded, artificially increasing the proportion of respondents identifying as another race. 

Some open-ended questions cannot be eliminated (e.g., name for which to address the incentive check). In these cases, respondents were asked for a verbal response, which was later manually transcribed. Manual transcription was feasible given the limited number of open-ended responses and IVR respondents, but this approach may not be scalable. There was also some concern that manual transcription would result in misspellings that would require checks to be reissued, but no such concerns materialized.

Respondents took more than 80% longer to complete via IVR than web

Fifth, IVR surveys take almost twice as long to complete as web surveys. IVR respondents in our study were randomly divided into two groups – one group received a long form (max of 83 questions) and the other received a short form (max of 44 questions). Web respondents and those assigned to the IVR long form received the entire questionnaire, whereas short form respondents received an abbreviated questionnaire. Whereas web respondents took an average of 11 minutes to complete the entire survey, it took 20 minutes to complete via IVR. This is in part due to the number of additional questions required to collect the same information in IVR (e.g., asking Hispanic origin is one question online but nine questions via IVR). Each IVR question also takes around 1.6 times as long to administer and collect a response as the same question asked online (18 seconds vs. 11 seconds for IVR long form and web, respectively). IVR respondents are required to listen to the entire question and all response options before they can select an answer, whereas web respondents have no time constraints imposed on them.

Sixth, the Center requested that the IVR voice be automated, not recorded by a live human. ATP surveys often require last-minute questionnaire changes, and the person who initially records a question may not be available on a tight timeline to implement the changes in IVR. Additionally, consistency is important across ATP surveys, and the same individual who records one survey may not be available for a later survey. Automated text-to-voice applications do not suffer these limitations. Automation also provides flexibility to control the speed of administration. 

But automated voices come with their own challenges. Emphasis or stress on a particular word is not possible in the IVR platform used for this study (underlining or ALL CAPS are used to provide emphasis online), placing additional cognitive burden on respondents. The use of automation also requires precise placement of punctuation. If punctuation is incorrect, the automated voice does not pause in the appropriate places. In this study, several iterations of the questionnaire were required just to address punctuation placement. 

Logistically, several improvements would be needed before IVR could be reasonably implemented as an additional mode for the ATP. Surveys would need to be moved to a platform that could field both web and IVR surveys. The platform would need to be able to easily randomize question and response option order, implement complex filters and skip logic and be able to adjust emphasis on specific words. Staff would need to be trained to write questions that could be fielded in both web and IVR and trained to format them to allow the computer to properly pause and stress words in a manner consistent with human speech patterns. Additional experimentation would be required to ensure that adaptations in question wording do not break trends and produce reasonably reliable and unbiased data. Each panel is different, but we believe most of these requirements would hold for other online panels as well.

IVR improves recruitment of non-internet and less literate individuals

IVR respondents were half as likely to ‘almost constantly’ use the internet compared with web respondents

As one might expect, IVR is more effective than web surveys at gaining participation from less tech-savvy adults. Nearly half (45%) of all IVR respondents in our study reported infrequent (less than several times per day) or no internet use. This compares to just 7% of web respondents. 

While direct measures of literacy were not collected, IVR respondents are different on several metrics correlated with both internet use and literacy. They are nearly twice as likely to be age 65 or older and more than 2.5 times as likely to have a high school education or less or to make $30,000 or less in household income than their web counterparts. IVR respondents are also more likely to identify as politically conservative than web respondents. While individuals age 65 and older are already well represented on the ATP, the ATP significantly underrepresents individuals with less education and lower incomes. If the number of respondents recruited via IVR were large enough, it could rectify the skew in education and income on the panel.

While IVR was successful in recruiting individuals from many groups traditionally underrepresented on the ATP, the success was not universal. For example, the ATP underrepresents young adults (ages 18 to 29) and less socially engaged individuals (e.g., people who do not volunteer and/or vote). Only 5% of IVR participants are ages 18 to 29, compared with 14% in the general population, and 72% of IVR respondents report voting in the 2020 general election, compared with the actual turnout of 66%. Neither of these findings are surprising. Younger adults are more likely to be online and opt for web over IVR. IVR, like all self-administered modes, also requires significant initiative from the respondent, something less likely to be taken up by less engaged individuals.

IVR respondents were less educated and had lower incomes than their web counterparts

While the addition of IVR appears to be successful at recruiting individuals different from those who respond via web, our survey did not recruit enough of them. A total of 1,332 respondents completed the survey yielding a response rate of 13.7% (AAPOR RR1).2 Nearly all respondents (94%) opted to complete via web, with only 82 completing via IVR. 

A slightly larger share of invited individuals opted to start IVR (7%), but IVR suffered from a higher breakoff rate. Whereas 95% of individuals who screened into the web survey completed it, only 80% of those who received the IVR long form and 89% who received the IVR short form completed. This is not surprising given that the IVR survey (even the short form) took longer, on average, to complete, and response rates are indirectly correlated with survey length(as length increases, response rates decrease). Unfortunately, it suggests additional improvements need to be made to increase the initiation rate (i.e., the proportion of people who start the survey) among the types of people who may lean toward IVR (as opposed to web) response. It also suggests that the IVR survey may need to be shorter or incentives may need to be larger to improve the IVR completion rate.

A higher proportion of IVR participants stopped the survey prior to completing than those participating via web

One methodological change that would ensure more individuals complete the survey via IVR is elimination of the web option. However, this approach would likely lower the overall response rate since many people prefer completing online. Moreover, IVR is useful insofar as it recruits individuals who would otherwise not participate. The goal should not just be to increase the number of responses obtained via IVR but to improve the number of responses from the types of people that prefer IVR.

Ultimately, while the IVR mode shows promise, more research is needed before it can qualify as a feasible mode for the ATP (and, likely, other online panels). In particular, while the addition of IVR would recruit individuals currently underrepresented on the ATP, the proportion of IVR as a share of total respondents is too small to meaningfully improve representation on the ATP. Response rates among individuals who would be more inclined to answer in IVR (as opposed to web) need to be increased. Also, experimentation to improve the productivity of the IVR mode may include: incentivizing IVR more than web; limiting the IVR questionnaires to a subset of the web questionnaire; only including items for which bias is known or suspected (data from IVR-inclined respondents is most valuable if they offer answers different from the web respondents); reducing the survey length and improving completion rates; or considering different data collection protocols such as recruiting via mail and transitioning to IVR after empanelment. 

IVR data quality appears acceptable, but is not as high as in web surveys

Some researchers have raised concerns about poor IVR data quality. Multiple analyses of the study data reveal that inbound IVR data quality may be sufficient but not as high as the data quality observed in the web survey. However, the findings are not conclusive. The mode of administration was not randomized; individuals could choose in which mode they wished to respond. This confounds population differences with mode effects. Moreover, the variance in estimates among the IVR sample was large due to the small sample sizes, limiting the ability to detect true differences. Finally, design choices in web and IVR were made to maximize data quality. For example, IVR respondents were required to listen to all response options before entering an answer to mitigate satisficing. To the extent that different design choices are made in other surveys, data quality may differ.

Some respondents are prone to order effects in self-administered modes. The order of the response options was randomized (first to last vs. last to first) for seven variables for both web and IVR3. Order effects were relatively consistent between modes for four of seven variables. However, three variables were susceptible to large (over 10 percentage points) order effects in IVR that were not observed among web responses: frequency with which individuals discuss government and politics, gender, and party identification. As noted before, the sample sizes for the IVR case counts are small (approximately 40 per group), so small changes in the distribution (regardless of the significance) can create large percentage point differences. IVR may also be more susceptible to satisficing later in the survey. Additional testing with larger samples, additional IVR questionnaire lengths, and placement of the questions is required before more conclusive inferences can be drawn. While order effects are less than ideal, randomization of the response order creates noise but can eliminate bias among these variables. Ultimately, even if the observed order effects are determined to be real, they can be accounted for and are not insurmountable. 

Order effects appear larger in IVR than web, but conclusions are limited due to small sample sizes
IVR levels of misreporting higher than web, but still low

Given the amount of time it takes to complete an IVR survey (compared with a web survey) and the lack of engagement from a live interviewer, IVR respondents may be more likely to satisfice, the act of selecting any reasonable response option. Two questions were included in both the web and IVR modes to identify satisficing. Respondents were asked if they used the non-existent social media platforms FizzyPress and Doromojo. All respondents should have selected “no,” but three IVR respondents reported using FizzyPress and one reported using Doromojo. These levels of inaccurate reporting are low, but should be monitored among a larger sample size. 

There was also concern that respondents may be more inclined to skip questions or select an answer at random in IVR because they had missed part of the question and did not want to wait for it to repeat. The latter (select an answer at random) could not be evaluated. To measure the former, 28 questions fielded to all respondents (both the IVR short and long form) were evaluated for item nonresponse. Four had an item nonresponse rate of 5% or higher on IVR; none reached that level on web. A total of 11% of IVR respondents refused to provide a race. Race was a “check all that apply” question in web that had to be modified for IVR and became cumbersome to respondents. Further refinements (for example, “What is your race or origin? For White, press one. For Black or African American, press two. For Asian or Asian-American, press three. For any other race or if you are multi-racial, press four”) may reduce the item nonresponse rate. The nonresponse rate for religion (5%) would also likely be reduced by editing the question – specifically, reducing the number of response options from 12. Income suffered from high nonresponse (6%). However, income is typically the most refused item in U.S. surveys, and this nonresponse rate is well below that found elsewhere. Ideology is the only question for which additional investigation may be warranted.

In all, some question edits may help improve the data quality among IVR respondents. Additional testing should be conducted on larger samples to provide more precise estimates of order effects, misreporting and item nonresponse. But none of the findings here would prevent IVR from being added to the ATP. 

Item nonresponse appears higher among IVR responses, but still low for most variables

The post How Call-In Options Affect Address-Based Web Surveys appeared first on Pew Research Center Methods.

]]>
2079
Acknowledgments https://www.pewresearch.org/methods/2021/08/25/acknowledgments-5/ Wed, 25 Aug 2021 17:50:04 +0000 https://www.pewresearch.org/methods/?p=2085 This report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder. This report is a collaborative effort based on the input and analysis of the following individuals: Research team Ashley Amaya, Senior Survey Methodologist Methodology Courtney Kennedy, Director, Survey ResearchScott Keeter, Senior Survey AdvisorAndrew Mercer, Senior […]

The post Acknowledgments appeared first on Pew Research Center Methods.

]]>

This report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder.

This report is a collaborative effort based on the input and analysis of the following individuals:

Research team

Ashley Amaya, Senior Survey Methodologist

Methodology

Courtney Kennedy, Director, Survey Research
Scott Keeter, Senior Survey Advisor
Andrew Mercer, Senior Research Methodologist 
Nick Bertoni, Senior Panel Manager
Dorene Asare-Marfo, Research Methodologist
Nick Hatley, Research Analyst
Arnold Lau, Research Methodologist

Communications and editorial

Rachel Weisel, Senior Communications Manager
Calvin Jordan, Communications Associate
Travis Mitchell, Copy Editor

Graphic design and web publishing

Bill Webster, Senior Information Graphics Designer
Travis Mitchell, Digital Producer

Other colleagues at Pew Research Center provided helpful comments on this study, including Claudia Deane and Michael Dimock.

The post Acknowledgments appeared first on Pew Research Center Methods.

]]>
2085