Erin Hartman and Stephen Jessee
Previous studies have compared experimental findings from convenience samples (e.g. Amazon Mechanical Turk) with those from population experiments (e.g. nationally representative probability samples). But while these results provide important information about how well these less expensive samples might approximate so-called “gold standard” samples, they still rely on convenience sample survey respondents, and as such do not provide information about generalizability to nonrespondents. In some ways this potential difficulty is more pernicious because it could mean that even experiments fielded to the highest quality samples might produce biased estimates of true population average treatment effects.
We rely on data from more than 50 separate survey experiments conducted by NORC for the NSF-funded Time Sharing Experiments in the Social Sciences (TESS). The NORC panel is unique in that it recruits respondents through a basic initial contact and then randomly samples some subset of initial non-responders for a “non-response follow up” (NRFU) recruitment which involves multiple contacts, offers of enhanced compensation, and face to face contact (door knocks). Thus we can re-estimate the main treatment effect of interest in each study, separately using only “initial responders” and “reluctant responders”, who more closely resemble nonrespondents. This data also serves as a unique test bed for evaluating the performance of new methods for estimating population effects from experimental results. In particular, we will test whether these methods, when applied to the initial responders to generalize to reluctant responders, can recover experimental benchmarks. In the modern era of single digit survey response rates, an important question is whether it is reasonable to treat respondents as representative of the general population, even conditional on standard demographic and other factors that are commonly used in sampling and weighting.