Sunday, January 31, 2010

Optimizing dual-frame RDD sample designs

Mike Brick presented an interesting short course through ASA/SRMS last year, presenting much-needed work on optimum allocation of samples from landline and cell phone frames. In his examples, made-up numbers for cost per interview were used.

Unfortunately, it is the cost estimates themselves
that are one of the greatest challenges to optimal allocation. Survey costs are ill-understood and difficult to measure.

To demonstrate the level of complexity, here is one example. Many may treat the purchasing of telephone samples as a standardized commodity, with some "bells and whistles" such as additional types of screening that could be purchased on a per-record basis to eliminate more nonworking numbers. However, sample vendors construct their sampling frames differently, even when the same type of (list-assisted) RDD methodology is used - e.g., 100-banks with at least 1 listed number. If so, there are at least two implications:

1. A dual frame study should not use a landline sample from
one vendor and a cell phone sample from another, as the overlap is unknown, leading to error in selection probabilities; and
2. One sample vendor may provide more efficient sample for landline but less efficient for cell phone. This implication has two further complications for studies.
a. The preferred sample vendor depends on the sample design, and
b. The optimum allocation of landline and cell phone sample also depends on the selected sample vendor.
We ran an experiment a couple of months ago and found support for exactly this set of expectations. Below is a table of the proportion of nonworking numbers by sample vendor, first for cell phone numbers and then for landline numbers. One vendor had a superior screening methodology for landline numbers which led to fewer nonworking numbers (although at a greater sample cost), while the other vendor provided cell phone sample that had fewer nonworking numbers. The latter was also confirmed by another experiment a few months prior - it does not seem to be by chance, in addition to being significant at the 95% level. Critical to understanding this, as cell phone numbers do not undergo screening, is that there are multiple ways in identifying which numbered banks should be allocated to the landline frame versus the cell phone frame, although the same uncertainty may extend to nonworking banks (exchanges in this case).

To further complicate things, one note of caution: these are working number rates. More efficient sample does not necessarily mean "better" sample. At an extreme, the most efficient sample is one that also has the least coverage, such as a listed sample. To use a cliche, further work is [certainly] needed in this area.