Alternative sources can encompass almost any type of information, making this topic intractable without specific examples. At a high level, the promise of "administrative data," for example, can be quite lucrative. But for most statistical purposes these data are not replacements for survey data but rather, hide potentially useful ways to inform data collection, augment analyses, improve survey estimates, and integrate with survey data to produce superior, potentially cheaper and faster, data products.
Here is one example. A national survey is conducted every week to track national gasoline and diesel prices. It is a complex endeavor that requires collection and processing of data in a single day.
A popular smartphone app uses crowdsourcing to collect gasoline and diesel prices at gas stations. Individuals post and update prices at each gas station to inform all users of the app in real time.
The two are not independent as the national survey makes some use of the app data. Indeed, it is commendable that the survey uses data from the app to some degree. Nonetheless, I was surprised at the degree of correspondence between the two sources of estimates when I graphed the survey estimates and overlaid them with the app estimates:
The main giveaway for which line represents which source is that the app provides daily estimates instead of weekly.
One key observation is that the week to week changes are almost identical in the two lines. There are some minor differences in the actual price estimate. This seems to suggest that a future possible step is to design the survey with a different intention--to periodically calibrate the app data. This would be a very different approach from using app data where survey data are not available. Granted that there are other obstacles such as a government estimate relying on proprietary data from the private sector, it is an example of where survey data and data from alternative sources could be further integrated with potential benefits.
No comments:
Post a Comment