More on EpiSurveyor and Data Quality

Joel Selanikio ON 03 September 2012

Recently, I had written about how use of EpiSurveyor not only saves time and money but also improves the quality of collected data (“EpiSurveyor: A Smart Investment in Data Quality”).  My basic point was that collecting data with EpiSurveyor has two quality benefits:

  • enforced data controls – when using an electronic device, data controls can be imposed at the point of data entry, including skip patterns, range checks, etc.
  • rapid view of all data – seeing the data as it is collected lets supervisors identify problems in data collection as they are occurring.

I’ve gotten some feedback from EpiSurveyor users, who have pointed out that the “metadata” gained when data is collected with EpiSurveyor is even more useful than I had indicated:

I think in your “smart data” blog entry there is a third way that ES leads to data quality. ES enters ‘quality control metadata’ that can be later ‘audited’ during the formal data analysis. The example I have used it to do a frequency of time required to complete a survey (using the survey start and survey stop metadata).  Then I look for outliers.

It also has more subtle metadata such as the order in which the records were collected (which could not be deduced reliably from a stack of paper).

This is a great point, and one which we’ll be emphasizing with two features of the next version of EpiSurveyor, which enters its public beta testing phase in one month:

  • More date-time stamps– currently EpiSurveyor automatically records the date-time stamp only when a record is created (i.e., when the user starts filling out a new form). The next version will provide similar date-time stamps for:
    • when the record is completed
    • when the record is uploaded
    • when the record was last modified (for cases when the data is edited after the initial activity)

With these additional date-time stamps, it will be possible to know how quickly a form has been filled out, how long the data collector waited to upload it, and other useful information. In fact, we’re building those concepts into the second new feature:

  • Supervisory report — in addition to the built in basic analysis, we’re adding a “supervisory” analysis, which won’t be about the content of the data set, but rather about the metadata, providing an immediate report of average time between records, average time spent to complete the form, a list showing which data collectors have collected the most data and which the least, etc.

Making this kind of metadata available is incredibly exciting because it automates a time-consuming process that all supervisors would like to do — because it would be so useful in identifying problems — but rarely have the time to do.  After all, few if any supervisors have the time to manually calculate the average length of time to complete a form — certainly not while the data collection is actually occurring.  Now every supervisor can have that useful information as the data collection is occurring, at the click of a button!

Below is a representation of what the new supervisory report will look like. We would love feedback and suggestions for how we might improve it.

supervisory report draft

(Click to enlarge image)