Tracking missing data in longitudinal samples

I’ve been working on this massive longitudinal dataset recently.  And while I have the R tables function and a variety of other tools at my disposal – I find that I constantly go back to SAS for identifying which time points are missing data for each subject.  Since this is massively longitudinal data, I tend to keep it around in “long” format (one column of data where subjects appear in multiple rows).  The first thing I did was transpose (proc transpose) the data into “wide” format, so that each subject would have all three of their visits on one row.

I then used some arrays to go through those three time points and put a $ in where the data was present, and an underscore in when the data is missing. I can then run a quick proc freq on the data to get summary information and handoff the list of missing data to double check that the missing data are really missing and not just a mistake in our database.

Examples of code and output are below.  Names and numbers changed for obvious reasons.


Comments are closed.