I’ve been working on this massive longitudinal dataset recently. And while I have the R tables function and a variety of other tools at my disposal – I find that I constantly go back to SAS for identifying which time points are missing data for each subject. Since this is massively longitudinal data, I tend to keep it around in “long” format (one column of data where subjects appear in multiple rows). The first thing I did was transpose (proc transpose) the data into “wide” format, so that each subject would have all three of their visits on one row.
I then used some arrays to go through those three time points and put a $ in where the data was present, and an underscore in when the data is missing. I can then run a quick proc freq on the data to get summary information and handoff the list of missing data to double check that the missing data are really missing and not just a mistake in our database.
Examples of code and output are below. Names and numbers changed for obvious reasons.