Tracking missing data in longitudinal samples

I’ve been working on this massive longitudinal dataset recently.  And while I have the R tables function and a variety of other tools at my disposal – I find that I constantly go back to SAS for identifying which time points are missing data for each subject.  Since this is massively longitudinal data, I tend to keep it around in “long” format (one column of data where subjects appear in multiple rows).  The first thing I did was transpose (proc transpose) the data into “wide” format, so that each subject would have all three of their visits on one row.

I then used some arrays to go through those three time points and put a $ in where the data was present, and an underscore in when the data is missing. I can then run a quick proc freq on the data to get summary information and handoff the list of missing data to double check that the missing data are really missing and not just a mistake in our database.

Examples of code and output are below.  Names and numbers changed for obvious reasons.

 

The power of doing things in Parallel

There just isn’t enough time in the day to get everything done.  And the more time that I spend on getting little things done, the less time I have for the “big picture”.  Big picture will include writing papers, chapters, and grants.  The ability to do repetitive tasks quickly and automatically is probably the most important skill to develop early on in your career.  One way to accomplish this is to hire minions.  The more sensible way is to learn how to program.

There are a lot of programming languages in the world.  Some are very low-level (e.g. C, C++), requiring you to specify quite a few details.  Other languages are very high level (e.g. Python, Ruby), which take care of a lot of the grit for you, while sacrificing some of the flexibility of lower-level languages.  But one of the most overlooked and easy to develop skill for someone dealing with imaging data is the ability to script in the shell of your UNIX/Linux/Mac.  Macs ship with the bash shell by default.  Before you run out and buy a book on Bash, remember that Google is your friend and that you can quickly learn Bash using free resources on the web.

Using bash, you can turn something that would require the manipulation of a lot of files into something quick and easy.  For instance, lets say that you need to use FSL’s BET (Brain Extraction Tool) to skull strip a hundred brains.  In this case the files follow the pattern – subj001.nii subj002.nii subj003.nii, etc.  If you wanted to run bet on each of these, you could do something along the lines of: bet <input file> <output file> <options>.  As you can imagine, doing that a hundred times can become very tedious!  Scripting to the rescue.

#!/bin/bash

for aSubject in subj*.nii
do
base=`basename $aSubject .nii` #removes the “.nii”, now base contains ex: subj001
bet $base ${base}_brain
done

The script that you can save as a file or just type into the bash shell (Terminal window) and press enter will run through all of the image files you have in the folder that follow our naming pattern and run bet on them.

So scripting is immediately useful, because you can now automate something that would have taken you a long time.  But automatic and fast aren’t always the same thing.  Insert GNU Parallel.  With GNU Parallel, you can run all of these tasks simultaneously (assuming that you have more than one processor in your computer).  To do this, we no longer need the for loop and our command becomes this:

ls subj*.nii | parallel bet {} {.}_brain

This command says get a listing of all files in the directory following our naming scheme.  Now pipe that listing into parallel.  Parallel will then call bet on all of those files piped into it and output them without the extension {.} and add on the suffix _brain.  In terms of efficiency, running the shell script above with the for loop took approximately 50 seconds on 9 files.  Running the process in parallel took 7 seconds with the same 9 files.  That’s it.  No hard setup of a cluster, no learning a special programming language.  Just parallel.

Preprocessing EEG/ERP data

Years ago, I wrote a manual for preprocessing data using EGI’s Net Station software.  As I was cleaning out old files and burning things I never use to CD, I came across my collection of analysis PDFs.  I’ll start by uploading the first manual, which covers preprocessing ERP data by filtering, segmenting, artifact detection, bad channel correction, re-referencing to the average reference, baseline correction, and averaging.

In the future I’ll upload more documents covering PCA, Source Localization, and making figures for publication.  I’ve also updated some of the future documentation to make use of R and SAS instead of SPSS.  If you have suggestions for additional changes, feel free to drop me a line.

Download ERP Data Analysis Preprocessing PDF

Creating automated snapshots of fMRI activation

There are plenty of times when you want to create snapshot images at a particular threshold so that you can quickly view the activation profiles of all participants within a given study.  AFNI offers this functionality through a plugout_drive application where you can tell AFNI to change overlay, underlay, threshold, etc and even save the screenshots to disk.  You can then automate the process if you like by encapsulating everything in a bash/ruby/python script.

Below is a simple example where we have several NIFTI files in a folder, along with the template image TT_N27.nii (originally TT_N27+tlrc).  Call this script with the arguments of the statistical maps that you would like to open as overlays.  AFNI will then open each overlay and take a snapshot with a p-value of 0.01.  You can of course modify the p-value in the script as well as change the spacing of slices, etc.  You could also have it go into subfolders by using bash to manage your directories however you like.

 

#!/bin/bash
echo "Beginning snapshots";
mkdir snapshots
afni -yesplugouts &
sleep 4
plugout_drive -com "OPEN_WINDOW A.axialimage mont=6x6:3 geom=600x600+800+600" \
 -com "CLOSE_WINDOW A.sagittalimage" \
 -com "SWITCH_UNDERLAY TT_N27.nii.gz" \
 -quit
 for aMap in $@
 do
 outputName=`basename $aMap .nii`
 plugout_drive -com "SET_FUNC_RANGE A.10" \
 -com "SWITCH_OVERLAY $aMap" \
 -com "SET_DICOM_XYZ A 0 16 23" \
 -com "SET_SUBBRICKS A -1 1 1" \
 -com "SET_THRESHNEW A .01 *p" \
 -com "SAVE_JPEG A.axialimage snapshots/${outputName}_snapshot.jpg" \
 -quit
 done
plugout_drive -com "QUIT"

EEG Processing Tools

I used to have a MobileMe account where I made available several custom developed software programs for doing processing of EEG/ERP data.  I’ve listed them below as well as their download links and also will make them available on the main site (www.cogneurostats.com).  All programs run on Mac OS X and are written in Objective-C/Cocoa.  I wrote them for PPC, but many of them should now work on Intel.  I’ll plan to release the source code at a later time.  

 

MCAT (My Channel Averaging Tool) – Version 3.0.1

A fairly straight forward problem, you have a high density array of channels (64, 128, 256) and you want to average several channels together for any number of reasons.  Alternatively, you just have a data matrix and you want a quick way to pull channels out and label them – this is the tool for you.  MCAT was developed to facilitate running PCAs with ERP data processed in Net Station, but should be friendly to most EEG/ERP software.  It assumes you have a matrix with channels represented as columns and rows representing time points.  MCAT will average channels and if you desire rotate the matrix to better fascinate a temporal or spatial-temporal PCA.DOWNLOAD MCAT3.

MCAT3

Channel Selector in MCAT3

 

MCAT-Merge – Version 1.0

While MCAT will concatenate multiple files together, if you forgot to do this or wanted to combine the files in a new order, you can use MCAT-Merge, which is really just a fancy way of using the unix “cat” command. DOWNLOAD MCAT-Merge_1.1

 

Net Station Text Export Tool – Version 3.0

Net Station is a great program for analyzing data.  But in order to export the data to text, you need a hasp.  Even if you have multiple hasps, there are times when you need to convert a file to text without access to hasp.  So you can export the data to Net Station RAW format (not to be confused with the image format from cameras) using the Net Station File Exporter Droplet and then use this program to export the data to text.  These text files can then be put into MCAT.  Version 3 makes use of multiple processors.  DOWNLOAD_NS_Text_Export_3.0.  You can also download the previous version for Mac OS X 10.3 and 10.4.  NS_Text_Export_2.1.

 

Command Line Tools – Version 1.0 beta

Command line tools for taking a Net Station RAW file and converting it to text or taking a text file (or multiple text files) and converting them into a Net Station RAW format file.  Particularly handy if you want to play with synthesized data.  The NSExport you just give the files you wish to convert.  The NSImporter you need to give syntax like this: ./NSImporter output.raw c1.txt c2.txt -C condition1 condition2.  When in doubt, typing just the command and hitting enter may give you help, but I haven’t really supported these in a while.  I wrote them for some very early work with synthesized data.  Source code to be posted at another time.  NSImporter is a PowerPC only application, though given the source code it wouldn’t be hard to modify. Download NSTools_1.0b.

 

SPSS Label Maker

A helpful program for laying out variables for use in SPSS or SAS for doing a temporo-spatial PCA.  Download SPSSLabelMaker_1.5

SPSS Label Maker 1.5