Parallelizing Freesurfer

Today I started running VBM and Freesurfer comparisons on a “new” dataset that everyone in our lab is particularly interested in moving forward.  The VBM was all carried out in SPM8 with the VBM8 toolbox. I particularly like the VBM8 toolbox because it makes use of the SPM8 DARTEL transformations, and it’s relatively easy to queue up a lot of jobs and come back a day later to all of your data segmented (grey matter, white matter) and normalized to MNI space.

But I also wanted to process this data through Freesurfer.  So you have a few options, you could 1) Open a new terminal window for each and every brain that you wish to process, limited by the total number of processors (or RAM) on your computer, and wait for those to finish before starting a new batch of processing (this requires a lot of remembering to start the next batch); 2) you could use a for-loop script to automatically start the next brain, which would take you approximately 12-hours (I’m using 12 hours to make the math simple, sometimes it takes longer) multiplied by the total number of subjects (in this case 12 hours * 50 subjects = 25 days).

for aSubject in Subject*.nii.gz
do
    recon-all -s ${aSubject%.nii.gz} -i $aSubject -all -qcache
done

Or 3) you could use GNU Parallel to batch process all of the brains using a set number of processors and have it automatically start more processing as each individual subject finishes.  I have a chunky Mac Pro on my desk with 12-cores and 40GB of RAM, I estimate that I can run 12 simultaneous processes and still not run out of RAM (and since I have hyperthreading, I shouldn’t find my computer grinding to a halt).  Total processing time is roughly (12 hours * 50 brains)/12 processors = 50 hours or just over 2 days.  Considerably faster than doing one at a time or constantly checking in to start more processing.

If you want to install GNU Parallel on linux see this post; if you have a Mac, I highly recommend using homebrew (requires the FREE Apple Xcode), followed by a quick “brew install parallel”.  Once you have GNU Parallel installed, I’m going to use – ls – to get a list of all NIFTI files in a directory, pipe that through sed to remove the extension (.nii.gz) and then pipe that list of subjects to GNU Parallel to automatically queue up jobs with a maximum of 12 simultaneous processes.  Press enter to begin the process.

ls Subject*.nii.gz | sed 's/.nii.gz//' | parallel --jobs 12 recon-all -s {} -i {}.nii.gz -all -qcache

It’s really that easy, when each job finishes, it will spit out the output to the terminal.  Remember that Freesurfer also spits out individual log files for the recon-all process into each subject’s folder.  A quick look at top or Activity Manager will show that I have several processes of Freesurfer running simultaneously.  When these finish, more will automatically start.

p_freesurfer

 

If you want to be daring and have several computers running the same architecture, operating system, and version of Freesurfer, you can use GNU Parallel to use multiple computers, complete with copying the files over and transferring results back.  Alternatively, if you have shared storage (e.g. NFS Mounts), you can just issue the commands to all of the computers that way.

Leave a comment

7 Comments

  1. Markus

     /  March 13, 2014

    Hi!

    I very much like your blog!

    I just tried your line with GNU parallel, however, I have a problem with the $SUBJECTS_DIR, apparently it is not adapted every time.

    I have a folder for each subject (s01 s02 etc) in which the T1 is contained.

    The $SUBJECTS_DIR should actually follow the recon-all, shouldn’t it?

    Thanks!
    Markus

  2. pete

     /  March 25, 2014

    I usually specify the SUBJECTS_DIR each time I open a terminal window. An easy way to do this is to change into the SUBJECTS_DIR and then type:

    export SUBJECTS_DIR=`pwd`

    This will automatically plug the current working directory into the export command. Then any command you run should work within that single window. My general workflow is to set the SUBJECTS_DIR and then cd into another directory that has all of the raw NIFTI files. Then I run the recon-all via:

    ls -d *.nii | parallel recon-all -s {.} -i {} -all -3T -qcache

    The {.} will remove the single extension .nii. If you have .nii.gz files you could pipe it through sed:

    ls -d *.nii.gz | sed ‘s/.nii.gz//’ | parallel recon-all -s {} -i {}.nii.gz -all -3T -qcache

  3. Markus

     /  March 25, 2014

    Thanks Pete!

    Actually, I realized that the above mentioned problem was more a problem of how to start recon all. I notice then that having a folder with all {subject}_T1s in it is the best thing to start, and that recon all would create the {subject} folders itself. So the SUBJECTS_DIR is organized by recon all itself too.

    Another question concerning parallel:
    I love the possibility to ls the subjects and pipe them to sed and then to recon all.
    However, is it also possible to run two commands within parallel ?In the sense of a subjects loop:
    for subjects in subjectlist ; do
    command 1 on subject
    command 2 on subject
    done

    I imagine something like:

    ls -d ??? | sed what_ever | parallel command 1 ; command 2

    but I am not sure and I do not want o mess up with my data!

    What do you think?

  4. pete

     /  March 25, 2014

    The easiest thing to do is to put your commands into a text file (in this example commands.txt). The basic rule is that commands on the same line will be performed sequentially and commands on different lines will be performed in parallel.

    So something like this:
    echo 1; echo 2;
    echo 3; echo 4;
    echo 5; echo 6;

    might produce:
    3
    4
    1
    2
    5
    6

    To execute this, you can do: parallel < commands.txt

    • Markus

       /  April 11, 2014

      Thank you Pete,

      Do you know how I would call the {} inside de command.txt?

      pure
      cd {}
      it does not work

      Example:
      cat subjectlist.txt | parralel –jobs 5 < command.txt
      (with inside command:
      cd {}
      ls -l
      )

      Thanks!
      Markus

      • pete

         /  April 21, 2014

        I think you’d need to put the subject list into the command file.

  5. Markus

     /  April 22, 2014

    Thanks Pete,

    Just to complete here:

    I actually managed to do it using the specification “–workdir” of GNU parallels:

    cat ../Scripts/do_graphcut.txt | parallel –jobs 4 –workdir $SUBJECTS_DIR/{} mri_gcut -T 0.65 $SUBJECTS_DIR/{}/mri/T1.mgz $SUBJECTS_DIR/{}/mri/brainmask.gcuts.T-0.65.mgz

    Best,
    Markus