Snakemake Workflow¶
A streamlined Snakemake workflow for automated diSPIM data processing with GPU acceleration on SLURM clusters.
Overview¶
The Snakemake workflow provides a complete, automated pipeline for processing diSPIM data from raw TIFF files to deconvolved results. It's designed for high-throughput processing on SLURM clusters with GPU acceleration.
Quick Start¶
# Navigate to the workflow directory
cd ../../../examples/snakemake
# Activate the pyspim environment
conda activate pyspim
# Run locally
snakemake
# Run on SLURM cluster
snakemake --slurm --jobs 10
Workflow Steps¶
The pipeline processes data through these automated steps:
- Load Data - Load raw TIFF files with camera correction
- ROI Detection - Automatically crop to regions of interest
- Deskew - Correct stage scanning angle (GPU-accelerated)
- Deconvolve - Richardson-Lucy deconvolution (GPU-accelerated)
Configuration¶
Edit config.yaml to configure your data:
# Data paths
data_dir: "path/to/your/data"
psf_dir: "path/to/psf/files"
# Acquisitions to process
acquisitions:
- "your_acquisition_name"
# Acquisition parameters
step_size: 0.5 # microns
pixel_size: 0.1625 # microns
theta: 45 # degrees
# Processing parameters
roi_method: "otsu"
decon_iterations: 10
# PSF files
psf_a: "PSFA_demo.npy"
psf_b: "PSFB_demo.npy"
Output Structure¶
results/{acquisition}/
âââ raw/ # Loaded raw data
âââ cropped/ # ROI-detected and cropped data
âââ deskewed/ # Deskewed data (GPU-accelerated)
âââ deconvolved/ # Final deconvolved result (GPU-accelerated)
SLURM Integration¶
The workflow automatically handles SLURM job submission with GPU resource requests:
Resource Management¶
- GPU Jobs: Automatically request
--gres=gpu:1for deskew and deconvolution - Memory: Configurable per rule (4GB for loading, 8GB for ROI, 16GB for deskew, 32GB for deconvolution)
- Runtime: Configurable time limits per processing step
- Parallel Execution: Use
--jobs Nfor concurrent processing
Example SLURM Execution¶
# Submit workflow to SLURM
snakemake --slurm --jobs 10
# Monitor jobs
squeue -u $USER
# Check logs
ls .snakemake/slurm_logs/
Adding New Acquisitions¶
-
Add to config.yaml:
-
Ensure data files exist:
-
Run the workflow:
Customization¶
Modifying Resource Requirements¶
Edit the resources section in Snakefile:
resources:
runtime = 240, # minutes
mem_mb = 32000, # memory in MB
slurm_extra = "--gres=gpu:1" # GPU request
Adding Processing Steps¶
The workflow is modular and easily extensible. Add new rules to Snakefile for additional processing steps.
Different ROI Methods¶
Change the ROI detection method in config.yaml:
Troubleshooting¶
Common Issues¶
GPU Issues:
- Ensure you're on a GPU node
- Check CUDA availability: nvidia-smi
- Verify GPU resource requests in Snakefile
Memory Issues:
- Increase mem_mb in Snakefile rules
- Monitor memory usage: squeue -o "%.18i %.9P %.20j %.8u %.2t %.10M %.6D %R"
Time Limits:
- Increase runtime in Snakefile rules
- Monitor job duration: sacct -j <jobid>
Debug Mode¶
# Verbose output
snakemake --verbose
# Dry run
snakemake -n
# Clean restart
rm -rf results && snakemake
Performance Tips¶
- Parallel Processing: Use
--jobs Nto process multiple acquisitions concurrently - GPU Utilization: Ensure GPU jobs run on appropriate nodes
- Storage: Use fast storage for intermediate files
- Monitoring: Check SLURM logs for performance insights
Example Output¶
After successful execution, you'll find: - Raw, cropped, and deskewed data as TIFF files - Final deconvolved result as TIFF file - ROI coordinates and processing metadata - SLURM job logs for monitoring
Integration with Other Workflows¶
The Snakemake workflow can be integrated into larger bioinformatics pipelines:
Snakemake Integration¶
# In your main Snakefile
include: "path/to/pyspim/examples/snakemake/Snakefile"
rule process_all:
input:
expand("results/{acquisition}/deconvolved/deconvolved.tif",
acquisition=config["acquisitions"])
# ... additional processing
Nextflow Integration¶
Convert the workflow to Nextflow or integrate with existing Nextflow pipelines.
Comparison with Other Methods¶
| Method | Pros | Cons |
|---|---|---|
| Snakemake | Automated, reproducible, SLURM integration | Learning curve, setup required |
| Notebook | Interactive, educational | Manual, not scalable |
| Command-line | Flexible, direct control | Manual job management |
The Snakemake workflow is ideal for production processing of multiple acquisitions with minimal manual intervention.