Skip to content

OCT data flow and folder layout

High-level architecture

Core components

  • Prefect Server & Scheduler
  • Runs on the primary processing host (Zircon)
  • Manages flow deployments, event triggers, state, and artifacts

  • Processing Workers

  • Zircon: high I/O, large-memory tasks (tile processing, mosaics)
  • Auxiliary hosts: compute-heavy, low-I/O tasks (e.g., registration)

  • Event Bus (Prefect Events)

  • Coordinates downstream workflows when upstream flows complete

  • Storage Backends

  • Local high-speed SSDs (temporary processing)
  • External raw data archiving drives
  • DANDI / LINC s3 storage

Data processing flow

Processing is organized hierarchically: Tile → Mosaic → Slice → All-Slices.

Tile-level processing

Each tile represents the smallest independently processed unit in the data hierarchy. Tiles are the atomic data unit - the fundamental building blocks that compose mosaics.

Inputs

  • Spectral raw data or complex data
  • File naming convention encodes acquisition metadata (parsed at ingest)

Tile processing steps

  1. Metadata Parsing
  2. Extract slice, mosaic, tile index, illumination
  3. Integrity & Compression
  4. Compute SHA-256 checksum
  5. Compress raw tile data
  6. Ensure single-read semantics where possible
  7. Spectral → Complex Conversion
  8. When input is spectral raw data: Performed in MATLAB to convert spectral data to complex format
  9. When input is already complex data: Files are soft linked to complex/ directory instead of converting
  10. Output stored in complex/
  11. See the MATLAB batch processing design doc for details on batch processing optimization
  12. Complex → 3D Volumes Conversion
  13. Convert complex tiles to 3D volumes (dBI, O3D, R3D modalities)
  14. Performed in MATLAB
  15. See the MATLAB batch processing design doc for details on batch processing optimization
  16. Surface Finding
  17. Automatic surface detection from intensity data
  18. Surface finding method can be configured (e.g., \"find\" for automatic detection)
  19. Enface Image Generation
  20. Generate 2D enface images from 3D volumes using surface information
  21. Multiple enface modalities: AIP (Average Intensity Projection), MIP (Maximum Intensity Projection), orientation, retardance, birefringence
  22. Surface maps generated for visualization
  23. Archival Upload
  24. Compressed raw data uploaded to DANDI/LINC
  25. Uploads are handled by a dedicated, event-triggered flow (see the upload strategy design doc for details)

MATLAB integration

  • MATLAB is invoked via command-line interface from Python
  • MATLAB functions handle spectral-to-complex and complex-to-processed conversions
  • Data flow: Python → MATLAB → Python (processed tiles)
  • Currently executed in MATLAB (future Python migration planned)
  • Future migration to Python-native implementations will eliminate the need for batch processing optimization

Tile-level QC

  • Validate surface finding overlap with intensity images
  • Verify processing quality at tile level
  • QC images emitted as Prefect artifacts and Slack notifications

Mosaic-level processing

Triggered once all tiles in a mosaic complete tile-level processing. A mosaic contains all tiles for a given slice and illumination type (normal or tilted).

First slice processing

For each illumination type of the first slice (mosaic_001 for normal, mosaic_002 for tilted, maybe more types of illumination), additional processing steps are required that are not needed for subsequent slices:

  1. Tile Coordinate Determination
  2. Determine tile positions and alignment for stitching
  3. Generates coordinate template that is reusable for all mosaics of the same illumination type
  4. For subsequent slices of the same illumination type, the template from the first slice is reused
  5. See the stitching and coordinate determination design doc for detailed algorithms and methods
  6. Focus Finding
  7. Determine optimal focus plane for 3D volume stitching
  8. Focus finding requires accurate surface information, so it uses an unfiltered version of the surface data
  9. QC validation: verify focus finding overlap with intensity images
  10. See the focus finding algorithms design doc for details

2D enface mosaic stitching

  1. Template Application
  2. Apply coordinate template to current mosaic tiles
  3. For first slice: use newly generated template
  4. For subsequent slices: reuse template from first slice of same illumination type
  5. Generate tile information files for each enface modality
  6. Stitch enface modalities
  7. Stitch all 2D enface modalities: AIP, MIP, orientation, retardance, birefringence, surface
  8. Each modality is stitched independently using the same coordinate template
  9. Generate overlap QC images to verify stitching quality
  10. Mask generation and application
  11. Generate mask from stitched AIP using threshold-based approach
  12. Apply mask to all stitched enface outputs
  13. Mask removes background/noise regions
  14. Output generation
  15. Save stitched enface images in multiple formats (NIfTI, JPEG)
  16. Upload stitched 2D mosaics to cloud storage

3D volume mosaic stitching

  1. Focus plane application
  2. For first slice: use focus plane determined during first slice processing
  3. For subsequent slices: reuse focus plane from first slice of same illumination type
  4. Apply focus plane for optimal 3D volume alignment
  5. Stitch 3D volumes
  6. Stitch 3D volume modalities: dBI, O3D, R3D
  7. Use coordinate template from 2D stitching
  8. Apply mask to stitched volumes
  9. Upload stitched volumes
  10. Upload stitched 3D volumes to cloud storage
  11. Volumes stored in appropriate format for downstream analysis

Slice-level processing

Triggered once all mosaics in a slice are complete. Each slice contains two (or more in future) mosaics: one with normal illumination and one with tilted illumination.

Registration process

  1. Thruplane registration
  2. Register normal and tilted illumination mosaics to combine orientations
  3. Uses MATLAB-based registration algorithm
  4. Accounts for tilt angle (gamma parameter) between illuminations
  5. Compute-heavy but low I/O, suitable for offloading to auxiliary hosts
  6. 3D orientation computation
  7. Combines information from both illumination angles
  8. Generates 3D axis representations (normalization needed)
  9. Generates RGB visualization of 3D axis orientation
  10. RGB 3D axis visualization
  11. Provides visual representation of fiber orientation in 3D space
  12. Useful for quality control and visualization

Slice-level outputs

  • Registered slice-level 2D/3D data
  • Thru-plane and in-plane data in .mat
  • 3D axis data in NIfTI
  • Orientation images in JPEG (thru-plane, in-plane, 3D axis)

All-slices processing

Currently manual:

  • Stack 2D mosaics across slices
  • Stack 3D volumes across slices

Folder structure

Project directory structure

project/
│  ├─ mosaic-{mosaic_id:02d}/
│  │  ├─ complex/      # complex tiles (symlinked if needed)
│  │  ├─ processed/    # intermediate processed data (SSD)
│  │  ├─ stitched/     # outputs (symlinked to dandiset)
│  │  ├─ state/        # flag files for batch tracking
│  │  │  ├─ batch-001.started
│  │  │  ├─ batch-001.archived
│  │  │  ├─ batch-001.processed
│  │  │  ├─ batch-001.uploaded
│  │  │  └─ ...
│  │  └─ archived/     # raw data (symlinked to dandiset raw)
│  ├─ registered/
│  ├─ state/
│  ├─ focus-normal.nii        # focus finding results (first slice)
│  ├─ focus-tilted.nii        # focus finding results (first slice)
│  └─ tilemap-normal.j2       # tile coordinate map
│  └─ tilemap-tilted.j2       # tile coordinate map

DANDI/LINC storage structure

DANDISET/
├─ rawdata/...         # raw compressed tiles
│  └─ sub-{subject_id}/
│     └─ sample-slice-{slice_id:03d}_chunk-{tile_id:04d}_acq-{acq}_OCT.nii.gz
└─ derivative/
   └─ sub-{subject_id}/
      └─ mosaic_{mosaic_id:03d}_{modality}.nii
      └─ .../*.ome.zarr (for large volumes)

Symlinks are used extensively to balance I/O performance and long-term storage:

  • Performance: Processing occurs on high-speed SSDs (local paths)
  • Archival: Final outputs symlinked to DANDI/LINC storage for long-term preservation
  • Efficiency: Avoids data duplication while maintaining fast access during processing
  • Transparency: Symlinks make data appear in both locations without copying

Symlinks are created:

  • From processing directories to DANDI/LINC storage for final outputs
  • From DANDI/LINC storage to processing directories for inputs (when needed)