Skip to content

OCT MATLAB batch processing, upload, and stitching

MATLAB batch processing (implementation detail)

See also the OCT data flow doc for a high-level overview of batch-level processing. This section provides additional implementation details.

Data hierarchy vs batch processing

Batches are a temporary grouping used only during MATLAB processing for efficiency.

Grid configuration

The grid configuration defines how tiles are organized for MATLAB batch processing:

  • grid_size_x: Number of batches (columns) per mosaic
  • Determines how many batches are needed to process all tiles in a mosaic
  • Each batch contains grid_size_y tiles
  • grid_size_y: Number of tiles per batch (rows)
  • Determines batch size for MATLAB processing
  • Normal and tilted illuminations share the same grid_size_y
  • Total tiles per mosaic: grid_size_x × grid_size_y

Why batches are used

Batch processing is used specifically because MATLAB processing is more efficient when processing multiple tiles together:

  • Reduces MATLAB startup overhead by processing many tiles per session
  • Enables better use of MATLAB's parallel operations
  • Reduces the number of MATLAB processes and improves resource utilization

Batch organization and state tracking

Tiles are organized into batches based on their position in the acquisition grid:

  • Each column (grid_size_x) becomes a batch
  • Each batch contains grid_size_y tiles (rows)
  • Batches are processed in parallel for efficiency

Batch completion is tracked using flag files stored in the state/ directory. See the events and state management doc for comprehensive details on flag file structure, lifecycle, and benefits. The batch-level flag files (batch-{batch_id}.started, .archived, .processed, .uploaded) enable progress tracking, idempotency, and crash recovery for MATLAB batch processing.

Post-MATLAB processing

Once MATLAB batch processing completes, the system operates on individual tiles again for downstream processing (stitching, QC, coordinate determination). The batch grouping is only relevant during MATLAB processing and exists purely as an implementation optimization; tiles remain the atomic data unit in the hierarchy.

Upload strategy

Separate upload flows (event-driven, non-blocking)

Uploads are handled by dedicated, event-triggered flows that run independently of compute pipelines:

  • Event-triggered: Upload flows are triggered by events (e.g., linc.oct.batch.uploaded)
  • Non-blocking: Upload flows don't block compute-intensive processing
  • Isolated: Upload failures don't affect compute pipeline
  • Independent retry: Upload flows can retry independently without blocking upstream processing

Upload queue management

Upload queue management enables controlled, concurrent uploads:

  • Concurrency control: Maximum number of concurrent uploads (e.g., 5)
  • Queue-based: Files are queued for upload, processed by background workers
  • Non-blocking enqueue: Adding files to upload queue returns immediately
  • Background processing: Actual uploads happen in background threads/processes

This prevents uploads from overwhelming network bandwidth or cloud storage APIs.

Cloud storage integration (DANDI/LINC)

  • DANDI archive: Processed data uploaded to DANDI archive for long-term preservation
  • LINC storage: Raw compressed tiles uploaded to LINC storage
  • Symlinks: Final outputs symlinked to DANDI/LINC storage locations
  • Metadata: Upload events include metadata for tracking and verification

Upload retry and error handling

  • Retry logic: Upload failures are retried automatically with exponential backoff
  • Error logging: Upload errors are logged for debugging and monitoring
  • Failure notification: Critical upload failures trigger notifications
  • Resumability: Partial uploads can be resumed using flag files

MATLAB integration

MATLAB invocation

MATLAB is invoked from Python via command-line interface:

  • Batch mode: MATLAB runs in batch mode (non-interactive) for automation
  • Function calls: Python constructs MATLAB function calls with batch of tiles
  • Path management: MATLAB paths are configured to include required toolboxes and functions
  • Output capture: MATLAB output is captured and logged for debugging

Batch processing requirement

MATLAB processes tiles in batches (not individually) for efficiency:

  • Spectral-to-complex: Multiple tiles passed to MATLAB function for batch conversion
  • Complex-to-processed: Multiple tiles passed to MATLAB function for batch processing
  • Reduced overhead: Batching reduces MATLAB startup overhead per tile

This is why batch-level processing exists - it's an optimization for MATLAB efficiency, not a data hierarchy level.

MATLAB functions used

High-level MATLAB functions (not specific implementation details):

  • Spectral-to-complex: Converts spectral raw data to complex format
  • Complex-to-processed: Converts complex data to 3D volumes and enface images
  • Surface finding: Automatic surface detection from intensity data
  • Registration: Thruplane registration for combining normal and tilted illuminations

Data flow between Python and MATLAB

  • Python → MATLAB: Batches of tile file paths passed to MATLAB functions
  • MATLAB processing: MATLAB reads files, processes tiles, writes outputs
  • MATLAB → Python: Processed tile files written to filesystem, Python reads results
  • File-based interface: Communication via filesystem (no in-memory data transfer)

Future migration strategy

When MATLAB steps are migrated to Python-native implementations:

  • Batch processing may no longer be necessary (Python can process tiles individually more efficiently)
  • Data hierarchy remains Tile → Mosaic (no change to fundamental structure)
  • Processing efficiency may improve (no MATLAB startup overhead)
  • System becomes more maintainable (single language codebase)

Coordinate determination and stitching

Fiji-based coordinate determination

Fiji (ImageJ) is used for initial tile alignment and coordinate determination:

  • Tile configuration: Fiji generates TileConfiguration.txt with initial tile positions
  • Overlap-based: Uses tile overlap information to align tiles
  • First slice only: Coordinate determination runs only for first slice of each illumination type
  • Template generation: Coordinates are processed and converted to reusable templates

Template generation and reuse strategy

Templates are generated once per illumination type and reused for all slices:

  • Template generation: Jinja2 templates generated from first slice coordinates
  • Template reuse: Subsequent slices of same illumination type reuse template
  • Efficiency: Avoids redundant coordinate determination for each slice
  • Consistency: Ensures consistent tile positioning across slices

Templates contain:

  • Tile positioning information
  • Scan resolution parameters
  • Base directory paths (parameterized for reuse)

Stitching process

2D enface stitching

  • Template application: Apply coordinate template to current mosaic tiles
  • Modality stitching: Stitch each enface modality independently (AIP, MIP, orientation, retardance, birefringence, surface)
  • Mask generation: Generate mask from stitched AIP using threshold
  • Mask application: Apply mask to all stitched enface outputs
  • Output formats: Save in multiple formats (NIfTI, JPEG, TIFF)

3D volume stitching

  • Focus finding: Determine optimal focus plane (first slice only)
  • Volume stitching: Stitch 3D volume modalities (dBI, O3D, R3D)
  • Template reuse: Use same coordinate template as 2D stitching
  • Mask application: Apply mask to stitched volumes

Mask generation and application

  • Threshold-based: Mask generated from stitched AIP using intensity threshold
  • Background removal: Mask removes background/noise regions
  • Consistent application: Same mask applied to all stitched modalities
  • Quality control: Mask quality validated as part of QC process