About daolite

What is daolite?

daolite (Durham Adaptive Optics Latency Inspection and Timing Estimator) is a Python package for modeling and analyzing the computational latency of Adaptive Optics (AO) real-time control systems. It provides accurate timing models for various AO pipeline components, allowing researchers and engineers to design, optimize, and understand the performance characteristics of complex AO systems before implementation.

How daolite Works

Core Architecture

daolite uses a component-based pipeline architecture where AO systems are modeled as a series of connected processing stages:

Component Definition: Each processing stage (camera, centroider, reconstructor, etc.) is represented as a PipelineComponent with:
- A component type (CAMERA, CALIBRATION, CENTROIDER, RECONSTRUCTION, CONTROL, etc.)
- Assigned compute resources (CPU or GPU specifications)
- A timing estimation function
- Parameters specific to that component
- Dependencies on other components
Dependency Resolution: The pipeline automatically resolves dependencies between components to determine execution order, handling complex workflows with branching and parallel processing.
Timing Estimation: Each component’s timing function estimates processing time based on:
- Hardware specifications (compute capability, memory bandwidth, network speed)
- Algorithm complexity (FLOPs, memory access patterns)
- Data size and structure
- Whether the operation is memory-bound or compute-bound
Packetization: daolite models the realistic, packet-by-packet processing of data as it flows through the pipeline, not just end-to-end latency.

Timing Model Principles

The latency estimation in daolite follows these key principles:

Memory vs. Compute Bound Operations

For each operation, daolite determines whether it’s limited by:

Memory bandwidth: Time = (Data size × Access patterns) / Memory bandwidth
Computational throughput: Time = Required FLOPs / Available FLOPs

The component uses whichever is more limiting (the bottleneck).

Hardware Modeling

daolite models computational resources with realistic parameters:

# CPU model includes:
- Number of cores
- Core frequency
- FLOPs per cycle
- Memory bandwidth (channels, width, frequency)
- Network speed
- Driver overhead

# GPU model includes:
- Total FLOPs
- Memory bandwidth
- PCIe transfer rates
- Kernel launch overhead

Dependency and Scheduling

Components start when their dependencies complete
Supports partial dependencies (start when first data packet arrives)
Models realistic scheduling and data transfer between compute resources
Accounts for PCIe transfers between CPU and GPU

What Can You Model?

AO System Architectures

Single Conjugate AO (SCAO): Single wavefront sensor and deformable mirror
Multi-Conjugate AO (MCAO): Multiple wavefront sensors, multiple DMs at different altitudes
Laser Tomography AO (LTAO): Tomographic wavefront reconstruction
Ground Layer AO (GLAO): Wide field correction
Extreme AO (ExAO): High-order systems with thousands of actuators

Pipeline Components

Sensors and Readout

Camera readout (rolling shutter, global shutter)
Various interfaces (CameraLink, GigE Vision, custom)
Pixel transfer and buffering

Wavefront Sensing

Shack-Hartmann centroiding (CoG, correlation, matched filter)
Pyramid wavefront sensing (slopes, intensity, extended source)
Extended source correlation
Curvature sensing

Processing Stages

Pixel calibration and flat-fielding
Slope calculation and referencing
Wavefront reconstruction (matrix-vector multiply, Fourier methods)
Tomographic reconstruction
Deformable mirror control (integrator, optimal control)

Data Transfer

Network transfers (10GbE, 100GbE, InfiniBand)
PCIe transfers (Gen3, Gen4, Gen5)
Driver and DMA overhead
Switch latency

Hardware Configurations

Predefined Resources

daolite includes models for modern hardware:

CPUs: AMD EPYC 7763, Intel Xeon 8480, and others
GPUs: NVIDIA RTX 4090, H100, AMD MI250X
Custom YAML-based hardware definitions

Mixed Compute

Assign different components to different hardware
Model CPU-GPU hybrid systems
Optimize hardware allocation for minimum latency

Use Cases

System Design

Pre-implementation Analysis: Estimate system performance before building hardware
Hardware Selection: Compare different CPU/GPU combinations
Bottleneck Identification: Find which component limits system performance
Upgrade Planning: Predict performance improvements from hardware upgrades

Algorithm Development

Algorithm Comparison: Compare latency of different algorithms (e.g., CoG vs. correlation centroiding)
Optimization Validation: Verify that algorithm optimizations reduce latency as expected
Scaling Analysis: Understand how performance scales with system size

Real-time Control Design

Frame Rate Estimation: Calculate maximum achievable frame rates
Timing Budget: Allocate time budgets to different processing stages
Latency Requirements: Verify system meets latency requirements for telescope tracking
Multi-system Coordination: Model synchronized operation of multiple subsystems

Key Features

Flexible and Powerful

Any Component Order: Define components in any order, dependencies determine execution
Heterogeneous Computing: Mix CPUs and GPUs in the same pipeline
Custom Components: Easily add custom processing stages
YAML Configuration: Define systems in readable configuration files

Accurate and Realistic

Hardware-specific Models: Based on actual CPU/GPU specifications
Packetized Processing: Models realistic data flow, not just end-to-end timing
Memory and Compute: Considers both memory bandwidth and computational limits
Network Overhead: Includes driver delays, switch latency, protocol overhead

Easy to Use

Simple API: Intuitive Python API for defining pipelines
Built-in Components: Pre-built functions for common AO operations
Visualization: Automatic generation of timing diagrams
JSON Runner: Run pipelines from JSON configuration files

Extensible

Custom Hardware: Define custom CPU/GPU specifications
Custom Algorithms: Add your own timing estimation functions
Plugin Architecture: Extend with new component types
Open Source: Modify and adapt to your needs

Example Workflow

A typical daolite workflow looks like this:

from daolite import Pipeline, PipelineComponent, ComponentType
from daolite.compute import hardware
from daolite.simulation.camera import PCOCamLink
from daolite.pipeline.centroider import Centroider
from daolite.pipeline.reconstruction import Reconstruction
from daolite.pipeline.control import FullFrameControl

# Create pipeline
pipeline = Pipeline()

# Add camera component
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CAMERA,
    name="WFS Camera",
    compute=hardware.amd_epyc_7763(),  # CPU
    function=PCOCamLink,
    params={"n_pixels": 1024*1024, "group": 50}
))

# Add centroider on GPU
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CENTROIDER,
    name="Centroider",
    compute=hardware.nvidia_rtx_4090(),  # GPU
    function=Centroider,
    params={"n_valid_subaps": 6400},
    dependencies=["WFS Camera"]
))

# Add reconstructor
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.RECONSTRUCTION,
    name="Reconstructor",
    compute=hardware.nvidia_rtx_4090(),  # GPU
    function=Reconstruction,
    params={"n_slopes": 12800, "n_acts": 5000},
    dependencies=["Centroider"]
))

# Add DM controller on CPU
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CONTROL,
    name="DM Control",
    compute=hardware.amd_epyc_7763(),  # CPU
    function=FullFrameControl,
    params={"n_acts": 5000},
    dependencies=["Reconstructor"]
))

# Run and visualize
results = pipeline.run(debug=True)
pipeline.visualize(title="AO Pipeline Timing")

# Calculate frame rate
total_time = results["DM Control"][-1, 1]  # in microseconds
max_fps = 1e6 / total_time
print(f"Maximum frame rate: {max_fps:.1f} Hz")

This produces a timing diagram showing when each component executes and the total system latency.

Accuracy and Limitations

Accuracy

daolite timing estimates are typically within 10-30% of real-world performance when:

Hardware specifications are accurate
Algorithm complexity is correctly characterized
System architecture matches the model
Similar types of hardware are compared

The model is most accurate for:

Relative comparisons between different configurations
Bottleneck identification (which component is limiting)
Scaling analysis (how performance changes with problem size)
Hardware selection (comparing CPU vs GPU implementations)

Limitations

Does not account for:
- OS scheduling jitter
- Cache effects (assumes ideal cache behavior)
- Thermal throttling
- Power management scaling
- Contention from other processes
Simplifications:
- Assumes optimal memory access patterns
- Does not model low-level CPU/GPU microarchitecture details
- Network model is simplified (no packet loss, constant latency)
Use for:
- System design and planning
- Algorithm comparison
- Hardware selection
- Bottleneck analysis
Not suitable for:
- Exact real-time scheduling (use real-time OS tools)
- Compliance testing (use actual hardware)
- Safety-critical timing guarantees (verify on hardware)

Getting Started

To start using daolite, see:

Installation - Install daolite and dependencies
Quick Start Guide - Your first pipeline in 5 minutes
Pipeline Architecture - Understanding the pipeline architecture
Examples - Complete example pipelines

Development and Community

daolite is open source and welcomes contributions:

Repository: https://github.com/davetbarr/daolite
Documentation: https://daolite.readthedocs.io
Issues: Report bugs and request features on GitHub
Contributing: See Contributing to daolite for guidelines

License

daolite is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for details.

Citation

If you use daolite in your research, please cite:

daolite: Durham Adaptive Optics Latency Inspection and Timing Estimator
David Barr
https://github.com/davetbarr/daolite

(BibTeX entry coming soon)