About daolite

What is daolite?

daolite (Durham Adaptive Optics Latency Inspection and Timing Estimator) is a Python package for modeling and analyzing the computational latency of Adaptive Optics (AO) real-time control systems. It provides accurate timing models for various AO pipeline components, allowing researchers and engineers to design, optimize, and understand the performance characteristics of complex AO systems before implementation.

How daolite Works

Core Architecture

daolite uses a component-based pipeline architecture where AO systems are modeled as a series of connected processing stages:

  1. Component Definition: Each processing stage (camera, centroider, reconstructor, etc.) is represented as a PipelineComponent with:

    • A component type (CAMERA, CALIBRATION, CENTROIDER, RECONSTRUCTION, CONTROL, etc.)

    • Assigned compute resources (CPU or GPU specifications)

    • A timing estimation function

    • Parameters specific to that component

    • Dependencies on other components

  2. Dependency Resolution: The pipeline automatically resolves dependencies between components to determine execution order, handling complex workflows with branching and parallel processing.

  3. Timing Estimation: Each component’s timing function estimates processing time based on:

    • Hardware specifications (compute capability, memory bandwidth, network speed)

    • Algorithm complexity (FLOPs, memory access patterns)

    • Data size and structure

    • Whether the operation is memory-bound or compute-bound

  4. Packetization: daolite models the realistic, packet-by-packet processing of data as it flows through the pipeline, not just end-to-end latency.

Timing Model Principles

The latency estimation in daolite follows these key principles:

Memory vs. Compute Bound Operations

For each operation, daolite determines whether it’s limited by:

  • Memory bandwidth: Time = (Data size × Access patterns) / Memory bandwidth

  • Computational throughput: Time = Required FLOPs / Available FLOPs

The component uses whichever is more limiting (the bottleneck).

Hardware Modeling

daolite models computational resources with realistic parameters:

# CPU model includes:
- Number of cores
- Core frequency
- FLOPs per cycle
- Memory bandwidth (channels, width, frequency)
- Network speed
- Driver overhead

# GPU model includes:
- Total FLOPs
- Memory bandwidth
- PCIe transfer rates
- Kernel launch overhead

Dependency and Scheduling

  • Components start when their dependencies complete

  • Supports partial dependencies (start when first data packet arrives)

  • Models realistic scheduling and data transfer between compute resources

  • Accounts for PCIe transfers between CPU and GPU

What Can You Model?

AO System Architectures

  • Single Conjugate AO (SCAO): Single wavefront sensor and deformable mirror

  • Multi-Conjugate AO (MCAO): Multiple wavefront sensors, multiple DMs at different altitudes

  • Laser Tomography AO (LTAO): Tomographic wavefront reconstruction

  • Ground Layer AO (GLAO): Wide field correction

  • Extreme AO (ExAO): High-order systems with thousands of actuators

Pipeline Components

Sensors and Readout

  • Camera readout (rolling shutter, global shutter)

  • Various interfaces (CameraLink, GigE Vision, custom)

  • Pixel transfer and buffering

Wavefront Sensing

  • Shack-Hartmann centroiding (CoG, correlation, matched filter)

  • Pyramid wavefront sensing (slopes, intensity, extended source)

  • Extended source correlation

  • Curvature sensing

Processing Stages

  • Pixel calibration and flat-fielding

  • Slope calculation and referencing

  • Wavefront reconstruction (matrix-vector multiply, Fourier methods)

  • Tomographic reconstruction

  • Deformable mirror control (integrator, optimal control)

Data Transfer

  • Network transfers (10GbE, 100GbE, InfiniBand)

  • PCIe transfers (Gen3, Gen4, Gen5)

  • Driver and DMA overhead

  • Switch latency

Hardware Configurations

Predefined Resources

daolite includes models for modern hardware:

  • CPUs: AMD EPYC 7763, Intel Xeon 8480, and others

  • GPUs: NVIDIA RTX 4090, H100, AMD MI250X

  • Custom YAML-based hardware definitions

Mixed Compute

  • Assign different components to different hardware

  • Model CPU-GPU hybrid systems

  • Optimize hardware allocation for minimum latency

Use Cases

System Design

  • Pre-implementation Analysis: Estimate system performance before building hardware

  • Hardware Selection: Compare different CPU/GPU combinations

  • Bottleneck Identification: Find which component limits system performance

  • Upgrade Planning: Predict performance improvements from hardware upgrades

Algorithm Development

  • Algorithm Comparison: Compare latency of different algorithms (e.g., CoG vs. correlation centroiding)

  • Optimization Validation: Verify that algorithm optimizations reduce latency as expected

  • Scaling Analysis: Understand how performance scales with system size

Real-time Control Design

  • Frame Rate Estimation: Calculate maximum achievable frame rates

  • Timing Budget: Allocate time budgets to different processing stages

  • Latency Requirements: Verify system meets latency requirements for telescope tracking

  • Multi-system Coordination: Model synchronized operation of multiple subsystems

Key Features

Flexible and Powerful

  • Any Component Order: Define components in any order, dependencies determine execution

  • Heterogeneous Computing: Mix CPUs and GPUs in the same pipeline

  • Custom Components: Easily add custom processing stages

  • YAML Configuration: Define systems in readable configuration files

Accurate and Realistic

  • Hardware-specific Models: Based on actual CPU/GPU specifications

  • Packetized Processing: Models realistic data flow, not just end-to-end timing

  • Memory and Compute: Considers both memory bandwidth and computational limits

  • Network Overhead: Includes driver delays, switch latency, protocol overhead

Easy to Use

  • Simple API: Intuitive Python API for defining pipelines

  • Built-in Components: Pre-built functions for common AO operations

  • Visualization: Automatic generation of timing diagrams

  • JSON Runner: Run pipelines from JSON configuration files

Extensible

  • Custom Hardware: Define custom CPU/GPU specifications

  • Custom Algorithms: Add your own timing estimation functions

  • Plugin Architecture: Extend with new component types

  • Open Source: Modify and adapt to your needs

Example Workflow

A typical daolite workflow looks like this:

from daolite import Pipeline, PipelineComponent, ComponentType
from daolite.compute import hardware
from daolite.simulation.camera import PCOCamLink
from daolite.pipeline.centroider import Centroider
from daolite.pipeline.reconstruction import Reconstruction
from daolite.pipeline.control import FullFrameControl

# Create pipeline
pipeline = Pipeline()

# Add camera component
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CAMERA,
    name="WFS Camera",
    compute=hardware.amd_epyc_7763(),  # CPU
    function=PCOCamLink,
    params={"n_pixels": 1024*1024, "group": 50}
))

# Add centroider on GPU
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CENTROIDER,
    name="Centroider",
    compute=hardware.nvidia_rtx_4090(),  # GPU
    function=Centroider,
    params={"n_valid_subaps": 6400},
    dependencies=["WFS Camera"]
))

# Add reconstructor
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.RECONSTRUCTION,
    name="Reconstructor",
    compute=hardware.nvidia_rtx_4090(),  # GPU
    function=Reconstruction,
    params={"n_slopes": 12800, "n_acts": 5000},
    dependencies=["Centroider"]
))

# Add DM controller on CPU
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CONTROL,
    name="DM Control",
    compute=hardware.amd_epyc_7763(),  # CPU
    function=FullFrameControl,
    params={"n_acts": 5000},
    dependencies=["Reconstructor"]
))

# Run and visualize
results = pipeline.run(debug=True)
pipeline.visualize(title="AO Pipeline Timing")

# Calculate frame rate
total_time = results["DM Control"][-1, 1]  # in microseconds
max_fps = 1e6 / total_time
print(f"Maximum frame rate: {max_fps:.1f} Hz")

This produces a timing diagram showing when each component executes and the total system latency.

Accuracy and Limitations

Accuracy

daolite timing estimates are typically within 10-30% of real-world performance when:

  • Hardware specifications are accurate

  • Algorithm complexity is correctly characterized

  • System architecture matches the model

  • Similar types of hardware are compared

The model is most accurate for:

  • Relative comparisons between different configurations

  • Bottleneck identification (which component is limiting)

  • Scaling analysis (how performance changes with problem size)

  • Hardware selection (comparing CPU vs GPU implementations)

Limitations

  • Does not account for:

    • OS scheduling jitter

    • Cache effects (assumes ideal cache behavior)

    • Thermal throttling

    • Power management scaling

    • Contention from other processes

  • Simplifications:

    • Assumes optimal memory access patterns

    • Does not model low-level CPU/GPU microarchitecture details

    • Network model is simplified (no packet loss, constant latency)

  • Use for:

    • System design and planning

    • Algorithm comparison

    • Hardware selection

    • Bottleneck analysis

  • Not suitable for:

    • Exact real-time scheduling (use real-time OS tools)

    • Compliance testing (use actual hardware)

    • Safety-critical timing guarantees (verify on hardware)

Getting Started

To start using daolite, see:

Development and Community

daolite is open source and welcomes contributions:

License

daolite is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for details.

Citation

If you use daolite in your research, please cite:

daolite: Durham Adaptive Optics Latency Inspection and Timing Estimator
David Barr
https://github.com/davetbarr/daolite

(BibTeX entry coming soon)