About daolite
What is daolite?
daolite (Durham Adaptive Optics Latency Inspection and Timing Estimator) is a Python package for modeling and analyzing the computational latency of Adaptive Optics (AO) real-time control systems. It provides accurate timing models for various AO pipeline components, allowing researchers and engineers to design, optimize, and understand the performance characteristics of complex AO systems before implementation.
How daolite Works
Core Architecture
daolite uses a component-based pipeline architecture where AO systems are modeled as a series of connected processing stages:
Component Definition: Each processing stage (camera, centroider, reconstructor, etc.) is represented as a
PipelineComponentwith:A component type (CAMERA, CALIBRATION, CENTROIDER, RECONSTRUCTION, CONTROL, etc.)
Assigned compute resources (CPU or GPU specifications)
A timing estimation function
Parameters specific to that component
Dependencies on other components
Dependency Resolution: The pipeline automatically resolves dependencies between components to determine execution order, handling complex workflows with branching and parallel processing.
Timing Estimation: Each component’s timing function estimates processing time based on:
Hardware specifications (compute capability, memory bandwidth, network speed)
Algorithm complexity (FLOPs, memory access patterns)
Data size and structure
Whether the operation is memory-bound or compute-bound
Packetization: daolite models the realistic, packet-by-packet processing of data as it flows through the pipeline, not just end-to-end latency.
Timing Model Principles
The latency estimation in daolite follows these key principles:
Memory vs. Compute Bound Operations
For each operation, daolite determines whether it’s limited by:
Memory bandwidth: Time = (Data size × Access patterns) / Memory bandwidth
Computational throughput: Time = Required FLOPs / Available FLOPs
The component uses whichever is more limiting (the bottleneck).
Hardware Modeling
daolite models computational resources with realistic parameters:
# CPU model includes:
- Number of cores
- Core frequency
- FLOPs per cycle
- Memory bandwidth (channels, width, frequency)
- Network speed
- Driver overhead
# GPU model includes:
- Total FLOPs
- Memory bandwidth
- PCIe transfer rates
- Kernel launch overhead
Dependency and Scheduling
Components start when their dependencies complete
Supports partial dependencies (start when first data packet arrives)
Models realistic scheduling and data transfer between compute resources
Accounts for PCIe transfers between CPU and GPU
What Can You Model?
AO System Architectures
Single Conjugate AO (SCAO): Single wavefront sensor and deformable mirror
Multi-Conjugate AO (MCAO): Multiple wavefront sensors, multiple DMs at different altitudes
Laser Tomography AO (LTAO): Tomographic wavefront reconstruction
Ground Layer AO (GLAO): Wide field correction
Extreme AO (ExAO): High-order systems with thousands of actuators
Pipeline Components
Sensors and Readout
Camera readout (rolling shutter, global shutter)
Various interfaces (CameraLink, GigE Vision, custom)
Pixel transfer and buffering
Wavefront Sensing
Shack-Hartmann centroiding (CoG, correlation, matched filter)
Pyramid wavefront sensing (slopes, intensity, extended source)
Extended source correlation
Curvature sensing
Processing Stages
Pixel calibration and flat-fielding
Slope calculation and referencing
Wavefront reconstruction (matrix-vector multiply, Fourier methods)
Tomographic reconstruction
Deformable mirror control (integrator, optimal control)
Data Transfer
Network transfers (10GbE, 100GbE, InfiniBand)
PCIe transfers (Gen3, Gen4, Gen5)
Driver and DMA overhead
Switch latency
Hardware Configurations
Predefined Resources
daolite includes models for modern hardware:
CPUs: AMD EPYC 7763, Intel Xeon 8480, and others
GPUs: NVIDIA RTX 4090, H100, AMD MI250X
Custom YAML-based hardware definitions
Mixed Compute
Assign different components to different hardware
Model CPU-GPU hybrid systems
Optimize hardware allocation for minimum latency
Use Cases
System Design
Pre-implementation Analysis: Estimate system performance before building hardware
Hardware Selection: Compare different CPU/GPU combinations
Bottleneck Identification: Find which component limits system performance
Upgrade Planning: Predict performance improvements from hardware upgrades
Algorithm Development
Algorithm Comparison: Compare latency of different algorithms (e.g., CoG vs. correlation centroiding)
Optimization Validation: Verify that algorithm optimizations reduce latency as expected
Scaling Analysis: Understand how performance scales with system size
Real-time Control Design
Frame Rate Estimation: Calculate maximum achievable frame rates
Timing Budget: Allocate time budgets to different processing stages
Latency Requirements: Verify system meets latency requirements for telescope tracking
Multi-system Coordination: Model synchronized operation of multiple subsystems
Key Features
Flexible and Powerful
Any Component Order: Define components in any order, dependencies determine execution
Heterogeneous Computing: Mix CPUs and GPUs in the same pipeline
Custom Components: Easily add custom processing stages
YAML Configuration: Define systems in readable configuration files
Accurate and Realistic
Hardware-specific Models: Based on actual CPU/GPU specifications
Packetized Processing: Models realistic data flow, not just end-to-end timing
Memory and Compute: Considers both memory bandwidth and computational limits
Network Overhead: Includes driver delays, switch latency, protocol overhead
Easy to Use
Simple API: Intuitive Python API for defining pipelines
Built-in Components: Pre-built functions for common AO operations
Visualization: Automatic generation of timing diagrams
JSON Runner: Run pipelines from JSON configuration files
Extensible
Custom Hardware: Define custom CPU/GPU specifications
Custom Algorithms: Add your own timing estimation functions
Plugin Architecture: Extend with new component types
Open Source: Modify and adapt to your needs
Example Workflow
A typical daolite workflow looks like this:
from daolite import Pipeline, PipelineComponent, ComponentType
from daolite.compute import hardware
from daolite.simulation.camera import PCOCamLink
from daolite.pipeline.centroider import Centroider
from daolite.pipeline.reconstruction import Reconstruction
from daolite.pipeline.control import FullFrameControl
# Create pipeline
pipeline = Pipeline()
# Add camera component
pipeline.add_component(PipelineComponent(
component_type=ComponentType.CAMERA,
name="WFS Camera",
compute=hardware.amd_epyc_7763(), # CPU
function=PCOCamLink,
params={"n_pixels": 1024*1024, "group": 50}
))
# Add centroider on GPU
pipeline.add_component(PipelineComponent(
component_type=ComponentType.CENTROIDER,
name="Centroider",
compute=hardware.nvidia_rtx_4090(), # GPU
function=Centroider,
params={"n_valid_subaps": 6400},
dependencies=["WFS Camera"]
))
# Add reconstructor
pipeline.add_component(PipelineComponent(
component_type=ComponentType.RECONSTRUCTION,
name="Reconstructor",
compute=hardware.nvidia_rtx_4090(), # GPU
function=Reconstruction,
params={"n_slopes": 12800, "n_acts": 5000},
dependencies=["Centroider"]
))
# Add DM controller on CPU
pipeline.add_component(PipelineComponent(
component_type=ComponentType.CONTROL,
name="DM Control",
compute=hardware.amd_epyc_7763(), # CPU
function=FullFrameControl,
params={"n_acts": 5000},
dependencies=["Reconstructor"]
))
# Run and visualize
results = pipeline.run(debug=True)
pipeline.visualize(title="AO Pipeline Timing")
# Calculate frame rate
total_time = results["DM Control"][-1, 1] # in microseconds
max_fps = 1e6 / total_time
print(f"Maximum frame rate: {max_fps:.1f} Hz")
This produces a timing diagram showing when each component executes and the total system latency.
Accuracy and Limitations
Accuracy
daolite timing estimates are typically within 10-30% of real-world performance when:
Hardware specifications are accurate
Algorithm complexity is correctly characterized
System architecture matches the model
Similar types of hardware are compared
The model is most accurate for:
Relative comparisons between different configurations
Bottleneck identification (which component is limiting)
Scaling analysis (how performance changes with problem size)
Hardware selection (comparing CPU vs GPU implementations)
Limitations
Does not account for:
OS scheduling jitter
Cache effects (assumes ideal cache behavior)
Thermal throttling
Power management scaling
Contention from other processes
Simplifications:
Assumes optimal memory access patterns
Does not model low-level CPU/GPU microarchitecture details
Network model is simplified (no packet loss, constant latency)
Use for:
System design and planning
Algorithm comparison
Hardware selection
Bottleneck analysis
Not suitable for:
Exact real-time scheduling (use real-time OS tools)
Compliance testing (use actual hardware)
Safety-critical timing guarantees (verify on hardware)
Getting Started
To start using daolite, see:
Installation - Install daolite and dependencies
Quick Start Guide - Your first pipeline in 5 minutes
Pipeline Architecture - Understanding the pipeline architecture
Examples - Complete example pipelines
Development and Community
daolite is open source and welcomes contributions:
Repository: https://github.com/davetbarr/daolite
Documentation: https://daolite.readthedocs.io
Issues: Report bugs and request features on GitHub
Contributing: See Contributing to daolite for guidelines
License
daolite is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for details.
Citation
If you use daolite in your research, please cite:
daolite: Durham Adaptive Optics Latency Inspection and Timing Estimator
David Barr
https://github.com/davetbarr/daolite
(BibTeX entry coming soon)