Compute Resources

Overview

The Compute Resources module in daolite provides detailed models of various computing hardware to accurately estimate the performance of AO pipeline components. This module allows users to:

Define custom hardware specifications
Use pre-defined hardware profiles
Compare performance across different systems
Model heterogeneous computing (CPU + GPU)

Hardware Model Components

CPU Resources

CPU resources are modeled with the following parameters:

Number of cores: Physical CPU cores available
Core frequency: Clock speed of CPU cores in Hz
FLOPS per cycle: Floating point operations per clock cycle (vectorization capability)
Memory channels: Number of memory channels
Memory width: Width of each memory channel in bits
Memory frequency: Memory clock frequency in Hz
Network speed: Network interface speed in bits/second
Driver time: Overhead for kernel/driver interactions

# Example: Creating custom CPU resources
from daolite.compute import ComputeResources

# Define an AMD EPYC server
cpu = ComputeResources(
    hardware="CPU",
    cores=64,
    core_frequency=2.45e9,  # 2.45 GHz
    flops_per_cycle=32,    # AVX-512 support
    memory_channels=8,
    memory_width=64,       # 64 bits per channel
    memory_frequency=3200e6,  # 3200 MHz
    network_speed=100e9,   # 100 Gbps
    time_in_driver=5       # 5 μs driver overhead
)

GPU Resources

GPU resources are modeled with a simplified approach focusing on the key performance parameters:

Hardware type: Indicator that this is a GPU resource
Memory bandwidth: Peak memory bandwidth in bytes/second
FLOPS: Peak floating point operations per second
Network speed: PCIe or network interface speed
Driver time: GPU driver and kernel launch overhead

# Example: Creating custom GPU resources
from daolite.compute import ComputeResources

# Define an NVIDIA A100 GPU
gpu = ComputeResources(
    hardware="GPU",
    memory_bandwidth=1.6e12,     # 1.6 TB/s (HBM2e)
    flops=1.9e13,                # 19.5 TFLOPS (FP32)
    network_speed=25e9,          # PCIe 4.0 x16
    time_in_driver=20            # 20 μs driver overhead
)

Pre-defined Hardware Library

daolite includes a comprehensive library of pre-defined hardware profiles for common CPUs and GPUs:

CPU Profiles

from daolite.compute import hardware

# AMD CPUs
cpu1 = hardware.amd_epyc_7763()    # AMD EPYC 7763 (Milan)
cpu2 = hardware.amd_epyc_9654()    # AMD EPYC 9654 (Genoa)
cpu3 = hardware.amd_ryzen_7950x()  # AMD Ryzen 9 7950X

# Intel CPUs
cpu4 = hardware.intel_xeon_8480()  # Intel Xeon Platinum 8480+
cpu5 = hardware.intel_xeon_8462()  # Intel Xeon 8462Y+

# Use a pre-defined CPU in a pipeline
pipeline.add_component(PipelineComponent(
    name="Centroider",
    compute=hardware.amd_epyc_7763(),  # Easy to use!
    ...
))

GPU Profiles

from daolite.compute import hardware

# NVIDIA GPUs
gpu1 = hardware.nvidia_a100_80gb()  # NVIDIA A100 80GB
gpu2 = hardware.nvidia_h100_80gb()  # NVIDIA H100 80GB
gpu3 = hardware.nvidia_rtx_4090()   # NVIDIA RTX 4090

# AMD GPUs
gpu4 = hardware.amd_mi300x()        # AMD Instinct MI300X

# Use a pre-defined GPU in a pipeline
pipeline.add_component(PipelineComponent(
    name="Reconstructor",
    compute=hardware.nvidia_rtx_4090(),
    ...
))

Memory Model

The memory model in daolite calculates effective bandwidth based on:

Theoretical peak bandwidth: Base calculation from hardware specs

For CPUs: memory_channels * memory_width * memory_frequency / 8

For GPUs: Directly specified memory bandwidth
Access pattern efficiency: Real-world memory access patterns rarely achieve theoretical peak
- Sequential access: ~80-95% efficiency
- Strided access: ~40-60% efficiency
- Random access: ~10-30% efficiency
Cache effects: Optional modeling of cache benefits

Computation Model

The computation model estimates processing time based on:

Theoretical peak FLOPS: Maximum floating point operations per second

For CPUs: cores * core_frequency * flops_per_cycle

For GPUs: Directly specified FLOPS
Algorithm efficiency: Real-world efficiency compared to theoretical peak
- Memory-bound algorithms: Typically limited by memory bandwidth
- Compute-bound algorithms: Limited by computational throughput
- Each algorithm has a scaling factor to account for implementation efficiency

Multiple Resource Types

daolite supports defining multiple resource types for different components:

# Define CPU for camera readout and DM control
cpu_resource = amd_epyc_7763()

# Define GPU for centroiding and reconstruction
gpu_resource = nvidia_rtx_4090()

# Use in pipeline components
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CAMERA,
    name="Camera",
    compute=cpu_resource,  # CPU for camera
    # ...other parameters...
))

pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CENTROIDER,
    name="Centroider",
    compute=gpu_resource,  # GPU for centroiding
    # ...other parameters...
))

Adding Custom Hardware Profiles

Users can extend the hardware library with custom profiles:

from daolite.compute import ComputeResources
from daolite.compute.resources import register_hardware_profile

# Create a custom hardware profile factory function
def my_custom_server():
    return ComputeResources(
        hardware="CPU",
        cores=128,
        core_frequency=3.2e9,
        flops_per_cycle=64,
        memory_channels=16,
        memory_width=64,
        memory_frequency=3600e6,
        network_speed=200e9,
        time_in_driver=2
    )

# Register the profile
register_hardware_profile("my_custom_server", my_custom_server)

# Later use it from the library
from daolite import my_custom_server

resource = my_custom_server()

Implementation Details

The Compute Resources module uses a factory pattern to create resources with consistent configurations:

ComputeResources: Class for both CPU and GPU resources
Pre-defined hardware profiles are factory functions

This design allows for easy extension and customization while maintaining type safety and consistent interfaces.

API Reference

For complete API details, see the Compute Resources API Reference section.