Compute Resources

Overview

The Compute Resources module in daolite provides detailed models of various computing hardware to accurately estimate the performance of AO pipeline components. This module allows users to:

  1. Define custom hardware specifications

  2. Use pre-defined hardware profiles

  3. Compare performance across different systems

  4. Model heterogeneous computing (CPU + GPU)

Hardware Model Components

CPU Resources

CPU resources are modeled with the following parameters:

  • Number of cores: Physical CPU cores available

  • Core frequency: Clock speed of CPU cores in Hz

  • FLOPS per cycle: Floating point operations per clock cycle (vectorization capability)

  • Memory channels: Number of memory channels

  • Memory width: Width of each memory channel in bits

  • Memory frequency: Memory clock frequency in Hz

  • Network speed: Network interface speed in bits/second

  • Driver time: Overhead for kernel/driver interactions

# Example: Creating custom CPU resources
from daolite.compute import ComputeResources

# Define an AMD EPYC server
cpu = ComputeResources(
    hardware="CPU",
    cores=64,
    core_frequency=2.45e9,  # 2.45 GHz
    flops_per_cycle=32,    # AVX-512 support
    memory_channels=8,
    memory_width=64,       # 64 bits per channel
    memory_frequency=3200e6,  # 3200 MHz
    network_speed=100e9,   # 100 Gbps
    time_in_driver=5       # 5 μs driver overhead
)

GPU Resources

GPU resources are modeled with a simplified approach focusing on the key performance parameters:

  • Hardware type: Indicator that this is a GPU resource

  • Memory bandwidth: Peak memory bandwidth in bytes/second

  • FLOPS: Peak floating point operations per second

  • Network speed: PCIe or network interface speed

  • Driver time: GPU driver and kernel launch overhead

# Example: Creating custom GPU resources
from daolite.compute import ComputeResources

# Define an NVIDIA A100 GPU
gpu = ComputeResources(
    hardware="GPU",
    memory_bandwidth=1.6e12,     # 1.6 TB/s (HBM2e)
    flops=1.9e13,                # 19.5 TFLOPS (FP32)
    network_speed=25e9,          # PCIe 4.0 x16
    time_in_driver=20            # 20 μs driver overhead
)

Pre-defined Hardware Library

daolite includes a comprehensive library of pre-defined hardware profiles for common CPUs and GPUs:

CPU Profiles

from daolite.compute import hardware

# AMD CPUs
cpu1 = hardware.amd_epyc_7763()    # AMD EPYC 7763 (Milan)
cpu2 = hardware.amd_epyc_9654()    # AMD EPYC 9654 (Genoa)
cpu3 = hardware.amd_ryzen_7950x()  # AMD Ryzen 9 7950X

# Intel CPUs
cpu4 = hardware.intel_xeon_8480()  # Intel Xeon Platinum 8480+
cpu5 = hardware.intel_xeon_8462()  # Intel Xeon 8462Y+

# Use a pre-defined CPU in a pipeline
pipeline.add_component(PipelineComponent(
    name="Centroider",
    compute=hardware.amd_epyc_7763(),  # Easy to use!
    ...
))

GPU Profiles

from daolite.compute import hardware

# NVIDIA GPUs
gpu1 = hardware.nvidia_a100_80gb()  # NVIDIA A100 80GB
gpu2 = hardware.nvidia_h100_80gb()  # NVIDIA H100 80GB
gpu3 = hardware.nvidia_rtx_4090()   # NVIDIA RTX 4090

# AMD GPUs
gpu4 = hardware.amd_mi300x()        # AMD Instinct MI300X

# Use a pre-defined GPU in a pipeline
pipeline.add_component(PipelineComponent(
    name="Reconstructor",
    compute=hardware.nvidia_rtx_4090(),
    ...
))

Memory Model

The memory model in daolite calculates effective bandwidth based on:

  1. Theoretical peak bandwidth: Base calculation from hardware specs

    For CPUs: memory_channels * memory_width * memory_frequency / 8

    For GPUs: Directly specified memory bandwidth

  2. Access pattern efficiency: Real-world memory access patterns rarely achieve theoretical peak

    • Sequential access: ~80-95% efficiency

    • Strided access: ~40-60% efficiency

    • Random access: ~10-30% efficiency

  3. Cache effects: Optional modeling of cache benefits

Computation Model

The computation model estimates processing time based on:

  1. Theoretical peak FLOPS: Maximum floating point operations per second

    For CPUs: cores * core_frequency * flops_per_cycle

    For GPUs: Directly specified FLOPS

  2. Algorithm efficiency: Real-world efficiency compared to theoretical peak

    • Memory-bound algorithms: Typically limited by memory bandwidth

    • Compute-bound algorithms: Limited by computational throughput

    • Each algorithm has a scaling factor to account for implementation efficiency

Multiple Resource Types

daolite supports defining multiple resource types for different components:

# Define CPU for camera readout and DM control
cpu_resource = amd_epyc_7763()

# Define GPU for centroiding and reconstruction
gpu_resource = nvidia_rtx_4090()

# Use in pipeline components
pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CAMERA,
    name="Camera",
    compute=cpu_resource,  # CPU for camera
    # ...other parameters...
))

pipeline.add_component(PipelineComponent(
    component_type=ComponentType.CENTROIDER,
    name="Centroider",
    compute=gpu_resource,  # GPU for centroiding
    # ...other parameters...
))

Adding Custom Hardware Profiles

Users can extend the hardware library with custom profiles:

from daolite.compute import ComputeResources
from daolite.compute.resources import register_hardware_profile

# Create a custom hardware profile factory function
def my_custom_server():
    return ComputeResources(
        hardware="CPU",
        cores=128,
        core_frequency=3.2e9,
        flops_per_cycle=64,
        memory_channels=16,
        memory_width=64,
        memory_frequency=3600e6,
        network_speed=200e9,
        time_in_driver=2
    )

# Register the profile
register_hardware_profile("my_custom_server", my_custom_server)

# Later use it from the library
from daolite import my_custom_server

resource = my_custom_server()

Implementation Details

The Compute Resources module uses a factory pattern to create resources with consistent configurations:

  • ComputeResources: Class for both CPU and GPU resources

  • Pre-defined hardware profiles are factory functions

This design allows for easy extension and customization while maintaining type safety and consistent interfaces.

API Reference

For complete API details, see the Compute Resources API Reference section.