Stack

Technologies I use to build GPU-accelerated AI systems.

GPU Computing & HPC

High-Performance Computing and GPU-Accelerated Systems

  • CUDA
    Expert

    Custom kernel development and optimization

  • A100 / H100
    Expert

    NVIDIA enterprise GPU architectures

  • NCCL
    Expert

    Multi-GPU communication optimization

  • TensorRT
    Advanced

    GPU inference optimization

  • Triton
    Advanced

    GPU kernel development framework

  • cuDNN
    Advanced

    Deep learning GPU primitives

AI Infrastructure

Distributed AI Systems and Machine Learning Platforms

  • PyTorch
    Expert

    Distributed training and inference

  • Transformers
    Expert

    Large language model architectures

  • Distributed Training
    Expert

    Multi-node, multi-GPU scaling

  • Model Optimization
    Expert

    Quantization, pruning, distillation

  • MLOps
    Advanced

    Production ML pipeline management

  • Inference Serving
    Advanced

    High-throughput model serving

Systems Programming

Low-level systems and performance optimization

  • Rust
    Expert

    Systems programming and cryptography

  • Python
    Expert

    AI/ML development and automation

  • C++
    Advanced

    Performance-critical applications

  • SPDM Protocol
    Advanced

    Hardware security attestation

  • Cryptography
    Advanced

    ECDSA, secure protocols

Cloud & Infrastructure

Enterprise-scale deployment and orchestration

  • Kubernetes
    Expert

    GPU workload orchestration

  • Docker
    Expert

    Containerized GPU applications

  • AWS
    Advanced

    EC2 P4/P5 GPU instances

  • Slurm
    Advanced

    HPC cluster management

  • Prometheus
    Advanced

    GPU metrics and monitoring