Stack
Technologies I use to build GPU-accelerated AI systems.
GPU Computing & HPC
High-Performance Computing and GPU-Accelerated Systems
CUDA
Custom kernel development and optimization
A100 / H100
NVIDIA enterprise GPU architectures
NCCL
Multi-GPU communication optimization
TensorRT
GPU inference optimization
Triton
GPU kernel development framework
cuDNN
Deep learning GPU primitives
AI Infrastructure
Distributed AI Systems and Machine Learning Platforms
PyTorch
Distributed training and inference
Transformers
Large language model architectures
Distributed Training
Multi-node, multi-GPU scaling
Model Optimization
Quantization, pruning, distillation
MLOps
Production ML pipeline management
Inference Serving
High-throughput model serving
Systems Programming
Low-level systems and performance optimization
Rust
Systems programming and cryptography
Python
AI/ML development and automation
C++
Performance-critical applications
SPDM Protocol
Hardware security attestation
Cryptography
ECDSA, secure protocols
Cloud & Infrastructure
Enterprise-scale deployment and orchestration
Kubernetes
GPU workload orchestration
Docker
Containerized GPU applications
AWS
EC2 P4/P5 GPU instances
Slurm
HPC cluster management
Prometheus
GPU metrics and monitoring