August 2025 GPU OptimizationOptimizing FP4 Mixed-Precision Inference on AMD GPUsLearn how we developed Petit, a collection of optimized FP16/BF16 x FP4 mixed-precision GPU kernels for AMD GPUs, achieving 1.74x faster inference and up to 3.7x performance improvements.