Implementation of a GEMM (without fusions) cost model that accounts for...
Implementation of a GEMM (without fusions) cost model that accounts for compute, memory (HBM) and L2 overheads. Subsequent work will refine heuristics and add support for cost modeling GEMM fusions. At the moment, the implementation is tuned for NVIDIA H100 GPUs.
PiperOrigin-RevId: 748041075