-
Krzysztof Drewniak authored
- Add --host-pipeline=runner to rocmlir-driver, which does the [gpu and a bunch of stdlib dialects]->llvm steps to make host code and which compiles gpu kernels to HSA code objects - Make -c mean "-kernel-pipeline full -host-pipeline runner" instead of "-kernel-pipeline gpu" - This means that (with the exception of xmir-runner, which needed to do some runtime stuff to pick the target), we no longer have "runners" that do a bunch of non-trivial work on their input. An advantage of this is that benchmarking now doesn't include the time it takes to compile a GPU kernel (which it might have before) - Adjust the xmir runner pipeline to include the arithmetic ops expansion pass that was needed for host code validation, and rearrange it some to unify it with the old mlir-rocm-runner - Delete mlir-rocm-runner - Adjust tests, mainly - Switching all instances of mlir-rocm-runner to mlir-cpu-runer -O2 - Change the single-target verifier=clone tests to use the runner pipeline because they don't need the xmir functionality - Change the convolution harness tests to expect -c to produce a binary blob and not a gpu.module - Change various CPU e2e tests to include the relevant --convert-*-to-llvm invocations or to go through the runner pipeline - Change the CI perf/parameter sweep scripts accordingly - Change the rocm-run widget to pick up mlir-cpu-runner568aa0d7