• Xinya Zhang's avatar
    Add strides to all input tensors (#8) · 9044fe5e
    Xinya Zhang authored
    The flash attention kernel now accepts all possible Tensor layouts. The only limit now is the last dimension must be continuous, which is also required by CUTLASS implementation.
    9044fe5e
This project is licensed under the MIT License. Learn more
LICENSE 1.05 KB