https://hgpu.org/?p=15628
An Efficient Implementation of the Longest Common Subsequence Algorithm with Bit-Parallelism on GPUs