GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

In proceedings of The 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS '06), pages 45, Apr. 2006
Präsentiert: The 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS '06)
 

Abstract

In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n) / p).

While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. This is achieved by using efficient linear stream memory accesses (and by combining the optimal time approach with algorithms optimized for small input sequences).

We present an implementation on modern programmable graphics hardware (GPUs). On recent GPUs, our optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently large input sequences. Because of the excellent scalability of our algorithm with the number of stream processor units p (up to n / log2 n or even n / log n units, depending on the stream architecture), our approach profits heavily from the trend of increasing number of fragment processor units on GPUs, so that we can expect further speed improvement with upcoming GPU generations.

Hinweis: Eine erweiterte Fassung dieses Papers ist ebenfalls verfügbar.

Bilder

Paper herunterladen

Paper herunterladen

Zusätzliches Material

Bibtex

@INPROCEEDINGS{gress-2006-gpu-abisort,
      author = {Gre{\ss}, Alexander and Zachmann, Gabriel},
       pages = {45},
       title = {GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures},
   booktitle = {The 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS '06)},
        year = {2006},
       month = apr,
    abstract = {In this paper, we present a novel approach for parallel sorting on stream processing architectures.
                It is based on adaptive bitonic sorting. For sorting $n$ values utilizing $p$ stream processor
                units, this approach achieves the optimal time complexity $O((n log n) / p)$.
                
                While this makes our approach competitive with common sequential sorting algorithms not only from a
                theoretical viewpoint, it is also very fast from a practical viewpoint. This is achieved by using
                efficient linear stream memory accesses (and by combining the optimal time approach with algorithms
                optimized for small input sequences).
                
                We present an implementation on modern programmable graphics hardware (GPUs). On recent GPUs, our
                optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the
                CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently
                large input sequences. Because of the excellent scalability of our algorithm with the number of
                stream processor units $p$ (up to $n / log^2 n$ or even $n / log n$ units, depending on the stream
                architecture), our approach profits heavily from the trend of increasing number of fragment
                processor units on GPUs, so that we can expect further speed improvement with upcoming GPU
                generations.},
  conference = {The 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS '06)}
}