PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU and GPU. In the new 0.4.0 version, the library provides also a backend for Xeon Phi (MIC). With this new version, various performance benchmarks based on vector-vector routines, sparse matrix-vector multiplication and CG method on different backends have been released: OpenMP/CUDA/OpenCL- NVIDIA GPU, AMD GPU, CPU and Xeon Phi. More information: http://www.paralution.com/benchmarks/
This report describes advantages of using GPUs for analytical queries. It compares performance of the Alenka database engine using a GPU with the performance of Oracle on a SPARC server. More information on Alenka including source code: https://github.com/antonmks/Alenka
The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at http://wp.me/p1ZihD-1.
The Scalable Heterogeneous Computing Benchmark Suite (SHOC) is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures for general-purpose computing, and the software used to program them. Its initial focus is on systems containing Graphics Processing Units (GPUs) and multi-core processors, and on the OpenCL programming standard. It can be used on clusters as well as individual hosts.
(Danalis, A., Marin, G., McCurdy, C., Meredith, J., Roth, P., Spafford, K., Tipparaju, V., Vetter, J. (2010). The Scalable HeterOgeneous Computing (SHOC) Benchmark Suite.Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processors (GPGPU 2010). PDF. Mar 2010.)
This white paper from RapidMind and HP compares the performance of BLAS dense linear algebra operations, the FFT, and European option pricing on the GPU against highly tuned CPU implementations on the fastest available CPUs. All of the GPU implementations were made using the RapidMind Development Platform, which allows the use of standard C++ programming to create high-performance parallel applications that run on the GPU. The full source for the samples is available in conjunction with a new beta version of the RapidMind development platform. The results will also be presented as a poster at SC06.