You are here: Home » Research
July 14th, 2008
Abstract:
In a previous publication, we have examined the fundamental difference between computational precision and result accuracy in the context of the iterative solution of linear systems as they typically arise in the Finite Element discretization of Partial Differential Equations (PDEs). In particular, we evaluated mixed- and emulated-precision schemes on commodity graphics processors (GPUs), which at that time only supported computations in single precision. With the advent of graphics cards that natively provide double precision, this report updates our previous results.
We demonstrate that with new co-processor hardware supporting native double precision, such as NVIDIA’s G200 and T10 architectures, the situation does not change qualitatively for PDEs, and the previously introduced mixed precision schemes are still preferable to double precision alone. But the schemes achieve significant quantitative performance improvements with the more powerful hardware. In particular, we demonstrate that a Multigrid scheme can accurately solve a common test problem in Finite Element settings with one million unknowns in less than 0.1 seconds, which is truely outstanding performance. We support these conclusions by exploring the algorithmic design space enlarged by the availability of double precision directly in the hardware.
(Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs). Dominik Göddeke and Robert Strzodka. Technical Report, 2008.)
Posted in Research | Tags: Finite Element Methods, mixed-precision arithmetic, Papers | Write a comment
June 6th, 2008
FEAST is a hardware-oriented MPI-based Finite Element solver toolkit. With the extension FEASTGPU the authors have previously demonstrated that significant speed-ups in the solution of the scalar Poisson problem can be achieved by the addition of GPUs as scientific co-processors to a commodity based cluster. In this paper the authors put the more general claim to the test: Applications based on FEAST, that ran only on CPUs so far, can be successfully accelerated on a co-processor enhanced cluster without any code modifications. The chosen solid mechanics code has higher accuracy requirements and a more diverse CPU/co-processor interaction than the Poisson example, and is thus better suited to assess the practicability of the acceleration approach. The paper presents accuracy experiments, a scalability test and acceleration results for different elastic objects under load. In particular, it demonstrates in detail that the single precision execution of the co-processor does not affect the final accuracy. The paper establishes how the local acceleration gains of factors 5.5 to 9.0 translate into 1.6- to 2.6-fold total speed-up. Subsequent analysis reveals which measures will increase these factors further. (Dominik Göddeke, Hilmar Wobker, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick McCormick, Stefan Turek. Co-Processor Acceleration of an Unmodified Parallel Solid Mechanics Code with FEASTGPU. International Journal of Computational Science and Engineering (to appear).)
Posted in Research | Tags: Finite Element Methods, mixed-precision arithmetic, Papers | Write a comment
May 25th, 2008
This paper by Lieberman et al. at the University of Maryland describes an application of GPU processing to the similarity join, a common operation in spatial databases. A similarity join takes two sets of points A, B and returns pairs p ∈ A, q ∈ B where the distance D(p,q) ≤ ε. The similarity join is a common spatial database operation with many applications. An algorithm named LSS is presented that executes on a GPU, taking advantage of the GPU’s parallelism and large data throughput. To achieve peak efficiency, LSS relies only on simple primitive operations that execute quickly on the GPU, such as the sorting and searching of arrays. It recasts the similarity join as a sort-and-search problem by mapping its input datasets onto a set of space-filling curves, generated by a parallel sort routine on the GPU. It then searches small intervals of these curves that are guaranteed to contain all pairs of the correct result. LSS offers a balance between time and work efficiencies and is shown to perform well when compared against existing prominent high-dimensional similarity join methods. (M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 1111-1120, Cancun, Mexico, April 2008.)
Posted in Research | Tags: Databases, Papers, Spatial Databases | Write a comment
May 25th, 2008
This paper by Cabido et al. presents a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1100% faster than the CPU version of the algorithm. (Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs. Raul Cabido, Antonio S. Montemayor, Juan Jose Pantrigo, and Bryson R. Payne. Machine Vision and Applications, Springer, 2008.)
Posted in Research | Tags: Computer Vision, Papers | Write a comment
May 25th, 2008
The advent of systems biology requires the simulation of ever-larger biomolecular systems, demanding a commensurate growth in computational power. This paper examines the use of the NVIDIA Tesla C870 graphics card programmed through the CUDA toolkit to accelerate the calculation of cutoff pair potentials, one of the most prevalent computations required by many different molecular modeling applications. The paper presents algorithms to calculate electrostatic potential maps for cutoff pair potentials. Whereas a straightforward approach for decomposing atom data leads to low computational efficiency, a new strategy enables fine-grained spatial decomposition of atom data that maps efficiently to the C870′s memory system while increasing work efficiency of atom data traversal by a factor of 5. The memory addressing flexibility exposed through CUDA’s SPMD programming model is crucial in enabling this new strategy. An implementation of the new algorithm provides a greater than threefold performance improvement over our previously published implementation and runs 12 to 20 times faster than optimized CPU-only code. The lessons learned are generally applicable to algorithms accelerated by uniform grid spatial decomposition. (C. I. Rodrigues, D. J. Hardy, J. E. Stone, K. Schulten, W. W. Hwu., GPU acceleration of cutoff pair potentials for molecular modeling applications. Proceedings of the 2008 Conference On Computing Frontiers, pp.273-282, 2008.) (http://www.ks.uiuc.edu/Research/gpu/)
Posted in Research | Tags: Molecular Dynamics, Papers, Scientific Computing | Write a comment
May 25th, 2008
Abstract: “The graphics processing unit (GPU) has become an integral part of today’s mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory andwidth that substantially outpaces its CPU counterpart. The GPU’s rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications. (J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, J. C. Phillips, “GPU Computing”, Proceedings of the IEEE, vol.96, no.5, pp.879-899, May 2008)
Posted in Research | Tags: Papers, Surveys | Write a comment
May 25th, 2008
A graph is an ordered pair G=(V,E) where V is a set of nodes and E is a set of edges connecting nodes. Graph drawing addresses the problem of creating geometric representations of graphs. Unlike matrices or images, graphs are unstructured and hence graph layout may not seem to be suitable for acceleration on the GPU. These papers present two GPU-accelerated graph drawing algorithms which are able to quickly compute aesthetic layouts of large graphs. One is for the layout of a single graph and the other is for computing stable layouts of a sequence of graphs. Speedups of 5.5x to 17x relative to a CPU implementation are demonstrated. (Yaniv Frishman and Ayellet Tal, Multi-Level Graph Layout on the GPU, IEEE Transactions on Visualization and Computer Graphics (Proceedings Information Visualization 2007), 13(6):1310-1317, 2007)
(Yaniv Frishman and Ayellet Tal, Online Dynamic Graph Drawing, accepted to IEEE Transactions on Visualization and Computer Graphics)
Posted in Research | Tags: Graph Algorithms, Papers | Write a comment
April 2nd, 2008
The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two biological sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. This paper by Svetlin Manavski and Giorgio Valle describes SmithWaterman-CUDA, an open-source project to perform fast sequence alignment on the GPU. Although the software performs the optimal Smith-Waterman alignment it is faster than heuristics approaches like FASTA and BLAST. The tests on protein data banks show up to 30x speed up related to reference CPU implementations. (Svetlin A. Manavski, Giorgio Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008, 9(Suppl 2):S10 (26 March 2008))
Posted in Research | Tags: Bioinformatics, Papers | Write a comment
April 2nd, 2008
Abstract: “We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). Taking advantage of GPU features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts. (Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.)
Posted in Research | Tags: Databases, Papers | Write a comment
April 2nd, 2008
Abstract: Mackey-Glass chaotic time series prediction and nuclear protein classification show the feasibility of evaluating genetic programming populations directly on parallel consumer gaming graphics processing units. Using a Linux KDE computer equipped with an NVIDIA GeForce 8800 GTX graphics processing unit card the C++ SPMD interpretter evolves programs at Giga GP operations per second (895 million GPops). We use the RapidMind general processing on GPU (GPGPU) framework to evaluate an entire population of a quarter of a million individual programs on a non-trivial problem in 4 seconds. An efficient reverse polish notation (RPN) tree based GP is given. (A SIMD interpreter for Genetic Programming on GPU Graphics Cards. W.B. Langdon and W. Banzhaf. In M. Neill, L. Vanneschi, A.I. Esparcia Alcazar, S. Gustafson eds., EuroGP 2008, pp73-85. Springer, LNCS 4971, 26-28 March, Naples.)
Posted in Research | Tags: Genetic Algorithms, Papers | Write a comment