Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster. (Adapting a message-driven parallel application to GPU-accelerated clusters. James C. Phillips, John E. Stone, and Klaus Schulten. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Research web site)
Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters
November 18th, 2008First GPUCamp to be held in Paris
November 18th, 2008The first GPUCamp will be held in Paris December the 6th at the well known La Cantine. This BarCamp aims at getting together the French GPGPU Community in order to start strong social and technical networking in France around this promising technology.
CAL.NET 1.0 and CUDA.NET 2.0.3 Released
November 18th, 2008CAL.NET is an effort to create a library to allow existing .NET applications access ATI/AMD GPU hardware for computational and graphical purposes. Programmers are able to manage the GPU hardware and execute kernels on it transparently. It is currently supported on Windows and Linux platforms with the latest drivers.
The latest release of CUDA.NET, 2.0.3, addresses issues with the previous release and adds many features including CUDA runtime API support and Direct3D/OpenGL interoperability. It is now possible to create hybrid applications with Tao and SlimDX, and an issue with copying vector data from device memory was fixed on Windows.
Floating Textures Source Code available at Sourceforge
November 18th, 2008Source Code for the Floating Textures algorithm presented at the Eurographics 2008 conference is now made available at Sourceforge. Floating Textures (paper and video available here) are a novel multi-view, projective texture mapping technique. While many previous multi-view texturing approaches lead to blurring and ghosting artifacts if 3D geometry and/or camera calibration are imprecise, Floating Textures warp (“float”) projected textures during run-time to preserve crisp, detailed texture appearance. The GPU implementation achieves interactive to real-time frame rates. The method is very generally applicable and can be used in combination with many image-based rendering methods or projective texturing applications. By using Floating Textures in conjunction with, e.g., visual hull rendering, light field rendering, or free-viewpoint video, improved rendering results can be obtained from fewer input images, less accurately calibrated cameras, and coarser 3D geometry proxies.
CIGPU 2009 Montreal 8-18 July 2009
November 18th, 2008Speeding Up Molecular Docking Calculations Using Consumer Graphics Hardware
November 18th, 2008The computer-aided prediction of protein-ligand complex conformations, i.e. docking a small ligand into the active site of a protein, is an important application in the early stages of the modern drug discovery process. For this problem a new approach called PLANTS (Protein-Ligand ANT System) is presented which is based on Ant Colony Optimization (ACO). Part of the work deals with the acceleration of this approach by moving the most time-consuming steps, the transformation of the protein and ligand structure and the evaluation of the objective function, to the GPU. The combined CPU-GPU approach is able to reach a speedup of 5 on average when comparing an optimized CPU-version (single core of a dual-core Pentium 4, 3 GHz) with the GPU-accelerated version (Nvidia Geforce 8800 GTX). Especially virtual screening applications, where the complex conformations of thousands to millions of ligands need to be predicted, can benefit from this speedup.
(Efficient Ant Colony Optimization Algorithms for Structure- and Ligand-Based Drug Design. Oliver Korb, PhD thesis, University of Konstanz, 2008)
SeismicCity Improves Depth Perception With NVIDIA GPU Computing Technology
November 18th, 2008From an NVIDIA Press Release:
SANTA CLARA, CA-OCTOBER 29, 2008- Houston-based SeismicCity announced today that it is using NVIDIA®® Tesla™ S1070 1U systems for Reverse Time Migration (RTM) – one of the most advanced seismic imaging techniques ever used by the oil and gas industry. SeismicCity selected the NVIDIA Tesla S1070 as it offered the fastest and most scalable implementation to run these complex algorithms enabling discovery of new oil and gas reserves faster.
“Last year, SeismicCity migrated its depth imaging system from a 1,000-core CPU based configuration to a configuration based on NVIDIA Tesla 1U systems,” said Claude Pignol, vice president of technology at SeismicCity. “NVIDIA’s advancements in GPU Computing are a major breakthrough. Transitioning to GPUs has given us a 10-20X performance boost, but more importantly, GPUs allow us to use computationally-intensive algorithms that we simply couldn’t process with CPUs. This is a huge advancement which allows us to use RTM and other more accurate but data-intensive algorithms for larger datasets.”
OpenCL Technical Briefing and Reception at SuperComputing ’08
November 4th, 2008Date: Monday November 17th 2008 – 5:30pm to 6:30pmLocation: Rio Grande Mexican Restaurant – right across the street from SC08
OpenCL is a royalty-free, open standard being created by the Khronos Group for programming heterogeneous parallel computing across GPUs and CPUs. OpenCL is being driven by industry-leading companies including AMD, Apple, ARM, Codeplay, Ericsson, Freescale, Imagination Technologies, IBM, Intel, Nokia, NVIDIA, Motorola, RapidMind and Texas Instruments. OpenCL enables portable programming of the emerging intersection of GPU and multi-core CPU compute capability and is designed to support a wide range of applications, from consumer software all the way to HPC solutions, through a low-level, high-performance, device-independent abstraction. This informal gathering will provide one of the first opportunities for the HPC community to gain an insight into the architecture and direction of this exciting development. Tex-Mex appetizers and cold beer will be provided! Please register early as seating is limited – we look forward to seeing you in Austin!
PGI x64+GPU Fortran & C99 Compilers
October 26th, 2008The PGI 8.0 release from The Portland Group includes a technology preview of the PGI accelerator programming strategy. PGI 8.0 compilers accept new directives that allow users to select compute intensive regions of Linux x64 Fortran and C99 programs and automatically offload them to an NVIDIA GPU. Until now HPC developers targeting GPU accelerators have had to rely on libraries or language extensions, and use of GPUs from Fortran has been extremely limited. Using the provisional support in PGI Release 8.0, programmers can accelerate Linux applications on x64+NVIDIA platforms by adding OpenMP-like compiler directives to existing high-level standard- compliant Fortran and C99 programs. At Supercomputing 2008 you can see the PGI x64+GPU compilers in action, and learn about PGI’s accelerator programming model and how you can use it to experiment with and embrace accelerated computing. You can also attend the PGI Vendor presentation by Michael Wolfe in room 19A/19B of the Austin convention center on Wednesday, November 19 from 10:30-11:00AM. Also, check out “Compilers and More: Programming GPUs Today” on HPCWire.
Gnort: High Performance Network Intrusion Detection Using Graphics Processors
October 26th, 2008This paper presents an intrusion detection system based on the Snort open-source NIDS that exploits the underutilized computational power of modern graphics cards to offload the costly pattern matching operations from the CPU, and thus increase the overall processing throughput. The prototype system, called Gnort, achieved a maximum traffic processing throughput of 2.3 Gbit/s using synthetic network traces, while when monitoring real traffic using a commodity Ethernet interface, it outperformed unmodified Snort by a factor of two. The results suggest that modern graphics cards can be used effectively to speed up intrusion detection systems, as well as other systems that involve pattern matching operations. (Gnort: High Performance Network Intrusion Detection Using Graphics Processors. G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. In Proceedings of the 11th International Symposium On Recent Advances In Intrusion Detection (RAID), 2008)