As the computing power of various platforms intended for games and similar applications is increasing rapidly, they attract the interest of professionals in the HPC community. As an example, modern graphics processing units (GPUs) are often used for HPC in GPGPU. Another example is the Cell Broadband Engine of the Playstation3 (PS3) that has a multicore architecture that lends itself for HPC. These platforms are not conventional HPC platforms; nonetheless they are used for HPC purposes and even clusters of such computing resources are being built with great success. Both the computing power and the low cost compared to conventional HPC resources make them very interesting. The aim of this workshop is to focus on such unconventional resources for HPC. Only imagination sets the limit for the kinds of devices that can be used for HPC end even be combined to form clusters. (UCHPC ’09 Website, Call for Papers)
The Khronos™ Group today announced the ratification and public release of the OpenCL™ 1.0 specification, the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software. Proposed six months ago as a draft specification by Apple, OpenCL has been developed and ratified by industry-leading companies including 3DLABS, Activision Blizzard, AMD, Apple, ARM, Barco, Broadcom, Codeplay, Electronic Arts, Ericsson, Freescale, HI, IBM, Intel Corporation, Imagination Technologies, Kestrel Institute, Motorola, Movidia, Nokia, NVIDIA, QNX, RapidMind, Samsung, Seaweed, TAKUMI, Texas Instruments and Umeå University. The OpenCL 1.0 specification and more details are available at http://www.khronos.org/opencl/
At Khronos “Developer University” today at SIGGRAPH Asia in Singapore, Khronos members publicly launched OpenCL 1.0 with a presentation of the specification and source code examples.
The new gDEBugger V4.4 adds in-depth analysis of OpenGL memory usage by tracking graphics memory allocated objects, their memory consumption and allocation call stacks. Also new in this version are graphics memory leak detection and the ability to break on them.
Using these new features will enable OpenGL and OpenGL ES developers to optimize their applications’ memory consumption and improve overall application performance.
gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API, lets programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Windows and Linux operating systems. (Graphic Remedy Website)
At SC08, Aggregate.Org/University of Kentucky demonstrated open source technology for running arbitrary MIMD programs directly on GPUs. There are two environments for MOG, a simulator which interprets the MIMD code and a “Meta-State Converter” compilation system which does state space transformation of MIMD code into pure (SIMD) native GPU code. Applying the current version of either, MIMD C code using shared memory communication can do recursion, etc., while running on a CUDA GPU. Support for both C and Fortran, with both shared memory and MPI for communications, and support of both NVIDIA CUDA and ATI CAL targets, is planned. The work is very new, but detailed publications, performance benchmarks, and code releases are expected to start to appear by early next year. (MOG at SC08)
This is a GPGPU event a long time in the making. Since the advent of general-purpose APIs and compilers for GPUs it has been predicted that GPUs would one day be used to help boost the performance of Supercomputers. With the latest release of the Top500 Supercomputer list, that prediction has become a reality.
More details from an NVIDIA press release:
NVIDIA Tesla Powers 29th Most Powerful Supercomputer in the World
SC08—AUSTIN, TX—NOVEMBER 17, 2008—The Tokyo Institute of Technology (Tokyo Tech) today announced a collaboration with NVIDIA to use NVIDIA® Tesla™ GPUs to boost the computational horsepower of its TSUBAME supercomputer. Through the addition of 170 Tesla S1070 1U systems, the TSUBAME supercomputer now delivers nearly 170 TFLOPS of theoretical peak performance, as well as 77.48 TFLOPS of measured Linpack performance, placing it, again, amongst the top ranks in the world’s Top 500 Supercomputers.
“Tokyo Tech is constantly investigating future computing platforms and it had become clear to us that to make the next major leap in performance, TSUBAME had to adopt GPU computing technologies,” said Satoshi Matsuoka, division director of the Global Scientific Information and Computing Center at Tokyo Tech. “In testing our key applications, the Tesla GPUs delivered speed-ups that we had never seen before, sometimes even orders of magnitude – a tremendous competitive boost for our scientists and engineers in reducing their time to solution.”
Speaking to the ease of implementation, Matsuoka continued,
“The entire upgrade was carried out in 1 week, and the TSUBAME supercomputer remained live throughout. This is an unprecedented feat in top-level supercomputing.”
A launch event was held Monday night at Austin’s Rio Grande Mexican Restaurant in conjuntion with Supercomputing 2008, to celebrate the newly completed OpenCL specification. No live demos of OpenCL applications were shown because the OpenCL spec must first be ratified by by all members of the Khronos Group before it can be publicly released. Still, the fact that this group has completed the complex specification in less than six months is nothing less than amazing. Macworld has posted an article discussing the event including interviews with members of the OpenCL working group. More information about OpenCL is available at the Khronos Group Website.
From a press release:
World’s Most Powerful Global Computation Software Now GPU Accelerated
SC08—AUSTIN, TX—NOVEMBER 18, 2008—At SC08, Wolfram Research will demonstrate a new version of Mathematica, the world’s most powerful general computational software, that integrates CUDA®, NVIDIA’s parallel GPU computing architecture. This new version is expected to give Mathematica users an unprecedented performance increase of 10-100X in numerical computing, modeling, simulation and visual computations, without the need to learn or write C code.
“Since its initial release, Mathematica has been adopted by over 3 million professionals across the entire global technical computing community, and it has had a profound effect on how computers are used across many fields,” said Joy Costa, director of global partnerships at Wolfram Research. “The prospect of a hundred fold increase in Mathematica 7 performance is staggering. CUDA enabled Mathematica will revolutionize the world of numerical computation.”
“With Mathematica 7, researchers and scientists can easily tap the enormous parallel processing power of NVIDIA GPU’s through a familiar high level interface,” said Andy Keane, general manager of the GPU Computing business at NVIDIA. This is truly transformative, giving Mathematica users computational horsepower like never before and reducing computation time in some cases from days to a matter of minutes.”
The demonstration of the CUDA-accelerated release of Mathematica coincides with the launch of the NVIDIA® Tesla™ Personal Supercomputer at this year’s SC08. Priced in the range of traditional PC workstations, Tesla Personal Supercomputers are unrivalled in price and performance. Available in configurations of up to 4 Tesla GPUs in a single system, Tesla Personal Supercomputers deliver up to 4 Teraflops of computing performance from up to 960 parallel processing cores.
From a press release:
NVIDIA Tesla Makes Personal SuperComputing A Reality
Tesla GPUs Enable Cluster Class Performance On The Desktop at 1/10th The Power
SC08—AUSTIN, TX—NOVEMBER 18 2008— Today, scientific research is carried out on supercomputing clusters, a shared resource that consumes hundreds of kilowatts of power and costs millions of dollars to build and maintain. As a result, researchers must fight for time on these resources, slowing their work and delaying results. NVIDIA and its worldwide partners today announced the availability of the GPU-based Tesla™ Personal Supercomputer, which delivers the equivalent computing power of a cluster, at 1/100th of the price and in a form factor of a standard desktop workstation.
From a press release:
ATI Stream is a set of advanced hardware and software technologies that enable AMD graphics processors (GPU), working in concert with the system’s central processor (CPU), to accelerate many applications beyond just graphics. This enables better balanced platforms capable of running demanding computing tasks faster than ever*.
November 13 News Summary
- On December 10, AMD plans to release for download a free ATI Catalyst™ driver update that instantly unlocks new ATI Stream acceleration capabilities already built into millions of ATI Radeon™ graphics cards.
- ATI Stream-enabled software titles for entertainment, gaming and productivity are being released or are under development by a growing list of the world’s top independent software vendors (ISVs), including ArcSoft and CyberLink.
Read the rest of this entry »
Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster. (Adapting a message-driven parallel application to GPU-accelerated clusters. James C. Phillips, John E. Stone, and Klaus Schulten. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Research web site)