June 1st, 2010
The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Realtime, and XML/RDF Processing) using various processor architectures (e.g., commodity and specialized Multi-core CPUs, Many-core GPUs, and FPGAs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and multicore programming strategies like OpenCL.
More information and the full call can be found here: http://www.adms-conf.org/
Read the rest of this entry »
Posted in Events, Research | Tags: Call for Papers, Cloud Computing, data management, High-Performance Computing | Write a comment
May 31st, 2010
The June 2010 Top500 list of the world’s fastest supercomputers was released this week at ISC 2010. While the US Jaguar supercomputer (located at the Department of Energy’s Oak Ridge Leadership Computing Facility) retained the top spot in Linpack performance, a Chinese cluster called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a GPU-accelerated system, or a Chinese system, has ever achieved on the Top500 list.
For more information, visit www.TOP500.org.
Posted in Events, Press | Tags: Supercomputing, Top500 | 1 Comment
May 30th, 2010
Abstract:
The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential equations with finite elements. We report on at least one order of magnitude speedup over highly-tuned conventional CPU implementations, without sacrificing either accuracy m
or functionality. In more detail, this thesis includes the following contributions:
Read the rest of this entry »
Posted in Research | Tags: Dissertations | 1 Comment
May 30th, 2010
AMD is offering an introductory tutorial to OpenCL™ that will be held alongside the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC’10). The tutorial is a “programmer’s introduction” which covers the ideas behind OpenCL™ and their translation to source code. Read the rest of this entry »
Posted in Developer Resources, Events | Tags: AMD, ATI Stream, OpenCL, Tutorials & Courses | 1 Comment
May 30th, 2010
Abstract:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)
Posted in Research | Tags: Image Processing, Multicore, NVIDIA CUDA, Papers | 5 Comments
May 20th, 2010
Advanced Micro Devices (AMD) recently released ATI Stream Profiler version 1.3. ATI Stream Profiler is a Microsoft® Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your OpenCL™ application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application’s performance.
Features of the tool include:
- Measure the execution time of an OpenCL kernel
- Query the hardware performance counters on ATI Radeon graphics cards
- Display the memory traffic from and to GPU
- Compare multiple runs (sessions) of the same or different programs
- Store the profile data for each run in a csv file
- Display the IL and ISA (hardware disassembly) code of the OpenCL kernel
Posted in Developer Resources | Tags: AMD, OpenCL, Profiling | Write a comment
May 20th, 2010
Abstracts due…24 September 2010
Papers due…1 October 2010
Anchorage, home to moose, bears, birds and whales, is strategically located at almost equal flying distance from Europe, Asia and the Eastern USA. Embraced by six mountain ranges, with views of Mount McKinley in Denali National Park, and warmed by a maritime climate, the area offers year-round adventure, recreation, and sporting events. It is a fitting destination for IPDPS to mark a quarter century of tracking developments in computer science. IPDPS serves as a forum for engineers and scientists from around the world to present their latest research findings in the fields of parallel processing and distributed computing. The five-day program will follow the usual format of contributed papers, invited speakers, and panels mid week, framed by workshops held on the first and last days. To celebrate the 25th year of IPDPS, plan to come early and stay late and also enjoy a modern city surrounded by spectacular wilderness. For updates on IPDPS 2011, visit the Web at www.ipdps.org.
Posted in Events, Research | Tags: Call for Papers, Parallel Computing | Write a comment
May 20th, 2010
The GPU Technology Conference (GTC 2010) will be held Sept. 20-23, 2010 in San Jose, Calif. Developers, researchers, scientists and entrepreneurs are invited to submit proposals on GPU-related topics. See www.nvidia.com/gtc.
GPU Developers Summit: Session Topics deadline: June 1, 2010
Emerging Companies Summit: “CEO on Stage” Nominations deadline: August 1, 2010
NVIDIA Research Summit: Posters deadline: August 15, 2010
To submit a proposal, you will be asked to set up a GTC 2010 account so you can track the status of your submission.
Submission guidelines: www.nvidia.com/object/call_for_submissions.html
Join GTC 2010 mailing list: www.nvidia.com/object/email_updates.html
Posted in Events, Research | Tags: Call for Papers, Conferences | Write a comment
May 20th, 2010
Abstract:
Random numbers are extensively used on the GPU. As more computation is ported to the GPU, it can no longer be treated as rendering hardware alone. Random number generators (RNG) are expected to cater general purpose and graphics applications alike. Such diversity adds to expected requirements of a RNG. A good GPU RNG should be able to provide repeatability, random access, multiple independent streams, speed, and random numbers free from detectable statistical bias. A specific application may require some if not all of the above characteristics at one time. In particular, we hypothesize that not all algorithms need the highest-quality random numbers, so a good GPU RNG should provide a speed quality tradeoff that can be tuned for fast low quality or slower high quality random numbers.
We propose that the Tiny Encryption Algorithm satisfies all of the requirements of a good GPU Pseudo Random Number Generator. We compare our technique against previous approaches, and present an evaluation using standard randomness test suites as well as Perlin noise and a Monte-Carlo shadow algorithm. We show that the quality of random number generation directly affects the quality of the noise produced, however, good quality noise can still be produced with a lower quality random number generator.
(Fahad Zafar, Aaron Curtis and Marc Olano, “GPU Random Numbers via the Tiny Encryption Algorithm”, HPG 2010: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on High Performance Graphics, (Saarbrücken, Germany, June 2010. Link to preprint.)
Posted in Research | Tags: High-Performance Graphics, NVIDIA CUDA, Papers, Random Number Generation | 1 Comment
May 20th, 2010
HOOMD-blue stands for Highly Optimized Object-oriented Many-particle Dynamics — Blue Edition. It performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to dozens of processor cores on a fast cluster.
HOOMD-blue 0.9.0 is a major new release. Highlights include:
- Support for Fermi generation GPUs
- Performance enhancements
- New pair potentials
- Particle data is now accessible from hoomd scripts
- Binary format dump files for simulation restarts
- Numerous small enhancements to enable easily restartable jobs
- 2D simulations are now possible
- Integration methods can now be applied to specified groups of particles
- All IMD commands issued by VMD are now understood
- … and more
HOOMD-blue 0.9.0 is available for download under an open source license.
Posted in Developer Resources, Research | Tags: High-Performance Computing, Molecular Dynamics, NVIDIA CUDA, Open Source, Tools | 1 Comment