June 1st, 2010
The Australia GPU Users groups are informal special interest groups founded to bring together GPU users from all fields and experience levels to learn and share their ideas and creations at friendly meetings. There are currently GPU users groups forming in Brisbane, Sydney, and Perth.
The groups will discuss general GPU computing, including GPGPU, CUDA, OpenCL, DirectCompute, DirectX and OpenGL and related technologies. There will be short presentations during the meetings, as well as informal discussions on a range of subjects, including core fundamentals, hardware architectures, parallel programming as well as specific optimisations and also examples of applications from different fields of industry, science and multimedia.
Sign up today: the meetings will allow you to meet others who share your interest in GPUs.
GPGPU.org is maintaining a list of GPU Users groups. If you have a local GPU users group, please tell us about it!
Posted in Developer Resources, Events, Research | Tags: Australia, User Groups | Write a comment
June 1st, 2010
At the ISC 2010 conference in Hamburg, Germany, this week, NVIDIA announced new programs for the growing CUDA/GPGPU developer community:
- CUDA Certification Program – Driven by demand for qualified GPGPU engineers, this is the first program to certify expertise in massively parallel programming on GPUs.
- CUDA Research Centers – Recognizes institutions that embrace GPU Computing across multiple research fields.
- CUDA Teaching Centers – Recognizes institutions that have integrated GPU Computing techniques into their mainstream computer programming curriculum.
These programs complement the existing CUDA Center of Excellence program, which has recognized 10 premier institutions around the world. More details are available here: http://www.nvidia.com/object/io_1275409333119.html
Posted in Business, Developer Resources, Research | Tags: NVIDIA, NVIDIA CUDA | Write a comment
June 1st, 2010
From a white paper by GE Intelligent Platforms (Link):
This white paper describes how GPGPU technology can allow system designers to fit an unprecedented amount of processing power into a very compact package. For example, it describes four GE Intelligent Platforms 3U VPX boards with a floating point performance of 766 GFLOPS in less than 0.4 cubic feet. With configuration control and lifecycle management from a leading COTS supplier, these technologies are clearly ready for duty.
Posted in Business | Tags: Clusters, System Infrastructure, systems | Write a comment
June 1st, 2010
The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Realtime, and XML/RDF Processing) using various processor architectures (e.g., commodity and specialized Multi-core CPUs, Many-core GPUs, and FPGAs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and multicore programming strategies like OpenCL.
More information and the full call can be found here: http://www.adms-conf.org/
Read the rest of this entry »
Posted in Events, Research | Tags: Call for Papers, Cloud Computing, data management, High-Performance Computing | Write a comment
May 31st, 2010
The June 2010 Top500 list of the world’s fastest supercomputers was released this week at ISC 2010. While the US Jaguar supercomputer (located at the Department of Energy’s Oak Ridge Leadership Computing Facility) retained the top spot in Linpack performance, a Chinese cluster called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a GPU-accelerated system, or a Chinese system, has ever achieved on the Top500 list.
For more information, visit www.TOP500.org.
Posted in Events, Press | Tags: Supercomputing, Top500 | 2 Comments
May 30th, 2010
Abstract:
The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential equations with finite elements. We report on at least one order of magnitude speedup over highly-tuned conventional CPU implementations, without sacrificing either accuracy m
or functionality. In more detail, this thesis includes the following contributions:
Read the rest of this entry »
Posted in Research | Tags: Dissertations | 1 Comment
May 30th, 2010
AMD is offering an introductory tutorial to OpenCL™ that will be held alongside the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC’10). The tutorial is a “programmer’s introduction” which covers the ideas behind OpenCL™ and their translation to source code. Read the rest of this entry »
Posted in Developer Resources, Events | Tags: AMD, ATI Stream, OpenCL, Tutorials & Courses | 1 Comment
May 30th, 2010
Abstract:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)
Posted in Research | Tags: Image Processing, Multicore, NVIDIA CUDA, Papers | 6 Comments
May 20th, 2010
Advanced Micro Devices (AMD) recently released ATI Stream Profiler version 1.3. ATI Stream Profiler is a Microsoft® Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your OpenCL™ application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application’s performance.
Features of the tool include:
- Measure the execution time of an OpenCL kernel
- Query the hardware performance counters on ATI Radeon graphics cards
- Display the memory traffic from and to GPU
- Compare multiple runs (sessions) of the same or different programs
- Store the profile data for each run in a csv file
- Display the IL and ISA (hardware disassembly) code of the OpenCL kernel
Posted in Developer Resources | Tags: AMD, OpenCL, Profiling | Write a comment
May 20th, 2010
Abstracts due…24 September 2010
Papers due…1 October 2010
Anchorage, home to moose, bears, birds and whales, is strategically located at almost equal flying distance from Europe, Asia and the Eastern USA. Embraced by six mountain ranges, with views of Mount McKinley in Denali National Park, and warmed by a maritime climate, the area offers year-round adventure, recreation, and sporting events. It is a fitting destination for IPDPS to mark a quarter century of tracking developments in computer science. IPDPS serves as a forum for engineers and scientists from around the world to present their latest research findings in the fields of parallel processing and distributed computing. The five-day program will follow the usual format of contributed papers, invited speakers, and panels mid week, framed by workshops held on the first and last days. To celebrate the 25th year of IPDPS, plan to come early and stay late and also enjoy a modern city surrounded by spectacular wilderness. For updates on IPDPS 2011, visit the Web at www.ipdps.org.
Posted in Events, Research | Tags: Call for Papers, Parallel Computing | Write a comment