Australia GPU Users Groups

June 1st, 2010

The Australia GPU Users groups are informal special interest groups founded to bring together GPU users from all fields and experience levels to learn and share their ideas and creations at friendly meetings.  There are currently GPU users groups forming in Brisbane, Sydney, and Perth.

The groups will discuss general GPU computing, including GPGPU, CUDA, OpenCL, DirectCompute, DirectX and OpenGL and related technologies. There will be short presentations during the meetings, as well as informal discussions on a range of subjects, including core fundamentals, hardware architectures, parallel programming as well as specific optimisations and also examples of applications from different fields of industry, science and multimedia.

Sign up today: the meetings will allow you to meet others who share your interest in GPUs.

GPGPU.org is maintaining a list of GPU Users groups.  If you have a local GPU users group, please tell us about it!

New NVIDIA Research & Certification Progams for CUDA/GPGPU

June 1st, 2010

At the ISC 2010 conference in Hamburg, Germany, this week, NVIDIA announced new programs for the growing CUDA/GPGPU developer community:

  • CUDA Certification Program – Driven by demand for qualified GPGPU engineers, this is the first program to certify expertise in massively parallel programming on GPUs.
  • CUDA Research Centers – Recognizes institutions that embrace GPU Computing across multiple research fields.
  • CUDA Teaching Centers – Recognizes institutions that have integrated GPU Computing techniques into their mainstream computer programming curriculum.

These programs complement the existing CUDA Center of Excellence program, which has recognized 10 premier institutions around the world. More details are available here: http://www.nvidia.com/object/io_1275409333119.html

White Paper: “Many-Core Processors Report Ready for Duty”

June 1st, 2010

From a white paper by GE Intelligent Platforms (Link):

This white paper describes how GPGPU technology can allow system designers to fit an unprecedented amount of processing power into a very compact package. For example, it describes four GE Intelligent Platforms 3U VPX boards with a floating point performance of 766 GFLOPS in less than 0.4 cubic feet. With configuration control and lifecycle management from a leading COTS supplier, these technologies are clearly ready for duty.


CFP: First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10), Colocated with VLDB 2010

June 1st, 2010

The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Realtime, and XML/RDF Processing) using various processor architectures  (e.g., commodity and specialized Multi-core CPUs, Many-core GPUs, and FPGAs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and multicore programming strategies like OpenCL.

More information and the full call can be found here: http://www.adms-conf.org/

Read the rest of this entry »

GPU Supercomputer #2 in Top500

May 31st, 2010

The June 2010 Top500 list of the world’s fastest supercomputers was released this week at ISC 2010.  While the US Jaguar supercomputer (located at the Department of Energy’s Oak Ridge Leadership Computing Facility) retained the top spot in Linpack performance, a Chinese cluster called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a GPU-accelerated system, or a Chinese system, has ever achieved on the Top500 list.

For more information, visit www.TOP500.org.

PhD Thesis: Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters

May 30th, 2010

Abstract:

The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential equations with finite elements. We report on at least one order of magnitude speedup over highly-tuned conventional CPU implementations, without sacrificing either accuracy m
or functionality. In more detail, this thesis includes the following contributions:
Read the rest of this entry »

Introductory Tutorial to OpenCL™ for HPC at SAAHPC’10

May 30th, 2010

AMD is offering an introductory tutorial to OpenCL™ that will be held alongside the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC’10). The tutorial is a “programmer’s introduction” which covers the ideas behind OpenCL™ and their translation to source code. Read the rest of this entry »

“Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”

May 30th, 2010

Abstract:

In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.

(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)

ATI Stream Profiler v1.3 Released

May 20th, 2010

Advanced Micro Devices (AMD) recently released ATI Stream Profiler version 1.3. ATI Stream Profiler is a Microsoft® Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your OpenCL™ application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application’s performance.

Features of the tool include:

  • Measure the execution time of an OpenCL kernel
  • Query the hardware performance counters on ATI Radeon graphics cards
  • Display the memory traffic from and to GPU
  • Compare multiple runs (sessions) of the same or different programs
  • Store the profile data for each run in a csv file
  • Display the IL and ISA (hardware disassembly) code of the OpenCL kernel

IPDPS 2011 CALL FOR PARTICIPATION

May 20th, 2010

Abstracts due…24 September 2010
Papers due…1 October 2010

Anchorage, home to moose, bears, birds and whales, is strategically located at almost equal flying distance from Europe, Asia and the Eastern USA. Embraced by six mountain ranges, with views of Mount McKinley in Denali National Park, and warmed by a maritime climate, the area offers year-round adventure, recreation, and sporting events. It is a fitting destination for IPDPS to mark a quarter century of tracking developments in computer science.  IPDPS serves as a forum for engineers and scientists from around the world to present their latest research findings in the fields of parallel processing and distributed computing. The five-day program will follow the usual format of contributed papers, invited speakers, and panels mid week, framed by workshops held on the first and last days.  To celebrate the 25th year of IPDPS, plan to come early and stay late and also enjoy a modern city surrounded by spectacular wilderness. For updates on IPDPS 2011, visit the Web at www.ipdps.org.

Page 30 of 89« First...1020...2829303132...405060...Last »