New NVIDIA Research & Certification Progams for CUDA/GPGPU

June 1st, 2010

At the ISC 2010 conference in Hamburg, Germany, this week, NVIDIA announced new programs for the growing CUDA/GPGPU developer community:

  • CUDA Certification Program – Driven by demand for qualified GPGPU engineers, this is the first program to certify expertise in massively parallel programming on GPUs.
  • CUDA Research Centers – Recognizes institutions that embrace GPU Computing across multiple research fields.
  • CUDA Teaching Centers – Recognizes institutions that have integrated GPU Computing techniques into their mainstream computer programming curriculum.

These programs complement the existing CUDA Center of Excellence program, which has recognized 10 premier institutions around the world. More details are available here: http://www.nvidia.com/object/io_1275409333119.html

CFP: First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10), Colocated with VLDB 2010

June 1st, 2010

The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Realtime, and XML/RDF Processing) using various processor architectures  (e.g., commodity and specialized Multi-core CPUs, Many-core GPUs, and FPGAs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and multicore programming strategies like OpenCL.

More information and the full call can be found here: http://www.adms-conf.org/

Read the rest of this entry »

PhD Thesis: Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters

May 30th, 2010

Abstract:

The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential equations with finite elements. We report on at least one order of magnitude speedup over highly-tuned conventional CPU implementations, without sacrificing either accuracy m
or functionality. In more detail, this thesis includes the following contributions:
Read the rest of this entry »

“Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”

May 30th, 2010

Abstract:

In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.

(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)

IPDPS 2011 CALL FOR PARTICIPATION

May 20th, 2010

Abstracts due…24 September 2010
Papers due…1 October 2010

Anchorage, home to moose, bears, birds and whales, is strategically located at almost equal flying distance from Europe, Asia and the Eastern USA. Embraced by six mountain ranges, with views of Mount McKinley in Denali National Park, and warmed by a maritime climate, the area offers year-round adventure, recreation, and sporting events. It is a fitting destination for IPDPS to mark a quarter century of tracking developments in computer science.  IPDPS serves as a forum for engineers and scientists from around the world to present their latest research findings in the fields of parallel processing and distributed computing. The five-day program will follow the usual format of contributed papers, invited speakers, and panels mid week, framed by workshops held on the first and last days.  To celebrate the 25th year of IPDPS, plan to come early and stay late and also enjoy a modern city surrounded by spectacular wilderness. For updates on IPDPS 2011, visit the Web at www.ipdps.org.

Submit GTC 2010 Proposals by June 1

May 20th, 2010

The GPU Technology Conference (GTC 2010) will be held Sept. 20-23, 2010 in San Jose, Calif. Developers, researchers, scientists and entrepreneurs are invited to submit proposals on GPU-related topics. See www.nvidia.com/gtc.

GPU Developers Summit: Session Topics deadline: June 1, 2010
Emerging Companies Summit: “CEO on Stage” Nominations deadline: August 1, 2010
NVIDIA Research Summit: Posters deadline: August 15, 2010

To submit a proposal, you will be asked to set up a GTC 2010 account so you can track the status of your submission.

Submission guidelines: www.nvidia.com/object/call_for_submissions.html
Join GTC 2010 mailing list: www.nvidia.com/object/email_updates.html

GPU Random Numbers via the Tiny Encryption Algorithm

May 20th, 2010

Abstract:

Random numbers are extensively used on the GPU. As more computation is ported to the GPU, it can no longer be treated as rendering hardware alone. Random number generators (RNG) are expected to cater general purpose and graphics applications alike. Such diversity adds to expected requirements of a RNG. A good GPU RNG should be able to provide repeatability, random access, multiple independent streams, speed, and random numbers free from detectable statistical bias. A specific application may require some if not all of the above characteristics at one time. In particular, we hypothesize that not all algorithms need the highest-quality random numbers, so a good GPU RNG should provide a speed quality tradeoff that can be tuned for fast low quality or slower high quality random numbers.

We propose that the Tiny Encryption Algorithm satisfies all of the requirements of a good GPU Pseudo Random Number Generator. We compare our technique against previous approaches, and present an evaluation using standard randomness test suites as well as Perlin noise and a Monte-Carlo shadow algorithm. We show that the quality of random number generation directly affects the quality of the noise produced, however, good quality noise can still be produced with a lower quality random number generator.

(Fahad Zafar, Aaron Curtis and Marc Olano, “GPU Random Numbers via the Tiny Encryption Algorithm”, HPG 2010: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on High Performance Graphics, (Saarbrücken, Germany, June 2010. Link to preprint.)

HOOMD-blue 0.9.0 released

May 20th, 2010

HOOMD-blue stands for Highly Optimized Object-oriented Many-particle Dynamics — Blue Edition. It performs general-purpose particle dynamics simulations on a single workstation, taking advantage of  NVIDIA GPUs to attain a level of performance equivalent to dozens of processor cores on a fast cluster.

HOOMD-blue 0.9.0 is a major new release. Highlights include:

  • Support for Fermi generation GPUs
  • Performance enhancements
  • New pair potentials
  • Particle data is now accessible from hoomd scripts
  • Binary format dump files for simulation restarts
  • Numerous small enhancements to enable easily restartable jobs
  • 2D simulations are now possible
  • Integration methods can now be applied to specified groups of particles
  • All IMD commands issued by VMD are now understood
  • and more

HOOMD-blue 0.9.0 is available for download under an open source license.

Simulation and Visualization of the Saint-Venant System using GPUs

May 13th, 2010

Abstract:

We consider three high-resolution schemes for computing shallow-water waves as described by the Saint-Venant system and discuss how to develop highly efficient implementations using graphical processing units (GPUs). The schemes are well-balanced for lake-at-rest problems, handle dry states, and support linear friction models. The first two schemes handle dry states by switching variables in the reconstruction step, so that that bilinear reconstructions are computed using physical variables for small water depths and conserved variables elsewhere. In the third scheme, reconstructed slopes are modified in cells containing dry zones to ensure non-negative values at integration points. We discuss how single and double-precision arithmetics affect accuracy and efficiency, scalability and resource utilization for our implementations, and demonstrate that all three schemes map very well to current GPU hardware. We have also implemented direct and close-to-photo-realistic visualization of simulation results on the GPU, giving visual simulations with interactive speeds for reasonably-sized grids.

(A. R. Brodtkorb, T. R. Hagen, K.-A. Lie and J. R. Natvig: “Simulation and Visualization of the Saint-Venant System using GPUs”. In review, February 2010. Link to PDF preprint, Youtube video)

State-of-the-Art in Heterogeneous Computing

May 13th, 2010

Abstract:

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.

(A. R. Brodtkorb, C. Dyken, T. R. Hagen, J. M. Hjelmervik and O. O. Storaasli: “State-of-the-Art in Heterogeneous Computing”, IOS Press, 18(1) (2010), pp. 1-33. Link to PDF)

Page 20 of 53« First...10...1819202122...304050...Last »