Workshop on Parallel Satisfiability Solving on New Architectures

May 31st, 2009

During the last decade, the fundamental Satisfiability Problem (SAT) has been extensively studied. The interest of the community significantly grows because of its conceptual simplicity and its ability to describe a wide set of various problems, including hardware verification, planning, automated reasoning, and others. Consequently, there is an increasing demand for high performance SAT-solving algorithms in industry. In spite of the actual trend in processor development, which is moving from single-core to multicore CPU, there exist few parallel solving approaches dedicated to SAT problems using shared memory architectures.

This workshop will focus on SAT and beyond SAT solving techniques exploiting parallelism in emerging massively multithreaded and multicore architectures. Recently, Graphics Processing Units (GPUs) have evolved to address programming of general-purpose computations. The workshop will particularly focus on the use of GPUs and coprocessor computing techniques to overcome traditional barriers to parallelization.

The workshop invites papers in this emerging discipline which includes, but is not limited to, the following areas of interest.

  • Satisfiability Solving Using Shared Memory
  • General-Purpose Computation on GPUs (GPGPU) for SAT
  • Reconfigurable Computing and FPGA for SAT
  • Parallel SAT, MaxSAT, #SAT and QBF pre-processing

For more information visit the workshop website.

MachStudio Pro uses GPU to enable real-time 3D workflows

May 31st, 2009

The new MachStudio Pro is stand-alone, visualization and rendering software that uses multi-threaded GPGPU computing to enable 3D artists to create and manipulate lights, materials, and HDR cameras in a real-time, non-linear workflow environment with film quality results.

MemtestG80: A Memory and Logic Tester for NVIDIA CUDA-enabled GPUs

May 25th, 2009

MemtestG80 is a software-based tester to test for “soft errors” in GPU memory or logic for NVIDIA CUDA-enabled GPUs. It uses a variety of proven test patterns (some custom and some based on Memtest86) to verify the correct operation of GPU memory and logic. It is a useful tool to ensure that given GPUs do not produce “silent errors” which may corrupt the results of a computation without triggering an overt error.

Precompiled binaries for Windows, Linux and OSX, as well as the source code, are available for download under the LGPL license. MemtestG80 is developed by Imran Haque and Vijay Pande.

GPUmat: GPU toolbox for MATLAB

May 25th, 2009

GPUmat, developed by the GP-You Group, allows Matlab code to benefit from the compute power of modern GPUs. It is built on top of NVIDIA CUDA. The  acceleration is transparent to the user, only the declaration of variables needs to be changed using new GPU-specific keywords. Algorithms need not be changed. A wide range of standard Matlab functions have been implemented.  GPUmat is available as freeware for Windows and Linux from the GP-You download page.

GPU-accelerated Monte Carlo simulation of the 2D and 3D Ising model

May 12th, 2009

Abstract:

The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a current CPU core. An implementation of the three dimensional ferromagnetic cubic lattice Ising model on a GPU is able to generate results up to 35 times faster than on a current CPU core. As proof of concept we calculate the critical temperature of the 2D and 3D Ising model using finite size scaling techniques. Theoretical results for the 2D Ising model and previous simulation results for the 3D Ising model can be reproduced.

The paper is available, as well as CUDA source code for the 2D Ising model.

[Tobias Preis, Peter Virnau, Wolfgang Paul, and Johannes J. Schneider. "GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model". Journal of Computational Physics 228, 4468-4477 (2009)]

University of Melbourne Workshop: High-Performance GPU Computing with NVIDIA CUDA

May 12th, 2009

A half-day workshop and discussion forum will be held from 8:45-13:00, Wednesday May 27, in Lecture theatre 3 of the Alan Gilbert Building at The University of Melbourne, Victoria, Australia. A  light lunch will be supplied afterwards from 13:00-14:00. With speakers from NVIDIA and Xenon Systems, this workshop is hosted by the ARC Centre of Excellence for Mathematics and Statistics of Complex Systems (MASCOS), and the Department of Mathematics and Statistics at the University of Melbourne.

Due to recent advances in GPU hardware and software, so called general-purpose GPU computing (GPGPU) is rapidly expanding from niche applications to the mainstream of high performance computing. For HPC researchers, hardware gains have increased the imperative to learn this new computing paradigm, while high level programming languages (in particular, CUDA) have decreased the barrier to entry to this field, so that it is now possible for new developers to rapidly port suitable applications from C/C++ running on CPUs to CUDA running on GPUs. For appropriate applications, GPUs have significant, even dramatic, advantages compared to CPUs in terms of both Dollars/FLOPS and Watts/FLOPS.

For more information see the workshop announcement.

Barra: A Modular Functional GPU Simulator

May 4th, 2009

Barra, developed by Sylvain Collange, Marc Daumas, David Defour and David Parello from Université de Perpignan, simulates CUDA programs at the assembly language level (NVIDIA PTX ISA). Its ultimate goal is to provide a 100% bit-accurate simulation, offering bug-for-bug compatibility with NVIDIA G80-based GPUs. It works directly with CUDA executables; neither source modification nor recompilation is required. Barra is primarily intended as a tool for research on computer architecture, although it can also be used to debug, profile and optimize CUDA programs at the lowest level. For more details and downloads, see the Barra wiki. A technical report is also available.

Analyzing CUDA Workloads Using a Detailed GPU Simulator

May 4th, 2009

From the abstract:

Modern GPUs provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manycore processors, whether those are GPUs or otherwise. The combination of multiple, multithreaded, SIMD cores makes studying these GPUs useful in understanding tradeoffs among memory, data, and thread level parallelism. While modern GPUs offer orders of magnitude more raw computing power than contemporary CPUs, many important applications, even those with abundant data-level parallelism, do not achieve peak performance. This paper characterizes several non-graphics applications written in NVIDIA’s CUDA programming model by running them on a novel detailed microarchitecture performance simulator that runs NVIDIA’s parallel thread execution (PTX) virtual instruction set. For this study, we selected twelve non-trivial CUDA applications demonstrating varying levels of performance improvement on GPU hardware (versus a CPU-only sequential version of  the application). We study the performance of these applications on our GPU performance simulator with configurations comparable to contemporary high-end graphics cards. We characterize the performance impact of several microarchitecture design choices including choice of interconnect topology, use of caches, design of memory controller, parallel workload distribution mechanisms, and memory request coalescing hardware. Two observations we make are (1) that for the applications we study, performance is more sensitive to interconnect bisection bandwidth rather than latency, and (2) that, for some applications, running fewer threads concurrently than on-chip resources might otherwise allow can improve performance by reducing contention in the memory system.

Ali Bakhoda, George L. Yuan, Wilson W.L. Fung, Henry Wong and Tor M. Aamondt: Analyzing CUDA Workloads Using a Detailed GPU Simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). (GPGPU-Sim website)

University of Western Australia GPU Computing Workshop

April 29th, 2009

A GPU computing workshop and discussion forum will be held at the UWA University Club Thursday, May 7th.  The workshop aims to provide a detailed introduction to GPU computing with CUDA and NVIDIA Tesla computing solutions, and to present research in GPU and Heterogeneous computing being undertaken in Western Australia.

Mark Harris (NVIDIA) will present an introduction to the CUDA architecture, programming model, and the programming environment of C for CUDA, as well as an overview of the Tesla GPU architecture, a live programming demo, and strategies for optimizing CUDA applications for the GPU. To better enable the uptake of this technology, Dragan Dimitrovici from Xenon Systems will provide an overview of CUDA enabled hardware options. The workshop will also include brief presentations of some of the projects using CUDA within Western Australia, including a presentation from Professor Karen Haines (WASP@UWA) on parallel computing strategies required for optimizing applications for GPU and heterogeneous computing.

Please see the workshop flyer for full details.

Fast and Scalable List Ranking on the GPU

April 28th, 2009

Abstract from the paper by Rehman et al.:

General purpose programming on graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. While GPUs are usually used to tackle regular problems that can be easily parallelized, we describe two implementations of List Ranking—a traditional irregular algorithm that is difficult to parallelize on such massively multi-threaded hardware. In our best implementation, we introduce a GPU-optimized, recursive version of the Helman-JaJa algorithm. Our implementation can rank a random list of 8 million elements in just over 100 milliseconds, and achieves a speedup of about 8-9 over a CPU implementation as well as a speedup of 3-4 over the best reported implementation on the Cell Broadband Engine. We also discuss some practical issues that come to the fore when working with massively multi-threaded architectures, especially for algorithms with highly irregular memory access patterns. (M. Suhail Rehman, K. Kothapalli, P.J. Narayanan. Fast and Scalable List Ranking on the GPU. 23rd International Conference on Supercomputing (ICS). New York, USA, June 2009. (To Appear))

Page 43 of 85« First...102030...4142434445...506070...Last »