Panoptes: A Binary Translation Framework for CUDA

May 22nd, 2012

Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks. Instrumentation and analysis tools for GPU environments to date have been more limited. Panoptes is a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, computationally-intensive programs can be run at the native parallelism of the device during analysis. To demonstrate its instrumentation capabilities, we currently implement a memory addressability and validity checker that targets CUDA programs.

Panoptes traces targeted programs by library interposition at runtime. Interactions with the GPU are intercepted, annotated as necessary, and are then sent to the actual CUDA library for execution on the device. This approach gives an analysis tool built on Panoptes a complete view of the state of the GPU without additional developer effort. In contrast, developer-added instrumentation may be incomplete due to errors of omission or cause maintenance difficulties, particularly for large code bases.

By directing annotated instructions to the GPU for execution rather than relying on the host for emulation, Panoptes is able to analyze programs at scale. The rift in parallel execution capabilities between modern GPUs and CPUs carries into testing and debugging as well. For computationally intensive tasks brought to the GPU explicitly for its parallelism, resorting to host-based emulation may necessitate reduced or simplified inputs for analysis. More details: http://github.com/ckennelly/panoptes

NVIDIA Kepler GK110 Architecture White Paper

May 20th, 2012

NVIDIA Kepler GK110 Die Shot

This white paper describes the new Kepler  GK110 Architecture from NVIDIA.

Comprising 7.1 billion transistors, Kepler GK110 is not only the fastest, but also the most architecturally complex microprocessor ever built. Adding many new innovative features focused on compute performance, GK110 was designed to be a parallel processing powerhouse for Tesla® and the HPC market.

Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM efficiency versus 60‐65% on the prior Fermi architecture.

In addition to greatly improved performance, the Kepler architecture offers a huge leap forward in power efficiency, delivering up to 3x the performance per watt of Fermi.

The paper describes features of the Kepler GK110 architecture, including

  • Dynamic Parallelism;
  • Hyper-Q;
  • Grid Management Unit;
  • NVIDIA GPUDirect™;
  • New SHFL instruction and atomic instruction enhancements;
  • New read-only data cache previously only accessible to texture;
  • Bindless Textures;
  • and much more.

Benchmarking Analytical Queries on a GPU

May 20th, 2012

This report describes advantages of using GPUs for analytical queries. It compares performance of the Alenka database engine using a GPU with the performance of Oracle on a  SPARC server. More information on Alenka including source code: https://github.com/antonmks/Alenka

5th Workshop on UnConventional High Performance Computing 2012

May 19th, 2012

Together with EuroPar-12, the 5th Workshop on UnConventional High Performance Computing 2012 (UCHPC 2012) will take place on August 27/28 at Rhodes Island, Greece. The workshop tries to capture solutions for HPC which are unconventional today but could become conventional and significant tomorrow. While GPGPU is already used a lot in HPC, there still are all kind of issues around best exploitation and productivity for the programmer. Submission deadline: June 6, 2012. For more details, see
http://www.lrr.in.tum.de/~weidendo/uchpc12

CUVILib v1.2 released

May 17th, 2012

TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.

More information, including a free trial version: http://www.cuvilib.com/

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

May 11th, 2012

Abstract:

Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed.

Results: We present CUSHAW, a parallelized short read aligner based on the compute unified device architecture (CUDA) parallel programming model. We exploit CUDA-compatible graphics hardware as accelerators to achieve fast speed. Our algorithm employs a quality-aware bounded search approach based on the Burrows- Wheeler transform (BWT) and the Ferragina Manzini (FM)-index to reduce the search space and achieve high alignment quality. Performance evaluation, using simulated as well as real short read datasets, reveals that our algorithm running on one or two graphics processing units (GPUs) achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments compared to three popular BWT-based aligners: Bowtie, BWA and SOAP2. CUSHAW also delivers competitive performance in terms of SNP calling for an E.coli test dataset.

Availability: http://cushaw.sourceforge.net.

(Y. Liu, B. Schmidt, D. Maskell: “CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform”, Bioinformatics, 2012. [DOI])

Acceleware CUDA™ Training – Life Science Focus

May 2nd, 2012

Partnering with NVIDIA and Microsoft, this four day CUDA training course is designed for Researchers and Programmers in the life science industries who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. It is held in Boston, MA, on June 4-7, 2012. This course will have a life science theme. Commonly used algorithms such as Monte Carlo methods, FFT and filtering will be used and profiled in examples. The case study on day 4 focuses on the efficient implementation of a molecular dynamics simulation. More information: http://www.acceleware.com/jun4boston

Facing the Multicore Challenge III

April 27th, 2012

Submissions are cordially invited for MCC-III, to be held in Stuttgart, Germany, September 19-21. This conference is the 3rd in a series, starting in 2010 in Heidelberg at the Heidelberg Academy of Sciences (HAW) and 2011 at the Karlsruhe Institute of Technology (KIT) and the Engineering Mathematics and Computing Lab (EMCL). It aims to combine new aspects of multi-/manycore microprocessor technologies, parallel applications, numerical simulation, software development and tools. Contributions are welcome from all participating disciplines. Particular emphasis is placed on the support and advancement of young scientists, in addition to high-quality invited keynote talks and tutorials. More information including the full call for papers, topics of interest and submission instructions: http://www.multicore-challenge.org

OpenCL SDK for new Intel Core Processors

April 27th, 2012

The Intel® SDK for OpenCL Applications now supports the OpenCL 1.1 full-profile on 3rd generation Intel® Core™ processors with Intel® HD Graphics 4000/2500. For the first time, OpenCL developers using Intel® architecture can utilize compute resources across both Intel® Processor and Intel HD Graphics. More information: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk

New Libra Platform version released

April 21st, 2012

Libra Platform is a GPGPU-Heterogeneous Compute API and runtime environment available on Windows, Mac and Linux. Libra Compute API offers performance portability and direct compute access via standard programming environments C/C++, Java, C# and Matlab to execute math operations on top of current and future compute architectures, including the latest GPUs, x86/x64 CPUs and with broad support for compute devices compatible with low level specific APIs – OpenCL, CUDA, OpenGL and standard x86/x64 compute APIs.

Read more in the full announcement.

Page 1 of 8912345...102030...Last »