CfP: High performance computational systems Biology

February 8th, 2010

The HiBi workshop establishes a forum to link researchers in the areas of parallel computing and computational systems biology. One of the main limitations in managing models of biological systems comes from the fundamental difference between the high parallelism evident in biochemical reactions and the sequential environments employed for the analysis of these reactions. Such limitations affect all varieties of continuous, deterministic, discrete and stochastic models; undermining the applicability of simulation techniques and analysis of biological models. The goal of HiBi is therefore to bring together researchers in the fields of high performance computing and computational systems biology. Experts from around the world will present their current work, discuss
profound challenges, new ideas, results, applications and their experience relating to key aspects of high performance computing in biology.

Topics of interest include, but are not limited to:

  • Parallel stochastic simulation
  • Biological and Numerical parallel computing
  • Parallel and distributed architectures
  • Emerging processing architecture: Cell processors, GPUs, mixed CPU-FPGA, etc.
  • Parallel model checking techniques
  • Parallel parameter estimation
  • Parallel algorithms for biological analysis
  • Application of concurrency theory to biology
  • Parallel visualization algorithms
  • Web-services and Internet computing for e-Science
  • Tools and applications

More Information: http://www.cosbi.eu/hibi2010/

CfP: Symposium on chemical computations on GP-GPUs

February 8th, 2010

The symposium will provide technical presentations from the companies advancing the development of GPUs, discussions of the challenges involved in effectively programming GPUs, and presentations on the use of GPUs in a range of chemical applications.

The deadline for submissions is 04/05/2010, and more information can be found at http://illinois.edu/lb/article/2101/33709.

CfP: High Performance Graphics 2010

February 7th, 2010

High-Performance Graphics 2010 continues last year’s success at synthesizing two important and cutting-edge topics in computer graphics, the previous Graphics Hardware and Interactive Ray Tracing conferences. The scope of the conference is the overarching field of performance-oriented graphics systems, covering innovative algorithms, efficient implementations, and hardware architecture. This broader focus offers a common forum bringing together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications.

The program features three days of paper and industry presentations, with ample time for discussions during breaks, lunches, and the  conference banquet. The conference, which will take place on June 25-27, is co-located with Eurographics Rendering Symposium on the campus of the Max-Planck  Institut Informatik, Saarland University, Saarbrucken, Germany.

Original and innovative performance-oriented contributions are invited from all areas of graphics, including hardware architectures, rendering, physics, animation, AI, simulation, data structures, with topics including (but not limited to):

  • New graphics hardware architectures
  • Rendering architectures and algorithms
  • Parallel computing for graphics (including GPU Computing)
  • Algorithmic foundations
  • Languages and compilation

The conference website with additional information is located at http://www.highperformancegraphics.org.

Triangular matrix inversion on Graphics Processing Unit

February 6th, 2010

Abstract:

Dense matrix inversion is a basic procedure in many linear algebra algorithms. A computationally arduous step in most dense matrix inversion methods is the inversion of triangular matrices as produced by factorization methods such as LU decomposition. In this paper, we demonstrate how triangular matrix inversion (TMI) can be accelerated considerably by using commercial Graphics Processing Units (GPU) in a standard PC. Our implementation is based on a divide and conquer type recursive TMI algorithm, efficiently adapted to the GPU architecture. Our implementation obtains a speedup of 34x versus a CPU-based LAPACK reference routine, and runs at up to 54 gigaflops/s on a GTX 280 in double precision. Limitations of the algorithm are discussed, and strategies to cope with them are introduced. In addition, we show how inversion of an L- and U-matrix can be performed concurrently on a GTX 295 based dual-GPU system at up to 90 gigaflops/s.

(Florian Ries, Tommaso De Marco, Matteo Zivieri and Roberto Guerrieri, Triangular Matrix Inversion on Graphics Processing Units, Supercomputing 2009, DOI 10.1145/1654059.1654069)

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

February 2nd, 2010

Abstract:

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI’s libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI’s SSE backend, and additional 3–4 and 4–16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development.

(Danny van Dyk, Markus Geveler, Sven Mallach, Dirk Ribbrock, Dominik Göddeke and Carsten Gutwenger: HONEI: A collection of libraries for numerical computations targeting multiple processor architectures. Computer Physics Communications 180(12), pp. 2534-2543, December 2009. DOI 10.1016/j.cpc.2009.04.018)

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs

February 2nd, 2010

Abstract:

As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route to higher performance through parallel processing. The rise of multi-core systems in all domains of computing has opened the door to heterogeneous multi-processors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application’s fine and coarse grained parallelism by using special APIs. CUDA is such a parallel-computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.

(Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong and Wen-Mei W. Hwu, FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs, Proceedings of the 7th Symposium on Application Specific Processors, pp.35-42, July 2009. DOI: 10.1109/SASP.2009.5226333)

Molecular Workshop Series at Stanford

February 2nd, 2010

Molecular Workshop Series – Running and Developing MD Algorithms on GPUs with OpenMM and PyOpenMM + Intro to MD and Trajectory Analysis

Simbios is excited to announce its upcoming Molecular Dynamics (MD) Workshop Series, highlighting new capabilities within the recently released OpenMM 1.0 and introducing PyOpenMM for rapid MD code development with high performance:

Day 1: Running and Developing MD Algorithms on GPUs with OpenMM
Day 2: Introduction to MD and Trajectory Analysis with Markov State Models

When: March 1-2, 2010 (sign up for one or two days)
Where: Stanford University

Registration is free but required and spaces are limited. Please visit http://simbios.stanford.edu/MDWorkshops.htm for the workshop agenda and to register.

Programming Massively Parallel Processors: A Hands-on Approach

January 29th, 2010

The first textbook of its kind, Programming Massively Parallel Processors: A Hands-on Approach launches today, authored by Dr. David B. Kirk, NVIDIA Fellow and former chief scientist, and Dr. Wen-mei Hwu, who serves at the University of Illinois at Urbana-Champaign as Chair of Electrical and Computer Engineering in the Coordinated Science Laboratory, co-director of the Universal Parallel Computing Research Center and principal investigator of the CUDA Center of Excellence.

The textbook, which is 256 pages, is the first aimed at teaching advanced students and professionals the basic concepts of parallel programming and GPU architectures. Published by Morgan-Kauffman, it explores various techniques for constructing parallel programs and reviews numerous case studies.

With conventional CPU-based computing no longer scaling in performance and the world’s computational challenges increasing in complexity, the need for massively parallel processing has never been greater. GPUs have hundreds of cores capable of delivering transformative performance increases across a wide range of computational challenges. The rise of these multi-core architectures has raised the need to teach advanced programmers a new and essential skill: how to program massively parallel processors.

Among the book’s key features:
• First and only text that teaches how to program within a massively parallel environment
• Portions of the NVIDIA-provided content have been part of the curriculum at 300 universities worldwide
• Drafts of sections of the book have been tested and taught by Kirk at the University of Illinois
• Book utilizes OpenCL and CUDA C, the NVIDIA parallel computing language developed specifically for massively parallel environments

The book is available to purchase directly from Elsevier or Amazon.

ATI Stream SDK 2.0 Production Release

January 26th, 2010

From the release notes:

ATI Stream SDK 2.0 is the first production SDK for both AMD GPUs and x86 CPUs. This release supports a wide range of ATI graphics processors, including the new ATI Radeon HD 5970, and provides support for OpenCL ICD (Installable Client Driver), atomic functions for 32-bit integers, a Microsoft Visual Studio 2008-integrated ATI Stream Profiler performance analysis tool, and other robust features. Preview support for upcoming features include OpenCL and Microsoft DirectX 10 interoperability, and double-precision floating point basic arithmetic in OpenCL C kernels.

Riken Hosting “Accelerated Computing” Workshop This Week in Tokyo

January 24th, 2010

RIKEN, one of the most prestigious research institutes in Japan, is the site of an upcoming computing workshop to be keynoted by NVIDIA CEO Jen–Hsun Huang. RIKEN conducts research across a wide range of fields, including physics, chemistry, medical science, biology, and engineering. The workshop will be held 1/28/10 – 1/29/10. See https://reg-nvidia.jp/public/seminar/view/3 for full details.  In addition to keynote speeches by Jen-Hsun Huang and Professor Takayuki Aoki from Tokyo Institute of Technology, guest speakers at the event include Prof. Lorena Barba from Boston University, Mr. Mr. Eiji Fujii from Square ENIX, Dr. Mark Harris from NVIDIA (and GPGPU.org), and Dr. James Phillips from The University of Illinois at Urbana-Champaign.

From the workshop webpage:

“Accelerated Computing” is an old concept that is recently redefined in High-Performance Computing. It was started by dedicated machines like GRAPEs, but a great revolution has been occurring fueled by recent advancement in GPU Computing, both in hardware and in software such as CUDA C and OpenCL. This conference aims to review cutting edge technologies and scientific applications, as well as to discuss the future of the “Accelerator” approach in scientific and industrial HPC. Please join the conference for fruitful discussions on the future of HPC with highly-parallel processors.

Page 1 of 5412345»...Last »