The paper discusses a fast implementation of the conjugate gradient iterative method with E-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results indicate that performance of electromagnetic simulations can be significantly improved thereby enabling full wave optimization of microwave components in more manageable time.
(A. Dziekonski, A. Lamecki and M. Mrozowski: “GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method”, IEEE Microwave and Wireless Components Letters 21(1) pp.1-3, Jan. 2011. [DOI])
A simple tool for off-line compilation of OpenCL kernel code, called “OpenCLcc”, is now available at
OpenCLcc takes a text file with the OpenCL kernel code as input and calls the OpenCL run-time to compile it, echoing errors to the console.
Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Functional modules, in contrast, consist of proteins that participate in a particular cellular process while binding each other at a different time and place.
A protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Protein complexes and functional modules can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks. Read the rest of this entry »
GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model builds a global memory space that allows CPU code to transparently access data hosted in accelerators’ (GPUs’) memories. Moreover, the coherency of the data is automatically handled by the library. This removes the necessity for manual memory transfers (cudaMemcpy) between the host and GPU memories. Furthermore, GMAC assigns a different “virtual GPU” to each host thread, and the virtual GPUs are evenly mapped to physical GPUs. This is especially useful for multi-GPU programs since each host thread can access the memory of all GPUs and simple GPU-to-GPU transfers can be performed with simple memcpy calls. Read the rest of this entry »
Press release (submitted to gpgpu.org very late…):
LOS ANGELES,CA – July 26, 2010 – PEER 1 Hosting (TSX:PIX), a global online IT hosting provider, today announced the availability of the industry’s first large-scale, hosted graphics processing unit (GPU) Cloud at the 37th Annual Siggraph International Conference.
The system runs the RealityServer® 3D web application service platform, developed by mental images, a wholly owned subsidiary of NVIDIA. The RealityServer platform is a powerful combination of NVIDIA Tesla GPUs and 3D web services software. It delivers interactive and photorealistic applications over the web using the iray® renderer, which enables animators, product designers, architects and consumers to easily visualize 3D scenes with remarkable realism. Read the rest of this entry »
Ocelot 2.0.969 brings CUDA 3.2 and Fermi support to a stable release. Ocelot is a BSD-licensed open source implementation of the CUDA runtime, a PTX emulator, and a mid-level PTX compiler.
Here is a feature list for 2.0.969:
- PTX 2.2 and Fermi device support: Floating point results should be within the ULP limits in the PTX ISA manual. Over 500 unit tests verify that the behaviour matches NVIDIA devices.
- Four target device types: A functional PTX emulator. A PTX to LLVM to x86/ARM JIT. A PTX to CAL JIT for AMD devices (beta). A PTX to PTX JIT for NVIDIA devices.
- A full-featured PTX 2.2 IR: An analysis/optimization pass interface over PTX (Control flow graph, dataflow graph, dominator/postdominator trees, structured control tree). Optimizations can be plugged in as modules.
- Correctness checking tools: A memory checker (detects unaligned and out of bounds accesses). A race detector. An interactive debugger (allows stepping through PTX instructions).
- An instruction trace analyzer interface: Allows user-defined modules to receive callbacks when PTX instructions are executed. Can be used to compute metrics over applications or perform correctness checks.
- A CUDA API frontend: Existing CUDA programs can be directly linked against Ocelot. Device pointers can be shared across host threads. Multiple devices can be controlled from the same host thread (cudaSetDevice can be called multiple times).
Ocelot is available under a BSD license at http://code.google.com/p/gpuocelot.
Submissions are cordially invited for the Workshop on GPU Computing, held with PPAM 2011 — 9th International Conference on Parallel Processing and Applied Mathematics, September 11-14, 2011, Torun, Poland. This workshop is organised by Josep R. Herrero, Enrique S. Quintana-Orti, and Robert Strzodka.
GPU programming is now a much richer environment that it used to be a few years ago. On top of the two major programming languages, CUDA and OpenCL, libraries (e.g., cufft) and high level interfaces (e.g., thrust) have been developed that allow a fast access to the computing power of GPUs without detailed knowledge or programming of GPU hardware.
Annotation-based programming models (e.g., PGI Accelerator), GPU plug-ins for existing mathematical software (e.g., Jacket in Matlab), GPU script languages (e.g., PyOpenCL), and new data parallel languages (e.g., Copperhead) bring GPU programming to a new level. Read the rest of this entry »
We examine the problem of segmenting foreground objects in live video when background scene textures change over time. In particular, we formulate background subtraction as minimizing a penalized instantaneous risk functional yielding a local on-line discriminative algorithm that can quickly adapt to temporal changes. We analyze the algorithms convergence, discuss its robustness to non-stationarity, and provide an efficient non-linear extension via sparse kernels. To accommodate interactions among neighboring pixels, a global algorithm is then derived that explicitly distinguishes objects versus background using maximum a posteriori inference in a Markov random field (implemented via graph-cuts). By exploiting the parallel nature of the proposed algorithms, we develop an implementation that can run efficiently on the highly parallel Graphics Processing Unit (GPU). Empirical studies on a wide variety of datasets demonstrate that the proposed approach achieves quality that is comparable to state-of-the-art off-line methods, while still being suitable for real-time video analysis (75 fps on a mid-range GPU).
(Li Cheng, M. Gong, D. Schuurmans, and T. Caelli: “Real-time Discriminative Background Subtraction”. IEEE Transactions on Image Processing, 2011, to appear. [DOI] [Sources & Info])
The Pan-American Advanced Studies Institute (PASI)—”Scientific Computing in the Americas: the challenge of massive parallelism”—was held in Valparaiso, Chile on 3–14 January 2011. The event hosted 14 lecturers and 68 participants, thanks to NSF/DOE funding. Lecture materials are now available publicly: PDFs of the lecture slides on the PASI website, and screencasts (video) via an iTunes U course and on YouTube also).
Exploitation of novel computer architectures, such as general purpose GPUs, is allowing researchers to accelerate the realization of frontier models in particle-based simulation, by enabling an increase in the level of realism in the description of the particles and their interactions and increasing both the number of particles and the timescales simulated.
This one-day meeting focuses on the new and exciting area of the exploitation of GPUs and related technology in the area of biomolecular simulations.
In addition to a programme of national and international speakers in the field, there is the opportunity to present a poster on your research. Read the rest of this entry »