Call for Papers: AMD 2013 Developer Summit

February 25th, 2013

Calling all software development innovators in general purpose GPU (GPGPU), data parallel and heterogeneous computing. On November 11-14, 2013 AMD will host the AMD 2013 Developer Summit in San Jose California. The AMD Developer Summit conference board has issued a call for presentation proposals, inviting creators of next-generation software to share research and development work through presentations based on the latest technical papers or reports.

The AMD Developer Summit will be a great venue for developers, academics and innovative entrepreneurs to network with others engaged in related work, collectively defining the future course of heterogeneous computing. And delivering a presentation offers you the perfect opportunity to advocate programming paradigms or gain support for industry standards.

The submission deadline is Mar. 15, 2013, and the submission website is available at:

PARALUTION – A fast, user-friendly library for sparse iterative methods on CPUs and GPUs

February 25th, 2013

PARALUTION is a library for sparse iterative methods with special focus on multi-core and accelerator technology such as GPUs. In particular, it incorporates fine-grained parallel preconditioners designed to expolit modern multi-/many-core devices. Based on C++, it provides a generic and flexible design and interface which allow seamless integration with other scientific software packages. The library is open source and released under GPL. Key features are:

  • OpenMP, CUDA and OpenCL support
  • No special hardware/library requirement
  • Portable code and results across all hardware
  • Many sparse matrix formats
  • Various iterative solvers/preconditioners
  • Generic and robust design
  • Plug-in for the finite element package Deal.II
  • Documentation: user manual (pdf), reports, doxygen

More information, including documentation and case studies, is available at

Lab4241 GP-GPU profiler

February 21st, 2013

A free, pre-alpha release of Lab4241′s GPGPU profiler is now available at It provides source-code-line performance profiling for C or C++ code and CUDA kernels in a non-intrusive way. The profiler enables the developer to a seamless evaluation of used GPU resources (execution counts, memory access, branch diversions, etc.) per source-line, along with result evaluation in a simple, intuitive GUI, similar as with known CPU profilers like Quantify or valgrind.

Call for papers: ADBIS workshop on GPUs In Databases (GID 2013)

February 12th, 2013

High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of the GPUs in Databases (GID) workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain. The list of topics of the GID workshop includes (but is not limited to):

  • Data compression on GPUs
  • GPUs in databases and data warehouses
  • Data mining using GPUs
  • Stream processing
  • Applications of GPUs in bioinformatics
  • Data oriented GPU primitives

For details please visit

Free online course on parallel programming on Udacity

February 10th, 2013

This class teaches the fundamentals of parallel computing with the GPU and the CUDA programming environment. Examples are based on a series of image processing algorithms, such as those in Photoshop or Instagram. Programming and running assignments on high-end GPUs is possible, even if you don’t own one yourself. The course started Monday 4th Feb 2013 so there is still time to join. More information and enrollment:

Generation of large finite-element matrices on multiple graphics processors

February 7th, 2013


The paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6-core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12-core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite-element matrices as large as 10 million unknowns, with over 1 billion nonzero entries. Comparing with the single-threaded and multithreaded CPU implementations, the GPU-based version of the algorithm based on the ideas presented in this paper reduces the finite-element matrix-generation time in double precision by factors of 100 and 30, respectively.

(Dziekonski, A., Sypek, P., Lamecki, A. and Mrozowski, M.: “Generation of large finite-element matrices on multiple graphics processors”. International Journal on Numerical Methoths in Engineering, 2012, in press. [DOI])

Amdahl Software announces the general availability of OpenCL CodeBench

February 7th, 2013

From a recent press release:

Amdahl Software, a leading supplier of development tools for multi-core software, after extensive beta testing by evaluators over a dozen countries and numerous end-user application markets, today announced the production release of OpenCL CodeBench. OpenCL CodeBench is an OpenCL Code Creation tool. It simplifies parallel software development, enabling developers to rapidly generate and optimize OpenCL applications. Engineering productivity is increased through the automation of overhead tasks. The tools suite enables engineers to work at higher levels of abstraction, accelerating the code development process. OpenCL CodeBench benefits both expert and novice engineers through a choice of command line or guided, wizard-driven development methodologies. Close cooperation with IP, SOC and platform vendors will enable future releases of OpenCL CodeBench to more tightly optimize software for specific end user platforms and development environments.

OpenCL CodeBench is available for trial or purchase. For additional information, please visit

CfP: Minisymposium on GPU Computing at PPAM (Warsaw, Sep 8-11, 2013)

January 31st, 2013

GPU programming is now a much richer environment that it used to be a few years ago. On top of the two major programming languages, CUDA and OpenCL, libraries (e.g., cufft) and high level interfaces (e.g., thrust) have been developed that allow a fast access to the computing power of GPUs without detailed knowledge or programming of GPU hardware.

Annotation-based programming models (e.g., OpenACC), GPU plug-ins for existing mathematical software (e.g., Jacket in Matlab), GPU script languages (e.g., PyOpenCL), and new data parallel languages (e.g., Copperhead) bring GPU programming to a new level.

A major criticism of programming abstractions is that they look great on small examples but fail on practical problems. Therefore, this symposium invites, in particular, submissions that deal with practical applications that have successfully employed GPU libraries or high level programming tools. The focus may lie both on the development of the libraries or utilization of existing tools. Workshop topics include, but are not limited to:

  • GPU applications coded with high level programming tools
  • GPU library development and application
  • Comparison of different programming abstractions on the same/similar applications
  • Comparison of the same/similar programming abstractions on different applications
  • Performance and coding effort of high level tools against hand-coded approaches on the GPU
  • Performance and coding effort on multi-core CPUs against GPUs utilizing programming abstractions
  • Classification of different programming abstractions with respect to their best application area

The highest quality papers of the minisymposium will receive an invitation to a special issue of the journal “Concurrency and Computation: Practice and Experience”.

Full CFP: Minisymposium on GPU Computing at the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM). Note that PPAM will also host a full-day tutorial on Advanced GPU Programming.

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

January 29th, 2013


In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to be numerically stable. Our paper makes two major contributions. First, our solver is the first numerically stable tridiagonal solver for GPUs. Our solver provides comparable quality of stable solutions to Intel MKL and Matlab, at speed comparable to the GPU tridiagonal solvers in existing packages like CUSPARSE. It is also scalable to multiple GPUs and CPUs. Second, we present and analyze two key optimization strategies for our solver: a high-throughput data layout transformation for memory efficiency, and a dynamic tiling approach for reducing the memory access footprint caused by branch divergence.

(Chang, Li-Wen and Stratton, John A. and Kim, Hee-Seok and Hwu, Wen-mei W.: “A scalable, numerically stable, high-performance tridiagonal solver using GPUs”, Supercomputing 2012. [WWW])

Parallel Computing Training Dates from AccelerEyes

January 29th, 2013

AccelerEyes has released dates for their upcoming CUDA and OpenCL training courses.



More information can be found on the courses’ webpages.

Page 12 of 107« First...1011121314...203040...Last »