VexCL 1.0.0 released with CUDA support

November 20th, 2013

VexCL is a modern C++ library created for ease of GPGPU development with C++. VexCL strives to reduce the amount of boilerplate code needed to develop GPGPU applications. The library provides a convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector multiplication, etc. The source code is available under the permissive MIT license. As of v1.0.0, VexCL provides two backends: OpenCL and CUDA. Users may choose either of those at compile time with a preprocessor macro definition. More information is available at the GitHub project page and release notes page.

CFP: GPGPU: Seventh Workshop on General Purpose Processing Using GPUs

November 20th, 2013

GPGPU-7 (Seventh Workshop on General Purpose Processing Using GPUs) is held in conjunction with ASPLOS in Salt Lake City, Utah, on March 1st, 2014. The goal of this workshop is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms.
This year’s work is particularly interested on new heterogeneous GPU platforms. Read the rest of this entry »

New Versions of AMD CodeXL, Bolt and AMD APP SDK

November 13th, 2013

AMD CodeXL is a free set of tools for GPU debugging, GPU profiling, static analysis of OpenCL kernels, and CPU profiling, including support for remote servers. For more information and download links, see: http://developer.amd.com/community/blog/2013/11/08/codexl-1-3-released/

Bolt is an STL compatible C++ template library for creating data-parallel applications using C++ (no C++ AMP / OpenCL code required). For more information about the Bolt template library and download links, see: http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/bolt-c-template-library/

AMD APP SDK has everything needed to get started with OpenCL and parallel programming. It includes OpenCL samples that are very easy to compile, as well as the Bolt and other libraries. For more information about AMD APP SDK and download links, see: http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/

Allinea DDT with support for NVIDIA CUDA 5.5 and CUDA on ARM

November 13th, 2013

Allinea DDT is part of Allinea Software’s unified tools platform, which provides a single powerful and intuitive environment for debugging and profiling of parallel and multithreaded applications. It is widely used by computational scientists and scientific programmers to fix software defects of parallel applications running on hybrid GPU clusters and supercomputers. DDT 4.1.1 supports CUDA 5.5, C++11 and the GNU 4.8 compilers. Also introduced with Allinea DDT 4.1.1 is CUDA toolkit debugging support for ARMv7 architectures. More information: http://www.allinea.com

Libra 3.0 – GPGPU SDK on Mobiles and Tablets

November 13th, 2013

The Libra 3.0 Heterogeneous Cloud Computing SDK has recently been released by GPU Systems. It supports PC, Tablet and Mobile Devices and includes a new virtualizing function for cloud compute services of local and remote CPUs and GPUs. C/C++, Java, C# and Matlab are supported. Read the full press release here.

Call for Posters Open for GPU Technology Conference 2014

November 4th, 2013

The Call for Posters for next year’s GTC is now open. New at GTC 2014 will be an award for best poster at the show. Don’t miss this opportunity to present the innovate work you’re doing to other top developers, researchers, and engineers. Start now by reviewing the submission guidelines and criteria and plan to submit for a chance to take home well-deserved bragging rights.

All the information you need is available on http://www.gputechconf.com/page/call-for-posters.html.

Which is faster Constant Cache or Read-only Cache? – Part Two

November 4th, 2013

One of the keys to achieving maximum performance in CUDA is taking advantage of the various memory spaces. Part II of Acceleware’s tutorial has now been published. The tutorial uses a simple encryption kernel to test and compare read-only cache, constant cache and global memory. Read the full tutorial…

CfP: High Performance Computing in Bioinformatics

November 4th, 2013

2014 International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2014)
7-9 April, 2014. Granada (SPAIN). Special Session: High Performance Computing in Bioinformatics

The goal of this special session is to explore the use of emerging parallel computing architectures as well as High Performance Computing systems (Supercomputers, Clusters, Grids) for the simulation of relevant biological systems and for applications in Bioinformatics, Computational Biology and Computational Chemistry. We welcome papers, not submitted elsewhere for review, with a focus in topics of interest ranging from but not limited to: Read the rest of this entry »

Webinar: Face-in-the-crowd recognition with GPUs

November 4th, 2013

A free webinar on accelerating face-in-the-crowd recognition with GPU technology will be held on November 5th. It teaches how GPUs can be used to accelerate face detection and recognition of people in the crowd. The presentation will also cover the speakers’ use of ROS, OpenCV, OpenMP, and Armadillo libraries to develop fast reliable distributed video processing code. To register follow the link: https://www2.gotomeeting.com/register/292953058

A GPU-based Streaming Algorithm for High-Resolution Cloth Simulation

October 19th, 2013

Abstract:

We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and interobject collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many-core GPUs. In practice, our algorithm can perform high-fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high-end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single-threaded CPU-based algorithm, and about one order of magnitude improvement over a 16-core CPUbased parallel implementation.

(Min Tang, Roufeng Tong, Rahul Narain, Chang Meng and Dinesh Manocha: “A GPU-based Streaming Algorithm for High-Resolution Cloth Simulation”, in the Proceedings of Pacific Graphics 2013. [WWW])

Page 11 of 112« First...910111213...203040...Last »