GPU.NET v2.0 released

July 29th, 2011

TidePowerd has released Version 2 of their GPU computing solution for the .NET framework, GPU.NET. Their platform allows developers to quickly and easily write GPU-accelerated applications completely in .NET-based languages. Some key benefits include:

  • Stay in C# and treat kernel methods like any regular method
  • “Boilerplate” GPU programming tasks such as memory transfer and GPU scheduling are abstracted from the developer
  • Cross-platform and cross-hardware with a single binary
  • Systems seamlessly adapt to new hardware without rewriting code
  • Speed on par with native code

New version 2 features:

  • Visual Studio Error list and IntelliSense integration
  • On-device random number generation
  • Double precision support

A free 30-days evaluation license is available, as well as in-depth examples and tutorials.

Jacket v1.8 and LibJacket v1.1 released

July 24th, 2011

Jacket 1.8 and LibJacket 1.1 have been released by Accelereyes, enabling GPU support for MATLAB and easier CUDA development with C/C++/Fortran and Python.  New features include:

  • Expanded support for the Signal Processing, Image Processing, and Statistics Libraries included with both Jacket and LibJacket
  • Faster linear algebra for special systems (e.g. symmetric, positive definite, triangular, etc.)
  • Enhanced visualizations
  • New and updated examples: FDTD, Mandelbrot fractals, maximum-likelihood neural segmentation, MDS for genomics
  • Built with CUDA 4.0 for peak performance

Visit http://www.accelereyes.com/ for details, downloads, whitepapers and tutorials.

CUVI 0.5 Released

July 24th, 2011

TunaCode is pleased to announce the release of CUVI (CUDA Vision and Imaging Library) version 0.5 which comes with a new API and new features. This release makes it even simpler to add acceleration to existing Imaging applications, without any prior technical knowledge of GPUs. CUVI v0.5 is built from bottom up with performance and ease-of-use in mind.

CUVI version 0.5 is available for download at http://cuvilib.com and is available for Windows (Win32, x64) with planned support for Linux and Mac.

Proven Algorithmic Techniques for Manycore Processors Summer School

July 20th, 2011

The Virtual School of Computational Science and Engineering (VSCSE) will offer a hands-on course for graduate students August 15-19:

Proven Algorithmic Techniques for Manycore Processors

This course will be delivered to a number of sites nationwide—including the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign—using high-definition video conferencing technologies. Students at all sites will be able to work with a cohort of fellow computational scientists, have access to local teaching assistants, and interact virtually with course instructors.

Registration for the weeklong course is $100. Please visit www.vscse.org for more information or hub.vscse.org to register.

Read the rest of this entry »

rCUDA 3.0a released

July 17th, 2011

A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are:

  • Partially updated API to 4.0
  • Added compatibility support with CUDA 4.0 environment
  • Updated CUBLAS API to 4.0 for the most common CUBLAS routines
  • Fixed some bugs
  • General performance improvements

For further information, please visit the rCUDA webpage.

CUDA Template Generator Released

June 26th, 2011

CUDA Template Generator is a Java application that allows generates CUDA C source file templates based on user input parameters. Features include :

  • An algorithm for automatic block and thread definition, depending on array size.
  • Automatic memory transfer functions for CPU->GPU->CPU communication.
  • Generated C source code function template to use in your application.

Developed by Pavel Kartashev, as part of his Master’s Degree work.

Microsoft Announces C++ AMP

June 26th, 2011

Microsoft has announced that the next version of Visual Studio will contain technology labeled C++ Accelerated Massive Parallelism (C++ AMP) to enable C++ developers to take advantage of the GPU for computation purposes. More information is available in the MSDN blog posts here and here.

Intel announces a high-performance SPMD compiler for the CPU

June 26th, 2011

Intel has announced ispc, The Intel SPMD Program Compiler, now available in source and binary form from http://ispc.github.com.

ispc is a new compiler for “single program, multiple data” (SPMD) programs; the same model that is used for (GP)GPU programming, but here targeted to CPUs. ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs; it frequently provides a a 3x or more speedup on CPUs with 4-wide SSE units, without any of the difficulty of writing intrinsics code. There were a few principles and goals behind the design of ispc:

  • To build a small C-like language that would deliver excellent performance to performance-oriented programmers who want to run SPMD programs on the CPU.
  • To provide a thin abstraction layer between the programmer and the hardware—in particular, to have an execution and data model where the programmer can cleanly reason about the mapping of their source program to compiled assembly language and the underlying hardware.
  • To make it possible to harness the computational power of the SIMD vector units without the extremely low-programmer-productivity activity of directly writing intrinsics.
  • To explore opportunities from close coupling between C/C++ application code and SPMD ispc code running on the same processor—to have lightweight function calls between the two languages, to share data directly via pointers without copying or reformatting, and so forth.

ispc is an open source compiler with a BSD license. It uses the LLVM Compiler Infrastructure for back-end code generation and optimization and is hosted on github. It supports Windows, Mac, and Linux, with both x86 and x86-64 targets. It currently supports the SSE2 and SSE4 instruction sets, though support for AVX should be available soon.

CUDA 4.0 Library Performance Overview

June 26th, 2011

The performance of many math functions has improved with the release of the CUDA 4.0 Toolkit. This presentation includes the performance results of many of the key functions. Results include performance measurements for:

  • cuFFT – Fast Fourier Transforms Library
  • cuBLAS – Complete BLAS Library
  • cuSPARSE – Sparse Matrix Library
  • cuRAND – Random Number Generation (RNG) Library
  • NPP – Performance Primitives for Image & Video Processing
  • Thrust – Templated Parallel Algorithms & Data Structures
  • math.h – C99 floating-point Library

 

Parallel Solution of Sparse Triangular Linear Systems

June 26th, 2011

Abstract:

A novel algorithm for solving in parallel a sparse triangular linear system on a graphical processing unit is proposed. It implements the solution of the triangular system in two phases. First, the analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the solve phase obtains the full solution by iterating sequentially across the constructed levels. The solution elements corresponding to each single level are obtained at once in parallel. The numerical experiments are also presented and it is shown that the incomplete-LU and Cholesky preconditioned iterative methods, using the parallel sparse triangular solve algorithm, can achieve on average more than 2x speedup on graphical processing units (GPUs) over their CPU implementation.

(Maxim Naumov: “Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU”, NVIDIA Technical Report, June 2011. [WWW])

Page 5 of 33« First...34567...102030...Last »