This webinar, scheduled for Wednesday, August 7 at 10 a.m. PDT, will cover the latest schedule for GPUDirect RDMA, scaling and optimization techniques for maximizing application performance using MVAPICH2, and the latest advancements of CUDA. Join speakers from Ohio State University, NVIDIA and Mellanox Technologies. Register by visiting www.gputechconf.com/gtcexpress.
The rCUDA team is glad to announce that its remote GPU virtualization technology now supports the ARM processor architecture. The new release of rCUDA for this low-power processor has been developed for the Ubuntu 11.04 and Ubuntu 12.04 ARM linux distributions. With this new rCUDA release, it is also possible to leverage hybrid platforms where the application uses ARM CPUs while requesting acceleration services provided by remote GPUs installed in x86 nodes. The opposite is also possible: an application running in an x86 computer can access remote GPUs attached to ARM systems. Please visit rCUDA website for more information or for requesting a free copy of the rCUDA middleware.
Algorithmic trading has become ever more popular in recent years – accounting for approximately half of all European and American stock trades placed in 2012. The trading strategies need to be back-tested regularly using historical market data for calibration and to check the expected return and risk. This is a computationally demanding process that can take hours to complete. However, back-testing the strategies frequently intra-day can significantly increase the profits for the trading institution.
Anatoly Baksheev, OpenCV GPU Module Team Leader at Itseez will demonstrate how to obtain and build OpenCV, its GPU module, and the sample programs. You will learn how to use the OpenCV GPU module and create your own custom GPU functions for OpenCV. Register for the July 30th webinar: http://goo.gl/5V3eA
The HSA Foundation will be hosting a Birds of a Feather session on heterogeneous computing on July 24 from 1-2 p.m., at the Anaheim Convention Center, Room 202B. For more info: http://slidesha.re/16JSqK7
GPU Technology Conference (GTC) is NVIDIA’s annual developer event and consistently attracts the world’s best and brightest GPU developers, creating opportunities for connection and learning through technical sessions and in-depth tutorials in science, professional graphics, game development, mobile computing, cloud computing and automotive applications, as well as first-hand interactions with peers, luminaries, and emerging and established companies.
If you are doing innovative work using GPU, please submit a proposal at https://gtc2014.consenseus.com/
The deadline is Friday, September 27.
Acceleware recently announced a couple of courses:
- CUDA for Finance: December 10 – 13, 2013, New York, NY [Details]
- OpenCL: October 22 – 25, 2013, Houston, TX [details]
- CUDA: September 24-27, [Details]
- C++ AMP: September 10-13, [Details]
While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4.
See also supplementary materials and the follow up at http://vratis.com/blog/?p=7.
(Tadeusz Tomczak, Katarzyna Zadarnowska, Zbigniew Koza, Maciej Matyka and Łukasz Mirosław: “Acceleration of iterative Navier-Stokes solvers on graphics processing units”, International Journal of Computational Fluid Dynamics, accepted, July 2013. [DOI])
From a recent press release:
AMD’s APP SDK is an essential resource for developers who wish to leverage the processing power of heterogeneous computing. OpenCL™ is the primary mechanism for achieving this today, but AMD’s goal is to enable developers to accelerate applications with the programming paradigm of their choice. Toward that end, AMD has added support for heterogeneous libraries such as the newly released Bolt open source C++ template library and OpenCV computer vision library which now includes heterogeneous acceleration.
New to APP SDK 2.8.1:
Bolt: With the recent launch of Bolt 1.0, AMD has added several samples to the APP SDK to demonstrate Bolt 1.0 features. These showcase the usage of Bolt APIs such as scan, sort, reduce and transform. Other new samples highlight the ease of porting from STL and the performance benefits achieved over equivalent STL implementations. We’ve also included samples to demonstrate the different fallback options available in Bolt 1.0 when no GPU is available which ensure your code runs correctly on any platform.
OpenCV: AMD has been working closely with the OpenCV open source community to add heterogeneous acceleration capability to the world’s most popular computer vision library. These changes are already integrated into OpenCV and are readily available for developers who want to improve performance and efficiency of their computer vision applications. AMD has included samples to illustrate these improvements and highlight how simple it is to include them in your app.
GCN: AMD recently launched its new Graphics Core Next (GCN) architecture on several AMD products. GCN is based on a scalar architecture vs. the VLIW vector architecture of prior generations, so hand-tuned vectorization to optimize hardware utilization is no longer needed. We’ve modified several samples in AMD APP SDK 2.8.1 to show the ease of writing scalar code as compared to vectorization.
For more information, see developer.amd.com.
The Thrust team is pleased to announce the release of Thrust v1.7, an open-source C++ library for developing high-performance parallel applications. Modeled after the C++ Standard Template Library, Thrust brings a familiar abstraction layer to the realm of parallel computing
Thrust 1.7.0 introduces a new interface for controlling algorithm execution as well as several new algorithms and performance improvements. With this new interface, users may directly control how algorithms execute as well as details such as the allocation of temporary storage. Key/value versions of thrust::merge and the set operation algorithms have been added, as well stencil versions of partitioning algorithms. For 32b types, new CUDA merge and set operations provide 2-15x faster performance while a new CUDA comparison sort provides 1.3-4x faster performance.
Thrust is open-source software distributed under the OSI-approved Apache License 2.0.