The deadline for submissions to “GPU’s in Databases” GID2011 workshop has been extended to April 12th, 2011. The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. The workshop topics include, but are not limited to: Read the rest of this entry »
Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency — to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput — to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs hold high potential to speed up the computations. However, their use for hard real-time stream processing is complicated by slow communications with CPUs, variable throughput changing non-linearly with the input size, and weak consistency of their local memory with respect to CPU accesses. Furthermore, their coarse grain hardware scheduler renders them unsuitable for unbalanced multi-stream workloads.
We present a general, efficient and practical algorithm for hard real-time stream scheduling in heterogeneous systems. The algorithm assigns incoming streams of different rates and deadlines to CPUs and accelerators. By employing novel stream schedulability criteria for accelerators, the algorithm finds the assignment which simultaneously satisfies the aggregate throughput requirements of all the streams and the deadline constraint of each stream alone.
Using the AES-CBC encryption kernel, we experimented extensively on thousands of streams with realistic rate and deadline distributions. Our framework outperformed the alternative methods by allowing 50% more streams to be processed with provably deadline-compliant execution even for deadlines as short as tens milliseconds. Overall, the combined GPU-CPU execution allows for up to 4-fold throughput increase over highly-optimized multi-threaded CPU-only implementations.
( Uri Verner, Assaf Schuster and Mark Silberstein, “Processing data streams with hard real-time constraints on heterogeneous systems”, ICS’11, to appear)
The 4th workshop on UnConventional High Performance Computing 2011 (UCHPC 2011), August 29th, 2011, Bordeaux, France, will be held in conjunction with Euro-Par 2011. This workshop is organized by Anders Hast, Josef Weidendorfer and Jan-Philipp Weiss.
As the word “UnConventional” in the title suggests, the workshop focuses on hardware or platforms used for HPC, which were not intended for HPC in the first place. Reasons could be raw computing power, good performance per watt, or low cost in general. Thus, UCHPC tries to capture solutions for HPC which are unconventional today but perhaps conventional tomorrow. For example, the computing power of platforms for games recently raised rapidly. This motivated the use of GPUs for computing (GPGPU), or even building computational grids from game consoles. The recent trend of integrating GPUs on processor chips seem to be very beneficial for use of both parts for HPC. Other examples for “unconventional” hardware are embedded, low-power processors, upcoming many-core architectures, FPGAs or DSPs. Thus, interesting devices for research in unconventional HPC are not only standard server or desktop systems, but also relative cheap devices due to being mass market products, such as smartphones, netbooks, tablets and small NAS servers. For example, smartphones seem to become more performance hungry every day. Only imagination sets the limit for use.
The full call for papers including detailed submission instructions is available at http://www.lrr.in.tum.de/~weidendo/uchpc11.
We are pleased to announce High-Performance Graphics 2011. High Performance Graphics is the leading international forum for performance-oriented graphics systems research including innovative algorithms, efficient implementations, and hardware architecture. The conference brings together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications.
The conference is co-located with ACM SIGGRAPH 2011 (Aug. 5-7) in Vancouver, Canada. More information including the full call for papers with deadlines and submission instructions, is available at http://www.highperformancegraphics.org.
In this paper, we present the design of Power Flow algorithm that has enhanced performance on the Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). This work investigates the performance of optimized CPU versions of Newton-Raphson (Polar form) and Gauss-Jacobi power flow algorithms, highlights the approach used to reduce the computation time by performing these studies on massively parallel GPU cores. Simulations results demonstrate the significant acceleration of the GPU version compared to its CPU variant, thus reducing processing time making them suitable for real-time online dispatching purposes.
(Singh, J. and Aruni, I.: “Accelerating Power Flow studies on Graphics Processing Unit”, Proceedings of the Annual IEEE India Conference 2010 (INDICON), pp 1-5, Dec. 2010. [DOI])
Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. We’ve been working with industry and academia partners to help advance real-world use of these technologies, and to understand the opportunities that lie ahead. It’s time to share what we’ve learned so far.
With tutorials, hands-on labs, and sessions that span a range of topics from HPC to multimedia, you’ll have the opportunity to expand your view of what heterogeneous computing currently offers and where it is going. You’ll hear from industry innovators and academic pioneers who are exploring different ways of approaching problems, and utilizing new paradigms in computing to help identify solutions. You’ll meet AMD experts with deep knowledge of hardware architectures and the software techniques that best leverage those platforms. And you’ll connect with other software professionals who share your passion for the future of technology.
Learn more at developer.amd.com/afds.
The First International Workshop on Accelerators Architectures for the Masses (WACy 2011) will be held in conjunction with 25th Int’l. Conference on Supercomputing (ICS 2011), on June 4th 2011. The submission of short papers (approximately ~6 pages) is encouraged. This workshop is organized by Arrvindh Shriraman and Tor Aamodt, the submission deadline is April 15th 11:59pm PST. More information is available at http://wacy.cs.sfu.ca.
Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted GridsMarch 29th, 2011
We present a computational method of coupling average interpolating wavelets with high-order finite volume schemes and its implementation on heterogeneous computer architectures for the simulation of multiphase compressible flows. The method is implemented to take advantage of the parallel computing capabilities of emerging heterogeneous multicore/multi-GPU architectures. A highly efficient parallel implementation is achieved by introducing the concept of wavelet blocks, exploiting the task-based parallelism for CPU cores, and by managing asynchronously an array of GPUs by means of OpenCL. We investigate the comparative accuracy of the GPU and CPU based simulations and analyze their discrepancy for two-dimensional simulations of shock-bubble interaction and Richtmeyer–Meshkov instability. The results indicate that the accuracy of the GPU/CPU heterogeneous solver is competitive with the one that uses exclusively the CPU cores. We report the performance improvements by employing up to 12 cores and 6 GPUs compared to the single-core execution. For the simulation of the shock-bubble interaction at Mach 3 with two million grid points, we observe a 100-fold speedup for the heterogeneous part and an overall speedup of 34.
(Rossinelli D., Hejazialhosseini B., Spampinato D., Koumoutsakos P.: “Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids”, SIAM Journal of Scientific Computing 33:512-540, 2011 [DOI])
Expanding the already comprehensive breadth of topics covered at GTC 2010, the GTC Content Committee has added new topic areas for 2011. Below is a partial list; see the GTC website for full details:
- Application Design & Porting Techniques
- Climate & Weather Modeling
- Cluster Management
- Computational Structural Mechanics
- Parallel Programming Languages
GTC is also looking for posters that describe novel or interesting research topics in parallel computing, visual computing, and applications of GPUs, with a particular interest in submissions describing GPU computing and CUDA applications that solve diverse problems in scientific and engineering domains. Read the rest of this entry »
From a recent press release:
CUDAfy is a .NET SDK that allows you to write, debug and emulate CUDA GPU applications in any .NET language including C# or Visual Basic. The aim is to bring the power of GPGPU to the large number of .NET developers out there. Features include:
- .NET object orientated CUDA model (GThread)
- Write .NET code marking methods, structures and constants that should be translated to CUDA (“Cudafying”)
- An add-in for Red Gate’s .NET Reflector tool that translates to CUDA C
- Built in emulation of GPU kernel functions
- 1D, 2D and 3D array support including access to Array class’s Length, GetLength and Rank members
- Use all standard .NET value types. No new types even for managing data allocated on GPU
- Simple .NET wrapper for CUFFT and CUBLAS
During our work with the European Space Agency, Astrium and NLR we saw how GPUs could significantly improve performance of the emulation of algorithms targeted on FPGAs and ASICs. The SDEs and SDKs produced were .NET based and CUDAfy is the result of efforts to more tightly integrate the GPU and CPU code development. There are user guides and sample projects. Many of the samples in the book CUDA by Example have been ported to .NET. See www.hybriddsp.com for downloads and more information.