<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GPGPU &#187; Tag: NVIDIA CUDA :: GPGPU.org</title>
	<atom:link href="http://gpgpu.org/tag/nvidia-cuda/feed" rel="self" type="application/rss+xml" />
	<link>http://gpgpu.org</link>
	<description>General-Purpose Computation on Graphics Hardware</description>
	<lastBuildDate>Mon, 06 Feb 2012 04:59:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>CUDA 4.1 Released</title>
		<link>http://gpgpu.org/2012/01/26/cuda-4-1</link>
		<comments>http://gpgpu.org/2012/01/26/cuda-4-1#comments</comments>
		<pubDate>Fri, 27 Jan 2012 04:06:55 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Debugging]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Profiling]]></category>
		<category><![CDATA[Programming Languages]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4422</guid>
		<description><![CDATA[Today NVIDIA released CUDA 4.1, including a new CUDA Toolkit, SDK, Visual Profiler, Parallel Nsight IDE and NVIDIA device driver. CUDA 4.1 makes it easier to accelerate scientific research with GPUs with key features including a redesigned Visual Profiler with automated performance analysis and expert guidance; a new LLVM-based compiler that generates up to 10% faster [...]]]></description>
			<content:encoded><![CDATA[<p>Today NVIDIA released <a href="http://www.developer.nvidia.com/cuda-toolkit-41" target="_blank">CUDA 4.1</a>, including a new CUDA Toolkit, SDK, Visual Profiler, Parallel Nsight IDE and NVIDIA device driver.</p>
<p>CUDA 4.1 makes it easier to accelerate scientific research with GPUs with key features including</p>
<ul>
<li>a redesigned Visual Profiler with automated performance analysis and expert guidance;</li>
<li>a new LLVM-based compiler that generates up to 10% faster code; and</li>
<li>1000+ new imaging and signal processing functions in the NPP library.</li>
</ul>
<p>The CuSparse library included with CUDA 4.1 has a new tridiagonal solver and 2x faster sparse matrix-vector multiplication using the ELL hybrid format, and the CuRand library included with CUDA 4.1 has two new random number generators. <span id="more-4422"></span> The CUDA 4.1 toolkit also brings some great improvements to its debugging and performance analysis tools.</p>
<p>Sign up for a webinar to learn more about all the new features &amp; high performance GPU-accelerated libraries!</p>
<p>CUDA 4.1 Toolkit 4.1 Feature Overview Webinar</p>
<ul>
<li><a href="https://www2.gotomeeting.com/register/955690146" target="_blank">For Europe and The Americas: 10am (PST), Wednesday, Feb 1</a></li>
<li><a href="  https://www2.gotomeeting.com/register/187844386" target="_blank">For Asia-Pacific and India:  10am (IST) Friday, Feb 3</a></li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/01/26/cuda-4-1/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PyCOOL: Python Cosmological Object-Oriented Lattice code</title>
		<link>http://gpgpu.org/2012/01/25/pycool</link>
		<comments>http://gpgpu.org/2012/01/25/pycool#comments</comments>
		<pubDate>Wed, 25 Jan 2012 05:03:45 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Astrophysics]]></category>
		<category><![CDATA[Cosmology]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4411</guid>
		<description><![CDATA[PyCOOL (Cosmological Object-Oriented Lattice code) is a fast GPU accelerated program that solves the evolution of interacting scalar fields in an expanding universe with symplectic algorithms. The program has been written with the intention to hit a sweet spot of speed, accuracy and user friendliness. This is achieved by using the Python language with the  PyCUDA interface [...]]]></description>
			<content:encoded><![CDATA[<p>PyCOOL (Cosmological Object-Oriented Lattice code) is a fast GPU accelerated program that solves the evolution of interacting scalar fields in an expanding universe with symplectic algorithms. The program has been written with the intention to hit a sweet spot of speed, accuracy and user friendliness. This is achieved by using the Python language with the  <a href="http://mathema.tician.de/software/pycuda">PyCUDA</a> interface to make a program that is very easy to adapt to different scalar field models.  The program is <a href="https://github.com/jtksai/PyCOOL" target="_blank">publicly available</a> under GNU General Public License at. See the <a href="http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/" target="_blank">PyCOOL website</a> for more information.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/01/25/pycool/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance of SpMV in CUSPARSE, CUSP and SpeedIT</title>
		<link>http://gpgpu.org/2012/01/14/performance-of-spmv-in-cusparse-cusp-and-speedit</link>
		<comments>http://gpgpu.org/2012/01/14/performance-of-spmv-in-cusparse-cusp-and-speedit#comments</comments>
		<pubDate>Sat, 14 Jan 2012 12:43:31 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Sparse Linear Systems]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4384</guid>
		<description><![CDATA[The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at http://wp.me/p1ZihD-1.]]></description>
			<content:encoded><![CDATA[<p>The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at <a title="full benchmarking report" href="http://wp.me/p1ZihD-1" target="_blank">http://wp.me/p1ZihD-1</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/01/14/performance-of-spmv-in-cusparse-cusp-and-speedit/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Acceleware 4 Day CUDA Course</title>
		<link>http://gpgpu.org/2012/01/06/acceleware-4-day-cuda-course</link>
		<comments>http://gpgpu.org/2012/01/06/acceleware-4-day-cuda-course#comments</comments>
		<pubDate>Fri, 06 Jan 2012 12:08:40 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4342</guid>
		<description><![CDATA[Partnering with NVIDIA and Microsoft, this four day course is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU. Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each [...]]]></description>
			<content:encoded><![CDATA[<p>Partnering with NVIDIA and Microsoft, this four day course is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.</p>
<p>Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.</p>
<p>Register before January 13 and receive $250 off your course fee!<br />
Enter promotional code AXTEB2012</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2012/01/06/acceleware-4-day-cuda-course/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HOOMD-blue 0.10.0 release</title>
		<link>http://gpgpu.org/2011/12/19/hoomd-blue-0-10-0-release</link>
		<comments>http://gpgpu.org/2011/12/19/hoomd-blue-0-10-0-release#comments</comments>
		<pubDate>Mon, 19 Dec 2011 07:44:41 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Molecular Dynamics]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4279</guid>
		<description><![CDATA[HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://codeblue.umich.edu/hoomd-blue/">HOOMD-blue</a> performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.</p>
<p>HOOMD-blue 0.10.0 adds many new features. Highlights include:<span id="more-4279"></span></p>
<ul>
<li>Added <strong>pair.dpdlj</strong> which uses the <span class="caps">DPD </span>thermostat and the Lennard-Jones potential. In previous versions, this could be accomplished by using two pair commands but at the cost of reduced performance.</li>
<li>Additional example scripts are now present in the documentation. The example scripts are cross-linked to the commands that are used in them.</li>
<li>Most dump commands now accept the form: <strong>dump.ext(filename=&#8221;filename.ext&#8221;)</strong> which immediately writes out filename.ext.</li>
<li>Specify rigid bodies in <span class="caps">XML </span>input files</li>
<li>Simulations that contain rigid body constraints applied to groups of particles in <span class="caps">BDNVT, NVE, NVT, </span>and <span class="caps">NPT </span>ensembles.</li>
<li>Energy minimization of rigid bodies ( <strong>integrate.mode_minimize_rigid_fire</strong> )</li>
<li>Existing commands are now rigid-body aware</li>
<li><span class="caps">NVT </span>integration using the Berendsen thermostat ( <strong>integrate.berendsen</strong> )</li>
<li>Bonds, angles, dihedrals, and impropers can now be created and deleted with the python data access <span class="caps">API.</span></li>
<li>and <a href="http://codeblue.umich.edu/hoomd-blue/">more</a></li>
</ul>
<p>HOOMD-blue 0.10.0 is available for <a href="http://codeblue.umich.edu/hoomd-blue/download.html">download</a> under an open source license. Check out the <a href="http://codeblue.umich.edu/hoomd-blue/doc/page_quick_start.html">quick start tutorial</a> to get started, or check out the <a href="http://codeblue.umich.edu/hoomd-blue/doc/index.html">full documentation</a> to see everything it can do.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/12/19/hoomd-blue-0-10-0-release/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures</title>
		<link>http://gpgpu.org/2011/12/14/acceleration-of-wavefront-applications</link>
		<comments>http://gpgpu.org/2011/12/14/acceleration-of-wavefront-applications#comments</comments>
		<pubDate>Wed, 14 Dec 2011 09:26:00 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Linear Algebra]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4264</guid>
		<description><![CDATA[Abstract: In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P).</p>
<p>Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.</p></blockquote>
<p>(Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A.: <em>&#8220;On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures&#8221;</em>,  The Computer Journal (in press) [<a href="http://dx.doi.org/10.1093/comjnl/bxr073" target="_blank">DOI</a>] [<a href="http://eprints.dcs.warwick.ac.uk/787/" target="_blank">PREPRINT</a>])</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/12/14/acceleration-of-wavefront-applications/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDA 4.1 RC2 Released</title>
		<link>http://gpgpu.org/2011/12/06/cuda4-1rc2-released</link>
		<comments>http://gpgpu.org/2011/12/06/cuda4-1rc2-released#comments</comments>
		<pubDate>Tue, 06 Dec 2011 16:31:45 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4245</guid>
		<description><![CDATA[The NVIDIA CUDA Toolkit 4.1 RC2 is now available for anyone to download. The key features of this release are: A new LLVM based compiler Over 1000 additional image processing function in the NPP library A Visual profiler There is also a new version of Parallel Nsight 2.1 RC2 with support for CUDA 4.1. To [...]]]></description>
			<content:encoded><![CDATA[<p>The NVIDIA CUDA Toolkit 4.1 RC2 is now available for anyone to download. The key features of this release are:</p>
<ul>
<li>A new LLVM based compiler</li>
<li>Over 1000 additional image processing function in the NPP library</li>
<li>A Visual profiler</li>
</ul>
<p>There is also a new version of Parallel Nsight 2.1 RC2 with support for CUDA 4.1. To download and to find out more follow: <a href="http://bit.ly/sRpQvr" target="_blank">http://bit.ly/sRpQvr</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/12/06/cuda4-1rc2-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to Generic Accelerated Computing with Libra SDK</title>
		<link>http://gpgpu.org/2011/11/30/generic-accelerated-computing-libra-sdk</link>
		<comments>http://gpgpu.org/2011/11/30/generic-accelerated-computing-libra-sdk#comments</comments>
		<pubDate>Wed, 30 Nov 2011 07:35:49 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Programming Environments]]></category>
		<category><![CDATA[Scientific Computing]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4230</guid>
		<description><![CDATA[Libra SDK is a sophisticated runtime including API, sample programs and documentation for massively accelerating software computations. This introduction tutorial provides an overview and usage examples of the powerful Libra API &#38; math libraries executing on x86/x64, OpenCL, OpenGL and CUDA technology. Libra API enables generic and portable CPU/GPU computing within software development without the [...]]]></description>
			<content:encoded><![CDATA[<p>Libra SDK is a sophisticated runtime including API, sample programs and documentation for massively accelerating software computations. This introduction tutorial provides an overview and usage examples of the powerful Libra API &amp; math libraries executing on x86/x64, OpenCL, OpenGL and CUDA technology. Libra API enables generic and portable CPU/GPU computing within software development without the need to create multiple, specific and optimized code paths to support x86, OpenCL, OpenGL or CUDA devices. Link to PDF: <a href="http://www.gpusystems.com/doc/LibraGenericComputing.pdf" target="_blank">www.gpusystems.com/doc/LibraGenericComputing.pdf</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/11/30/generic-accelerated-computing-libra-sdk/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Alenka &#8211; A GPU database engine including compression</title>
		<link>http://gpgpu.org/2011/11/28/gpu-database-engine</link>
		<comments>http://gpgpu.org/2011/11/28/gpu-database-engine#comments</comments>
		<pubDate>Mon, 28 Nov 2011 10:26:38 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Compression]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4215</guid>
		<description><![CDATA[Support for several types of compression has been added to the GPU-based database engine ålenkå . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in [...]]]></description>
			<content:encoded><![CDATA[<p>Support for several types of compression has been added to the GPU-based <a href="https://sourceforge.net/projects/alenka/files" target="_blank">database engine ålenkå</a> . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in analytical queries as shown by ålenkå&#8217;s results in the Star Schema and TPC-H benchmarks.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/11/28/gpu-database-engine/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Integrating CUDA and GNU Autotools</title>
		<link>http://gpgpu.org/2011/11/17/integrating-cuda-and-gnu-autotools</link>
		<comments>http://gpgpu.org/2011/11/17/integrating-cuda-and-gnu-autotools#comments</comments>
		<pubDate>Thu, 17 Nov 2011 10:18:10 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Autotools]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=4173</guid>
		<description><![CDATA[ClusterChimps.org has released a step by step guide to integrating CUDA with GNU Autotools. The guide covers building stand alone CUDA binaries, static CUDA libraries, shared CUDA libraries and comes with an example tarball. For more information go to http://www.clusterchimps.org/autotools.php]]></description>
			<content:encoded><![CDATA[<p>ClusterChimps.org has released a step by step guide to integrating CUDA with GNU Autotools. The guide covers building stand alone CUDA binaries, static CUDA libraries, shared CUDA libraries and comes with an example tarball. For more information go to <a href="http://www.clusterchimps.org/autotools.php" target="_blank">http://www.clusterchimps.org/autotools.php</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/11/17/integrating-cuda-and-gnu-autotools/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

