<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GPGPU &#187; Tag: Data-Parallel :: GPGPU.org</title>
	<atom:link href="http://gpgpu.org/tag/data-parallel/feed" rel="self" type="application/rss+xml" />
	<link>http://gpgpu.org</link>
	<description>General-Purpose Computation on Graphics Hardware</description>
	<lastBuildDate>Mon, 06 Feb 2012 04:59:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>CUDPP 2.0: parallel hash tables, tridiagonal solver, parallel reductions, and double precision</title>
		<link>http://gpgpu.org/2011/08/08/cudpp-2-0</link>
		<comments>http://gpgpu.org/2011/08/08/cudpp-2-0#comments</comments>
		<pubDate>Tue, 09 Aug 2011 03:07:13 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=3831</guid>
		<description><![CDATA[CUDPP release 2.0 is a major new release of the CUDA Data-Parallel Primitives Library, with exciting new features. The public interface has undergone a minor redesign to provide thread safety. Parallel reductions (cudppReduce) and a tridiagonal system solver (cudppTridiagonal) have been added, and a new component library, cudpp_hash, provides fast data-parallel hash table functionality. In addition, [...]]]></description>
			<content:encoded><![CDATA[<p>CUDPP release 2.0 is a major new release of the <a href="http://cudpp.googlecode.com" target="_blank">CUDA Data-Parallel Primitives Library</a>, with exciting new features. The public interface has undergone a minor redesign to provide thread safety. Parallel reductions (<a href="http://cudpp.googlecode.com/svn/tags/2.0/doc/html/group__public_interface.html#ga21d9b2b3c74daffbec52ef628f835313" target="_blank">cudppReduce</a>) and a tridiagonal system solver (<a href="http://cudpp.googlecode.com/svn/tags/2.0/doc/html/group__public_interface.html#gabd3c1f97e1d22839756fd2594aaefb56" target="_blank">cudppTridiagonal</a>) have been added, and a new component library, <a href="http://cudpp.googlecode.com/svn/tags/2.0/doc/html/hash_overview.html" target="_blank">cudpp_hash</a>, provides fast data-parallel hash table functionality. In addition, support for 64-bit data types (double as well as long long and unsigned long long) has been added to all CUDPP algorithms, and a variety of bugs have been fixed.  For a complete list of changes, see the <a href="http://cudpp.googlecode.com/svn/tags/2.0/doc/html/changelog.html" rel="nofollow" target="_blank">change log</a>. CUDPP 2.0 is available for <a href="http://code.google.com/p/cudpp/downloads/list">download now</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2011/08/08/cudpp-2-0/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thrust v1.3 release</title>
		<link>http://gpgpu.org/2010/10/07/thrust-v1-3-release</link>
		<comments>http://gpgpu.org/2010/10/07/thrust-v1-3-release#comments</comments>
		<pubDate>Fri, 08 Oct 2010 01:25:16 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=2840</guid>
		<description><![CDATA[Thrust v1.3, an open-source template library for CUDA applications, has been released. Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing. Version 1.3 adds several new features, including: a state-of-the-art sorting implementation, recently featured on Slashdot. performance improvements to stream compaction and reduction robust [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gpgpu.org/wp/wp-content/uploads/2010/10/thrust_logo-e1286501346306.png"><img class="alignright size-full wp-image-2841" title="thrust_logo" src="http://gpgpu.org/wp/wp-content/uploads/2010/10/thrust_logo-e1286501346306.png" alt="" width="200" height="79" /></a><a href="http://thrust.googlecode.com">Thrust</a> v1.3, an open-source template library for CUDA applications, has been released.  Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing.</p>
<p>Version 1.3 adds several new features, including:</p>
<ul>
<li>a state-of-the-art sorting implementation, recently <a href="http://developers.slashdot.org/story/10/08/30/0133203/Sorting-Algorithm-Breaks-Giga-Sort-Barrier-With-GPUs">featured</a> on Slashdot.</li>
<li>performance improvements to stream compaction and reduction</li>
<li>robust error reporting and failure detection</li>
<li>support for CUDA 3.2 and gf104-based GPUs</li>
<li>search algorithms</li>
<li>and <a href="http://code.google.com/p/thrust/source/browse/CHANGELOG?r=2444d6c2eb30fea369b0417940d2306f8d03040c">more</a>!</li>
</ul>
<p>Get started with Thrust today!  First <a href="http://thrust.googlecode.com/files/thrust-v1.3.0.zip">download Thrust v1.3</a> and then follow the online <a href="http://code.google.com/p/thrust/wiki/QuickStartGuide">quick-start guide</a>.  Refer to the <a href="http://code.google.com/p/thrust/wiki/Documentation">online documentation</a> for a complete list of features.  Many <a href="http://thrust.googlecode.com/files/examples-v1.3.zip">concrete examples</a> and a set of <a href="http://code.google.com/p/thrust/downloads/list">introductory slides</a> are also available.<span id="more-2840"></span></p>
<p>Thrust is open-source software distributed under the <a href="http://www.opensource.org/licenses/apache2.0.php">OSI-approved</a> Apache License v2.0.</p>
<p>Acknowledgments<br />
•	Thanks to Duane Merrill for contributing a fast radix sort implementation<br />
•	Thanks to Erich Elsen for contributing an implementation of find_if<br />
•	Thanks to Andrew Corrigan for contributing changes which enable OpenMP in the absence of nvcc<br />
•	Thanks to Andrew Corrigan, Cliff Woolley, David Coeurjolly, Janick Martinez Esturo, John Bowers, Maxim Naumov, Michael Garland, and Ryuta Suzuki for bug reports<br />
•	Thanks to Cliff Woolley for help with testing</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2010/10/07/thrust-v1-3-release/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDPP 1.1.1</title>
		<link>http://gpgpu.org/2010/04/29/cudpp-1-1-1</link>
		<comments>http://gpgpu.org/2010/04/29/cudpp-1-1-1#comments</comments>
		<pubDate>Thu, 29 Apr 2010 09:36:17 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Sorting]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=2262</guid>
		<description><![CDATA[The CUDA Data Parallel Primitives Library (CUDPP) is a cross-platform, open-source library of data-parallel algorithm primitives such as parallel prefix-sum (&#8220;scan&#8221;), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables. [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://code.google.com/p/cudpp/" target="_blank">CUDA Data Parallel Primitives Library</a> (CUDPP) is a cross-platform, open-source library of data-parallel algorithm primitives such as parallel prefix-sum (&#8220;scan&#8221;), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables. CUDPP runs on processors that support CUDA.</p>
<p><a href="http://cudpp.googlecode.com/files/cudpp_src_1.1.1.zip">CUDPP release 1.1.1</a> is a bugfix release with fixes for scan, segmented scan, stream compaction, and radix sort on the NVIDIA Fermi (sm_20) architecture, including GeForce 400 series and Tesla 20 series GPUs.  It also includes improvements and bugfixes for radix sorts on 64-bit OSes, and fixes for 64-bit builds on MS Windows OSes and Apple OS X 10.6 (Snow Leopard).  <a href="http://cudpp.googlecode.com/svn/tags/1.1.1/cudpp/doc/html/changelog.html" target="_blank">Change Log</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2010/04/29/cudpp-1-1-1/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDPP Users: Please Complete This Survey!</title>
		<link>http://gpgpu.org/2010/02/11/cudpp-survey</link>
		<comments>http://gpgpu.org/2010/02/11/cudpp-survey#comments</comments>
		<pubDate>Fri, 12 Feb 2010 00:56:00 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>
		<category><![CDATA[Surveys]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=2145</guid>
		<description><![CDATA[The developers of the CUDPP (CUDA Data-Parallel Primitives) Library request that users (past and current) of the CUDPP Library fill out the CUDPP Survey.  This survey will help the CUDPP Team prioritize new development and support for existing and new features.]]></description>
			<content:encoded><![CDATA[<p>The developers of the CUDPP (CUDA Data-Parallel Primitives) Library request that users (past and current) of the CUDPP Library fill out the <a href="http://gd.is/TTJ3" target="_blank">CUDPP Survey</a>.  This survey will help the CUDPP Team prioritize new development and support for existing and new features.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2010/02/11/cudpp-survey/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Some older publications worth reading</title>
		<link>http://gpgpu.org/2010/01/17/news-backlog</link>
		<comments>http://gpgpu.org/2010/01/17/news-backlog#comments</comments>
		<pubDate>Sun, 17 Jan 2010 22:22:06 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Site News]]></category>
		<category><![CDATA[Computer Architecture]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Molecular Dynamics]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Programming Languages]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1609</guid>
		<description><![CDATA[Occasionally, we receive news submissions pointing us to interesting older papers that somehow slipped by without our notice. This post collects a few of those. If you want your work to be posted on GPGPU.org  in a timely manner, please remember to use the news submission form. Joshua A. Anderson, Chris D. Lorenz and Alex [...]]]></description>
			<content:encoded><![CDATA[<p>Occasionally, we receive news submissions pointing us to interesting older papers that somehow slipped by without our notice. This post collects a few of those. If you want your work to be posted on GPGPU.org  in a timely manner, please remember to use the <a href="http://gpgpu.org/submit-news">news submission form</a>.</p>
<ul>
<li>Joshua A. Anderson, Chris D. Lorenz and Alex Travesset present and discuss molecular dynamics simulations and compare a single GPU against a 36-CPU cluster (<em>General purpose molecular dynamics simulations fully implemented on graphics processing units</em>, Journal of Computational Physics 227(10), May 2008, DOI <a href="http://dx.doi.org/10.1016/j.jcp.2008.01.047" target="_blank">10.1016/j.jcp.2008.01.047</a>).</li>
<li>Wen-mei Hwu et al. derive and discuss goals and concepts of programming models for fine-grained parallel architectures, from the point of view of both a programmer and a hardware /compiler designer, and analyze CUDA as one current representative  (<em>Implicitly parallel programming models for thousand-core microprocessors</em>, Proceedings of DAC&#8217;07, June 2007, DOI <a href="http://dx.doi.org/10.1145/1278480.1278669" target="_blank">10.1145/1278480.1278669</a>).</li>
<li>Jeremy Sugerman et al. present GRAMPS, a prototype implementation of future graphics hardware that allows pipelines to be specified as graphs in software (<em>GRAMPS: A Programming Model for Graphics Pipelines</em>, ACM Transactions on Graphics 28(1), January 2009, DOI <a href="http://dx.doi.org/10.1145/1477926.1477930" target="_blank">10.1145/1477926.1477930</a>).</li>
<li>William R. Mark discusses concepts of future graphics architectures in this contribution to the 2008 ACM Queue special issue on GPUs <em>(Future graphics architectures</em>, ACM Queue 6(2), March/April 2008,  DOI <a href="http://dx.doi.org/10.1145/1365490.1365501" target="_blank">10.1145/1365490.1365501</a>).</li>
<li>BSGP by Qiming Hou et al. is a new programming language for general purpose GPU computing that achieves the same efficiency as well-tuned CUDA programs but makes code much easier to read, develop and maintain (<em>BSGP: bulk-synchronous GPU programming</em>, ACM Siggraph 2008, August 2008, DOI <a href="http://dx.doi.org/10.1145/1399504.1360618">10.1145/1399504.1360618</a>).</li>
<li>Finally,<a href="http://dx.doi.org/10.1016/j.jpdc.2008.05.014" target="_blank"> Che et al.</a> and <a href="http://dx.doi.org/10.1109/MM.2008.57" target="_blank">Garland et al.</a> survey the field of GPU computing and discuss many different application domains. These articles are, in addition to the ones we have <a href="developer/cuda#reading">collected on the developer pages</a>, recommended to GPGPU newcomers.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2010/01/17/news-backlog/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thrust 1.1 Released</title>
		<link>http://gpgpu.org/2009/09/11/thrust-1-1-released</link>
		<comments>http://gpgpu.org/2009/09/11/thrust-1-1-released#comments</comments>
		<pubDate>Fri, 11 Sep 2009 05:51:01 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>
		<category><![CDATA[Sorting]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1856</guid>
		<description><![CDATA[Thrust (v1.1) is  an open-source template library for developing CUDA applications.  Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing. Version 1.1 adds several new features, including: fancy iterators binary search algorithms pair and tuple types segmented scan (experimental) pinned memory support (experimental) and more! [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Thrust" href="http://thrust.googlecode.com/" target="_blank">Thrust</a> (v1.1) is  an open-source template library for developing CUDA applications.  Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing. Version 1.1 adds several new features, including:</p>
<ul>
<li> <a title="fancy iterators" href="http://thrust.googlecode.com/svn/tags/1.1.0/doc/html/group__fancyiterator.html" target="_blank">fancy iterators</a></li>
<li> <a title="binary search algorithms" href="http://thrust.googlecode.com/svn/tags/1.1.0/doc/html/group__binary__search.html" target="_blank">binary search algorithms</a></li>
<li> <a title="pair and tuple types" href="http://thrust.googlecode.com/svn/tags/1.1.0/doc/html/group__utility.html" target="_blank">pair and tuple types</a></li>
<li> <a title="segmented scan (experimental)" href="http://thrust.googlecode.com/svn/tags/1.1.0/doc/html/group__segmentedprefixsums.html" target="_blank">segmented scan      (experimental)</a></li>
<li> <a title="pinned memory support (experimental)" href="http://thrust.googlecode.com/svn/tags/1.1.0/doc/html/group__memory__management__classes.html" target="_blank">pinned memory      support (experimental)</a></li>
<li> and <a title="more" href="http://code.google.com/p/thrust/source/browse/tags/1.1.0/CHANGELOG" target="_blank">more</a>!</li>
</ul>
<p>To get started with Thrust, first <a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" title="Download Thrust" href="http://code.google.com/p/thrust/downloads/list" target="_blank">download</a> Thrust and then follow the online <a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" href="http://code.google.com/p/thrust/wiki/Tutorial" target="_blank">tutorial</a>.  Refer to the <a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" title="online documentation" href="http://code.google.com/p/thrust/wiki/Documentation" target="_blank">online documentation</a> for a complete list of features.  Many <a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" href="http://thrust.googlecode.com/files/examples.zip" target="_blank">concrete examples</a> and a set of <a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" href="http://code.google.com/p/thrust/downloads/list" target="_blank">introductory slides</a> are also available. As the following code example shows, Thrust programs are concise and readable. <span id="more-1856"></span></p>
<pre>#include &lt;thrust/host_vector.h&gt;
#include &lt;thrust/device_vector.h&gt;
#include &lt;thrust/generate.h&gt;
#include &lt;thrust/sort.h&gt;
#include &lt;cstdlib&gt;</pre>
<pre>int main(void)
{
    // generate twenty random numbers on the host
    thrust::host_vector&lt;int&gt; h_vec(20);
    thrust::generate(h_vec.begin(), h_vec.end(), rand);</pre>
<pre>    // transfer data to the device
    thrust::device_vector&lt;int&gt; d_vec = h_vec;</pre>
<pre>    // sort data on the device
    thrust::sort(d_vec.begin(), d_vec.end());</pre>
<pre>    return 0;
}</pre>
<pre></pre>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/09/11/thrust-1-1-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel acquires RapidMind</title>
		<link>http://gpgpu.org/2009/08/23/intel-acquires-rapidmind</link>
		<comments>http://gpgpu.org/2009/08/23/intel-acquires-rapidmind#comments</comments>
		<pubDate>Mon, 24 Aug 2009 01:08:26 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[APIs]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Programming Languages]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1807</guid>
		<description><![CDATA[Intel has acquired RapidMind, the company behind the RapidMind (formerly Sh) programming environment targeting multicore CPUs, AMD and NVIDIA GPUs and the Cell processor. The RapidMind Platform continues to be available, including support. In the medium term RapidMind&#8217;s technology and products will be integrated with Intel&#8217;s data-parallel products, in particular Intel&#8217;s Ct technology. This blog [...]]]></description>
			<content:encoded><![CDATA[<p>Intel has acquired RapidMind, the company behind the RapidMind (formerly Sh) programming environment targeting multicore CPUs, AMD and NVIDIA GPUs and the Cell processor. The <a href="http://rapidmind.com/product.php" target="_blank">RapidMind Platform</a> continues to be available, including support. In the medium term RapidMind&#8217;s technology and products will be integrated with Intel&#8217;s data-parallel products, in particular <a href="http://software.intel.com/en-us/data-parallel/" target="_blank">Intel&#8217;s Ct technology</a>.</p>
<p>This <a href="http://software.intel.com/en-us/blogs/2009/08/19/rapidmind-intel/" target="_blank">blog entry</a> by James Reinders from Intel describes the acquisition and future plans in more detail.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/08/23/intel-acquires-rapidmind/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDPP 1.1 Now Available</title>
		<link>http://gpgpu.org/2009/07/01/cudpp-1-1-release</link>
		<comments>http://gpgpu.org/2009/07/01/cudpp-1-1-release#comments</comments>
		<pubDate>Wed, 01 Jul 2009 08:30:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[CUDPP]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1738</guid>
		<description><![CDATA[Release 1.1 of the CUDA Data-Parallel Primitives Library (CUDPP) is now available for download.  The two major new features in CUDPP 1.1 are a very fast new radix sort implementation with support for sorting key-value pairs (with float or unsigned integer keys); and a new pseudorandom number generator, cudppRand. CUDPP 1.1 also replaces its former custom [...]]]></description>
			<content:encoded><![CDATA[<p>Release 1.1 of the <a href="http://gpgpu.org/developer/cudpp">CUDA Data-Parallel Primitives Library</a> (CUDPP) is now available for download.  The two major new features in CUDPP 1.1 are a very fast new radix sort implementation with support for sorting key-value pairs (with float or unsigned integer keys); and a new pseudorandom number generator, cudppRand. CUDPP 1.1 also replaces its former custom license with the standard BSD license. This greatly simplifies the CUDPP license details, and it also enables CUDPP to move into a public source repository such as Google Code in the near future. For more information, visit the <a href="http://gpgpu.org/developer/cudpp">CUDPP Website</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/07/01/cudpp-1-1-release/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Efficient parallel scan algorithms for GPUs</title>
		<link>http://gpgpu.org/2009/06/24/sengupta-segscan</link>
		<comments>http://gpgpu.org/2009/06/24/sengupta-segscan#comments</comments>
		<pubDate>Thu, 25 Jun 2009 01:19:46 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[CUDPP]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1696</guid>
		<description><![CDATA[This NVIDIA technical report by Sengupta, Harris, and Garland describes the design of new parallel algorithms for scan and segmented scan on GPUs.   This paper describes the primitives included in the latest release of the CUDPP library. Abstract: Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan [...]]]></description>
			<content:encoded><![CDATA[<p>This <a href="http://mgarland.org/papers.html#segscan-tr" target="_blank">NVIDIA technical report</a> by Sengupta, Harris, and Garland describes the design of new parallel algorithms for scan and segmented scan on GPUs.   This paper describes the primitives included in the latest release of the <a href="http://gpgpu.org/developer/cudpp">CUDPP</a> library.</p>
<p>Abstract:</p>
<blockquote><p>Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms.  Segmented scan and related primitives also provide the necessary support for the flattening transform, which allows for nested data-parallel programs to be compiled into flat data-parallel languages.  In this paper, we describe the design of efficient scan and segmented scan parallel primitives in CUDA for execution on GPUs.  Our algorithms are designed using a divide-and-conquer approach that builds all scan primitives on top of a set of primitive intra-warp scan routines.  We demonstrate that this design methodology results in routines that are simple, highly efficient, and free of irregular access patterns that lead to memory bank conflicts.  These algorithms form the basis for current and upcoming releases of the widely used CUDPP library.</p></blockquote>
<p>(S. Sengupta, M. Harris, and M. Garland. <a href="http://mgarland.org/papers.html#segscan-tr" target="_blank"><em>Efficient parallel scan algorithms for GPUs</em></a>.     NVIDIA Technical Report NVR-2008-003, December 2008)</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/06/24/sengupta-segscan/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fast and Scalable List Ranking on the GPU</title>
		<link>http://gpgpu.org/2009/04/28/fast-and-scalable-list-ranking-on-the-gpu</link>
		<comments>http://gpgpu.org/2009/04/28/fast-and-scalable-list-ranking-on-the-gpu#comments</comments>
		<pubDate>Wed, 29 Apr 2009 03:27:38 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Data-Parallel]]></category>
		<category><![CDATA[List Ranking]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Parallel Algorithms]]></category>

		<guid isPermaLink="false">http://gpgpu.org/2009/04/28/fast-and-scalable-list-ranking-on-the-gpu</guid>
		<description><![CDATA[Abstract from the paper by Rehman et al.: General purpose programming on graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. While GPUs are usually used to tackle regular problems that can be easily parallelized, we describe two implementations [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract from the <a href="http://research.iiit.ac.in/~rehman/Papers/ics152-rehman.pdf" target="_blank">paper by Rehman et al.</a>:</p>
<p>General purpose programming on graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. While GPUs are usually used to tackle regular problems that can be easily parallelized, we describe two implementations of List Ranking—a traditional irregular algorithm that is difficult to parallelize on such massively multi-threaded hardware. In our best implementation, we introduce a GPU-optimized, recursive version of the Helman-JaJa algorithm. Our implementation can rank a random list of 8 million elements in just over 100 milliseconds, and achieves a speedup of about 8-9 over a CPU implementation as well as a speedup of 3-4 over the best reported implementation on the Cell Broadband Engine. We also discuss some practical issues that come to the fore when working with massively multi-threaded architectures, especially for algorithms with highly irregular memory access patterns. (M. Suhail Rehman, K. Kothapalli, P.J. Narayanan. <a href="http://research.iiit.ac.in/~rehman/Papers/ics152-rehman.pdf" target="_blank">Fast and Scalable List Ranking on the GPU</a>. 23rd International Conference on Supercomputing (ICS). New York, USA, June 2009. (To Appear))</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/04/28/fast-and-scalable-list-ranking-on-the-gpu/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

