This paper described an implementation of fast deformable image registration using GPUs and CUDA in radiation therapy. Using lung and prostate volumetric imaging, the GPU implementation is 40-66 times faster than a single-threaded CPU implementation and 25-41 times faster than a multithreaded implementation. The paradigm of GPU-based near-real-time deformable image registration opens up a host of clinical applications for medical imaging. ( High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. (Sanjiv S. Samant, Junyi Xia, Pınar Muyan-Özçelik, John D. Owens. Medical physics, 2008.)