Automating CUDA-to-OpenCL Translation

  • If I have seen further it is by standing on the shoulders of giants — Issac Newton


Release Notes

CU2CL Beta Release 0.8.0b - 2017.03.21

Welcome to the CU2CL (CUDA-to-OpenCL) source-to-source translator project. CU2CL is a work-in-progress academic prototype translator that endeavors to provide translation of many of the most frequently used CUDA features. However, CUDA is a large, moving target and CU2CL may not support all syntax and API elements of the current CUDA specification.

Currently it targets OpenCL as the output standard. Further it aims to provide an easily-maintained translated source by preserving formatting and developer comments from the original CUDA, as well as injecting searchable comment tags for untranslated structures. (These are intended to aid manual post-translation intervention.)

The tool translated both host- and device-side code, generating separate families of output files suitable for a standard C++ compiler (*-cl.cpp/h files), as well as an OpenCL kernel compiler (* files). The majority of supported translations are focused on core Runtime API calls for memory, device, thread, stream, and event management as well as the CUDA C-style kernel invocation and full kernel function translation.

Additionally, the 0.8.0b release has added a range of new features to support cross-translation unit global conversion of large, multi-file codebases. Please see the full release notes for more information.

This bleeding-edge release currently has limited or no support for several features including (but not limited to):

  • CUDA Textures and Surfaces
  • CUDA/OpenGL Interoperability
  • C++ Templates in kernel code
  • CUDA Driver API
  • CUDA Runtime API features newer than 3.2
  • CUDA Kernel features newer than 3.2
  • Precompiled CUDA libraries

Many of these features are supportable in OpenCL but have not been necessitated by the applications of interest that CU2CL has been provisioned for. Pull requests to add or improve support for the one-to-one mappings that exist are always appreciated.

We have endeavored to provide translation-time warnings for as many of these cases as possible, but the tool is undergoing active development and the CUDA specification is a moving target so there are likely cases we've missed. There is also a potential for the translator to produce small mistranslations in the presence of atypical usage of CUDA syntax. There will be bugs, and we'd love to hear about them so that we can improve the tool. Contact us!


Currently only tested and officially supported on 64-bit variants of Debian 8.7 "Jessie" and later and Centos 6.5. (But likely to work on other 64-bit Debian or Fedora-based distributions.)

  • make
  • cmake >= 2.8
  • wget
  • CUDA Runtime >=3.2
  • GNU C Library (glibc) >= 2.19

Ensure you have internet access as the install process will download the specific LLVM/Clang revision used to build the CU2CL binary.

Unpack the cu2cl_binary-0.8.0b.tgz tarball, and invoke the BASH script to start the install process. The script will use wget to download the necessary revision of the LLVM/Clang source, and unpack it to subdirectory “llvm-3.4” and create the build directory “llvm-build”. It will then configure cmake and build the LLVM/Clang binaries required by CU2CL. Compiling Clang can take some time, so by default the installation script is configured to use two make jobs for compilation. To change this behavior, edit line 23 of the script “make -j2” to reflect the desired number of jobs.

The script then symbolically links a small number of include files generated during the LLVM/Clang build file so they are locatable my the CU2CL build process. It then configures cmake to build CU2CL and performs the build. CU2CL is a single source tool, so no threaded make is necessary for this step.

After compiling LLVM/Clang and CU2CL, the installation should be complete. The binary cu2cl-tool in directory "cu2cl-build" can now be used to perform translations. Do not remove the "llvm-build" directory as CU2CL needs access to (If you have a local packaged install of this library, you may be able to use that, however this is untested.) You will also likely need to update your LD_LIBRARY_PATH variable to point to the newly-generated llvm-build/lib/


Once built, the cu2cl_tool binary can be used to translate sets of CUDA source files, along with any local CUDA source files they include. However, it will not provide translation of any included headers specified with the system include "<...>" syntax. After successful translation, CU2CL will output *-cl.cpp, *, and *-cl.h files as appropriate for host, device, and header code.

In order to run the tool the following syntax must be used:

"./cu2cl_tool fooRuntimeMain.cpp -- -DGPU_ON -I ./fooIncludes" 

Which would generate fooRuntimeMain.cpp-cl.cpp,,,, and any associated translated header files.

Additionally, a set of CU2CL utility functions will be generated in cu2cl_util.c/h/cl. cu2cl_util.c must be compiled and linked into the finished executable for the linking to succeed, as it includes requisite initialization, cleanup, and other OpenCL utility functions.

It will selectively attempt to translate any *.c *.h, *.cpp, *.hpp or other included source file types, if they contain CUDA syntax (variables, runtime function calls, or special syntax) and are not a system include (i.e. local to the project). It will not attempt to translate any includes specified with the #include <...> syntax reserved for system headers - project headers should use the #include "..." syntax. Finally, it does not support the CUDA SDK Samples' shrUtils or cutils, and will likely emit malformed source code if they are present. Please manually refactor your code to handle these constraints before attempting translation.


I have CUDA syntax in my header files but I don't see any translator output!

CU2CL will not translate headers specified using the angle brace syntax reserved for system header files, only those using the appropriate double quotes syntax for locally-included files.

i.e. NOT

#include <localCUDAheader.cuh>

but instead

#include "localCUDAheader.cuh" 
My code uses NVIDIA's cUtils/shrUtils, but the translated code looks broken.

CU2CL does not support translation of these non-canonical CUDA libraries, and we do not intend to. Please manually refactor your code to not rely on any of these non-canonical functions.

My code uses macro FOO(a, b), why does it look broken in output OpenCL?

CU2CL's macro support is still rather basic. We are working on improving it's ability to handle preprocessor directives, but in the mean time, factoring out preprocessor directives as much as possible will dramatically improve the correctness of output.

Why are sections of my code surrounded by #ifdef, #ifndef not translated?

CU2CL acts like a compiler and will only provide translation of the conditionally compiled regions defined at translation time. If you wish to translate other regions, preprocessor definitions can be specified to the translators parser after the "--". Further, to pass these conditional compilation macros down to the OpenCL Kernel compiler the --cl-extra-args CU2CL option should be specified as a double-quote delimeted string before the "--" .

i.e "./cu2cl_tool --cl-extra-args="-D SOMEVAR=1" -- -D SOMEVAR=1 -I ./" 
My CUDA device variable that's declared in header file "foo.h" isn't translated, even though it's initialization in another file is. What's going on?

As of CU2CL 0.8.0b propagation of cl_mem type rewrites up and down the call-graph, and across translation unit boundaries is supported if the following conditions are met: 1) All translation units that make up the final binary must be specified on the same invocation of CU2CL. (i.e. do not run separate invocations per each *.o file, specify all the sources at once.) 2) The variable declaration is not in a banned header file (i.e. a system include). 3) The variable is of a primitive device pointer type (i.e. not a struct, enum, union, typedef, class, etc. -- static-size arrays or pointers-to-pointers should be fine.) If these conditions are met and it is still not translated, please submit a minimal representative example as an issue on our GitHub.

My well-packed 3-member vector type is taking up too much memory in OpenCL!

CU2CL currently targets OpenCL 1.0 compatibility which does not provide a three-member vector type. Therefore these are automatically upconverted to four-member vectors, which may cause issues with data alignment.

The templates/classes in my kernel code aren't translated!

OpenCL does not support C++ in device code, and CU2CL does not support translation of these features to C. Please manually remove all C++ from device code before translation.

The CUDA library my application relies on doesn't appear to be used anymore.

CU2CL cannot translate precompiled CUDA libraries. Manually convert your OpenCL code to use an equivalent OpenCL library after translation.

My OpenCL code is trying to call a nonexistent kernel function! / Invoking a CUDA function pointer doesn't appear to be working after translation.

We do not currently support recognizing CUDA kernel function pointers. You will need to manually adapt your OpenCL code to emulate this behavior.

My OpenCL kernel code is trying to include system headers! Why is it doing that?

This kernel code likely resided in a CUDA file which included these system headers. They were copied without modification into the OpenCL device source code file. Please manually remove them.

I'm getting errors about "invalid output constraint '=h' in asm" in CUDA headers from CUDA versions >= 5.0 and can't proceed with translation!

Some versions of Clang 3.2 have difficulty parsing some inline assembly from the CUDA headers (from CUDA versions >= 5.0). A workaround that allows translation to proceed in most cases --- those not requiring these specialized intrinsics for Compute Capability > 3.0 --- is to simply nullify these headers with a #define.

Try adding the "-D __SM_32_INTRINSICS_H__" override to the CU2CL invocation like so:

i.e "./cu2cl_tool -- -D __SM_32_INTRINSICS_H__ -I ./" 

One warning, if your CUDA code actually makes use of these intrinsics, the translation behavior after this workaround is essentially undefined.


Xilinx Harris

As a thanks for supporting our work, sponsors receive early access to all major releases. If you are interested in becoming a sponsor, please contact us.

Recent News

CU2CL 0.8.0b Released

03/21/17: We are pleased to announce the release of the 0.8.0b version of CU2CL. The whole program translation architecture debuted in CU2CL 0.7.0b has been expanded to include our first cross-AST translation and type propagations. A binary tarball is available with registration here and the full source is available on GitHub.

CU2CL at SC'16 Emerging Technologies Showcase

09/27/16: Sathre, Gardner and Feng have been selected to demonstrate CU2CL's effectiveness during the Emerging Technologies Showcase at Supercomputing'16 in Salt Lake City.

More publications ...

CU2CL Releases

Source Release 0.8.0b

Latest update (03/21/17)

Binary Release 0.8.0b

Latest update (03/21/17)

CU2CL License

Read | Download