UPDATE (March 28, 2017): I have found a way to use at least some of the C++11 features. See the end of the post for the changes.
It’s been a long time since I’ve posted anything. Since my last post I had to update the operating system on my work desktop as Fedora 23 went end-of-life. While it is possible to simply use dnf to upgrade your system, I have the proprietary Nvidia driver installed on my system for two reason. First, with my graphics cards (GeForce GTX 650’s), the Nouveau driver doesn’t seem to really work. Without fail, after about 30 minutes the screen no longer refreshes, making it seem as though the system is locked up. Second, I do some GPU programming with CUDA which requires the proprietary driver. With the proprietary driver installed, it’s a bit more difficult to upgrade, so I tend to just back everything up and do a clean install.
Having done that recently, I find myself this morning needing to re-install CUDA as I have a computational problem which could benefit from some massive parallelism. I figured I’d go ahead and post the procedure here for my future reference and in case anyone else might benefit from it.
- Start by installing the Nvidia proprietary driver using one of the following methods:
- Download directly from Nvidia and follow the directions here. This works well for desktops.
- If you have a laptop with Nvidia Optimus (i.e. an integrated Intel card with discrete Nvidia card), use Bumblebee following the directions here. To check if this is the case for you, look at the output of
$ lspci | egrep "VGA|3D"if the output is as follows (from my laptop):
$ lspci | egrep "VGA|3D" 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev ff)
then you’ll want to use Bumblebee.
- If you prefer, there are repos that have the Nvidia proprietary driver such as rpmfusion and negativo17. I have no recent experience with either of these repos so I can’t say how well they will work. I will note that negativo17 also has CUDA in their repo, but again I have no experience installing in that way.
- Once you have installed the driver, rebooted your system, and ensured that everything is working, then we can move on to installing CUDA. Download CUDA 8.0 for Fedora 23 (I know, it’s ridiculous that they don’t officially support version of Fedora that are not end-of-life) from here, making sure to select the “runfile (local)” option.
- Install a few libraries that CUDA would like to have around
$ sudo dnf install libGLU-devel libXi-devel libXmu-devel $ sudo dnf groupinstall "C Development Tools and Libraries"
- Change directory to where you downloaded CUDA and then run the install script with the
--overrideoption which will allow it to install with an unsupported version of GCC and Fedora 25 instead of 23
$ sudo sh cuda_8.0.44_linux.run --override
- You will first be presented with the End User Agreement which you should read carefully (i.e. simply press and hold enter until you reach the end), then type accept.
- You will be warned that you are installing on an unsupported configuration and asked if you’d like to continue. Enter
- You will be asked if you’d like to install the proprietary driver, which we already did in step 1, so enter
- You will be asked if you want to install the CUDA 8.0 Toolkit, which is kind of the whole point of this, so enter
y, and then use either the default location or a location of your choosing for the installation.
- You’ll probably want to create the symbolic link when asked, and you’ll definitely want to install the samples as they will allow us to test that the installation is working and they will provide you with lots of example code to look at. You can specify the location in which to install the samples, or use the default of your home directory. After that the install will proceed, ignore the warning about the incomplete installation.
- Now we need to edit our .bash_profile a bit, so open it up in your favorite text editor and add/modify the following lines
PATH=$PATH:<Other locations you may have>:/usr/local/cuda/bin LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<Other locations you may have>:/usr/local/cuda/lib64 CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:$HOME/NVIDIA_CUDA-8.0_Samples/common/inc export PATH export LD_LIBRARY_PATH export CPLUS_INCLUDE_PATH
- Once that is out of the way, you will of course have to log out and back in again or type
$ source $HOME/.bash_profilefor the changes to take effect.
- Okay, now we have to deal with the fact that Nvidia is way behind the times and doesn’t officially support any GCC version passed 5. Fedora 25 currently uses 6.3.1, so we need to edit one of the CUDA header files. Open
/usr/local/cuda-8.0/include/host_config.hin your editor of choice and change line 117 to something like
#if __GNUC__ > 8
I find the easiest thing to do is to copy the file to your home directory, make the changes and then use sudo to copy it back.
- Now that all of that is done, there is one more caveat. Since GCC version 5 defaulted to
-std=c++98when compiling, this is the behavior that CUDA expects. So, when compiling any CUDA code, you’ll need to include the
--compiler-options "-std=c++98"option. This will of course preclude you from using any C++11, C++14 or C++17 features in your CUDA code (see below for an update on this).
- To test that everything is working, change directory to
$ cd <PATH_TO_SAMPLES>/1_Utilities/deviceQuery
open the makefile and change line 155 to read
CCFLAGS := -std=c++98
save the changes and then in the terminal type
$ maketo build the code, then
primusrunbefore it if you use Bumblebee) to run the code. If you see information about the CUDA devices on your system, then everything should be working!
Unfortunately, I have yet to find a way to make all of the samples at once using the make file in the top samples directory. But you can build them individually in a similar manner to how we made the deviceQuery sample.
I hope that this is helpful to others. The procedure above has worked on my laptop with Bumblebee, and works on my desktop. If you try this is and it works on your system please let me know in the comments by providing the output of
$ uname -r,
$ lspci | grep "VGA" or
$lspci | egrep "VGA|3D" if you use Bumblebee, and
$ nvcc --version. Also, if you have problems or have to deviate from the above procedures let us all know. Happy coding!
UPDATE ABOUT C++11:
Today I needed to compile some code to run Markov Chain Monte Carlo (MCMC) where the model is computed on the GPU to speed things up from several minutes per model calculation to ~3.5 seconds. The header only MCMC library that I wrote uses the standard library random number generators and distributions that are part of C++11. This made me revisit the issue caused by NVIDIA being behind the times.
To access C++11 features, you can run compilation commands with:
$ nvcc -std=c++11 -arch=...
where after the “…” you fill in the rest of the flags you need. When I would do this on my system, it would complain about a couple of lines in math_functions.h, so instead of just giving up, I opened the file to dig into what was going on. The offending lines were (I believe) 8897 and 8901 which read
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ int isnan(double x) throw(); __DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ int isinf(double x) throw();
If we simply modify these lines to be in the same format as the others in that block, so they read
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isnan(double x); __DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isinf(double x);
then we should no longer have problems using (most of) the C++11 features! However, there is at least one exception that I have found so far thanks to my custom parameter file parsing library. If you use the standard library map, you’ll get a bunch of output like
/usr/include/c++/6.3.1/tuple:605:4: error: parameter packs not expanded with ‘...’: bool>::type=true> ^ /usr/include/c++/6.3.1/tuple:605:4: note: ‘_Elements’
However, you seem to be able to vectors, and tuples just fine. If you make these changes and run into other things that seem to break please let me know in the comments. I’ll try to look into this is more detail soon.