Page MenuHomeSolus

CUDA
Closed, ResolvedPublic

Assigned To
Authored By
theSoenke
Aug 24 2016, 9:51 PM
Referenced Files
None
Tokens
"Like" token, awarded by different55."Like" token, awarded by FxPape."Burninate" token, awarded by nrfaikar."Like" token, awarded by nathanielsimard."Love" token, awarded by Jacalz."Like" token, awarded by smalltimer."Like" token, awarded by micheldehn."Love" token, awarded by miwilc."Love" token, awarded by doc-E-brown."Like" token, awarded by tlewis334."Mountain of Wealth" token, awarded by Wolfgange3311999."Love" token, awarded by Kremor."Mountain of Wealth" token, awarded by ryangorley."Love" token, awarded by LongSeanSilvr."Like" token, awarded by sharms."Love" token, awarded by JGH1000."Love" token, awarded by krieghof."Love" token, awarded by Riokei."Like" token, awarded by usillos."Love" token, awarded by andresclari.

Description

Name: CUDA
Homepage: https://developer.nvidia.com/cuda-zone
Open Source: no
Download: http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run

Required for many machine learning related stuff also useful for GPU support in Blender T238
The NVIDIA license seems to allow redistribution so i guess it does not need to land in third party
http://docs.nvidia.com/cuda/eula/#redistributable-software

Revisions and Commits

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@DataDrake Have there been any updates on this? I need cuda for machine learning research, and would love it if I could do this type of work in Solus. What are the bottlenecks keeping it from being implemented?

Perhaps we could ask @ikey to look at this - CUDA is really essential for a lot of people and this topic was opened a year ago. It's also not something that can be replaced by already available packages.

Perhaps some sort of docker image? As far as I read different versions of clang can co-exists.

Hi Guys, bumping this one up - I moved to Solus from Ubuntu and it's a bummer to see no CUDA support. As @JGH1000 mentioned, this is essential for a lot of machine learning and other tasks, would really love to see it in the Solus repos.

Cuda 8.x requires outdated version of gcc/clang.
Cuda 9 is currently in release candidate will support newer versions of the compilers as @JGH1000 stated. So I guess this should be reevaluated once Cuda 9 is released.

Unknown Object (User) added a comment.Aug 6 2017, 10:51 PM

@kyrios123 just out of curiosity, how does archlinux manage that if they are always on the latest versions? Do they provide the RC? I can not work on this issue though, no NVidia card anymore..

@STiAT Cuda 9 isn't publicly available, only registered users can download it.
I made a quick search and ArchLinux provides cuda 8 and some earlier versions thru the community repository.

Arch Linux allow users to install different version of GCC at time, that's the trick. Check depends here.

Would there be a scope to have multiple GCC versions in Solus? CUDA 9 supports GCC 6.x but what if Solus moves to GCC 7.x? CUDA is usually used by professionals that need a stable environment for work. As far as I remember Arch creates a symbolic link in /opt/cuda to older GCC. Seems pretty simple but I might be wrong.

Unknown Object (User) added a comment.Aug 22 2017, 9:34 PM

@JGH1000 there maybe is, and there may be a way to work around the licensing issue as well with the new Snaps support but we'd have to bundle certain applications which require CUDA to be bundled in snaps, with older GCC and the application - those where it makes sense, and probably Blender and others are candidates for that.

@STiAT Sounds tricky, especially given that a lot of stuff in Python and R uses CUDA - all the scientific computing nowadays depend on that.

doc-E-brown rescinded a token.
doc-E-brown awarded a token.
doc-E-brown added a subscriber: doc-E-brown.

As a researcher in machine learning I'd love to switch to Solus full time as my sole distro. CUDA support would enable that.

One of our IRC users sgfx found a docker container that's allowed him to run iray renders much faster. It allows you to use CUDA and it's from NVIDIA's Github page

Install docker

sudo eopkg it docker

Install nvidia-docker and nvidia-docker-plugin

wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1_amd64.tar.xz
sudo tar --strip-components=1 -C /usr/bin -xvf /tmp/nvidia-docker*.tar.xz && rm /tmp/nvidia-docker*.tar.xz

Run nvidia-docker-plugin

sudo -b nohup nvidia-docker-plugin > /tmp/nvidia-docker.log

Test nvidia-smi

nvidia-docker run --rm nvidia/cuda nvidia-smi

sgfx here...

If you want the nVidia Docker plugin to run automatically on startup, you can make a systemd service to do that for you.

Create the file /etc/systemd/system/nvidia-docker-plugin.service with the contents:

[Unit]
Description=nVidia Docker plugin service

[Service]
ExecStart=/usr/bin/nvidia-docker-plugin

[Install]
WantedBy=multi-user.target

Start the service:

sudo systemctl start nvidia-docker-plugin

Enable it to autostart:

sudo systemctl enable nvidia-docker-plugin

Docker seems like a good way to go. I'll give it a try and will report back.

In T354#79438, @willgfx wrote:

sgfx here...

If you want the nVidia Docker plugin to run automatically on startup, you can make a systemd service to do that for you.

Create the file /etc/systemd/system/nvidia-docker-plugin.service with the contents:

[Unit]
Description=nVidia Docker plugin service

[Service]
ExecStart=/usr/bin/nvidia-docker-plugin

[Install]
WantedBy=multi-user.target

Start the service:

sudo systemctl start nvidia-docker-plugin

Enable it to autostart:

sudo systemctl enable nvidia-docker-plugin

Using this are there any further steps I need to do to render with cycles in blender?

@kyrios123 mentioned that Cuda 9 is a releases candidate. The status changed and CUDA 9 is downloadable now without any registration. Maybe it is now a possibility for packaging into Solus ?

@Scottapotamas @saitam @Riokei @willgfx @theSoenke can you please all test and see if a fully updated system works with CUDA as you would expect?

Preface: Dell XPS 15 (9550), uses a 960M.

NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.90  Tue Sep 19 19:17:35 PDT 2017
GCC version:  gcc version 6.4.0 (Solus)

I downloaded cuda_9.0.176_384.81.run, and went through with the install (but rejected it trying to install 384.81). Seemed to complete successfully.

I was able to compile a simple diy helloworld style prog with some cuda malloc, memcpy tests etc, but on run got a "error while loading shared libraries: libcudart.so.9.0: cannot open shared object file: ..."

I resolved this with sudo ldconfig /usr/local/cuda-9.0/lib64 and my hello-world runs properly. Moving onto more complicated samples (which I elected to put in /opt/cuda) I'm able to build the examples, but when trying to run simplePrintf I get:

scott@marble /opt/cuda/NVIDIA_CUDA-9.0_Samples/0_Simple/simplePrintf $ ./simplePrintf 
CUDA error at ../../common/inc/helper_cuda.h:1162 code=30(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)"

and for some of the more standard ones (matrixMul and friends), get

scott@marble /opt/cuda/NVIDIA_CUDA-9.0_Samples/0_Simple/matrixMul $ ./matrixMul 
[Matrix Multiply Using CUDA] - Starting...
cudaGetDevice returned error unknown error (code 30), line(390)
cudaGetDeviceProperties returned error unknown error (code 30), line(403)
MatrixA(320,320), MatrixB(640,320)
cudaMalloc d_A returned error unknown error (code 30), line(165)

Should I install the older graphics driver that is included in the cuda package? Is that a bad idea™? I was under the impression I resolved my path correctly.

(I'm not able to compile the more complex graphical examples due to apparently missing deps for libGL, libGLU, libX11, gl.h, glu.h, xlib.h). I know I've got glu.h, but this is probably an unrelated issue.


EDIT: shortly after testing this and posting, I noticed my fans kick on, and two CPU cores were pegged at 100% load but I couldn't find which tasks this correlated to (previous test progs with cuda were all killed). About a minute after I noticed this, system slowed to the point where the cursor wouldn't respond, audio streams cut out etc. Not saying its absolutely related, but I'm pretty sure its related...

@Justin: Yes, the update removed the need for nvidia-docker-plugin in my usage case (IRay renders in Allegorithmic Substance Designer)

@Justin Nope, in my use case for blender I don't see my gpu showing up as a usable cycles device.

Just spotted this Solus G+ CUDA post which explains why @Riokei and I didn't win.

@Justin I'll try my stuff when the CUDA SDK needs testing, I'm probably not experienced enough to help much before that point.

@Justin Full CUDA support requires nvcc with few libs and headers altogether and the driver part. Driver part was solved via https://dev.solus-project.com/R2209:0a0742a93f8eba0ff9f13a03423b016c4da4618a but the rest is still missing.
Snapping stuff like Blender and FFmpeg (2 most common examples) together with CUDA and GCC 6 would increase their size by huge margin.

Most convenient solution for users would be having separate GCC package in main repo like Arch Linux does and distributing CUDA via snap or script that downloads and installs it (like in openSUSE).

Is there any new progress here, and has supporting cuDNN already been added to the road map, or do I need to submit a separate request?

IntenceYT removed a subscriber: IntenceYT.
IntenceYT added a subscriber: IntenceYT.

I have been able to install CUDA on my Solus system and use it to compile and run some of my own CUDA programs.

Thus far, I have only installed it in my home directory, and used environment variables to make the libraries and include files usable.

I largely used the Arch PKGBUILD as a guide for installing, including the "hack" for making glibc 2.26 work by adding a #define to one of the header files.
I did not apply the findglllib.mk patch (it would need to be adapted for Solus), resulting in one of the samples throwing warnings.
I also did not bother to remove many of the files, including duplicates of the samples, jre, etc.

The other thing I did, which may not be acceptable for a finished package, is making it ignore the gcc version check and just simply use system gcc 7.3.0.
I have seen no ill effects of that, though admittedly I have not tested extensively.

The CUDA included sample programs compiled without error.

I am running NVIDIA-current drivers (390.48) with a GTX 1060 GPU.

CUDA 9.2 was released, with support for glibc 2.26 and GCC 7 (at least on Fedora 27 and Ubuntu 17.10)

https://developer.download.nvidia.com/compute/cuda/9.2/Prod/docs/sidebar/CUDA_Installation_Guide_Linux.pdf

It does, however, require version 396.26 of the proprietary drivers, which are not yet released in the repo.

When CUDA is added please keep in mind there needs to be an easy way of handling the different versions of CUDA, and that you don't just include the most recent one. Deep learning libraries and many creative applications use very varying revisions of CUDA. Also please include cuDNN at some point.

@justinkterry How is CUDA in terms of backward compatability? The problem, as stated earlier in this tread, is that older versions often have (sometimes seemingly arbitrarily imposed) prerequisites on older versions of GCC/glibc. Maintaining the older versions therefore becomes a bit of a pain, as Solus does not by default have the ability to have multiple GCCs and such installed.

As for cuDNN, that would probably be a separate package, as it is a separate download from CUDA itself. (Indeed, it requires a login to even get a download link, at least officially)
Arch does package it, and you can see their PKGBUILD if you search for it. Not sure how/if their maintainer received permission from Nvidia to do so.

I have a package.yml file that worked on my machine (and submitted in D2962), however CUDA cannot be distributed in the repos at this time due to the EULA.

(There are still certainly some cleaning up I could do to it, but may not bother until/unless it can be distributed more broadly)

For anyone interested, know that CUDA can be installed and used on Solus.

So the whole CUDA ecosystem is a kind of a complex shit show. You have a CUDA version (i.e. 9.1) and subversion (i.e. 9.1.2) with fixes. Each version requires a specific range of drivers, and hardware (though any modern hardware/ hardware something like solus wants to support is fine). Each program requires specific version or range of versions of CUDA, which are almost never the most current version. Most interesting things require cuDNN as AI is baked into literally everything now, and the biggest users of GPU acceleration are ML people (like myself). cuDNN has versions and subversions as well, with builds of each for the most current different versions of CUDA. Support for the newest version of cuDNN happens sometimes in mainstream software. As much as a shit show this is, for technical reasons it's justified. The only thing they could do to make things a little better is start including cuDNN by default with CUDA like they do with cuBLAS etc.

I have no idea how arch couldve got permission, I'm almost certain they haven't and don't care.

Also just a tip: Nvidias public facing documentation for a lot of CUDA things is objectively wrong. You have to log in with a developer account (anyone can make one) and get the full manuals, which are the some of the worst documentation I've ever read (though technically correct).

Let me know if you have any other questions!

So I spoke to @DataDrake about this in IRC awhile back, and he said the reason you hadn't included CUDA was due to licensing issues and you were looking into CUDa docker. However, I'm reading the license and I can't find any issue with redistributing it. Programs built to use CUDA (say adobe after effects) actually are forced to include it in the package they distribute.

Not only that, but after reading the official license on CUDA (https://docs.nvidia.com/cuda/eula/index.html#nvidia-cuda-toolkit-license-agreement), I ran into this gem:

"CUDA Licensed Software designed exclusively for use on the Linux or FreeBSD operating systems, or other operating systems derived from the source code to these operating systems, may be copied and redistributed, provided that the object code files thereof are not modified in any way (except for unzipping of compressed files)."

So why can't CUDA and cuDNN versions be added to the 3rd part software repo?

TL;DR Yay, a new EULA. The answer is still: No.

The section you have called out is intended to provide permission to Linux distributions to redistribute unmodified versions of CUDA alongside packaged software. If does NOT allow us to redistribute the headers needed to compile software against CUDA.

In the case of Blender for example:

Allowed:

  • Redistribute pre-compiled and unmodified Blender with unmodified Blender-provided CUDA libraries

Not Allowed:

  • Package CUDA and build Blender against it

Further evidence:

2.7. Attachment A
Redistributable Software
In connection with Section 1.2.1.1 of this Agreement, the following CUDA Toolkit files may be distributed with Licensee Applications developed by you, including certain variations of these files that have version number or architecture specific information embedded in the file name - as an example only, for release version 6.0 of the 64-bit Windows software, the file cudart64_60.dll is redistributable.

Note: They explicitly call out the files which are redistributable with Licensee Applications by the developers, not us. Headers are not included.

As for cuDNN, it is under a separate license than CUDA which is even more restrictive, but has equivalent limitations on its redistribution in our situation.

I would also like to call out that this particular section is now a part of the new EULA and seems to indicate legal responsibilities that we have no interest in taking on:

2.5. Audit
During the term of the AGREEMENT and for three (3) years thereafter, you will maintain all usual and proper books and records of account relating to the CUDA Licensed Software provided under the AGREEMENT. During such period and upon written notice to you, NVIDIA or its authorized third party auditors subject to confidentiality obligations will have the right to inspect and audit your Enterprise books and records for the purpose of confirming compliance with the terms of the AGREEMENT. Any such inspection and audit will be conducted during regular business hours and no more frequently than annually unless non-compliance was previously found. If such an inspection and audit reveals a material non-conformance with the terms of the AGREEMENT, then you will pay NVIDIA’s reasonable costs of conducting the inspection and audit. Further, you agree that the party delivering the CUDA Licensed Software to you may collect and disclose to NVIDIA information for NVIDIA to verify your compliance with the terms of the AGREEMENT including (without limitation) information regarding your use of the CUDA Licensed Software.

Fair enough, at least it's still a move in the right direction. Can you please look at my comments on T837 while you're around?

DataDrake claimed this task.

Use nvidia-docker.