The naming scheme of the shared libraries in nvidia-glx-driver-common is not correct.
This causes runtime error when running nvidia-docker.
This patch uses the SONAME reported by objdump to determine the library name.
Details
- Reviewers
DataDrake - Group Reviewers
Triage Team - Commits
- R2210:8b623e0c78f6: libEGL_nvidia.so.1 should really have been libEGL_nvidia.so.0
Tested with nvidia-docker (D4518).
Diff Detail
- Repository
- R2210 nvidia-glx-driver
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
You can't just edit it that way, it needs to be rebased on the latest commit from the repo.
I'm trying to be clear here, but you aren't following. You need to rebase the patch on the latest commit. That isn't fixed with a rebuild. You need to do it with git.
So when you do figure that out, you also need to fix the dozen or so missing libraries. that's not ok.
| package.yml | ||
|---|---|---|
| 134–137 | Why are you doing all this and not just changing the .1 to .0 in the symlink call? | |
The libraries aren't really missing here. What are missing are some symlinks.
I modified the redo_libs function in the package.yml file. So now it creates the symlink based on the reported soname of the libraries. For some libraries, e.g. libnvidia-gtk3.so.410.78, its soname is also libnvidia-gtk3.so.410.78. Thus the function won't create the symlink libnvidia-gtk3.so.1 for it. While with the original redo_libs function this symlink would have been created. There are about 10 libraries whose sonames match their current names, and the symlinks to those are "missing" in the new build. However, they should not impact anything.
Or you could just change line 142 to reflect .0 instead of .1, it would work the same and you wouldn't go breaking my symlinks. :)
| package.yml | ||
|---|---|---|
| 134–137 | The only library needs to be changed is libEGL_nvidia.so. I tried to find a systematic way to determine whether we should call it libEGL_nvidia.so.0 or libEGL_nvidia.so.1 rather than random guessing. This is also the way that other programs (e.g. nvidia-docker determines which library to link to during runtime). That is why I am using objdump here. | |
The libEGL_nvidia.so library is the only one that needs to be called libEGL_nvidia.so.0 instead of libEGL_nvidia.so.1. Most other libraries have a soname in the form of lib().so.1. That is one of the reasons why I didn't simply modify line 142. That would have made the name of the symlink for most other libraries incorrect.
| package.yml | ||
|---|---|---|
| 134–137 | Then why are we changing everything else and not just doing a single symlink? | |
Sooo. if most of the others are .1, then why does you fix remove the their symlinks? Do you not see the deletions?
| package.yml | ||
|---|---|---|
| 134–137 | With the current patch:
I agree that we can add a few lines to fix libEGL_nvidia.so. But I don't know whether NVIDAI will give it another soname so that it might be called libnvidia-glsi.so.1 in the future. At that time, we need to fix the patch again. Someone always needs to check the soname of that library that way. If you believe it is safer to simply change the name for libEGL_nvidia.so.0, I will also be happy to do that. | |
Yeah I saw those deletions.
I might not be clear enough earlier. Let me give a bit more details here.
The NVIDIA driver includes a lot of binary libraries, such as libEGL_nvidia.so.410.78, libglxserver_nvidia.so.410.78, libcuda.so.410.78, etc. The redo_libs function creates symlinks for these libraries.
In the original repo, almost all libraries will have a symlink named lib().so.1, such as libEGL_nvidia.so.1, libglxserver_nvidia.so.1, libcuda.so.1.
When we tested the nvidia-docker package, I found that libEGL_nvida.so.1 should really be named as libEGL_nvidia.so.0 because that is what the nvidia-docer program looks for, and it looks for that name because it is the elf name of libEGL_nvidia.so.410.78.
Now I propose to use objdump to explicitly check the soname of all libraries shipped in NVIDIA driver package.
I found there are three situations.
For libEGL_nvidia.so.410.78, its soname is libEGL_nvidia.so.0, thus I create a symlink named libEGL_nvidia.so.0 for it.
For libcuda.so.410.78 (and many others), its soname is libcuda.so.1, thus a symlink named libcuda.so.1 is appropriate.
For libglxserver_nvidia.so.410.78 (and a few others), the soname is also libglxserver_nvidia.so.410.78, thus no symlink is created for its soname. This causes the apparent "deletions" in the xml file. My thought is that, since its soname matches its current name, there is really no point in creating some symlinnks called lib().so.1 for these libraries. In principle, no programs should look for these libraries by the name lib().so.1 either.
In principle (as long as nvidia hasn't screwed up their libraries, so would need testing), this is a much more sane, logical and clever approach. It should also allow cleanup of the package.yml by removing this section, as it would create the expected links, rather than working around them not being .so.1. Probably should be done with all the nvidia packages to ensure they have the right links and aren't prone to breaking when things change.
# Get weird guys in order first
ln -sv libGLESv2_nvidia.so.${version} $installdir/%libdir%/libGLESv2_nvidia.so.2
ln -sv libGLESv2_nvidia.so.${version} $installdir/usr/lib32/libGLESv2_nvidia.so.2
ln -sv libGLX_nvidia.so.${version} $installdir/%libdir%/libGLX_nvidia.so.0
ln -sv libGLX_nvidia.so.${version} $installdir/usr/lib32/libGLX_nvidia.so.0Yes that is the reason why I am doing it with objdump. Now I removed the following lines in the original package.yml file:
# Get weird guys in order first
ln -sv libGLESv2_nvidia.so.${version} $installdir/%libdir%/libGLESv2_nvidia.so.2
ln -sv libGLESv2_nvidia.so.${version} $installdir/usr/lib32/libGLESv2_nvidia.so.2
ln -sv libGLX_nvidia.so.${version} $installdir/%libdir%/libGLX_nvidia.so.0
ln -sv libGLX_nvidia.so.${version} $installdir/usr/lib32/libGLX_nvidia.so.0and from the xml file we see that these "special" libraries are taken care of correctly by the redo_libs function and objdump. I agree that this is a much more reliable method.
This is necessary and I planned to do it for all nvidia driver packages if this patch get approved.
@DataDrake @JoshStrobl If you are still concerned with the deleted symlinks (which happens when the elf name of the library matches its current name), I can add them back.