Page MenuHomeSolus

libEGL_nvidia.so.1 should really have been libEGL_nvidia.so.0
ClosedPublic

Authored by xulongwu4 on Dec 1 2018, 11:06 PM.

Details

Summary

The naming scheme of the shared libraries in nvidia-glx-driver-common is not correct.
This causes runtime error when running nvidia-docker.
This patch uses the SONAME reported by objdump to determine the library name.

Test Plan

Tested with nvidia-docker (D4518).

Diff Detail

Repository
R2210 nvidia-glx-driver
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
xulongwu4 created this revision.Dec 1 2018, 11:06 PM
xulongwu4 requested review of this revision.Dec 1 2018, 11:06 PM
xulongwu4 edited the test plan for this revision. (Show Details)Dec 1 2018, 11:08 PM
DataDrake accepted this revision.Dec 11 2018, 3:06 AM
DataDrake added a subscriber: DataDrake.

LGTM. Thanks!

This revision is now accepted and ready to land.Dec 11 2018, 3:06 AM
DataDrake requested changes to this revision.Dec 11 2018, 3:12 AM

Needs a rebuild

This revision now requires changes to proceed.Dec 11 2018, 3:12 AM

Is there anything I still need to do for this patch? @DataDrake

the rest is fine, it just needs rebuild because there were new releases

xulongwu4 updated this revision to Diff 11357.Dec 11 2018, 4:37 AM

Fix release number

You can't just edit it that way, it needs to be rebased on the latest commit from the repo.

xulongwu4 updated this revision to Diff 11360.Dec 11 2018, 4:43 AM

Rebuild the package

I'm trying to be clear here, but you aren't following. You need to rebase the patch on the latest commit. That isn't fixed with a rebuild. You need to do it with git.

So when you do figure that out, you also need to fix the dozen or so missing libraries. that's not ok.

xulongwu4 updated this revision to Diff 11361.Dec 11 2018, 4:52 AM

Rebase the patch on latest commit

JoshStrobl added inline comments.
package.yml
134–137

Why are you doing all this and not just changing the .1 to .0 in the symlink call?

So when you do figure that out, you also need to fix the dozen or so missing libraries. that's not ok.

The libraries aren't really missing here. What are missing are some symlinks.

I modified the redo_libs function in the package.yml file. So now it creates the symlink based on the reported soname of the libraries. For some libraries, e.g. libnvidia-gtk3.so.410.78, its soname is also libnvidia-gtk3.so.410.78. Thus the function won't create the symlink libnvidia-gtk3.so.1 for it. While with the original redo_libs function this symlink would have been created. There are about 10 libraries whose sonames match their current names, and the symlinks to those are "missing" in the new build. However, they should not impact anything.

Or you could just change line 142 to reflect .0 instead of .1, it would work the same and you wouldn't go breaking my symlinks. :)

xulongwu4 added inline comments.Dec 11 2018, 5:00 AM
package.yml
134–137

The only library needs to be changed is libEGL_nvidia.so. I tried to find a systematic way to determine whether we should call it libEGL_nvidia.so.0 or libEGL_nvidia.so.1 rather than random guessing. This is also the way that other programs (e.g. nvidia-docker determines which library to link to during runtime). That is why I am using objdump here.

Or you could just change line 142 to reflect .0 instead of .1, it would work the same and you wouldn't go breaking my symlinks. :)

The libEGL_nvidia.so library is the only one that needs to be called libEGL_nvidia.so.0 instead of libEGL_nvidia.so.1. Most other libraries have a soname in the form of lib().so.1. That is one of the reasons why I didn't simply modify line 142. That would have made the name of the symlink for most other libraries incorrect.

JoshStrobl added inline comments.Dec 11 2018, 5:06 AM
package.yml
134–137

Then why are we changing everything else and not just doing a single symlink?

Sooo. if most of the others are .1, then why does you fix remove the their symlinks? Do you not see the deletions?

xulongwu4 added inline comments.Dec 11 2018, 5:16 AM
package.yml
134–137

With the current patch:

  • the libEGL_nvidia.so.0 has the correct name and all the other libraries are not affected (they are still named as lib().so.1 as can be seen from the xml file).
  • some of the symlinks are not created now. This would only happen if the soname reported by objdump is the same as its current name (e.g., the soname of libnvidia-glsi.so.410.78 is just libnvidia-glsi.so.410.78 as reported by objdump). But I feel these symlinks are not needed anyway. Programs look for dynamic libraries based on their soname. If we already have libnvidia-glsi.so and libnvidia-glsi.so.410.78, we should not need libnvidia-glsi.so.1 any more (again because its soname is libnvidia-glsi.so.410.78 rather than libnvidia-glsi.so.1).

I agree that we can add a few lines to fix libEGL_nvidia.so. But I don't know whether NVIDAI will give it another soname so that it might be called libnvidia-glsi.so.1 in the future. At that time, we need to fix the patch again. Someone always needs to check the soname of that library that way.

If you believe it is safer to simply change the name for libEGL_nvidia.so.0, I will also be happy to do that.

xulongwu4 added a comment.EditedDec 11 2018, 5:34 AM

Sooo. if most of the others are .1, then why does you fix remove the their symlinks? Do you not see the deletions?

Yeah I saw those deletions.

I might not be clear enough earlier. Let me give a bit more details here.

The NVIDIA driver includes a lot of binary libraries, such as libEGL_nvidia.so.410.78, libglxserver_nvidia.so.410.78, libcuda.so.410.78, etc. The redo_libs function creates symlinks for these libraries.

In the original repo, almost all libraries will have a symlink named lib().so.1, such as libEGL_nvidia.so.1, libglxserver_nvidia.so.1, libcuda.so.1.

When we tested the nvidia-docker package, I found that libEGL_nvida.so.1 should really be named as libEGL_nvidia.so.0 because that is what the nvidia-docer program looks for, and it looks for that name because it is the elf name of libEGL_nvidia.so.410.78.

Now I propose to use objdump to explicitly check the soname of all libraries shipped in NVIDIA driver package.

I found there are three situations.

For libEGL_nvidia.so.410.78, its soname is libEGL_nvidia.so.0, thus I create a symlink named libEGL_nvidia.so.0 for it.

For libcuda.so.410.78 (and many others), its soname is libcuda.so.1, thus a symlink named libcuda.so.1 is appropriate.

For libglxserver_nvidia.so.410.78 (and a few others), the soname is also libglxserver_nvidia.so.410.78, thus no symlink is created for its soname. This causes the apparent "deletions" in the xml file. My thought is that, since its soname matches its current name, there is really no point in creating some symlinnks called lib().so.1 for these libraries. In principle, no programs should look for these libraries by the name lib().so.1 either.

sunnyflunk added a subscriber: sunnyflunk.EditedDec 11 2018, 5:55 AM

For libglxserver_nvidia.so.410.78 (and a few others), the soname is also libglxserver_nvidia.so.410.78, thus no symlink is created for its soname. This causes the apparent "deletions" in the xml file. My thought is that, since its soname matches its current name, there is really no point in creating some symlinnks called lib().so.1 for these libraries. In principle, no programs should look for these libraries by the name lib().so.1 either.

In principle (as long as nvidia hasn't screwed up their libraries, so would need testing), this is a much more sane, logical and clever approach. It should also allow cleanup of the package.yml by removing this section, as it would create the expected links, rather than working around them not being .so.1. Probably should be done with all the nvidia packages to ensure they have the right links and aren't prone to breaking when things change.

# Get weird guys in order first
ln -sv libGLESv2_nvidia.so.${version} $installdir/%libdir%/libGLESv2_nvidia.so.2
ln -sv libGLESv2_nvidia.so.${version} $installdir/usr/lib32/libGLESv2_nvidia.so.2
ln -sv libGLX_nvidia.so.${version} $installdir/%libdir%/libGLX_nvidia.so.0
ln -sv libGLX_nvidia.so.${version} $installdir/usr/lib32/libGLX_nvidia.so.0
xulongwu4 updated this revision to Diff 11367.Dec 11 2018, 2:43 PM

Remove the # Get weird guys in order first part in package.yml.

In principle (as long as nvidia hasn't screwed up their libraries, so would need testing), this is a much more sane, logical and clever approach. It should also allow cleanup of the package.yml by removing this section, as it would create the expected links, rather than working around them not being .so.1.

Yes that is the reason why I am doing it with objdump. Now I removed the following lines in the original package.yml file:

# Get weird guys in order first
ln -sv libGLESv2_nvidia.so.${version} $installdir/%libdir%/libGLESv2_nvidia.so.2
ln -sv libGLESv2_nvidia.so.${version} $installdir/usr/lib32/libGLESv2_nvidia.so.2
ln -sv libGLX_nvidia.so.${version} $installdir/%libdir%/libGLX_nvidia.so.0
ln -sv libGLX_nvidia.so.${version} $installdir/usr/lib32/libGLX_nvidia.so.0

and from the xml file we see that these "special" libraries are taken care of correctly by the redo_libs function and objdump. I agree that this is a much more reliable method.

Probably should be done with all the nvidia packages to ensure they have the right links and aren't prone to breaking when things change.

This is necessary and I planned to do it for all nvidia driver packages if this patch get approved.

@DataDrake @JoshStrobl If you are still concerned with the deleted symlinks (which happens when the elf name of the library matches its current name), I can add them back.

DataDrake accepted this revision.Dec 17 2018, 12:46 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Dec 17 2018, 12:46 AM
DataDrake requested changes to this revision.Dec 17 2018, 12:48 AM

Needs a rebase. Please wait to update until nvida-glx-driver is rebuilt for 4.19.9.

This revision now requires changes to proceed.Dec 17 2018, 12:48 AM
DataDrake accepted this revision.Dec 23 2018, 2:16 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Dec 23 2018, 2:16 PM
This revision was automatically updated to reflect the committed changes.