Page MenuHomeSolus

Steam's Proton 5.13-2 does not work, games do not launch
Open, Needs More InfoPublic

Description

When Steam Play's Proton version is set to "Proton 5.13-2", games that use it will not launch (basically all non-native Linux games) due to an error with "pressure-vessel".

Possibly related - https://github.com/NixOS/nixpkgs/issues/100655

This impacts Steam from the repos and installed as a Flatpak. Toggling LSI has no impact. The only thing the makes non-native games function is forcing Proton 5.0-10 which reduces performance and even breaks some games.

When Steam's Proton version is set to "5.13-2", the following errors spam the dmesg log:

[ 9747.296557] traps: pressure-vessel[46474] trap invalid opcode ip:7fbc228bed89 sp:7ffce2834140 error:0 in libc-2.31.so[7fbc228bd000+175000]

When launching any game with Proton 5.13-2, the following error is shown in Steam's stdout:

pressure-vessel-launch[49688]: Can't connect to peer socket: Could not connect: No such file or directory

Event Timeline

egee-irl created this task.Nov 26 2020, 8:23 PM
serebit added a subscriber: serebit.EditedNov 26 2020, 10:54 PM

Works on my machine (Proton 5.13-2). 5.13-1 didn't work on my system at first, but that turned out to be an issue with pressure-vessel. Maybe open an issue on the pressure-vessel github repo?

I did some additional testing with GloriousEggroll's custom Proton builds and found:

  • Proton-5.21-GE-1 - Does Not Work (same issue)
  • Proton-5.11-GE-3 - Works without issue
  • Proton-5.9-GE-8 - Works without issue

In addition to:

  • Proton 5.13-2 - Does not work
  • Proton 5.0-10 - Works for some (but not all) games

I suspect the Proton versions built into Steam Play don't like something about Solus but it's mystifying because the Flatpak version of Steam shouldn't be relying on system libs at all, right?

Oh, you're using Flatpak Steam? Maybe try the Solus dist of Steam and see if that changes results.

Oh, you're using Flatpak Steam? Maybe try the Solus dist of Steam and see if that changes results.

I've tried native & Flatpak and both have the same error. I was saying it's weird for the Flatpak version to have the same issue as the native version because it shouldn't rely on native libs but maybe that's my own misunderstanding.

Jacek added a subscriber: Jacek.Nov 28 2020, 12:54 AM

Just a wild guess, what GPU and drivers are you using. That could explain why newest Proton doesn't work, maybe try beta or stable instead if you are on Nvidia?

Girtablulu triaged this task as Needs More Info priority.Nov 29 2020, 8:49 AM
Girtablulu edited projects, added Software; removed Lacks Project.
Girtablulu added a subscriber: Girtablulu.

did you try deactivate LSI?

did you try deactivate LSI?

I can confirm this issue with our Steam package. Changing settings in LSI (and restarting Steam to make them apply) has no effect.

The NixOS issue linked in the OP leads to some interesting discussions (that mostly go over my head) like this one and this one

oh boy looks like some combination with our glibc and needs certain linking? I'd like a comment by core regarding this if we may fix this on our side or not :)

Is there any update on this issue? It seems that even with more recent versions of 5.13 it does not work on Solus... @Girtablulu

So it seems like this issue is caused by the --with-priv-mode=setuid option in our bubblewrap package.
chmod 0755 /usr/bin/bwrap fixes it, and games launch as expected with Proton 5.13

See this issue on the Steam Runtime bug tracker for details.

The setuid-option was added to bubblewrap with D1217. It doesn't mention the specific reason for including it, and flatpak also seems to run fine so far with it not set. Maybe someone who knows about this stuff can check if we actually need that option.

It exists before the referenced differential, that was just the flag being moved.

smcv added a subscriber: smcv.Mar 18 2021, 2:00 PM

I've tried native & Flatpak and both have the same error

They have the same high-level failure mode (game doesn't start) but for entirely different reasons.

The feature being used here is that Steam can start individual games in their own container environment, using a tool called pressure-vessel. For native Linux games, this is an opt-in feature intended to get each game to run in a predictable environment, which is particularly helpful for games that make assumptions that were true in 2015 but are no longer valid (this is described as "Steam Linux Runtime" in the UI). For Windows games that run via a compatibility layer, a newer version of the container runtime is automatically used whenever you have Proton 5.13 or newer as your compatibility layer, because those Proton builds require a newer library stack than the one Steam has traditionally provided.

pressure-vessel uses a system copy of bwrap if available, instead of its own copy, because some kernels require a setuid bwrap (which Steam cannot provide, because it is an unprivileged user process).

For the version of Steam running directly on Solus without a container, according to @Staudey's report in the Steam Runtime bug tracker it works fine if bwrap is not setuid, but something crashes (with no error message printed, and possibly with SIGILL) if bwrap is setuid. The workaround is to not make bwrap setuid. This is meant to work, and on Debian 10 (which genuinely requires a setuid bwrap), it does - but apparently something is sufficiently different on Solus to cause a crash. This is probably a bug in either bwrap or pressure-vessel, but at the moment we can't tell which. If someone can get details of which process is crashing, and ideally also a backtrace, then that would be very helpful.

bwrap does not need to be setuid unless your kernel requires it. The setuid mode of installation for bwrap disables some features for security reasons, and is intended to be used with kernels that do not support unprivileged creation of user namespaces, such as the distro kernels in Debian 10 or older and the non-default linux-hardened kernel in Arch Linux. This is a security trade-off in those distributions: disabling unprivileged creation of user namespaces reduces the kernel's attack surface and prevents exploitation of some kernel bugs (such as CVE-2016-3135); but it makes it necessary for bwrap to be setuid, which means that bugs in bwrap could be used by a local attacker to get root privilege escalation (such as CVE-2020-5291).

As far as I can see from the packaging git repo, Solus' packaged kernels (both -lts and -current) have full support for unprivileged creation of user namespaces, similar to the distro kernels in Fedora, Ubuntu and Debian 11, and Arch Linux's default kernel. For kernels like these, a non-setuid bwrap is preferred: it offers more features than the setuid bwrap, and prevents bwrap bugs like CVE-2020-5291 from being used for root privilege escalation.

Having a kernel that allows unprivileged user namespaces (like Debian 11), combined with a setuid bwrap (like Debian 10), maximises the attack surface: it gives attackers access to more bugs in the kernel and more bugs in bwrap, and in fact there have been several bwrap CVEs that were really only exploitable in this situation. I would not recommend this configuration.

(Obviously, the kernel developers and the bwrap developers both fix known vulnerabilities like CVE-2016-3135 and CVE-2020-5291 as soon as they become known - the point here is to defend you against possible bugs similar to those that are not known about yet.)


Meanwhile, the Flatpak app can't launch new containers because it's already in a container. This is known, and not a Solus-specific issue (Steam Runtime bug, Flathub bug). I've been working on a solution for a while, but it requires a new version of Flatpak with several new features, which are currently waiting for review. The only thing Solus needs to do to benefit from this is to keep your packaged version of Flatpak up to date. Because the new features add new API, I would strongly discourage patching them in as a distro-specific change before they have been reviewed and merged upstream.

The new Flatpak features that will make this work with the Flatpak version of Steam will require that Flatpak is using a copy of bwrap that is not setuid. As far as I can tell from Solus' packaging repository, this is already the case - it's using a Flatpak-specific bundled copy, installed as flatpak-bwrap, instead of the system copy.

It exists before the referenced differential, that was just the flag being moved.

Oh yeah, never mind, it was already part of the initial package inclusion.

serebit added a comment.EditedMar 20 2021, 3:37 PM

Any update on this, @egee-irl / @Staudey / @ekianjo? There have been a few iterations on pressure-vessel and Proton since this issue was created. If the Flatpak version of Steam can't launch Proton 5.13, then that's an upstream issue according to @smcv (who is apparently from Collabora! Neat), but if our native Steam package can't launch Proton 5.13 on some systems then this warrants further investigation. Like smcv said, a backtrace would be helpful.

Edit: Just saw Staudey's issue on steam-runtime. We could disable --with-priv-mode=setuid in bwrap to resolve this, or we could wait for the issue to be resolved upstream.

@serebit I'll have to see if I can get a backtrace or something, but maybe I will need someone to walk me through it in the end :-)

JoshStrobl added a comment.EditedMar 20 2021, 8:04 PM

@serebit No instead @ekianjo would rather write an article claiming it is a "this as a low priority task" rather than being help, recognizing it was triaged as needs more information (not low), and providing any additional information that could be required or useful. Fortunately @smcv was kind enough and gracious with his time to create an account and provide a comprehensive comment, identifying it as a broader issue not limited to Solus. Given this comment occurred during a sync that I actually did on a Thursday and not a Friday so I could get cgroups v2 support in afterwards, including rebuilds of the kernel, I didn't see it -- not that I have my eyes on every issue of the tracker 24/7 (reminder: everyone here is a volunteer, I may have a Patreon but still not able to do full-time OSS development). So instead of reviewing @serebit's patch that he was kind enough to provide, I'm having to spend time replying to the intent of an article someone felt they should write instead of providing useful information and doing broader investigative work instead. Because apparently clicks on an article are more important than contributing in a meaningful manner to a project you use.

@serebit I'll be taking a look at your patch momentarily. @smcv, I want to thank you for taking the time out of your day to write this comprehensive comment, as well as being the badass you are to actively contribute to open source software, improving the viability of Linux on the desktop for countless around the world.

Source of article: https://web.archive.org/web/20210319141820/https://boilingsteam.com/proton-and-solus-an-unstable-mix-in-2021/

ekianjo added a comment.EditedMar 21 2021, 3:39 AM

@JoshStrobl always love you how run things around here from your pedestal, making it sound like I was being unhelpful as I was not even the originator of the ticket and yet I am the one to have to provide more information in the first place? Quite amusing.

I was even asking for an update on this issue and my message received no answer for a month.

This ticket has been sitting around since November 2020, has been confirmed by several people, and there was no activity on it until a few days ago and yet you claim that this was NOT treated as "low priority" - well I certainly hope to never have to deal with a bug with low priority in the future because that sounds like it will take years to get it through. It almost seems like none of the Solus devs are doing any serious gaming with Steam otherwise they would have noticed this by themselves.

Because apparently clicks on an article are more important than contributing in a meaningful manner to a project you use.

The article actually worked to make this issue a bit more apparent despite the fact it's been broken and no maintainer took action on this for... 1,2,3+ months. Sorry if that's what it takes to get your attention.

Hey, just chiming in, I've mentioned myself in the thread that I wasn't having the issue when this was created. I still haven't had this issue. Given the activity from smcv and Staudey over the last few days, both here and on the steam-runtime GitHub, it's safe to assume that a workaround would've been pushed by next stable sync even if your article hadn't been published. In addition, you probably shouldn't take too much credit, given that the only reason any of us even saw your article was because someone with far more reach to Linux users than you (who doesn't even *use* Solus) pointed it out to me in another forum, calling it out for how ridiculous and childish it was.

Note in the steam-runtime issue that there are no obvious error messages produced, and the only way we would have been able to deduce that it's an issue with bwrap at all is through collaboration with people who understand how pressure-vessel works. As you can imagine, that's not very many people. smcv made his comment two days ago, and you wrote an article *a day later* mocking Solus for not having worked around the issue yet. You could've just as easily done what I did, looked at the steam-runtime issue, seen the suggested workaround, and submitted a diff for it. You didn't. You chose to incite drama where there was no need for it.

You're in a space where collaboration is everything. Solus isn't a product of just two people, it's a product of tens working many volunteer hours to keep things running, with Josh and Bryan at the helm. I'm in no position to make demands, of course, but I can ask you to please be mindful of that. You already use Solus, you've said it yourself. I'd rather you give us a hand than try to tear us down.

mcv made his comment two days ago, and you wrote an article *a day later* mocking Solus for not having worked around the issue yet

Please point me to the specific part of the article where I am mocking Solus. I'm curious.

You chose to incite drama where there was no need for it.

There's no drama, the whole article is purely factual (linking to everything I pointed out as source, by the way, this is not just some kind of random opinion detached from any kind of reality). Yeah, I did voice some concerns at the very end. And... apparently this is bad?

You already use Solus, you've said it yourself. I'd rather you give us a hand than try to tear us down.

Let me quote the end of the article: "Here’s to hoping a fix will come soon."

Does it sound like I am trying to tear you guys down? I could jump to another distro the next day if I wanted to, but guess what, I choose to continue using Solus, while I happen to also have issues with how things work or do not work.

@ekianjo you're not exactly helping the situation by escalating it to public shaming just because you have a problem with things taking longer than you expect to be fixed.

and yet I am the one to have to provide more information in the first place

Josh didn't ask you to provide more information. He just expected you to recognize that this is still an ongoing issue and that we need more information to find the root-cause. serebit only referenced you because generally people who are running into issues like this have at least tried to find the source of the problem and might have more information to share.

I was even asking for an update on this issue and my message received no answer for a month.

So maybe realize that there are many moving pieces in Solus and less than 20 regular contributors and zero full-time staff, and cut us a little slack. Josh has a business and other projects to worry about. I am in my last year of a PhD. No one else has stepped up to dig into this until now. Stuff happens.

This ticket has been sitting around since November 2020, has been confirmed by several people, and there was no activity on it until a few days ago and yet you claim that this was NOT treated as "low priority" - well I certainly hope to never have to deal with a bug with low priority in the future because that sounds like it will take years to get it through.

Comparatively, having only the latest version of Proton be broken is quite a bit lower priority than most of the issues we deal with on a regular basis. And quite often breakage with Steam is the result of some upstream problem that gets fixed before we even get to it.

It almost seems like none of the Solus devs are doing any serious gaming with Steam otherwise they would have noticed this by themselves.

Apparently not "serious" enough to warrant being on the latest release of Proton, no. But we can and do use Proton frequently. And again, it's not like we always have time for the latest and greatest games that might require that, anyways. I personally rarely get to play anything the year it is released. Such is life.

The article actually worked to make this issue a bit more apparent despite the fact it's been broken and no maintainer took action on this for... 1,2,3+ months. Sorry if that's what it takes to get your attention.

If you don't see this kind of click-bait public shaming as a problem, you are the problem. You didn't even attempt to ask about it again here or to contact anyone through any of our other support channels. Hell, even a tweet asking about it would have been more welcome than this. Instead you tried exactly once, waited until it annoyed you enough, then you turned it into a bigger thing than it is to get your way. If you can't see how childish and entitled you are acting about this, I'm at a loss for words.

This will get fixed, just not on your schedule.

@serebit Could it be that bubblewrap is/was not installed on your machine, but flatpak is? In that case the Steam runtime will use flatpak-bwrap instead, which isn't setuid and therefore works (as I saw in testing when replacing bwrap by a script and forgetting to set chmod +x)

@serebit Could it be that bubblewrap is/was not installed on your machine, but flatpak is? In that case the Steam runtime will use flatpak-bwrap instead, which isn't setuid and therefore works (as I saw in testing when replacing bwrap by a script and forgetting to set chmod +x)

That... makes a lot of sense.

smcv added a comment.Mar 23 2021, 11:26 AM

Note in the steam-runtime issue that there are no obvious error messages produced, and the only way we would have been able to deduce that it's an issue with bwrap at all is through collaboration with people who understand how pressure-vessel works

We'd like to improve on that, but Solus' setuid bwrap was apparently crashing with no error message on stderr, so the only thing we'd be able to get out of it is the exit status. The wrapper shell script that sets this stuff up should have at least logged the exit status, but currently doesn't (I'll try to fix that for a future version).

If the setuid bwrap was completely broken, then we should have detected that and failed with a better error message, because we do a quick check early in the process - bwrap --bind / / true or something like that. Unfortunately, apparently that superficial check does work with Solus' setuid bwrap, and it's only the full command-line that we execve() later on that doesn't.

With my bubblewrap contributor hat on, I'd still be interested in seeing a backtrace from the setuid bwrap.

One thing pressure-vessel might do to improve on this in future would be to try to use its bundled/known-good copy of a non-setuid bwrap first, and only go looking for a system copy if the bundled copy fails (as it will on more restrictive kernels, e.g. RHEL 7 and Debian 10). With the current configuration of Solus kernels, that logic would result in using the bundled copy and not the system copy. The main reason I haven't done this is that it'd be very easy for that to regress on systems like Debian 10 where the bundled copy can't be used.

From https://dev.getsol.us/R3479:a12cb4e23fae3309c8abf7c207ae5c800336e3ee:

This is necessary to work around T9406, which is an upstream issue with Valve's pressure-vessel tool.

I don't think this is really accurate. I do think the change is correct, but the commit message suggests that you might have made the right change for the wrong reasons.

pressure-vessel not working in Flatpak is an upstream issue (with changes to both pressure-vessel and Flatpak required to fix it), but for the failure seen here in a non-Flatpak Steam, I don't think pressure-vessel is doing anything wrong. We run the system copy of bwrap with an argv[] that seems reasonable, and if it isn't setuid then that apparently works fine, but if it's setuid then instead it crashes with SIGILL. I don't know why it does that, but it seems like a bug in either bwrap or Solus' packaging of bwrap.

Again, I would strongly recommend not making the system copy of bwrap setuid if you don't need to. I maintain the bubblewrap package in Debian (where bwrap historically needed to be setuid), and for Debian 11 I've stopped making it setuid, after first arranging with Debian's kernel maintainer to lift the extra restrictions on creating user namespaces so that bubblewrap can still work correctly.

The only additional feature you get from bwrap being setuid is "can run successfully under e.g. Debian 10/RHEL 7 kernels", which would maybe be interesting if Solus is frequently run in a chroot or privileged container on a Debian/RHEL system, but is not interesting if Solus is normally run as a complete OS in its own right. I don't use Solus myself, but I get the impression that running it as a complete desktop OS is its normal use-case?

The "costs" of making bwrap setuid, and the reasons I wanted to stop it from being setuid on Debian, are:

  • it disables some features (which pressure-vessel deliberately avoids relying on, for now, so that we can still run on Debian 10, but other bwrap users including Flatpak do want those features)
  • it's a security risk, exposing Solus to bubblewrap vulnerabilities like CVE-2020-5291 and CVE-2016-8659 that could have been avoided by making it non-setuid (obviously those two have been fixed long ago, but there might be more like them, and making bwrap non-setuid would turn any similar vulnerabilities into a non-issue on Solus)

I think the setup with a non-setuid bwrap, as used in Fedora and Ubuntu - and now Debian 11 and Solus too - is the best one for the future.

I don't think this is really accurate. I do think the change is correct, but the commit message suggests that you might have made the right change for the wrong reasons.

Noted all, thanks for the clarification! If you provide test args for pressure-vessel-wrap I'm sure @Staudey would be willing to provide a backtrace (reminder: gdb -ex run --args ./a.out arg1 arg2 ... to generate backtrace)