Page MenuHomeSolus

Xorg sporadically hangs
Open, Needs More InfoPublic

Description

Heyo,

It seems like every so often (once a week or two) my computer freezes up. It's happened a fair few times in total now, and it seems to be a mostly visual issue. Audio still plays, but the screens are completely frozen. I can ssh in (which I have done), and from that, it seems to be an Xorg issue.

Unfortunately, restarting lightdm doesn't seem to work, neither does systemctl reboot. Even though the ssh session terminates, the my screesn stay displaying the frozen picture. The only thing that seems to work is a hard power cycle, which isn't good.

╰> ssh tranquillity
Last login: Wed Jan 15 11:09:52 2020 from 130.95.254.203
╭─ tec@tranquillity  ~                                                                                                                                                                    12:10
╰> htop
╭─ tec@tranquillity  ~                                                                                                                                                            37.39  12:11
╰> ps -aux | grep "xorg"
root      1008  1.2  2.4 826124 395948 tty7    Rsl+ Jan11  82:59 /usr/lib64/xorg-server/Xorg :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
tec      29779  0.0  0.0   3388   876 pts/3    S+   12:11   0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn xorg
╭─ tec@tranquillity  ~                                                                                                                                                             0.03  12:11
╰> sudo systemctl restart lightdm
Password:


^C
╭─ tec@tranquillity  ~                                                                                                                                                SIGINT(2) ↵  16.20  12:12
╰> sudo systemctl status lightdm
* lightdm.service - Display Manager
   Loaded: loaded (/usr/lib/systemd/system/lightdm.service; disabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) since Thu 2020-01-16 12:12:24 AWST; 15s ago
 Main PID: 978 (lightdm)
    Tasks: 5 (limit: 4915)
   Memory: 19.5M
   CGroup: /system.slice/lightdm.service
           `-978 /usr/sbin/lightdm

Jan 15 19:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 15 20:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 15 21:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 15 22:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 15 23:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 16 00:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 16 01:03:00 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 16 10:41:23 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 16 11:36:56 tranquillity gnome-keyring-daemon[1162]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
Jan 16 12:12:24 tranquillity systemd[1]: Stopping Display Manager...
╭─ tec@tranquillity  ~                                                                                                                                                         3 ↵  0.27  12:12
╰> sudo systemctl reboot
Connection to 180.150.91.8 closed by remote host.
Connection to 180.150.91.8 closed.
╭─ ~                                                                                                                                                                    SIG(127) ↵  03:07  12:13
╰> ssh tranquillity
ssh: connect to host tranquillity port 8579: Connection refused

I am running

  • Linux 5.4.8-141 (just updated to
  • AMD 1700x
  • GTX 980ti, with Nvidia driver

Related Objects

Mentioned Here
T2156: MuPDF

Event Timeline

tecosaur created this task.Jan 19 2020, 2:34 AM
JoshStrobl updated the task description. (Show Details)Jan 19 2020, 10:50 PM

I have experienced this issue three times since creating this task. For no apparent reason every so often my system will completely freeze, with (seemingly) no hope of recovery. For the ~year I had windows (painful as it was) I did not experience any stability issues.

Something similar has happened twice recently: the system has suddenly died No frozen image, unable to ssh, it just dies.
This is begining to go beyond an inconvinience.

tomocafe added a subscriber: tomocafe.

I (think) this happens to me too. I check journalctl after power cycling and never see anything that looks interesting to me.

DataDrake triaged this task as Needs More Info priority.Feb 5 2020, 5:35 AM
DataDrake added a subscriber: DataDrake.

Going to need more to go on. Should see something in your journal.

I just had my system lock up again. Happens maybe 1-3 times a week at irregular times. After scrubbing the journal I found these entries just before the system graphics froze. @DataDrake let me know if there's anything else you need/I can help test.

Feb 06 11:58:54 kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Feb 06 11:58:54 kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 06 11:58:54 kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 06 11:58:54 kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 06 11:58:54 kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Feb 06 11:58:54 kernel: GPU crash dump saved to /sys/class/drm/card0/error
Feb 06 11:58:54 kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 06 11:58:54 kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 06 11:58:54 kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 06 11:58:54 kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 06 11:58:54 kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}

Ah the intel resets. Yeah those are a known issue and are still present in 5.5 afaik. No fix yet.

@DataDrake I'm on a AMD system, could that be affecting me too?

Ah the intel resets. Yeah those are a known issue and are still present in 5.5 afaik. No fix yet.

After going down through the rabbit hole and reading way too much about this, I'm going to post this here for anyone else who comes looking.
https://gitlab.freedesktop.org/drm/intel/issues/673

@alecbcs Yep, hopefully we'll see it backported for 5.5.4 soon.

@JoshStrobl I'll keep an eye out for it! (Well...unless my screen freezes :P)

I just had this happen to me again. Going through sudo journalctl -ra I've picked out the following possibly-relevant-looking lines:

Feb 19 09:39:17 tranquillity gvfsd[1672]: Error calling org.gtk.vfs.MonitorClient.Changed(): Timeout was reached (g-io-error-quark, 24)

Feb 19 09:37:45 tranquillity kernel: snd_hda_intel 0000:09:00.1: can't change power state from D0 to D3hot (config space inaccessible)
Feb 19 09:37:44 tranquillity kernel: NVRM: A GPU crash dump has been created. If possible, please run
                                     NVRM: nvidia-bug-report.sh as root to collect this data before
                                     NVRM: the NVIDIA kernel module is unloaded.
Feb 19 09:37:44 tranquillity kernel: NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
Feb 19 09:37:44 tranquillity kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=971, GPU has fallen off the bus.
Feb 19 09:37:44 tranquillity kernel: NVRM: GPU at PCI:0000:09:00: GPU-070f07d1-98ee-a9cf-6d0a-91f3de015f55
Feb 19 09:37:40 tranquillity kernel: snd_hda_codec_hdmi hdaudioC0D0: out of range cmd 0:4:707:ffffffbf
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER: Device recovery failed
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER:   TLP Header: 40000010 090100ff fedf7000 00000000
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER:    [21] ACSViol
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00300000/04400000
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Feb 19 09:37:40 tranquillity kernel: pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Feb 19 09:37:35 tranquillity gsd-media-keys[1434]: gvc_mixer_card_get_index: assertion 'GVC_IS_MIXER_CARD (card)' failed

Feb 19 08:43:00 tranquillity budgie-wm.desktop[1562]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0xa800003
Feb 19 08:43:00 tranquillity budgie-wm.desktop[1562]: Window manager warning: Invalid WM_TRANSIENT_FOR window 0x1f2 specified for 0xa800003.
Feb 19 08:41:04 tranquillity kernel: pcieport 0000:00:03.1: AER:    [12] Timeout
Feb 19 08:41:04 tranquillity kernel: pcieport 0000:00:03.1: AER:    [ 8] Rollover
Feb 19 08:41:04 tranquillity kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00001100/00006000
Feb 19 08:41:04 tranquillity kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Feb 19 08:41:04 tranquillity kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
...
Feb 19 08:35:34 tranquillity kernel: pcieport 0000:00:03.1: AER:    [12] Timeout
Feb 19 08:35:34 tranquillity kernel: pcieport 0000:00:03.1: AER:    [ 8] Rollover
Feb 19 08:35:34 tranquillity kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00001100/00006000
Feb 19 08:35:34 tranquillity kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Feb 19 08:35:34 tranquillity kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
...
Feb 19 08:32:25 tranquillity kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
Feb 19 08:32:25 tranquillity kernel: pcieport 0000:00:03.1: AER:    [12] Timeout
Feb 19 08:32:25 tranquillity kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00001000/00006000
Feb 19 08:32:25 tranquillity kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Feb 19 08:32:25 tranquillity kernel: pcieport 0000:00:03.1: AER: Corrected error received: 0000:00:00.0
Feb 19 08:32:22 tranquillity systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Feb 19 08:32:17 tranquillity kdeconnectd[1726]: kdeconnect.plugin.notification: Destroying NotificationsPlugin
Feb 19 08:32:17 tranquillity kdeconnectd[1726]: kdeconnect.plugin.sendnotification: Destroying NotificationsListener
Feb 19 08:32:17 tranquillity budgie-wm.desktop[1562]: Window manager warning: Overwriting existing binding of keysym ff61 with keysym ff61 (keycode 6b).

Feb 19 08:32:11 tranquillity kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 19 08:32:11 tranquillity gsd-sharing[1431]: Failed to StopUnit service: GDBus.Error:org.freedesktop.systemd1.NoSuchUnit: Unit gnome-remote-desktop.service not loaded.
Feb 19 08:32:11 tranquillity gsd-sharing[1431]: Failed to StopUnit service: GDBus.Error:org.freedesktop.systemd1.NoSuchUnit: Unit gnome-user-share-webdav.service not loaded.

I'm up to date (Linux version 5.5.6-149) yet I'm still having this happen to me. Black screen, computer just does dead every so often (few days, very annoying). Looked in journalctl -ra but couldn't find anything.

Since that comment this has happened a further two times :(

aquasp added a subscriber: aquasp.Mar 6 2020, 3:34 AM

They appear to have made some changes in intel i915 in kernel 5.5.7 and 5.5.8

Maybe its fixed in the 5.5.8?

Will that affect AMD systems?

aquasp added a comment.Mar 6 2020, 4:18 AM

@tecosaur

I need to confirm, but maybe

aquasp added a comment.Mar 6 2020, 4:18 AM

i915 is a intel thing, but its possible that in the kernel changelog there is something about amd

My system is currently crashing every few days. If I could roll back to 6 months ago I would. I feel like I'm using windows again, and it's really hurting my experience.

Looking in my journal, I'm not sure what to look out for, or which logs I should look at.

@DataDrake do you have any suggestions?