Page MenuHomeSolus

Daily freezing until hard reboot
Closed, InvalidPublic

Assigned To
None
Authored By
tomocafe
May 14 2020, 8:53 PM
Referenced Files
F5939446: journal_kb1.log
May 15 2020, 9:31 PM
F5935937: journal_kb2.log
May 14 2020, 8:53 PM
F5935938: journal_kb1.log
May 14 2020, 8:53 PM
F5935936: journal_kb3.log
May 14 2020, 8:53 PM

Description

I'm not sure if this is the same issue as T8620 or not.

Every day it seems, sometimes multiple times, the system freezes on me. It starts with some buttons not working, then all of a sudden everything becomes frozen. I try Ctrl+Alt+F1 but nothing happens.

How should I troubleshoot this? Please let me know what additional information I can provide! :)

My last 4 boots are hard resets after freezing:

$ last -n5 -x shutdown reboot
reboot   system boot  4.9.219-157.lts  Thu May 14 13:28   still running
reboot   system boot  4.9.219-157.lts  Thu May 14 13:22   still running
reboot   system boot  4.9.219-157.lts  Thu May 14 09:59   still running
reboot   system boot  4.9.219-157.lts  Thu May 14 09:08   still running
shutdown system down  4.9.219-157.lts  Wed May 13 17:34 - 09:08  (15:34)

System information:

$ inxi -Fz
System:
  Kernel: 4.9.219-157.lts x86_64 bits: 64 Desktop: Budgie 10.5.1
  Distro: Solus 4.1
Machine:
  Type: Desktop Mobo: ASUSTeK model: PRIME X370-PRO v: Rev X.0x
  serial: <filter> UEFI: American Megatrends v: 0805 date: 06/20/2017
CPU:
  Topology: 8-Core model: AMD Ryzen 7 1700 bits: 64 type: MT MCP
  L2 cache: 4096 KiB
  Speed: 1550 MHz min/max: 1550/3400 MHz Core speeds (MHz): 1: 1550 2: 1550
  3: 1550 4: 1550 5: 1550 6: 1550 7: 1550 8: 1550 9: 1550 10: 1550 11: 1550
  12: 1550 13: 1550 14: 1550 15: 1550 16: 1550
Graphics:
  Device-1: NVIDIA GP106 [GeForce GTX 1060 6GB] driver: nvidia v: 440.82
  Display: x11 server: X.Org 1.20.7 driver: nvidia
  resolution: 3840x2160~60Hz
  OpenGL: renderer: GeForce GTX 1060 6GB/PCIe/SSE2 v: 4.6.0 NVIDIA 440.82
Audio:
  Device-1: NVIDIA GP106 High Definition Audio driver: snd_hda_intel
  Device-2: AMD Family 17h HD Audio driver: snd_hda_intel
  Device-3: Logitech C922 Pro Stream Webcam type: USB
  driver: snd-usb-audio,uvcvideo
  Device-4: Blue Microphones Yeti Nano type: USB
  driver: hid-generic,snd-usb-audio,usbhid
  Sound Server: ALSA v: k4.9.219-157.lts
Network:
  Device-1: Intel I211 Gigabit Network driver: igb
  IF: enp38s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: docker0 state: down mac: <filter>
Drives:
  Local Storage: total: 465.76 GiB used: 130.38 GiB (28.0%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 960 EVO 500GB
  size: 465.76 GiB
Partition:
  ID-1: / size: 453.31 GiB used: 130.38 GiB (28.8%) fs: ext4 dev: /dev/dm-1
Swap:
  Alert: No Swap data was found.
Sensors:
  Message: No sensors data was found. Is sensors configured?
Info:
  Processes: 379 Uptime: 14m Memory: 31.42 GiB used: 1.27 GiB (4.0%)
  Shell: bash inxi: 3.1.00

Kernel logs from my last 3 boots:

sudo journalctl -k -b -1

sudo journalctl -k -b -2

sudo journalctl -k -b -3

Thank you for looking at this!

Related Objects

Event Timeline

I picked that when I first set up my system 2 years ago... I think I was assuming it was more stable? Are you suggesting to use the current instead?

I think it will go away on -current considering your hardware came out long after we moved to newer kernels (4.14 I think?).

Let me give it a try. This says current is the default. Was that the case in 2017 time frame? I vaguely remember being guided towards picking lts by the installer. But that was a long time ago.

We switched to -current by default as of Solus 3 (August 2017). So it's possible you installed shortly before that.

You have an nvidia GPU and are not using the open source driver, so make sure you install a driver that will work with the -current kernel before you reboot. From what I can tell you're using nvidia-glx-driver which means you would have to install nvidia-glx-driver-current.

In case I'm wrong, to confirm which nvidia driver branch you have installed you can run this command:
eopkg li | grep nvidia | grep -v modaliases

I use a different driver branch, for me the output is:

nvidia-developer-driver-32bit            - 32-bit libraries for nvidia-developer-driver
nvidia-developer-driver-common           - Shared assets for the NVIDIA Developer Dinary Driver
nvidia-developer-driver-current          - NVIDIA Developer Binary Driver (Current Kernel)

You want to make sure for whatever result you get, there is a "-current" version of that exact same nvidia driver installed.

Thanks, Harvey. I am indeed using nvidia-glx-driver with lts and will install nvidia-glx-driver-current. I also checked all the ones listed here: https://getsol.us/articles/troubleshooting/boot-management/en/#installing-an-alternative-kernel

But I don't use any of those besides the Nvidia driver.

Unknown Object (User) added a subscriber: Unknown Object (User).May 15 2020, 7:28 PM

I've the "daily lockup" for quite some time now, didn't give it much traction. I am not using NVidia, and I am on -current.

There are no special log outputs there on journalctl, which is why I didn't know where to start.

I still hit the issue with the current kernel, but the journal logs do seem to give more promising information.

Unknown Object (User) added a comment.May 15 2020, 11:20 PM

Sounds a bit like this to me:
https://bugzilla.kernel.org/show_bug.cgi?id=196683 stating it's an issue of the kernel ignoring bios settings and/or crashing ehen the CPU goes on low voltage "standby".

I'll try it with proceasor.max_cstate=5 and see if that changes anything.

DataDrake triaged this task as Needs More Info priority.May 17 2020, 4:46 AM
DataDrake edited projects, added Hardware; removed Lacks Project.

Marking as Needs More Info until we get a better idea what the cause is.

Seems like this is an AMD thing.

I just updated to the latest BIOS version and will see if that helps.

Unknown Object (User) added a comment.May 17 2020, 7:08 PM

I did a completely fresh install yesterday (unrelated reason, wanted Budgie again but realized that Gnome still has issues with uinput, so back on Plasma). BIOS is up to date quite a while now (mine does not seem to receive updates anymore). I didn't have the hard freeze last day... I'll wait and try to find some kind of source, such ... daily/some day freezes are hard to get an idea of. So ye, needs more info is probably the right status for this one.

Unknown Object (User) added a comment.Jun 17 2020, 8:51 PM

This maybe sounds crazy, but I got those lockups more and more frequently. I realized that my cat chewed some on my display cables. Replacing them seems to have remedied the issue. Not sure how and why linux would lock up due to display cables being eaten, but that actually seems to be the case.

Seems putting fat boy on diet actually got me the issue.

ermo edited subscribers, added: ermo; removed: DataDrake.

Closing this as invalid as the cause (per the above comment) does not appear to be related to Solus per se.