Difference between revisions of "PVE Troubleshooting"

From Da Nerd Mage Wiki
Jump to navigation Jump to search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
For some reason, the denizens of the Internet assume that all difficulties when using PVE stem from screwing up your cluster.  This is kind of odd when you consider that, sometimes, PVE servers run on their own...
For some reason, the denizens of the Internet assume that all difficulties when using PVE stem from screwing up your cluster.  This is kind of odd when you consider that, sometimes, PVE servers run on their own...
== Some resources: ==
[https://engineerworkshop.com/blog/how-to-unlock-a-proxmox-vm/ Proxmox Locked VM Errors]
== qm commands fail hard ==
== qm commands fail hard ==
Example:
Example:
Line 10: Line 12:
</syntaxhighlight>
</syntaxhighlight>


Nearly every Google hit is discussion about how to get your cluster working again... :|
Nearly every Google hit is discussion about how to get your cluster working again... :{{!}}


The actual problem, OTOH... Appears to be that, if your hostname doesn't match what's in /etc/hosts qm gets lost...
The actual problem, OTOH... Appears to be that, if your hostname doesn't match what's in /etc/hosts qm gets lost...


Take a look at <code>/etc/hostname</code>
Take a look at <code>/etc/hostname</code>
Line 42: Line 44:
192.168.1.2 proxmox-pve.nerdmage.ca proxmox-pve
192.168.1.2 proxmox-pve.nerdmage.ca proxmox-pve
</syntaxhighlight>
</syntaxhighlight>
Once that's fixed, restarting the machine should get things back to a working state.
== "EXT4-fs (dm-xx): write access unavailable, skipping orphan cleanup" ==
Do Not Panic!
This is not actually an error showing up repeatedly on the console.
It's some sort of random silliness.
[https://forum.proxmox.com/threads/ext4-fs-dm-10-write-access-unavailable-skipping-orphan-cleanup.46785/ Forum reference]
Far as I've managed to determine, it's something related to LXCs being locked during backup.
== "usb 2-1-port2: disabled by hub (EMI?), re-enabling..." ==
Far too much research indicates that pretty much no-one has a clue what this means...
Tho I '''did''' find some reference to radio interference...  hhhmmm...
== Post Power Failure Boot Problems ==
Recently, one of my servers has begun failing to start it's guests upon bootup after a power failure event.
'''Activation of logical volume pve/vm-XXXX-disk-X is prohibited while logical volume pve/data_tmeta is active.'''
'''WARNING: Device /dev/sdi3 not initialized in udev database even after waiting 10000000 microseconds.'''
'''TASK ERROR: activating LV 'pve/data' failed: Activation of logical volume pve/data is prohibited while logical volume pve/data_tdata is active.'''
'''[ TIME ] Timed out waiting for device /dev/disk/by-uuid????????-????-????-????-????????????.'''
[https://forum.proxmox.com/threads/task-error-activating-lv-pve-data-failed-activation-of-logical-volume-pve-data-is-prohibited-while-logical-volume-pve-data_tdata-is-active.106225/ This thread] has some hints...
Specifically, the following ''sometimes'' gets it running again:
*<code> lvchange -an pve/data</code>
* <code>lvconvert --repair pve/data</code>
* <code>lvchange -ay pve/data</code>
* <code>reboot</code>
[https://unix.stackexchange.com/questions/690552/boot-taking-forever-with-a-106-jbod-attached-warning-device-dev-xxx-not-ini This thread] has a suggested config change, but tis doesn't seem to have worked.
Then, there's [https://forum.proxmox.com/threads/local-lvm-not-available-after-kernel-update-on-pve-7.97406/page-2#post-430860 this thread] where people seem to believe it's a bug in Debian...

Latest revision as of 12:03, 23 June 2024

For some reason, the denizens of the Internet assume that all difficulties when using PVE stem from screwing up your cluster.  This is kind of odd when you consider that, sometimes, PVE servers run on their own...

Some resources:

Proxmox Locked VM Errors

qm commands fail hard

Example:

root@proxmox-pve:~# qm list
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

Nearly every Google hit is discussion about how to get your cluster working again... :|

The actual problem, OTOH... Appears to be that, if your hostname doesn't match what's in /etc/hosts qm gets lost...

Take a look at /etc/hostname

In our example, it'll look like:

proxmox-pve

Now, if /etc/hosts contains:

127.0.0.1 localhost.localdomain localhost
192.168.1.2 pve.nerdmage.ca pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

poor PVE is gonna be confused.

That second line should be:

192.168.1.2 proxmox-pve.nerdmage.ca proxmox-pve

Once that's fixed, restarting the machine should get things back to a working state.

"EXT4-fs (dm-xx): write access unavailable, skipping orphan cleanup"

Do Not Panic!

This is not actually an error showing up repeatedly on the console.

It's some sort of random silliness.

Forum reference

Far as I've managed to determine, it's something related to LXCs being locked during backup.

"usb 2-1-port2: disabled by hub (EMI?), re-enabling..."

Far too much research indicates that pretty much no-one has a clue what this means...

Tho I did find some reference to radio interference... hhhmmm...

Post Power Failure Boot Problems

Recently, one of my servers has begun failing to start it's guests upon bootup after a power failure event.

Activation of logical volume pve/vm-XXXX-disk-X is prohibited while logical volume pve/data_tmeta is active.

WARNING: Device /dev/sdi3 not initialized in udev database even after waiting 10000000 microseconds.

TASK ERROR: activating LV 'pve/data' failed: Activation of logical volume pve/data is prohibited while logical volume pve/data_tdata is active.

[ TIME ] Timed out waiting for device /dev/disk/by-uuid????????-????-????-????-????????????.

This thread has some hints...

Specifically, the following sometimes gets it running again:

  • lvchange -an pve/data
  • lvconvert --repair pve/data
  • lvchange -ay pve/data
  • reboot

This thread has a suggested config change, but tis doesn't seem to have worked.

Then, there's this thread where people seem to believe it's a bug in Debian...