Difference between revisions of "PVE Troubleshooting"

From Da Nerd Mage Wiki
Jump to navigation Jump to search
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
For some reason, the denizens of the Internet assume that all difficulties when using PVE stem from screwing up your cluster.  This is kind of odd when you consider that, sometimes, PVE servers run on their own...
For some reason, the denizens of the Internet assume that all difficulties when using PVE stem from screwing up your cluster.  This is kind of odd when you consider that, sometimes, PVE servers run on their own...
== Some resources: ==
[https://engineerworkshop.com/blog/how-to-unlock-a-proxmox-vm/ Proxmox Locked VM Errors]
== qm commands fail hard ==
== qm commands fail hard ==
Example:
Example:
 
<syntaxhighlight lang="sh">
<pre>
root@proxmox-pve:~# qm list
root@proxmox-pve:~# qm list
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[1] failed: Connection refused
Line 9: Line 10:
ipcc_send_rec[3] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
Unable to load access control list: Connection refused
</pre>
</syntaxhighlight>
 
Nearly every Google hit is discussion about how to get your cluster working again... :{{!}}
 
The actual problem, OTOH... Appears to be that, if your hostname doesn't match what's in /etc/hosts qm gets lost...
 
Take a look at <code>/etc/hostname</code>
 
In our example, it'll look like:
<syntaxhighlight lang="ini">
proxmox-pve
</syntaxhighlight>
 
Now, if <code>/etc/hosts</code> contains:
<syntaxhighlight lang="ini">
127.0.0.1 localhost.localdomain localhost
192.168.1.2 pve.nerdmage.ca pve
 
# The following lines are desirable for IPv6 capable hosts
 
::1    ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
</syntaxhighlight>
 
poor PVE is gonna be confused.
 
That second line should be:
<syntaxhighlight lang="ini">
192.168.1.2 proxmox-pve.nerdmage.ca proxmox-pve
</syntaxhighlight>
 
Once that's fixed, restarting the machine should get things back to a working state.
 
== "EXT4-fs (dm-xx): write access unavailable, skipping orphan cleanup" ==
Do Not Panic!
 
This is not actually an error showing up repeatedly on the console.
 
It's some sort of random silliness.


Nearly every Google hit is discussion about how to get your cluster working again... :|
[https://forum.proxmox.com/threads/ext4-fs-dm-10-write-access-unavailable-skipping-orphan-cleanup.46785/ Forum reference]


The actual problem, OTOH...  Appears to be that, if your hostname doesn't match what's in /etc/hosts qm gets lost...
Far as I've managed to determine, it's something related to LXCs being locked during backup.

Revision as of 15:07, 24 November 2023

For some reason, the denizens of the Internet assume that all difficulties when using PVE stem from screwing up your cluster.  This is kind of odd when you consider that, sometimes, PVE servers run on their own...

Some resources:

Proxmox Locked VM Errors

qm commands fail hard

Example:

root@proxmox-pve:~# qm list
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

Nearly every Google hit is discussion about how to get your cluster working again... :|

The actual problem, OTOH... Appears to be that, if your hostname doesn't match what's in /etc/hosts qm gets lost...

Take a look at /etc/hostname

In our example, it'll look like:

proxmox-pve

Now, if /etc/hosts contains:

127.0.0.1 localhost.localdomain localhost
192.168.1.2 pve.nerdmage.ca pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

poor PVE is gonna be confused.

That second line should be:

192.168.1.2 proxmox-pve.nerdmage.ca proxmox-pve

Once that's fixed, restarting the machine should get things back to a working state.

"EXT4-fs (dm-xx): write access unavailable, skipping orphan cleanup"

Do Not Panic!

This is not actually an error showing up repeatedly on the console.

It's some sort of random silliness.

Forum reference

Far as I've managed to determine, it's something related to LXCs being locked during backup.