Debugging of suspected ZFS deadlocks

Please first read the following. Please direct your ZFS reports to

The best first step is to capture stack traces of all threads in one of the following ways:

If you see thread(s) with zio_wait call in their stacks and you also see thread(s) with zio_done call in their stacks, then this is very likely a true ZFS deadlock. Please report.

Similarly, if you see thread(s) with zio_wait call in their stacks and you also see thread(s) with zio_interrupt call in their stacks, then this is very likely a true ZFS deadlock. Please report.

If you do not see any threads with zio_wait call, but you see threads with the following calls (or similar):

then this is very likely a true ZFS deadlock. Please report.

If neither of the above is true. That is, you do see zio_wait and you don't see either of zio_done or zio_interrupt, then the problem is most likely with the storage layer:

Consider reporting this problem. Please be realistic about the problem. Do not expect a resolution in ZFS code.

Some notes:

If you are into deep debugging some very interesting/useful information can be seen in vdev_t structures associated with each leaf vdev of a pool.

vdev_queue = {
        vq_deadline_tree = {avl_root = 0xfffffe0338dbb248, avl_compar =
0xffffffff816855b0 <vdev_queue_deadline_compare>,
avl_offset = 584, avl_numnodes = 116, avl_size = 896},
        vq_read_tree = {avl_root = 0xfffffe019d0b65b0, avl_compar =
0xffffffff81685600 <vdev_queue_offset_compare>, avl_offset = 560, avl_numnodes =
8, avl_size = 896},
        vq_write_tree = { avl_root = 0xfffffe03e3d19230, avl_compar =
0xffffffff81685600 <vdev_queue_offset_compare>, avl_offset = 560, avl_numnodes =
108, avl_size = 896},
        vq_pending_tree = {avl_root = 0xfffffe025e32c230, avl_compar =
0xffffffff81685600 <vdev_queue_offset_compare>, avl_offset = 560, avl_numnodes =
10, avl_size = 896},

avl_numnodes provides a number of requests (zio-s) in a given queue. vq_deadline_tree is a queue of incoming requests, vq_read_tree and vq_write_tree are sub-queues for read and write requests correspondingly. vq_pending_tree is a queue of requests that have been issued to the underlying storage layer, ZFS is waiting for these requests to be completed.


AndriyGapon/AvgZfsDeadlockDebug (last edited 2018-03-12T02:55:14+0000 by MarkLinimon)