CephFS currently deadlocks under CTDB's ping_pong POSIX locking test
when run concurrently on multiple nodes.
The deadlock is caused by failed removal of a waiting_locks entry when
the waiting lock is merged with an existing lock, e.g:
Initial MDS state (two clients, same file):
held_locks -- start: 0, length: 1, client: 4116, pid: 7899, type: 2
start: 2, length: 1, client: 4110, pid: 40767, type: 2
waiting_locks -- start: 1, length: 1, client: 4116, pid: 7899, type: 2
Waiting lock entry 4116@1:1 fires:
handle_client_file_setlock: start: 1, length: 1,
client: 4116, pid: 7899, type: 2
MDS state after lock is obtained:
held_locks -- start: 0, length: 2, client: 4116, pid: 7899, type: 2
start: 2, length: 1, client: 4110, pid: 40767, type: 2
waiting_locks -- start: 1, length: 1, client: 4116, pid: 7899, type: 2
Note that the waiting 4116@1:1 lock entry is merged with the existing
4116@0:1 held lock to become a 4116@0:2 held lock. However, the now
handled 4116@1:1 waiting_locks entry remains.
When handling a lock request, the MDS calls adjust_locks() to merge
the new lock with available neighbours. If the new lock is merged,
then the waiting_locks entry is not located in the subsequent
remove_waiting() call because adjust_locks changed the new lock to
include the old locks.
This fix ensures that the waiting_locks entry is removed prior to
modification during merge.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit
476e4902907dfadb3709ba820453299ececf990b)
} else {
//yay, we can insert a shared lock
dout(15) << "inserting shared lock" << dendl;
+ remove_waiting(new_lock);
adjust_locks(self_overlapping_locks, new_lock, neighbor_locks);
held_locks.insert(pair<uint64_t, ceph_filelock>(new_lock.start, new_lock));
ret = true;
}
}
} else { //no overlapping locks except our own
+ remove_waiting(new_lock);
adjust_locks(self_overlapping_locks, new_lock, neighbor_locks);
dout(15) << "no conflicts, inserting " << new_lock << dendl;
held_locks.insert(pair<uint64_t, ceph_filelock>
}
if (ret) {
++client_held_lock_counts[(client_t)new_lock.client];
- remove_waiting(new_lock);
}
else if (wait_on_fail && !replay)
++client_waiting_lock_counts[(client_t)new_lock.client];
old_lock = &(*iter)->second;
old_lock_client = old_lock->client;
dout(15) << "lock to coalesce: " << *old_lock << dendl;
- /* because if it's a neibhoring lock there can't be any self-overlapping
+ /* because if it's a neighboring lock there can't be any self-overlapping
locks that covered it */
if (old_lock->type == new_lock.type) { //merge them
if (0 == new_lock.length) {