Fixes: https://tracker.ceph.com/issues/55583
Use a python threading lock to avoid race conditions where the
config file is being both read and written to at the same time.
Before this change, the content of the config file being parsed could be
'corrupted' by the MetadataManager racing with itself. Along with the
previous two patches, additional logging was added to the mgr code to
produce the simplified version of the mgr log below:
```
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'[GLOBAL]\nversion = 2\ntype = clone\npath = /volumes/Park/babydino2/
c9f773af-5221-49c6-846c-
d65c0920ae3f\nstate = pending\n\n[source]\nvolume = cephfs\ngroup = Park\nsubvolume = Jurrasic\nsnapshot = dinodna0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b''
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'[GLOBAL]\nversion = 2\ntype = clone\npath = /volumes/Park/babydino2/
c9f773af-5221-49c6-846c-
d65c0920ae3f\nstate = pending\n\n[source]\nvolume = cephfs\ngroup = Park\nsubvolume = Jurrasic\nsnapshot = dinodna0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] wrote 203 bytes to config b'/volumes/Park/babydino2/.meta'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'a0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b''
[volumes ERROR volumes.module] Failed _cmd_fs_clone_cancel(clone_name:babydino2, format:json, group_name:Park, prefix:fs clone cancel, vol_name:cephfs) < "":
Traceback (most recent call last):
...
File "/usr/lib64/python3.6/configparser.py", line 1111, in _read
raise e
configparser.ParsingError: Source contains parsing errors: b'/volumes/Park/babydino2/.meta'
[line 13]: 'a0\n'
```
Looking at the above you can see that the log indicates a write to the
config file (of 203 bytes). This happens before the file has finished
reading and thus instead of getting an empty string indicating EOF, it
gets that last four bytes of the new content of the file. The lock
prevents the MetadataManager from both reading and writing the config
file at the same time.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit
2fe83d72d94d1e0939d390554a833cd93b4dd965)
import errno
import logging
import sys
+import threading
if sys.version_info >= (3, 2):
import configparser
log = logging.getLogger(__name__)
+# _lock needs to be shared across all instances of MetadataManager.
+# that is why we have a file level instance
+_lock = threading.Lock()
+
def _conf_reader(fs, fd, offset=0, length=4096):
while True:
fd = None
try:
log.debug("opening config {0}".format(self.config_path))
- fd = self.fs.open(self.config_path, os.O_RDONLY)
- cfg = ''.join(_conf_reader(self.fs, fd))
+ with _lock:
+ fd = self.fs.open(self.config_path, os.O_RDONLY)
+ cfg = ''.join(_conf_reader(self.fs, fd))
self.config.read_string(cfg, source=self.config_path)
except UnicodeDecodeError:
raise MetadataMgrException(-errno.EINVAL,
try:
- fd = self.fs.open(self.config_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, self.mode)
- with _ConfigWriter(self.fs, fd) as cfg_writer:
- self.config.write(cfg_writer)
- cfg_writer.fsync()
+ with _lock:
+ fd = self.fs.open(self.config_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, self.mode)
+ with _ConfigWriter(self.fs, fd) as cfg_writer:
+ self.config.write(cfg_writer)
+ cfg_writer.fsync()
log.info("wrote {0} bytes to config {1}".format(cfg_writer.wrote, self.config_path))
except cephfs.Error as e:
raise MetadataMgrException(-e.args[0], e.args[1])