From: Kefu Chai Date: Wed, 13 May 2026 05:02:16 +0000 (+0800) Subject: crimson/osd: disable ofstream buffering to fix concurrent logging X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=1d09118286aff3737807dc7ef40fd55dbc863608;p=ceph.git crimson/osd: disable ofstream buffering to fix concurrent logging seastar::logger::do_log() writes to the shared static _out pointer from every shard's reactor thread with no lock. std::cerr is safe in this setting because it is unbuffered: each write maps to a single write(2) syscall, which POSIX serializes at the kernel level. A buffered std::ofstream is not safe: multiple shards concurrently advance the filebuf's put pointer (pptr) past the end-of-buffer marker (epptr), causing _M_convert_to_external to compute a length longer than the 8192-byte internal buffer and write past it. The C++ standard does not provide thread-safety guarantees for std::ofstream. [res.on.data.races] (C++23 ยง16.4.6.10) specifies that concurrent non-const access to a standard library object from multiple threads is a data race with undefined behavior. std::basic_filebuf, which std::ofstream owns internally, maintains mutable state (pptr, epptr, _M_buf, and the codec state) that is updated on every write with no synchronization. std::cerr is an explicit exception: [iostream.objects] guarantees that concurrent writes to the standard stream objects are safe, and cerr achieves this by being unbuffered (no pptr/epptr to race on) and writing through a single atomic write(2) per flush. This manifested as a heap-buffer-overflow in basic_filebuf::overflow() via seastar::logger::do_log() during a multi-shard run, reported against the build that included the dangling-pointer fix (6680e02d041). The dangling-pointer bug had masked this latent race by crashing before multiple shards came online. In this change, we fix this by calling pubsetbuf(nullptr, 0) immediately after opening the file. This suppresses _M_allocate_internal_buffer and makes each operator<< call fall through to a single write(2) syscall, matching std::cerr's thread-safety guarantee. This fix is necessary for all production deployments, not just pathological configurations: log_file has a daemon_default of /var/log/ceph/$cluster-$name.log in global.yaml.in ($cluster is always "ceph" in modern Ceph, as customized cluster names have been deprecated). Every crimson-osd process therefore opens a log_file_stream by default, and every multi-shard run is exposed to this race. An alternative would be to follow ScyllaDB's approach: its service file has no StandardOutput or StandardError directives, so systemd connects the process to the journal, and the logger keeps _out pointing at std::cerr. This sidesteps the buffered-ofstream problem entirely. For Crimson to adopt that model it would need to respect log_to_file and log_to_stderr (which it currently ignores, checking only log_file), and a dedicated ceph-crimson-osd@.service unit would be needed so that a StandardError=append:/var/log/ceph/ceph-osd.%i.log directive could be added without affecting the classic OSD. That is a larger refactor; pubsetbuf(nullptr, 0) is the minimal correct fix for now. Fixes: https://tracker.ceph.com/issues/76524 Signed-off-by: Kefu Chai --- diff --git a/src/crimson/osd/main.cc b/src/crimson/osd/main.cc index c549277888de..6f287c9f1f85 100644 --- a/src/crimson/osd/main.cc +++ b/src/crimson/osd/main.cc @@ -165,6 +165,11 @@ int main(int argc, const char* argv[]) } catch (const std::system_error& e) { ceph_abort_msg(fmt::format("unable to open log file: {}", e.what())); } + // seastar::logger::do_log() writes to _out from every shard's thread + // with no lock. std::cerr is safe because it is unbuffered; a buffered + // ofstream is not. Disable buffering so each write() is a single syscall, + // matching cerr's thread-safety guarantee. + log_file_stream.rdbuf()->pubsetbuf(nullptr, 0); logger().set_ostream(log_file_stream); } auto reset_logger = seastar::defer([] {