I've only seen this on one cluster, but let's not issue repops during
scrub on objects where the object_info_t::soid value is not correct.
The cluster in question has been through many different non-release
kernels and osd versions, so the objects presumably came about due to an
old xfs or filestore bug. They recently became fatal since we made
filestore crash on ENOENT for setattrs. In the past, the cluster just
silently tolerated them.
http://tracker.ceph.com/issues/18409 is a larger feature to detect these
better and repair them automatically.
Related: http://tracker.ceph.com/issues/18409
Signed-off-by: Samuel Just <sjust@redhat.com>
continue;
dout(10) << __func__ << " recording digests for " << p->first << dendl;
ObjectContextRef obc = get_object_context(p->first, false);
- assert(obc);
+ if (!obc) {
+ osd->clog->error() << info.pgid << " " << mode
+ << " cannot get object context for "
+ << p->first;
+ continue;
+ } else if (obc->obs.oi.soid != p->first) {
+ osd->clog->error() << info.pgid << " " << mode
+ << " object " << p->first
+ << " has a valid oi attr with a mismatched name, "
+ << " obc->obs.oi.soid: " << obc->obs.oi.soid;
+ continue;
+ }
OpContextUPtr ctx = simple_opc_create(obc);
ctx->at_version = get_next_version();
ctx->mtime = utime_t(); // do not update mtime