Xuehan Xu [Fri, 17 Dec 2021 05:20:35 +0000 (13:20 +0800)]
crimson/os/seastore: reset onode in 'SeaStore::repeat_with_onode' before the transaction gets destroyed
Onodes hold references to the onode tree extents. And if it's referencing the root extent, that root
extent is cached in the onode trees root_tracker which caches onode tree roots by transaction address.
Than root_tracker entry only gets removed when the onode(or the corresponding "super") is destroyed.
On the other hand, two non-concurrent transactions can occupy the same address. So if an onode gets destroyed
after its transaction is destroyed, there will be a chance that another transaction occupying the same
address get that not-yet-destroyed and may-be-outdated onode.
BTW, Since we already cache extents in transactions, might want to drop onode tree root_tracker later?
Sage Weil [Fri, 17 Dec 2021 04:54:25 +0000 (23:54 -0500)]
Merge PR #44228 into master
* refs/pull/44228/head:
qa/suites/orch/cephadm/osds: test 'ceph cephadm osd activate'
mgr/cephadm/services/osd: skip found osds that already have daemons
mgr/cephadm: allow activation of OSDs that have previously started
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sage Weil [Mon, 6 Dec 2021 15:19:16 +0000 (10:19 -0500)]
mgr/cephadm: allow activation of OSDs that have previously started
When this code was introduced way back in ea987a0e56db106f7c76d11f86b3e602257f365e,
for some reason I was focused only on freshly created OSDs. The
get_osd_uuid_map() helper is used by deploy_osd_daemons_for_existing_osds()
which is called not only by OSD creation but also by 'ceph cephadm
osd activate', which is meant to instantiate daemons for existing OSD
devices (e.g., devices that were reattached to a new server, or whose
/var/lib/ceph/$fsid/osd.$id directory was lost for some other reason.
However, if we ignore OSDs with up_from > 0, then we can't recreate a
daemon instance for such existing OSDs--arguably the most important ones,
since they may hold real data.
Fixes: https://tracker.ceph.com/issues/53491 Signed-off-by: Sage Weil <sage@newdream.net>
Ronen Friedman [Thu, 16 Dec 2021 10:49:57 +0000 (10:49 +0000)]
crimson/osd: removing an unneeded make_unique()
As the desired lifetime of the object matches the lifetime if
it is allocated on the stack, and as no ownership is transferred,
there is no point in using a unique_ptr here.
And see Google's guidance (https://abseil.io/tips/187),
under "Common Anti-Pattern: Avoiding &".
Paul Cuzner [Fri, 12 Nov 2021 03:16:59 +0000 (16:16 +1300)]
mgr/cephadm: Add snmp-gateway service support
Add a new snmp-gateway service to provide a bridge between
Prometheus and an SNMP management platform. The gateway
service uses https://github.com/maxwo/snmp_notifier to provide
an SNMP v2c and SNMP V3 support.
The SNMP V3 support mandates at least authentication, and also
offers authentication and privacy (encryption).
Fixes: https://tracker.ceph.com/issues/52920 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Igor Fedotov [Tue, 2 Nov 2021 12:03:39 +0000 (15:03 +0300)]
os/bluestore: avoid premature onode release.
This was observed when onode's removal is followed by reading
and the latter causes object release before the removal is finalized.
The root cause is an improper 'pinned' state assessment in Onode::get
More detailed overview is:
At some point Onode::get() might face the case when nref == 2 and pinned = true
which means parallel incomplete put is running on the onode - ref count is
decremented but pinned state is still unmodified (and even lock hasn't been
acquired yet).
This might finally result in two puts racing over the same onode with nref == 2
which finally results in a premature onode release:
// nref =3, pinned = 1
// Thread 1 Thread 2
// o->put() o->get()
// --nref(n = 2, pinned=1)
// nref++ (n=3, pinned = 1)
// return
// ...
// o->put()
// --nref(n = 2)
// pinned = 0,
// --nref(n = 1)
// ocs->_unpin_and_rm(o) -> o->put()
// ...
// --nref(n = 0)
// release o
// o->c->get_onode_cache()
// FAULT!
//
The suggested fix is to introduce additional atomic counter tracking
running put() functions. And permit onode release when both regular
nref and put_nref are both equal to zero.
Fixes: https://tracker.ceph.com/issues/53002 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
This should prevent omap and xattr extent allocations from clumping near
the onode's hint. Additionally, only generate them past the default
16MB object_data_handler reservation.
Neha Ojha [Tue, 7 Dec 2021 17:47:22 +0000 (17:47 +0000)]
doc/releases/pacific.rst: add core updates for 16.2.7
16.2.7 fixes https://tracker.ceph.com/issues/53062, so remove the
"big scary warning" from the top of the pacific release page. We continue
to warn about this bug under the 16.2.6 section and in
https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus.
mon: Omit MANY_OBJECTS_PER_PG warning when autoscaler is on
Add a conditional statement when autoscaler is
set to ON to omit message when about pool having
many more objects per pg than cluster average.
Fixes: https://tracker.ceph.com/issues/53516 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Tim Serong [Fri, 10 Dec 2021 07:43:25 +0000 (18:43 +1100)]
ceph.spec.in: fix mgr-cephadm CherryPy requirement for SUSE builds
Commit 78983ad0d0c added cherrypy to ceph-mgr-cephadm's Requires,
but this needs to be split out into distro-specific sections due
to subtle/irritating naming differences.
Fixes: 78983ad0d0cce422da32dc4876ac186f6d32c3f5 Signed-off-by: Tim Serong <tserong@suse.com>
Samuel Just [Fri, 10 Dec 2021 22:31:00 +0000 (14:31 -0800)]
crimson/os/seastore/cache: init extents prior to read
Thus should ensure that any captured members of extent_init_func are
still valid at the cost of not being able to access the contents of the
extent at invocation time. With this, we should be able to rely on any
logical extents/lba extents in the cache having validly initialized lba
pins.
Fixes: https://tracker.ceph.com/issues/53555 Signed-off-by: Samuel Just <sjust@redhat.com>
John Mulligan [Fri, 10 Dec 2021 13:16:19 +0000 (08:16 -0500)]
python-common: make count & count-per-host >= 1 checks consistent
The previous version of the validate function had a incorrect error
statement that suggested the count must be >1 when it should have
been >=1. This confusion was possibly due to using "n < 1" on
one line and "n <= 0" on another line. Since both values are supposed
to be integers this change corrects the error message and makes
the comparisons on the lines both use "n < 1" (since I find it easier
to see that the check "n < 1" is the inverse of the error text
asserting "n >= 1").
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 8 Dec 2021 20:37:11 +0000 (15:37 -0500)]
python-common: add unit test func for invalid yaml inputs
I didn't find a preexisting test function for this so I added a
new test that is fed yaml snippets and expected error messages.
This verifies some of the recently added validation for
count and cound_per_host under the placement spec.
Signed-off-by: John Mulligan <jmulligan@redhat.com>