Yingxin Cheng [Fri, 24 Dec 2021 08:12:13 +0000 (16:12 +0800)]
crimson/osd: --help-seastar does no longer belong to unknown_args
Now that app_template handles --help-seastar, and prints out all
app-level usages if --help is included in command line options. There is
no need to add a dedicated --help-seastar by ourselves and translate it
to --help.
Adam Kupczyk [Wed, 15 Dec 2021 09:59:55 +0000 (09:59 +0000)]
os/bluestore/bluefs: Add tracking of bluefs log in noop replay mode
Keep updating bluefs log when printing content of bluefs replay log.
Without this modification we only have initial content of log.
Log can be printed by 'ceph-bluestore-tool bluefs-log-dump'.
Adam Kupczyk [Wed, 24 Nov 2021 17:55:05 +0000 (18:55 +0100)]
os/bluestore/bluefs: Sync BlueFS log with its allocation delta
BlueFS log is the only file that we can append to.
When we append to file we must take into consideration previously commited allocations,
otherwise update will be miscalculated.
Adam Kupczyk [Wed, 24 Nov 2021 17:52:35 +0000 (18:52 +0100)]
test/objectstore/bluefs_test: Add test for continuation of previous BlueFS log
Added test that verifies that in update mode we properly pick up delta.
BlueFS log is the only file that can be appended to, but it is done in very indirect way.
Aashish Sharma [Mon, 8 Nov 2021 07:31:02 +0000 (13:01 +0530)]
mgr/dashboard: Cluster Expansion - Review Section: fixes and improvements
Ensure "Storage capacity" keeps the "Description : Value" approach ("Number of devices: X" and "Raw Capacity: Y" in different lines).Correct issue with "host by services" host count
gal salomon [Fri, 7 May 2021 21:29:13 +0000 (00:29 +0300)]
RGW: Implement continuation, progress, stats, end s3select response
RGW/S3select: Implement output-serializationi. user may request different CSV defintions
for output (field delimiter, row delimiter, quote handling.
RGW/S3select: Implement presto-alignments. presto-application sends
queries with table-alias,case insensitive, and with no-semicolon at the
end of statement.
Xuehan Xu [Fri, 17 Dec 2021 05:20:35 +0000 (13:20 +0800)]
crimson/os/seastore: reset onode in 'SeaStore::repeat_with_onode' before the transaction gets destroyed
Onodes hold references to the onode tree extents. And if it's referencing the root extent, that root
extent is cached in the onode trees root_tracker which caches onode tree roots by transaction address.
Than root_tracker entry only gets removed when the onode(or the corresponding "super") is destroyed.
On the other hand, two non-concurrent transactions can occupy the same address. So if an onode gets destroyed
after its transaction is destroyed, there will be a chance that another transaction occupying the same
address get that not-yet-destroyed and may-be-outdated onode.
BTW, Since we already cache extents in transactions, might want to drop onode tree root_tracker later?
Sage Weil [Fri, 17 Dec 2021 04:54:25 +0000 (23:54 -0500)]
Merge PR #44228 into master
* refs/pull/44228/head:
qa/suites/orch/cephadm/osds: test 'ceph cephadm osd activate'
mgr/cephadm/services/osd: skip found osds that already have daemons
mgr/cephadm: allow activation of OSDs that have previously started
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sage Weil [Mon, 6 Dec 2021 15:19:16 +0000 (10:19 -0500)]
mgr/cephadm: allow activation of OSDs that have previously started
When this code was introduced way back in ea987a0e56db106f7c76d11f86b3e602257f365e,
for some reason I was focused only on freshly created OSDs. The
get_osd_uuid_map() helper is used by deploy_osd_daemons_for_existing_osds()
which is called not only by OSD creation but also by 'ceph cephadm
osd activate', which is meant to instantiate daemons for existing OSD
devices (e.g., devices that were reattached to a new server, or whose
/var/lib/ceph/$fsid/osd.$id directory was lost for some other reason.
However, if we ignore OSDs with up_from > 0, then we can't recreate a
daemon instance for such existing OSDs--arguably the most important ones,
since they may hold real data.
Fixes: https://tracker.ceph.com/issues/53491 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 16 Dec 2021 15:24:46 +0000 (10:24 -0500)]
mon: prevent new sessions during shutdown
From shutdown() we set STATE_SHUTDOWN and then call remove_all_sessions().
ms_handle_accept() is the only caller of add_session, so verifying that
we aren't shutting down (while under the session_map_lock) is sufficient
to prevent any new sessions from being added.
Fixes: https://tracker.ceph.com/issues/39150 Signed-off-by: Sage Weil <sage@newdream.net>
Ronen Friedman [Thu, 16 Dec 2021 10:49:57 +0000 (10:49 +0000)]
crimson/osd: removing an unneeded make_unique()
As the desired lifetime of the object matches the lifetime if
it is allocated on the stack, and as no ownership is transferred,
there is no point in using a unique_ptr here.
And see Google's guidance (https://abseil.io/tips/187),
under "Common Anti-Pattern: Avoiding &".
Modify bluefs-import command so it can properly initialize allocators.
Without allocators initialized, importing file to bluefs did overwrite some random data,
including first block on device.
Paul Cuzner [Fri, 12 Nov 2021 03:16:59 +0000 (16:16 +1300)]
mgr/cephadm: Add snmp-gateway service support
Add a new snmp-gateway service to provide a bridge between
Prometheus and an SNMP management platform. The gateway
service uses https://github.com/maxwo/snmp_notifier to provide
an SNMP v2c and SNMP V3 support.
The SNMP V3 support mandates at least authentication, and also
offers authentication and privacy (encryption).
Fixes: https://tracker.ceph.com/issues/52920 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Yaarit Hatuka [Wed, 15 Dec 2021 22:45:57 +0000 (22:45 +0000)]
mgr/telemetry: catch also IndexError in gather_device_report()
When generating the device report, we obfuscate host names, which
exist in a device's 'location' key. Some devices do not have a
'location' key, thus we catch a KeyError in these cases; in other cases
the key exists, but its value is an empty list. Skip these too by
catching an IndexError.
Igor Fedotov [Tue, 2 Nov 2021 12:03:39 +0000 (15:03 +0300)]
os/bluestore: avoid premature onode release.
This was observed when onode's removal is followed by reading
and the latter causes object release before the removal is finalized.
The root cause is an improper 'pinned' state assessment in Onode::get
More detailed overview is:
At some point Onode::get() might face the case when nref == 2 and pinned = true
which means parallel incomplete put is running on the onode - ref count is
decremented but pinned state is still unmodified (and even lock hasn't been
acquired yet).
This might finally result in two puts racing over the same onode with nref == 2
which finally results in a premature onode release:
// nref =3, pinned = 1
// Thread 1 Thread 2
// o->put() o->get()
// --nref(n = 2, pinned=1)
// nref++ (n=3, pinned = 1)
// return
// ...
// o->put()
// --nref(n = 2)
// pinned = 0,
// --nref(n = 1)
// ocs->_unpin_and_rm(o) -> o->put()
// ...
// --nref(n = 0)
// release o
// o->c->get_onode_cache()
// FAULT!
//
The suggested fix is to introduce additional atomic counter tracking
running put() functions. And permit onode release when both regular
nref and put_nref are both equal to zero.
Fixes: https://tracker.ceph.com/issues/53002 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>