From 6ae82fe2f737b0001a7f4f903c44909a5dc66da1 Mon Sep 17 00:00:00 2001 From: Anthony D'Atri Date: Sun, 21 May 2023 08:39:56 -0400 Subject: [PATCH] doc/dev/crimson: Improve crimson.rst Signed-off-by: Anthony D'Atri (cherry picked from commit 524901d3cfbf73cc86284a2623ada493836c13ec) --- doc/dev/crimson/crimson.rst | 214 ++++++++++++++++++++---------------- 1 file changed, 118 insertions(+), 96 deletions(-) diff --git a/doc/dev/crimson/crimson.rst b/doc/dev/crimson/crimson.rst index 9695019b8099d..d2a6a00e17df5 100644 --- a/doc/dev/crimson/crimson.rst +++ b/doc/dev/crimson/crimson.rst @@ -2,50 +2,50 @@ crimson ======= -Crimson is the code name of crimson-osd, which is the next generation ceph-osd. -It targets fast networking devices, fast storage devices by leveraging state of -the art technologies like DPDK and SPDK, for better performance. And it will -keep the support of HDDs and low-end SSDs via BlueStore. Crimson will try to -be backward compatible with classic OSD. +Crimson is the code name of ``crimson-osd``, which is the next +generation ``ceph-osd``. It improves performance when using fast network +and storage devices, employing state-of-the-art technologies including +DPDK and SPDK. BlueStore continues to support HDDs and slower SSDs. +Crimson aims to be backward compatible with the classic ``ceph-osd``. .. highlight:: console Building Crimson ================ -Crimson is not enabled by default. To enable it:: +Crimson is not enabled by default. Enable it at build time by running:: $ WITH_SEASTAR=true ./install-deps.sh $ mkdir build && cd build $ cmake -DWITH_SEASTAR=ON .. -Please note, `ASan`_ is enabled by default if crimson is built from a source -cloned using git. +Please note, `ASan`_ is enabled by default if Crimson is built from a source +cloned using ``git``. .. _ASan: https://github.com/google/sanitizers/wiki/AddressSanitizer Testing crimson with cephadm =============================== -The Ceph CI/CD pipeline includes ceph container builds with -crimson-osd subsitituted for ceph-osd. +The Ceph CI/CD pipeline builds containers with +``crimson-osd`` subsitituted for ``ceph-osd``. Once a branch at commit has been built and is available in -shaman, you can deploy it using the cephadm instructions outlined +``shaman``, you can deploy it using the cephadm instructions outlined in :ref:`cephadm` with the following adaptations. -First, while performing the initial bootstrap, use the --image flag to -use a crimson build rather than a default build: +First, while performing the initial bootstrap, use the ``--image`` flag to +use a Crimson build: .. prompt:: bash # cephadm --image quay.ceph.io/ceph-ci/ceph:-crimson --allow-mismatched-release bootstrap ... -You'll likely need to include the --allow-mismatched-release flag to +You'll likely need to supply the ``--allow-mismatched-release`` flag to use a non-release branch. -Additionally, prior to deploying the osds, you'll need enable crimson -and default pools to be created as crimson pools (from cephadm shell): +Additionally, prior to deploying OSDs, you'll need enable Crimson to +direct the default pools to be created as Crimson pools. From the cephadm shell run: .. prompt:: bash # @@ -53,39 +53,39 @@ and default pools to be created as crimson pools (from cephadm shell): ceph osd set-allow-crimson --yes-i-really-mean-it ceph config set mon osd_pool_default_crimson true -The first command enables the crimson experimental feature. Crimson -is highly experimental, and malfunctions up to and including crashes +The first command enables the ``crimson`` experimental feature. Crimson +is highly experimental, and malfunctions including crashes and data loss are to be expected. -The second enables the allow_crimson OSDMap flag. The monitor will -not allow crimson-osd to boot without that flag. +The second enables the ``allow_crimson`` OSDMap flag. The monitor will +not allow ``crimson-osd`` to boot without that flag. -The last causes pools to be created by default with the crimson flag. -crimson pools are restricted to operations supported by crimson. -crimson-osd won't instantiate pgs from non-crimson pools. +The last causes pools to be created by default with the ``crimson`` flag. +Crimson pools are restricted to operations supported by Crimson. +``Crimson-osd`` won't instantiate PGs from non-Crimson pools. Running Crimson =============== -As you might expect, crimson is not featurewise on par with its predecessor yet. +As you might expect, Crimson does not yet have as extensive a feature set as does ``ceph-osd``. object store backend -------------------- At the moment, ``crimson-osd`` offers both native and alienized object store -backends. The native object store backends perform IO using seastar reactor. +backends. The native object store backends perform IO using the SeaStar reactor. They are: .. describe:: cyanstore - CyanStore is modeled after memstore in classic OSD. + CyanStore is modeled after memstore in the classic OSD. .. describe:: seastore Seastore is still under active development. -While the alienized object store backends are backed by a thread pool, which -is a proxy of the alien store adaptor running in SeaStar. The proxy issues +The alienized object store backends are backed by a thread pool, which +is a proxy of the alienstore adaptor running in Seastar. The proxy issues requests to object stores running in alien threads, i.e., worker threads not managed by the Seastar framework. They are: @@ -95,36 +95,34 @@ managed by the Seastar framework. They are: .. describe:: bluestore - The object store used by classic OSD by default. + The object store used by the classic ``ceph-osd`` daemonize --------- Unlike ``ceph-osd``, ``crimson-osd`` does not daemonize itself even if the -``daemonize`` option is enabled. Because, to read this option, ``crimson-osd`` +``daemonize`` option is enabled. In order to read this option, ``crimson-osd`` needs to ready its config sharded service, but this sharded service lives -in the seastar reactor. If we fork a child process and exit the parent after +in the Seastar reactor. If we fork a child process and exit the parent after starting the Seastar engine, that will leave us with a single thread which is -the replica of the thread calls `fork()`_. This would unnecessarily complicate -the code, if we would have tackled this problem in crimson. - -Since a lot of GNU/Linux distros are using systemd nowadays, which is able to -daemonize the application, there is no need to daemonize by ourselves. For -those who are using sysvinit, they can use ``start-stop-daemon`` for daemonizing -``crimson-osd``. If this is not acceptable, we can whip up a helper utility -to do the trick. +a replica of the thread that called `fork()`_. Tackling this problem in Crimson +would unnecessarily complicate the code. +Since supported GNU/Linux distributions use ``systemd``, which is able to +daemonize the application, there is no need to daemonize ourselves. +Those using sysvinit can use ``start-stop-daemon`` to daemonize ``crimson-osd``. +If this is does not work out, a helper utility may be devised. .. _fork(): http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html logging ------- -Currently, ``crimson-osd`` uses the logging utility offered by Seastar. see -``src/common/dout.h`` for the mapping between different logging levels to -the severity levels in Seastar. For instance, the messages sent to ``derr`` -will be printed using ``logger::error()``, and the messages with debug level -over ``20`` will be printed using ``logger::trace()``. +``Crimson-osd`` currently uses the logging utility offered by Seastar. See +``src/common/dout.h`` for the mapping between Ceph logging levels to +the severity levels in Seastar. For instance, messages sent to ``derr`` +will be issued using ``logger::error()``, and the messages with a debug level +greater than ``20`` will be issued using ``logger::trace()``. +---------+---------+ | ceph | seastar | @@ -140,18 +138,17 @@ over ``20`` will be printed using ``logger::trace()``. | > 20 | trace | +---------+---------+ -Please note, ``crimson-osd`` -does not send the logging message to specified ``log_file``. It writes -the logging messages to stdout and/or syslog. Again, this behavior can be +Note that ``crimson-osd`` +does not send log messages directly to a specified ``log_file``. It writes +the logging messages to stdout and/or syslog. This behavior can be changed using ``--log-to-stdout`` and ``--log-to-syslog`` command line -options. By default, ``log-to-stdout`` is enabled, and the latter disabled. +options. By default, ``log-to-stdout`` is enabled, and ``--log-to-syslog`` is disabled. vstart.sh --------- -To facilitate the development of crimson, following options would be handy when -using ``vstart.sh``, +The following options aree handy when using ``vstart.sh``, ``--crimson`` start ``crimson-osd`` instead of ``ceph-osd`` @@ -160,71 +157,96 @@ using ``vstart.sh``, do not daemonize the service ``--redirect-output`` - redirect the stdout and stderr of service to ``out/$type.$num.stdout``. + Redirect the ``stdout`` and ``stderr`` to ``out/$type.$num.stdout``. ``--osd-args`` - pass extra command line options to crimson-osd or ceph-osd. It's quite - useful for passing Seastar options to crimson-osd. For instance, you could - use ``--osd-args "--memory 2G"`` to set the memory to use. Please refer - the output of:: + Pass extra command line options to ``crimson-osd`` or ``ceph-osd``. + This is useful for passing Seastar options to ``crimson-osd``. For + example, one can supply ``--osd-args "--memory 2G"`` to set the amount of + memory to use. Please refer to the output of:: crimson-osd --help-seastar - for more Seastar specific command line options. + for additional Seastar-specific command line options. ``--cyanstore`` use CyanStore as the object store backend. ``--bluestore`` - use the alienized BlueStore as the object store backend. This is the default - setting, if not specified otherwise. + Use the alienized BlueStore as the object store backend. This is the default. ``--memstore`` - use the alienized MemStore as the object store backend. + Use the alienized MemStore as the object store backend. + +``--seastore`` + Use SeaStore as the back end object store. + +``--seastore-devs`` + Specify the block device used by SeaStore. + +``--seastore-secondary-devs`` + Optional. SeaStore supports multiple devices. Enable this feature by + passing the block device to this option. + +``--seastore-secondary-devs-type`` + Optional. Specify the type of secondary devices. When the secondary + device is slower than main device passed to ``--seastore-devs``, the cold + data in faster device will be evicted to the slower devices over time. + Valid types include ``HDD``, ``SSD``(default), ``ZNS``, and ``RANDOM_BLOCK_SSD`` + Note secondary devices should not be faster than the main device. ``--seastore`` - use SeaStore as the object store backend. + Use SeaStore as the object store backend. -So, a typical command to start a single-crimson-node cluster is:: +To start a cluster with a single Crimson node, run:: $ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \ --without-dashboard --cyanstore \ --crimson --redirect-output \ --osd-args "--memory 4G" -Where we assign 4 GiB memory, a single thread running on core-0 to crimson-osd. +Here we assign 4 GiB memory and a single thread running on core-0 to ``crimson-osd``. + +Another SeaStore example:: + + $ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \ + --without-dashboard --seastore \ + --crimson --redirect-output \ + --seastore-devs /dev/sda \ + --seastore-secondary-devs /dev/sdb \ + --seastore-secondary-devs-type HDD -You could stop the vstart cluster using:: +Stop this ``vstart`` cluster by running:: $ ../src/stop.sh --crimson Metrics and Tracing =================== -Crimson offers three ways to report the stats and metrics: +Crimson offers three ways to report stats and metrics. pg stats reported to mgr ------------------------ Crimson collects the per-pg, per-pool, and per-osd stats in a `MPGStats` -message, and send it over to mgr, so that the mgr modules can query +message which is sent to the Ceph Managers. Manager modules can query them using the `MgrModule.get()` method. asock command ------------- -an asock command is offered for dumping the metrics:: +An admin socket command is offered for dumping metrics:: $ ceph tell osd.0 dump_metrics $ ceph tell osd.0 dump_metrics reactor_utilization -Where `reactor_utilization` is an optional string allowing us to filter +Here `reactor_utilization` is an optional string allowing us to filter the dumped metrics by prefix. Prometheus text protocol ------------------------ -the listening port and address can be configured using the command line options of +The listening port and address can be configured using the command line options of `--prometheus_port` see `Prometheus`_ for more details. @@ -237,11 +259,11 @@ fio --- ``crimson-store-nbd`` exposes configurable ``FuturizedStore`` internals as an -NBD server for use with fio. +NBD server for use with ``fio``. -To use fio to test ``crimson-store-nbd``, +In order to use ``fio`` to test ``crimson-store-nbd``, perform the below steps. -#. You will need to install ``libnbd``, and compile fio like +#. You will need to install ``libnbd``, and compile it into ``fio`` .. prompt:: bash $ @@ -258,8 +280,8 @@ To use fio to test ``crimson-store-nbd``, cd build ninja crimson-store-nbd -#. Run the ``crimson-store-nbd`` server with a block device. Please specify - the path to the raw device, like ``/dev/nvme1n1`` in place of the created +#. Run the ``crimson-store-nbd`` server with a block device. Specify + the path to the raw device, for example ``/dev/nvme1n1``, in place of the created file for testing with a block device. .. prompt:: bash $ @@ -275,16 +297,16 @@ To use fio to test ``crimson-store-nbd``, --type transaction_manager \ --uds-path ${unix_socket} & - in which, + Below are descriptions of these command line arguments: ``--smp`` - how many CPU cores are used + The number of CPU cores to use (Symmetric MultiProcessor) ``--mkfs`` - initialize the device first + Initialize the device first. ``--type`` - which backend to use. If ``transaction_manager`` is specified, SeaStore's + The back end to use. If ``transaction_manager`` is specified, SeaStore's ``TransactionManager`` and ``BlockSegmentManager`` are used to emulate a block device. Otherwise, this option is used to choose a backend of ``FuturizedStore``, where the whole "device" is divided into multiple @@ -294,7 +316,7 @@ To use fio to test ``crimson-store-nbd``, without the object store semantics, ``transaction_manager`` would be a better choice. -#. Create an fio job file named ``nbd.fio`` +#. Create a ``fio`` job file named ``nbd.fio`` .. code:: ini @@ -311,7 +333,7 @@ To use fio to test ``crimson-store-nbd``, [job0] offset=0 -#. Test the crimson object store using the fio compiled just now +#. Test the Crimson object store, using the custom ``fio`` built just now .. prompt:: bash $ @@ -319,9 +341,9 @@ To use fio to test ``crimson-store-nbd``, CBT --- -We can use `cbt`_ for performing perf tests:: +We can use `cbt`_ for performance tests:: - $ git checkout master + $ git checkout main $ make crimson-osd $ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/baseline ../src/test/crimson/cbt/radosbench_4K_read.yaml $ git checkout yet-another-pr @@ -346,9 +368,9 @@ We can use `cbt`_ for performing perf tests:: 19:48:23 - INFO - cbt - seq/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.0508262/0.0557337 => accepted 19:48:23 - WARNING - cbt - 1 tests failed out of 16 -Where we compile and run the same test against two branches. One is ``master``, another is ``yet-another-pr`` branch. -And then we compare the test results. Along with every test case, a set of rules is defined to check if we have -performance regressions when comparing two set of test results. If a possible regression is found, the rule and +Here we compile and run the same test against two branches: ``main`` and ``yet-another-pr``. +We then compare the results. Along with every test case, a set of rules is defined to check for +performance regressions when comparing the sets of test results. If a possible regression is found, the rule and corresponding test results are highlighted. .. _cbt: https://github.com/ceph/cbt @@ -383,7 +405,7 @@ The `tips`_ for debugging Scylla also apply to Crimson. Human-readable backtraces with addr2line ---------------------------------------- -When a seastar application crashes, it leaves us with a serial of addresses, like:: +When a Seastar application crashes, it leaves us with a backtrace of addresses, like:: Segmentation fault. Backtrace: @@ -402,10 +424,10 @@ When a seastar application crashes, it leaves us with a serial of addresses, lik 0x000000000d833ac9 Segmentation fault -``seastar-addr2line`` offered by Seastar can be used to decipher these -addresses. After running the script, it will be waiting for input from stdin, -so we need to copy and paste the above addresses, then send the EOF by inputting -``control-D`` in the terminal:: +The ``seastar-addr2line`` utility provided by Seastar can be used to map these +addresses to functions. The script expects input on ``stdin``, +so we need to copy and paste the above addresses, then send EOF by inputting +``control-D`` in the terminal. One might use ``echo`` or ``cat`` instead`:: $ ../src/seastar/scripts/seastar-addr2line -e bin/crimson-osd @@ -433,8 +455,8 @@ so we need to copy and paste the above addresses, then send the EOF by inputting seastar::app_template::run_deprecated(int, char**, std::function&&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/app-template.cc:173 (discriminator 5) main at /home/kefu/dev/ceph/build/../src/crimson/osd/main.cc:131 (discriminator 1) -Please note, ``seastar-addr2line`` is able to extract the addresses from -the input, so you can also paste the log messages like:: +Note that ``seastar-addr2line`` is able to extract addresses from +its input, so you can also paste the log messages as below:: 2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr:Backtrace: 2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e78dbc @@ -443,11 +465,11 @@ the input, so you can also paste the log messages like:: 2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e985 2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: /lib64/libpthread.so.0+0x0000000000012dbf -Unlike classic OSD, crimson does not print a human-readable backtrace when it -handles fatal signals like `SIGSEGV` or `SIGABRT`. And it is more complicated -when it comes to a stripped binary. So before planting a signal handler for -those signals in crimson, we could to use `script/ceph-debug-docker.sh` to parse -the addresses in the backtrace:: +Unlike the classic ``ceph-osd``, Crimson does not print a human-readable backtrace when it +handles fatal signals like `SIGSEGV` or `SIGABRT`. It is also more complicated +with a stripped binary. So instead of planting a signal handler for +those signals into Crimson, we can use `script/ceph-debug-docker.sh` to map +addresses in the backtrace:: # assuming you are under the source tree of ceph $ ./src/script/ceph-debug-docker.sh --flavor crimson master:27e237c137c330ebb82627166927b7681b20d0aa centos:8 -- 2.39.5