$ ceph pg <pgid> mark_unfound_lost revert
Revert "lost" objects to their prior state, either a previous version
-or delete them if they were just created. ::
+or delete them if they were just created.
OSD subsystem
-============
+====================================
Kernel client troubleshooting (FS)
-============
+====================================
If there is an issue with the cephfs kernel client, the most important thing is
figuring out whether the problem is with the client or the MDS. Generally,
this is easy to work out. If the kernel client broke directly, there
will be output in dmesg. Collect it and any appropriate kernel state. If
the problem is with the MDS, there will be hung requests that the client
-is waiting on. Look in /sys/kernel/debug/ceph/*/ and cat the mdsc file to
+is waiting on. Look in ``/sys/kernel/debug/ceph/*/`` and cat the ``mdsc`` file to
get a listing of requests in progress. If one of them remains there, the
MDS has probably "forgotten" it.
We can get hints about what's going on by dumping the MDS cache:
osd map epoch by having the monitor set its *up_thru* in the osd
map. This helps peering ignore previous *acting sets* for which
peering never completed after certain sequences of failures, such as
- the second interval below::
+ the second interval below:
- - *acting set* = [A,B]
- - *acting set* = [A]
- - *acting set* = [] very shortly after (e.g., simultaneous failure, but staggered detection)
- - *acting set* = [B] (B restarts, A does not)
+ - *acting set* = [A,B]
+ - *acting set* = [A]
+ - *acting set* = [] very shortly after (e.g., simultaneous failure, but staggered detection)
+ - *acting set* = [B] (B restarts, A does not)
*last epoch clean*
the last epoch at which all nodes in the *acting set*
+.. _adjusting-crush:
+
=========================
Adjusting the CRUSH map
=========================
-.. _adjusting-crush:
-
There are a few ways to adjust the crush map:
* online, by issuing commands to the monitor
RADOS cluster. This allows one to identify if any requests are blocked
by a non-responsive ceph-osd. For example, one might see::
-{ "ops": [
+ { "ops": [
{ "tid": 1858,
"pg": "2.d2041a48",
"osd": 1,
#. Allocate a new OSD id::
- $ ceph osd create
- 123
+ $ ceph osd create
+ 123
#. Make sure ceph.conf is valid for the new OSD.
#. Initialize osd data directory::
- $ ceph-osd -i 123 --mkfs --mkkey
+ $ ceph-osd -i 123 --mkfs --mkkey
#. Register the OSD authentication key::
- $ ceph auth add osd.123 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd-data/123/keyring
+ $ ceph auth add osd.123 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd-data/123/keyring
#. Adjust the CRUSH map to allocate data to the new device (see :ref:`adjusting-crush`).
#. Remove it from the CRUSH map::
- $ ceph osd crush remove osd.123
+ $ ceph osd crush remove osd.123
#. Remove it from the osd map::
- $ ceph osd rm 123
+ $ ceph osd rm 123
See also :ref:`failures-osd`.