Ramana Raja [Thu, 22 Sep 2022 15:41:50 +0000 (11:41 -0400)]
src/mds: increment directory inode's change attr by one
... whenever the mtime or ctime of the directory inode is modified.
In CephFS subvolume clones exported using NFS-Ganesha, newly created
files using `touch` were not being listed. It was identified that the
create request sent to the Ceph MDS via NFS-Ganesha's libcephfs client
modified the mtime and ctime of the parent directory, but did not modify
the change_attr of the parent directory. Since the NFS client
didn't see a modification of the change attribute in the reply, it
didn't invalidate its readdir cache. The subsequent directory `ls` was
satisfied from the NFS client's stale readdir cache.
Whenever parent directory inode's mtime was modified in
MDCache::predirty_journal_parents(), the parent inode's change_attr
was set to its dirstat->change_attr. The parent inode's
dirstat->change_attr doesn't track changes to parent's *ctime only*
changes such as setattr, setvxattr, etc. on the parent
directory. See commit 0d441dcd6af553d11d6be6df56d577c5659904a0 for more
details. This caused the directory inode's change_attr to not be updated
when an operation to change only its ctime was followed by an operation
to change its mtime and ctime.
Fix this by making changes to MDCache::predirty_journal_parents() and
CInode::finish_scatter_gather_update() to increment the directory
inode's change_attr by one instead of setting it to its
dirstat->change_attr.
Fixes: https://tracker.ceph.com/issues/57210 Signed-off-by: Ramana Raja <rraja@redhat.com>
Alternate operations that only change directory's ctime
(setattr/setxattr/removexattr on directory) with those that change
directory's mtime and ctime (create/rename/remove a file within
directory). Check that directory's change_attr is updated everytime
ctime changes.
Ken Dreyer [Fri, 30 Sep 2022 20:56:35 +0000 (16:56 -0400)]
win32: speed up and simplify deps cloning
Use --depth 1 for all the dependencies we clone to speed up the process.
Use the --branch argument for cloning all dependencies. This simplifies
the process and makes it easier to use other copies in an offline
environment where github.com is inaccessible.
Stefan Chivu [Tue, 4 Oct 2022 14:02:14 +0000 (14:02 +0000)]
rbd: Removed device_name argument from wnbd unmap
Right now, rbd-wnbd doesn't actually use disk path
identifiers such as "/dev/*" or "\\.\PhysicalDrive*".
So instead of accepting two arguments that are basically
handled more or less the same, we're dropping the device_name
argument and sticking to the image spec.
Signed-off-by: Stefan Chivu <schivu@cloudbasesolutions.com>
Kefu Chai [Sat, 1 Oct 2022 10:03:30 +0000 (18:03 +0800)]
crimson/osd: drop redundant code
this change is a cleanup.
we already update `superblock` with the latest `boost_epoch` and
osdmap's epoch in `OSD::handle_osd_map()`, and `committed_osd_maps()` is
called at end of this function. `shutdown()` is called when we marked
down and stop by the monitor in `committed_osd_maps()`. so these
assignment statements are noops. instead, we should stop the whole osd
service. let's leave it for another commit.
Kefu Chai [Sat, 1 Oct 2022 09:43:00 +0000 (17:43 +0800)]
crimson/osd: use abort_source in stop_signal
before this change, `stop_signal::wait()` waits until it receives
`SIGTERM` or `SIGINT`, but we also need to stop the service per the
request of monitor or when a serious health condition is detected.
so, an `abort_source` should allow the server to request abort by
itself. also, as the single truth of stop, `stop_signal` will be
able to send the message to its subscribers to abort any "blocking"
calls which might prevent or delay the stop process.
Kefu Chai [Sat, 1 Oct 2022 08:30:43 +0000 (16:30 +0800)]
crimson/osd: vendor stop_signal.h
stop_signal.h is copied from seastar/apps/lib/stop_signal.hh. this
change copies it int our project so we can customize it in a following
commit. we will need to add an `abort_source` member to it.
the class `stop_signal` works great for us. but under some
circumstances, we also need to stop the crimson server programmatically
from the server itsef, for instance, per the request of monitor. in that
case, `stop_signal` cannot fulfill our needs, as the only thing which
can stop it is, SIGTERM or SIGTERM. we could send SIGINT to ourselves to
unblock `stop_signal::wait()`. but it would be better if we could leverage
`abort_source` for this purpose. for two reasons:
* `abort_source` allows use to cancel a "blocking" op.
we have a couple background tasks in crimson, like `AdminSocket`'s
accept and handle task, which could have been stopped by an
`abort_source`. but now it checks for the `stop_gate` before accepting
an incoming connection. this pattern cannot be always be repeated.
because we cannot *abort* a "blocking" task. for instance, we use
a homebrew `tri_mutex` to protect the read/write consistency of
an obc, what if the server would need to stop when a request is
still trying to acquire the lock? with the help of the `abort_source`,
we should be able to either subscribe from the `abort_source`,
and set an abort exception for each of the waiters. the same applies
to other "blocking" calls like `shared_promise::get_shared_future()`,
this allows us to abort the waiter before the promise is resolved.
this could be handy if we need to abort potentially op which
might take long and should be aborted when monitor wants to
kill the osd instance.
* `abort_source` allows us to abort the services programmatically.
this is the use case mentioned in the beginning of this commit
message.
This PR rewrites the front matter in the "Erasure Code"
section of the RADOS documentation. Previously, the information
in this section was syntactically confused. I have also fleshed
out the distinction between erasure coding and replication.
Josh Durgin [Fri, 11 Mar 2022 20:17:54 +0000 (15:17 -0500)]
doc/governance: update based on review and CLT discussions
Clarified some parts of council that were discussed previously,
specifying the number of members and a staggered term.
Added a bit more about the steering committee - thinking the meetings
could be split between tactical (3/4 weeks) and strategic (monthly),
and still open to anyone to join the discussion, but restricted to
only members voting.
Removed the meeting section since that belongs more in a separate
place, like the ceph.io website.
Josh Durgin [Fri, 15 Oct 2021 15:36:07 +0000 (11:36 -0400)]
doc/governance: add proposed structure
This is Sage's summary of the Ceph leadership team discussions around
this topic. Still many details to be worked out, this is just one
concrete proposal as a basis for further discussion.
crimson/os/seastore: use device_off_t for offset at seastore level
* Replace the reset of seastore_off_t by the extended device_off_t.
* Extend offset from 32-bit to 56-bit signed integer at seastore level.
* res_paddr_t to embed device_off_t.
* blk_paddr_t to use signed device_off_t for consistency.