Previously, errors stuck indelibly to the inode, which
meant that a close call would see an error even if the
user already dutifully fsync()'d and handled it.
We should emit each error only once per file handle.
ceph-disk: enable directory backed OSD at boot time
https://github.com/ceph/ceph/commit/539385b143feee3905dceaf7a8faaced42f2d3c6
introduced a regression preventing directory backed OSD from starting at
boot time.
For device backed OSD the boot sequence starts with ceph-disk@.service
and proceeds to
systemctl enable --runtime ceph-osd@.service
where the --runtime ensure ceph-osd@12 is removed when the machine
reboots so that it does not compete with the ceph-disk@/dev/sdb1 unit at
boot time.
However directory backed OSD solely rely on the ceph-osd@.service unit
to start at boot time and will therefore fail to boot.
The --runtime flag is selectively set for device backed OSD only.
mon/OSDMonitor: transit creating_pgs from pgmap when upgrading
there could be some pg(s) still being created when we are upgrading to
luminous, and the pools holding them are not changed in the sense of
pg_pool_t::last_change after the upgrade and before we scan for
creating pgs. in that case, the existing update_pending_creatings()
will fail to collect the pgs being created before the upgrade.
with this change, the creating_pgs in pgmap are also used for updating
the OSDMonitor's creating_pgs if it's updated.
but we should stopupdating the pgmap once the upgrade completes. i.e.
stop dispatching MSG_PGSTATS messages to PGMonitor if the quorum and all
osds are luminous.
John Spray [Wed, 8 Mar 2017 12:13:46 +0000 (12:13 +0000)]
mds: shut down finisher before objecter
Some of the finisher contexts would try to call into Objecter.
We mostly are protected from this by mds_lock+the stopping
flag, but at the Filer level there's no mds_lock, so in the
case of file size probing we have a problem.
Fixes: http://tracker.ceph.com/issues/19204 Signed-off-by: John Spray <john.spray@redhat.com>
John Spray [Tue, 28 Mar 2017 18:13:33 +0000 (14:13 -0400)]
mds: ignore ENOENT on writing backtrace
We get ENOENT when a pool doesn't exist. This can
happen because we don't prevent people deleting
former cephfs data pools whose files may not have
had their metadata flushed yet.
http://tracker.ceph.com/issues/19401 Signed-off-by: John Spray <john.spray@redhat.com>
Matt Benjamin [Tue, 11 Apr 2017 10:42:07 +0000 (06:42 -0400)]
rgw_file: don't expire directories being read
If a readdir expire event turns out to be older than last_readdir,
just reschedule it (but actually, we should just discard it, as
another expire event must be in queue.
Fixes: http://tracker.ceph.com/issues/19625 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
John Spray [Fri, 7 Apr 2017 13:24:01 +0000 (09:24 -0400)]
mon: emit cluster log messages on MDS health changes
Previously, when we got a beacon that updated the health
metrics for an MDS, the user would just see mysterious-looking
cluster log messages indicating a rising fsmap epoch number.
It would be good to do this for health messages in general at
some point, but for now just do it for the MDS ones.
Fixes: http://tracker.ceph.com/issues/19551 Signed-off-by: John Spray <john.spray@redhat.com>
Matt Benjamin [Tue, 11 Apr 2017 09:56:13 +0000 (05:56 -0400)]
rgw_file: chunked readdir
Adjust readdir callback path for new nfs-ganesha chunked readdir,
including changes to respect the result of callback to not
continue.
Pending introduction of offset name hint, our caller will just be
completely enumerating, so it is possible to remove the offset map
and just keep a last offset.
Fixes: http://tracker.ceph.com/issues/19624 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Sage Weil [Wed, 12 Apr 2017 02:35:32 +0000 (22:35 -0400)]
mon/OSDMonitor: fix initial map when require_luminous_osds not set on mkfs
If we don't set the luminous flag, we should not set the new luninous
fields or else we'll get a crc mismatch. (Funnily that happens in the
epoch where the flag is eventually set and the encoded map finally includes
the field we have set in memory.)
Yang Honggang [Thu, 13 Apr 2017 12:09:07 +0000 (20:09 +0800)]
cephfs: fix write_buf's _len overflow problem
After I have set about 400 64KB xattr kv pair to a file,
mds is crashed. Every time I try to start mds, it will crash again.
The root reason is write_buf._len overflowed when doing
Journaler::append_entry().
This patch try to fix this problem through the following changes:
John Spray [Wed, 29 Mar 2017 18:38:37 +0000 (19:38 +0100)]
tools/cephfs: set dir_layout when injecting inodes
When we left this as zero, the MDS would interpret it was HASH_LINUX
rather than the default HASH_RJENKINS. Potentially that
could cause problems if there perhaps were already dirfrags in
the metadata pool that were set up using rjenkins. Mainly
it just seems more appropriate to explicitly set this field
rather than hit the fallback behaviour.
Related: http://tracker.ceph.com/issues/19406 Signed-off-by: John Spray <john.spray@redhat.com>
Jason Dillaman [Tue, 11 Apr 2017 01:09:01 +0000 (21:09 -0400)]
rbd: import-diff should discard any zeroed extents
Sparse (zeroed) extents cannot be safely skipped. Instead, the
zeroed extent should be discarded from the image to ensure
the import remains consistent with the export.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Henrik Korkuc [Sun, 19 Feb 2017 09:44:20 +0000 (11:44 +0200)]
client/Client.cc: add feature to reconnect client after MDS reset
Client.cc marks session as stale instead of reconecting after received
reset from MDS. On MDS side session is closed so MDS is ignoring cap
renew. This adds option to reconnect stale client sessions instead of
just marking sessions stale.
Fixes: http://tracker.ceph.com/issues/18757 Signed-off-by: Henrik Korkuc <henrik@kirneh.eu>
Jos Collin [Wed, 12 Apr 2017 09:18:43 +0000 (14:48 +0530)]
test: add explicit braces to avoid ambiguous ‘else’ and to silence warnings
The following warning appears during make for several files in the test submodule:
warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wdangling-else]
qa/workunits/ceph-helpers: do not error out if is_clean
it would be a race otherwise, because we cannot be sure that the cluster
pgs are not all clean or not when run_osd() returns, but we can be sure
that they are expected to active+clean after a while. that's what
wait_for_clean() does.
545bc83 removed most of the plumbing for the debug mon features admin
socket commands but failed to remove the register/unregister command
pairs. This means the monitor asserts if an attempt is made to use any
of these commands.