Samuel Just [Fri, 3 Jun 2016 00:13:09 +0000 (17:13 -0700)]
src/: remove all direct comparisons to get_max()
get_max() now returns a special singleton type from which hobject_t's
can be assigned and constructed, but which cannot be directly compared.
This patch also cleans up all such uses to use is_max() instead.
This should prevent some issues like 16113 by preventing us from
checking for max-ness by comparing against a sentinel value. The more
complete fix will be to make all fields of hobject_t private and enforce
a canonical max() representation that way. That patch will be hard to
backport, however, so we'll settle for this for now.
Samuel Just [Fri, 3 Jun 2016 00:36:21 +0000 (17:36 -0700)]
hobject: compensate for non-canonical hobject_t::get_max() encodings
This closes a loop-hole that could allow a non-canonical in memory
hobject_t::get_max() object which would return true for is_max(), but
false for *this == hobject_t::get_max().
Ramana Raja [Wed, 13 Apr 2016 08:33:51 +0000 (14:03 +0530)]
ceph_volume_client: evict client also based on mount path
Evict clients based on not just their auth ID, but also based on the
volume path mounted. This is needed for the Manila use-case, where
the clients using an auth ID are denied further access to a share.
John Spray [Wed, 11 May 2016 12:18:23 +0000 (13:18 +0100)]
client: report root's quota in statfs
When user is mounted a quota-restricted inode
as the root, report that inode's quota status
as the filesystem statistics in statfs.
This allows us to have a fairly convincing illusion
that someone has a filesystem to themselves, when
they're really mounting a restricted part of
the larger global filesystem.
Fixes: http://tracker.ceph.com/issues/15599 Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit b6d2b6d1a51969c210ae75fef93c71ac21f511a6)
Jason Dillaman [Wed, 25 May 2016 18:00:34 +0000 (14:00 -0400)]
rbd-mirror: stop stale replayers before starting new replayers
If the connection details are tweaked for a remote peer, stop
the existing replayer before potentially starting a new replayer
against the same remote.
Jason Dillaman [Tue, 24 May 2016 02:21:33 +0000 (22:21 -0400)]
journal: eliminate watch delay for object refetches
The randomized write sizes of the modified rbd-mirror stress
test results in a lot of journal object with few entries.
Immediately fetch objects when performing a refetch check prior
to closing an empty object.
Jason Dillaman [Mon, 23 May 2016 18:57:03 +0000 (14:57 -0400)]
journal: keep active tag to assist with pruning watched objects
It's possible that there might be additional entries to prune in
objects that haven't been prefetched yet. Keep the active tag
to allow these entries to be pruned after they have been loaded.
Jason Dillaman [Mon, 23 May 2016 15:01:05 +0000 (11:01 -0400)]
journal: cleanup watch refetch flag handling
Clear the refetch required flag while scheduling the watch
and remove the stale object after the watch completes if still
empty. Previously, it was possible for the flag to become
out-of-sync with whether or not it was actually refreshed
and pruned.
Fixes: http://tracker.ceph.com/issues/15993 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit ff2cc27ae592646b495bf1b614d35bd50c091a3d)
Ricardo Dias [Fri, 13 May 2016 15:44:53 +0000 (16:44 +0100)]
rbd-mirror: Unregister clients from non-primary images journal
A non-primary image may have registered clients on its journal
(for instance a primary image that was later demoted). We must
unregister the clients when disabling image mirroring with the
force option.
Mykola Golub [Wed, 25 May 2016 18:54:16 +0000 (21:54 +0300)]
test: workaround failure in journal.sh
With the changes to ensure that the commit position of a new
client is initialized to the minimum position of other clients,
the 'journal inspect/export' commands return zero records because
the master client has committed all of its entries.
Workaround this by restoring the initial commit position after
writing to the image.
Robin H. Johnson [Fri, 20 May 2016 23:00:33 +0000 (16:00 -0700)]
rgw: fix manager selection when APIs customized
When modifying rgw_enable_apis per RGW instance, such as for staticsites, you
can end up with RESTManager instance being null in some cases, which returns a
HTTP 405 MethodNotAllowed to all requests.
Example configuration to trigger the bug:
rgw_enable_apis = s3website
Backport: jewel
X-Note: Patch from Yehuda in private IRC discussion, 2016/05/20. Fixes: http://tracker.ceph.com/issues/15973 Fixes: http://tracker.ceph.com/issues/15974 Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
(cherry picked from commit 7c7a465b55f7100eab0f140bf54f9420abd1c776)
Kefu Chai [Fri, 13 May 2016 03:26:31 +0000 (11:26 +0800)]
osd/OpRequest: reset connection upon unregister
this helps to free the resources referenced by the connection, among
other things, in the case of MOSDOp, the OSD::Session and OSDMap. this
helps to free the resource earlier and trim the osdmaps in time.
Kefu Chai [Thu, 12 May 2016 12:28:11 +0000 (20:28 +0800)]
osd: reset session->osdmap if session is not waiting for a map anymore
we should release the osdmap reference once we are done with it,
otherwise we might need to wait very long to update that reference with
a newer osdmap ref. this appears to be an OSDMap leak: it is held by an
quiet OSD::Session forever.
the osdmap is not reset in OSD::session_notify_pg_create(), because its
only caller is wake_pg_waiters(), which will call
dispatch_session_waiting() later. and dispatch_session_waiting() will
check the session->osdmap, and will also reset the osdmap if
session->waiting_for_pg.empty().
Jason Dillaman [Thu, 19 May 2016 00:53:26 +0000 (20:53 -0400)]
rbd-mirror: disable librbd caching for replicated images
Each image has its own cache and each cache uses its own thread. With
a large replicated cluster, this could result in thousands of extra
threads and gigabytes of extra memory.
Fixes: http://tracker.ceph.com/issues/15930 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit ea35f148257282fe3f3ae02fe7a26cf245cda952)
Jason Dillaman [Thu, 19 May 2016 19:50:04 +0000 (15:50 -0400)]
librbd: delay commit of overwritten journal event
With the cache enabled and write-after-write IOs to the same
object extents, it was possible for the overwritten journal event
to be committed before the overwriter journal event was written
to disk. If a client crash occurs before the event is written,
the image will be inconsistent on replay.
Fixes: http://tracker.ceph.com/issues/15938 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit e8bf64cc85ffe3d2dda23eab1834f7a5f104f6fe)
Jason Dillaman [Thu, 19 May 2016 17:48:22 +0000 (13:48 -0400)]
qa/workunits/rbd: record rbd CLI debug messages during mirror stress
The debug messages from 'rbd bench-write' and 'rbd snap create',
in addition to the existing debug messages from rbd-mirror, make
it possible to determine the source of any image inconsistency.
Yehuda Sadeh [Mon, 16 May 2016 21:35:12 +0000 (14:35 -0700)]
rgw: keep track of written_objs correctly
Fixes: http://tracker.ceph.com/issues/15886
Only add a rados object to the written_objs list if the write
was successful. Otherwise if the write will be canceled for some
reason, we'd remove an object that we didn't write to. This was
a problem in a case where there's multiple writes that went to
the same part. The second writer should fail the write, since
we do an exclusive write. However, we added the object's name
to the written_objs list anyway, which was a real problem when
the old processor was disposed (as it was clearing the objects).
Jason Dillaman [Tue, 17 May 2016 01:17:09 +0000 (21:17 -0400)]
qa/workunits/rbd: rbd-mirror daemon stress test
This test repeatedly runs rbd bench-write, kills the process
randomly to create an unclean journal shutdown, and verifies
that the image content replicates correctly.
Jason Dillaman [Sun, 15 May 2016 13:52:41 +0000 (09:52 -0400)]
journal: skip partially complete tag entries during playback
If a journal client does not fully write out its buffered entries
before quiting, replay should skip over all remaining out-of-
sequence entries for the tag.
Fixes: http://tracker.ceph.com/issues/15864 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 9454f7e4c62437b1c288f371009feba1fd374584)
Jason Dillaman [Sat, 14 May 2016 22:58:41 +0000 (18:58 -0400)]
journal: close, advance, and open object set ordering
Flush in-flight appends to open objects before advancing the
active object set. Additionally, don't start recording to the
new objects until after advancing the active set.
Jason Dillaman [Fri, 13 May 2016 20:17:37 +0000 (16:17 -0400)]
journal: ignore flush on closed/overflowed object
The journal would be in-progress on transitioning to a new
object recorder in a newer object set. Once the records
re-attach to the new object player they will automatically
flush.
Jason Dillaman [Fri, 13 May 2016 18:49:07 +0000 (14:49 -0400)]
journal: re-fetch active object before advancing set during replay
During a live replay, it's possible that an append and and overflow
into the next object could race with the live playback of the same
object. Re-fetch an "empty" object at least once before advancing
to next set to ensure all records have been read.
Fixes: http://tracker.ceph.com/issues/15665 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 6056f8c45c99bd37cb18933a37cc238c7e9a7c7d)