Haomai Wang [Wed, 29 Jan 2014 09:50:10 +0000 (17:50 +0800)]
Add KeyValueStore implementation
KeyValueStore is another ObjectStore implementation with FileStore. It
uses KV store wrapper(StripObjectMap) which inherited GenericObjectMap
to implement ObjectStore APIs.
Each object has a header key in KV backend, which encapsulated the metadata
of object such as size, the status of keys. A complete object data maybe spread
around multi keys. The CRUD operation of object need to access the header key
of object to know the details, then the actual data keys will be get.
Now the actual KV backend of KeyValueStore is only LevelDB, more KV backend
(RocksDB, NVM API) will be introduced in the near future.
Haomai Wang [Wed, 29 Jan 2014 09:46:00 +0000 (17:46 +0800)]
Add a new KV wrapper GenericObjectMap
Now we already have DBObjectMap which implement ObjectMap and other
interfaces, and ObjectMap.h implied that ObjectMap is used to encapsulates
the FileStore key value store. There exists limitation in current DBObjectMap
implementation, such as lacking of "coll_t" in "key", complicated prefix
hard-coded and inflexible extending.
So in order to provide a more flexible API and clear implementation to wrap KV
store, I copy the origin DBObjectMap and redesign the partial implementation.
Adding "coll_t" argument to all API and export "prefix" to callers. Prefixes
are divided into two parts "INTERN" and "USER". "INTERN" keys used by self to
manage and "USER" keys are managed by callers. Besides above, misc fixes are
imported such as more clear member function name and extendible header
structure.
There's a window in-between receiving an MOSDPGTemp message from an OSD
and actually handling it that may lead to the pool the pg temps refer to
no longer existing. This may happen if the MOSDPGTemp message is queued
pending dispatching due to an on-going proposal (maybe even the pool
removal).
This patch fixes such behavior in two steps:
1. Check if the pool exists in the osdmap upon preprocessing
- if pool does not exist in the osdmap, then the pool must have been
removed prior to handling the message, but after the osd sent it.
- safe to ignore the pg update
2. If all pg updates in the message have been ignored, ignore the whole
message. Otherwise, let prepare handle the rest.
3. Recheck if pool exists in the osdmap upon prepare
- We may have ignored this pg back in preprocess, but other pgs in the
message may have led the message to be passed on to prepare; ignore
pg update once more.
4. Check if pool is pending removal and ignore pg update if so.
We delegate checking the pending value to prepare_pgtemp() because in this
case we should only ignore the update IFF the pending value is in fact
committed. Otherwise we should retry the message. prepare_pgtemp() is
the appropriate place to do so.
Fixes: 7116 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Fri, 24 Jan 2014 19:04:37 +0000 (11:04 -0800)]
OSDMap: fix damaging input osdmap from remove_down_temps
The default copy constructor copies shared_ptrs do vectors that are then
modified by apply_incremental, which means that the const osdmap argument
isn't in fact const. Fix this by doing a deep(ish) copy.
Fixes: #7060 Signed-off-by: Sage Weil <sage@inktank.com>
Derek Yarnell [Mon, 27 Jan 2014 19:27:51 +0000 (12:27 -0700)]
packaging: apply udev hack rule to RHEL
In the RPM spec file there is a test to deploy the uuid hack udev rules
for older udev operating systems. This includes CentOS and RHEL, but the
check currently only is for CentOS, causing RHEL clients to get a bogus
osd rules file.
Adjust the conditional to apply to RHEL as well as CentOS. (The %{rhel}
macro is defined in both platforms' redhat-rpm-config package.)
Fixes http://tracker.ceph.com/issues/7245
Signed-off-by: Ken Dreyer <ken.dreyer@inktank.com>
Loic Dachary [Wed, 8 Jan 2014 11:41:06 +0000 (12:41 +0100)]
mon: shell test helpers to run MONs from sources
The intent is to make it more convenient to reproduce a specific mon
behavior and observe the result by grepping the logs. It can be handy
for bug diagnostic. The test could be included as a unit test to
be run on make check.
The setup function will prepare a directory and kill leftover from a
previous run. The teardown function cleans up on success. The run
function is expected to be provided by the calling script and can make
use of the run_mon function to mkfs + run a monitor.
Loic Dachary [Sun, 26 Jan 2014 12:32:57 +0000 (13:32 +0100)]
unittests: fail early when low on disk
Scripts from qa that are run as unittests via test/vstart_wrapper.sh may
fail because the partition on which it runs is low on space ( 95% full
). When it happens the cause of the problem may be unclear because it
is likely to show only as a client not being able to reach the mon.
A test is added in test/vstart_wrapper.sh to verify the disk space usage
using the same method as the mon would and fail with a detailed error if
it is the case.
Yehuda Sadeh [Tue, 14 Jan 2014 22:48:16 +0000 (14:48 -0800)]
rgw: quota thread for full user stats sync
Get user stats up to date periodically. Add configurables for different
periods, whether we update idle users.
Make sure radosgw-admin does not start the quota threads.
Yehuda Sadeh [Mon, 13 Jan 2014 22:19:27 +0000 (14:19 -0800)]
rgw, cls_user: fix bucket creation
There's a single op to create and update the user bucket info, however,
the cases differ a bit, as we only need to guard against ENOENT if we're
updating the info.
Yehuda Sadeh [Thu, 9 Jan 2014 00:39:19 +0000 (16:39 -0800)]
rgw: pass bucket owner all around
User quota operations require that we know who the actual user we do the
operation on is. Pass that info when creating new object and when
removing objects.
Sage Weil [Thu, 23 Jan 2014 17:16:54 +0000 (09:16 -0800)]
osd/OSDMap: do not create erasure rule by default
If we do, we will require the v2 feature bit from clients.
We could only include feature bits for rules that are actually referenced
by pools, but for now making the user create the rule is simpler. There is
no need to create this rule ahead of time.
Samuel Just [Sun, 19 Jan 2014 09:17:49 +0000 (01:17 -0800)]
PG: drop messages from down peers
This overlaps with the existing old_peering_msg() mechanism
except in one case: pulls from a replica not in the acting
set. If such a replica gets marked down, we may resend
pulls to another replica without causing a new interval
to start. If we recieved, but didn't process, a push in
response to such a pull prior to processing the map marking
the peer down, we might process the push after having reset
the pull state for a different pull operation. We can
avoid this by discarding ops from down peers.
Samuel Just [Thu, 16 Jan 2014 20:04:01 +0000 (12:04 -0800)]
PG::calc_acting: consider newest_update_osd when choosing backfill peers
We must include newest_update_osd->second.log_tail when considering backfill
peers because in GetLog we will request logs back to the min last_update over
our acting_backfill set. This will result in our log being extended as far
backwards as necessary to pick up any peers which can be log recovered by the
union of newest_update_osd's log and that of the chosen primary.