BuildRequires: cryptopp-devel has been replaced by nss-devel. Skip
google-perftools-devel because that package is not available for x86-64.
Add python.
Don't install libcls_rbd.so.1.0.0.debug.
Package crbdnamer and librados-config.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
mkostemps isn't present in older glibc versions, like the ones in CentOS
5.5. We don't really use any of the extra functionality of mkostemps in
this test.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Samuel Just [Tue, 15 Mar 2011 00:25:46 +0000 (17:25 -0700)]
PG,OSD: activate pg during replay
Replay PGs already accept and queue transactions. PGs will now go to
active during replay in order to simplify the state reported to the user
and to allow recovery to being.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Tommi Virtanen [Mon, 14 Mar 2011 18:52:44 +0000 (11:52 -0700)]
blobhash: Avoid size_t in templatized hash functions.
On S/390, the earlier rjhash<size_t> failed with
"no match for call to '(rjhash<long unsigned int>) (size_t&)'".
It seems the rjhash<size_T> logic was only enabled
on some architectures, and relied on some pretty deep
internals of the bit layout (__LP64__).
Use an explicitly 32-bit type as early as possible, and
convert back to size_t only when really needed. This
should work, and simplifies the code. In theory, we might
have a narrower output (size_t might be 64-bit, max value
we now output is 32-bit), but this doesn't matter as this
is only ever used for picking a slot in an in-memory hash
table, hash(key) modulo num_of_buckets, there won't be >4G
buckets.
Closes: #837 Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Sage Weil [Thu, 17 Mar 2011 18:32:56 +0000 (11:32 -0700)]
msgr: let user explicitly set nonce
There will be problems if two messengers use the same entity_addr_t because
they are on the same ip and choose the same nonce (e.g., because they are
in the same process). Let the caller sort this out in whatever way it
finds most appropriate.
For libceph, librados, and csyn, all N million to the pid.
Fixes: #877 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 16 Mar 2011 21:39:24 +0000 (14:39 -0700)]
common: disable log_per_instance for non-daemons
Turn off the logging and symlink rotation, not just symlink rotation.
This is a somewhat arbitrary distinction (log per instance only for
daemons), but its only used by vstart and only really useful for
development/debugging, so who cares.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 16 Mar 2011 21:29:53 +0000 (14:29 -0700)]
Makefile: drop libradosgw_a LDFLAGS
Fixes the warning
src/Makefile.am:299: variable `libradosgw_a_LDFLAGS' is defined but no program or
src/Makefile.am:299: library has `libradosgw_a' as canonic name (possible typo)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 16 Mar 2011 21:25:46 +0000 (14:25 -0700)]
mds: resync fragmentation during cache rejoin
During rejoin we may find that different MDSs have different fragmentation
for directories. When that happens we should refragment as needed on the
replicas to match what's on the primary.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 16 Mar 2011 05:18:45 +0000 (22:18 -0700)]
osd: only update last_epoch_started after all replicas commit peering results
The PG info.history.last_epoch_started is important because it bounds how
far back in time we think we need to look in order to fully recover the
contents of the PG. That's because every replica commits the PG peering
result (the info and pg log) when it activates.
In order for this to work properly, we can only advance last_epoch_started
_after_ the peer results are stable on disk on all replicas. Otherwise a
poorly timed failure (or set of failures) could lose the PG peer results
and we wouldn't go back far enough in time to find them.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Sat, 20 Nov 2010 22:37:13 +0000 (14:37 -0800)]
filestore: adjust op_queue throttle max during fs commit
The underlying FS (btrfs at least) will block writes for a period while it
is doing a commit. If an OSD workload is write limited, we should raise
the op_queue max (operations that are queued to be applied to disk) during
the commit period.
For example, for a normally journal throughput limited (writeahead mode)
workload:
- journal queue throttle normally limits things.
- sync starts
- journaled items getting moved to op_queue soon fills up op_queue max
- all writes stop
- sync completes
- op_queue drains, new writes come in again
- journal queue throttle fills up, again starts limiting tput
For an fs throughput limited workload (writeahead):
- kernel buffer cache hits dirty limit
- op_queue throttle limits tput
- sync starts
- opq stalls, new writes stall on throttler
- sync completes
- opq drains (quickly: kernel has no dirty pages)
- new writes flood in
- etc.
(Actually this isn't super realistic, because hitting the kernel dirty
limit will do all sorts of other weird things with userland memory
allocations.)
In both cases, the commit phase blocks up the op queue, and raising the
limit temporarily will keep things flowing. This should be ok because the
disks are still busy during this period; they're just flushing dirty
data and metadata. Once the sync completes the opq will quickly dump dirty
data into the kernel page cache and "catch up".
Since cfuse usually runs as a nonprivileged user, its defaults must be a
little different from those of the other daemons. Add a flag to
common_init which can be used to set unprivileged daemon defaults.
SimpleMessenger::start() now just takes a boolean telling it whether to
daemonize. It doesn't need to check global variables or other arguments;
it just daemonizes if you tell it to; otherwise not.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Samuel Just [Thu, 10 Mar 2011 23:37:33 +0000 (15:37 -0800)]
ReplicatedPG,OSD: Track which osds we are pulling from
Currently, a PG waiting on a pull from a dead OSD cannot continue
recovery. ReplicatedPG::pull now tracks open pulls by peer in
rec_from_peer (map<int, set<sobject_t> >).
OSD::advance_map now calls check_recovery_op_pulls to allow the PG to
reset pulls from failed peers.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
size_t is usually 32-bit on 32-bit architectures and 64 on 64-bit ones.
On the other hand, we want our offsets and lengths for librados and
librbd to be 64 bit everywhere. So we need to use uint64_t for offsets
and lengths.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>