Florian Haas [Wed, 12 Mar 2014 18:31:54 +0000 (19:31 +0100)]
doc: Add "nearest power of two" to PG rule-of-thumb
Following an IRC discussion, it emerged that it would be helpful
to explain the merit of choosing a number of PGs per pool that is
a power of two, to keep PGs at roughly equal sizes in case of
PG splits.
See http://irclogs.ceph.widodh.nl/index.php?date=2014-03-12 for the
original discussion.
Yehuda Sadeh [Wed, 12 Mar 2014 01:19:44 +0000 (18:19 -0700)]
rgw: don't overwrite bucket entry data when syncing user stats
Fixes: #7687
When syncing user bucket stats we overwritten the entire entry with the
passed in entry. We should only look at the stats portion, and not
overwrite the rest (which contains bucket creation time).
Sharif Olorin [Tue, 11 Mar 2014 10:01:01 +0000 (21:01 +1100)]
rados_connect not thread-safe when using nss (documentation)
I'm not sure whether rados_connect is expected to be threadsafe or not,
so this is just a documentation patch rather than a fix; I'd appreciate
your opinion on whether this is expected behaviour or not.
The race condition is in the call to ceph::crypto::init when called by
common_init_finish, the issue being that it calls NSS_NoDB_Init (not
threadsafe) without locking. It can be reproduced (probabilistically) by
calling rados_connect on different rados_t objects simultaneously, due
to NSS_NoDB_Init's use of PR_CallOnce in nspr (which keeps global state,
and while PR_CallOnce is intended as a locking function, the locking
itself isn't thread-safe, and can pass PR_Lock a null pointer).
The observed behaviour is a segfault on calling rados_connect.
Backtrace, for reference:
Warren Usui [Fri, 21 Feb 2014 05:11:45 +0000 (21:11 -0800)]
Fix get_status() to find client.rados text inside of ps command results.
Added port (fixed value for right now in teuthology) to hostname. Fixes: 7374 Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit 8200b8a02511e367370d33cb74c3d45ef85fca31)
Yan, Zheng [Sun, 9 Mar 2014 23:36:14 +0000 (07:36 +0800)]
mds: fix owner check of file lock
flock and posix lock do not use process ID as owner identifier.
The process ID of who holds the lock is just for F_GETLK fcntl(2).
For linux kernel, File lock's owner identifier is the file pointer
through which the lock is requested.
The fix is do not take the 'pid_namespace' into consideration when
checking conflict locks. Also rename the 'pid' fields of struct
ceph_mds_request_args and struct ceph_filelock to 'owner', rename
'pid_namespace' fields to 'pid'.
The kclient counterpart of this patch modifies the flock code to
assign the file pointer to the 'owner' field of lock message. It
also set the most significant bit of the 'owner' field. We can use
that bit to distinguish between old and new clients.
Ken Dreyer [Mon, 10 Mar 2014 22:06:48 +0000 (16:06 -0600)]
doc: rm duplicate info from release-process
The "just push the new tag" bit is already done in the list of commands
above. Remove this piece, since it's duplicated by the "git push"
command above.
The ceph-deploy and backports-sections were empty. Remove them.
Stephan Renatus [Mon, 10 Mar 2014 14:17:41 +0000 (15:17 +0100)]
rbdmap: bugfix upstart script
It seems like the upstart script is lacking a little behind [the initscript](https://github.com/ceph/ceph/blob/master/src/init-rbdmap#L44-L49); however, this bugfix makes it actually do what it should do.
Before, the bug made the job just ignore all parameters, with the following error in /var/log/upstart/rbdmap.log:
Samuel Just [Fri, 7 Mar 2014 23:54:23 +0000 (15:54 -0800)]
ReplicatedPG::finish_ctx: clear object_info if !obs.exists
Otherwise, we see a different object_info_t depending on whether the
transaction deleting the object clears before another op recreating it appears.
In particular, we use oi.version to set the prior_version on the log entries in
finish_ctx. If the oi is allowed to stick around the recreation log event will
have a prior version of the deletion event when it should have a prior version
of eversion_t().
Fixes: #7655 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Looking for an entry in olog which matches one of ours might add
extra divergent entries. Instead, do what merge_log does and
walk back through the auth log looking for an entry in olog.
Fixes: 7657 Signed-off-by: Samuel Just <sam.just@inktank.com>
Yan, Zheng [Thu, 6 Mar 2014 23:12:39 +0000 (07:12 +0800)]
client: fix Client::getcwd()
An recent commit made MDS not include dentry trace in LOOKUPPARENT
reply. It broke Client::getcwd. The fix is change getcwd() to use
LOOKUPNAME MDS request
Yan, Zheng [Thu, 6 Mar 2014 07:24:02 +0000 (15:24 +0800)]
mds: introduce LOOKUPNAME MDS request
The new MDS request is used for connecting a given inode to its
parent inode. It allows client to have efficient implementation of
get_rename() NFS export callback.
Sage Weil [Fri, 7 Mar 2014 22:02:26 +0000 (14:02 -0800)]
mon/PGMap: send pg create messages to primary, not acting[0]
For erasure pools, these may not match.
In the case of #7652, this caused pg_create messages to be send
indefinitely. register_pg() added it to the list for acting_primary, and
when we got the (non-creating) pg stat update we removed it from the list
for acting[0].
Fixes: #7652 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 7 Mar 2014 21:29:03 +0000 (13:29 -0800)]
mon/OSDMonitor: make osdmap feature checks non-racy
The check for OSD features may race with the boot of an OSD that does not
have the necessary features. Check the pending info too, and if there is
a missing feature, return -EAGAIN. In the callers, wait on -EAGAIN.
qa: workunits/mon/rbd_snaps_ops.sh: ENOTSUP on snap rm from copied pool
'rados cppool' copies the contents but that doesn't make the destination
pool an unmanaged snaps pool. Therefore, we must get an ENOTSUP when
we try to remove an unmanaged snap from a not-unmanaged pool.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: OSDMonitor: don't remove unamanaged snaps from not-unmanaged pools
Although we should allow creating unmanaged snaps on not-unamanaged pools,
as long as those pools don't have any managed snapshots in them, we cannot
allow removal -- because the pool will not have any unmanaged snapshots.
Fixes: 7210 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Fri, 7 Mar 2014 00:12:30 +0000 (16:12 -0800)]
osd: fix agent thread shutdown
We had an old invariant that agent_queue would have at least 1 entry in
it to simplify some other code paths, but it turns out that it is simpler
not to do that.
In particular, this was triggering a failed assertion on shutdown when we
assert that the queue is empty.
Dump offending items on shutdown if they are there, tho, to catch any
future bugs.
Fixes: #7637 Signed-off-by: Sage Weil <sage@inktank.com>
Loic Dachary [Thu, 6 Mar 2014 23:07:26 +0000 (00:07 +0100)]
logrotate: copy/paste daemon list from *-all-starter.conf
Each upstart/*-all-starter.conf use the same script to find the list of
daemons and their ids. Copy it over to the corresponding logrotate.conf
script instead of using a less reliable script based on initctl list
output.
If logrotate fails to run initctl reload on a daemon, it will keep
writing to the rotated log file, even after it is deleted and until it
fills the disk. By using the exact same shell snippet as the upstart
scripts used to start the daemon, all of them will be sent the HUP
signal and reopen the log file that was just rotated.