Jeff Layton [Mon, 29 Aug 2016 11:16:41 +0000 (07:16 -0400)]
client: add the ability to set the btime
This adds a new set of libcephfs calls: ceph_ll_setattrx and
ceph_setattrx. This allows clients to set the btime in addition to other
values that are typically settable via ceph_setattr calls.
Currently, the setattrx mask uses the same CEPH_SETATTR values that the
ceph_setattr interface uses. I'm not sure this is what we will want
though. Would it be better to rephrase that via STATX_* constants?
Jeff Layton [Mon, 29 Aug 2016 11:16:41 +0000 (07:16 -0400)]
MDS: allow the MDS to accept requests to set the btime
Unfortunately, the only option here is to rev the MClientRequest
version as the ceph_mds_request_head is not currently versioned. Add a
new ceph_mds_request_head, which contains a new ceph_mds_request_args
structure.
The new ceph_mds_request_head is now versioned via a __le16
at the beginning of it, and then the args structure is expanded to hold
the btime. When we get a legacy ceph_mds_request_head, we just set the
new fields to zero. When encoding a reply to a legacy client, we simply
don't encode the version in the head, or the btime in the setattr union
member.
Reluctantly-Signed-off-by: Jeff Layton <jlayton@redhat.com>
Jeff Layton [Mon, 29 Aug 2016 11:16:40 +0000 (07:16 -0400)]
cephfs: rename ceph_mds_request_head and _args with a _legacy postfix
We're going to need to introduce new versions of these structures in
order to expand the setattr union member. Rename the existing ones so
that it's clear that they are for legacy clients and servers.
Jeff Layton [Mon, 29 Aug 2016 11:16:40 +0000 (07:16 -0400)]
mds: make frag_info_t add_dirty() function take a pointer to touched_mtime
...rather than messing around with references. While we're at it, we
can also make the argument optional, which allows us to drop an unused
stack variable from CDir::split.
Jeff Layton [Mon, 29 Aug 2016 11:16:40 +0000 (07:16 -0400)]
mds/client: bump the change_attr at the appropriate time for files
The semantics for a change_attr are that it should be incremented
whenever there is a change to the ctime in the inode. Add those
increments for the simple case of regular files. Directories however can
be fragmented so we'll need to do something more elaborate there.
Jeff Layton [Mon, 29 Aug 2016 14:33:10 +0000 (10:33 -0400)]
mds: ensure that change_attr reflects metadata changes on clients that hold CAP_FILE_EXCL
Suppose we have two clients. client1 holds FILE_EXCL cap and client2
holds AUTH_EXCL. Both have the change_attr at the same value (call it
1). client1 does 2 writes and its change_attr goes to 3. The client1
then queries for the change_attr and gets back 3 from the cache. The MDS
then recalls FILE_EXCL from client1 and now the MDS and client1 have the
same change_attr (3).
client2 then does a chmod on the file, and its change_attr goes to 2.
client1 then does a statx with STX_VERSION|STX_MODE. The MDS recalls the
AUTH_EXCL cap from client2, the change_attr in the MClientCaps is less
than the one in the MDS inode, so it gets discarded. client1 then sees
a new mode but the change_attr value has not changed, which violates the
rules.
Fix this with an extra increment of the MDS copy of the change_attr when
the caps being returned are dirty, and they don't contain exclusive write
caps.
Jeff Layton [Mon, 29 Aug 2016 11:16:39 +0000 (07:16 -0400)]
mds/client: add btime to CapSnap and MClientCaps
Currently we don't have a mechanism to set the btime, but we will need
that eventually. If we want to allow the client to cache that change, we
need to be able to pass it back and forth between client and server.
Jeff Layton [Mon, 29 Aug 2016 11:16:39 +0000 (07:16 -0400)]
libcephfs: add a test for "lazy" statx
Create 2 clients. Create a file in client1, and do a lookup of it in
client2, and then ll_getattrx it from client2. chmod the file from
client1, ll_getattrx it client2 (this time with AT_NO_ATTR_SYNC) and
ensure that the ctime change is not seen.
Jeff Layton [Mon, 29 Aug 2016 11:16:38 +0000 (07:16 -0400)]
libcephfs: add a ceph_ll_getattrx and ceph_statx
New interfaces for fetching extended (and selective) stat information.
Additionally, applications can specify AT_NO_ATTR_SYNC in the flags to
indicate that they want to do a "lazy" statx that just hands out the
inode info from the cache, or AT_SYMLINK_NOFOLLOW to avoid following
symlinks when walking the path.
Jeff Layton [Mon, 29 Aug 2016 11:16:37 +0000 (07:16 -0400)]
client: pass a mask parameter to path_walk
ll_walk expects to get a set of attributes out of a path_walk. Pass a
caps mask parameter into path_walk, and then apply it when we reach the
last component of the path.
This may prevent us from having to further iteract with the server after
the pathwalk, in some cases. If we know that we're going to need certain
caps to do the actual operation we can request them during the lookup
and may have all that we need by the time we go to do the real request.
Nathan Cutler [Sat, 27 Aug 2016 18:11:04 +0000 (20:11 +0200)]
doc: do not list all major versions in get-packages.rst
The list of major versions is difficult to maintain. This commit drops it and
replaces it with a link to releases.rst plus some general language about how we
recommend that everyone keep their clusters up-to-date.
Sage Weil [Wed, 24 Aug 2016 17:02:07 +0000 (13:02 -0400)]
os/bluestore: ensure block device size is a multiple of the block size
We might have a backing device that is an odd number of 512-byte sectors
but have the block_size configured to 4096. Ensure the reported size
rounds down to avoid confusing other layers of the stack.
Tim Serong [Fri, 19 Aug 2016 11:16:48 +0000 (21:16 +1000)]
ceph.spec.in: don't try to package __pycache__ for SUSE
When building on openSUSE Tumbleweed, nothing seems to create
the various __pycache__ directories (so the build fails because
those files don't exist), and in any case they should be
created automatically at runtime, so shouldn't need to be
packaged. However, the Fedora packaging guidelines suggest
including __pycache__, so I've used a %suse_version guard here.
Fixes: http://tracker.ceph.com/issues/17106 Signed-off-by: Tim Serong <tserong@suse.com>
Loic Dachary [Tue, 23 Aug 2016 10:17:00 +0000 (12:17 +0200)]
tests: populate /dev/disk/by-partuuid for scsi_debug
The scsi_debug SCSI devices do not have a symlink in /dev/disk/by-partuuid
because they are filtered out by 60-persistent-storage.rules. That was
worked around by 60-ceph-partuuid-workaround-rules which has been
removed by 9f76b9ff31525eac01f04450d72559ec99927496.
Add create rules targetting this specific case, only for tests since the
problem does not show in real use cases.
Casey Bodley [Tue, 23 Aug 2016 19:10:44 +0000 (15:10 -0400)]
rgw: delete region map after upgrade to zonegroup map
convert_regionmap() reads the region map and uses it to initialize the
zonegroup map. but it doesn't remove the region_map afterwards, so
radosgw (and some radosgw-admin commands) will keep doing this on
startup, overwriting any changes made to the period/zonegroup map