From: Abhishek Lekshmanan Date: Thu, 22 Jun 2017 14:19:13 +0000 (+0200) Subject: doc: rgw add some basic documentation for sync plugins & ES X-Git-Tag: v13.0.2~140^2~1 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=4306602862bd22072a7f296ee681cf47524f0328;p=ceph.git doc: rgw add some basic documentation for sync plugins & ES Mostly a rst formatted C-c C-v of Yehuda's mail to the ceph-devel lists Signed-off-by: Abhishek Lekshmanan --- diff --git a/doc/radosgw/elastic-sync-module.rst b/doc/radosgw/elastic-sync-module.rst new file mode 100644 index 000000000000..d6108b41022f --- /dev/null +++ b/doc/radosgw/elastic-sync-module.rst @@ -0,0 +1,181 @@ +========================= +ElasticSearch Sync Module +========================= + +.. versionadded:: Kraken + +This sync module writes the metadata from other zones to `ElasticSearch`_. As of +luminous this is a json of data fields we currently store in ElasticSearch. + +:: + + { + "_index" : "rgw-gold-ee5863d6", + "_type" : "object", + "_id" : "34137443-8592-48d9-8ca7-160255d52ade.34137.1:object1:null", + "_score" : 1.0, + "_source" : { + "bucket" : "testbucket123", + "name" : "object1", + "instance" : "null", + "versioned_epoch" : 0, + "owner" : { + "id" : "user1", + "display_name" : "user1" + }, + "permissions" : [ + "user1" + ], + "meta" : { + "size" : 712354, + "mtime" : "2017-05-04T12:54:16.462Z", + "etag" : "7ac66c0f148de9519b8bd264312c4d64" + } + } + } + + + +ElasticSearch tier type configurables +------------------------------------- + +* ``endpoint`` + +Specifies the Elasticsearch server endpoint to access + +* ``num_shards`` (integer) + +The number of shards that Elasticsearch will be configured with on +data sync initialization. Note that this cannot be changed after init. +Any change here requires rebuild of the Elasticsearch index and reinit +of the data sync process. + +* ``num_replicas`` (integer) + +The number of the replicas that Elasticsearch will be configured with +on data sync initialization. + +* ``explicit_custom_meta`` (true | false) + +Specifies whether all user custom metadata will be indexed, or whether +user will need to configure (at the bucket level) what custome +metadata entries should be indexed. This is false by default + +* ``index_buckets_list`` (comma separated list of strings) + +If empty, all buckets will be indexed. Otherwise, only buckets +specified here will be indexed. It is possible to provide bucket +prefixes (e.g., foo\*), or bucket suffixes (e.g., \*bar). + +* ``approved_owners_list`` (comma separated list of strings) + +If empty, buckets of all owners will be indexed (subject to other +restrictions), otherwise, only buckets owned by specified owners will +be indexed. Suffixes and prefixes can also be provided. + +* ``override_index_path`` (string) + +if not empty, this string will be used as the elasticsearch index +path. Otherwise the index path will be determined and generated on +sync initialization. + + +End user metadata queries +------------------------- + +.. versionadded:: Luminous + +Since the ElasticSearch cluster now stores object metadata, it is important that +the ElasticSearch endpoint is not exposed to the public and only accessible to +the cluster administrators. For exposing metadata queries to the end user itself +this poses a problem since we'd want the user to only query their metadata and +not of any other users, this would require the ElasticSearch cluster to +authenticate users in a way similar to RGW does which poses a problem. + +As of Luminous RGW in the metadata master zone can now service end user +requests. This allows for not exposing the elasticsearch endpoint in public and +also solves the authentication and authorization problem since RGW itself can +authenticate the end user requests. For this purpose RGW introduces a new query +in the bucket apis that can service elasticsearch requests. All these requests +must be sent to the metadata master zone. + +Syntax +~~~~~~ + +Get an elasticsearch query +`````````````````````````` + +:: + + GET /{bucket}?query={query-expr} + +request params: + - max-keys: max number of entries to return + - marker: pagination marker + +``expression := [(] [)][ ...]`` + +op is one of the following: +<, <=, ==, >=, > + +For example :: + + GET /?query=name==foo + +Will return all the indexed keys that user has read permission to, and +are named 'foo'. + +Will return all the indexed keys that user has read permission to, and +are named 'foo'. + +The output will be a list of keys in XML that is similar to the S3 +list buckets response. + +Configure custom metadata fields +```````````````````````````````` + +Define which custom metadata entries should be indexed (under the +specified bucket), and what are the types of these keys. If explicit +custom metadata indexing is configured, this is needed so that rgw +will index the specified custom metadata values. Otherwise it is +needed in cases where the indexed metadata keys are of a type other +than string. + +:: + + POST /{bucket}?mdsearch + x-amz-meta-search: [, ...] + +Multiple metadata fields must be comma seperated, a type can be forced for a +field with a `;`. The currently allowed types are string(default), integer and +date + +eg. if you want to index a custom object metadata x-amz-meta-year as int, +x-amz-meta-date as type date and x-amz-meta-title as string, you'd do + +:: + + POST /mybooks?mdsearch + x-amz-meta-search: x-amz-meta-year;int, x-amz-meta-release-date;date, x-amz-meta-title;string + + +Delete custom metadata configuration +```````````````````````````````````` + +Delete custom metadata bucket configuration. + +:: + + DELETE /?mdsearch + +Get custom metadata configuration +````````````````````````````````` + +Retrieve custom metadata bucket configuration. + +:: + + GET /?mdsearch + + +.. _`Elasticsearch`: https://github.com/elastic/elasticsearch diff --git a/doc/radosgw/sync-plugins.rst b/doc/radosgw/sync-plugins.rst new file mode 100644 index 000000000000..4b8917488581 --- /dev/null +++ b/doc/radosgw/sync-plugins.rst @@ -0,0 +1,89 @@ +============ +Sync Modules +============ + +.. versionadded:: Kraken + +The `Multisite`_ functionality of RGW introduced in Jewel allowed the ability to +create multiple zones and mirror data and metadata between them. ``Sync Modules`` +are built atop of the multisite framework that allows for forwarding data and +metadata to a different external tier. A sync module allows for a set of actions +to be performed whenever a change in data occurs (metadata ops like bucket or +user creation etc. are also regarded as changes in data). As the rgw multisite +changes are eventually consistent at remote sites, changes are propagated +asynchronously. This would allow for unlocking use cases such as backing up the +object storage to an external cloud cluster or a custom backup solution using +tape drives, indexing metadata in ElasticSearch etc. + +A sync module configuration is local to a zone. The sync module determines +whether the zone exports data or can only consume data that was modified in +another zone. As of luminous the supported sync plugins are `elasticsearch`_, +``rgw``, which is the default sync plugin that synchronises data between the +zones and ``log`` which is a trivial sync plugin that logs the metadata +operation that happens in the remote zones. The following docs are written with +the example of a zone using `elasticsearch sync module`_, the process would be similar +for configuring any sync plugin + +.. note ``rgw`` is the default sync plugin and there is no need to explicitly + configure this + +Requirements and Assumptions +---------------------------- + +Let us assume a simple multisite configuration as described in the `Multisite`_ +docs, of 2 zones ``us-east`` and ``us-west``, let's add a third zone +``us-east-es`` which is a zone that only processes metadata from the other +sites. This zone can be in the same or a different ceph cluster as ``us-east``. +This zone would only consume metadata from other zones and RGWs in this zone +will not serve any end user requests directly. + + +Configuring Sync Modules +------------------------ + +Create the third zone similar to the `Multisite`_ docs, for example + +:: + + # radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-es \ + --access-key={system-key} --secret={secret} --endpoints=http://rgw-es:80 + + + +A sync module can be configured for this zone via the following + +:: + + # radosgw-admin zone modify --rgw-zone={zone-name} --tier-type={tier-type} --tier-config={set of key=value pairs} + + +For example in the ``elasticsearch`` sync module + +:: + + # radosgw-admin zone modify --rgw-zone={zone-name} --tier-type=elasticsearch \ + --tier-config=endpoint=http://localhost:9200,num_shards=10,num_replicas=1 + + +For the various supported tier-config options refer to the `elasticsearch sync module`_ docs + +Finally update the period + + +:: + + # radosgw-admin period update --commit + + +Now start the radosgw in the zone + +:: + + # systemctl start ceph-radosgw@rgw.`hostname -s` + # systemctl enable ceph-radosgw@rgw.`hostname -s` + + + +.. _`Multisite`: ../multisite +.. _`elasticsearch sync module`: ../elastic-sync-module +.. _`elasticsearch`: ../elastic-sync-module