From: Cole Mitchell Date: Sun, 16 Apr 2023 13:13:56 +0000 (-0400) Subject: doc/radosgw: format part of s3select X-Git-Tag: v17.2.7~462^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F51105%2Fhead;p=ceph.git doc/radosgw: format part of s3select Format the first section of s3select. Nothing else is being fixed. Signed-off-by: Cole Mitchell (cherry picked from commit a6a84471a7af154e7ccc93f51df2fc9744dc606c) --- diff --git a/doc/radosgw/s3select.rst b/doc/radosgw/s3select.rst index 3e38eb6ca91d..2ddc6ea5117f 100644 --- a/doc/radosgw/s3select.rst +++ b/doc/radosgw/s3select.rst @@ -7,18 +7,23 @@ Overview -------- - | The purpose of the **s3 select** engine is to create an efficient pipe between user client and storage nodes (the engine should be close as possible to storage). - | It enables the selection of a restricted subset of (structured) data stored in an S3 object using an SQL-like syntax. - | It also enables for higher level analytic-applications (such as SPARK-SQL), using that feature to improve their latency and throughput. - - | For example, an s3-object of several GB (CSV file), a user needs to extract a single column filtered by another column. - | As the following query: - | ``select customer-id from s3Object where age>30 and age<65;`` - - | Currently the whole s3-object must be retrieved from OSD via RGW before filtering and extracting data. - | By "pushing down" the query into radosgw, it's possible to save a lot of network and CPU(serialization / deserialization). - - | **The bigger the object, and the more accurate the query, the better the performance**. +The purpose of the **s3 select** engine is to create an efficient pipe between +user client and storage nodes (the engine should be close as possible to +storage). It enables the selection of a restricted subset of (structured) data +stored in an S3 object using an SQL-like syntax. It also enables for higher +level analytic-applications (such as SPARK-SQL), using that feature to improve +their latency and throughput. + +For example, an s3-object of several GB (CSV file), a user needs to extract a +single column filtered by another column. As the following query: ``select +customer-id from s3Object where age>30 and age<65;`` + +Currently the whole s3-object must be retrieved from OSD via RGW before +filtering and extracting data. By "pushing down" the query into radosgw, it's +possible to save a lot of network and CPU(serialization / deserialization). + + **The bigger the object, and the more accurate the query, the better the + performance**. Basic workflow --------------