From bdc0681831038710cbb1a9438995c799e054b8e8 Mon Sep 17 00:00:00 2001 From: Cole Mitchell Date: Sun, 16 Apr 2023 09:13:56 -0400 Subject: [PATCH] doc/radosgw: format part of s3select Format the first section of s3select. Nothing else is being fixed. Signed-off-by: Cole Mitchell (cherry picked from commit a6a84471a7af154e7ccc93f51df2fc9744dc606c) --- doc/radosgw/s3select.rst | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/doc/radosgw/s3select.rst b/doc/radosgw/s3select.rst index 3e38eb6ca91d7..2ddc6ea5117fb 100644 --- a/doc/radosgw/s3select.rst +++ b/doc/radosgw/s3select.rst @@ -7,18 +7,23 @@ Overview -------- - | The purpose of the **s3 select** engine is to create an efficient pipe between user client and storage nodes (the engine should be close as possible to storage). - | It enables the selection of a restricted subset of (structured) data stored in an S3 object using an SQL-like syntax. - | It also enables for higher level analytic-applications (such as SPARK-SQL), using that feature to improve their latency and throughput. - - | For example, an s3-object of several GB (CSV file), a user needs to extract a single column filtered by another column. - | As the following query: - | ``select customer-id from s3Object where age>30 and age<65;`` - - | Currently the whole s3-object must be retrieved from OSD via RGW before filtering and extracting data. - | By "pushing down" the query into radosgw, it's possible to save a lot of network and CPU(serialization / deserialization). - - | **The bigger the object, and the more accurate the query, the better the performance**. +The purpose of the **s3 select** engine is to create an efficient pipe between +user client and storage nodes (the engine should be close as possible to +storage). It enables the selection of a restricted subset of (structured) data +stored in an S3 object using an SQL-like syntax. It also enables for higher +level analytic-applications (such as SPARK-SQL), using that feature to improve +their latency and throughput. + +For example, an s3-object of several GB (CSV file), a user needs to extract a +single column filtered by another column. As the following query: ``select +customer-id from s3Object where age>30 and age<65;`` + +Currently the whole s3-object must be retrieved from OSD via RGW before +filtering and extracting data. By "pushing down" the query into radosgw, it's +possible to save a lot of network and CPU(serialization / deserialization). + + **The bigger the object, and the more accurate the query, the better the + performance**. Basic workflow -------------- -- 2.39.5