doc/radosgw: edit "Basic Workflow" in s3select.rst

author Zac Dover <zac.dover@proton.me>

Thu, 29 Jun 2023 08:48:00 +0000 (18:48 +1000)

committer Zac Dover <zac.dover@proton.me>

Thu, 29 Jun 2023 15:27:52 +0000 (01:27 +1000)
author Zac Dover <zac.dover@proton.me>
Thu, 29 Jun 2023 08:48:00 +0000 (18:48 +1000)
committer Zac Dover <zac.dover@proton.me>
Thu, 29 Jun 2023 15:27:52 +0000 (01:27 +1000)
diff --git a/doc/radosgw/s3select.rst b/doc/radosgw/s3select.rst

index 8d177ff27328e4a1e8bf6badef550a662bf91164..0c1046a3883c5691f07dd6ebc273643b394f0d67 100644 (file)
--- a/doc/radosgw/s3select.rst
+++ b/doc/radosgw/s3select.rst
@@ -32,30 +32,29 @@ the better the performance of s3select**.
  Basic Workflow
  --------------
      
-S3-select query is sent to RGW via `AWS-CLI
+S3 Select queries are sent to RGW via `AWS-CLI
  <https://docs.aws.amazon.com/cli/latest/reference/s3api/select-object-content.html>`_
  
-It passes the authentication and permission process as an incoming message
-(POST). **RGWSelectObj_ObjStore_S3::send_response_data** is the “entry point”,
-it handles each fetched chunk according to input object-key.
-**send_response_data** is first handling the input query, it extracts the query
-and other CLI parameters.
+S3 Select passes the authentication and permission parameters as an incoming
+message (POST). ``RGWSelectObj_ObjStore_S3::send_response_data`` is the entry
+point and handles each fetched chunk according to the object key that was
+input.  ``send_response_data`` is the first to handle the input query: it
+extracts the query and other CLI parameters.
     
-Per each new fetched chunk (~4m), RGW executes an s3-select query on it. The
-current implementation supports CSV objects and since chunks are randomly
-“cutting” the CSV rows in the middle, those broken-lines (first or last per
-chunk) are skipped while processing the query.   Those “broken” lines are
-stored and later merged with the next broken-line (belong to the next chunk),
-and finally processed.
-   
-Per each processed chunk an output message is formatted according to `AWS
+RGW executes an S3 Select query on each new fetched chunk (up to 4 MB). The
+current implementation supports CSV objects. CSV rows are sometimes "cut" in
+the middle by the limits of the chunks, and those broken-lines (the first or
+last per chunk) are skipped while processing the query. Such broken lines are
+stored and later merged with the next broken line (which belongs to the next
+chunk), and only then processed.
+
+For each processed chunk, an output message is formatted according to `aws
  specification
-<https://docs.aws.amazon.com/AmazonS3/latest/API/archive-RESTObjectSELECTContent.html#archive-RESTObjectSELECTContent-responses>`_
-and sent back to the client.  RGW supports the following response:
+<https://docs.aws.amazon.com/amazons3/latest/api/archive-restobjectselectcontent.html#archive-restobjectselectcontent-responses>`_
+and sent back to the client. RGW supports the following response:
  ``{:event-type,records} {:content-type,application/octet-stream}
-{:message-type,event}``.  For aggregation queries the last chunk should be
-identified as the end of input, following that the s3-select-engine initiates
-end-of-process and produces an aggregated result.  
+{:message-type,event}``. For aggregation queries, the last chunk should be
+identified as the end of input. 
  
          
  Basic Functionalities
author	Zac Dover <zac.dover@proton.me>
	Thu, 29 Jun 2023 08:48:00 +0000 (18:48 +1000)
committer	Zac Dover <zac.dover@proton.me>
	Thu, 29 Jun 2023 15:27:52 +0000 (01:27 +1000)