From: Zac Dover Date: Thu, 29 Jun 2023 08:48:00 +0000 (+1000) Subject: doc/radosgw: edit "Basic Workflow" in s3select.rst X-Git-Tag: v18.1.3~62^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=60af02c6cd9dd53c42c556b125b3769b56c65e50;p=ceph-ci.git doc/radosgw: edit "Basic Workflow" in s3select.rst Edit the "Basic Workflow" section in doc/radosgw/s3select.rst. Co-authored-by: Anthony D'Atri Signed-off-by: Zac Dover (cherry picked from commit 4d2c09b683421552cfb4df7f467f2d9a6c9c7c26) --- diff --git a/doc/radosgw/s3select.rst b/doc/radosgw/s3select.rst index 8b502cf6f50..cbd424ea265 100644 --- a/doc/radosgw/s3select.rst +++ b/doc/radosgw/s3select.rst @@ -28,30 +28,29 @@ possible to save a lot of network and CPU(serialization / deserialization). Basic Workflow -------------- -S3-select query is sent to RGW via `AWS-CLI +S3 Select queries are sent to RGW via `AWS-CLI `_ -It passes the authentication and permission process as an incoming message -(POST). **RGWSelectObj_ObjStore_S3::send_response_data** is the “entry point”, -it handles each fetched chunk according to input object-key. -**send_response_data** is first handling the input query, it extracts the query -and other CLI parameters. +S3 Select passes the authentication and permission parameters as an incoming +message (POST). ``RGWSelectObj_ObjStore_S3::send_response_data`` is the entry +point and handles each fetched chunk according to the object key that was +input. ``send_response_data`` is the first to handle the input query: it +extracts the query and other CLI parameters. -Per each new fetched chunk (~4m), RGW executes an s3-select query on it. The -current implementation supports CSV objects and since chunks are randomly -“cutting” the CSV rows in the middle, those broken-lines (first or last per -chunk) are skipped while processing the query. Those “broken” lines are -stored and later merged with the next broken-line (belong to the next chunk), -and finally processed. - -Per each processed chunk an output message is formatted according to `AWS +RGW executes an S3 Select query on each new fetched chunk (up to 4 MB). The +current implementation supports CSV objects. CSV rows are sometimes "cut" in +the middle by the limits of the chunks, and those broken-lines (the first or +last per chunk) are skipped while processing the query. Such broken lines are +stored and later merged with the next broken line (which belongs to the next +chunk), and only then processed. + +For each processed chunk, an output message is formatted according to `aws specification -`_ -and sent back to the client. RGW supports the following response: +`_ +and sent back to the client. RGW supports the following response: ``{:event-type,records} {:content-type,application/octet-stream} -{:message-type,event}``. For aggregation queries the last chunk should be -identified as the end of input, following that the s3-select-engine initiates -end-of-process and produces an aggregated result. +{:message-type,event}``. For aggregation queries, the last chunk should be +identified as the end of input. Basic Functionalities