From: Mark Kogan Date: Mon, 20 Jul 2020 10:19:57 +0000 (+0300) Subject: rgw: add PUT and POST req support to data cache X-Git-Tag: v16.1.0~1217^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=6251d2b6c111117828a28a63e1fd98d019d077c8;p=ceph.git rgw: add PUT and POST req support to data cache facilitates the full usage of the Nginx cache endpoint with s3 tools that support AWSv4 like s3cmd,aws-cli, benchmarking tools like hsbench and also hadoop/s3a. Co-authored-by: Or Friedmann Signed-off-by: Mark Kogan --- diff --git a/doc/radosgw/rgw-cache.rst b/doc/radosgw/rgw-cache.rst index d28a73887fb8..9b4c96f1e2b0 100644 --- a/doc/radosgw/rgw-cache.rst +++ b/doc/radosgw/rgw-cache.rst @@ -1,4 +1,4 @@ -========================== +========================== RGW Data caching and CDN ========================== @@ -7,28 +7,32 @@ RGW Data caching and CDN .. contents:: This feature adds to RGW the ability to securely cache objects and offload the workload from the cluster, using Nginx. -After an object is accessed the first time it will be stored in the Nginx directory. +After an object is accessed the first time it will be stored in the Nginx cache directory. When data is already cached, it need not be fetched from RGW. A permission check will be made against RGW to ensure the requesting user has access. -This feature is based on some Nginx modules, ngx_http_auth_request_module, https://github.com/kaltura/nginx-aws-auth-module, Openresty for lua capabilities. -Currently this feature only works for GET requests and it will cache only AWSv4 requests (only s3 requests). +This feature is based on some Nginx modules, ngx_http_auth_request_module, https://github.com/kaltura/nginx-aws-auth-module, Openresty for Lua capabilities. + +Currently, this feature will cache only AWSv4 requests (only s3 requests), caching-in the output of the 1st GET request +and caching-out on subsequent GET requests, passing thru transparently PUT,POST,HEAD,DELETE and COPY requests. + + The feature introduces 2 new APIs: Auth and Cache. New APIs ------------------------- -There are 2 new apis for this feature: +There are 2 new APIs for this feature: -Auth API - The cache uses this to validate that an user can access the cached data +Auth API - The cache uses this to validate that a user can access the cached data Cache API - Adds the ability to override securely Range header, that way Nginx can use it is own smart cache on top of S3: https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/ -Using this API gives the ability to read ahead objects when clients asking a specific range from the object. -On subsequent accesses to the cached object, Nginx will satisfy requests for already-cached ranges from cache. Uncached ranges will be read from RGW (and cached). +Using this API gives the ability to read ahead objects when clients asking a specific range from the object. +On subsequent accesses to the cached object, Nginx will satisfy requests for already-cached ranges from the cache. Uncached ranges will be read from RGW (and cached). Auth API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This API Validates a specific authenticated access being made to the cache, using RGW's knowledge of the client credentials and stored access policy. + +This API Validates a specific authenticated access being made to the cache, using RGW's knowledge of the client credentials and stored access policy. Returns success if the encapsulated request would be granted. Cache API @@ -44,10 +48,10 @@ $ radosgw-admin user create --uid= --display-name="cache use This user can send to the RGW the Cache API header ``X-Amz-Cache``, this header contains the headers from the original request(before changing the Range header). It means that ``X-Amz-Cache`` built from several headers. -The headers that are building the ``X-Amz-Cache`` header are separated by char with ascii code 177 and the header name and value are separated by char ascii code 178. -The RGW will check that the cache user is an authorized user and if it is a cache user, -if yes it will use the ``X-Amz-Cache`` to revalidate that the user have permissions, using the headers from the X-Amz-Cache. -During this flow the RGW will override the Range header. +The headers that are building the ``X-Amz-Cache`` header are separated by char with ASCII code 177 and the header name and value are separated by char ASCII code 178. +The RGW will check that the cache user is an authorized user and if it is a cache user, +if yes it will use the ``X-Amz-Cache`` to revalidate that the user has permissions, using the headers from the X-Amz-Cache. +During this flow, the RGW will override the Range header. Using Nginx with RGW @@ -59,7 +63,7 @@ Download the source of Openresty: $ wget https://openresty.org/download/openresty-1.15.8.3.tar.gz -git clone the aws auth nginx module: +git clone the AWS auth Nginx module: :: @@ -82,46 +86,65 @@ $ gmake -j $(nproc) $ sudo gmake install $ sudo ln -sf /usr/local/openresty/bin/openresty /usr/bin/nginx -Put in-place your nginx configuration files and edit them according to your environment: +Put in-place your Nginx configuration files and edit them according to your environment: -All nginx conf files are under: https://github.com/ceph/ceph/tree/master/examples/rgw-cache +All Nginx conf files are under: https://github.com/ceph/ceph/tree/master/examples/rgw-cache -nginx.conf should go to /etc/nginx/nginx.conf +`nginx.conf` should go to `/etc/nginx/nginx.conf` -nginx-lua-file.lua should go to /etc/nginx/nginx-lua-file.lua +`nginx-lua-file.lua` should go to `/etc/nginx/nginx-lua-file.lua` -nginx-default.conf should go to /etc/nginx/conf.d/nginx-default.conf +`nginx-default.conf` should go to `/etc/nginx/conf.d/nginx-default.conf` -The parameters that are most likely to require adjustment according to the environment are located in the file nginx-default.conf +The parameters that are most likely to require adjustment according to the environment are located in the file `nginx-default.conf` Modify the example values of *proxy_cache_path* and *max_size* at: -`proxy_cache_path /data/cache levels=2:2:2 keys_zone=mycache:999m max_size=20G inactive=1d use_temp_path=off;` +:: + + proxy_cache_path /data/cache levels=2:2:2 keys_zone=mycache:999m max_size=20G inactive=1d use_temp_path=off; -And modify the example *server* values to point to the RGWs URIs: -`server rgw1:8000 max_fails=2 fail_timeout=5s;` +And modify the example *server* values to point to the RGWs URIs: -`server rgw2:8000 max_fails=2 fail_timeout=5s;` +:: -`server rgw3:8000 max_fails=2 fail_timeout=5s;` - -It is important to substitute the access key and secret key located in the nginx.conf with those belong to the user with the amz-cache caps + server rgw1:8000 max_fails=2 fail_timeout=5s; + server rgw2:8000 max_fails=2 fail_timeout=5s; + server rgw3:8000 max_fails=2 fail_timeout=5s; -It is possible to use nginx slicing which is a better method for streaming purposes. +| It is important to substitute the *access key* and *secret key* located in the `nginx.conf` with those belong to the user with the `amz-cache` caps +| for example, create the `cache` user as following: + +:: -For using slice you should use nginx-slicing.conf and not nginx-default.conf + radosgw-admin user create --uid=cacheuser --display-name="cache user" --caps="amz-cache=read" --access-key --secret -Further information about nginx slicing: +It is possible to use Nginx slicing which is a better method for streaming purposes. + +For using slice you should use `nginx-slicing.conf` and not `nginx-default.conf` + +Further information about Nginx slicing: https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/#byte-range-caching -If you do not want to use the prefetch caching, It is possible to replace nginx-default.conf with nginx-noprefetch.conf -Using noprefetch means that if the client is sending range request of 0-4095 and then 0-4096 Nginx will cache those requests separately, So it will need to fetch those requests twice. +If you do not want to use the prefetch caching, It is possible to replace `nginx-default.conf` with `nginx-noprefetch.conf` +Using `noprefetch` means that if the client is sending range request of 0-4095 and then 0-4096 Nginx will cache those requests separately, So it will need to fetch those requests twice. + +Run Nginx(openresty): -Run nginx(openresty): :: $ sudo systemctl restart nginx + +Appendix +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +**A note about performance:** In certain instances like development environment, disabling the authentication by commenting the following line in `nginx-default.conf`: + +:: + + #auth_request /authentication; + +may (depending on the hardware) increases the performance significantly as it forgoes the auth API calls to radosgw. diff --git a/examples/rgw-cache/nginx-default.conf b/examples/rgw-cache/nginx-default.conf index 37dbb8070102..ddde7053946f 100644 --- a/examples/rgw-cache/nginx-default.conf +++ b/examples/rgw-cache/nginx-default.conf @@ -11,33 +11,53 @@ server { listen 80; server_name cacher; location /authentication { - internal; - limit_except GET { deny all; } + internal; + client_max_body_size 0; proxy_pass http://rgws$request_uri; - proxy_pass_request_body off; + proxy_pass_request_body off; proxy_set_header Host $host; # setting x-rgw-auth allow the RGW the ability to only authorize the request without fetching the obj data proxy_set_header x-rgw-auth "yes"; - proxy_set_header Authorization $http_authorization; - proxy_http_version 1.1; - proxy_method $request_method; + proxy_set_header Authorization $http_authorization; + proxy_http_version 1.1; + proxy_method $request_method; # Do not convert HEAD requests into GET requests - proxy_cache_convert_head off; - error_page 404 = @outage; - proxy_intercept_errors on; - if ($request_uri = "/"){ + proxy_cache_convert_head off; + error_page 404 = @outage; + proxy_intercept_errors on; + if ($request_uri = "/") { return 200; } # URI included with question mark is not being cached - if ($request_uri ~* (\?)){ - return 200; + if ($request_uri ~* (\?)) { + return 200; + } + if ($request_method = "PUT") { + return 200; + } + if ($request_method = "POST") { + return 200; + } + if ($request_method = "HEAD") { + return 200; + } + if ($request_method = "COPY") { + return 200; + } + if ($request_method = "DELETE") { + return 200; + } + if ($http_if_match) { + return 200; + } + if ($http_authorization !~* "aws4_request") { + return 200; } } location @outage{ - return 403; + return 403; } location / { - limit_except GET { deny all; } auth_request /authentication; proxy_pass http://rgws; set $authvar ''; @@ -66,26 +86,51 @@ server { # prevent convertion of head requests to get requests proxy_cache_convert_head off; # Listing all buckets should not be cached - if ($request_uri = "/"){ - set $do_not_cache "no"; - set $date $http_x_amz_date; + if ($request_uri = "/") { + set $do_not_cache "no"; + set $date $http_x_amz_date; } # URI including question mark are not supported to prevent bucket listing cache - if ($request_uri ~* (\?)){ + if ($request_uri ~* (\?)) { set $do_not_cache "no"; - set $date $http_x_amz_date; + set $date $http_x_amz_date; } # Only aws4 requests are being cached - As the aws auth module supporting only aws v2 if ($http_authorization !~* "aws4_request") { set $date $http_x_amz_date; } + if ($request_method = "PUT") { + set $date $http_x_amz_date; + } + if ($request_method = "POST") { + set $date $http_x_amz_date; + } + if ($request_method = "HEAD") { + set $do_not_cache "no"; + set $date $http_x_amz_date; + } + if ($request_method = "COPY") { + set $do_not_cache "no"; + set $date $http_x_amz_date; + } + if ($http_if_match) { + #set $do_not_cache "no"; + set $date $http_x_amz_date; + set $myrange $http_range; + } + if ($request_method = "DELETE") { + set $do_not_cache "no"; + set $date $http_x_amz_date; + } + proxy_set_header if_match $http_if_match; + proxy_set_header Range $myrange; # Use the original x-amz-date if the aws auth module didn't create one proxy_set_header x-amz-date $date; proxy_set_header X-Amz-Cache $authvar; proxy_no_cache $do_not_cache; - proxy_set_header Authorization $awsauth; + proxy_set_header Authorization $awsauthfour; # This is on which content the nginx to use for hashing the cache keys - proxy_cache_key "$request_uri$request_method$request_body"; - client_max_body_size 20G; + proxy_cache_key "$request_uri$request_method$request_body$myrange"; + client_max_body_size 0; } } diff --git a/examples/rgw-cache/nginx-lua-file.lua b/examples/rgw-cache/nginx-lua-file.lua index d776cb700f44..efaf42230a58 100644 --- a/examples/rgw-cache/nginx-lua-file.lua +++ b/examples/rgw-cache/nginx-lua-file.lua @@ -20,7 +20,7 @@ end if check ~= nil then local xamzcache = concathdrs:sub(2) xamzcache = xamzcache .. string.char(0xB2) .. "Authorization" .. string.char(0xB1) .. check - if xamzcache:find("aws4_request") ~= nil and uri ~= "/" and uri:find("?") == nil then + if xamzcache:find("aws4_request") ~= nil and uri ~= "/" and uri:find("?") == nil and hdrs["if-match"] == nil then ngx.var.authvar = xamzcache end end diff --git a/examples/rgw-cache/nginx-noprefetch.conf b/examples/rgw-cache/nginx-noprefetch.conf index 30661d300332..03e0ebc4a6dd 100644 --- a/examples/rgw-cache/nginx-noprefetch.conf +++ b/examples/rgw-cache/nginx-noprefetch.conf @@ -11,33 +11,53 @@ server { listen 80; server_name cacher; location /authentication { - internal; - limit_except GET { deny all; } + internal; + client_max_body_size 0; proxy_pass http://rgws$request_uri; - proxy_pass_request_body off; + proxy_pass_request_body off; proxy_set_header Host $host; # setting x-rgw-auth allow the RGW the ability to only authorize the request without fetching the obj data proxy_set_header x-rgw-auth "yes"; - proxy_set_header Authorization $http_authorization; - proxy_http_version 1.1; - proxy_method $request_method; + proxy_set_header Authorization $http_authorization; + proxy_http_version 1.1; + proxy_method $request_method; # Do not convert HEAD requests into GET requests - proxy_cache_convert_head off; - error_page 404 = @outage; - proxy_intercept_errors on; - if ($request_uri = "/"){ + proxy_cache_convert_head off; + error_page 404 = @outage; + proxy_intercept_errors on; + if ($request_uri = "/") { return 200; } # URI included with question mark is not being cached - if ($request_uri ~* (\?)){ - return 200; + if ($request_uri ~* (\?)) { + return 200; + } + if ($request_method = "PUT") { + return 200; + } + if ($request_method = "POST") { + return 200; + } + if ($request_method = "HEAD") { + return 200; + } + if ($request_method = "COPY") { + return 200; + } + if ($request_method = "DELETE") { + return 200; + } + if ($http_if_match) { + return 200; + } + if ($http_authorization !~* "aws4_request") { + return 200; } } location @outage{ - return 403; + return 403; } location / { - limit_except GET { deny all; } auth_request /authentication; proxy_pass http://rgws; # if $do_not_cache is not empty the request would not be cached, this is relevant for list op for example @@ -63,11 +83,11 @@ server { # prevent convertion of head requests to get requests proxy_cache_convert_head off; # Listing all buckets should not be cached - if ($request_uri = "/"){ - set $do_not_cache "no"; + if ($request_uri = "/") { + set $do_not_cache "no"; } # URI including question mark are not supported to prevent bucket listing cache - if ($request_uri ~* (\?)){ + if ($request_uri ~* (\?)) { set $do_not_cache "no"; } # Use the original x-amz-date if the aws auth module didn't create one @@ -76,6 +96,6 @@ server { proxy_set_header Range $http_range; # This is on which content the nginx to use for hashing the cache keys proxy_cache_key "$request_uri$request_method$request_body$http_range"; - client_max_body_size 20G; + client_max_body_size 0; } } diff --git a/examples/rgw-cache/nginx-slicing.conf b/examples/rgw-cache/nginx-slicing.conf index 1d6606d30fd8..d3c8f623b473 100644 --- a/examples/rgw-cache/nginx-slicing.conf +++ b/examples/rgw-cache/nginx-slicing.conf @@ -11,34 +11,54 @@ server { listen 80; server_name cacher; location /authentication { - internal; - limit_except GET { deny all; } + internal; + client_max_body_size 0; proxy_pass http://rgws$request_uri; - proxy_pass_request_body off; + proxy_pass_request_body off; proxy_set_header Host $host; # setting x-rgw-auth allow the RGW the ability to only authorize the request without fetching the obj data proxy_set_header x-rgw-auth "yes"; - proxy_set_header Authorization $http_authorization; - proxy_http_version 1.1; - proxy_method $request_method; + proxy_set_header Authorization $http_authorization; + proxy_http_version 1.1; + proxy_method $request_method; # Do not convert HEAD requests into GET requests - proxy_cache_convert_head off; - error_page 404 = @outage; - proxy_intercept_errors on; - if ($request_uri = "/"){ + proxy_cache_convert_head off; + error_page 404 = @outage; + proxy_intercept_errors on; + if ($request_uri = "/") { return 200; } # URI included with question mark is not being cached - if ($request_uri ~* (\?)){ - return 200; + if ($request_uri ~* (\?)) { + return 200; + } + if ($request_method = "PUT") { + return 200; + } + if ($request_method = "POST") { + return 200; + } + if ($request_method = "HEAD") { + return 200; + } + if ($request_method = "COPY") { + return 200; + } + if ($request_method = "DELETE") { + return 200; + } + if ($http_if_match) { + return 200; + } + if ($http_authorization !~* "aws4_request") { + return 200; } } location @outage{ - return 403; + return 403; } location / { slice 1m; - limit_except GET { deny all; } auth_request /authentication; proxy_set_header Range $slice_range; proxy_pass http://rgws; @@ -68,26 +88,50 @@ server { # prevent convertion of head requests to get requests proxy_cache_convert_head off; # Listing all buckets should not be cached - if ($request_uri = "/"){ - set $do_not_cache "no"; - set $date $http_x_amz_date; + if ($request_uri = "/") { + set $do_not_cache "no"; + set $date $http_x_amz_date; } # URI including question mark are not supported to prevent bucket listing cache - if ($request_uri ~* (\?)){ + if ($request_uri ~* (\?)) { set $do_not_cache "no"; - set $date $http_x_amz_date; + set $date $http_x_amz_date; } # Only aws4 requests are being cached - As the aws auth module supporting only aws v2 if ($http_authorization !~* "aws4_request") { set $date $http_x_amz_date; } + if ($request_method = "PUT") { + set $date $http_x_amz_date; + } + if ($request_method = "POST") { + set $date $http_x_amz_date; + } + if ($request_method = "HEAD") { + set $do_not_cache "no"; + set $date $http_x_amz_date; + } + if ($request_method = "COPY") { + set $do_not_cache "no"; + set $date $http_x_amz_date; + } + if ($http_if_match) { + #set $do_not_cache "no"; + set $date $http_x_amz_date; + set $myrange $slice_range; + } + if ($request_method = "DELETE") { + set $do_not_cache "no"; + set $date $http_x_amz_date; + } + proxy_set_header if_match $http_if_match; # Use the original x-amz-date if the aws auth module didn't create one proxy_set_header x-amz-date $date; proxy_set_header X-Amz-Cache $authvar; proxy_no_cache $do_not_cache; - proxy_set_header Authorization $awsauth; + proxy_set_header Authorization $awsauthfour; # This is on which content the nginx to use for hashing the cache keys proxy_cache_key "$request_uri$request_method$request_body$slice_range"; - client_max_body_size 20G; + client_max_body_size 0; } } diff --git a/examples/rgw-cache/nginx.conf b/examples/rgw-cache/nginx.conf index f000597da62e..a478db1dc935 100644 --- a/examples/rgw-cache/nginx.conf +++ b/examples/rgw-cache/nginx.conf @@ -25,6 +25,23 @@ http { default $http_authorization; ~. $aws_token; # Regular expression to match any value } + map $request_uri $awsauthtwo { + "/" $http_authorization; + "~\?" $http_authorization; + default $awsauth; + } + map $request_method $awsauththree { + default $awsauthtwo; + "PUT" $http_authorization; + "HEAD" $http_authorization; + "POST" $http_authorization; + "DELETE" $http_authorization; + "COPY" $http_authorization; + } + map $http_if_match $awsauthfour { + ~. $http_authorization; # Regular expression to match any value + default $awsauththree; + } include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' diff --git a/src/rgw/rgw_auth_s3.cc b/src/rgw/rgw_auth_s3.cc index 0eefc19f9887..27ccdf65564f 100644 --- a/src/rgw/rgw_auth_s3.cc +++ b/src/rgw/rgw_auth_s3.cc @@ -571,7 +571,7 @@ get_v4_canonical_headers(const req_info& info, } const char* const t = info.env->get(token_env.c_str()); if (!t) { - dout(10) << "warning env var not available" << dendl; + dout(10) << "warning env var not available " << token_env.c_str() << dendl; continue; }