-==========================
+==========================
RGW Data caching and CDN
==========================
.. contents::
This feature adds to RGW the ability to securely cache objects and offload the workload from the cluster, using Nginx.
-After an object is accessed the first time it will be stored in the Nginx directory.
+After an object is accessed the first time it will be stored in the Nginx cache directory.
When data is already cached, it need not be fetched from RGW. A permission check will be made against RGW to ensure the requesting user has access.
-This feature is based on some Nginx modules, ngx_http_auth_request_module, https://github.com/kaltura/nginx-aws-auth-module, Openresty for lua capabilities.
-Currently this feature only works for GET requests and it will cache only AWSv4 requests (only s3 requests).
+This feature is based on some Nginx modules, ngx_http_auth_request_module, https://github.com/kaltura/nginx-aws-auth-module, Openresty for Lua capabilities.
+
+Currently, this feature will cache only AWSv4 requests (only s3 requests), caching-in the output of the 1st GET request
+and caching-out on subsequent GET requests, passing thru transparently PUT,POST,HEAD,DELETE and COPY requests.
+
+
The feature introduces 2 new APIs: Auth and Cache.
New APIs
-------------------------
-There are 2 new apis for this feature:
+There are 2 new APIs for this feature:
-Auth API - The cache uses this to validate that an user can access the cached data
+Auth API - The cache uses this to validate that a user can access the cached data
Cache API - Adds the ability to override securely Range header, that way Nginx can use it is own smart cache on top of S3:
https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/
-Using this API gives the ability to read ahead objects when clients asking a specific range from the object.
-On subsequent accesses to the cached object, Nginx will satisfy requests for already-cached ranges from cache. Uncached ranges will be read from RGW (and cached).
+Using this API gives the ability to read ahead objects when clients asking a specific range from the object.
+On subsequent accesses to the cached object, Nginx will satisfy requests for already-cached ranges from the cache. Uncached ranges will be read from RGW (and cached).
Auth API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This API Validates a specific authenticated access being made to the cache, using RGW's knowledge of the client credentials and stored access policy.
+
+This API Validates a specific authenticated access being made to the cache, using RGW's knowledge of the client credentials and stored access policy.
Returns success if the encapsulated request would be granted.
Cache API
This user can send to the RGW the Cache API header ``X-Amz-Cache``, this header contains the headers from the original request(before changing the Range header).
It means that ``X-Amz-Cache`` built from several headers.
-The headers that are building the ``X-Amz-Cache`` header are separated by char with ascii code 177 and the header name and value are separated by char ascii code 178.
-The RGW will check that the cache user is an authorized user and if it is a cache user,
-if yes it will use the ``X-Amz-Cache`` to revalidate that the user have permissions, using the headers from the X-Amz-Cache.
-During this flow the RGW will override the Range header.
+The headers that are building the ``X-Amz-Cache`` header are separated by char with ASCII code 177 and the header name and value are separated by char ASCII code 178.
+The RGW will check that the cache user is an authorized user and if it is a cache user,
+if yes it will use the ``X-Amz-Cache`` to revalidate that the user has permissions, using the headers from the X-Amz-Cache.
+During this flow, the RGW will override the Range header.
Using Nginx with RGW
$ wget https://openresty.org/download/openresty-1.15.8.3.tar.gz
-git clone the aws auth nginx module:
+git clone the AWS auth Nginx module:
::
$ sudo gmake install
$ sudo ln -sf /usr/local/openresty/bin/openresty /usr/bin/nginx
-Put in-place your nginx configuration files and edit them according to your environment:
+Put in-place your Nginx configuration files and edit them according to your environment:
-All nginx conf files are under: https://github.com/ceph/ceph/tree/master/examples/rgw-cache
+All Nginx conf files are under: https://github.com/ceph/ceph/tree/master/examples/rgw-cache
-nginx.conf should go to /etc/nginx/nginx.conf
+`nginx.conf` should go to `/etc/nginx/nginx.conf`
-nginx-lua-file.lua should go to /etc/nginx/nginx-lua-file.lua
+`nginx-lua-file.lua` should go to `/etc/nginx/nginx-lua-file.lua`
-nginx-default.conf should go to /etc/nginx/conf.d/nginx-default.conf
+`nginx-default.conf` should go to `/etc/nginx/conf.d/nginx-default.conf`
-The parameters that are most likely to require adjustment according to the environment are located in the file nginx-default.conf
+The parameters that are most likely to require adjustment according to the environment are located in the file `nginx-default.conf`
Modify the example values of *proxy_cache_path* and *max_size* at:
-`proxy_cache_path /data/cache levels=2:2:2 keys_zone=mycache:999m max_size=20G inactive=1d use_temp_path=off;`
+::
+
+ proxy_cache_path /data/cache levels=2:2:2 keys_zone=mycache:999m max_size=20G inactive=1d use_temp_path=off;
-And modify the example *server* values to point to the RGWs URIs:
-`server rgw1:8000 max_fails=2 fail_timeout=5s;`
+And modify the example *server* values to point to the RGWs URIs:
-`server rgw2:8000 max_fails=2 fail_timeout=5s;`
+::
-`server rgw3:8000 max_fails=2 fail_timeout=5s;`
-
-It is important to substitute the access key and secret key located in the nginx.conf with those belong to the user with the amz-cache caps
+ server rgw1:8000 max_fails=2 fail_timeout=5s;
+ server rgw2:8000 max_fails=2 fail_timeout=5s;
+ server rgw3:8000 max_fails=2 fail_timeout=5s;
-It is possible to use nginx slicing which is a better method for streaming purposes.
+| It is important to substitute the *access key* and *secret key* located in the `nginx.conf` with those belong to the user with the `amz-cache` caps
+| for example, create the `cache` user as following:
+
+::
-For using slice you should use nginx-slicing.conf and not nginx-default.conf
+ radosgw-admin user create --uid=cacheuser --display-name="cache user" --caps="amz-cache=read" --access-key <access> --secret <secret>
-Further information about nginx slicing:
+It is possible to use Nginx slicing which is a better method for streaming purposes.
+
+For using slice you should use `nginx-slicing.conf` and not `nginx-default.conf`
+
+Further information about Nginx slicing:
https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/#byte-range-caching
-If you do not want to use the prefetch caching, It is possible to replace nginx-default.conf with nginx-noprefetch.conf
-Using noprefetch means that if the client is sending range request of 0-4095 and then 0-4096 Nginx will cache those requests separately, So it will need to fetch those requests twice.
+If you do not want to use the prefetch caching, It is possible to replace `nginx-default.conf` with `nginx-noprefetch.conf`
+Using `noprefetch` means that if the client is sending range request of 0-4095 and then 0-4096 Nginx will cache those requests separately, So it will need to fetch those requests twice.
+
+Run Nginx(openresty):
-Run nginx(openresty):
::
$ sudo systemctl restart nginx
+
+Appendix
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+**A note about performance:** In certain instances like development environment, disabling the authentication by commenting the following line in `nginx-default.conf`:
+
+::
+
+ #auth_request /authentication;
+
+may (depending on the hardware) increases the performance significantly as it forgoes the auth API calls to radosgw.
listen 80;
server_name cacher;
location /authentication {
- internal;
- limit_except GET { deny all; }
+ internal;
+ client_max_body_size 0;
proxy_pass http://rgws$request_uri;
- proxy_pass_request_body off;
+ proxy_pass_request_body off;
proxy_set_header Host $host;
# setting x-rgw-auth allow the RGW the ability to only authorize the request without fetching the obj data
proxy_set_header x-rgw-auth "yes";
- proxy_set_header Authorization $http_authorization;
- proxy_http_version 1.1;
- proxy_method $request_method;
+ proxy_set_header Authorization $http_authorization;
+ proxy_http_version 1.1;
+ proxy_method $request_method;
# Do not convert HEAD requests into GET requests
- proxy_cache_convert_head off;
- error_page 404 = @outage;
- proxy_intercept_errors on;
- if ($request_uri = "/"){
+ proxy_cache_convert_head off;
+ error_page 404 = @outage;
+ proxy_intercept_errors on;
+ if ($request_uri = "/") {
return 200;
}
# URI included with question mark is not being cached
- if ($request_uri ~* (\?)){
- return 200;
+ if ($request_uri ~* (\?)) {
+ return 200;
+ }
+ if ($request_method = "PUT") {
+ return 200;
+ }
+ if ($request_method = "POST") {
+ return 200;
+ }
+ if ($request_method = "HEAD") {
+ return 200;
+ }
+ if ($request_method = "COPY") {
+ return 200;
+ }
+ if ($request_method = "DELETE") {
+ return 200;
+ }
+ if ($http_if_match) {
+ return 200;
+ }
+ if ($http_authorization !~* "aws4_request") {
+ return 200;
}
}
location @outage{
- return 403;
+ return 403;
}
location / {
- limit_except GET { deny all; }
auth_request /authentication;
proxy_pass http://rgws;
set $authvar '';
# prevent convertion of head requests to get requests
proxy_cache_convert_head off;
# Listing all buckets should not be cached
- if ($request_uri = "/"){
- set $do_not_cache "no";
- set $date $http_x_amz_date;
+ if ($request_uri = "/") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
}
# URI including question mark are not supported to prevent bucket listing cache
- if ($request_uri ~* (\?)){
+ if ($request_uri ~* (\?)) {
set $do_not_cache "no";
- set $date $http_x_amz_date;
+ set $date $http_x_amz_date;
}
# Only aws4 requests are being cached - As the aws auth module supporting only aws v2
if ($http_authorization !~* "aws4_request") {
set $date $http_x_amz_date;
}
+ if ($request_method = "PUT") {
+ set $date $http_x_amz_date;
+ }
+ if ($request_method = "POST") {
+ set $date $http_x_amz_date;
+ }
+ if ($request_method = "HEAD") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ }
+ if ($request_method = "COPY") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ }
+ if ($http_if_match) {
+ #set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ set $myrange $http_range;
+ }
+ if ($request_method = "DELETE") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ }
+ proxy_set_header if_match $http_if_match;
+ proxy_set_header Range $myrange;
# Use the original x-amz-date if the aws auth module didn't create one
proxy_set_header x-amz-date $date;
proxy_set_header X-Amz-Cache $authvar;
proxy_no_cache $do_not_cache;
- proxy_set_header Authorization $awsauth;
+ proxy_set_header Authorization $awsauthfour;
# This is on which content the nginx to use for hashing the cache keys
- proxy_cache_key "$request_uri$request_method$request_body";
- client_max_body_size 20G;
+ proxy_cache_key "$request_uri$request_method$request_body$myrange";
+ client_max_body_size 0;
}
}
if check ~= nil then
local xamzcache = concathdrs:sub(2)
xamzcache = xamzcache .. string.char(0xB2) .. "Authorization" .. string.char(0xB1) .. check
- if xamzcache:find("aws4_request") ~= nil and uri ~= "/" and uri:find("?") == nil then
+ if xamzcache:find("aws4_request") ~= nil and uri ~= "/" and uri:find("?") == nil and hdrs["if-match"] == nil then
ngx.var.authvar = xamzcache
end
end
listen 80;
server_name cacher;
location /authentication {
- internal;
- limit_except GET { deny all; }
+ internal;
+ client_max_body_size 0;
proxy_pass http://rgws$request_uri;
- proxy_pass_request_body off;
+ proxy_pass_request_body off;
proxy_set_header Host $host;
# setting x-rgw-auth allow the RGW the ability to only authorize the request without fetching the obj data
proxy_set_header x-rgw-auth "yes";
- proxy_set_header Authorization $http_authorization;
- proxy_http_version 1.1;
- proxy_method $request_method;
+ proxy_set_header Authorization $http_authorization;
+ proxy_http_version 1.1;
+ proxy_method $request_method;
# Do not convert HEAD requests into GET requests
- proxy_cache_convert_head off;
- error_page 404 = @outage;
- proxy_intercept_errors on;
- if ($request_uri = "/"){
+ proxy_cache_convert_head off;
+ error_page 404 = @outage;
+ proxy_intercept_errors on;
+ if ($request_uri = "/") {
return 200;
}
# URI included with question mark is not being cached
- if ($request_uri ~* (\?)){
- return 200;
+ if ($request_uri ~* (\?)) {
+ return 200;
+ }
+ if ($request_method = "PUT") {
+ return 200;
+ }
+ if ($request_method = "POST") {
+ return 200;
+ }
+ if ($request_method = "HEAD") {
+ return 200;
+ }
+ if ($request_method = "COPY") {
+ return 200;
+ }
+ if ($request_method = "DELETE") {
+ return 200;
+ }
+ if ($http_if_match) {
+ return 200;
+ }
+ if ($http_authorization !~* "aws4_request") {
+ return 200;
}
}
location @outage{
- return 403;
+ return 403;
}
location / {
- limit_except GET { deny all; }
auth_request /authentication;
proxy_pass http://rgws;
# if $do_not_cache is not empty the request would not be cached, this is relevant for list op for example
# prevent convertion of head requests to get requests
proxy_cache_convert_head off;
# Listing all buckets should not be cached
- if ($request_uri = "/"){
- set $do_not_cache "no";
+ if ($request_uri = "/") {
+ set $do_not_cache "no";
}
# URI including question mark are not supported to prevent bucket listing cache
- if ($request_uri ~* (\?)){
+ if ($request_uri ~* (\?)) {
set $do_not_cache "no";
}
# Use the original x-amz-date if the aws auth module didn't create one
proxy_set_header Range $http_range;
# This is on which content the nginx to use for hashing the cache keys
proxy_cache_key "$request_uri$request_method$request_body$http_range";
- client_max_body_size 20G;
+ client_max_body_size 0;
}
}
listen 80;
server_name cacher;
location /authentication {
- internal;
- limit_except GET { deny all; }
+ internal;
+ client_max_body_size 0;
proxy_pass http://rgws$request_uri;
- proxy_pass_request_body off;
+ proxy_pass_request_body off;
proxy_set_header Host $host;
# setting x-rgw-auth allow the RGW the ability to only authorize the request without fetching the obj data
proxy_set_header x-rgw-auth "yes";
- proxy_set_header Authorization $http_authorization;
- proxy_http_version 1.1;
- proxy_method $request_method;
+ proxy_set_header Authorization $http_authorization;
+ proxy_http_version 1.1;
+ proxy_method $request_method;
# Do not convert HEAD requests into GET requests
- proxy_cache_convert_head off;
- error_page 404 = @outage;
- proxy_intercept_errors on;
- if ($request_uri = "/"){
+ proxy_cache_convert_head off;
+ error_page 404 = @outage;
+ proxy_intercept_errors on;
+ if ($request_uri = "/") {
return 200;
}
# URI included with question mark is not being cached
- if ($request_uri ~* (\?)){
- return 200;
+ if ($request_uri ~* (\?)) {
+ return 200;
+ }
+ if ($request_method = "PUT") {
+ return 200;
+ }
+ if ($request_method = "POST") {
+ return 200;
+ }
+ if ($request_method = "HEAD") {
+ return 200;
+ }
+ if ($request_method = "COPY") {
+ return 200;
+ }
+ if ($request_method = "DELETE") {
+ return 200;
+ }
+ if ($http_if_match) {
+ return 200;
+ }
+ if ($http_authorization !~* "aws4_request") {
+ return 200;
}
}
location @outage{
- return 403;
+ return 403;
}
location / {
slice 1m;
- limit_except GET { deny all; }
auth_request /authentication;
proxy_set_header Range $slice_range;
proxy_pass http://rgws;
# prevent convertion of head requests to get requests
proxy_cache_convert_head off;
# Listing all buckets should not be cached
- if ($request_uri = "/"){
- set $do_not_cache "no";
- set $date $http_x_amz_date;
+ if ($request_uri = "/") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
}
# URI including question mark are not supported to prevent bucket listing cache
- if ($request_uri ~* (\?)){
+ if ($request_uri ~* (\?)) {
set $do_not_cache "no";
- set $date $http_x_amz_date;
+ set $date $http_x_amz_date;
}
# Only aws4 requests are being cached - As the aws auth module supporting only aws v2
if ($http_authorization !~* "aws4_request") {
set $date $http_x_amz_date;
}
+ if ($request_method = "PUT") {
+ set $date $http_x_amz_date;
+ }
+ if ($request_method = "POST") {
+ set $date $http_x_amz_date;
+ }
+ if ($request_method = "HEAD") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ }
+ if ($request_method = "COPY") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ }
+ if ($http_if_match) {
+ #set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ set $myrange $slice_range;
+ }
+ if ($request_method = "DELETE") {
+ set $do_not_cache "no";
+ set $date $http_x_amz_date;
+ }
+ proxy_set_header if_match $http_if_match;
# Use the original x-amz-date if the aws auth module didn't create one
proxy_set_header x-amz-date $date;
proxy_set_header X-Amz-Cache $authvar;
proxy_no_cache $do_not_cache;
- proxy_set_header Authorization $awsauth;
+ proxy_set_header Authorization $awsauthfour;
# This is on which content the nginx to use for hashing the cache keys
proxy_cache_key "$request_uri$request_method$request_body$slice_range";
- client_max_body_size 20G;
+ client_max_body_size 0;
}
}
default $http_authorization;
~. $aws_token; # Regular expression to match any value
}
+ map $request_uri $awsauthtwo {
+ "/" $http_authorization;
+ "~\?" $http_authorization;
+ default $awsauth;
+ }
+ map $request_method $awsauththree {
+ default $awsauthtwo;
+ "PUT" $http_authorization;
+ "HEAD" $http_authorization;
+ "POST" $http_authorization;
+ "DELETE" $http_authorization;
+ "COPY" $http_authorization;
+ }
+ map $http_if_match $awsauthfour {
+ ~. $http_authorization; # Regular expression to match any value
+ default $awsauththree;
+ }
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
}
const char* const t = info.env->get(token_env.c_str());
if (!t) {
- dout(10) << "warning env var not available" << dendl;
+ dout(10) << "warning env var not available " << token_env.c_str() << dendl;
continue;
}