πŸ„ΈπŸ…πŸ…€β€


RGW Error Handling

Contents

Summary

Write up of a recent RGW error handling deep dive.

From the perspective of a storage backend … like the s3gw filesystem+SQLite backend .

How to convert storage errors to client errors in a meaningful way?

S3 Errors - How do they look like?

S3 uses HTTP return codes first and adds detail via the XML error response.

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>CamelCasedError code from ErrorCodeList </Code>
  <Message>Message for human consumption</Message>
  <Resource>Bucket or object that caused the error</Resource>
  <RequestId>ID of request associated with the error</RequestId>
</Error>
Example S3 error reply

Error Handling in RGW

RGW uses negative return values to signal errors. C++ exceptions are not expected and generally bubble up to the top crashing the application.

Error codes are a mixture of custom (ERR_...) and errno.h (E...) errors. The errno.h errors usually make only sense if you look at them from a client perspective This is important! RGW uses EINVAL, EPERM, EACCES, ENOENT, EEXIST. Passing filesystem errors further up will result in weird behavior. .

Standard Errors?

For completeness, HTTP codes are nicely documented here:

Amazon has a list of error codes in their docs:

RGW defines more on top, as do other vendors:

RGW Error Code Implementation

There is a global mapping typed rgw_http_errors of error number (err_no) to HTTP code and an error code string RGW supports multiple protocols, therefore there are mappings for S3, SWIFT, STS and IAM protocol frontends. .

RGW Operations Descendants of RGWOp. Counted 119 while writing this. Examples: GetObj, ListBuckets, etc. have an error state as part of their req_state. The request processing converts this to HTTP return codes and an S3 style error response. Responses may contain a message field, if req_state.err.message is set.

The SAL layer does not have access to the error state. SAL backends may only return error numbers, no additional message.

Source links:

RGW Error Code Table

Based on the lookup table in rgw/rgw_common.cc, but reordered a bit.

RGW Error [err_no] HTTP error code [http_ret] Error Code [err_code] Notes
200…
0 200
STATUS_CREATED 201 "Created"
STATUS_ACCEPTED 202 "Accepted"
STATUS_NO_CONTENT 204 "NoContent"
STATUS_PARTIAL_CONTENT 206
300…
ERR_PERMANENT_REDIRECT 301 "PermanentRedirect"
ERR_WEBSITE_REDIRECT 301 "WebsiteRedirect"
STATUS_REDIRECT 303
ERR_NOT_MODIFIED 304 "NotModified"
400…
EINVAL 400 "InvalidArgument"
ERR_INVALID_REQUEST 400 "InvalidRequest"
ERR_INVALID_DIGEST 400 "InvalidDigest"
ERR_BAD_DIGEST 400 "BadDigest"
ERR_INVALID_LOCATION_CONSTRAINT 400 "InvalidLocationConstraint"
ERR_ZONEGROUP_DEFAULT_PLACEMENT_MISCONFIGURATION 400 "ZonegroupDefaultPlacementMisconfiguration"
ERR_INVALID_BUCKET_NAME 400 "InvalidBucketName"
ERR_INVALID_OBJECT_NAME 400 "InvalidObjectName" non standard?
ERR_UNRESOLVABLE_EMAIL 400 "UnresolvableGrantByEmailAddress"
ERR_MALFORMED_XML 400 "MalformedXML"
ERR_AMZ_CONTENT_SHA256_MISMATCH 400 "XAmzContentSHA256Mismatch"
ERR_MALFORMED_DOC 400 "MalformedPolicyDocument"
ERR_INVALID_TAG 400 "InvalidTag"
ERR_MALFORMED_ACL_ERROR 400 "MalformedACLError"
ERR_INVALID_CORS_RULES_ERROR 400 "InvalidRequest"
ERR_INVALID_WEBSITE_ROUTING_RULES_ERROR 400 "InvalidRequest"
ERR_INVALID_ENCRYPTION_ALGORITHM 400 "InvalidEncryptionAlgorithmError"
ERR_INVALID_RETENTION_PERIOD 400 "InvalidRetentionPeriod"
EACCES 403 "AccessDenied"
EPERM 403 "AccessDenied"
ERR_LENGTH_REQUIRED 411 "MissingContentLength" HTTP header Content-Length
ERR_SIGNATURE_NO_MATCH 403 "SignatureDoesNotMatch"
ERR_INVALID_ACCESS_KEY 403 "InvalidAccessKeyId"
ERR_USER_SUSPENDED 403 "UserSuspended"
ERR_REQUEST_TIME_SKEWED 403 "RequestTimeTooSkewed"
ERR_MFA_REQUIRED 403 "AccessDenied"
ENOENT 404 "NoSuchKey"
ERR_NO_SUCH_WEBSITE_CONFIGURATION 404 "NoSuchWebsiteConfiguration"
ERR_NOT_FOUND 404 "Not Found" unused
ERR_NO_SUCH_LC 404 "NoSuchLifecycleConfiguration"
ERR_NO_SUCH_BUCKET_POLICY 404 "NoSuchBucketPolicy"
ERR_NO_SUCH_USER 404 "NoSuchUser" RGW Admin
ERR_NO_ROLE_FOUND 404 "NoSuchEntity"
ERR_NO_CORS_FOUND 404 "NoSuchCORSConfiguration"
ERR_NO_SUCH_SUBUSER 404 "NoSuchSubUser"
ERR_NO_SUCH_CORS_CONFIGURATION 404 "NoSuchCORSConfiguration"
ERR_NO_SUCH_OBJECT_LOCK_CONFIGURATION 404 "ObjectLockConfigurationNotFoundError"
ERR_METHOD_NOT_ALLOWED 405 "MethodNotAllowed"
ERR_USER_EXIST 409 "UserAlreadyExists"
ERR_EMAIL_EXIST 409 "EmailExists"
ERR_KEY_EXIST 409 "KeyExists"
ERR_TAG_CONFLICT 409 "OperationAborted"
ERR_INVALID_SECRET_KEY 400 "InvalidSecretKey"
ERR_INVALID_KEY_TYPE 400 "InvalidKeyType"
ERR_INVALID_CAP 400 "InvalidCapability"
ERR_INVALID_TENANT_NAME 400 "InvalidTenantName"
ERR_PRECONDITION_FAILED 412 "PreconditionFailed"
ERR_LOCKED 423 "Locked"
ERR_ZERO_IN_URL 400 "InvalidRequest"
ERR_NO_SUCH_TAG_SET 404 "NoSuchTagSet"
ERR_NO_SUCH_BUCKET_ENCRYPTION_CONFIGURATION 404 "ServerSideEncryptionConfigurationNotFoundError"
ERR_LIMIT_EXCEEDED 400 "LimitExceeded" RGW put ACLs
ERR_NO_SUCH_ENTITY 404 "NoSuchEntity"
ERR_POSITION_NOT_EQUAL_TO_LENGTH 409 "PositionNotEqualToLength" append object processor
ERR_OBJECT_NOT_APPENDABLE 409 "ObjectNotAppendable" append object processor
ERR_UNPROCESSABLE_ENTITY 422 "UnprocessableEntity" etag in put obj
ERR_INVALID_OBJECT_STATE 403 "InvalidObjectState" (tiering)
ERR_INVALID_BUCKET_STATE 409 "InvalidBucketState"
Storage related…
ERR_INVALID_PART 400 "InvalidPart" (multipart)
ERR_INVALID_PART_ORDER 400 "InvalidPartOrder" (multipart)
ERR_NO_SUCH_UPLOAD 404 "NoSuchUpload" (multipart)
ERR_TOO_LARGE 400 "EntityTooLarge" > allowed obj size, S3 Limits, rgw config
ERR_TOO_SMALL 400 "EntityTooSmall" < min obj size
ERR_TOO_MANY_BUCKETS 400 "TooManyBuckets" Quota
ERR_NO_SUCH_BUCKET 404 "NoSuchBucket"
EEXIST 409 "BucketAlreadyExists"
ENOTEMPTY 409 "BucketNotEmpty"
ERANGE 416 "InvalidRange" e.g offset >= object size
ERR_QUOTA_EXCEEDED 403 "QuotaExceeded" ENOSPC, Quota
Misc, Internal, Rate Limit..
ETIMEDOUT 408 "RequestTimeout"
ERR_REQUEST_TIMEOUT 400 "RequestTimeout"
ERR_INTERNAL_ERROR 500 "InternalError"
ERR_NOT_IMPLEMENTED 501 "NotImplemented"
ERR_SERVICE_UNAVAILABLE 503 "ServiceUnavailable"
ERR_RATE_LIMITED 503 "SlowDown"
fallback 500 "UnknownError".

Client Side

Permanent errors have to be handled appropriately to their meaning. Boto3, for example, throws a matching exception

Temporary errors are a bit more interesting. To quote the S3 docs:

Internal errors are errors that occur within the Amazon S3 environment.

Requests that receive an InternalError response might not have processed. For example, if a PUT request returns InternalError, a subsequent GET might retrieve the old value or the updated value.

If Amazon S3 returns an InternalError response, retry the request.

From Amazon S3 User Guide Error Best Practices

Boto3 Retry Behavior

There is a standard and legacy mode Difference is as far as I can tell mostly in the retry count and the list of retryable errors. .

In both modes retry errors from a list of retryable errors. Errors are listed in the boto3 retries doc, consisting of transient and throttling / rate limit errors.

Summary: Retry anything that looks like a transient server side error or rate limiting.

Both modes use exponential backoff.