πŸ„ΈπŸ…πŸ…€β€


RGW Error Handling

Table of Contents

Summary

Write up of a recent RGW error handling deep dive.

From the perspective of a storage backend (1)… like the s3gw filesystem+SQLite backend .

How to convert storage errors to client errors in a meaningful way?

S3 Errors - How do they look like?

S3 uses HTTP return codes first and adds detail via the XML error response.

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>CamelCasedError code from ErrorCodeList </Code>
  <Message>Message for human consumption</Message>
  <Resource>Bucket or object that caused the error</Resource>
  <RequestId>ID of request associated with the error</RequestId>
</Error>
Example S3 error reply

Error Handling in RGW

RGW uses negative return values to signal errors. C++ exceptions are not expected and generally bubble up to the top crashing the application.

Error codes are a mixture of custom (ERR_...) and errno.h (E...) errors. The errno.h errors usually make only sense if you look at them from a client perspective (2)This is important! RGW uses EINVAL, EPERM, EACCES, ENOENT, EEXIST. Passing filesystem errors further up will result in weird behavior. .

Standard Errors?

For completeness, HTTP codes are nicely documented here:

Amazon has a list of error codes in their docs:

RGW defines more on top, as do other vendors:

RGW Error Code Implementation

There is a global mapping typed rgw_http_errors of error number (err_no) to HTTP code and an error code string (3)RGW supports multiple protocols, therefore there are mappings for S3, SWIFT, STS and IAM protocol frontends. .

RGW Operations (4)Descendants of RGWOp. Counted 119 while writing this. Examples: GetObj, ListBuckets, etc. have an error state as part of their req_state. The request processing converts this to HTTP return codes and an S3 style error response. Responses may contain a message field, if req_state.err.message is set.

The SAL layer does not have access to the error state. SAL backends may only return error numbers, no additional message.

Source links:

RGW Error Code Table

Based on the lookup table in rgw/rgw_common.cc, but reordered a bit.

RGW Error [err_no]HTTP error code [http_ret]Error Code [err_code]Notes
200…
0200
STATUS_CREATED201"Created"
STATUS_ACCEPTED202"Accepted"
STATUS_NO_CONTENT204"NoContent"
STATUS_PARTIAL_CONTENT206
300…
ERR_PERMANENT_REDIRECT301"PermanentRedirect"
ERR_WEBSITE_REDIRECT301"WebsiteRedirect"
STATUS_REDIRECT303
ERR_NOT_MODIFIED304"NotModified"
400…
EINVAL400"InvalidArgument"
ERR_INVALID_REQUEST400"InvalidRequest"
ERR_INVALID_DIGEST400"InvalidDigest"
ERR_BAD_DIGEST400"BadDigest"
ERR_INVALID_LOCATION_CONSTRAINT400"InvalidLocationConstraint"
ERR_ZONEGROUP_DEFAULT_PLACEMENT_MISCONFIGURATION400"ZonegroupDefaultPlacementMisconfiguration"
ERR_INVALID_BUCKET_NAME400"InvalidBucketName"
ERR_INVALID_OBJECT_NAME400"InvalidObjectName"non standard?
ERR_UNRESOLVABLE_EMAIL400"UnresolvableGrantByEmailAddress"
ERR_MALFORMED_XML400"MalformedXML"
ERR_AMZ_CONTENT_SHA256_MISMATCH400"XAmzContentSHA256Mismatch"
ERR_MALFORMED_DOC400"MalformedPolicyDocument"
ERR_INVALID_TAG400"InvalidTag"
ERR_MALFORMED_ACL_ERROR400"MalformedACLError"
ERR_INVALID_CORS_RULES_ERROR400"InvalidRequest"
ERR_INVALID_WEBSITE_ROUTING_RULES_ERROR400"InvalidRequest"
ERR_INVALID_ENCRYPTION_ALGORITHM400"InvalidEncryptionAlgorithmError"
ERR_INVALID_RETENTION_PERIOD400"InvalidRetentionPeriod"
EACCES403"AccessDenied"
EPERM403"AccessDenied"
ERR_LENGTH_REQUIRED411"MissingContentLength"HTTP header Content-Length
ERR_SIGNATURE_NO_MATCH403"SignatureDoesNotMatch"
ERR_INVALID_ACCESS_KEY403"InvalidAccessKeyId"
ERR_USER_SUSPENDED403"UserSuspended"
ERR_REQUEST_TIME_SKEWED403"RequestTimeTooSkewed"
ERR_MFA_REQUIRED403"AccessDenied"
ENOENT404"NoSuchKey"
ERR_NO_SUCH_WEBSITE_CONFIGURATION404"NoSuchWebsiteConfiguration"
ERR_NOT_FOUND404"Not Found"unused
ERR_NO_SUCH_LC404"NoSuchLifecycleConfiguration"
ERR_NO_SUCH_BUCKET_POLICY404"NoSuchBucketPolicy"
ERR_NO_SUCH_USER404"NoSuchUser"RGW Admin
ERR_NO_ROLE_FOUND404"NoSuchEntity"
ERR_NO_CORS_FOUND404"NoSuchCORSConfiguration"
ERR_NO_SUCH_SUBUSER404"NoSuchSubUser"
ERR_NO_SUCH_CORS_CONFIGURATION404"NoSuchCORSConfiguration"
ERR_NO_SUCH_OBJECT_LOCK_CONFIGURATION404"ObjectLockConfigurationNotFoundError"
ERR_METHOD_NOT_ALLOWED405"MethodNotAllowed"
ERR_USER_EXIST409"UserAlreadyExists"
ERR_EMAIL_EXIST409"EmailExists"
ERR_KEY_EXIST409"KeyExists"
ERR_TAG_CONFLICT409"OperationAborted"
ERR_INVALID_SECRET_KEY400"InvalidSecretKey"
ERR_INVALID_KEY_TYPE400"InvalidKeyType"
ERR_INVALID_CAP400"InvalidCapability"
ERR_INVALID_TENANT_NAME400"InvalidTenantName"
ERR_PRECONDITION_FAILED412"PreconditionFailed"
ERR_LOCKED423"Locked"
ERR_ZERO_IN_URL400"InvalidRequest"
ERR_NO_SUCH_TAG_SET404"NoSuchTagSet"
ERR_NO_SUCH_BUCKET_ENCRYPTION_CONFIGURATION404"ServerSideEncryptionConfigurationNotFoundError"
ERR_LIMIT_EXCEEDED400"LimitExceeded"RGW put ACLs
ERR_NO_SUCH_ENTITY404"NoSuchEntity"
ERR_POSITION_NOT_EQUAL_TO_LENGTH409"PositionNotEqualToLength"append object processor
ERR_OBJECT_NOT_APPENDABLE409"ObjectNotAppendable"append object processor
ERR_UNPROCESSABLE_ENTITY422"UnprocessableEntity"etag in put obj
ERR_INVALID_OBJECT_STATE403"InvalidObjectState"(tiering)
ERR_INVALID_BUCKET_STATE409"InvalidBucketState"
Storage related…
ERR_INVALID_PART400"InvalidPart"(multipart)
ERR_INVALID_PART_ORDER400"InvalidPartOrder"(multipart)
ERR_NO_SUCH_UPLOAD404"NoSuchUpload"(multipart)
ERR_TOO_LARGE400"EntityTooLarge"> allowed obj size, S3 Limits, rgw config
ERR_TOO_SMALL400"EntityTooSmall"< min obj size
ERR_TOO_MANY_BUCKETS400"TooManyBuckets"Quota
ERR_NO_SUCH_BUCKET404"NoSuchBucket"
EEXIST409"BucketAlreadyExists"
ENOTEMPTY409"BucketNotEmpty"
ERANGE416"InvalidRange"e.g offset >= object size
ERR_QUOTA_EXCEEDED403"QuotaExceeded"ENOSPC, Quota
Misc, Internal, Rate Limit..
ETIMEDOUT408"RequestTimeout"
ERR_REQUEST_TIMEOUT400"RequestTimeout"
ERR_INTERNAL_ERROR500"InternalError"
ERR_NOT_IMPLEMENTED501"NotImplemented"
ERR_SERVICE_UNAVAILABLE503"ServiceUnavailable"
ERR_RATE_LIMITED503"SlowDown"
fallback500"UnknownError".

Client Side

Permanent errors have to be handled appropriately to their meaning. Boto3, for example, throws a matching exception

Temporary errors are a bit more interesting. To quote the S3 docs:

Internal errors are errors that occur within the Amazon S3 environment.

Requests that receive an InternalError response might not have processed. For example, if a PUT request returns InternalError, a subsequent GET might retrieve the old value or the updated value.

If Amazon S3 returns an InternalError response, retry the request.

From Amazon S3 User Guide Error Best Practices

Boto3 Retry Behavior

There is a standard and legacy mode (5)Difference is as far as I can tell mostly in the retry count and the list of retryable errors. .

In both modes retry errors from a list of retryable errors. Errors are listed in the boto3 retries doc, consisting of transient and throttling / rate limit errors.

Summary: Retry anything that looks like a transient server side error or rate limiting.

Both modes use exponential backoff.