πΈπ π β
RGW Error Handling
Table of Contents
Summary
Write up of a recent RGW error handling deep dive.
From the perspective of a storage backend (1)… like the s3gw filesystem+SQLite backend .
How to convert storage errors to client errors in a meaningful way?
S3 Errors - How do they look like?
S3 uses HTTP return codes first and adds detail via the XML error response.
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>CamelCasedError code from ErrorCodeList </Code>
<Message>Message for human consumption</Message>
<Resource>Bucket or object that caused the error</Resource>
<RequestId>ID of request associated with the error</RequestId>
</Error>
Error Handling in RGW
RGW uses negative return values to signal errors. C++ exceptions are not expected and generally bubble up to the top crashing the application.
Error codes are a mixture of custom (ERR_...
) and errno.h (E...
) errors.
The errno.h errors usually make only sense if you look at them from a client perspective
(2)This is important! RGW uses EINVAL, EPERM, EACCES, ENOENT, EEXIST.
Passing filesystem errors further up will result in weird behavior.
.
Standard Errors?
For completeness, HTTP codes are nicely documented here:
Amazon has a list of error codes in their docs:
RGW defines more on top, as do other vendors:
RGW Error Code Implementation
There is a global mapping typed rgw_http_errors
of error number (err_no
) to HTTP code and an error code string
(3)RGW supports multiple protocols, therefore there are mappings for S3, SWIFT, STS and IAM protocol frontends.
.
RGW Operations
(4)Descendants of RGWOp. Counted 119 while writing this. Examples: GetObj, ListBuckets, etc.
have an error state as part of their req_state
.
The request processing converts this to HTTP return codes and an S3 style error response.
Responses may contain a message field, if req_state.err.message
is set.
The SAL layer does not have access to the error state. SAL backends may only return error numbers, no additional message.
Source links:
- rgw/rgw_common.cc contains the error number mappings, and the conversion logic
- rgw/rgw_common.h contains the
ERR_
constant definitions
RGW Error Code Table
Based on the lookup table in rgw/rgw_common.cc, but reordered a bit.
RGW Error [err_no] | HTTP error code [http_ret] | Error Code [err_code] | Notes |
---|---|---|---|
200β¦ | |||
0 | 200 | ||
STATUS_CREATED | 201 | "Created" | |
STATUS_ACCEPTED | 202 | "Accepted" | |
STATUS_NO_CONTENT | 204 | "NoContent" | |
STATUS_PARTIAL_CONTENT | 206 | ||
300β¦ | |||
ERR_PERMANENT_REDIRECT | 301 | "PermanentRedirect" | |
ERR_WEBSITE_REDIRECT | 301 | "WebsiteRedirect" | |
STATUS_REDIRECT | 303 | ||
ERR_NOT_MODIFIED | 304 | "NotModified" | |
400β¦ | |||
EINVAL | 400 | "InvalidArgument" | |
ERR_INVALID_REQUEST | 400 | "InvalidRequest" | |
ERR_INVALID_DIGEST | 400 | "InvalidDigest" | |
ERR_BAD_DIGEST | 400 | "BadDigest" | |
ERR_INVALID_LOCATION_CONSTRAINT | 400 | "InvalidLocationConstraint" | |
ERR_ZONEGROUP_DEFAULT_PLACEMENT_MISCONFIGURATION | 400 | "ZonegroupDefaultPlacementMisconfiguration" | |
ERR_INVALID_BUCKET_NAME | 400 | "InvalidBucketName" | |
ERR_INVALID_OBJECT_NAME | 400 | "InvalidObjectName" | non standard? |
ERR_UNRESOLVABLE_EMAIL | 400 | "UnresolvableGrantByEmailAddress" | |
ERR_MALFORMED_XML | 400 | "MalformedXML" | |
ERR_AMZ_CONTENT_SHA256_MISMATCH | 400 | "XAmzContentSHA256Mismatch" | |
ERR_MALFORMED_DOC | 400 | "MalformedPolicyDocument" | |
ERR_INVALID_TAG | 400 | "InvalidTag" | |
ERR_MALFORMED_ACL_ERROR | 400 | "MalformedACLError" | |
ERR_INVALID_CORS_RULES_ERROR | 400 | "InvalidRequest" | |
ERR_INVALID_WEBSITE_ROUTING_RULES_ERROR | 400 | "InvalidRequest" | |
ERR_INVALID_ENCRYPTION_ALGORITHM | 400 | "InvalidEncryptionAlgorithmError" | |
ERR_INVALID_RETENTION_PERIOD | 400 | "InvalidRetentionPeriod" | |
EACCES | 403 | "AccessDenied" | |
EPERM | 403 | "AccessDenied" | |
ERR_LENGTH_REQUIRED | 411 | "MissingContentLength" | HTTP header Content-Length |
ERR_SIGNATURE_NO_MATCH | 403 | "SignatureDoesNotMatch" | |
ERR_INVALID_ACCESS_KEY | 403 | "InvalidAccessKeyId" | |
ERR_USER_SUSPENDED | 403 | "UserSuspended" | |
ERR_REQUEST_TIME_SKEWED | 403 | "RequestTimeTooSkewed" | |
ERR_MFA_REQUIRED | 403 | "AccessDenied" | |
ENOENT | 404 | "NoSuchKey" | |
ERR_NO_SUCH_WEBSITE_CONFIGURATION | 404 | "NoSuchWebsiteConfiguration" | |
ERR_NOT_FOUND | 404 | "Not Found" | unused |
ERR_NO_SUCH_LC | 404 | "NoSuchLifecycleConfiguration" | |
ERR_NO_SUCH_BUCKET_POLICY | 404 | "NoSuchBucketPolicy" | |
ERR_NO_SUCH_USER | 404 | "NoSuchUser" | RGW Admin |
ERR_NO_ROLE_FOUND | 404 | "NoSuchEntity" | |
ERR_NO_CORS_FOUND | 404 | "NoSuchCORSConfiguration" | |
ERR_NO_SUCH_SUBUSER | 404 | "NoSuchSubUser" | |
ERR_NO_SUCH_CORS_CONFIGURATION | 404 | "NoSuchCORSConfiguration" | |
ERR_NO_SUCH_OBJECT_LOCK_CONFIGURATION | 404 | "ObjectLockConfigurationNotFoundError" | |
ERR_METHOD_NOT_ALLOWED | 405 | "MethodNotAllowed" | |
ERR_USER_EXIST | 409 | "UserAlreadyExists" | |
ERR_EMAIL_EXIST | 409 | "EmailExists" | |
ERR_KEY_EXIST | 409 | "KeyExists" | |
ERR_TAG_CONFLICT | 409 | "OperationAborted" | |
ERR_INVALID_SECRET_KEY | 400 | "InvalidSecretKey" | |
ERR_INVALID_KEY_TYPE | 400 | "InvalidKeyType" | |
ERR_INVALID_CAP | 400 | "InvalidCapability" | |
ERR_INVALID_TENANT_NAME | 400 | "InvalidTenantName" | |
ERR_PRECONDITION_FAILED | 412 | "PreconditionFailed" | |
ERR_LOCKED | 423 | "Locked" | |
ERR_ZERO_IN_URL | 400 | "InvalidRequest" | |
ERR_NO_SUCH_TAG_SET | 404 | "NoSuchTagSet" | |
ERR_NO_SUCH_BUCKET_ENCRYPTION_CONFIGURATION | 404 | "ServerSideEncryptionConfigurationNotFoundError" | |
ERR_LIMIT_EXCEEDED | 400 | "LimitExceeded" | RGW put ACLs |
ERR_NO_SUCH_ENTITY | 404 | "NoSuchEntity" | |
ERR_POSITION_NOT_EQUAL_TO_LENGTH | 409 | "PositionNotEqualToLength" | append object processor |
ERR_OBJECT_NOT_APPENDABLE | 409 | "ObjectNotAppendable" | append object processor |
ERR_UNPROCESSABLE_ENTITY | 422 | "UnprocessableEntity" | etag in put obj |
ERR_INVALID_OBJECT_STATE | 403 | "InvalidObjectState" | (tiering) |
ERR_INVALID_BUCKET_STATE | 409 | "InvalidBucketState" | |
Storage related⦠| |||
ERR_INVALID_PART | 400 | "InvalidPart" | (multipart) |
ERR_INVALID_PART_ORDER | 400 | "InvalidPartOrder" | (multipart) |
ERR_NO_SUCH_UPLOAD | 404 | "NoSuchUpload" | (multipart) |
ERR_TOO_LARGE | 400 | "EntityTooLarge" | > allowed obj size, S3 Limits, rgw config |
ERR_TOO_SMALL | 400 | "EntityTooSmall" | < min obj size |
ERR_TOO_MANY_BUCKETS | 400 | "TooManyBuckets" | Quota |
ERR_NO_SUCH_BUCKET | 404 | "NoSuchBucket" | |
EEXIST | 409 | "BucketAlreadyExists" | |
ENOTEMPTY | 409 | "BucketNotEmpty" | |
ERANGE | 416 | "InvalidRange" | e.g offset >= object size |
ERR_QUOTA_EXCEEDED | 403 | "QuotaExceeded" | ENOSPC, Quota |
Misc, Internal, Rate Limit.. | |||
ETIMEDOUT | 408 | "RequestTimeout" | |
ERR_REQUEST_TIMEOUT | 400 | "RequestTimeout" | |
ERR_INTERNAL_ERROR | 500 | "InternalError" | |
ERR_NOT_IMPLEMENTED | 501 | "NotImplemented" | |
ERR_SERVICE_UNAVAILABLE | 503 | "ServiceUnavailable" | |
ERR_RATE_LIMITED | 503 | "SlowDown" | |
fallback | 500 | "UnknownError". |
Client Side
Permanent errors have to be handled appropriately to their meaning. Boto3, for example, throws a matching exception
Temporary errors are a bit more interesting. To quote the S3 docs:
Internal errors are errors that occur within the Amazon S3 environment.
Requests that receive an InternalError response might not have processed. For example, if a PUT request returns InternalError, a subsequent GET might retrieve the old value or the updated value.
If Amazon S3 returns an InternalError response, retry the request.
Boto3 Retry Behavior
There is a standard and legacy mode (5)Difference is as far as I can tell mostly in the retry count and the list of retryable errors. .
In both modes retry errors from a list of retryable errors. Errors are listed in the boto3 retries doc, consisting of transient and throttling / rate limit errors.
Summary: Retry anything that looks like a transient server side error or rate limiting.
Both modes use exponential backoff.