RGW Error Handling
Write up of a recent RGW error handling deep dive.
..like the s3gw filesystem+SQLite backend From the perspective of a storage backend.
How to convert storage errors to client errors in a meaningful way?
S3 Errors - How do they look like?
S3 uses HTTP return codes first and adds detail via the XML error response. S3 docs, List of S3 Error Codes
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>CamelCasedError code from ErrorCodeList </Code>
<Message>Message for human consumption</Message>
<Resource>Bucket or object that caused the error</Resource>
<RequestId>ID of request associated with the error</RequestId>
</Error>
Error Handling in RGW
RGW uses negative return values to signal errors. C++ exceptions are not expected and generally bubble up to the top crashing the application.
Error codes are a mixture of custom (ERR_...) and errno.h (E...) errors.
The errno.h errors usually make only sense if you look at them from a client perspective
This is important! RGW uses EINVAL, EPERM, EACCES, ENOENT, EEXIST.
Passing filesystem errors further up will result in weird behavior.
Standard Errors?
For completeness, HTTP codes are nicely documented here:
Amazon has a list of error codes in their docs:
RGW defines more on top, as do other vendors:
RGW Error Code Implementation
RGW supports multiple protocols
therefore there are mappings for S3
SWIFT
STS and IAM protocol frontends.
There is a global mapping typed rgw_http_errors of error number (err_no) to HTTP code and an error code string
Descendants of RGWOp. Counted 119 while writing this. Examples: GetObj
ListBuckets
etc.
RGW Operations have an error state as part of their req_state.
The request processing converts this to HTTP return codes and an S3 style error response.
Responses may contain a message field, if req_state.err.message is set.
The SAL layer does not have access to the error state. SAL backends may only return error numbers, no additional message.
Source links:
- rgw/rgw_common.cc contains the error number mappings, and the conversion logic
- rgw/rgw_common.h contains the
ERR_constant definitions
RGW Error Code Table
Based on the lookup table in rgw/rgw_common.cc, but reordered a bit.
| RGW Error [err_no] | HTTP error code [http_ret] | Error Code [err_code] | Notes |
|---|---|---|---|
| 200… | |||
| 0 | 200 | ||
| STATUS_CREATED | 201 | "Created" | |
| STATUS_ACCEPTED | 202 | "Accepted" | |
| STATUS_NO_CONTENT | 204 | "NoContent" | |
| STATUS_PARTIAL_CONTENT | 206 | ||
| 300… | |||
| ERR_PERMANENT_REDIRECT | 301 | "PermanentRedirect" | |
| ERR_WEBSITE_REDIRECT | 301 | "WebsiteRedirect" | |
| STATUS_REDIRECT | 303 | ||
| ERR_NOT_MODIFIED | 304 | "NotModified" | |
| 400… | |||
| EINVAL | 400 | "InvalidArgument" | |
| ERR_INVALID_REQUEST | 400 | "InvalidRequest" | |
| ERR_INVALID_DIGEST | 400 | "InvalidDigest" | |
| ERR_BAD_DIGEST | 400 | "BadDigest" | |
| ERR_INVALID_LOCATION_CONSTRAINT | 400 | "InvalidLocationConstraint" | |
| ERR_ZONEGROUP_DEFAULT_PLACEMENT_MISCONFIGURATION | 400 | "ZonegroupDefaultPlacementMisconfiguration" | |
| ERR_INVALID_BUCKET_NAME | 400 | "InvalidBucketName" | |
| ERR_INVALID_OBJECT_NAME | 400 | "InvalidObjectName" | non standard? |
| ERR_UNRESOLVABLE_EMAIL | 400 | "UnresolvableGrantByEmailAddress" | |
| ERR_MALFORMED_XML | 400 | "MalformedXML" | |
| ERR_AMZ_CONTENT_SHA256_MISMATCH | 400 | "XAmzContentSHA256Mismatch" | |
| ERR_MALFORMED_DOC | 400 | "MalformedPolicyDocument" | |
| ERR_INVALID_TAG | 400 | "InvalidTag" | |
| ERR_MALFORMED_ACL_ERROR | 400 | "MalformedACLError" | |
| ERR_INVALID_CORS_RULES_ERROR | 400 | "InvalidRequest" | |
| ERR_INVALID_WEBSITE_ROUTING_RULES_ERROR | 400 | "InvalidRequest" | |
| ERR_INVALID_ENCRYPTION_ALGORITHM | 400 | "InvalidEncryptionAlgorithmError" | |
| ERR_INVALID_RETENTION_PERIOD | 400 | "InvalidRetentionPeriod" | |
| EACCES | 403 | "AccessDenied" | |
| EPERM | 403 | "AccessDenied" | |
| ERR_LENGTH_REQUIRED | 411 | "MissingContentLength" | HTTP header Content-Length |
| ERR_SIGNATURE_NO_MATCH | 403 | "SignatureDoesNotMatch" | |
| ERR_INVALID_ACCESS_KEY | 403 | "InvalidAccessKeyId" | |
| ERR_USER_SUSPENDED | 403 | "UserSuspended" | |
| ERR_REQUEST_TIME_SKEWED | 403 | "RequestTimeTooSkewed" | |
| ERR_MFA_REQUIRED | 403 | "AccessDenied" | |
| ENOENT | 404 | "NoSuchKey" | |
| ERR_NO_SUCH_WEBSITE_CONFIGURATION | 404 | "NoSuchWebsiteConfiguration" | |
| ERR_NOT_FOUND | 404 | "Not Found" | unused |
| ERR_NO_SUCH_LC | 404 | "NoSuchLifecycleConfiguration" | |
| ERR_NO_SUCH_BUCKET_POLICY | 404 | "NoSuchBucketPolicy" | |
| ERR_NO_SUCH_USER | 404 | "NoSuchUser" | RGW Admin |
| ERR_NO_ROLE_FOUND | 404 | "NoSuchEntity" | |
| ERR_NO_CORS_FOUND | 404 | "NoSuchCORSConfiguration" | |
| ERR_NO_SUCH_SUBUSER | 404 | "NoSuchSubUser" | |
| ERR_NO_SUCH_CORS_CONFIGURATION | 404 | "NoSuchCORSConfiguration" | |
| ERR_NO_SUCH_OBJECT_LOCK_CONFIGURATION | 404 | "ObjectLockConfigurationNotFoundError" | |
| ERR_METHOD_NOT_ALLOWED | 405 | "MethodNotAllowed" | |
| ERR_USER_EXIST | 409 | "UserAlreadyExists" | |
| ERR_EMAIL_EXIST | 409 | "EmailExists" | |
| ERR_KEY_EXIST | 409 | "KeyExists" | |
| ERR_TAG_CONFLICT | 409 | "OperationAborted" | |
| ERR_INVALID_SECRET_KEY | 400 | "InvalidSecretKey" | |
| ERR_INVALID_KEY_TYPE | 400 | "InvalidKeyType" | |
| ERR_INVALID_CAP | 400 | "InvalidCapability" | |
| ERR_INVALID_TENANT_NAME | 400 | "InvalidTenantName" | |
| ERR_PRECONDITION_FAILED | 412 | "PreconditionFailed" | |
| ERR_LOCKED | 423 | "Locked" | |
| ERR_ZERO_IN_URL | 400 | "InvalidRequest" | |
| ERR_NO_SUCH_TAG_SET | 404 | "NoSuchTagSet" | |
| ERR_NO_SUCH_BUCKET_ENCRYPTION_CONFIGURATION | 404 | "ServerSideEncryptionConfigurationNotFoundError" | |
| ERR_LIMIT_EXCEEDED | 400 | "LimitExceeded" | RGW put ACLs |
| ERR_NO_SUCH_ENTITY | 404 | "NoSuchEntity" | |
| ERR_POSITION_NOT_EQUAL_TO_LENGTH | 409 | "PositionNotEqualToLength" | append object processor |
| ERR_OBJECT_NOT_APPENDABLE | 409 | "ObjectNotAppendable" | append object processor |
| ERR_UNPROCESSABLE_ENTITY | 422 | "UnprocessableEntity" | etag in put obj |
| ERR_INVALID_OBJECT_STATE | 403 | "InvalidObjectState" | (tiering) |
| ERR_INVALID_BUCKET_STATE | 409 | "InvalidBucketState" | |
| Storage related… | |||
| ERR_INVALID_PART | 400 | "InvalidPart" | (multipart) |
| ERR_INVALID_PART_ORDER | 400 | "InvalidPartOrder" | (multipart) |
| ERR_NO_SUCH_UPLOAD | 404 | "NoSuchUpload" | (multipart) |
| ERR_TOO_LARGE | 400 | "EntityTooLarge" | > allowed obj size, S3 Limits, rgw config |
| ERR_TOO_SMALL | 400 | "EntityTooSmall" | < min obj size |
| ERR_TOO_MANY_BUCKETS | 400 | "TooManyBuckets" | Quota |
| ERR_NO_SUCH_BUCKET | 404 | "NoSuchBucket" | |
| EEXIST | 409 | "BucketAlreadyExists" | |
| ENOTEMPTY | 409 | "BucketNotEmpty" | |
| ERANGE | 416 | "InvalidRange" | e.g offset >= object size |
| ERR_QUOTA_EXCEEDED | 403 | "QuotaExceeded" | ENOSPC, Quota |
| Misc, Internal, Rate Limit.. | |||
| ETIMEDOUT | 408 | "RequestTimeout" | |
| ERR_REQUEST_TIMEOUT | 400 | "RequestTimeout" | |
| ERR_INTERNAL_ERROR | 500 | "InternalError" | |
| ERR_NOT_IMPLEMENTED | 501 | "NotImplemented" | |
| ERR_SERVICE_UNAVAILABLE | 503 | "ServiceUnavailable" | |
| ERR_RATE_LIMITED | 503 | "SlowDown" | |
| fallback | 500 | "UnknownError". |
Client Side
Permanent errors have to be handled appropriately to their meaning. Boto3, for example, throws a matching exception
Temporary errors are a bit more interesting. To quote the S3 docs:
Internal errors are errors that occur within the Amazon S3 environment.
Requests that receive an InternalError response might not have processed. For example, if a PUT request returns InternalError, a subsequent GET might retrieve the old value or the updated value.
If Amazon S3 returns an InternalError response, retry the request.
Boto3 Retry Behavior
Difference is as far as I can tell mostly in the retry count and the list of retryable errors. There is a standard and legacy mode
In both modes retry errors from a list of retryable errors. Errors are listed in the boto3 retries doc, consisting of transient and throttling / rate limit errors.
Summary: Retry anything that looks like a transient server side error or rate limiting.
Both modes use exponential backoff.