πΈπ π β
RGW Error Handling
Table of Contents
Summary
Write up of a recent RGW error handling deep dive.
From the perspective of a storage backend (1)… like the s3gw filesystem+SQLite backend .
How to convert storage errors to client errors in a meaningful way?
S3 Errors - How do they look like?
S3 uses HTTP return codes first and adds detail via the XML error response.
Error Handling in RGW
RGW uses negative return values to signal errors. C++ exceptions are not expected and generally bubble up to the top crashing the application.
Error codes are a mixture of custom (ERR_...
) and errno.h (E...
) errors.
The errno.h errors usually make only sense if you look at them from a client perspective
(2)This is important! RGW uses EINVAL, EPERM, EACCES, ENOENT, EEXIST.
Passing filesystem errors further up will result in weird behavior.
.
Standard Errors?
For completeness, HTTP codes are nicely documented here:
Amazon has a list of error codes in their docs:
RGW defines more on top, as do other vendors:
RGW Error Code Implementation
There is a global mapping typed rgw_http_errors
of error number (err_no
) to HTTP code and an error code string
(3)RGW supports multiple protocols, therefore there are mappings for S3, SWIFT, STS and IAM protocol frontends.
.
RGW Operations
(4)Descendants of RGWOp. Counted 119 while writing this. Examples: GetObj, ListBuckets, etc.
have an error state as part of their req_state
.
The request processing converts this to HTTP return codes and an S3 style error response.
Responses may contain a message field, if req_state.err.message
is set.
The SAL layer does not have access to the error state. SAL backends may only return error numbers, no additional message.
Source links:
- rgw/rgw_common.cc contains the error number mappings, and the conversion logic
- rgw/rgw_common.h contains the
ERR_
constant definitions
RGW Error Code Table
Based on the lookup table in rgw/rgw_common.cc, but reordered a bit.
RGW Error [err_no] | HTTP error code [http_ret] | Error Code [err_code] | Notes |
---|---|---|---|
200β¦ | |||
0 | 200 | ||
STATUS_CREATED | 201 | "Created" | |
STATUS_ACCEPTED | 202 | "Accepted" | |
STATUS_NO_CONTENT | 204 | "NoContent" | |
STATUS_PARTIAL_CONTENT | 206 | ||
300β¦ | |||
ERR_PERMANENT_REDIRECT | 301 | "PermanentRedirect" | |
ERR_WEBSITE_REDIRECT | 301 | "WebsiteRedirect" | |
STATUS_REDIRECT | 303 | ||
ERR_NOT_MODIFIED | 304 | "NotModified" | |
400β¦ | |||
EINVAL | 400 | "InvalidArgument" | |
ERR_INVALID_REQUEST | 400 | "InvalidRequest" | |
ERR_INVALID_DIGEST | 400 | "InvalidDigest" | |
ERR_BAD_DIGEST | 400 | "BadDigest" | |
ERR_INVALID_LOCATION_CONSTRAINT | 400 | "InvalidLocationConstraint" | |
ERR_ZONEGROUP_DEFAULT_PLACEMENT_MISCONFIGURATION | 400 | "ZonegroupDefaultPlacementMisconfiguration" | |
ERR_INVALID_BUCKET_NAME | 400 | "InvalidBucketName" | |
ERR_INVALID_OBJECT_NAME | 400 | "InvalidObjectName" | non standard? |
ERR_UNRESOLVABLE_EMAIL | 400 | "UnresolvableGrantByEmailAddress" | |
ERR_MALFORMED_XML | 400 | "MalformedXML" | |
ERR_AMZ_CONTENT_SHA256_MISMATCH | 400 | "XAmzContentSHA256Mismatch" | |
ERR_MALFORMED_DOC | 400 | "MalformedPolicyDocument" | |
ERR_INVALID_TAG | 400 | "InvalidTag" | |
ERR_MALFORMED_ACL_ERROR | 400 | "MalformedACLError" | |
ERR_INVALID_CORS_RULES_ERROR | 400 | "InvalidRequest" | |
ERR_INVALID_WEBSITE_ROUTING_RULES_ERROR | 400 | "InvalidRequest" | |
ERR_INVALID_ENCRYPTION_ALGORITHM | 400 | "InvalidEncryptionAlgorithmError" | |
ERR_INVALID_RETENTION_PERIOD | 400 | "InvalidRetentionPeriod" | |
EACCES | 403 | "AccessDenied" | |
EPERM | 403 | "AccessDenied" | |
ERR_LENGTH_REQUIRED | 411 | "MissingContentLength" | HTTP header Content-Length |
ERR_SIGNATURE_NO_MATCH | 403 | "SignatureDoesNotMatch" | |
ERR_INVALID_ACCESS_KEY | 403 | "InvalidAccessKeyId" | |
ERR_USER_SUSPENDED | 403 | "UserSuspended" | |
ERR_REQUEST_TIME_SKEWED | 403 | "RequestTimeTooSkewed" | |
ERR_MFA_REQUIRED | 403 | "AccessDenied" | |
ENOENT | 404 | "NoSuchKey" | |
ERR_NO_SUCH_WEBSITE_CONFIGURATION | 404 | "NoSuchWebsiteConfiguration" | |
ERR_NOT_FOUND | 404 | "Not Found" | unused |
ERR_NO_SUCH_LC | 404 | "NoSuchLifecycleConfiguration" | |
ERR_NO_SUCH_BUCKET_POLICY | 404 | "NoSuchBucketPolicy" | |
ERR_NO_SUCH_USER | 404 | "NoSuchUser" | RGW Admin |
ERR_NO_ROLE_FOUND | 404 | "NoSuchEntity" | |
ERR_NO_CORS_FOUND | 404 | "NoSuchCORSConfiguration" | |
ERR_NO_SUCH_SUBUSER | 404 | "NoSuchSubUser" | |
ERR_NO_SUCH_CORS_CONFIGURATION | 404 | "NoSuchCORSConfiguration" | |
ERR_NO_SUCH_OBJECT_LOCK_CONFIGURATION | 404 | "ObjectLockConfigurationNotFoundError" | |
ERR_METHOD_NOT_ALLOWED | 405 | "MethodNotAllowed" | |
ERR_USER_EXIST | 409 | "UserAlreadyExists" | |
ERR_EMAIL_EXIST | 409 | "EmailExists" | |
ERR_KEY_EXIST | 409 | "KeyExists" | |
ERR_TAG_CONFLICT | 409 | "OperationAborted" | |
ERR_INVALID_SECRET_KEY | 400 | "InvalidSecretKey" | |
ERR_INVALID_KEY_TYPE | 400 | "InvalidKeyType" | |
ERR_INVALID_CAP | 400 | "InvalidCapability" | |
ERR_INVALID_TENANT_NAME | 400 | "InvalidTenantName" | |
ERR_PRECONDITION_FAILED | 412 | "PreconditionFailed" | |
ERR_LOCKED | 423 | "Locked" | |
ERR_ZERO_IN_URL | 400 | "InvalidRequest" | |
ERR_NO_SUCH_TAG_SET | 404 | "NoSuchTagSet" | |
ERR_NO_SUCH_BUCKET_ENCRYPTION_CONFIGURATION | 404 | "ServerSideEncryptionConfigurationNotFoundError" | |
ERR_LIMIT_EXCEEDED | 400 | "LimitExceeded" | RGW put ACLs |
ERR_NO_SUCH_ENTITY | 404 | "NoSuchEntity" | |
ERR_POSITION_NOT_EQUAL_TO_LENGTH | 409 | "PositionNotEqualToLength" | append object processor |
ERR_OBJECT_NOT_APPENDABLE | 409 | "ObjectNotAppendable" | append object processor |
ERR_UNPROCESSABLE_ENTITY | 422 | "UnprocessableEntity" | etag in put obj |
ERR_INVALID_OBJECT_STATE | 403 | "InvalidObjectState" | (tiering) |
ERR_INVALID_BUCKET_STATE | 409 | "InvalidBucketState" | |
Storage related⦠| |||
ERR_INVALID_PART | 400 | "InvalidPart" | (multipart) |
ERR_INVALID_PART_ORDER | 400 | "InvalidPartOrder" | (multipart) |
ERR_NO_SUCH_UPLOAD | 404 | "NoSuchUpload" | (multipart) |
ERR_TOO_LARGE | 400 | "EntityTooLarge" | > allowed obj size, S3 Limits, rgw config |
ERR_TOO_SMALL | 400 | "EntityTooSmall" | < min obj size |
ERR_TOO_MANY_BUCKETS | 400 | "TooManyBuckets" | Quota |
ERR_NO_SUCH_BUCKET | 404 | "NoSuchBucket" | |
EEXIST | 409 | "BucketAlreadyExists" | |
ENOTEMPTY | 409 | "BucketNotEmpty" | |
ERANGE | 416 | "InvalidRange" | e.g offset >= object size |
ERR_QUOTA_EXCEEDED | 403 | "QuotaExceeded" | ENOSPC, Quota |
Misc, Internal, Rate Limit.. | |||
ETIMEDOUT | 408 | "RequestTimeout" | |
ERR_REQUEST_TIMEOUT | 400 | "RequestTimeout" | |
ERR_INTERNAL_ERROR | 500 | "InternalError" | |
ERR_NOT_IMPLEMENTED | 501 | "NotImplemented" | |
ERR_SERVICE_UNAVAILABLE | 503 | "ServiceUnavailable" | |
ERR_RATE_LIMITED | 503 | "SlowDown" | |
fallback | 500 | "UnknownError". |
Client Side
Permanent errors have to be handled appropriately to their meaning. Boto3, for example, throws a matching exception
Temporary errors are a bit more interesting. To quote the S3 docs:
Boto3 Retry Behavior
There is a standard and legacy mode (5)Difference is as far as I can tell mostly in the retry count and the list of retryable errors. .
In both modes retry errors from a list of retryable errors. Errors are listed in the boto3 retries doc, consisting of transient and throttling / rate limit errors.
Summary: Retry anything that looks like a transient server side error or rate limiting.
Both modes use exponential backoff.