Skip to content

Check if an S3 object exists without generating an error? #479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wlandau opened this issue Jan 10, 2022 · 2 comments
Open

Check if an S3 object exists without generating an error? #479

wlandau opened this issue Jan 10, 2022 · 2 comments
Labels
question 🧐❓ Further information is requested

Comments

@wlandau
Copy link

wlandau commented Jan 10, 2022

targets uses paws for S3 storage, and it needs to check if the object exists before knowing how to proceed. Is it possible to efficiently check the existence of an object without generating an error if the object does not exist? The existing tryCatch(..., http_400 = ...) workaround seems okay, but http_400 could include other kinds of errors than a missing object.

@DyfanJones
Copy link
Member

DyfanJones commented Jan 11, 2022

Hi @wlandau, I don't think this answers your question fully but you could be more specific with the error check. Instead of using http_400 you could use http_404 . This will make sure you don't accidentally mask any 403 Forbidden code errors (I beleive access permission falls into this).

So for example:

aws_s3_head <- function(key, bucket, region = NULL, version = NULL) {
  if (!is.null(region)) {
    withr::local_envvar(.new = list(AWS_REGION = region))
  }
  args <- list(
    Key = key,
    Bucket = bucket
  )
  if (!is.null(version)) {
    args$VersionId <- version
  }
  do.call(what = paws::s3()$head_object, args = args)
}

aws_s3_head_true <- function(key, bucket, region = NULL, version = NULL) {
  aws_s3_head(
    key = key,
    bucket = bucket,
    region = region,
    version = version
  )
  TRUE
}

old_aws_s3_exists <- function(key, bucket, region = NULL, version = NULL) {
  tryCatch(
    aws_s3_head_true(
      key = key,
      bucket = bucket,
      region = region,
      version = version
    ),
    http_400 = function(condition) {
      FALSE
    }
  )
}

new_aws_s3_exists <- function(key, bucket, region = NULL, version = NULL) {
  tryCatch(
    aws_s3_head_true(
      key = key,
      bucket = bucket,
      region = region,
      version = version
    ),
    http_404 = function(condition) {
      FALSE
    }
  )
}


# aws s3 bucket with iam role doesn't have permission to access
bucket = "made-up-bucket-1"
key = "made-up"

old_aws_s3_exists(key, bucket)
#> [1] FALSE
new_aws_s3_exists(key, bucket)
#> Error: SerializationError (HTTP 403). failed to read from query HTTP response body

# aws s3 object doesn't exist
bucket = "made-up-bucket-2"
key = "made-up"

old_aws_s3_exists(key, bucket)
#> [1] FALSE
new_aws_s3_exists(key, bucket)
#> [1] FALSE

Created on 2022-01-11 by the reprex package (v2.0.1)

I hope this helps 😄

Reference: boto3.client.s3.head_object

@davidkretch davidkretch added the question 🧐❓ Further information is requested label Feb 27, 2022
@tyner
Copy link

tyner commented Mar 5, 2024

Alternatively, could call s3fs::s3_file_exists() which will return a logical value (or give an error if the permissions prohibit access). Also it is vectorized!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question 🧐❓ Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants