EKS/ECR envbuilder layer cache

I'm trying to setup devcontainer layer caching. I started out with the aws-devcontainer starter template, and I have a repo in ECR which I have filled in to the "cache_repo" variable. But when I start the workspace, I see the following:
Failed to find cached image in repository "[aws-acct-id].dkr.ecr.us-west-2.amazonaws.com/envbuilder-cache". It will be rebuilt in the next apply. Error: failed to fetch the envbuilder binary from the builder image: check remote image: check remote image: GET https://[aws-acct-id].dkr.ecr.us-west-2.amazonaws.com/v2/custom-envbuilder/manifests/latest: unexpected status code 401 Unauthorized: Not Authorized
Failed to find cached image in repository "[aws-acct-id].dkr.ecr.us-west-2.amazonaws.com/envbuilder-cache". It will be rebuilt in the next apply. Error: failed to fetch the envbuilder binary from the builder image: check remote image: check remote image: GET https://[aws-acct-id].dkr.ecr.us-west-2.amazonaws.com/v2/custom-envbuilder/manifests/latest: unexpected status code 401 Unauthorized: Not Authorized
As this is coming from terraform, this is running in the coder pod which is running with the "coder" service account. I have a pod identity association that should be giving this service account access to ECR, with full read access and write to the envbuilder-cache repo. I had a hypothesis that the pod identity association was not sufficient to access ECR, only to retrieve credentials. So I adjusted the template to add a data "aws_ecr_authorization_token" and to use that to render a docker_config_base64 for the "envbuilder_cached_image": I can see with coder state pull that it is getting an authorization token. Yet the 401 error persists. Anything I should be checking?
7 Replies
Codercord
Codercord3w ago
<#1357846227366056027>
Category
Help needed
Product
Coder (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
David
DavidOP3w ago
This is the relevant portion of the template
# Get the ECR authorization token
data "aws_ecr_authorization_token" "token" {
count = var.cache_repo == "" ? 0 : data.coder_workspace.me.start_count
}

# Check for the presence of a prebuilt image in the cache repo
# that we can use instead.
resource "envbuilder_cached_image" "cached" {
count = var.cache_repo == "" ? 0 : data.coder_workspace.me.start_count
builder_image = local.devcontainer_builder_image
git_url = data.coder_parameter.repo_url.value
cache_repo = var.cache_repo
extra_env = local.envbuilder_env

# Create a properly formatted Docker config.json with the ECR token
docker_config_base64 = base64encode(jsonencode({
"auths" = {
# Extract the registry URL from the proxy_endpoint (removes https:// prefix)
trimsuffix(trimprefix(data.aws_ecr_authorization_token.token[0].proxy_endpoint, "https://"), "/") = {
"auth" = data.aws_ecr_authorization_token.token[0].authorization_token
}
}
}))
}
# Get the ECR authorization token
data "aws_ecr_authorization_token" "token" {
count = var.cache_repo == "" ? 0 : data.coder_workspace.me.start_count
}

# Check for the presence of a prebuilt image in the cache repo
# that we can use instead.
resource "envbuilder_cached_image" "cached" {
count = var.cache_repo == "" ? 0 : data.coder_workspace.me.start_count
builder_image = local.devcontainer_builder_image
git_url = data.coder_parameter.repo_url.value
cache_repo = var.cache_repo
extra_env = local.envbuilder_env

# Create a properly formatted Docker config.json with the ECR token
docker_config_base64 = base64encode(jsonencode({
"auths" = {
# Extract the registry URL from the proxy_endpoint (removes https:// prefix)
trimsuffix(trimprefix(data.aws_ecr_authorization_token.token[0].proxy_endpoint, "https://"), "/") = {
"auth" = data.aws_ecr_authorization_token.token[0].authorization_token
}
}
}))
}
Phorcys
Phorcys3w ago
hey, a similar issue has been reported in the past let me find it https://discord.com/channels/747933592273027093/1286376282984026226/1286710187033628763 ah, my bad, it seems that it's pretty much the same as your template what IAM permissions did you set for the service account?
David
DavidOP2w ago
Sort of the same. It is building the credentials the same way. But that example is giving the credentials to envbuilder. I'm trying to give the credentials to resource "envbuilder_cached_image". One guess I had: Maybe the terraform resource isn't using the credentials to "fetch the envbuilder binary from the builder image", but only for accessing the cache repo?
David
DavidOP2w ago
IAM (in the deployment terraform, not a workspace template)
David
DavidOP2w ago
Yeah, I think that might be it: docker_config_base64 is passed into envbuilder's config, but it's not used when fetching envbuilder from the builder_image. The helper function GetRemoteImage(), uses authn.DefaultKeychain, which reads from ~/.docker/config.json et al. https://github.com/coder/terraform-provider-envbuilder/blob/main/internal/imgutil/imgutil.go#L27 https://github.com/google/go-containerregistry/blob/main/pkg/authn/keychain.go#L87 I realize I haven't said or made clear in my snippets: the builder_image is in a private repository (b/c I added some files there that we want available during devcontainer builds). Two workarounds come to mind: - Use the public image for builder_image in this resource--we don't actually need our modified image just to check the cache. - Modify the coder deployment to put credentials in an appropriate place to be read by GetRemoteImage--actually, this isn't a good solution because ECR credentials expire every 12 hours; though I suppose I could complicate it further by adding a process to refresh them.
Phorcys
Phorcys2w ago
i'm not sure, @Atif do you have any ideas?

Did you find this page helpful?