Skip to content

Bedrock needs inference profile ARN instead of model name #894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vladionescu opened this issue Mar 6, 2025 · 1 comment
Open

Bedrock needs inference profile ARN instead of model name #894

vladionescu opened this issue Mar 6, 2025 · 1 comment

Comments

@vladionescu
Copy link

vladionescu commented Mar 6, 2025

When using [Async]AnthropicBedrock I have to provide "arn:aws:bedrock:us-east-1:0000000000:inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0" to model_name= instead of just anthropic.claude-3-7-sonnet-20250219-v1:0.

If I provide only the model's name, I get

Error code: 400 - {'message': 'Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.'}

httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-7-sonnet-20250219-v1:0/invoke'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

Docs from Anthropic or AWS don't make this obvious, and instead indicate to use the model name directly.

Maybe just needs a docs update?

Repro

from anthropic import AnthropicBedrock

client = AnthropicBedrock(
    aws_profile="bedrock",
    aws_region="us-east-1",
)

# Succeeds (replace 000000000 with your AWS account ID)
message = client.messages.create(
    model="arn:aws:bedrock:us-east-1:0000000000:inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello, world"}],
)
print(message.content)

# Fails
message = client.messages.create(
    model="anthropic.claude-3-7-sonnet-20250219-v1:0", max_tokens=256, messages=[{"role": "user", "content": "Hello, world"}]
)
print(message.content)
@EktorTyp
Copy link

EktorTyp commented Mar 7, 2025

Have you tried using the Inference profile ID from the Cross-region inference? Directly without the arn.

3.5 had the same "issue" -> here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants