Replies: 3 comments
-
Mutual TLS support: #2140 (see note on that PR, but at present there is no nice way to get the Dn in order to set groups from client certificate, so this is an option only for restricting access to the server as a whole, not for any more fine grained restrictions) |
Beta Was this translation helpful? Give feedback.
-
Good call out. This is an important limitation of the current implementation. |
Beta Was this translation helpful? Give feedback.
-
fyi: I submitted a PR for the oauth token introspection as described above: #2187 |
Beta Was this translation helpful? Give feedback.
-
It will often be desirable to restrict access to a llama-stack server. Providers may be configured with an API key for some service, where use of that key will incur cost or consume quota. When running a server including such a provider, it will often be desirable to restrict who is allowed to incur those costs or consume that quota. Even where an API key is not used, using the server may consume valuable but limited resource (e.g. GPU) and ability to do so may need to be restricted. It may also be desirable to prevent resources created through the server from being deleted or modified by unauthorised parties.
The purpose of this discussion is to first clarify my understanding of the current state and its limitations and then to determine whether there is consensus for any particular direction or approach to improvements.
Current built-in access control capabilities
Authentication is the process by which the identity of the user or process making a request is verified. Authorization is the process by which it is determined if the user or process making the request is allowed to do so. In the current scheme authentication is considered out of scope; how the user or process making a request gets the token is not a concern for the llama-stack server.
The built in capabilities allow a llama-stack server to enable authorization which will result in every request made having to have a bearer token in the authorization header. Authorization essentially consists of two parts: (i) validation of the token, which if succesful grants access to the server as a whole, and (ii) determination of the access attributes for that token which are associated with the request and used to determine whether access will be granted to a particular resource.
Once a token is validated, the request is associated with user attributes determined by the auth provider in use. These are compared with access attributes associated with resources in determining whether to allow access to that specific resource. At present access attributes are limited to four defined sets: roles, teams, projects and namespaces. The motivation for this attribute-based access control (ABAC) scheme is assumed to be increased flexibility over a purely role-based access control (RBAC) scheme.
The access attributes on resources are at present always set to the user attributes associated with the request that creates the resource for dynamically created resources and there is no way to subsequently modify them. For configured resources there are no access attributes set at present. There is no access check to determine whether a resource should be created when processing a request.
There are at present two auth providers defined:
1. custom auth provider
The custom auth provider validates the token by making a post request to a defined auth endpoint url. If the request succeeds, the token is considered valid. The response may also return the attributes to be associated with the user represented by the token.
If no access attributes are returned in the response from the auth endpoint, the auth provider uses the token string itself (as encoded in the authorization header) as the value for the namespaces attribute. Note: this could be problematic if the token used has a limited lifetime as when it is renewed the user will no longer have access to the resources they previously created. It only makes sense where the token never changes for a given user.
2. kubernetes auth provider
The kubernetes auth provider validates the token by using it to make a call on the kubernetes API server. The token must therefore by a valid token for that API server, representing a user or a service-account. If the call to the kubernetes API server succeeds, the auth provider sets the user attributes for the request to have the roles attribute set to the subject from the token payload and the teams attributes set to the groups taken from the token payload.
Alternative access control options
Rather than using the built-in capabilities, someone deploying a llama-stack server could use alternative ways of securing access. These include:
Use a proxy in front of the server that authorizes access. Note: for this to be effective, direct access to the llama-stack server needs to be prevented, e.g. by restricting the interface to which it binds to loopback (see e.g. feat: allow the interface on which the server will listen to be configured #2015).
Use mutual TLS, i.e. add support for defining (a) trusted issuer(s) and only accept requests from clients who present a client certificate signed by that (or those) trusted issuers. This on its own doesn't help with fine grained access control for a multi-tenant server. It could be a simple solution where all that is required is controlling access to the server as a whole.
Restrict network access, e.g. through firewall or on Kubernetes using NetworkPolicy. This is often cumbersome however.
Observations
While the built-in authorization can be effectively used to control access to the server, more fine grained access control is not as effective. Configured resources are currently accessible to anyone with a valid token. Dynamically created resources (where supported) can be created by anyone with a valid token.
The custom auth provider requires a custom auth endpoint to be developed to secure the resources. Having to develop such a component just to secure a server is likely to be a barrier in some cases.
The kerberos auth provider restricts access to any user or process with access to a valid bearer token for the kubernetes api server on a cluster. In practice that likely means any user with access to the cluster or any pod running in it. Without more effective fine grained access control, it is best suited at present to a single tenant cluster.
Proposal
I believe it is important to:
Provide a way on kubernetes to either restrict access to a particular user or service account, or restrict access to particular features of the server (e.g. who can use a particular model on a particular inference provider) to a subset of users or service accounts. Without this, a server deployed on a multi-tenant kubernetes cluster is usable by all tenants (unless restricted through network policy).
Provide a way to secure a server not running on a kubernetes cluster without having to develop a custom auth endpoint.
Making ABAC more effective
To make the ABAC scheme effective, there needs to be some way to control the resource access attributes independently and it needs to be possible to prevent particular resources from being dynamically created by particular users.
One way to do so could be to alter the config schema used to define resources to allow access attributes to be specified. For dynamically created resources the schema accepted as input would need to be altered to allow access attributes to be set.
A suggestion for achieving this would be to add to the auth config a set of 'rules' that would allow the desired access attributes for any resource to be determined. Since the resource would not need to exist for this determination the resulting attributs could be compared with those associated with the request before creating the resource. The rules here would be an ordered list of tuples containing the resource type (model, vectordb etc), the resource identifier, the provider identifier and the applicable attributes. To determine the desired access attributes for a given resource, the resource type, identifier and provider would be compare to the list in order and the access attributes from the first match would be used. A wildcard would be allowed in defining the rules, e.g. the mepty string, which would be considered to match any value. That could be used to simply set up a default, but refine it for specific cases if desired.
An additional access check would be performed prior to creating a new resource, by first retrieving the access attributes it would have if created, and then matching those to the attributes associated with the request, failing to create if there was no match.
A suggested implementation of this approach can be viewed here: grs@93de95c
Minimal standard access attributes
For clarity I also propose that we define a minimal base that can be relied on for the attributes associated with a request by the auth providers. At present the custom auth provider might result in any of the attributes being set, or just the namespace being set. The kubernetes auth provider sets the roles and groups (though role is set to the subject of the token). I would propose that we pick either roles or groups and state that an auth provider MUST set that attribute for a given request. If it is unable to give a sensible explicit value, it should fallback to 'default'. The subject of the token could be added as an additional value for that attribute.
This would still allow a more flexible ABAC scheme to be used where supported, but would always allow falling back to a more traditional RBAC style scheme.
An additional, standards based, auth provider
To make it possible to secure a server without requiring the development of a custom auth endpoint, I would propose to explore a new auth provide that would be able to be configured to work with existing authorization servers through standardized interfaces. Some more investigation on this is required but OAuth 2 token introspection (rfc7662) would be a good place to start and/or working with defined token formats such as JWT.
A mutual TLS based provider could also be valuable, with groups and subject derived from the client's certificate.
Beta Was this translation helpful? Give feedback.
All reactions