R
RunPodβ€’2w ago
jackson hole

How is the architecture set up in the serverless (please give me a minute to explain myself)

We have been looking for the LLM hosting services and autoscaling functionality to make sure we meet the demand -- but our main concern is the authentication architecture design. The basic setup Based on my understanding there are the following layers: 1. Application in the user's device (sends request) 2. A dedicated authentication server checks the user's authenticity (by API key, bearer etc and rate limits) 3. Our HTTP server takes that request, processes the data and sends the request to the LLM server (to runpod - serverless) 4. Runpod returns some generated data, and finally the HTTP server post-processes it and sends back to the user. --- We want to: - Make sure no unauthorized device is accessing our API to LLM - To track each user's leftover quota and only let them to send a couple of requests etc. πŸ‘‰πŸ» As you can see, we have certain authentication related thoughts -- but I need more granular understanding of what standard practices are when deploying LLMs for the commercial use which the real customers are going to use. Please guide. Thank you.
10 Replies
nerdylive
nerdyliveβ€’2w ago
What's the question? That visualization on "the basic setup" seems right and common yes
jackson hole
jackson holeOPβ€’2w ago
Damn, the visualization (the image attached -- if that's what you meant -- was just to grab attention -- πŸ˜… ) removed. The question is rather "an ask for guidence" on the standard architecture design while deploying the LLMs with authentication. If basic setup is good enough, then okay, otherwise you may guide more, thanks/.
nerdylive
nerdyliveβ€’2w ago
Sorry I meant your explanation, no worries about the image hahah
jackson hole
jackson holeOPβ€’2w ago
Alrighty, then I guess I should go ahead with that visualization.
nerdylive
nerdyliveβ€’2w ago
Make sure no unauthorized device is accessing our API to LLM to make sure of this, you should store the runpod api key in your backend, never expose / even send request from the frontend(like user's browser using javascript) so the user browser -> your server then your server sends it into runpod so yeah your explanation looks good because you have your server in the middle first and for the quota management, you'll have to do this in your app where you track each user's usage heres a response from chatgpt with some edits: Quota Enforcement: Define both short-term rate limits (e.g., requests per second) and long-term quotas (e.g., monthly usage caps) based on subscription tiers. Implement dynamic quotas tailored to individual users or plans. For example, premium users may have higher limits than free-tier users. Monitoring and Alerts: Provide users with real-time dashboards to track their quota usage. This transparency helps them manage their consumption effectively. Set up alerts for "soft limits" to warn users before they hit their hard quota limits. Also i think you should add some ways that you can monitor your usage, or if runpod's serverless dashboard seems enough then its fine Granular Control: (extras) Divide quotas into sub-quotas for specific endpoints or services. This prevents a single user from exhausting the entire quota across all features. Use priority queues or traffic spreading techniques to optimize API usage during peak times.
jackson hole
jackson holeOPβ€’2w ago
Fabulous. Thanks. ✨ One thing... Generally the security is on our end which we need to decide. I mean, how do we want to proceed with authentication. There are several options like: - Basic authentication (sending uname-pass in header -- least secure) - Some dynamic token -- encrypt with SHA and that sort of stuff - Create API key per user account (just like OpenAI) and use that etc... Let's say we have selected any of the techniques, then, is there any predefined framework that we can use or, do we need to code these logic from scratch? I have heard of "AWS API Gateway" but not sure about its relevance. We are using FastAPI as our HTTP request handler and that will sent the request to the runpod for context. So, the question: Should we write the authentication logic, or are there libraries/services that can do these for us? Thanks mate
nerdylive
nerdyliveβ€’2w ago
there are lots of framework /libraries even application like api gateways that are open source / hosted version that you can use for this i've never used aws api gateway but yeah i guess its some sort of hosted version of api gateway with the features you need like the auth, rate limits, etc yeah and for the auth depending on how your app is like, usually we can use like api tokens per user account for apis chatgpt response again, so its quick and accurate: The relevance of AWS API Gateway and whether you should rely on it or implement your own authentication logic depends on your specific use case and architecture. Here's a detailed breakdown: AWS API Gateway Overview AWS API Gateway is a fully managed service that helps developers create, publish, and manage APIs. It acts as a "front door" for applications, routing requests to backend services like AWS Lambda or other HTTP endpoints. Key features include: Scalability: Automatically scales to handle thousands of requests. Security: Supports built-in authentication and authorization mechanisms (e.g., AWS IAM, Amazon Cognito). Monitoring: Integrates with AWS CloudWatch for logging and metrics. However, it introduces additional latency due to its architecture and has a 30-second timeout for requests, which can be problematic for long-running operations like Server-Sent Events (SSE) or streaming responses123.
jackson hole
jackson holeOPβ€’2w ago
I see, that will basically replace our "authentication server layer". Thanks a lot -- looking forward to implementing these soon ✌🏻
nerdylive
nerdyliveβ€’2w ago
Third-Party Authentication Services You can also integrate third-party services like Auth0 or Firebase Authentication, which handle user management and token generation securely. These services reduce development overhead but may introduce additional costs. Recommendation Given your setup with FastAPI as the HTTP handler and RunPod as the backend: If you need fine-grained control over authentication or have specific requirements (e.g., SSE support), implement the authentication logic in FastAPI using OAuth2 or JWT. If you prefer offloading this responsibility, consider using a third-party service like Amazon Cognito or Auth0. Avoid using AWS API Gateway solely for authentication unless you're deeply integrated into the AWS ecosystem and require its other features. your welcome also you can use other api gateways, but for simplicity and costs ( smaller business / teams ) better to try create your own first, or using auth services only not the whole api gateway
jackson hole
jackson holeOPβ€’2w ago
Absolutely mate

Did you find this page helpful?