Using Open LLMs On-Demand via Bedrock

Published May 7, 2025

LLMs are still aiming to change the world (we’ll see, whether they succeed, though).

One particularly interesting development is that open LLMs (like Gemma 3) are more accessible than ever. Tools like LM Studio, Ollama, and libraries such as llama.cpp have made it feasible to run impressive models like those from Google (Gemma) or Alibaba (Qwen) directly on your local machine or self-hosted servers (e.g., via Ollama or llama.cpp).

Local LLMs via Ollama & LM Studio - The Practical Guide

Learn how to use LM Studio & Ollama to run open large language models like Gemma, Llama or DeepSeek locally to perform AI inference on consumer hardware.

Explore Course

This local approach offers great control and is fantastic for experimentation, local AI workflows, privacy, and offline use.

However, running these models locally or self-hosting them can come with its own set of challenges: significant hardware requirements (lots of VRAM may be required, depending on the model you want to run), complex setup and maintenance (when planning to host and expose open models), and limitations when it comes to scaling your application for broader use or ensuring consistent uptime without dedicated effort.

If your goal is to integrate these models into production applications, or if you need to access various models without managing individual environments, the self-hosted path can quickly become resource-intensive.

This is where a managed, on-demand service like Amazon Bedrock offers a compelling alternative ( Groq would be yet another alternative).

Instead of dealing with installations or server configurations for each model, Bedrock (just like Groq) aims to provide a streamlined way to access and use a variety of LLMs on a pay-as-you-go basis, without managing the underlying infrastructure.

What Exactly is Amazon Bedrock?

Amazon Bedrock is a fully managed service from Amazon Web Services (AWS) that provides access to a selection of high-performing foundation models from leading AI companies (including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Amazon itself, and DeepSeek AI) through a single, unified API.

Here are some of its key features:

Model Choice without Individual Setups: Bedrock offers a diverse catalog of FMs. You can switch between models from different providers without needing to download, configure, and manage separate environments for each one, unlike local tools where each new model or major update might require a new setup.
Serverless Operation & Scalability: With Bedrock, you don’t worry about your local machine’s capacity or provisioning servers. AWS handles the infrastructure, which can scale to meet demand—something that’s hard to achieve with a typical local setup.
Simplified API Access: Bedrock provides a consistent API to interact with various models, simplifying integration into your applications. Switching a model does not require switching to an entirely different API or SDK.
Pay-As-You-Go (On-Demand): You typically pay only for what you use (e.g., the amount of data processed). This can be more cost-effective than investing in high-end hardware that might sit idle, especially for applications with variable workloads or for developers who need access to powerful models sporadically.
Customization and Integration: While you can use models directly, Bedrock also supports features like fine-tuning models with your own data (for some models) and integrates with other AWS services for building more comprehensive AI solutions.
Guard rails and Security: Bedrock includes built-in security features, such as data encryption and compliance with AWS security standards. It also allows you to set up so-called “Guardrails” that can control and filter model input and output.
Simple RAG: Bedrock supports Retrieval-Augmented Generation (RAG) out of the box via a feature called “Knowledge Bases”, allowing you to combine the power of LLMs with external data sources for more accurate and contextually relevant responses.

In essence, Amazon Bedrock aims to lower the barrier to entry for developers looking to leverage the power of generative AI in a scalable and operationally efficient manner.

Why Use Amazon Bedrock?

While open models from innovators like DeepSeek AI are fantastic for their transparency and cutting-edge capabilities, deploying them beyond a personal machine or a single self-managed server presents challenges.

When you need to build applications that are reliable, can handle multiple users, or require access to various models without constant tinkering, the advantages of a managed service become clear.

Amazon Bedrock addresses these by:

Managed Infrastructure: AWS takes on the responsibility of hosting and managing the model infrastructure, reducing your operational burden compared to maintaining your own Ollama or llama.cpp instances.
On-Demand Access & Cost Efficiency for Variable Use: The ability to call these models on-demand means you can use their power without your local machine being constantly tied up, or without paying for idle self-hosted servers. This is efficient for tasks that are intermittent or during development phases when you might not need a model running 24/7.
Scalability and Reliability: Leveraging AWS infrastructure means your model access can scale with your application’s needs and benefits from AWS’s reliability, far beyond what a typical local or single-server self-hosted solution can offer.
Security Features: Bedrock provides features and integrates with AWS security services to help you build secure generative AI applications. For instance, data processed by Bedrock is not used to train the original base models or shared with third-party model providers.

So, let’s dive into an example.

Using a DeepSeek Model On-Demand with Amazon Bedrock

We’ll walk through an example for accessing the DeepSeek R1 model using the AWS SDK for Python (Boto3). While the Bedrock console offers “Playgrounds” for interactive testing, programmatic access is key for application integration. Of course, AWS also offers SDKs for many other languages (e.g., JavaScript).

Prerequisites:

An AWS Account (and a basic understanding of what AWS is and “how it works”)
Python installed on your system & general knowledge of how to write and run Python code

Step 1: Enable Model Access in the AWS Console

Before using a specific foundation model via the API, you typically need to request access to it.

Log in to the AWS Management Console and navigate to the “Amazon Bedrock” service.
In the Bedrock service console, find “Model access” in the left-hand navigation pane (at the very bottom).
Select the model(s) you want to access (e.g., DeepSeek R1) and request access

Access is often granted quickly, but there might be a brief processing period.

Step 2: (Optional) Experiment in the Bedrock Playground

Once model access is granted, it’s a good idea to test the model interactively.

In the Bedrock console, go to “Playgrounds” > “Chat / Text”.
Click “Select model” and select the model you want to test (e.g., DeepSeek R1).
Enter a prompt and observe the model’s response. This helps you understand its capabilities and how to formulate effective prompts.

Step 3: Programmatic Access with Python

Here’s how to use the DeepSeek model using Python:

If you haven’t already, install the AWS SDK for Python (recommended: do that in a virtual environment):

pip install boto3

With that done, inside your Python project / virtual environment, create a new file (e.g., ‘main.py’).

You now need to get the model ID for the model you want to use. You can find this in the Bedrock console under “Model catalog” by selecting your preferred model there.

For example, for DeepSeek R1, the model ID is us.deepseek.r1-v1:0.

You can then use that ID in your Python code to send a prompt to the model (via a request to Amazon Bedrock).

Here’s a basic Python script that sends a prompt to DeepSeek R1 via Amazon Bedrock:

import boto3
import json

# Ensure your AWS credentials and region are configured (e.g., via AWS CLI)
# For DeepSeek-R1, specify a region where it's available, e.g., 'us-west-2' or 'us-east-1'
bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')

model_id = 'us.deepseek.r1-v1:0'

prompt = input("Your prompt: ")

payload = {
    "prompt": prompt,
    "max_tokens": 4096,
    "temperature": 0.7
}
body = json.dumps(payload)

try:
    print(f"Invoking DeepSeek model (ID: {model_id})...")
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=model_id,
        contentType='application/json'
    )

    response_body = json.loads(response.get('body').read())

    # The path to the generated text varies by model.
    # You MUST check the AWS Bedrock documentation for the specific response format of your model.
    # Below's the code for DeepSeek R1
    generated_text = response_body["choices"][0]["text"]
    
    print("\nDeepSeek Model Response:")
    print(generated_text)

except Exception as e:
    print(f"Error invoking DeepSeek model: {e}")
    print("Please ensure the model ID is correct, you have access, and the request body format matches the model's requirements as per AWS Bedrock documentation.")

You can run this script from your terminal / command line:

python main.py

After entering your prompt, it can take a couple of seconds for the response to come back. Ultimately, the text generated by the model will be printed to your terminal.

Common Pitfalls

There are a couple of common pitfalls when using the above code:

Model ID: Ensure you have the correct model ID. You can find it in the Bedrock console under “Model catalog” by selecting your preferred model.
Region: Make sure you are using a region where the model is available. You can check the AWS documentation for the specific model’s availability.
AWS Credentials: Ensure your AWS credentials are set up correctly. You can do this via the AWS CLI or by setting environment variables.
Request Body Format: The request body format may vary by model. The example above uses the format expected by DeepSeek R1. Always check the AWS Bedrock documentation for the specific model you are using to ensure the request body is formatted correctly.
Response Parsing: The response structure may differ based on the model. The example above assumes a specific structure for DeepSeek R1. Always refer to the AWS Bedrock documentation for the model you are using to understand the response format.

Understanding On-Demand Usage

When using invoke_model as shown above, you are typically operating in Bedrock’s “On-Demand” mode.

This means:

You pay based on the volume of data processed (e.g., number of input and output tokens).
There’s no need to commit to a minimum usage or pre-allocate capacity.
It’s suitable for applications with fluctuating workloads, for development and testing, or when starting with generative AI, especially when compared to the upfront or continuous cost of high-end local hardware or dedicated self-hosted servers.

For applications requiring guaranteed throughput and consistent high performance, Bedrock also offers a “Provisioned Throughput” option, where you purchase dedicated inference capacity for a specific model.