A photo of Maximilian Schwarzmüller

Maximilian Schwarzmüller

Using Open LLMs On-Demand via Bedrock

Using Open LLMs On-Demand via Bedrock

Published

LLMs are still aiming to change the world (we’ll see, whether they succeed, though).

One particularly interesting development is that open LLMs (like Gemma 3) are more accessible than ever. Tools like LM Studio, Ollama, and libraries such as llama.cpp have made it feasible to run impressive models like those from Google (Gemma) or Alibaba (Qwen) directly on your local machine or self-hosted servers (e.g., via Ollama or llama.cpp ).

Local LLMs via Ollama & LM Studio - The Practical Guide
Local LLMs via Ollama & LM Studio - The Practical Guide

Learn how to use LM Studio & Ollama to run open large language models like Gemma, Llama or DeepSeek locally to perform AI inference on consumer hardware.

Explore Course

This local approach offers great control and is fantastic for experimentation, local AI workflows, privacy, and offline use.

However, running these models locally or self-hosting them can come with its own set of challenges: significant hardware requirements (lots of VRAM may be required, depending on the model you want to run), complex setup and maintenance (when planning to host and expose open models), and limitations when it comes to scaling your application for broader use or ensuring consistent uptime without dedicated effort.

If your goal is to integrate these models into production applications, or if you need to access various models without managing individual environments, the self-hosted path can quickly become resource-intensive.

This is where a managed, on-demand service like Amazon Bedrock offers a compelling alternative ( Groq would be yet another alternative).

Instead of dealing with installations or server configurations for each model, Bedrock (just like Groq) aims to provide a streamlined way to access and use a variety of LLMs on a pay-as-you-go basis, without managing the underlying infrastructure.

What Exactly is Amazon Bedrock?

Amazon Bedrock is a fully managed service from Amazon Web Services (AWS) that provides access to a selection of high-performing foundation models from leading AI companies (including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Amazon itself, and DeepSeek AI) through a single, unified API.

Here are some of its key features:

In essence, Amazon Bedrock aims to lower the barrier to entry for developers looking to leverage the power of generative AI in a scalable and operationally efficient manner.

Why Use Amazon Bedrock?

While open models from innovators like DeepSeek AI are fantastic for their transparency and cutting-edge capabilities, deploying them beyond a personal machine or a single self-managed server presents challenges.

When you need to build applications that are reliable, can handle multiple users, or require access to various models without constant tinkering, the advantages of a managed service become clear.

Amazon Bedrock addresses these by:

  1. Managed Infrastructure: AWS takes on the responsibility of hosting and managing the model infrastructure, reducing your operational burden compared to maintaining your own Ollama or llama.cpp instances.
  2. On-Demand Access & Cost Efficiency for Variable Use: The ability to call these models on-demand means you can use their power without your local machine being constantly tied up, or without paying for idle self-hosted servers. This is efficient for tasks that are intermittent or during development phases when you might not need a model running 24/7.
  3. Scalability and Reliability: Leveraging AWS infrastructure means your model access can scale with your application’s needs and benefits from AWS’s reliability, far beyond what a typical local or single-server self-hosted solution can offer.
  4. Security Features: Bedrock provides features and integrates with AWS security services to help you build secure generative AI applications. For instance, data processed by Bedrock is not used to train the original base models or shared with third-party model providers.

So, let’s dive into an example.

Using a DeepSeek Model On-Demand with Amazon Bedrock

We’ll walk through an example for accessing the DeepSeek R1 model using the AWS SDK for Python (Boto3). While the Bedrock console offers “Playgrounds” for interactive testing, programmatic access is key for application integration. Of course, AWS also offers SDKs for many other languages (e.g., JavaScript).

Prerequisites:

Step 1: Enable Model Access in the AWS Console

Before using a specific foundation model via the API, you typically need to request access to it.

  1. Log in to the AWS Management Console and navigate to the “Amazon Bedrock” service.
  2. In the Bedrock service console, find “Model access” in the left-hand navigation pane (at the very bottom).
  3. Select the model(s) you want to access (e.g., DeepSeek R1) and request access

Access is often granted quickly, but there might be a brief processing period.

Step 2: (Optional) Experiment in the Bedrock Playground

Once model access is granted, it’s a good idea to test the model interactively.

  1. In the Bedrock console, go to “Playgrounds” > “Chat / Text”.
  2. Click “Select model” and select the model you want to test (e.g., DeepSeek R1).
  3. Enter a prompt and observe the model’s response. This helps you understand its capabilities and how to formulate effective prompts.

Step 3: Programmatic Access with Python

Here’s how to use the DeepSeek model using Python:

If you haven’t already, install the AWS SDK for Python (recommended: do that in a virtual environment):

pip install boto3

With that done, inside your Python project / virtual environment, create a new file (e.g., ‘main.py’).

You now need to get the model ID for the model you want to use. You can find this in the Bedrock console under “Model catalog” by selecting your preferred model there.

For example, for DeepSeek R1, the model ID is us.deepseek.r1-v1:0.

You can then use that ID in your Python code to send a prompt to the model (via a request to Amazon Bedrock).

Here’s a basic Python script that sends a prompt to DeepSeek R1 via Amazon Bedrock:

import boto3
import json

# Ensure your AWS credentials and region are configured (e.g., via AWS CLI)
# For DeepSeek-R1, specify a region where it's available, e.g., 'us-west-2' or 'us-east-1'
bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')

model_id = 'us.deepseek.r1-v1:0'

prompt = input("Your prompt: ")

payload = {
    "prompt": prompt,
    "max_tokens": 4096,
    "temperature": 0.7
}
body = json.dumps(payload)

try:
    print(f"Invoking DeepSeek model (ID: {model_id})...")
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=model_id,
        contentType='application/json'
    )

    response_body = json.loads(response.get('body').read())

    # The path to the generated text varies by model.
    # You MUST check the AWS Bedrock documentation for the specific response format of your model.
    # Below's the code for DeepSeek R1
    generated_text = response_body["choices"][0]["text"]
    
    print("\nDeepSeek Model Response:")
    print(generated_text)

except Exception as e:
    print(f"Error invoking DeepSeek model: {e}")
    print("Please ensure the model ID is correct, you have access, and the request body format matches the model's requirements as per AWS Bedrock documentation.")

You can run this script from your terminal / command line:

python main.py

After entering your prompt, it can take a couple of seconds for the response to come back. Ultimately, the text generated by the model will be printed to your terminal.

Understanding On-Demand Usage

When using invoke_model as shown above, you are typically operating in Bedrock’s “On-Demand” mode.

This means:

For applications requiring guaranteed throughput and consistent high performance, Bedrock also offers a “Provisioned Throughput” option, where you purchase dedicated inference capacity for a specific model.