production-stack

Router

The source code for the request router.

Key features

Running the router

The router can be configured using command-line arguments. Below are the available options:

Basic Options

Service Discovery Options

Routing Logic Options

Monitoring Options

Logging Options

Dynamic Config Options

Sentry Options

Build docker image

docker build -t <image_name>:<tag> -f docker/Dockerfile .

Example commands to run the router

You can install the router using the following command:

pip install -e .

If you want to run the router with the semantic cache, you can install the dependencies using the following command:

pip install -e .[semantic_cache]

Example 1: running the router locally at port 8000 in front of multiple serving engines:

vllm-router --port 8000 \
    --service-discovery static \
    --static-backends "http://localhost:9001,http://localhost:9002,http://localhost:9003" \
    --static-models "facebook/opt-125m,meta-llama/Llama-3.1-8B-Instruct,facebook/opt-125m" \
    --static-aliases "gpt4:meta-llama/Llama-3.1-8B-Instruct" \
    --static-model-types "chat,chat,chat" \
    --static-backend-health-checks \
    --engine-stats-interval 10 \
    --log-stats \
    --routing-logic roundrobin

Backend health checks

By enabling the --static-backend-health-checks flag, vllm-router will send a simple request to your LLM nodes every minute to verify that they still work. If a node is down, it will output a warning and exclude the node from being routed to.

If you enable this flag, its also required that you specify --static-model-types as we have to use different endpoints for each model type.

Enabling this flag will put some load on your backend every minute as real requests are send to the nodes to test their functionality.

Dynamic Router Config

The router can be configured dynamically using a json file when passing the --dynamic-config-json option. The router will watch the json file for changes and update the configuration accordingly (every 10 seconds).

Currently, the dynamic config supports the following fields:

Required fields:

Optional fields:

Here is an example dynamic config file:

{
    "service_discovery": "static",
    "routing_logic": "roundrobin",
    "static_backends": "http://localhost:9001,http://localhost:9002,http://localhost:9003",
    "static_models": "facebook/opt-125m,meta-llama/Llama-3.1-8B-Instruct,facebook/opt-125m"
}

Get current dynamic config

If the dynamic config is enabled, the router will reflect the current dynamic config in the /health endpoint.

curl http://<router_host>:<router_port>/health

The response will be a JSON object with the current dynamic config.

{
    "status": "healthy",
    "dynamic_config": <current_dynamic_config (JSON object)>
}