spakky-vllm¶
spakky-vllm은 ADR-0009 Agent workflow를 위한 로컬 OpenAI-compatible
IAgentModel 구현체입니다. 이 패키지는 의도적으로 outbound model adapter 역할만
담당합니다. Agent core 계약은 spakky-agent에 남기고, vLLM HTTP 설정, completion
mapping, streaming event, tool-call argument 검증은 이 플러그인이 소유합니다.
검증 전략¶
spakky-vllm 테스트는 실제 vLLM 서버나 로컬 모델을 호출하지 않습니다. CI와 로컬 커밋
시간을 예측 가능하게 유지하기 위해 IVllmChatClient fake로 request mapping,
streaming event 변환, structured output, required tool calling, error mapping을
검증합니다.
spakky.plugins.vllm
¶
vLLM model adapter plugin for Spakky Agent.
PLUGIN_NAME = Plugin(name='spakky-vllm')
module-attribute
¶
Plugin identifier for the vLLM adapter package.
HttpxVllmChatClient
¶
Bases: IVllmChatClient
httpx-backed client for vLLM's OpenAI-compatible API.
complete(payload, config)
async
¶
Send a chat completion request and return the JSON object response.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
stream(payload, config)
async
¶
Stream server-sent event chunks from chat completions.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
IVllmChatClient
¶
VllmConfig()
¶
Bases: BaseSettings
Settings for the OpenAI-compatible vLLM model endpoint.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/config.py
endpoint_url = DEFAULT_VLLM_ENDPOINT_URL
class-attribute
instance-attribute
¶
Base URL for the vLLM OpenAI-compatible API, without a trailing path.
model = DEFAULT_VLLM_MODEL
class-attribute
instance-attribute
¶
Model identifier passed to the vLLM chat completions endpoint.
request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS
class-attribute
instance-attribute
¶
Timeout for non-streaming chat completion requests.
stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS
class-attribute
instance-attribute
¶
Timeout budget reserved for streaming requests.
stream_enabled = True
class-attribute
instance-attribute
¶
Whether callers may request the streaming model surface.
chat_template_kwargs = Field(default_factory=dict)
class-attribute
instance-attribute
¶
vLLM chat template kwargs passed through to chat completion requests.
chat_completions_url
property
¶
Return the normalized chat completions URL.
AbstractVllmError
¶
VllmModelRefusalError
¶
VllmResponseError
¶
VllmStreamingDisabledError
¶
VllmStreamingNotImplementedError
¶
VllmTimeoutError
¶
VllmTransportError
¶
VllmAgentModel(config, client)
¶
Bases: IAgentModel
Spakky Agent model adapter for a local OpenAI-compatible vLLM server.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
complete(request)
async
¶
Return a provider-neutral model response from vLLM chat completions.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
stream(request)
¶
Return provider-neutral stream events from vLLM chat completions.
추가 모듈¶
Configuration for the spakky-vllm plugin.
VllmConfig()
¶
Bases: BaseSettings
Settings for the OpenAI-compatible vLLM model endpoint.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/config.py
endpoint_url = DEFAULT_VLLM_ENDPOINT_URL
class-attribute
instance-attribute
¶
Base URL for the vLLM OpenAI-compatible API, without a trailing path.
model = DEFAULT_VLLM_MODEL
class-attribute
instance-attribute
¶
Model identifier passed to the vLLM chat completions endpoint.
request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS
class-attribute
instance-attribute
¶
Timeout for non-streaming chat completion requests.
stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS
class-attribute
instance-attribute
¶
Timeout budget reserved for streaming requests.
stream_enabled = True
class-attribute
instance-attribute
¶
Whether callers may request the streaming model surface.
chat_template_kwargs = Field(default_factory=dict)
class-attribute
instance-attribute
¶
vLLM chat template kwargs passed through to chat completion requests.
chat_completions_url
property
¶
Return the normalized chat completions URL.
HTTP client boundary for the vLLM OpenAI-compatible endpoint.
IVllmChatClient
¶
HttpxVllmChatClient
¶
Bases: IVllmChatClient
httpx-backed client for vLLM's OpenAI-compatible API.
complete(payload, config)
async
¶
Send a chat completion request and return the JSON object response.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
stream(payload, config)
async
¶
Stream server-sent event chunks from chat completions.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
IAgentModel implementation backed by vLLM.
VllmAgentModel(config, client)
¶
Bases: IAgentModel
Spakky Agent model adapter for a local OpenAI-compatible vLLM server.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
complete(request)
async
¶
Return a provider-neutral model response from vLLM chat completions.
Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
stream(request)
¶
Return provider-neutral stream events from vLLM chat completions.
Plugin initialization for the vLLM model adapter.
initialize(app)
¶
Register vLLM configuration, HTTP client, and IAgentModel adapter.