spakky-vllm¶

spakky-vllm은 ADR-0009 Agent workflow를 위한 로컬 OpenAI-compatible IAgentModel 구현체입니다. 이 패키지는 의도적으로 outbound model adapter 역할만 담당합니다. Agent core 계약은 spakky-agent에 남기고, vLLM HTTP 설정, completion mapping, streaming event, tool-call argument 검증은 이 플러그인이 소유합니다.

검증 전략¶

spakky-vllm 테스트는 실제 vLLM 서버나 로컬 모델을 호출하지 않습니다. CI와 로컬 커밋 시간을 예측 가능하게 유지하기 위해 IVllmChatClient fake로 request mapping, streaming event 변환, structured output, required tool calling, error mapping을 검증합니다.

`spakky.plugins.vllm` ¶

vLLM model adapter plugin for Spakky Agent.

`PLUGIN_NAME = Plugin(name='spakky-vllm')` `module-attribute` ¶

Plugin identifier for the vLLM adapter package.

`HttpxVllmChatClient` ¶

Bases: IVllmChatClient

httpx-backed client for vLLM's OpenAI-compatible API.

`complete(payload, config)` `async` ¶

Send a chat completion request and return the JSON object response.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@override
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a chat completion request and return the JSON object response."""
    try:
        async with httpx.AsyncClient(
            timeout=config.request_timeout_seconds,
        ) as client:
            response = await client.post(
                config.chat_completions_url,
                json=dict(payload),
            )
        response.raise_for_status()
        decoded: object = response.json()
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

    if not isinstance(decoded, Mapping):
        raise VllmResponseError
    return decoded

`stream(payload, config)` `async` ¶

Stream server-sent event chunks from chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@override
async def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream server-sent event chunks from chat completions."""
    try:
        async with httpx.AsyncClient(
            timeout=config.stream_timeout_seconds,
        ) as client:
            async with client.stream(
                "POST",
                config.chat_completions_url,
                json=dict(payload),
            ) as response:
                response.raise_for_status()
                async for line in response.aiter_lines():
                    chunk = self._decode_sse_line(line)
                    if chunk is not None:
                        yield chunk
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

`IVllmChatClient` ¶

Bases: ABC

Boundary used by the model adapter to call vLLM.

`complete(payload, config)` `abstractmethod` `async` ¶

Send a non-streaming chat completion request.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@abstractmethod
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a non-streaming chat completion request."""
    ...

`stream(payload, config)` `abstractmethod` ¶

Stream OpenAI-compatible chat completion chunks.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@abstractmethod
def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream OpenAI-compatible chat completion chunks."""
    ...

`VllmConfig()` ¶

Bases: BaseSettings

Settings for the OpenAI-compatible vLLM model endpoint.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/config.py

def __init__(self) -> None:
    super().__init__()

`endpoint_url = DEFAULT_VLLM_ENDPOINT_URL` `class-attribute` `instance-attribute` ¶

Base URL for the vLLM OpenAI-compatible API, without a trailing path.

`model = DEFAULT_VLLM_MODEL` `class-attribute` `instance-attribute` ¶

Model identifier passed to the vLLM chat completions endpoint.

`request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

Timeout for non-streaming chat completion requests.

`stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

Timeout budget reserved for streaming requests.

`stream_enabled = True` `class-attribute` `instance-attribute` ¶

Whether callers may request the streaming model surface.

`chat_template_kwargs = Field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

vLLM chat template kwargs passed through to chat completion requests.

`chat_completions_url` `property` ¶

Return the normalized chat completions URL.

`AbstractVllmError` ¶

Bases: AbstractSpakkyFrameworkError, ABC

Base class for vLLM adapter errors.

`VllmModelRefusalError` ¶

Bases: AbstractVllmError

Raised when the model refuses to produce a normal completion.

`VllmResponseError` ¶

Bases: AbstractVllmError

Raised when a vLLM response cannot be mapped to Spakky model contracts.

`VllmStreamingDisabledError` ¶

Bases: AbstractVllmError

Raised when streaming is disabled by plugin configuration.

`VllmStreamingNotImplementedError` ¶

Bases: AbstractVllmError

Backward-compatible alias for pre-streaming adapter failures.

`VllmTimeoutError` ¶

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint times out.

`VllmTransportError` ¶

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint cannot be reached.

`VllmAgentModel(config, client)` ¶

Bases: IAgentModel

Spakky Agent model adapter for a local OpenAI-compatible vLLM server.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py

def __init__(self, config: VllmConfig, client: IVllmChatClient) -> None:
    self.__config = config
    self.__client = client

`complete(request)` `async` ¶

Return a provider-neutral model response from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py

@override
async def complete(self, request: ModelRequest) -> ModelResponse:
    """Return a provider-neutral model response from vLLM chat completions."""
    payload = self._to_chat_completion_payload(request, stream=False)
    tool_schema_by_name = self._tool_constraints_by_name(request.tool_calling)
    response = await self.__client.complete(payload, self.__config)
    return self._to_model_response(response, request, tool_schema_by_name)

`stream(request)` ¶

Return provider-neutral stream events from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py

@override
def stream(self, request: ModelRequest) -> AsyncGenerator[ModelStreamEvent, None]:
    """Return provider-neutral stream events from vLLM chat completions."""
    return self._stream(request)

추가 모듈¶

Configuration for the spakky-vllm plugin.

`VllmConfig()` ¶

Bases: BaseSettings

Settings for the OpenAI-compatible vLLM model endpoint.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/config.py

def __init__(self) -> None:
    super().__init__()

`endpoint_url = DEFAULT_VLLM_ENDPOINT_URL` `class-attribute` `instance-attribute` ¶

Base URL for the vLLM OpenAI-compatible API, without a trailing path.

`model = DEFAULT_VLLM_MODEL` `class-attribute` `instance-attribute` ¶

Model identifier passed to the vLLM chat completions endpoint.

`request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

Timeout for non-streaming chat completion requests.

`stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

Timeout budget reserved for streaming requests.

`stream_enabled = True` `class-attribute` `instance-attribute` ¶

Whether callers may request the streaming model surface.

`chat_template_kwargs = Field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

vLLM chat template kwargs passed through to chat completion requests.

`chat_completions_url` `property` ¶

Return the normalized chat completions URL.

HTTP client boundary for the vLLM OpenAI-compatible endpoint.

`IVllmChatClient` ¶

Bases: ABC

Boundary used by the model adapter to call vLLM.

`complete(payload, config)` `abstractmethod` `async` ¶

Send a non-streaming chat completion request.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@abstractmethod
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a non-streaming chat completion request."""
    ...

`stream(payload, config)` `abstractmethod` ¶

Stream OpenAI-compatible chat completion chunks.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@abstractmethod
def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream OpenAI-compatible chat completion chunks."""
    ...

`HttpxVllmChatClient` ¶

Bases: IVllmChatClient

httpx-backed client for vLLM's OpenAI-compatible API.

`complete(payload, config)` `async` ¶

Send a chat completion request and return the JSON object response.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@override
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a chat completion request and return the JSON object response."""
    try:
        async with httpx.AsyncClient(
            timeout=config.request_timeout_seconds,
        ) as client:
            response = await client.post(
                config.chat_completions_url,
                json=dict(payload),
            )
        response.raise_for_status()
        decoded: object = response.json()
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

    if not isinstance(decoded, Mapping):
        raise VllmResponseError
    return decoded

`stream(payload, config)` `async` ¶

Stream server-sent event chunks from chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py

@override
async def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream server-sent event chunks from chat completions."""
    try:
        async with httpx.AsyncClient(
            timeout=config.stream_timeout_seconds,
        ) as client:
            async with client.stream(
                "POST",
                config.chat_completions_url,
                json=dict(payload),
            ) as response:
                response.raise_for_status()
                async for line in response.aiter_lines():
                    chunk = self._decode_sse_line(line)
                    if chunk is not None:
                        yield chunk
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

IAgentModel implementation backed by vLLM.

`VllmAgentModel(config, client)` ¶

Bases: IAgentModel

Spakky Agent model adapter for a local OpenAI-compatible vLLM server.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py

def __init__(self, config: VllmConfig, client: IVllmChatClient) -> None:
    self.__config = config
    self.__client = client

`complete(request)` `async` ¶

Return a provider-neutral model response from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py

@override
async def complete(self, request: ModelRequest) -> ModelResponse:
    """Return a provider-neutral model response from vLLM chat completions."""
    payload = self._to_chat_completion_payload(request, stream=False)
    tool_schema_by_name = self._tool_constraints_by_name(request.tool_calling)
    response = await self.__client.complete(payload, self.__config)
    return self._to_model_response(response, request, tool_schema_by_name)

`stream(request)` ¶

Return provider-neutral stream events from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py

@override
def stream(self, request: ModelRequest) -> AsyncGenerator[ModelStreamEvent, None]:
    """Return provider-neutral stream events from vLLM chat completions."""
    return self._stream(request)

Error classes for the spakky-vllm plugin.

`AbstractVllmError` ¶

Bases: AbstractSpakkyFrameworkError, ABC

Base class for vLLM adapter errors.

`VllmTransportError` ¶

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint cannot be reached.

`VllmTimeoutError` ¶

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint times out.

`VllmResponseError` ¶

Bases: AbstractVllmError

Raised when a vLLM response cannot be mapped to Spakky model contracts.

`VllmConstrainedDecodingUnsupportedError` ¶

Bases: AbstractVllmError

Raised when requested tool constraints are not enforced by vLLM.

`VllmStreamingDisabledError` ¶

Bases: AbstractVllmError

Raised when streaming is disabled by plugin configuration.

`VllmModelRefusalError` ¶

Bases: AbstractVllmError

Raised when the model refuses to produce a normal completion.

`VllmStreamingNotImplementedError` ¶

Bases: AbstractVllmError

Backward-compatible alias for pre-streaming adapter failures.

Constants for the spakky-vllm plugin.

Plugin initialization for the vLLM model adapter.

`initialize(app)` ¶

Register vLLM configuration, HTTP client, and IAgentModel adapter.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/main.py

def initialize(app: SpakkyApplication) -> None:
    """Register vLLM configuration, HTTP client, and IAgentModel adapter."""
    app.add(VllmConfig)
    app.add(HttpxVllmChatClient)
    app.add(VllmAgentModel)
    app.container.bind_to_type(IAgentModel, VllmAgentModel)

spakky-vllm¶

검증 전략¶

spakky.plugins.vllm ¶

PLUGIN_NAME = Plugin(name='spakky-vllm') module-attribute ¶

HttpxVllmChatClient ¶

complete(payload, config) async ¶

stream(payload, config) async ¶

IVllmChatClient ¶

complete(payload, config) abstractmethod async ¶

stream(payload, config) abstractmethod ¶

VllmConfig() ¶

endpoint_url = DEFAULT_VLLM_ENDPOINT_URL class-attribute instance-attribute ¶

model = DEFAULT_VLLM_MODEL class-attribute instance-attribute ¶

request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS class-attribute instance-attribute ¶

stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS class-attribute instance-attribute ¶

stream_enabled = True class-attribute instance-attribute ¶

chat_template_kwargs = Field(default_factory=dict) class-attribute instance-attribute ¶

chat_completions_url property ¶

AbstractVllmError ¶

VllmModelRefusalError ¶

VllmResponseError ¶

VllmStreamingDisabledError ¶

VllmStreamingNotImplementedError ¶

VllmTimeoutError ¶

VllmTransportError ¶

VllmAgentModel(config, client) ¶

complete(request) async ¶

stream(request) ¶

추가 모듈¶

VllmConfig() ¶

endpoint_url = DEFAULT_VLLM_ENDPOINT_URL class-attribute instance-attribute ¶

model = DEFAULT_VLLM_MODEL class-attribute instance-attribute ¶

request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS class-attribute instance-attribute ¶

stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS class-attribute instance-attribute ¶

stream_enabled = True class-attribute instance-attribute ¶

chat_template_kwargs = Field(default_factory=dict) class-attribute instance-attribute ¶

chat_completions_url property ¶

IVllmChatClient ¶

complete(payload, config) abstractmethod async ¶

stream(payload, config) abstractmethod ¶

HttpxVllmChatClient ¶

complete(payload, config) async ¶

stream(payload, config) async ¶

VllmAgentModel(config, client) ¶

complete(request) async ¶

stream(request) ¶

AbstractVllmError ¶

VllmTransportError ¶

VllmTimeoutError ¶

VllmResponseError ¶

VllmConstrainedDecodingUnsupportedError ¶

VllmStreamingDisabledError ¶

VllmModelRefusalError ¶

VllmStreamingNotImplementedError ¶

initialize(app) ¶

`spakky.plugins.vllm` ¶

`PLUGIN_NAME = Plugin(name='spakky-vllm')` `module-attribute` ¶

`HttpxVllmChatClient` ¶

`complete(payload, config)` `async` ¶

`stream(payload, config)` `async` ¶

`IVllmChatClient` ¶

`complete(payload, config)` `abstractmethod` `async` ¶

`stream(payload, config)` `abstractmethod` ¶

`VllmConfig()` ¶

`endpoint_url = DEFAULT_VLLM_ENDPOINT_URL` `class-attribute` `instance-attribute` ¶

`model = DEFAULT_VLLM_MODEL` `class-attribute` `instance-attribute` ¶

`request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

`stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

`stream_enabled = True` `class-attribute` `instance-attribute` ¶

`chat_template_kwargs = Field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`chat_completions_url` `property` ¶

`AbstractVllmError` ¶

`VllmModelRefusalError` ¶

`VllmResponseError` ¶

`VllmStreamingDisabledError` ¶

`VllmStreamingNotImplementedError` ¶

`VllmTimeoutError` ¶

`VllmTransportError` ¶

`VllmAgentModel(config, client)` ¶

`complete(request)` `async` ¶

`stream(request)` ¶

`VllmConfig()` ¶

`endpoint_url = DEFAULT_VLLM_ENDPOINT_URL` `class-attribute` `instance-attribute` ¶

`model = DEFAULT_VLLM_MODEL` `class-attribute` `instance-attribute` ¶

`request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

`stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS` `class-attribute` `instance-attribute` ¶

`stream_enabled = True` `class-attribute` `instance-attribute` ¶

`chat_template_kwargs = Field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`chat_completions_url` `property` ¶

`IVllmChatClient` ¶

`complete(payload, config)` `abstractmethod` `async` ¶

`stream(payload, config)` `abstractmethod` ¶

`HttpxVllmChatClient` ¶

`complete(payload, config)` `async` ¶

`stream(payload, config)` `async` ¶

`VllmAgentModel(config, client)` ¶

`complete(request)` `async` ¶

`stream(request)` ¶

`AbstractVllmError` ¶

`VllmTransportError` ¶

`VllmTimeoutError` ¶

`VllmResponseError` ¶

`VllmConstrainedDecodingUnsupportedError` ¶

`VllmStreamingDisabledError` ¶

`VllmModelRefusalError` ¶

`VllmStreamingNotImplementedError` ¶

`initialize(app)` ¶