콘텐츠로 이동

spakky-vllm

spakky-vllm은 ADR-0009 Agent workflow를 위한 로컬 OpenAI-compatible IAgentModel 구현체입니다. 이 패키지는 의도적으로 outbound model adapter 역할만 담당합니다. Agent core 계약은 spakky-agent에 남기고, vLLM HTTP 설정, completion mapping, streaming event, tool-call argument 검증은 이 플러그인이 소유합니다.

검증 전략

spakky-vllm 테스트는 실제 vLLM 서버나 로컬 모델을 호출하지 않습니다. CI와 로컬 커밋 시간을 예측 가능하게 유지하기 위해 IVllmChatClient fake로 request mapping, streaming event 변환, structured output, required tool calling, error mapping을 검증합니다.

spakky.plugins.vllm

vLLM model adapter plugin for Spakky Agent.

PLUGIN_NAME = Plugin(name='spakky-vllm') module-attribute

Plugin identifier for the vLLM adapter package.

HttpxVllmChatClient

Bases: IVllmChatClient

httpx-backed client for vLLM's OpenAI-compatible API.

complete(payload, config) async

Send a chat completion request and return the JSON object response.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@override
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a chat completion request and return the JSON object response."""
    try:
        async with httpx.AsyncClient(
            timeout=config.request_timeout_seconds,
        ) as client:
            response = await client.post(
                config.chat_completions_url,
                json=dict(payload),
            )
        response.raise_for_status()
        decoded: object = response.json()
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

    if not isinstance(decoded, Mapping):
        raise VllmResponseError
    return decoded

stream(payload, config) async

Stream server-sent event chunks from chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@override
async def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream server-sent event chunks from chat completions."""
    try:
        async with httpx.AsyncClient(
            timeout=config.stream_timeout_seconds,
        ) as client:
            async with client.stream(
                "POST",
                config.chat_completions_url,
                json=dict(payload),
            ) as response:
                response.raise_for_status()
                async for line in response.aiter_lines():
                    chunk = self._decode_sse_line(line)
                    if chunk is not None:
                        yield chunk
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

IVllmChatClient

Bases: ABC

Boundary used by the model adapter to call vLLM.

complete(payload, config) abstractmethod async

Send a non-streaming chat completion request.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@abstractmethod
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a non-streaming chat completion request."""
    ...

stream(payload, config) abstractmethod

Stream OpenAI-compatible chat completion chunks.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@abstractmethod
def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream OpenAI-compatible chat completion chunks."""
    ...

VllmConfig()

Bases: BaseSettings

Settings for the OpenAI-compatible vLLM model endpoint.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/config.py
def __init__(self) -> None:
    super().__init__()

endpoint_url = DEFAULT_VLLM_ENDPOINT_URL class-attribute instance-attribute

Base URL for the vLLM OpenAI-compatible API, without a trailing path.

model = DEFAULT_VLLM_MODEL class-attribute instance-attribute

Model identifier passed to the vLLM chat completions endpoint.

request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS class-attribute instance-attribute

Timeout for non-streaming chat completion requests.

stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS class-attribute instance-attribute

Timeout budget reserved for streaming requests.

stream_enabled = True class-attribute instance-attribute

Whether callers may request the streaming model surface.

chat_template_kwargs = Field(default_factory=dict) class-attribute instance-attribute

vLLM chat template kwargs passed through to chat completion requests.

chat_completions_url property

Return the normalized chat completions URL.

AbstractVllmError

Bases: AbstractSpakkyFrameworkError, ABC

Base class for vLLM adapter errors.

VllmModelRefusalError

Bases: AbstractVllmError

Raised when the model refuses to produce a normal completion.

VllmResponseError

Bases: AbstractVllmError

Raised when a vLLM response cannot be mapped to Spakky model contracts.

VllmStreamingDisabledError

Bases: AbstractVllmError

Raised when streaming is disabled by plugin configuration.

VllmStreamingNotImplementedError

Bases: AbstractVllmError

Backward-compatible alias for pre-streaming adapter failures.

VllmTimeoutError

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint times out.

VllmTransportError

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint cannot be reached.

VllmAgentModel(config, client)

Bases: IAgentModel

Spakky Agent model adapter for a local OpenAI-compatible vLLM server.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
def __init__(self, config: VllmConfig, client: IVllmChatClient) -> None:
    self.__config = config
    self.__client = client

complete(request) async

Return a provider-neutral model response from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
@override
async def complete(self, request: ModelRequest) -> ModelResponse:
    """Return a provider-neutral model response from vLLM chat completions."""
    payload = self._to_chat_completion_payload(request, stream=False)
    tool_schema_by_name = self._tool_constraints_by_name(request.tool_calling)
    response = await self.__client.complete(payload, self.__config)
    return self._to_model_response(response, request, tool_schema_by_name)

stream(request)

Return provider-neutral stream events from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
@override
def stream(self, request: ModelRequest) -> AsyncGenerator[ModelStreamEvent, None]:
    """Return provider-neutral stream events from vLLM chat completions."""
    return self._stream(request)

추가 모듈

Configuration for the spakky-vllm plugin.

VllmConfig()

Bases: BaseSettings

Settings for the OpenAI-compatible vLLM model endpoint.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/config.py
def __init__(self) -> None:
    super().__init__()

endpoint_url = DEFAULT_VLLM_ENDPOINT_URL class-attribute instance-attribute

Base URL for the vLLM OpenAI-compatible API, without a trailing path.

model = DEFAULT_VLLM_MODEL class-attribute instance-attribute

Model identifier passed to the vLLM chat completions endpoint.

request_timeout_seconds = DEFAULT_VLLM_REQUEST_TIMEOUT_SECONDS class-attribute instance-attribute

Timeout for non-streaming chat completion requests.

stream_timeout_seconds = DEFAULT_VLLM_STREAM_TIMEOUT_SECONDS class-attribute instance-attribute

Timeout budget reserved for streaming requests.

stream_enabled = True class-attribute instance-attribute

Whether callers may request the streaming model surface.

chat_template_kwargs = Field(default_factory=dict) class-attribute instance-attribute

vLLM chat template kwargs passed through to chat completion requests.

chat_completions_url property

Return the normalized chat completions URL.

HTTP client boundary for the vLLM OpenAI-compatible endpoint.

IVllmChatClient

Bases: ABC

Boundary used by the model adapter to call vLLM.

complete(payload, config) abstractmethod async

Send a non-streaming chat completion request.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@abstractmethod
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a non-streaming chat completion request."""
    ...

stream(payload, config) abstractmethod

Stream OpenAI-compatible chat completion chunks.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@abstractmethod
def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream OpenAI-compatible chat completion chunks."""
    ...

HttpxVllmChatClient

Bases: IVllmChatClient

httpx-backed client for vLLM's OpenAI-compatible API.

complete(payload, config) async

Send a chat completion request and return the JSON object response.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@override
async def complete(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> JsonResponseObject:
    """Send a chat completion request and return the JSON object response."""
    try:
        async with httpx.AsyncClient(
            timeout=config.request_timeout_seconds,
        ) as client:
            response = await client.post(
                config.chat_completions_url,
                json=dict(payload),
            )
        response.raise_for_status()
        decoded: object = response.json()
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

    if not isinstance(decoded, Mapping):
        raise VllmResponseError
    return decoded

stream(payload, config) async

Stream server-sent event chunks from chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/client.py
@override
async def stream(
    self,
    payload: Mapping[str, object],
    config: VllmConfig,
) -> AsyncGenerator[JsonResponseObject, None]:
    """Stream server-sent event chunks from chat completions."""
    try:
        async with httpx.AsyncClient(
            timeout=config.stream_timeout_seconds,
        ) as client:
            async with client.stream(
                "POST",
                config.chat_completions_url,
                json=dict(payload),
            ) as response:
                response.raise_for_status()
                async for line in response.aiter_lines():
                    chunk = self._decode_sse_line(line)
                    if chunk is not None:
                        yield chunk
    except httpx.TimeoutException as e:
        raise VllmTimeoutError from e
    except httpx.HTTPError as e:
        raise VllmTransportError from e
    except JSONDecodeError as e:
        raise VllmResponseError from e

IAgentModel implementation backed by vLLM.

VllmAgentModel(config, client)

Bases: IAgentModel

Spakky Agent model adapter for a local OpenAI-compatible vLLM server.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
def __init__(self, config: VllmConfig, client: IVllmChatClient) -> None:
    self.__config = config
    self.__client = client

complete(request) async

Return a provider-neutral model response from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
@override
async def complete(self, request: ModelRequest) -> ModelResponse:
    """Return a provider-neutral model response from vLLM chat completions."""
    payload = self._to_chat_completion_payload(request, stream=False)
    tool_schema_by_name = self._tool_constraints_by_name(request.tool_calling)
    response = await self.__client.complete(payload, self.__config)
    return self._to_model_response(response, request, tool_schema_by_name)

stream(request)

Return provider-neutral stream events from vLLM chat completions.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/model.py
@override
def stream(self, request: ModelRequest) -> AsyncGenerator[ModelStreamEvent, None]:
    """Return provider-neutral stream events from vLLM chat completions."""
    return self._stream(request)

Error classes for the spakky-vllm plugin.

AbstractVllmError

Bases: AbstractSpakkyFrameworkError, ABC

Base class for vLLM adapter errors.

VllmTransportError

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint cannot be reached.

VllmTimeoutError

Bases: AbstractVllmError

Raised when the OpenAI-compatible vLLM endpoint times out.

VllmResponseError

Bases: AbstractVllmError

Raised when a vLLM response cannot be mapped to Spakky model contracts.

VllmConstrainedDecodingUnsupportedError

Bases: AbstractVllmError

Raised when requested tool constraints are not enforced by vLLM.

VllmStreamingDisabledError

Bases: AbstractVllmError

Raised when streaming is disabled by plugin configuration.

VllmModelRefusalError

Bases: AbstractVllmError

Raised when the model refuses to produce a normal completion.

VllmStreamingNotImplementedError

Bases: AbstractVllmError

Backward-compatible alias for pre-streaming adapter failures.

Constants for the spakky-vllm plugin.

Plugin initialization for the vLLM model adapter.

initialize(app)

Register vLLM configuration, HTTP client, and IAgentModel adapter.

Source code in plugins/spakky-vllm/src/spakky/plugins/vllm/main.py
def initialize(app: SpakkyApplication) -> None:
    """Register vLLM configuration, HTTP client, and IAgentModel adapter."""
    app.add(VllmConfig)
    app.add(HttpxVllmChatClient)
    app.add(VllmAgentModel)
    app.container.bind_to_type(IAgentModel, VllmAgentModel)