Spaces:
Sleeping
Sleeping
iridescent
commited on
Commit
·
7dce215
1
Parent(s):
ccb19d3
🦄 refactor: 重构 Dockerfile 和 app.py,优化多阶段构建,增强模型加载和 API 逻辑,更新 README 文档以反映新特性
Browse files- Dockerfile +44 -19
- README.md +104 -6
- app.py +120 -58
- requirements.txt +4 -1
Dockerfile
CHANGED
|
@@ -1,30 +1,55 @@
|
|
| 1 |
-
#
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
# 设置工作目录
|
| 11 |
-
WORKDIR /
|
| 12 |
|
| 13 |
-
#
|
| 14 |
-
|
|
|
|
| 15 |
|
| 16 |
-
#
|
| 17 |
-
|
| 18 |
-
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
|
| 19 |
|
| 20 |
# 复制应用代码
|
| 21 |
-
COPY ./app.py /
|
| 22 |
|
| 23 |
# 暴露容器端口
|
| 24 |
-
EXPOSE
|
| 25 |
|
| 26 |
# 启动应用的命令
|
| 27 |
# 使用 uvicorn 运行 app.py 文件中的 app 对象
|
| 28 |
# --host 0.0.0.0 使其可以从外部访问
|
| 29 |
-
# --port
|
| 30 |
-
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "
|
|
|
|
| 1 |
+
# --- 第一阶段:构建环境 ---
|
| 2 |
+
# 使用一个包含编译工具的镜像作为"构建器"
|
| 3 |
+
FROM python:3.12-slim as builder
|
| 4 |
+
|
| 5 |
+
# 设置环境变量,避免 frontend 弹窗交互
|
| 6 |
+
ENV DEBIAN_FRONTEND=noninteractive
|
| 7 |
+
|
| 8 |
+
# 安装 llama-cpp-python 所需的编译工具和依赖
|
| 9 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 10 |
+
build-essential \
|
| 11 |
+
cmake \
|
| 12 |
+
pkg-config \
|
| 13 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 14 |
+
|
| 15 |
+
# 将 Python 包安装到一个独立目录,方便后续拷贝
|
| 16 |
+
ENV PYTHONDONTWRITEBYTECODE=1
|
| 17 |
+
ENV PYTHONUNBUFFERED=1
|
| 18 |
+
ENV PIP_NO_CACHE_DIR=off
|
| 19 |
+
ENV PIP_DISABLE_PIP_VERSION_CHECK=on
|
| 20 |
+
ENV PIP_DEFAULT_TIMEOUT=100
|
| 21 |
+
ENV POETRY_VIRTUALENVS_CREATE=false
|
| 22 |
+
ENV PATH="/app/bin:$PATH"
|
| 23 |
+
|
| 24 |
+
WORKDIR /app
|
| 25 |
+
COPY ./requirements.txt /app/requirements.txt
|
| 26 |
+
RUN pip install --no-cache-dir -r /app/requirements.txt
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
# --- 第二阶段:最终运行环境 ---
|
| 30 |
+
# 使用一个干净、轻量的镜像作为最终的运行环境
|
| 31 |
+
FROM python:3.12-slim as final
|
| 32 |
+
|
| 33 |
+
# 设置 Hugging Face 的缓存目录
|
| 34 |
+
ENV HF_HOME=/data
|
| 35 |
# 设置工作目录
|
| 36 |
+
WORKDIR /app
|
| 37 |
|
| 38 |
+
# 从构建器阶段拷贝已安装的Python依赖包
|
| 39 |
+
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
|
| 40 |
+
COPY --from=builder /usr/local/bin /usr/local/bin
|
| 41 |
|
| 42 |
+
# 创建并授权数据目录,用于模型缓存
|
| 43 |
+
RUN mkdir /data && chmod 777 /data
|
|
|
|
| 44 |
|
| 45 |
# 复制应用代码
|
| 46 |
+
COPY ./app.py /app/app.py
|
| 47 |
|
| 48 |
# 暴露容器端口
|
| 49 |
+
EXPOSE 8080
|
| 50 |
|
| 51 |
# 启动应用的命令
|
| 52 |
# 使用 uvicorn 运行 app.py 文件中的 app 对象
|
| 53 |
# --host 0.0.0.0 使其可以从外部访问
|
| 54 |
+
# --port 8080 监听指定的端口
|
| 55 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
|
README.md
CHANGED
|
@@ -1,7 +1,105 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Sparkle-Server: 高性能 GGUF 大模型 API 服务
|
| 2 |
+
|
| 3 |
+
这是一个基于 `FastAPI` 和 `llama-cpp-python` 的高性能大语言模型(LLM)推理服务。它经过精心优化,旨在以最简单的方式部署基于 GGUF 格式的本地大模型,并提供兼容 OpenAI 的 API 接口。
|
| 4 |
+
|
| 5 |
+
## ✨ 特性
|
| 6 |
+
|
| 7 |
+
- **高性能推理**: 底层使用 `llama.cpp`,在 CPU 上也能实现非常快速的文本生成。
|
| 8 |
+
- **兼容 OpenAI**: 提供 `/v1/chat/completions` 接口,可以无缝对接到各种现有的 OpenAI 生态工具中。
|
| 9 |
+
- **流式响应**: 支持流式(Server-Sent Events)输出,显著提升客户端的交互体验。
|
| 10 |
+
- **灵活配置**: 所有关键参数(如模型ID、文件名、上下文长度等)均可通过环境变量或 `.env` 文件进行配置。
|
| 11 |
+
- **轻量级部署**: 采用 Docker 多阶段构建,最终镜像体积小,安全且易于部署。
|
| 12 |
+
- **动态模型加载**: 在服务启动时从 Hugging Face Hub 自动下载指定的 GGUF 模型。
|
| 13 |
+
|
| 14 |
+
## 🚀 快速开始
|
| 15 |
+
|
| 16 |
+
### 1. 准备工作
|
| 17 |
+
|
| 18 |
+
- 安装 [Docker](https://www.docker.com/products/docker-desktop/)。
|
| 19 |
+
- 克隆本项目。
|
| 20 |
+
|
| 21 |
+
### 2. 配置模型 (可选)
|
| 22 |
+
|
| 23 |
+
您可以创建一个 `.env` 文件来配置您想要运行的模型。如果文件不存在,将使用默认的 Qwen3-8B 模型。
|
| 24 |
+
|
| 25 |
+
创建一个名为 `.env` 的文件,内容如下:
|
| 26 |
+
|
| 27 |
+
```env
|
| 28 |
+
# Hugging Face 上的模型仓库 ID
|
| 29 |
+
MODEL_ID="Qwen/Qwen3-14B-GGUF"
|
| 30 |
+
|
| 31 |
+
# 要下载的 GGUF 模型文件名 (确保它在上面的仓库中存在)
|
| 32 |
+
FILENAME="Qwen3-14B-Q5_K_M.gguf"
|
| 33 |
+
|
| 34 |
+
# 模型的上下文窗口大小
|
| 35 |
+
N_CTX=4096
|
| 36 |
+
|
| 37 |
+
# 要卸载到 GPU 的层数 (0 表示完全使用CPU, -1 表示尽可能多地使用GPU)
|
| 38 |
+
N_GPU_LAYERS=0
|
| 39 |
+
```
|
| 40 |
|
| 41 |
+
### 3. 构建并运行 Docker 容器
|
| 42 |
+
|
| 43 |
+
在项目根目录下,执行以下命令:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
docker build -t sparkle-server .
|
| 47 |
+
docker run -it -p 8080:8080 --rm --name sparkle-server sparkle-server
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
服务启动后,模型文件会自动从 Hugging Face Hub 下载并加载。您将在终端看到模型加载的日志。
|
| 51 |
+
|
| 52 |
+
## 🤖 API 使用示例
|
| 53 |
+
|
| 54 |
+
服务启动后,您可以访问 `http://localhost:8080/docs` 查看交互式 API 文档。
|
| 55 |
+
|
| 56 |
+
以下是使用 `curl` 的调用示例:
|
| 57 |
+
|
| 58 |
+
### 示例 1: 标准 JSON 响应
|
| 59 |
+
|
| 60 |
+
发送一个请求,并等待模型生成完整的回复。
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
curl http://localhost:8080/v1/chat/completions \
|
| 64 |
+
-H "Content-Type: application/json" \
|
| 65 |
+
-d '{
|
| 66 |
+
"messages": [
|
| 67 |
+
{
|
| 68 |
+
"role": "system",
|
| 69 |
+
"content": "你是一个乐于助人的AI助手。"
|
| 70 |
+
},
|
| 71 |
+
{
|
| 72 |
+
"role": "user",
|
| 73 |
+
"content": "你好!请给我讲一个关于宇宙的笑话。"
|
| 74 |
+
}
|
| 75 |
+
],
|
| 76 |
+
"max_tokens": 128,
|
| 77 |
+
"temperature": 0.7,
|
| 78 |
+
"stream": false
|
| 79 |
+
}'
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### 示例 2: 流式响应
|
| 83 |
+
|
| 84 |
+
发送一个请求,服务器会以数据流的方式实时返回生成的词语。
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
curl http://localhost:8080/v1/chat/completions \
|
| 88 |
+
-H "Content-Type: application/json" \
|
| 89 |
+
-H "Accept: text/event-stream" \
|
| 90 |
+
-d '{
|
| 91 |
+
"messages": [
|
| 92 |
+
{
|
| 93 |
+
"role": "user",
|
| 94 |
+
"content": "请写一首关于秋天的五言绝句。"
|
| 95 |
+
}
|
| 96 |
+
],
|
| 97 |
+
"max_tokens": 100,
|
| 98 |
+
"stream": true
|
| 99 |
+
}'
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
您将看到以 `data:` 开头的 Server-Sent Events (SSE) 数据流。
|
| 103 |
+
|
| 104 |
+
---
|
| 105 |
+
*Powered by Sparkle-Server*
|
app.py
CHANGED
|
@@ -1,69 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from llama_cpp import Llama
|
| 2 |
from huggingface_hub import hf_hub_download
|
| 3 |
-
from
|
| 4 |
-
from pydantic import BaseModel
|
| 5 |
|
| 6 |
-
# --- 模型加载逻辑 ---
|
| 7 |
-
|
| 8 |
-
# 1. 配置模型和分词器
|
| 9 |
-
# 使用最强大的、经过GGUF优化的Qwen3系列8B模型
|
| 10 |
-
MODEL_ID = "unsloth/Qwen3-8B-GGUF"
|
| 11 |
-
# 我们选择一个在性能和质量之间取得良好平衡的8位量化版本
|
| 12 |
-
FILENAME = "Qwen3-8B-Q8_0.gguf"
|
| 13 |
-
|
| 14 |
-
print(f"正在从Hub下载模型: {MODEL_ID}/{FILENAME}...")
|
| 15 |
-
|
| 16 |
-
# 2. 从Hub下载GGUF模型文件
|
| 17 |
-
model_path = hf_hub_download(repo_id=MODEL_ID, filename=FILENAME)
|
| 18 |
-
|
| 19 |
-
print("模型下载完成。正在加载模型到内存...")
|
| 20 |
-
|
| 21 |
-
# 3. 使用 llama-cpp-python 加载GGUF模型
|
| 22 |
-
# n_ctx是上下文窗口大小,n_gpu_layers=0表示完全使用CPU
|
| 23 |
-
model = Llama(model_path=model_path, n_ctx=4096, n_gpu_layers=0, verbose=True)
|
| 24 |
-
|
| 25 |
-
print("模型加载完成。")
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
# --- API 服务逻辑 ---
|
| 29 |
-
|
| 30 |
-
# 4. 创建 FastAPI 应用实例
|
| 31 |
-
app = FastAPI()
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
# 5. 定义请求体的数据模型
|
| 35 |
-
class GenerationRequest(BaseModel):
|
| 36 |
-
prompt: str
|
| 37 |
-
max_tokens: int = 128
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
"""
|
| 44 |
-
|
| 45 |
"""
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
|
| 67 |
@app.get("/")
|
| 68 |
def read_root():
|
| 69 |
-
return {"message": "
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
import asyncio
|
| 3 |
+
from typing import List, Optional, Dict, Any, Generator, AsyncGenerator
|
| 4 |
+
from fastapi import FastAPI, HTTPException
|
| 5 |
+
from pydantic import BaseModel, Field
|
| 6 |
+
from pydantic_settings import BaseSettings, SettingsConfigDict
|
| 7 |
from llama_cpp import Llama
|
| 8 |
from huggingface_hub import hf_hub_download
|
| 9 |
+
from sse_starlette.sse import EventSourceResponse
|
|
|
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
+
# --- 1. 配置管理 ---
|
| 13 |
+
class Settings(BaseSettings):
|
| 14 |
+
model_config = SettingsConfigDict(
|
| 15 |
+
env_file=".env", env_file_encoding="utf-8", extra="ignore"
|
| 16 |
+
)
|
| 17 |
|
| 18 |
+
MODEL_ID: str = Field(
|
| 19 |
+
"unsloth/Qwen3-8B-GGUF", description="Hugging Face上的模型仓库ID"
|
| 20 |
+
)
|
| 21 |
+
FILENAME: str = Field("Qwen3-8B-Q8_0.gguf", description="要下载的GGUF模型文件名")
|
| 22 |
+
N_CTX: int = Field(4096, description="模型的上下文窗口大小")
|
| 23 |
+
N_GPU_LAYERS: int = Field(0, description="要卸载到GPU的层数 (0表示完全使用CPU)")
|
| 24 |
+
N_THREADS: Optional[int] = Field(
|
| 25 |
+
None, description="用于推理的CPU核心数 (None为自动)"
|
| 26 |
+
)
|
| 27 |
+
VERBOSE: bool = Field(True, description="是否启用Llama.cpp的详细日志")
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
settings = Settings()
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
# --- 2. 模型加载 ---
|
| 34 |
+
def load_model():
|
| 35 |
+
"""从Hugging Face Hub下载并加载GGUF模型"""
|
| 36 |
+
print(f"正在从Hub下载模型: {settings.MODEL_ID}/{settings.FILENAME}...")
|
| 37 |
+
try:
|
| 38 |
+
model_path = hf_hub_download(
|
| 39 |
+
repo_id=settings.MODEL_ID, filename=settings.FILENAME
|
| 40 |
+
)
|
| 41 |
+
except Exception as e:
|
| 42 |
+
print(f"模型下载失败: {e}")
|
| 43 |
+
raise RuntimeError(f"无法从Hugging Face Hub下载模型: {e}")
|
| 44 |
+
|
| 45 |
+
print("模型下载完成。正在加载模型到内存...")
|
| 46 |
+
try:
|
| 47 |
+
model = Llama(
|
| 48 |
+
model_path=model_path,
|
| 49 |
+
n_ctx=settings.N_CTX,
|
| 50 |
+
n_gpu_layers=settings.N_GPU_LAYERS,
|
| 51 |
+
n_threads=settings.N_THREADS,
|
| 52 |
+
verbose=settings.VERBOSE,
|
| 53 |
+
)
|
| 54 |
+
print("模型加载完成。")
|
| 55 |
+
return model
|
| 56 |
+
except Exception as e:
|
| 57 |
+
print(f"模型加载失败: {e}")
|
| 58 |
+
raise RuntimeError(f"无法加载Llama模型: {e}")
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
model = load_model()
|
| 62 |
+
|
| 63 |
+
# --- 3. API 服务逻辑 ---
|
| 64 |
+
app = FastAPI(
|
| 65 |
+
title="Sparkle-Server - GGUF 大模型 API",
|
| 66 |
+
description="一个基于 llama-cpp-python 和 FastAPI 的、兼容 OpenAI 格式的高性能LLM推理服务。",
|
| 67 |
+
version="1.0.0",
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
# --- 4. API 数据模型 (兼容 OpenAI) ---
|
| 72 |
+
class ChatMessage(BaseModel):
|
| 73 |
+
role: str
|
| 74 |
+
content: str
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
class ChatCompletionRequest(BaseModel):
|
| 78 |
+
messages: List[ChatMessage]
|
| 79 |
+
model: str = settings.MODEL_ID
|
| 80 |
+
max_tokens: int = 1024
|
| 81 |
+
temperature: float = 0.7
|
| 82 |
+
stream: bool = False
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
# --- 5. 流式响应生成器 ---
|
| 86 |
+
async def stream_generator(
|
| 87 |
+
chat_iterator: Generator[Dict[str, Any], Any, None],
|
| 88 |
+
) -> AsyncGenerator[str, None]:
|
| 89 |
+
"""将 llama-cpp-python 的输出流转换为 Server-Sent Events (SSE) 格式"""
|
| 90 |
+
for chunk in chat_iterator:
|
| 91 |
+
if "content" in chunk["choices"][0]["delta"]:
|
| 92 |
+
yield f"data: {json.dumps(chunk)}\n\n"
|
| 93 |
+
await asyncio.sleep(0) # 允许事件循环处理其他任务
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
# --- 6. API 端点 (兼容 OpenAI) ---
|
| 97 |
+
@app.post("/v1/chat/completions")
|
| 98 |
+
async def create_chat_completion(request: ChatCompletionRequest):
|
| 99 |
"""
|
| 100 |
+
处理聊天补全请求,支持流式和非流式响应。
|
| 101 |
"""
|
| 102 |
+
if not request.messages:
|
| 103 |
+
raise HTTPException(status_code=400, detail="messages 列表不能为空")
|
| 104 |
+
|
| 105 |
+
try:
|
| 106 |
+
if request.stream:
|
| 107 |
+
# 流式响应
|
| 108 |
+
chat_iterator = model.create_chat_completion(
|
| 109 |
+
messages=request.dict()["messages"],
|
| 110 |
+
max_tokens=request.max_tokens,
|
| 111 |
+
temperature=request.temperature,
|
| 112 |
+
stream=True,
|
| 113 |
+
)
|
| 114 |
+
return EventSourceResponse(stream_generator(chat_iterator))
|
| 115 |
+
else:
|
| 116 |
+
# 非流式响应
|
| 117 |
+
result = model.create_chat_completion(
|
| 118 |
+
messages=request.dict()["messages"],
|
| 119 |
+
max_tokens=request.max_tokens,
|
| 120 |
+
temperature=request.temperature,
|
| 121 |
+
stream=False,
|
| 122 |
+
)
|
| 123 |
+
return result
|
| 124 |
+
except Exception as e:
|
| 125 |
+
print(f"处理请求时发生错误: {e}")
|
| 126 |
+
raise HTTPException(status_code=500, detail=f"内部服务器错误: {str(e)}")
|
| 127 |
|
| 128 |
|
| 129 |
@app.get("/")
|
| 130 |
def read_root():
|
| 131 |
+
return {"message": "Sparkle-Server (GGUF版) 正在运行。请访问 /docs 查看 API 文档。"}
|
requirements.txt
CHANGED
|
@@ -2,4 +2,7 @@ torch==2.7.1
|
|
| 2 |
llama-cpp-python==0.3.9
|
| 3 |
huggingface-hub==0.33.0
|
| 4 |
fastapi==0.115.13
|
| 5 |
-
uvicorn[standard]==0.34.3
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
llama-cpp-python==0.3.9
|
| 3 |
huggingface-hub==0.33.0
|
| 4 |
fastapi==0.115.13
|
| 5 |
+
uvicorn[standard]==0.34.3
|
| 6 |
+
pydantic-settings==2.10.1
|
| 7 |
+
python-dotenv==1.1.1
|
| 8 |
+
sse-starlette==2.3.6
|