Skip to content

Instantly share code, notes, and snippets.

@nicolay-r
Last active April 8, 2025 10:26
Show Gist options
  • Save nicolay-r/840425749cf6d3e397da3d329e894d59 to your computer and use it in GitHub Desktop.
Save nicolay-r/840425749cf6d3e397da3d329e894d59 to your computer and use it in GitHub Desktop.
No-string POST-based stream fetching of LLM output in JavaScript from via REST API hosted by FastAPI
// This is a custom implementation of the POST-based stream fetching for streaming via FastAPI
// https://stackoverflow.com/questions/78826168/how-to-stream-llm-response-from-fastapi-to-react
async function* streamIterator(apiUrl, requestBody, extraHeaders) {
let response = await fetch(apiUrl, {
method: 'POST',
headers: { ...{'Content-Type': 'application/json'}, ...(extraHeaders || {}) },
body: JSON.stringify(requestBody),
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
// NOTE: Below you can use parser API for handling:
// https://www.npmjs.com/package/eventsource-parser
// Or see the original repo: https://github.com/msimoni18/so-stream-llm-response-fastapi-react
const reader = response.body.getReader();
const decoder = new TextDecoder();
while(true) {
const {value, done} = await reader.read();
if (done) break;
yield decoder.decode(value);
}
}
async function handleStream() {
const apiUrl = 'http://localhost:8000/stream-with-post';
const requestBody = {
question: "What's the color of the sky?"
};
for await (const data of streamIterator(apiUrl, requestBody, {})) {
console.log(JSON.parse(data));
}
}
handleStream()
# This is a server implmentation using FastAPI framework and pydandic.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
app = FastAPI()
class Question(BaseModel):
question: str
async def stream_answer(question):
# YOUR CODE WITH LLM OR OTHER STREAMING SERVICE GOES HERE.
CATEGORY = "sentence"
CHUNKS = ["This", "is", "a", "streamed", "response", "that", "would", "be", "received"]
for chunk in CHUNKS:
# THIS IS A TEMPLATE OF THE OUTPUT CHUNK [json-serialized to support of `\n` and other characters]
# https://medium.com/@thiagosalvatore/the-line-break-problem-when-using-server-sent-events-sse-1159632d09a0
# Requires installation of orjson.
yield f"event: {CATEGORY}\ndata: {orjson.dumps(chunk).decode()}\n\n"
@app.post('/stream-with-post')
async def stream_response_from_llm_post(question: Question):
return StreamingResponse(stream_answer(question=question.question), media_type='text/event-stream')
@nicolay-r
Copy link
Author

To enable CORS you may wrap app initialization as follows:

app.add_middleware(
     CORSMiddleware,
     allow_origins=['*'],
     allow_credentials=False,
     allow_methods=['*'],
     allow_headers=['*']
)

@nicolay-r
Copy link
Author

Credits to the repository mentioned in the related StackOverflow thread:
https://github.com/msimoni18/so-stream-llm-response-fastapi-react

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment