Build Ask AI with Sourcey and LangChain
If you want an Ask AI feature over your docs, this is the thin layer.
Sourcey already ships the files a retriever needs:
search-index.jsonllms-full.txt- stable page URLs
That means you do not need a hosted index just to make your docs usable from LangChain. Publish the docs site. Point the retriever at it. Done.
Install
Python:
pip install -U langchain-sourceyJavaScript:
npm install langchain-sourcey @langchain/coreWhat it reads
Both langchain-sourcey packages work against a published Sourcey docs root.
It uses:
search-index.jsonto find candidate pagesllms-full.txtto hydrate full page content- the page URL as the LangChain citation source
If llms-full.txt is missing, it falls back to the matched page HTML.
Quickstart
Python:
from langchain_sourcey import SourceyRetriever
retriever = SourceyRetriever(
site_url="https://sourcey.com/docs",
top_k=3,
)
docs = retriever.invoke("mcp integration")
for doc in docs:
print(doc.metadata["title"])
print(doc.metadata["source"])
print(doc.page_content[:160])
print()site_url should be the root of a published Sourcey build:
https://sourcey.com/docshttps://sourcey.com/cheesestorehttps://cheesestore.github.io
JavaScript:
import { SourceyRetriever } from "langchain-sourcey";
const retriever = new SourceyRetriever({
siteUrl: "https://sourcey.com/docs",
topK: 3,
});
const docs = await retriever.invoke("mcp integration");
for (const doc of docs) {
console.log(doc.metadata.title);
console.log(doc.metadata.source);
console.log(doc.pageContent.slice(0, 160));
console.log();
}Implement Ask AI
Install a chat model package. This example uses OpenAI.
Python:
pip install -U langchain-openaifrom langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_sourcey import SourceyRetriever
retriever = SourceyRetriever(site_url="https://sourcey.com/docs", top_k=3)
prompt = ChatPromptTemplate.from_template(
"""Answer the question using the documentation context below.
{context}
Question: {question}"""
)
chain = (
RunnablePassthrough.assign(context=(lambda x: x["question"]) | retriever)
| prompt
| ChatOpenAI(model="gpt-4.1-mini")
| StrOutputParser()
)
print(chain.invoke({"question": "How does Sourcey document MCP servers?"}))JavaScript:
npm install @langchain/openaiimport type { Document } from "@langchain/core/documents";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI } from "@langchain/openai";
import { SourceyRetriever } from "langchain-sourcey";
const retriever = new SourceyRetriever({
siteUrl: "https://sourcey.com/docs",
topK: 3,
});
const prompt = ChatPromptTemplate.fromTemplate(
`Answer the question using the documentation context below.
{context}
Question: {question}`
);
const formatDocs = (docs: Document[]) =>
docs.map((doc) => doc.pageContent).join("\n\n");
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocs),
question: new RunnablePassthrough(),
},
prompt,
new ChatOpenAI({ model: "gpt-4.1-mini" }),
new StringOutputParser(),
]);
console.log(await chain.invoke("How does Sourcey document MCP servers?"));The contract
If you want this to work cleanly on your own docs site, keep these stable:
- publish
search-index.json - publish
llms-full.txt - set
siteUrlso page URLs are canonical
That is the whole trick. Sourcey emits the retrieval surface as part of the normal docs build, so the LangChain integration stays thin.
Package
- PyPI: langchain-sourcey
- npm: langchain-sourcey
- GitHub: sourcey/langchain-sourcey
