Build Ask AI with Sourcey and LangChain

If you want an Ask AI feature over your docs, this is the thin layer.

Sourcey already ships the files a retriever needs:

search-index.json
llms-full.txt
stable page URLs

That means you do not need a hosted index just to make your docs usable from LangChain. Publish the docs site. Point the retriever at it. Done.

Install

Python:

pip install -U langchain-sourcey

JavaScript:

npm install langchain-sourcey @langchain/core

What it reads

Both langchain-sourcey packages work against a published Sourcey docs root.

It uses:

search-index.json to find candidate pages
llms-full.txt to hydrate full page content
the page URL as the LangChain citation source

If llms-full.txt is missing, it falls back to the matched page HTML.

Quickstart

Python:

from langchain_sourcey import SourceyRetriever

retriever = SourceyRetriever(
    site_url="https://sourcey.com/docs",
    top_k=3,
)

docs = retriever.invoke("mcp integration")

for doc in docs:
    print(doc.metadata["title"])
    print(doc.metadata["source"])
    print(doc.page_content[:160])
    print()

site_url should be the root of a published Sourcey build:

https://sourcey.com/docs
https://sourcey.com/cheesestore
https://cheesestore.github.io

JavaScript:

import { SourceyRetriever } from "langchain-sourcey";

const retriever = new SourceyRetriever({
  siteUrl: "https://sourcey.com/docs",
  topK: 3,
});

const docs = await retriever.invoke("mcp integration");

for (const doc of docs) {
  console.log(doc.metadata.title);
  console.log(doc.metadata.source);
  console.log(doc.pageContent.slice(0, 160));
  console.log();
}

Implement Ask AI

Install a chat model package. This example uses OpenAI.

Python:

pip install -U langchain-openai

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

from langchain_sourcey import SourceyRetriever

retriever = SourceyRetriever(site_url="https://sourcey.com/docs", top_k=3)

prompt = ChatPromptTemplate.from_template(
    """Answer the question using the documentation context below.

{context}

Question: {question}"""
)

chain = (
    RunnablePassthrough.assign(context=(lambda x: x["question"]) | retriever)
    | prompt
    | ChatOpenAI(model="gpt-4.1-mini")
    | StrOutputParser()
)

print(chain.invoke({"question": "How does Sourcey document MCP servers?"}))

JavaScript:

npm install @langchain/openai

import type { Document } from "@langchain/core/documents";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI } from "@langchain/openai";
import { SourceyRetriever } from "langchain-sourcey";

const retriever = new SourceyRetriever({
  siteUrl: "https://sourcey.com/docs",
  topK: 3,
});

const prompt = ChatPromptTemplate.fromTemplate(
  `Answer the question using the documentation context below.

{context}

Question: {question}`
);

const formatDocs = (docs: Document[]) =>
  docs.map((doc) => doc.pageContent).join("\n\n");

const chain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocs),
    question: new RunnablePassthrough(),
  },
  prompt,
  new ChatOpenAI({ model: "gpt-4.1-mini" }),
  new StringOutputParser(),
]);

console.log(await chain.invoke("How does Sourcey document MCP servers?"));

The contract

If you want this to work cleanly on your own docs site, keep these stable:

publish search-index.json
publish llms-full.txt
set siteUrl so page URLs are canonical

That is the whole trick. Sourcey emits the retrieval surface as part of the normal docs build, so the LangChain integration stays thin.

Install

What it reads

Quickstart

Implement Ask AI

The contract

Package