How to Put LLMs into Discord | The Complete 2026 Guide

Learn exactly how to put LLMs into Discord using Python, Ollama, OpenAI, and ready-made tools like llmcord. Step-by-step guide for beginners and developers alike — no fluff, just working methods.

Discord started as a gaming chat app. Today it’s the digital living room for millions of communities — developers, artists, researchers, gamers, students — all gathered in servers buzzing with conversation. And in 2025, there’s one thing nearly every active Discord community is thinking about: what if the AI could just be part of the conversation?

That’s not a fantasy anymore. Putting a large language model (LLM) directly inside Discord is not only possible, it’s surprisingly approachable. Whether you want a bot that answers server questions, summarizes long threads, helps with coding problems, roleplays as a character, or simply chats intelligently with your members — you can build it.

This guide covers every major method: from zero-code tools to full Python bots, from cloud APIs like OpenAI and Claude to locally-run models via Ollama and LM Studio. By the end, you’ll know exactly which path suits your use case and how to walk it.

Understanding the Architecture: How LLMs Connect to Discord

Before diving into code, it helps to understand what’s actually happening under the hood. When a user types a message in a Discord channel and a bot responds with an AI-generated reply, here’s the complete data flow:

The user sends a message in Discord.
Discord’s API routes that message to your bot application.
Your bot script receives the message via a webhook or WebSocket connection.
The message text is sent as a prompt to an LLM (either a local model or a cloud API).
The LLM generates a response and returns it.
Your bot posts that response back into the Discord channel.

The LLM can live anywhere — on OpenAI’s servers, Anthropic’s infrastructure, Google’s cloud, or right on your own laptop running Ollama. The Discord bot is just the bridge.

The LLM can live anywhere — on OpenAI’s servers, Anthropic’s infrastructure, Google’s cloud, or right on your own laptop running Ollama. The Discord bot is just the bridge. That single insight makes the whole thing click.

// METHOD 01 · RECOMMENDED FOR BEGINNERS

llmcord — The Easiest No-Boilerplate Option

If you want to skip writing most of the bot infrastructure yourself, llmcord (github.com/jakobdylanc/llmcord) is the most popular ready-to-run solution available. It supports any OpenAI-compatible API including Ollama, xAI, Gemini, and OpenRouter.

What Makes llmcord Special

Hot reloading of config — change settings without restarting
Caches message data in a size-managed global dictionary to minimize Discord API calls
Reply-chain conversations that build context naturally
Per-user, per-role, and per-channel permission controls
Supports vision models (image attachments) and text file attachments

Setup in 5 Steps

Create a Discord Bot at discord.com/developers/applications → Bot tab → generate token → enable Message Content Intent
Clone the repo: git clone https://github.com/jakobdylanc/llmcord.git
Install dependencies: pip install -r requirements.txt
Configure your provider — add base_url and optional api_key. OpenAI, OpenRouter, and Ollama are pre-configured. First model in your list is the default.
Run the bot — the invite URL prints to your console automatically.

Conversation Experience

@ the bot to start a conversation and reply to continue. You can branch conversations into threads — just create a thread from any message and @ the bot inside. Back-to-back messages from the same user are automatically chained. In DMs, conversations continue without needing to reply each time.

Pro TipInclude something like "User messages are prefixed with their Discord ID as <@ID>" in your system prompt to help the model understand the user format and mention users back properly.

How to Put LLMs into Discord

// METHOD 02 · BEST FOR PRIVACY & COST

Python + Ollama (Local LLM)

If you want full control and zero API costs, running Ollama locally is your move. Ollama lets you run LLMs like Llama 3, Mistral, and Gemma entirely offline — your data never leaves your machine.

Cloud vs. Local: At a Glance

Feature	Cloud API (OpenAI, Claude)	Local LLM (Ollama)
Cost	Pay per token	Free (hardware only)
Privacy	Data sent to provider	Fully offline
Speed	Fast (dedicated servers)	Depends on GPU/CPU
Setup	Minimal	Moderate
Model Variety	Limited to provider	Hundreds on HuggingFace
Internet Required	Yes	No

Step-by-Step Setup

Step 1 — Install Ollama and pull a model

ollama pull llama3
ollama run llama3   # verify it works
# Keep this terminal open — your bot needs it running

Step 2 — Set up your Python environment

mkdir discord-llm-bot && cd discord-llm-bot
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install discord.py python-dotenv requests

Step 3 — Create your .env file

DISCORD_TOKEN=your_bot_token_here
OLLAMA_MODEL=llama3

Step 4 — Write bot.py

import discord
import requests
import os
from dotenv import load_dotenv

load_dotenv()

TOKEN = os.getenv("DISCORD_TOKEN")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3")
OLLAMA_URL = "http://localhost:11434/api/generate"

intents = discord.Intents.default()
intents.message_content = True
client = discord.Client(intents=intents)

def ask_ollama(prompt, temperature=0.7):
    payload = {
        "model": OLLAMA_MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": temperature, "num_predict": 500}
    }
    try:
        response = requests.post(OLLAMA_URL, json=payload, timeout=60)
        if response.status_code == 200:
            return response.json().get("response", "").strip()
        return "Something went wrong with the model."
    except requests.exceptions.ConnectionError:
        return "Error: Ollama is not running! Start it with `ollama serve`."

@client.event
async def on_ready():
    print(f"Logged in as {client.user}")

@client.event
async def on_message(message):
    if message.author == client.user:
        return
    if client.user.mentioned_in(message):
        user_input = message.content.replace(f"<@{client.user.id}>", "").strip()
        async with message.channel.typing():
            reply = ask_ollama(user_input)
        await message.reply(reply)

client.run(TOKEN)

Step 5 — Run it: python bot.py — look for Logged in as YourBot#1234 in the console, then @ the bot in your server.

// METHOD 03 · BEST FOR QUALITY & SPEED

Python + OpenAI API (Cloud LLM)

For the highest-quality responses and zero hardware management, connecting your Discord bot to OpenAI’s API (or any OpenAI-compatible provider like Anthropic or Gemini) is the cleanest cloud option.

pip install discord.py openai python-dotenv

import discord
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client_ai = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
conversation_history = {}  # Per-user memory

intents = discord.Intents.default()
intents.message_content = True
bot = discord.Client(intents=intents)

@bot.event
async def on_message(message):
    if message.author == bot.user:
        return
    if bot.user.mentioned_in(message):
        user_id = str(message.author.id)
        user_input = message.content.replace(f"<@{bot.user.id}>", "").strip()

        if user_id not in conversation_history:
            conversation_history[user_id] = []

        conversation_history[user_id].append({"role": "user", "content": user_input})

        async with message.channel.typing():
            response = client_ai.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant in a Discord server."},
                    *conversation_history[user_id]
                ],
                max_tokens=800
            )

        reply = response.choices[0].message.content
        conversation_history[user_id].append({"role": "assistant", "content": reply})
        await message.reply(reply)

bot.run(os.getenv("DISCORD_TOKEN"))

⚡ Key FeatureThis version tracks per-user conversation history — each member on your server maintains their own independent chat context with the bot. It resets when the bot restarts; for persistence, move history to SQLite or Redis.

// METHOD 04 · FOR JS DEVELOPERS

LM Studio + Node.js

If you prefer JavaScript, LM Studio gives you a polished desktop GUI for managing and running local LLMs, and its JavaScript SDK makes integrating with a Discord bot clean and straightforward.

npm install discord.js @lmstudio/sdk dotenv

import { LMStudioClient } from '@lmstudio/sdk';
import { Client, GatewayIntentBits } from 'discord.js';
import 'dotenv/config';

const lms = new LMStudioClient();
const discord = new Client({
  intents: [GatewayIntentBits.Guilds, GatewayIntentBits.GuildMessages, GatewayIntentBits.MessageContent]
});

discord.on('ready', () => console.log(`Ready: ${discord.user.tag}`));

discord.on('messageCreate', async (message) => {
  if (message.author.bot) return;
  if (!message.mentions.has(discord.user)) return;

  const prompt = message.content.replace(`<@${discord.user.id}>`, '').trim();
  const model = await lms.llm.get({ path: 'lmstudio-community/gemma-2-2b-it-GGUF' });

  await message.channel.sendTyping();
  const response = await model.respond([{ role: 'user', content: prompt }]);
  await message.reply(response.content);
});

discord.login(process.env.DISCORD_TOKEN);

In LM Studio, navigate to the server section, select your model from the dropdown, and start the local API server before running this script. Models like Gemma 2 2B work well on most consumer hardware.

Using Fine-Tuned Models in Discord

If you want your Discord bot to behave like a domain expert — trained on your own data — you can connect fine-tuned LLMs using the same architecture described above. The key difference is in your API call: instead of pointing at gpt-4o or llama3, you point it at your fine-tuned model’s endpoint.

Platforms like Hugging Face Inference Endpoints, Together AI, Fireworks AI, and Replicate all let you host fine-tuned models with an OpenAI-compatible API — meaning the bot code stays identical, only the endpoint URL and model name change.

Scaling to Multi-Server Bots: Real-World Lessons

Once your bot works locally, scaling it reveals hard truths fast. A Gonzaga University CS team discovered this when they built a production-grade LLM Discord bot: the project required TypeScript code, web API calls, asynchronous and synchronous models, Docker containers, GPU configuration, and Discord libraries all working together.

Their key finding: running Ollama on consumer laptops was far too slow for active servers. They ultimately moved to a dedicated GPU research server, which enabled near real-time responses comparable to cloud API services.

Recommended Hardware by Use Case

Use Case	Recommended Setup	Notes
Personal / hobby server	CPU + Ollama (7B model)	Slow but completely free
Small community (<100 active)	Consumer GPU + Ollama	RTX 3060+ works well
Mid-size server (100–1,000)	Cloud API or rented GPU	OpenAI / Together AI
Large community (1,000+)	Dedicated server + Docker	GPU cluster recommended

Adding Memory and Context

Out of the box, most simple implementations are stateless — each message is treated independently. For natural conversation, you need persistent memory. Here are the three tiers:

Tier 1 · Simple
In-Memory Dict

Python dictionary keyed by user ID. Fast to implement, resets on bot restart. Good for testing.

Tier 2 · Persistent
SQLite Database

Save and retrieve conversation history across restarts. Enough for most real-world programs.

Tier 3 · Semantic
Vector DB (FAISS / Chroma)

Store and retrieve semantically relevant past messages. The bot can recall topics from weeks ago.

Giving Your Discord Bot a Personality

A generic “helpful assistant” bot is boring. A bot with a strong, consistent persona is something your community will actually use. The persona lives entirely in your system prompt:

You are Axiom, the no-nonsense tech support bot for this server.
You speak in short, direct sentences. You never apologize unnecessarily.
You're an expert in Python, Linux, and self-hosted tools.
When you don't know something, you say "I don't know" — no guessing.
Today is {date}. Current time: {time}.

The temperature parameter controls creativity — 0.7 is the sweet spot for most chatbots. Higher values (0.9+) produce more varied, creative responses; lower values (0.3) produce more focused, deterministic answers.

Troubleshooting Common Issues

Bot Is Online But Not Responding

Most Common IssueGo to the Discord Developer Portal → Your Bot → Bot tab → enable Message Content Intent. Without this, your bot literally cannot read what users type — it’s the #1 cause of silent bots.

Ollama Returns “Connection Refused”

Ollama must be running before you start the bot. Open a separate terminal and run ollama serve. Keep that window open for the duration of your bot’s operation.

Responses Are Cutting Off

Increase max_tokens or num_predict in your API call. Also: Discord has a 2,000-character message limit. Add logic to detect long responses and split them across multiple messages:

if len(reply) > 1900:
    chunks = [reply[i:i+1900] for i in range(0, len(reply), 1900)]
    for chunk in chunks:
        await message.channel.send(chunk)
else:
    await message.reply(reply)

Bot Responds to Other Bots

Add if message.author.bot: return at the very top of your on_message handler to filter out all bot-authored messages, including your own bot’s messages.

All Methods Compared Side by Side

Method	Skill Level	Cost	Privacy	Best For
llmcord	Low	Depends on provider	High w/ Ollama	Quick setup, small–medium servers
Python + Ollama	Medium	Free (hardware)	Excellent	Privacy-focused, custom logic
Python + OpenAI	Medium	Pay per token	Data sent to OpenAI	High quality, low hardware burden
Node.js + LM Studio	Medium	Free	Excellent	JS developers, local inference
Fine-tuned endpoint	High	Variable	Depends on host	Domain-specific expert bots
Docker + GPU server	High	Hardware / cloud	Excellent	Production, large communities

Security Best Practices

Never hardcode tokens or API keys — always use .env files and add them to .gitignore
Rate-limit users — a single user can flood your bot with expensive API calls or degrade the experience for everyone
Set a system prompt with explicit limits — without guardrails, users can manipulate the bot into off-topic or harmful content
Log interactions (without storing sensitive personal data) so you can audit unusual activity
Restrict which channels the bot responds in using role-based permissions or channel allowlists in your config

Advanced Features Worth Adding

Slash Commands

Register /ask, /summarize, or /explain slash commands using Discord’s application command system. These show up as autocomplete suggestions for users — much more discoverable than @mentions.

Streaming Responses

Instead of waiting for the full response before posting, stream tokens in real time and edit the bot’s message as text arrives. This feels much more natural and eliminates the awkward silence before long responses appear.

Multimodal Support

Models like GPT-4o, Claude 3.5 Sonnet, and LLaVA can process images. Users can attach a screenshot and ask the bot to explain, debug, or describe it. Enable by adding image processing to your message handler.

Thread-Based Conversations

Route each new conversation into its own Discord thread automatically. This keeps main channels clean while preserving full context for each individual chat — ideal for busy servers.

Frequently Asked Questions

Can I use Claude (Anthropic) as the LLM in my Discord bot?

Yes. The Anthropic Python SDK is straightforward to integrate. Replace the OpenAI client with Anthropic’s, adjust the model name to claude-sonnet-4-5 or claude-opus-4-5, and the rest of your bot code stays largely the same.

Do I need to know how to code to add an LLM to Discord?

For llmcord, minimal coding is needed — mostly YAML configuration. For a custom bot, basic Python knowledge is required. The barrier is lower than most people expect; the hardest part is usually setting up the Discord Developer Portal correctly.

Is it free to run an LLM on Discord?

Running local models via Ollama is free (you pay only for electricity and hardware). Cloud API providers charge per token. OpenAI’s GPT-4o pricing is listed on their platform page and varies by input vs. output tokens.

Can the bot respond in voice channels?

Text-only by default. Voice channel support requires an additional TTS (text-to-speech) layer using Discord.py’s voice client and a TTS engine like Coqui TTS or ElevenLabs. It’s possible but significantly more complex to implement.

How do I keep my bot running 24/7?

Deploy to a cloud server (AWS EC2, DigitalOcean, Google Cloud, Railway.app) or use a process manager like systemd or pm2 on a home server. Railway.app even has a one-click Discord bot deployment template for fast cloud hosting.

What’s the best model for a Discord bot?

For local use: Llama 3 8B via Ollama is the best balance of speed and quality on consumer hardware. For cloud: GPT-4o mini is fast and cost-effective for high-volume servers. For premium quality: GPT-4o or Claude Sonnet.

// CONCLUSION

Which Method Should You Use?

Putting an LLM into Discord has never been easier, and the right approach depends on your situation.

If you want to be up and running in under an hour without writing much code, llmcord is your answer. Clone it, configure a YAML file, and you have a production-quality multi-model bot with conversation threading and permissions built in.

If you want full ownership and zero ongoing API costs, the Python + Ollama approach gives you a completely private, locally-run LLM bot. It scales with your hardware and costs nothing after setup.

If you want the highest quality responses and don’t mind paying per token, connect your Python bot to OpenAI, Anthropic Claude, or Google Gemini. The code is minimal and the results are excellent.

Whatever path you choose, the fundamental architecture is the same: Discord receives the message, your bot passes it to an LLM, the response comes back. Once you understand that loop, every enhancement — memory, personas, slash commands, multimodal input — is just one more layer on top.

The AI is already part of your community. Now you know exactly how to make it official.

If You Have Hotel and You want IT Solutions For Hotel Contact Cyber Hospitality

Contact Info

Follow Us

Contact Info

Follow Us

How to Put LLMs into Discord | The Complete 2026 Guide