Skip to main content
Training your chatbot is how you teach it to answer questions about your business, products, and services. SiteSpeakAI supports multiple content sources so you can build a comprehensive knowledge base for your AI agent.

How Training Works

1

Add sources

Connect your content sources like websites, documents, or app integrations.
2

Select content

Choose which pages, files, or data you want your chatbot to learn from.
3

Train

SiteSpeakAI processes your content and builds a searchable knowledge base.
4

Test & refine

Ask your chatbot questions and fine-tune responses as needed.

Supported Source Types

When you click + Add Sources, you can choose from the following source types:

Website

Add a website URL and SiteSpeakAI will crawl it to extract text content. This may take a minute or two depending on the size of the website. Learn more Add individual page URLs to train on specific pages rather than an entire website. Learn more

Sitemap

Provide a sitemap URL to automatically discover and crawl all pages on your site.

Text

Upload plain text files or paste text content directly.

PDF

Upload PDF documents like product manuals, guides, and policies. Learn more

Audio

Upload audio files to be transcribed and used for training. Learn more

Video

Upload video files to extract and train on the audio content.

Apps

Connect third-party platforms to train on their content. Learn more Available integrations:
  • Notion: Connect your Notion workspace
  • BookStack: Wiki and knowledge base content
  • OneNote: Connect your Microsoft OneNote notebooks
  • Google Drive: Connect your Google Drive documents
  • Discord: Select Discord channels to train on
Intercom and Google Docs integrations are coming soon.

Accessing Training Sources

1

Go to Training & Content

In your chatbot dashboard, click Training & Content in the sidebar.
2

Select Sources

Click on Sources to view and manage your training content.
3

Add new sources

Click + Add Sources to connect new content.
Training sources list

Managing Your Sources

Source Status

Each source shows its current status:
StatusMeaning
Trained (green)Content is processed and ready
TrainingCurrently being processed
PendingQueued for training
ErrorSomething went wrong

Source Information

For each source you can see:
  • Name: The page title or file name
  • URL: Source location (if applicable)
  • Type: The source type (link icon for URLs, etc.)
  • Size: Amount of content (e.g., 3.6 KB, 7.2 KB)
  • Status: Training status (Trained, Training, Pending, Error)
  • Auto: Whether auto-sync is enabled
  • Last Trained: When it was last processed (e.g., 18 hours ago, 4 months ago)

Managing Sources

Select one or more sources using the checkboxes to reveal action buttons:
  • Delete: Remove selected sources from training
  • Retrain: Re-fetch content and retrain selected sources
  • Auto Sync: Enable automatic syncing for selected sources

Best Practices

Quality Over Quantity

  • Focus on accurate, well-written content
  • Remove outdated or duplicate information
  • Organize content clearly with headings

Keep Content Updated

  • Enable auto-sync for dynamic sources
  • Regularly review and refresh static content
  • Remove sources that are no longer relevant

Test Thoroughly

  • Ask your chatbot common customer questions
  • Check that answers cite the correct sources
  • Use fine-tuning to correct mistakes

Source Guides


Ready to automate your customer service with AI?

Join over 1000+ businesses, websites and startups automating their customer service and other tasks with a custom trained AI agent.
Last modified on January 22, 2026