Full site crawler

The full site crawler is the simplest and most automatic mode: it will analyze all accessible pages of your website starting from the homepage and following all internal links.

How to configure a full site crawler

  1. Go to the "Crawler" section of the control panel
  2. Click on "Create new crawler"
  3. Select "Full site" as type
  4. Enter your site's main URL (e.g. https://mysite.com)
  5. Assign a name to the crawler (e.g. "Main site crawler")
  6. Configure advanced settings if needed
  7. Save and start the crawler

What the crawler does automatically

  • Starts from homepage: Begins from the URL you specified
  • Follows all internal links: Navigates through menus, footer, text links
  • Respects site structure: Maintains page hierarchy
  • Avoids duplicate content: Doesn't scan the same page multiple times
  • Handles errors: Skips pages not found or with errors

Available configurations

Scan depth
  • What it controls: How many link "hops" it can make from the initial page
  • Depth 1: Homepage only
  • Depth 2: Homepage + directly linked pages
  • Depth 3+: Also includes subpages of subpages
  • Recommendation: For most sites, depth 3-4 is sufficient
Page limits
  • Purpose: Avoid overloading the server and respect plan limits
  • Typical setting: 100-500 pages for medium-sized sites
  • Priority: The crawler scans the most important pages first
Update frequency
  • Daily: For sites with frequently changing content
  • Weekly: For most business sites
  • Monthly: For sites with static content
  • Manual: Only when you decide to update

Progress monitoring

During scanning you can monitor:

  • Pages processed: How many pages have been analyzed
  • Pages found: How many pages have been discovered
  • Errors: Pages not accessible or with problems
  • Status: If the crawler is active, completed or in error

Full site advantages

  • Simplicity: Minimal configuration required
  • Completeness: You don't miss any important pages
  • Automation: Updates itself periodically
  • Minimal maintenance: Works autonomously

Potential disadvantages

  • Irrelevant content: Might include non-useful pages (e.g. policies, cookies)
  • Resource usage: Consumes more plan quota
  • Processing time: Takes longer for large sites
  • Less control: You can't easily exclude specific sections

Best practices

  • Test first: Start with a low limit to see which pages are found
  • Monitor results: Check that important pages are being found
  • Subsequent exclusions: After first crawling, exclude non-useful pages
  • Gradual updates: Don't set too aggressive frequencies initially