Full site crawler

The full site crawler is the simplest and most automatic mode: it will analyze all accessible pages of your website starting from the homepage and following all internal links.

How to configure a full site crawler

Go to the "Crawler" section of the control panel
Click on "Create new crawler"
Select "Full site" as type
Enter your site's main URL (e.g. https://mysite.com)
Assign a name to the crawler (e.g. "Main site crawler")
Configure advanced settings if needed
Save and start the crawler

What the crawler does automatically

Starts from homepage: Begins from the URL you specified
Follows all internal links: Navigates through menus, footer, text links
Respects site structure: Maintains page hierarchy
Avoids duplicate content: Doesn't scan the same page multiple times
Handles errors: Skips pages not found or with errors

Available configurations

Scan depth

What it controls: How many link "hops" it can make from the initial page
Depth 1: Homepage only
Depth 2: Homepage + directly linked pages
Depth 3+: Also includes subpages of subpages
Recommendation: For most sites, depth 3-4 is sufficient

Page limits

Purpose: Avoid overloading the server and respect plan limits
Typical setting: 100-500 pages for medium-sized sites
Priority: The crawler scans the most important pages first

Update frequency

Daily: For sites with frequently changing content
Weekly: For most business sites
Monthly: For sites with static content
Manual: Only when you decide to update

Progress monitoring

During scanning you can monitor:

Pages processed: How many pages have been analyzed
Pages found: How many pages have been discovered
Errors: Pages not accessible or with problems
Status: If the crawler is active, completed or in error

Full site advantages

Simplicity: Minimal configuration required
Completeness: You don't miss any important pages
Automation: Updates itself periodically
Minimal maintenance: Works autonomously

Potential disadvantages

Irrelevant content: Might include non-useful pages (e.g. policies, cookies)
Resource usage: Consumes more plan quota
Processing time: Takes longer for large sites
Less control: You can't easily exclude specific sections

Best practices

Test first: Start with a low limit to see which pages are found
Monitor results: Check that important pages are being found
Subsequent exclusions: After first crawling, exclude non-useful pages
Gradual updates: Don't set too aggressive frequencies initially