The full site crawler is the simplest and most automatic mode: it will analyze all accessible pages of your website starting from the homepage and following all internal links.
How to configure a full site crawler
- Go to the "Crawler" section of the control panel
- Click on "Create new crawler"
- Select "Full site" as type
- Enter your site's main URL (e.g. https://mysite.com)
- Assign a name to the crawler (e.g. "Main site crawler")
- Configure advanced settings if needed
- Save and start the crawler
What the crawler does automatically
- Starts from homepage: Begins from the URL you specified
- Follows all internal links: Navigates through menus, footer, text links
- Respects site structure: Maintains page hierarchy
- Avoids duplicate content: Doesn't scan the same page multiple times
- Handles errors: Skips pages not found or with errors
Available configurations
Scan depth
- What it controls: How many link "hops" it can make from the initial page
- Depth 1: Homepage only
- Depth 2: Homepage + directly linked pages
- Depth 3+: Also includes subpages of subpages
- Recommendation: For most sites, depth 3-4 is sufficient
Page limits
- Purpose: Avoid overloading the server and respect plan limits
- Typical setting: 100-500 pages for medium-sized sites
- Priority: The crawler scans the most important pages first
Update frequency
- Daily: For sites with frequently changing content
- Weekly: For most business sites
- Monthly: For sites with static content
- Manual: Only when you decide to update
Progress monitoring
During scanning you can monitor:
- Pages processed: How many pages have been analyzed
- Pages found: How many pages have been discovered
- Errors: Pages not accessible or with problems
- Status: If the crawler is active, completed or in error
Full site advantages
- Simplicity: Minimal configuration required
- Completeness: You don't miss any important pages
- Automation: Updates itself periodically
- Minimal maintenance: Works autonomously
Potential disadvantages
- Irrelevant content: Might include non-useful pages (e.g. policies, cookies)
- Resource usage: Consumes more plan quota
- Processing time: Takes longer for large sites
- Less control: You can't easily exclude specific sections
Best practices
- Test first: Start with a low limit to see which pages are found
- Monitor results: Check that important pages are being found
- Subsequent exclusions: After first crawling, exclude non-useful pages
- Gradual updates: Don't set too aggressive frequencies initially