Monitoring Crawled Pages

Detailed monitoring of crawled pages allows you to verify crawling quality and identify potential issues or optimization opportunities.

Accessing monitoring

Go to the "Crawler" section in the control panel
Select the crawler you want to monitor
Click on "View pages" or "Scan details"
Choose between "Processed pages" or "Pages with errors"

Available information for each page

Successfully processed pages

For each successfully scanned page you can see:

tag

Full URL: The page address
Page title: The extracted
Scan date: When it was last processed
Content size: Amount of extracted text
HTTP code: Usually 200 (success)
Processing time: How long the scan took

Pages with errors

For pages that generated errors:

Problematic URL: The address that caused the error
Error type: HTTP code (404, 500, etc.) or problem type
Error message: Detailed description of the problem
Last attempt date: When the error occurred
Number of attempts: How many times the system tried

Filters and search

Available filters

By status: Success only, errors only, or all
By date: Pages scanned in a specific period
By size: Pages with high or low content
By HTTP code: Filter by specific response codes

Text search

By URL: Find pages with specific URLs
By title: Search pages with particular titles
By content: Find pages containing certain words

Quality analysis

Indicators of useful pages

Substantial content: Pages with at least 200-300 words
Clear titles: Meaningful and descriptive title tags
HTML structure: Proper use of H1, H2, paragraphs
Unique content: Text not duplicated from other pages

Indicators of problematic pages

Poor content: Less than 50 words of text
Mainly navigation: Only menus and links, little content
Duplicate content: Identical to other already scanned pages
Recurring errors: Always problematic in access

Available actions

Single page management

Exclusion: Add URLs to exclusion rules
Forced re-scan: Force new processing
Content view: See extracted text
Direct opening: Visit the original page

Bulk actions

Multiple exclusion: Exclude multiple pages simultaneously
Group re-scan: Reprocess multiple pages
Export: Download URL lists for external analysis

Pattern interpretation

Positive patterns

Steady growth: Number of processed pages increases over time
Stable or decreasing errors: Indicates optimal configuration
Quality content: Most pages have substantial content
Complete coverage: All important sections are represented

Problematic patterns

Many 404 errors: Site has removed many pages
Recurring 403/500 errors: Possible site configuration issues
Generalized poor content: Crawler is taking non-useful pages
Stagnation: No new pages found for a long time

Data-based optimization

Rule improvement

Targeted exclusions: Exclude URL patterns that generate only poor content
Specific inclusions: Add rules to capture quality content
Depth balancing: Adjust depth based on results

Priority management

Key pages: Ensure most important pages are always scanned
Differentiated frequency: Scan more often sections that change frequently
Smart limit: Focus resources on most useful pages

Reports and export

Automatic reports

Weekly summary: Statistics of recent scans
Error alerts: Notifications when errors exceed thresholds
New content: List of newly found pages

Data export

Page CSV: Complete list for external analysis
Error reports: For technical problem resolution
Temporal statistics: Performance trends over time