Spider Quick Reference
Web crawler for discovering and indexing pages and links
Key Features
- Crawl websites and discover links
- Index page content
- Follow links automatically
- Configurable crawl depth
- URL filtering and exclusions
- Crawl rate limiting
- Queue-based crawling
- Real-time crawl monitoring
Server-Side Usage:
// Get the Spider API
var Spider = js.getObject("/OpenForum/AddOn/Spider","Spider.sjs");
// Start spider crawl
Spider.crawl("https://example.com", {
maxDepth: 3,
maxPages: 100,
followExternal: false
});
// Stop spider
Spider.stop();
// Get crawl status
var status = Spider.getStatus();
// Get discovered URLs
var urls = Spider.getDiscoveredURLs();
// Configure spider
Spider.configure({
maxDepth: 5,
maxPages: 500,
delay: 1000, // milliseconds between requests
userAgent: "OpenForum Spider"
});
Client-Side Usage:
// Start spider
JSON.post('/OpenForum/AddOn/Spider/Start', null,
'url=' encodeURIComponent('https://example.com')
'&maxDepth=3&maxPages=100')
.onSuccess(function(result) {
console.log('Spider started:', result);
}).go();
// Stop spider
JSON.post('/OpenForum/AddOn/Spider/Stop')
.onSuccess(function(result) {
console.log('Spider stopped');
}).go();
// Get status
JSON.get('/OpenForum/AddOn/Spider/Status')
.onSuccess(function(status) {
console.log('Crawl progress:', status);
}).go();
Configuration Options
- maxDepth - Maximum link depth to follow
- maxPages - Maximum number of pages to crawl
- delay - Delay between requests (milliseconds)
- followExternal - Follow external links (true/false)
- userAgent - User agent string
- excludePatterns - URL patterns to exclude
- includePatterns - URL patterns to include only
Status Information
- Pages crawled
- Queue size
- Current URL
- Discovered links
- Errors encountered
Configuration
- Starting URL
- Crawl depth limits
- Rate limiting
- URL filters and exclusions