🕷️ Crawler Inspector
URL Lookup
URL:
Lookup by URL
Direct Parameter Lookup
Host ID:
Partition ID:
Unparsed Hash:
Lookup by Parameters
Raw Queries and Responses
1. Shard Calculation
Query:
curl -X POST \ 'http://laksa143.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d 'SELECT getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')) AS root_hash, root_hash % 200 AS shard FORMAT JSONEachRow'
Response:
{"root_hash":"2124610269177942191","shard":191}
Calculated Shard:
191 (from laksa143)
2. Crawled Status Check
Query:
curl -X POST \ 'http://laksa191.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, attrs_boilerpipe_text AS attrs_boilerpipe_text, attrs_markdown AS attrs_markdown, attrs_readable_markdown AS attrs_readable_markdown, meta_canonical AS meta_canonical FROM crawler3.page_info_local FINAL PREWHERE (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\'))) FORMAT JSONEachRow'
Response:
3. Robots.txt Check
Query:
curl -sS --get \ 'http://fish014.int.ahrefs:12055/access' \ --data-urlencode 'max_retries=0' \ --data-urlencode 'pid=1775492330:2988755:page-crawl-status-tool@yepsand' \ --data-urlencode 'kind=check' \ --data-urlencode 'url=https://www.mattresspluscanada.ca/2347832442bedding.html'
Response:
"Allowed"
4. Spam/Ban Check
Query:
curl -X POST \ 'http://laksa191.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d 'SELECT fh_dont_index, ml_spam_score FROM robots.target_settings_local FINAL WHERE src_root_hash = getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')) AND startsWith(getAhrefsDropPortFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')), src_unparsed_prefix) ORDER BY length(src_unparsed_prefix) DESC LIMIT 1 FORMAT JSONEachRow'
Response:
5. Seen Status Check
Query:
curl -X POST \ 'http://laksa191.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d '(SELECT getAhrefsURLFromUnparsed(dst_unparsed) AS found_url, dst_unparsed AS unparsed, dst_root_hash AS root_hash FROM crawler3.urls_local FINAL PREWHERE (dst_root_hash, dst_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')))) UNION ALL (SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, src_unparsed AS unparsed, src_root_hash AS root_hash FROM web_queue.crawl5_local FINAL PREWHERE crawl_yyyymm >= toYYYYMM(today() - INTERVAL 2 MONTHS) AND (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.mattresspluscanada.ca/2347832442bedding.html\')))) FORMAT JSONEachRow'
Response:
Crawled check error: Failed to connect to laksa191.int.ahrefs port 8124 after 1 ms: Could not connect to server