🕷️ Crawler Inspector
URL Lookup
URL:
Lookup by URL
Direct Parameter Lookup
Host ID:
Partition ID:
Unparsed Hash:
Lookup by Parameters
Raw Queries and Responses
1. Shard Calculation
Query:
curl -X POST \ 'http://laksa039.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d 'SELECT getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')) AS root_hash, root_hash % 200 AS shard FORMAT JSONEachRow'
Response:
{"root_hash":"11387554747766093965","shard":165}
Calculated Shard:
165 (from laksa039)
2. Crawled Status Check
Query:
curl -X POST \ 'http://laksa165.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, attrs_boilerpipe_text AS attrs_boilerpipe_text, attrs_markdown AS attrs_markdown, attrs_readable_markdown AS attrs_readable_markdown, meta_canonical AS meta_canonical FROM crawler3.page_info_local FINAL PREWHERE (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\'))) FORMAT JSONEachRow'
Response:
3. Robots.txt Check
Query:
curl -sS --get \ 'http://fish193.int.ahrefs:12055/access' \ --data-urlencode 'max_retries=0' \ --data-urlencode 'pid=1775593456:4187748:page-crawl-status-tool@yepsand' \ --data-urlencode 'kind=check' \ --data-urlencode 'url=https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578'
Response:
"Allowed"
4. Spam/Ban Check
Query:
curl -X POST \ 'http://laksa165.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d 'SELECT fh_dont_index, ml_spam_score FROM robots.target_settings_local FINAL WHERE src_root_hash = getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')) AND startsWith(getAhrefsDropPortFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')), src_unparsed_prefix) ORDER BY length(src_unparsed_prefix) DESC LIMIT 1 FORMAT JSONEachRow'
Response:
5. Seen Status Check
Query:
curl -X POST \ 'http://laksa165.int.ahrefs:8124/' \ -H 'Content-Type: text/plain' \ -H 'X-ClickHouse-Database: crawler3' \ -H 'Authorization: Basic YXBpOg==' \ -d '(SELECT getAhrefsURLFromUnparsed(dst_unparsed) AS found_url, dst_unparsed AS unparsed, dst_root_hash AS root_hash FROM crawler3.urls_local FINAL PREWHERE (dst_root_hash, dst_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')))) UNION ALL (SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, src_unparsed AS unparsed, src_root_hash AS root_hash FROM web_queue.crawl5_local FINAL PREWHERE crawl_yyyymm >= toYYYYMM(today() - INTERVAL 2 MONTHS) AND (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.amazon.com/Memes-Bikini-Bottom-SpongeBob-Squarepants/dp/1546147578\')))) FORMAT JSONEachRow'
Response:
Crawled check error: Failed to connect to laksa165.int.ahrefs port 8124 after 1 ms: Could not connect to server