Crawler Inspector

1. Shard Calculation

Query:

Response:

Calculated Shard: 171 (from laksa033)

2. Crawled Status Check

Query:

curl -X POST \
  'http://laksa171.int.ahrefs:8124/' \
  -H 'Content-Type: text/plain' \
  -H 'X-ClickHouse-Database: crawler3' \
  -H 'Authorization: Basic YXBpOg==' \
  -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, meta_canonical AS meta_canonical, ml_categories_json AS ml_categories_json, ml_types_json AS ml_types_json, ml_intent_types_json AS ml_intent_types_json, meta_language AS meta_language, attrs_author AS attrs_author, ifNull(toUnixTimestamp(attrs_publish_time), 0) AS attrs_publish_time, ifNull(toUnixTimestamp(attrs_original_publish_time), 0) AS attrs_original_publish_time, ifNull(attrs_is_republished, 0) AS attrs_is_republished, ifNull(attrs_nr_words, 0) AS attrs_nr_words, ifNull(attrs_boilerpipe_nr_words, 0) AS attrs_boilerpipe_nr_words, ifNull(body_ext_links_number, 0) AS body_ext_links_number, ifNull(body_int_links_number, 0) AS body_int_links_number, ifNull(meta_nofollow, 0) AS meta_nofollow, ifNull(meta_noarchive, 0) AS meta_noarchive, ifNull(props_was_rendered, 0) AS props_was_rendered, ifNull(src_redirect, \'\') AS src_redirect, ifNull(download_time_msec, 0) AS download_time_msec, ifNull(download_ttfb_msec, 0) AS download_ttfb_msec, ifNull(download_size, 0) AS download_size FROM crawler3.page_info_local FINAL PREWHERE int_partition_id = 36 AND (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.hiascend.com/document/detail/zh/canncommercial/700/modeldevpt/ptmigr/AImpug_000028.html\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.hiascend.com/document/detail/zh/canncommercial/700/modeldevpt/ptmigr/AImpug_000028.html\'))) FORMAT JSONEachRow'

Response:

{"found_url":"https:\/\/www.hiascend.com\/document\/detail\/zh\/canncommercial\/700\/modeldevpt\/ptmigr\/AImpug_000028.html","crawl_time":1776895733,"first_indexed_time":1728363686,"http_code":200,"src_unparsed":"com,hiascend!www,\/document\/detail\/zh\/canncommercial\/700\/modeldevpt\/ptmigr\/AImpug_000028.html s443","src_root_hash":"2628830536891727371","history_drop_reason":null,"meta_title":"拉起多卡分布式训练-多卡分布式训练-模型训练-模型迁移与训练-PyTorch 网络模型迁移和训练-模型开发（PyTorch）-CANN商用版7.0.0开发文档-昇腾社区","meta_descriptions":["<!DOCTYPE html> 拉起多卡分布式训练 在单机和多机场景下，有4种方式可拉起分布式训练，分别为shell脚本方式（推荐）、mp.spawn方式、Python方式、torchrun方式。其中torchrun方式仅在PyTorch 1.11.0及以上版本支持使用。以下内容以一个简单模型脚本为样例，展示前3种拉起方式分别需要对脚本代码进行的修改。torchrun方式的代码修改与shell脚本"],"meta_canonical":null,"ml_categories_json":"","ml_types_json":"","ml_intent_types_json":"","meta_language":"zh","attrs_author":null,"attrs_publish_time":0,"attrs_original_publish_time":1728363686,"attrs_is_republished":0,"attrs_nr_words":"971","attrs_boilerpipe_nr_words":"814","body_ext_links_number":3,"body_int_links_number":53,"meta_nofollow":0,"meta_noarchive":0,"props_was_rendered":0,"src_redirect":"","download_time_msec":93,"download_ttfb_msec":89,"download_size":54975}

Filter	Status	Condition	Details
HTTP status	PASS	`download_http_code = 200`	HTTP 200
Age cutoff	PASS	`download_stamp > now() - 6 MONTH`	1.4 months ago
History drop	PASS	`isNull(history_drop_reason)`	No drop reason
Spam/ban	PASS	`fh_dont_index != 1 AND ml_spam_score = 0`	ml_spam_score=0
Canonical	PASS	`meta_canonical IS NULL OR = '' OR = src_unparsed`	Not set

Property	Value
URL	https://www.hiascend.com/document/detail/zh/canncommercial/700/modeldevpt/ptmigr/AImpug_000028.html
Last Crawled	2026-04-22 22:08:53 (1 month ago)
First Indexed	2024-10-08 05:01:26 (1 year ago)
HTTP Status Code	200
Content
Meta Title	拉起多卡分布式训练-多卡分布式训练-模型训练-模型迁移与训练-PyTorch 网络模型迁移和训练-模型开发（PyTorch）-CANN商用版7.0.0开发文档-昇腾社区
Meta Description	<!DOCTYPE html> 拉起多卡分布式训练在单机和多机场景下，有4种方式可拉起分布式训练，分别为shell脚本方式（推荐）、mp.spawn方式、Python方式、torchrun方式。其中torchrun方式仅在PyTorch 1.11.0及以上版本支持使用。以下内容以一个简单模型脚本为样例，展示前3种拉起方式分别需要对脚本代码进行的修改。torchrun方式的代码修改与shell脚本
Meta Canonical	null
Boilerpipe Text	heavy column, fetched on demand
Markdown	heavy column, fetched on demand
Readable Markdown	heavy column, fetched on demand
ML Classification
ML Categories	null
ML Page Types	null
ML Intent Types	null
Content Metadata
Language	zh
Author	null
Publish Time	not set
Original Publish Time	2024-10-08 05:01:26 (1 year ago)
Republished	No
Word Count (Total)	971
Word Count (Content)	814
Links
External Links	3
Internal Links	53
Technical SEO
Meta Nofollow	No
Meta Noarchive	No
JS Rendered	No
Redirect Target	null
Performance
Download Time (ms)	93
TTFB (ms)	89
Download Size (bytes)	54,975
Location
Host ID	171 (laksa171)
Partition ID	36
Root Hash	2628830536891727371
Unparsed URL	com,hiascend!www,/document/detail/zh/canncommercial/700/modeldevpt/ptmigr/AImpug_000028.html s443

🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

1. Shard Calculation

Query:

Response:

2. Crawled Status Check

Query:

Response:

3. Robots.txt Check

Query:

Response:

4. Spam/Ban Check

Query:

Response:

5. Seen Status Check

Page Info Filters

Page Details