Crawler Inspector

1. Shard Calculation

Query:

Response:

Calculated Shard: 16 (from laksa017)

2. Crawled Status Check

Query:

curl -X POST \
  'http://laksa016.int.ahrefs:8124/' \
  -H 'Content-Type: text/plain' \
  -H 'X-ClickHouse-Database: crawler3' \
  -H 'Authorization: Basic YXBpOg==' \
  -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, meta_canonical AS meta_canonical, ml_categories_json AS ml_categories_json, ml_types_json AS ml_types_json, ml_intent_types_json AS ml_intent_types_json, meta_language AS meta_language, attrs_author AS attrs_author, ifNull(toUnixTimestamp(attrs_publish_time), 0) AS attrs_publish_time, ifNull(toUnixTimestamp(attrs_original_publish_time), 0) AS attrs_original_publish_time, ifNull(attrs_is_republished, 0) AS attrs_is_republished, ifNull(attrs_nr_words, 0) AS attrs_nr_words, ifNull(attrs_boilerpipe_nr_words, 0) AS attrs_boilerpipe_nr_words, ifNull(body_ext_links_number, 0) AS body_ext_links_number, ifNull(body_int_links_number, 0) AS body_int_links_number, ifNull(meta_nofollow, 0) AS meta_nofollow, ifNull(meta_noarchive, 0) AS meta_noarchive, ifNull(props_was_rendered, 0) AS props_was_rendered, ifNull(src_redirect, \'\') AS src_redirect, ifNull(download_time_msec, 0) AS download_time_msec, ifNull(download_ttfb_msec, 0) AS download_ttfb_msec, ifNull(download_size, 0) AS download_size FROM crawler3.page_info_local FINAL PREWHERE int_partition_id = 97 AND (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.53ai.com/news/finetuning/2025072471386.html\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.53ai.com/news/finetuning/2025072471386.html\'))) FORMAT JSONEachRow'

Response:

{"found_url":"https:\/\/www.53ai.com\/news\/finetuning\/2025072471386.html","crawl_time":1775579868,"first_indexed_time":1753320474,"http_code":200,"src_unparsed":"com,53ai!www,\/news\/finetuning\/2025072471386.html s443","src_root_hash":"336105317631279416","history_drop_reason":null,"meta_title":"150%训练效率提升：感知检测小模型训练优化方法 - 53AI-AI知识库|企业AI知识库|大模型知识库|AIHub","meta_descriptions":["深入探讨 150%训练效率提升的感知检测小模型训练优化方法，聚焦 RAG 技术。文中基于业务实践，总结不同算力卡上的训练之道，为智能驾驶场景提供借鉴。涵盖 maptr、sparsedrive、qcnet、GaussianFormer 等适用于智能驾驶的小模型，详解从选择 dsw 镜像到执行训练命令的全流程。还介绍模型微调方法与技术，助力提升模型性能。点击阅读，了解详情，开启智能驾驶模型训练新征程。"],"meta_canonical":null,"ml_categories_json":"","ml_types_json":"","ml_intent_types_json":"","meta_language":null,"attrs_author":null,"attrs_publish_time":0,"attrs_original_publish_time":1753320474,"attrs_is_republished":0,"attrs_nr_words":"975","attrs_boilerpipe_nr_words":"595","body_ext_links_number":10,"body_int_links_number":118,"meta_nofollow":0,"meta_noarchive":0,"props_was_rendered":0,"src_redirect":"","download_time_msec":1699,"download_ttfb_msec":1696,"download_size":36483}

Filter	Status	Condition	Details
HTTP status	PASS	`download_http_code = 200`	HTTP 200
Age cutoff	PASS	`download_stamp > now() - 6 MONTH`	1.9 months ago
History drop	PASS	`isNull(history_drop_reason)`	No drop reason
Spam/ban	PASS	`fh_dont_index != 1 AND ml_spam_score = 0`	ml_spam_score=0
Canonical	PASS	`meta_canonical IS NULL OR = '' OR = src_unparsed`	Not set

Property	Value
URL	https://www.53ai.com/news/finetuning/2025072471386.html
Last Crawled	2026-04-07 16:37:48 (1 month ago)
First Indexed	2025-07-24 01:27:54 (10 months ago)
HTTP Status Code	200
Content
Meta Title	150%训练效率提升：感知检测小模型训练优化方法 - 53AI-AI知识库\|企业AI知识库\|大模型知识库\|AIHub
Meta Description	深入探讨 150%训练效率提升的感知检测小模型训练优化方法，聚焦 RAG 技术。文中基于业务实践，总结不同算力卡上的训练之道，为智能驾驶场景提供借鉴。涵盖 maptr、sparsedrive、qcnet、GaussianFormer 等适用于智能驾驶的小模型，详解从选择 dsw 镜像到执行训练命令的全流程。还介绍模型微调方法与技术，助力提升模型性能。点击阅读，了解详情，开启智能驾驶模型训练新征程。
Meta Canonical	null
Boilerpipe Text	heavy column, fetched on demand
Markdown	heavy column, fetched on demand
Readable Markdown	heavy column, fetched on demand
ML Classification
ML Categories	null
ML Page Types	null
ML Intent Types	null
Content Metadata
Language	null
Author	null
Publish Time	not set
Original Publish Time	2025-07-24 01:27:54 (10 months ago)
Republished	No
Word Count (Total)	975
Word Count (Content)	595
Links
External Links	10
Internal Links	118
Technical SEO
Meta Nofollow	No
Meta Noarchive	No
JS Rendered	No
Redirect Target	null
Performance
Download Time (ms)	1,699
TTFB (ms)	1,696
Download Size (bytes)	36,483
Location
Host ID	16 (laksa016)
Partition ID	97
Root Hash	336105317631279416
Unparsed URL	com,53ai!www,/news/finetuning/2025072471386.html s443

🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

1. Shard Calculation

Query:

Response:

2. Crawled Status Check

Query:

Response:

3. Robots.txt Check

Query:

Response:

4. Spam/Ban Check

Query:

Response:

5. Seen Status Check

Page Info Filters

Page Details