Crawler Inspector

1. Shard Calculation

Query:

Response:

Calculated Shard: 99 (from laksa002)

2. Crawled Status Check

Query:

curl -X POST \
  'http://laksa099.int.ahrefs:8124/' \
  -H 'Content-Type: text/plain' \
  -H 'X-ClickHouse-Database: crawler3' \
  -H 'Authorization: Basic YXBpOg==' \
  -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, meta_canonical AS meta_canonical, ml_categories_json AS ml_categories_json, ml_types_json AS ml_types_json, ml_intent_types_json AS ml_intent_types_json, meta_language AS meta_language, attrs_author AS attrs_author, ifNull(toUnixTimestamp(attrs_publish_time), 0) AS attrs_publish_time, ifNull(toUnixTimestamp(attrs_original_publish_time), 0) AS attrs_original_publish_time, ifNull(attrs_is_republished, 0) AS attrs_is_republished, ifNull(attrs_nr_words, 0) AS attrs_nr_words, ifNull(attrs_boilerpipe_nr_words, 0) AS attrs_boilerpipe_nr_words, ifNull(body_ext_links_number, 0) AS body_ext_links_number, ifNull(body_int_links_number, 0) AS body_int_links_number, ifNull(meta_nofollow, 0) AS meta_nofollow, ifNull(meta_noarchive, 0) AS meta_noarchive, ifNull(props_was_rendered, 0) AS props_was_rendered, ifNull(src_redirect, \'\') AS src_redirect, ifNull(download_time_msec, 0) AS download_time_msec, ifNull(download_ttfb_msec, 0) AS download_ttfb_msec, ifNull(download_size, 0) AS download_size FROM crawler3.page_info_local FINAL PREWHERE int_partition_id = 91 AND (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://www.dataapplab.com/math-you-need-to-succeed-in-ml-interviews/\')), getAhrefsUnparsedNoserviceFromURL(\'https://www.dataapplab.com/math-you-need-to-succeed-in-ml-interviews/\'))) FORMAT JSONEachRow'

Response:

{"found_url":"https:\/\/www.dataapplab.com\/math-you-need-to-succeed-in-ml-interviews\/","crawl_time":1779403254,"first_indexed_time":1635498758,"http_code":200,"src_unparsed":"com,dataapplab!www,\/math-you-need-to-succeed-in-ml-interviews\/ s443","src_root_hash":"3498991255532658299","history_drop_reason":null,"meta_title":"机器学习面试，你必须知道这些数学知识 - Data Application Lab","meta_descriptions":["机器学习是指训练计算机程序，以建立基于数据的统计模型的过程。 机器学习 (ML) 的目标是转换数据，并从数据中识别关键模式或获得关键见解。而数学是机器学习面试中的一大重点。为了练习，我们汇总了机器学习面试问题中的数学相关的函数和问题。这份ML最重要数学的指南会对大家的求职很有帮助。"],"meta_canonical":null,"ml_categories_json":"","ml_types_json":"","ml_intent_types_json":"","meta_language":"en-us","attrs_author":"Zhang Bonnie","attrs_publish_time":1635455220,"attrs_original_publish_time":1635455220,"attrs_is_republished":0,"attrs_nr_words":"433","attrs_boilerpipe_nr_words":"198","body_ext_links_number":5,"body_int_links_number":42,"meta_nofollow":0,"meta_noarchive":0,"props_was_rendered":0,"src_redirect":"","download_time_msec":786,"download_ttfb_msec":722,"download_size":26620}

Filter	Status	Condition	Details
HTTP status	PASS	`download_http_code = 200`	HTTP 200
Age cutoff	PASS	`download_stamp > now() - 6 MONTH`	0.4 months ago
History drop	PASS	`isNull(history_drop_reason)`	No drop reason
Spam/ban	PASS	`fh_dont_index != 1 AND ml_spam_score = 0`	ml_spam_score=0
Canonical	PASS	`meta_canonical IS NULL OR = '' OR = src_unparsed`	Not set

Property	Value
URL	https://www.dataapplab.com/math-you-need-to-succeed-in-ml-interviews/
Last Crawled	2026-05-21 22:40:54 (12 days ago)
First Indexed	2021-10-29 09:12:38 (4 years ago)
HTTP Status Code	200
Content
Meta Title	机器学习面试，你必须知道这些数学知识 - Data Application Lab
Meta Description	机器学习是指训练计算机程序，以建立基于数据的统计模型的过程。机器学习 (ML) 的目标是转换数据，并从数据中识别关键模式或获得关键见解。而数学是机器学习面试中的一大重点。为了练习，我们汇总了机器学习面试问题中的数学相关的函数和问题。这份ML最重要数学的指南会对大家的求职很有帮助。
Meta Canonical	null
Boilerpipe Text	heavy column, fetched on demand
Markdown	heavy column, fetched on demand
Readable Markdown	heavy column, fetched on demand
ML Classification
ML Categories	null
ML Page Types	null
ML Intent Types	null
Content Metadata
Language	en-us
Author	Zhang Bonnie
Publish Time	2021-10-28 21:07:00 (4 years ago)
Original Publish Time	2021-10-28 21:07:00 (4 years ago)
Republished	No
Word Count (Total)	433
Word Count (Content)	198
Links
External Links	5
Internal Links	42
Technical SEO
Meta Nofollow	No
Meta Noarchive	No
JS Rendered	No
Redirect Target	null
Performance
Download Time (ms)	786
TTFB (ms)	722
Download Size (bytes)	26,620
Location
Host ID	99 (laksa099)
Partition ID	91
Root Hash	3498991255532658299
Unparsed URL	com,dataapplab!www,/math-you-need-to-succeed-in-ml-interviews/ s443

🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

1. Shard Calculation

Query:

Response:

2. Crawled Status Check

Query:

Response:

3. Robots.txt Check

Query:

Response:

4. Spam/Ban Check

Query:

Response:

5. Seen Status Check

Page Info Filters

Page Details