🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 46 (from laksa174)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📍
LOCATION
Host 46 · Partition 26
laksa046
10938660598884985246
📄
INDEXABLE
CRAWLED
1 month ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH1.6 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://www.cnblogs.com/ysngki/p/17814160.html
Last Crawled2026-04-17 19:36:45 (1 month ago)
First Indexed2023-11-30 05:29:46 (2 years ago)
HTTP Status Code200
Content
Meta TitleFairseq 机器翻译全流程一文速通 (NMT, WMT, translation) - ysngki - 博客园
Meta Description最新编辑于:2024年8月30日 一、摘要 fairseq 是个常用的机器翻译项目。它的优化很好,但代码晦涩难懂,限制了我们的使用。 本文旨在梳理如下流程:1)准备 WMT23 的数据 (其余生成任务皆可类比),2)训练模型,3)用 sacrebleu、COMET-22 评测模型。 不想要 wmt
Meta Canonicalnull
Boilerpipe Text
heavy column, fetched on demand
Markdown
heavy column, fetched on demand
Readable Markdown
heavy column, fetched on demand
ML Classification
ML Categoriesnull
ML Page Typesnull
ML Intent Typesnull
Content Metadata
Languagezh-cn
Authornull
Publish Timenot set
Original Publish Time2023-11-30 05:29:46 (2 years ago)
RepublishedNo
Word Count (Total)1,767
Word Count (Content)1,703
Links
External Links15
Internal Links27
Technical SEO
Meta NofollowNo
Meta NoarchiveNo
JS RenderedNo
Redirect Targetnull
Performance
Download Time (ms)1,434
TTFB (ms)1,317
Download Size (bytes)17,516
Location
Host ID46 (laksa046)
Partition ID26
Root Hash10938660598884985246
Unparsed URLcom,cnblogs!www,/ysngki/p/17814160.html s443