🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 119 (from laksa178)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📄
INDEXABLE
CRAWLED
10 hours ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://spacy.io/
Last Crawled2026-04-07 12:19:09 (10 hours ago)
First Indexed2015-12-03 04:33:17 (10 years ago)
HTTP Status Code200
Meta TitlespaCy · Industrial-strength Natural Language Processing in Python
Meta DescriptionspaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Meta Canonicalnull
Boilerpipe Text
Industrial-Strength Natural Language Processing in Python Get things done spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. Blazing fast spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Awesome ecosystem Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows. Edit the code & try spaCy spaCy v3.7 · Python 3 · via Binder Features Support for 75+ languages 84 trained pipelines for 25 languages Multi-task learning with pretrained transformers like BERT Pretrained word vectors State-of-the-art speed Production-ready training system Linguistically-motivated tokenization Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification , lemmatization, morphological analysis, entity linking and more Easily extensible with custom components and attributes Support for custom models in PyTorch , TensorFlow and other frameworks Built in visualizers for syntax and NER Easy model packaging , deployment and workflow management Robust, rigorously evaluated accuracy Reproducible training for custom pipelines spaCy v3.0 introduces a comprehensive and extensible system for configuring your training runs . Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to rerun your experiments and track changes. You can use the quickstart widget or the init config command to get started, or clone a project template for an end-to-end workflow. Get started Language Components tagger morphologizer trainable_lemmatizer parser ner spancat textcat Hardware CPU GPU (transformer) Optimize for efficiency accuracy # This is an auto-generated partial config. To use it with 'spacy train' # you can run spacy init fill-config to auto-fill all default settings: # python -m spacy init fill-config ./base_config.cfg ./config.cfg [ paths ] train = null dev = null vectors = null [ system ] gpu_allocator = null [ nlp ] lang = " en " pipeline = [] batch_size = 1000 [ components ] [ corpora ] [ corpora.train ] @readers = " spacy.Corpus.v1 " path = ${paths.train} max_length = 0 [ corpora.dev ] @readers = " spacy.Corpus.v1 " path = ${paths.dev} max_length = 0 [ training ] dev_corpus = " corpora.dev " train_corpus = " corpora.train " [ training.optimizer ] @optimizers = " Adam.v1 " [ training.batcher ] @batchers = " spacy.batch_by_words.v1 " discard_oversize = false tolerance = 0.2 [ training.batcher.size ] @schedules = " compounding.v1 " start = 100 stop = 1000 compound = 1.001 [ initialize ] vectors = ${paths.vectors} End-to-end workflows from prototype to production spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those data transformation , preprocessing and training steps , so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations. Try it out Benchmarks spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current state-of-the-art . You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run. More results Pipeline Parser Tagger NER en_core_web_trf (spaCy v3) 95.1 97.8 89.8 en_core_web_lg (spaCy v3) 92.0 97.4 85.5 en_core_web_lg (spaCy v2) 91.9 97.2 85.5 Full pipeline accuracy on the OntoNotes 5.0 corpus (reported on the development set). Named Entity Recognition System OntoNotes CoNLL ‘03 spaCy RoBERTa (2020) 89.8 91.6 Stanza (StanfordNLP) 1 88.8 92.1 Flair 2 89.7 93.1 Named entity recognition accuracy on the OntoNotes 5.0 and CoNLL-2003 corpora. See NLP-progress for more results. Project template: benchmarks/ner_conll03 . 1. Qi et al. (2020) . 2. Akbik et al. (2018) .
Markdown
[spaCy](https://spacy.io/) [💥 **New:** spaCy for PDFs and Word docs](https://github.com/explosion/spacy-layout) - [Usage](https://spacy.io/usage) - [Models](https://spacy.io/models) - [API](https://spacy.io/api) - [Universe](https://spacy.io/universe) Search # Industrial-Strength Natural Language Processing ## in Python ### Get things done spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. [Get started](https://spacy.io/usage/spacy-101) ### Blazing fast spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. [Facts & Figures](https://spacy.io/usage/facts-figures) ### Awesome ecosystem Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows. [Read more](https://spacy.io/usage/projects) ``` Edit the code & try spaCyspaCy v3.7 · Python 3 · via Binder run ``` ## Features - Support for **75+ languages** - **84 trained pipelines** for 25 languages - Multi-task learning with pretrained **transformers** like BERT - Pretrained **word vectors** - State-of-the-art speed - Production-ready **training system** - Linguistically-motivated **tokenization** - Components for **named entity** recognition, part-of-speech tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more - Easily extensible with **custom components** and attributes - Support for custom models in **PyTorch**, **TensorFlow** and other frameworks - Built in **visualizers** for syntax and NER - Easy **model packaging**, deployment and workflow management - Robust, rigorously evaluated accuracy ### NEW [Large Language Models: Integrating LLMs into structured NLP pipelines](https://spacy.io/usage/large-language-models) [The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large Language Models (LLMs) into spaCy, featuring a modular system for **fast prototyping** and **prompting**, and turning unstructured responses into **robust outputs** for various NLP tasks, **no training data** required. [Learn more](https://spacy.io/usage/large-language-models) ### From the makers of spaCy [Prodigy: Radically efficient machine teaching](https://prodi.gy/) [![Prodigy: Radically efficient machine teaching](https://spacy.io/_next/static/media/prodigy_overview.28855944.jpg)](https://prodi.gy/) Prodigy is an **annotation tool** so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you **train and evaluate** your models faster. [Try it out](https://prodi.gy/) ## Reproducible training for custom pipelines spaCy v3.0 introduces a comprehensive and extensible system for **configuring your training runs**. Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to **rerun your experiments** and track changes. You can use the quickstart widget or the [`init config`](https://spacy.io/api/cli#init-config) command to get started, or clone a project template for an end-to-end workflow. [Get started](https://spacy.io/usage/training) Language Components tagger morphologizer trainable\_lemmatizer parser ner spancat textcat Hardware CPU GPU (transformer) Optimize for efficiency accuracy ``` # This is an auto-generated partial config. To use it with 'spacy train' # you can run spacy init fill-config to auto-fill all default settings: # python -m spacy init fill-config ./base_config.cfg ./config.cfg [paths] train = null dev = null vectors = null [system] gpu_allocator = null [nlp] lang = "en" pipeline = [] batch_size = 1000 [components] [corpora] [corpora.train] @readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 0 [corpora.dev] @readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 [training] dev_corpus = "corpora.dev" train_corpus = "corpora.train" [training.optimizer] @optimizers = "Adam.v1" [training.batcher] @batchers = "spacy.batch_by_words.v1" discard_oversize = false tolerance = 0.2 [training.batcher.size] @schedules = "compounding.v1" start = 100 stop = 1000 compound = 1.001 [initialize] vectors = ${paths.vectors} ``` [![Illustration of project workflow and commands](https://spacy.io/_next/static/media/projects.ff53d6cb.png)](https://spacy.io/usage/projects) #### 🪐Get started: [`pipelines/tagger_parser_ud`](https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud) The easiest way to get started is to clone a project template and run it – for example, this template for training a **part-of-speech tagger** and **dependency parser** on a Universal Dependencies treebank. \$ python -m spacy project clone pipelines/tagger\_parser\_ud ## End-to-end workflows from prototype to production spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those **data transformation**, preprocessing and **training steps**, so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations. [Try it out](https://spacy.io/usage/projects) [![spaCy Tailored Pipelines](https://spacy.io/_next/static/media/spacy-tailored-pipelines_wide.40a24484.png)](https://explosion.ai/custom-solutions) **Get a custom spaCy pipeline, tailor-made for your NLP problem by spaCy's core developers.** - **Streamlined.** Nobody knows spaCy better than we do. Send us your pipeline requirements and we'll be ready to start producing your solution in no time at all. - **Production ready.** spaCy pipelines are robust and easy to deploy. You'll get a complete spaCy project folder which is ready to `spacy project run`. - **Predictable.** You'll know exactly what you're going to get and what it's going to cost. We quote fees up-front, let you try before you buy, and don't charge for over-runs at our end — all the risk is on us. - **Maintainable.** spaCy is an industry standard, and we'll deliver your pipeline with full code, data, tests and documentation, so your team can retrain, update and extend the solution as your requirements change. [Learn more](https://explosion.ai/custom-solutions) [![Advanced NLP with spaCy: A free online course](https://spacy.io/_next/static/media/course.6d34fa59.jpg)](https://course.spacy.io/) In this **free and interactive online course** you’ll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches. It includes **55 exercises** featuring videos, slide decks, multiple-choice questions and interactive coding practice in the browser. [Start the course](https://course.spacy.io/) ## Benchmarks spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current **state-of-the-art**. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run. [More results](https://spacy.io/usage/facts-figures#benchmarks) | Pipeline | Parser | Tagger | NER | |---|---|---|---| | [`en_core_web_trf`](https://spacy.io/models/en#en_core_web_trf) (spaCy v3) | 95\.1 | 97\.8 | 89\.8 | | [`en_core_web_lg`](https://spacy.io/models/en#en_core_web_lg) (spaCy v3) | 92\.0 | 97\.4 | 85\.5 | | `en_core_web_lg` (spaCy v2) | 91\.9 | 97\.2 | 85\.5 | **Full pipeline accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) corpus (reported on the development set). | Named Entity Recognition System | OntoNotes | CoNLL ‘03 | |---|---|---| | spaCy RoBERTa (2020) | 89\.8 | 91\.6 | | Stanza (StanfordNLP)1 | 88\.8 | 92\.1 | | Flair2 | 89\.7 | 93\.1 | **Named entity recognition accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) and [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419.pdf) corpora. See [NLP-progress](http://nlpprogress.com/english/named_entity_recognition.html) for more results. Project template: [`benchmarks/ner_conll03`](https://github.com/explosion/projects/tree/v3/benchmarks/ner_conll03). **1\.** [Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). **2\.** [Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/). - spaCy - [Usage](https://spacy.io/usage) - [Models](https://spacy.io/models) - [API Reference](https://spacy.io/api) - [Online Course](https://course.spacy.io/) - [Custom Solutions](https://explosion.ai/custom-solutions) - Community - [Universe](https://spacy.io/universe) - [GitHub Discussions](https://github.com/explosion/spaCy/discussions) - [Issue Tracker](https://github.com/explosion/spaCy/issues) - [Stack Overflow](http://stackoverflow.com/questions/tagged/spacy) - [Merchandise](https://explosion.ai/merch) - Connect - [Bluesky](https://bsky.app/profile/explosion-ai.bsky.social) - [GitHub](https://github.com/explosion/spaCy) - [Live Stream](https://www.youtube.com/playlist?list=PLBmcuObd5An5_iAxNYLJa_xWmNzsYce8c) - [YouTube](https://youtube.com/c/ExplosionAI) - [Blog](https://explosion.ai/blog) - Stay in the loop\! - Receive updates about new releases, tutorials and more. © 2016-2025 [Explosion](https://explosion.ai/)[Legal / Imprint](https://explosion.ai/legal)
Readable Markdown
Industrial-Strength Natural Language Processingin Python ### Get things done spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. ### Blazing fast spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. ### Awesome ecosystem Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows. ``` Edit the code & try spaCyspaCy v3.7 · Python 3 · via Binder ``` ## Features - Support for **75+ languages** - **84 trained pipelines** for 25 languages - Multi-task learning with pretrained **transformers** like BERT - Pretrained **word vectors** - State-of-the-art speed - Production-ready **training system** - Linguistically-motivated **tokenization** - Components for **named entity** recognition, part-of-speech tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more - Easily extensible with **custom components** and attributes - Support for custom models in **PyTorch**, **TensorFlow** and other frameworks - Built in **visualizers** for syntax and NER - Easy **model packaging**, deployment and workflow management - Robust, rigorously evaluated accuracy ## Reproducible training for custom pipelines spaCy v3.0 introduces a comprehensive and extensible system for **configuring your training runs**. Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to **rerun your experiments** and track changes. You can use the quickstart widget or the [`init config`](https://spacy.io/api/cli#init-config) command to get started, or clone a project template for an end-to-end workflow. [Get started](https://spacy.io/usage/training) Language Components taggermorphologizertrainable\_lemmatizerparsernerspancattextcat Hardware CPUGPU (transformer) Optimize for efficiencyaccuracy ``` # This is an auto-generated partial config. To use it with 'spacy train' # you can run spacy init fill-config to auto-fill all default settings: # python -m spacy init fill-config ./base_config.cfg ./config.cfg [paths] train = null dev = null vectors = null [system] gpu_allocator = null [nlp] lang = "en" pipeline = [] batch_size = 1000 [components] [corpora] [corpora.train] @readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 0 [corpora.dev] @readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 [training] dev_corpus = "corpora.dev" train_corpus = "corpora.train" [training.optimizer] @optimizers = "Adam.v1" [training.batcher] @batchers = "spacy.batch_by_words.v1" discard_oversize = false tolerance = 0.2 [training.batcher.size] @schedules = "compounding.v1" start = 100 stop = 1000 compound = 1.001 [initialize] vectors = ${paths.vectors} ``` ## End-to-end workflows from prototype to production spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those **data transformation**, preprocessing and **training steps**, so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations. [Try it out](https://spacy.io/usage/projects) ## Benchmarks spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current **state-of-the-art**. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run. [More results](https://spacy.io/usage/facts-figures#benchmarks) | Pipeline | Parser | Tagger | NER | |---|---|---|---| | [`en_core_web_trf`](https://spacy.io/models/en#en_core_web_trf) (spaCy v3) | 95\.1 | 97\.8 | 89\.8 | | [`en_core_web_lg`](https://spacy.io/models/en#en_core_web_lg) (spaCy v3) | 92\.0 | 97\.4 | 85\.5 | | `en_core_web_lg` (spaCy v2) | 91\.9 | 97\.2 | 85\.5 | **Full pipeline accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) corpus (reported on the development set). | Named Entity Recognition System | OntoNotes | CoNLL ‘03 | |---|---|---| | spaCy RoBERTa (2020) | 89\.8 | 91\.6 | | Stanza (StanfordNLP)1 | 88\.8 | 92\.1 | | Flair2 | 89\.7 | 93\.1 | **Named entity recognition accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) and [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419.pdf) corpora. See [NLP-progress](http://nlpprogress.com/english/named_entity_recognition.html) for more results. Project template: [`benchmarks/ner_conll03`](https://github.com/explosion/projects/tree/v3/benchmarks/ner_conll03). **1\.** [Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). **2\.** [Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/).
Shard119 (laksa)
Root Hash6839834103567801919
Unparsed URLio,spacy!/ s443