ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://spacy.io/ |
| Last Crawled | 2026-04-07 12:19:09 (10 hours ago) |
| First Indexed | 2015-12-03 04:33:17 (10 years ago) |
| HTTP Status Code | 200 |
| Meta Title | spaCy · Industrial-strength Natural Language Processing in Python |
| Meta Description | spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more. |
| Meta Canonical | null |
| Boilerpipe Text | Industrial-Strength
Natural Language
Processing
in Python
Get things done
spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive.
Blazing fast
spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using.
Awesome ecosystem
Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows.
Edit the code & try spaCy
spaCy v3.7 · Python 3 · via
Binder
Features
Support for
75+ languages
84 trained pipelines
for 25 languages
Multi-task learning with pretrained
transformers
like BERT
Pretrained
word vectors
State-of-the-art speed
Production-ready
training system
Linguistically-motivated
tokenization
Components for
named entity
recognition, part-of-speech tagging, dependency parsing, sentence segmentation,
text classification
, lemmatization, morphological analysis, entity linking and more
Easily extensible with
custom components
and attributes
Support for custom models in
PyTorch
,
TensorFlow
and other frameworks
Built in
visualizers
for syntax and NER
Easy
model packaging
, deployment and workflow management
Robust, rigorously evaluated accuracy
Reproducible training for custom pipelines
spaCy v3.0 introduces a comprehensive and extensible system for
configuring your training runs
. Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to
rerun your experiments
and track changes. You can use the quickstart widget or the
init config
command to get started, or clone a project template for an end-to-end workflow.
Get started
Language
Components
tagger
morphologizer
trainable_lemmatizer
parser
ner
spancat
textcat
Hardware
CPU
GPU (transformer)
Optimize for
efficiency
accuracy
# This is an auto-generated partial config. To use it with 'spacy train'
# you can run spacy init fill-config to auto-fill all default settings:
# python -m spacy init fill-config ./base_config.cfg ./config.cfg
[
paths
]
train
=
null
dev
=
null
vectors
=
null
[
system
]
gpu_allocator
=
null
[
nlp
]
lang
=
"
en
"
pipeline
=
[]
batch_size
=
1000
[
components
]
[
corpora
]
[
corpora.train
]
@readers
=
"
spacy.Corpus.v1
"
path
=
${paths.train}
max_length
=
0
[
corpora.dev
]
@readers
=
"
spacy.Corpus.v1
"
path
=
${paths.dev}
max_length
=
0
[
training
]
dev_corpus
=
"
corpora.dev
"
train_corpus
=
"
corpora.train
"
[
training.optimizer
]
@optimizers
=
"
Adam.v1
"
[
training.batcher
]
@batchers
=
"
spacy.batch_by_words.v1
"
discard_oversize
=
false
tolerance
=
0.2
[
training.batcher.size
]
@schedules
=
"
compounding.v1
"
start
=
100
stop
=
1000
compound
=
1.001
[
initialize
]
vectors
=
${paths.vectors}
End-to-end workflows from prototype to production
spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those
data transformation
, preprocessing and
training steps
, so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations.
Try it out
Benchmarks
spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current
state-of-the-art
. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run.
More results
Pipeline
Parser
Tagger
NER
en_core_web_trf
(spaCy v3)
95.1
97.8
89.8
en_core_web_lg
(spaCy v3)
92.0
97.4
85.5
en_core_web_lg
(spaCy v2)
91.9
97.2
85.5
Full pipeline accuracy
on the
OntoNotes 5.0
corpus (reported on
the development set).
Named Entity Recognition System
OntoNotes
CoNLL ‘03
spaCy RoBERTa (2020)
89.8
91.6
Stanza (StanfordNLP)
1
88.8
92.1
Flair
2
89.7
93.1
Named entity recognition accuracy
on the
OntoNotes 5.0
and
CoNLL-2003
corpora. See
NLP-progress
for
more results. Project template:
benchmarks/ner_conll03
.
1.
Qi et al. (2020)
.
2.
Akbik et al. (2018)
. |
| Markdown | [spaCy](https://spacy.io/)
[💥 **New:** spaCy for PDFs and Word docs](https://github.com/explosion/spacy-layout)
- [Usage](https://spacy.io/usage)
- [Models](https://spacy.io/models)
- [API](https://spacy.io/api)
- [Universe](https://spacy.io/universe)
Search
# Industrial-Strength Natural Language Processing
## in Python
### Get things done
spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive.
[Get started](https://spacy.io/usage/spacy-101)
### Blazing fast
spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using.
[Facts & Figures](https://spacy.io/usage/facts-figures)
### Awesome ecosystem
Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows.
[Read more](https://spacy.io/usage/projects)
```
Edit the code & try spaCyspaCy v3.7 · Python 3 · via Binder run
```
## Features
- Support for **75+ languages**
- **84 trained pipelines** for 25 languages
- Multi-task learning with pretrained **transformers** like BERT
- Pretrained **word vectors**
- State-of-the-art speed
- Production-ready **training system**
- Linguistically-motivated **tokenization**
- Components for **named entity** recognition, part-of-speech tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
- Easily extensible with **custom components** and attributes
- Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
- Built in **visualizers** for syntax and NER
- Easy **model packaging**, deployment and workflow management
- Robust, rigorously evaluated accuracy
### NEW [Large Language Models: Integrating LLMs into structured NLP pipelines](https://spacy.io/usage/large-language-models)
[The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large Language Models (LLMs) into spaCy, featuring a modular system for **fast prototyping** and **prompting**, and turning unstructured responses into **robust outputs** for various NLP tasks, **no training data** required.
[Learn more](https://spacy.io/usage/large-language-models)
### From the makers of spaCy [Prodigy: Radically efficient machine teaching](https://prodi.gy/)
[](https://prodi.gy/)
Prodigy is an **annotation tool** so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you **train and evaluate** your models faster.
[Try it out](https://prodi.gy/)
## Reproducible training for custom pipelines
spaCy v3.0 introduces a comprehensive and extensible system for **configuring your training runs**. Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to **rerun your experiments** and track changes. You can use the quickstart widget or the [`init config`](https://spacy.io/api/cli#init-config) command to get started, or clone a project template for an end-to-end workflow.
[Get started](https://spacy.io/usage/training)
Language
Components
tagger
morphologizer
trainable\_lemmatizer
parser
ner
spancat
textcat
Hardware
CPU
GPU (transformer)
Optimize for
efficiency
accuracy
```
# This is an auto-generated partial config. To use it with 'spacy train' # you can run spacy init fill-config to auto-fill all default settings: # python -m spacy init fill-config ./base_config.cfg ./config.cfg [paths] train = null dev = null vectors = null [system] gpu_allocator = null [nlp] lang = "en" pipeline = [] batch_size = 1000 [components] [corpora] [corpora.train] @readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 0 [corpora.dev] @readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 [training] dev_corpus = "corpora.dev" train_corpus = "corpora.train" [training.optimizer] @optimizers = "Adam.v1" [training.batcher] @batchers = "spacy.batch_by_words.v1" discard_oversize = false tolerance = 0.2 [training.batcher.size] @schedules = "compounding.v1" start = 100 stop = 1000 compound = 1.001 [initialize] vectors = ${paths.vectors}
```
[](https://spacy.io/usage/projects)
#### 🪐Get started: [`pipelines/tagger_parser_ud`](https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud)
The easiest way to get started is to clone a project template and run it – for example, this template for training a **part-of-speech tagger** and **dependency parser** on a Universal Dependencies treebank.
\$
python -m spacy project clone pipelines/tagger\_parser\_ud
## End-to-end workflows from prototype to production
spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those **data transformation**, preprocessing and **training steps**, so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations.
[Try it out](https://spacy.io/usage/projects)
[](https://explosion.ai/custom-solutions)
**Get a custom spaCy pipeline, tailor-made for your NLP problem by spaCy's core developers.**
- **Streamlined.** Nobody knows spaCy better than we do. Send us your pipeline requirements and we'll be ready to start producing your solution in no time at all.
- **Production ready.** spaCy pipelines are robust and easy to deploy. You'll get a complete spaCy project folder which is ready to `spacy project run`.
- **Predictable.** You'll know exactly what you're going to get and what it's going to cost. We quote fees up-front, let you try before you buy, and don't charge for over-runs at our end — all the risk is on us.
- **Maintainable.** spaCy is an industry standard, and we'll deliver your pipeline with full code, data, tests and documentation, so your team can retrain, update and extend the solution as your requirements change.
[Learn more](https://explosion.ai/custom-solutions)
[](https://course.spacy.io/)
In this **free and interactive online course** you’ll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches. It includes **55 exercises** featuring videos, slide decks, multiple-choice questions and interactive coding practice in the browser.
[Start the course](https://course.spacy.io/)
## Benchmarks
spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current **state-of-the-art**. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run.
[More results](https://spacy.io/usage/facts-figures#benchmarks)
| Pipeline | Parser | Tagger | NER |
|---|---|---|---|
| [`en_core_web_trf`](https://spacy.io/models/en#en_core_web_trf) (spaCy v3) | 95\.1 | 97\.8 | 89\.8 |
| [`en_core_web_lg`](https://spacy.io/models/en#en_core_web_lg) (spaCy v3) | 92\.0 | 97\.4 | 85\.5 |
| `en_core_web_lg` (spaCy v2) | 91\.9 | 97\.2 | 85\.5 |
**Full pipeline accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) corpus (reported on the development set).
| Named Entity Recognition System | OntoNotes | CoNLL ‘03 |
|---|---|---|
| spaCy RoBERTa (2020) | 89\.8 | 91\.6 |
| Stanza (StanfordNLP)1 | 88\.8 | 92\.1 |
| Flair2 | 89\.7 | 93\.1 |
**Named entity recognition accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) and [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419.pdf) corpora. See [NLP-progress](http://nlpprogress.com/english/named_entity_recognition.html) for more results. Project template: [`benchmarks/ner_conll03`](https://github.com/explosion/projects/tree/v3/benchmarks/ner_conll03). **1\.** [Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). **2\.** [Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/).
- spaCy
- [Usage](https://spacy.io/usage)
- [Models](https://spacy.io/models)
- [API Reference](https://spacy.io/api)
- [Online Course](https://course.spacy.io/)
- [Custom Solutions](https://explosion.ai/custom-solutions)
- Community
- [Universe](https://spacy.io/universe)
- [GitHub Discussions](https://github.com/explosion/spaCy/discussions)
- [Issue Tracker](https://github.com/explosion/spaCy/issues)
- [Stack Overflow](http://stackoverflow.com/questions/tagged/spacy)
- [Merchandise](https://explosion.ai/merch)
- Connect
- [Bluesky](https://bsky.app/profile/explosion-ai.bsky.social)
- [GitHub](https://github.com/explosion/spaCy)
- [Live Stream](https://www.youtube.com/playlist?list=PLBmcuObd5An5_iAxNYLJa_xWmNzsYce8c)
- [YouTube](https://youtube.com/c/ExplosionAI)
- [Blog](https://explosion.ai/blog)
- Stay in the loop\!
- Receive updates about new releases, tutorials and more.
© 2016-2025 [Explosion](https://explosion.ai/)[Legal / Imprint](https://explosion.ai/legal) |
| Readable Markdown | Industrial-Strength Natural Language Processingin Python
### Get things done
spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive.
### Blazing fast
spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using.
### Awesome ecosystem
Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows.
```
Edit the code & try spaCyspaCy v3.7 · Python 3 · via Binder
```
## Features
- Support for **75+ languages**
- **84 trained pipelines** for 25 languages
- Multi-task learning with pretrained **transformers** like BERT
- Pretrained **word vectors**
- State-of-the-art speed
- Production-ready **training system**
- Linguistically-motivated **tokenization**
- Components for **named entity** recognition, part-of-speech tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
- Easily extensible with **custom components** and attributes
- Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
- Built in **visualizers** for syntax and NER
- Easy **model packaging**, deployment and workflow management
- Robust, rigorously evaluated accuracy
## Reproducible training for custom pipelines
spaCy v3.0 introduces a comprehensive and extensible system for **configuring your training runs**. Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to **rerun your experiments** and track changes. You can use the quickstart widget or the [`init config`](https://spacy.io/api/cli#init-config) command to get started, or clone a project template for an end-to-end workflow.
[Get started](https://spacy.io/usage/training)
Language
Components
taggermorphologizertrainable\_lemmatizerparsernerspancattextcat
Hardware
CPUGPU (transformer)
Optimize for
efficiencyaccuracy
```
# This is an auto-generated partial config. To use it with 'spacy train' # you can run spacy init fill-config to auto-fill all default settings: # python -m spacy init fill-config ./base_config.cfg ./config.cfg [paths] train = null dev = null vectors = null [system] gpu_allocator = null [nlp] lang = "en" pipeline = [] batch_size = 1000 [components] [corpora] [corpora.train] @readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 0 [corpora.dev] @readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 [training] dev_corpus = "corpora.dev" train_corpus = "corpora.train" [training.optimizer] @optimizers = "Adam.v1" [training.batcher] @batchers = "spacy.batch_by_words.v1" discard_oversize = false tolerance = 0.2 [training.batcher.size] @schedules = "compounding.v1" start = 100 stop = 1000 compound = 1.001 [initialize] vectors = ${paths.vectors}
```
## End-to-end workflows from prototype to production
spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those **data transformation**, preprocessing and **training steps**, so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations.
[Try it out](https://spacy.io/usage/projects)
## Benchmarks
spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current **state-of-the-art**. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run.
[More results](https://spacy.io/usage/facts-figures#benchmarks)
| Pipeline | Parser | Tagger | NER |
|---|---|---|---|
| [`en_core_web_trf`](https://spacy.io/models/en#en_core_web_trf) (spaCy v3) | 95\.1 | 97\.8 | 89\.8 |
| [`en_core_web_lg`](https://spacy.io/models/en#en_core_web_lg) (spaCy v3) | 92\.0 | 97\.4 | 85\.5 |
| `en_core_web_lg` (spaCy v2) | 91\.9 | 97\.2 | 85\.5 |
**Full pipeline accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) corpus (reported on the development set).
| Named Entity Recognition System | OntoNotes | CoNLL ‘03 |
|---|---|---|
| spaCy RoBERTa (2020) | 89\.8 | 91\.6 |
| Stanza (StanfordNLP)1 | 88\.8 | 92\.1 |
| Flair2 | 89\.7 | 93\.1 |
**Named entity recognition accuracy** on the [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) and [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419.pdf) corpora. See [NLP-progress](http://nlpprogress.com/english/named_entity_recognition.html) for more results. Project template: [`benchmarks/ner_conll03`](https://github.com/explosion/projects/tree/v3/benchmarks/ner_conll03). **1\.** [Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). **2\.** [Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/). |
| Shard | 119 (laksa) |
| Root Hash | 6839834103567801919 |
| Unparsed URL | io,spacy!/ s443 |