ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.2 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://www.usessionbuddy.com/post/how-to-remove-Nan-values-from-data-in-Pandas/ |
| Last Crawled | 2026-04-06 03:51:13 (4 days ago) |
| First Indexed | 2019-11-16 03:32:57 (6 years ago) |
| HTTP Status Code | 200 |
| Meta Title | How to remove Nan values from data in Pandas |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Lets just import the library first
import
pandas
as
pd
I have the movies database which I have downloaded from Kaggle for this exercise. Lets read the data and look at first few rows by using head which will first 10 rows...
df = pd.read_csv(
"movies_metadata.csv"
)
df.head()
Lets find out the name of columns we have in the data by using df.columns
df.columns
Out[12]:
Index([
'adult'
,
'belongs_to_collection'
,
'budget'
,
'genres'
,
'homepage'
,
'id'
,
'imdb_id'
,
'original_language'
,
'original_title'
,
'overview'
,
'popularity'
,
'poster_path'
,
'production_companies'
,
'production_countries'
,
'release_date'
,
'revenue'
,
'runtime'
,
'spoken_languages'
,
'status'
,
'tagline'
,
'title'
,
'video'
,
'vote_average'
,
'vote_count'
],
dtype=
'object'
)
Lets see how many rows we have in the data...
df.size
Out[47]:
1091184
So there are 1091184 rows in the data.
Lets do a simple query on the data. Lets find out all the rows which contains title "Toy Story". Here is the query to do that...
df[df.title.str.contains(
'Toy Story'
,
case
=
False
)]
But I got following error...
ValueError: cannot index
with
vector containing NA /
NaN
values
How To Fix The Error "cannot index with vector containing NA"
To fix the above error, we can either ignore the Na/Nan values and then run above command or remove the Na/Nan values altogether. Lets try the first idea that is ignore the Nan values. The command to do that is following...
df[df.title.str.contains(
'Toy Story'
,
case
=
False
) & (df.title.isna()==
False
)]
To find out how many records we get , we can use len() python method on the df since it is a list.
len(df[df.title.str.contains(
'Toy Story'
,
case
=
False
) & (df.title.isna()==
False
)])
Out[
52
]:
5
We got 5 rows.
The above method will ignore the NaN values from title column. We can also remove all the rows which have NaN values...
How To Drop NA Values Using Pandas DropNa
df1 = df.dropna()
In [46]:
df1.size
Out[46]:
16632
As we can see above dropna() will remove all the rows where at least one value has Na/NaN value. Number of rows have reduced to 16632.
Related Posts |
| Markdown | [UsessionBuddy](https://www.usessionbuddy.com/)
- [Home](https://www.usessionbuddy.com/)
- [Linux](https://www.usessionbuddy.com/tags/linux/)
- [Bash](https://www.usessionbuddy.com/tags/bash/)
- [Python](https://www.usessionbuddy.com/tags/python/)
- [Vim](https://www.usessionbuddy.com/tags/vim/)
- [Privacy](https://www.usessionbuddy.com/privacy/)
## Table of Contents
# How to remove Nan values from data in Pandas
Lets just import the library first
```
import pandas as pd
```
I have the movies database which I have downloaded from Kaggle for this exercise. Lets read the data and look at first few rows by using head which will first 10 rows...
```
df = pd.read_csv("movies_metadata.csv")
df.head()
```
Lets find out the name of columns we have in the data by using df.columns
```
df.columns
Out[12]:
Index(['adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
'imdb_id', 'original_language', 'original_title', 'overview',
'popularity', 'poster_path', 'production_companies',
'production_countries', 'release_date', 'revenue', 'runtime',
'spoken_languages', 'status', 'tagline', 'title', 'video',
'vote_average', 'vote_count'],
dtype='object')
```
Lets see how many rows we have in the data...
```
df.size
Out[47]:
1091184
```
So there are 1091184 rows in the data.
Lets do a simple query on the data. Lets find out all the rows which contains title "Toy Story". Here is the query to do that...
```
df[df.title.str.contains('Toy Story', case=False)]
```
But I got following error...
```
ValueError: cannot index with vector containing NA / NaN values
```
## How To Fix The Error "cannot index with vector containing NA"
To fix the above error, we can either ignore the Na/Nan values and then run above command or remove the Na/Nan values altogether. Lets try the first idea that is ignore the Nan values. The command to do that is following...
```
df[df.title.str.contains('Toy Story', case=False) & (df.title.isna()==False)]
```
To find out how many records we get , we can use len() python method on the df since it is a list.
```
len(df[df.title.str.contains('Toy Story', case=False) & (df.title.isna()==False)])
Out[52]:
5
```
We got 5 rows.
The above method will ignore the NaN values from title column. We can also remove all the rows which have NaN values...
## How To Drop NA Values Using Pandas DropNa
```
df1 = df.dropna()
In [46]:
df1.size
Out[46]:
16632
```
As we can see above dropna() will remove all the rows where at least one value has Na/NaN value. Number of rows have reduced to 16632.
## Related Posts
Copyright © All rights reserved Googu Corporation |
| Readable Markdown | Lets just import the library first
```
import pandas as pd
```
I have the movies database which I have downloaded from Kaggle for this exercise. Lets read the data and look at first few rows by using head which will first 10 rows...
```
df = pd.read_csv("movies_metadata.csv")
df.head()
```
Lets find out the name of columns we have in the data by using df.columns
```
df.columns
Out[12]:
Index(['adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
'imdb_id', 'original_language', 'original_title', 'overview',
'popularity', 'poster_path', 'production_companies',
'production_countries', 'release_date', 'revenue', 'runtime',
'spoken_languages', 'status', 'tagline', 'title', 'video',
'vote_average', 'vote_count'],
dtype='object')
```
Lets see how many rows we have in the data...
```
df.size
Out[47]:
1091184
```
So there are 1091184 rows in the data.
Lets do a simple query on the data. Lets find out all the rows which contains title "Toy Story". Here is the query to do that...
```
df[df.title.str.contains('Toy Story', case=False)]
```
But I got following error...
```
ValueError: cannot index with vector containing NA / NaN values
```
## How To Fix The Error "cannot index with vector containing NA"
To fix the above error, we can either ignore the Na/Nan values and then run above command or remove the Na/Nan values altogether. Lets try the first idea that is ignore the Nan values. The command to do that is following...
```
df[df.title.str.contains('Toy Story', case=False) & (df.title.isna()==False)]
```
To find out how many records we get , we can use len() python method on the df since it is a list.
```
len(df[df.title.str.contains('Toy Story', case=False) & (df.title.isna()==False)])
Out[52]:
5
```
We got 5 rows.
The above method will ignore the NaN values from title column. We can also remove all the rows which have NaN values...
## How To Drop NA Values Using Pandas DropNa
```
df1 = df.dropna()
In [46]:
df1.size
Out[46]:
16632
```
As we can see above dropna() will remove all the rows where at least one value has Na/NaN value. Number of rows have reduced to 16632.
Related Posts |
| Shard | 195 (laksa) |
| Root Hash | 7666975497231005395 |
| Unparsed URL | com,usessionbuddy!www,/post/how-to-remove-Nan-values-from-data-in-Pandas/ s443 |