Wednesday, 26 July 2023

REMOVE EMPTY ROWS, NULL VALUES, MEAN, MEDIAN, MODE - PYTHON

REMOVE EMPTY ROWS

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())


By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

REPLACE NULL VALUES WITH THE NUMBER 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

REPLACE ONLY FOR SPECIFIED COLUMNS

import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

REPLACE USING MEAN, MEDIAN, OR MODE


MEAN = THE AVERAGE VALUE (THE SUM OF ALL VALUES DIVIDED BY NUMBER OF VALUES).


import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

MEDIAN = THE VALUE IN THE MIDDLE, AFTER YOU HAVE SORTED ALL VALUES ASCENDING.

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

MODE = THE VALUE THAT APPEARS MOST FREQUENTLY

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)

No comments:

Data Sharing

  1. Create Share CREATE SHARE my_share; 2. Grant privileges to share GRANT USAGE ON DATABASE my_db TO SHARE my_share; GRANT USAGE ...