Wednesday, 26 July 2023

REMOVE EMPTY ROWS, NULL VALUES, MEAN, MEDIAN, MODE - PYTHON

REMOVE EMPTY ROWS

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())


By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

REPLACE NULL VALUES WITH THE NUMBER 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

REPLACE ONLY FOR SPECIFIED COLUMNS

import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

REPLACE USING MEAN, MEDIAN, OR MODE


MEAN = THE AVERAGE VALUE (THE SUM OF ALL VALUES DIVIDED BY NUMBER OF VALUES).


import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

MEDIAN = THE VALUE IN THE MIDDLE, AFTER YOU HAVE SORTED ALL VALUES ASCENDING.

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

MODE = THE VALUE THAT APPEARS MOST FREQUENTLY

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...