[머신러닝 심화] 데이터 분석 프로세스

결측치(Missing Value)

타이타닉 데이터 결측치 처리 실습

import pandas as pd
titanic_df = pd.read_csv('파일경로')

titanic_df.info()

# 결측치가 있는 행 모두 삭제
titanic_df.dropna(axis=0).info()

# Age에 결측치가 있는 행 삭제
cond3 = (titanic_df['Age'].notna())
titanic_df[cond3].info()

# fillna 이용한 대치
# 평균값 계산
age_mean = titanic_df['Age'].mean().round(2)
titanic_df['Age_mean'] = titanic_df['Age'].fillna(age_mean)

titanic_df.info()

## SimpleImputer를 이용한 대치

from sklearn.impute import SimpleImputer
si = SimpleImputer()
si.fit(titanic_df[['Age']])

# 대치값 확인
si.statistics_

## array([29.69911765])

titanic_df['Age_si_mean'] = si.transform(titanic_df[['Age']])

titanic_df.info()

[머신러닝 심화] 데이터 분석 프로세스 - 데이터 분리 (0)	2024.01.31
[머신러닝 심화] 데이터 분석 프로세스 - 데이터 전처리(인코딩&스케일링) (0)	2024.01.31
[머신러닝 심화] 데이터 분석 프로세스 - 데이터 전처리(이상치) (0)	2024.01.31
[머신러닝 심화] 데이터 분석 프로세스 - 데이터 수집 및 EDA (0)	2024.01.31
[머신러닝 기초] 회귀와 분류 정리 (0)	2024.01.30

길 잃은 취준생이 뭐라도 해보는 블로그