Housing Price Prediction by Tensorflow
서울 아파트 가격 예측 모델링 Project
2019년 이래로 끊임없이 상승한다고 했던 아파트 가격 미국 기준 금리 인상에 따라 우리나라 금리도 연달아 오르고 있는 상황, 그리고 기대 인구의 감소 3기 신도시 입주 등 2025년 이후로 공급 물량 확대 등의 여러 가지 요소가 아파트 가격의 예측을 더욱 힘들게 하고 있다.
과연 아파트 가격은 어떻게 될 것인지 간단하게 Python Tensorflow로 알아보았다.
(1) 필요한 라이브러리 Import
import pathlib
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print(tf.__version__)
import os
import datetime
from datetime import datetime, date
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_colwidth', None)
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import font_manager, rc
font_path = "C:/Windows/Fonts/SeoulNamsanB.ttf"
font = font_manager.FontProperties(fname=font_path).get_name()
rc('font', family=font)
2.1.0
(2) CSV 데이터 READ 및 데이터 전처리
#Macro 거시데이터 리드
korea_gdp_rate = pd.read_csv("./data/Marco/01_korea_gdp_series.csv",sep=",",encoding="UTF-8")
korea_interest_rate = pd.read_csv("./data/Marco/02_korea_interest_rate.csv",sep=",",encoding="UTF-8")
korea_personal_loan = pd.read_csv("./data/Marco/03_korea_personal_loan.csv",sep=",",encoding="UTF-8")
korea_loan_for_house = pd.read_csv("./data/Marco/04_korea_loan_for_house.csv",sep=",",encoding="UTF-8")
korea_personal_GDP = pd.read_csv("./data/Marco/05_korea_personal_GDP.csv",sep=",",encoding="UTF-8")
seoul_gdp_series = pd.read_csv("./data/Marco/06_seoul_gdp_series.csv",sep=",",encoding="UTF-8")
housing_count_yearly = pd.read_csv("./data/Marco/07_housing_count_yearly.csv",sep=",",encoding="UTF-8")
seoul_population= pd.read_csv("./data/Marco/08_seoul_population.csv",sep=",",encoding="UTF-8")
constructure_confirm= pd.read_csv("./data/Marco/09_constructure_confirm.csv",sep=",",encoding="UTF-8")
#2011~2020년의 필요한 연도 데이터 추출
korea_gdp_rate = korea_gdp_rate.loc[41:]
korea_interest_rate = korea_interest_rate.loc[12:]
korea_personal_loan = korea_personal_loan.loc[9:]
korea_loan_for_house = korea_loan_for_house.loc[:9]
korea_personal_GDP = korea_personal_GDP[1:]
seoul_gdp_series
housing_count_yearly
seoul_population
constructure_confirm
year 합계 수도권 서울
0 2011 549594 272156 88060
1 2012 586884 269290 86123
2 2013 440116 192610 77621
3 2014 515251 241889 65249
4 2015 765328 408773 101235
5 2016 726048 341162 74739
6 2017 653441 321402 113131
7 2018 554136 280097 65751
8 2019 487975 272226 62272
9 2020 457514 252301 58181
(File) korea_gdp_series
- 대한민국 명목 GDP 데이터
통계표명: 국내총생산 및 경제성장률
단위: 십억원, 전년동기비 %
출처: 한국은행「국민소득」
주석: * 국민총생산(명목, 시장가격)
- 실질GDP, 실질성장률은 발표시기(한국은행, GDP속보치 발표)와 명목GDP, 명목소득 증감률의 발표시기(한국은행, GDP잠정치 발표)가 차이가 있어 국내총생산(명목GDP)과 경제성장률(실질성장률) 업데이트 시기가 다름
(File) korea_interest_rate
- 대한민국 금리 추이 데이터 통계표명: 시장금리 추이 단위: 기간중 평균금리, % 출처: 한국은행 경제통계시스템 > 4. 금리 주석: * 콜금리 목표는 월말 기준이며,국고채10년은 00.11월부터, 콜금리목표는 99.4월부터임.
(File) korea_personal_loan
- 가계대출정보데이터 통계표명: 가계신용 동향 단위: 조원, % 출처: 한국은행「가계신용동향」
(File) korea_loan_for_house
- 주택금융신용보증기금 통계표명 : 주택금융신용보증기금 동향 단위 : 억원, % 출처 : 주택금융통계시스템
한국주택금융공사법 제56조와 시행규칙 제3조에 의거하여 주택금융신용보증기금에 출연하는 금융기관의 주택자금대출금 규모이며, 한국주택금융공사법 시행규칙 개정(제3조, 2007.7.1 시행)으로 출연대상 주택자금대출의 정의 및 범위가 변경 적용됨
※ 주택금융신용보증기금에 출연한 금융기관 (전분기말 기준)
- 시중은행 : 국민, 신한, 우리, KEB하나, 한국씨티, 한국스탠다드차타드은행, 카카오뱅크, 케이뱅크
- 지방은행 : 경남, 광주, 대구, 부산, 전북, 제주은행
- 특수은행 : 농협, 수협, 기업, 산업은행
- 외은지점 : 미쓰비시 도쿄UFJ, 중국, 중국농업, 중국광대, 중국공상, 파키스탄 국립, 홍콩상하이
(File) seoul_personal_gdp
- 1인당 지역내 개인소득 데이터 통계표명 : 시도별 1인당 지역내 개인소득 단위 : 천원 출처 : 통계청(KOSIS)
(File) seoul_gdp_series
- 서울시 지역 내 총생산 데이터 통계표명 : 서울시 지역 내 총생산 데이터 단위 : 백만원 출처 : 서울특별시
(File) housing_count_yearly
- 서울시 공동주택 현황 통계 통계표명 : 서울시 공동주택 현황 단위 : 건 출처 : 서울특별시
(File) seoul_population
- 서울시 인구 통계 정보 통계표명 : 서울시 인구 통계 정보 단위 : 명 출처 : 서울특별시
(File) constructure_confirm
- 주택건설인허가실적 통계표명 : 주택건설인허가실적 단위 : 호 출처 : 한국주택협회
#아파트 거래 데이터 추출
tr_2011 = pd.read_csv("./data\/Transaction/transaction_data_2011.csv",sep=",",encoding="UTF-8")
tr_2012 = pd.read_csv("./data\/Transaction/transaction_data_2012.csv",sep=",",encoding="UTF-8")
tr_2013 = pd.read_csv("./data\/Transaction/transaction_data_2013.csv",sep=",",encoding="UTF-8")
tr_2014 = pd.read_csv("./data\/Transaction/transaction_data_2014.csv",sep=",",encoding="UTF-8")
tr_2015 = pd.read_csv("./data\/Transaction/transaction_data_2015.csv",sep=",",encoding="UTF-8")
tr_2016 = pd.read_csv("./data\/Transaction/transaction_data_2016.csv",sep=",",encoding="UTF-8")
tr_2017 = pd.read_csv("./data\/Transaction/transaction_data_2017.csv",sep=",",encoding="UTF-8")
tr_2018 = pd.read_csv("./data\/Transaction/transaction_data_2018.csv",sep=",",encoding="UTF-8")
tr_2019 = pd.read_csv("./data\/Transaction/transaction_data_2019.csv",sep=",",encoding="UTF-8")
tr_2020 = pd.read_csv("./data\/Transaction/transaction_data_2020.csv",sep=",",encoding="UTF-8")
□ 본 서비스에서 제공하는 정보는 법적인 효력이 없으므로 참고용으로만 활용하시기 바랍니다. □ 신고정보가 실시간 변경, 해제되어 제공시점에 따라 공개건수 및 내용이 상이할 수 있는 점 참고하시기 바랍니다. □ 본 자료는 계약일 기준입니다. (※ 7월 계약, 8월 신고건 → 7월 거래건으로 제공) □ 통계자료 활용시에는 수치가 왜곡될 수 있으니 참고자료로만 활용하시기 바라며, 외부 공개시에는 반드시 신고일 기준으로 집계되는 공식통계를 이용하여 주시기 바랍니다.
- 국토교통부 실거래가 공개시스템의 궁금하신 점이나 문의사항은 콜센터 1588-0149로 연락 주시기 바랍니다. □ 검색조건 계약일자 : 20110101 ~ 20211031 실거래구분 : 아파트(매매) 주소구분 : 지번주소 시도 : 서울특별시 시군구 : 전체 읍면동 : 전체 면적 : 전체 금액선택 : 전체
(3) Data Preprocessing
#2011~2020년의 아파트 거래 데이터 통합
tr_concat = pd.concat([tr_2011,tr_2012,tr_2013,tr_2014,tr_2015,tr_2016,tr_2017,tr_2018,tr_2019,tr_2020], ignore_index=True)
tr_concat.columns
Index(['시군구', '번지', '본번', '부번', '단지명', '전용면적(㎡)', '계약년월', '계약일', '거래금액(만원)',
'층', '건축년도', '도로명', '해제사유발생일', '거래유형', '중개사소재지'],
dtype='object')
#날짜 형식 변경
year = tr_concat["계약년월"].astype(str).str[:4]
month = tr_concat["계약년월"].astype(str).str[4:6]
day = tr_concat["계약일"].astype(str)
#날짜형식 맞추기 위해 1자리수 일자에 0넣기
for i in range(len(day)):
if len(day.iloc[i])==1:
day.iloc[i] = "0"+ day.iloc[i]
year_month = year.str.cat(month,sep="-")
year_month_day = year_month.str.cat(day,sep="-")
year_month_day
0 2011-07-09
1 2011-07-28
2 2011-01-19
3 2011-09-02
4 2011-12-17
...
826104 2020-08-07
826105 2020-07-10
826106 2020-12-03
826107 2020-09-28
826108 2020-09-28
Name: 계약년월, Length: 826109, dtype: object
from datetime import datetime
format = "%Y-%m-%d"
dt_datetime=[]
for i in range(826109):
dt_datetime.append(datetime.strptime(year_month_day[i],format))
dt_datetime
[datetime.datetime(2011, 7, 9, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 9, 2, 0, 0),
datetime.datetime(2011, 12, 17, 0, 0),
datetime.datetime(2011, 3, 28, 0, 0),
datetime.datetime(2011, 5, 30, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 2, 15, 0, 0),
datetime.datetime(2011, 9, 27, 0, 0),
datetime.datetime(2011, 1, 5, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0),
datetime.datetime(2011, 1, 14, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 1, 20, 0, 0),
datetime.datetime(2011, 1, 22, 0, 0),
datetime.datetime(2011, 1, 22, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 1, 25, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 1, 29, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 2, 1, 0, 0),
datetime.datetime(2011, 2, 7, 0, 0),
datetime.datetime(2011, 2, 10, 0, 0),
datetime.datetime(2011, 2, 11, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 3, 2, 0, 0),
datetime.datetime(2011, 3, 3, 0, 0),
datetime.datetime(2011, 3, 8, 0, 0),
datetime.datetime(2011, 3, 16, 0, 0),
datetime.datetime(2011, 3, 21, 0, 0),
datetime.datetime(2011, 3, 22, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 26, 0, 0),
datetime.datetime(2011, 3, 26, 0, 0),
datetime.datetime(2011, 3, 28, 0, 0),
datetime.datetime(2011, 3, 29, 0, 0),
datetime.datetime(2011, 3, 29, 0, 0),
datetime.datetime(2011, 3, 29, 0, 0),
datetime.datetime(2011, 4, 7, 0, 0),
datetime.datetime(2011, 4, 9, 0, 0),
datetime.datetime(2011, 4, 11, 0, 0),
datetime.datetime(2011, 4, 12, 0, 0),
datetime.datetime(2011, 4, 13, 0, 0),
datetime.datetime(2011, 4, 13, 0, 0),
datetime.datetime(2011, 4, 15, 0, 0),
datetime.datetime(2011, 4, 19, 0, 0),
datetime.datetime(2011, 4, 22, 0, 0),
datetime.datetime(2011, 4, 25, 0, 0),
datetime.datetime(2011, 4, 26, 0, 0),
datetime.datetime(2011, 4, 28, 0, 0),
datetime.datetime(2011, 5, 5, 0, 0),
datetime.datetime(2011, 5, 9, 0, 0),
datetime.datetime(2011, 5, 10, 0, 0),
datetime.datetime(2011, 5, 16, 0, 0),
datetime.datetime(2011, 5, 20, 0, 0),
datetime.datetime(2011, 5, 20, 0, 0),
datetime.datetime(2011, 5, 23, 0, 0),
datetime.datetime(2011, 5, 23, 0, 0),
datetime.datetime(2011, 6, 1, 0, 0),
datetime.datetime(2011, 6, 3, 0, 0),
datetime.datetime(2011, 6, 7, 0, 0),
datetime.datetime(2011, 6, 13, 0, 0),
datetime.datetime(2011, 6, 14, 0, 0),
datetime.datetime(2011, 6, 14, 0, 0),
datetime.datetime(2011, 6, 14, 0, 0),
datetime.datetime(2011, 6, 20, 0, 0),
datetime.datetime(2011, 6, 20, 0, 0),
datetime.datetime(2011, 6, 20, 0, 0),
datetime.datetime(2011, 6, 20, 0, 0),
datetime.datetime(2011, 6, 21, 0, 0),
datetime.datetime(2011, 6, 23, 0, 0),
datetime.datetime(2011, 7, 4, 0, 0),
datetime.datetime(2011, 7, 8, 0, 0),
datetime.datetime(2011, 7, 9, 0, 0),
datetime.datetime(2011, 7, 12, 0, 0),
datetime.datetime(2011, 7, 13, 0, 0),
datetime.datetime(2011, 7, 13, 0, 0),
datetime.datetime(2011, 7, 13, 0, 0),
datetime.datetime(2011, 7, 16, 0, 0),
datetime.datetime(2011, 7, 19, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 21, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 27, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 7, 29, 0, 0),
datetime.datetime(2011, 7, 30, 0, 0),
datetime.datetime(2011, 7, 30, 0, 0),
datetime.datetime(2011, 7, 30, 0, 0),
datetime.datetime(2011, 7, 30, 0, 0),
datetime.datetime(2011, 8, 9, 0, 0),
datetime.datetime(2011, 8, 15, 0, 0),
datetime.datetime(2011, 8, 18, 0, 0),
datetime.datetime(2011, 8, 22, 0, 0),
datetime.datetime(2011, 8, 23, 0, 0),
datetime.datetime(2011, 8, 25, 0, 0),
datetime.datetime(2011, 8, 27, 0, 0),
datetime.datetime(2011, 8, 29, 0, 0),
datetime.datetime(2011, 8, 30, 0, 0),
datetime.datetime(2011, 9, 1, 0, 0),
datetime.datetime(2011, 9, 1, 0, 0),
datetime.datetime(2011, 9, 3, 0, 0),
datetime.datetime(2011, 9, 3, 0, 0),
datetime.datetime(2011, 9, 7, 0, 0),
datetime.datetime(2011, 9, 7, 0, 0),
datetime.datetime(2011, 9, 15, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 9, 17, 0, 0),
datetime.datetime(2011, 9, 20, 0, 0),
datetime.datetime(2011, 9, 21, 0, 0),
datetime.datetime(2011, 9, 21, 0, 0),
datetime.datetime(2011, 9, 22, 0, 0),
datetime.datetime(2011, 9, 24, 0, 0),
datetime.datetime(2011, 9, 27, 0, 0),
datetime.datetime(2011, 9, 27, 0, 0),
datetime.datetime(2011, 9, 27, 0, 0),
datetime.datetime(2011, 9, 28, 0, 0),
datetime.datetime(2011, 10, 5, 0, 0),
datetime.datetime(2011, 10, 5, 0, 0),
datetime.datetime(2011, 10, 6, 0, 0),
datetime.datetime(2011, 10, 7, 0, 0),
datetime.datetime(2011, 10, 7, 0, 0),
datetime.datetime(2011, 10, 7, 0, 0),
datetime.datetime(2011, 10, 10, 0, 0),
datetime.datetime(2011, 10, 11, 0, 0),
datetime.datetime(2011, 10, 11, 0, 0),
datetime.datetime(2011, 10, 11, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 13, 0, 0),
datetime.datetime(2011, 10, 14, 0, 0),
datetime.datetime(2011, 10, 15, 0, 0),
datetime.datetime(2011, 10, 15, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 22, 0, 0),
datetime.datetime(2011, 10, 22, 0, 0),
datetime.datetime(2011, 10, 28, 0, 0),
datetime.datetime(2011, 11, 2, 0, 0),
datetime.datetime(2011, 11, 5, 0, 0),
datetime.datetime(2011, 11, 10, 0, 0),
datetime.datetime(2011, 11, 11, 0, 0),
datetime.datetime(2011, 11, 16, 0, 0),
datetime.datetime(2011, 11, 17, 0, 0),
datetime.datetime(2011, 11, 17, 0, 0),
datetime.datetime(2011, 11, 18, 0, 0),
datetime.datetime(2011, 11, 18, 0, 0),
datetime.datetime(2011, 11, 18, 0, 0),
datetime.datetime(2011, 11, 21, 0, 0),
datetime.datetime(2011, 11, 22, 0, 0),
datetime.datetime(2011, 11, 25, 0, 0),
datetime.datetime(2011, 11, 25, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 3, 0, 0),
datetime.datetime(2011, 12, 3, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 7, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 13, 0, 0),
datetime.datetime(2011, 12, 13, 0, 0),
datetime.datetime(2011, 12, 15, 0, 0),
datetime.datetime(2011, 12, 19, 0, 0),
datetime.datetime(2011, 12, 22, 0, 0),
datetime.datetime(2011, 12, 23, 0, 0),
datetime.datetime(2011, 12, 23, 0, 0),
datetime.datetime(2011, 12, 24, 0, 0),
datetime.datetime(2011, 12, 24, 0, 0),
datetime.datetime(2011, 12, 27, 0, 0),
datetime.datetime(2011, 12, 28, 0, 0),
datetime.datetime(2011, 12, 29, 0, 0),
datetime.datetime(2011, 1, 5, 0, 0),
datetime.datetime(2011, 1, 14, 0, 0),
datetime.datetime(2011, 1, 15, 0, 0),
datetime.datetime(2011, 1, 21, 0, 0),
datetime.datetime(2011, 2, 17, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 3, 4, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 4, 19, 0, 0),
datetime.datetime(2011, 4, 26, 0, 0),
datetime.datetime(2011, 5, 6, 0, 0),
datetime.datetime(2011, 5, 10, 0, 0),
datetime.datetime(2011, 6, 15, 0, 0),
datetime.datetime(2011, 6, 17, 0, 0),
datetime.datetime(2011, 7, 5, 0, 0),
datetime.datetime(2011, 7, 6, 0, 0),
datetime.datetime(2011, 7, 8, 0, 0),
datetime.datetime(2011, 7, 21, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 7, 27, 0, 0),
datetime.datetime(2011, 8, 13, 0, 0),
datetime.datetime(2011, 8, 17, 0, 0),
datetime.datetime(2011, 8, 23, 0, 0),
datetime.datetime(2011, 8, 25, 0, 0),
datetime.datetime(2011, 8, 30, 0, 0),
datetime.datetime(2011, 9, 6, 0, 0),
datetime.datetime(2011, 9, 23, 0, 0),
datetime.datetime(2011, 10, 15, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 10, 20, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 11, 9, 0, 0),
datetime.datetime(2011, 11, 23, 0, 0),
datetime.datetime(2011, 11, 29, 0, 0),
datetime.datetime(2011, 12, 3, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 9, 0, 0),
datetime.datetime(2011, 12, 9, 0, 0),
datetime.datetime(2011, 12, 9, 0, 0),
datetime.datetime(2011, 12, 21, 0, 0),
datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 13, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 1, 25, 0, 0),
datetime.datetime(2011, 1, 25, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 29, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 2, 23, 0, 0),
datetime.datetime(2011, 3, 7, 0, 0),
datetime.datetime(2011, 3, 10, 0, 0),
datetime.datetime(2011, 3, 12, 0, 0),
datetime.datetime(2011, 3, 21, 0, 0),
datetime.datetime(2011, 3, 21, 0, 0),
datetime.datetime(2011, 3, 21, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 30, 0, 0),
datetime.datetime(2011, 4, 7, 0, 0),
datetime.datetime(2011, 4, 21, 0, 0),
datetime.datetime(2011, 4, 25, 0, 0),
datetime.datetime(2011, 4, 26, 0, 0),
datetime.datetime(2011, 4, 27, 0, 0),
datetime.datetime(2011, 5, 2, 0, 0),
datetime.datetime(2011, 5, 2, 0, 0),
datetime.datetime(2011, 5, 3, 0, 0),
datetime.datetime(2011, 5, 16, 0, 0),
datetime.datetime(2011, 5, 20, 0, 0),
datetime.datetime(2011, 5, 23, 0, 0),
datetime.datetime(2011, 5, 26, 0, 0),
datetime.datetime(2011, 6, 2, 0, 0),
datetime.datetime(2011, 6, 3, 0, 0),
datetime.datetime(2011, 6, 7, 0, 0),
datetime.datetime(2011, 6, 8, 0, 0),
datetime.datetime(2011, 6, 9, 0, 0),
datetime.datetime(2011, 6, 10, 0, 0),
datetime.datetime(2011, 6, 16, 0, 0),
datetime.datetime(2011, 6, 21, 0, 0),
datetime.datetime(2011, 7, 1, 0, 0),
datetime.datetime(2011, 7, 2, 0, 0),
datetime.datetime(2011, 7, 5, 0, 0),
datetime.datetime(2011, 7, 5, 0, 0),
datetime.datetime(2011, 7, 7, 0, 0),
datetime.datetime(2011, 7, 7, 0, 0),
datetime.datetime(2011, 7, 15, 0, 0),
datetime.datetime(2011, 7, 18, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 21, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 23, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 7, 29, 0, 0),
datetime.datetime(2011, 8, 11, 0, 0),
datetime.datetime(2011, 8, 15, 0, 0),
datetime.datetime(2011, 8, 20, 0, 0),
datetime.datetime(2011, 9, 17, 0, 0),
datetime.datetime(2011, 9, 21, 0, 0),
datetime.datetime(2011, 9, 21, 0, 0),
datetime.datetime(2011, 9, 23, 0, 0),
datetime.datetime(2011, 9, 27, 0, 0),
datetime.datetime(2011, 9, 29, 0, 0),
datetime.datetime(2011, 9, 30, 0, 0),
datetime.datetime(2011, 10, 3, 0, 0),
datetime.datetime(2011, 10, 7, 0, 0),
datetime.datetime(2011, 10, 10, 0, 0),
datetime.datetime(2011, 10, 10, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 13, 0, 0),
datetime.datetime(2011, 10, 13, 0, 0),
datetime.datetime(2011, 10, 13, 0, 0),
datetime.datetime(2011, 10, 13, 0, 0),
datetime.datetime(2011, 10, 14, 0, 0),
datetime.datetime(2011, 10, 19, 0, 0),
datetime.datetime(2011, 10, 20, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 11, 3, 0, 0),
datetime.datetime(2011, 11, 4, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 11, 9, 0, 0),
datetime.datetime(2011, 11, 9, 0, 0),
datetime.datetime(2011, 11, 10, 0, 0),
datetime.datetime(2011, 11, 13, 0, 0),
datetime.datetime(2011, 11, 14, 0, 0),
datetime.datetime(2011, 11, 16, 0, 0),
datetime.datetime(2011, 11, 22, 0, 0),
datetime.datetime(2011, 11, 22, 0, 0),
datetime.datetime(2011, 11, 24, 0, 0),
datetime.datetime(2011, 11, 30, 0, 0),
datetime.datetime(2011, 11, 30, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 1, 0, 0),
datetime.datetime(2011, 12, 3, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 7, 0, 0),
datetime.datetime(2011, 12, 16, 0, 0),
datetime.datetime(2011, 12, 16, 0, 0),
datetime.datetime(2011, 12, 20, 0, 0),
datetime.datetime(2011, 12, 21, 0, 0),
datetime.datetime(2011, 12, 23, 0, 0),
datetime.datetime(2011, 12, 26, 0, 0),
datetime.datetime(2011, 12, 27, 0, 0),
datetime.datetime(2011, 12, 28, 0, 0),
datetime.datetime(2011, 12, 30, 0, 0),
datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 2, 8, 0, 0),
datetime.datetime(2011, 2, 21, 0, 0),
datetime.datetime(2011, 2, 27, 0, 0),
datetime.datetime(2011, 5, 10, 0, 0),
datetime.datetime(2011, 5, 25, 0, 0),
datetime.datetime(2011, 6, 13, 0, 0),
datetime.datetime(2011, 7, 2, 0, 0),
datetime.datetime(2011, 7, 6, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 23, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 8, 15, 0, 0),
datetime.datetime(2011, 10, 1, 0, 0),
datetime.datetime(2011, 10, 18, 0, 0),
datetime.datetime(2011, 10, 25, 0, 0),
datetime.datetime(2011, 10, 28, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 12, 0, 0),
datetime.datetime(2011, 12, 15, 0, 0),
datetime.datetime(2011, 12, 19, 0, 0),
datetime.datetime(2011, 1, 5, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 2, 14, 0, 0),
datetime.datetime(2011, 3, 2, 0, 0),
datetime.datetime(2011, 5, 4, 0, 0),
datetime.datetime(2011, 5, 4, 0, 0),
datetime.datetime(2011, 5, 17, 0, 0),
datetime.datetime(2011, 5, 21, 0, 0),
datetime.datetime(2011, 6, 10, 0, 0),
datetime.datetime(2011, 6, 12, 0, 0),
datetime.datetime(2011, 7, 2, 0, 0),
datetime.datetime(2011, 7, 8, 0, 0),
datetime.datetime(2011, 7, 9, 0, 0),
datetime.datetime(2011, 8, 1, 0, 0),
datetime.datetime(2011, 8, 17, 0, 0),
datetime.datetime(2011, 8, 20, 0, 0),
datetime.datetime(2011, 9, 8, 0, 0),
datetime.datetime(2011, 9, 24, 0, 0),
datetime.datetime(2011, 9, 28, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 10, 20, 0, 0),
datetime.datetime(2011, 11, 3, 0, 0),
datetime.datetime(2011, 11, 3, 0, 0),
datetime.datetime(2011, 11, 3, 0, 0),
datetime.datetime(2011, 11, 22, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 14, 0, 0),
datetime.datetime(2011, 12, 14, 0, 0),
datetime.datetime(2011, 12, 26, 0, 0),
datetime.datetime(2011, 12, 29, 0, 0),
datetime.datetime(2011, 12, 29, 0, 0),
datetime.datetime(2011, 1, 13, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 3, 4, 0, 0),
datetime.datetime(2011, 5, 21, 0, 0),
datetime.datetime(2011, 7, 23, 0, 0),
datetime.datetime(2011, 8, 19, 0, 0),
datetime.datetime(2011, 8, 30, 0, 0),
datetime.datetime(2011, 9, 2, 0, 0),
datetime.datetime(2011, 9, 8, 0, 0),
datetime.datetime(2011, 9, 22, 0, 0),
datetime.datetime(2011, 9, 24, 0, 0),
datetime.datetime(2011, 10, 19, 0, 0),
datetime.datetime(2011, 10, 31, 0, 0),
datetime.datetime(2011, 10, 31, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 11, 10, 0, 0),
datetime.datetime(2011, 11, 10, 0, 0),
datetime.datetime(2011, 11, 10, 0, 0),
datetime.datetime(2011, 11, 22, 0, 0),
datetime.datetime(2011, 12, 24, 0, 0),
datetime.datetime(2011, 12, 24, 0, 0),
datetime.datetime(2011, 4, 23, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 2, 10, 0, 0),
datetime.datetime(2011, 4, 8, 0, 0),
datetime.datetime(2011, 6, 5, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 11, 24, 0, 0),
datetime.datetime(2011, 3, 8, 0, 0),
datetime.datetime(2011, 7, 7, 0, 0),
datetime.datetime(2011, 8, 17, 0, 0),
datetime.datetime(2011, 3, 7, 0, 0),
datetime.datetime(2011, 4, 15, 0, 0),
datetime.datetime(2011, 1, 5, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 16, 0, 0),
datetime.datetime(2011, 2, 18, 0, 0),
datetime.datetime(2011, 2, 18, 0, 0),
datetime.datetime(2011, 2, 21, 0, 0),
datetime.datetime(2011, 2, 25, 0, 0),
datetime.datetime(2011, 3, 9, 0, 0),
datetime.datetime(2011, 3, 12, 0, 0),
datetime.datetime(2011, 3, 30, 0, 0),
datetime.datetime(2011, 4, 9, 0, 0),
datetime.datetime(2011, 4, 13, 0, 0),
datetime.datetime(2011, 4, 15, 0, 0),
datetime.datetime(2011, 4, 29, 0, 0),
datetime.datetime(2011, 4, 29, 0, 0),
datetime.datetime(2011, 5, 2, 0, 0),
datetime.datetime(2011, 5, 12, 0, 0),
datetime.datetime(2011, 6, 4, 0, 0),
datetime.datetime(2011, 7, 5, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 8, 6, 0, 0),
datetime.datetime(2011, 8, 17, 0, 0),
datetime.datetime(2011, 8, 22, 0, 0),
datetime.datetime(2011, 8, 26, 0, 0),
datetime.datetime(2011, 9, 29, 0, 0),
datetime.datetime(2011, 10, 6, 0, 0),
datetime.datetime(2011, 10, 22, 0, 0),
datetime.datetime(2011, 10, 26, 0, 0),
datetime.datetime(2011, 10, 29, 0, 0),
datetime.datetime(2011, 11, 3, 0, 0),
datetime.datetime(2011, 11, 3, 0, 0),
datetime.datetime(2011, 12, 13, 0, 0),
datetime.datetime(2011, 1, 4, 0, 0),
datetime.datetime(2011, 1, 6, 0, 0),
datetime.datetime(2011, 1, 6, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 1, 11, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0),
datetime.datetime(2011, 1, 13, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 1, 20, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 1, 29, 0, 0),
datetime.datetime(2011, 2, 13, 0, 0),
datetime.datetime(2011, 2, 16, 0, 0),
datetime.datetime(2011, 2, 17, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 2, 21, 0, 0),
datetime.datetime(2011, 2, 23, 0, 0),
datetime.datetime(2011, 2, 24, 0, 0),
datetime.datetime(2011, 2, 24, 0, 0),
datetime.datetime(2011, 2, 26, 0, 0),
datetime.datetime(2011, 3, 1, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 28, 0, 0),
datetime.datetime(2011, 3, 30, 0, 0),
datetime.datetime(2011, 4, 5, 0, 0),
datetime.datetime(2011, 4, 12, 0, 0),
datetime.datetime(2011, 4, 12, 0, 0),
datetime.datetime(2011, 4, 12, 0, 0),
datetime.datetime(2011, 4, 14, 0, 0),
datetime.datetime(2011, 4, 21, 0, 0),
datetime.datetime(2011, 4, 22, 0, 0),
datetime.datetime(2011, 4, 27, 0, 0),
datetime.datetime(2011, 5, 5, 0, 0),
datetime.datetime(2011, 5, 5, 0, 0),
datetime.datetime(2011, 5, 13, 0, 0),
datetime.datetime(2011, 5, 16, 0, 0),
datetime.datetime(2011, 5, 28, 0, 0),
datetime.datetime(2011, 6, 1, 0, 0),
datetime.datetime(2011, 6, 9, 0, 0),
datetime.datetime(2011, 6, 13, 0, 0),
datetime.datetime(2011, 6, 15, 0, 0),
datetime.datetime(2011, 6, 20, 0, 0),
datetime.datetime(2011, 6, 23, 0, 0),
datetime.datetime(2011, 6, 27, 0, 0),
datetime.datetime(2011, 7, 1, 0, 0),
datetime.datetime(2011, 7, 23, 0, 0),
datetime.datetime(2011, 8, 5, 0, 0),
datetime.datetime(2011, 8, 8, 0, 0),
datetime.datetime(2011, 8, 11, 0, 0),
datetime.datetime(2011, 8, 30, 0, 0),
datetime.datetime(2011, 9, 1, 0, 0),
datetime.datetime(2011, 9, 1, 0, 0),
datetime.datetime(2011, 9, 6, 0, 0),
datetime.datetime(2011, 9, 6, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 9, 24, 0, 0),
datetime.datetime(2011, 10, 1, 0, 0),
datetime.datetime(2011, 10, 3, 0, 0),
datetime.datetime(2011, 10, 4, 0, 0),
datetime.datetime(2011, 10, 15, 0, 0),
datetime.datetime(2011, 11, 12, 0, 0),
datetime.datetime(2011, 11, 14, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 12, 0, 0),
datetime.datetime(2011, 12, 14, 0, 0),
datetime.datetime(2011, 12, 21, 0, 0),
datetime.datetime(2011, 12, 24, 0, 0),
datetime.datetime(2011, 12, 27, 0, 0),
datetime.datetime(2011, 12, 28, 0, 0),
datetime.datetime(2011, 12, 30, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 1, 25, 0, 0),
datetime.datetime(2011, 1, 25, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 2, 7, 0, 0),
datetime.datetime(2011, 2, 7, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 10, 0, 0),
datetime.datetime(2011, 2, 21, 0, 0),
datetime.datetime(2011, 2, 22, 0, 0),
datetime.datetime(2011, 2, 23, 0, 0),
datetime.datetime(2011, 2, 26, 0, 0),
datetime.datetime(2011, 3, 5, 0, 0),
datetime.datetime(2011, 3, 9, 0, 0),
datetime.datetime(2011, 3, 26, 0, 0),
datetime.datetime(2011, 3, 26, 0, 0),
datetime.datetime(2011, 3, 31, 0, 0),
datetime.datetime(2011, 4, 1, 0, 0),
datetime.datetime(2011, 4, 2, 0, 0),
datetime.datetime(2011, 4, 5, 0, 0),
datetime.datetime(2011, 4, 17, 0, 0),
datetime.datetime(2011, 4, 22, 0, 0),
datetime.datetime(2011, 4, 28, 0, 0),
datetime.datetime(2011, 5, 3, 0, 0),
datetime.datetime(2011, 5, 13, 0, 0),
datetime.datetime(2011, 5, 14, 0, 0),
datetime.datetime(2011, 5, 20, 0, 0),
datetime.datetime(2011, 5, 20, 0, 0),
datetime.datetime(2011, 5, 26, 0, 0),
datetime.datetime(2011, 5, 27, 0, 0),
datetime.datetime(2011, 5, 30, 0, 0),
datetime.datetime(2011, 6, 3, 0, 0),
datetime.datetime(2011, 6, 8, 0, 0),
datetime.datetime(2011, 7, 15, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 23, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 27, 0, 0),
datetime.datetime(2011, 7, 31, 0, 0),
datetime.datetime(2011, 8, 31, 0, 0),
datetime.datetime(2011, 9, 5, 0, 0),
datetime.datetime(2011, 9, 6, 0, 0),
datetime.datetime(2011, 9, 7, 0, 0),
datetime.datetime(2011, 10, 8, 0, 0),
datetime.datetime(2011, 10, 19, 0, 0),
datetime.datetime(2011, 10, 25, 0, 0),
datetime.datetime(2011, 10, 28, 0, 0),
datetime.datetime(2011, 10, 28, 0, 0),
datetime.datetime(2011, 11, 1, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 11, 11, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 7, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 19, 0, 0),
datetime.datetime(2011, 12, 21, 0, 0),
datetime.datetime(2011, 12, 23, 0, 0),
datetime.datetime(2011, 12, 23, 0, 0),
datetime.datetime(2011, 3, 8, 0, 0),
datetime.datetime(2011, 2, 22, 0, 0),
datetime.datetime(2011, 6, 7, 0, 0),
datetime.datetime(2011, 7, 3, 0, 0),
datetime.datetime(2011, 7, 9, 0, 0),
datetime.datetime(2011, 9, 15, 0, 0),
datetime.datetime(2011, 9, 20, 0, 0),
datetime.datetime(2011, 10, 1, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 2, 26, 0, 0),
datetime.datetime(2011, 4, 14, 0, 0),
datetime.datetime(2011, 4, 19, 0, 0),
datetime.datetime(2011, 5, 20, 0, 0),
datetime.datetime(2011, 9, 9, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 2, 10, 0, 0),
datetime.datetime(2011, 9, 28, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 1, 9, 0, 0),
datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 11, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 1, 21, 0, 0),
datetime.datetime(2011, 1, 22, 0, 0),
datetime.datetime(2011, 2, 1, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 10, 0, 0),
datetime.datetime(2011, 2, 11, 0, 0),
datetime.datetime(2011, 2, 15, 0, 0),
datetime.datetime(2011, 2, 28, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 25, 0, 0),
datetime.datetime(2011, 3, 31, 0, 0),
datetime.datetime(2011, 4, 5, 0, 0),
datetime.datetime(2011, 4, 11, 0, 0),
datetime.datetime(2011, 4, 27, 0, 0),
datetime.datetime(2011, 4, 29, 0, 0),
datetime.datetime(2011, 5, 2, 0, 0),
datetime.datetime(2011, 5, 4, 0, 0),
datetime.datetime(2011, 5, 13, 0, 0),
datetime.datetime(2011, 5, 17, 0, 0),
datetime.datetime(2011, 6, 14, 0, 0),
datetime.datetime(2011, 6, 14, 0, 0),
datetime.datetime(2011, 6, 14, 0, 0),
datetime.datetime(2011, 7, 5, 0, 0),
datetime.datetime(2011, 7, 5, 0, 0),
datetime.datetime(2011, 7, 7, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 7, 22, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 7, 29, 0, 0),
datetime.datetime(2011, 8, 8, 0, 0),
datetime.datetime(2011, 8, 13, 0, 0),
datetime.datetime(2011, 8, 13, 0, 0),
datetime.datetime(2011, 8, 25, 0, 0),
datetime.datetime(2011, 8, 26, 0, 0),
datetime.datetime(2011, 8, 27, 0, 0),
datetime.datetime(2011, 8, 30, 0, 0),
datetime.datetime(2011, 9, 1, 0, 0),
datetime.datetime(2011, 9, 8, 0, 0),
datetime.datetime(2011, 9, 22, 0, 0),
datetime.datetime(2011, 10, 14, 0, 0),
datetime.datetime(2011, 10, 15, 0, 0),
datetime.datetime(2011, 10, 20, 0, 0),
datetime.datetime(2011, 10, 20, 0, 0),
datetime.datetime(2011, 10, 24, 0, 0),
datetime.datetime(2011, 10, 25, 0, 0),
datetime.datetime(2011, 10, 25, 0, 0),
datetime.datetime(2011, 11, 16, 0, 0),
datetime.datetime(2011, 11, 21, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 7, 0, 0),
datetime.datetime(2011, 12, 7, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 12, 10, 0, 0),
datetime.datetime(2011, 12, 19, 0, 0),
datetime.datetime(2011, 12, 21, 0, 0),
datetime.datetime(2011, 12, 22, 0, 0),
datetime.datetime(2011, 11, 17, 0, 0),
datetime.datetime(2011, 1, 5, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 1, 29, 0, 0),
datetime.datetime(2011, 3, 28, 0, 0),
datetime.datetime(2011, 5, 27, 0, 0),
datetime.datetime(2011, 6, 15, 0, 0),
datetime.datetime(2011, 9, 2, 0, 0),
datetime.datetime(2011, 10, 11, 0, 0),
datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 18, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 5, 25, 0, 0),
datetime.datetime(2011, 6, 2, 0, 0),
datetime.datetime(2011, 6, 10, 0, 0),
datetime.datetime(2011, 10, 18, 0, 0),
datetime.datetime(2011, 2, 15, 0, 0),
datetime.datetime(2011, 3, 3, 0, 0),
datetime.datetime(2011, 12, 7, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 2, 24, 0, 0),
datetime.datetime(2011, 3, 5, 0, 0),
datetime.datetime(2011, 3, 8, 0, 0),
datetime.datetime(2011, 6, 21, 0, 0),
datetime.datetime(2011, 10, 22, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 21, 0, 0),
datetime.datetime(2011, 4, 6, 0, 0),
datetime.datetime(2011, 4, 9, 0, 0),
datetime.datetime(2011, 4, 13, 0, 0),
datetime.datetime(2011, 4, 15, 0, 0),
datetime.datetime(2011, 5, 11, 0, 0),
datetime.datetime(2011, 6, 29, 0, 0),
datetime.datetime(2011, 8, 19, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 11, 16, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 2, 1, 0, 0),
datetime.datetime(2011, 2, 21, 0, 0),
datetime.datetime(2011, 2, 24, 0, 0),
datetime.datetime(2011, 3, 2, 0, 0),
datetime.datetime(2011, 3, 2, 0, 0),
datetime.datetime(2011, 6, 28, 0, 0),
datetime.datetime(2011, 6, 28, 0, 0),
datetime.datetime(2011, 6, 28, 0, 0),
datetime.datetime(2011, 7, 15, 0, 0),
datetime.datetime(2011, 7, 21, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 8, 11, 0, 0),
datetime.datetime(2011, 8, 29, 0, 0),
datetime.datetime(2011, 9, 28, 0, 0),
datetime.datetime(2011, 10, 4, 0, 0),
datetime.datetime(2011, 10, 12, 0, 0),
datetime.datetime(2011, 10, 20, 0, 0),
datetime.datetime(2011, 10, 28, 0, 0),
datetime.datetime(2011, 11, 28, 0, 0),
datetime.datetime(2011, 11, 28, 0, 0),
datetime.datetime(2011, 12, 6, 0, 0),
datetime.datetime(2011, 12, 28, 0, 0),
datetime.datetime(2011, 12, 30, 0, 0),
datetime.datetime(2011, 12, 20, 0, 0),
datetime.datetime(2011, 2, 25, 0, 0),
datetime.datetime(2011, 3, 2, 0, 0),
datetime.datetime(2011, 3, 5, 0, 0),
datetime.datetime(2011, 3, 16, 0, 0),
datetime.datetime(2011, 3, 21, 0, 0),
datetime.datetime(2011, 4, 7, 0, 0),
datetime.datetime(2011, 2, 17, 0, 0),
datetime.datetime(2011, 2, 25, 0, 0),
datetime.datetime(2011, 2, 25, 0, 0),
datetime.datetime(2011, 2, 28, 0, 0),
datetime.datetime(2011, 6, 16, 0, 0),
datetime.datetime(2011, 11, 23, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 9, 30, 0, 0),
datetime.datetime(2011, 11, 2, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 3, 16, 0, 0),
datetime.datetime(2011, 1, 4, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 20, 0, 0),
datetime.datetime(2011, 5, 26, 0, 0),
datetime.datetime(2011, 5, 27, 0, 0),
datetime.datetime(2011, 8, 2, 0, 0),
datetime.datetime(2011, 9, 15, 0, 0),
datetime.datetime(2011, 10, 24, 0, 0),
datetime.datetime(2011, 10, 11, 0, 0),
datetime.datetime(2011, 10, 1, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 2, 1, 0, 0),
datetime.datetime(2011, 2, 17, 0, 0),
datetime.datetime(2011, 3, 2, 0, 0),
datetime.datetime(2011, 3, 19, 0, 0),
datetime.datetime(2011, 5, 3, 0, 0),
datetime.datetime(2011, 5, 24, 0, 0),
datetime.datetime(2011, 9, 23, 0, 0),
datetime.datetime(2011, 2, 23, 0, 0),
datetime.datetime(2011, 3, 3, 0, 0),
datetime.datetime(2011, 4, 19, 0, 0),
datetime.datetime(2011, 8, 23, 0, 0),
datetime.datetime(2011, 1, 7, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 1, 29, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 23, 0, 0),
datetime.datetime(2011, 3, 5, 0, 0),
datetime.datetime(2011, 3, 9, 0, 0),
datetime.datetime(2011, 4, 7, 0, 0),
datetime.datetime(2011, 5, 17, 0, 0),
datetime.datetime(2011, 5, 23, 0, 0),
datetime.datetime(2011, 7, 4, 0, 0),
datetime.datetime(2011, 7, 15, 0, 0),
datetime.datetime(2011, 7, 25, 0, 0),
datetime.datetime(2011, 8, 5, 0, 0),
datetime.datetime(2011, 8, 11, 0, 0),
datetime.datetime(2011, 8, 23, 0, 0),
datetime.datetime(2011, 10, 1, 0, 0),
datetime.datetime(2011, 10, 3, 0, 0),
datetime.datetime(2011, 12, 5, 0, 0),
datetime.datetime(2011, 12, 28, 0, 0),
datetime.datetime(2011, 12, 28, 0, 0),
datetime.datetime(2011, 12, 29, 0, 0),
datetime.datetime(2011, 12, 31, 0, 0),
datetime.datetime(2011, 2, 17, 0, 0),
datetime.datetime(2011, 6, 13, 0, 0),
datetime.datetime(2011, 7, 6, 0, 0),
datetime.datetime(2011, 6, 28, 0, 0),
datetime.datetime(2011, 8, 22, 0, 0),
datetime.datetime(2011, 9, 30, 0, 0),
datetime.datetime(2011, 10, 4, 0, 0),
datetime.datetime(2011, 11, 30, 0, 0),
datetime.datetime(2011, 8, 5, 0, 0),
datetime.datetime(2011, 5, 4, 0, 0),
datetime.datetime(2011, 7, 28, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 1, 6, 0, 0),
datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 2, 9, 0, 0),
datetime.datetime(2011, 2, 16, 0, 0),
datetime.datetime(2011, 3, 24, 0, 0),
datetime.datetime(2011, 4, 18, 0, 0),
datetime.datetime(2011, 5, 11, 0, 0),
datetime.datetime(2011, 5, 23, 0, 0),
datetime.datetime(2011, 6, 1, 0, 0),
datetime.datetime(2011, 6, 15, 0, 0),
datetime.datetime(2011, 6, 24, 0, 0),
datetime.datetime(2011, 7, 4, 0, 0),
datetime.datetime(2011, 7, 12, 0, 0),
datetime.datetime(2011, 8, 26, 0, 0),
datetime.datetime(2011, 10, 1, 0, 0),
datetime.datetime(2011, 11, 23, 0, 0),
datetime.datetime(2011, 12, 30, 0, 0),
datetime.datetime(2011, 5, 28, 0, 0),
datetime.datetime(2011, 4, 1, 0, 0),
datetime.datetime(2011, 7, 12, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 6, 10, 0, 0),
datetime.datetime(2011, 1, 8, 0, 0),
datetime.datetime(2011, 1, 14, 0, 0),
datetime.datetime(2011, 1, 15, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 1, 22, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 1, 24, 0, 0),
datetime.datetime(2011, 1, 27, 0, 0),
datetime.datetime(2011, 1, 29, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 14, 0, 0),
datetime.datetime(2011, 2, 18, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 3, 11, 0, 0),
datetime.datetime(2011, 3, 16, 0, 0),
datetime.datetime(2011, 4, 14, 0, 0),
datetime.datetime(2011, 4, 14, 0, 0),
datetime.datetime(2011, 4, 20, 0, 0),
datetime.datetime(2011, 4, 21, 0, 0),
datetime.datetime(2011, 5, 10, 0, 0),
datetime.datetime(2011, 5, 13, 0, 0),
datetime.datetime(2011, 5, 21, 0, 0),
datetime.datetime(2011, 5, 31, 0, 0),
datetime.datetime(2011, 6, 11, 0, 0),
datetime.datetime(2011, 6, 18, 0, 0),
datetime.datetime(2011, 7, 15, 0, 0),
datetime.datetime(2011, 7, 29, 0, 0),
datetime.datetime(2011, 8, 6, 0, 0),
datetime.datetime(2011, 8, 17, 0, 0),
datetime.datetime(2011, 8, 18, 0, 0),
datetime.datetime(2011, 8, 27, 0, 0),
datetime.datetime(2011, 9, 1, 0, 0),
datetime.datetime(2011, 10, 21, 0, 0),
datetime.datetime(2011, 10, 27, 0, 0),
datetime.datetime(2011, 11, 10, 0, 0),
datetime.datetime(2011, 12, 17, 0, 0),
datetime.datetime(2011, 1, 6, 0, 0),
datetime.datetime(2011, 3, 29, 0, 0),
datetime.datetime(2011, 5, 21, 0, 0),
datetime.datetime(2011, 5, 23, 0, 0),
datetime.datetime(2011, 6, 6, 0, 0),
datetime.datetime(2011, 9, 17, 0, 0),
datetime.datetime(2011, 9, 21, 0, 0),
datetime.datetime(2011, 12, 22, 0, 0),
datetime.datetime(2011, 11, 5, 0, 0),
datetime.datetime(2011, 1, 13, 0, 0),
datetime.datetime(2011, 1, 15, 0, 0),
datetime.datetime(2011, 1, 31, 0, 0),
datetime.datetime(2011, 2, 7, 0, 0),
datetime.datetime(2011, 6, 6, 0, 0),
datetime.datetime(2011, 6, 17, 0, 0),
datetime.datetime(2011, 8, 8, 0, 0),
datetime.datetime(2011, 12, 8, 0, 0),
datetime.datetime(2011, 3, 14, 0, 0),
datetime.datetime(2011, 8, 1, 0, 0),
datetime.datetime(2011, 12, 27, 0, 0),
datetime.datetime(2011, 1, 17, 0, 0),
datetime.datetime(2011, 2, 14, 0, 0),
datetime.datetime(2011, 2, 15, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 4, 1, 0, 0),
datetime.datetime(2011, 1, 4, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 7, 1, 0, 0),
datetime.datetime(2011, 9, 29, 0, 0),
datetime.datetime(2011, 6, 10, 0, 0),
datetime.datetime(2011, 4, 18, 0, 0),
datetime.datetime(2011, 1, 14, 0, 0),
datetime.datetime(2011, 3, 4, 0, 0),
datetime.datetime(2011, 4, 18, 0, 0),
datetime.datetime(2011, 5, 16, 0, 0),
datetime.datetime(2011, 5, 24, 0, 0),
datetime.datetime(2011, 7, 9, 0, 0),
datetime.datetime(2011, 8, 31, 0, 0),
datetime.datetime(2011, 10, 17, 0, 0),
datetime.datetime(2011, 4, 25, 0, 0),
datetime.datetime(2011, 6, 15, 0, 0),
datetime.datetime(2011, 12, 12, 0, 0),
datetime.datetime(2011, 5, 27, 0, 0),
datetime.datetime(2011, 11, 25, 0, 0),
datetime.datetime(2011, 6, 15, 0, 0),
datetime.datetime(2011, 6, 25, 0, 0),
datetime.datetime(2011, 1, 19, 0, 0),
datetime.datetime(2011, 10, 19, 0, 0),
datetime.datetime(2011, 11, 7, 0, 0),
datetime.datetime(2011, 12, 16, 0, 0),
datetime.datetime(2011, 12, 17, 0, 0),
datetime.datetime(2011, 6, 24, 0, 0),
datetime.datetime(2011, 6, 27, 0, 0),
datetime.datetime(2011, 8, 25, 0, 0),
datetime.datetime(2011, 9, 2, 0, 0),
datetime.datetime(2011, 9, 22, 0, 0),
datetime.datetime(2011, 9, 30, 0, 0),
datetime.datetime(2011, 12, 17, 0, 0),
datetime.datetime(2011, 1, 26, 0, 0),
datetime.datetime(2011, 7, 26, 0, 0),
datetime.datetime(2011, 8, 27, 0, 0),
datetime.datetime(2011, 9, 16, 0, 0),
datetime.datetime(2011, 2, 14, 0, 0),
datetime.datetime(2011, 3, 16, 0, 0),
datetime.datetime(2011, 3, 21, 0, 0),
datetime.datetime(2011, 4, 25, 0, 0),
datetime.datetime(2011, 5, 24, 0, 0),
datetime.datetime(2011, 5, 27, 0, 0),
datetime.datetime(2011, 6, 3, 0, 0),
datetime.datetime(2011, 7, 12, 0, 0),
datetime.datetime(2011, 7, 20, 0, 0),
datetime.datetime(2011, 11, 23, 0, 0),
datetime.datetime(2011, 8, 10, 0, 0),
datetime.datetime(2011, 1, 28, 0, 0),
datetime.datetime(2011, 2, 8, 0, 0),
datetime.datetime(2011, 7, 3, 0, 0),
...]
year_month_day = pd.DataFrame(dt_datetime, columns=["계약년월일"])
year.name = "년도"
tr_concat = pd.concat([tr_concat, year_month_day, year], axis=1)
#데이터분석에 필요한 컬럼만 추출
tr_data = tr_concat[["계약년월일","시군구","단지명","전용면적(㎡)","거래금액(만원)","년도"]]
tr_data["area"] = tr_data.시군구.str.split(" ").str[1]
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
tr_data["area"]
0 강남구
1 강남구
2 강남구
3 강남구
4 강남구
...
826104 중랑구
826105 중랑구
826106 중랑구
826107 중랑구
826108 중랑구
Name: area, Length: 826109, dtype: object
type(tr_data["거래금액(만원)"])
pandas.core.series.Series
#문자형 > 숫자형 데이터로 변환
tr_data["거래금액(만원)"] = tr_data["거래금액(만원)"].str.replace(',','')
tr_data["거래금액(만원)"] = tr_data["거래금액(만원)"].apply(pd.to_numeric)
#평당가격 도출
tr_data["평당가"] = tr_data["거래금액(만원)"] / tr_data["전용면적(㎡)"]
tr_data
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""
계약년월일 시군구 단지명 전용면적(㎡) 거래금액(만원) 년도 \
0 2011-07-09 서울특별시 강남구 개포동 개포2차현대아파트(220) 77.75 64000 2011
1 2011-07-28 서울특별시 강남구 개포동 개포2차현대아파트(220) 77.75 65500 2011
2 2011-01-19 서울특별시 강남구 개포동 개포6차우성아파트1동~8동 67.28 70500 2011
3 2011-09-02 서울특별시 강남구 개포동 개포6차우성아파트1동~8동 79.97 85000 2011
4 2011-12-17 서울특별시 강남구 개포동 개포6차우성아파트1동~8동 67.28 68000 2011
... ... ... ... ... ... ...
826104 2020-08-07 서울특별시 중랑구 중화동 한영(104) 67.57 26000 2020
826105 2020-07-10 서울특별시 중랑구 중화동 현대휴앤미 95.94 44000 2020
826106 2020-12-03 서울특별시 중랑구 중화동 현대휴앤미 100.17 54800 2020
826107 2020-09-28 서울특별시 중랑구 중화동 현대휴앤미(102동) 77.71 40000 2020
826108 2020-09-28 서울특별시 중랑구 중화동 현대휴앤미(102동) 77.71 40000 2020
area 평당가
0 강남구 823.151125
1 강남구 842.443730
2 강남구 1047.859691
3 강남구 1062.898587
4 강남구 1010.701546
... ... ...
826104 중랑구 384.786148
826105 중랑구 458.619971
826106 중랑구 547.069981
826107 중랑구 514.734268
826108 중랑구 514.734268
[826109 rows x 8 columns]
#일자별 해당 지역 평당가 체크
tr_data_series = tr_data.groupby(['계약년월일','area'])['평당가'].agg(**{"일자별평당가":"mean"}).reset_index()
# 구별 / 일자별 평당가 전처리 데이터
tr_data_series
계약년월일 area 일자별평당가
0 2011-01-01 강남구 663.166621
1 2011-01-01 강동구 637.018752
2 2011-01-01 강북구 421.807269
3 2011-01-01 강서구 345.009218
4 2011-01-01 관악구 565.387404
... ... ... ...
85226 2020-12-31 용산구 2056.587814
85227 2020-12-31 은평구 933.400030
85228 2020-12-31 종로구 1791.733836
85229 2020-12-31 중구 2062.139194
85230 2020-12-31 중랑구 928.100564
[85231 rows x 3 columns]
year = tr_data_series["계약년월일"].astype(str).str[:4]
tr_data_series["year"] = year
#연도(year)정보를 바탕으로 Macro 데이터 merge
korea_gdp_rate["year"] = korea_gdp_rate["year"].astype(str)
korea_personal_loan["year"] = korea_personal_loan["year"].astype(str)
korea_interest_rate["year"] = korea_interest_rate["year"].astype(str)
korea_loan_for_house["year"] = korea_loan_for_house["year"].astype(str)
korea_personal_GDP["year"] = korea_personal_GDP["year"].astype(str)
seoul_gdp_series["year"] = seoul_gdp_series["year"].astype(str)
housing_count_yearly["year"] = housing_count_yearly["year"].astype(str)
seoul_population["year"] = seoul_population["year"].astype(str)
constructure_confirm["year"] = constructure_confirm["year"].astype(str)
merge1 = pd.merge(left=tr_data_series, right=korea_gdp_rate, how="left", on="year")
merge2 = pd.merge(left=merge1, right=korea_personal_loan, how="left", on="year")
merge3 = pd.merge(left=merge2, right=korea_interest_rate, how="left", on="year")
merge4 = pd.merge(left=merge3, right=korea_loan_for_house, how="left", on="year")
merge5 = pd.merge(left=merge4, right=korea_personal_GDP, how="left", on="year")
merge6 = pd.merge(left=merge5, right=seoul_gdp_series, how="left", on="year")
merge7 = pd.merge(left=merge6, right=housing_count_yearly, how="left", on="year")
merge8 = pd.merge(left=merge7, right=seoul_population, how="left", on="year")
merge9 = pd.merge(left=merge8, right=constructure_confirm, how="left", on="year")
#SET INDEX (날짜 데이터를 인덱스로)
all_df_prepro = tr_data_series.set_index("계약년월일")
all_df = merge9.set_index("계약년월일")
#DataSet 1-1
all_df
area 일자별평당가 year KOREA_GDP GDP_Growth_rate loan 기준금리 \
계약년월일
2011-01-01 강남구 663.166621 2011 1388937 4 916 3
2011-01-01 강동구 637.018752 2011 1388937 4 916 3
2011-01-01 강북구 421.807269 2011 1388937 4 916 3
2011-01-01 강서구 345.009218 2011 1388937 4 916 3
2011-01-01 관악구 565.387404 2011 1388937 4 916 3
... ... ... ... ... ... ... ...
2020-12-31 용산구 2056.587814 2020 1933152 (1) 1726 1
2020-12-31 은평구 933.400030 2020 1933152 (1) 1726 1
2020-12-31 종로구 1791.733836 2020 1933152 (1) 1726 1
2020-12-31 중구 2062.139194 2020 1933152 (1) 1726 1
2020-12-31 중랑구 928.100564 2020 1933152 (1) 1726 1
housing_loan gdp_per_person gdp Viliages Buildings \
계약년월일
2011-01-01 2511440 27901 326415107 4081 19022
2011-01-01 2511440 27901 326415107 4081 19022
2011-01-01 2511440 27901 326415107 4081 19022
2011-01-01 2511440 27901 326415107 4081 19022
2011-01-01 2511440 27901 326415107 4081 19022
... ... ... ... ... ...
2020-12-31 5914223 37568 435102998 4134 20179
2020-12-31 5914223 37568 435102998 4134 20179
2020-12-31 5914223 37568 435102998 4134 20179
2020-12-31 5914223 37568 435102998 4134 20179
2020-12-31 5914223 37568 435102998 4134 20179
House 세대 인구 합계 수도권 서울
계약년월일
2011-01-01 1459112 4192752 10528774 549594 272156 88060
2011-01-01 1459112 4192752 10528774 549594 272156 88060
2011-01-01 1459112 4192752 10528774 549594 272156 88060
2011-01-01 1459112 4192752 10528774 549594 272156 88060
2011-01-01 1459112 4192752 10528774 549594 272156 88060
... ... ... ... ... ... ...
2020-12-31 1544424 4417954 9911088 457514 252301 58181
2020-12-31 1544424 4417954 9911088 457514 252301 58181
2020-12-31 1544424 4417954 9911088 457514 252301 58181
2020-12-31 1544424 4417954 9911088 457514 252301 58181
2020-12-31 1544424 4417954 9911088 457514 252301 58181
[85231 rows x 18 columns]
(4) 서울시 아파트 거래 평당가격 시계열 데이터 확인
seoul_int = all_df[["일자별평당가","KOREA_GDP","GDP_Growth_rate","loan","기준금리","housing_loan","gdp_per_person","gdp",'Viliages',"Buildings",'House','세대',"인구","수도권"]]
seoul_int.loc[(seoul_int.GDP_Growth_rate == "(1)"),"GDP_Growth_rate"] = 1
C:\Users\MSI\Anaconda3\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
seoul_int["KOREA_GDP"] = seoul_int["KOREA_GDP"].apply(pd.to_numeric)
seoul_int["GDP_Growth_rate"] = seoul_int["GDP_Growth_rate"].apply(pd.to_numeric)
seoul_int["loan"] = seoul_int["loan"].apply(pd.to_numeric)
seoul_int["기준금리"] = seoul_int["기준금리"].apply(pd.to_numeric)
seoul_int["housing_loan"] = seoul_int["housing_loan"].apply(pd.to_numeric)
seoul_int["gdp_per_person"] = seoul_int["gdp_per_person"].apply(pd.to_numeric)
seoul_int["Viliages"] = seoul_int["Viliages"].apply(pd.to_numeric)
seoul_int["Buildings"] = seoul_int["Buildings"].apply(pd.to_numeric)
seoul_int["House"] = seoul_int["House"].apply(pd.to_numeric)
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
after removing the cwd from sys.path.
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
import sys
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
from sklearn.preprocessing import MinMaxScaler
features_name = seoul_int.columns
scaler = MinMaxScaler()
scaler.fit(seoul_int)
seoul_scaled = scaler.transform(seoul_int)
seoul_df_scaled = pd.DataFrame(data=seoul_int, columns=features_name)
seoul_df_scaled["일자별평당가"].plot(title="서울시 전체 일자별 평당가 추이")
<AxesSubplot:title={'center':'서울시 전체 일자별 평당가 추이'}, xlabel='계약년월일'>
(5) 서울 아파트 거래가격 Map 데이터로의 시각화
import folium
import json
import re
tr_data_series
계약년월일 area 일자별평당가 year
0 2011-01-01 강남구 663.166621 2011
1 2011-01-01 강동구 637.018752 2011
2 2011-01-01 강북구 421.807269 2011
3 2011-01-01 강서구 345.009218 2011
4 2011-01-01 관악구 565.387404 2011
... ... ... ... ...
85226 2020-12-31 용산구 2056.587814 2020
85227 2020-12-31 은평구 933.400030 2020
85228 2020-12-31 종로구 1791.733836 2020
85229 2020-12-31 중구 2062.139194 2020
85230 2020-12-31 중랑구 928.100564 2020
[85231 rows x 4 columns]
data_for_map = tr_data_series.groupby(['year','area'])['일자별평당가'].agg(**{"연도별평당가":"mean"}).reset_index()
#연도별 / 구별 평당가 체크
data_for_map_2011 = data_for_map[data_for_map["year"]=="2011"]
geo_json = "https://raw.githubusercontent.com/southkorea/seoul-maps/master/kostat/2013/json/seoul_municipalities_geo_simple.json"
df2 = data_for_map_2011[["area","연도별평당가"]]
df2.columns=["name","values"]
df2=df2.sort_values(by="name")
df2["name"] = df2["name"].apply(lambda x : re.compile('[가-힣]+').findall(x)[0])
n = folium.Map(
location=[37.566345,126.977893],
tiles ="Stamen Terrain"
)
folium.Choropleth(
geo_data=geo_json,
name='choropleth',
data=df2,
columns=['name','values'],
key_on='feature.properties.name',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2
).add_to(n)
<folium.features.Choropleth at 0x13d31d9fa88>
n
(6) 모델링 1번 시도 : 시계열 LSTM 모델링
#강남구에 한정하여 시계열 모델링 시도
gangnam_df = all_df[all_df["area"]=="강남구"]
gangnam_df.columns
Index(['area', '일자별평당가', 'year', 'KOREA_GDP', 'GDP_Growth_rate', 'loan',
'기준금리', 'housing_loan', 'gdp_per_person', 'gdp', 'Viliages',
'Buildings', 'House', '세대', '인구', '합계', '수도권', '서울'],
dtype='object')
split_date = pd.Timestamp('2018-01-01')
train = gangnam_df.loc[:split_date, ['일자별평당가']]
test = gangnam_df.loc[split_date:, ['일자별평당가']]
ax = train.plot()
test.plot(ax=ax)
plt.legend(['train','test'])
<matplotlib.legend.Legend at 0x13d28a19808>
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(train)
train_sc = scaler.transform(train)
scaler.fit(test)
test_sc = scaler.transform(test)
train_sc_df = pd.DataFrame(train_sc, columns=["일자별평당가"], index=train.index)
test_sc_df = pd.DataFrame(test_sc, columns=["일자별평당가"], index=test.index)
train_sc_df.head()
일자별평당가
계약년월일
2011-01-01 -2.405885
2011-01-03 -1.610236
2011-01-04 -1.410823
2011-01-05 -0.270497
2011-01-06 -0.258758
#sliding window 구성하기
for s in range(1, 13):
train_sc_df['shift_{}'.format(s)] = train_sc_df['일자별평당가'].shift(s)
test_sc_df['shift_{}'.format(s)] = test_sc_df['일자별평당가'].shift(s)
train_sc_df.head(13)
일자별평당가 shift_1 shift_2 shift_3 shift_4 shift_5 \
계약년월일
2011-01-01 -2.405885 NaN NaN NaN NaN NaN
2011-01-03 -1.610236 -2.405885 NaN NaN NaN NaN
2011-01-04 -1.410823 -1.610236 -2.405885 NaN NaN NaN
2011-01-05 -0.270497 -1.410823 -1.610236 -2.405885 NaN NaN
2011-01-06 -0.258758 -0.270497 -1.410823 -1.610236 -2.405885 NaN
2011-01-07 -0.346819 -0.258758 -0.270497 -1.410823 -1.610236 -2.405885
2011-01-08 -0.805440 -0.346819 -0.258758 -0.270497 -1.410823 -1.610236
2011-01-09 -0.447909 -0.805440 -0.346819 -0.258758 -0.270497 -1.410823
2011-01-10 -0.275092 -0.447909 -0.805440 -0.346819 -0.258758 -0.270497
2011-01-11 -0.406449 -0.275092 -0.447909 -0.805440 -0.346819 -0.258758
2011-01-12 -0.143209 -0.406449 -0.275092 -0.447909 -0.805440 -0.346819
2011-01-13 -0.963023 -0.143209 -0.406449 -0.275092 -0.447909 -0.805440
2011-01-14 -0.844905 -0.963023 -0.143209 -0.406449 -0.275092 -0.447909
shift_6 shift_7 shift_8 shift_9 shift_10 shift_11 \
계약년월일
2011-01-01 NaN NaN NaN NaN NaN NaN
2011-01-03 NaN NaN NaN NaN NaN NaN
2011-01-04 NaN NaN NaN NaN NaN NaN
2011-01-05 NaN NaN NaN NaN NaN NaN
2011-01-06 NaN NaN NaN NaN NaN NaN
2011-01-07 NaN NaN NaN NaN NaN NaN
2011-01-08 -2.405885 NaN NaN NaN NaN NaN
2011-01-09 -1.610236 -2.405885 NaN NaN NaN NaN
2011-01-10 -1.410823 -1.610236 -2.405885 NaN NaN NaN
2011-01-11 -0.270497 -1.410823 -1.610236 -2.405885 NaN NaN
2011-01-12 -0.258758 -0.270497 -1.410823 -1.610236 -2.405885 NaN
2011-01-13 -0.346819 -0.258758 -0.270497 -1.410823 -1.610236 -2.405885
2011-01-14 -0.805440 -0.346819 -0.258758 -0.270497 -1.410823 -1.610236
shift_12
계약년월일
2011-01-01 NaN
2011-01-03 NaN
2011-01-04 NaN
2011-01-05 NaN
2011-01-06 NaN
2011-01-07 NaN
2011-01-08 NaN
2011-01-09 NaN
2011-01-10 NaN
2011-01-11 NaN
2011-01-12 NaN
2011-01-13 NaN
2011-01-14 -2.405885
#dropNA Train set
X_train = train_sc_df.dropna().drop('일자별평당가', axis=1)
y_train = train_sc_df.dropna()[['일자별평당가']]
#dropNA Test set
X_test = test_sc_df.dropna().drop('일자별평당가', axis=1)
y_test = test_sc_df.dropna()[['일자별평당가']]
X_train.head()
shift_1 shift_2 shift_3 shift_4 shift_5 shift_6 \
계약년월일
2011-01-14 -0.963023 -0.143209 -0.406449 -0.275092 -0.447909 -0.805440
2011-01-15 -0.844905 -0.963023 -0.143209 -0.406449 -0.275092 -0.447909
2011-01-16 -0.530647 -0.844905 -0.963023 -0.143209 -0.406449 -0.275092
2011-01-17 -0.336816 -0.530647 -0.844905 -0.963023 -0.143209 -0.406449
2011-01-18 -1.084970 -0.336816 -0.530647 -0.844905 -0.963023 -0.143209
shift_7 shift_8 shift_9 shift_10 shift_11 shift_12
계약년월일
2011-01-14 -0.346819 -0.258758 -0.270497 -1.410823 -1.610236 -2.405885
2011-01-15 -0.805440 -0.346819 -0.258758 -0.270497 -1.410823 -1.610236
2011-01-16 -0.447909 -0.805440 -0.346819 -0.258758 -0.270497 -1.410823
2011-01-17 -0.275092 -0.447909 -0.805440 -0.346819 -0.258758 -0.270497
2011-01-18 -0.406449 -0.275092 -0.447909 -0.805440 -0.346819 -0.258758
print(type(X_train))
X_train = X_train.values
print(type(X_test))
X_test = X_test.values
y_train = y_train.values
y_test = y_test.values
print(X_train.shape)
print(y_train.shape)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
(2464, 12)
(2464, 1)
X_train_t = X_train.reshape(X_train.shape[0], 12, 1)
X_test_t = X_test.reshape(X_test.shape[0], 12, 1)
print("최종 DATA SET")
print(X_train_t.shape)
print(X_train_t)
print(y_train)
최종 DATA SET
(2464, 12, 1)
[[[-0.96302271]
[-0.14320865]
[-0.4064492 ]
...
[-1.41082274]
[-1.61023636]
[-2.40588522]]
[[-0.84490503]
[-0.96302271]
[-0.14320865]
...
[-0.27049713]
[-1.41082274]
[-1.61023636]]
[[-0.53064669]
[-0.84490503]
[-0.96302271]
...
[-0.25875777]
[-0.27049713]
[-1.41082274]]
...
[[ 2.24085583]
[ 2.62587908]
[ 2.89125758]
...
[ 2.98609925]
[ 2.24576779]
[ 2.08951722]]
[[ 1.816919 ]
[ 2.24085583]
[ 2.62587908]
...
[ 1.68106014]
[ 2.98609925]
[ 2.24576779]]
[[ 2.74317732]
[ 1.816919 ]
[ 2.24085583]
...
[ 1.66028695]
[ 1.68106014]
[ 2.98609925]]]
[[-0.84490503]
[-0.53064669]
[-0.33681626]
...
[ 1.816919 ]
[ 2.74317732]
[ 3.6066599 ]]
#시계열 모델링 LSTM
from keras.layers import LSTM
from keras.models import Sequential
from keras.layers import Dense
import keras.backend as K
from keras.callbacks import EarlyStopping
K.clear_session()
model = Sequential() # Sequeatial Model
model.add(LSTM(20, input_shape=(12, 1))) # (timestep, feature)
model.add(Dense(1)) # output = 1
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 20) 1760
_________________________________________________________________
dense_1 (Dense) (None, 1) 21
=================================================================
Total params: 1,781
Trainable params: 1,781
Non-trainable params: 0
_________________________________________________________________
Using TensorFlow backend.
early_stop = EarlyStopping(monitor='loss', patience=1, verbose=1)
model.fit(X_train_t, y_train, epochs=100,
batch_size=30, verbose=1, callbacks=[early_stop])
Epoch 1/100
2464/2464 [==============================] - 1s 431us/step - loss: 0.5652
Epoch 2/100
2464/2464 [==============================] - 0s 178us/step - loss: 0.4182
Epoch 3/100
2464/2464 [==============================] - 0s 198us/step - loss: 0.4152
Epoch 4/100
2464/2464 [==============================] - 0s 184us/step - loss: 0.4152
Epoch 5/100
2464/2464 [==============================] - 0s 199us/step - loss: 0.4135
Epoch 6/100
2464/2464 [==============================] - 0s 184us/step - loss: 0.4135
Epoch 7/100
2464/2464 [==============================] - 0s 183us/step - loss: 0.4126
Epoch 8/100
2464/2464 [==============================] - 0s 192us/step - loss: 0.4121
Epoch 9/100
2464/2464 [==============================] - 0s 176us/step - loss: 0.4111
Epoch 10/100
2464/2464 [==============================] - 0s 182us/step - loss: 0.4093
Epoch 11/100
2464/2464 [==============================] - 0s 175us/step - loss: 0.4088
Epoch 12/100
2464/2464 [==============================] - 0s 170us/step - loss: 0.4084
Epoch 13/100
2464/2464 [==============================] - 0s 177us/step - loss: 0.4085
Epoch 00013: early stopping
<keras.callbacks.callbacks.History at 0x13d2babe5c8>
score = model.evaluate(X_test_t, y_test, batch_size=30)
1025/1025 [==============================] - 0s 122us/step
print(score)
#mse 가 0.81 수준으로 나왔다
0.8044285868726125
test_sc_df.describe()
일자별평당가 shift_1 shift_2 shift_3 shift_4 \
count 1.037000e+03 1036.000000 1035.000000 1034.000000 1033.000000
mean -2.740763e-17 -0.001252 -0.001312 -0.001604 -0.003016
std 1.000483e+00 1.000152 1.000634 1.001074 1.000528
min -3.474334e+00 -3.474334 -3.474334 -3.474334 -3.474334
25% -5.708210e-01 -0.571740 -0.572659 -0.573578 -0.574497
50% 2.178348e-03 0.000132 -0.001914 -0.003474 -0.005035
75% 5.915490e-01 0.588946 0.589814 0.590681 0.588078
max 5.024884e+00 5.024884 5.024884 5.024884 5.024884
shift_5 shift_6 shift_7 shift_8 shift_9 \
count 1032.000000 1031.000000 1030.000000 1029.000000 1028.000000
mean -0.004648 -0.004920 -0.005242 -0.006563 -0.007535
std 0.999636 1.000083 1.000516 1.000103 1.000104
min -3.474334 -3.474334 -3.474334 -3.474334 -3.474334
25% -0.574833 -0.575170 -0.575507 -0.575843 -0.576000
50% -0.005327 -0.005620 -0.005881 -0.006142 -0.007253
75% 0.587501 0.587694 0.587886 0.587309 0.580915
max 5.024884 5.024884 5.024884 5.024884 5.024884
shift_10 shift_11 shift_12
count 1027.000000 1026.000000 1025.000000
mean -0.007700 -0.008266 -0.009181
std 1.000577 1.000901 1.000960
min -3.474334 -3.474334 -3.474334
25% -0.576157 -0.576313 -0.576470
50% -0.008364 -0.008375 -0.008386
75% 0.583046 0.585178 0.578784
max 5.024884 5.024884 5.024884
y_pred = model.predict(X_test_t, batch_size=32)
plt.scatter(y_test, y_pred)
plt.xlabel("실제 일자별 평당가: $Y_i$")
plt.ylabel("예측한 일자별 평당가: $\hat{Y}_i$")
plt.title("Prices vs Predicted price Index: $Y_i$ vs $\hat{Y}_i$")
Text(0.5, 1.0, 'Prices vs Predicted price Index: $Y_i$ vs $\\hat{Y}_i$')
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0, flags=flags)
(7) 다중 회귀 모델링 1차 시도 with Tensorflow 2.0
#강남구로 한정하여 분석 시도
absgangnam_df = all_df[all_df["area"]=="강남구"]
gangnam_int = gangnam_df[["일자별평당가","KOREA_GDP","GDP_Growth_rate","loan","기준금리","housing_loan","gdp_per_person","gdp",'Viliages',"Buildings",'House','세대',"인구","수도권"]]
gangnam_int.loc[(gangnam_int.GDP_Growth_rate == "(1)"),"GDP_Growth_rate"] = -1
C:\Users\MSI\Anaconda3\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
gangnam_int["KOREA_GDP"] = gangnam_int["KOREA_GDP"].apply(pd.to_numeric)
gangnam_int["GDP_Growth_rate"] = gangnam_int["GDP_Growth_rate"].apply(pd.to_numeric)
gangnam_int["loan"] = gangnam_int["loan"].apply(pd.to_numeric)
gangnam_int["기준금리"] = gangnam_int["기준금리"].apply(pd.to_numeric)
gangnam_int["housing_loan"] = gangnam_int["housing_loan"].apply(pd.to_numeric)
gangnam_int["gdp_per_person"] = gangnam_int["gdp_per_person"].apply(pd.to_numeric)
gangnam_int["Viliages"] = gangnam_int["Viliages"].apply(pd.to_numeric)
gangnam_int["Buildings"] = gangnam_int["Buildings"].apply(pd.to_numeric)
gangnam_int["House"] = gangnam_int["House"].apply(pd.to_numeric)
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
after removing the cwd from sys.path.
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
import sys
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\MSI\Anaconda3\lib\site-packages\ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
features_name = gangnam_int.columns
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(gangnam_int)
gangnam_scaled = scaler.transform(gangnam_int)
gangnam_df_scaled = pd.DataFrame(data=gangnam_scaled, columns=features_name)
train_dataset = gangnam_df_scaled.sample(frac=0.8, random_state=0)
test_dataset = gangnam_df_scaled.drop(train_dataset.index)
sns.pairplot(train_dataset[["일자별평당가","KOREA_GDP","기준금리","인구","housing_loan"]])
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0.0, flags=flags)
<seaborn.axisgrid.PairGrid at 0x13d2b133088>
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0, flags=flags)
데이터 컬럼간 상관관계 체크
corr = gangnam_df_scaled.corr(method='pearson')
corr_df = corr.apply(lambda x : round(x,2))
print(corr_df.iloc[0,:])
일자별평당가 1.00
KOREA_GDP 0.78
GDP_Growth_rate -0.59
loan 0.82
기준금리 -0.65
housing_loan 0.85
gdp_per_person 0.79
gdp 0.82
Viliages 0.01
Buildings 0.53
House 0.42
세대 0.82
인구 -0.82
수도권 -0.02
Name: 일자별평당가, dtype: float64
#데이터셋과 데이러라벨 분리
train_labels = train_dataset.pop("일자별평당가")
test_labels = test_dataset.pop("일자별평당가")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_28600/3224697975.py in <module>
1 #데이터셋과 데이러라벨 분리
----> 2 train_labels = train_dataset.pop("일자별평당가")
3 test_labels = test_dataset.pop("일자별평당가")
NameError: name 'train_dataset' is not defined
#순차형 모델 설계
def build_model():
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
return model
model = build_model()
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 896
_________________________________________________________________
dense_1 (Dense) (None, 64) 4160
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dense_3 (Dense) (None, 1) 65
=================================================================
Total params: 9,281
Trainable params: 9,281
Non-trainable params: 0
_________________________________________________________________
example_batch = train_dataset[:10]
example_result = model.predict(example_batch)
example_result
array([[ 0.02163886],
[ 0.5113964 ],
[ 0.48028725],
[-0.36780232],
[ 0.22917318],
[ 0.22917318],
[-0.09433231],
[-0.36780232],
[ 0.38817587],
[ 0.38817587]], dtype=float32)
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 500
history = model.fit(
train_dataset, train_labels,
epochs=EPOCHS, validation_split = 0.2, verbose=0,
callbacks=[PrintDot()])
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
def plot_history(history):
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
plt.figure(figsize=(8,12))
plt.subplot(2,1,1)
plt.xlabel('Epoch')
plt.ylabel('Mean Abs Error [일자별평당가]')
plt.plot(hist['epoch'], hist['mae'],
label='Train Error')
plt.plot(hist['epoch'], hist['val_mae'],
label = 'Val Error')
plt.ylim([0.25,0.5])
plt.legend()
plt.subplot(2,1,2)
plt.xlabel('Epoch')
plt.ylabel('Mean Square Error [$일자별평당가^2$]')
plt.plot(hist['epoch'], hist['mse'],
label='Train Error')
plt.plot(hist['epoch'], hist['val_mse'],
label = 'Val Error')
plt.ylim([0.2,0.5])
plt.legend()
plt.show()
plot_history(history)
Font 'rm' does not have a glyph for '\uc77c' [U+c77c], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uc790' [U+c790], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ubcc4' [U+bcc4], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ud3c9' [U+d3c9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ub2f9' [U+b2f9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uac00' [U+ac00], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uc77c' [U+c77c], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uc790' [U+c790], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ubcc4' [U+bcc4], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ud3c9' [U+d3c9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ub2f9' [U+b2f9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uac00' [U+ac00], substituting with a dummy symbol.
loss, mae, mse = model.evaluate(test_dataset, test_labels, verbose=2)
print("테스트 세트의 평균 절대 오차 : {:5.2f} 일자별평당가".format(mae))
702/702 - 0s - loss: 0.2435 - mae: 0.3191 - mse: 0.2435
테스트 세트의 평균 절대 오차 : 0.32 일자별평당가
test_predictions = model.predict(test_dataset).flatten()
plt.scatter(test_labels, test_predictions)
plt.xlabel('실제값 [일자별평당가]')
plt.ylabel('예측값 [일자별평당가]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
error = test_predictions - test_labels
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [일자별평당가]")
_ = plt.ylabel("Count")
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0, flags=flags)
(8) 다중 회귀 모델링 2차 시도 with Tensorflow 2.0 (interpolate)
gangnam_df = all_df[all_df["area"]=="강남구"]
#날짜 인덱스 길이에 따라 list 생성
date_index = gangnam_df.index
len(date_index)
list_test = []
for i in range(3512):
list_test.append(np.nan)
df_test = pd.DataFrame(list_test, columns = ["test"], index = date_index)
#Macro 거시데이터 리드
t_korea_gdp_rate = pd.read_csv("./data/Marco/01_korea_gdp_series.csv",sep=",",encoding="UTF-8")
t_korea_interest_rate = pd.read_csv("./data/Marco/02_korea_interest_rate.csv",sep=",",encoding="UTF-8")
t_korea_personal_loan = pd.read_csv("./data/Marco/03_korea_personal_loan.csv",sep=",",encoding="UTF-8")
t_korea_loan_for_house = pd.read_csv("./data/Marco/04_korea_loan_for_house.csv",sep=",",encoding="UTF-8")
t_korea_personal_GDP = pd.read_csv("./data/Marco/05_korea_personal_GDP.csv",sep=",",encoding="UTF-8")
t_seoul_gdp_series = pd.read_csv("./data/Marco/06_seoul_gdp_series.csv",sep=",",encoding="UTF-8")
t_housing_count_yearly = pd.read_csv("./data/Marco/07_housing_count_yearly.csv",sep=",",encoding="UTF-8")
t_seoul_population= pd.read_csv("./data/Marco/08_seoul_population.csv",sep=",",encoding="UTF-8")
t_constructure_confirm= pd.read_csv("./data/Marco/09_constructure_confirm.csv",sep=",",encoding="UTF-8")
#2011~2020년의 필요한 연도 데이터 추출
t_korea_gdp_rate = t_korea_gdp_rate.loc[41:]
t_korea_interest_rate = t_korea_interest_rate.loc[12:]
t_korea_personal_loan = t_korea_personal_loan.loc[9:]
t_korea_loan_for_house = t_korea_loan_for_house.loc[:9]
t_korea_personal_GDP = t_korea_personal_GDP[1:]
t_seoul_gdp_series
t_housing_count_yearly
t_seoul_population
t_constructure_confirm
year 합계 수도권 서울
0 2011 549594 272156 88060
1 2012 586884 269290 86123
2 2013 440116 192610 77621
3 2014 515251 241889 65249
4 2015 765328 408773 101235
5 2016 726048 341162 74739
6 2017 653441 321402 113131
7 2018 554136 280097 65751
8 2019 487975 272226 62272
9 2020 457514 252301 58181
#날짜별 데이터 인덱스 대표값 입력
date_time_index = [datetime(2011,1,1),datetime(2012,12,31),datetime(2013,12,31),datetime(2014,12,31),datetime(2015,12,31),datetime(2016,12,31),datetime(2017,12,31),datetime(2018,12,31),datetime(2019,12,31),datetime(2020,12,31)]
t_korea_gdp_rate["date_time_index"] = date_time_index
t_korea_gdp_rate = t_korea_gdp_rate.set_index("date_time_index")
t_korea_interest_rate["date_time_index"] = date_time_index
t_korea_interest_rate = t_korea_interest_rate.set_index("date_time_index")
t_korea_personal_loan["date_time_index"] = date_time_index
t_korea_personal_loan = t_korea_personal_loan.set_index("date_time_index")
t_korea_loan_for_house["date_time_index"] = date_time_index
t_korea_loan_for_house = t_korea_loan_for_house.set_index("date_time_index")
t_korea_personal_GDP["date_time_index"] = date_time_index
t_korea_personal_GDP = t_korea_personal_GDP.set_index("date_time_index")
t_seoul_gdp_series["date_time_index"] = date_time_index
t_seoul_gdp_series = t_seoul_gdp_series.set_index("date_time_index")
t_housing_count_yearly["date_time_index"] = date_time_index
t_housing_count_yearly = t_housing_count_yearly.set_index("date_time_index")
t_seoul_population["date_time_index"] = date_time_index
t_seoul_population = t_seoul_population.set_index("date_time_index")
t_constructure_confirm["date_time_index"] = date_time_index
t_constructure_confirm = t_constructure_confirm.set_index("date_time_index")
#날짜 인덱스 입력 체크
t_korea_personal_loan
year loan
date_time_index
2011-01-01 2011 916
2012-12-31 2012 964
2013-12-31 2013 1019
2014-12-31 2014 1085
2015-12-31 2015 1203
2016-12-31 2016 1343
2017-12-31 2017 1451
2018-12-31 2018 1537
2019-12-31 2019 1600
2020-12-31 2020 1726
t_korea_gdp_rate.loc[(t_korea_gdp_rate.GDP_Growth_rate == "(1)"),"GDP_Growth_rate"] = -1
t_korea_gdp_rate["KOREA_GDP"] = t_korea_gdp_rate["KOREA_GDP"].apply(pd.to_numeric)
t_korea_gdp_rate["GDP_Growth_rate"] = t_korea_gdp_rate["GDP_Growth_rate"].apply(pd.to_numeric)
t_korea_personal_loan["loan"] = t_korea_personal_loan["loan"].apply(pd.to_numeric)
t_korea_interest_rate["기준금리"] = t_korea_interest_rate["기준금리"].apply(pd.to_numeric)
t_korea_loan_for_house["housing_loan"] = t_korea_loan_for_house["housing_loan"].apply(pd.to_numeric)
t_korea_personal_GDP["gdp_per_person"] = t_korea_personal_GDP["gdp_per_person"].apply(pd.to_numeric)
t_housing_count_yearly["Viliages"] = t_housing_count_yearly["Viliages"].apply(pd.to_numeric)
t_housing_count_yearly["Buildings"] = t_housing_count_yearly["Buildings"].apply(pd.to_numeric)
t_housing_count_yearly["House"] = t_housing_count_yearly["House"].apply(pd.to_numeric)
t_constructure_confirm["수도권"] = t_constructure_confirm["수도권"].apply(pd.to_numeric)
# Interpolate를 활용하여 데이터 선형화 / 연속화
interpol1 = pd.merge(left=df_test, right=t_korea_gdp_rate, how="left", left_index=True, right_index=True)
interpol2 = pd.merge(left=df_test, right=t_korea_personal_loan, how="left", left_index=True, right_index=True)
interpol3 = pd.merge(left=df_test, right=t_korea_interest_rate, how="left", left_index=True, right_index=True)
interpol4 = pd.merge(left=df_test, right=t_korea_loan_for_house, how="left", left_index=True, right_index=True)
interpol5 = pd.merge(left=df_test, right=t_korea_personal_GDP, how="left", left_index=True, right_index=True)
interpol6 = pd.merge(left=df_test, right=t_housing_count_yearly, how="left", left_index=True, right_index=True)
interpol7 = pd.merge(left=df_test, right=t_constructure_confirm, how="left", left_index=True, right_index=True)
interpol8 = pd.merge(left=df_test, right=t_seoul_gdp_series, how="left", left_index=True, right_index=True)
interpol9 = pd.merge(left=df_test, right=t_seoul_population, how="left", left_index=True, right_index=True)
interpol1 = interpol1.interpolate()
interpol2 = interpol2.interpolate()
interpol3 = interpol3.interpolate()
interpol4 = interpol4.interpolate()
interpol5 = interpol5.interpolate()
interpol6 = interpol6.interpolate()
interpol7 = interpol7.interpolate()
interpol8 = interpol8.interpolate()
interpol9 = interpol9.interpolate()
all_df_prepro = tr_data_series.set_index("계약년월일")
all_df_prepro = all_df_prepro[all_df_prepro["area"]=="강남구"]
merge1 = pd.merge(left=all_df_prepro, right=interpol1, how="left", left_index=True, right_index=True)
merge2 = pd.merge(left=merge1, right=interpol2, how="left", left_index=True, right_index=True)
merge3 = pd.merge(left=merge2, right=interpol3, how="left", left_index=True, right_index=True)
merge4 = pd.merge(left=merge3, right=interpol4, how="left", left_index=True, right_index=True)
merge5 = pd.merge(left=merge4, right=interpol5, how="left", left_index=True, right_index=True)
merge6 = pd.merge(left=merge5, right=interpol6, how="left", left_index=True, right_index=True)
merge7 = pd.merge(left=merge6, right=interpol7, how="left", left_index=True, right_index=True)
merge8 = pd.merge(left=merge7, right=interpol8, how="left", left_index=True, right_index=True)
merge9 = pd.merge(left=merge8, right=interpol9, how="left", left_index=True, right_index=True)
target_df = merge9[["일자별평당가","KOREA_GDP","GDP_Growth_rate","loan","기준금리","housing_loan","gdp_per_person","gdp",'Viliages',"Buildings",'House','세대',"인구","수도권"]]
target_df.plot()
<AxesSubplot:xlabel='계약년월일'>
na_df = target_df[target_df["KOREA_GDP"].isna()]
#중복된 데이터 개수 세기
na_df.index.drop_duplicates()
DatetimeIndex([], dtype='datetime64[ns]', name='계약년월일', freq=None)
target_df = target_df.dropna()
target_df
일자별평당가 KOREA_GDP GDP_Growth_rate loan 기준금리 \
계약년월일
2011-01-01 663.166621 1.388937e+06 4.000000 916.000000 3.0
2011-01-03 820.308356 1.389012e+06 3.997085 916.069971 3.0
2011-01-04 859.692819 1.389086e+06 3.994169 916.139942 3.0
2011-01-05 1084.908684 1.389161e+06 3.991254 916.209913 3.0
2011-01-06 1087.227224 1.389235e+06 3.988338 916.279883 3.0
... ... ... ... ... ...
2020-12-27 2557.440399 1.933051e+06 -0.965015 1724.530612 1.0
2020-12-28 2471.905319 1.933076e+06 -0.973761 1724.897959 1.0
2020-12-29 2030.309822 1.933102e+06 -0.982507 1725.265306 1.0
2020-12-30 1938.892652 1.933127e+06 -0.991254 1725.632653 1.0
2020-12-31 2410.882176 1.933152e+06 -1.000000 1726.000000 1.0
housing_loan gdp_per_person gdp Viliages \
계약년월일
2011-01-01 2.511440e+06 27901.000000 3.264151e+08 4081.000000
2011-01-03 2.511656e+06 27902.300292 3.264266e+08 4081.017493
2011-01-04 2.511871e+06 27903.600583 3.264382e+08 4081.034985
2011-01-05 2.512087e+06 27904.900875 3.264497e+08 4081.052478
2011-01-06 2.512303e+06 27906.201166 3.264612e+08 4081.069971
... ... ... ... ...
2020-12-27 5.907140e+06 37564.571429 4.351126e+08 4134.501458
2020-12-28 5.908911e+06 37565.428571 4.351102e+08 4134.376093
2020-12-29 5.910681e+06 37566.285714 4.351078e+08 4134.250729
2020-12-30 5.912452e+06 37567.142857 4.351054e+08 4134.125364
2020-12-31 5.914223e+06 37568.000000 4.351030e+08 4134.000000
Buildings House 세대 인구 \
계약년월일
2011-01-01 19022.000000 1.459112e+06 4.192752e+06 1.052877e+07
2011-01-03 19022.180758 1.459114e+06 4.192730e+06 1.052865e+07
2011-01-04 19022.361516 1.459117e+06 4.192709e+06 1.052852e+07
2011-01-05 19022.542274 1.459119e+06 4.192687e+06 1.052840e+07
2011-01-06 19022.723032 1.459121e+06 4.192666e+06 1.052827e+07
... ... ... ... ...
2020-12-27 20177.845481 1.544251e+06 4.416900e+06 9.912253e+06
2020-12-28 20178.134111 1.544295e+06 4.417164e+06 9.911962e+06
2020-12-29 20178.422741 1.544338e+06 4.417427e+06 9.911670e+06
2020-12-30 20178.711370 1.544381e+06 4.417691e+06 9.911379e+06
2020-12-31 20179.000000 1.544424e+06 4.417954e+06 9.911088e+06
수도권
계약년월일
2011-01-01 272156.000000
2011-01-03 272151.822157
2011-01-04 272147.644315
2011-01-05 272143.466472
2011-01-06 272139.288630
... ...
2020-12-27 252533.361516
2020-12-28 252475.271137
2020-12-29 252417.180758
2020-12-30 252359.090379
2020-12-31 252301.000000
[3512 rows x 14 columns]
features_name = target_df.columns
#표준화
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(target_df)
gangnam_scaled = scaler.transform(target_df)
gangnam_df_scaled = pd.DataFrame(data=gangnam_scaled, columns=features_name)
target_df["일자별평당가"].mean()
1367.6529994162436
target_df["일자별평당가"].std()
443.0142889012346
train_dataset = gangnam_df_scaled.sample(frac=0.8, random_state=0)
test_dataset = gangnam_df_scaled.drop(train_dataset.index)
train_dataset = train_dataset.dropna()
test_dataset = test_dataset.dropna()
sns.pairplot(train_dataset[["일자별평당가","KOREA_GDP","기준금리","인구","housing_loan"]])
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0.0, flags=flags)
<seaborn.axisgrid.PairGrid at 0x13d460e7788>
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0, flags=flags)
train_labels = train_dataset.pop("일자별평당가")
test_labels = test_dataset.pop("일자별평당가")
test_dataset[:2]
KOREA_GDP GDP_Growth_rate loan 기준금리 housing_loan \
0 -1.424635 1.60839 -1.279383 1.250434 -1.152464
3 -1.423473 1.59793 -1.278568 1.250434 -1.151826
gdp_per_person gdp Viliages Buildings House 세대 \
0 -1.413319 -1.266843 -1.511219 -1.531661 -1.008147 -0.498564
3 -1.412172 -1.265952 -1.510096 -1.530136 -1.007907 -0.499625
인구 수도권
0 1.487077 -0.283590
3 1.484989 -0.283843
def build_model():
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
return model
model = build_model()
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_4 (Dense) (None, 64) 896
_________________________________________________________________
dense_5 (Dense) (None, 64) 4160
_________________________________________________________________
dense_6 (Dense) (None, 64) 4160
_________________________________________________________________
dense_7 (Dense) (None, 1) 65
=================================================================
Total params: 9,281
Trainable params: 9,281
Non-trainable params: 0
_________________________________________________________________
example_batch = train_dataset[:10]
example_result = model.predict(example_batch)
example_result
array([[ 0.00797851],
[-0.01199541],
[-0.09142744],
[ 0.05832197],
[ 0.27348617],
[ 0.18634151],
[ 0.04727918],
[ 0.07454206],
[ 0.10136928],
[ 0.0705907 ]], dtype=float32)
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 500
history = model.fit(
train_dataset, train_labels,
epochs=EPOCHS, validation_split = 0.2, verbose=0,
callbacks=[PrintDot()])
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()
loss mae mse val_loss val_mae val_mse epoch
495 0.211812 0.297824 0.211812 0.220288 0.304418 0.220288 495
496 0.210872 0.299693 0.210872 0.217033 0.306330 0.217033 496
497 0.211817 0.299451 0.211817 0.220548 0.306266 0.220548 497
498 0.211599 0.299263 0.211599 0.219617 0.303908 0.219617 498
499 0.212285 0.299121 0.212285 0.218657 0.304192 0.218657 499
def plot_history(history):
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
plt.figure(figsize=(8,12))
plt.subplot(2,1,1)
plt.xlabel('Epoch')
plt.ylabel('Mean Abs Error [일자별평당가]')
plt.plot(hist['epoch'], hist['mae'],
label='Train Error')
plt.plot(hist['epoch'], hist['val_mae'],
label = 'Val Error')
plt.ylim([0,1])
plt.legend()
plt.subplot(2,1,2)
plt.xlabel('Epoch')
plt.ylabel('Mean Square Error [$일자별평당가^2$]')
plt.plot(hist['epoch'], hist['mse'],
label='Train Error')
plt.plot(hist['epoch'], hist['val_mse'],
label = 'Val Error')
plt.ylim([0,1])
plt.legend()
plt.show()
plot_history(history)
Font 'rm' does not have a glyph for '\uc77c' [U+c77c], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uc790' [U+c790], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ubcc4' [U+bcc4], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ud3c9' [U+d3c9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ub2f9' [U+b2f9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uac00' [U+ac00], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uc77c' [U+c77c], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uc790' [U+c790], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ubcc4' [U+bcc4], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ud3c9' [U+d3c9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\ub2f9' [U+b2f9], substituting with a dummy symbol.
Font 'rm' does not have a glyph for '\uac00' [U+ac00], substituting with a dummy symbol.
loss, mae, mse = model.evaluate(test_dataset, test_labels, verbose=2)
print("테스트 세트의 평균 절대 오차 : {:5.2f} 일자별평당가".format(mae))
702/702 - 0s - loss: 0.2247 - mae: 0.3053 - mse: 0.2247
테스트 세트의 평균 절대 오차 : 0.31 일자별평당가
test_predictions = model.predict(test_dataset).flatten()
plt.scatter(test_labels, test_predictions)
plt.xlabel('실제값 [일자별평당가]')
plt.ylabel('예측값 [일자별평당가]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
회귀 계수 확인
from sklearn.linear_model import LinearRegression
mlr = LinearRegression()
mlr.fit(train_dataset,train_labels)
LinearRegression()
test_predict = mlr.predict(test_dataset)
mlr.intercept_
-0.0018036691811668545
a = mlr.coef_
b = a.tolist()
b
[-1623274064068.07,
481600679659.31866,
-3101418178340.6064,
65030817020.926315,
5234723716722.214,
29619458914.86346,
-2526308842104.1704,
-219004405806.8065,
206194477323.95392,
-87436112647.77512,
-1268401596474.5413,
-3336944064158.5127,
125243097565.24297]
plt.scatter(test_labels, test_predict, alpha = 0.4)
plt.show()
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0.0, flags=flags)
C:\Users\MSI\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0, flags=flags)
(9) 모델을 활용한 예측 시나리오
#한국 GDP(십억원). GDP성장률(%), 가계대출(조원), 금리(%), 주택금융(억원), 1인당 GDP, 서울GDP, 아파트단지, 동수, 가구수, 세대수, 서울시 인구, 수도권 건축허가
prediction_2020 = [1933152, -1.00, 1726, 1.0, 5914223, 37568, 435102998, 4134.0, 20179.0, 1544424.0, 4417954.0, 9911088.0, 252301.0]
아래 prediction_2023 리스트에 원하는 시나리오에 해당 하는 값으로 변경
prediction_2023 = [1933152, -1.00, 1500, 1.0, 5914223, 37568, 435102998, 4134.0, 20179.0, 1544424.0, 4417954.0, 9911088.0, 252301.0]
p_t_korea_gdp_rate = t_korea_gdp_rate.iloc[:,1].values.tolist()
p_t_korea_gdp_rate.append(prediction_2023[0])
p_t_korea_gdp_rate_2 = t_korea_gdp_rate.iloc[:,2].values.tolist()
p_t_korea_gdp_rate_2.append(prediction_2023[1])
p_t_korea_personal_loan = t_korea_personal_loan.iloc[:,1].values.tolist()
p_t_korea_personal_loan.append(prediction_2023[2])
p_t_korea_interest_rate = t_korea_interest_rate.iloc[:,1].values.tolist()
p_t_korea_interest_rate.append(prediction_2023[3])
p_t_korea_loan_for_house = t_korea_loan_for_house.iloc[:,1].values.tolist()
p_t_korea_loan_for_house.append(prediction_2023[4])
p_t_korea_personal_GDP = t_korea_personal_GDP.iloc[:,1].values.tolist()
p_t_korea_personal_GDP.append(prediction_2023[5])
p_t_seoul_gdp_series = t_seoul_gdp_series.iloc[:,1].values.tolist()
p_t_seoul_gdp_series.append(prediction_2023[6])
p_t_housing_count_yearly = t_housing_count_yearly.iloc[:,1].values.tolist()
p_t_housing_count_yearly.append(prediction_2023[7])
p_t_housing_count_yearly2 = t_housing_count_yearly.iloc[:,2].values.tolist()
p_t_housing_count_yearly2.append(prediction_2023[8])
p_t_housing_count_yearly3 = t_housing_count_yearly.iloc[:,3].values.tolist()
p_t_housing_count_yearly3.append(prediction_2023[9])
p_t_seoul_population = t_seoul_population.iloc[:,1].values.tolist()
p_t_seoul_population.append(prediction_2023[10])
p_t_seoul_population2 = t_seoul_population.iloc[:,2].values.tolist()
p_t_seoul_population2.append(prediction_2023[11])
p_t_constructure_confirm = t_constructure_confirm.iloc[:,2].values.tolist()
p_t_constructure_confirm.append(prediction_2023[12])
result = {"1":p_t_korea_gdp_rate,
"2":p_t_korea_gdp_rate_2,
"3":p_t_korea_personal_loan,
"4":p_t_korea_interest_rate,
"5":p_t_korea_loan_for_house,
"6":p_t_korea_personal_GDP,
"7":p_t_seoul_gdp_series,
"8":p_t_housing_count_yearly,
"9":p_t_housing_count_yearly2,
"10":p_t_housing_count_yearly3,
"11":p_t_seoul_population,
"12":p_t_seoul_population2,
"13":p_t_constructure_confirm}
result_df = pd.DataFrame(result)
features_name = result_df.columns
scaler = StandardScaler()
scaler.fit(result_df)
result_scaled = scaler.transform(result_df)
result_df_scaled = pd.DataFrame(data=result_scaled, columns=features_name)
prediction_2023 = result_df_scaled.iloc[10,:].tolist()
prediction_2023
[1.1106653210030881,
-2.007387671367415,
0.7327348334733893,
-1.147078669352809,
1.546410679260238,
1.1285861640701929,
1.1917846307057862,
-0.34676527372731875,
1.2473695929018436,
1.334659438424521,
1.8650400734240828,
-1.412205733642385,
-0.5471789150810322]
prediction_2023_np = np.array(prediction_2023)
prediction_2023_np = np.reshape(prediction_2023_np, (13))
prediction_2023_np = np.expand_dims(prediction_2023_np, axis=0)
predictions = model.predict(prediction_2023_np)
p = predictions[0][0]
# 모델이 예측한 평당가
(p * target_df["일자별평당가"].std() + target_df["일자별평당가"].mean()) * 3.3
6544.339630303525