[딥러닝 파이토치 교과서] 7장 시계열 분석 Colab torchtext 오류 해결법

인공지능/딥러닝

[딥러닝 파이토치 교과서] 7장 시계열 분석 Colab torchtext 오류 해결법

M.랄라 2023. 6. 10. 21:19

딥러닝 파이토치 교과서의 경우 torchtext가 0.8.0, 0.9.0 혹은 0.10.0 중 하나로 코드를 작성하신 것 같다.

그러나 안타깝게도 colab은 torchtext의 0.8.0, 0.9.0, 0.10.0 모두 지원하지 않는 다는 점이 문제다. (2023.06.07 기준)

이것 때문에 시계열 데이터 실습을 못하는건 너무 아깝다고 생각하여 375p [코드 7-4] ~ 379p [코드 7-10] 까지를 대신할 수 있는 코드를 작성하였다. 똑같이 IMDB데이터를 사용하였으므로 7-12부터는 책에있는 코드를 그대로 사용해도 가능할 것 같다.

1. https://www.kaggle.com/datasets/atulanandjha/imdb-50k-movie-reviews-test-your-bert?select=test.csv 에 접속해 imdb train.csv, test.csv를 다운받는다.

2. [코드 7-4] ~ [코드7-9] 내용

import pandas as pd 
import csv

train_file_path = 'train.csv경로'
test_file_path = 'test.csv경로'

train_df = pd.read_csv(train_file_path)
train_df = train_df.rename(columns={'sentiment': 'label'})
train_df = train_df.reset_index()

test_df = pd.read_csv(test_file_path)
test_df = test_df.rename(columns={'sentiment': 'label'})
test_df = test_df.reset_index()

train_data = []
test_data = []

# train_data 초기화
for index, line in train_df.iterrows():
    original_dict = {
        'text': [],
        'label' : ""
    }

    if (len(line) < 2):
      continue

    original_dict['text'] = line['text'].split(' ')
    original_dict['label'] = line['label']
    train_data.append(original_dict)

# test_data 초기화
for index, line in test_df.iterrows():
    original_dict = {
        'text': [],
        'label' : ""
    }

    if (len(line) < 2):
      continue

    original_dict['text'] = line['text'].split(' ')
    original_dict['label'] = line['label']
    test_data.append(original_dict)
    
import string

for example in train_data:
  text = [x.lower() for x in example['text']]
  text = [x.replace('<br','')for x in text]
  text = [''.join(c for c in s if c not in string.punctuation) for s in text]
  text = [s for s in text if s]
  example['text'] = text
  
import random
from sklearn.model_selection import train_test_split

train_data, valid_data = train_test_split(train_data, random_state=random.seed(0), test_size=0.2)
print(f'Number of training examples : {len(train_data)}')
print(f'Number of valid_data examples : {len(valid_data)}')
print(f'Number of test examples : {len(test_data)}')

위의 코드를 실행하면 아래와 같이 책과 똑같은 결과가 나온다 !

Number of training examples : 20000

Number of valid_data examples : 5000

Number of test examples : 25000

'인공지능 > 딥러닝' 카테고리의 다른 글

딥러닝 Conv (0)	2023.07.10
딥러닝 전이학습 (0)	2023.07.09
[딥러닝 파이토치 교과서] ResNet 용어 정리 및 코드 분석 (0)	2023.05.30
GAN: Generative Adversarial Nets 논문 리뷰 (0)	2023.05.06
Imagenet classification with deep convolutional neural networks 정리 (0)	2023.03.02

현재글[딥러닝 파이토치 교과서] 7장 시계열 분석 Colab torchtext 오류 해결법

망중한 블로그

u-test, 수식 E, 딥러닝, 신경망 수학, confusion matrix 헷갈려요, ResNet 용어, indicator function, 리액트 맥북, 수식 I, 인디케이터 펑션, d값, Counterfactual Visual Explanations, 로지스틱회귀 E, effective size, welch's t-test, 선형대수학, ICML 2019, power analysis, 인공지능 수학, 김종엽,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

망중한 블로그