Created
June 21, 2020 05:40
-
-
Save ksdkamesh99/3f27ba53f03dbcadb259a4fb96c70b38 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # load CSV file | |
| data=pd.read_csv('winequality_red.csv') | |
| #plot the Countplot for the column quality | |
| sns.countplot(x='quality',data=data) | |
| # store the quality dataframe | |
| quality=data['quality'] | |
| # now if quality is less than 6.5 then it is assigned as 0 and if it is above 6.5 it is assigned to be 1 | |
| data['quality']=pd.cut(data['quality'],bins=(2,6.5,8),labels=[0,1]) | |
| #change the datatype of data['quality'] from category to int64 | |
| data['quality']=data['quality'].astype('int64') | |
| #Now plot correlation heat map | |
| plt.figure(figsize=(60,30)) | |
| sns.heatmap(data.corr(),annot=True,fmt='.2f') | |
| plt.show() | |
| # Seperate data into features and labels | |
| x=data.iloc[:,:-1] | |
| y=data.iloc[:,-1] | |
| #Split the data into training and testing dataset by taking train_size as 75% | |
| x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.75,random_state=42) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment