day 12 Scikit-Learn Machine Learning
Machine Learning with Scikit-LearnPermalink
we will be using a machine learning library called Scikit-Learn to implement a machine learning algorithm
ML algorithms can be divided roughly into three categories
-
supervised learning
-
classification
-
categorical
-
regression
-
forecasting
-
-
unsupervised learning
-
clustering
-
dimensionality reduction
-
-
reinforcement learning
- Monte Carlo methods
2.Q-Learning
3.Policy Gradient methods
ML in Scikit-LearnPermalink
-
Classification
7개 SGD Classifier, KNeighborsClassifier, LinearSVC, NaiveBayes, SVC, Kernel approximation, EnsembleClassifiers
-
Regression
7개 SGD Regressor, Lasso, ElasticNet, RidgeRegression, SVR(kernel=’linear’), SVR(kernel=’rbf’), EnsembelRegressor
-
Clustering
-
Dimensionality Reduction
algorithms are chosen by
-
amount of data
-
presense of label (정답의 유무)
-
type of data (quantity or category)
About Scikit-LearnPermalink
DependenciesPermalink
-
Python (>= 3.6)
-
NumPy (>= 1.13.3)
-
SciPy (>= 0.19.1)
-
joblib (>= 0.11)
InstallationPermalink
#!pip install scikit-learn
import sklearn
print(sklearn.__version__)
0.23.2
ModulesPermalink
Commonly used API
Data ExpressionPermalink
Datasets can be expressed using
-
NumPy ndarray
-
Pandas DataFrame
-
SciPy Sparse Matrix
In Scikit-Learn, we express data by Feature Matrix and Target Vector
Feature Matrix is the given data and the Target Vector is the results or answers corresponding to the data
Regression ModelPermalink
We will make a model that predicts outcome by using data and ML
import numpy as np
import matplotlib.pyplot as plt
r = np.random.RandomState(10)
x = 10 * r.rand(100)
y = 2 * x - 3 * r.rand(100)
plt.scatter(x,y)
<matplotlib.collections.PathCollection at 0x272f360aee0>
x.shape
(100,)
y.shape
(100,)
x and y shape is (100,) a 1 dimention vector
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model
LinearRegression()
# ! 에러 발생
model.fit(x, y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-65b198098d37> in <module>
1 # ! 에러 발생
----> 2 model.fit(x, y)
~\anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
503
504 n_jobs_ = self.n_jobs
--> 505 X, y = self._validate_data(X, y, accept_sparse=['csr', 'csc', 'coo'],
506 y_numeric=True, multi_output=True)
507
~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
430 y = check_array(y, **check_y_params)
431 else:
--> 432 X, y = check_X_y(X, y, **check_params)
433 out = X, y
434
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
793 raise ValueError("y cannot be None")
794
--> 795 X = check_array(X, accept_sparse=accept_sparse,
796 accept_large_sparse=accept_large_sparse,
797 dtype=dtype, order=order, copy=copy,
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
617 # If input is 1D raise error
618 if array.ndim == 1:
--> 619 raise ValueError(
620 "Expected 2D array, got 1D array instead:\narray={}.\n"
621 "Reshape your data either using array.reshape(-1, 1) if "
ValueError: Expected 2D array, got 1D array instead:
array=[7.71320643 0.20751949 6.33648235 7.48803883 4.98507012 2.24796646
1.98062865 7.60530712 1.69110837 0.88339814 6.85359818 9.53393346
0.03948266 5.12192263 8.12620962 6.12526067 7.21755317 2.91876068
9.17774123 7.14575783 5.42544368 1.42170048 3.7334076 6.74133615
4.41833174 4.34013993 6.17766978 5.13138243 6.50397182 6.01038953
8.05223197 5.21647152 9.08648881 3.19236089 0.90459349 3.00700057
1.13984362 8.28681326 0.46896319 6.26287148 5.47586156 8.19286996
1.9894754 8.56850302 3.51652639 7.54647692 2.95961707 8.8393648
3.25511638 1.65015898 3.92529244 0.93460375 8.21105658 1.5115202
3.84114449 9.44260712 9.87625475 4.56304547 8.26122844 2.51374134
5.97371648 9.0283176 5.34557949 5.90201363 0.39281767 3.57181759
0.7961309 3.05459918 3.30719312 7.73830296 0.39959209 4.29492178
3.14926872 6.36491143 3.4634715 0.43097356 8.79915175 7.63240587
8.78096643 4.17509144 6.05577564 5.13466627 5.97836648 2.62215661
3.00871309 0.25399782 3.03062561 2.42075875 5.57578189 5.6550702
4.75132247 2.92797976 0.64251061 9.78819146 3.39707844 4.95048631
9.77080726 4.40773825 3.18272805 5.19796986].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
If we use x as it is, w eget an error. We have to change x into a matrix. Since x is a Numpy ndarray type, we can use reshape().
#x는 numpy의 ndarray타입이니 reshape()를 사용하면 좋을 것 같네요.
X = x.reshape(100,1)
model.fit(X,y)
LinearRegression()
x_new = np.linspace(-1, 11, 100)
X_new = x_new.reshape(100,1)
y_new = model.predict(X_new)
In reshape(), input -1 automatically calculates rest of the numbers.
X_ = x_new.reshape(-1,1)
X_.shape
(100, 1)
We can use sklearn.metrics to check if the regression model’s predictions are accurate
from sklearn.metrics import mean_squared_error
error = mean_squared_error(x_new, y_new)
print(error)
25.934871794642014
plt.scatter(x, y, label='input data')
plt.plot(X_new, y_new, color='red', label='regression line')
[<matplotlib.lines.Line2D at 0x272f3d75880>]
The graph looks accurately placed
datasets modulesPermalink
sklearn.datasets provide data for us to use.
-
datasets.load_boston(): 회귀 문제, 미국 보스턴 집값 예측
-
datasets.load_breast_cancer(): 분류 문제, 유방암 판별
-
datasets.load_digits(): 분류 문제, 0 ~ 9 숫자 분류
-
datasets.load_iris(): 분류 문제, iris 품종 분류
-
datasets.load_wine(): 분류 문제, 와인 분류
from sklearn.datasets import load_wine
data = load_wine()
type(data)
sklearn.utils.Bunch
print(data)
{'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
1.065e+03],
[1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
1.050e+03],
[1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
1.185e+03],
...,
[1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
8.350e+02],
[1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
8.400e+02],
[1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
5.600e+02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2]), 'frame': None, 'target_names': array(['class_0', 'class_1', 'class_2'], dtype='<U7'), 'DESCR': '.. _wine_dataset:\n\nWine recognition dataset\n------------------------\n\n**Data Set Characteristics:**\n\n :Number of Instances: 178 (50 in each of three classes)\n :Number of Attributes: 13 numeric, predictive attributes and the class\n :Attribute Information:\n \t\t- Alcohol\n \t\t- Malic acid\n \t\t- Ash\n\t\t- Alcalinity of ash \n \t\t- Magnesium\n\t\t- Total phenols\n \t\t- Flavanoids\n \t\t- Nonflavanoid phenols\n \t\t- Proanthocyanins\n\t\t- Color intensity\n \t\t- Hue\n \t\t- OD280/OD315 of diluted wines\n \t\t- Proline\n\n - class:\n - class_0\n - class_1\n - class_2\n\t\t\n :Summary Statistics:\n \n ============================= ==== ===== ======= =====\n Min Max Mean SD\n ============================= ==== ===== ======= =====\n Alcohol: 11.0 14.8 13.0 0.8\n Malic Acid: 0.74 5.80 2.34 1.12\n Ash: 1.36 3.23 2.36 0.27\n Alcalinity of Ash: 10.6 30.0 19.5 3.3\n Magnesium: 70.0 162.0 99.7 14.3\n Total Phenols: 0.98 3.88 2.29 0.63\n Flavanoids: 0.34 5.08 2.03 1.00\n Nonflavanoid Phenols: 0.13 0.66 0.36 0.12\n Proanthocyanins: 0.41 3.58 1.59 0.57\n Colour Intensity: 1.3 13.0 5.1 2.3\n Hue: 0.48 1.71 0.96 0.23\n OD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71\n Proline: 278 1680 746 315\n ============================= ==== ===== ======= =====\n\n :Missing Attribute Values: None\n :Class Distribution: class_0 (59), class_1 (71), class_2 (48)\n :Creator: R.A. Fisher\n :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n :Date: July, 1988\n\nThis is a copy of UCI ML Wine recognition datasets.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data\n\nThe data is the results of a chemical analysis of wines grown in the same\nregion in Italy by three different cultivators. There are thirteen different\nmeasurements taken for different constituents found in the three types of\nwine.\n\nOriginal Owners: \n\nForina, M. et al, PARVUS - \nAn Extendible Package for Data Exploration, Classification and Correlation. \nInstitute of Pharmaceutical and Food Analysis and Technologies,\nVia Brigata Salerno, 16147 Genoa, Italy.\n\nCitation:\n\nLichman, M. (2013). UCI Machine Learning Repository\n[https://archive.ics.uci.edu/ml]. Irvine, CA: University of California,\nSchool of Information and Computer Science. \n\n.. topic:: References\n\n (1) S. Aeberhard, D. Coomans and O. de Vel, \n Comparison of Classifiers in High Dimensional Settings, \n Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of \n Mathematics and Statistics, James Cook University of North Queensland. \n (Also submitted to Technometrics). \n\n The data was used with many others for comparing various \n classifiers. The classes are separable, though only RDA \n has achieved 100% correct classification. \n (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) \n (All results using the leave-one-out technique) \n\n (2) S. Aeberhard, D. Coomans and O. de Vel, \n "THE CLASSIFICATION PERFORMANCE OF RDA" \n Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of \n Mathematics and Statistics, James Cook University of North Queensland. \n (Also submitted to Journal of Chemometrics).\n', 'feature_names': ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']}
data.keys()
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names'])
data.data
array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
1.065e+03],
[1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
1.050e+03],
[1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
1.185e+03],
...,
[1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
8.350e+02],
[1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
8.400e+02],
[1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
5.600e+02]])
data.data.shape
(178, 13)
data.data.ndim
2
data.target
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2])
data.target.shape
(178,)
data.feature_names
['alcohol',
'malic_acid',
'ash',
'alcalinity_of_ash',
'magnesium',
'total_phenols',
'flavanoids',
'nonflavanoid_phenols',
'proanthocyanins',
'color_intensity',
'hue',
'od280/od315_of_diluted_wines',
'proline']
len(data.feature_names)
13
data.target_names
array(['class_0', 'class_1', 'class_2'], dtype='<U7')
print(data.DESCR)
.. _wine_dataset:
Wine recognition dataset
------------------------
**Data Set Characteristics:**
:Number of Instances: 178 (50 in each of three classes)
:Number of Attributes: 13 numeric, predictive attributes and the class
:Attribute Information:
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline
- class:
- class_0
- class_1
- class_2
:Summary Statistics:
============================= ==== ===== ======= =====
Min Max Mean SD
============================= ==== ===== ======= =====
Alcohol: 11.0 14.8 13.0 0.8
Malic Acid: 0.74 5.80 2.34 1.12
Ash: 1.36 3.23 2.36 0.27
Alcalinity of Ash: 10.6 30.0 19.5 3.3
Magnesium: 70.0 162.0 99.7 14.3
Total Phenols: 0.98 3.88 2.29 0.63
Flavanoids: 0.34 5.08 2.03 1.00
Nonflavanoid Phenols: 0.13 0.66 0.36 0.12
Proanthocyanins: 0.41 3.58 1.59 0.57
Colour Intensity: 1.3 13.0 5.1 2.3
Hue: 0.48 1.71 0.96 0.23
OD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71
Proline: 278 1680 746 315
============================= ==== ===== ======= =====
:Missing Attribute Values: None
:Class Distribution: class_0 (59), class_1 (71), class_2 (48)
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988
This is a copy of UCI ML Wine recognition datasets.
https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data
The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.
Original Owners:
Forina, M. et al, PARVUS -
An Extendible Package for Data Exploration, Classification and Correlation.
Institute of Pharmaceutical and Food Analysis and Technologies,
Via Brigata Salerno, 16147 Genoa, Italy.
Citation:
Lichman, M. (2013). UCI Machine Learning Repository
[https://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science.
.. topic:: References
(1) S. Aeberhard, D. Coomans and O. de Vel,
Comparison of Classifiers in High Dimensional Settings,
Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of
Mathematics and Statistics, James Cook University of North Queensland.
(Also submitted to Technometrics).
The data was used with many others for comparing various
classifiers. The classes are separable, though only RDA
has achieved 100% correct classification.
(RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data))
(All results using the leave-one-out technique)
(2) S. Aeberhard, D. Coomans and O. de Vel,
"THE CLASSIFICATION PERFORMANCE OF RDA"
Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of
Mathematics and Statistics, James Cook University of North Queensland.
(Also submitted to Journal of Chemometrics).
dataset categorizationPermalink
#!pip install pandas
import pandas as pd
pd.DataFrame(data.data, columns=data.feature_names)
alcohol | malic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280/od315_of_diluted_wines | proline | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14.23 | 1.71 | 2.43 | 15.6 | 127.0 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065.0 |
1 | 13.20 | 1.78 | 2.14 | 11.2 | 100.0 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050.0 |
2 | 13.16 | 2.36 | 2.67 | 18.6 | 101.0 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185.0 |
3 | 14.37 | 1.95 | 2.50 | 16.8 | 113.0 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480.0 |
4 | 13.24 | 2.59 | 2.87 | 21.0 | 118.0 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
173 | 13.71 | 5.65 | 2.45 | 20.5 | 95.0 | 1.68 | 0.61 | 0.52 | 1.06 | 7.70 | 0.64 | 1.74 | 740.0 |
174 | 13.40 | 3.91 | 2.48 | 23.0 | 102.0 | 1.80 | 0.75 | 0.43 | 1.41 | 7.30 | 0.70 | 1.56 | 750.0 |
175 | 13.27 | 4.28 | 2.26 | 20.0 | 120.0 | 1.59 | 0.69 | 0.43 | 1.35 | 10.20 | 0.59 | 1.56 | 835.0 |
176 | 13.17 | 2.59 | 2.37 | 20.0 | 120.0 | 1.65 | 0.68 | 0.53 | 1.46 | 9.30 | 0.60 | 1.62 | 840.0 |
177 | 14.13 | 4.10 | 2.74 | 24.5 | 96.0 | 2.05 | 0.76 | 0.56 | 1.35 | 9.20 | 0.61 | 1.60 | 560.0 |
178 rows × 13 columns
X = data.data
y = data.target
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
RandomForestClassifier()
y_pred = model.predict(X)
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
#타겟 벡터 즉 라벨인 변수명 y와 예측값 y_pred을 각각 인자로 넣습니다.
print(classification_report(y, y_pred))
#정확도를 출력합니다.
print("accuracy = ", accuracy_score(y, y_pred))
precision recall f1-score support
0 1.00 1.00 1.00 59
1 1.00 1.00 1.00 71
2 1.00 1.00 1.00 48
accuracy 1.00 178
macro avg 1.00 1.00 1.00 178
weighted avg 1.00 1.00 1.00 178
accuracy = 1.0
accuracy is 1. This cannot be correct. I will explain in below how to fix this problem.
EstimatorPermalink
We can use the estimator to predict outcome regardless of supervised or unsupervised learning
Split Train and Test DataPermalink
We used the same data set for training and testing. This resulted in accuracy of 100% in our testing
from sklearn.datasets import load_wine
data = load_wine()
print(data.data.shape)
print(data.target.shape)
(178, 13)
(178,)
X_train = data.data[:142]
X_test = data.data[142:]
print(X_train.shape, X_test.shape)
(142, 13) (36, 13)
y_train = data.target[:142]
y_test = data.target[142:]
print(y_train.shape, y_test.shape)
(142,) (36,)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
RandomForestClassifier()
y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
print("정답률=", accuracy_score(y_test, y_pred))
정답률= 0.9444444444444444
We can also use the train_test_split function to split the data sets easily
from sklearn.model_selection import train_test_split
result = train_test_split(X, y, test_size=0.2, random_state=42)
print(type(result))
print(len(result))
<class 'list'>
4
result[0].shape
(142, 13)
result[1].shape
(36, 13)
result[2].shape
(142,)
result[3].shape
(36,)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
SummaryPermalink
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 데이터셋 로드하기
data = load_wine()
# 훈련용 데이터셋 나누기
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=11)
# 훈련하기
model = RandomForestClassifier()
model.fit(X_train, y_train)
# 예측하기
y_pred = model.predict(X_test)
# 정답률 출력하기
print("정답률=", accuracy_score(y_test, y_pred))
정답률= 0.9722222222222222
Leave a comment