How to convert a Scikit-learn dataset to a Pandas dataset
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Book End
--
Chapters
00:00 How To Convert A Scikit-Learn Dataset To A Pandas Dataset
00:18 Accepted Answer Score 208
00:54 Answer 2 Score 127
01:08 Answer 3 Score 88
01:39 Answer 4 Score 19
01:55 Thank you
--
Full question
https://stackoverflow.com/questions/3810...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #scikitlearn #dataset
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Book End
--
Chapters
00:00 How To Convert A Scikit-Learn Dataset To A Pandas Dataset
00:18 Accepted Answer Score 208
00:54 Answer 2 Score 127
01:08 Answer 3 Score 88
01:39 Answer 4 Score 19
01:55 Thank you
--
Full question
https://stackoverflow.com/questions/3810...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #scikitlearn #dataset
#avk47
ACCEPTED ANSWER
Score 208
Manually, you can use pd.DataFrame constructor, giving a numpy array (data) and a list of the names of the columns (columns).
To have everything in one DataFrame, you can concatenate the features and the target into one numpy array with np.c_[...] (note the []):
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
# save load_iris() sklearn dataset to iris
# if you'd like to check dataset type use: type(load_iris())
# if you'd like to view list of attributes use: dir(load_iris())
iris = load_iris()
# np.c_ is the numpy concatenate function
# which is used to concat iris['data'] and iris['target'] arrays
# for pandas column argument: concat iris['feature_names'] list
# and string list (in this case one string); you can make this anything you'd like..
# the original dataset would probably call this ['Species']
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
ANSWER 2
Score 127
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df.head()
This tutorial maybe of interest: http://www.neural.cz/dataset-exploration-boston-house-pricing.html
ANSWER 3
Score 88
TOMDLt's solution is not generic enough for all the datasets in scikit-learn. For example it does not work for the boston housing dataset. I propose a different solution which is more universal. No need to use numpy as well.
from sklearn import datasets
import pandas as pd
boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df_boston['target'] = pd.Series(boston_data.target)
df_boston.head()
As a general function:
def sklearn_to_df(sklearn_dataset):
df = pd.DataFrame(sklearn_dataset.data, columns=sklearn_dataset.feature_names)
df['target'] = pd.Series(sklearn_dataset.target)
return df
df_boston = sklearn_to_df(datasets.load_boston())
ANSWER 4
Score 19
Took me 2 hours to figure this out
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
##iris.keys()
df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
Get back the species for my pandas