How to create a DataFrame of random integers with Pandas?
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Luau
--
Chapters
00:00 How To Create A Dataframe Of Random Integers With Pandas?
00:29 Accepted Answer Score 291
01:10 Answer 2 Score 24
01:36 Answer 3 Score 5
02:33 Thank you
--
Full question
https://stackoverflow.com/questions/3275...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #size #shapes
#avk47
ACCEPTED ANSWER
Score 292
numpy.random.randint accepts a third argument (size) , in which you can specify the size of the output array. You can use this to create your DataFrame -
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
Here - np.random.randint(0,100,size=(100, 4)) - creates an output array of size (100,4) with random integer elements between [0,100) .
Demo -
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
which produces:
A B C D
0 45 88 44 92
1 62 34 2 86
2 85 65 11 31
3 74 43 42 56
4 90 38 34 93
5 0 94 45 10
6 58 23 23 60
.. .. .. .. ..
ANSWER 2
Score 25
The recommended way to create random integers with NumPy these days is to use numpy.random.Generator.integers. (documentation)
import numpy as np
import pandas as pd
rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100, 4)), columns=list('ABCD'))
df
----------------------
A B C D
0 58 96 82 24
1 21 3 35 36
2 67 79 22 78
3 81 65 77 94
4 73 6 70 96
... ... ... ... ...
95 76 32 28 51
96 33 68 54 77
97 76 43 57 43
98 34 64 12 57
99 81 77 32 50
100 rows × 4 columns
ANSWER 3
Score 6
You can also use np.random.Generator.choice.
df = pd.DataFrame(np.random.default_rng().choice(100, size=(100, 4)), columns=['A','B','C','D'])
The advantage of this method over integers is that you can choose from any list / array you want. For example, if you want to generate random sample from [2, 5, 10], then
df = pd.DataFrame(np.random.default_rng().choice([2,5,10], size=(100, 4)), columns=['A','B','C','D'])
You can even associate a probability distribution to sample entries. For example, if you want to choose 2 with p=0.8, and 5 with p=0.2, you can do so by, passing p= argument.
df = pd.DataFrame(np.random.default_rng().choice([2,5], p=[.8,.2], size=(100, 4)), columns=['A','B','C','D'])
Also, with the Generator, choice is as fast as integers and faster than randint.
%timeit pd.DataFrame(np.random.default_rng().choice(100, size=(100_000,4)), columns=[*'ABCD'])
# 3.34 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.DataFrame(np.random.default_rng().integers(0, 100, size=(100_000,4)), columns=[*'ABCD'])
# 3.81 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.DataFrame(np.random.randint(100, size=(100_000,4)), columns=[*'ABCD'])
# 6.78 ms ± 776 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)