The Python Oracle

How to create a density plot in matplotlib?

This video explains
How to create a density plot in matplotlib?

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay

--

Chapters
00:00 Question
01:09 Accepted answer (Score 140)
02:50 Answer 2 (Score 189)
03:28 Answer 3 (Score 68)
04:07 Answer 4 (Score 51)
04:28 Thank you

--

Full question
https://stackoverflow.com/questions/4150...

Question links:
[the normed=True parameter]: http://matplotlib.sourceforge.net/exampl...

Answer 1 links:
[seaborn]: http://stanford.edu/~mwaskom/software/se.../
[image]: https://i.stack.imgur.com/ZHRs2.png

Answer 2 links:
[image]: https://i.stack.imgur.com/dNjzz.png
[image]: https://i.stack.imgur.com/7vfxl.png

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #r #numpy #matplotlib #scipy



ANSWER 1

Score 202


Five years later, when I Google "how to create a kernel density plot using python", this thread still shows up at the top!

Today, a much easier way to do this is to use seaborn, a package that provides many convenient plotting functions and good style management.

import numpy as np
import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.set_style('whitegrid')
sns.kdeplot(np.array(data), bw=0.5)

enter image description here




ACCEPTED ANSWER

Score 149


Sven has shown how to use the class gaussian_kde from Scipy, but you will notice that it doesn't look quite like what you generated with R. This is because gaussian_kde tries to infer the bandwidth automatically. You can play with the bandwidth in a way by changing the function covariance_factor of the gaussian_kde class. First, here is what you get without changing that function:

alt text

However, if I use the following code:

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = gaussian_kde(data)
xs = np.linspace(0,8,200)
density.covariance_factor = lambda : .25
density._compute_covariance()
plt.plot(xs,density(xs))
plt.show()

I get

alt text

which is pretty close to what you are getting from R. What have I done? gaussian_kde uses a changable function, covariance_factor to calculate its bandwidth. Before changing the function, the value returned by covariance_factor for this data was about .5. Lowering this lowered the bandwidth. I had to call _compute_covariance after changing that function so that all of the factors would be calculated correctly. It isn't an exact correspondence with the bw parameter from R, but hopefully it helps you get in the right direction.




ANSWER 3

Score 73


Option 1:

Use pandas dataframe plot (built on top of matplotlib):

import pandas as pd
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
pd.DataFrame(data).plot(kind='density') # or pd.Series()

enter image description here

Option 2:

Use distplot of seaborn:

import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.distplot(data, hist=False)

enter image description here




ANSWER 4

Score 53


Maybe try something like:

import matplotlib.pyplot as plt
import numpy
from scipy import stats
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = stats.kde.gaussian_kde(data)
x = numpy.arange(0., 8, .1)
plt.plot(x, density(x))
plt.show()

You can easily replace gaussian_kde() by a different kernel density estimate.