The Python Oracle

python - how to compute correlation-matrix with nans in data-matrix

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Cool Puzzler LoFi

--

Chapters
00:00 Python - How To Compute Correlation-Matrix With Nans In Data-Matrix
01:46 Accepted Answer Score 14
02:27 Thank you

--

Full question
https://stackoverflow.com/questions/2710...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #scipy #correlation

#avk47



ACCEPTED ANSWER

Score 14


I think the method you are looking for is corr() from pandas. For example, a dataframe as following. You can also refer to this question. How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?

import pandas as pd
df = pd.DataFrame({'A': [2, None, 1, -4, None, None, 3],
                   'B': [None, 1, None, None, 1, 3, None],
                   'C': [2, 1, None, 2, 2.1, 1, 0],
                   'D': [-2, 1.1, 3.2, 2, None, 1, None]})

df
    A       B       C       D
0   2       NaN     2       -2
1   NaN     1       1       1.1
2   1       NaN     NaN     3.2
3   -4      NaN     2       2
4   NaN     1       2.1     NaN
5   NaN     3       1       1
6   3       NaN     0       NaN
rho = df.corr()
rho
       A          B            C           D
A   1.000000     NaN       -0.609994    -0.441784
B   NaN          1.0       -0.500000    -1.000000
C   -0.609994    -0.5       1.000000    -0.347928
D   0.041204     -1.0       -0.347928    1.000000