pandas

Computational Tools

Find The Correlation Between Columns

Suppose you have a DataFrame of numerical values, for example:

df = pd.DataFrame(np.random.randn(1000, 3), columns=['a', 'b', 'c'])

Then

>>> df.corr()
    a    b    c
a    1.000000    0.018602    0.038098
b    0.018602    1.000000    -0.014245
c    0.038098    -0.014245    1.000000

will find the Pearson correlation between the columns. Note how the diagonal is 1, as each column is (obviously) fully correlated with itself.

pd.DataFrame.correlation takes an optional method parameter, specifying which algorithm to use. The default is pearson. To use Spearman correlation, for example, use

>>> df.corr(method='spearman')
    a    b    c
a    1.000000    0.007744    0.037209
b    0.007744    1.000000    -0.011823
c    0.037209    -0.011823    1.000000

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow