Computational Tools
Find The Correlation Between Columns
Suppose you have a DataFrame of numerical values, for example:
df = pd.DataFrame(np.random.randn(1000, 3), columns=['a', 'b', 'c'])
Then
>>> df.corr()
a b c
a 1.000000 0.018602 0.038098
b 0.018602 1.000000 -0.014245
c 0.038098 -0.014245 1.000000
will find the Pearson correlation between the columns. Note how the diagonal is 1, as each column is (obviously) fully correlated with itself.
pd.DataFrame.correlation
takes an optional method
parameter, specifying which algorithm to use. The default is pearson
. To use Spearman correlation, for example, use
>>> df.corr(method='spearman')
a b c
a 1.000000 0.007744 0.037209
b 0.007744 1.000000 -0.011823
c 0.037209 -0.011823 1.000000