Computational Tools
Find The Correlation Between Columns
Suppose you have a DataFrame of numerical values, for example:
df = pd.DataFrame(np.random.randn(1000, 3), columns=['a', 'b', 'c'])Then
>>> df.corr()
a b c
a 1.000000 0.018602 0.038098
b 0.018602 1.000000 -0.014245
c 0.038098 -0.014245 1.000000will find the Pearson correlation between the columns. Note how the diagonal is 1, as each column is (obviously) fully correlated with itself.
pd.DataFrame.correlation takes an optional method parameter, specifying which algorithm to use. The default is pearson. To use Spearman correlation, for example, use
>>> df.corr(method='spearman')
a b c
a 1.000000 0.007744 0.037209
b 0.007744 1.000000 -0.011823
c 0.037209 -0.011823 1.000000