how to analysis the heatmap to find the correlation

+1 vote
Sep 28, 2019 in Machine Learning by Vikas
• 130 points
10,371 views

1 answer to this question.

0 votes

Hi @Vikas, there are 5 simple steps to analyze the heatmap correlation:

1. Import data

data = pd.read_csv('file_clean.csv')

2. Create correlation matrix. .corr() is used to create the correlation matrix. You'll have to make sure that all the elements in the matrix are of numeric type. If they are not of the numeric type you'll have to add or concat them explicitly.

corr = data.corr()

3. Create heatmap in seaborn:

ax = sns.heatmap(
    corr, 
    vmin=-1, vmax=1, center=0,
    cmap=sns.diverging_palette(20, 220, n=200),
    square=True
)
ax.set_xticklabels(
    ax.get_xticklabels(),
    rotation=45,
    horizontalalignment='right'
);

You'll see something like this where the blue indicates positive and red indicates negative. 

Now to start analyzing the heatmap correlation, ask yourself this question:

What's the weakest and strongest correlation pair?

I am assuming its difficult to analyze right? 

Now according to your dataset, you need to create a scatter plot which makes it easier to analyze.

def heatmap(x, y, size):
    fig, ax = plt.subplots()
    
    # Mapping from column names to integer coordinates
    x_labels = [v for v in sorted(x.unique())]
    y_labels = [v for v in sorted(y.unique())]
    x_to_num = {p[1]:p[0] for p in enumerate(x_labels)} 
    y_to_num = {p[1]:p[0] for p in enumerate(y_labels)} 
    
    size_scale = 500
    ax.scatter(
        x=x.map(x_to_num), # Use mapping for x
        y=y.map(y_to_num), # Use mapping for y
        s=size * size_scale, # Vector of square sizes, proportional to size parameter
        marker='s' # Use square as scatterplot marker
    )
    
    # Show column labels on the axes
    ax.set_xticks([x_to_num[v] for v in x_labels])
    ax.set_xticklabels(x_labels, rotation=45, horizontalalignment='right')
    ax.set_yticks([y_to_num[v] for v in y_labels])
    ax.set_yticklabels(y_labels)
    
data = pd.read_csv('https://raw.githubusercontent.com/drazenz/heatmap/master/autos.clean.csv')
columns = ['bore', 'stroke', 'compression-ratio', 'horsepower', 'city-mpg', 'price'] 
corr = data[columns].corr()
corr = pd.melt(corr.reset_index(), id_vars='index') # Unpivot the dataframe, so we can get pair of arrays for x and y
corr.columns = ['x', 'y', 'value']
heatmap(
    x=corr['x'],
    y=corr['y'],
    size=corr['value'].abs()
)

You'll get something like this:

Make a few modifications(get the plots in between the grid)

ax.grid(False, 'major')
ax.grid(True, 'minor')
ax.set_xticks([t + 0.5 for t in ax.get_xticks()], minor=True)
ax.set_yticks([t + 0.5 for t in ax.get_yticks()], minor=True)

And you are good to go!
Have a look at this blog: https://towardsdatascience.com/better-heatmaps-and-correlation-matrix-plots-in-python-41445d0f2bec

answered Sep 30, 2019 by Vishal
I tried the code sample you have given above. Everything seems fine except the graphs(both) not showing. Any idea? Thanks !
Are you trying to plot both the graph in a single window? Or explain your query a little bit.

Related Questions In Machine Learning

0 votes
1 answer

How to find the intersecting point in the regression line

To find the intersecting point use the ...READ MORE

answered Apr 5, 2022 in Machine Learning by Dev
• 6,000 points
912 views
0 votes
1 answer

How to import the BatchNormalization function in Keras?

Hi@akhtar, The general use case is to use ...READ MORE

answered Jul 29, 2020 in Machine Learning by MD
• 95,440 points
3,422 views
0 votes
1 answer

How to determine the correct kernel function?

Hi@Ogun, It depends on your dataset. First thing ...READ MORE

answered Oct 12, 2020 in Machine Learning by MD
• 95,440 points
505 views
0 votes
0 answers

How to find all latest java files.

I need all the latest java packages ...READ MORE

Dec 1, 2021 in Machine Learning by anonymous
• 120 points
306 views
0 votes
1 answer

How to extract the regression coefficient from statsmodels.api?

The coefficients can be obtained using the ...READ MORE

answered Mar 17, 2022 in Machine Learning by Dev
• 6,000 points
9,023 views
0 votes
1 answer
0 votes
1 answer

How to specify the prior probability for scikit-learn's Naive Bayes

In GaussianNB, there is a mechanism to ...READ MORE

answered Apr 7, 2022 in Machine Learning by Nandini
• 5,480 points
1,636 views
0 votes
1 answer
+1 vote
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP