Error using sklearn and linear regression shapes 1 16 and 1 1 not aligned 16 dim 1 1 dim 0

0 votes

I wanted to learn about machine learning and I stumbled upon youtube siraj and his Udacity videos and wanted to try and pick up a few things.

His video in reference: https://www.youtube.com/watch?v=vOppzHpvTiQ&index=1&list=PL2-dafEMk2A7YdKv4XfKpfbTH5z6rEEj3

In his video, he had a txt file he imported and read, but when I tried to recreate the the txt file it couldn't be read correctly. Instead, I tried to create a pandas dataframe with the same data and perform the linear regression/predict on it, but then I got the below error.

Found input variables with inconsistent numbers of samples: [1, 16] and something about passing 1d arrays and I need to reshape them.

Then when I tried to reshape them following this post: Sklearn : ValueError: Found input variables with inconsistent numbers of samples: [1, 6]

I get this error....

shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)

This is my code down below. I know it's probably a syntax error, I'm just not familiar with this scklearn yet and would like some help.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model

#DF = pd.read_fwf('BrainBodyWeight.txt')
DF = pd.DataFrame()
DF['Brain'] = [3.385, .480, 1.350, 465.00,36.330, 27.660, 14.830, 1.040, 4.190, 0.425, 0.101, 0.920, 1.000, 0.005, 0.060, 3.500 ]

DF['Body'] = [44.500, 15.5, 8.1, 423, 119.5, 115, 98.2, 5.5,58, 6.40, 4, 5.7,6.6, .140,1, 10.8]

try:
    x = DF['Brain']
    y = DF['Body']

    x = x.tolist()
    y = y.tolist()

    x = np.asarray(x)
    y = np.asarray(y)


    body_reg = linear_model.LinearRegression()
    body_reg.fit(x.reshape(-1,1),y.reshape(-1,1))
    plt.scatter(x,y)
    plt.plot(x,body_reg.predict(x))
    plt.show()
except Exception as e:
    print(e)

Can anyone explain why sklearn doesn't like my input????

Apr 11, 2022 in Machine Learning by Dev
• 6,000 points
2,049 views

1 answer to this question.

0 votes

Based on documents LinearRegression. An x array of the shape [n samples,n features] is required by fit(). That's why, before executing fit, you're altering your x array. There are no n features given because if you don't, you'll end up with an array with the shape (16,), which does not fit the needed [n samples,n features] shape.

z = DF['Brain']
z = z.tolist()
z = np.asarray(z)

# 16 samples, None feature
z.shape
(16,)

# 16 samples, 1 feature
z.reshape(-1,1).shape
(16,1)

The LinearRegression has the same criteria. When calling the predict function (and also for consistency), you only need to conduct the same reshaping as before.

plt.plot(z,body_reg.predict(z.reshape(-1,1)))

You can also simply restructure the x array before executing any functions. 
You can also access the inner numpy array of values for feature reference by calling DF['Brain'].values. You don't need to cast it to numpy array -> list. So instead of doing all the conversions, you can just use this:

z = DF['Brain'].values.reshape(1,-1)
y = DF['Body'].values.reshape(1,-1)

reg = linear_model.LinearRegression()
reg.fit(z, y)

Hope this helps!

Supercharge Your Skills with Our Machine Learning Course!

answered Apr 13, 2022 by anonymous

Related Questions In Machine Learning

0 votes
1 answer

Keras image binary classification, which class is assigned probability 0 and 1 ? Using Functional API

Not exactly.  If your sigmoid output was ...READ MORE

answered Mar 7, 2022 in Machine Learning by Dev
• 6,000 points
1,653 views
0 votes
1 answer
0 votes
1 answer

How to specify the prior probability for scikit-learn's Naive Bayes

In GaussianNB, there is a mechanism to ...READ MORE

answered Apr 7, 2022 in Machine Learning by Nandini
• 5,480 points
1,652 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,311 views
0 votes
1 answer

different results for Random Forest Regression in R and Python

Random Forests, as others have mentioned, have ...READ MORE

answered Apr 12, 2022 in Machine Learning by Dev
• 6,000 points
1,403 views
0 votes
1 answer

How to add random and/or fixed effects into cloglog regression in R

The standard glm function can be used ...READ MORE

answered Apr 13, 2022 in Machine Learning by anonymous
603 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP