I got a 2-D dataset with two columns x and y. I would like to get the linear regression coefficients and interception dynamically when new data feed in. Using scikit-learn I could calculate all current available data like this:
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
x = np.arange(100)
y = np.arange(100)+10*np.random.random_sample((100,))
regr.fit(x,y)
print(regr.coef_)
print(regr.intercept_)
However, I got quite big dataset (more than 10k rows in total) and I want to calculate coefficient and intercept as fast as possible whenever there's new rows coming in. Currently calculate 10k rows takes about 600 microseconds, and I want to accelerate this process.
Scikit-learn looks like does not have online update function for linear regression module. Is there any better ways to do this?