Basic Machine Learning - Linear Regression

Originally posted on Apr 26, 2018 10:00:00 AM
Last updated on April 22, 2024
Bart Maertens

data architect and developer with over 20 years of experience in data engineering and analytics. Founder and lead of the know.bi expert team, Apache Hop co-founder and PMC member.

What size is this?

Suppose you want to predict what the length or width of a flower petal.
For this we can look for a relation between the two.

For this post we'll be looking at Linear Regression.

Linear regression is a test to see if two variables, let's say X and Y, are related so that when X increases; Y does as well. Y is therefor dependent on X, and if this relation is valid we can use a model to predict Y using X.

As a dataset we'll be using the iris dataset from sklearn.

We first import the dataset after which we'll take a look at the first few rows.

iris = ds.load_iris() # the iris dataset df = pd.DataFrame.from_records(data=iris.data, columns=iris.feature_names) df.head()

head

We want to find a relation between the width and length of a petal so we'll use them to plot a simple scatter plot.

X = df['petal width (cm)'] y = df['petal length (cm)'] plt.scatter(X,y) plt.show()

scatter2

Next we'll split the data, then train a linear model to see if the relation between the width and length is linear.

# Split dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) # Train the model using the training sets regr.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))

# predict length y_pred = regr.predict(X_test.values.reshape(-1, 1))

After training our model and making a prediction between width and height we can plot both the test data and predicted results.

plt.scatter(X_test,y_test,color='r') plt.scatter(X_test,y_pred,color='b', linewidth=2) plt.show()

scatter1

Looking at the plot we can visually determine there are no clear outliers to the data which means a linear relationship between petal width and length is valid.

Using linear regression we can determine linear relationships between data, in this example; the relation between a flower petal's width and length.

Get the code here!

data science, machine learning, algorithm, linear regression

What's weird about this?

At certain times you might be faced with unexpected patterns or events...

Is this A, or B?

As a follow-up to last week's machine learning tidbit let's look at an example of...

How is this related?

In this post, we'll take a look at how we can find out in what way data is...

Basic Machine Learning - Linear Regression

What size is this?

Subscribe to the know.bi blog

Blog comments

Related posts

Basic Machine Learning - Anomaly Detection

What's weird about this?

Basic Machine Learning - Classification

Is this A, or B?

Basic Machine Learning - Clustering

How is this related?