Basic Machine Learning - Linear Regression

What size is this?

Suppose you want to predict what the length or width of a flower petal.
For this we can look for a relation between the two.

For this post we'll be looking at Linear Regression.

Linear regression is a test to see if two variables, let's say X and Y, are related so that when X increases; Y does as well. Y is therefor dependent on X, and if this relation is valid we can use a model to predict Y using X.

As a dataset we'll be using the iris dataset from sklearn.

We first import the dataset after which we'll take a look at the first few rows.

iris = ds.load_iris() # the iris dataset
df = pd.DataFrame.from_records(data=iris.data, columns=iris.feature_names)
df.head()

head



We want to find a relation between the width and length of a petal so we'll use them to plot a simple scatter plot.

X = df['petal width (cm)']
y = df['petal length (cm)']
plt.scatter(X,y)
plt.show()

scatter2



Next we'll split the data, then train a linear model to see if the relation between the width and length is linear.

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# Train the model using the training sets
regr.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))


 

# predict length
y_pred = regr.predict(X_test.values.reshape(-1, 1))



After training our model and making a prediction between width and height we can plot both the test data and predicted results.

plt.scatter(X_test,y_test,color='r')
plt.scatter(X_test,y_pred,color='b', linewidth=2)
plt.show()

scatter1



Looking at the plot we can visually determine there are no clear outliers to the data which means a linear relationship between petal width and length is valid.

Using linear regression we can determine linear relationships between data, in this example; the relation between a flower petal's width and length.

Get the code here!

Get started with data science!

You may also like

These blogs about data science

GraphConnect, the annual Neo4J event, was hosted in New York yesterday (2018-09-20). About 800 people gathered near Times Square for a day of talks about…

Amazon SageMaker is a "fully managed machine learning service". This means it provisions an environment for data scientists and developers without them needing…

What's weird about this? At certain times you might be faced with unexpected patterns or events appearing in your data. Let's take a look on how we can tackle…