Suppose you want to predict what the length or width of a flower petal.
For this we can look for a relation between the two.
For this post we'll be looking at Linear Regression.
Linear regression is a test to see if two variables, let's say X and Y, are related so that when X increases; Y does as well. Y is therefor dependent on X, and if this relation is valid we can use a model to predict Y using X.
As a dataset we'll be using the iris dataset from sklearn.
We first import the dataset after which we'll take a look at the first few rows.
iris = ds.load_iris() # the iris dataset
df = pd.DataFrame.from_records(data=iris.data, columns=iris.feature_names)
df.head()
We want to find a relation between the width and length of a petal so we'll use them to plot a simple scatter plot.
X = df['petal width (cm)']
y = df['petal length (cm)']
plt.scatter(X,y)
plt.show()
Next we'll split the data, then train a linear model to see if the relation between the width and length is linear.
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# Train the model using the training sets
regr.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))
# predict length
y_pred = regr.predict(X_test.values.reshape(-1, 1))
After training our model and making a prediction between width and height we can plot both the test data and predicted results.
plt.scatter(X_test,y_test,color='r')
plt.scatter(X_test,y_pred,color='b', linewidth=2)
plt.show()
Looking at the plot we can visually determine there are no clear outliers to the data which means a linear relationship between petal width and length is valid.
Using linear regression we can determine linear relationships between data, in this example; the relation between a flower petal's width and length.
Get the code here!