Suppose you want to predict what the length or width of a flower petal.
For this we can look for a relation between the two.
For this post we'll be looking at Linear Regression.
Linear regression is a test to see if two variables, let's say X and Y, are related so that when X increases; Y does as well. Y is therefor dependent on X, and if this relation is valid we can use a model to predict Y using X.
As a dataset we'll be using the iris dataset from sklearn.
We first import the dataset after which we'll take a look at the first few rows.
iris = ds.load_iris() # the iris dataset
df = pd.DataFrame.from_records(data=iris.data, columns=iris.feature_names)
We want to find a relation between the width and length of a petal so we'll use them to plot a simple scatter plot.
X = df['petal width (cm)']
y = df['petal length (cm)']
Next we'll split the data, then train a linear model to see if the relation between the width and length is linear.
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# Train the model using the training sets
regr.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))
# predict length
y_pred = regr.predict(X_test.values.reshape(-1, 1))
After training our model and making a prediction between width and height we can plot both the test data and predicted results.
Looking at the plot we can visually determine there are no clear outliers to the data which means a linear relationship between petal width and length is valid.
Using linear regression we can determine linear relationships between data, in this example; the relation between a flower petal's width and length.
Get the code here!