Suppose you want to predict what the length or width of a flower petal.

For this we can look for a relation between the two.

For this post we'll be looking at Linear Regression.

Linear regression is a test to see if two variables, let's say X and Y, are **related** so that when X increases; Y does as well. Y is therefor **dependent** on X, and if this relation is **valid** we can use a model to **predict** Y using X.

As a dataset we'll be using the iris dataset from sklearn.

We first import the dataset after which we'll take a look at the first few rows.

`iris = ds.load_iris() # the iris dataset`

df = pd.DataFrame.from_records(data=iris.data, columns=iris.feature_names)

df.head()

We want to find a relation between the width and length of a petal so we'll use them to plot a simple scatter plot.

`X = df['petal width (cm)']`

y = df['petal length (cm)']

plt.scatter(X,y)

plt.show()

Next we'll split the data, then train a linear model to see if the relation between the width and length is linear.

`# Split dataset`

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Train the model using the training sets

regr.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))

`# predict length`

y_pred = regr.predict(X_test.values.reshape(-1, 1))

After training our model and making a prediction between width and height we can plot both the test data and predicted results.

`plt.scatter(X_test,y_test,color='r')`

plt.scatter(X_test,y_pred,color='b', linewidth=2)

plt.show()

Looking at the plot we can visually determine there are no clear outliers to the data which means a linear relationship between petal width and length is valid.

Using linear regression we can determine linear relationships between data, in this example; the relation between a flower petal's width and length.

Get the code here!

These blogs about data science

21 September, 2018

10 July, 2018

11 April, 2018