AzureMLStudio_Lab_Demo.md

Microsoft Azure Machine Learning Studio Experiment: Dataset: Automobile Price data

Problem statement

A Chinese automobile company Geely Auto aspires to enter the US market by setting up their manufacturing unit there and producing cars locally to give competition to their US and European counterparts. They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:

Which variables are significant in predicting the price of a car
How well those variables describe the price of a car

Based on various market surveys, the consulting firm has gathered a large dataset of different types of cars across the Americal market.

Business Goal

You are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

Aim: To predict price of the car.

Features:

Make
body-style
wheel-base
engine-size
horsepower
peak-rpm
highway-mpg
price

Labels: Price

Algorithm: Linear Regression

Step 1 – Go to https://studio.azureml.net/

We can either use our own dataset or provided by azure Here we gone use custom dataset provided by the azure , but you can use your own

Step 2 – Upload the dataset

Step 3- Create a new experiment by clicking +NEW at the bottom of the Machine Learning Studio (classic) window. Select EXPERIMENT > Blank Experiment.

Just drag and drop

In this dataset, each row represents an automobile, and the variables associated with each automobile appear as columns. We'll predict the price in far-right column (column 26, titled "price") using the variables for a specific automobile.

Select column in the dataset

Prepare the data A dataset usually requires some preprocessing before it can be analyzed. You might have noticed the missing values present in the columns of various rows. These missing values need to be cleaned so the model can analyze the data correctly. We'll remove any rows that have missing values. Also, the normalized-losses column has a large proportion of missing values, so we'll exclude that column from the model altogether.