A Chinese automobile company Geely Auto aspires to enter the US market by setting up their manufacturing unit there and producing cars locally to give competition to their US and European counterparts. They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:
- Which variables are significant in predicting the price of a car
- How well those variables describe the price of a car
Based on various market surveys, the consulting firm has gathered a large dataset of different types of cars across the Americal market.
You are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market.
Features:
- Make
- body-style
- wheel-base
- engine-size
- horsepower
- peak-rpm
- highway-mpg
- price
Labels: Price
Algorithm: Linear Regression
Step 1 – Go to https://studio.azureml.net/
We can either use our own dataset or provided by azure Here we gone use custom dataset provided by the azure , but you can use your own
Step 2 – Upload the dataset
Step 3- Create a new experiment by clicking +NEW at the bottom of the Machine Learning Studio (classic) window. Select EXPERIMENT > Blank Experiment.
Just drag and drop
In this dataset, each row represents an automobile, and the variables associated with each automobile appear as columns. We'll predict the price in far-right column (column 26, titled "price") using the variables for a specific automobile.
Select column in the dataset
Prepare the data A dataset usually requires some preprocessing before it can be analyzed. You might have noticed the missing values present in the columns of various rows. These missing values need to be cleaned so the model can analyze the data correctly. We'll remove any rows that have missing values. Also, the normalized-losses column has a large proportion of missing values, so we'll exclude that column from the model altogether.
Clean missing data
Select column in the dataset
Split data
This is the regression type of problem so we choose linear regression model
So now select the target/label feature
Now select scoring model
Now Evaluate model using testing dataset
Now click on the RUN button
Check the evaluated result I.e. metric score