Mobile Price Range Classification using AWS SageMaker

Guneet Kohli
5 min readFeb 11, 2024

Learning Machine Learning along with cloud computing often seems daunting. However, a platform where innovation meets simplicity Amazon SageMaker is a cloud-based machine-learning platform that simplifies model creation, training, and deployment. It accelerates workflows, offers cost-efficiency, and scales seamlessly.

The goal of this project was to build and deploy a Random-Forest multi-class classifier model on AWS SageMaker to predict mobile phone price ranges. The first step of the project was to understand the dataset given. The dataset was a collection of mobile phone features along with their corresponding price ranges.

Each row represents a different mobile phone, and the columns contain various attributes and specifications for each phone. This dataset contains mobile phone attributes and the label that we need to predict was Price_Range of the phone.

  • battery_power: Battery capacity.
  • blue: Bluetooth availability (0 or 1).
  • clock_speed: Processor clock speed.
  • dual_sim: Dual SIM card support (0 or 1).
  • fc: Front camera megapixels.
  • four_g: 4G network support (0 or 1).
  • int_memory: Internal memory (GB).
  • m_dep: Mobile depth (thickness).
  • mobile_wt: Mobile phone weight.
  • n_cores: Number of processor cores.
  • pc: Primary camera megapixels.
  • px_height: Pixel resolution height.
  • px_width: Pixel resolution width.
  • ram: RAM capacity.
  • sc_h: Screen height.
  • sc_w: Screen width.
  • talk_time: Talk time (hours).
  • three_g: 3G network support (0 or 1).
  • touch_screen: Touch screen availability (0 or 1).
  • wifi: Wi-Fi availability (0 or 1).
  • price_range: Target variable representing mobile phone price range.

In summary, the dataset includes various features that characterize mobile phones, and the goal appears to be predicting the price range based on these features. The target variable (price_range) is categorical, indicating different price ranges as follows:

  • Low Price Range (Label 0)
  • Medium Price Range (Label 1)
  • High Price Range (Label 2)
  • Very High Price Range (Label 3)

The dataset seems suitable for a classification task where the machine learning model aims to predict the price range category of a mobile phone.

Dataset

Tools Used: VS Code, Anaconda, AWS SageMaker, AWS S3, AWS IAM User, AWS IAM Role.

The project can be divided into three parts: Setup, Training and Development.

SETUP

Step 1: Installing AWS CLI to communicate better with Management Console from VS Code. Make sure to give the IAM User Administrative access. Download the access keys and keep them somewhere secure, avoid sharing it with anyone.

Communicating with AWS CLI from Terminal

Step 2: Created a user with Administrator access so that interaction between user and local machine is seamless.

Step 3: Created a new environment in VSCode listing all the requirements in a text file. Packages included boto3, sagemaker, scikit-learn,pandas, numpy and ipykernel

Step 4: Set up an S3 bucket for storage of train and test files into the cloud.

TRAINING PHASE

The training phase was a series of steps followed, which included data ingestion, feature engineering, writing a script file to get the tasks done, creating an IAM Role and then performing the actual training.

Data ingestion

Sent the train and test files in S3 buckets.

Script.py

Wrote a script that used the Random Forest Classifier from sklearn.

%%writefile script.py was used to create a script in a notebook.

Creating an IAM Role

IAM Role(not user) was created and the ARN was used in the script. Made sure to add SageMaker policy in the role to prevent any errors further in the code.

IAM Role

Sagemaker using the Script.py file

The script.py file serves as the entry point for our sklearn model. Here the ARN of the role comes in play.

# Importing sagemaker's default SKLearn library
from sagemaker.sklearn.estimator import SKLearn

FRAMEWORK_VERSION = "0.23-1"

sklearn_estimator = SKLearn(
entry_point="script.py",

# ARN of a new sagemaker role (ARN of user does not work)
role="arn:aws:iam::905418303768:role/sagemaker-role",

# creates instance inside the Sagemaker machine
instance_count=1,
instance_type="ml.m5.large",

# framework version present in the documentation, declared above
framework_version=FRAMEWORK_VERSION,

# name of folder after model has been trained
base_job_name="RF-custom-sklearn",

# hyperparameters to the RF classifier
hyperparameters={
"n_estimators": 100,
"random_state": 0,
},
use_spot_instances = True,
max_wait = 7200,
max_run = 3600
)

Before deploying the model, methods such as .fit are used to ensure model training gets completed.

Training job status

The Model accuracy on the testing data is 88.33%.

Accuracy

DEPLOYMENT PHASE

To ensure that a copy of the model is created , another location is specified. This step is taken as a measure to ensure availability so that the copy of the model is used for deployment.

End-point Deployment

This is done by performing the model.deploy() function

endpoint_name = "Custom-sklearn-model-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName={}".format(endpoint_name))

predictor = model.deploy(
initial_instance_count=1,

# deploy in this specific instance as an endpoint
instance_type="ml.m4.xlarge",
endpoint_name=endpoint_name,
)

Testing the deployment

A sample example was taken to test the deployment. Since it was a multi class classification problem, the solution was dependent on 20 dimensions. For a sample set of points, the classification for the mobile came out to be Very High Price Range.

| Feature         | Value |
|-----------------|-------|
| Battery Power | 1454 |
| Bluetooth | 1.0 |
| Clock Speed | 0.5 |
| Dual SIM | 1.0 |
| Front Camera | 1.0 |
| 4G Support | 0.0 |
| Internal Memory | 34.0 |
| Depth | 0.7 |
| Weight | 83.0 |
| Cores | 4.0 |
| PC | 3.0 |
| Pixel Height | 250.0 |
| Pixel Width | 1033.0|
| RAM | 3419.0|
| Screen Height | 7.0 |
| Screen Width | 5.0 |
| Talk Time | 5.0 |
| 3G Support | 1.0 |
| Touch Screen | 1.0 |
| WiFi | 0.0 |

---------------------------------------------------------------------
| Price Range | * MODEL DETERMINED AS 3 => VERY HIGH PRICE RANGE* |
---------------------------------------------------------------------

Learning Resources:

Links to Code and files: Github

--

--

Guneet Kohli

Inquisitive CS grad, thriving in the world of Ravenclaws && Gryffindors.