Simple E-Commerce Dataset Exploration Using Python

Muhamad Ilham
5 min readSep 18, 2023

--

Hi, this is my first post on medium. In this post, I would like to share my experience while exploring the raw E-commerce dataset. This project is one of my learning projects to improve my analytical skills. In this project, I used python programming as my tool with pandas, matplotlib and seaborn as my libraries.

photo source

E-Commerce Dataset Details

This dataset contains eight columns namely InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country. For more details about the dataset here.

Dataset Descriptions

Data Preprocessing

After obtaining the dataset, the next step is data preprocessing which aims to prepare the data before the exploration process. This stage consists of several things, such as removing missing values, changing data types, removing duplicate values and adding columns.

In the raw E-Commerce-dataset.csv there are a total of 136,534 rows missing values.

Missing values in E-Commerce-dataset.csv

Based on the table above, we can see that there are missing values in the Description and CustomerID columns, and there is a problem in the UnitPrice column which has a value of 0 which does not make sense in the data. In addition, the Quantity column has a negative value which should not be true. The next step is to drop the row that has a missing value and the UnitPrice column that has a value of 0. After the data has no missing value, the next step is to filter the data with the Quantity column more than 0, so that the negative value in the Quantity column is gone.

There were a total of 10,001 duplicates, which were subsequently removed.

E-Commerce-dataset after removing missing values and duplicate data

After doing data cleaning the total rows in the dataset changed from 541,909 to 392,732. The next step is to change the data type of several columns such as the InvoiceDate column from object to datetime, InvoiceNo from object to integer and CustomerID from object to integer. Next is to add the Revenue column to facilitate the data exploration process, to add the Revenue column we need to multiply the Quantity column with the UnitPrice column.

E-Commerce-dataset.csv after preprocessing.

After preprocessing the data, the data now contains 9 columns, namely InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country, and Revenue. The total rows of the dataset now total 392,732.

Data Exploration

In this dataset, customers come from 37 different countries. The country with the highest revenue was the United Kingdom(UK) with a total revenue of £7,285,000. Total orders from the UK accounted for 81.97% of all orders from 37 countries. The Netherlands and EIRE as the second and the third countries with the highest revenue with total £285,446.340 and £265,262.46.

5 Countries with the highest revenue

By revenue, the UK as the country with the highest revenue also has the most customers with a total of 3,921 which is 90% of the total number of customers from 37 countries. It makes sense that the UK has the highest revenue by customers.

5 Countries with the most Customers

However, based on average revenue, the Netherlands has the highest average revenue with £120.8 per customer. Followed by Australia and Japan as the second and third countries with the highest average revenue with £116.94 and £116.56 per customer.

Average Revenue Every Country

Based on all the data above, it can be said that the UK is the country with the highest total revenue. This can happen because the total customers from that country make up 90% of all customers in the world, because of that the UK is also the country with the highest total transactions with a percentage of 81.97% of all countries. However, the highest average revenue comes from the Netherlands with a total of £120.8 per customer, as the Netherlands is a country with low total customers compared to other countries but the Netherlands is included in the top 7 countries with high total transactions with 2,363 times and the second highest total revenue after the UK. From the data, we can say that customers from the Netherlands have the highest purchasing power out of 37 countries.

7 Countries with the highest total transactions

Based on the month, it can be said that the total revenue fluctuates every month, with the lowest revenue in February 2011 amounting to £446,084.92, while the highest revenue occurred in November 2011 with a total revenue of £1,156,205.61.

Total Revenue by Year-Month

Based on the time of day, we can see what time the highest total order occurs. The highest total order occurred at 12 noon with a total of 3,130 invoices.

Total Order by Hours

The product with the highest total purchase was Paper Craft, Little Birdie with a total purchase of 80,995 items. In addition, Medium Cermaic Top Storage Jar and World War 2 Gliders ASSTD Designs were the second and third most purchased products with 77,916 items and 54,319 items respectively.

Top 10 Products with highest purchase

Meanwhile, the product with the highest total order value was Paper Craft, Little Birdie with a total of £16,8 M. In addition, Regency Cakes and 3 Tier and White Hanging Heart T-Light Holder came in second and third as the products with the highest order value with a total of £14,2 M and £10,03 M.

Top 10 Products with highest total order value

Conclusion

  1. The country with the highest revenue is the UK, also with the most customers among 37 countries.
  2. The customers with the highest purchasing power are from the Netherlands.
  3. The highest total revenue was in November 2011.
  4. The highest total order at 12pm.
  5. The product with the highest total purchase and the highest total revenue is Paper Craft, Little Birdie.

Source

  1. https://www.kaggle.com/datasets/aliessamali/ecommerce
  2. https://www.kaggle.com/code/aliessamali/recommendation-systems-e-commerce#Recommendations

I will be very happy to discuss and accept any suggestions about the project since I’m still learning and still have a long way to go, please reach me through https://www.linkedin.com/in/muha-ilham/. Thank you!!

--

--