Comparison of Major Supermarkets in Australia — Part 1 (Overview)

Periculosa Libertas
6 min readAug 26, 2021

Since the outbreak of COVID-19, there has been an undeniable shift in consumer behaviour across the world, especially in Australia — where lockdowns has been the go-to measure to “stop the spread”. These lockdowns, coupled with governmental health advice on minimizing social contact, has acted as a strong catalyst for consumers to adopt online shopping — especially when shopping for groceries.

The groceries industry in Australia is dominated by two main incumbents, Woolworths and Coles but their dominance has been challenged in recent times with the entry of Aldi and Costco. As online groceries shopping becomes more widely adopted, pricing will become an even more important factor for consumers when deciding which store to buy their weekly groceries from. This is due to the fact that location (i.e. convenience) would no longer be a factor to influence shoppers.

So, which one of these major incumbents have the most competitive pricing? What are the similarities between these supermarkets vis-à-vis product category segmentation and pricing strategy? What are some other inferences we can draw from these data explorations? These are some of the questions I intend to explore.

According to IBIS World’s May 2020 Industry Report (G4111) — Supermarkets and Grocery Stores in Australia:

“Supermarkets primarily compete on the basis of price. Consumers are price conscious, wanting to be assured they are purchasing value-for-money goods. Heavy discounting by the industry’s major players, in an act to drive consumers instore and maintain market shares, highlights the importance of the price in the industry.”

According to the same report, the market share of the major players are:

  • Woolworths Group — 37.6%
  • Coles Group — 29.1%
  • Aldi — 9.9%
  • Metcash (which owns IGA) — 6.6%
Source: IBIS World Industry Report G4111 — Supermarkets and Grocery Stores in Australia (May 2020)

The online groceries industry has the following market share breakdown:

  • Woolworths Group — 40.9%
  • Coles Group — 29.8%
  • Grocery Delivery E-Services Australia — 11%
  • Other — 18.3%
Source: IBIS World Industry Report OD5527 — Online Grocery Sales in Australia (April 2020)

My intention is to explore the differences in their product category segmentation (i.e. how many products are stocked in each category, what percentage of the total products do these categories account for, what percentage of products are on special etc.). The next thing I want to do is explore differences at the individual products level and conduct a very granular comparison between Woolworths, Coles and IGA. There are many industry reports out there comparing the industry players at a macro level, however, there seems to be no publicly available comparisons of the pricing differences of identical products (i.e. same brand, same product, same packaging size) between the major supermarkets in Australia. I am aware of apps like Frugl and websites such as Price Hipster which attempt to provide consumers with some sort of price intelligence to help them find the cheapest products. However, those services do not provide any analytical insights into the major players’ pricing strategy and if there are any product category segmentation differences. From all the reviews I’ve read on Frugl, it seems they have data integrity issues with their pricing comparisons.

Since, Aldi is still currently still in the process of building their ecommerce business in Australia (https://shop.aldi.com.au), it would be inappropriate to include them in this comparison. Furthermore, the fact that they specialise in private label products and other foreign brands which are not found in Coles, IGA or Woolworths would make it unsuitable to include them in this comparison — it would be like comparing apples to oranges.

All analyses performed use Python in Jupyter Notebook.

Data from Woolworths, Coles and IGA were scraped from their online store websites on 23 April, 2021. Since each supermarket uses their own labelling convention, the datasets required some manipulation and cleaning to ensure product naming conventions and packaging units were consistent across all three.

The following modifications were done to the respective supermarket datasets:

Datasets were read into pandas DataFrames and NaN values replaced with an empty string.

# Woolworths
df_wow = pd.read_csv('Data/2021-04-23 WOW Data.csv', encoding='utf-8-sig').fillna('')
# Coles
df_coles = pd.read_csv('Data/2021-04-23 Coles Data.csv', encoding='utf-8-sig').fillna('')
# IGA
df_iga = pd.read_csv('Data/2021-04-23 IGA Data.csv', encoding='utf-8-sig')

Woolworths

  • Removed commas within the Category names, which makes ‘exploding’ these items easier (since some have multiple categories and I wish to have the product listed multiple times while only having one of the categories listed in that column.
df_wow['Department'] = df_wow['Department'].apply(
lambda x: x.replace('Meat, Seafood & Deli', 'Meat Seafood & Deli').replace('Dairy, Eggs & Fridge', 'Dairy Eggs & Fridge').replace("'", "").strip("[]")
)
  • Filter out products that have ‘Tobacco Product’ listed in their Department column and products which have doesn’t have any value in their price column.
filter = (df_wow['Department'] != 'Tobacco Product') & (df_wow['Price'] != '')df_wow = df_wow.loc[filter]
  • Products which are in multiple categories are “exploded” out. The below pictures demonstrates what is meant by “explode”:
df_wow = df_wow.assign(Department=df_wow.Department.str.split(', ')).explode('Department')
Before — a list of 2 items are in the Department column
After — these multiple departments of the same product has been “exploded” out
  • Products in the ‘Liquor’ category are removed. These are all BWS products being sold on the Woolworths platform since they were both owned by the same parent company (BWS is no owned by Endevour — a Woolworths Group spin off). Products with no entry in the Category column are now labelled ‘NOT LISTED’.
df_wow = df_wow.loc[df_wow['Department'] != 'Liquor']df_wow['Department'] = df_wow['Department'].apply(lambda x: 'NOT LISTED' if x == '' else x)
  • A new column (‘Brand_Product_Size’) is created by concatenating the Product Name and Package Size columns. This convention of having the brand name followed by the product name and the packaging size in one string will be the target column used when matching identical products across all the supermarkets. Columns are re-arranged and relevant columns renamed to reflect which store it belongs to.
df_wow['Brand_Product_Size'] = df_wow['Product Name'] + " " + df_wow['Package Size']df_wow = df_wow[
['SKU', 'Brand_Product_Size', 'Brand', 'Product Name',
'Price', 'Package Size', 'Price per unit', 'Specials',
'Department', 'Online Only', 'New Product', 'Product URL']].reset_index(drop=True)
df_wow = df_wow.rename(columns={
'Price': 'WOW Price',
'Package Size': 'WOW Size',
'Price per unit': 'WOW ppu',
'Specials': 'WOW Specials',
'Department': 'WOW Category'
})

The cleaned dataset is then outputted as a CSV file for the next stage.

df_wow.to_csv('1. Cleaned Data/Woolworths.csv', index=False, encoding='utf-8-sig')

Coles

After reading in the Coles dataset into a DataFrame, the following changes are applied to the dataset:

  • In the ‘Unit Size” column:
    ‘1 each’ was changed to ‘each’,
    ‘l’ was changed to ‘L’
  • In the ‘Category’ column, all ‘-’ were replaced with a space and the first letter of each word was capitalised.
  • Only products which are available and not in the liquor category are filtered to remain in the dataset.
  • The ‘Brand_Product_Size’ is created just like Woolworths.
  • Columns re-arranged, re-named to reflect store, and dataset outputted as CSV for next stage.
df_coles = pd.read_csv('Data/2021-04-23 Coles Data.csv', encoding='utf-8-sig').fillna('')df_coles['Unit Size'] = df_coles['Unit Size'].apply(lambda x: x.replace('1 each', 'each').lower()).apply(lambda x: x.replace('l', 'L'))df_coles['Brand_Product_Size'] = df_coles['Brand'] + ' ' + df_coles['Product Name'] + ' ' + df_coles['Unit Size']df_coles.Category = df_coles.Category.apply(lambda x: x.replace('--', ' ').replace('-', ' ').title())df_coles = df_coles.loc[(df_coles['Availability'] == 'Available') & (df_coles['Category'] != 'Liquor')]df_coles = df_coles[['SKU', 'Brand_Product_Size', 'Brand', 'Product Name', 'Current Price', 
'Price per unit', 'Unit Size', 'On Special', 'Category', 'URL']]
df_coles = df_coles.rename(columns={
'Current Price': 'COL Price',
'Price per unit': 'COL ppu',
'Unit Size': 'COL Size',
'On Special': 'COL Specials',
'Category': 'COL Category',
'URL': 'Product URL'
})
df_coles.to_csv('1. Cleaned Data/Coles.csv', encoding='utf-8-sig', index=False)

IGA

  • Create IGA DataFrame.
  • Using regex to replace ‘gm’ with ‘g’ at the end of all strings in the ‘Product Name’ column.
  • Changed SKU from float to a string and removed decimal points. Re-arrange columns, re-name and replaced ‘-’ with space in the ‘Category’ column. Dataset outputted as CSV for next stage.
df_iga = pd.read_csv('Data/2021-04-23 IGA Data.csv', encoding='utf-8-sig')import redf_iga['Product Name'] = df_iga['Product Name'].apply(lambda x: re.sub(r'gm$', 'g', x))df_iga.SKU = df_iga['SKU'].astype(str).apply(lambda x: x[:-2])df_iga = df_iga[['SKU', 'Product Name', 'Price', 'Price per unit', 'Main Category', 'Product URL']]df_iga = df_iga.rename(columns={
'Product Name': 'Brand_Product_Size',
'Price': 'IGA Price',
'Price per unit': 'IGA ppu',
'Main Category': 'IGA Category'
})
df_iga['IGA Category'] = df_iga['IGA Category'].apply(lambda x: x.replace('-', ' ').title())df_iga.to_csv('1. Cleaned Data/IGA.csv', encoding='utf-8-sig', index=False)

Continue to Part 2 →

--

--