Developed an intelligent car recommendation system.
The application uses the Hugging Face transformers library for natural language understanding and provides personalised car recommendations using zero-shot-classification.
Data Science & ML Ops Leader
Results-driven data scientist with an MSc in Data Science and 7 years of experience leading end-to-end ML projects—from exploratory analysis to deployment and monitoring—at scale (cinch, NBrown Group, Mealish, Arcadis). Proven ability to:
The application uses the Hugging Face transformers library for natural language understanding and provides personalised car recommendations using zero-shot-classification.
The model is built using Google's Tensorflow.
Initially developed and validated the model on Apple stock; extended the framework to predict daily closing price using technical data across a broader set of US equities.
Engineered a takeaway database consisting meal data 150+ restaurants using SQL and dbt. Classified over 350,000 takeaway options into diverse dietary plans like vegan, gluten-free, halal etc. using OpenAI’s GPT-3.5 model, python, FastAPI, GCP python SDKs (google-cloud-bigquery, google-cloud-storage), Docker, GitHub actions, and deployed the API on GCP Linux VM using a microservices architecture. [Private repo, contact for access!]
Built a LGBM MAPIE model and a fastapi endpoint which predicts average and maximum refurb costs for retail-able vehicles.
Generated a data quality report for a days to sell data for vehicles. This model predicts the number of days it will take to sell a car at a certain selling price. Also performed feature selection and random search hyperparameter optimisation to achieve a RMSE of 0.7 days. [Private repo, contact for access!]
Developed and embedded a model monitoring framework (consisting of Long Term Model Drift monitoring, Early warning monitoring and alerting) for reporting model drift of 4 supervised machine learning CLTV models (demand, returns, orders, transaction) using sklearn, seaborn packages in python. [Private repo, contact for access!]
Performed big data analysis on tweets using PySpark, Scala, Hive, Impala and mySQL in VMware to monitor unusual activity during football matches.
Built an advanced package in R for spatial analysis. This includes functions such as GetElevation() which calculates the elevation given a lat and long, FindNearest() which looks for the nearest spatial feature, ConvertCoordinates() which converts easting and northing to longitude and latitude and vice versa.
This R package can be used to get historical weather conditions, hourly and daily forecasts for any location in the world. Also includes unit tests.
This repo is a simple walkthrough of how to apply test driven development in python.
This repo consists of a database engineered for a hypothetical online mail company, Insight Mail. Also contains a database schema as well.