Click-Through Rate Prediction
In this project we attempted the Kaggle Display Advertising Challenge in 2014 which is to train a model with 10GB of web traffic data to predict click-through rate (CTR, percent of ads clicked).
Description
Inspired by the winning team’s approach, we explained the concept of field-aware fectorization machine (FFM) and sought to build our own homegrown FFM model with PySpark’s RDD API. The primary goal is to beat a baseline model defined by measuring the overall click-through rate. A secondary goal is to match the model performance of competition-winning models.
Techniques
- factorization machine
- logistic regression
- massively parallel processing
Tools
- Apache Spark
- Parquet
- Docker
- GCP
- Pandas
- Matplotlib
More Information
More information can be found at the following links: