Click-Through Rate Prediction

In this project we attempted the Kaggle Display Advertising Challenge in 2014 which is to train a model with 10GB of web traffic data to predict click-through rate (CTR, percent of ads clicked).

Description

Inspired by the winning team’s approach, we explained the concept of field-aware fectorization machine (FFM) and sought to build our own homegrown FFM model with PySpark’s RDD API. The primary goal is to beat a baseline model defined by measuring the overall click-through rate. A secondary goal is to match the model performance of competition-winning models.

Techniques

  • factorization machine
  • logistic regression
  • massively parallel processing

Tools

  • Apache Spark
  • Parquet
  • Docker
  • GCP
  • Pandas
  • Matplotlib

More Information

More information can be found at the following links: