Neural Networks

As part of a class in Neural Networks, my team and I built an artificial neural network (ANN) from scratch as part of a comparative analysis with traditional ML methods in classifying derby winners.

Using data scraped from a weather API, Equiibase’s race history statistics PDFs, and from Kaggle’s Big Data Derby Competition, we employed NLP data pre-processing to prepare for modeling. Feature importance and hyper parameter tuning were used to gain insight into the data and tune the models.

An ANN model that processed numerical, categorical, and text data was built to predict the target variables using the Keras functional API. Different input layers were specified for different data types. Embedding was performed for all categorical features; the dimensions were specified via the rule of thumb of taking the minimum number of fifty and the number of categories divided by two. Then, the embeddings for all categorical features were concatenated. The text sequences for each observation were tokenized, padded, and fed to the input layer. After that, we created embeddings for the text data input and processed them with RNN layers, such as the LSTM. Numerical variables were fed to a dense layer. A concatenation layer was created to combine all outputs from above. More dense layers were built to provide more non-linearity to the model.

Code