The Academy: A Machine Learning Framework
One of my last projects at my previous company was to implement a framework for developing and testing various predictive models. Thinking about the users, designing the simplest and most useful interface, adding splashes of color and personality – it was a really enjoyable project.
I’m including a modified version of README for the project, to give you all a flavor of how it was put together:
The Academy
A Trait Prediction Framework
Welcome to The Academy, a trait prediction framework. You’re minutes away from hours of fun developing experimental models for understanding and predicting people’s inner selves.
Before you start pounding those ivories, here’s a bit about how the system works:
Overview
First, all the goodness is located in the following directories:
predict_traits.py
is the command which makes the actual predictions for our production users. You update this file to change the model we use in production. We’ll come back to this soon.
The models/
directory is where our different models live. To create a model, create a new .py
file in this directory. To make it easier to talk about different models, we’re going to name them according to a theme. That theme is philosophers. The first two models are Pirsig and Hofstadter.
Creating a model
You have a lot of freedom in creating your model, provided that you conform to the following interface:
- Your model must inherit from the
BasePredictor
class, which can be imported fromprediction.models
- On initialization, your model must accept a
pandas.DataFrame
object as the first argument. It can accept an arbitrary number of keyword arguments, which can serve as the parameters to your model. The parameters are model-dependent – you can make them anything you want, or have none at all. - Your model must implement a
predict()
method, which will return apandas.DataFrame
, with scores for all of the member’s traits. pandas is a very powerful and popular library for doing data analysis. You can read more about it here
That’s it! As long as your model exposes the interface described, you can implement it in any way you like, from support vector machine to randint()
.
There is a PreProcessor class which will help you prepare the data. Here’s how the model should work:
Testing a model
After developing a model, it is important that you test it. Only by testing model accuracy and tracking this metric over time will we be able to make continual movement towards greater effectiveness.
To test a model, use the PredictionTest
class:
Your model will be tested against the data using a technique called “K-Fold Cross Validation”, and the results will be stored in the question_predictiontest
table:
The test will store information about the model and parameters that were used in the test, as well as the accuracy and the runtime of your algorithm. Accuracy is calculated using a technique called “Root Mean Square Error”.
You can test your model with different sets of parameters, to find the best values. You can also save some notes with the test, if you like:
You have some options in regards to the data your model runs on. By default, the model runs on the entire dataset, and uses the score
for any question (agrees
/enumeration
). You can change this as follows: