From Idea to App in 5 Weeks

Larkist is my Lambda School capstone project and was built in 5 weeks by myself and an incredible team of 8 other students. You can learn all about our team & try the app yourself here.

What is Larkist:

Larkist is an web application for Twitter power users that allows them to:

  • ❌ block toxic-bullies from their timeline.
  • 💌 use AI to create lists.
  • ✍️ Tweet with “undo send”.
  • Make your timeline sing🎶.

Twitter users often encounter one of two problems: Toxic timelines with content that is offputting, or a timeline feed that lacks relevance.

Larkist overcomes the toxicity of timelines using an NLP powered timeline “scanner” that helps users see how toxic interactions are in their timelines, and pairs it with an effortless blocking workflow.

For irrelevant timelines we opted to build our own recommendation engine and utilize a neglected native Twitter feature: lists. Each list created by a user generates its own timeline that is viewable by the user.

Creating lists via Twitter is cumbersome, but our recommendation engine makes it easier than ever to create interesting & insightful Twitter lists.

DS Technologies Used:

  • Pandas
  • Numpy
  • Tweepy + Twitter Oauth
  • NetworkX
  • Sklearn
  • TensorFlow
  • Google Cloud Platform
  • PostgreSQL

My Contributions:

My Contributions to this project were within these five areas:

  • Data Science - Classification API (Toxic Comments)
  • Data Science - Recommendation API (List Recommender)
  • Data Engineering - Scalable API endpoint hosting & application hosting.
  • Product - Larkist Landing Page
  • Leadership - Vision & Voice of the Customer

I’ll touch on the Data Science portion first, then the Data Engineering, then the support roles.

DS - Prediction API (NLP)

For our NLP model we ended up using a fine-tuned BERT model as a multi-class classifier. We utilized the Jigsaw Toxic Comments dataset from Kaggle/Jigsaw to do our training & fine tuning, then validated the transfer learning accuracy through spot testing on a twitter dataset we built ourselves.

By tracking the users selection of block actions along with our recommendations we are able to monitor the health & effectiveness of this model “in the wild”.

Hosting a large NLP model like this can be quite a challenge, but we utilized Kubernetes & Docker to host our Prediction API endpoint on Google Cloud and leveraged Google Cloud Functions for prepping & feeding queries. This wide-feed structure + Kubernetes backbone makes this endpoint robust & scalable.

DS - Recommendation API (Graph Network)

The prediction API is built using NetworkX and a modified PageRank algorithm. One of the biggest challenges in working with the Twitter API is getting enough data to generate meaningful relationships. By using Tweepy & multithreading we were able to package our functions up and run them in Google Cloud Functions as well to enable our API to scale with demand, and generate a graph network of thousands of interactions between the target users provided to seed the function execution. By evaluating these interactions and incorporating context from “Likes” and “Following” we are able to rank the list of targets found in our Tweepy API calls and return a ranked list of recommended “List Members”.

As with the Prediction API, by tracking the users selection & inclusion of our recommended “List Members” we can monitor the health & effectiveness of this model “in the wild”.

Data Engineering

Did you know that Twitter lists created on the east coast don’t update to the west coast servers for up to 6 days? We didn’t either when we first started this project. This and many other data engineering curve balls landed in my lap to solve during our project.

We initally deployed our backend on Heroku, but with the list update issue I was able to move our entire deployment over to Google Cloud App Engine & Cloud SQL. This solved our list creation issue and improved the performance of our site.

Product & Leadership

I know personally the importance that value-driven product features have. By interviewing any one who would talk with me early on I was able to gain additional empathy and test ideas generated by our team, before we spent hours coding them as features. This additional layer of planning & leadership shaped our team, and helped us to keep the “customer” in mind in every design decision.

Final Thoughts & Why Larkist?

Growing up as a small sized kid/teen I dealt with being on the receiving end of bullying quite a bit, so the idea that I can use Data Science to help people eliminate that negativity through proactive filtering has always appealed to me.

I had been working on Toxic Comment detection NLP for some time as part of a prior project for HackerSalt, and I wanted to take it further by building a product people could use every day. The inspiration for Larkist started as an idea in a Tweet:

When I saw the tweet I knew this was something I wanted to help build. I reached out and advocated for the opportunity to build it. I couldn’t be happier with how it turned out.

If I can make the world a better place using Data Science, you better believe I’m going to try.

Check out our Demo Day Pitch below, or try the app out for yourself at Larkist.com

Demo Pitch