Taking a Machine Learning project to production involves multiple components — Data Engineering, DevOps, and Machine Learning. The intersection of these components is MLOps. MLOps (Machine Learning + DevOps) is the process of taking a machine learning project to production — with the goal of automating and improving the quality of production models, while also focusing on business and regulatory requirements.
All Knowledge is Connected. One of the things, I realised early in my journey as a Data Analyst was that new concepts are learnt more easily when mapped to concepts you already are familiar with. So when I started learning about Machine Learning in Graphs — it was but natural to relate the concept of Word Embeddings to Node Embeddings Using DeepWalk. I was also recently exploring the arXiv data and what better way to learn something new than by implementing it.
The only constant rule about this world is it is always changing and to survive you need to be growing and adapting to the changing world. “CHANGE” is how I would describe my journey in Analytics so far. I started as an Analyst in 2017 and today I am a Lead Data Analyst and this journey has been fulfilling, has been full of challenges and constant learning has been the mantra. …
Recommendation systems are the key building blocks of companies like Amazon, Flipkart, Netflix, Facebook etc. The goal of a recommendation system is to recommend items to predict the possibility that a user would favour an item based on their prior interaction with the system. Two most common types of Recommendation System are Content-Based and Collaborative Filtering.
Content-Based Recommender system focuses on the attributes of the item and recommends items similar to it.
Collaborative Filtering focuses on user’s attitude towards the items to recommend items. It uses the knowledge of the crowd to predict items.
In the world of Natural Language Processing (NLP), the most basic models are based on Bag of Words. But such models fail to capture the syntactic relations between words.
For example, suppose we build a sentiment analyser based on only Bag of Words. Such a model will not be able to capture the difference between “I like you”, where “like” is a verb with a positive sentiment, and “I am like you”, where “like” is a preposition with a neutral sentiment.
So this leaves us with a question — how do we improve on this Bag of Words technique?
Welcome to part two of the predicting taxi fare using machine learning series! This is a unique challenge, wouldn’t you say? We take cab rides on a regular basis (sometimes even daily!), and yet when we’re hitting that ‘Book now’ button, we rely on manual on-the-fly calculations rather than hardcore ML ones. And that’s what I aim to demonstrate here.
In the previous post, we looked at cleaning the data and exploring it to identify relationships between variables, and also to understand various features that will have an impact on the Taxi Fare.
When the Cambridge Analytica- Facebook scandal emerged, articles related to misuse of user data by technology companies were ubiquitous. The issues raised related to privacy made me want to understand the impact of the scandal on the views people had on Facebook and how their perception of Facebook has changed since the scandal.
In this challenge we are given a training set of 55M Taxi trips in New York since 2009 in the train data and 9914 records in the test data. The goal of this challenge is to predict the fare of a taxi trip given information about the pickup and drop off locations, the pickup date time and number of passengers travelling.
In any analytics project 80% of the time and effort is spent on data cleaning, exploratory analysis and deriving new features. …
In my previous post, I had written about how to scrape search results for a particular query string from Medium. In this post, we will go into details of analyzing the data scrapped for search term “Data Science” to group posts based on Number of claps and Responses into different levels of popularity and also understand what makes these posts popular.
The data scrapped from Medium search results was JSON file with extensive data about each search result. To explore the structure of JSON file, I used Notepad++ with JSON plugin. …
I wanted a way to look at what people are writing on Medium about Data Science and here’s how I did it.
Medium is a great tool for posting and discovering content on latest topics and being an data enthusiast, I wanted to understand what people are writing on Data Science and what kind of articles are well-read. So I decided to build a crawler using scrapy — a python library.
To build any crawler, it is imperative to understand what requests are made to the server to fetch the data. To get this information, I used the “Network” tab…