Latest post

Upload your local Spark script to an AWS EMR cluster using a simple Python script

Author image Upload your local Spark script to an AWS EMR cluster using a simple Python script

Apache Spark is definitely one of the hottest topics in the Data Science community at the moment. Last month when we visited PyData Amsterdam 2016 we witnessed a great example of Spark's immense popularity. The speakers at PyData talking about Spark had the largest crowds after all. Sometimes we see that these popular topics are slowly transforming in buzzwords that »

A recommendation system for blogs: Content-based similarity (part 2)

Author image

In this second post in a series of posts about a content recommendation system for The Marketing Technologist (TMT) website we are going to elaborate on the concept of content-based recommendation systems. In the first post we described the benefits of recommendation systems and we roughly divided them in two different types of recommenders: content-based and collaborative filtering. The first »

A recommendation system for blogs: Setting up the prerequisites (part 1)

Author image

The goal of data science is typically described as creating value from Big Data. However, data science should also meet a second goal, that is, avoiding an information overload. One particular type of projects that really meet these two goals are recommendation engines. Online stores such as Amazon but also streaming services such as Netflix suffer from information overload. Customers »

Is there time for coffee? Your execution time is ticking in Python!

Author image

Last month I was working on a machine learning project. If you make use of grid search to find the optimum parameters, it is nice to know how much time an iterating process costs, so I do not waste my time. In this blog you’ll learn how to: Install the progress bar library in Windows The disadvantage of the »

Slashception with regexp_extract in Hive

Author image

As a Data Scientist I frequently need to work with regular expressions. Though the capabilities and power of regular expressions are enormous, I just cannot seem to like them a lot. That is because when they do not function as expected they can be a really time-consuming nightmare. In this blogpost I will describe the hours I lost last week »

The GAM approach to spend your money more efficiently!

Author image

In an earlier blogpost we described how Blue Mango Interactive optimizes the media spend of clients using S-curves. S-curves are used to find the S-shaped relationship of a particular media driver on a KPI such as sales. Moreover, when a S-curve is obtained, we can determine the optimal point that prevents under- or overspending. Hence, we spend our money more »