Who is going to win the football world cup 2018?

We aim to predict the winner of the FIFA world cup solely based on data. The method applied is not fancy at all, but it should do the trick to get some neat results (spoiler alert: Germany wins!). We use three datasets obtained by Kaggle which contain the outcome of specific pairings between teams, rank, points and the weighted point difference with the opponent. Then, we create a model to predict the outcome of each match during the FIFA world cup 2018. To make the results more appealing, we translate the outcome probabilities to fair odds.

We found ones responsible for Facebook’s Cambridge Analytica data breach – and it is you

Couple days ago, when Mark Zuckerberg, the billionaire founder and chief executive of Facebook faced senators on the House side of Capitol Hill, for two-day detailed questioning by more than 100 lawmakers, he didn’t break a sweat. It was, by basketball lingo, a serious mismatch in the paint and posterizing was inevitable. CEO of biggest social network came well prepared and next two halves of the game that lasted more than 20 hours he spent explaining to longstanding senators how social network (and internet) actually work. Mr Zuckerberg completed his job successfully, once again. He protected his empire for which he stated “has no known competitors”, and investors gave him thumbs up on the stock exchange next morning.  The biggest takeaway from the press was following: senators don’t understand how Facebook works. And they are not alone. If you have a Facebook profile, chances are, you don’t understand too.

Building a RNN-LSTM completely from scratch (no libraries!)

In this post, we are going to build a RNN-LSTM completely from scratch only by using numpy (coding like it’s 1999). LSTMs belong to the family of recurrent neural networks which are very usefull for learning sequential data as texts, time series or video data. While traditional feedforward networks consist of an input layer, a hidden layer, an output layer and the weights, bias, and activation function between each layer, the RNN network incorporates a hidden state layer which is connected to itself (recurrent).

Forecasting the Bitcoin price using data from Twitter and Reddit

In this blog post, we are going to forecast the Bitcoin price based on text data from Twitter and Reddit. Given that the observed Bitcoin price is formed by some supply and demand function, modeling the demand side, while assuming that the supply side behaves somehow stable, we may end up with some outstanding forecasting results. Social media data has been massively used in the financial industry and requires algorithms that can scale. However, social media data is unstructured and noisy. Supervised learning techniques are strongly domain dependent and need a massive amount of labeled data to be trained on to generalize well. We are going to tackle this problem by mapping the vectorized text data and sentiment directly to future price movements of Bitcoin. The economic theory claims that the price of an asset is a composition of its utility and speculation value. In 2017, we observed a crypto-currency market that went skyrocket – in the absence of a blockchain killer application so far; it is safe to assume that the reason behind this was driven by at least 90% of speculation and 10% by the utility. This assumption highly encourages our project.

Alpha from the Box Office?

When I recently watched some trailers of upcoming cinema movies I was wondering what the number of Youtube views (or also other properties, i.e. reviews) can tell about the success of the movie. For example, the official trailer of the Marvel movie “Black Panther” has about 33 million views at the moment. Black Panther was released on 2018/02/15 in the US and is as of yet the by far most successful movie of 2018 with more than 1.1 billion USD gross worldwide. By way of comparison the official upload of the first trailer of upcoming “Avengers: Infinity War” counts now 156 million views. So, it seems reasonable to expect an even higher turnover for this one (but surely not 4.7 times as big) and thus a pretty large amount of money for Disney. Therefore, it is interesting to investigate (1) what drives the Box Office of a movie, (2) what methods are available to get a forecast and (3) whether there is a obvious connection to the performance of the producing company.

Can blockchain upgrade the soccer transfer market?

Not only since the recent mega deals of Neymar (~ 220 Million Euro), Coutinho (~ 160 Million Euro) or Dembele (~ 140 Million Euro) the transfers between the European top clubs in football generate extensive media coverage. Supposing this market is a closed system, these incomes should be taxed and reinvested. While the first is beneficial for the whole country, the latter is beneficial for smaller clubs since they can sell young players for higher prices. So high prices and high player salaries should not be a problem, like Zlatan Ibrahimovic stated when entering the PSG squad.

How to win an AI-Hackathon?

Almost a year ago, having my laptop and a sleeping bag in my backpack, I attended an AI-Hackathon in Germany. Right after the kick-off meeting at 9:00 AM I teamed up with two UX Designers and one Business Developer. We immediately started brainstorming to identify a potential project using open data and AI. Our first idea was to find a new particle in the CERN data or new physics. However, we dropped that idea real quick and decided to build a service for visually impaired people. The idea was basically to create an audiobook from any video content. Usually, Hackathons often aren’t long enough to create something entirely from scratch. Nevertheless, as I was working with Deep Learning models for quite some time, it wouldn’t take too long to recycle a couple of thousands line of code and wrap it around some video feed. Given my professional experience, my task was to develop the back-end of a minimalistic prototype within 24 hours while my teammates were focused on the user interface, presentation, and a bulletproof business case.