Who is going to win the football world cup 2018?

Reading Time: 6 minutes We aim to predict the winner of the FIFA world cup solely based on data. The method applied is not fancy at all, but it should do the trick to get some neat results (spoiler alert: Germany wins!). We use three datasets obtained by Kaggle which contain the outcome of specific pairings between teams, rank, points and the weighted point difference with the opponent. Then, we create a model to predict the outcome of each match during the FIFA world cup 2018. To make the results more appealing, we translate the outcome probabilities to fair odds.

We found the ones responsible for Facebook’s Cambridge Analytica data breach – and it is you

Reading Time: 5 minutes Couple days ago, when Mark Zuckerberg, the billionaire founder and chief executive of Facebook faced senators on the House side of Capitol Hill, for two-day detailed questioning by more than 100 lawmakers, he didn’t break a sweat. It was, by basketball lingo, a serious mismatch in the paint and posterizing was inevitable. CEO of biggest social network came well prepared and next two halves of the game that lasted more than 20 hours he spent explaining to longstanding senators how social network (and internet) actually work. Mr Zuckerberg completed his job successfully, once again. He protected his empire for which he stated “has no known competitors”, and investors gave him thumbs up on the stock exchange next morning.  The biggest takeaway from the press was following: senators don’t understand how Facebook works. And they are not alone. If you have a Facebook profile, chances are, you don’t understand too.

Building a RNN-LSTM completely from scratch (no libraries!)

Reading Time: 10 minutes In this post, we are going to build a RNN-LSTM completely from scratch only by using numpy (coding like it’s 1999). LSTMs belong to the family of recurrent neural networks which are very usefull for learning sequential data as texts, time series or video data. While traditional feedforward networks consist of an input layer, a hidden layer, an output layer and the weights, bias, and activation function between each layer, the RNN network incorporates a hidden state layer which is connected to itself (recurrent).

Forecasting the Bitcoin price using data from Twitter and Reddit

Reading Time: 3 minutes In this blog post, we are going to forecast the Bitcoin price based on text data from Twitter and Reddit. Given that the observed Bitcoin price is formed by some supply and demand function, modeling the demand side, while assuming that the supply side behaves somehow stable, we may end up with some outstanding forecasting results. Social media data has been massively used in the financial industry and requires algorithms that can scale. However, social media data is unstructured and noisy. Supervised learning techniques are strongly domain dependent and need a massive amount of labeled data to be trained on to generalize well. We are going to tackle this problem by mapping the vectorized text data and sentiment directly to future price movements of Bitcoin. The economic theory claims that the price of an asset is a composition of its utility and speculation value. In 2017, we observed a crypto-currency market that went skyrocket – in the absence of a blockchain killer application so far; it is safe to assume that the reason behind this was driven by at least 90% of speculation and 10% by the utility. This assumption highly encourages our project.

Measuring Information Consumption via EDGAR Log Files

Reading Time: 2 minutes ” EDGAR, the Electronic Data Gathering Analysis and Retrieval system performs automated collection, validation and forwarding of submissions by companies who are required by law to file forms with the U.S. Securities and Exchange Commission (SEC). “

Alpha from the Box Office?

Reading Time: 5 minutes When I recently watched some trailers of upcoming cinema movies I was wondering what the number of Youtube views (or also other properties, i.e. reviews) can tell about the success of the movie. For example, the official trailer of the Marvel movie “Black Panther” has about 33 million views at the moment. Black Panther was released on 2018/02/15 in the US and is as of yet the by far most successful movie of 2018 with more than 1.1 billion USD gross worldwide. By way of comparison the official upload of the first trailer of upcoming “Avengers: Infinity War” counts now 156 million views. So, it seems reasonable to expect an even higher turnover for this one (but surely not 4.7 times as big) and thus a pretty large amount of money for Disney. Therefore, it is interesting to investigate (1) what drives the Box Office of a movie, (2) what methods are available to get a forecast and (3) whether there is a obvious connection to the performance of the producing company.

Can blockchain upgrade the soccer transfer market?

Reading Time: 3 minutes Not only since the recent mega deals of Neymar (~ 220 Million Euro), Coutinho (~ 160 Million Euro) or Dembele (~ 140 Million Euro) the transfers between the European top clubs in football generate extensive media coverage. Supposing this market is a closed system, these incomes should be taxed and reinvested. While the first is beneficial for the whole country, the latter is beneficial for smaller clubs since they can sell young players for higher prices. So high prices and high player salaries should not be a problem, like Zlatan Ibrahimovic stated when entering the PSG squad.

How to win an AI-Hackathon?

Reading Time: 2 minutes Almost a year ago, having my laptop and a sleeping bag in my backpack, I attended an AI-Hackathon in Germany. Right after the kick-off meeting at 9:00 AM I teamed up with two UX Designers and one Business Developer. We immediately started brainstorming to identify a potential project using open data and AI. Our first idea was to find a new particle in the CERN data or new physics. However, we dropped that idea real quick and decided to build a service for visually impaired people. The idea was basically to create an audiobook from any video content. Usually, Hackathons often aren’t long enough to create something entirely from scratch. Nevertheless, as I was working with Deep Learning models for quite some time, it wouldn’t take too long to recycle a couple of thousands line of code and wrap it around some video feed. Given my professional experience, my task was to develop the back-end of a minimalistic prototype within 24 hours while my teammates were focused on the user interface, presentation, and a bulletproof business case.

Elo rating and funds’ performance

Reading Time: 3 minutes The elo rating system is a by Aprad Elo created system for calculating relative skill levels in games such as chess or video games. Although this system couldn’t establish its implementation in many other forms of sport, there are several websites publishing these elo rankings (e.g. Word Football Elo Ratings).
The elo rating number is based on pairwise comparisons. Players‘ ratings are not measured absolutely, but rather depend on their own rating, the rating of their opponents and the results of the game.

Is diversifying your portfolio by adding volatility as an asset a good idea?

Reading Time: 3 minutes In portfolio theory we teach students that investing in different assets improves the performance of portfolios due to the diversification effect. This effect works best if the correlation between assets is low or even negative.

Unfortunately, the correlation between assets, especially stocks, is quite high nowadays making it difficult to benefit from diversification. One asset being highly negatively correlated with the stock market is volatility. The most famous volatility Index is the VIX® Index which “is a key measure of market expectations of near-term volatility conveyed by S&P 500 stock index option prices.” The correlation between the VIX and the S&P 500 is about -0.74 from the beginning of 2007 until today. So, it might be a good idea to add the VIX to your portfolio because of diversification, right?