Portfolio Selection by Full-Scale Optimization and Differential Evolution (I/II)

Reading Time: 4 minutes

This blog entry introduces the theoretical framework for optimizing large portfolios with respect to expected utility by Full-Scale Optimization (FSO) and Differential Evolution (DE) before my next post will present the application and empirical results.

We will use the standard Mean-Variance (MV) optimization approach for portfolio analysis as a benchmark to the FSO framework. MV assumes normally distributed returns and investor preferences that can be well approximated by mean and variance while FSO identifies the optimal portfolio on any set of return distributions based on any description of investor preferences. This implies that no assumptions on the return distributions have to be made. Estimation errors prevail in the MV approach, where means and variances of all assets and the covariances of all asset pairs are to be estimated, as well as in the FSO, where the entire multivariate return distribution is estimated. Considering estimation errors we will need to test our in-sample results out of sample, e.g. by a bootstrapping procedure, which allows us another comparison between both approaches.

There are no limitations on the utility functions, which enables the implication of loss aversion and prospect theory. In FSO the portfolio allocations are then optimized to maximize the specified utility function over non-parameterized return distributions making it a semi-parametric approach. This means that no analytical solution can be achieved thus we need to implement adequate search algorithms to find the optimal portfolio allocation. For the case of one maximum this can be obtained via single-staged gradient-based techniques or a grid search, where all possible portfolio allocation solutions are tested to a specified level of detail. This is problematic for large portfolios as the number of possible solutions (m) drastically increases with the number of assets (n) under consideration and the level of detail (p) for the grid search

(1)   \begin{align*}C(n+1/p-1,1/p)=\frac{(n+(1/p)-1)!}{(1/p)!(n-1)!}=\frac{((1/p)+1)*((1/p)+2)*\ldots*((1/p)+n-1)}{1*2*\ldots*(n-1)}.\end{align*}

As the grid search for a portfolio that consists of many assets becomes too burdensome computationally, we calculate the optimal allocation by Differential Evolution, which is a heuristic search algorithm well suited for complex global non-convex optimization problems. Another reason for using DE is that the method is divisible, so the computation can be shared on parallel machines, which tackles the problem of computation time.

Portfolio selection
Generally, when we‘re selecting our portfolio with respect to utility functions, where utility is dependent on the expected portfolio return distributions, we can express this problem with N assets as

(2)   \begin{align*}\theta^{*}=\underset{\theta\in\Omega}{\mathrm{arg\,max\,}} U(\theta^{'}R)\end{align*}

with the utility function U as the objective function and the vector of our portfolio weights \theta with dimensions (N\times1). R is a matrix (N\times S) of different returns for each asset (n=1,\ldots,N) for S different scenarios. Portfolio returns are then obtained as portfolio allocations multiplied by asset returns \theta^{'}R, which yields a vector of length S and constitutes a probability distribution of portfolio returns of all scenarios considered. Portfolio weights \theta are subject to constraints \Omega, which may include institutional restrictions, e.g. upper and lower bounds on allocations, borrowing restrictions and Value-at-Risk restrictions. The optimal vector of portfolio weights \theta^{*} maximizes the utility function subject to the constraints specified in \Omega.

Full-Scale Optimization
We express the empirical returns over time period t=1,\ldots,T for N assets as a return matrix R (N\times T). Returns are defined as R_{n,t} = (P_{n,t}/P_{n,t-1}) -1 with P_{n,t} being the price of asset n at time t. To identify the scenarios mentioned above, we treat every column R_t of the return matrix as a future scenario with probability T^{-1}. Again, utility is evaluated for each vector of portfolio weights and each scenario in the sample. The vector of portfolio weights with highest average utility across all scenarios is denoted as the optimal allocation combination \theta_{FSO}

(3)   \begin{align*}\theta_{FSO}=\underset{\theta\in\Omega}{\mathrm{arg\,max\,}} \left(T^{-1}\sum_{t=1}^{T}U(\theta^{'}R_t)\right).\end{align*}

We are completely flexible in the formulation of utility functions and we do not make any simplifying assumptions on the return distribution. Literally any utility function can be implemented since the optimization is carried out numerically, allowing for complex investor preferences.

Differential Evolution
DE was first introduced by Storn and Price (1997) and belongs to a family of algorithms that applies self-learning techniques to find optima in large solution surfaces. The algorithm is described in a portfolio setting by Hagströmer and Binner (2009) as follows:

(1) Initialize population: Randomly generate a set of P starting value vectors \theta_{i,1,g} of length N, where i=1,2,\ldots,P and P being the population size, which equals one in the initialization phase.
(2) Mutation: Generate a second set of P vectors of length N by \theta_{i,2,g}=\theta_{j_1,1,g}+(F+z_1)(\theta_{j_2,1,g}-\theta_{j_3,1,g}+z_2) with F as a difference vector scale factor, j_1, j_2 and j_3 as randomly drawn numbers and z_1 and z_2 as stochastic elements (noise).
(3) Offspring: Generate a third set of P vectors \theta_{i,3,g}^{*} of length N with crossover probability \pi equalling \theta_{i,1,g} and (1-\pi) equalling \theta_{i,2,g}. Then adjust these vectors to satisfy the set constraints using a function f_c such that f_c(\theta_{i,3,g}^{*})=\theta_{i,3,g}\in\Omega.
(4) Selection: Generate a fourth set of P vectors of length N consisting of the best solution vectors in sets 1 and 3 using \theta_{i,4,g}={\mathrm{arg\,max\,}}\{\theta_{i,1,g},\theta_{i,3,g}\}(U(\theta^{'}R)).
(5) Iteration: Create a new vector generation by setting \theta_{i,1,g+1}=\theta_{i,4,g} and repeat steps 2-4 until g=G with G as haltering criterion.
(6) Optimum: The optimum is obtained as \theta^{DE}={\mathrm{arg\,max_x\,}}\{\theta_{i,4,G}\}(U(\theta^{'}R)).

As seen above, the number of technical parameters required is low. To prevent random starting values from affecting the DE optimum, the whole process is repeated multiple times. These independent restarts allow us to cut computation time by running it on many machines parallel to each other.

After having introduced the theoretical framework, my following blog post will show a possible implementation in a portfolio setting and discuss results.

Adler, T. and Kritzman, M. (2007). Mean-variance versus full-scale optimisation: In and out of sample. Journal of Asset Management (7), pp. 302-311.
Hagströmer, B. and Binner, J.M. (2009). Stock portfolio selection with full-scale optimization and differential evolution. Applied Financial Economics (19), pp. 1559-1571.
Hagströmer, B. et al. (2008). Mean-variance vs full-scale optimization: Broad evidence for the UK. Federal Reserve Bank of St. Louis, Working Paper No. 2007-016D.
Maringer, D. (2008). Risk preferences and loss aversion in portfolio optimization. Computational Methods in Financial Engineering (Eds) E. Kontoghiorghes, B. Rustem and P. Winkler, Springer, Berlin, Heidelberg, pp. 27-45.
Storn, P. and Price, K. (1997). Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization (11), pp. 341-359.

Print Friendly, PDF & Email