市场收益率

统计学中探讨收益率的四个性质如下:

  1. 收益率集中于何处(Central trendency)
  2. 收益率举起中心位置的偏离有多远(离散度Dispersion)
  3. 收益率的分布是堆成的还是偏向一边的(偏度Skewness)
  4. 收益率是否出现极端结果的情况(峰度Kurtosis)

一、集中趋势的度量 Measures of central tendency

1. Mean

  1. Arthmetic mean 算术平均
    1. 定义:Arithmetic mean return focus on average single-period performance.
    2. Advantage:
      1. Easy to work with mathematically.
      2. Uses all the information about size and magnitude of the observations.
    3. Disadvantage:
      1. Sensitive to extreme values.
    4. There are three options for dealing with extreme values:
      1. Do thing;
      2. Delete all the outliers(离群值;异常值),calculate trimmed mean 修剪均值.
      3. Peplace the the outliers with another value,calculate winsorized mean缩微均值.
  2. Weighted mean 加权平均
    1. 公式:
      weighted-mean-formula
    2. 用途:Weighted mean are mostly used to calculate the portfolio return,or the expected value based on probabilities.
  3. Geometric mean 几何平均
    1. 公式:
      Geometric-mean-formula
    2. 用途:Used to calculate average periodic compound rate of return on ivestment. Geometric mean return focus on the profitability of an investment over a multi-period horizon.
  4. Harmonic mean 调和平均
    1. 公式:
      Geometric-mean-formula
    2. 用途:Used to find average cost per share of stock purchased over time in constant dollar amounts.主要用来建仓,成本最低;
  5. Harmonic Mean <= Geometric Mean <= Arithmetic Mean
  6. Median:“The median is the value of the middle item of a set of items that has been sorted into ascending or descending order. In an odd-numbered sample of n items, the median is the value of the item that occupies the (n + 1)/2 position. In an even-numbered sample, we define the median as the mean of the values of items occupying the n/2 and (n + 2)/2 positions (the two middle items).”
    3.** Mode:**“The mode is the most frequently occurring value in a distribution”

二、分位数Quantiles

Quantiles(or fractile)are often used to rank performance and investment research.

Quantiles:

  1. 定义:A value or below which a stated fraction of the data lies. 给定一组观测值,第y个百分数是指这样的一个值,小于等于该值的观测值站总观测值的y%。

  2. 分位数的分类:

    1. Quartiles 四分位数
    2. Quintiles 五分位数
    3. Deciles 十分位数
    4. Percentiles 百分位数
  3. 一个由n个按升序排列数组组成的数组的百分位点位置的计算公式为:

二、离散度的度量 Measures of dispersion

  1. Absolute dispersion
    1. Range
      1. 定义:Range = Maximum Value - Minimum Value
      2. 优势和劣势:Easy for computation,but only use two numbers and tell nothing about the distribution of data set.
    2. Mean absolute deviation:
      1. 公式:
    3. Variance and standard deviation
      1. 总体标准差和总体标准差
        1. 总体方差 Population Variance
        2. 总体标准差 Population Standard Deviation
      2. 样本方差和样本标准差
        1. 样本方差 Sample Variance
          1. sample-variance
        2. 样本标准差 Sample Standard Deviation
          1. sample-standard-deviation
    4. Downside deviation: is a measure of dispersion of the observations below the target.
      1. 意义:A measure of down side risk.
      2. 公式:
        1. Target-downside-deviation
  2. Relative dispersion
    1. Coefficient of variation(CV) 变异系数
    2. 公式:
    3. CV就是样本标准差和样本均值的比值;
    4. A measure of risk per unit fo mean return, thus the lower is better.

三、偏度 Skewness

  1. The normal distribution(Mean = Median = Mode)

Symmetrical distribution

  1. Positively Skewed

Positively skewed

  • Frequent small losses and a few extreme gains(fatter/long right tail)
  1. Negatively Skewed

Positively skewed

  • Frequent small gains and a few extreme losses(fatter/long left tail)

四、峰度 Kurtosis

Kurtosis峰度,是一个统计度量指标,它告诉我们一个分布的峰值比正态分布更高还是更低。

  • 一个分布的峰值如果比正态分布大,那么就称其为尖峰的(leptokurtic);kurtosis > 3 ,excess kurtosis > 0
  • 一个分布的峰度如果比正态分布小,那么就称其为低峰的(platykurtic);kurtosis < 3 ,excess kurtosis < 0
  • 一个分布的峰度如果等于正态分布,那么就称其为中峰的(mesokurtic);kurtosis = 3 ,excess kurtosis = 0

kurtosis

一个具有正的超额峰度的收益率分布相对于正态分布,具有出现频率更高的极端偏离均值的偏差值。

五、协方差Covariance和相关系数Correlation

  1. 协方差Covariance:

    1. 定义:The sample of covariance is a measure of how two variables in a sample move together.
    2. 公式:Cov(X,Y) = ∑((Xi - X̅)*(Yi - Y̅)) / (n - 1)
      Covariance
    3. 正协方差: A positive covariance means X and Y tend to increase or decrease together.
    4. 负协方差: A negative covariance means as one increases, the other decreases.
    5. 协方差的取值接近为0时,说明两个变量之间不存在现行关系,他们不呈现明显的相关性。
  2. Sample Correlation Coefficient

    1. 定义:“The sample correlation coefficient is a standardized measure of how two variables in a sample move together. The sample correlation coefficient (rXY) is the ratio of the sample covariance to the product of the two variables’ standard deviations”.
    2. Correlation ranges from −1 and +1 for two random variables, X and Y.
    3. A correlation of 0 (uncorrelated variables) indicates an absence of any linear (that is, straight-line) relationship between the variables.
    4. A positive correlation close to +1 indicates a strong positive linear relationship.A correlation of 1 indicates a perfect linear relationship.
    5. A negative correlation close to −1 indicates a strong negative (that is, inverse) linear relationship. A correlation of −1 indicates a perfect inverse linear relationship.

Scatter Plots showing Various Degrees of Correlation
degrees-of-correlation

  1. 相关系数的局限性
    1. Correlation may be quite sensitive to outliers. 受离群值影响
    2. Correlation does not imply causation. 不具备因果关系
    3. Spurious correlation 虚假相关
      1. Chance relationship 偶然关系
      2. Related to third variable. 与第三变量相关