统计学中探讨收益率的四个性质如下:
- 收益率集中于何处(Central trendency)
- 收益率举起中心位置的偏离有多远(离散度Dispersion)
- 收益率的分布是堆成的还是偏向一边的(偏度Skewness)
- 收益率是否出现极端结果的情况(峰度Kurtosis)
一、集中趋势的度量 Measures of central tendency
1. Mean
- Arthmetic mean 算术平均
- 定义:Arithmetic mean return focus on average single-period performance.
- Advantage:
- Easy to work with mathematically.
- Uses all the information about size and magnitude of the observations.
- Disadvantage:
- Sensitive to extreme values.
- There are three options for dealing with extreme values:
- Do thing;
- Delete all the outliers(离群值;异常值),calculate trimmed mean 修剪均值.
- Peplace the the outliers with another value,calculate winsorized mean缩微均值.
- Weighted mean 加权平均
- 公式:
- 用途:Weighted mean are mostly used to calculate the portfolio return,or the expected value based on probabilities.
- 公式:
- Geometric mean 几何平均
- 公式:
- 用途:Used to calculate average periodic compound rate of return on ivestment. Geometric mean return focus on the profitability of an investment over a multi-period horizon.
- 公式:
- Harmonic mean 调和平均
- 公式:
- 用途:Used to find average cost per share of stock purchased over time in constant dollar amounts.主要用来建仓,成本最低;
- 公式:
- Harmonic Mean <= Geometric Mean <= Arithmetic Mean
- Median:“The median is the value of the middle item of a set of items that has been sorted into ascending or descending order. In an odd-numbered sample of n items, the median is the value of the item that occupies the (n + 1)/2 position. In an even-numbered sample, we define the median as the mean of the values of items occupying the n/2 and (n + 2)/2 positions (the two middle items).”
3.** Mode:**“The mode is the most frequently occurring value in a distribution”
二、分位数Quantiles
Quantiles(or fractile)are often used to rank performance and investment research.
Quantiles:
定义:A value or below which a stated fraction of the data lies. 给定一组观测值,第y个百分数是指这样的一个值,小于等于该值的观测值站总观测值的y%。
分位数的分类:
- Quartiles 四分位数
- Quintiles 五分位数
- Deciles 十分位数
- Percentiles 百分位数
一个由n个按升序排列数组组成的数组的百分位点位置的计算公式为:
二、离散度的度量 Measures of dispersion
- Absolute dispersion
- Range
- 定义:Range = Maximum Value - Minimum Value
- 优势和劣势:Easy for computation,but only use two numbers and tell nothing about the distribution of data set.
- Mean absolute deviation:
- 公式:
- 公式:
- Variance and standard deviation
- 总体标准差和总体标准差
- 总体方差 Population Variance
- 总体标准差 Population Standard Deviation
- 样本方差和样本标准差
- 样本方差 Sample Variance
- 样本标准差 Sample Standard Deviation
- 样本方差 Sample Variance
- 总体标准差和总体标准差
- Downside deviation: is a measure of dispersion of the observations below the target.
- 意义:A measure of down side risk.
- 公式:
- Range
- Relative dispersion
- Coefficient of variation(CV) 变异系数
- 公式:
- CV就是样本标准差和样本均值的比值;
- A measure of risk per unit fo mean return, thus the lower is better.
三、偏度 Skewness
- The normal distribution(Mean = Median = Mode)
- Positively Skewed
- Frequent small losses and a few extreme gains(fatter/long right tail)
- Negatively Skewed
- Frequent small gains and a few extreme losses(fatter/long left tail)
四、峰度 Kurtosis
Kurtosis峰度,是一个统计度量指标,它告诉我们一个分布的峰值比正态分布更高还是更低。
- 一个分布的峰值如果比正态分布大,那么就称其为尖峰的(leptokurtic);kurtosis > 3 ,excess kurtosis > 0
- 一个分布的峰度如果比正态分布小,那么就称其为低峰的(platykurtic);kurtosis < 3 ,excess kurtosis < 0
- 一个分布的峰度如果等于正态分布,那么就称其为中峰的(mesokurtic);kurtosis = 3 ,excess kurtosis = 0
一个具有正的超额峰度的收益率分布相对于正态分布,具有出现频率更高的极端偏离均值的偏差值。
五、协方差Covariance和相关系数Correlation
协方差Covariance:
- 定义:The sample of covariance is a measure of how two variables in a sample move together.
- 公式:Cov(X,Y) = ∑((Xi - X̅)*(Yi - Y̅)) / (n - 1)
- 正协方差: A positive covariance means X and Y tend to increase or decrease together.
- 负协方差: A negative covariance means as one increases, the other decreases.
- 协方差的取值接近为0时,说明两个变量之间不存在现行关系,他们不呈现明显的相关性。
Sample Correlation Coefficient
- 定义:“The sample correlation coefficient is a standardized measure of how two variables in a sample move together. The sample correlation coefficient (rXY) is the ratio of the sample covariance to the product of the two variables’ standard deviations”.
- Correlation ranges from −1 and +1 for two random variables, X and Y.
- A correlation of 0 (uncorrelated variables) indicates an absence of any linear (that is, straight-line) relationship between the variables.
- A positive correlation close to +1 indicates a strong positive linear relationship.A correlation of 1 indicates a perfect linear relationship.
- A negative correlation close to −1 indicates a strong negative (that is, inverse) linear relationship. A correlation of −1 indicates a perfect inverse linear relationship.
Scatter Plots showing Various Degrees of Correlation
- 相关系数的局限性
- Correlation may be quite sensitive to outliers. 受离群值影响
- Correlation does not imply causation. 不具备因果关系
- Spurious correlation 虚假相关
- Chance relationship 偶然关系
- Related to third variable. 与第三变量相关