LARS Library Least Angle Regression Stagewise Library

格式：pdf
大小：371.07 KB
文档页数：25

下载文档原格式

/ 25

稀疏优化问题算法研究

稀疏优化问题算法研究作者：李蒙来源：《当代人（下半月）》2018年第03期摘要：稀疏优化问题发展至今，已经广泛应用于压缩感知、图像处理、复杂网络、指数追踪、变量选择等领域，并取得了令人瞩目的成就。

稀疏优化问题的求解算法种类繁多，根据算法设计原理的不同，可将其大致分为三类：贪婪算法、凸松弛方法和阈值类算法。

本文主要介绍稀疏优化问题算法研究进展及各类算法的优缺点。

关键词：稀疏优化问题；压缩感知；信号重构；优化算法随着当代社会信息技术的飞速发展，所获取和需要处理的数据量大幅增多，而基于传统的香农奈奎斯特采样定理中要求采样频率不得低于信号最高频率的两倍才可以精确重构信号。

2006年，Donoho、Candes等人针对稀疏信号或可稀疏表示的信号提出了新的采集和编解码理论，即压缩感知理论。

该理论使得通过少量采样即可准确或近似重构原始信号。

信号重构问题是一个非凸稀疏优化问题（问题）。

该问题是一个NP-hard问题，所以出现了多种不同的处理模型近似地或在一定条件下等价地求解问题。

以下就从压缩感知的信号重构理论出发，分析研究稀疏优化问题的模型和求解算法。

一、压缩感知重构问题在压缩感知理论中，若信号本身是稀疏的，则原始信号和观测信号之间的表达式为：；若信号本身不是稀疏的，但在基底下具有稀疏表示，即，其中是一个稀疏向量，则原始信号和观测信号之间的表达式为：，此时，重构原始信号只需要得到满足该等式约束的稀疏解，便可通过得到原始信号，其中为测量矩阵，为变换矩阵，为感知矩阵。

在压缩感知理论中，感知矩阵是一个很重要的组成部分，感知矩阵的好坏会直接影响原始信号的重构质量。

压缩感知理论中的稀疏信号重构问题旨在通过低维观测数据最大程度恢复出原始的高维信号，其本质为求一个欠定方程组的稀疏解，即（1）压缩感知的重构问题也可通过下述模型来求解：（2）Donoho和Chen证明了当观测矩阵和变换矩阵不相关时，（1）和（2）的解等价，并将（2）称为基追踪（Basis Pursuit，BP）问题。

scikit-learn 回归算法

Scikit-learn是一个流行的Python机器学习库，其中包含许多用于回归分析的算法。

以下是一些常用的scikit-learn回归算法：
1. 线性回归（Linear Regression）
2. 岭回归（Ridge Regression）
3. Lasso回归（Lasso Regression）
4. 弹性网络（ElasticNet Regression）
5. 决策树回归（Decision Tree Regression）
6. 随机森林回归（Random Forest Regression）
7. 套索回归（LassoLars Regression）
8. 最小角回归（Least Angle Regression）
9. SVR（Support Vector Regression）
10. 岭SVR（Ridge SVR）
11. LassoSVR
12. 弹性网络SVR（ElasticNet SVR）
这些算法中，线性回归、岭回归、Lasso回归和弹性网络可以用于解决过拟合和欠拟合问题，而决策树回归、随机森林回归和套索回归可以用于处理非线性关系和特征选择。

SVR是一种用于解决分类问题的回归算法，但它也可以用于回归问题。

lars算法的R语言操作指南

Package‘lars’February20,2015Type PackageVersion1.2Date2013-04-23Title Least Angle Regression,Lasso and Forward StagewiseAuthor Trevor Hastie<hastie@>and Brad Efron<brad@>Maintainer Trevor Hastie<hastie@>Description Efﬁcient procedures forﬁtting an entire lasso sequencewith the cost of a single least squaresﬁt.Least angleregression and inﬁnitesimal forward stagewise regression arerelated to the lasso,as described in the paper below.Depends R(>=2.10)License GPL-2URL /~hastie/Papers/#LARSNeedsCompilation yesRepository CRANDate/Publication2013-04-2409:46:12R topics documented:rs (2)diabetes (3)lars (4)rs (5)rs (7)rs (8)Index101rsrs Computes K-fold cross-validated error curve for larsDescriptionComputes the K-fold cross-validated mean squared prediction error for lars,lasso,or forward stage-wise.Usagers(x,y,K=10,index,trace=FALSE,plot.it=TRUE,se=TRUE,type=c("lasso","lar","forward.stagewise","stepwise"), mode=c("fraction","step"),...)Argumentsx Input to larsy Input to larsK Number of foldsindex Abscissa values at which CV curve should be computed.If mode="fraction"this is the fraction of the saturated|beta|.The default value in this case isindex=seq(from=0,to=1,length=100).If mode="step",thisis the number of steps in lars procedure.The default is complex in this case,anddepends on whether N>p or not.In principal it is index=1:ers can supplytheir own values of index(with care).trace Show computations?plot.it Plot it?se Include standard error bands?type type of larsﬁt,with default"lasso"mode This refers to the index that is used for cross-validation.The default is"fraction"for type="lasso"or type="forward.stagewise".For type="lar"or type="stepwise"the default is"step"...Additional arguments to larsValueInvisibly returns a list with components(which can be plotted using plotCVlars)index As abovecv The CV curve at each value of indexcv.error The standard error of the CV curvemode As abovediabetes3Author(s)Trevor HastieReferencesEfron,Hastie,Johnstone and Tibshirani(2003)"Least Angle Regression"(with discussion)Annals of Statistics;see also /~hastie/Papers/LARS/LeastAngle_2002.pdf.Examplesdata(diabetes)attach(diabetes)rs(x2,y,trace=TRUE,max.steps=80)detach(diabetes)diabetes Blood and other measurements in diabeticsDescriptionThe diabetes data frame has442rows and3columns.These are the data used in the Efron et al "Least Angle Regression"paper.FormatThis data frame contains the following columns:x a matrix with10columnsy a numeric vectorx2a matrix with64columnsDetailsThe x matrix has been standardized to have unit L2norm in each column and zero mean.The matrix x2consists of x plus certain interactions.Source/~hastie/Papers/LARS/LeastAngle_2002.psReferencesEfron,Hastie,Johnstone and Tibshirani(2003)"Least Angle Regression"(with discussion)Annals of Statistics4larslars Fits Least Angle Regression,Lasso and Inﬁnitesimal Forward Stage-wise regression modelsDescriptionThese are all variants of Lasso,and provide the entire sequence of coefﬁcients andﬁts,starting fromzero,to the least squaresﬁt.Usagelars(x,y,type=c("lasso","lar","forward.stagewise","stepwise"),trace=FALSE,normalize=TRUE,intercept=TRUE,Gram,eps=.Machine$double.eps,max.steps,use. Argumentsx matrix of predictorsy responsetype One of"lasso","lar","forward.stagewise"or"stepwise".The names can beabbreviated to any unique substring.Default is"lasso".trace If TRUE,lars prints out its progressnormalize If TRUE,each variable is standardized to have unit L2norm,otherwise it is leftalone.Default is TRUE.intercept if TRUE,an intercept is included in the model(and not penalized),otherwise nointercept is included.Default is TRUE.Gram The X’X matrix;useful for repeated runs(bootstrap)where a large X’X staysthe same.eps An effective zeromax.steps Limit the number of steps taken;the default is8*min(m,n-intercept),with m the number of variables,and n the number of samples.For type="lar"or type="stepwise",the maximum number of steps is min(m,n-intercept).For type="lasso"and especially type="forward.stagewise",there can bemany more terms,because although no more than min(m,n-intercept)vari-ables can be active during any step,variables are frequently droppped and addedas the algorithm proceeds.Although the default usually guarantees that the al-gorithm has proceeded to the saturatedﬁt,users should check.use.Gram When the number m of variables is very large,rger than N,then you maynot want LARS to precompute the Gram matrix.Default is use.Gram=TRUEDetailsLARS is described in detail in Efron,Hastie,Johnstone and Tibshirani(2002).With the"lasso"option,it computes the complete lasso solution simultaneously for ALL values of the shrinkageparameter in the same computational cost as a least squaresﬁt.A"stepwise"option has recentlybeen added to LARS.ValueA"lars"object is returned,for which print,plot,predict,coef and summary methods exist.Author(s)Brad Efron and Trevor HastieReferencesEfron,Hastie,Johnstone and Tibshirani(2003)"Least Angle Regression"(with discussion)Annalsof Statistics;see also /~hastie/Papers/LARS/LeastAngle_2002.pdf.Hastie,Tibshirani and Friedman(2002)Elements of Statistical Learning,Springer,NY.See Alsoprint,plot,summary and predict methods for lars,and rsExamplesdata(diabetes)par(mfrow=c(2,2))attach(diabetes)object<-lars(x,y)plot(object)object2<-lars(x,y,type="lar")plot(object2)object3<-lars(x,y,type="for")#Can use abbreviationsplot(object3)detach(diabetes)rs Plot method for lars objectsDescriptionProduce a plot of a larsﬁt.The default is a complete coefﬁcient path.Usage##S3method for class larsplot(x,xvar=c("norm","df","arc.length","step"),breaks=TRUE,plottype=c("coefficients", "Cp"),omit.zeros=TRUE,eps=1e-10,...)Argumentsx lars objectxvar The type of x variable against which to plot.xvar=norm(default)plots againstthe L1norm of the coefﬁcient vector,as a fraction of the maximal L1norm.xvar=step plots against the step number(which is essentially degrees of free-dom for LAR;not for LASSO or Forward Stagewise).xvar=arc.length plotsagainst the arc.length of theﬁtted vector;this is useful for a LAR object,be-cause the L1norm of its coefﬁcient vector need not be monotone in the steps.xvar=df plots against the estimated df,which is the size of the active set at eachstep.breaks If TRUE,then vertical lines are drawn at each break point in the piecewise linearcoefﬁcient pathsplottype Either coefficients(default)or Cp.The coefﬁcient plot shows the path ofeach coefﬁcient as a function of the norm fraction or Df.The Cp plot shows theCp curve.omit.zeros When the number of variables is much greater than the number of observations,many coefﬁcients will never be nonzero;this logical(default TRUE)avoids plot-ting these zero coefﬁcentseps Deﬁnition of zero above,default is1e-10...Additonal arguments for generic plot.Can be used to set xlims,change colors,line widths,etcDetailsThe default plot uses the fraction of L1norm as the xvar.For forward stagewise and LAR,coefﬁ-cients can pass through zero during a step,which causes a change of slope of L1norm vs arc-length.Since the coefﬁcients are piecewise linear in arc-length between each step,this causes a change in slope of the coefﬁcients.ValueNULLAuthor(s)Trevor HastieReferencesEfron,Hastie,Johnstone and Tibshirani(2003)"Least Angle Regression"(with discussion)Annals of Statistics;see also /~hastie/Papers/LARS/LeastAngle_2002.pdf Yann-Ael Le Borgne(private communication)pointed out the problems in plotting forward stagewise and LAR coefﬁcients against L1norm,and the solution we have implemented.rs7Examplesdata(diabetes)attach(diabetes)object<-lars(x,y)plot(object)detach(diabetes)rs Make predictions or extract coefﬁcients from aﬁtted lars modelDescriptionWhile lars()produces the entire path of solutions,rs allows one to extract a prediction at a particular point along the path.Usage##S3method for class larspredict(object,newx,s,type=c("fit","coefficients"),mode=c("step", "fraction","norm","lambda"),...)##S3method for class larscoef(object,...)Argumentsobject Aﬁtted lars objectnewx If type="ﬁt",then newx should be the x values at which theﬁt is required.If type="coefﬁcients",then newx can be omitted.s a value,or vector of values,indexing the path.Its values depends on the mode= argument.By default(mode="step"),s should take on values between0and p(e.g.,a step of1.3means.3of the way between step1and2.)type If type="ﬁt",predict returns theﬁtted values.If type="coefﬁcients",predict returns the coefﬁcients.Abbreviations allowed.mode Mode="step"means the s=argument indexes the lars step number,and the coef-ﬁcients will be returned corresponding to the values corresponding to step s.Ifmode="fraction",then s should be a number between0and1,and it refers to theratio of the L1norm of the coefﬁcient vector,relative to the norm at the full LSsolution.Mode="norm"means s refers to the L1norm of the coefﬁcient vector.Mode="lambda"uses the lasso regularization parameter for s;for other modelsit is the maximal correlation(does not make sense for lars/stepwise models).Abbreviations allowed....Any arguments for rs should work for rsDetailsLARS is described in detail in Efron,Hastie,Johnstone and Tibshirani(2002).With the"lasso"option,it computes the complete lasso solution simultaneously for ALL values of the shrinkage parameter in the same computational cost as a least squaresﬁt.ValueEither a vector/matrix ofﬁtted values,or a vector/matrix of coefﬁcients.Author(s)Trevor HastieReferencesEfron,Hastie,Johnstone and Tibshirani(2002)"Least Angle Regression"(with discussion)Annals of Statistics;see also /~hastie/Papers/LARS/LeastAngle_2002.pdf.Hastie,Tibshirani and Friedman(2002)Elements of Statistical Learning,Springer,NY.See Alsoprint,plot,lars,rsExamplesdata(diabetes)attach(diabetes)object<-lars(x,y,type="lasso")###make predictions at the values in x,at each of the###steps produced in objectfits<rs(object,x,type="fit")###extract the coefficient vector with L1norm=4.1coef4.1<-coef(object,s=4.1,mode="norm")#orcoef4.1<-predict(object,s=4.1,type="coef",mode="norm")detach(diabetes)rs Summary method for lars objectsDescriptionProduce an anova-type summary for a lars object.Usage##S3method for class larssummary(object,sigma2=NULL,...)Argumentsobject lars objectsigma2optional variance measure(for p>n)...Additional arguments for summary genericDetailsAn anova summary is produced,with Df,RSS and Cp for each step.Df is tricky for some models, such as forward stagewise and stepwise,and is not likely to be accurate.When p>n,the user is responsible for supplying sigma2.ValueAn anova object is returned,with rownames the step number,and with components:Df Estimated degree of freedomRss The Residual sum of SquaresCp The Cp statisticAuthor(s)Brad Efron and Trevor HastieReferencesEfron,Hastie,Johnstone and Tibshirani(2003)"Least Angle Regression"(with discussion)Annals of Statistics;see also /~hastie/Papers/LARS/LeastAngle_2002.pdf.Hastie,Tibshirani and Friedman(2002)Elements of Statistical Learning,Springer,NY.See Alsolars,and print,plot,and predict methods for lars,and rsExamplesdata(diabetes)attach(diabetes)object<-lars(x,y)summary(object)detach(diabetes)Index∗Topic datasetsdiabetes,3∗Topic hplotrs,5∗Topic methodsrs,5rs,7∗Topic regressionrs,2lars,4rs,7rs,8rs(rs),7rs,2diabetes,3lars,4rs,5rs,7rs,810。

类似于lasso的算法

类似于lasso的算法
1. Ridge回归（岭回归），与lasso类似，ridge回归也是一
种用于特征选择和回归分析的方法。

它使用L2正则化来限制模型的
复杂度，可以减少过拟合的风险。

2. Elastic Net，Elastic Net是lasso和ridge回归的结合，它同时使用L1和L2正则化来平衡特征选择和模型复杂度。

这种方
法在处理高维数据和共线性问题时非常有效。

3. Least Angle Regression（最小角回归），LARS是一种用
于高维数据的回归分析方法，它与lasso类似但更加高效。

它可以
快速找到最佳的特征子集，适用于大规模数据集。

4. Forward Stagewise Regression，这是一种渐进式的特征选
择方法，类似于lasso但更加灵活。

它可以逐步添加特征并调整系数，适用于处理大规模数据和高维特征。

以上是一些类似于lasso的常见算法，它们都可以用于特征选
择和回归分析，但在实际应用中需要根据具体情况选择合适的算法。

希望这些信息能够帮助你更好地理解类似于lasso的算法。

Lasso回归

Lasso回归Lasso 是⼀个线性模型，它给出的模型具有稀疏的系数（sparse coefficients）。

它在⼀些场景中是很有⽤的，因为它倾向于使⽤较少参数的情况，能够有效减少给定解决⽅案所依赖变量的个数。

因此，Lasso 及其变体是压缩感知（compressed sensing）领域的基础。

在某些特定条件下，它能够恢复⾮零权重的精确解。

在数学公式表达上，它由⼀个带有l1先验的正则项的线性模型组成。

其最⼩化的⽬标函数是：min w12n samples||Xw−y||22+α||w||1lasso estimator 解决了加上惩罚项α||ω||1的最⼩⼆乘的最⼩化，其中，α是⼀个常数，||ω||1是参数向量l1-norm的范数。

from sklearn.linear_model import Lassolasso = Lasso()lasso.fit([[0, 0], [1, 1]], [0,1])print("coef: {}".format(lasso.coef_))print(lasso.predict([[1, 1]]))coef: [0. 0.][0.5]from sklearn.linear_model import Lassolasso01 = Lasso(alpha=0.1)lasso01.fit([[0, 0], [1, 1]], [0,1])print("coef: {}".format(lasso01.coef_))print(lasso01.predict([[1, 1]]))coef: [0.6 0. ][0.8]在⼈⼯产⽣的被加性噪声污染的稀疏信号上估计Lasso和Elastic-Net回归模型。

估计出的稀疏与真实的稀疏进⾏⽐较。

print(__doc__)import numpy as npimport matplotlib.pyplot as pltfrom sklearn.metrics import r2_score%matplotlib notebook# 产⽣⼀些稀疏数据np.random.seed(42)n_samples, n_features = 50, 200X = np.random.randn(n_samples, n_features) # randn(...)产⽣的是正态分布的数据coef = 3 * np.random.randn(n_features) # 每个特征对应⼀个系数inds = np.arange(n_features)np.random.shuffle(inds)coef[inds[10:]] = 0 # 稀疏化系数--随机地把系数向量1x200的其中190个值变为0y = np.dot(X, coef) # 线性运算--y = X .*w# 添加噪声：零均值，标准差为0.01的⾼斯噪声y += 0.01 * np.random.normal(size=n_samples)# 将数据划分为训练集和测试集n_samples = X.shape[0]X_train, y_train = X[: n_samples // 2], y[: n_samples // 2]X_test, y_test = X[n_samples // 2: ], y[n_samples // 2: ]# 训练 Lasso 模型from sklearn.linear_model import Lassoalpha = 0.1lasso = Lasso(alpha=alpha)y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)r2_score_lasso = r2_score(y_test, y_pred_lasso)print(lasso)print("r^2 on test data:\n{:.2f}".format(r2_score_lasso))# 训练 ElasticNet 模型from sklearn.linear_model import ElasticNetenet = ElasticNet(alpha=alpha, l1_ratio=0.7)y_pred_enet = enet.fit(X_train, y_train).predict(X_test)r2_score_enet = r2_score(y_test, y_pred_enet)print(enet)print("r^2 on test data:\n{:.2f}".format(r2_score_enet))# 画图plt.plot(enet.coef_, color='lightgreen', linewidth=2, label='Elastic net coefficients')plt.plot(lasso.coef_, color='gold', linewidth=2, label='Lasso coefficients')plt.plot(coef, '--', color='navy', label='original coefficient')plt.legend(loc='best')plt.title("Lasso r^2: {:.2f}, ElasticNet r^2: {:.2f}".format(r2_score_lasso, r2_score_enet))Automatically created module for IPython interactive environmentLasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,normalize=False, positive=False, precompute=False, random_state=None,selection='cyclic', tol=0.0001, warm_start=False)r^2 on test data:0.39ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.7,max_iter=1000, normalize=False, positive=False, precompute=False,random_state=None, selection='cyclic', tol=0.0001, warm_start=False)r^2 on test data:0.24<IPython.core.display.Javascript object>Text(0.5,1,'Lasso r^2: 0.39, ElasticNet r^2: 0.24')设置正则化参数alpha 参数控制着估计出的模型的系数的稀疏度使⽤交叉验证scikit-learn 通过交叉验证来公开设置 Lasso alpha 参数的对象：LassoCV 和 LassoLarsCV。

基于最小绝对收缩与选择算子模型稀疏恢复的多目标检测

基于最小绝对收缩与选择算子模型稀疏恢复的多目标检测洪刘根;郑霖;杨超【摘要】针对地面多径环境下运动目标检测,使用最小绝对收缩与选择算子(LASSO)算法在参数估计时会出现伪目标的问题,提出一种基于LASSO模型框架的设计矩阵降维构造方法.首先,信号的多径传播能够带来目标检测的空间分集,信号在不同的多径上有不同的多普勒频移;此外,使用宽带正交频分复用(OFDM)信号能够带来频率分集.由于空间分集和频率分集的引入造成目标的稀疏特性.利用多径的稀疏性和对环境的先验知识,去估计稀疏向量.仿真结果表明,在一定信噪比(SNR,-5 dB)下,基于设计矩阵降维构造方法的改进的LASSO算法比基追踪算法(BP)、DS(Dantzig Selector)、LASSO等传统算法的检测性能有明显提高;在一定虚警率(0.1)条件下,改进的LASSO算法比原LASSO算法检测概率提高了30％.所提算法能够有效去除伪目标,提高雷达目标检测概率.%Focusing on the issue that the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm may introduce some false targets in moving target detection with the presence of multipath reflections,a descending dimension method for designed matrix based on LASSO was proposed.Firstly,the multipath propagation increases the spatial diversity and provides different Doppler shifts over different paths.In addition,the application of broadband OFDM signal provides frequency diversity.The introduction of spatial diversity and frequency diversity to the system causes target spacesparseness.Sparseness of multiple paths and environment knowledge were applied to estimate paths along the receiving target responses.Simulation results show that the improved LASSO algorithm based on the descendingdimension method for designed matrix has better detection performance than the traditional algorithms such as Basis Pursuit (BP),Dantzig Selector (DS) and LASSO at the Signal-to-Noise Ratio (SNR) of-5 dB,and the target detection probability of the improved LASSO algorithm was 30％higher than that of LASSO at the false alarm rate of 0.1.The proposed algorithm can effectively filter the false targets and improve the radar target detection probability.【期刊名称】《计算机应用》【年(卷),期】2017(037)008【总页数】5页(P2184-2188)【关键词】多径效应;稀疏向量恢复;多目标检测;最小绝对收缩与选择算子;正交频分复用信号雷达【作者】洪刘根;郑霖;杨超【作者单位】桂林电子科技大学广西无线宽带通信与信号处理重点实验室,广西桂林541004;桂林电子科技大学信息与通信学院,广西桂林541004;桂林电子科技大学广西无线宽带通信与信号处理重点实验室,广西桂林541004;桂林电子科技大学信息与通信学院,广西桂林541004;桂林电子科技大学广西无线宽带通信与信号处理重点实验室,广西桂林541004;桂林电子科技大学信息与通信学院,广西桂林541004【正文语种】中文【中图分类】TN91地面动目标检测技术在国防和民用中变得越来越重要。

最小角回归算法

最小角回归算法最小角回归（LARS）算法是一种常用于回归分析的算法，特别是当数据集具有高度相关性时，该算法能够在模型选择方面发挥出色的性能。

LARS算法在实践中非常有用，因为它为研究人员提供了一种准确且有效的方式来探测目标变量与其他预测变量之间的关系。

LARS算法的核心思想是一次向前选择，即在每个步骤中，它选择与目标变量最相关的预测变量。

然后根据它们之间的相关性逐步向预测变量添加系数。

除此之外，LARS 算法还能够帮助我们处理高度相关的预测变量并且在预测建模中最小化误差。

LARS算法需要的前提是更多的变量比样本。

这种数据结构很常见，因此，很多实际应用情况中LARS算法都有出色的表现。

一个显著的好处是，LARS算法能够准确地确定影响目标变量的关键预测变量。

这些关键变量可以是单个变量或一组变量，LARS算法可以从关键变量集合中给出精确的预测结果，而不必额外花费时间和精力去分析和测试其它预测变量。

LARS算法的另一个独特优势是，它可以同时从多个预测变量中进行预测，而不必在每次更新时对整个系数向量进行重新计算。

这一过程可以借助于计算简单的辅助变量完成，从而加快模型训练和预测等计算性能。

LARS算法与最小二乘回归不同之处在于，该算法选择一个或多个预测变量来估计目标变量，而最小二乘则处理所有预测变量。

因此，与最小二乘相比，LARS算法在数据集中的离群点和噪声敏感度较低，而且在大规模计算中具有很好的效率和稳定性。

最初，LARS算法应用场景非常有限，但随着大规模数据科学和量化金融的普及，该算法的适用范围得到了大大拓展。

它不仅可以应用于聚类、分类、回归等计算任务，还能够通过模型训练来发现有意义的特定模式和趋势，为实际工程应用开创了新的方法。

总之，LARS算法是一种可靠、高效的算法，常常应用于数据挖掘、建模和分析等方面。

它的特点在于有能力选择最相关的预测变量来估计目标变量，从而使得在数据拥有大量预测变量的情况下也具有出色的性能。

lars特征选择方法

lars特征选择方法
LARS（Least Angle Regression）是一种特征选择方法，它通过最小化角度损失函数和正则化项来寻找最佳特征子集。

LARS是一种逐步的特征选择方法，它通过逐步增加特征集合的大小来构建模型。

在每一步中，LARS选择与当前模型最相关的特征，并将其添加到特征集合中。

同时，LARS也会考虑去除与当前模型不太相关的特征。

LARS的主要优点是它可以找到所有相关的特征，而不仅仅是其中一部分。

此外，由于它是一种逐步的方法，因此可以很容易地理解每个步骤中添加或删除的特征。

在使用LARS进行特征选择时，通常需要设置一个正则化参数来控制模型的复杂度。

较小的正则化参数会导致模型更加复杂，而较大的正则化参数会导致模型更加简单。

在选择正则化参数时，通常需要进行交叉验证以找到最佳的参数值。

总的来说，LARS是一种强大的特征选择方法，它可以有效地找到与预测变量最相关的特征，并构建简单而准确的模型。

Lasso问题与LARS算法

c = XT (y − µA) or c j = ⟨x j, y − µA⟩
(2-20)
为当前各自变量与因变量的相关。若设
C = max{|c j|}
j
(2-21)
为当前最大相关，则根据假定，A中自变量与y的相关都为最大相关，即
⎧ ⎪⎪⎪⎪⎨|c j| = C for j ∈ A
⎪⎪⎪⎪⎩|c j| < C for j A
Figure 2.1中间一幅图给出了这种情况。初始时y与x1的相关度较高，由于每次只逼近一小步，因此需要多次用x1进行逼近，直到x2的相关度变得比x1更大（临界点处y′在x1和x2的角分线上）。之后交替用x2和x1进行逼近，一直达到（或非常接近于）因变量y。
这种算法能够给出最优解，且越小，给出的解就越精确，但算法的复杂度也随的减小而增大。一般情况下，为了保证精度，往往取得很小，因此算法的复杂度往往很高。
yk = ⟨x, φk⟩, k = 1, 2, . . . , n
(2-3)
5
其中x ∈ Rm为待测信号，φk为采样函数（传统时域-频域采样中即为傅里叶级数）， ⟨·, ·⟩表示内积。
如果采样函数是线性的，(2-3)式可写作
yn×1 = Φn×m xm×1
(2-4)
其中Φ ∈ Rn×m为采样矩阵。压缩采样中，我们希望n ≪ m，从而进行大幅度的数据压缩。采样后得到结果y ∈ Rn，我们希望能完好地恢复被测信号x ∈ Rm，即数据还原。
2.4.2 代数定义
前面的几何分析虽然直观，但高维向量“角分线”的定义需要用具体的代数算式给出。此外每次逼近的步长（即何时下一个自变量的相关度增大到与正在计算的自变量相关度相同）也需要计算得到。

lasso最小角回归算法推导

lasso最小角回归算法推导Title: Derivation of the Lasso Least Angle Regression Algorithm The Lasso Least Angle Regression (LARS) algorithm is a powerful tool in statistical learning, combining the principles of both least squares regression and the Lasso regularization technique. This algorithm provides a computationally efficient means to estimate regression coefficients, especially in high-dimensional datasets.拉索最小角回归（LARS）算法是统计学习中的一个强大工具，它结合了最小二乘回归和拉索正则化技术的原理。

该算法提供了一种计算效率高的方法，用于估计回归系数，特别是在高维数据集中。

At its core, the LARS algorithm aims to find a set of coefficients that minimize the residual sum of squares, subject to a constraint on the sum of the absolute values of the coefficients. This constraint introduces sparsity, allowing for the selection of only the most relevant predictors.在核心上，LARS算法旨在找到一组系数，这些系数在最小化残差平方和的同时，还要满足系数绝对值之和的约束。

从理论到应用——浅谈lasso模型

本科生学年论文题目：从理论到应用——浅谈lasso模型指导教师：学院：姓名：学号：班级：从理论到应用——浅谈lasso模型【摘要】回归模型是我们在处理数据中常用的方法。

其中，Lasso模型是一种适用于多重共线性问题，能够在参数估计的同时实现变量的选择的回归方法。

本文从lasso模型的概念谈起，对其起源、思想、与岭回归的比较、通过lar的算法实现等方面进行了探究。

另外还使用R 语言对简单案例进行lasso模型的应用。

最后简述了lasso模型的研究现状。

【abstract】Regression model is our commonly used method in processing data. Lasso model is a kind of regression method for multiple linear problems, which can be used to achieve parameter estimation and variable selection at the same time. This paper starts from the concept of the lasso model, including its origin, ideas, and the comparison of ridge regression, through lar algorithm implementation, etc. In addition, using R language to solve a simple case through lasso. At last, the research status of lasso model is introduced.【关键词】Lasso 岭回归最小角回归R语言【key words】Lasso ridge regression lar R language目录一、定义及基本信息.................................. - 3 -二、起源与原理...................................... - 3 -三、模型的思想...................................... - 3 -四、 Lasso与岭回归 .................................. - 4 -1、岭回归的概念.................................. - 4 -2、 Lasso与岭回归的比较........................... - 4 -五、 Lasso的算法步骤................................. - 5 -1、 lasso算法实现的背景........................... - 5 -2、最小角回归.................................... - 6 -3、用lar实现lasso ............................... - 6 -六、案例分析........................................ - 7 -1、问题描述...................................... - 7 -2、简单线性回归求解.............................. - 8 -3、利用lasso求解............................... - 10 -七、应用与研究现状................................. - 11 -八、参考资料....................................... - 12 -一、定义及基本信息Lasso模型是由Robert Tibshirani在1996年JRSSB上的一篇文章Regression shrinkage and selection via the lasso所提出的一种能够实现指标集合精简的估计方法。

Lasso回归总结

Lasso回归总结Ridge回归由于直接套⽤线性回归可能产⽣过拟合，我们需要加⼊正则化项，如果加⼊的是L2正则化项，就是Ridge回归，有时也翻译为岭回归。

它和⼀般线性回归的区别是在损失函数上增加了⼀个L2正则化的项，和⼀个调节线性回归项和正则化项权重的系数α。

损失函数表达式如下：J(θ)=1/2(Xθ−Y)T(Xθ−Y)+1/2α||θ||22其中α为常数系数，需要进⾏调优。

||θ||2为L2范数。

Ridge回归的解法和⼀般线性回归⼤同⼩异。

如果采⽤梯度下降法，则每⼀轮θ迭代的表达式是：θ=θ−(βX T(Xθ−Y)+αθ)其中β为步长。

如果⽤最⼩⼆乘法，则θ的结果是：θ=(X T X+αE)−1X T Y其中E为单位矩阵。

Ridge回归在不抛弃任何⼀个变量的情况下，缩⼩了回归系数，使得模型相对⽽⾔⽐较的稳定，但这会使得模型的变量特别多，模型解释性差。

有没有折中⼀点的办法呢？即⼜可以防⽌过拟合，同时克服Ridge回归模型变量多的缺点呢？有，这就是下⾯说的Lasso回归。

Lasso回归概述Lasso回归有时也叫做线性回归的L1正则化，和Ridge回归的主要区别就是在正则化项，Ridge回归⽤的是L2正则化，⽽Lasso回归⽤的是L1正则化。

Lasso回归的损失函数表达式如下：J(θ)=1/2n(Xθ−Y)T(Xθ−Y)+α||θ||1其中n为样本个数，α为常数系数，需要进⾏调优。

||θ||1为L1范数。

Lasso回归使得⼀些系数变⼩，甚⾄还是⼀些绝对值较⼩的系数直接变为0，因此特别适⽤于参数数⽬缩减与参数的选择，因⽽⽤来估计稀疏参数的线性模型。

但是Lasso回归有⼀个很⼤的问题，导致我们需要把它单独拎出来讲，就是它的损失函数不是连续可导的，由于L1范数⽤的是绝对值之和，导致损失函数有不可导的点。

也就是说，我们的最⼩⼆乘法，梯度下降法，⽜顿法与拟⽜顿法对它统统失效了。

那我们怎么才能求有这个L1范数的损失函数极⼩值呢？接下来介绍两种全新的求极值解法：坐标轴下降法（coordinate descent）和最⼩⾓回归法（Least Angle Regression， LARS）。

Least angle regression

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efﬁcient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modiﬁcation of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefﬁcients; the LARS modiﬁcation calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modiﬁcation efﬁciently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efﬁcient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

LARS算法简介

LARS算法简介Posted on 2011/04/23 by 郝智恒最近临时抱佛脚，为了讨论班报告Group Regression方面的文章，研究了Efron等人于2004年发表在Annals of Statistics里一篇被讨论的文章LEAST ANGLE REGRESSION。

这篇文章很长，有45页。

加上后面一些模型方面大牛的讨论的文章，一共有93页。

对于这种超长论文，我向来敬畏。

后来因为要报告的文章里很多东西都看不懂，才回过头来研读这篇基石性的文章。

所谓大牛，就是他能提出一种别人从来没有提出过的想法。

大牛们看待问题的角度和常人不同。

比如在回归中常用的逐步回归法。

我们小辈们只知道向前回归，向后回归还有二者结合的一些最基本的想法。

比如向前回归，就是先选择和响应最相关的变量，进行最小二乘回归。

然后在这个模型的基础上，再选择和此时残差相关度最高的（也就是相关度次高）的变量，加入模型重新最小二乘回归。

之后再如法继续，直到在某些度量模型的最优性准则之下达到最优，从而选取一个最优的变量子集进行回归分析，得到的模型是相比原模型更加简便，更易于解释的。

这种方法，牺牲了模型准确性（预测有偏），但是提高了模型的精确度（方差变小）。

大多数本科生对逐步回归的理解也就如此了。

Efron看待这个问题时，比起常人更高了一个层次。

他首先指出，逐步向前回归，有可能在第二步挑选变量的时候去掉和X1相关的，但是也很重要的解释变量。

这是因为它每次找到变量，前进的步伐都太大了，侵略性太强。

因此在这个基础上，Efron提出了Forward stagewise。

也就是先找出和响应最相关的一个变量，找到第一个变量后不急于做最小二乘回归，而是在变量的solution path上一点一点的前进(所谓solution path是指一个方向，逐步回归是在这个方向上进行)，每前进一点，都要计算一下当前的残差和原有的所有变量的相关系数，找出绝对值最大的相关系数对应的变量。

广义线性模型中的参数估计及变量选择方法研究

重庆大学硕士学位论文 1 引言一直都是学者们研究的方向。

本文将提出一种新的岭参数估计方案，并运用在广义线性模型中，通过模拟验证，指出其比目前文献中提出的参数估计方法更加实用。

1.1.2 参数估计的国内外研究现状目前，已经提出了一些方法来解决线性回归模型和广义线性回归模型中参数的估计。

其中较为流行的方法之一是 Hoerl 和 Kennard（1970）提出的岭估计，将∑ 2 最大限度地减少 RSS。

岭估最小二乘估计中加入 L2 惩罚，它运用约束j kβ≤-1计通过定义OLS 估计的压缩，得到一个有偏估计()βˆT TR=X X +kI X y ，k 是常数，I 是单位矩阵，并且它的方差比OLS 估计的要小。

因此，在有一定偏差的均方误差（MSE）下能够获得更好的估计，并且能够整体提高预测能力。

随后，许多文献都对岭参数估计做了相应的研究，比如：Hoerl 和 Kennard(1970)，Hoerl et al.(1975)，Lawless 和 Wang，Masuo (1988), Kibria (2003)，Khalaf 和 Shukur (2013)，Alkhamisi et al.(2006)，Muniz 和 Kibria (2009)，Drugade 和 Kashid (2010)和Khalaf et al.(2013)。

所有的这些和其他替代岭参数估计都主要运用在线性回归模型中。

后来Logistic 回归模型得到了一些重视，同时在Schaeffer et al.(1984)，Schaeffer (1986)，Månsson a 和Shukur (2011)和Kibria et al.(2012)中可以找到一些研究。

并且提出的现阶段比较好的岭参数估计方法。

此外 Liu（1993）提出的 Liu 估计（Liu Estimation）中的参数估计方法也被许多学者所研究。

对最小二乘进行另外的一种约束，从而得到一个新的有偏估计：βL =X +X X +dI βOLS 。

高维回归中的几种变量选择方法

变量选择算法也很多，诸如所有可能子集回归、向前选择、向后消减、逐步回归等，还有日本学者 Akaike（1973 年）[1]提出的 AIC
) b = arg min Y -
X ¢b
2 +l
b
b
2
1
（2）
准则、Mallows（1973 年）提出的 Cp 准则、以及 Schwarz（1978 年）其中，l > 0 是调节参数。
数据微小的变动就会使模型很难解释；其次，面对高维数据和超 Regression），即最小角回归算法来实现，对其进行简单的修正，
高维数据，当变量个数 p 很大时，最优子集方法就不可能对 2p 就可求出 LASSO 估计的系数解路径，在 R 软件中，可利用相关
个模型进行比较。因此，本文主要基于高维数据和超高维数据的软件包实现。
- 28 - 科学技术创新 2019.30
以此确保该系数对应的变量有较大的概率入选模型。Alasso 方法保留了 Lasso 优良性质的同时也有效地减少了参数估计的偏差。同时，Alasso 也是一个凸优化问题，易于求解。
1.3 Group Lasso Group Lasso 方法最早由 Bakin 在 1999 年提出，并给出了相应算法。该方法是 Lasso 方法的自然推广，使用 2 范数作为其惩罚项，Yuan 和 Lin（2006 年）对其做了进一步改进[4]。
2019.30 科学技术创新 - 27 -
高维回归中的几种变量选择方法
胡紫薇
（廊坊银行股份有限公司，河北廊坊 065000）
摘要院高维数据的变量选择是统计学家面临的主要问题之一。随着现代科学与技术的发展，统计分析者面临的数据越来越复杂，数据量也越来越大，海量的高维数据和超高维数据让统计分析工作颇具挑战性，各种各样的污染数据和异常数据也掺杂其

lasso算法

修正的LARS算法和lasso2011/04/25 郝智恒在小弟的上一篇文章中，简单的介绍了LARS算法是怎么回事。

主要参考的是Efron等人的经典文章least angle regression。

在这篇文章中，还提到了一些有趣的看法，比如如何用LARS算法来求解lasso estimate和forward stagewise estimate。

这种看法将我对于模型选择的认识提升了一个层次。

在这个更高的层次下看回归的变量选择过程，似乎能有一些更加创新的想法。

lasso estimate的提出是Tibshirani在1996年RSSB上的一篇文章Regression shrinkage and selection via lasso。

所谓lasso，其全称是least absolute shrinkage and selection operator。

其想法可以用如下的最优化问题来表述：在限制了的情况下，求使得残差平方和达到最小的回归系数的估值。

我们熟悉如何求解限制条件为等号时，回归方程的求解。

也就是用lagrange乘子法求解。

但是对于这种，限制条件是不等号的情况，该如何求解，则有两种想法。

第一种，也是我比较倾向于的方法，是利用计算机程序，对从开始，不断慢慢增加它的值，然后对每个，求限制条件为等号时候的回归系数的估计，从而可以以的值为横轴，作出一系列的回归系数向量的估计值，这一系列的回归系数的估计值就是lasso estimation。

另外一种想法，是借助与最优化问题中的KKT条件，用某个黑箱式的算法，求解。

（本人对于最优化方面的东西实在是不很熟悉，故不在此弄斧，只求抛砖引玉，能有高手给出这种想法的具体介绍。

）lasso estimate具有shrinkage和selection两种功能，shrinkage这个不用多讲，本科期间学过回归分析的同学应该都知道岭估计会有shrinkage的功效，lasso也同样。

rlasso回归的原理 -回复

rlasso回归的原理-回复什么是rlasso回归？rlasso回归是一种基于LASSO（least absolute shrinkage and selection operator）方法的回归技术，它是一种变量选择技术，可以在具有共线性的数据中识别最重要的自变量。

LASSO回归是一种广泛应用于机器学习和统计建模领域的回归方法。

它旨在通过在损失函数中添加一个正则化项来实现变量选择和收缩估计。

LASSO回归通过对不重要的自变量施加惩罚，使其系数为零，从而实现了变量选择。

与传统的LASSO回归方法不同，rlasso回归在惩罚项中引入了自相关结构，用于处理具有时间或空间相关性的数据。

在许多实际问题中，我们经常遇到数据具有时间相关性的情况，例如金融时间序列数据或气象数据。

在这些情况下，传统的LASSO回归方法可能无法很好地处理自相关性，因此rlasso回归则能更好地应对这些问题。

rlasso回归的数学原理可以通过最小化以下目标函数来实现：minimize(1/2n) ∑(yᵢ - β₀- xᵢᵀβ)² + λ ∑ β + ρ ∑ β - βₗ ,其中，yᵢ是因变量的观测值，xᵢ是自变量的观测值，β₀是截距项，β是自变量的系数向量，λ和ρ是正则化参数。

第一项表示普通的最小二乘损失，第二项是L1正则化项，用于变量选择，第三项是rlasso特有的自相关正则化项，用于处理自相关性。

通过调整正则化参数λ和ρ的值，可以控制变量选择和自相关调整的力度。

较大的λ值可以促使更多的自变量系数为零，实现变量选择。

较大的ρ值可以减弱自变量之间的自相关性，适用于处理具有明显自相关性的数据。

与传统的LASSO回归相比，rlasso回归在解决具有自相关的数据时具有一些显著优势。

首先，rlasso回归可以更好地应对自相关性，避免了在自相关的数据上产生虚假变量选择的问题。

其次，rlasso回归可以提供更准确的估计和预测，尤其是在具有明显自相关性的数据中。

基于LASSO_算法的波束空间信道估计

第11期2023年6月无线互联科技Wireless Internet TechnologyNo.11June,2023作者简介:张珍凤(1995 ),女,山西朔州人,助教,硕士研究生;研究方向:无线资源管理,机器学习,无线信道测量与建模㊂基于LASSO 算法的波束空间信道估计张珍凤,张文芳(山西能源学院电气与控制工程系,山西太原030006)摘要:在大规模毫米波(mmWave )天线阵列通信中,多输入多输出(Multiple Input Multiple Output ,MIMO )系统可以使用透镜天线阵列大幅减少射频链的数量,但由于天线的数量远远大于射频链路的数量,信道估计具有挑战性㊂由于波束空间信道具有稀疏的特性,那么用于求解稀疏信号恢复问题的算法,可以作为波束空间信道估计问题的解决方法㊂波束空间信道估计问题建立的模型是基于l 0-范数的非凸性问题,该问题为NP -hard ㊂通常用l 1-范数代替l 0-范数,将该问题转化为凸优化问题㊂该凸优化问题可以用传统的贪心算法方法进行求解㊂然而,这些贪心算法估计精度差㊂而且随着稀疏度的增加,计算复杂度也会增加㊂文章提出了最小角回归(Least Angle Regression ,LARS )算法和改进的最小绝对收缩和选择算子(Least Absolute Shrinkage and Selection Operator ,LASSO )算法,高效地解决了稀疏信号的恢复问题,即波束信道的估计㊂实验仿真结果表明,LARS 算法和所提出的改进LASSO 算法能够获得比传统OMP 算法更高效且具有更好的波束空间信道估计精度㊂关键词:波束空间信道估计;OMP 算法;LARS 算法;LASSO 算法中图分类号:TN929.5㊀㊀文献标志码:A 0㊀引言㊀㊀在5G 及以上频段毫米波(mmWave)大规模的移动通信中,多输入多输出(MIMO)是其中的关键技术之一[1]㊂为了降低大量天线的硬件成本和相关射频链所带来的功耗,透镜天线阵列作为一种高效节能的混合预编码在毫米波大规模MIMO 技术中得到广泛应用[2-3]㊂因为透镜天线阵列可以用不同加权系数的天线实现发射或接收空间不同方向的信号㊂透镜天线阵列线由于具有空间波束赋形能力,它可以将空间信道转换为波束空间信道,对信号进行空域滤波[4]㊂由于在毫米波频率下,只有少数具有较大路径增益的主导传播路径,因此,毫米波大规模MIMO 系统的波束空间信道中很多值为零,只有部分值不为零,所以波束空间信道是稀疏的[5]㊂因此,通过选择增益较大的主要波束连接到数字基带的射频链(Radio Frequency,RF),可以使频链数量大大减少㊂由于信道维度高,而且波束选择需要波束空间中准确的信道状态信息(Channel State Information,CSI)[6],特别是当射频链的数量远远小于天线的数量时[7],如何获得波束空间信道是一个非常具有挑战性的问题㊂1㊀系统模型建立1.1㊀空间波束信道㊀㊀基于毫米波的时分双工(Time Division Duplex,TDD)系统如图1所示㊂基站(Base Station,BS)是一个由N 根天线和N RF 条射频链路组成的透镜天线阵列,此透镜天线阵列为K 个单天线用户服务㊂为得到波束信道估计问题,首先,在传统的毫米波MIMO 信道中,根据广泛使用的Saleh -Valenzuela 信道模型[8],在第k 个用户和M 根天线的基站之间大小为M ˑ1信道矢量h k 表示为:h k =M L k ðLk l =1βk ,l a (θazi k ,l ,θelek ,l )=M L k ðLkl =1c k ,l(1)其中,L k 是可分辨的信道路径数,c k ,l =βk ,l a (θazi k ,l ,θele k ,l )是第l 条路径,其中,θazi k ,l ,θelek ,l ,βk ,l 分别是第l 条路径的水平角㊁俯仰角和复增益㊂a (θazi k ,l ,θele k ,l )是一个Mˑ1维的阵列导向矢量,它和阵列几何结构有关㊂对于均匀线性阵列(Uniform LinearArrays,ULAs),阵列导向矢量与一个角度有关,表示㊀㊀㊀图1㊀　透镜天线阵列多波束空间发射系统如下[10]:a ULA (θ)=1M[e -j 2πd sin(θ)m/λ](2)其中,m =[0,1, ,(M -1)]T㊂空间域信道使用透镜天线阵列转换成波束信道㊂透镜阵列相当于空间离散傅里叶变换矩阵F 大小为M ˑM ㊂对于ULAs,矩阵F 表示为:F =[a ULA (ψ1),a ULA (ψ2), ,a ULA (ψM )]H(3)其中,ψn =1Mm -M +12(),m =1,2, ,M 表示透镜天线阵列的空间方向㊂相似的,a ULA (ψ)表示为:a ULA (θ)=1M [e-j 2πψm](4)最后,波束信道矢量h k 的大小为M ˑ1在第k 个用户和M 根天线的基站能被表示为:h k =Fh k =M L k ðLk l =1ck ,l(5)其中,c k ,l =F c k ,l 是波束信道的第l 条信道分量㊂1.2㊀波速信道估计问题形成㊀㊀为了获得信道状态信息(CSI),通过Q 个时隙,所有用户把已知的导频符号发送给基站(BS)㊂由于TDD 信道的互易性,只考虑上行链路来获得信道估计㊂然后,根据估计的上行通道直接得到下行通道㊂本文采用了广泛使用的正交导频传输策略,其中由于导频的正交性,每个用户的上行信道估计是独立的㊂因此可以逐个估计所有K 个用户和BS 之间的波束空间信道向量㊂一般情况下,以第k 个用户和BS 之间的波束空间信道向量h k 为例来描述信道估计问题㊂在导频传输的第q 时刻,基站侧基带经过波束选择后,N RF ˑ1维的测试信号y k ,q 表示为[9]y k ,q =A k ,q h k s k ,q +n k ,q ,q =1,2, Q(6)其中,A k ,q 是N RF ˑM 波束选择网络,s k ,q 是传送的导频信号,n k ,q =A k ,q n k ,q 是有效的噪声向量,其中n k ,q ~CN (0,σ2m I M )是M ˑ1噪声向量,其中σ2m 为噪声功率㊂在经过Q 时刻的导频传输后,假定s k ,q =1,q =1,2, ,Q ,可以得到N ˑ1(N =QN RF )维的测试信号y k 为y k =y k ,1y k ,2︙y k ,Qéëêêêêêùûúúúúú=A k h k +n k (7)其中,A k =[A T k ,1,A T k ,2, ,A T k ,Q ]T是NˑM 维的选择矩阵,同时,该矩阵要乘以ʃ1N系数㊂n k =[n T k ,1,n T k ,2, ,n T k ,Q ]T是在Q 时段内的Nˑ1维的噪声向量㊂由于导频的正交性,所以信道估计算法对于K 个用户是相同的,问题可以进行简化,表示为:y =Ah+n (8)其中,波束信道h 中的元素在接近于信道路径实际空间角度时有很大的值㊂在毫米波频段,由于散射的限制有很少的传播路径㊂因此,波束信道h 是稀疏的[5]㊂因此,可以基于压缩理论中的稀疏信号恢复算法用很少的导频信号去估计波束信道㊂在压缩采样(Compressive Sampling,CS)中,矩阵A 为感知矩阵㊂波束信道估计问题可以转化为压缩采样信号理论中的稀疏信号的恢复问题[13]㊂min h 0,㊀㊀s.t. y-Ah ɤε(9)其中, h 0是h 中的非零元素,ε是可容忍误差参数㊂由于l0-范数的非凸性,上式中的问题为NP-hard[10]㊂因此,通常用l1范数代替l0范数[11],将该问题转化为凸优化问题㊂目前已有一些传统的贪心算法,如有正交匹配追踪(Orthogonal Matching Pursuit,OMP)[8]算法,它可以从带噪测量信号中获得稀疏的空间波束信道信息,还有基于压OMP改进的缩采样匹配追踪(Compressive Sampling Matching Pursuit,CoSaMP)[11]算法等都可以实现稀疏信号恢复㊂但是,这些贪心算法不能达到令人满意的估计精度㊂特别是随着稀疏度的增加,这些贪心算法的计算复杂度也会增加㊂2㊀LARS算法和改进LASSO算法2.1㊀LARS算法㊀㊀最小绝对收缩和选择算子(LASSO)问题描述的是一类有约束的优化问题[12]㊂具体定义如下: (α,β)=argmin y-Xβ-α l2s.t. β l1ɤε(10)其中,X=[x1, ,x m]ɪR nˑm是自变量矩阵,y= [y1, ,y n]ɪR nˑ1是因变量,取常数α=y-Xβ㊂用x iɪR n对y iɪR n进行线性回归,并且回归系数β的l1范数不超过ε值㊂LASSO问题是在回归系数β的l1范数满足一定约束条件下的线性回归问题㊂目前,在压缩采样(Compressive Sampling,CS)技术中大多采用LASSO问题的方式描述㊂对于线性压缩采样信号恢复问题,可以用下面的方程描述:y nˑ1=Φnˑm x mˑ1(11)其中,ΦɪR nˑm,n≪m为采样矩阵㊂yɪR n为经过压缩采样后得到的数据㊂为了获得原始被测信号xɪR m需要求解下面一个优化问题:min x l1,㊀㊀s.t.y=Φx(12)或存在误差时min x l1,㊀㊀s.t. y-Φx l2ɤε(13)在近似求解方程(11)的所有解中,当x的l1范数最小时,得到的就是最优稀疏解㊂由上可知,LASSO 问题中回归系数β的最优解问题就是稀疏信号恢复优化问题的最优x求解㊂由于波束信道估计问题本质上是一个稀疏信号恢复的问题㊂因此,通常在波束信道估计中,把l0-范数非凸优化问题转化为l1-范数凸优化问题后,波速信道估计优化问题的求解就和信号恢复优化问题对应,只要把原始被测信号x换做h,采样矩阵Φ换做感知矩阵A㊂那么就可以使用求解LASSO问题的方法进行波束信道估计㊂最小角回归(Least Angle Regression,LARS)算法能够高效地求解LASSO问题㊂LARS算法利用高维向量的角分线 ,先找到与因变量相关度最大的自变量,并用其对因变量逼近㊂多次逼近直到残差足够小或已取得了所有自变量,算法结束㊂下面将介绍LARS算法解决LASSO问题(波束信道估计)的具体过程[12]㊂在前向逐步回归算法中,它按顺序构建子空间,每次向子空间添加一个变量㊂在每一步中,它都确定要在活动集中添加最佳变量,然后更新最小二乘拟合以包含所有活动变量㊂LARS采用相似的策略㊂首先,它确定与测量值最相关的活动变量㊂LAR不是完全拟合该活动变量,而是将该活动变量的回归系数连续地沿其最小二乘值方向移动㊂在移动的过程中,其与残差的相关性减弱㊂当有新的自变量在与残差的相关性相同时,这个过程就会停止㊂然后,将第二个新自变量加入活动集,将它们构成的回归系数移动到一起,使它们的相关性保持紧密和残差的相关性递减㊂这一过程一直持续到所有自变量都在子空间中,并在完全最小二乘拟合时结束㊂具体如算法1所示㊂在第k步开始时,设J k为活动变量集,h Ak为系数变量;将会有k-1个非零值,并且刚刚输入的值将为零㊂如果当前的残差为r k=y-A Jkh Jk,则这一步的方向为:δk=(A T JkA Jk)-1A T J k r k(14)则预测参数将变为h Jk(α)=h J k+αδk㊂如果这一步开始时的拟合向量是f^k,那么它就演变为f^k(α)=f^k +α㊃uk,其中,u k=A J kδk是新的拟合方向㊂最小角这个名字来源于对这个过程的几何解释;u k与J k中的每个预测值形成最小(且相等)的角度㊂算法1:最小二乘回归(LARS)(1)将自变量因子A标准化,使之具有零均值和单位范数㊂以残差r=y-y,h1,h2, ,h k=0开始㊂(2)找到与r最相关的自变量a j㊂(3)将h j从0移向其最小二乘系数 a j,r⓪,直到其他自变量a k与当前残差的相关性与a j相同㊂(4)沿着 a j,a k⓪上当前残差的联合最小二乘系数定义的方向移动h j和h k,直到其他自变量a l与当前残差有相同的相关性㊂(5)按照这种方式继续下去,直到输入了所有k 个自变量㊂经过min(N-1,k)步,得到完整的最小二乘解㊂2.2㊀改进LARS算法求解LASSO问题㊀㊀LAR(LASSO)算法是一种高效的算法,因为它与使用k个自变量的单个最小二乘拟合具有相同的计算顺序㊂最小角度回归总是需要k步才能得到完整的最小二乘估计㊂虽然两种算法有相似之处,但LASSO算法需要经过多于k步才能得到最小二乘估计的解㊂为此对算法1进行修正得到算法2㊂算法2是计算任何LASSO问题解的一种有效方法,特别是当k远远大于N的时候㊂Osborne等[12]还发现了计算LASSO问题的分段线性路径,称之为同伦算法㊂算法2:最小二乘回归(LARS) 改进LASSO如果非零参数达到零,则将其自变量从活动变量集中删除,并重新计算当前联合最小二乘方向㊂2.3㊀近似消息传递算法㊀㊀迭代近似消息传递(Approximate Message Passing, AMP)算法收敛速度快,可以恢复计算复杂度低的稀疏信号,特别是对高维稀疏信号㊂在本小节中,将介绍AMP算法如何估计波束空间信道,如算法3所示㊂在算法3中,步骤1中的项b t v t-1和c t v∗t-1被称为Onsager修正,它们被引入AMP算法以加速收敛㊂AMP算法的关键步骤是步骤4,通过软阈值收缩函数ηst获得第t次迭代中估计的h ^t+1㊂收缩函数ηst是非线性元素运算,它考虑了矢量h 的稀疏性,使得h ^t+1的估计值更稀疏㊂对于第i个元素r t,i=|r t,i|e jw t,i(i=1, 2, ,N)的输入向量r t,有[ηst(r t;λt,σ2t)]i=ηst(|r t,i|e jw t;λt,σ2t)=max(|rt,i|-λtσt,0)e jw t,i(15)其中,w t,i为复值元素r t,i的相位,λt为第t次迭代中预先确定的固定参数,σ2t通过估计步骤2中的噪声方差来更新㊂由(15)可以发现,软阈值收缩函数ηst可以将低功率复值输入的幅值缩小到零㊂在步骤5和步骤6中,分别计算了收缩函数ηst在输入向量r及其共轭向量r∗处的逐元素导数,得到b t+1和c t+1㊂AMP算法能很好地处理大规模稀疏信号恢复问题㊂算法3:近似消息传递(AMP)输入:测量矩阵y,感应矩阵A,迭代数目T初始化:v-1=0,b0=0,c0=0,h ^0=0for t=0,1, ,T-1㊀do1㊁v t=y-Ah ^t+b t v t-1+c t v∗t-12㊁σ2t=1M v t 223㊁r t=h t+A T v t4㊁h ^t+1=ηst(r t;λt,σ2t)5㊁b t+1=1MðN i=1∂ηst(r t,i;λt,σ2t)∂r t,i6㊁c t+1=1MðN i=1∂ηst(r t,i;λt,σ2t)∂r∗t,iend for输出:稀疏信号恢复结果:h ^=h ^T3　实验仿真㊀㊀在仿真实验中,笔者认为BS有M=256个透镜天线阵列和N RF=16个射频链㊂单天线用户数设置为K=16㊂测量次数设置为N=128㊂上行信道估计的信噪比定义为1/σ2m㊂然后,根据Saleh-Valenzuela信道模型生成空间信道样本㊂对于(1)中的Saleh-Valenzuela信道模型,路径数为L k=3,βk,l~CN(0, 1),θk,l~U(-π2,π2),l=1,2,3㊂笔者为每个用户k 设置相同的信道参数验证LARS算法和改进LASSO 算法㊂最后,利用归一化均方误差(NMSE)评价LARS算法的性能㊂NMSE=EðK k=1 h^k-h k 22{}EðK k=1 h k 22{}(16)为了找出训练阶段的最佳信噪比设置,笔者对不同信噪比下算法的NMSE性能进行了比较,如图2所示㊂在本小节中,笔者考虑了ULA的Saleh-LASSO 算法和LARS 算法的波束空间信道估计性能比较㊂图2中为在不同信噪比下上述3种算法的NMSE 性能比较㊂如图2所示,随着信噪比增加,波束空间信道估计精度逐渐提高㊂同时,从图1可以得出与传统的OMP 算法相比,所提出的改进LASSO 算法和LARS 算法在所有考虑的信噪比区域都具有更低的估计误差,改进LASSO 算法比LARS 算法具有更好的NMSE 性能㊂图2㊀基于Saleh -Valenzuela 信道模型的ULAs NMSE 性能比较在图3中,笔者进一步比较了OMP 算法㊁改进LASSO 算法和AMP 算法在不同信噪比下的NMSE 性能㊂如图3所示,随着信噪比增加,波束空间信道估计精度逐渐提高㊂与传统的OMP 算法相比,AMP 算法和所提出的改进LASSO 算法在所有考虑的信噪比区域都具有更低的估计误差㊂而且AMP 算法和所提出的改进LASSO 算法具有相似的NMSE 性能㊂仿真结果表明,LARS 算法和所提出的改进LSSSO 算法能够获得比传统OMP 算法有更好的波束空间信道估计精度㊂图3㊀基于Saleh -Valenzuela 信道模型的ULAs NMSE 性能比较4㊀结语㊀㊀本文提出了基于LARS 算法和改进LASSO 算法来解决毫米波大规模MIMO 系统中的波束空间信道估计问题㊂仿真结果表明,与现有的OMP 和其他传统的波束空间信道估计方案相比,考虑基于LARS 算法和改进LASSO 算法的信道估计能够以较低的导频开销获得更好的估计精度㊂在未来的工作中,笔者将遵循所提出的LARS 算法的思想,通过考虑太赫兹信道特性来解决太赫兹通信中的信道估计问题㊂参考文献[1]MUMTAZ S ,RODRIGUEZ J ,DAI L L.mmWavemassive MIMO :A paradigm for 5G [EB /OL ].(2016-11-21)[2023-04-18].https :// /usercenter /paper /show ?paperid =15040gt0cm0m0000y 83v04f05n285236&site =xueshu_se.[2]AI B ,MOLISCH A F ,RUPP M ,et al.5G keytechnologies for smart railways [J ].Proceedings of theIEEE ,2020(6):856-893.[3]YONG Z ,RUI limeter Wave MIMO with lensantenna array :a new path division multiplexing paradigm[J ].IEEE Transactions on Communications ,2016(4):1557-1571.[4]ZENG Y ,ZHANG R ,CHEN Z N.Electromagneticlens -focusingantennaenabledmassiveMIMO :performance improvement and cost reduction [J ].IEEEJournal on Selected Areas in Communications ,2014(6):1194-1206.[5]BRADY J ,BEHDAD N ,SAYEED A M.BeamspaceMIMO for millimeter -wave communications :system architecture ,modeling ,analysis ,and measurements [J ].IEEE Transactions on Antennas &Propagation ,2013(7):3814-3827.[6]SRINIDHI N ,DATTA T ,CHOCKALINGAM A ,etyered tabu search algorithm for large -MIMO detection and a lower bound on ML performance [J ].IEEE Transactions on Communications ,2011(11):2955-2963.[7]AMADORI PV ,MASOUROS C.Low RF -complexity millimeter -wave beamspace -MIMO systemsbybeamselection [J ].IEEE TransactionsonCommunications,2015(6):2212-2223.[8]GAO X Y,DAI L L,HAN S F,et al.Reliable beamspace channel estimation for millimeter-wave massive MIMO systems with lens antenna array[J]. IEEE Transactions on Wireless Communications,2017 (9):6010-6021.[9]NATARAJAN B K.Sparse approximate solutions to linear systems[J].Siam Journal on Computing,1995 (2):227-234.[10]FUCHS J J.On sparse representations in arbitrary redundant bases[J].IEEE Transactions on Information Theory,2004(6):1341-1344.[11]TROPP J A.Greed is good:algorithmic results for sparse approximation[J].IEEE Transactions on Information Theory,2004(10):2231-2242. [12]HASTIE T,TIBSHIRANI R,FRIEDMAN J.The elements of statistical learning:data mining,inference, and prediction[M].Berlin:Springer,2009. [13]WEI X,HU C,DAI L.Deep learning for beamspace channel estimation in millimeter-wave massive MIMO systems[J].IEEE Transactions on Communications, 2021(1):182-193.(编辑㊀王雪芬)Beam space channel estimation based on LASSO algorithmZhang Zhenfeng Zhang WenfangDepartment of Electrical and Control Engineering Shanxi Institute of Energy Taiyuan030006 ChinaAbstract In large-scale millimeter wave mmWave antenna array communication Multiple Input Multiple Output MIMO systems can use lens antenna arrays to significantly reduce the number of RF chains but channel estimation is challenging because the number of antennas is much larger than the number of RF links.Since the beam-space channel is sparse the algorithm used to solve the problem of sparse signal recovery can be used as a solution to the problem of beam-space channel estimation.The model established for this problem is a non-convexity problem based on l0-norm which is NP-hard.The l0-norm is usually replaced by l1-norm to transform the non-convexity into a convex optimization problem.The convex optimization problem can be solved by the traditional greedy algorithm. However these greedy algorithms have poor estimation accuracy.Moreover with the increase of sparsity the computational complexity will also increase.In this paper an algorithm for Least Angle Regression LARS and an improved Least Absolute Shrinkage and Selection Operator LASSO the LASSO algorithm is efficient to solve the sparse signal recovery that is the estimation of the beam channel.Experimental simulation results show that LARS algorithm and the proposed improved LASSO algorithm can obtain more efficient and better accuracy of beam space channel estimation than the traditional OMP algorithm.Key words beam space channel estimation OMP algorithm LARS algorithm LASSO algorithm。

机器学习中的稀疏表示方法

机器学习中的稀疏表示方法随着数据量和特征维度的不断增加，在机器学习中，如何实现高效的特征选择和数据降维成为了重要的研究问题之一。

稀疏表示方法就是在这个背景下应运而生的一种重要技术。

由于其具有高效、可解释性等优秀特性，因此在数据分析、图像处理、信号处理等领域都得到了广泛的应用。

本文将从什么是稀疏表示、稀疏表示的求解算法等方面对机器学习中的稀疏表示方法进行详细介绍。

一、稀疏表示的概念稀疏表示是指用尽可能少的基函数来表示信号，从而实现数据的压缩或降维。

在机器学习中，常用的基函数有Discrete Cosine Transform(DCT)、Karhunen-Loève Transform(KLT)、Wavelet Transform(WT)等。

这些基函数都能实现一种表示方法，即只有很少的系数会被激活，而其他的系数则保持为零。

一个简单的例子，假设我们有一个数据集D，其中每个数据样本为$x \in R^d$，则通常我们可以用以下线性模型去表示这个数据集：$$\min_{w_i} \sum_{i=1}^{d}{\left \| Xw_i - x_i \right \|_2^2} + \lambda\left \| w_i \right \|_1$$其中，$X$是基向量矩阵，$w_i$是用于表示$x_i$的系数向量，$\left \| \cdot \right \|$是$l_1$范数，$\lambda$是控制稀疏度的超参数。

通常，$l_1$范数最小化问题的解具有很强的稀疏性，即只有少数的元素被激活，而其他的元素均为零。

二、稀疏表示的求解算法上述线性模型的求解问题属于优化问题，通常我们可以采用一些求解稀疏表示问题的算法来实现。

1. LARS算法Least Angle Regression(LARS)算法是一种线性模型求解算法，它能够计算出一系列用于表示目标函数的基向量，从而解释数据集的大部分方差。

它可以看做是一种逐步回归算法的改进。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

˜i = Xi − mean(Xi ) • Normalize the columns of X : X std(Xi ) ˜ have zero mean, and since y • Since the columns of X ˜ has zero mean, we have p = 0. We t ˜ t ˜ ˜. All the columns inside X ˜ ˜ ˜ have can thus use the normal equations: X X β = X y ˜ to ﬁnd β a standard deviation of one. This increase numerical stability. ˜ has been computed on the “normalized columns” X ˜i and p = 0. Let’s convert • Currently, β back the model so that it can be applied on the un-normalized X and y : * βi = std(y ) ˜ βi , i = 1, . . . , n std(Xi )
m
Robust Regression: β ∗ is the solution to: min ( Xβ − y 1 ) = min
β β i=1
|X(i) β − yi |
where | · | is the absolute value. We will not discuss here Robust regression algorithms. To ﬁnd the solution of a classical Linear Regression, we have to solve: Error2 (β ∗ ) = min (Xβ − y )2
Let’s assume that we want to perform a simple identiﬁcation task. Let’s assume that you have a set of n + 1 measures (x(1) , x(2) , . . . , x(n) , y ). We want to ﬁnd a function (or a model) that predicts the measure y in function of the measures x(j ) , j = 1, . . . , n. The vector x ∈ n = (x(1) , . . . , x(n) )t has several names: • x is the regressor vector. • x is the input. • x is the vector of independent variables. y has several names: • y is the target. • y is the output. • (y is the dependant variable). The pair (x, y ) has several names: • (x, y ) is an input↔output pair. • (x, y ) is a sample. We want to ﬁnd f such that y = f (x). Let’s also assume that you know that f (x) belongs to the family of linear models. I.e. f (x) = n j =1 x(j ) βj + p =< x, β > +p where < ·, · > is the dot product of two vectors and p is the constant term. You can also write f (x) = x t β + p where β ∈ n and p are describing f (x). We will now assume, without loss of generality, that p = 0. We will see later how to compte p if this is not the case. β is the model that we want to ﬁnd. Let’s now assume that we have many input↔output pairs. i.e. we have (x(1) , . . . , x(n) )(1) = X(1) ↔ y1 X(2) ↔ y2 . . . . . . X(m) ↔ ym
LARS Library: Least Angle Regression Stagewise Library
Frank Vanden Berghen
IRIDIA, Universit´ e Libre de Bruxelles fvandenb@iridia.ulb.ac.be
November 22, 2005
β
(3)
If we deﬁne the total prediction error TPError as
m
Sum of Squared Errors(β ) =
i=1
(X(i) β − yi )2 = (Xβ − y )2 = Xβ − y
2
where · 2 is the L2-norm of a vector, we obtain the Classical Linear Regression or Least Min-Square(LS): Classical Linear Regression: β ∗ is he solution to: min ( Xβ − y 2 ) = min (Xβ − y )2
β β
Such a deﬁnition of the total prediction error TPError can give un-deserved weight to a small number of bad samples. It means that one sample with a large prediction error is enough to change completely β ∗ (this is due to the squaring eﬀect that “ampliﬁes” the contribution of the “outliers”). This is why some researcher use a L1 − norm instead of a L2 − norm:
β
= min (Xβ − y )t (Xβ − y )
β
= min (β t X t − y t )(Xβ − y )
β
= min (β t X t Xβ − y t Xβ − β t X t y + y t y )
β
Since y t Xβ is a scalar, we can take its transpose without changing its value: y t Xβ = (y t Xβ )t = β t (y t X )t = β t X t y . We obtain: Error2 (β ∗ ) = min β t X t Xβ − 2β t X t y
β
Error2 (β ∗ ) is minimum ⇔ ⇔ ⇔
∂Error2 (β ) ∗ (β ) = 0 ∂β 2X t Xβ ∗ − 2X t y = 0 X t Xβ ∗ = X t y (4)
3 The equation 4 is the well-known normal equation. On the left-hand-side of this equation we ﬁnd X t X = C ∈ n×n : the correlation matrix. C is symmetric and semi-positive deﬁnite (all t the eigenvalues of C are ≥ 0). The element Cij = m k=1 Xik Xjk =< Xi , Xj >= Xi Xj is the correlation between column i and column j (where Xi stands for column i of X ). The righthand-side of equation 4 is also interesting: it contains the univariate relation of all the columns of X (the variables) with the target y . We did previously the assumption that the constant term p of the model is zero. p is null when mean(y ) = 0 and mean(Xi ) = 0 where mean(·) is the mean operator and Xi is the ith column of X . Thus, in the general case, when p = 0, the procedure to build a model is: • Compute mean(y ), std(y ), mean(Xi ), std(Xi ) where std(·) is the “standard deviation” operator. • Normalize the target: y ˜= y − mean(y ) std(y )
t β = y . Using matrix notation, β is the solution of: We want to compute β such that ∀i X( i i)