fusion algorithm
- 格式:pdf
- 大小:111.88 KB
- 文档页数:2
Optimization of Cooperative Spectrum Sensing in Ad-hoc Cognitive Radio NetworksWenfang Xia,Wei Yuan,Wenqing Cheng,Wei Liu,Shu Wang,Jing XuDepartment of Electronics and Information Engineering,Huazhong University of Science and Technology,Wuhan430074,ChinaEmail:{xiawf,yuanwei,chengwq,liuwei,shuwang,xujing}@Abstract—Spectrum sensing is an essential functionality of cog-nitive radio networks(CRN).Among existing spectrum sensing methods,cooperative spectrum sensing is the most promising one which can achieve superior sensing performance by introducing spatial diversity of sensing data sources.Such cooperation also introduces additional information exchanging which leads to extra power consumption and reporting delay.In this paper, the optimal sensing performance problem is formulated as a nonlinear binary integer programming problem tofind suitable cooperative nodes minimizing the average detection Bayesian risk.The binary particle swarm optimization(BPSO)algorithm is adopted to obtain suboptimal solutions to cooperative nodes. Computer simulations show that the proposed scheme can significantly improve the sensing performance compared with the case that all neighboring nodes participate in sensing without discrimination under different scenarios.Index Terms—cooperative spectrum sensing;nonlinear binary integer programming;binary particle swarm optimization.I.I NTRODUCTIONThe conflict between the lack of frequency resources for wireless communication and the low utilization of authorized spectrum bands motivates cognitive radio(CR)technology which allows secondary networks share the authorized spec-trum bands without causing harmful interference to the pri-mary users(PU)[1].Thus CR systems should continuously perform spectrum sensing to reliably detect weak primary signals of possibly unknown types.An accurate spectrum sensing method is important to maximize the throughput of CR networks(CRN)while keeping the interference to the PU under a certain constraint.However,such a requirement faces a great challenge in wireless communication systems due to some factors,such as multipath fading,shadowing, interference and noise[2],[3].Recent research indicates that cooperative spectrum sensing can lessen the impact of such above factors and increase sensing performance by introducing spatial diversity of sensing data sources[2].Most existing work about cooperative spectrum sensing focuses on the study of data fusion algorithms under the assumption that sensing performance of all cognitive nodes are the same.Considering the sensing capabilities of cognitive nodes were different from each other due to geographical locations,antenna gains or performance of RF amplifiers[4], the optimal data fusion algorithm[5]and weight cooperation 1This work was supported in part by the National Nutural Science Foun-dation of China through the grant No.60602029and No.60772088method[6]were proposed to help reduce the impact of unreliable cognitive nodes.However,the extra cost of power and bandwidth resources for reporting shaky results cannot be avoided.Considering sensing capability of cognitive nodes, a sensing optimization scheme was proposed to improve the collaborative sensing performance in[4].However,the as-sumption of perfect reporting channel was not realistic[7]and the delay for information exchanging was ignored.Obviously, the reporting delay increases with the number of cooperative nodes,which will lead to the decrease of the time for observing and data transmission.Although the pipelined cooperative spectrum sensing strategy[8]can reduce the reporting delay at the greatest extent,strict synchronization is required for all cooperative nodes,which is undoubtedly a great challenge to CRN because of the low complexity requirement of hardwares, especially for ad-hoc networks owing to the continual change of the fusion center[9].To obtain the optimal sensing performance at the least cooperative cost,a cooperative node selection scheme based on the individual characteristics of every node is explored in this paper.The optimization of cooperative spectrum sensing is for-mulated as a nonlinear binary integer programming(NLBIP) problem with selection variables of cooperative nodes.Binary particle swarm optimization(BPSO)method is adopted to solve such a problem.According to the individual characteris-tics of cooperative nodes,including the sensing capability and reporting delay for local decisions,suitable nodes are selected for cooperative sensing to obtain the minimum detection risk under a certain bandwidth of control channel.The rest of this paper is organized as follows.The moti-vation of the proposed cooperative node selection scheme is described in Section II.The cooperative sensing scheme is introduced in Section III.In Section IV,the concept and the mathematical formulation of the optimal cooperative sensing performance problem are presented and the BPSO algorithm is detailed to solve such a puter simulations are provided in section V to verify the performance of the proposed algorithm.Finally,we draw a conclusion of our contributions in Section VI.II.M OTIVATIONThe increase of cooperative nodes helps improve the sensing performance[10],but results in additional bandwidth and power consumption.As we all know,the sensing capabilitiesFig.1.A typical scene of cooperative sensingof cognitive nodes may be different from each other[4]and the attenuation of reporting channels is hardly the same,which influences the cooperative sensing performance greatly[7]. Thus different cognitive nodes make different contributions to cooperative sensing performance[4].In conventional coopera-tive sensing schemes,all cognitive nodes participate in sensing indiscriminately.Some nodes with low sensing capability make nothing to the cooperative sensing performance but introduce additional cooperative consumption.In order to explain the problem clearly,we consider the scenario of Fig.1.CR0,with three neighboring nodes CR1, CR2,CR3,needs to transmit data and acts as the fusion center. CR0and CR3are shadowed over the sensing channel and cannot reliably sense the presence of PU due to the low SNR of received signals.CR1is shadowed over the reporting channel and its decisions cannot be reported to CR0exactly[11]. Assuming that CR0needs one cognitive node to cooperate in sensing,the SNR of sensing channels and control channels are shown in Table I.The simulated results of average Bayesian risk are shown in Table II.From Table II,it can be seen that the sensing performance of “Optimal”fusion rule is better than that of“AND”and“OR”and CR2is the best choice of CR0to cooperative sensing whatever fusion rule is chosen.When proper data fusion rule is adopted,better sensing performance also can be achieved with all three neighboring nodes cooperation.However,the required bandwidth of control channel and the power consumption will be far more than the case of only CR2participating in cooperation.From above analysis,it can be draw that better sensing performance can be obtained at the least cost by selecting proper cooperative nodes,for example,CR2in the scene of Fig.1.On the other hand,the selection of cooperative nodes is not only related to the sensing capability,but also related to the reporting channel of cognitive nodes.III.C OOPERATIVE S PECTRUM S ENSING S CHEMEIn cooperative spectrum sensing,each CR node observes the signals of PU independently and sends the observation statistics or decisions to the fusion center to make afinal decision.The fusion center usually is the central control node of networks,such as the AP in WLAN,the wireless router in MESH.Since there is no central control node in ad-hocTABLE IO NE E XAMPLE OF F IG.2CR0CR1CR2CR3Sensing Channel-10dB0dB0dB-10dBControl Channel-5dB5dB5dBTABLE IIA VERAGE D ETECTIONB AYESIAN R ISKCR0CR0,CR0,CR0,CR0,CR1CR1CR2CR3CR2,CR3 OR0.27630.16400.36350.3643AND0.36690.30870.21590.49080.4903Optimal0.21650.01060.36350.0108 networks and all the nodes are equal,the cognitive node which needs to transmit data acts as the fusion center and collects local decisions[12].The cooperative spectrum sensing model in ad-hoc CRN is shown in Fig.2.CR0is the node needing to conduct spectrum sensing and acts as the fusion center.CR i (i=1,···,M,where M denotes the number of cognitive nodes)is the cooperative nodes.The observed signal samples of CR i can be expressed by following two hypotheses.H0:x i(k)=n i(k)H1:x i(k)=h pi s(k)+n i(k),i=1,2,...,M(1) where x i(k)and n i(k)represent the received signal samples and the additive noise of CR i at time k,respectively.h pi is the gain of the sensing channel between the PU and CR i,and s(k)the PU signals.To simplify analysis,h pi is assumed to be unchanged in one sensing period and n i(k) is a Gaussian variable with zero-mean and unit variance, i.e.,n i(k)∼CN(0,1).The process of cooperative spectrum sensing shown in Fig.2can be divided into following steps.(1)Every cognitive node,CR i,observes the spectrum of interest independently and makes local decision D i by some local detection method.D i=1u i>λi−1u i<λi(i=1,...,M)(2)whereλi and u i represent the threshold and observation statistics of CR i,respectively.(2)All cooperative nodes report their decisions to CR0to make thefinal decision D by some data fusion algorithm.D=ϕ(D 0,D 1,...,D M)(3) where D 0,D 1,...,D M denote the signal of D0,D1,...,D M after passing through the reporting channel.IV.F ORMULATION OF THE O PTIMIZATION P ROBLEM Before discussing the problem of selecting suitable coop-erative nodes,we make the same assumption of[10]that the instantaneous channel state information of the sensing channel and reporting channel is available at the cognitive users.Fig.2.The model of cooperative sensing in ad-hoc CRNA.The Sensing Capability of a Single NodeConsidering the fact that no prior knowledge about primary signals is available,the energy detection is adopted as a local detection method in this paper.The false alarm probability P fi and the detection probability P di of CR i are as follows [6].⎧⎪⎨⎪⎩P fi =P r (u i >λi |H 0)=Q √N λi −σ2i √2σ2iP di =P r (u i >λi |H 1)=Q √N λi −(1+γi )σ2i √2(1+2γi)σ2i(4)where Q (x )=1√2π+∞xe −t 22dt is the complementary cu-mulative distribution function,γi =E S |h i |2Nσ2ithe signal-to-noise ratio (SNR)of CR i and the quantity E S represents the transmitted signal energy over a sequence of N samples during each detection interval.Generally,perfect wireless channel is not realistic since it is usually subject to fading [7].The SNR of the reporting channel between the fusion center and CR i is marked as ρi .To simplify analysis,BPSK is adopted to modulate local decisions in this paper.Other modulation scheme is applied,too.Then the channel error rate is P ei =Q (√2ρi ).Therefore,the detection probability P diand false alarm probability Pfi of CR i after passing through reporting channel are expressed as follows:P fi=P r (D i =1|H 0)=(1−P fi )(1−P ei )+P fi P ei (5a)P di=P r (D i =1|H 1)=(1−P di )(1−P ei )+P di P ei (5b)B.The Reporting DelayMost existing research about cooperative spectrum sensing neglects the transmission delay for reporting local decisions to the fusion center.We argue that the reporting delay can not be ignored,especially for large number of cognitive nodes,because the bandwidth of the control channel usually is constrained for limited spectrum resource.Fig.3shows the slot transmission model in cooperative sensing.Time is divided into slots (frames)with equal length.Each frame includes two processes:sensing and data trans-mission.The sensing stage also can be divided into two phases:observing and reporting.In the observing phase,all cooperative nodes observe the interested spectrum band and make local decisions which are reported to the fusion center in a scheduled order in the reporting stage.In Fig.3,τi denotes reporting slot of CR i ,τS the observing slot,T S and T R the sensing and data transmission duration,respectively.Fig.3.A sensing slotWe assume that the bandwidth of control channel is given,denoted as B.For BPSK modulation,the transmission rate is 1bit/second/Hz.Assuming the utilization rate of the control channel is 80%.The total reporting time isτR =K i =1βi10.8B=τR (β)(6)where K is the neighboring node number of CR0,β=[β1,β2,···,βM ]T .βi ={0,1}represents that CR i participatein cooperative sensing (βi =1)or not(βi =0).The total number of cooperative nodes is n = Ki =1βi and the sensing duration is T S =τS +τRObviously,T R decreases with the increase of T S when the frame duration T =T S +T R is a constant and the decrease of T R will lead to the decrease of throughput.To satisfy the least throughput requirements of CRN,T R should not be lessthan T req Rwhere T reqR means the minimum of T R that satisfies the throughput requirement of CRN.Thus there is an upperbound of T S ,denoted as T S max =T −T reqR.On the other hand,to minimize the interference to the PU and improve the sensing performance,T S should be as big as possible.Thus we set T S =T S max .Substituting (6)into N =f S τS ,we getN =f S τS =f S (T Smax−τR (β))(7)The increase of cooperative nodes number leads to a high space diversity gain and helps to improve the sensing perfor-mance.However,it also results in the increase of τR which leads to the decrease of a single nodes sensing capability.Hence,there exists a tradeoff between the sensing capability and the number of cooperative nodes.C.The Average Bayesian RiskThe average Bayesian risk of cooperative spectrum sensing can be expressed as follows.R =P (H 0)[P (D =1|H 0)C 10+P (D =0|H 0)C 00]+P (H 1)[P (D =1|H 1)C 11+P (D =0|H 1)C 01](8)where C ij (i =0,1,j =0,1)means the price when H j occurs while makes H i .We just consider two kinds of errors:the false alarm probability Q f and the missed detection probability Q m .Thus C ii =0,for i =0,1.However C 10and C 01may be different for various system requirements.Then the Bayesianrisk can be rewritten asR=P(H0)Q f C10+P(H1)Q m C01=C10[Q f P(H0)+αQ m P(H1)](9)whereα=C01C10.For conservative systems,we takeα∈(1,∞)andα∈(0,1)for aggressive systems.Eq.(4)gives the relevance functions between P fi(P di)and the parameters (N,σ2i,λi andγi).P ei is a function ofρi.According to[13], the optimal threshold isλ∗i=1+√1+2γi+4A i2σ2i(10)where A i=2(1+2γi)Nγi ln(1+2γi)P(H0)αP(H1),ln(·)denotes the nat-ural logarithmic function.Eq.(9)can be rewritten as follows whenσ2i=1.R=C10[Q f(N,γ,β,ρ)P(H0)+αQ m(N,γ,β,ρ)P(H1)](11) whereγ=[γ1,···,γM]T,ρ=[ρ1,···,ρM]T.By substitut-ing(7)into(11),we can obtainR=C10[Q f(γ,β,ρρ,f S(T S−τR(β)))P(H0)+αQ m(γ,β,ρρ,f S(T S−τR(β)))P(H1)]=C10[ Q f(γ,β,ρ)P(H0)+α Q m(γ,β,ρ)P(H1)](12) whereQf(γ,β,ρ)=Q f(γ,β,ρρ,f S(T S−τR(β))Qf(γ,β,ρ)=Q f(γ,β,ρρ,f S(T S−τR(β))In order to obtain the best sensing performance at the least cost,we have to select cognitive nodes based onγ,ρto minimize the Bayesian risk in cooperative spectrum sensing. Then the optimization problem can be described asmin Rs.t.βi∈S⊆{0,1}n(P1) It is difficult to prove that the optimal solution of(P1)exists. But assuming the sensing performance of all cognitive nodes is the same and the control channel is perfect,it can be proved that R is convex in n when reporting delay is considered. D.The Praticle Swarm Optimization AlgorithmThe optimization problem of(P1)is a typical nonlinear binary integer programming(NLBIP)problem which is differ-ent from the linear binary integer programming.There is no general method to obtain the strict optimal solution.Different authors have shown various methods to get the suboptimal solution,such as the simulated annealing algorithm,the neural network algorithm,the genetic algorithm and the particle swarm optimization(PSO)algorithm.The study in[14]in-dicates that PSO is simple and can converge to the optimal solution faster compared with other algorithms.Considering the agility requirement of CRN,the binary PSO(BPSO)[15] is applied to solve the optimization problem of(P1).The following algorithm can helpfind the optimal solution ofβto(P1).All neighboring nodes are assumed to participate in sensingfirst.(1)Generate the initial swarm involving Num particles with all elements of1,marked asβj,j=1,2,···,Num.Set the initial orientation vector v i(0)randomly.(2)Calculate the new direction vector for each particle based on the following function.v j(t+1)=ω(t)v j(t)+c1R1(t)(p j(t)−βj(t))+c2R2(t)(p g(t)−βj(t))where c1and c2are the acceleration parameters,R1(t)and R2(t)random numbers between[0,1],ω(t)the inertia weight. p j(t)and p g(t)are the optimal solutions ofβin the history of the j-th particle and the swarm at the time t,respectively.(3)Update allβby the following function.βj(t+1)=(R(0,1)<S(v j(t+1)))where R(0,1)is a random vector in[0,1],S(x)=11+exp(−x) is the Sigmode function,the values of v j(t+1)determines the probability that the elements ofβj(t+1)equal to1or0.(4)Calculate the average Bayesian risk of each particle using Eq.(11),and update p j(t)and p g(t)according to the constraints below.if R(βj(t+1))<R(p j(t)),then p j(t+1)=βj(t+1) if R(p j(t+1))<R(p g(t)),then p g(t+1)=p j(t+1) (5)If the termination condition is not satisfied,go to Step(2). Otherwise,stop and output p g(t).Thefinal p g(t)will be the optimal value ofβfor(P1),marked asβ∗.V.A NALYSIS AND S IMULATIONThe average Bayesian risk for different numbers of coop-erative nodes is simulated in Fig.4.The SNR of CR0is-10dB.The received SNR and wireless control channel SNR of all neighboring nodes are assumed to be the same and are set as-8dB and10dB,respectively.T S is3.125ms and the sampling rate32kHz.In Fig.4,CR Num equals to one means no cooperative nodes and only CR0itself carries out the sensing process.It can be observed from Fig.4that the average Bayesian risk decreases with the increase of the cooperative nodes only when the report delay is not considered.When the reporting delay is considered,R decreasesfirstly and then increases which means that R is convex in CR Num and there exists the minimum value of R denoted as R min.In other words,more cooperative nodes do not mean better sensing performance under constraint sensing slots.To describe the sensing performance improvement of the proposed scheme,the average Bayesian risk of the four scenes in Table III is simulated by applying the BPSO algorithm.The physical meanings of the four scenes are as follows.S1:The detection capacities of all neighboring nodes are the same,and the control channel is nearly a perfect channel for P ei≈0whenρi=10dB.S2:The received SNR of all neighboring nodes are the same while some nodes are shadowing from CR0and their control channel SNR is low.Fig.4.Average Bayesian risk performance vs.CR NumTABLE IIIT HE V ALUE OFγANDρIN F OUR S CENESScene No.γ(dB)ρ(dB) S1−5ones(1,10)10ones(1,10)S2−5ones(1,10)[5,-5,-5,5,-6,6,-5,6,-6,5] S3[-4,-10,-20,-20,-6,-4,-7,-3,-2,-20]10ones(1,10)S4[-4,-10,-20,-20,-6,-4,-7,-3,-2,-20][5,-5,-5,5,-6,6,-5,6,-6,5] S3:Some cognitive nodes are far away from primary transmitter and their received SNR is low.S4:Both the detection capacities of neighboring nodes and the SNR of the control channels are different.The minimum Bayesian risk of CR0itself sensing is0.38by applying the adaptive threshold parameters proposed in[13]. The initial value ofβis ones(1,10),where ones(1,m)means a1-by-m vector of ones.The results of twenty times of repeated simulation are presented in Table IV.It is shown that R min is not achieved at the case that all neighboring nodes coordinate sensing because of reporting delay.Choosing part of neighboring nodes to cooperate by adopting the BPSO algorithm can reduce the Bayesian risk and improve the sensing performance obviously. The optimal solution ofβ,i.e.β∗,of each scene converges to a steady vector,which verifies the stability of BPSO algorithm. The values ofβ∗show that the nodes with better sensing performance and higher quality of control channel are selected to cooperate in sensing.It should be explained that n=6is satisfied in S1all through twenty repeated times simulation and which nodes are selected to cooperate in sensing is random absolutely.This is coincide with the results that the optimal number of cooperative nodes is six for perfect control channel with5kHz bandwidth as shown in Fig.4.VI.C ONCLUSIONIn this paper,we proposed a novel cooperative node selec-tion scheme in ad-hoc CRN to minimize the average Bayesian risk.Based on the study of the influence of received SNR and reporting channel to the cooperative sensing performance, the cooperative node selection problem was formulated as an NLBIP problem and the BPSO algorithm was adopted toTABLE IVTHE B AYESIAN R ISK OF T WENTY T IMES S IMULATIONaverage Bayesian riskβ∗all nodes best average worstS1.1385.0707.0720.0731Ki=1βi=6 S2.1924.0774.0779.0831[1,0,0,1,0,1,0,1,0,1]S3.1335.0205.0222.0232[1,0,0,0,0,1,0,1,1,0]S4.1887.0466.0485.0498[1,0,0,0,0,1,0,1,0,0]solve it.Simulation results verified that the average Bayesian risk was reduced significantly compared with the case that all neighboring cognitive nodes participated in cooperative sensing and the proposed scheme improved the sensing per-formance by selecting suitable cognitive nodes to cooperate in sensing.Furthermore,the energy costs and the electromagnetic pollution to the environment would be decreased since some of the nodes do not participate in sensing.R EFERENCES[1]S.Haykin,Cognitive radio:Brain-empowered wireless communications.IEEE J.Select.Areas Commun.,vol.23pp.201-220.2005.[2] A.Ghasemi and E.S.Sousa,Collaborative spectrum sensing for oppor-tunistic access in fading environments.in IEEE Symp.New Frontiers in Dynamic Spectrum Access Networks,.Baltimore,USA,Nov.2005, pp.131-136.[3]W.Saad,Z.Han,M.Debbah,et al.,Coalitional games for distributedcollaborative spectrum sensing in cognitive radio networks.in Proc.COM2009.Rio de Janeiro,Brazil,Apr.2009,pp.2114-2122.[4]W.Lee,and D.H.Cho.Sensing optimization considering sensing ca-pability of cognitive terminal in cognitive radio system.in Proc.IEEE CrownCom2008.Singapore,May2008,pp.1-6.[5]L.Chen.,W.Jun,and S.Li,Cooperative spectrum sensing with multi-bits local sensing decisions in cognitive radio context.in Proc.IEEE s Vegas,USA,Mar.2008,pp.570-575.[6]Q.Zhi,,S.Cui.,and A.H.Sayed,Optimal linear cooperation forspectrum sensing in cognitive radio networks.IEEE J.Select.Topics Signal Processing,vol.2,pp.28-40,2008.[7] C.Sun,W.Zhang,and K.B.Letaief.Cooperative spectrum sensing forcognitive radios under bandwidth constraints.in Proc.IEEE WCNC 2007.HongKong,Mar.2007,pp.733-736.[8] F.Gao.,W.Yuan.,W.Liu et al.,Pipelined cooperative spectrum sensingin cognitive radio networks.in Proc.IEEE WCNC2009.Budapest, hungary,Apr.2009,pp.1-5.[9]Z.Qing.,L.Tong,A.Swami,et al.,Decentralized cognitive MACfor opportunistic spectrum access in ad hoc networks:A POMDP framework.IEEE J.Select.Areas Commun.,vol.25pp.589-600,2007.[10] C.Sun,W.Zhang and taief,Cluster-based cooperative spectrumsensing in cognitive radio systems.in Proc.IEEE ICC,Glasgow, Scotland,Jun.2007,pp.2511-2515.[11]K.B.Letaief,and W.Zhang,Cooperative communications for cognitiveradio networks.Proceedings of the IEEE,vol.97,pp.878-893,2009.[12]R.Chen,J.Park,and K.Bian,Robust distributed spectrum sensing incognitive radio networks.in Proc.IEEE INFOCOM,Phoenix,Arizona ,USA,Apr.2009,pp.1876-1884.[13]W.Xia,S.Wang,W.Liu,et al.,Fast adaptive threshold for cooperativespectrum sensing.in press J.Elec.&Inform.Tech.,2009.[14]T.Matsui,M.Sakawa,K.Kato,Particle swarm optimization for nonlin-ear0-1programming problems.in Proc.IEEE International Conference on Systems,Man and Cybernetics(SMC),Singapore,Jan.2008,pp.236-243.[15]J.Kennedy,and R.C.Eberhart.A discrete binary version of the particleswarm algorithm.in Proc.IEEE International Conference on Systems, Man,and Cybernetics(SMC),Orlando,FL,USA,Jan.1997,pp.4101-4108.。
第3期王寅等:基于深度强化学习与旋量法的机械臂路径规划517及其衍生算法、PRM(probabilistic road maps)[8]及其衍生算法、退火算法[9]等都是基于采样的,环境发生改变后需要重新采样才能得到可行路径,难以适用于多变的机械臂工作场景.因此,如何提高机械臂在复杂环境的泛化性能成为了机械臂路径规划中的一个难点.深度强化学习(deep reinforcement learning,DR-L)[10–11],将深度学习的感知能力和强化学习的决策能力相结合,近年来被广泛运用在包括机械臂路径规划的各领域中.文献[12]将深度强化学习与机器人逆运动学融合实现了无碰撞的路径规划;文献[13]提出了一种基于SAC(soft actor-critic)算法的多臂机器人周期运动障碍物路径规划算法;文献[14]采用了一种基于DRL的混合控制算法,并通过了真实机械臂进行的测试;文献[15]使用了一种无模型深度强化学习算法进行训练;文献[16]提出通过深度强化学习让机器人具有自主获取进行复杂装配的能力,证明了通过DRL 模型可以提高机器人的智能水平;文献[17]提出了一种面向果园复杂环境的基于深度强化学习的快速、鲁棒的无碰撞路径规划方法;文献[18]提出使用深度强化学习方法来学习机械臂的控制器,即通过机械臂的控制器的反馈数据进行训练,将机械臂引导到目标地点;文献[19]通过奖励函数的设计来解决面向路径规划的DRL算法在障碍物环境中性能差的现象;文献[20]提出一种基于六自由度机械臂运动规划的改进DRL算法,通过添加成功经验池和碰撞经验池、改进奖励函数、优先训练前三轴等方法提升了训练效率和模型性能.这些研究通过DRL进行机械臂的路径规划研究,并通过经验池改进、奖励函数改进等方法使DRL更适用于机械臂,但仍然存在明显的获取样本成本过高问题,且随着机械臂轴数增多、环境复杂程度增大,样本获取成本将成指数式增长.因而,提升样本利用率对于深度强化学习在机械臂领域的应用有着不可忽视的作用.利用学习过程中不断生成的自然轨迹作为实际示例来进行训练,是目前机械臂训练过程中获取样本的常见方法,但样本获取成本大,且常存在样本利用效率过低的问题.文献[21]提出经验回放方法,将与环境交互所得样本存储至重放缓冲区,使样本多次用于训练.深度学习中的数据增强方法,是在已有样本集的基础上进行样本的扩充,但由于强化学习的样本是在探索过程中即时得到,故对强化学习并不适用,但可以通过数据增强的思路对样本进行有效扩充.例如HER(hindsight experience replay)算法[22],此算法将采集所得智能体的状态作为该轨迹的新目标,以此生成有效人工轨迹.除此以外还有HER的衍生算法、对称等扩充样本的方法,但这些方法只通过改变智能体状态的方式进行样本扩充,提升样本利用率.本文基于数据增强的思想提出了一种面向机械臂领域的离线策略强化学习通用算法:DRL与旋量法的融合算法(screw method in DRL,SMIDRL).具体贡献如下:1)通过旋量法[23]对机械臂与环境交互得到的自然轨迹进行有效扩充,从而使得样本利用率得到了更高效的提升,总体流程如图1所示.因为SMIDRL需要将即时采集到的轨迹进行扩充再存入回放经验池,故只能局限于离线策略DRL算法;2)SMIDRL在扩充轨迹的同时还可以通过改变障碍物等物体姿态的方式对交互环境进行同步改变,从而提升智能体对动态复杂环境的应对能力;3)SMIDRL可以与HER算法这类样本扩充算法进行结合,这样可以使样本利用率和训练效率得到更显著提升.图1旋量法与离线策略DRL的融合算法Fig.1Fusion algorithm of screw method and off-policy DRL 本文的组织结构如下:第1部分介绍了DRL在机械臂路径规划领域的研究现状以及提升样本利用率的必要性;第2部分介绍了深度强化学习算法在机械臂路径规划中的应用;第3部分介绍了SMIDRL理论;第4部分在OpenAI Gym[24]的Mujoco平台下,通过Fe-tch机械臂[25]和UR5机械臂的仿真环境,使用DDPG (deep deterministic policy gradient)算法[26]进行实验分析,证明本算法不仅可以提高模型训练效率,还能提高机械臂在复杂动态环境的泛化能力;第5部分进行了总结.2DRL在机械臂路径规划中的应用2.1马尔可夫决策过程马尔可夫决策过程(Markov decision process,MD-P)[27]是强化学习的基础,用于系统状态具有马尔可夫性质的环境,模拟智能体可实现的随机性策略与回报.给定一个随机过程的当前状态和所有过去状态,如果未来状态的条件概率分布且仅与当前状态有关,则称该随机过程具有马尔可夫性质.本文考虑将连续状态和动作空间模拟为多目标马尔可夫决策过程,即{S,A,G,T,R,p,γ}.其中:S是连续的状态空间;A指的是一个连续的动作空间;G是一组目标;T:S×A×S→[0,1]代表描述环境动态的未知转换函数;R{s,a,s′,g}是当智能体目标为g∈G时,执行动作a∈A后从状态s∈S达到状态s′∈S 所得到的即时奖励;γ∈[0,1]是折扣因子.在该框架518控制理论与应用第40卷下的机械臂学习目的是获得一个策略π:S ×G ∈A ,使预期折扣奖励的总和对于任何给定的目标都能达到最大化.2.2DDPG 与机械臂路径规划基于DDPG 的机械臂路径规划方法如图2所示.行动者当前网络负责机械臂动作选择策略的更新,评判家当前网络负责对机械臂所执行动作评判策略的更新.两个目标网络则是借鉴了DDQN (double deep Q-learning)算法[28]的双网络框架模式,通过目标框架评判当前框架的方式解决单个网络框架收敛慢、算法不稳定的问题.经验回放将机械臂与环境交互所得转移样本(s,a,r,s ′)存放到经验池中,训练时再从经验池中随机采样,以此将采集样本碎片化存储,避免了样本之间的相关性,同时也提高了样本利用率.图2基于DDPG 的机械臂路径规划方法[26]Fig.2Path planning method of manipulator based on DDPG [26]3DRL 与旋量法的融合旋量法是一种基于李群的解决机械臂运动学的方法.SMIDRL 算法如图1所示,即通过旋量法将机械臂探索过程中收集到的自然轨迹进行复制、筛选,以此得到廉价的可行人工轨迹,在不进行额外探索的前提下得到更多的训练样本,达到提高样本利用率的目的.每次转换得到的人工轨迹必须通过自然轨迹生成,否则可能会使训练结果有很大偏差.3.1旋量法进行旋量运算需要参数ξ和θ,其中θ为旋转角度,ξ=[νω]T ∈se (3)⊂R 6为旋量系数,ν=−ω×q ,ω=[w 1w 2w 3]T ∈so (3)⊂R 3表示旋转轴(如z 轴为旋转轴时ω=[001]T ),q ∈R 3为旋转体到旋转轴的垂直向量.ξ和θ对应的旋量矩阵g (ξ,θ)表示为g (ξ,θ)=[e ˆωθ(I −e ˆωθ)(ω×ν)01]∈SE (3),(1)其中:ˆω=0−ω3ω2ω30−ω1−ω2ω10,(2)e ˆωθ=I +ˆωsin θ+ˆω2(1−cos θ)∈SO (3),(3)其中:SO (n )⊂R n ×n 为n 维的特殊正交群,SE (n )⊂R (n +1)×(n +1)为系统位形空间R n 与SO (n )的乘积空间,so (n ),se (n )为李群SO (n ),SE (n )的李代数.3.2机械臂参数本文所使用机械臂仿真环境参数如下:1)状态空间s =(s f ,s ob ,s ob2g ,s obs ),其中:s f 包含夹具位置(x f ,y f ,z f ),线速度(x ′f ,y ′f ,z ′f ),手指的相对距离d f 和相对速率d ′f ;s ob 包括可移动物体的位姿(x ob ,y ob ,z ob ,αob ,βob ,γob ),线速度和角速度(x ′ob ,y ′ob ,z ′ob ,α′ob ,β′ob ,γ′ob );s ob2g 表示可移动物体与目标的距离差(x ob2g ,y ob2g ,z ob2g );s obs 表示障碍物的位姿(x obs ,y obs ,z obs ,αobs ,βobs ,γobs ),若环境中无障碍物,则s obs 为空集;2)动作空间a =(x a ,y a ,z a ,d f ),其中:d f 为夹爪手指的相对距离,(x a ,y a ,z a )表示机械臂下一步将运动到的位置;3)目标g =(x g ,y g ,z g )表示目标位置;4)奖励函数R 如式(4)所示,未完成任务则给予惩第3期王寅等:基于深度强化学习与旋量法的机械臂路径规划519罚“−1”,反之反馈值为“0”.R =−1,d (ob ,g ) ϵ,0,d (ob ,g )>ϵ,(4)其中:d (ob ,g )为可移动物体与目标之间的距离标量;ϵ为距离阈值,d (ob ,g ) ϵ时判定任务完成.3.3自然轨迹与人工轨迹的映射假定一条长度为h 的轨迹τ={g,(s 0,a 1,r 1,s 1,a 2,r 2,s 2,···,s h )},g ∈G ,s 0∈S ,∀i =1,2,···,h ,s i ∈S ,a i ∈A ,r i =R (s i −1,a i ,s i ,g ).假定所有在真实存在的机械臂中的轨迹的长度不大于H ∈N (即每个操作长度有限),则所有轨迹集合为L =H ∪h =1G ×S ×(A ×R ×S )h .所有可行轨迹集合为L ⊆L .对于τ∈L ,都有T (s i −1,a i ,s i ),则判定τ∈L .对于τ∈L ,若满足条件R (τ)>R min 或r i >R min ,则判定τ为成功轨迹.其中:R (τ)=h ∑i =1γi −1R (s i −1,a i ,s i ,g ),R min 为阈值.本文中的成功条件为r i >R min ,阈值为R min =−1.成功轨迹集合为L +⊆L .定义可行轨迹L 的映射为f :L →L ,使得f (L ,θ)=L ,其中f (L ,θ)={τ∈L|∃τ′∈L ,f (τ′,−θ)=τ}(下文将f (L ,θ)简化为f (L )表示).对于任意轨迹τ∈L 进行映射f 时,其元素也存在映射f G :G →G ,f S :S →S ,f A :A →A .换言之,状态、动作和奖励都可进行单独映射,即对于任意τ∈L 都存在f (τ)=τ′∈L .定义自然轨迹为τ={g,(s 0,a 1,r 1,s 1,a 2,r 2,s 2,···,s h )},所得人工轨迹为τ′={g ′,(s ′0,a ′1,r ′1,s ′1,a ′2,r ′2,s ′2,···,s ′h )},则τ与τ′中的元素满足以下映射关系:g ′=f G (g ),s ′0=f S (s 0),∀i =1,2,···,h ,a ′i =f A (a i ),s ′i =f S (s i ),r ′i =R (f S (s i −1),f A (a i ),f S (s i ),f G ).在SMIDRL 算法中,对于任意τ∈L 通过映射f 得到的人工轨迹τ′∈L ,其内部元素之间的相对距离并未改变(例如机械臂末端相对于目标的距离并未改变,即d (ob ,g )=d (f S (ob),f G (g ))),因此自然轨迹τ映射得到人工轨迹τ′后奖励值不变,即r ′i =r i .所有成功轨迹L +⊆L 同样适用可行轨迹L 的映射关系.3.4SMIDRL 算法旋量法的本质是旋转,通过旋量法对轨迹映射(简称为旋量映射)后,可能会生成工作区外的无效轨迹,因此设置参数θmax 来提高生成轨迹的有效性.SMIDRL 算法将机械臂与环境交互得到的自然轨迹通过旋量法进行扩充得到人工轨迹,在加以筛选后,与自然轨迹共同放入到经验回放库中.需通过旋量法进行映射的轨迹元素包括第3.2节中介绍的状态、动作、目标,其中包括位置矢量、线速度矢量、欧拉角、角速度以及夹具相对距离和相对速度等元素.夹具相对距离和相对速度是刚体内的元素,在经过旋量映射以后并不会被改变,因此不对其进行映射.在仿真环境中,因为机械臂底座坐标系未与世界坐标系重合,故需将机械臂底座坐标系简单转换至世界坐标系,接下来的介绍将默认已进行转换.每条自然轨迹可同时进行多次映射,在此定义每条自然轨迹进行旋量映射次数的参数n tw ,具体如下所示:Θ={θj |θj ∼(0,θmax ],j =1,2,···,n tw },(5)其中Θ表示在(0,θmax ]采样n tw 次所得角度的集合.每次旋量映射除了旋量参数ξ和角度θ以外并无改变,接下来仅介绍单次旋量映射.为了方便介绍,下文所有位置和线速度皆以(x,y,z )表示,所有弧度和角速度皆以(α,β,γ)表示.首先,将对应的(x,y,z )和(α,β,γ)(如障碍物的位置和姿态、可移动物体的线速度和角速度定义为对应)转换为以下SE (3)矩阵形式:g 0=[R 0p 001],(6)其中:R 0=Eul(α,β,γ),p 0=[x y z ]T .再根据式(1)得到该轨迹参数ξ,θ对应的旋量矩阵g (ξ,θ),则可得(x ′,y ′,z ′)和(α′,β′,γ′)的SE (3)矩阵g =g (ξ,θ)g 0=[R ′p′01],(7)从而可得到映射后的(x ′,y ′,z ′)和(α′,β′,γ′):[x ′,y ′,z ′]=(p ′)T ,(8)[α′,β′,γ′]=Eul −1(R ′),(9)其中:Eul(·)为将欧拉角映射为旋转矩阵SO (3)的运算,Eul −1(·)为Eul(·)的逆运算.以上为单次旋量映射的过程,下文将由Sc 表示,如将状态s 映射为s ′,可用s ′←Sc (s )表示.旋量映射完成后,需通过生成轨迹是否在工作空间内等条件来判断该轨迹是否有效,如有效才可存放至经验池.SMIDRL 算法的伪代码见算法1所示.SMIDRL 算法将已得到的自然轨迹进行复制扩充,在不进行额外探索的情况下获得更多廉价的有效人工轨迹,从而提高样本利用率和学习效率;在复制轨迹的同时,对障碍物等环境元素进行同步改变,以此提升机械臂对随机环境的泛化性能.4实验与验证4.1实验环境实验环境如下:仿真环境为Mujoco 平台中具有双指平行夹具的模拟7自由度Fetch 机械臂和6自由度520控制理论与应用第40卷UR5机械臂;深度强化学习算法为DDPG 算法;神经网络框架为Pytorch;显卡为GeForce GTX 2080Ti;CPU 为Intel(R)Core(TM)i9-9900X CPU @3.50GHz;操作系统为Ubuntu16.04;GPU 为32G.算法1:DRL 与旋量法的融合算法(SMIDL)Input :离线策略DRL 算法D ,容量为N 的经验回放库R ,回合数(episode)M ,每回合步数T ,采样样本数m ,最大角度θmax ,旋量参数ξ,旋量映射次数n tw .1初始化离线策略DRL 算法D 和经验回放库R 2for episode =1,2,···,M do3获得目标g 和初始状态s 1.4for t =1,2,···,T do5通过算法D 获得动作a t 并执行;6到达下一状态s t +1,计算得奖励值r t ;7将转移样本(s t ,a t ,r t ,s t +1)存入R ;8end9for j =1,2,···,n tw do10采样获得角度θj ∼(0,θmax ];11根据θj 和ξ映射g 和s 1:12g ′←Sc (g ),s ′1←Sc (s 1);13for t =1,2,···,T do14根据θj 和ξ映射a t 和s t +1:a ′t ←Sc (a ′),s ′t +1←Sc (s t +1);15if 该人工轨迹为有效轨迹then16将(s ′i ,a ′i ,r i ,s ′t +1)存入R 17end18end19end20for t =1,2,···,T do21从R 进行随机采样m 个转移样本(s,a,r,s ′);22使用算法D 和采样转移样本进行优化23end 24end仿真环境包括如图3–5所示的Fetch 机械臂仿真环境以及如图6所示的UR5机械臂仿真环境.Fetch 机械臂仿真环境包括推动、滑动、拾取和放置任务,UR5机械臂仿真环境为拾取和放置任务.在所有任务中,每一回合都会使可移动物体和障碍物在桌上进行随机初始化.图3–6中可移动物体为黑色方块,预设目标为红色球体,障碍物为黄色方块.状态、动作及奖励函数等如第3.2节定义.1)推动:该任务通过机械臂末端将物体推动至桌上的预设位置.2)拾取和放置:该任务先使用机械臂末端的夹爪将物体拾取,再将物体放置到工作空间中的预设位置.3)滑动:该任务通过机械臂末端对物体施加一个力,使物体在存在摩擦力的前提下滑动到桌子上的预设位置(目标位置在机械臂的工作区之外).(a)无障碍物任务(b)有障碍物任务图3推动任务的环境Fig.3Environment of the pushingtask(a)无障碍物任务(b)有障碍物任务图4拾取与放置任务的环境Fig.4Environment of the pick-and-placement task第3期王寅等:基于深度强化学习与旋量法的机械臂路径规划521(a)无障碍物任务(b)有障碍物任务图5滑动任务的环境Fig.5Environment of slidingtask图6UR5机械臂的拾取与放置任务环境Fig.6Pick-and-placement task environment of UR5manipulator4.2神经网络超参数设置在本文实验所使用的算法中,其网络框架超参数设置如表1所示,除非有特殊说明,否则超参数值不做改变.4.3算法评判标准在强化学习中,一回合(epoch)由固定大小的连续步(episode)组成,因此可以通过计算每回合(即每迭代一次)中成功的步数来计算该回合的成功率.以此为基础,在本文实验中,算法每次更新完成后,都会通过当前的训练模型控制机械臂执行10次任务并求得其平均成功率,该平均成功率则为当前时刻的训练模型优劣性的评判标准.表1神经网络框架超参数Table 1Hyperparameter of neural networkframework超参数符号超参数名称具体数值α策略网络学习率0.001β评价网络学习率0.001τ目标网络软更新参数0.1γ衰减因子0.98m 小批量采样样本数128N 经验回放池容量1×106M 回合数200T每回合步数504.4实验与分析本文实验中,分别给推动、拾取和放置任务8倍加速,由于滑动任务更加困难,因此给它24倍加速.前文提到SMIDRL 与HER 算法融合后可使样本利用率提升更高,接下来将使用HER 算法中效果最好的“未来策略”进行验证,HER 算法的使用次数为4.为了更好介绍SMIDRL 算法的性能,本文将通过无障碍情况下有无HER 算法、不同次数的旋量映射、存在障碍物情况这3种情况进行实验对比并验证.同时为了证明SMIDRL 的通用性,本文通过solidworks 导出的UR5机械臂模型,保存UR5机械臂的物理参数,在存在障碍物环境下让该UR5机械臂进行“拾取与放置”实验.本实验中所展示数据图皆截取至Tensor-board,为方便对比已用Tensorboard 自带插件进行拟合,其中被虚化部分为原始数据,实体部分为拟合后数据.4.4.1无障碍情况下有无HER 算法的性能比较本实验将通过DRL,SMIDRL,DRL+HER 以及S-MIDRL+HER 这4种情况进行对比,来验证SMIDRL 算法提升性能的有效性,且与HER 算法融合后性能可使提升更高.其中推动任务、拾取与放置任务进行旋量映射的次数n tw =16,滑动任务进行旋量映射的次数n tw =24.实验结果如图7所示,实验分析如下:1)在推动任务中,DRL 算法的成功率呈微下降趋势;SMIDRL 算法的成功率在160回合开始上升,最高达到30%;HER+DRL 算法的成功率于60回合开始上升;HER+SMIDRL 算法在60回合时成功率就达到90%以上,并于80回合左右达到峰值且大致平稳;2)在拾取与放置任务中,DRL 算法的效果较差,成功率无上升趋势且均低于10%;SMIDRL 算法在100回合处成功率开始上升;HER+DRL 算法的成功率522控制理论与应用第40卷虽有增长,但相对缓慢,200回合时成功率都未高于90%;HER+SMIDRL 算法在120回合左右成功率就已达到峰值,并大致保持平稳;3)滑动任务相对其他两个任务更为困难,相对于HER+DRL 算法,HER+SMIDRL 算法仅得到微弱的提升;但在未融入HER 算法的情况下,SMIDRL 算法对比DRL 算法的优势较为明显:DRL 算法成功率无上升趋势,而SMIDRL 算法在130回合后成功率开始增高.(a)拾取与放置任务(b)滑动任务(c)推动任务图7无障碍任务下SMIDRL 算法的性能Fig.7Performance of SMIDRL without obstacle综上,在无障碍情况下,SMIDRL 算法的效率明显高于DRL 算法,且融入HER 算法后SMIDRL 算法提升更为显著.4.4.2不同次数旋量法对性能的影响本实验将通过进行不同次数旋量映射的实验,来比较在无障碍物的情况下不同次数旋量映射对算法性能的影响.在此次实验中,将在HER+SMIDRL 算法的基础上,通过0次、1次、8次、16次及32次旋量映射来进行比较.实验结果如图8所示,实验分析如下:1)在推动任务中,单次旋量映射的算法与未进行旋量映射的相比,训练效率得到大幅提升;而n tw =1到n tw =32的区别不大,在训练前期(60回合前)随着旋量映射次数增多成功率提升越快,在60回合左右达到最高值;2)拾取与放置任务与推动任务相似,n tw =1相对于n tw =0,其算法性能提升明显:n tw =0的成功率均低于90%,而n tw =1的成功率于140回合左右达到峰值;而n tw =1到n tw =32虽对算法的训练效率虽有提升,但提升幅度随着旋量映射次数的增加而减少;3)滑动任务相对更为复杂,随着旋量映射次数增多,训练效率虽能得到提升,但提升幅度降小.(a)拾取与放置任务(b)滑动任务(c)推动任务图8无障碍环境下进行不同次数旋量映射的比较Fig.8Comparison of screw-mapping with different times inbarrier-free environments在这3个任务中,在一定的旋量映射次数范围内,随着旋量映射的次数增加,成功率随之升高,但提升第3期王寅等:基于深度强化学习与旋量法的机械臂路径规划523效果逐渐降低.4.4.3存在障碍物情况的性能比较本实验将在仿真环境中加入障碍物,以此验证SMIDRL 算法在障碍物环境中也是可行有效的.为了方便对比,本实验将融入HER 算法,即通过HER+DRL 算法与HER+SMIDRL 算法进行对比分析.实验结果如图9所示,由于障碍物的加入使环境变得复杂,导致HER+DRL 的训练模型成功率均处于较低的状态.实验分析如下:1)在推动任务中,HER+DRL 算法成功率低于10%,但HER+SMIDRL 算法成功率逐步上升,在90回合开始成功率保持在90%以上;2)在拾取与放置任务中,HER+DRL 算法的成功率上升缓慢,在130回合处才开始有明显上升趋势;HER+SMIDRL 算法的成功率在40回合左右开始稳步上升,在120回合后保持在90%左右;3)由于滑动任务相对其他两个任务难度更大,故仅有小幅度的性能提升.(a)拾取与放置任务(b)滑动任务(c)推动任务图9存在障碍任务下SMIDRL 算法的性能Fig.9Performance of SMIDRL with obstacle综上,SMIDRL 算法在存在障碍物的非结构化下同样具备提升样本利用率和模型泛化性能的能力.4.4.4UR5机械臂的性能验证前3个实验已经验证SMIDRL 在Fetch 机械臂环境中的可行性,本实验将通过UR5机械臂在障碍物环境下进行拾取与放置实验,验证SMIDRL 的通用性以及在现实环境的可行性.在本实验中,旋量映射参数n tw 设置为16,为了方便对比观察,此实验都将融入HER 算法,即通过HER+DRL 与HER+SMIDRL 进行分析.实验结果如图10所示.由于UR5机械臂为六自由度,在障碍物环境中运动受限,因而成功率较低.HER+DRL 算法成功率低于10%,而HER+SMIDRL 算法在130回合左右将成功率稳定在42%左右.图10UR5机械臂下的SMIDRL 性能验证Fig.10SMIDRL performance verification in UR5manipulator5结论针对深度强化学习在机械臂路径规划的应用中仍面临的样本需求量大和获取成本高的问题,本文提出了SMIDRL 算法,并通过Fetch 机械臂和UR5机械臂在非结构化环境下的仿真实验进行了验证.SMIDRL 算法通过旋量法与DRL 算法的融合,在同样的探索次数下,能够复制更多的有效轨迹并投入训练,使训练效率得到提升;并且在进行轨迹扩充的同时,通过对目标、障碍物等环境元素进行同步复制的方式,对交互环境进行相应改变,从而提高了机械臂在复杂随机环境中的适应能力,提高训练模型的泛化性能.实验结果表明,在未融入HER 算法的情况下,旋量法对训练模型有质的提升;在与HER 算法结合后,加快了在推动、拾取与放置任务中的学习速度,大幅提高了机械臂的成功率.在滑动任务中也有一定的性能提升,且在有障碍物的复杂环境中,采用SMIDRL 算法的性能提升效率更为明显.参考文献:[1]YANG C,ZENG C,CONG Y ,et al.A learning framework of adap-tive manipulative skills from human to robot.IEEE Transactions on Industrial Informatics ,2018,15(2):1153–1161.524控制理论与应用第40卷[2]YANG C,WU H,LI Z,et al.Mind control of a robotic arm with vi-sual fusion technology.IEEE Transactions on Industrial Informatics, 2017,14(9):3822–3830.[3]NUBERT J,KOHLER J,BERENZ V,et al.Safe and fast tracking ona robot manipulator:Robust mpc and neural network control.IEEERobotics and Automation Letters,2020,5(2):3050–3057.[4]PIAZZI A,VISIOLI A.Global minimum-jerk trajectory planningof robot manipulators.IEEE Transactions on Industrial Electronics, 2000,47(1):140–194.[5]SARAMAGO S F P,JUNIOR V S.Optimal trajectory planning ofrobot manipulators in the presence of moving obstacles.Mechanism and Machine Theory,2000,35(8):1079–1094.[6]SARAMAGO S F P,STEFFEN JR V.Optimization of the trajectoryplanning of robot manipulators taking into account the dynamics of the system.Mechanism and Machine Theory,1998,33(37):883–894.[7]KARAMAN S,WALTER M R,PEREZ A,et al.Anytime motionplanning using the RRT.IEEE International Conference on Robotics and Automation.Shanghai:IEEE,2011:1478–1483.[8]CANNY J.The Complexity of Robot Motion Planning.Canbridge:MIT press,1988.[9]BARRAL D,PERRIN J P,DOMBRE E,et al.An evolutionary simu-lated annealing algorithm for optimizing robotic task point ordering.Proceedings of the IEEE International Symposium on Assembly and Task Planning(ISATP’99)(Cat.No.99TH8470).Porto:IEEE,1999: 157–162.[10]ZHAO Dongbin,SHAO Kun,ZHU Yuanheng,et al.Summary ofdeep reinforcement learning:Also on the development of computer go.Control Theory&Applications,2016,33(6):701–717.(赵冬斌,邵坤,朱圆恒,等.深度强化学习综述:兼论计算机围棋的发展.控制理论与应用,2016,33(6):701–717.)[11]TANG Zhentao,SHAO Kun,ZHAO Dongbin,et al.Progress in deepreinforcement learning:From Alphago to Alphago zero.Control The-ory&Applications,2017,34(12):1529–1546.(唐振韬,邵坤,赵冬斌,等.深度强化学习进展:从AlphaGo到AlphaGo Zero.控制理论与应用,2017,34(12):1529–1546.)[12]ZHONG J,WANG T,CHENG L.Collision-free path planning forwelding manipulator via hybrid algorithm of deep reinforcement learning and inverse plex&Intelligent Systems, 2022,8(3):1899–1912.[13]PRIANTO E,PARK J H,BAE J H,et al.Deep reinforcementlearning-based path planning for multi-arm manipulators with peri-odically moving obstacles.Applied Sciences,2021,11(6):2587.[14]SANGIOV ANNI B,INCREMONA G P,PIASTRA M,et al.Self-configuring robot path planning with obstacle avoidance via deep re-inforcement learning.IEEE Control Systems Letters,2020,5(2):397–402.[15]SANGIOV ANNI B,RENDINIELLO A,INCREMONA G P,etal.Deep reinforcement learning for collision avoidance of robotic manipulators.European Control Conference(ECC).Limassol, CYPRUS:IEEE,2018:2063–2068.[16]LI F,JIANG Q,ZHANG S,et al.Robot skill acquisition in assemblyprocess using deep reinforcement learning.Neurocomputing,2019, 345:92–102.[17]LIN G,ZHU L,LI J,et al.Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning.Computers and Electronics in Agriculture,2021,188:106350.[18]IRIONDO A,LAZKANO E,SUSPERREGI L,et al.Pick and placeoperations in logistics using a mobile manipulator controlled with deep reinforcement learning.Applied Sciences,2019,9(2):348.[19]WEN S,CHEN J,WANG S,et al.Path planning of humanoid armbased on deep deterministic policy gradient.IEEE International Con-ference on Robotics and Biomimetics.Kuala Lumpur:IEEE,2018: 1755–1760.[20]LI Z,MA H,DING Y,et al.Motion planning of six-dof arm robotbased on improved DDPG algorithm.The39th Chinese Control Con-ference IEEE.Shenyang:IEEE,2020:3954–3959.[21]LIN L J.Self-improving reactive agents based on reinforcementlearning,planning and Teaching.Machine Learning,1992,8(3/4): 293–321.[22]ANDRYCHOWICZ M,WOLSKI F,RAY A,et al.Hindsight experi-ence replay.The31th Conference on Neural Information Processing Systems(NIPS2017),Long Beach,CA,USA:2017,30:5048–5058.[23]MURRAY,RICHARD M,ZEXIANG LI,et al.A Mathematical In-troduction to Robotic Manipulation.Boca Raton:CRC press,2017: 61–66.[24]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.OpenAIGym.arXiv Preprint,2016:1606.0154OVI.[25]PLAPPERT M,ANDRYCHOWICZ M,RAY A,et al.Multi-goalreinforcement learning:Challenging robotics environments and request for research.ArXiv Preprint,2018:arXiv:1802.09464.[26]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous con-trol with deep reinforcement learning.International Conference on Learning Representations(Poster).Santiago:ICLR,2015.[27]SCHAUL T,HORGAN D,GREGOR K,et al.Universal value func-tion approximators.International Conference on Machine Learning.Lile:ICML,2015:1312–1320.[28]V AN HASSELT H,GUEZ A,SILVER D.Deep reinforcementlearning with double q-learning.Proceedings of the AAAI Conference on Artificial Intelligence.Texas:AAAI,2016,30(1):2094–2100.作者简介:王寅硕士研究生,目前研究方向为深度强化学习、串联机械臂、路径规划,E-mail:*********************;王永华副教授,博士,目前研究方向为信号处理、机器学习、智能测控,E-mail:********************.cn;尹泽中硕士研究生,目前研究方向为移动机械臂、深度强化学习、运动规划,E-mail:*********************;万频教授,博士,目前研究方向为智能测控技术、信号处理、物联网,E-mail:***************.第40卷第3期2023年3月控制理论与应用Control Theory&ApplicationsV ol.40No.3Mar.2023基于双边闭环函数的网络化采样控制系统稳定性分析曾红兵,颜俊杰,肖会芹†(湖南工业大学电气与信息工程学院,湖南株洲412007)摘要:考虑数据通信时延不确定环境下网络化采样控制系统的稳定性问题.首先基于输入时滞方法,建立包含采样周期信息的网络化采样控制系统的数学模型,在此基础上,采用双边闭环函数方法和自由矩阵积分不等式技术,得到网络传输时滞变化区间依赖稳定性新准则,并进一步讨论了网络化采样控制系统中网络时延与采样周期之间的关系.仿真结果表明减小采周期可以增强网络控制系统对网络通信时延的鲁棒性.关键词:网络控制系统;双边闭环函数;自由矩阵积分不等式;不确定数据传输时滞引用格式:曾红兵,颜俊杰,肖会芹.基于双边闭环函数的网络化采样控制系统稳定性分析.控制理论与应用, 2023,40(3):525–530DOI:10.7641/CTA.2022.11008Stability analysis of networked control system based ontwo-sided looped functionalsZENG Hong-bing,YAN Jun-jie,XIAO Hui-qin†(School of Electrical and Information Engineering,Hunan University of Technology,Zhuzhou Hunan412007,China) Abstract:The stability of a class of networked sampled-data control systems with data transmission delay is studied. Firstly,based on the input delay method,a model of networked control systems with periodic sampling is established.On this basis,a new stability criterion with uncertain transmission delay is obtained by using the two-sided looped function method and the free matrix integral inequality technique.Furthermore,the relationship of uncertain data transmission delay and the sampling period in networked sampled-data control system is investigated.Simulation results show that reducing the sampling period can improve the robustness of networked control system.Key words:networked control system;two-sided looped-functionals;free-matrix-based integral inequality;uncertain delayCitation:ZENG Hongbing,YAN Junjie,XIAO Huiqin.Stability analysis of networked control system based on two-sided looped functionals.Control Theory&Applications,2023,40(3):525–5301引言伴随着计算机和网络通讯技术的迅速发展,网络控制系统(networked control system,NCS)得到了广泛的关注[1].在网络控制系统中,采样作为信息处理的前端,在网络通讯、数据传输的过程中起着重要作用.增大采样周期可以降低信息的传输量,从而有效的节约网络通讯资源.此外,系统元件执行的速度和网络带宽是有限的,在信号的采集和传输中不可避免的会产生拥堵,从而导致数据传输时滞.因此,在具有数据传输时滞的网络环境下研究不确定时滞与保证系统稳定允许的最大采样周期之间的关系对实际网络控制系统的设计具有指导意义.目前,采样控制系统已经被学者们广泛研究并取得了许多重要成果[2–6].针对采样控制系统问题的稳定性问题,主要采用以下3种方法:1)离散时间系统方法[3],此方法主要应用于确定采样区间,它将采样系统构建为一个离散的时间系统模型,并基于离散时间系统理论进行分析;2)脉冲系统方法[4],此方法将采样系统建立成一个脉冲系统进行分析;3)输入时滞方法[5–6],此方法是将采样系统转化为具有输入时滞的连续时间系统,并基于连续时间系统理论进行分析.2012年,针对采样控制系统的稳定性问题,文献收稿日期:2021−10−21;录用日期:2022−07−27.†通信作者.E-mail:xiaohq***********;Tel.:+86731-22183270.本文责任编委:施阳.国家自然科学基金项目(62173136),湖南省自然科学基金项目(2020JJ2013,2021JJ50047)资助.Supported by the National Natural Science Foundation of China(62173136)and the National Natural Science Foundation of Hunan Province (2020JJ2013,2021JJ50047).。
摘要摘要随着干扰类型日趋多样化与复杂化,单一模式的制导方式抗干扰性能越来越低,为了有效对抗干扰,多模复合制导方式得到了广泛应用。
由于多模复合制导需要将各个传感器的数据信息融合输出,因此如何判断是否有传感器受到干扰并且确定是哪个传感器被干扰会直接影响导引头的跟踪结果。
本论文的研究工作就是基于雷达、红外、激光三模复合导引头的协同抗干扰问题展开的。
首先,本文简单分析了雷达、红外、激光可能会面临的干扰以及这些干扰的机理,然后基于此分析,针对该导引头的共口径结构,设计出了一种分布式、特征级的三模复合信息融合抗干扰方案,该方案包括各传感器前端以及信息融合处理单元,其中信息融合处理单元包括航迹形成、时空配准、航迹关联、特征融合和航迹融合五个环节,航迹关联环节实现了该方案的抗干扰策略。
根据该方案,本文先介绍了航迹形成环节需要用到的数据关联算法、航迹管理以及αβ-滤波算法,其中,数据关联算法主要介绍了最近邻数据关联算法和改进的经验联合概率数据关联算法,并通过仿真实验验证了这两个数据关联算法各自的适用情况以及滤波算法的性能。
然后,本文介绍了时空配准、航迹关联算法(MK-NN算法和改进的MK-NN算法)和航迹融合算法,其中,改进的MK-NN算法通过实时的滑窗方法以及参考航迹的引入可以判断出哪个传感器受到了干扰,仿真结果表明该算法比MK-NN算法的抗干扰能力强,可以有效提高导引头的抗干扰能力。
最后,依据前文介绍的各种算法,为了将信息融合协同抗干扰技术进行工程实现,本文建立了一套信息融合系统,首先介绍了该信息融合系统的整体工作流程,然后介绍了三模航迹关联策略,并针对导引头的单模、双模以及三模各个制导阶段给出了相应的航迹关联协同抗干扰方案。
为了验证该系统的功能与性能是否满足要求,本文还开发了一套基于计算机建模仿真技术的信息融合数字仿真测试系统,并通过该数字仿真测试系统对信息融合系统的功能、信息融合处理的延时以及抗干扰指标进行了9种模式的测试,其中这9种模式包括单干扰模式以及多种干扰组合模式,测试结果表明了该三模复合信息融合系统可以有效对抗这9种模式的干扰,而任一单一模式都无法完全抵抗这些干扰,从而证明了该协同抗干扰技术的有效性。
子空间辨识彩色融合图像质量主观评价实验张晓东;高绍姝;王宇轩;仵宇【摘要】为了发掘图像质量单一评价指标对融合图像目标探测性的影响,采集典型场景图像,开展彩色融合图像主观评价实验.考虑相邻图像的质量评价结果对当前图像评价结果的影响,提出采用子空间辨识方法建立基于目标探测的融合图像综合质量高阶预测模型.通过对实验结果的分析与处理,证实该方法的可行性和有效性,获得了较好的实验结果和实验教学效果,为进一步实现基于目标探测的双波段彩色融合图像质量客观评价提供实验基础与理论依据.【期刊名称】《实验室研究与探索》【年(卷),期】2018(037)007【总页数】5页(P136-139,188)【关键词】彩色融合图像质量;子空间辨识;主观评价【作者】张晓东;高绍姝;王宇轩;仵宇【作者单位】中国石油大学(华东)计算机与通信工程学院,山东青岛266580;中国石油大学(华东)计算机与通信工程学院,山东青岛266580;东北农业大学电气与信息学院,哈尔滨150030;中国石油大学(华东)计算机与通信工程学院,山东青岛266580【正文语种】中文【中图分类】G6420 引言彩色(夜视)融合技术通过数字图像处理方法,将同一场景的可见光(微光)与红外多波段的灰度源图像合成一幅适于人眼观察的彩色图像,可以帮助观察者更快更准确的探测目标,在军事侦察,安全监控,海上救援等方面展现出广阔的应用前景。
研究人员能够通过目标亮度,目标与背景的亮度差,色调差等衡量彩色融合图像中目标的可探测程度[1]。
同时也能够根据目标探测率,虚警率以及目标与背景的颜色距离等评价融合图像的目标探测性[2-3]。
但目前缺乏公认的彩色(夜视)融合图像质量评价理论和方法,直接影响到彩色夜视成像系统的评价。
如何基于目标探测性评价融合图像的综合质量成为亟待解决的问题之一。
彩色(夜视)融合图像质量主观评价实验需要组织学生对图像质量进行主观评价,获取可靠的评价数据,为验证和比较图像质量客观评价模型提供统一的衡量标准。
后融合算法介绍---------------------------------------------------------------------- 后融合算法(Post-fusion algorithm)是一种用于将多个分类器或模型的输出结果进行集成和融合的方法,它通过结合不同模型的预测结果,以提高整体性能和准确性。
后融合算法通常可以分为以下几种类型:1、投票算法(Voting Algorithm):投票算法是一种简单而直观的后融合方法。
它基于多数表决的原则,将多个模型的预测结果进行统计并选择获得最高票数的类别作为最终预测结果。
2、加权平均算法(Weighted Average Algorithm):加权平均算法是一种根据模型的性能和可信度对预测结果进行加权平均的方法。
较好的模型通常会被赋予更高的权重,从而更大程度地影响最终结果。
3、基于规则的方法(Rule-based Methods):基于规则的方法依靠事先定义的规则来处理和组合多个模型的输出结果。
这些规则可以考虑模型之间的一致性、置信度、偏差等因素,以确定最佳的融合策略。
4、贝叶斯方法(Bayesian Methods):贝叶斯方法使用概率模型来描述模型之间的关系,并基于贝叶斯公式计算融合后的概率分布。
通过考虑模型的先验知识和数据的后验概率,贝叶斯方法可以更准确地进行后融合。
5、集成学习方法(Ensemble Learning Methods):集成学习方法通过训练多个模型并结合它们的预测结果来实现后融合。
这些方法可以包括随机森林、AdaBoost、Bagging 等技术,以提高分类器的性能和泛化能力。
选择合适的后融合算法取决于具体应用场景、可用模型的类型和数量、数据集的特点等因素。
不同的算法和方法在不同的情况下可能会产生不同的效果。
需要注意的是,后融合算法的设计和调优也需要注意防止过拟合和提高稳定性,避免引入额外的误差。
MTI-3-8A7G6-DKFeatures▪Full-featured AHRS on 12.1 x 12.1 mm module▪Roll/pitch accuracy (dynamic) 1.0 degFigure 1: MTi 1-seriesTable of ContentsT ABLE OF C ONTENTS (2)1GENERAL INFORMATION (3)1.1O RDERING I NFORMATION (3)1.2B LOCK D IAGRAM (3)1.3T YPICAL A PPLICATION (4)1.4P IN C ONFIGURATION (4)1.5P IN MAP (5)1.6P IN D ESCRIPTIONS (6)1.7P ERIPHERAL INTERFACE SELECTION (6)1.7.1I2C (7)1.7.2SPI (7)1.7.3UART half duplex (7)1.7.4UART full duplex with RTS/CTS flow control (8)1.8R ECOMMENDED EXTERNAL COMPONENTS (8)2MTI 1-SERIES ARCHITECTURE (9)2.1MT I 1-SERIES CONFIGURATIONS (9)2.1.1MTi-1 IMU (9)2.1.2MTi-2 VRU (9)2.1.3MTi-3 AHRS (9)2.2S IGNAL PROCESSING PIPELINE (10)2.2.1Strapdown integration (10)2.2.2XKF3TM Sensor Fusion Algorithm (10)2.2.3Frames of reference used in MTi 1-series (11)33D ORIENTATION AND PERFORMANCE SPECIFICATIONS (12)3.13D O RIENTATION SPECIFICATIONS (12)3.2S ENSORS SPECIFICATIONS (12)4SENSOR CALIBRATION (14)5SYSTEM AND ELECTRICAL SPECIFICATIONS (15)5.1I NTERFACE SPECIFICATIONS (15)5.2S YSTEM SPECIFICATIONS (15)5.3E LECTRICAL SPECIFICATIONS (16)5.4A BSOLUTE MAXIMUM RATINGS (16)6MTI 1-SERIES SETTINGS AND OUTPUTS (17)6.1M ESSAGE STRUCTURE (17)6.2O UTPUT SETTINGS (18)6.3MTD ATA2 (19)6.4S YNCHRONIZATION AND TIMING (20)7MAGNETIC INTERFERENCE (21)7.1M AGNETIC F IELD M APPING (21)7.2A CTIVE H EADING S TABILIZATION (AHS) (21)8PACKAGE AND HANDLING (22)8.1P ACKAGE DRAWING (22)8.2P ACKAGING (23)8.3R EFLOW SPECIFICATION (23)9TRADEMARKS AND REVISIONS (24)9.1T RADEMARKS (24)9.2R EVISIONS (24)Figure 2: MTi 1-series module block diagramFigure 3: Typical application Figure 4: Pin assignmentFigure 8: External components (I2C interface) Figure 9: External components (UART interface)2 MTi 1-series architectureThis section discusses the MTi 1-series architecture including the various configurations and the signal processing pipeline.2.1 MTi 1-series configurationsThe MTi 1-series is a fully-tested self-contained module that can 3D output orientation data (Euler angles (roll, pitch, yaw), rotation matrix (DCM) and quaternions), orientation and velocity increments (∆q and ∆v) and sensors data (acceleration, rate of turn, magnetic field). The MTi 1-series module is available as an Inertial Measurement Unit (IMU), Vertical Reference Unit (VRU) and Attitude and Heading Reference System (AHRS). Depending on the product, output options may be limited to sensors data and/or unreferenced yaw.All MTi’s feature a 3D accelerometer/gyroscope combo-sensor, a magnetometer, a high-accuracy crystal and a low-power MCU. The MCU coordinates the synchronization and timing of the various sensors, it applies calibration models (e.g. temperature modules) and output settings and runs the sensor fusion algorithm. The MCU also generates output messages according to the proprietary XBus communication protocol. The messages and the data output are fully configurable, so that the MTi 1-series limits the load, and thus power consumption, on the application processor.2.1.1 MTi-1 IMUThe MTi-1 module is an Inertial Measurement Unit (IMU) that outputs 3D rate of turn, 3D acceleration and 3D magnetic field. The MTi-1 also outputs coning and sculling compensated orientation increments and velocity increments (∆q and ∆v) from its AttitudeEngine TM. Advantages over a gyroscope-accelerometer combo-sensor are the inclusion of synchronized magnetic field data, on-board signal processing and the easy-to-use communication protocol. Moreover, the testing and calibration performed by Xsens result in a robust and reliable sensor module, that can be integrated within a short time frame. The signal processing pipeline and the suite of output options allow access to the highest possible accuracy at any bandwidth, limiting the load on the application processor.2.1.2 MTi-2 VRUThe MTi-2 is a 3D vertical reference unit (VRU). Its orientation algorithm (XKF3TM) outputs 3D orientation data with respect to a gravity referenced frame: drift-free roll, pitch and unreferenced yaw. In addition, it outputs calibrated sensor data: 3D acceleration, 3D rate of turn and 3D earth-magnetic field data. All modules of the MTi 1-series are also capable of outputting data generated by the strapdown integration algorithm (the AttitudeEngine TM outputting orientation and velocity increments ∆q and ∆v). The3D acceleration is also available as so-called free acceleration which has gravity subtracted. Although the yaw is unreferenced, though still superior to gyroscope integration. With the feature Active Heading Stabilization (AHS, see section 7.2) the drift in unreferenced yaw can be limited to 1 deg after 60 minutes, even in magnetically disturbed environments. 2.1.3 MTi-3 AHRSThe MTi-3 supports all features of the MTi-1 and MTi-2, and in addition is a full gyro-enhanced Attitude and Heading Reference System (AHRS). It outputs drift-free roll, pitch and true/magnetic North referenced yaw and sensors data: 3D acceleration, 3D rate of turn, as well as 3D orientation and velocity increments (∆q and ∆v), and 3D earth-magnetic field data. Free acceleration is also available for the MTi-3 AHRS.2.2 Signal processing pipelineThe MTi 1-series is a self-contained module, so all calculations and processes such as sampling, coning and sculling compensation and the Xsens XKF3TM sensor fusion algorithm run on board.2.2.1 Strapdown integrationThe Xsens optimized strapdown algorithm (AttitudeEngine TM) performs high-speed dead-reckoning calculations at 1 kHz allowing accurate capture of high frequency motions. This approach ensures a high bandwidth. Orientation and velocity increments are calculated with full coning and sculling compensation. At an output data rate of up to 100 Hz, no information is lost, yet the output data rate can be configured low enough for systems with limited communication bandwidth. These orientation and velocity increments are suitable for any 3D motion tracking algorithm. Increments are internally time-synchronized with the magnetometer data.2.2.2 XKF3TM Sensor Fusion AlgorithmXKF3 is a sensor fusion algorithm, based on Extended Kalman Filter framework that uses 3D inertial sensor data (orientation and velocity increments) and 3D magnetometer, also known as ‘9D’ to optimally estimate 3D orientation with respect to an Earth fixed frame.XKF3 takes the orientation and velocity increments together with the magnetic field updates and fuses this to produce a stable orientation (roll, pitch and yaw) with respect to the earth fixed frame. The XKF3 sensor fusion algorithm can be processed with filter profiles. These filter profiles contain predefined filter parameter settings suitable for different user application scenarios.The following filter profiles are available:∙General– suitable for most applications.Supported by the MTi-3 module.∙Dynamic– assumes that the motion is highly dynamic. Supported by the MTi-3 module.∙High_mag_dep– heading corrections rely on the magnetic field measured. To be usedwhen magnetic field is homogeneous.Supported by the MTi-3 module.∙Low_mag_dep– heading corrections are less dependent on the magnetic fieldmeasured. Heading is still based onmagnetic field, but more distortions areexpected with less trust being placed onmagnetic measurements. Supported by theMTi-3 module.∙VRU_general– Roll and pitch are thereferenced to the vertical (gravity), yaw isdetermined by stabilized dead-reckoning,referred to as Active Heading Stabilization(AHS) which significantly reduces headingdrift, see also section 7.2. Consider usingVRU_general in environments that have aheavily disturbed magnetic field. TheVRU_general filter profile is the only filterprofile available for the MTi-2-VRU, alsosupported by the MTi-3 modulezxyFigure 10: Default sensor fixed coordinate system for the MTi 1-series moduleIt is straightforward to apply a rotation matrix to the MTi, so that the velocity and orientation increments, free acceleration and the orientation output is output using that coordinate frame. The default reference coordinate system is East-North-Up (ENU) and the MTi 1-series has predefined output options for North-East-Down (NED) and North-West-Up (NWU). Any arbitrary alignment can be entered. These orientation resets have effect on all outputs that are by default outputted with an ENU reference coordinate system.4 Sensor calibrationEach MTi is individually calibrated and tested over its temperature range. The (simplified) sensor model of the gyroscopes, accelerometers and magnetometers can be represented as following:s=K T−1(u−b T)s = sensor data of the gyroscopes, accelerometers and magnetometers in rad/s, m/s2 or a.u. respectivelyK T-1= gain and misalignment matrix (temperature compensated)u = sensor value before calibration (unsigned 16-bit integers from the sensor)b T= bias (temperature compensated)Xsens’ calibration procedure calibrate s for many parameters, including bias (offset), alignment of the sensors with respect to the module PCB and each other and gain (scale factor). All calibration values are temperature dependent and temperature calibrated. The calibration values are stored in non-volatile memory in the MTi.7 Magnetic interferenceMagnetic interference can be a major source of error for the heading accuracy of any Attitude and Heading Reference System (AHRS). As an AHRS uses the magnetic field to reference the dead-reckoned orientation on the horizontal plane with respect to the (magnetic) North, a severe and prolonged distortion in that magnetic field will cause the magnetic reference to be inaccurate. The MTi 1-series module has several ways to cope with these distortions to minimize the effect on the estimated orientation.7.1 Magnetic Field MappingWhen the distortion is deterministic, i.e. when the distortion moves with the MTi, the MTi can be calibrated for this distortion this type of errors are usually referred to as soft and hard iron distortions. The Magnetic Field Mapping procedure compensates for both hard-iron and soft-iron distortions.In short, the magnetic field mapping (calibration) is performed by moving the MTi together with theobject/platform that is causing the distortion. On an external computer (Windows or Linux), the results are processed and the updated magnetic field calibration values are written to the non-volatile memory of the MTi 1-series module. The magnetic field mapping procedure is extensively documented in the Magnetic Field Mapper User Manual (MT0202P), available in the MT Software Suite. 7.2 Active Heading Stabilization (AHS) It is often not possible or desirable to connect the MTi 1-series module to a high-level processor/host system, so that the Magnetic Field Mapping procedure is not an option. Also, when the distortion is non-deterministic the Magnetic Field Mapping procedure does not yield the desired result. For all these situations, the on-board XKF3 sensor fusion algorithm has integrated an algorithm called Active Heading Stabilization (AHS).The AHS algorithm delivers excellent heading tracking accuracy. Heading tracking drift in the MTi 1-series can be as low as 1 deg per hour, while being fully immune to magnetic distortions.AHS is only available in the VRU_general filter profile. This filter profile is the only filter profile in the MTi-2 VRU and one of the 5 available filter profiles in the MTi-3 AHRS.8 Package and handlingNote that this is a mechanical shock (g) sensitive device. Proper handling is required to prevent damage to the part. Note that this is an ESD-sensitive device. Proper handling is required to prevent damage to the part.8.1 Package drawingThe MTi 1-series module is compatible with JEDEC PLCC28 IC-sockets.Figure 11: General tolerances are +/- 0.1 mmFigure 12: Recommended MTi 1-series module footprint8.2 PackagingThe MTi 1-series module is shipped in trays. Trays are available with a MOQ of 20 modules. A full tray contains 152 modules.Figure 13: A tray containing 20 MTi 1-series modules8.3 Reflow specificationThe moisture sensitivity level of the MTi 1-series modules corresponds to JEDEC MSL Level 3, see also: ∙IPC/JEDEC J-STD-020E “Joint Indus try Standard: Moisture/Reflow Sensitivity Classification for non-hermetic Solid State Surface Mount Devices”∙IPC/JEDEC J-STD-033C “Joint Industry Standard: Handling, Packing, Shipping and Use of Moisture/Reflow Sensitive Surface Mount Devices”.The sensor fulfils the lead-free soldering requirements of the above-mentioned IPC/JEDEC standard, i.e. reflow soldering with a peak temperature up to 260°C. Recommended Preheat Area (t s) is 80-100 sec. The minimum height of the solder after reflow shall be at least 50µm. This is required for good mechanical decoupling between the MTi 1-series module and the printed circuit board (PCB) it is mounted on. Assembled PCB’s may NOT be cleaned with ultrasonic cleaning.MTI-3-8A7G6-DK。
华南农业大学学报 Journal of South China Agricultural University 2024, 45(2): 293-303DOI: 10.7671/j.issn.1001-411X.202212020刘茗洋, 崔凯, 宫金良, 等. 基于图像融合的不同成熟阶段苹果果实识别[J]. 华南农业大学学报, 2024, 45(2): 293-303.LIU Mingyang, CUI Kai, GONG Jinliang, et al. Apple fruit recognition at different maturity stages based on image fusion[J]. Journal of South China Agricultural University, 2024, 45(2): 293-303.基于图像融合的不同成熟阶段苹果果实识别刘茗洋1,崔 凯2,宫金良3,张彦斐1(1 山东理工大学 农业工程与食品科学学院, 山东 淄博 255000; 2 南京蜻蜓智慧农业研究院有限公司,江苏 南京 210019; 3 山东理工大学 机械工程学院, 山东 淄博 255000)摘要: 【目的】针对复杂农业环境中不同成熟阶段苹果目标识别困难的问题,研究一种基于图像融合的苹果识别算法。
【方法】采用保边性能较好的均值漂移滤波对图像进行预处理,滤除少量背景噪声。
分别从RGB颜色空间和YIQ颜色空间提取R−G分量和I分量特征图像,采用像素级图像融合算法融合2幅特征图像信息,突出显示果实目标区域。
利用Otsu自适应阈值算法获得最佳阈值,将目标苹果从背景中分割出来。
为识别苹果目标,提出一种基于改进梯度场的Hough变换圆检测算法,通过引入形态学重建算法清理背景中残留的小面积区域,提高检测效率;同时以分割的苹果二值图像为判断标准构造剔除虚假圆算法,避免检测出现虚假目标。