Unifying statistical texture classification frameworks

格式：pdf
大小：784.89 KB
文档页数：17

下载文档原格式

Multi-scale structural similarity for image quality assesment

MULTI-SCALE STRUCTURAL SIMILARITY FOR IMAGE QUALITY ASSESSMENT Zhou Wang1,Eero P.Simoncelli1and Alan C.Bovik2(Invited Paper)1Center for Neural Sci.and Courant Inst.of Math.Sci.,New York Univ.,New York,NY10003 2Dept.of Electrical and Computer Engineering,Univ.of Texas at Austin,Austin,TX78712 Email:zhouwang@,eero.simoncelli@,bovik@ABSTRACTThe structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene,and therefore a measure of structural similarity can provide a good approxima-tion to perceived image quality.This paper proposes a multi-scale structural similarity method,which supplies moreﬂexibility than previous single-scale methods in incorporating the variations of viewing conditions.We develop an image synthesis method to calibrate the parameters that deﬁne the relative importance of dif-ferent scales.Experimental comparisons demonstrate the effec-tiveness of the proposed method.1.INTRODUCTIONObjective image quality assessment research aims to design qual-ity measures that can automatically predict perceived image qual-ity.These quality measures play important roles in a broad range of applications such as image acquisition,compression,commu-nication,restoration,enhancement,analysis,display,printing and watermarking.The most widely used full-reference image quality and distortion assessment algorithms are peak signal-to-noise ra-tio(PSNR)and mean squared error(MSE),which do not correlate well with perceived quality(e.g.,[1]–[6]).Traditional perceptual image quality assessment methods are based on a bottom-up approach which attempts to simulate the functionality of the relevant early human visual system(HVS) components.These methods usually involve1)a preprocessing process that may include image alignment,point-wise nonlinear transform,low-passﬁltering that simulates eye optics,and color space transformation,2)a channel decomposition process that trans-forms the image signals into different spatial frequency as well as orientation selective subbands,3)an error normalization process that weights the error signal in each subband by incorporating the variation of visual sensitivity in different subbands,and the vari-ation of visual error sensitivity caused by intra-or inter-channel neighboring transform coefﬁcients,and4)an error pooling pro-cess that combines the error signals in different subbands into a single quality/distortion value.While these bottom-up approaches can conveniently make use of many known psychophysical fea-tures of the HVS,it is important to recognize their limitations.In particular,the HVS is a complex and highly non-linear system and the complexity of natural images is also very signiﬁcant,but most models of early vision are based on linear or quasi-linear oper-ators that have been characterized using restricted and simplistic stimuli.Thus,these approaches must rely on a number of strong assumptions and generalizations[4],[5].Furthermore,as the num-ber of HVS features has increased,the resulting quality assessment systems have become too complicated to work with in real-world applications,especially for algorithm optimization purposes.Structural similarity provides an alternative and complemen-tary approach to the problem of image quality assessment[3]–[6].It is based on a top-down assumption that the HVS is highly adapted for extracting structural information from the scene,and therefore a measure of structural similarity should be a good ap-proximation of perceived image quality.It has been shown that a simple implementation of this methodology,namely the struc-tural similarity(SSIM)index[5],can outperform state-of-the-art perceptual image quality metrics.However,the SSIM index al-gorithm introduced in[5]is a single-scale approach.We consider this a drawback of the method because the right scale depends on viewing conditions(e.g.,display resolution and viewing distance). In this paper,we propose a multi-scale structural similarity method and introduce a novel image synthesis-based approach to calibrate the parameters that weight the relative importance between differ-ent scales.2.SINGLE-SCALE STRUCTURAL SIMILARITYLet x={x i|i=1,2,···,N}and y={y i|i=1,2,···,N}be two discrete non-negative signals that have been aligned with each other(e.g.,two image patches extracted from the same spatial lo-cation from two images being compared,respectively),and letµx,σ2x andσxy be the mean of x,the variance of x,and the covariance of x and y,respectively.Approximately,µx andσx can be viewed as estimates of the luminance and contrast of x,andσxy measures the the tendency of x and y to vary together,thus an indication of structural similarity.In[5],the luminance,contrast and structure comparison measures were given as follows:l(x,y)=2µxµy+C1µ2x+µ2y+C1,(1)c(x,y)=2σxσy+C2σ2x+σ2y+C2,(2)s(x,y)=σxy+C3σxσy+C3,(3) where C1,C2and C3are small constants given byC1=(K1L)2,C2=(K2L)2and C3=C2/2,(4)Fig.1.Multi-scale structural similarity measurement system.L:low-passﬁltering;2↓:downsampling by2. respectively.L is the dynamic range of the pixel values(L=255for8bits/pixel gray scale images),and K1 1and K2 1aretwo scalar constants.The general form of the Structural SIMilarity(SSIM)index between signal x and y is deﬁned as:SSIM(x,y)=[l(x,y)]α·[c(x,y)]β·[s(x,y)]γ,(5)whereα,βandγare parameters to deﬁne the relative importanceof the three components.Speciﬁcally,we setα=β=γ=1,andthe resulting SSIM index is given bySSIM(x,y)=(2µxµy+C1)(2σxy+C2)(µ2x+µ2y+C1)(σ2x+σ2y+C2),(6)which satisﬁes the following conditions:1.symmetry:SSIM(x,y)=SSIM(y,x);2.boundedness:SSIM(x,y)≤1;3.unique maximum:SSIM(x,y)=1if and only if x=y.The universal image quality index proposed in[3]corresponds to the case of C1=C2=0,therefore is a special case of(6).The drawback of such a parameter setting is that when the denominator of Eq.(6)is close to0,the resulting measurement becomes unsta-ble.This problem has been solved successfully in[5]by adding the two small constants C1and C2(calculated by setting K1=0.01 and K2=0.03,respectively,in Eq.(4)).We apply the SSIM indexing algorithm for image quality as-sessment using a sliding window approach.The window moves pixel-by-pixel across the whole image space.At each step,the SSIM index is calculated within the local window.If one of the image being compared is considered to have perfect quality,then the resulting SSIM index map can be viewed as the quality map of the other(distorted)image.Instead of using an8×8square window as in[3],a smooth windowing approach is used for local statistics to avoid“blocking artifacts”in the quality map[5].Fi-nally,a mean SSIM index of the quality map is used to evaluate the overall image quality.3.MULTI-SCALE STRUCTURAL SIMILARITY3.1.Multi-scale SSIM indexThe perceivability of image details depends the sampling density of the image signal,the distance from the image plane to the ob-server,and the perceptual capability of the observer’s visual sys-tem.In practice,the subjective evaluation of a given image varies when these factors vary.A single-scale method as described in the previous section may be appropriate only for speciﬁc settings.Multi-scale method is a convenient way to incorporate image de-tails at different resolutions.We propose a multi-scale SSIM method for image quality as-sessment whose system diagram is illustrated in Fig. 1.Taking the reference and distorted image signals as the input,the system iteratively applies a low-passﬁlter and downsamples theﬁltered image by a factor of2.We index the original image as Scale1, and the highest scale as Scale M,which is obtained after M−1 iterations.At the j-th scale,the contrast comparison(2)and the structure comparison(3)are calculated and denoted as c j(x,y) and s j(x,y),respectively.The luminance comparison(1)is com-puted only at Scale M and is denoted as l M(x,y).The overall SSIM evaluation is obtained by combining the measurement at dif-ferent scales usingSSIM(x,y)=[l M(x,y)]αM·Mj=1[c j(x,y)]βj[s j(x,y)]γj.(7)Similar to(5),the exponentsαM,βj andγj are used to ad-just the relative importance of different components.This multi-scale SSIM index deﬁnition satisﬁes the three conditions given in the last section.It also includes the single-scale method as a spe-cial case.In particular,a single-scale implementation for Scale M applies the iterativeﬁltering and downsampling procedure up to Scale M and only the exponentsαM,βM andγM are given non-zero values.To simplify parameter selection,we letαj=βj=γj forall j’s.In addition,we normalize the cross-scale settings such thatMj=1γj=1.This makes different parameter settings(including all single-scale and multi-scale settings)comparable.The remain-ing job is to determine the relative values across different scales. Conceptually,this should be related to the contrast sensitivity func-tion(CSF)of the HVS[7],which states that the human visual sen-sitivity peaks at middle frequencies(around4cycles per degree of visual angle)and decreases along both high-and low-frequency directions.However,CSF cannot be directly used to derive the parameters in our system because it is typically measured at the visibility threshold level using simpliﬁed stimuli(sinusoids),but our purpose is to compare the quality of complex structured im-ages at visible distortion levels.3.2.Cross-scale calibrationWe use an image synthesis approach to calibrate the relative impor-tance of different scales.In previous work,the idea of synthesizing images for subjective testing has been employed by the“synthesis-by-analysis”methods of assessing statistical texture models,inwhich the model is used to generate a texture with statistics match-ing an original texture,and a human subject then judges the sim-ilarity of the two textures [8]–[11].A similar approach has also been qualitatively used in demonstrating quality metrics in [5],[12],though quantitative subjective tests were not conducted.These synthesis methods provide a powerful and efﬁcient means of test-ing a model,and have the added beneﬁt that the resulting images suggest improvements that might be made to the model[11].M )distortion level (MSE)12345Fig.2.Demonstration of image synthesis approach for cross-scale calibration.Images in the same row have the same MSE.Images in the same column have distortions only in one speciﬁc scale.Each subject was asked to select a set of images (one from each scale),having equal quality.As an example,one subject chose the marked images.For a given original 8bits/pixel gray scale test image,we syn-thesize a table of distorted images (as exempliﬁed by Fig.2),where each entry in the table is an image that is associated witha speciﬁc distortion level (deﬁned by MSE)and a speciﬁc scale.Each of the distorted image is created using an iterative procedure,where the initial image is generated by randomly adding white Gaussian noise to the original image and the iterative process em-ploys a constrained gradient descent algorithm to search for the worst images in terms of SSIM measure while constraining MSE to be ﬁxed and restricting the distortions to occur only in the spec-iﬁed scale.We use 5scales and 12distortion levels (range from 23to 214)in our experiment,resulting in a total of 60images,as demonstrated in Fig.2.Although the images at each row has the same MSE with respect to the original image,their visual quality is signiﬁcantly different.Thus the distortions at different scales are of very different importance in terms of perceived image quality.We employ 10original 64×64images with different types of con-tent (human faces,natural scenes,plants,man-made objects,etc.)in our experiment to create 10sets of distorted images (a total of 600distorted images).We gathered data for 8subjects,including one of the authors.The other subjects have general knowledge of human vision but did not know the detailed purpose of the study.Each subject was shown the 10sets of test images,one set at a time.The viewing dis-tance was ﬁxed to 32pixels per degree of visual angle.The subject was asked to compare the quality of the images across scales and detect one image from each of the ﬁve scales (shown as columns in Fig.2)that the subject believes having the same quality.For example,one subject chose the images marked in Fig.2to have equal quality.The positions of the selected images in each scale were recorded and averaged over all test images and all subjects.In general,the subjects agreed with each other on each image more than they agreed with themselves across different images.These test results were normalized (sum to one)and used to calculate the exponents in Eq.(7).The resulting parameters we obtained are β1=γ1=0.0448,β2=γ2=0.2856,β3=γ3=0.3001,β4=γ4=0.2363,and α5=β5=γ5=0.1333,respectively.4.TEST RESULTSWe test a number of image quality assessment algorithms using the LIVE database (available at [13]),which includes 344JPEG and JPEG2000compressed images (typically 768×512or similar size).The bit rate ranges from 0.028to 3.150bits/pixel,which allows the test images to cover a wide quality range,from in-distinguishable from the original image to highly distorted.The mean opinion score (MOS)of each image is obtained by averag-ing 13∼25subjective scores given by a group of human observers.Eight image quality assessment models are being compared,in-cluding PSNR,the Sarnoff model (JNDmetrix 8.0[14]),single-scale SSIM index with M equals 1to 5,and the proposed multi-scale SSIM index approach.The scatter plots of MOS versus model predictions are shown in Fig.3,where each point represents one test image,with its vertical and horizontal axes representing its MOS and the given objective quality score,respectively.To provide quantitative per-formance evaluation,we use the logistic function adopted in the video quality experts group (VQEG)Phase I FR-TV test [15]to provide a non-linear mapping between the objective and subjective scores.After the non-linear mapping,the linear correlation coef-ﬁcient (CC),the mean absolute error (MAE),and the root mean squared error (RMS)between the subjective and objective scores are calculated as measures of prediction accuracy .The prediction consistency is quantiﬁed using the outlier ratio (OR),which is de-Table1.Performance comparison of image quality assessment models on LIVE JPEG/JPEG2000database[13].SS-SSIM: single-scale SSIM;MS-SSIM:multi-scale SSIM;CC:non-linear regression correlation coefﬁcient;ROCC:Spearman rank-order correlation coefﬁcient;MAE:mean absolute error;RMS:root mean squared error;OR:outlier ratio.Model CC ROCC MAE RMS OR(%)PSNR0.9050.901 6.538.4515.7Sarnoff0.9560.947 4.66 5.81 3.20 SS-SSIM(M=1)0.9490.945 4.96 6.25 6.98 SS-SSIM(M=2)0.9630.959 4.21 5.38 2.62 SS-SSIM(M=3)0.9580.956 4.53 5.67 2.91 SS-SSIM(M=4)0.9480.946 4.99 6.31 5.81 SS-SSIM(M=5)0.9380.936 5.55 6.887.85 MS-SSIM0.9690.966 3.86 4.91 1.16ﬁned as the percentage of the number of predictions outside the range of±2times of the standard deviations.Finally,the predic-tion monotonicity is measured using the Spearman rank-order cor-relation coefﬁcient(ROCC).Readers can refer to[15]for a more detailed descriptions of these measures.The evaluation results for all the models being compared are given in Table1.From both the scatter plots and the quantitative evaluation re-sults,we see that the performance of single-scale SSIM model varies with scales and the best performance is given by the case of M=2.It can also be observed that the single-scale model tends to supply higher scores with the increase of scales.This is not surprising because image coding techniques such as JPEG and JPEG2000usually compressﬁne-scale details to a much higher degree than coarse-scale structures,and thus the distorted image “looks”more similar to the original image if evaluated at larger scales.Finally,for every one of the objective evaluation criteria, multi-scale SSIM model outperforms all the other models,includ-ing the best single-scale SSIM model,suggesting a meaningful balance between scales.5.DISCUSSIONSWe propose a multi-scale structural similarity approach for image quality assessment,which provides moreﬂexibility than single-scale approach in incorporating the variations of image resolution and viewing conditions.Experiments show that with an appropri-ate parameter settings,the multi-scale method outperforms the best single-scale SSIM model as well as state-of-the-art image quality metrics.In the development of top-down image quality models(such as structural similarity based algorithms),one of the most challeng-ing problems is to calibrate the model parameters,which are rather “abstract”and cannot be directly derived from simple-stimulus subjective experiments as in the bottom-up models.In this pa-per,we used an image synthesis approach to calibrate the param-eters that deﬁne the relative importance between scales.The im-provement from single-scale to multi-scale methods observed in our tests suggests the usefulness of this novel approach.However, this approach is still rather crude.We are working on developing it into a more systematic approach that can potentially be employed in a much broader range of applications.6.REFERENCES[1] A.M.Eskicioglu and P.S.Fisher,“Image quality mea-sures and their performance,”IEEE munications, vol.43,pp.2959–2965,Dec.1995.[2]T.N.Pappas and R.J.Safranek,“Perceptual criteria for im-age quality evaluation,”in Handbook of Image and Video Proc.(A.Bovik,ed.),Academic Press,2000.[3]Z.Wang and A.C.Bovik,“A universal image quality in-dex,”IEEE Signal Processing Letters,vol.9,pp.81–84,Mar.2002.[4]Z.Wang,H.R.Sheikh,and A.C.Bovik,“Objective videoquality assessment,”in The Handbook of Video Databases: Design and Applications(B.Furht and O.Marques,eds.), pp.1041–1078,CRC Press,Sept.2003.[5]Z.Wang,A.C.Bovik,H.R.Sheikh,and E.P.Simon-celli,“Image quality assessment:From error measurement to structural similarity,”IEEE Trans.Image Processing,vol.13, Jan.2004.[6]Z.Wang,L.Lu,and A.C.Bovik,“Video quality assessmentbased on structural distortion measurement,”Signal Process-ing:Image Communication,special issue on objective video quality metrics,vol.19,Jan.2004.[7] B.A.Wandell,Foundations of Vision.Sinauer Associates,Inc.,1995.[8]O.D.Faugeras and W.K.Pratt,“Decorrelation methods oftexture feature extraction,”IEEE Pat.Anal.Mach.Intell., vol.2,no.4,pp.323–332,1980.[9] A.Gagalowicz,“A new method for textureﬁelds synthesis:Some applications to the study of human vision,”IEEE Pat.Anal.Mach.Intell.,vol.3,no.5,pp.520–533,1981. [10] D.Heeger and J.Bergen,“Pyramid-based texture analy-sis/synthesis,”in Proc.ACM SIGGRAPH,pp.229–238,As-sociation for Computing Machinery,August1995.[11]J.Portilla and E.P.Simoncelli,“A parametric texture modelbased on joint statistics of complex wavelet coefﬁcients,”Int’l J Computer Vision,vol.40,pp.49–71,Dec2000. [12]P.C.Teo and D.J.Heeger,“Perceptual image distortion,”inProc.SPIE,vol.2179,pp.127–141,1994.[13]H.R.Sheikh,Z.Wang, A. C.Bovik,and L.K.Cormack,“Image and video quality assessment re-search at LIVE,”/ research/quality/.[14]Sarnoff Corporation,“JNDmetrix Technology,”http:///products_services/video_vision/jndmetrix/.[15]VQEG,“Final report from the video quality experts groupon the validation of objective models of video quality assess-ment,”Mar.2000./.PSNRM O SSarnoffM O S(a)(b)Single−scale SSIM (M=1)M O SSingle−scale SSIM (M=2)M O S(c)(d)Single−scale SSIM (M=3)M O SSingle−scale SSIM (M=4)M O S(e)(f)Single−scale SSIM (M=5)M O SMulti−scale SSIMM O S(g)(h)Fig.3.Scatter plots of MOS versus model predictions.Each sample point represents one test image in the LIVE JPEG/JPEG2000image database [13].(a)PSNR;(b)Sarnoff model;(c)-(g)single-scale SSIM method for M =1,2,3,4and 5,respectively;(h)multi-scale SSIM method.。

高维时空数据的建模与统计推断, 英文

高维时空数据的建模与统计推断, 英文In the realm of data science, the modeling and statistical inference of high-dimensional spatiotemporal data present unique challenges and opportunities. This type of data, which encapsulates information across multiple dimensions and over time, offers a rich source of insights but also poses computational and analytical complexities. The key lies in developing effective techniques that can capture the intricate relationships and patterns inherent in these data, while also accounting for their inherent noise and uncertainty.在数据科学领域，高维时空数据的建模与统计推断既带来了独特的挑战，也提供了丰富的机遇。

这类数据涵盖了多个维度和时间的信息，提供了深入洞察的丰富资源，但同时也带来了计算和分析的复杂性。

关键在于开发有效的技术，这些技术既要能够捕捉数据中固有的复杂关系和模式，又要考虑其固有的噪声和不确定性。

To address these challenges, a multifaceted approach is necessary. On the modeling front, techniques such as dimensionality reduction and sparse modeling can help identify the most relevant features and reduce the computational burden. Machine learning algorithms, especially those designed for handling high-dimensional data, can also be leveraged to capture complex relationships and patterns.为了应对这些挑战，需要采取多方面的方法。

diffusion unet模型

Diffusion UNet模型引言Diffusion UNet是一种用于图像分割的深度学习模型，它结合了UNet和扩散过程的概念。

本文将详细介绍Diffusion UNet模型的原理、结构和应用。

背景图像分割是计算机视觉领域的一个重要任务，它将图像划分为若干个不同的区域，以实现对图像中不同物体的识别和分析。

传统的图像分割方法通常基于手工设计的特征和规则，但随着深度学习技术的发展，基于神经网络的方法在图像分割任务上取得了显著进展。

UNet是一种常用且广泛应用于图像分割任务中的神经网络模型。

它具有编码器-解码器结构，可以有效地捕捉到不同尺度下的特征信息，并生成精细的分割结果。

然而，UNet在处理边界模糊和小目标等问题上仍存在一定局限性。

为了解决这些问题，研究者提出了Diffusion UNet模型。

该模型在UNet基础上引入了扩散过程，通过逐步迭代地传播特征信息来提高分割结果的准确性。

Diffusion UNet模型原理Diffusion UNet模型的核心思想是通过扩散过程来增强特征传播。

具体而言，它通过多次迭代地应用一个扩散操作来逐步更新特征图。

Diffusion操作可以看作是一种信息传播的过程，它将每个像素点周围的特征信息进行平均，并与原始特征图进行融合。

这样做的好处是可以将边界信息逐渐传播到整个图像中，从而提高分割结果的准确性。

Diffusion UNet模型的整体结构与UNet相似，包括编码器和解码器。

编码器用于提取图像的多尺度特征，解码器用于将这些特征重新组合并生成最终的分割结果。

在解码器中，除了常规的上采样和卷积操作外，还加入了Diffusion操作来增强特征传播。

Diffusion UNet模型结构Diffusion UNet模型结构包括编码器、解码器和Diffusion操作三部分。

编码器编码器由多个下采样模块组成，每个下采样模块由两个卷积层和一个最大池化层组成。

这些下采样模块可以逐渐减小特征图的尺寸，同时增加特征图的通道数，以捕捉不同尺度下的特征信息。

融入CBAM的Res-UNet高分辨率遥感影像语义分割模型

融入CBAM的Res-UNet高分辨率遥感影像语义分割模型孙凌辉;赵丽科;李琛;成子怡
【期刊名称】《地理空间信息》
【年(卷),期】2024(22)2
【摘要】针对现有语义分割方法处理复杂遥感影像细节特征识别能力差、信息丢
失等问题,提出一种融合注意力机制的遥感影像语义分割网络模型。

模型主干网络
采用编码器-解码器架构的U-Net模型,为了缓解梯度和网络退化问题,将残差结构
嵌入到主干网络中;同时融入通道、空间注意力模块,兼顾影像的细节特征和模型鲁
棒性。

在ISPRS Potsdam数据集上进行分析验证,实验结果表明,在去除“噪声”、地物边缘“平滑”、细窄地物“连续”、细小目标分割等方面,融入CBAM模块的ResUNet语义分割精度要优于传统网络模型。

【总页数】3页(P68-70)
【作者】孙凌辉;赵丽科;李琛;成子怡
【作者单位】河南工业大学信息科学与工程学院
【正文语种】中文
【中图分类】TP3
【相关文献】
1.图像语义分割方法在高分辨率遥感影像解译中的研究综述
2.空间信息感知语义分割模型的高分辨率遥感影像道路提取
3.卷积神经网络和视觉注意力语义分割模型
在高分辨率遥感影像分类中的性能分析4.基于FPN Res-Unet的高分辨率遥感影像建筑物变化检测
因版权原因，仅展示原文概要，查看原文内容请购买。

稀疏二阶注意力机制驱动的多尺度卷积遥感图像场景分类网络

稀疏二阶注意力机制驱动的多尺度卷积遥感图像场景分类网络倪康;赵雨晴;陈志
【期刊名称】《光子学报》
【年(卷),期】2022(51)6
【摘要】地面目标尺度信息不同及场景图像复杂的空间分布和纹理信息导致基于CNNs的场景分类算法分类效果欠佳。

针对以上问题,从深度特征学习角度出发,提出一种稀疏二阶注意力机制驱动的多尺度卷积神经网络。

首先,在主干网之后引入多尺度卷积层以获取地面目标不同尺度信息目标的特征表述,将组卷积嵌入多尺度卷积层以降低计算复杂度;其次,在分析基于一阶和二阶统计量的注意力机制优势之后,提出一种稀疏二阶注意力机制,以增强不同尺度卷积特征的通道信息可判别性。

该注意力机制的稀疏性可在确保场景分类性能的同时,有效降低二阶统计量的特征维度;最后,将多尺度卷积层与稀疏二阶注意力机制嵌入端到端网络训练。

在AID和NWPU45数据集上的实验表明:本文所提网络可提升场景分类准确率;同时,通过热力图结果对比和消融实验,验证了稀疏二阶注意力机制和所提各网络层的有效性。

【总页数】12页(P386-397)
【作者】倪康;赵雨晴;陈志
【作者单位】南京邮电大学计算机学院;江苏省大数据安全与智能处理重点实验室;首都经济贸易大学管理工程学院
【正文语种】中文
【中图分类】TP751
【相关文献】
1.基于尺度注意力网络的遥感图像场景分类
2.基于双重注意力机制的遥感图像场景分类特征表示方法
3.基于ResNet双注意力机制的遥感图像场景分类
4.基于自注意力卷积网络的遥感图像分类
5.卷积神经网络和特征融合的遥感图像场景分类
因版权原因，仅展示原文概要，查看原文内容请购买。

基于UNet_模型的赛道识别算法研究

第１３卷㊀第８期Ｖｏｌ．１３Ｎｏ．８㊀㊀智㊀能㊀计㊀算㊀机㊀与㊀应㊀用ＩｎｔｅｌｌｉｇｅｎｔＣｏｍｐｕｔｅｒａｎｄＡｐｐｌｉｃａｔｉｏｎｓ㊀㊀２０２３年８月㊀Ａｕｇ．２０２３㊀㊀㊀㊀㊀㊀文章编号：２０９５－２１６３（２０２３）０８－０２０５－０５中图分类号：ＴＰ３９１文献标志码：Ａ基于ＵＮｅｔ模型的赛道识别算法研究徐㊀威，于海滨，余胤翔，巩荣芬（辽宁科技大学电子与信息工程学院，辽宁鞍山１１４０５１）摘㊀要：随着赛道识别问题不断升级，赛道识别的精确度需求大大提升，传统的赛道识别方法受环境变化的影响大㊂近几年来，深度学习方法在赛道识别方面也取得了很好的效果㊂本文就赛道识别问题提出了一种基于Ｋｅｒａｓ的ＵＮｅｔ模型的赛道识别方法，该方法是对赛道图像根据ＵＮｅｔ模型训练结果进行分割，并用细化算法来识别中线，由此判断小车接下来的行径㊂最后，经实际对比实验测试，新方法识别时效性一般，识别准确率高㊂关键词：ＵＮｅｔ模型；赛道识别；细化算法；深度学习ＲｅｓｅａｒｃｈｏｎｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎａｌｇｏｒｉｔｈｍｂａｓｅｄｏｎＵＮｅｔｍｏｄｅｌＸＵＷｅｉ，ＹＵＨａｉｂｉｎ，ＹＵＹｉｎｘｉａｎｇ，ＧＯＮＧＲｏｎｇｆｅｎ（ＳｃｈｏｏｌｏｆＥｌｅｃｔｒｏｎｉｃｓａｎｄＩｎｆｏｒｍａｔｉｏｎＥｎｇｉｎｅｅｒｉｎｇ，ＬｉａｏｎｉｎｇＵｎｉｖｅｒｓｉｔｙｏｆＳｃｉｅｎｃｅａｎｄＴｅｃｈｎｏｌｏｇｙ，ＡｎｓｈａｎＬｉａｏｎｉｎｇ１１４０５１，Ｃｈｉｎａ）ʌＡｂｓｔｒａｃｔɔＷｉｔｈｔｈｅｃｏｎｔｉｎｕｏｕｓｕｐｇｒａｄｉｎｇｏｆｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎｉｓｓｕｅｓ，ｉｔｉｓｒｅｑｕｉｒｅｄｔｏｇｒｅａｔｌｙｉｍｐｒｏｖｅｔｈｅａｃｃｕｒａｃｙｏｆｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎ，ａｎｄｔｒａｄｉｔｉｏｎａｌｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎｍｅｔｈｏｄｓａｒｅｓｉｇｎｉｆｉｃａｎｔｌｙａｆｆｅｃｔｅｄｂｙｅｎｖｉｒｏｎｍｅｎｔａｌｃｈａｎｇｅｓ．Ｉｎｒｅｃｅｎｔｙｅａｒｓ，ｄｅｅｐｌｅａｒｎｉｎｇｍｅｔｈｏｄｓｈａｖｅａｌｓｏａｃｈｉｅｖｅｄｇｏｏｄｒｅｓｕｌｔｓｉｎｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎ．ＴｈｉｓａｒｔｉｃｌｅｐｒｏｐｏｓｅｓａｒａｃｅｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎｍｅｔｈｏｄｂａｓｅｄｏｎＫｅｒａｓᶄｓＵＮｅｔｍｏｄｅｌｆｏｒｒａｃｅｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎ．ＴｈｉｓｍｅｔｈｏｄｓｅｇｍｅｎｔｓｔｈｅｒａｃｅｔｒａｃｋｉｍａｇｅｂａｓｅｄｏｎｔｈｅｔｒａｉｎｉｎｇｒｅｓｕｌｔｓｏｆｔｈｅＵＮｅｔｍｏｄｅｌ，ａｎｄｕｓｅｓａｔｈｉｎｎｉｎｇａｌｇｏｒｉｔｈｍｔｏｉｄｅｎｔｉｆｙｔｈｅｃｅｎｔｅｒｌｉｎｅｆｏｒｄｅｔｅｒｍｉｎｉｎｇｔｈｅｎｅｘｔｂｅｈａｖｉｏｒｏｆｔｈｅｃａｒ．Ｆｉｎａｌｌｙ，ｔｈｒｏｕｇｈａｃｔｕａｌｃｏｍｐａｒａｔｉｖｅｅｘｐｅｒｉｍｅｎｔａｌｔｅｓｔｉｎｇ，ｔｈｅｎｅｗｍｅｔｈｏｄｈａｓａｖｅｒａｇｅｒｅｃｏｇｎｉｔｉｏｎｔｉｍｅｌｉｎｅｓｓａｎｄｈｉｇｈｒｅｃｏｇｎｉｔｉｏｎａｃｃｕｒａｃｙ．ʌＫｅｙｗｏｒｄｓɔＵＮｅｔｍｏｄｅｌ；ｔｒａｃｋｒｅｃｏｇｎｉｔｉｏｎ；ｒｅｆｉｎｅｍｅｎｔａｌｇｏｒｉｔｈｍ；ｄｅｅｐｌｅａｒｎｉｎｇ基金项目：大学生创新创业训练计划项目（Ｓ２０２２１０１４６０３７）㊂作者简介：徐㊀威（２００２－），男，本科生，主要研究方向：机器视觉；于海滨（２００２－），男，本科生，主要研究方向：嵌入式；余胤翔（２００２－），男，本科生，主要研究方向：嵌入式软件和机器视觉；巩荣芬（１９７９－），女，博士，副教授，主要研究方向：模式识别㊁机器学习㊂通讯作者：巩荣芬㊀㊀Ｅｍａｉｌ：１１６１５５５３８１＠ｑｑ．ｃｏｍ收稿日期：２０２３－０１－１１０㊀引㊀言近些年，智能汽车获得了迅猛发展，陆续涵盖了工程控制㊁信息与通信㊁模式识别㊁传感技术㊁电气工程㊁计算机等多个学科及领域㊂研究可知，智能汽车技术在交通运输㊁智能驾驶等方面有着广阔的应用前景与发展空间［１］㊂智能汽车的核心部分是赛道元素识别㊁方向和速度控制，精准的赛道元素识别算法是方向和速度控制的基础和前提，尤其是面对复杂路况时赛道的识别是系统设计的难点［２］㊂智能小车相比于真实汽车有着专属的特点和用途，智能小车具有机械结构简单轻便㊁驾驶模式低速安全㊁整车易于改造实现等特点，可以用于智能汽车核心控制系统的研发设计［３］㊂最近几年，深度学习方法在赛道识别方面也取得了很好的效果㊂这些方法可以自动学习特征，从而更加精确地识别出赛道㊂本文就智能小车的赛道识别问题提出了一种基于Ｋｅｒａｓ的ＵＮｅｔ模型的识别方法，可对逐帧图像进行识别处理，划出中线集，来判断道路状况，以此来实现对赛道元素的精准识别㊂１㊀传统赛道识别现状赛道识别是从图像或视频数据中准确识别出赛道，这是自动驾驶㊁机器视觉和计算机视觉等领域的基础㊂目前，赛道识别研究已经较为成熟，其中一些传统的赛道识别方法包括颜色分割㊁特征提取㊁边缘检测算法㊂最近几年，深度学习的赛道识别方法也取得了很好的效果㊂这些方法可以实现自动学习特征，以此更加准确地识别出赛道㊂为了更准确㊁更快速地对赛道进行识别，需要结合深度学习和计算机视觉等技术进行改进㊂２㊀赛道识别实现思路２．１㊀基于ＵＮｅｔ模型的赛道识别算法基于ＵＮｅｔ模型的赛道识别算法主要可以分成２个部分㊂第一部分是对输入进的图像根据网络训练出的结果进行预测，将图像分割成背景和赛道两部分；第二部分是对赛道部分进行细化算法求取中点集㊂最后，根据中点集就可以预测智能车未来的行径㊂２．２㊀ＵＮｅｔ模型实现ＵＮｅｔ是一种用于图像分割任务的卷积神经网络模型㊂由Ｒｏｎｎｅｂｅｒｇｅｒ等学者在２０１５年提出，并广泛应用在医学图像分割领域㊂该模型的特点是具有对称的Ｕ形结构［４］，其中包含一个下采样路径和一个上采样路径，使得在保持空间分辨率的同时，能够对图像进行有效的语义分割㊂ＵＮｅｔ的训练过程通常使用交叉熵损失函数，并且可以使用数据增强技术来增加数据的多样性㊂该模型在医学图像分割方面的应用，不仅能够对肿瘤和器官进行有效的分割，而且在其他领域㊁如自然图像分割和道路分割等方面也取得了可观成果㊂２．２．１㊀主干特征提取采用的主干特征提取网络为ＶＧＧ１６㊂ＶＧＧ１６总共有１６层，包括１３个卷积层和３个全连接层㊂第１次经过６４个卷积核的２次卷积后，采用１次ｐｏｏｌｉｎｇ，第２次再经过２次１２８个卷积核卷积后，再采用ｐｏｏｌｉｎｇ，再重复２次３个５１２个卷积核卷积后，再采用ｐｏｏｌｉｎｇ，最后经过３次全连接［５］㊂224?224?3224?224?64112?112?12856?56?25628?28?51214?14?5127?7?5121?1?40961?1?40961?1?10001?1?1000F CC o n vM a x p o o l图１㊀ＶＧＧ１６结构图Ｆｉｇ．１㊀ＳｔｒｕｃｔｕｒｅｄｉａｇｒａｍｏｆＶＧＧ１６㊀㊀利用ＶＧＧ１６提取后的卷积层和最大池化层，经过图２的卷积及池化操作后，获得５个初步的有效特征层㊂i n p u t (512,512,3)蜜C o n v 2d f i l t e r s =64(512,512,64)C o n v 2d f i l t e r s =64(512,512,64)M a x p o o l i n g s =2(256,256,64)蜜C o n v 2d f i l t e r s =128(256,256,128)C o n v 2d f i l t e r s =128(256,256,128)M a x p o o l i n g s =2(128,128,128)蜜C o n v 2d f i l t e r s =256(128,128,256)C o n v 2d f i l t e r s =256(128,128,256)C o n v 2d f i l t e r s =256(128,128,256)M a x p o o l i n g s =2(64,64,256)蜜C o n v 2d f i l t e r s =512(64,64,512)C o n v 2d f i l t e r s =512(64,64,512)C o n v 2d f i l t e r s =512(64,64,512)M a x p o o l i n g s =2(32,32,512)蜜C o n v 2d f i l t e r s =512(32,32,512)C o n v 2d f i l t e r s =512(32,32,512)C o n v 2d f i l t e r s =512(32,32,512)f 1f 2f 3f 4f 5图２㊀有效特征层示意图Ｆｉｇ．２㊀Ｓｃｈｅｍａｔｉｃｄｉａｇｒａｍｏｆｔｈｅｅｆｆｅｃｔｉｖｅｆｅａｔｕｒｅｌａｙｅｒ２．２．２㊀主干特征提取经过主干特征提取网络，可以获得５个初步的有效特征层，在加强特征提取网络过程中，会利用这５个初步的有效特征层进行特征融合［６］，特征融合的方式就是对特征层进行上采样后㊁再进行堆叠㊂为了方便网络的构建与呈现出更好的通用性，在上采样时直接进行２倍上采样再进行特征融合，最终获得的特征层和输入图片的高宽相同㊂２．２．３㊀Ｌｏｓｓ解析Ｌｏｓｓ解析是指对一个神经网络模型的训练过程中，计算出的误差损失值进行分析和解释㊂可以帮助了解模型的训练效果和优化方向，从而更好地调整模型参数，提升模型性能㊂本文使用的Ｌｏｓｓ由ＣｒｏｓｓＥｎｔｒｏｐｙＬｏｓｓ和ＤｉｃｅＬｏｓｓ两部分组成㊂ＣｒｏｓｓＥｎｔｒｏｐｙＬｏｓｓ就是交叉熵损失，在对像素点进行分类时使用［７］㊂ＤｉｃｅＬｏｓｓ将语义分割的评价指标作为Ｌｏｓｓ，Ｄｉｃｅ系数是一种用于度量２个样本相似度的指标，常用于自然语言处理中的文本匹配任务㊂是用来测量２个样本中共同出现的元素或特征的比例，取值范围在［０，１］㊂其值可用如下公式进行计算：６０２智㊀能㊀计㊀算㊀机㊀与㊀应㊀用㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第１３卷㊀ｓ＝２ｘɘｙｘ＋ｙ（１）２．２．４㊀预测结果获得特征层后，就可以利用输入进来的图片特征预测结果，使用１ˑ１的卷积进行通道调整㊂２．３㊀细化算法实现细化算法是一种图像处理算法，现已广泛应用于数字图像处理领域㊂算法可以将图像中的线条或轮廓进行细化，以达到消除噪声㊁减少数据冗余等目的㊂细化算法在很多应用领域都有着重要的作用，例如医学图像处理㊁指纹识别㊁人脸识别等㊂细化算法是一个迭代算法，整个迭代过程分为２部分，对图像的像素进行处理㊂数学方法公式分别见如下：２ɤＮＰ１()ɤ６ＳＰ１()＝１Ｐ２∗Ｐ４∗Ｐ６＝０Ｐ４∗Ｐ６∗Ｐ８＝０ìîíïïïï（２）２ɤＮＰ１()ɤ６ＳＰ１()＝１Ｐ２∗Ｐ４∗Ｐ８＝０Ｐ２∗Ｐ６∗Ｐ８＝０ìîíïïïï（３）㊀㊀其中，Ｎ（Ｐ１）表示与Ｐ１相邻的８个像素点中，为实景像素点的个数；Ｓ（Ｐ１）表示像素中出现０１的累计次数［７］，０为背景，１为实景㊂符合上述全部算法时该格子的算法为１，并且根据这２个部分进行迭代，直到结果不再变化为止㊂最后的结果就是细化算法后的骨架㊂３㊀数据集处理３．１㊀数据集构建ＵＮｅｔ的工作实际上就是对图片的每个像素点进行分类，以此来对各像素点位的每个类别概率进行预测㊂ＵＮｅｔ模型训练用的数据集，采用ＶＯＣ的格式，分为２部分，第一部分是原图，为ＲＧＢ图像［高，宽，３］；第二部分是标签，为灰度图图像［高，宽］㊂原图和标签的数据集图片如图３所示，是２３９个原图与标签图㊂３．２㊀数据增强为了扩大样本的数据量，需提高模型在复杂环境下的准确性和泛化能力㊂本文对数据集下的图像进行了改变亮度㊁锐化处理，模糊处理等操作㊂改变亮度㊁锐化处理㊁模糊处理后的图像数据，可以模拟出赛道在不同光照强度下的场景㊂对原图的亮度处理是在（－４０，４０）的范围内随机加减，通过对每个像素点加减值，来达到改变图像亮度的效果；对原图的锐化处理是增强图像的边缘对比度，需要先对图像进行高通滤波处理［８］，来突出其特征边缘，再对特征边缘在（１．１，１．３）范围内做随机倍数加强；对原图的模糊处理上是随机添加均值为０㊁方差为１的高斯噪声［９］㊂原图和标签的数据集图片如图４所示㊂180.j p g 185.j p g 190.j p g 195.j p g 181.j p g 186.j p g 191.j p g 196.j p g 182.j p g 187.j p g 192.j p g 197.j p g 183.j p g 188.j p g 193.j p g 198.j p g 184.j p g189.j p g194.j p g199.j p g 220.p n g 225.p n g 230.p n g 235.p n g 221.p n g 226.p n g 231.p n g 236.p n g 222.p n g 227.p n g 232.p n g 237.p n g 223.p n g 228.p n g 233.p n g 238.p n g 224.p n g229.p n g234.p n g239.p n g图３㊀原图和标签的数据集图片Ｆｉｇ．３㊀Ｄａｔａｓｅｔｉｍａｇｅｏｆｔｈｅｏｒｉｇｉｎａｌｉｍａｇｅａｎｄｌａｂｅｌｓ（ａ）原图㊀㊀㊀㊀㊀㊀㊀（ｂ）改变亮度㊀㊀㊀㊀㊀㊀㊀（ｃ）锐化处理㊀㊀㊀㊀㊀㊀（ｄ）模糊处理图４㊀数据集增强Ｆｉｇ．４㊀Ｄａｔａｓｅｔｅｎｈａｎｃｅｍｅｎｔ７０２第８期徐威，等：基于ＵＮｅｔ模型的赛道识别算法研究４㊀实验与结果分析４．１㊀试验平台在Ｗｉｎｄｏｗｓ１０系统下，ＣＵＤＡｖ１０．０，ｃｕＤＮＮ７．５．０，运用Ｐｙｔｈｏｎ３．６．４语言进行编译，基于Ｔｅｎｓｏｒｆｌｏｗ２．６．２下的Ｋｅｒａｓ２．６．０框架下搭建的，在ＶＳＣｏｄｅ２０２１下运行㊂４．２㊀识别效果使用基于ＵＮｅｔ模型的赛道识别算法，对输入视频的每一帧进行识别，选取其中一帧，识别效果如图５所示㊂㊀㊀（ａ）原图㊀㊀㊀㊀㊀㊀㊀（ｂ）语义分割㊀㊀㊀㊀㊀㊀（ｃ）细化处理㊀㊀㊀㊀㊀㊀（ｃ）可视化㊀㊀图５㊀识别效果图Ｆｉｇ．５㊀Ｒｅｃｏｇｎｉｔｉｏｎｒｅｎｄｅｒｉｎｇ４．３㊀时效性分析本文对算法进行了测试，选取了３０帧的视频作为检测对象，记录每一帧下的频率，画出时效性分析图来体现基于ＵＮｅｔ模型的赛道识别算法的时效性㊂时效分析结果如图６所示㊂6.56.05.55.04.54.03.53.02.52.0051015202530帧f p s 图６㊀时效分析图Ｆｉｇ．６㊀Ａｇｉｎｇａｎａｌｙｓｉｓ４．４㊀准确度分析本文为了验证基于ＵＮｅｔ模型的赛道识别的准确性，与传统赛道识别算法进行结果比对㊂对于传统赛道识别算法，研究中采用的是颜色分割和边缘检测的方法㊂实验时对相同的赛道视频进行识别，并对其输出的中点集与正确中点集进行比较㊂为了加强准确度分析的严密性，研究采取分别对其在不同环境情况下（低光照㊁正常光照㊁高光照）的８张赛道图片进行识别，对输出的中点集使用编辑距离算法，计算与实际中点集的误差，公式具体如下：ｆ（ｉ，ｊ）＝㊀ｍａｘ（ｉ，ｊ）㊀㊀㊀㊀㊀㊀㊀㊀ｉｆｍｉｎ（ｉ，ｊ）＝０ｍｉｎｆ（ｉ－１，ｊ）＋１ｆ（ｉ，ｊ－１）＋１ｆ（ｉ－１，ｊ－１）＋１（ａｉʂｂｊ）ìîíïïïｉｆｍｉｎ（ｉ，ｊ）ʂ０ìîíïïïï（４）㊀㊀其中，ａ，ｂ是２个数组，ｆ（ｉ，ｊ）是ａ中的前ｉ个字符和ｂ中前ｊ个字符的编辑距离㊂研究得到的识别偏差结果对比见表１㊂表１㊀识别偏差结果对比表Ｔａｂ．１㊀Ｃｏｍｐａｒｉｓｏｎｔａｂｌｅｏｆｉｄｅｎｔｉｆｉｃａｔｉｏｎｂｉａｓｒｅｓｕｌｔｓ序号低光照传统法误差／像素新法误差／像素正常光照传统法误差／像素新法误差／像素高光照传统法误差／像素新法误差／像素１４７２５５５４１２１２４７２４３５４５２１３１６１５３３１５１４４４７２２１１８４５１９５４５２３９９３５１９６２９１９６６１４１０７１６７１２６２１７８４１２４３７３１２０平均偏差３６１７．７５．８５．４３０．９１４．６（下转第２１３页）８０２智㊀能㊀计㊀算㊀机㊀与㊀应㊀用㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第１３卷㊀障类型，模拟正常㊁滚珠故障㊁内圈故障㊁外圈故障，将预先设置故障的轴承安装到智能小车的驱动轴进行测试，采集驱动轴轴承的振动信号数据，该方法先将不同故障类型的时域振动信号进行ＥＭＤ分解，再提取训练样本数据中不同维数的能量作为特征向量㊁选用径向基核函数方法建立ＳＶＭ模型，最后对测试样本数据中的特征向量进行故障识别㊂试验结果表明，在本文中人工模拟滚动轴承故障类型经过ＥＭＤ－ＳＶＭ分类后平均分类准确率达到９３．７５％，能够准确识别轴承的不同故障类型，为轴承的早期故障提供参考㊂参考文献［１］郭志超，朱敏，王荟荟．ＥＭＤ－ＳＶＭ结合对风机齿轮箱振动检测与故障诊断［Ｊ］．郑州师范教育，２０２０，９（０４）：６－９．［２］刘剑生，王细洋．基于ＥＭＤ与ＢＰ神经网络的齿轮故障诊断［Ｊ］．失效分析与预防，２０２０，１５（０６）：３７０－３７５，３９２．［３］钟岳，王钊，方虎生，等．基于ＥＭＤ－ＳＶＤ的液压系统故障模糊聚类研究［Ｊ］．机电工程技术，２０２０，４９（１１）：１０４－１０８．［４］陈之恒，宋冬利，张卫华，等．基于ＥＭＤ及改进ＰＳＯ＿ＢＰ的电机轴承故障诊断［Ｊ］．测控技术，２０２０，３９（１１）：３３－３８，１２５．［５］唐静，王二化，朱俊，等．基于ＥＭＤ和ＳＶＭ的齿轮裂纹故障诊断研究［Ｊ］．机床与液压，２０２０，４８（１４）：２００－２０４．［６］贺志晶，王兴，李凯，等．基于ＦＩＲ－ＥＭＤ和改进ＳＶＭ的铁路轴承故障诊断［Ｊ］．噪声与振动控制，２０１７，３７（０２）：１４３－１４７．［７］彭松，黄志辉，胡奇宇．基于ＥＭＤ与ＳＶＭ的地铁列车滚动轴承故障诊断方法分析［Ｊ］．科技创新与应用，２０１６（２３）：３６．［８］ＶＡＰＮＩＫＶ．Ｎａｔｕｒｅｏｆｓｔａｔｉｓｔｉｃａｌｌｅａｒｎｉｎｇｔｈｅｏｒｙ［Ｍ］．ＵＳ：ＳｐｒｉｎｇｅｒＳｃｉｅｎｃｅ＆ＢｕｓｉｎｅｓｓＭｅｄｉａ，１９９９．［９］王红军，徐小力．支持向量机在设备故障诊断方面的应用研究概述［Ｊ］．机械设计与制造，２００５（０９）：１５７－１５９．［１０］马玉猛．基于ＥＭＤ和ＰＳＯ－ＳＶＭ的通用航空飞机燃油流量预测［Ｊ］．滨州学院学报，２０２２，３８（０４）：２０－２４．［１１］陈炳光．基于ＥＭＤ和ＳＶＭ煤矿通风机轴承故障诊断的研究［Ｄ］．徐州：中国矿业大学，２０１８．（上接第２０８页）５　结束语本文针对赛道识别问题，提出了一种基于ＵＮｅｔ模型的赛道识别方法，使用ＵＮｅｔ模型分割后的掩码图像提取赛道的边缘信息，再用细化算法求取中点集㊂本文对ＵＮｅｔ模型的框架和原理进行了简单阐述，并通过时效性分析，发现了模型存在的初始运行慢问题，通过准确度分析实验，发现相比于传统赛道识别方法，本文方法在不同环境下的赛道识别有了很大的提升㊂后续需要继续研究结构和算法的改进，进一步提升新方法的时效性和准确率㊂参考文献［１］黄俊嘉，余志贤，陈锐，等．基于特征分类的智能汽车赛道元素识别算法［Ｊ］．计算机产品与流通，２０１８（０３）：１２７，１２９．［２］吴绪辉，潘璇峰，邓伟杰．基于图像分割匹配的赛道元素识别算法［Ｊ］．物联网技术，２０１９，９（１１）：２５－２７．［３］俞洋，李峰，缪奕扬．基于机器视觉的全元素赛道智能小车实验系统设计与应用［Ｊ］．中南民族大学学报（自然科学版），２０２２，４１（０６）：６８９－６９６．［４］柴志忠．基于深度卷积神经网络的病理影像研究［Ｄ］．厦门：厦门大学，２０１９．［５］王宁．基于改进ＹＯＬＯ网络与极限学习机的绝缘子故障检测［Ｄ］．大庆：东北石油大学，２０２１．［６］韩俊文．基于ＯｐｅｎＣＶ３的焊缝轨迹识别系统［Ｊ］．电子技术与软件工程，２０１８（１８）：４３－４４．［７］徐奎奎．基于深度学习的合金组织缺陷检测及应用研究［Ｄ］．石家庄：河北经贸大学，２０２２．［８］姜萍萍，颜国正，丁国清，等．一种便携式工业视频内窥镜的开发［Ｊ］．光学精密工程，２００２（０４）：４０７－４１０．［９］汪欢．无源毫米波图像序列超分辨重建算法研究［Ｄ］．成都：电子科技大学，２０１４．［１０］李易．多类型车道线的检测与识别方法研究［Ｄ］．西安：西安石油大学，２０２１．３１２第８期李志伟，等：轴承故障诊断特征提取方法研究。

swin-unet结构

swin-unet结构Swin-Unet: 一种创新的图像分割网络结构导语：在计算机视觉领域，图像分割是一项关键任务，它的目标是将图像中的每个像素分配给不同的类别。

近年来，深度学习技术的快速发展为图像分割带来了革命性的进展。

本文将介绍一种创新的图像分割网络结构——Swin-Unet，该网络结构结合了Swin Transformer 和Unet的优点，取得了优异的分割效果。

一、引言Swin-Unet是在Swin Transformer的基础上进行扩展和改进而来的。

Swin Transformer是2021年提出的一种自注意力机制模型，它采用了分层的注意力机制，能够同时捕捉局部和全局的上下文信息。

而Unet是一种经典的图像分割网络，它具有编码器-解码器的结构，能够有效地提取图像的特征并进行像素级的分类。

通过将Swin Transformer和Unet相结合，Swin-Unet在图像分割任务上取得了显著的性能提升。

二、Swin-Unet的结构Swin-Unet的整体结构如下所示：1. 编码器部分：Swin-Unet采用Swin Transformer作为编码器，它由多个Swin Block组成。

每个Swin Block包含一个局部感知层和一个全局感知层，用于捕捉不同尺度的特征。

局部感知层利用局部窗口进行自注意力计算，全局感知层则利用全局窗口进行自注意力计算，两者相互补充，实现了全局和局部上下文信息的融合。

2. 解码器部分：Swin-Unet的解码器采用了Unet的结构，它由多个上采样模块和跳跃连接模块组成。

上采样模块通过上采样操作将编码器的特征图恢复到原始尺寸，并与相应的跳跃连接特征进行融合。

跳跃连接模块能够提供低层特征的上下文信息，有助于恢复细节和边缘。

3. 损失函数：Swin-Unet采用交叉熵损失函数进行训练，该损失函数能够有效地衡量分割结果与真实标签的差异。

此外，为了进一步提升性能，可以结合其他损失函数，如Dice损失函数等。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

UnifyingStatisticalTextureClassiﬁcationFrameworks

ManikVarmaandAndrewZissermanDept.ofEngineeringScienceUniversityofOxfordOxford,OX13PJ,UK{manik,az}@robots.ox.ac.uk

AbstractTheobjectiveofthispaperistoexaminestatisticalapproachestotheclassiﬁcationoftex-turedmaterialsfromasingleimageobtainedunderunknownviewpointandillumination.Theapproachesinvestigatedherearebasedonthejointprobabilitydistributionofﬁlterresponses.

Wereviewpreviousworkbasedonthisformulationandmaketwoobservations.First,weshowthatthereisacorrespondencebetweenthetwocommonrepresentationsofﬁlteroutputs-textonsandbinnedhistograms.Second,weshowthattwoclassiﬁcationmethod-ologies,nearestneighbourmatchingandBayesianclassiﬁcation,areequivalentforpartic-ularchoicesofthedistancemeasure.Wedescribetheprosandconsofthesealternativerepresentationsanddistancemeasures,andillustratethediscussionbyclassifyingallthematerialsintheColumbia-Utrecht(CUReT)texturedatabase.

Theseequivalencesallowustoperformdirectcomparisonsbetweenthetextonfrequencymatchingframework,bestexempliﬁedbytheclassiﬁersofLeungandMalik[IJCV2001],CulaandDana[CVPR2001],andVarmaandZisserman[ECCV2002],andtheBayesianframeworkmostcloselyrepresentedbytheworkofKonishiandYuille[CVPR2000].

Keywords:Texture,Classiﬁcation,PDFrepresentation,Textons

1IntroductionInthispaper,weexaminetwoofthemostsuccessfulframeworksinwhichtheprob-lemoftextureclassiﬁcationhasbeenattempted–thetextonfrequencycomparisonframework,asexempliﬁedbytheclassiﬁersof[3,8,14],andtheBayesianframe-work,mostcloselyrepresentedby[7].Whiletheframeworksappearseeminglyunrelated,wedrawoutthesimilaritiesbetweenthem,andshowthatthetwocanbemadeequivalentundercertainchoicesofrepresentationanddistancemeasure.

PreprintsubmittedtoElsevierScience22November2004Theclassiﬁcationproblembeingtackledisthefollowing:givenasingleimageofatexturedmaterialobtainedunderunknownviewingandilluminationconditions,classifyitintooneofasetofpre-learntclasses.Classifyingtexturesfromasingleimageundersuchgeneralconditionsandwithoutanyaprioriinformationisaverydemandingtask.

Fig.1.Thechangeinimagedappearanceofthesametexture(PlasterB,texture#30intheColumbia-Utrechtdatabase[5])withvariationinimagingconditions.Toprow:constantviewingangleandvaryingillumination.Bottomrow:constantilluminationandvaryingviewingangle.Thereisaconsiderabledifferenceintheappearanceacrossimages.

Whatmakestheproblemsohardisthatunlikeotherformsofclassiﬁcation,wheretheobjectsbeingcategorisedhaveadeﬁnitestructurewhichcanbecapturedandrepresented,mosttextureshavelargestochasticvariationswhichmakethemdifﬁ-culttomodel.Furthermore,texturedmaterialsoftenundergoaseachangeintheirimagedappearancewithvariationsinilluminationandcamerapose(seeﬁgure1).Dealingwiththissuccessfullyisoneofthemaintasksofanyclassiﬁcationalgo-rithm.Anotherfactorwhichcomesintoplayisthat,quiteoften,twomaterialswhenphotographedunderverydifferentimagingconditionscanappeartobequitesimi-lar,asisillustratedbyﬁgure2.Itisacombinationofallthesefactorswhichmakesthetextureclassiﬁcationproblemsochallenging.

Nevertheless,weakclassiﬁcationalgorithmswhichexploitthestatisticalnatureoftexturesbycharacterisingthemasdistributionsofﬁlterresponseshaveshownmuchpromiseoflate.Thetwotypesofalgorithminthiscategorythathavebeenpartic-ularsuccessfulare(a)theBayesianclassiﬁerbasedonthejointprobabilitydistri-butionofﬁlterresponsesrepresentedbyabinnedhistogram[7],and(b)thenear-estneighbourχ2distributioncomparisonclassiﬁerbasedonatextonfrequencyrepresentation[3,8,14].Inthispaper,wedrawanequivalencebetweenthesetwoschemeswhichthenallowsadirectcomparisonoftheperformanceofthetwoclas-siﬁcationmethodologies.

ThesuccessofBayesianclassiﬁcationappliedtoﬁlterresponseswasconvincinglydemonstratedbyKonishiandYuille[7].WorkingontheSowerbyandSanFran-ciscooutdoordatasets,theiraimwastoclassifyimagepixelsintooneofsixtextureclasses.ThejointPDFoftheempiricalclassconditionalprobabilityofsixﬁlterresponsesforeachtexturewaslearntfromtrainingimages.Itwasrepresentedasahistogrambyquantisingtheﬁlterresponsesintobins.Novelimagepixelswerethen

2Fig.2.SmallinterclassvariationsbetweentexturespresentintheColumbia-Utrechtdatabase.Inthetoprow,theﬁrstandthefourthimageareofthesametexturewhilealltheotherimages,eventhoughtheylooksimilar,belongtodifferentclasses.Similarly,inthebottomrow,theimagesappearsimilarandyettherearethreedifferenttextureclassespresent.

classiﬁedbycomputingtheirﬁlterresponsesandusingBayes’decisionrule.How-ever,forimageregions,KonishiandYuilleonlyreportedtheChernoffinformation(whichisameasureoftheasymptoticclassiﬁcationerrorrate)butdidn’tperformanyactualclassiﬁcation.Inthispaper,wetaketheﬁnalstepanduseBayes’deci-sionruletoclassifyentireimagesbymakingthesameassumptionasSchmid[11],i.e.thataregionisacollectionofstatisticallyindependentpixels,andwhoseprob-abilityisthereforetheproductoftheindividualpixelprobabilities.

【IT专家】使用Texture2D将OpenGL绘图到现有的HDC

页数:4
我学院：unity中movieTexture实现视频播放的方法

页数:3
Render to texture渲染到贴图(即贴图烘焙)

页数:3
最新Texture Coordinates

页数:7
益新蚀纹 YS texture 资料

页数:12
mould texture

页数:19
把Android原生的View渲染到OpenGL Texture

页数:19
Surface Texture (表面粗糙度)

页数:1
Texture

页数:6
unity3D技术之EditorGUI.DrawPreviewTexture 绘制预览纹理

页数:4