Qin CVPR 2013 - Query Adaptive Similarity for Large Scale Object Retrieval
- 格式:pdf
- 大小:463.07 KB
- 文档页数:8
基于随机梯度法的选择性神经网络二次集成
施彦;黄聪明;侯朝桢
【期刊名称】《计算机工程》
【年(卷),期】2004(30)16
【摘要】针对使用贪心法、遗传算法等方法实现选择性神经网络集成时出现的"局部最小点"和"过拟合"问题,提出了一类基于随机梯度法的选择性神经网络二次集成方法.理论分析和实验表明,与上述选择性神经网络集成方法相比,该方法易于实现且效果明显.
【总页数】4页(P133-135,159)
【作者】施彦;黄聪明;侯朝桢
【作者单位】北京理工大学化工与环境学院,北京,100081;北京理工大学化工与环境学院,北京,100081;北京理工大学信息科学技术学院,北京,100081
【正文语种】中文
【中图分类】TP301.6
【相关文献】
1.基于选择性集成神经网络的电路板故障智能诊断 [J], 于敏;马丽华;卢朝梁
2.选择性神经网络二次集成在火药近红外分析中的应用研究 [J], 施彦;黄聪明
3.基于蜂群算法的选择性神经网络集成的风机齿轮箱轴承故障诊断 [J], 朱俊;刘天羽;王致杰;黄麒元;孟畅;江秀臣;盛戈皞
4.基于选择性神经网络集成的地基云状识别算法 [J], 鲁高宇;李涛
5.基于随机贪心选择的选择性集成算法 [J], 江峰;张友强;杜军威;刘国柱;眭跃飞
因版权原因,仅展示原文概要,查看原文内容请购买。
基于双重阈值近邻查找的协同过滤算法
李颖;李永丽;蔡观洋
【期刊名称】《吉林大学学报(信息科学版)》
【年(卷),期】2013(031)006
【摘要】为了提高协同过滤算法的推荐精度,从协同过滤算法中近邻用户/项目组的选择入手,提出基于双重阈值近邻查找的协同过滤算法.该算法能充分利用现有的稀疏用户项目评分矩阵,找出与目标用户相关性较强,且能参与到评分预测过程中的候选用户.实验结果表明,该算法相比传统的协同过滤算法及部分改进算法,其推荐精度有一定提高,对实际应用具有一定的参考价值.
【总页数】7页(P647-653)
【作者】李颖;李永丽;蔡观洋
【作者单位】吉林师范大学计算机学院,吉林四平136000;东北师范大学计算机科学与信息技术学院,长春130117;吉林大学计算机科学与技术学院,长春130012【正文语种】中文
【中图分类】TP301.6
【相关文献】
1.基于正相关和负相关最近邻居的协同过滤算法 [J], 徐怡;唐一民;王冉
2.通过评分特征优化基于K近邻的协同过滤算法 [J], 韩林峄;吴晟
3.基于近邻协同过滤算法的相似度计算方法研究 [J], 王博生;何先波;朱广林;郭军平;陶卫国;李丽
4.基于BiasSVD和聚类用户最近邻的协同过滤算法研究 [J], 李佳; 张牧
5.基于BiasSVD和聚类用户最近邻的协同过滤算法研究 [J], 李佳;张牧
因版权原因,仅展示原文概要,查看原文内容请购买。
第 22卷第 8期2023年 8月Vol.22 No.8Aug.2023软件导刊Software Guide基于高效注意力模块的三阶段网络图像修复周遵富1,2,张乾1,2,李伟1,2,李筱玉1,2(1.贵州民族大学数据科学与信息工程学院;2.贵州省模式识别与智能系统重点实验室,贵州贵阳 550025)摘要:现存的人脸图像修复方法,在图像大比例缺损或分辨率高的条件下合成的图像会出现图像纹理结构不协调和上下文语义信息不一致的情况,不能准确合成清晰的图像结构,比如眼睛和眉毛等。
为解决此问题,提出一种基于归一化注意力模块(NAM)的三阶段网络(RLGNet)人脸图像修复方法,能加快模型收敛速度并降低计算成本。
其中,粗修复网络对残损图像进行初步修复后,局部修复网络对残损图像局部进行细致的修复,基于归一化注意力模块的全局细化修复网络对残损图像整体进行修复,以增加图像的语义连贯性和纹理结构的协调性。
该方法在CelebA-HQ数据集上进行实验,结果表明在掩码比例为20%~30%时PSNR达到30.35 dB,SSIM达到0.926 9,FID为2.55,能够合成相邻像素过渡自然和纹理结构合理的人脸图像。
关键词:人脸图像修复;归一化注意力模块;三阶段修复网络;激活函数DOI:10.11907/rjdk.231474开放科学(资源服务)标识码(OSID):中图分类号:TP391.41 文献标识码:A文章编号:1672-7800(2023)008-0196-07Three-stage Network Image Inpainting Based on Efficient Attention ModuleZHOU Zunfu1,2, ZHANG Qian1,2, LI Wei1,2, LI Xiaoyu1,2(1.School of Data Science and Information Engineering, Guizhou Minzu University;2.Key Laboratory of Pattern Recognition and Intelligent Systems of Guizhou, Guiyang 550025, China)Abstract:Existing face image inpainting methods, which synthesize face images under the conditions of large scale image deficiency or high resolution that will have inconsistent image texture structure and inconsistent contextual semantic information, and cannot accurately synthe⁃size clear image structures, such as eyes and eyebrows, etc. To solve this problem, This paper proposed a Rough-Local-Global Networks(RL⁃GNet) face image inpainting method based on a Normalization-based Attention Module(NAM), which can accelerate the convergence speed of the model and reduce the computational cost. Among them, the coarse inpainting network performs the initial repair of the residual image and then the local inpainting network performs the detailed repair of the residual image locally; the global refinement inpainting network based on the normalized attention mechanism performs the repair of the residual image as a whole to increase the semantic coherence and the coordi⁃nation of the texture structure of the image. The method proposed in this paper is tested on the CelebA-HQ dataset. The results show that the PSNR reaches 30.35 dB and the SSIM value reaches 0.926 9 and FID is 2.55 at the mask ratio of 20%~30%, which can synthesize face images with a natural transition of adjacent pixels and a reasonable texture structure.Key Words:face image inpainting; normalized attention module; three-stage inpainting network; activation function0 引言图像修复是指补全图像中不同比例的缺失区域。
基于自适应图学习与邻域嵌入的特征选择基于自适应图学习与邻域嵌入的特征选择随着数据规模和维度的不断增加,特征选择在机器学习和数据挖掘中变得越来越重要。
特征选择可以帮助提高学习算法的效率和准确性,并便于解释模型的结果。
然而,传统的特征选择方法往往无法处理高维和复杂的数据,因此需要开发新的方法来解决这个问题。
在本文中,我们提出了一种基于自适应图学习与邻域嵌入的特征选择方法。
首先,我们先介绍一下自适应图学习。
自适应图学习是一种非监督学习方法,通过在数据空间中建立图结构来捕捉数据之间的相似性。
自适应图学习可以有效地处理非线性和非高斯分布的数据,因此在特征选择中具有潜力。
然后,我们介绍邻域嵌入。
邻域嵌入是一种无监督降维方法,它可以将高维数据映射到低维空间中,并保持数据之间的邻域关系。
邻域嵌入可以帮助我们发现数据的潜在结构和特征之间的关联性。
基于以上两种方法,我们提出了一种特征选择方法。
首先,我们通过自适应图学习构建一个数据图,其中每个数据样本表示为一个节点,并根据相似性确定边的权重。
然后,我们使用邻域嵌入方法将数据映射到低维空间中,并计算每个特征在低维空间中的重要性。
通过对比映射前后的特征重要性,我们可以确定哪些特征对于数据的判别性质更重要。
最后,我们根据特征的重要性进行特征选择,选择那些具有较高重要性的特征。
为了验证我们提出的方法的有效性,我们在几个公开数据集上进行了实验。
实验结果表明,我们的方法在特征选择方面具有较高的准确性和稳定性。
与传统的特征选择方法相比,我们的方法能够更好地捕捉到数据的潜在结构和特征之间的关联性,并且能够得到更优的特征子集。
此外,我们的方法还具有较低的计算复杂度,可以在较短的时间内完成特征选择。
总结起来,本文提出了一种基于自适应图学习与邻域嵌入的特征选择方法。
通过将自适应图学习和邻域嵌入相结合,我们的方法能够更好地处理高维和复杂的数据,并且能够得到更优的特征子集。
未来,我们将进一步改进我们的方法,并将其应用于更广泛的领域综上所述,本文提出了一种基于自适应图学习与邻域嵌入的特征选择方法,该方法能够更好地捕捉到数据的潜在结构和特征之间的关联性,并且能够得到更优的特征子集。
北京理工大学科技成果——一种基于自适应采样的
实时视觉跟踪系统
成果简介
视觉跟踪是计算机视觉领域中备受关注的一个研究热点。
它在智能交通、人机交互、视频监控和军事等领域有着广泛的应用。
在实际应用中,视觉跟踪方法通常需要处理一些复杂的目标运动如运动突变和几何外观急剧变化等情况,传统的跟踪技术无法满足跟踪要求。
我们提出基于自适应MCMC采样的方法,很好地解决了这一难题,为实际应用中复杂运动目标的跟踪提供了保障。
项目来源自主研发
技术领域计算机应用技术,人工智能与模式识别。
应用范围交通、广场、车站、会场等视频监控中人,车辆等目标的实时跟踪。
现状特点本系统的跟踪准确率>95%;在使用现有性能个人计算机的条件下,可以做到每秒30帧的实时跟踪,采用嵌入式技术固化跟踪算法可以做到复杂目标运动的实时跟踪。
所在阶段跟踪算法已在一些应用系统中试用,跟踪效果良好。
成果知识产权独立知识产权。
成果转让方式技术合作
市场状况及效益分析目前视频监控的基础设施已经普及使用到各种公共场所,特别是“十二五”期间国家实施的天网计划将进一步推动这些基础设施的普及,使得本项目减少了大量的基础实施投入。
另外由于具有广泛的应用前景,市场效益可观。
发生镜头切换情况的行人跟踪,传统跟踪方法此时将会跟踪失败。
基于人工智能的多模态雷达自适应抗干扰优化算法
许诚;程强;赵鹏;程玮清
【期刊名称】《现代电子技术》
【年(卷),期】2024(47)7
【摘要】多模态雷达系统容易受到外界环境干扰,如天气条件、电磁干扰等,而这些干扰可能会影响多模态雷达数据的准确性和稳定性。
多模态雷达的抗干扰性能决定雷达的测量精度,因此,为提升多模态雷达的抗干扰能力,提出基于人工智能的多模态雷达自适应抗干扰优化算法。
该算法以多模态雷达信号模型为基础,分析距离速度同步欺骗干扰、频谱弥散干扰原理,计算欺骗干扰时雷达接收的总回波信号。
将计算的回波信号结果输入至人工智能的YOLOv5s深度学习模型中,通过模型的训练和映射处理,完成多模态雷达自适应抗干扰优化,实现雷达欺骗性信号干扰抑制。
测试结果显示,该算法的干扰对消比结果在0.935以上,干扰输出功率结果在0.017以下,能够可靠完成多干扰和单一干扰两种干扰抑制,实现多模态雷达自适应抗干扰优化。
【总页数】4页(P73-76)
【作者】许诚;程强;赵鹏;程玮清
【作者单位】空军预警学院
【正文语种】中文
【中图分类】TN95-34;TN911.1;TP391
【相关文献】
1.基于相控阵雷达自适应波束形成的抗干扰技术
2.基于AA的多通道雷达自适应抗干扰方法
3.基于AA的多通道雷达自适应抗干扰方法
4.一种基于自适应搜索的多模态多目标优化算法
5.基于鲸鱼优化算法的变分模态分解和改进的自适应加权融合算法的管道泄漏检测与定位方法
因版权原因,仅展示原文概要,查看原文内容请购买。
基于深度强化学习的自适应滤波算法研究一、引言自适应滤波是指根据信号统计特征,设计出适合当前信号的滤波器。
该技术可用于信号去噪、信号特征提取、信号恢复等领域。
目前,基于深度强化学习的自适应滤波算法受到了广泛关注,并在音频处理、图像处理、控制系统等领域得到了广泛应用。
本文将介绍基于深度强化学习的自适应滤波算法的研究现状与发展方向。
二、自适应滤波的原理及分类自适应滤波是一种根据输入信号的性质调节滤波器响应的方法。
其基本原理是利用输入信号的统计性质、峰值、均值、方差等,调节滤波器的响应特性,使其更加适应当前输入信号的特征。
常用的自适应滤波算法包括最小均方算法(LMS)、归一化LMS算法(NLMS)、递推最小平方算法(RLS)等。
根据滤波器结构,自适应滤波可分为线性自适应滤波与非线性自适应滤波。
线性自适应滤波采用线性滤波器的结构,其输入信号通过滤波器后,输出信号为输入信号与滤波器系数的卷积。
非线性自适应滤波器则不限于线性滤波器的结构,它可以根据需要设计任意结构的滤波器,如模糊滤波器、小波滤波器。
三、深度强化学习及其在自适应滤波中的应用深度强化学习是深度学习与强化学习结合的一种自适应学习方法。
在深度强化学习中,智能体通过与环境的交互,学习如何在特定任务中最大化期望的长期回报。
深度强化学习在语音识别、图像处理、游戏AI、智能机器人等领域得到了广泛应用。
深度强化学习在自适应滤波中的应用主要是基于卷积神经网络(CNN)和循环神经网络(RNN)的结构。
深度强化学习网络利用无监督学习方法,从大量数据中自主学习滤波器的响应特征和滤波器系数。
由于其能够自适应地提取信号的特征,它可以更加准确地去除噪声,从而提高滤波效果。
在实践中,深度强化学习在图像去噪、语音去噪、控制系统等领域得到了广泛应用。
深度强化学习的一个优点是可以取代传统的自适应算法。
传统的自适应滤波器需要在每个时间步骤上计算估计信号,而基于深度强化学习的滤波器可以直接利用输入信号进行学习,省去了估计信号的过程,大大提高了滤波器的运算速度。
自适应分割的视频点云多模式帧间编码方法陈 建 1, 2廖燕俊 1王 适 2郑明魁 1, 2苏立超3摘 要 基于视频的点云压缩(Video based point cloud compression, V-PCC)为压缩动态点云提供了高效的解决方案, 但V-PCC 从三维到二维的投影使得三维帧间运动的相关性被破坏, 降低了帧间编码性能. 针对这一问题, 提出一种基于V-PCC 改进的自适应分割的视频点云多模式帧间编码方法, 并依此设计了一种新型动态点云帧间编码框架. 首先, 为实现更精准的块预测, 提出区域自适应分割的块匹配方法以寻找最佳匹配块; 其次, 为进一步提高帧间编码性能, 提出基于联合属性率失真优化(Rate distortion optimization, RDO)的多模式帧间编码方法, 以更好地提高预测精度和降低码率消耗. 实验结果表明, 提出的改进算法相较于V-PCC 实现了−22.57%的BD-BR (Bjontegaard delta bit rate)增益. 该算法特别适用于视频监控和视频会议等帧间变化不大的动态点云场景.关键词 点云压缩, 基于视频的点云压缩, 三维帧间编码, 点云分割, 率失真优化引用格式 陈建, 廖燕俊, 王适, 郑明魁, 苏立超. 自适应分割的视频点云多模式帧间编码方法. 自动化学报, 2023, 49(8):1707−1722DOI 10.16383/j.aas.c220549An Adaptive Segmentation Based Multi-mode Inter-frameCoding Method for Video Point CloudCHEN Jian 1, 2 LIAO Yan-Jun 1 WANG Kuo 2 ZHENG Ming-Kui 1, 2 SU Li-Chao 3Abstract Video based point cloud compression (V-PCC) provides an efficient solution for compressing dynamic point clouds, but the projection of V-PCC from 3D to 2D destroys the correlation of 3D inter-frame motion and re-duces the performance of inter-frame coding. To solve this problem, we proposes an adaptive segmentation based multi-mode inter-frame coding method for video point cloud to improve V-PCC, and designs a new dynamic point cloud inter-frame encoding framework. Firstly, in order to achieve more accurate block prediction, a block match-ing method based on adaptive regional segmentation is proposed to find the best matching block; Secondly, in or-der to further improve the performance of inter coding, a multi-mode inter-frame coding method based on joint at-tribute rate distortion optimization (RDO) is proposed to increase the prediction accuracy and reduce the bit rate consumption. Experimental results show that the improved algorithm proposed in this paper achieves −22.57%Bjontegaard delta bit rate (BD-BR) gain compared with V-PCC. The algorithm is especially suitable for dynamic point cloud scenes with little change between frames, such as video surveillance and video conference.Key words Point cloud compression, video-based point cloud compresion (V-PCC), 3D inter-frame coding, point cloud segmentation, rate distortion optimization (RDO)Citation Chen Jian, Liao Yan-Jun, Wang Kuo, Zheng Ming-Kui, Su Li-Chao. An adaptive segmentation based multi-mode inter-frame coding method for video point cloud. Acta Automatica Sinica , 2023, 49(8): 1707−1722点云由三维空间中一组具有几何和属性信息的点集构成, 通常依据点的疏密可划分为稀疏点云和密集点云[1]. 通过相机矩阵或高精度激光雷达采集的密集点云结合VR 头盔可在三维空间将对象或场景进行6自由度场景还原, 相较于全景视频拥有更真实的视觉体验, 在虚拟现实、增强现实和三维物体捕获领域被广泛应用[2−3]. 通过激光雷达反射光束经光电处理后收集得到的稀疏点云可生成环境地收稿日期 2022-07-05 录用日期 2022-11-29Manuscript received July 5, 2022; accepted November 29, 2022国家自然科学基金(62001117, 61902071), 福建省自然科学基金(2020J01466), 中国福建光电信息科学与技术创新实验室(闽都创新实验室) (2021ZR151), 超低延时视频编码芯片及其产业化(2020年福建省教育厅产学研专项)资助Supported by National Natural Science Foundation of China (62001117, 61902071), Fujian Natural Science Foundation (2020J01466), Fujian Science & Technology Innovation Laborat-ory for Optoelectronic Information of China (2021ZR151), and Ultra-low Latency Video Coding Chip and its Industrialization (2020 Special Project of Fujian Provincial Education Depart-ment for Industry-University Research)本文责任编委 刘成林Recommended by Associate Editor LIU Cheng-Lin1. 福州大学先进制造学院 泉州 3622512. 福州大学物理与信息工程学院 福州 3501163. 福州大学计算机与大数据学院/软件学院 福州 3501161. School of Advanced Manufacturing, Fuzhou University, Quan-zhou 3622512. College of Physics and Information Engineer-ing, Fuzhou University, Fuzhou 3501163. College of Com-puter and Data Science/College of Software, Fuzhou University,Fuzhou 350116第 49 卷 第 8 期自 动 化 学 报Vol. 49, No. 82023 年 8 月ACTA AUTOMATICA SINICAAugust, 2023图, 以实现空间定位与目标检测等功能, 业已应用于自动驾驶、无人机以及智能机器人等场景[4−7]. 但相较于二维图像, 点云在存储与传输中的比特消耗显著增加[8], 以经典的8i 动态点云数据集[9]为例, 在每秒30帧时的传输码率高达180 MB/s, 因此动态点云压缩是对点云进行高效传输和处理的前提.N ×N ×N 3×3×3为了实现高效的动态点云压缩, 近年来, 一些工作首先在三维上进行帧间运动估计与补偿, 以充分利用不同帧之间的时间相关性. 其中, Kammerl 等[10]首先提出通过构建八叉树对相邻帧进行帧间差异编码, 实现了相较于八叉树帧内编码方法的性能提升; Thanou 等[11]则提出将点云帧经过八叉树划分后, 利用谱图小波变换将三维上的帧间运动估计转换为连续图之间的特征匹配问题. 然而, 上述方法对帧间像素的运动矢量估计不够准确. 为了实现更精确的运动矢量估计, Queiroz 等[12]提出一种基于运动补偿的动态点云编码器, 将点云体素化后进行块划分, 依据块相关性确定帧内与帧间编码模式, 对帧间编码块使用提出的平移运动模型改善预测误差; Mekuria 等[13]则提出将点云均匀分割为 的块, 之后将帧间对应块使用迭代最近点(Iterative closest point, ICP)[14]进行运动估计,以进一步提高帧间预测精度; Santos 等[15]提出使用类似于2D 视频编码器的N 步搜索算法(N-step search, NSS), 在 的三维块区域中迭代寻找帧间对应块, 而后通过配准实现帧间编码. 然而,上述方法实现的块分割破坏了块间运动相关性, 帧间压缩性能没有显著提升.为了进一步提高动态点云压缩性能, 一些工作通过将三维点云投影到二维平面后组成二维视频序列, 而后利用二维视频编码器中成熟的运动预测与补偿算法, 实现三维点云帧间预测. 其中, Lasserre 等[16]提出基于八叉树的方法将三维点云投影至二维平面, 之后用二维视频编码器进行帧间编码; Bud-agavi 等[17]则通过对三维上的点进行二维平面上的排序, 组成二维视频序列后利用高效视频编码器(High efficiency video coding, HEVC)进行编码.上述方法在三维到二维投影的过程中破坏了三维点间联系, 重构质量并不理想. 为改善投影后的点间联系, Schwarz 等[18]通过法线将点映射于圆柱体上确保点间联系, 对圆柱面展开图使用二维视频编码以提高性能. 但在圆柱上的投影使得部分点因遮挡丢失, 影响重构精度. 为尽可能保留投影点数, Mam-mou 等[19]根据点云法线方向与点间距离的位置关系, 将点云划分为若干Patch, 通过对Patch 进行二维平面的排列以减少点数损失, 进一步提高了重构质量.基于Patch 投影后使用2D 视频编码器进行编码, 以实现二维上的帧间运动预测与补偿的思路取得了最优的性能, 被运动图像专家组(Moving pic-ture experts group, MPEG)正在进行的基于视频的点云压缩(Video-based point cloud compres-sion, V-PCC)标准[20]所采纳, 但将Patch 从三维到二维的投影导致三维运动信息无法被有效利用, 使得帧间压缩性能提升受到限制. 针对这一问题, 一些工作尝试在V-PCC 基础上实现三维帧间预测,其中, Li 等[21]提出了一种三维到二维的运动模型,利用V-PCC 中的几何与辅助信息推导二维运动矢量以实现帧间压缩性能改善, 但通过二维推导得到的三维运动信息并不完整, 导致运动估计不够准确.Kim 等[22]提出通过点云帧间差值确定帧内帧与预测帧, 帧内帧用V-PCC 进行帧内编码, 预测帧依据前帧点云进行运动估计后对残差进行编码以实现运动补偿, 但残差编码依旧消耗大量比特. 上述方法均在V-PCC 基础上实现了三维点云的帧间预测,但无论是基于二维的三维运动推导还是帧间残差的编码, 性能改善都比较有限.在本文的工作中, 首先, 为了改善三维上实现运动估计与补偿中, 块分割可能导致的运动相关性被破坏的问题, 本文引入了KD 树(K-dimension tree,KD Tree)思想, 通过迭代进行逐层深入的匹配块分割, 并定义分割块匹配度函数以自适应确定分割的迭代截止深度, 进而实现了更精准的运动块搜索;另外, 针对V-PCC 中二维投影导致三维运动信息无法被有效利用的问题, 本文提出在三维上通过匹配块的几何与颜色两种属性进行相似性判别, 并设计率失真优化(Rate distortion optimization, RDO)模型对匹配块分类后进行多模式的帧间编码, 实现了帧间预测性能的进一步改善. 实验表明, 本文提出的自适应分割的视频点云多模式帧间编码方法在与最新的V-PCC 测试软件和相关文献的方法对比中均取得了BD-BR (Bjontegaard delta bit rate)的负增益. 本文的主要贡献如下:1)提出了针对动态点云的新型三维帧间编码框架, 通过自动编码模式判定、区域自适应分割、联合属性率失真优化的多模式帧间编码、结合V-PCC 实现了帧间编码性能的提升;2)提出了一种区域自适应分割的块匹配方法,以寻找帧间预测的最佳匹配块, 从而改善了均匀分割和传统分割算法导致运动相关性被破坏的问题;3)提出了一种基于联合属性率失真优化模型的多模式帧间编码方法, 在改善预测精度的同时显著减少了帧间编码比特.1 基于视频的点云压缩及其问题分析本文所提出的算法主要在V-PCC 基础上进行1708自 动 化 学 报49 卷三维帧间预测改进, 因此本节对V-PCC 的主要技术做简要介绍, 并分析其不足之处. 其中, V-PCC 编码框架如图1所示.图 1 V-PCC 编码器框架Fig. 1 V-PCC encoder diagram首先, V-PCC 计算3D 点云中每个点的法线以确定最适合的投影面, 进而将点云分割为多个Patch [23].接着, 依据对应Patch 的位置信息, 将其在二维平面上进行紧凑排列以完成对Patch 的打包. 之后,依据打包结果在二维上生成对应的图像, 并使用了几何图、属性图和占用图分别表示各点的坐标、颜色及占用信息. 鉴于Patch 在二维的排列不可避免地存在空像素点, 因此需要占用图表示像素点的占用与否[24]; 由于三维到二维的投影会丢失一个维度坐标信息, 因此使用几何图将该信息用深度形式进行表示; 为了实现动态点云的可视化, 还需要一个属性图用于表示投影点的颜色属性信息. 最后, 为了提高视频编码器的压缩性能, 对属性图和几何图的空像素进行了填充和平滑处理以减少高频分量; 同时, 为了缓解重构点云在Patch 边界可能存在的重叠或伪影, 对重构点云进行几何和属性上的平滑滤波处理[25]. 通过上述步骤得到二维视频序列后, 引入二维视频编码器(如HEVC)对视频序列进行编码.V-PCC 将动态点云帧进行二维投影后, 利用成熟的二维视频编码技术实现了动态点云压缩性能的提升. 但是, V-PCC 投影过程将连续的三维物体分割为多个二维子块, 丢失了三维上的运动信息,使得三维动态点云中存在的时间冗余无法被有效利用. 为了直观展示投影过程导致的运动信息丢失,图2以Longdress 数据集为例, 展示了第1 053和第1 054两相邻帧使用V-PCC 投影得到的属性图.观察图2可以发现, 部分在三维上高度相似的区域,如图中标记位置1、2与3所对应Patch, 经二维投影后呈现出完全不同的分布, 该结果使得二维视频编码器中帧间预测效果受到限制, 不利于压缩性能的进一步提升.2 改进的动态点云三维帧间编码为了在V-PCC 基础上进一步降低动态点云的时间冗余性, 在三维上进行帧间预测和补偿以最小化帧间误差, 本文提出了一个在V-PCC 基础上改进的针对动态点云的三维帧间编码框架, 如图3所示. 下面对该框架基本流程进行介绍.首先, 在编码端, 我们将输入的点云序列通过模块(a)进行编码模式判定, 以划分帧内帧与预测帧. 其思想与二维视频编码器类似, 将动态点云划分为多组具有运动相似性的图像组(Group of pic-tures, GOP)以分别进行编码. 其中图像组中的第一帧为帧内帧, 后续帧均为预测帧, 帧内帧直接通过V-PCC 进行帧内编码; 预测帧则通过帧间预测方式进行编码. 合理的GOP 划分表明当前图像组内各相邻帧均具有较高运动相关性, 因此可最优化匹配块预测效果以减少直接编码比特消耗, 进而提高整体帧间编码性能. 受文献[22]启发, 本文通过对当前帧与上一帧参考点云进行几何相似度判定,以确定当前帧的编码方式进行灵活的图像组划分.如式(1)所示.Longdress 第 1 053 帧三维示例Longdress 第 1 054 帧三维示例Longdress 第 1 053 帧 V-PCC投影属性图Longdress 第 1 054 帧 V-PCC投影属性图11223图 2 V-PCC 从三维到二维投影(属性图)Fig. 2 V-PCC projection from 3D to2D (Attribute map)8 期陈建等: 自适应分割的视频点云多模式帧间编码方法1709cur ref E Gcur,ref Ωmode mode E O R 其中, 为当前帧点云, 为前帧参考点云, 表示两相邻帧点云的几何偏差, 为编码模式判定阈值. 当 值为1时表示当前帧差异较大, 应当进行帧内模式编码; 当 值为0时则表示两帧具有较大相似性, 应当进行帧间模式编码. 另外, 在动态点云重构误差 的计算中, 使用原始点云 中各点与重构点云 在几何和属性上的误差均值表示, 即式(2)所示.N O O (i )R (i ′)i i ′E O,R O R 其中, 为原始点云点数, 和 分别表示原始点云第 点与对应重构点云 点的几何或属性值, 即为原始点云 与重构点云 间误差值.N ×N ×N K 接着, 在进行帧间编码模式判断后, 通过模块(b)进行预测帧的区域自适应块分割. 块分割的目的在于寻找具有帧间运动一致性的匹配块以进行运动预测和补偿. 不同于 等分或 均值聚类, 所提出的基于KD 树思想的区域自适应块匹配从点云质心、包围盒和点数三个角度, 判断分割块的帧间运动程度以进行分割深度的自适应判定,最终实现最佳匹配块搜索.之后, 对于分割得到的匹配块, 通过模块(c)进行基于联合属性率失真优化的帧间预测. 在该模块中, 我们通过帧间块的几何与颜色属性联合差异度,结合率失真优化模型对匹配块进行分类, 分为几乎无差异的完全近似块(Absolute similar block, ASB)、差异较少的相对近似块(Relative similar block,RSB)以及存在较大差异的非近似块(Non similar block, NSB). 完全近似块认为帧间误差可忽略不计, 仅需记录参考块的位置信息; 而相对近似块则表示存在一定帧间误差, 但可通过ICP 配准和属性补偿来改善几何与属性预测误差, 因此除了块位置信息, 还需记录预测与补偿信息; 而对于非近似块,则认为无法实现有效的帧间预测, 因此通过融合后使用帧内编码器进行编码.最后, 在完成帧间模式分类后, 为了在编码端进行当前帧的重构以作为下一帧匹配块搜索的参考帧, 通过模块(d)对相对近似块进行几何预测与属性补偿, 而后将几何预测与属性补偿后的相对近似块、完全近似块、非近似块进行融合得到重构帧. 为了在解码端实现帧间重构, 首先需要组合预测帧中的所有非近似块, 经由模块(e)的V-PCC 编码器进行帧内编码, 并且, 还需要对完全近似块的位置信息、相对近似块的位置与预测补偿信息通过模块(f)进行熵编码以实现完整的帧间编码流程.至此, 整体框架流程介绍完毕, 在接下来的第3节与第4节中, 我们将对本文提出的区域自适应分割的块匹配算法与联合属性率失真优化的多模式帧间编码方法进行更为详细的介绍, 并在第5节通过实验分析进行算法性能测试.3 区域自适应分割的块匹配N B j cur j ref j ∆E cur j ,ref j 相较于二维视频序列, 动态点云存在大量空像素区域, 帧间点数也往往不同. 因此, 对一定区域内的点集进行帧间运动估计时, 如何准确找到匹配的邻帧点集是一个难点. 假设对当前帧进行帧间预测时共分割为 个子点云块, 第 块子点云 与其对应参考帧匹配块 间存在几何与属性综合误差 . 由于重构的预测帧实质上是通过组合相应的参考帧匹配块而估计得到的, 因此精准的帧间块匹配尝试最小化每个分割块的估计误差,以提高预测帧整体预测精度, 如式(3)所示:图 3 改进的三维帧间编码框架Fig. 3 Improved 3D inter-frame coding framework1710自 动 化 学 报49 卷K K N N ×N ×N 为了充分利用帧间相关性以降低时间冗余, 一些工作尝试对点云进行分割后寻找最佳匹配块以实现帧间预测. Mekuria 等[13]将动态点云划分为若干个大小相同的宏块, 依据帧间块点数和颜色进行相似性判断, 对相似块使用迭代最近点算法计算刚性变换矩阵以实现帧间预测. 然而, 当区域分割得到的对应匹配块间存在较大偏差时, 预测效果不佳.为了减少匹配块误差以提高预测精度, Xu 等[26]提出使用 均值聚类将点云分为多个簇, 在几何上通过ICP 实现运动预测, 在属性上则使用基于图傅里叶变换的模型进行运动矢量估计. 但基于 均值聚类的点云簇分割仅在预测帧中进行, 没有考虑帧间块运动相关性, 匹配精度提升受到限制. 为了进一步提高匹配精度, Santos 等[15]受到二维视频编码器中 步搜索算法的启发, 提出了一种3D-NSS 方法实现三维上的匹配块搜索, 将点云分割为 的宏块后进行3D-NSS 以搜索最优匹配块, 而后通过ICP 进行帧间预测.K 上述分割方法均实现了有效的块匹配, 但是,基于宏块的均匀块分割与基于传统 均值聚类的块划分均没有考虑分割块间可能存在的运动连续性, 在分割上不够灵活. 具体表现为分割块过大无法保证块间匹配性, 过小又往往导致已经具有运动连续性的预测块被过度细化, 出现相同运动预测信息的冗余编码. 为了避免上述问题, 本文引入KD 树思想, 提出了一种区域自适应分割算法, 该算法通过迭代进行逐层深入的二分类划分, 对各分割深度下块的运动性质与匹配程度进行分析, 确定是否需要继续分割以实现精准运动块匹配. 算法基本思想如图4所示, 若满足分割条件则继续进行二分类划分, 否则停止分割.Ψ(l,n )其中, 准确判断当前分割区域是否满足运动连续性条件下的帧间运动, 是避免过度分割以实现精准的运动块搜索的关键, 本文通过定义分割块匹配函数来确定截止深度, 如式(4)所示:ρ(n )=max [sign (n −N D ),0]n N D ρ(n )=1ξ(l )l 其中, 为点数判定函数,当点数 大于最小分割块点数阈值 时, ,表示满足深入分割的最小点数要求, 否则强制截止; 为当前深度 下的块运动偏移度, 通过衡量匹配块间的运动变化分析是否需要进一步分割.ξξw ξu 提出的 函数分别通过帧间质心偏移度 估计匹配块间运动幅度, 帧间包围盒偏移度 进行匹ξn ξw ξu ξn T l ξ(l )配块间几何运动一致性判定, 点数偏移度 进行点云分布密度验证, 最后通过 、 与 累加值与分割截止阈值 的比值来整体衡量当前块的运动程度与一致性. 即对于当前分割深度 , 可进一步细化为式(5):其中,w cur w ref u cur u ref n cur n ref l P Max P Min 并且, 、 、 、 、与分别表示当前分割深度下该区域与其前帧对应区域的质心、包围盒与点数,和分别为当前块对角线对应点.ρ(n )=1ξ(l)lξξξξ在的前提下,值反映当前KD 树分割深度下该区域点云的帧间运动情况.值越大帧间运动越显著,当值大于1时,需对运动块进行帧间运动补偿,如果继续分割将导致块的运动一致性被破坏或帧间对应块无法实现有效匹配,从而导致帧间预测失败;值越小说明当前区域点云整体运动变化越小,当值小于1时,需进一步分割寻找可能存在的运动区域.l +1d 对于需要进一步分割的点云块,为了尽可能均匀分割以避免分割后匹配块间误差过大, 将待分割匹配块质心均值作为分割点, 通过以包围盒最长边作为分割面来确定 深度下的分割轴 , 分割轴l = 0l = 1l = 2l = m l = m + 1条件满足, 继续分割条件不满足, 停止分割图 4 区域自适应分割块匹配方法示意图Fig. 4 Schematic diagram of region adaptive segmentation based block matching method8 期陈建等: 自适应分割的视频点云多模式帧间编码方法1711如式(6)所示:Edge d,max Edge d,min d 其中, 和 分别为待分割块在 维度的最大值和最小值.总结上文所述, 我们将提出的区域自适应分割的块匹配算法归纳为算法1. 算法 1. 区域自适应分割的块匹配cur ref 输入. 当前帧点云 与前帧参考点云 输出. 当前帧与参考帧对应匹配块j =1N B 1) For to Do l =02) 初始化分割深度 ;3) Docur j ref j 4) 选取待分割块 和对应待匹配块 ;w u n 5) 计算质心 、包围盒 与块点数 ;ξ(l )6) 根据式(5)计算运动块偏移度 ;ρ(n )7) 根据函数 判定当前分割块点数;Ψ(l,n )8) 根据式(4)计算分割块匹配函数 ;Ψ(l,n )9) If 满足匹配块分割条件:d 10) 根据式(6)确定分割轴 ;cur j ref j 11) 对 与 进行分割;12) 保存分割结果;l +113) 分割深度 ;Ψ(l,n )14) Else 不满足匹配块分割条件:15) 块分割截止;16) 保存匹配块;17) End of if18) While 所有块均满足截止条件;19) End of for图5展示了本文提出的区域自适应分割的块匹配算法对帧Longdress_0536和其参考帧Longdress_0535进行分割后的块匹配结果. 在该序列当前帧下, 人物进行上半身的侧身动作. 观察图5可发现,在运动变化较大的人物上半身, 算法在寻找到较大的对应匹配块后即不再分割; 而人物下半身运动平缓, 算法自适应提高分割深度以实现帧间匹配块的精确搜索, 因而下半身的分块数目大于上半身.4 联合属性率失真优化的多模式帧间编码P Q在动态点云的帧间编码中, 常对相邻帧进行块分割或聚类后依据匹配块相似性实现帧间预测, 并利用补偿算法减少预测块误差以改善帧间编码质量. 其中迭代最近点算法常用于帧间运动估计中,其通过迭代更新待配准点云 相较于目标点云 S t E (S,t )间的旋转矩阵 和平移向量 , 进而实现误差 最小化, 如式(7)所示:N p p i P i q i ′Q p i 其中 为待配准点云点数, 为待配准点云 的第 个点, 为目标点云 中与 相对应的点.但是, 完全依据ICP 配准进行动态点云的三维帧间预测存在两个问题: 首先, ICP 仅在预测块上逼近几何误差的最小化而没考虑到颜色属性偏差引起的匹配块差异, 影响了整体预测精度; 其次, 从率失真角度分析, 对运动变化极小的匹配块进行ICP 配准实现的运动估计是非必要的, 该操作很难改善失真且会增加帧间编码比特消耗.为改善上述问题, 提出了联合属性率失真优化的多模式帧间编码方法. 提出的方法首先在确保几何预测精度的同时, 充分考虑了可能的属性变化导致的预测精度下降问题, 而后通过率失真优化模型,对块依据率失真代价函数得到的最优解进行分类后, 应用不同的编码策略以优化帧间编码方案, 旨在有限的码率约束下最小化编码失真, 即式(8)所示:R j D j j N B R C λ其中, 和 分别表示第 个点云块的编码码率和对应的失真; 是当前帧编码块总数; 表示总码率预算.引入拉格朗日乘子 ,式(8)所示的带约束优化问题可以转换为无约束的最优化问题, 即式(9)所示:当前帧分割可视化当前帧分割效果参考帧分割效果图 5 区域自适应分割的块匹配方法分割示例Fig. 5 Example of block matching method based onadaptive regional segmentation1712自 动 化 学 报49 卷。
专利名称:一种基于双向邻域过滤策略的遥感图像配准方法专利类型:发明专利
发明人:赵明,安博文,吴泳澎,许晓彦,陈元林
申请号:CN201310077992.3
申请日:20130312
公开号:CN103116891A
公开日:
20130522
专利内容由知识产权出版社提供
摘要:本发明公开了一种基于双向邻域过滤策略的遥感图像配准方法,将灰度特征与特征点的空间结构特性相结合,以基于灰度特征的匹配结果作为初始匹配点对,以特征点的邻域结构特性作为约束,通过双向邻域结构的差异性,迭代得到具有相同双向邻域结构的匹配点对,并且采用双向邻域过滤策略恢复迭代过程中错误剔除的候选误配点对。
该方法用于待配准图像之间存在较大仿射变换、待配准图像为异源图像、图像场景中存在相似图案的情况,能够在无人工参与的条件下提高配准的精度。
申请人:上海海事大学
地址:201306 上海市浦东新区临港新城海港大道1550号
国籍:CN
代理机构:上海信好专利代理事务所(普通合伙)
代理人:张静洁
更多信息请下载全文后查看。
迁移学习中的领域自适应方法研究引言随着人工智能的快速发展,迁移学习在计算机科学领域日益受到关注。
在现实世界中,我们往往面临的是数据不足或者不平衡的情况,因此传统的机器学习方法可能无法充分应用。
迁移学习通过利用源领域中已经学习到的知识,应用到目标领域中,解决了这一问题。
领域自适应是迁移学习中的一种重要方法,它主要致力于解决源领域和目标领域之间的分布差异问题,使得在目标领域上的表现更好。
1. 领域自适应概述领域自适应是指在迁移学习中,源领域和目标领域之间存在着分布差异的情况下,通过一系列方法和技术,使得模型在目标领域上表现更好的过程。
在传统的机器学习中,通常假设样本独立同分布,但在现实情况中,这种假设往往难以满足。
领域自适应方法的出现,为我们提供了一种有效的解决方案。
2. 领域自适应方法分类领域自适应方法可以分为基于实例的方法和基于特征的方法。
基于实例的方法主要通过实例选择、重标定和实例生成等方式,使得源领域和目标领域之间的分布差异最小化。
基于特征的方法则通过特征选择、特征映射和特征转换等方式,将不同领域的特征进行统一,以达到降低分布差异的目的。
此外,还有一些集成两种方法的混合方法,以进一步提高领域自适应的性能。
3. 基于实例的领域自适应方法基于实例的领域自适应方法主要包括实例选择、重标定和实例生成等技术。
实例选择方法通过选择源领域和目标领域中相似的实例进行训练,以降低分布差异。
重标定方法通过适应性重标定的方式,将源领域的样本从源领域空间映射到目标领域空间。
实例生成方法则通过生成一些虚拟的目标领域样本,以增加目标领域的训练样本,提高模型在目标领域上的表现。
4. 基于特征的领域自适应方法基于特征的领域自适应方法主要通过特征选择、特征映射和特征转换等方式,将不同领域的特征进行统一,以降低分布差异。
特征选择方法通过选择对目标领域有用的特征进行训练,提高模型在目标领域上的表现。
特征映射方法通过找到源领域和目标领域之间的映射关系,将源领域的特征映射到目标领域的特征空间。
自适应多尺度融合特征-概述说明以及解释1.引言1.1 概述概述部分的内容可以按照以下思路进行编写:概述部分主要介绍本文的研究背景、意义以及研究目标。
可以从以下几个方面展开:首先,可以简要介绍计算机视觉领域的发展趋势和挑战。
随着计算机视觉的迅速发展,图像处理和分析技术在各个领域得到广泛应用。
然而,在实际应用中,图像数据的多尺度特性存在困扰,例如目标的尺寸变化、视角变化、光照条件等。
这些因素给图像处理和分析任务带来了很大的挑战。
接着,可以引入自适应多尺度融合特征的概念。
自适应多尺度融合特征是通过融合不同尺度的图像特征来提高图像处理和分析任务的性能。
通过从不同的尺度上获取图像的特征信息,可以更好地理解图像内容,提高图像处理和分析任务的准确度和鲁棒性。
然后,可以强调自适应多尺度融合特征的研究意义和应用价值。
自适应多尺度融合特征能够解决图像处理和分析中的多尺度问题,对于目标检测、图像分类、图像生成等任务都具有重要的作用。
通过合理的融合策略和算法,可以充分利用图像中不同尺度的信息,提高算法的性能和鲁棒性,进一步推动计算机视觉技术的发展。
最后,可以明确本文的研究目标和内容安排。
本文旨在研究自适应多尺度融合特征在图像处理和分析任务中的应用,探索有效的融合策略和算法。
具体而言,在本文中将从不同尺度的特征提取、融合策略设计、实验验证等方面展开研究。
通过实验评估和对比分析,验证自适应多尺度融合特征的有效性和性能。
综上所述,本章将详细介绍自适应多尺度融合特征的研究背景、意义和研究目标,并对后续章节进行了简要的介绍。
通过本文的研究,有望为解决图像处理和分析中的多尺度问题提供有效的方法和思路,推动计算机视觉技术的进一步发展。
1.2文章结构文章结构部分的内容可以包括以下内容:文章结构部分是整篇文章的核心,它可以帮助读者更好地理解文章的脉络和逻辑结构。
本文采用以下结构:第一部分是引言。
在引言中,我们首先对自适应多尺度融合特征进行了概述,介绍了它在图像处理和计算机视觉领域的应用。
基于正交投影的BiLSTM-CNN情感特征抽取方法
魏苏波;张顺香;朱广丽;孙争艳;李健
【期刊名称】《南京师大学报:自然科学版》
【年(卷),期】2023(46)1
【摘要】基于正交投影的BiLSTM-CNN的情感特征抽取方法旨在从文本中获取带权重的中性词向量,得到具有更高区分度的情感特征,为文本情感分类提供有力的技术支持.传统的深度学习模型会忽略关键局部上下文信息中的特殊意义词,导致获取的情感特征不够丰富.针对这一问题,本文提出一种基于正交投影的BiLSTM-CNN 情感特征抽取方法.首先,将中性词向量投影到情感极性词的正交空间中,得到加权中性词向量,同时通过CNN深度学习模型抽取文本关键语义;然后,利用BiLSTM-Attention模型和带权重的中性词向量,从提取出的关键语义中学习可增强句子情感的语义特征,使文本在情感分类时更具判别性.实验结果表明本文所提出的情感特征抽取方法可以获取更完整的情感特征,从而显著提高文本情感分类的准确率.【总页数】10页(P139-148)
【作者】魏苏波;张顺香;朱广丽;孙争艳;李健
【作者单位】安徽理工大学计算机科学与工程学院;合肥综合性国家科学中心人工智能研究院
【正文语种】中文
【中图分类】TP391.1
【相关文献】
1.一种基于多重词典的中文文本情感特征抽取方法
2.一种基于正交投影的特征抽取方法
3.基于BiLSTM-CNN串行混合模型的文本情感分析
4.面向微博短文本的细粒度情感特征抽取方法
5.基于BiLSTM-CNN的水稻问句相似度匹配方法研究
因版权原因,仅展示原文概要,查看原文内容请购买。
小波自适应阈值和双边滤波的图像去噪
吴成茂;胡伟;王辉
【期刊名称】《西安邮电学院学报》
【年(卷),期】2013(018)004
【摘要】针对VisuShink阈值去噪参数选取仅考虑到噪声标准差和信号长度的不足,提出一种自适应小波阈值并结合双边滤波实现图像去噪的方法.从分析小波子带系数的统计特性出发,选取小波分解层数与子带之间相互关联的自适应阈值,实现图像小波分解自适应去噪,再结合双边滤波去噪方法,以获得能保留图像丰富细节信息的图像降噪效果.通过多幅图像加不同强度噪声所做测试显示,新方法相比传统的软硬阈值函数去噪方法更有效,尤其是对高强度高斯噪声图像去噪,可得到较好的峰值信噪比和视觉图像质量.
【总页数】4页(P5-8)
【作者】吴成茂;胡伟;王辉
【作者单位】西安邮电大学电子工程学院,陕西西安710121;西安邮电大学电子工程学院,陕西西安710121;西安邮电大学自动化学院,陕西西安710121
【正文语种】中文
【中图分类】TP391.41
【相关文献】
1.基于自适应改进小波阈值模型的农业图像去噪 [J], 潘玫玫
2.小波自适应阈值和双边滤波的图像去噪 [J], 吴成茂;胡伟;王辉;
3.结合小波自适应阈值与双边滤波的图像降噪 [J], 尤波; 张宸枫
4.基于自适应小波阈值与曲波变换的SAR图像去噪 [J], 杨哲;邵哲平
5.基于小波自适应阈值的脑CT图像去噪研究 [J], 张爱桃;程思齐;肖雨;周旭;李连捷
因版权原因,仅展示原文概要,查看原文内容请购买。
一种应用于边缘计算框架的改进型动态目标跟踪方法
刘浩;卿粼波;宗江琴;陈虹君
【期刊名称】《四川大学学报:自然科学版》
【年(卷),期】2022(59)6
【摘要】随着5G通信和物联网大数据技术的高速发展,传统的云计算模式已经越来越跟不上数据的增长速度了,边缘计算作为一种新的计算模式,表现出了很强的处理大数据和高速计算的能力.本文在“智能交通仿真系统”课题中,提出了一种适合于视频图像处理的边缘计算框架,并对传统的动目标跟踪算法进行了两点改进:(1)采用树莓派作为视频前端处理器,具有体积小,成本低,计算能力强的特点,适合边缘计算;(2)针对跟踪阶段需要采集大量样本,计算量大的缺点,采用了一种较小步长作为滑动窗的分步式图像采样方法对原压缩跟踪算法进行改进,从而减少了计算量.计算机仿真实验的结果证实了该算法在基本不影响跟踪精度的情况下提高了运算速度.【总页数】7页(P93-99)
【作者】刘浩;卿粼波;宗江琴;陈虹君
【作者单位】成都锦城学院电子信息学院;四川大学电子信息学院;江西省信息科技学校;成都锦城学院四川省专家工作站
【正文语种】中文
【中图分类】TP751.1
【相关文献】
1.各向异性高通滤波中一种改进型边缘方向估计算法
2.一种应用于Ad Hoc网络的改进型TDMA动态时隙分配算法
3.MRNN:一种新的基于改进型递归神经网络的WSN动态建模方法:应用于故障检测
4.一种应用于分布式目标跟踪仿真系统的算法调用方法
5.一种改进型Canny图像边缘检测方法
因版权原因,仅展示原文概要,查看原文内容请购买。
第 23卷第 1期2024年 1月Vol.23 No.1Jan.2024软件导刊Software Guide基于多分类器差异的噪声矫正域适应学习郑潍雯1,汪云云2(1.南京邮电大学计算机学院、软件学院、网络空间安全学院;2.江苏省大数据安全与智能处理重点实验室,江苏南京 210023)摘要:无监督域适应学习旨在利用相关标签丰富的源域知识帮助缺少标签信息的目标域学习,目前常见的域适应方法通常假设源域数据是正确标记的。
然而,实际存在的噪声环境会使得源域样本的标签和特征被破坏。
为解决源域带噪这一问题,提出基于多分类器差异的噪声矫正域适应学习(NCDA)。
首先,利用模型中多分类器间的输出差异,提出一个更精细的分类标准,将源域数据分为特征噪声样本、标签噪声样本、干净样本;其次,针对噪声类型提出不同的矫正方法,并将矫正后的样本重新投入模型中训练;最后,采用随机分类器的思想优化模型。
在Office-31、Of⁃fice-Home及Bing-Caltech数据集上与现有算法进行比较,分类准确率比次优方法高0.2%~1.6%,实验结果表明了NC⁃DA的有效性与鲁棒性。
关键词:无监督域适应;噪声检测;噪声矫正;机器学习DOI:10.11907/rjdk.231109开放科学(资源服务)标识码(OSID):中图分类号:TP391 文献标识码:A文章编号:1672-7800(2024)001-0042-08Noise Correction Domain Adaptation Learning Based on Classifiers DiscrepancyZHENG Weiwen1, WANG Yunyun2(1.School of Computer Science, Nanjing University of Posts & Telecommunications;2.Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing 210023, China)Abstract:Unsupervised domain adaption (UDA)aims to transfer knowledge from the related and label-rich source domain to the label-scarce target domain. Usually, domain adaptation methods assume that the source data is correctly labeled. However, the labels and features of source samples will be destroyed due to the actual noise environment. To solve the problem of noisy source domain, this paper proposed noise correction domain adaptation based on classifiers discrepancy (NCDA). First, this method made a more precise classification standard by the difference between multiple classifiers in the network, which can divide noisy source samples into feature noise samples, label noise samples,and clean samples. Second, different correction methods were applied on them. Then, the corrected samples were put back into the training procedure. Finally,this paper used the idea of stochastic classifiers to improve the network. Extensive experiments on Office-31,Office-Home and Bing-Caltech demonstrated the effectiveness and robustness of NCDA, whose accuracy is 0.2%~1.6% higher than the sub-optimal method.Key Words:unsupervised domain adaptation; noise detection; noise correction;machine learning0 引言近年来,深度神经网络在图像识别、语义分割和自然语言处理等许多应用中取得了显著成果。
一种自适应非均匀性校正算法1. 简介- 引入非均匀性校正的概念- 观察实际应用中的非均匀性问题- 介绍本文要研究的自适应非均匀性校正算法2. 相关技术- 介绍常见的非均匀性校正算法- 对比这些算法的优缺点- 引入用于自适应非均匀性校正的技术3. 自适应非均匀性校正算法- 介绍本文提出的算法的原理- 给出详细的算法流程- 提供算法的实现方法4. 实验结果- 在不同数据集和应用场景中,对比本算法和其他算法的表现- 详细分析本算法的性能和准确性- 比较不同参数对算法性能的影响5. 结论和未来工作- 总结本文提出的自适应非均匀性校正算法的优点和局限- 提出改进本算法的方向- 探讨非均匀性校正的未来研究方向附加:参考文献、致谢等部分根据要求添加。
第1章节:简介在计算机视觉和图像处理的领域中,非均匀性是一个普遍存在的问题,例如拍摄设备的物理限制和环境光照强度等因素会导致图像中出现亮度、色彩和对比度等方面的变化。
非均匀性会影响计算机视觉应用的准确性和鲁棒性,因此需要开发非均匀性校正算法来解决这个问题。
传统的非均匀性校正算法常常是使用全局或局部的图像变换,例如直方图均衡化或多项式变换等,这些算法的准确性和性能在不同的应用场景下具有很大的差异性。
最近,随着计算机视觉、机器学习和人工智能等领域的快速发展,更多的自适应非均匀性校正算法被提出并在各种场景中被广泛应用。
本文旨在研究一种自适应非均匀性校正算法,该算法能够自动识别和调整图像中的非均匀性。
本文的贡献体现在:- 提出一种基于自适应技术的非均匀性校正算法,通过在不同的图像区域中进行动态调整,以实现更加准确和可靠的非均匀性校正。
- 设计精心安排的实验,证明本算法的有效性和优越性,对于不规则形状和高动态范围的图像数据同样具有很强的适应性。
本文的结构组织如下:第一章为本文引言,简要介绍了非均匀性校正的背景和研究意义。
第二章介绍了相关的技术和研究方向,包括传统的非均匀性校正算法和自适应技术的应用。
Query Adaptive Similarity for Large Scale Object Retrieval Danfeng Qin Christian Wengert Luc van GoolETH Z¨u rich,Switzerland{qind,wengert,vangool}@vision.ee.ethz.chAbstractMany recent object retrieval systems rely on local fea-tures for describing an image.The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features.In this paper we present a probabilistic framework for modeling the fea-ture to feature similarity measure.We then derive a query adaptive distance which is appropriate for global similar-ity evaluation.Furthermore,we propose a function to score the individual contributions into an image to image similar-ity within the probabilistic framework.Experimental results show that our method improves the retrieval accuracy sig-nificantly and consistently.Moreover,our result compares favorably to the state-of-the-art.1.IntroductionWe consider the problem of content-based image re-trieval for applications such as object recognition or simi-lar image retrieval.This problem has applications in web image retrieval,location recognition,mobile visual search, and tagging of photos.Most of the recent state-of-the-art large scale image re-trieval systems rely on local features,in particular the SIFT descriptor[14]and its variants.Moreover,these descrip-tors are typically used jointly with a bag-of-words(BOW) approach,reducing considerably the computational burden and memory requirements in large scale scenarios.The similarity between two images is usually expressed by aggregating the similarities between corresponding lo-cal features.However,to the best of our knowledge,few attempts have been made to systematically analyze how to model the employed similarity measures.In this paper we present a probabilistic view of the fea-ture to feature similarity.We then derive a measure that is adaptive to the query feature.We show-both on simulated and real data-that the Euclidean distance density distribu-tion is highly query dependent and that our model adapts the original distance accordingly.While it is difficult to know the distribution of true correspondences,it is actu-ally quite easy to estimate the distribution of the distance of non-corresponding features.The expected distance to the non-corresponding features can be used to adapt the origi-nal distance and can be efficiently estimated by introducing a small set of random features as negative examples.Fur-thermore,we derive a global similarity function that scores the feature to feature similarities.Based on simulated data, this function approximates the analytical result.Moreover,in contrast to some existing methods,our method does not require any parameter tuning to achieve its best performance on different datasets.Despite its simplic-ity,experimental results on standard benchmarks show that our method improves the retrieval accuracy consistently and significantly and compares favorably to the state-of-the-art.Furthermore,all recently presented post-processing steps can still be applied on top of our method and yield an additional performance gain.The rest of this paper is organized as follows.Section2 gives an overview of related research.Section3describes our method in more detail.The experiments for evaluating our approach are described in Section4.Results in a large scale image retrieval system are presented in Section5and compared with the state-of-the-art.2.Related WorkMost of the recent works addressing the image similar-ity problem in image retrieval can be roughly grouped into three categories.Feature-feature similarity Thefirst group mainly works on establishing local feature correspondence.The most fa-mous work in this group is the bag-of-words(BOW)ap-proach[24].Two features are considered to be similar if they are assigned to the same visual word.Despite the effi-ciency of the BOW model,the hard visual word assignment significantly reduces the discriminative power of the local features.In order to reduce quantization artifacts,[20]pro-posed to assign each feature to multiple visual words.In contrast,[8]rely on using smaller codebooks but in con-junction with short binary codes for each local feature,re-fining the feature matching within the same V oronoi cell. Additionally,product quantization[12]was used to esti-2013 IEEE Conference on Computer Vision and Pattern Recognitionmate the pairwise Euclidean distance between features,and the top k nearest neighbors of a query feature is considered as matches.Recently,several researchers have addressed the problem of the Euclidean distance not being the optimal similarity measure in most situations.For instance in[16], a probabilistic relationship between visual words is learned from a large collection of corresponding feature tracks.Al-ternatively,in[21],they learn a projection from the original feature space to a new space,such that Euclidean metric in this new space can appropriately model feature similarity. Intra-image similarity The second group focuses on effec-tively weighting the similarity of a feature pair considering its relationship to other matched pairs.Several authors exploit the property that the local fea-tures inside the same image are not independent.As a consequence,a direct accumulation of local feature sim-ilarities can lead to inferior performance.This problem was addressed in[4]by down-weighting the contribution of non-incidentally co-occurring features.In[9]this prob-lem was approached by re-weighting features according to their burstiness measurement.As the BOW approach discards spatial information,a scoring step can be introduced which exploits the property that the true matched feature pairs should follow a consis-tent spatial transformation.The authors of[19]proposed to use RANSAC to estimate the homography between im-ages,and only count the contribution of feature pairs con-sistent with this model.[26]and[23]propose to quantize the image transformation parameter space in a Hough vot-ing manner,and let each matching feature pair vote for its correspondent parameter cells.A feature pair is considered valid if it supports the cell of maximum votes.Inter-image similarity Finally,the third group addresses the problem of how to improve the retrieval performance by exploiting additional information contained in other images in the database,that depict the same object as the query im-age.[5]relies on query expansion.That is,after retrieving a set of spatially verified database images,this new set is used to query the system again to increase recall.In[22], a set of relevant images is constructed using k-reciprocal nearest neighbors,and the similarity score is evaluated on how similar a database image is to this set.Our work belongs to thefirst group.By formulating the feature-feature matching problem in a probabilistic frame-work,we propose an adaptive similarity to each query fea-ture,and a similarity function to approximate the quanti-tative result.Although the idea of adapting similarity by dissimilarity has already been exploited in[11][17],we pro-pose to measure dissimilarity by mean distance of the query to a set of random features,while theirs use k nearest neigh-bors(kNN).According to the fact that,in a realistic dataset, different objects may have different numbers of relevant im-ages,it is actually quite hard for the kNN based method to find an generalized k for all queries.Moreover,as kNN is an order statistic,it could be sensitive to outliers and can’t be used reliably as an estimator in realistic scenarios.In con-trast,in our work,the set of random features could be con-sidered as a clean set of negative examples,and the mean operator is actually quite robust as shown later.Considering the large amount of data in a typical large scale image retrieval system,it is impractical to compute the pairwise distances between high-dimensional original feature vectors.However,several approaches exist to re-lieve that burden using efficient approximations such as [12,13,3,6].For simplicity,we adopt the method proposed in[12]to estimate the distance between features.3.Our ApproachIn this section,we present a theoretical framework for modeling the visual similarity between a pair of features, given a pairwise measurement.We then derive an analytical model for computing the accuracy of the similarity estima-tion in order to compare different similarity measures.Fol-lowing the theoretical analysis,we continue the discussion on simulated data.Since the distribution of the Euclidean distance varies enormously from one query feature to an-other,we propose to normalize the distance locally to ob-tain similar degree of measurement across queries.Further-more,using the adaptive measure,we quantitatively analyze the similarity function on the simulated data and propose a function to approximate the quantitative result.Finally,we discuss how to integrate ourfindings into a retrieval system.3.1.A probabilistic view of similarity estimationWe are interested in modeling the visual similarity be-tween features based on a pairwise measurement.Let us denote as x i the local feature vectors from a query image and as Y={y1,...,y j,...,y n}a set of local fea-tures from a collection of database images.Furthermore, let m(x i,y j)denote a pairwise measurement between x i and y j.Finally T(x i)represents the set of features which are visually similar to x i,and F(x i)as the set of features which are dissimilar to x i.Instead of considering whether y j is similar to x i and how similar they look,we want to evaluate how likely y j belongs to T(x i)given a measure m.This can be modeled as followsf(x i,y j)=p(y j∈T(x i)|m(x i,y j))(1) For simplicity,we denote m j=m(x i,y j),T i=T(x i), and F i=F(x i).As y j either belongs to T i or F i,we have p(y j∈T i|m j)+p(y j∈F i|m j)=1(2) Furthermore,according to the Bayes Theoremp(y j∈T i|m j)=p(m j|y j∈T i)×p(y j∈T i)p(m j)(3)andp (y j ∈F i |m j )=p (m j |y j ∈F i )×p (y j ∈F i )p (m j )(4)Finally,by combining Equations 2,3and 4we getp (y j ∈T i |m j )=1+p (m j |y j ∈F i )p (m j |y j ∈T i )×p (y j ∈F i )p (y j ∈T i )−1(5)For large datasets the quantity p (y j ∈T i )can be modeled by the occurrence frequency of x i .Therefore,p (y j ∈T i )and p (y j ∈F i )only depend on the query feature x i .In contrast,p (m j |y j ∈T i )and p (m j |y j ∈F i )are the probability density functions of the distribution of m j ,for {y j |y ∈T i }and {y j |y ∈F i }.We will show in Section 3.3,how to generate simulated data for estimating these distributions.In Section 3.5we will further exploit these distributions in our framework.3.2.Estimation accuracySince the pairwise measurement between features is the only observation for our model,it is essential to estimate its reliability.Intuitively,an optimal measurement should be able to perfectly separate the true correspondences from the false ones.In other words,the better the measurement distinguishes the true correspondences from the false ones,the more accurately the feature similarity based on it can be estimated.Therefore,the measurement accuracy can be modeled as the expected pureness.Let T be a collection of all matched pairs of features,i.e,T ={(x,y )|y ∈T (x ))}(6)The probability that a pair of features is a true match given the measurement value z can be expressed asp (T |z )=p ((x,y )∈T |m (x,y )=z )(7)Furthermore,the probability of observing a measurement value z given a corresponding feature pair isp (z |T )=p (m (x,y )=z |(x,y )∈T )(8)Then,the accuracy for the similarity estimation isAcc (m )= ∞−∞p (T |z )×p (z |T )d z(9)with m some pairwise measurement and Acc (m )the accu-racy of the model based on m .Sincep (T |z )≤1and∞−∞p (z |T )d z =1(10)the accuracy of a measure m isAcc (m )≤1(11)andAcc (m )=1⇔p (T |z )=1,∀p (z |T )>0(12)This measure allows to compare the accuracy of different distance measurements as will be shown in the next section.3.3.Ground truth data generationIn order to model the property of T (x i ),we simulate cor-responding features using the following method:First,re-gions r i,0are detected on a random set of images by the Hessian Affine detector[15].Then,we apply numerous ran-dom affine warpings (using the affine model proposed by ASIFT [25])to r i,0,and generate a set of related regions.Finally,SIFT features are computed on all regions resulting in {x i,1,x i,2,...,x i,n }as a subset of T (x i,0).The parameters for the simulated affine transformation are selected randomly and some random jitter is added to model the detection errors occurring in a practical setting.The non-corresponding features F (x i )are simply generated by selecting 500K random patches extracted from a differ-ent and unrelated dataset.In this way,we also generate a dataset D containing 100K matched pairs of features from different images,and 1M non-matched paris.Figure 1de-picts two corresponding image patches randomly selected from the simulated data.Figure 1.Corresponding image patches for two randomly selected points of the simulated data3.4.Query adaptive distanceIt has been observed that the Euclidean distance is not an appropriate measurement for similarity [21,16,11].We argue that the Euclidean distance is a robust estimator when normalized locally.As an example,Figure 2depicts the distributions of the Euclidean distance of the corresponding and non corre-sponding features for the two different interest points shown in Figure 1.For each sample point x i ,we collected a set of 500corresponding features T (x i )using the procedure from Section 3.3and a set of 500K random non-corresponding features F (x i ).It can be seen,that the Euclidean dis-tance separates the matching from the non-matching fea-tures quite well in the local neighborhood of a given query feature x i .However,by averaging the distributions of T (x i )and F (x i )respectively for all queries x i ,the Euclidean distance loses its discriminative power.This explains,why the Eu-clidean distance has inferior performance in estimating vi-00.050.10.150.20.250.30.350.40.45T(x_1)F(x_1)T(x_2)F(x_2)Euclidean DistanceFigure 2.Distribution of the Euclidean distance for two points from the simulated data.The solid lines show the distribution for corresponding features T (x i ),whereas the dotted line depict non-corresponding ones F (x i ).sual similarity from a global point of view.A local adapta-tion is therefore necessary to recover the discriminability of the Euclidean Distance.Another property can also be observed in Figure 2:if a feature has a large distance to its correspondences,it also has a large distance to the non-matching features.By ex-ploiting this property,a normalization of the distance can be derived for each query featured n (x i ,y j )=d (x i ,y j )/N d (x i )(13)where d n (·,·)represents the normalized distance,d (·,·)represents the original Euclidean distance and N d (x i )rep-resents the expected distance of x i to its non-matching fea-tures.It is intractable to estimate the distance distribution between all feature and their correspondences,but it is sim-ple to estimate the expected distance to non-corresponding features.Since the non-corresponding features are inde-pendent from the query,a set of randomly sampled,thus unrelated features can be used to represent the set of non-correspondent features to each query.Moreover,if we as-sume the distance distribution of the non-corresponding set to follow a normal distribution N (μ,σ),then the estima-tion error of its mean based on a subset follows another normal distribution N (0,σ/N ),with N the size of the sub-set.Therefore,N d (x i )can be estimated sufficiently well and very efficiently from even a small set of random,i.e.non-corresponding features.The probability that an unknown feature matches to the query one when observing their distance z can be modeled as,p (T |z )=N T ×p (z |T )N T ×p (z |T )+N F ×p (z |F )={1+N FN T ×p (z |F )p (z |T )}−1(14)with N T and N F the number of corresponding and non-corresponding pairs respectively.In practical settings,N F is usually many orders of magnitude larger than N T .There-fore,once p (z |F )starts getting bigger than 0,p (T |z )rapidly decreases,and the corresponding features would be quickly get confused with the non-corresponding ones.Figure 3illustrates how the adaptive distance recovers more correct matches compared to the Euclidean distance.Moreover,by assuming that N F /N T ≈1000the measurement accuracy following Equation 9can be com-puted.For the Euclidean distance,the estimation accuracy is 0.7291,and for the adaptive distance,the accuracy is 0.7748.Our proposed distance thus significantly outper-forms the Euclidean distance.3.5.Similarity functionIn this section,we show how to derive a globally appro-priate feature similarity in a quantitative manner.After hav-ing established the distance distribution of the query adap-tive distance in the previous section,the only unknown inEquation 5remains p (y j∈F i )p (y j ∈T i ).As discussed in Section 3.1,this quantity is inversely proportional to the occurrence frequency of x i ,and it isgenerally a very large term.Assuming c =p (y j∈F i )p (y j ∈T i )be-ing between 10and 100000,the full similarity function can be estimated and is depicted in Figure 4.The resulting curves follow an inverse sigmoid form such that the similarity is 1for d n →0and 0if d n →1.They all have roughly the same shape and differ approxi-mately only by an offset.It is to be noted,that they show a very sharp transition making it very difficult to correctly estimate the transition point and thus to achieve a good sep-aration between true and false matches.In order to reduce the estimation error due to such sharp transitions,a smoother curve would be desirable.Since the distance distributions are all long-tailed,we have fitted dif-ferent kinds of exponential functions to those curves.How-ever,we observe similar results.For the reason of simplic-ity,we choose to approximate the similarity function asf (x i ,y j )=exp(−α×d n (x i ,y j )4)(15)As can be seen in Figure 4,this curve is flatter and coversapproximately the full range of possible values for c .In Equation 15,αcan be used to tune the shape of the final function and roughly steers the slope of our function,we achieved best results with α=9and keep this value throughout all experiments.In the next section,the robustness of this function in real image retrieval system will be evaluated.3.6.Overall methodIn this section we will integrate the query adaptive dis-tance measurement and the similarity function presented00.020.040.060.080.10.120.140.160.18Euclidean Distance E m p i r i c a l P r o b a b i l i t yfalse correpondent pairs(a)Euclidean Distance 00.020.040.060.080.10.120.140.160.18Query Adaptive DistanceE m p i r i c a l P r o b a b i l i t yfalse correpondent pairs(b)Query Adaptive DistanceE m p i r i c a l P r o b a b i l i t yQuery Adaptive Distance(c)ComparisonFigure 3.The comparison of our adaptive distance to the Euclidean distance on dataset D .The solid lines are the distance distribution of the matched pairs,and the dotted lines are the distance distribution of non-matched pairs.The green dashed lines denotes where the probability of the non-matching distance exceed 0.1%,i.e,the non-matching feature is very likely to dominate our observation.A comparison of the right tails of both distributions is shown in (c).0.511.5200.10.20.30.40.50.60.70.80.91Our adaptive distanceS i m i l a r i t yc=10c=100c=1000c=10000c=100000Our functionFigure 4.Feature similarity evaluated on dataset D .Red lines are the visual similarity for different c evaluated on the simulated data.The blue line is our final similarity function with α=9.before into an image retrieval system.Let the visual similarity between the query image q ={x 1,...,x m }and a database image d ={y 1,...,y n }besim (q,d )=m i =1n j =1f (x i ,y j )(16)with f (x i ,y j )the pairwise feature similarity as in Equa-tion 15.As mentioned before,d n (x i ,y j )and N d (x i )areestimated using the random set of features.For retrieval,we use a standard bag-of-words inverted file.However,in order to have an estimation of the pairwise distance d (x i ,y j )between query and database features,we add a product quantization scheme as in [12]and select the same parameters as the original author.The feature space is firstly partitioned into N c =20 000V oronoi cells ac-cording to a coarse quantization codebook K c .All features located in the same V oronoi cell are grouped into the same inverted list.Each feature is further quantized with respect to its coarse quantization centroid.That is,the residual be-tween the feature and its closest centroid is equally split into m =8parts and each part is separately quantized according to a product quantization codebook K p with N p =256cen-troids.Then,each feature is encoded using its related image identifier and a set of quantization codes,and is stored in its corresponding inverted list.We select random features from Flickr and add 100of them to each inverted list.For performance reasons,we make sure that the random features are added to the inverted list before adding the database vectors.At query time,all inverted lists whose related coarse quantization centers are in the k nearest neighborhood of the query vector are scanned.With our indexing scheme,the distances to non-matching features are always computed first,with their mean value being directly N d (x i ).Then,the query adap-tive distance d n (x i ,y j )to each database vector can directly be computed as in Equation 13.In order to reduce un-necessary computation even more,a threshold βis used to quickly drop features whose Euclidean distance is larger than β×N d (x i ).This parameter has little influence on the retrieval performance,but reduces the computational load significantly.Its influence is evaluated in Section 4.As pointed out by [9],local features of an image tend to occur in bursts.In order to avoid multiple counting of statis-tically correlated features,we incorporate both “intra bursti-ness”and “inter burstiness”normalization [9]to re-weight the contributions of every pair of features.The similarity function thus changes tosim (q,d )=m i =1n j =1w (x i ,y j )f (x i ,y j )(17)with w (x i ,y j )the burstiness weighting.4.ExperimentsIn this part,wefirst introduce the evaluation protocol. Then we give some implementation details of our algo-rithm.Furthermore,we discuss the influence of each pa-rameter and experimentally select the best ones.Finally,we evaluate each part of our method separately.4.1.Datasets and performance evaluation protocolWe evaluated our method on the Oxford5k[19], Paris[20],Holidays[8]and Oxford105k dataset.Ox-ford105k consists of Oxford5k and100285distractor im-ages.The100285distractor images are a set of random images that we downloaded from Flickr having the same resolution of1024×768as the original Oxford5k dataset.We follow the same evaluation measurement method as proposed in the original publications,that is,the mean av-erage precision(mAP)is calculated as the overall perfor-mance of the retrieval system.4.2.Implementation detailsPreprocessing For all experiments,all images are resized such that their maximum resolution is1024×768.In each image,interest points are detected using the Hessian Affine detector and a SIFT descriptor is computed around each point.As in[2]a square root scaling is applied to each SIFT vector,yielding a significantly better retrieval performance when using the Euclidean metric.Codebook training The vocabularies were trained on an independent dataset of images randomly downloaded from Flickr in order to prevent overfitting to the datasets. Random feature dataset preparation Random images from Flickr(however different from the codebook training dataset)are used to generate the random feature dataset. 4.3.Parameter selectionIn this section,we evaluate the retrieval performance of our approach on the Oxford5K dataset for different settings of parameters.There are two parameters in our method:the number of random features in each inverted list,and the cut-off thresholdβforfiltering out features whose contribution is negligible.The influence of the number of the random features Ta-ble1shows the retrieval performance by varying the num-ber of random features for each inverted list.The perfor-mance remains almost constant for a very large range of number of random features.This supports the assumption, that the mean distance of a query feature to the dissimilar features can be robustly estimated even with a small num-ber of random features.We select100random features per inverted list throughout the rest of this paper.The influence of the cut-off thresholdβTable2showsthat features with a distance larger thanβ×N d(xi)withLength50100500100010000mAP0.7390.7390.7390.7390.738 Table1.Influence of the size of the random feature set for eachinverted list on Oxford5kβ0.800.850.90.95similarity score0.0250.0090.0030.001#selected features1343124292mAP0.7330.7390.7400.739Table2.Influence of the cut-off valueβon Oxford5kβ∈[0.8,0.95]have almost no contribution to the retrievalperformance.In order to reduce the number of updates ofthe scoring table,we selectβ=0.85for all experiments.4.4.Effectiveness of our methodLocal adaptive distance In order to compare the adap-tive distance function to the Euclidean distance,we use athreshold for separating matching and non-matching fea-tures.Figure4.4shows the retrieval performance for a vary-ing threshold both for the Euclidean distance as well as forthe adaptive distance.Overall,the best mAP using the adap-tive distance is3%better than the Euclidean distance.Fur-thermore,the adaptive distance is less sensitive when se-lecting a non-optimal threshold.It is to be noted that in thefinal setup,our method does not require any thresholding.00.10.20.30.40.50.60.70.80.910.050.10.150.20.250.30.350.40.450.50.550.60.650.670.69query adapted distanceEuclidean distanceThresholdparison of our adaptive distance with Euclidean dis-tance on Oxford5k datasetContributions of other steps In order to justify the contri-bution of other steps that are contained in our method,weevaluate the performance of our method by taking them outof the pipeline.For the experiment on Oxford5k,wefindout that without the feature scaling,mAP will drop from0.739to0.707,while without burstiness weighting,mAPwill drop to0.692.With multi-assignment only on the queryside,mAP can increase from0.739to0.773for MA=5,and0.780for MA=10.MA denotes the number of in-verted lists that are traversed per query feature.5.ResultsThroughout all experiments,the set of parameters was fixed to the values obtained in the previous section and vocabularies were trained always on independent datasets.Table 3shows the retrieval performance on all typical benchmarks both with single assignment (SA)and multi-assignment (MA =10).As expected,multi-assignment (scanning of several inverted lists)reduces the quantization artifacts and improves the performance consistently,how-ever,in exchange for more computational load.Furthermore,we applied an image level post-processing step on top of our method.We choose to use reciprocal nearest neighbors (RNN)[22],for the reason that it can be easily integrated on top of a retrieval system independently from the image similarity function.We adopt the publicly available code [1]provided by the original authors and the default settings.RNN significantly improves the results on Oxford5K and Paris datasets,but slightly lowers the result on Holidays.Considering that RNN tries to exploit addi-tional information contained in other relevant database im-ages,which are scarce in Holidays (in average only 2to 3relevant database images per query),it is difficult for query expansion methods to perform much better.Dataset SA MA MA +RNN Oxford5k 0.7390.7800.850Oxford105k 0.6780.7280.816Paris 0.7030.7360.855Holidays0.8140.8210.801Table 3.Performance of our method on public datasets.parison with state-of-the-art00.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.8INRIA Holiday Oxford5K ParisTop kFigure 6.Retrieval Performance by using top k nearest neighbor as similar features [12]We first compare the performance of our method to [12]which relies on using the top k nearest neighbors of the Euclidean distance for selecting the similar features of aquery.This work is closest to ours,both in memory over-head and computational complexity.It can be seen in Fig-ure 6,that no single k maximizes the performance for all datasets,showing that this parameter is very sensitive to the data.Moreover,our method outperforms the peak results from [12]consistently by roughly 10points of mAP.Table 4shows the comparison to several other methods without applying any image-level post-processing step.As pointed out by [10],training a vocabulary on independent data rather than the evaluated dataset itself can better rep-resent the search performance in a very large dataset.We only compare to state-of-the-art methods using codebooks trained on independent datasets.We achieve the best per-formance for Oxford5k,Oxford105k,and Holidays and fall only slightly behind [16]on Paris.Dataset Ours [16][7][18]Oxford5k 0.7800.7420.7040.725Oxford105k 0.7280.675∗-0.652Paris 0.7360.749--Holidays0.8210.749∗∗0.8170.769/0.818∗∗Table parisons with state-of-the-art methods without apply-ing image level post-processing.∗indicates the score of merging Oxford5k and Paris and 100K distractor images.∗∗denotes the result obtained by manually rotating all images in the Holidays dataset to be upright.Furthermore,Table 5gives a comparison for the results when additional image-level post-processing steps are ap-plied.We argue,that any post-processing step can directly benefit from our method and illustrate with RNN as exam-ple that the best performance can be achieved.Dataset Ours+RNN [16][18][2]Oxford5k 0.8500.8490.8220.809Oxford105k 0.8160.7950.7720.722Paris 0.8550.824-0.765Holidays0.8010.758∗∗0.78-Table parisons with the state of art methods with post-processing in image level.∗∗denotes the result obtained by man-ually rotating all images in the Holidays dataset to be upright.In all of the previous experiments,each feature costs 12bytes of memory.Specifically,4bytes is used for the image identifier and 8bytes for the quantization codes.As [11]mainly show results using more bytes for feature encoding,we also compare our method to theirs with more bytes per feature.As shown in Table 6,using more bytes further im-proves the retrieval results.Even with less bytes than [11],better performance is achieved on all datasets.In all experiments,we compare favorably to the state-of-the-art by exploiting a simple similarity function without any parameter tuning for each dataset.The good results。