Removing Camera Shake from a Single Photograph(deblur_fergus)
- 格式:pdf
- 大小:9.65 MB
- 文档页数:8
第37卷第4期2022年4月Vol.37No.4Apr.2022液晶与显示Chinese Journal of Liquid Crystals and Displays一种应用相机阵列实现图像遮挡物移除的方法杨墨轩,赵源萌*,朱凤霞,张宏飞,张存林(太赫兹光电子学教育部重点实验室,太赫兹波谱与成像北京市重点实验室,北京成像理论与技术高精尖创新中心,首都师范大学物理系,北京100048)摘要:针对光路中前景遮挡物影响感兴趣信息采集的问题,本文对应用相机阵列的遮挡物移除算法进行实验研究。
用阵列型光场相机采集四维光场数据,然后用数字重聚焦技术进行不同深度的重聚焦,突出目标物细节特征。
利用图像重构技术合成子图像阵列,选择最小误差阈值分割法标记遮挡物区域并复现原图像的细节特征。
实验结果证明了应用阵列型光场相机移除遮挡物的可行性,及其改善图像质量、复现遮挡区域图像、提高图像可读性、降低噪声影响的能力。
依据无参考的图像质量评价指标,本文算法在重构图像质量上SNR与PSNR分别提升了17.3%与77.6%。
关键词:光场采集;数字重聚焦;图像重构;遮挡物中图分类号:TP391;O436文献标识码:A doi:10.37188/CJLCD.2021-0255Method of the image de-occlusion by using focal plane cameraYANG Mo-xuan,ZHAO Yuan-meng*,ZHU Feng-xia,ZHANG Hong-fei,ZHANG Cun-lin (Key Laboratory of Terahertz Optoelectronics,Ministry of Education,Beijing Key Laboratory for Terahertz Spectroscopy and Imaging,Beijing Advanced Innovation Center for Imaging Theory and Technology,Department of Physics,Capital Normal University,Beijing100048,China)Abstract:In order to address the problem that the foreground occlusions in the light path affect the acquisi‐tion of information of interest,this paper explores the de-occlusion algorithm based on the application of the camera array through experiments.The array light field camera is used to collect4D light field data,af‐ter which the digital refocusing technology is employed to perform refocusing at different depths so as to highlight the detailed features of target objects.The experimental results support the feasibility of applying the array light field camera to remove occlusions,as well as its ability to enhance the quality of images,re‐produce the image of occluded areas,improve the readability of images,and mitigate the effects of noise. According to the no-reference image quality assessment,in terms of the quality of reconstructed images,the algorithm in this paper can improve the signal to noise ratio(SNR)and peak signal to noise ratio 文章编号:1007-2780(2022)04-0494-07收稿日期:2021-10-09;修订日期:2021-11-21.基金项目:国家自然科学基金(No.61875140);首都师范大学分类发展-学位点建设与研究生教育立项及研究生高水平学术创新项目(No.008-2155089);科技创新服务能力建设-基本科研业务(No.20530290044)Supported by National Natural Science Foundation of China(No.61875140);Classified Development ofCapital Normal University-Construction of Academic Sites,Project Approval of Graduate Education andHigh-Level Academic Innovation Project of Graduate Students(No.008-2155089);Scientific and Techno‐logical Innovation Service Capacity Building-Basic Scientific Research Services(No.20530290044)*通信联系人,E-mail:*********************.cn第4期杨墨轩,等:一种应用相机阵列实现图像遮挡物移除的方法(PSNR)by17.3%and77.6%,respectively.Key words:light field acquisition;digital refocusing;image reconstruction;occlusion1引言在计算机视觉领域中,目标物前出现遮挡与目标物出现混叠时都会对数据采集造成影响。
remove fog算法Remove fog算法概述Remove fog算法是一种图像去雾算法,旨在从有雾的图像中提取出清晰的场景。
它可以用于改善天气条件较差或者拍摄环境不佳的照片、视频等。
原理Remove fog算法基于以下原理:在有雾的图像中,物体与相机之间的可见距离受到雾的影响而降低,因此,通过估计物体与相机之间的可见距离来去除雾霭。
该算法使用了一个称为大气散射模型(Atmospheric Scattering Model)的模型来描述光线在大气中传播时发生的散射现象。
该模型假设大气中存在一定浓度的微小颗粒,这些颗粒会使得光线在传播过程中发生散射。
当光线与这些颗粒碰撞时,它们会被散射到周围,并且随着传播距离增加而逐渐减弱。
因此,在有雾的图像中,远处物体看起来比近处物体更模糊。
实现步骤1. 估计全局大气光(Global Atmospheric Light):首先需要确定图像中存在雾霭的区域。
一种常用的方法是计算每个像素点的亮度值,并根据阈值将其分为前景和背景。
然后,从前景中选择最亮的像素点作为全局大气光。
2. 估计透射率(Transmittance):对于每个像素点,需要确定从该像素点到相机之间的可见距离。
这可以通过估计透射率来实现。
透射率表示光线在传播过程中被吸收或散射的程度。
在有雾的图像中,透射率随着距离增加而减小。
因此,可以使用以下公式计算透射率:t = e^(-beta * d)其中,t表示透射率,beta是一个常量(用于调整雾霭强度),d表示物体与相机之间的距离。
3. 去除雾霭:最后一步是将估计出来的透射率应用到原始图像上,以去除雾霭。
这可以通过以下公式实现:J(x) = (I(x) - A) / max(t(x), t0) + A其中,J(x)表示去除雾霭后的图像,I(x)表示原始图像,在该位置上的颜色值,A表示全局大气光,在该位置上的颜色值,t(x)表示在该位置上估计得到的透射率,t0是一个常量(用于避免除以零错误)。
相机标定过程(opencv)+matlab参数导⼊opencv+matlab标定和矫正%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%⾟苦原创所得,转载请注明出处%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%start -- 摄像机标定 ---------------------------------------------->摄像机标定的数学过程如下标定事先选⽤棋盘格要注意⼀些问题,张正友论⽂中建议棋盘格数⼤于7*7。
opencv标定时候对正⽅形的棋盘格标定板是不能识别的,需要长⽅形的标定板。
张正友论⽂中建议每次拍摄标定板占50%以上,但这是对畸变并不是很⼤的普通相机⽽⾔的,对于球⾯相机是不适⽤的,相反球⾯相机标定使⽤的标定板占⽐应该较⼩⽐较好(对于格⼦并不是⾮常密的棋盘格⽽⾔),原因是因为棋盘格每个⾓点之间的距离越⼤,这段距离之间的可能发⽣畸变的点越多,如果占⽐过⼤就⽆法将形变体现在棋盘格中。
棋盘格的选⽤应该根据实际需要选⽤,对于要求精密识别的情况,则需要⾼精度的棋盘格,相应的价格也会较⾼;对于精度要求并不是很⾼的(如抓取)情况并不需要精度很⾼的标定板,也能够节省开⽀。
这⾥程序的实现是在opencv中,所以就⽤opencv的程序来说明具体的过程.注意各个版本的opencv之间的程序移植性并不好,以下程序是在opencv2.4.3下编制运⾏的,每⼀步的要⽤到的输⼊输出都做了红⾊标记.⽴体相机标定分为两个步骤,⼀个是单⽬标定(本⽂档第2步),另⼀个是双⽬标定单⽬标定获得相机的x,y轴的焦距;x,y轴的坐标原点位置;世界坐标系和平⾯坐标之间的旋转和平移矩阵,5个畸变系数双⽬标定获得两个相机成像平⾯之间的旋转和平移矩阵注意1.程序运⾏前需要插上摄像头,否则程序有可能不能正常运⾏2.单⽬标定(1).获取棋盘格图像for (int i=1; i<=19; i++)//输⼊左标定板图像{std::stringstream str;//声明输⼊输出流str << "./left" << i << ".jpg";//以名字⽅式把图像输⼊到流std::cout << str.str() << std::endl;//.str("")清除内容 .clear()清空标记位leftFileList.push_back(str.str());//.push_back从容器后向前插⼊数据leftBoardImage = cv::imread(str.str(),0);//⽤来显⽰即时输⼊的图像cv::namedWindow("left chessboard image");cv::imshow("left chessboard image",leftBoardImage);cv::waitKey(10);}(2).定义棋盘格的⾓点数⽬cv::Size boardSize(14,10)(3).程序定位提取⾓点这⾥建⽴的是理想成像平⾯(三维,第三维为0,单位为格⼦数)和图像坐标系(⼆维,单位是像素)之间的关系(a)⾸先声明两个坐标容器std::vector<cv::Point2f> imageCorners;//⼆位坐标点std::vector<cv::Point3f> objectCorners;//三维坐标点(b)初始化棋盘⾓点,令其位置位于(x,y,z)=(i,j,0),⼀个棋盘格为⼀个坐标值for (int i=0; i<boardSize.height; i++){for (int j=0; j<boardSize.width; j++){objectCorners.push_back(cv::Point3f(i, j, 0.0f));}}(c)直接使⽤opencv内函数找到⼆维⾓点坐标,并建⽴标定标定格⼦和实际坐标间的关系(像素级别)这个函数使⽤时,当标定板是长⽅形时可以找到⾓点,但是标定板是正⽅形时,就找不到,原因还未知.cv::findChessboardCorners(image, boardSize, imageCorners);(d)获得像素精度往往是不够的,还需要获得亚像素的精度cv::cornerSubPix(image,imageCorners, //输⼊/输出cv::Size(5,5),//搜索框的⼀半,表⽰在多⼤窗⼝定位⾓点cv::Size(-1,-1), //死区cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS,30, // max number of iterations//迭代最⼤次数0.01)); // min accuracy//最⼩精度注:TermCriteria模板类,取代了之前的CvTermCriteria,这个类是作为迭代算法的终⽌条件的,这个类在参考⼿册⾥介绍的很简单,我查了些资料,这⾥介绍⼀下。
摄像机标定的几种方法摄像机标定是计算机视觉和机器视觉领域中的一项重要技术,用于确定相机的内参矩阵和外参矩阵,从而实现图像的准确测量与三维重建。
本文将介绍几种常用的摄像机标定方法,包括直接线性变换(DLT)、Zhang的标定法、Tsai的标定法、径向畸变模型等。
1.直接线性变换(DLT)方法:直接线性变换方法是摄像机标定最基础的方法之一,通过在物体平面上放置多个已知几何形状的标定物体,测量它们的图像坐标和真实坐标,通过最小二乘法求解相机的投影矩阵。
DLT方法简单直接,但对噪声敏感,容易产生误差。
2. Zhang的标定法:Zhang的标定法是一种常用的摄像机标定方法,通过在平面上放置一系列平行的标定板,根据不同位置姿态下的标定板的图像坐标和物理坐标,运用最小二乘法求解相机的内参矩阵和外参矩阵。
Zhang的标定法提高了标定的精度和稳定性,但要求标定板在不同位置姿态下具有较大的变化。
3. Tsai的标定法:Tsai的标定法是一种基于摄像机的投影模型的标定方法,通过摄像机的旋转和平移矩阵,以及曲率和径向畸变的参数,对图像坐标和物理坐标之间的映射关系进行数学推导和求解。
Tsai的标定法可以对畸变进行校正,提高图像测量的精度。
4. Kalibr工具包:Kalibr是一个开源的摄像机标定和多传感器校准工具包,结合了多种摄像机标定方法,例如DLT、Tsai、Zhang等。
Kalibr工具包不仅可以标定单目相机,还可以标定双目和多目视觉系统,对相机的内参、外参、畸变等参数进行标定和优化,同时还能进行相机的手眼标定、IMU与相机的联合标定等。
5. Di Zhang的自标定方法:Di Zhang提出了一种基于相对边界点的自标定方法,通过提取图像中的特定点边界,通过对这些边界点位置的检测与分析,实现对相机内参和外参的求解。
这种方法不需要使用标定板等外部标定物体,只需要相机自身可以看到的物体边界即可进行标定。
6.径向畸变模型:径向畸变是摄像机成像中常见的一种畸变形式,主要表现为物体边缘呈弯曲的形式。
快速运动去模糊摘要本文介绍了一种针对只几秒钟功夫的大小适中的静态单一影像的快速去模糊方法。
借以引入一种新奇的预测步骤和致力于图像偏导而不是单个像素点,我们在迭代去模糊过程上增加了清晰图像估计和核估计。
在预测步骤中,我们使用简单的图像处理技术从估算出的清晰图像推测出的固定边缘,将单独用于核估计。
使用这种方法,前计算高效高斯可满足对于估量清晰图像的反卷积,而且小的卷积结果还会在预测中被抑制。
对于核估计,我们用图像衍生品表示了优化函数,经减轻共轭梯度法所需的傅立叶变换个数优化计算过数值系统条件,可更加快速收敛。
实验结果表明,我们的方法比前人的工作更好,而且去模糊质量也是比得上的。
GPU(Graphics Processing Unit图像处理器)的安装使用程。
我们还说明了这个规划比起使用单个像素点需要更少的更加促进了进一步的提速,让我们的方法更快满足实际用途。
CR(计算机X成像)序列号:I.4.3[图像处理和计算机视觉]:增强—锐化和去模糊关键词:运动模糊,去模糊,图像恢复1引言运动模糊是很常见的一种引起图像模糊并伴随不可避免的信息损失的情况。
它通常由花大量时间积聚进入光线形成图像的图像传感器的特性造成。
曝光期间,如果相机的图像传感器移动,就会造成图像运动模糊。
如果运动模糊是移位不变的,它可以看作是一个清晰图像与一个运动模糊核的卷积,其中核描述了传感器的踪迹。
然后,去除图像的运动模糊就变成了一个去卷积运算。
在非盲去卷积过程中,已知运动模糊核,问题是运动模糊核从一个模糊变形恢复出清晰图像。
在盲去卷积过程中,模糊核是未知的,清晰图像的恢复就变得更加具有挑战性。
本文中,我们解决了静态单一图像的盲去卷积问题,模糊核与清晰图像都是由输入模糊图像估量出。
单一映像的盲去卷积是一个不适定问题,因为未知事件个数超过了观测数据的个数。
早期的方法在运动模糊核上强加了限制条件,使用了参数化形式[Chen et al. 1996;Chan and Wong 1998; Yitzhaky et al. 1998; Rav-Acha and Peleg2005]。
ois 抖动算法
OIS抖动算法是一种用于图像稳定的算法,主要用于减少由于手部震动或运动引起的图像抖动。
这种算法通过使用陀螺仪感应器检测设备的摇晃和运动,并通过微调相机模块的位置来对抗这些运动。
当设备在拍摄时做微小调整,摄像头镜头的光学元件会被移动,以抵消手部或设备的运动,从而确保拍摄的图像保持稳定,减少模糊和抖动的影响。
OIS抖动算法的优点是实时响应,可以在短时间内对摄像头进行微调,从而有效减少图像的抖动。
与数字图像稳定(DIS)算法相比,OIS抖动算法可以更好地保持图像的清晰度和细节。
需要注意的是,OIS抖动算法仅适用于静态或轻微运动的图像稳定。
对于快速或剧烈的运动,可能需要其他技术或算法来进行更加高级的图像稳定处理。
图像复原1.背景介绍图像复原是图像处理的一个重要课题。
图像复原也称图像恢复,是图像处理的一个技术。
它主要目的是改善给定的图像质量。
当给定一幅退化了的或是受到噪声污染的图像后,利用退化现象的某种先验知识来重建或恢复原有图像是复原处理的基本过程。
可能的退化有光学系统中的衍射,传感器非线性畸变,光学系统的像差,摄影胶片的非线性,打气湍流的扰动效应,图像运动造成的模糊及集合畸变等等。
噪声干扰可以有电子成像系统传感器、信号传输过程或者是胶片颗粒性造成。
各种退化图像的复原可归结为一种过程,具体地说就是把退化模型化,并且采用相反的过程进行处理,以便恢复出原图像。
文章介绍图像退化的原因,直方图均衡化及几种常见的图像滤波复原技术,以及用MATLAB实现图像复原的方法。
2.实验工具及其介绍2.1实验工具MATLAB R2016a2.2工具介绍MATLAB语言是基于最为流行的C++语言基础上的,因此语法特征与C++语言极为相似,而且更加简单,更加符合科技人员对数学表达式的书写格式。
使之更利于非计算机专业的科技人员使用。
而且这种语言可移植性好、可拓展性极强。
MATLAB具有方便的数据可视化功能,以将向量和矩阵用图形表现出来,并且可以对图形进行标注和打印。
高层次的作图包括二维和三维的可视化、图象处理、动画和表达式作图。
新版本的MATLAB对整个图形处理功能作了很大的改进和完善,使它不仅在一般数据可视化软件都具有的功能(例如二维曲线和三维曲面的绘制和处理等)方面更加完善,而且对于一些其他软件所没有的功能(例如图形的光照处理、色度处理以及四维数据的表现等),MATLAB 同样表现了出色的处理能力。
同时对一些特殊的可视化要求,例如图形对话等,MATLAB也有相应的功能函数,保证了用户不同层次的要求。
3.图像复原法3.1含义图像复原也称图像恢复,是图像处理中的一大类技术。
所谓图像复原,是指去除或减在获取数字图像过程中发生的图像质量下降(退化)这些退化包括由光学系统、运动等等造成图像的模糊,以及源自电路和光度学因素的噪声。
1.(判断) 反向传播是用来训练人工神经网络的常见方法。
(A)A. TRUEB. FALSE2.(多选) 图像数字化包括的处理过程有哪些?(BC)A. 二值化B. 量化C. 灰度变换D. 采样3. (单选) 一副4位的图像能够区分多少种亮度变化?(B)A. 8B. 16C. 128D. 2564.(单选)修改HSV彩色空间的H分量,会改变图像的什么?(A)A. 色相B. 亮度C. 饱和度D. 对比度5.(判断) 背景差分法是用图像序列中的当前帧和背景参考模型进行减法运算,可以实现图像中运动物体的检测。
(A)A. TRUEB. FALSE6.(判断) 图像的滤波器操作又被称为模板运算,常用的模板运算有模板卷积和模板排序。
(A)A. TRUEB. FALSE7.(单选) ModelArts自动学习的使用流程是什么?(C)A. 部署上线->模型训练->数据标注B. 模型训练->数据标注->部署上线C. 数据标注->模型训练->部署上线D. 数据标注->部署上线->模型训练8. (多选) 图像的目标检测算法需要完成?(ABC)A. 目标位置的计算B. 目标类别的判断C. 置信度的计算D. 目标边缘的计算9.(判断) 图像特征提取是一种降维的思想,可以有效降低图像数据的数据量。
(A)A. TRUEB. FALSE10.(单选) 用两个3x3的卷积核对一副三通道的彩色图像进卷积,得到的特征图有几个通道?(B)A. 1B. 2C. 3D. 411.(多选) 语音合成方法有哪些?(ABCD)A. 共振峰合成器B. 串联共振峰合成器C. 并联共振峰合成器D. PSOLA方法12.(判断) 高斯模型就是用高斯密度函数精确地量化事物,将一个事物分解为若干个基于伯努利分布的模型。
(B)A. TRUEB. FALSE13.(多选) ModelArts训练作业支持哪些算法引擎?(ABD)A. PyTorchB. MXNetC. Spark MLlibD. TensorFlow14.(多选) 命名实体识别是指识别文本中具有特定意义的实体,主要包括哪些?(ABCDE)A. 人名B. 地名C. 机构名D. 时间E. 日期15.(判断) TF-IDF是一种基于统计的计算方法,常用于评估在一个文档集中一个词对全部文档的重要程度。
Removing Camera Shake from a Single Photograph Rob Fergus1Barun Singh1Aaron Hertzmann2Sam T.Roweis2William T.Freeman11MIT CSAIL2University ofTorontoFigure1:Left:An image spoiled by camera shake.Middle:result from Photoshop“unsharp mask”.Right:result from our algorithm. AbstractCamera shake during exposure leads to objectionable image blurand ruins many photographs.Conventional blind deconvolutionmethods typically assume frequency-domain constraints on images,or overly simplified parametric forms for the motion path duringcamera shake.Real camera motions can follow convoluted paths,and a spatial domain prior can better maintain visually salient im-age characteristics.We introduce a method to remove the effects ofcamera shake from seriously blurred images.The method assumesa uniform camera blur over the image and negligible in-plane cam-era rotation.In order to estimate the blur from the camera shake,the user must specify an image region without saturation effects.We show results for a variety of digital photographs taken frompersonal photo collections.CR Categories:I.4.3[Image Processing and Computer Vision]:Enhancement,G.3[Artificial Intelligence]:LearningKeywords:camera shake,blind image deconvolution,variationallearning,natural image statistics1IntroductionCamera shake,in which an unsteady camera causes blurry pho-tographs,is a chronic problem for photographers.The explosion ofconsumer digital photography has made camera shake very promi-nent,particularly with the popularity of small,high-resolution cam-eras whose light weight can make them difficult to hold sufficientlysteady.Many photographs capture ephemeral moments that cannotbe recaptured under controlled conditions or repeated with differ-ent camera settings—if camera shake occurs in the image for anyreason,then that moment is“lost”.Shake can be mitigated by using faster exposures,but that can leadto other problems such as sensor noise or a smaller-than-desireddepth-of-field.A tripod,or other specialized hardware,can elim-inate camera shake,but these are bulky and most consumer pho-tographs are taken with a conventional,handheld ersmay avoid the use offlash due to the unnatural tonescales that re-sult.In our experience,many of the otherwise favorite photographsof amateur photographers are spoiled by camera shake.A methodto remove that motion blur from a captured photograph would bean important asset for digital photography.Camera shake can be modeled as a blur kernel,describing the cam-era motion during exposure,convolved with the image intensities.Removing the unknown camera shake is thus a form of blind imagedeconvolution,which is a problem with a long history in the im-age and signal processing literature.In the most basic formulation,the problem is underconstrained:there are simply more unknowns(the original image and the blur kernel)than measurements(theobserved image).Hence,all practical solutions must make strongprior assumptions about the blur kernel,about the image to be re-covered,or both.Traditional signal processing formulations of theproblem usually make only very general assumptions in the formof frequency-domain power laws;the resulting algorithms can typi-cally handle only very small blurs and not the complicated blur ker-nels often associated with camera shake.Furthermore,algorithmsexploiting image priors specified in the frequency domain may notpreserve important spatial-domain structures such as edges.This paper introduces a new technique for removing the effects ofunknown camera shake from an image.This advance results fromtwo key improvements over previous work.First,we exploit recentresearch in natural image statistics,which shows that photographsof natural scenes typically obey very specific distributions of im-age gradients.Second,we build on work by Miskin and MacKay[2000],adopting a Bayesian approach that takes into account uncer-tainties in the unknowns,allowing us tofind the blur kernel impliedby a distribution of probable images.Given this kernel,the imageis then reconstructed using a standard deconvolution algorithm,al-though we believe there is room for substantial improvement in thisreconstruction phase.We assume that all image blur can be described as a single convolu-tion;i.e.,there is no significant parallax,any image-plane rotationof the camera is small,and no parts of the scene are moving rel-ative to one another during the exposure.Our approach currentlyrequires a small amount of user input.Our reconstructions do contain artifacts,particularly when theabove assumptions are violated;however,they may be acceptable to consumers in some cases,and a professional designer could touch-up the results.In contrast,the original images are typically unus-able,beyond touching-up—in effect our method can help“rescue”shots that would have otherwise been completely lost.2Related WorkThe task of deblurring an image is image deconvolution;if the blur kernel is not known,then the problem is said to be“blind”.For a survey on the extensive literature in this area,see[Kundur and Hatzinakos1996].Existing blind deconvolution methods typically assume that the blur kernel has a simple parametric form,such as a Gaussian or low-frequency Fourier components.However,as il-lustrated by our examples,the blur kernels induced during camera shake do not have simple forms,and often contain very sharp edges. Similar low-frequency assumptions are typically made for the input image,e.g.,applying a quadratic regularization.Such assumptions can prevent high frequencies(such as edges)from appearing in the reconstruction.Caron et al.[2002]assume a power-law distribution on the image frequencies;power-laws are a simple form of natural image statistics that do not preserve local structure.Some methods [Jalobeanu et al.2002;Neelamani et al.2004]combine power-laws with wavelet domain constraints but do not work for the complex blur kernels in our examples.Deconvolution methods have been developed for astronomical im-ages[Gull1998;Richardson1972;Tsumuraya et al.1994;Zarowin 1994],which have statistics quite different from the natural scenes we address in this paper.Performing blind deconvolution in this do-main is usually straightforward,as the blurry image of an isolated star reveals the point-spread-function.Another approach is to assume that there are multiple images avail-able of the same scene[Bascle et al.1996;Rav-Acha and Peleg 2005].Hardware approaches include:optically stabilized lenses [Canon Inc.2006],specially designed CMOS sensors[Liu and Gamal2001],and hybrid imaging systems[Ben-Ezra and Nayar 2004].Since we would like our method to work with existing cam-eras and imagery and to work for as many situations as possible,we do not assume that any such hardware or extra imagery is available. Recent work in computer vision has shown the usefulness of heavy-tailed natural image priors in a variety of applications,including denoising[Roth and Black2005],superresolution[Tappen et al. 2003],intrinsic images[Weiss2001],video matting[Apostoloff and Fitzgibbon2005],inpainting[Levin et al.2003],and separating reflections[Levin and Weiss2004].Each of these methods is effec-tively“non-blind”,in that the image formation process(e.g.,the blur kernel in superresolution)is assumed to be known in advance. Miskin and MacKay[2000]perform blind deconvolution on line art images using a prior on raw pixel intensities.Results are shown for small amounts of synthesized image blur.We apply a similar varia-tional scheme for natural images using image gradients in place of intensities and augment the algorithm to achieve results for photo-graphic images with significant blur.3Image modelOur algorithm takes as input a blurred input image B,which is as-sumed to have been generated by convolution of a blur kernel K with a latent image L plus noise:B=K⊗L+N(1) where⊗denotes discrete image convolution(with non-periodic boundary conditions),and N denotes sensor noise at each pixel. We assume that the pixel values of the image are linearly relatedtoFigure2:Left:A natural scene.Right:The distribution of gra-dient magnitudes within the scene are shown in red.The y-axis has a logarithmic scale to show the heavy tails of the distribution. The mixture of Gaussians approximation used in our experiments is shown in green.the sensor irradiance.The latent image L represents the image we would have captured if the camera had remained perfectly still;our goal is to recover L from B without specific knowledge of K.In order to estimate the latent image from such limited measure-ments,it is essential to have some notion of which images are a-priori more likely.Fortunately,recent research in natural image statistics have shown that,although images of real-world scenes vary greatly in their absolute color distributions,they obey heavy-tailed distributions in their gradients[Field1994]:the distribution of gradients has most of its mass on small values but gives sig-nificantly more probability to large values than a Gaussian distri-bution.This corresponds to the intuition that images often con-tain large sections of constant intensity or gentle intensity gradi-ent interrupted by occasional large changes at edges or occlusion boundaries.For example,Figure2shows a natural image and a histogram of its gradient magnitudes.The distribution shows that the image contains primarily small or zero gradients,but a few gra-dients have large magnitudes.Recent image processing methods based on heavy-tailed distributions give state-of-the-art results in image denoising[Roth and Black2005;Simoncelli2005]and su-perresolution[Tappen et al.2003].In contrast,methods based on Gaussian prior distributions(including methods that use quadratic regularizers)produce overly smooth images.We represent the distribution over gradient magnitudes with a zero-mean mixture-of-Gaussians model,as illustrated in Figure2.This representation was chosen because it can provide a good approxi-mation to the empirical distribution,while allowing a tractable es-timation procedure for our algorithm.4AlgorithmThere are two main steps to our approach.First,the blur kernel is estimated from the input image.The estimation process is per-formed in a coarse-to-fine fashion in order to avoid local minima. Second,using the estimated kernel,we apply a standard deconvo-lution algorithm to estimate the latent(unblurred)image.The user supplies four inputs to the algorithm:the blurred image B,a rectangular patch within the blurred image,an upper bound on the size of the blur kernel(in pixels),and an initial guess as to orientation of the blur kernel(horizontal or vertical).Details of how to specify these parameters are given in Section4.1.2. Additionally,we require input image B to have been converted to a linear color space before processing.In our experiments,we ap-plied inverse gamma-correction1withγ=2.2.In order to esti-mate the expected blur kernel,we combine all the color channels of the original image within the user specified patch to produce a grayscale blurred patch P.1Pixel value=(CCD sensor value)1/γ4.1Estimating the blur kernelGiven the grayscale blurred patch P,we estimate K and the la-tent patch image L p byfinding the values with highest probabil-ity,guided by a prior on the statistics of L.Since these statistics are based on the image gradients rather than the intensities,we per-form the optimization in the gradient domain,using∇L p and∇P, the gradients of L p and P.Because convolution is a linear opera-tion,the patch gradients∇P should be equal to the convolution of the latent gradients and the kernel:∇P=∇L p⊗K,plus noise.We assume that this noise is Gaussian with varianceσ2.As discussed in the previous section,the prior p(∇L p)on the la-tent image gradients is a mixture of C zero-mean Gaussians(with variance v c and weightπc for the c-th Gaussian).We use a sparsity prior p(K)for the kernel that encourages zero values in the kernel, and requires all entries to be positive.Specifically,the prior on ker-nel values is a mixture of D exponential distributions(with scale factorsλd and weightsπd for the d-th component).Given the measured image gradients∇P,we can write the posterior distribution over the unknowns with Bayes’Rule:p(K,∇L p|∇P)∝p(∇P|K,∇L p)p(∇L p)p(K)(2)=∏iN(∇P(i)|(K⊗∇L p(i)),σ2)(3)∏iC∑c=1πc N(∇L p(i)|0,v c)∏jD∑d=1πd E(K j|λd)where i indexes over image pixels and j indexes over blur kernel elements.N and E denote Gaussian and Exponential distributions respectively.For tractability,we assume that the gradients in∇P are independent of each other,as are the elements in∇L p and K.A straightforward approach to deconvolution is to solve for the maximum a-posteriori(MAP)solution,whichfinds the kernel K and latent image gradients∇L that maximizes p(K,∇L p|∇P).This is equivalent to solving a regularized-least squares problem that at-tempts tofit the data while also minimizing small gradients.We tried this(using conjugate gradient search)but found that the algo-rithm failed.One interpretation is that the MAP objective function attempts to minimize all gradients(even large ones),whereas we expect natural images to have some large gradients.Consequently, the algorithm yields a two-tone image,since virtually all the gradi-ents are zero.If we reduce the noise variance(thus increasing the weight on the data-fitting term),then the algorithm yields a delta-function for K,which exactlyfits the blurred image,but without any deblurring.Additionally,wefind the MAP objective function to be very susceptible to poor local minima.Instead,our approach is to approximate the full posterior distri-bution p(K,∇L p|∇P),and then compute the kernel K with max-imum marginal probability.This method selects a kernel that is most likely with respect to the distribution of possible latent im-ages,thus avoiding the overfitting that can occur when selecting a single“best”estimate of the image.In order to compute this approximation efficiently,we adopt a variational Bayesian approach[Jordan et al.1999]which com-putes a distribution q(K,∇L p)that approximates the posterior p(K,∇L p|∇P).In particular,our approach is based on Miskin and MacKay’s algorithm[2000]for blind deconvolution of cartoon im-ages.A factored representation is used:q(K,∇L p)=q(K)q(∇L p). For the latent image gradients,this approximation is a Gaussian density,while for the non-negative blur kernel elements,it is a rec-tified Gaussian.The distributions for each latent gradient and blur kernel element are represented by their mean and variance,stored in an array.Following Miskin and MacKay[2000],we also treat the noise vari-anceσ2as an unknown during the estimation process,thus freeing the user from tuning this parameter.This allows the noise variance to vary during estimation:the data-fitting constraint is loose early in the process,becoming tighter as better,low-noise solutions are found.We place a prior onσ2,in the form of a Gamma distribution on the inverse variance,having hyper-parameters a,b:p(σ2|a,b)=Γ(σ−2|a,b).The variational posterior ofσ2is q(σ−2),another Gamma distribution.The variational algorithm minimizes a cost function representing the distance between the approximating distribution and the true posterior,measured as:KL(q(K,∇L p,σ−2)||p(K,∇L p|∇P)).The independence assumptions in the variational posterior allows the cost function C KL to be factored:<logq(∇L p)p)>q(∇Lp)+<logq(K)>q(K)+<logq(σ−2)2)>q(σ−2)(4) where<·>q(θ)denotes the expectation with respect to q(θ)2.For brevity,the dependence on∇P is omitted from this equation.The cost function is then minimized as follows.The means of the distributions q(K)and q(∇L p)are set to the initial values of K and ∇L p and the variance of the distributions set high,reflecting the lack of certainty in the initial estimate.The parameters of the dis-tributions are then updated alternately by coordinate descent;one is updated by marginalizing out over the other whilst incorporat-ing the model priors.Updates are performed by computing closed-form optimal parameter updates,and performing line-search in the direction of these updated values(see Appendix A for details).The updates are repeated until the change in C KL becomes negligible. The mean of the marginal distribution<K>q(K)is then taken as thefinal value for K.Our implementation adapts the source code provided online by Miskin and MacKay[2000a].In the formulation outlined above,we have neglected the possibil-ity of saturated pixels in the image,an awkward non-linearity which violates our model.Since dealing with them explicitly is compli-cated,we prefer to simply mask out saturated regions of the image during the inference procedure,so that no use is made of them. For the variational framework,C=D=4components were used in the priors on K and∇L p.The parameters of the prior on the latent image gradientsπc,v c were estimated from a single street scene image,shown in Figure2,using EM.Since the image statistics vary across scale,each scale level had its own set of prior parameters. This prior was used for all experiments.The parameters for the prior on the blur kernel elements were estimated from a small set of low-noise kernels inferred from real images.4.1.1Multi-scale approachThe algorithm described in the previous section is subject to local minima,particularly for large blur kernels.Hence,we perform es-timation by varying image resolution in a coarse-to-fine manner.At the coarsest level,K is a3×3kernel.To ensure a correct start to the algorithm,we manually specify the initial3×3blur kernel to one of two simple patterns(see Section4.1.2).The initial estimate for the latent gradient image is then produced by running the inference scheme,while holding Kfixed.We then work back up the pyramid running the inference at each level;the converged values of K and∇L p being upsampled to act as an initialization for inference at the next scale up.At thefinest scale,the inference converges to the full resolution kernel K.2For example,<σ−2>q(σ−2)= σ−2σ−2Γ(σ−2|a,b)=b/a.Figure3:The multi-scale inference scheme operating on the foun-tain image in Figure1.1st&3rd rows:The estimated blur ker-nel at each scale level.2nd&4th rows:Estimated image patch ateach scale.The intensity image was reconstructed from the gradi-ents used in the inference using Poisson image reconstruction.ThePoisson reconstructions are shown for reference only;thefinal re-construction is found using the Richardson-Lucy algorithm with thefinal estimated blur kernel.4.1.2User supervisionAlthough it would seem more natural to run the multi-scale in-ference scheme using the full gradient image∇L,in practice wefound the algorithm performed better if a smaller patch,rich inedge structure,was manually selected.The manual selection al-lows the user to avoid large areas of saturation or uniformity,whichcan be disruptive or uninformative to the algorithm.Examples ofuser-selected patches are shown in Section5.Additionally,the al-gorithm runs much faster on a small patch than on the entire image.An additional parameter is that of the maximum size of the blurkernel.The size of the blur encountered in images varies widely,from a few pixels up to hundreds.Small blurs are hard to resolveif the algorithm is initialized with a very large kernel.Conversely,large blurs will be cropped if too small a kernel is used.Hence,foroperation under all conditions,the approximate size of the kernelis a required input from the user.By examining any blur artifact inthe image,the size of the kernel is easily deduced.Finally,we also require the user to select between one of two ini-tial estimates of the blur kernel:a horizontal line or a vertical line.Although the algorithm can often be initialized in either state andstill produce the correct high resolution kernel,this ensures the al-gorithm starts searching in the correct direction.The appropriateinitialization is easily determined by looking at any blur kernel ar-tifact in the image.4.2Image ReconstructionThe multi-scale inference procedure outputs an estimate of the blurkernel K,marginalized over all possible image reconstructions.Torecover the deblurred image given this estimate of the kernel,weexperimented with a variety of non-blind deconvolution methods,including those of Geman[1992],Neelamani[2004]and van Cit-tert[Zarowin1994].While many of these methods perform well insynthetic test examples,our real images exhibit a range of non-linearities not present in synthetic cases,such as non-Gaussiannoise,saturated pixels,residual non-linearities in tonescale and es-timation errors in the kernel.Disappointingly,when run on ourimages,most methods produced unacceptable levels of artifacts.We also used our variational inference scheme on the gradients ofthe whole image∇B,while holding Kfixed.The intensity imagewas then formed via Poisson image reconstruction[Weiss2001].Aside from being slow,the inability to model the non-linearitiesmentioned above resulted in reconstructions no better than otherapproaches.As L typically is large,speed considerations make simple methodsattractive.Consequently,we reconstruct the latent color image Lwith the Richardson-Lucy(RL)algorithm[Richardson1972;Lucy1974].While the RL performed comparably to the other methodsevaluated,it has the advantage of taking only a few minutes,evenon large images(other,more complex methods,took hours or days).RL is a non-blind deconvolution algorithm that iteratively maxi-mizes the likelihood function of a Poisson statistics image noisemodel.One benefit of this over more direct methods is that it givesonly non-negative output values.We use Matlab’s implementationof the algorithm to estimate L,given K,treating each color chan-nel independently.We used10RL iterations,although for largeblur kernels,more may be needed.Before running RL,we cleanup K by applying a dynamic threshold,based on the maximum in-tensity value within the kernel,which sets all elements below a cer-tain value to zero,so reducing the kernel noise.The output of RLwas then gamma-corrected usingγ=2.2and its intensity histogrammatched to that of B(using Matlab’s histeq function),resulting inL.See pseudo-code in Appendix A for details.5ExperimentsWe performed an experiment to check that blurry images are mainlydue to camera translation as opposed to other motions,such asin-plane rotation.To this end,we asked8people to photographa whiteboard3which had small black dots placed in each cornerwhilst using a shutter speed of1second.Figure4shows dots ex-tracted from a random sampling of images taken by different peo-ple.The dots in each corner reveal the blur kernel local to thatportion of the image.The blur patterns are very similar,showingthat our assumptions of spatially invariant blur with little in planerotation are valid.We apply our algorithm to a number of real images with varyingdegrees of blur and saturation.All the photos came from personalphoto collections,with the exception of the fountain and cafe im-ages which were taken with a high-end DSLR using long exposures(>1/2second).For each we show the blurry image,followed bythe output of our algorithm along with the estimated kernel.The running time of the algorithm is dependent on the size of thepatch selected by the user.With the minimum practical size of128×128it currently takes10minutes in our Matlab implemen-tation.For a patch of N pixels,the run-time is O(N log N)owingto our use of FFT’s to perform the convolution operations.Hencelarger patches will still run in a reasonable piled andoptimized versions of our algorithm could be expected to run con-siderably faster.Small blurs.Figures5and6show two real images degraded bysmall blurs that are significantly sharpened by our algorithm.The3Camera-to-whiteboard distance was≈5m.Lens focal length was50mm mounted on a0.6x DSLR sensor.Figure4:Left:The whiteboard test scene with dots in each corner. Right:Dots from the corners of images taken by different people. Within each image,the dot trajectories are very similar suggesting that image blur is well modeled as a spatially invariantconvolution.Figure5:Top:A scene with a small blur.The patch selected by the user is indicated by the gray rectangle.Bottom:Output of our algorithm and the inferred blur kernel.Note the crisp text.gray rectangles show the patch used to infer the blur kernel,chosen to have many image details but few saturated pixels.The inferred kernels are shown in the corner of the deblurred images.Large blurs.Unlike existing blind deconvolution methods our algorithm can handle large,complex blurs.Figures7and9show our algorithm successfully inferring large blur kernels.Figure1 shows an image with a complex tri-lobed blur,30pixels in size (shown in Figure10),beingdeblurred.Figure6:Top:A scene with complex motions.While the motion of the camera is small,the child is both translating and,in the case of the arm,rotating.Bottom:Output of our algorithm.The face and shirt are sharp but the arm remains blurred,its motion not modeled by our algorithm.As demonstrated in Figure8,the true blur kernel is occasionally revealed in the image by the trajectory of a point light source trans-formed by the blur.This gives us an opportunity to compare the inferred blur kernel with the true one.Figure10shows four such image structures,along with the inferred kernels from the respec-tive images.We also compared our algorithm against existing blind deconvo-lution algorithms,running Matlab’s deconvblind routine,which provides implementations of the methods of Biggs and Andrews [1997]and Jansson[1997].Based on the iterative Richardson-Lucy scheme,these methods also estimate the blur kernel;alternating be-tween holding the blur constant and updating the image and vice-versa.The results of this algorithm,applied to the fountain and cafe scenes are shown in Figure11and are poor compared to the output of our algorithm,shown in Figures1and13.Images with significant saturation.Figures12and13con-tain large areas where the true intensities are not observed,owing to the dynamic range limitations of the camera.The user-selected patch used for kernel analysis must avoid the large saturated re-gions.While the deblurred image does have some artifacts near saturated regions,the unsaturated regions can still be extracted.Figure7:Top:A scene with a large blur.Bottom:Output of our algorithm.See Figure8for a closeupview.Figure8:Top row:Closeup of the man’s eye in Figure7.The origi-nal image(on left)shows a specularity distorted by the camera mo-tion.In the deblurred image(on right)the specularity is condensed to a point.The color noise artifacts due to low light exposure can be removed by medianfiltering the chrominance channels.Bottom row:Closeup of child from another image of the family(different from Figure7).In the deblurred image,the text on his jersey is nowlegible.Figure9:Top:A blurry photograph of three brothers.Bottom:Out-put of our algorithm.Thefine detail of the wallpaper is now visible.6DiscussionWe have introduced a method for removing camera shake effects from photographs.This problem appears highly underconstrained atfirst.However,we have shown that by applying natural im-age priors and advanced statistical techniques,plausible results can nonetheless be obtained.Such an approach may prove useful in other computational photography problems.Most of our effort has focused on kernel estimation,and,visually, the kernels we estimate seem to match the image camera motion. The results of our method often contain artifacts;most prominently, ringing artifacts occur near saturated regions and regions of signif-icant object motion.We suspect that these artifacts can be blamed primarily on the non-blind deconvolution step.We believe that there is significant room for improvement by applying modern sta-tistical methods to the non-blind deconvolution problem.There are a number of common photographic effects that we do not explicitly model,including saturation,object motion,and compres-sion artifacts.Incorporating these factors into our model should improve robustness.Currently we assume images to have a linear tonescale,once the gamma correction has been removed.How-ever,cameras typically have a slight sigmoidal shape to their tone response curve,so as to expand their dynamic range.Ideally,this non-linearity would be removed,perhaps by estimating it during inference,or by measuring the curve from a series of bracketed。