Multilevel image reconstruction with natural pixels
- 格式:pdf
- 大小:907.16 KB
- 文档页数:27
Two-tier image annotation model based on a multi-labelclassifier and fuzzy-knowledge representation schemeMarina Ivasic-Kos a,n,Miran Pobar a,Slobodan Ribaric ba Department of Informatics,University of Rijeka,Rijeka,Croatiab Faculty of Electrical Engineering and Computing,University of Zagreb,Zagreb,Croatiaa r t i c l e i n f oArticle history:Received31October2014Received in revised form31August2015Accepted22October2015Available online6November2015Keywords:Image annotationKnowledge representationInference algorithmsFuzzy Petri NetMulti-label image classificationa b s t r a c tAutomatic image annotation involves automatically assigning useful keywords to an unlabelled image.Themajor goal is to bridge the so-called semantic gap between the available image features and the keywordsthat people might use to annotate images.Although different people will most likely use different words toannotate the same image,most people can use object or scene labels when searching for images.We propose a two-tier annotation model where thefirst tier corresponds to object-level and thesecond tier to scene-level annotation.In thefirst tier,images are annotated with labels of objects present inthem,using multi-label classification methods on low-level features extracted from images.Scene-levelannotation is performed in the second tier,using the originally developed inference-based algorithms forannotation refinement and for scene recognition.These algorithms use a fuzzy knowledge representationscheme based on Fuzzy Petri Net,KRFPNs,that is defined to enable reasoning with concepts useful forimage annotation.To define the elements of the KRFPNs scheme,novel data-driven algorithms foracquisition of fuzzy knowledge are proposed.The proposed image annotation model is evaluated separately on thefirst and on the second tier usinga dataset of outdoor images.The results outperform the published results obtained on the same imagecollection,both on the object-level and on scene-level annotation.Different subsets of features composedof dominant colours,image moments,and GIST descriptors,as well as different classification methods(RAKEL,ML-kNN and Naïve Bayes),were tested in thefirst tier.The results of scene level annotation in thesecond tier are also compared with a common classification method(Naïve Bayes)and have shownsuperior performance.The proposed model enables the expanding of image annotation with new conceptsregardless of their level of abstraction.&2015Elsevier Ltd.All rights reserved.1.IntroductionImage retrieval,search and organisation have become a problem on account of the huge number of images produced daily.In order to simplify these tasks,different approaches for image retrieval have been proposed that can be roughly divided into those that compare visual content(content-based image retrieval)and those that use text descriptions of images(text-based image retrieval)[27,7].Image retrieval based on text has appeared to be easier,more natural and more suitable for people in most everyday cases.This is because it is much simpler to write a keyword-based query than to provide image examples,and it is likely that the user does not have an example image of the query.Besides,images corresponding to the same keywords can be very diverse.For example,if a person has an image of a town,but wants a different view,content-based retrieval would not be the best choice because most of the results would be too similar to the image he already has.On the other hand,very diverse images can be retrieved with a keyword query,e.g.for the keyword Dubrovnik,different views of the town can be retrieved(Fig.1.).To be able to retrieve images using text,they must be labelled or described in the surrounding text.However,most images are not. Since manually providing image annotation is a tedious and expensive task,especially when dealing with a large number of images, automatic image annotation has appeared as a solution.Contents lists available at ScienceDirectjournal homepage:/locate/prPattern Recognition/10.1016/j.patcog.2015.10.0170031-3203/&2015Elsevier Ltd.All rightsreserved.n Correspondence to:R.Matejčić2,51000Rijeka,Croatia.Tel.:þ38551584710.E-mail address:marinai@uniri.hr(M.Ivasic-Kos).Pattern Recognition52(2016)287–305Automatic annotation methods deal with visual features that can be extracted from the raw image data,such as colour,texture and structure,and can automatically assign metadata in the form of keywords from a controlled vocabulary to an unlabelled image.The major goal is to bridge the so-called semantic gap [12]between the available features and the keywords that could be useful to people.In the paper [15]an image representation model is given to re flect the semantic levels of words used in image annotation.This problem is challenging because different people will most likely annotate the same image with different words that re flect their knowledge of the context of the image,their experience and cultural background.However,most people when searching for images use two types of labels:object and scene labels.Object labels correspond to objects that can be recognised in an image,like sky,trees,tracks and train for the image in Fig.2a.Scene labels represent the context of the whole image,like SceneTrain or more general Transportation for Fig.2a.Scene labels can be either directly obtained as a result of the global classi fication of image features [23]or inferred from object labels,as proposed in our approach.Since object and scene labels are most commonly used,in this paper we focus on automatic image annotation at scene and object levels.We propose a two-tier annotation model for automatic image annotation,where the first tier corresponds to object annotation and the second tier to scene-level annotation.An overview of the proposed model is given after the sections with the related work,in Section 3.The first assumption is that there can be many objects in any image,but an image can be classi fied into one scene.The second assumption is that there are typical objects of which scenes are composed.Since many object labels can be assigned to an image,the object-level annotation was treated as a multi-label problem and appropriate multi-label classi fication methods RAKEL and ML-kNN were used.The details of object level annotation are given in Section 4.The scene-level annotation relies on originally developed inference based algorithm de fined for fuzzy knowledge representation scheme based on Fuzzy Petri Net,KRFPNs.The KRFPNs scheme and data-driven algorithms for knowledge acquisition about domain images are de fined in Section 5.The inference approach for annotation re finement at the object level is given in Section 6.The conclusions about scenes are drawn using inference-based algorithms on facts about image domain and object labels obtained in the first tier,as detailed in Section 7.The major contributions of this paper can be summarised as follows:1.The adaptive two-tier annotation model in which each tier can be independently used and modi fied.2.The first tier uses multi-label classi fication to suit the image annotation at object level.3.The second tier uses inference-based algorithms to handle the recognition of scenes and higher-level concepts that do not have speci fic low-level features.4.The de finition of the knowledge representation scheme based on Fuzzy Petri Nets,to represent knowledge about domain images that is often incomplete,uncertain and ambiguous.5.Novel data-driven algorithms for acquisition of fuzzy knowledge about relations between objects and about scenes.6.Novel inference based algorithm for annotation re finement to reduce the error propagation through the hierarchical structure of concepts during the inference of scenes.7.Novel inference based algorithm for automatic scene recognition.8.A comparison of inference based scene classi fication with an ordinary classi fication approach.The performance of the proposed two-tier automatic annotation system was evaluated on outdoor images with regard to different feature subsets (dominant colours,moments,GIST descriptors [23]and compared to the published results obtained on the same image database as detailed in Section 8.The inference based scene classi fication is also compared with an ordinary classi fication approach and has shown better performance.The paper ends with a conclusion and directions for future work,Section 8.Fig.1.Part of the image search results for the keyword"Dubrovnik".Object labelstracks, train, cloud, sky, trees,snow, polar bear Scene label SceneTrain, Transportation ScenePolarbear, WildLife, ArcticFig.2.Examples of images and their annotation at object and scene levels.M.Ivasic-Kos et al./Pattern Recognition 52(2016)287–3052882.Related workAutomatic image annotation (AIA)has been an active research topic in recent years due to its potential impact on image retrieval,search,image interpretation and description.The AIA approaches proposed so far can be divided in various ways,e.g.according to the theory they are most related to [9]or according to the semantic level of concepts that are used for annotation [30].AIA approaches are most closely related to statistical theory and machine learning or logical reasoning and arti ficial intelligence,while semantic levels can be organised into flat or structured vocabularies.Classical AIA approaches belonging to the field of machine learning look for mapping between image features and concepts at object or scene levels.Classi fication and probabilistic modelling have been extensively used for this purpose.Methods based on classi fication,such as the one described in [17],treat each of the semantic keywords or concepts as an independent class and assign each of them to one classi fier.Methods based on the translation model [10]and methods,which use latent semantic analysis [22]fall into the category of probabilistic methods that aim to learn a relevance model to represent correlations between images and keywords.A recent survey of research in this field can be found in [35,7].Lately,graph-based methods have been intensively investigated to apply logical reasoning to images,and many graph-based image analysis algorithms have proven to be successful.Conceptual graphs which encode the interpretation of the image are used for image annotation.Pan et al.[24]have proposed a graph-based method for image annotation in which images,annotations and regions are considered as three types of nodes of a mixed media graph.In [19],automatic image annotation is performed using two graph-based learning processes.In the first graph,the nodes are images and the edges are relations between the images,and in the second graph the nodes are words and the edges are relations between the words.In [8],the authors aim to illustrate each of the concepts from the WordNet ontology with 500-1000images in order to create public image ontology called ImageNet.Within the project aceMedia,in [21]ontology is combined with fuzzy logic to generate concepts from the beach domain.In [1],the same group of authors has used a combination of different classi fiers for learning concepts and fuzzy spatial relationships.In [13],a framework based on Fuzzy Petri Nets is proposed for image annotation at object level.In the fuzzy-knowledge base,nodes are features and objects.Co-occurrence relations are de fined between objects,and attribute relations are de fined between objects and features.The idea of introducing a fuzzy knowledge representation scheme into image annotation system proposed in our previous research[14,16],is further developed in this paper.New inference-based algorithms for scene classi fication and annotation re finement as well as algorithms for data-driven knowledge acquisition are formally de fined and enhanced.As opposed to the previous work where object level classi fication was performed using classi fication of segmented images,object level annotation is treated here as a multi-label classi fication problem on the whole image.Binder et al.[3]have proposed a method called Output Kernel Multi-Task Learning (MTL)to improve ranking performance.Zhang et al.[36]have de fined a graph-based representation for loosely annotated images where each node is de fined as a collection of discriminative image patches annotated with object category labels.The edge linking two nodes models the co-occurrence relationship among different objects in the same image.3.Overview of the two-tier image annotation modelImages of outdoor scenes commonly contain one or more objects of interest,for instance person,boat,dog,bridge and different kinds of background objects such as sky,grass,water ,etc.However,people often think about these images as a whole,interpreting them asscenes,Fig.3.Framework of a two-tier automatic image annotation system.M.Ivasic-Kos et al./Pattern Recognition 52(2016)287–305289for example tennis match instead of person,court,racket,net ,and ball .To make the image annotation more useful for organising and for the retrieval of images,it should contain both object and scene labels.Object labels correspond to classes whose instances can be recognised in an image.Scene labels are used to represent the context or semantics of the whole image,according to common sense and expert knowledge.To deal with different levels of annotation with object and scene labels,a two-tier automatic image annotation model using a multi-label classi fier and a fuzzy-knowledge representation scheme is proposed.In the first tier,multi-label classi fication is proposed since it can provide reliable recognition of speci fic objects without the need to include knowledge.However,higher-level concepts usually do not correspond to speci fic low-level features,and inference based on knowledge about the domain is therefore more appropriate.The relationships between objects in a scene can be easily identi fied and are generally unchanging,so they can be used to infer scenes and more abstract concepts.Still,knowledge about objects and scenes is usually imprecise,somewhat ambiguous and incomplete.Therefore,fuzzy-knowledge representation is used in the second tier.The architecture of the proposed system is depicted in Fig.3.The input to the system is an unlabelled image and the results of automatic annotation are object and scene labels.First,from each image,low-level features are extracted which represent the geometric and photometric properties of the image.Each image is then represented by the m -component feature vector x ¼ðx 1;x 2;…;x m ÞT .The obtained features are used for object classi fication.The assumption is that more than one object can be relevant for image annotation,so multi-label classi fication methods are used.The result of the image classi fication in the first tier is object labels from the set C of all object labels.In the second tier,the object labels are used for scene-level classi fication.Scene-level classi fication is supported by the image-domain knowledge base,where each scene is de fined as a composition of typical object classes.Other concepts at different levels of abstraction can be inferred using the de fined hierarchical relations among concepts.Relations among objects and scenes can also be used to check the consistency of object labels and to discard those that do not fit the likely context.In this way,the propagation of errors is reduced through the hierarchical structure of concepts during the inference of scenes or more abstract concepts.To represent fuzzy knowledge about the domain of interest,a knowledge-representation scheme based on the Fuzzy Petri Net formalism [26]is de fined.The proposed scheme has the ability to cope with uncertain,imprecise,and fuzzy knowledge about concepts and relations,and enables reasoning about them.The knowledge-representation scheme related to image domain and inference-based algorithms for annotation re finement and for scene recognition are implemented as Java classes.The elements of the scheme are gen-erated utilising the proposed data-driven algorithms for acquisition of fuzzy knowledge,which are implemented in Matlab.4.Multi-label classi fication in the first tierIn the general case,an image can contain multiple objects in both the foreground and the background.All objects appearing on the image,or many of them,can be useful or important for automatic image interpretation.All objects appearing on the image are potentially useful and hold important information for automatic image interpretation.To preserve as much information about the image as possible,many object labels are usually assigned to an image.Therefore,image annotation at the object level is an inherently multi-label classi fication problem.Here we have used different feature sets with appropriate multi-label classi fication methods.4.1.Feature setsThe variety of perceptual and semantic information about objects in an outdoor image could be contained in global low-level features such as dominant colour,spatial structure,colour histogram and texture.For classi fication in the first tier,we used a feature set made up of dominant colours of the whole image (global dominant colours)and of parts of the image (region based dominant colours),colour moments and the GIST descriptor.The used set of features is constructed to represent the varied aspects of image representation including colour and structure information.The dominant colours were selected from the colour histogram of the image.The colour histogram was calculated for each of the RGB colour channels of the whole image.Next,histogram bins with the highest values for each channel were selected as dominant colours.AfterOriginal image 3x1 grid Central andbackground regionFig.4.Original images and the arrangement of a 3x1image grid and central and background regions from which the dominant colours features were computed.M.Ivasic-Kos et al./Pattern Recognition 52(2016)287–305290experimenting with different number of different colours (3–36)per channel we chose to use 12of them in each image as features (referred to as DC)for our classi fication tasks.The information about the colour layout of an image was preserved using 5local RGB histograms,from which dominant colours were extracted in the same manner as for the whole image.The local dominant colours are referred to as DC1to DC5.To calculate the DC1,DC2and DC3local features,a histogram was computed for each cell of a 3x1grid applied to each image.The DC4feature was computed in the central part of the image,presumably containing the main image object,and the DC5feature in the surrounding part that would probably contain the background,as shown in Fig.4.The size of the central part was 1/4of the diagonal size of the whole image,and of the same proportions.There are 12dominant colours with 3channels giving a 36-dimensional feature vector of global DC.The size of local dominant colours vector (DC1to DC5)is 180(12dominant coloursx3channelsx5image parts).Additionally,we computed the colour moments (CM)for each RGB channel:mean,standard deviation,skewness and kurtosis.The size of the CM feature vector is 12(4colour moments for 3channels).The GIST image descriptor [23]that was proven to be ef ficient for scene recognition was also used as a region-based feature.It is a structure-based image descriptor that refers to the dominant spatial structure of the image characterized by the properties of its boundaries (e.g.,the size,degree of openness,perspective)and its content (e.g.,naturalness,roughness).The spatial properties are estimated using the global features computed as a weighted combination of Gabor-like multi scale-oriented filters.In our case,we used 8Â8encoding samples in the GIST descriptor within 8orientations per 8scales of image components,so the GIST feature vector has 512components.We performed the classi fication tasks using all the extracted features (DC,DC1..DC5,CM,GIST).The size of the feature vector used for classi fication was 740.Since the size of the feature vector is large in proportion to the number of images,we also tested the classi fication performance using five subsets.4.2.Image annotation at object levelWe attempt to label images with both foreground and background object labels,assuming that they are all useful for image annotation.Since an image may be associated with multiple object labels,the annotation at the object level in the first tier is treated as a multi-label classi fication problem.Multi-label image classi fication can be formally expressed as φ:E -P ðC Þ,where E is a set of image examples,P ðC Þis a power set of the set of class labels C and there is at least one example e j A E that is mapped into two or more classes,i.e.(e j A E:φe j ÀÁZ 2j .The methods most commonly used to tackle a multi-label classi fication problem can be divided into two different approaches [29].In the first,the multi-label classi fication problem is transformed into more single-label classi fication problems [20],known as the data transformation approach.The aim is to transform the data so that any classi fication method designed for single-label classi fication can be applied.However,data transformation approaches usually do not exploit the patterns of labels that naturally occur in multi-label setting.For example,dolphin usually appears together with sea ,while observing the labels cheetah and fish is much less common.On the other hand,algorithm adaptation methods extend speci fic learning algorithms in order to handle multi-label data directly and can sometimes utilise the inherent co-occurrence of labels.Another issue in multi-label classi fication is the fact that typically some labels occur more rarely in the data and it may be dif ficult to train individual classi fiers for those labels.For the multi-label classi fication task,we used the Multi-label k-Nearest Neighbour (ML-kNN)[34],a lazy learning algorithm derived from the traditional kNN algorithm,and RAKEL (RAndom k-labELsets)[28]which is an example of data adaptation methods.The kNN classi fier is a nonlinear and adaptive algorithm that is rather well suited for rarely occurring labels.The RAKEL algorithm considers a small random subset of labels and uses a single-label classi fier for the prediction of each element in the power set of this subset.In this way,the algorithm aims to take into account label correlations using single-label classi fiers.In our case,kNN classi fier and the C4.5classi fication tree are used as base single-label classi fiers for RAKEL.The base classi fiers are applied to subtasks with a manageable number of labels.It was experimentally shown that for our task the best results are obtained using RAKEL with the C4.5as base classi fier.We also used the Naïve Bayes (NB)classi fier along with data transformation.The data were transformed so that each instance with multiple labels was replaced with elements of the binary relation ρD E ÂC between a set of samples E and a set of class labels C .An ordered pair ðe ;c ÞA E ÂC can be interpreted as “e is classi fied into c ”and is often written as e ρc .For example,if an image example e 15A E was annotated with labels φe 15ðÞ¼lion ;sky ;grass ÈÉ;lion ;sky ;grass A E,it was transformed into three single-label instances e 15ρlion ;e 15ρsky ;e 15ρgrass .With NB,a model is learned for each label separately,and the result is the union of all labels that are obtained as a result of a binary classi fication in one-vs.-all fashion and with RAKEL for small subsets of labels.Unlike the NB,with RAKEL and ML-kNN a model is learned for small subsets of labels and therefore as a result of the classi fication only such subsets can be obtained.In both cases,incorrect labels can also be assigned to an image along with correct ones if they are in the assigned subset for RAKEL and kNN case,or if they are a result of binary misclassi fication for NB.For the one-vs.-all case,the inherent correlations among labels are not used,but an annotation re finement post-processing step is proposed here that incorporates label correlations and excludes from results labels that do not normally occur together with other labels in images.The object labels and the corresponding con fidence obtained as a result of the multi-label classi fication in the first tier of the image annotation system are used as features for the recognition of scenes in the second tier.In case when the Naïve Bayes classi fier is used,con fidence of each object is set according to the posterior probability of that object class.Otherwise,when classi fier used in the first tier cannot provide information on the con fidence of the classi fication,e.g.when the ML-kNN classi fier is used,the con fidence value is set to 1.Object labels from the annotation set φe ðÞD C and the corresponding con fidence values make the set Φe ðÞ&φe ðÞÂ0;1½ that is input to the inference based algorithm for scene recognition in the second tier.5.Knowledge representation in the second tierThe assumption is that there are typical objects of which scenes are composed,so each scene is treated as a composition of objects that are selected as typical,based on the used training dataset.To enable making inferences about scenes,relations between objects and scenes and relations between objects in an image are modelled and stored in a knowledge base.Facts about scenes and objects are collected M.Ivasic-Kos et al./Pattern Recognition 52(2016)287–305291automatically from the training dataset and organised in a knowledge representation scheme.It is a faster and more effective way of collecting speci fic knowledge then manually building the knowledge base,since it is a broad,uncertain domain of knowledge and expertise is not clearly de fined.An expert only de fines facts related to general knowledge like hierarchical relationships between concepts so the need for manual de finition of facts is reduced drastically.However,an expert can still review generated facts and modify them if necessary.Due to the inductive nature of generating facts,the acquired knowledge is often incomplete and unreliable,so the ability to draw conclusions about scenes from imprecise,vague knowledge becomes necessary.Additionally,the inference of an unknown scene relies on facts about objects that are obtained as classi fication results in the first tier and that are subject to errors.Apart from using a knowledge representation scheme that allows representation of fuzzy knowledge and can handle uncertain reasoning,the algorithm for object annotation re finement is proposed to detect and discard those objects that do not fit the context and so to prevent error propagation.For the purpose of knowledge representation,a scheme based on KRFPN formalism [26],which is in turn based on the Fuzzy Petri Net,is de fined to represent knowledge about outdoor images as well as to support subsequent reasoning about scenes in the second tier of the system.The de finition of a fuzzy knowledge representation scheme adapted for automatic image annotation,named KRFPNs and the novel data-driven algorithms for automatic acquisition of knowledge about object and scenes,and relations between them,are detailed below.5.1.De finition of the KRFPNs scheme adapted for image annotationThe elements of the knowledge base used for the annotation of outdoor images,particularly fuzzy knowledge about scenes and relationships among objects,are presented using the KRFPNs scheme.The KRFPNs scheme concatenates elements of the Fuzzy Petri Net (FPN )with a semantic interpretation and is de fined as:KRFPNs ¼FPN ;^α;^β;D ;Σ :ð1ÞFPN ¼ðP ;T ;I ;O ;M ;Ω;f ;c Þis a Fuzzy Petri Net where P ¼p 1;p 2;:::;p n ÈÉ;n A N is a finite set of places,T ¼t 1;t 2;:::;t m f g ;m A N is a finite set of transitions,I :T -P P ðÞ=∅is the input function and O :T -P P ðÞ=∅is the output function,where P P ðÞis the power set of places.The sets of places and transitions are disjunctive,P \T ¼∅,and the link between places and transitions is given with the input and output functions.Elements P ;T ;I ;O are parts of an ordinary Petri Net (PN).M ¼m 1;m 2;…;m r f g ;r Z 0is a set of tokens used to de fine the state of a PN in discrete steps.The state of the net in step w is de fined with distribution of tokens:Ωw :P -P M ðÞ,where P M ðÞis a power set of M .The initial distribution of tokens is denoted as Ω0and in our case,in the initial distribution a place can have no or at most one token,so j Ω0p ðÞj A 0;1f g .A place p is marked in step w if Ωw ðp Þa ∅.Marked places are important for the firing of transitions and the execution of the net.Firing of the transitions changes the distribution of tokens and causes the change of the net state.Fuzziness in the scheme is implemented with functions c :M -½0;1 and f :T -½0;1 that associate tokens and transitions with truth values in the range [0,1],where zero means “not true ”and one “always true ”.Semantic interpretation of places and transitions is implemented in the scheme with functions ^αand ^βin order to link elements of the scheme with concepts related to image annotation.The function ^α:P -D Â0;1½ ;maps a place p i A P to a concept and the corresponding con fidence value d j ;c d j ÀÁÀÁ:d j A D ;c d j ÀÁ¼max c m l ðÞ;m l A Ωp i ÀÁ.If there are no tokens in the place,the token value is zero.An alternative function thatdiffers in the codomain is de fined ;α:P -D .The function α,is a bijective function,so its inverse exists αÀ1:D -P .Those functions are useful for mapping a place p i A P in the scheme to a concept d j and vice versa,when no information about the con fidence of a concept is needed.The set of concepts D contains object and scene labels used for annotation of outdoor images,D ¼C [SC .A set C contains object labels such as,Airplane ;Train ;Shuttle ;Ground ;Cloud ;Sky ;Coral ;ÈDolphin ;Bird ;Lion ;Mountain g &C and set SC contains scene labels on different is SC ¼{SceneAirplane,SceneBear,..,Seaside,Space,Undersea}.The function ^β:T -ΣÂ0;1½ &D ÂD Â0;1½ ;maps a transition t j A T to a pair d pk σi d pl ;c j ÀÁ:σi A Σ;d pk ;d pl A D ;αÀ1ðd pk Þ¼p k ;αÀ1ðd pl Þ¼p l ;p k A I t j ÀÁ;p l A O t j ÀÁ;c j ¼f t j ÀÁwhere the first element is the relation between concepts linked with input and output places of the tran-sition t j and the second is its con fidence value.The function β:T -Σthat differs in the codomain is also de fined and is useful for checking whether there is an association between a transition and a relation.The set Σcontains relations and is de fined as Σ¼o ccurs _with ;is _part _of ;is _a .The relation σ1¼occurs _with is a pseudo-spatial relation that is de fined between objects and it models the joint occurrence of objects in images.The compositional relation σ2¼is _part _of is de fined between scenes and objects,re flecting the assumption that objects are parts of a scene.The hierarchical relation σ3¼is _a is de fined between scenes to model a taxonomic hierarchy between scenes.The con fidence values of σ1¼occurs _with and σ2¼is _part _of relations are de fined according to the used training dataset,and in case of σ3¼is _a it is de fined by an expert.5.2.Data-driven knowledge acquisition and automatic de finition of KRFPNs schemeThe process of knowledge acquisition related to annotation re finement and scene recognition is supported with algorithms that enable automatic extraction of knowledge from existing data in the training set and arrangement of acquired knowledge in the knowledge representation scheme.Automated knowledge acquisition is a fast and economic way of collecting knowledge about the domain and can be used to estimate the con fidence values of acquired facts especially when the domain knowledge is by nature broad as is the case with the image annotation.In that case experts are not able to provide a comprehensive overview of possible relationships between objects in different scenes or objectively evaluate their reliability.Automatically accumulated knowledge is very dependent on the examples in the data set,and to make it more general the estimated con fidences of facts are further adjusted.In addition,to improve the knowledge base consistency,completeness and accuracy,and to enhance the inference process,an expert can review or modify the existing facts and rules or add new ones.M.Ivasic-Kos et al./Pattern Recognition 52(2016)287–305292。
fine-to-coarse reconstruction算法-概述说明以及解释1.引言1.1 概述:在计算机视觉领域,图像重建是一项重要的任务,其目的是从输入的低分辨率图像中生成高质量的高分辨率图像。
Fine-to-Coarse Reconstruction算法是一种常用的图像重建算法,它通过逐渐增加图像的分辨率级别,从粗到细地重建图像,以获得更加清晰、细节丰富的图像。
Fine-to-Coarse Reconstruction算法在图像处理和计算机视觉中有着广泛的应用,能够有效地提高图像质量和细节信息的还原程度。
本文将详细介绍Fine-to-Coarse Reconstruction算法的原理、应用和优势,希望能为读者提供深入了解和应用该算法的指导。
1.2 文章结构本文主要分为引言、正文和结论三部分。
在引言部分中,我们将对Fine-to-Coarse Reconstruction算法进行概述,并介绍文章的结构和目的。
在正文部分,我们将详细介绍Fine-to-Coarse Reconstruction算法的原理以及其在实际应用中的表现。
我们将重点讨论该算法在图像处理、计算机视觉等领域的应用,并探讨其优势和局限性。
最后,在结论部分,我们将对整篇文章进行总结,展望Fine-to-Coarse Reconstruction算法的未来发展方向,并留下一些思考和结束语。
整个文章结构清晰,层次分明,将帮助读者全面了解和理解Fine-to-Coarse Reconstruction算法的重要性和价值。
1.3 目的Fine-to-Coarse Reconstruction算法的目的是通过逐步从细节到整体的重建过程,实现对图像或模型的高效重建。
通过逐步迭代的方式,算法能够在保持细节的同时,提高重建的速度和准确性。
本文旨在深入探讨Fine-to-Coarse Reconstruction算法的原理、应用和优势,以期能够为相关研究和应用提供更多的启发和帮助。
(19)中华人民共和国国家知识产权局(12)发明专利申请(10)申请公布号 (43)申请公布日 (21)申请号 201910286781.8(22)申请日 2019.04.10(71)申请人 大连民族大学地址 116600 辽宁省大连市经济技术开发区辽河西路18号(72)发明人 吴宝春 郑蕊蕊 毕佳晶 贺建军 辛守宇 (74)专利代理机构 大连智高专利事务所(特殊普通合伙) 21235代理人 刘斌(51)Int.Cl.G06T 3/40(2006.01)(54)发明名称满文图像超分辨率重建的生成器构建方法(57)摘要满文图像超分辨率重建的生成器构建方法,属于计算机图像处理领域,为了解决低分辨率满文图像进行超分辨率重建的生成器构建问题,使用5个相同结构的残差块和2个亚像素卷积层构建而成生成器,能够学习高低分辨率的满文图像间映射关系,从而对低分辨率满文图像进行超分辨率重建。
权利要求书1页 说明书7页 附图1页CN 110009568 A 2019.07.12C N 110009568A权 利 要 求 书1/1页CN 110009568 A1.一种满文图像超分辨率重建的生成器构建方法,其特征在于:使用5个相同结构的残差块和2个亚像素卷积层构建而成生成器。
2.如权利要求1所述的满文图像超分辨率重建的生成器构建方法,其特征在于:生成器结构是:第1操作是Input输入层,用于输入图像是训练数据中低分辨率RGB三通道图像;第2操作是G-Conv-1层,是卷积层,卷积核为9像素×9像素,步长1像素,包含64个滤波器;第3操作是PReLu层,其将G-Conv-1层的输入信号进行非线性变换,;第4-8操作均是Residual block残差块,且五个操作是5个结构相同的Residual block 残差块,用于对低分辨率图像的图示信息特征进行提取;第9操作包括G-Conv-2卷积层、BN操作、Sum操作,其中G-Conv-2卷积层的卷积核为3像素×3像素,步长1像素,包含64个滤波器,BN表示批量归一化操作,Sum表示输出求和;第10操作包括G-Conv-3卷积层、Sub-Pixel CN亚像素卷积层、PReLu层,其中G-Conv-3卷积层的卷积核为3像素×3像素,步长1像素,包含256个滤波器;Sub-Pixel CN亚像素卷积层具有2层,用于对提取到的低分辨率图像特征进行重组而生成高分辨率图像,PReLu层将上一层的输入信号进行非线性变换;第11操作是G-Conv-4是卷积层,卷积核为9像素×9像素,步长1像素,包含3个滤波器;第12操作是Output输出层。
Reflection Detection in Image SequencesMohamed Abdelaziz Ahmed Francois Pitie Anil KokaramSigmedia,Electronic and Electrical Engineering Department,Trinity College Dublin{/People}AbstractReflections in image sequences consist of several layers superimposed over each other.This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g.motion estimation and object recognition.This work presents an automated technique for detecting reflections in image se-quences by analyzing motion trajectories of feature points. It models reflection as regions containing two different lay-ers moving over each other.We present a strong detector based on combining a set of weak detectors.We use novel priors,generate sparse and dense detection maps and our results show high detection rate with rejection to patholog-ical motion and occlusion.1.IntroductionReflections are often the result of superimposing differ-ent layers over each other(see Fig.1,2,4,5).They mainly occur due to photographing objects situated behind a semi reflective medium(e.g.a glass window).As a result the captured image is a mixture between the reflecting surface (background layer)and the reflected image(foreground). When viewed from a moving camera,two different layers moving over each other in different directions are observed. This phenomenon violates many of the existing models for video sequences and hence causes many consumer video applications to fail e.g.slow-motion effects,motion based sports summarization and so on.This calls for the need of an automated technique that detects reflections and assigns a different treatment to them.Detecting reflections requires analyzing data for specific reflection characteristics.However,as reflections can arise by mixing any two images,they come in many shapes and colors(Fig.1,2,4,5).This makes extracting characteris-tics specific to reflections not an easy task.Furthermore, one should be careful when using motion information of re-flections as there is a high probability of motion estimation failure.For these reasons the problem of reflection detec-tion is hard and was not examined before.Reflection can be detected by examining the possibility of decomposing an image into two different layers.Lots of work exist on separating mixtures of semi-transparent lay-ers[17,11,12,7,4,1,13,3,2].Nevertheless,most of the still image techniques[11,4,1,3,2]require two mixtures of the same layers under two different mixing conditions while video techniques[17,12,13]assume a simple rigid motion for the background[17,13]or a repetitive one[12].These assumptions are hardly valid for reflections on mov-ing image sequences.This paper presents an automated technique for detect-ing reflections in image sequences.It is based on analyzing spatio-temporal profiles of feature point trajectories.This work focuses on analyzing three main features of reflec-tions:1)the ability of decomposing an image into two in-dependent layers2)image sharpness3)the temporal be-havior of image patches.Several weak detectors based on analyzing these features through different measures are pro-posed.Afinal strong detector is generated by combining the weak detectors.The problem is formulated within a Bayesian framework and priors are defined in a way to re-ject false alarms.Several sequences are processed and re-sults show high detection rate with rejection to complicated motion patterns e.g.blur,occlusion,fast motion.Aspects of novelty in this paper include:1)A technique for decomposing a color still image containing reflection into two images containing the structures of the source lay-ers.We do not claim that this technique could be used to fully remove reflections from videos.What we claim is that the extracted layers can be useful for reflection detection since on a block basis,reflection is reduced.This technique can not compete with state of the art separation techniques.However we use this technique because it works on single frames and thus does not require motion,which is not the case with any existing separation technique.2)Diagnos-tic tools for reflection detection based on analyzing feature points trajectories3)A scheme for combining weak de-tectors in one strong reflection detector using Adaboost4) Incorporating priors which reject spatially and temporally impulsive detections5)The generation of dense detection maps from sparse detections and using thresholding by hys-1Figure1.Examples of different reflections(shown in green).Reflection is the result of superimposing different layers over each other.As a result they have a wide range of colors and shapes.teresis to avoid selecting particular thresholds for the systemparameters6)Using the generated maps to perform betterframe rate conversion in regions of reflection.Frame rateconversion is a computer vision application that is widelyused in the post-production industry.In the next section wepresent a review on the relevant techniques for layer separa-tion.In section3we propose our layer separation technique.We then go to propose our Bayesian framework followed bythe results section.2.Review on Layer Separation TechniquesA mixed image M is modeled as a linear combinationbetween the source layers L1and L2according to the mix-ing parameters(a,b)as follows.M=aL1+bL2(1)Layer separation techniques attempt to decompose reflec-tion M into two independent layers.They do so by ex-changing information between the source layers(L1andL2)until their mutual independence is maximized.Thishowever requires the presence of two mixtures of the samelayers under two different mixing proportions[11,4,1,3,2].Different separation techniques use different forms ofexpressing the mutual layer independence.Current formsused include minimizing the number of corners in the sep-arated layers[7]and minimizing the grayscale correlationbetween the layers[11].Other techniques[17,12,13]avoid the requirement ofhaving two mixtures of the same layers by using tempo-ral information.However they often require either a staticbackground throughout the whole image sequence[17],constraint both layers to be of non-varying content throughtime[13],or require the presence of repetitive dynamic mo-tion in one of the layers[12].Yair Weiss[17]developed atechnique which estimates the intrinsic image(static back-ground)of an image sequence.Gradients of the intrinsiclayer are calculated by temporallyfiltering the gradientfieldof the sequence.Filtering is performed in horizontal andvertical directions and the generated gradients are used toreconstruct the rest of the background image.yer Separation Using Color IndependenceThe source layers of a reflection M are usually color in-dependent.We noticed that the red and blue channels ofM are the two most uncorrelated RGB channels.Each ofthese channels is usually dominated by one layer.Hence thesource layers(L1,L2)can be estimated by exchanging in-formation between the red and blue channels till the mutualindependence between both channels is r-mation exchange for layer separation wasfirst introducedby Sarel et.al[12]and it is reformulated for our problem asfollowsL1=M R−αM BL2=M B−βM R(2)Here(M R,M B)are the red and blue channels of themixture M while(α,β)are separation parameters to becalculated.An exhaustive search for(α,β)is performed.Motivated by Levin et.al.work on layer separation[7],thebest separated layer is selected as the one with the lowestcornerness value.The Harris cornerness operator is usedhere.A minimum texture is imposed on the separated lay-ers by discarding layers with a variance less than T x.For an8-bit image,T x is set to2.The removal of this constraintcan generate empty meaningless layers.The novelty in thislayer separation technique is that unlike previous techniques[11,4,1,3,2],it only requires one image.Fig.2shows separation results generated by the proposedtechnique for different images.Results show that our tech-nique reduces reflections and shadows.Results are only dis-played to illustrate a preprocess step,that is used for one ofour reflection measures and not to illustrate full reflectionremoval.Blocky artifacts are due to processing images in50×50blocks.These artifacts are irrelevant to reflectiondetection.4.Bayesian Inference for Reflection Detection(BIRD)The goal of the algorithm is tofind regions in imagesequences containing reflections.This is achieved by an-(a)(b)(c)(d)(e)(f)Figure 2.Reducing reflections/shadows using the proposed layer separation technique.Color images are the original images with reflec-tions/shadows (shown in green).The uncolored images represent one source layer (calculated by our technique)with reflections/shadows reduced.In (e)reflection still remains apparent however the person in the car is fully removed.alyzing trajectories of feature points.Trajectories are gen-erated using KLT feature point tracker [9,14].Denote P inas the feature point of i th track in frame n and F inas the 50×50image patch centered on P in .Trajectories are ana-lyzed by examining all feature points along tracks of length more than 4frames.For each point,analysis are carriedover the three image patches (F i n −1,F i n ,F in +1).Based onthe analysis outcome,a binary label field l in is assigned toeach F i n .l in is set to 1for reflection and 0otherwise.4.1.Bayesian FrameworkThe system derives an estimate for l in from the posterior P (l |F )(where (i,n)are dropped for clarity).The posterior is factorized in a Bayesian fashion as followsP (l |F )=P (F|l )P (l |l N )(3)The likelihood term P (F|l )consists of 9detectors D 1−D 9each performing different analysis on F and operating at thresholds T 1−9(see Sec.4.5.1).The prior P (l |l N )en-forces various smoothness constraints in space and time toreject spatially and temporally impulsive detections and to generate dense detection masks.Here N denote the spatio-temporal neighborhood of the examined site.yer Separation LikelihoodThis likelihood measures the ability of decomposing animage patch F in into two independent layers.Three detec-tors are proposed.Two of them attempts to perform layer separation before analyzing data while the third measures the possibility of layer separation by measuring the color channels independence.Layer Separation via Color Independence D 1:Our technique (presented in Sec.3)is used to decompose the im-age patch F i n into two layers L 1i n and L 2in .This is applied for every point along every track.Reflection is detected by comparing the temporal behavior of the observed image patches F with the temporal behavior of the extracted lay-ers.Patches containing reflection are defined as ones with higher temporal discontinuity before separation than after separation.Temporal discontinuity is measured using struc-ture similarity index SSIM[16]as followsD1i n=max(SS(G i n,G i n−1),SS(G i n,G i n+1))−max(SS(L i n,L i n−1),SS(L i n,L i n+1))SS(L i n,L i n−1)=max(SS(L1i n,L1i n−1),SS(L2i n,L2i n−1))) SS(L i n,L i n+1)=max(SS(L1i n,L1i n+1),SS(L2i n,L2i n+1)) Here G=0.1F R+0.7F G+0.2F B where(F R,F G,F B) are the red,green and blue components of F respectively. SS(G i n,G i n−1)denotes the structure similarity between the two images F i n and F i n−1.We only compare the structures of(G i n,G i n−1)by turning off the luminance component of SSIM[16].SS(.,.)returns an a value between0−1where 1denotes identical similarity.Reflection is detected if D1i n is less than T1.Intrinsic Layer Extraction D2:Let INTR i denote the intrinsic(reflectance)image extracted by processing the 50×50i th track using Yair technique[17].In case of re-flection the structure similarity between the observed mix-ture F i n and INTR i should be low.Therefore,F i n isflagged as containing reflection if SS(F i n,INTR i)is less than T2.Color Channels Independence D3:This approach measures the Generalized Normalized Cross Correlation (GNGC)[11]between the red and blue channels of the ex-amined patch F i n to infer whether the patch is a mixture between two different layers or not.GNGC takes values between0and1where1denotes perfect match between the red and blue channels(M R and M B respectively).This analysis is applied to every image patch F i n and reflection is detected if GNGC(M R,M B)<T3.4.3.Image Sharpness Likelihood:D4,D5Two approaches for analyzing image sharpness are used. Thefirst,D4,estimates thefirst order derivatives for the examined patch F i n andflags it as containing reflection if the mean of the gradient magnitude within the examined patch is smaller than a threshold T4.The second approach, D5,uses the sharpness metric of Ferzil et.al.[5]andflagsa patch as reflection if its sharpness value is less than T5.4.4.Temporal Discontinuity LikelihoodSIFT Temporal Profile D6:This detectorflags the ex-amined patch F i n as reflection if its SIFT features[8]are undergoing high temporal mismatch.A vector p=[x s g]is assigned to every interest point in F i n.The vector contains the position of the point x=(x,y),scale and dominate ori-entation from the SIFT descriptor,s=(δ,o),and the128 point SIFT descriptor g.Interest points are matched with neighboring frames using[8].F i n isflagged as reflection if the average distance between the matched vectors p is larger than T6.Color Temporal Profile D7:This detectorflags the im-age patch F i n as reflection if its grayscale profile does not change smoothly through time.The temporal change in color is defined as followsD7i n=min( C i n−C i n−1 , C i n−C i n+1 )(4) Here C i n is the mean value for G i n,the grayscale representa-tion of F i n.F i n isflagged as reflection if D7i n>T7.AutoCorrelation Temporal Profile D8:This detector flags the image patch F i n as reflection if its autocorrelation is undergoing large temporal change.The temporal change in the autocorrelation is defined as followsD8i n=min(1NA i n−A i n−1 2,1NA i n−A i n+1 2)(5)A i n is a vector containing the autocorrelation of G i n while N is the number of pels in A i n.F i n isflagged as reflection if D8i n is bigger than T8.Motion Field Divergence D9:D9for the examined patch F i n is defined as followsD9i n=DFD( div(d(n)) + div(d(n+1)) )/2(6) DFD and div(d(n))are the Displaced Frame Difference and Motion Field Divergence for F i n.d(n)is the2D motion vector calculated using block matching.DFD is set to the minimum of the forward and backward DFDs.div(d(n)) is set to the minimum of the forward and backward di-vergence.The divergence is averaged over blocks of two frames to reduce the effect of possible motion blur gener-ated by unsteady camera motion.F i n isflagged as reflection if D9>T9.4.5.Solving for l in4.5.1Maximum Likelihood(ML)SolutionThe likelihood is factorized as followsP(F|l)=P(l|D1)P(l|D2−8)P(l|D9)(7)Thefirst and last terms are solved using D1<T1and D9>T9respectively.D2−8are used to form one strong detector D s and P(l|D2−8)is solved by D s>T s.We found that not including(D1,D9)in D s generates better de-tection results than when included.Feature analysis of each detector are averaged over a block of three frames to gen-erate temporally consistent detections.T9isfixed to10in all experiments.In Sec.4.5.2we avoid selecting particular thresholds for(T1,T s)by imposing spatial and temporal priors on the generated maps.Calculating D s:The strong detector D s is expressed as a linear combination of weak detectors operating at different thresholds T as followsP(l|D2−8)=Mk=1W(V(k),T)P(D V(k)|T)(8)False Alarm RateC o r r e c tD e t e c t i o n R a t eFigure 3.ROC for D 1−9and D s .The Adaboost detector D s out-performs all other techniques and D 1is the second best in the range of false alarms <0.1.Here M is the number of weak detectors (fixed to 20)used in forming D s and V (k )is a function which returns a value between 2-8to indicate which detectors from D 2−8are used.k indexes the weak detectors in order of their impor-tance as defined by the weights W .W and T are learned through Adaboost [15](see Tab.1).Our training set consist of 89393images of size 50×50pels.Reflection is modeled in 35966images each being a synthetic mixture between two different images.Fig.3shows the the Receiver Operating Characteristic (ROC)of applying D 1−9and D s on the training samples.D s outperforms all the other detectors due to its higher cor-rect detection rate and lower false alarms.D 6D 8D 5D 3D 2D 4D 7W 1.310.960.480.520.330.320.26T0.296.76e −60.040.950.6172.17Table 1.Weights W and operating thresholds T for the best seven detectors selected by Adaboost.4.5.2Successive Refinement for Maximum A-Posteriori (MAP)The prior P (l |l N )of Eq.3imposes spatial and temporal smoothness on detection masks.We create a MAP estimate by refining the sparse maps from the previous ML steps.We first refine the labeling of all the existing feature points P in each image and then use the overlapping 50×50patches around the refined labeled points as a dense pixel map.ML Refinement:First we reject false detections from ML which are spatially inconsistent.Every feature point l =1is considered and the sum of the geodesic distance from that site to the two closest neighbors which are labeledl =1is measured.When that distance is more than 0.005then that decision is rejected i.e.we set l =0.Geodesic distances allow the nature of the image material between point to be taken in to account more effectively and have been in use for some time now [10].To reduce the compu-tational load of this step,we downsample the image mas-sively by 50in both directions.This retains gross image topology only.Spatio-Temporal Dilation:Labels are extended in space and time to other feature points along their trajecto-ries.If l in =1,all feature points lying along the track i are set to l =1.In addition,l is extended to all image patches (F n )overlapping spatially with the examined patch.This generates a denser representation of the detection masks.We call this step ML-Denser.Hysteresis:We can avoid selecting particular thresholds [T 1,T s ]for BIRD by applying Hysteresis using a set of dif-ferent thresholds.Let T H =[−0.4,5]and T L =[0,3]de-note a high and low configuration for [T 1,T s ].Detection starts by examining ML-Denser at high thresholds.High thresholds generate detected points P h with high confi-dence.Points within a small geodesic distance (<D geo )and small euclidean distance (<D euc )to each other are grouped together.Here we use (D geo ,D euc )=(0.0025,4)and resize the examined frames as mentioned previously.The centroids of each group is then calculated.Thresholds are lowered and a new detection point is added to an exist-ing group if it is within D geo and D euc to the centroid of this group.This is the hysteresis idea.If however the examined point has a large euclidean distance (>D euc )but a small geodesic distance (<D geo )to the centroid of all existing groups,a new group is formed.Points at which distances >D geo and >D euc are regarded as outliers and discarded.Group centroids are updated and the whole process is re-peated iteratively till the examined threshold reaches T L .The detection map generated at T L is made more dense by performing Spatio-Temporal Dilation above.Spatio-Temporal ‘Opening’:False alarms of the previ-ous step are removed by propagating the patches detected in the first frame to the rest of the sequence along the fea-ture point trajectories.A detection sample at fame n is kept if it agrees with the propagated detections from the previous frame.Correct detections missed from this step are recovered by running Spatio-Temporal Dilation on the ‘temporally eroded’solution.This does mean that trajecto-ries which do not start in the first frame are not likely to be considered,however this does not affect the performance in our real examples shown here.The selection of an optimal frame from which to perform this opening operation is the subject of future work.=Figure 4.From Top:ML (calculated at (T 1,T s )=(−0.13,3.15)),Hysteresis and Spatio-Temporal ‘Opening’for three consecutive frames from the SelimH sequence.Reflection is shown in red and detected reflection using our technique is shown in green.Spatio-Temporal ‘Opening’rejects false alarms generated by ML and by Hysteresis (shown in yellow and blue respectively).5.Results5.1.Reflection Detection15sequences containing 932frames of size 576×720are processed with BIRD.Full sequences with reflection de-tection can be found in /Misc/CVPR2011.Fig.4compares the ML,Hysteresis and Spatio-Temporal ‘Opening’for three consecutive frames from the SelimH se-quence.This sequence contains occlusion,motion blur and strong edges in the reflection (shown in red).The ML so-lution (first line)generates good sparse reflection detection (shown in green),however it generates some errors (shown in yellow).Hysteresis rejects these errors and generates dense masks with some false alarm (shown in blue).These false alarms are rejected by Spatio-Temporal ‘Opening’.Fig.5shows the result of processing four sequences us-ing BIRD.In the first two sequences,BIRD detected regions of reflections correctly and discarded regions of occlusion (shown in purple)and motion blur (shown in blue).In Girl-Ref most of the sequence is correctly classified as reflection.In SelimK1the portrait on the right is correctly classified as containing reflection even in the presence of motion blur (shown in blue).Nevertheless,BIRD failed in detecting the reflection on the left portrait as it does not contain strong distinctive feature points.Fig.6shows the ROC plot for 50frames from SelimH .Here we compare our technique BIRD against DFD and Im-age Sharpness[5].DFD,flags a region as reflection if it has high displaced frame difference.Image Sharpness flags a region as reflection if it has low sharpness.Frames are pro-cessed on 50×50blocks.Ground truth reflection masks are generated manually and detection rates are calculated on pel basis.The ROC shows that BIRD outperforms the other techniques by achieving a very high correct detection rate of 0.9for a false detection rate of 0.1.This is a major improvement over a correct detection rate of 0.2and 0.1for DFD and Sharpness respectively.5.2.Frame Rate Conversion:An applicationOne application for reflection detection is improving frame rate conversion in regions of reflection.Frame rate conversion is the process of creating new frames from ex-isting ones.This is done by using motion vectors to inter-polate objects in the new frames.This process usually fails in regions of reflection due to motion estimation failure.Fig.7illustrates the generation of a slow motion effect for the person’s leg in GirlRef (see Fig.5,third line).This is done by doubling the frame rate using the Foundry’s Kro-nos plugin [6].Kronos has an input which defines the den-sity of the motion vector field.The larger the density theFigure 5.Detection results of BIRD (shown in green)on,From top:BuilOnWind [10,35,49],PHouse 9-11,GirlRef [45,55,65],SelimK132-35.Reflections are shown in red.Good detections are generated despite occlusion (shown in purple)and motion blur (shown in blue).For GirlRef we replace Hysteresis and Spatio-Temporal ‘Opening’with a manual parameter configuration of (T 1,T s )=(−0.01,3.15)followed by a Spatio-Temporal Dilation step.This setting generates good detections for all examined sequences with static backgrounds.more detailed the vector and hence the better the interpo-lation.However,using highly detailed vectors generate ar-tifacts in regions of reflections as shown in Fig.7(second line).We reduce these artifacts by lowering the motion vec-tor density in regions of reflection indicated by BIRD (see Fig.7,third line).Image sequence results and more exam-ples are available in /Misc/CVPR2011.6.ConclusionThis paper has presented a technique for detecting reflec-tions in image sequences.This problem was not addressed before.Our technique performs several analysis on feature point trajectories and generates a strong detector by com-bining these analysis.Results show major improvement over techniques which measure image sharpness and tem-poral discontinuity.Our technique generates high correct detection rate with rejection to regions containing compli-cated motion eg.motion blur,occlusion.The technique was fully automated in generating most results.As an ap-plication,we showed how the generated detections can be used to improve frame rate conversion.A limiting factor of our technique is that it requires source layers with strong distinctive feature points.This could lead to incomplete de-tections.Acknowledgment:This work is funded by the Irish Re-serach Council for Science,Engineering and TechnologyFigure 7.Slow motion effect for the person’s leg of GirlRef (see Fig:5third line).Top:Original frames 59-61;Middle:generated frames using the Foundry’s plugin Kronos [6]with one motion vector calculated for every 4pels;Bottom;with one motion vector calculated for every 64pels in regions of reflection.False Alarm RateC o r r e c tD e t e c t i o n R a t eFigure 6.ROC plots for our technique BIRD,DFD and Sharpness for SelimH .Our technique BIRD outperforms DFD and Sharp-ness with a massive increase in the Correct Detection Rate.(IRCSET)and Science Foundation Ireland (SFI).References[1] A.M.Bronstein,M.M.Bronstein,M.Zibulevsky,and Y .Y .Zeevi.Sparse ICA for blind separation of transmitted and reflected images.International Journal of Imaging Systems and Technology ,15(1):84–91,2005.1,2[2]N.Chen and P.De Leon.Blind image separation throughkurtosis maximization.In Asilomar Conference on Signals,Systems and Computers ,volume 1,pages 318–322,2001.1,2[3]K.Diamantaras and T.Papadimitriou.Blind separation ofreflections using the image mixtures ratio.In ICIP ,pages 1034–1037,2005.1,2[4]H.Farid and E.Adelson.Separating reflections from imagesby use of independent components analysis.Journal of the Optical Society of America ,16(9):2136–2145,1999.1,2[5]R.Ferzli and L.J.Karam.A no-reference objective imagesharpness metric based on the notion of just noticeable blur (jnb).IEEE Trans.on Img.Proc.(TIPS),18(4):717–728,2009.4,6[6]T.Foundry.Nuke,furnace .6,8[7] A.Levin,A.Zomet,and Y .Weiss.Separating reflectionsfrom a single image using local features.In IEEE Conference on Computer Vision and Pattern Recognition (CVPR),pages 306–313,2004.1,2[8] D.G.Lowe.Distinctive image features from scale-invariantput.Vision ,60(2):91–110,2004.4[9] B.D.Lucas and T.Kanade.An iterative image registra-tion technique with an application to stereo vision (darpa).In DARPA Image Understanding Workshop ,pages 121–130,1981.3[10] D.Ring and F.Pitie.Feature-assisted sparse to dense motionestimation using geodesic distances.In International Ma-chine Vision and Image Processing Conference ,pages 7–12,2009.5[11] B.Sarel and M.Irani.Separating transparent layers throughlayer information exchange.In European Conference on Computer Vision (ECCV),pages 328–341,2004.1,2,4[12] B.Sarel and M.Irani.Separating transparent layers of repet-itive dynamic behaviors.In ICCV ,pages 26–32,2005.1,2[13]R.Szeliski,S.Avidan,and yer extrac-tion from multiple images containing reflections and trans-parency.In CVPR ,volume 1,pages 246–253,2000.1,2[14] C.T.Takeo and T.Kanade.Detection and tracking ofpoint features.Carnegie Mellon University Technical Report CMU-CS-91-132,1991.3[15]P.Viola and M.Jones.Robust real-time object detection.InInternational Journal of Computer Vision ,2001.5[16]Z.Wang,A.Bovik,H.Sheikh,and E.Simoncelli.Imagequality assessment:from error visibility to structural simi-larity.TIPS ,13(4):600–612,April 2004.4[17]Y .Weiss.Deriving intrinsic images from image sequences.In ICCV ,pages 68–75,2001.1,2,4。
翻译词汇,日积月累计划单列市city specifically designated in the state plan农机具补贴subsidy for agricultural machinery and tools附加税VAT(value-added tax)/extra duties/surtax/super tax/tax surcharge三资企业three kinds of foreign-invested enterprise or ventures 中外合资企业Sino-foreign joint ventures外商独资企业exclusively foreign-owned company in china中外合作企业cooperative business经济萧条economic slump/economic recession/economic depression business depression发展才是硬道理development as a task of overriding importance driven by consumption, investment ,import and export统筹城乡发展coordinate development in rural and urban regions 拓宽农村增收渠道seek new ways to increase farmer's incomes区域协调互动发展机制 a mechanism for promoting balance and interactive development among regions粗放型增长方式the extensive/inefficient mode /model of growth 密集的增长模式the intensive mode of growth国民经济性支柱产业pillar industry of national economy英汉词汇homecoming 同学会agent 代理crowdsourcing 众包lion's share 大部分ecological civilization 生态文明surrealism 超现实主义Central Ballet Troupe 中央芭蕾舞歌舞团myth of China's peaceful rise 中国的和平崛起的神话currency manipulator 货币操纵国sample survey 抽样检查core competiveness 核心竞争力intellectual property infringement 知识产权侵权multi-polar world 多极世界unabridged dictionary 足本词典Chic lit(文学)年轻女性小说缩略语交货前付款CBD(Cash Before Delivery)泛太平洋伙伴关系TPP(trans-pacific partnership)万国邮政联盟UPU(Universal Postal Union)订金CBD(Cash Before Delivery)有线电视CATV(Cable Television)汉英词汇转变经济增长方式the transformation of the economic growth mode弥补外需缺口compensate for weak external demand重点产业调整振兴规划the plan for restructuring and reinvigorating key industries县级基本财力保障机制 a mechanism for ensuring basic funding for county-level governments对外工程承包和劳务合同营业额the total turnover of contracted overseas construction and labor contracts激发经济内在活力stimulate the internal vitality of the economy 消除经济不利影响overcome the adverse effects of imported and structural inflation进出口许可证管理the import and export licensing administration 货物实际进出境口岸the port of the actual entry and exit of the goods进口环节增值税和消费税import value added tax and consumption tax劳动报酬在初次分配中的比重the ratio of worker's incomes in the primary distribution of national income区域发展总体战略the master strategy of regional development 登月舱lunar module服务舱service module指令舱command module英汉词汇reentry module 返回舱propelling module 推进舱orbital module 轨道舱geosynchronous satellite 同步轨道卫星satellite in sun geosynchronous 太阳同步卫星weather satellite 气象卫星recoverable satellite 返回卫星communication satellite 通讯卫星remote sensing satellite 遥感卫星The long marchⅡrocket launcher 长征2号运载火箭low earth orbit 近地轨道orbit the earth 绕地球飞行fine-tune orbit 调整轨道personnel type 人才类型talent competitiveness 人才竞争力缩略语特别行政区SAR(Special Administrative Region)联合国旅游组织WTO(World Tourism Organization)军事委员会CMA(Committee of Military Affairs)仅供参考FRI(for your information)仅供参考FYR(for your reference)汉英词汇人才流失brain drain海外人才overseas talents人才库talent pool人才管理talent management人才政策personnel policy人才发展talent development尊重人才value talent/ talented people党管人才human resources under the Party leadership 党政人才Party and government officials青年英才young talents高素质教育人才high-quality educators海外高层人才high-quality overseas professionals全民健康卫生人才national health professionals高技能人才the highly skilled/highly skilled workers企业经营管理人才enterprise management talents英汉词汇military talents 高素质军事人才professional medical personnel 专业医药人才innovative skilled sci-tech workers 创新性科技人才急需紧缺专门人才professionals in short supplybuilding of talent team 人才队伍建设talent management 人才工作管理体制major projects for talent development 重大人才工程HR= human resources 人力资源overseas Chinese students' home 全球留学人才之家cramming method of teaching 填鸭式教学be released from regular work for study 脱产学习locally-granted student loan 生源地助学贷款pseudo-science 伪科学academic credit system 学分制talent pool 人才贮备汉英词汇保障性住房indemnificatory housing包工头labor contractor首期按揭down payment筒子楼tube-shaped apartment危房dilapidated building违章建筑unapproved construction project样板房model unit活动板房movable plank house/ movable house违章借贷abusive lending包装风险房贷packaging of risky mortgages住房公积金贷款housing accumulation fund/ housing provident fund保障性住房low-income housing/government subsidized housing 小产权房house/apartment with limited/incomplete property rights棚改房housing run-down areas that will undergo renovation一定让人民安居乐业we must ensure peace and security for the people出让、划拨和转让grant, allocation and transfer of right to use land英汉词汇relief fund 救济金under-the-counter deals 开后门pet phrase 口头禅flash mob 快闪族Malthusian Theory of Population 马尔萨斯人口论the Matthew Effect 马太效应rule of thumb 拇指规则door-to-door service 上门服务probationary period 试用期human trafficking 人口贩卖sugar-coated bullet 糖衣炮弹job-hopping 跳槽ostrich policy/ostrichism 鸵鸟政策die-hard fans 铁杆粉丝sister cities 友好城市缩略语首次公开募股IPOs(initial public offerings)中国民航CAAC(Civil Aviation Authority of China)域名服务器DNS(Domain Name Server)亚洲石油公司APC(Asian Petroleum Company)最惠国待遇原则MFNS(Most-Favored-Nation Treatment)汉英词汇邮政储蓄postal savings回扣sales commission/kickback潘多拉魔盒Pandora's box霹雳舞break dancing诺亚方舟Noah‘s Ark软肋soft spot/Achilles’ heel软着陆soft landing汇演joint performance花边新闻tidbit西气东输pipeline for transmitting natural gas from the west to the east政府保障力the government’s capacity for safeguard people’s livelihood开辟了中国特色社会主义道路blaze a trial of socialism with Chinese characteristics生产安全事故workplace accident ; work-related accident/work accident/accident at work/accident due to lack of work safety 行政事业性收费administrative charges/fees/public service charges/fees一次性补偿lump-sum compensation/once-and-for-all compensation/one-off compensation/flat compensation英汉词汇property tax 物业税vacant property 闲置地产complete apartment 现房first/second stage 一期/二期monthly installment payment 月供added-value fees 增值地价depreciation allowances 折旧费policy-related house 政策性住房key zones for development 重点开发区residential property 住宅地产owner-occupied homes /houses 自住型住房消费run-down areas 棚户区commercial housing with price ceilings 限价商品房change hand 房屋转手(易主)jelly-built projects 豆腐渣工程缩略语中国银监会CBRC(China Banking Regulatory Commission)多边贸易谈判MTN(multilateral trade negotiations)国内生产总值GDP(Gross Domestic Product)石油输出国家组织OPEC(Organization of Petroleum Exporting Countries)东南亚国家联盟(东盟)ASEAN(Association of Southeast Asian Nations)汉英词汇观望态度wait-and-watch attitude旧区改造reconstruction of old area楼层建筑面积floor space普通购房者private homebuyer期房forward delivery housing容积率capacity rate商品房commercial /residential building商业地产commercial property商住综合楼commercial/ residential complex社区community首付down payment售后回租(租回已租出财产)leaseback塔楼tower building停车位parking space停车场parking lot英汉词汇land reserves 囤地second-hand house 二手房property ownership certificate 房产证real estate agent 房产中介property bubble 房地产泡沫real estate/property market 房地产市场overheated property sector 房地产市场过热property price/housing price 房价payment by installment 分期付款tenement 分租合住的经济公寓property deed tax 购房契税housing bubble 房地产泡沫land use certificate 土地使用证allowances for repairs and maintenance 维修费speculative property transaction 投机性房产交易缩略语科学技术开发应用咨询委员会ACASTD(Advisory Committee on the Application of Science and Technology Development)联合国可持续发展委员会CSD(Commission on Sustainable Development)粮食及农业组织FAO(Food and Agriculture organization)联合国大会GA(general assembly)国际法院ICJ(International Court of Justice)汉英词汇花钱炫富spend money thoughtlessly and flaunt one's wealth 土豪newly rich and powerful people南水北调工程South-to-North water diversion project政绩考核assess the performance of local authorities生态脆弱fragile eco-system妥善处理敏感问题和分歧properly handle sensitive questions and disputes安居工程housing project for low-income urban residents住房空置率housing vacancy rate地王property king /land king/top bidder廉租房low-rent housing经济实用型住房affordable housing限价房price-capped housing公租房public rental housing按揭购房buy a house on mortgage板楼slap-type apartment building英汉词汇civil service exam 公务员考试interdisciplinary talent 复合型人才college/university entrance exam 高考divergent thinking 发散思维top student 高材生teaching to the test/exam-oriented education 应试教育quality-oriented education 素质教育senior cadre/high-ranking official 高干enrollment expansion 扩招real estate speculator 炒房者unpaid mortgage balance 抵押贷款欠额location classification 地段等级nail household/nail house residents 钉子户planned enrollment 计划内招生correspondence university 函授大学缩略语独联体CIS(Commonwealth of Independence States)东盟自由贸易区AFTA(Asian Free Trade Area)联合国教科文组织UNESCO(United Nations Educational,Scientific,and Cultural Organization)全国人民代表大会NPC(National People's Congress)中华人民共和国PRC(People's Republic of China)汉英词汇各类型,多层次人才队伍 a skilled, diversified and multilevel workforce高级人才high-end professionals/top talents国家中长期人才发展规划纲要national program for medium and long-term talent development中国国际人才交流协会Conference on InternationalExchange of Professionals科教兴国战略和人才强国战略the strategy of reinvigorating China through science, education and human resources文化渗透cultural infiltration玛雅文化Mayan civilization国家非物质文化遗产national intangible cultural heritage以人为本的城镇化human-centered urbanization打破城乡二元经济结构break up the city-country dualistic economic structure释放巨大的内需潜力unleash huge potential in domestic demand 公共文化服务体系public service system of culture国家文化软实力the country’s soft power in cultural fields全民族文明素质the Chinese people‘s civil education对外文化宣传international cultural publicity英汉词汇Trojan Horse 特洛伊木马cultural industry 文化产业cultural undertaking 文化事业hard-disk drive 硬盘驱动器organic nanomaterial 有机(生化)纳米材料lunar rover 月球车live broadcast 直播chief editor 主编candid camera 抓拍镜头special dispatch 专电sulfur dioxide emissions 二氧化硫排放wind break 防风林charm of oriental culture 东方神韵the Oriental Hawaii 东方夏威夷holiday resort 度假村缩略语转基因动物GMA(Genetically Modified Animal)转基因食品GMF(Genetically Modified Food)国家环境保护总局SEPA(State Environmental Protection Administration)联合国气候变化框架公约UNFCCC(The United Nations Framework Convention on Climate Change)中国人民政治协商会议CPPCC(Chinese People's Political Consultative Conference)汉英词汇两岸包机the cross-Straits charter flight历史遗留问题problems left over by history领土管辖权territorial jurisdiction落实共识implement the consensus民间外交people-to people diplomacy白色污染white pollution悬浮颗粒物suspended particles烟尘排放soot emission野生动植物wild animals and plant;wild fauna and flora有毒废料toxic waste加强文化建设加强文化建设strengthen cultural development efforts高雅艺术high art培养青少年思想道德建设cultivate ideals and ethics among young people满足人民群众不断增长的精神文化需求satisfy the people'sever-growing demand for cultural products加强公民道德建设工程carry out the program for improving civic morality英汉词汇joint venture 合资企业gold reserve 黄金储备reciprocal trade agreement 互惠贸易协定exchange risks 汇率风险demand deposit 活期存款judge by default 缺席判决rights of the person 人身权力determine facts 认定事实petition for appeal 上诉状cases involving foreign interests 涉外案件boarding school 寄宿学校education in the home 家庭教育education in home economics 家政教育contingent of teachers 教师队伍innovation in pedagogy 教学方法创新缩略语合格境内机构投资者QDII(Qualified Domestic Institutional Investor)合格境外机构投资者QFII(Qualified Foreign Institutional Investors)计算机辅助教学CAI(computer-aided instruction或computer-assisted instruction)计算机辅助设计CAD(Computer - Aided Design)计算机辅助翻译CAT(Computer Aided Translation)汉英词汇"爱国者"导弹PatriotmissileAA制Dutchtreatment;goDutch百闻不如一见Seeingisbelieving.百慕大三角BurmudaTriangle闭关政策closed-doorpolicy闭卷closed-bookexam冰雕icesculpture本垒打circuitclout,four-master,roundtrip本票cashier'scheque本本主义bookishness吃皇粮"receivesalaries,subsidies,orothersupportedfromthegovernmen t"充值卡rechargeablecard出入平安Safetripwhereveryougo敞蓬轿车open-toppedlimousine畅通工程SmoothTrafficProject英汉词汇deadaccount 呆帐packingcredit(loan) 打包贷款sun-top 吊带衫revokelicense 吊销执照firstinning 第一发球权countdown 倒计时globalvillage 地球村24-hour(service) 全天候all-weatehraircraft 全天候飞机performanceofenterprises 企业效益listingofacompany 企业上市braindrain人才流失competitionfortalentedpeople 人才战administrativeprogram 施政纲领double-edgedsword 双刃剑缩略语有线电视CATV(Cable Television)首席信息主管CIO(Chief Information Officer)注册会计师CPA(Certified Public Accountant)情商EQ(Emotional Quotient)超文本传输协议HTTP(Hyper Text Transfer Protocol)汉英词汇女汉子tough girl, manly woman殡葬改革funeral and interment reform生态安葬an ecological way of burial多功能生态保护实验区multifunction ecological experimentation zone谨言慎行be cautious in words and deeds包二奶have a concubine (originally a Cantonese expression) 变相涨价disguised inflation保证金margins, collateral保证金帐户margin account背投屏幕rear projection screen背黑锅to become a scapegoat不夜城"sleepless city, ever-bright city"不败记录clean record, spotless record存款准备金制度reserve against deposit system粗放式管理extensive management英汉词汇target hitting activities 达标活动grand slam 大满贯paid holiday 带薪假期penalty kick 点球loan loss provision, provisions of risk 风险准备金quota-free products 非配额产品burglarproof door; antitheft door 防盗门real estate evaluator 房产估价师real estate management 房管Time and tide wait for no man 时不我待break a butterfly on the wheel 杀鸡用牛刀be disengaged from work; divorce oneself from one's work 脱产talk show 脱口秀inter-bank borrowing 同业拆借geostationary satellilte 同步卫星缩略语强迫性神经官能症OCD(obsessive-compulsive disorder)艾滋病病毒HIV(human immunodeficiency virus)疯牛病BSE(bovine spongiform encephalopathy)非处方药OTC(over the counter)联合国儿童基金会UNICEF(United Nations International Children'sEmergency Fund)。
***大学毕业设计(英文翻译)原文题目:Automatic Panoramic Image Stitching using Invariant Features 译文题目:使用不变特征的全景图像自动拼接学院:电子与信息工程学院专业: ********姓名: ******学号:**********使用不变特征的全景图像自动拼接马修·布朗和戴维•洛{mbrown|lowe}@cs.ubc.ca计算机科学系英国哥伦比亚大学加拿大温哥华摘要本文研究全自动全景图像的拼接问题,尽管一维问题(单一旋转轴)很好研究,但二维或多行拼接却比较困难。
以前的方法使用人工输入或限制图像序列,以建立匹配的图像,在这篇文章中,我们假定拼接是一个多图像匹配问题,并使用不变的局部特征来找到所有图像的匹配特征。
由于以上这些,该方法对输入图像的顺序、方向、尺度和亮度变化都不敏感;它也对不属于全景图一部分的噪声图像不敏感,并可以在一个无序的图像数据集中识别多个全景图。
此外,为了提供更多有关的细节,本文通过引入增益补偿和自动校直步骤延伸了我们以前在该领域的工作。
1. 简介全景图像拼接已经有了大量的研究文献和一些商业应用。
这个问题的基本几何学很好理解,对于每个图像由一个估计的3×3的摄像机矩阵或对应矩阵组成。
估计处理通常由用户输入近似的校直图像或者一个固定的图像序列来初始化,例如,佳能数码相机内的图像拼接软件需要水平或垂直扫描,或图像的方阵。
在自动定位进行前,第4版的REALVIZ拼接软件有一个用户界面,用鼠标在图像大致定位,而我们的研究是有新意的,因为不需要提供这样的初始化。
根据研究文献,图像自动对齐和拼接的方法大致可分为两类——直接的和基于特征的。
直接的方法有这样的优点,它们使用所有可利用的图像数据,因此可以提供非常准确的定位,但是需要一个只有细微差别的初始化处理。
基于特征的配准不需要初始化,但是缺少不变性的传统的特征匹配方法(例如,Harris角点图像修补的相关性)需要实现任意全景图像序列的可靠匹配。
Algebraic operation 代数运算;一种图像处理运算,包括两幅图像对应像素的和、差、积、商。
Aliasing 走样(混叠);当图像像素间距和图像细节相比太大时产生的一种人工痕迹。
Arc 弧;图的一部分;表示一曲线一段的相连的像素集合。
Binary image 二值图像;只有两级灰度的数字图像(通常为0和1,黑和白)Blur 模糊;由于散焦、低通滤波、摄像机运动等引起的图像清晰度的下降。
Border 边框;一副图像的首、末行或列。
Boundary chain code 边界链码;定义一个物体边界的方向序列。
Boundary pixel 边界像素;至少和一个背景像素相邻接的内部像素(比较:外部像素、内部像素)Boundary tracking 边界跟踪;一种图像分割技术,通过沿弧从一个像素顺序探索到下一个像素将弧检测出。
Brightness 亮度;和图像一个点相关的值,表示从该点的物体发射或放射的光的量。
Change detection 变化检测;通过相减等操作将两幅匹准图像的像素加以比较从而检测出其中物体差别的技术。
Class 类;见模或类Closed curve 封闭曲线;一条首尾点处于同一位置的曲线。
Cluster 聚类、集群;在空间(如在特征空间)中位置接近的点的集合。
Cluster analysis 聚类分析;在空间中对聚类的检测,度量和描述。
Concave 凹的;物体是凹的是指至少存在两个物体内部的点,其连线不能完全包含在物体内部(反义词为凸)Connected 连通的Contour encoding 轮廓编码;对具有均匀灰度的区域,只将其边界进行编码的一种图像压缩技术。
Contrast 对比度;物体平均亮度(或灰度)与其周围背景的差别程度Contrast stretch 对比度扩展;一种线性的灰度变换Convex 凸的;物体是凸的是指连接物体内部任意两点的直线均落在物体内部。
COLMAP已知相机内外参数重建稀疏稠密模型COLMAP已知相机内外参数重建稀疏/稠密模型Reconstruct sparse/dense model from known camera poses1. ⼿动指定所有相机内外参在⽬录下⼿动新建cameras.txt, images.txt, 和 points3D.txt三个⽂本⽂件,⽬录⽰例:+── %ProjectPath%/created/sparse│ +── cameras.txt│ +── images.txt│ +── points3D.txt将内参(camera intrinsics) 放⼊cameras.txt,外参(camera extrinsics)放⼊ images.txt , points3D.txt 为空。
具体格式如下:cameras.txt ⽰例:# Camera list with one line of data per camera:# CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[fx,fy,cx,cy]# Number of cameras: 21 PINHOLE 1280 720 771.904 771.896 639.993 360.0012 PINHOLE 1280 720 771.899 771.898 639.999 360.001COLMAP 设定了SIMPLE_PINHOLE,PINHOLE,SIMPLE_RADIAL,RADIAL,OPENCV,FULL_OPENCV ,SIMPLE_RADIAL_FISHEYE,RADIAL_FISHEYE,OPENCV_FISHEYE,FOV,THIN_PRISM_FISHEYE 共11种相机模型,其中最常⽤的为PINHOLE,即⼩孔相机模型。
⼀般正常拍摄的照⽚,不考虑畸变即为该模型。
⼿动参数的情况下官⽅推荐OPENCV相机模型,相⽐PINHOLE,考虑了xy轴畸变。
图像分类中混淆矩阵精度验证法中的几个指标说明选择主菜单->Classification->Post Classification->Confusion Matrix->Using Ground Truth ROIs,可以得到如下的分类精度验证的混淆矩阵。
要看懂这个精度验证结果,需要了解几个混淆矩阵中的几项评价指标:总体分类精度(Overall Accuracy)等于被正确分类的像元总和除以总像元数。
被正确分类的像元数目沿着混淆矩阵的对角线分布,总像元数等于所有真实参考源的像元总数,如本次精度分类精度表中的Overall Accuracy = (110230/125843) 87.5933%。
●Kappa系数(Kappa Coefficient)它是通过把所有真实参考的像元总数(N)乘以混淆矩阵对角线(XKK)的和,再减去某一类中真实参考像元数与该类中被分类像元总数之积之后,再除以像元总数的平方减去某一类中真实参考像元总数与该类中被分类像元总数之积对所有类别求和的结果。
Kappa计算公式●错分误差(Commission)指被分为用户感兴趣的类,而实际属于另一类的像元,它显示在混淆矩阵里面。
本例中,总共划分为林地有19210个像元,其中正确分类16825,2385个是其他类别错分为林地(混淆矩阵中林地一行其他类的总和),那么其错分误差为2385/19210= 12.42%。
●漏分误差(Omission)指本身属于地表真实分类,当没有被分类器分到相应类别中的像元数。
如在本例中的林地类,有真实参考像元16885个,其中16825个正确分类,其余60个被错分为其余类(混淆矩阵中耕地类中一列里其他类的总和),漏分误差60/16885=0.36%●制图精度(Prod.Acc)是指分类器将整个影像的像元正确分为A类的像元数(对角线值)与A类真实参考总数(混淆矩阵中A类列的总和)的比率。
如本例中林地有16885个真实参考像元,其中16825个正确分类,因此林地的制图精度是16825/16885=99.64%。
常州华森医疗器械有限公司成立于2002年,是中国最具活力和创新能力的外科手术医疗器械制造及相关服务供应商之一。
公司下设骨科部、吻合器部、胸外科部、手术工具部和新品开发部等五大产业部,现有江苏国家级常州高新区、江苏国家级武进高新区和常州科教城三大研发、生产基地。
华森骨科部位于江苏国家级武进高新区,占地面积100000m2, 拥有标准厂房50000m2, 净化车间3000m2, 研发中心3000m2;中高级以上职称的生产、研发、管理专业人员占员工总人数的40%以上。
依托北京协和医院、北京军区总医院、南方医科大学、华南理工大学、苏州大学、常州科教城等院校的技术支持和紧密合作,华森公司具备了国内一流的研发力量和加工生产能力。
公司严格执行ISO和CMD 标准运营,产品已全面通过TUV欧盟CE认证,部分产品已率先通过FDA认证。
凭借可靠质量、周到服务,在国内外市场已取得了行业内的广泛认同。
企业理念:敬业创新,发展共享。
企业使命和目标:以科学严谨的态度、不断创新的理念以及对临床医学的深刻理解,研发生产贴合临床、简单易用的高质量医疗产品,成为国内外客户值得信赖的民族品牌企业。
Changzhou Waston Medical Appliance Co., Ltd, one of China’s most vigorous and innovative medical device manufacturers and related service providers,was founded in 2002. It consists of five divisions, including the Orthopedics Department, Surgical Sta-pler Department, Thoracic Surgery Department, Surgical Instruments Department and New Product Development Department. Our company has three R&D and production bases, located in National Changzhou Hi-tech Industry Zone of Jiangsu, National Wujin Hi-tech Industry Zone of Jiangsu, and Science and Technology Center of Changzhou separately.Waston’s Orthopedics Department locates at Wujin National Hi-Tech Industry zone of Jiangsu. It covers an area of 100,000 m2, with a standard factory 50,000 m2, a purification plant 3,000 m2 and a R & D center 3,000m2. The staff in production, R&D, and management with senior titles account for over 40%. Waston also owns a full set of international advanced equipment such as 5-axis machining centers, slitting machine tools, CNC machine tools and 3D printers. Collabrating with Beijing Union Medical College Hospital, Beijing Military General Hospital, Southern Medical University, South China University of Technology, Suzhou University, Changzhou Science & Education Town and other institutions of technical support and close cooperation, Watson owns the first-class R&D and manufacturing capacity. In strict implementation of ISO and CMD standards, all products have passed European Union CE certificates of TUV and some have been approved by FDA. With reliable quality, attentive service, Waston is widely recognized in medical industry both at home and abroad.Corporate philosophy:Create by heart,Share of development.Corporate mission and objectives:with rigorous scientific attitude, the concept of continuous innovation and a deep understand-ing of clinical medicine to research and develop high-quality medical products which fit the clinical and is easy-to-use. It is Waston’s greatest honor to be a trusted Chinese brand for customers both at home and abroad.07脊柱内固定系列Spinal System19锁定接骨板系列LOC Plate Series36空心钉系列Cannulated Screws Series37子母钉内固定系统Fixation Screws Series38接骨板系列Plates System49鹅头钉接骨板系列DHS/DCS Plates System50微型接骨板系列Mini Plates System56接骨螺钉系列Screws System57髓内钉系列Intramedullary Nails System64WHC-I型髋臼后柱导向内固定系统Guidance system for posterior column of acetabulum 65一次性使用负压引流套装Vaccum Sealing Drainge (i-VSD) Suit 66专利纯钛肋骨接骨板系列Patented Rib Plates System67胸肋骨固定器Sternal Fixation System68捆扎钢绳及钢扣系列Binding Wires System 69脊柱器械包系列Spinal Instruments Set80锁定接骨板器械包系列LOC Plate Instruments Set87空心钉器械包系列Cannulated Screw Instruments Set91子母钉器械包系列Fixation Screws Instruments Set92界面螺钉(ACL)器械包系列ACL Interference Screw Instruments Set94接骨板器械包系列Plate Instruments Set101微型接骨板器械包系列Mini Plates System104伽玛髓内钉器械包Ⅱ型(PFNA)(PFNA) GAMMA Intramedullary Nails Instruments (II) 115髓内钉器械包系列Intramedullary Nail Instruments Set124断钉取出专用器械包Screw Extractor Instruments Set124枪式复位钳Collinear Reduction Clamp125植骨漏斗Bone Funnel126股骨头髓芯减压专用器械Femoral Head Core Decompression InstrumentsTable of Contents脊柱内固定系列6 Holes11101 颈椎前路钢板I型Anterior cervical plate I手术适应症Indications1. 创伤引起的颈椎不稳Instability caused by trauma2. 颈椎前后凸畸形矫正所致的颈椎不稳Instability associated with correction of cervical lordosis and kyphosis deformity3. 颈椎融合手术失败后假关节形成所致的颈椎不稳Instability associated with pseudoarthosis as a result of previously failed cervical spine surgery4. 颈椎原发性或转移性恶性肿瘤时巨大的重建手术所致的颈椎不稳Instability associated with major reconstructive surgery for primary tumors or metastatic malignant tumors of the cervical spine.5. 进行性退行性颈椎间盘疾病、颈椎管狭窄及颈脊髓压迫症单椎体或多椎体切除后 所致的颈椎不稳Instability associated with single or multiple level corpectomy in advanced degenerative disc disease, spinal canal stenosis and cervical myelopathyNEULEN 颈椎椎板成型系统NEULEN cervical laminoplasty systemand self-drilling)手术适应症IndicationsNEULEN Cervical Laminoplasty System is to be used when there is:1. 脊髓型颈椎病,累及3个或更多节段。
基于提升算法的二维5/3和9/7小波变换的MATLAB仿真与DSP 实现王靖琰,刘蒙中国科学院上海应用物理研究所,上海 (201800)E-mail :wjycas@摘 要:本文讨论了基于提升算法的二维5/3和9/7小波的原理,对算法进行了MATLAB 仿真,并在浮点型DSP TMS320C6713B 上实现了图像的二维5/3、9/7小波提升变换和逆变换。
实验结果证明了方法的有效性。
关键词:小波提升,二维9/7、5/3小波,MATLAB ,TMS320C6713B1.引言随着人们对多媒体信息需求的日益增长,数码相机、移动电话、MP4 等多媒体信息处理系统蓬勃发展。
基于通用DSP 处理器的此类系统设计以灵活性强、扩展性好、可升级和易维护的优点成为系统开发的首选方案 [1]。
由于良好的时频局部特性和多分辨分析特性,小波已广泛应用于图像处理领域,并且被吸收进新的一些国际标准中成为了标准算法。
文中在MATLAB 平台上对基于小波提升的二维离散5/3和9/7小波变换算法进行了仿真,并在浮点型DSP TMS320C6713B 上实现了算法,该程序运算速度快,可充分利用硬件资源,特别适用于嵌入式系统的需求。
2.小波变换提升算法基本原理1994年Sweldens 提出了小波的提升算法,有效地解决传统的基于Mallat 的塔式分解小波变换算法计算量大、对存储空间的要求高的问题,从算法方面提高了小波变换的实现效率[2]。
2.1 5/3小波提升格式小波提升算法的基本思想是通过由基本小波(lazy wavelet)逐步构建出一个具有更加良好性质的新小波,其实现步骤有3个:分解(split)、预测(predict)和更新(update)。
分解是将数据分为偶数序列和奇数序列2个部分,预测是用分解的偶数序列预测奇数序列,得到的预测误差为变换的高频分量,更新是由预测误差来更新偶数序列,得到变换的低频分量。
在J PEG2000中,5/3提升小波变换的算法为[3]:(2)(22)(21)(21)(1)2(21)(21)2(2)(2)(2)4x n x n c n x n c n c n d n x n ++⎡⎤+=+−⎢⎥⎣⎦−+++⎡⎤=+⎢⎥⎣⎦由其正变换的反置即可得到逆变换的算法为 c(2n-1) + c(2n+1)+2x (2n) = d (2n) - (3)4x(2n)+x(2n+2)x(2n+1)=c(2n)+(4)2⎡⎤⎢⎥⎣⎦⎡⎤⎢⎥⎣⎦ 从算式可以得出,提升算法是原位计算,即进行小波变换时在原位计算各个系数,计算的系数可以直接替代原始数据而不需要附加数据存储空间。
温世锋简历广州市第一人民医院脊柱外科主任医师,医学博士,硕士生导师,医院办公室副主任。
【基本资料】姓名:温世锋性别:男技术职称:主任医师专业:骨科、脊柱外科出生年月:1973年7月籍贯:江西赣州毕业院校:南方医科大学学历:博士【学历和工作经历】1993年9月至1998年7月,江西省赣南医学院临床医学系学习,本科学历,取得学士学位。
1998年8月至2000年7月,江西省赣南医学院生物化学与分子生物学教研室任教,助教。
因当时工作需要,提前承担本科、专科生物化学与分子生物学理论大课与实验操作教学工作。
2000年8月至2003年7月,广州医学院研究生学习,导师俆中和教授,专业骨外科,研究方向活骨移植。
研究课题为广东省科技厅立项课题:长段同种异体骨与带血管自体腓骨复合骨移植的实验研究。
研究生毕业,取得医学硕士学位。
2004年2月至2004年8月,香港中文大学骨科系、威尔士亲王医院骨科,访问学者。
2003年7月至2006年8月,广州市第一人民医院骨外科工作,骨科住院医师。
2006年9月至2010年12月,广州市第一人民医院骨外科工作,骨科主治医师。
2005年-2012年,连续7年受聘于广州国际女子网球公开赛首席赛事医师。
2009年8月至2010年8月,香港大学公共卫生学院研究生学习(full time),专业:Epidemiology and Clinical Effectiveness。
主要学习内容为临床试验研究。
毕业论文为:Effectiveness of three surgical decompressionstrategies for treatment of multilevel cervical myelopathy: a retrospective study。
研究生毕业,取得公共卫生学硕士学位。
2010年9月至2013年7月,南方医科大学博士研究生学习,导师尹庆水教授,专业骨外科,研究方向:脊髓型颈椎病的治疗。
图像处理中的全局优化技术(Global optimization techniques in image processing and computer vision) (一)2013-05-29 14:26 1659人阅读评论(1) 收藏举报算法图像处理计算机视觉imagevisionMulinB按:最近打算好好学习一下几种图像处理和计算机视觉中常用的global optimization (或energy minimization) 方法,这里总结一下学习心得。
分为以下几篇:1. Discrete Optimization: Graph Cuts and Belief Propagation (本篇)2. Quadratic Optimization : Poisson Equation and Laplacian Matrix3. Variational Methods for Optical Flow Estimation4. TODO: Likelihood Maximization (e.g., Blind Deconvolution)1. Discrete Optimization: Graph Cuts and Belief Propagation很多图像处理和视觉的问题可以看成是pixel-labeling问题,用energy minimization framework可以formulate成以下能量方程:其中第一项叫data term (或叫unary term),是将label l_p赋给像素p时的cost,第二项叫smoothness term (或叫pairwise term),是每两个相邻pixel的labeling不同时产生的cost (w_pq是spatial varying weight,比如跟p和q的difference相关)。
传统的smoothness term 一般只考虑两两(pairwise)相邻的pixel,最近几年的CVPR和ICCV上开始出现很多higher-order MRF的研究,比如这位大牛的paper,这是题外话。
温世锋简历广州市第一人民医院脊柱外科主任医师,医学博士,硕士生导师,医院办公室副主任。
【基本资料】姓名:温世锋性别:男技术职称:主任医师专业:骨科、脊柱外科出生年月:1973年7月籍贯:江西赣州毕业院校:南方医科大学学历:博士【学历和工作经历】1993年9月至1998年7月,江西省赣南医学院临床医学系学习,本科学历,取得学士学位。
1998年8月至2000年7月,江西省赣南医学院生物化学与分子生物学教研室任教,助教。
因当时工作需要,提前承担本科、专科生物化学与分子生物学理论大课与实验操作教学工作。
2000年8月至2003年7月,广州医学院研究生学习,导师俆中和教授,专业骨外科,研究方向活骨移植。
研究课题为广东省科技厅立项课题:长段同种异体骨与带血管自体腓骨复合骨移植的实验研究。
研究生毕业,取得医学硕士学位。
2004年2月至2004年8月,香港中文大学骨科系、威尔士亲王医院骨科,访问学者。
2003年7月至2006年8月,广州市第一人民医院骨外科工作,骨科住院医师。
2006年9月至2010年12月,广州市第一人民医院骨外科工作,骨科主治医师。
2005年-2012年,连续7年受聘于广州国际女子网球公开赛首席赛事医师。
2009年8月至2010年8月,香港大学公共卫生学院研究生学习(full time),专业:Epidemiology and Clinical Effectiveness。
主要学习内容为临床试验研究。
毕业论文为:Effectiveness of three surgical decompressionstrategies for treatment of multilevel cervical myelopathy: a retrospective study。
研究生毕业,取得公共卫生学硕士学位。
2010年9月至2013年7月,南方医科大学博士研究生学习,导师尹庆水教授,专业骨外科,研究方向:脊髓型颈椎病的治疗。
第49卷第6期2022年6月Vol.49,No.6Jun.2022湖南大学学报(自然科学版)Journal of Hunan University(Natural Sciences)基于深度多级小波U-Net的车牌雾图去雾算法陈炳权†,朱熙,汪政阳,梁寅聪(吉首大学信息科学与工程学院,湖南吉首416000)摘要:为了解决雾天拍摄的车牌图像边缘模糊、色彩失真的问题,提出了端到端的基于深度多级小波U-Net的车牌雾图去雾算法.以MWCNN为去雾网络的主体框架,利用“SOS”增强策略和编解码器之间的跨层连接整合小波域中的特征信息,采用离散小波变换的像素-通道联合注意力块降低去雾车牌图像中的雾度残留.此外,利用跨尺度聚合增强块补充小波域图像中缺失的空间域图像信息,进一步提高了去雾车牌图像质量.仿真实验表明,该网络在结构相似度和峰值信噪比上具有明显的优势,在处理合成车牌雾图和实际拍摄的车牌雾图上,去雾效果表现良好.关键词:车牌雾图去雾;MWCNN;“SOS”增强策略;跨层连接;注意力;跨尺度聚合中图分类号:TP391.41文献标志码:ADehazing Algorithm of License Plate Fog Image Basedon Deep Multi-level Wavelet U-NetCHEN Bingquan†,ZHU Xi,WANG Zhengyang,LIANG Yincong(College of Information Science and Engineering,Jishou University,Jishou416000,China)Abstract:To solve the problem of edge blurring and color distortion of license plate images taken in foggy weather,an end-to-end depth multilevel wavelet U-Net based algorithm for license plate fog image removal is pre⁃sented.Taking MWCNN as the main frame work of the defogging network,the feature information in the wavelet do⁃main is integrated using the“SOS”enhancement strategy and the cross-layer connection between the codec.The pixel-channel joint attention block of the discrete wavelet transform is used to reduce the fog residue in the defrosted license plate image.In addition,the cross-scale aggregation enhancement blocks are used to supplement the missing spatial domain image information in the wavelet domain image,which further improves the quality of the defogging li⁃cense plate image.The simulation results show that the network has obvious advantages in structural similarity and peak signal-to-noise ratio,and it performs well in dealing with the composite plate fog image and the actual photo⁃graphed plate fog image.Key words:license plate fog image defogging;MWCNN;“SOS”enhancement strategy;cross-layer connection;attention mechanism;cross-scale aggregation∗收稿日期:2021-11-01基金项目:国家自然科学基金资助项目(No.62141601),National Natural Science Foundation of China(No.62141601);湖南省教育厅重点资助项目(21A0326),The SRF of Hunan Provincial Education Department(No.21A0326)作者简介:陈炳权(1972-),男,湖南常德人,吉首大学副教授,硕士生导师,博士†通信联系人,E-mail:****************文章编号:1674-2974(2022)06-0124-11DOI:10.16339/ki.hdxbzkb.2022293第6期陈炳权等:基于深度多级小波U-Net的车牌雾图去雾算法在大雾天气下使用光学成像器件(如相机、监控摄像头等)对目标场景或物体进行拍摄时,往往会使得图像对比度低,边缘、字符等细节信息模糊.图像去雾是图像处理中典型的不适定问题,旨在从雾天图像中复原出相应的无雾图像,作为一种提升图像质量的方法,已被广泛应用于图像分类、识别、交通监控等领域.近年来,针对不同场景(室内家居场景、室外自然场景、交通道路场景、夜间雾霾场景等)下均匀雾度或非均匀雾度图像的去雾技术引起了广泛关注与研究,但由于实际雾霾对图像影响的复杂多变性,从真实的雾天图像中复原无雾图像仍具有诸多挑战性.图像去雾技术发展至今,主要分为以下三类:基于数学模型的去雾技术,如直方图均衡[1]、小波变换[2]、色彩恒常性理论[3](Retinex)等;基于大气散射模型(ASM)和相关统计先验的去雾技术,如暗通道先验[4-5](DCP)、色衰减先验[6](CAP)、非局部先验[7](NLP)等;基于深度学习的去雾技术,如Deha⁃zeNet[8]、DCPDN[9]、AODNet[10]等.近年来,深度卷积神经网络在计算机视觉中应用广泛,2019年Liu 等[11]认为传统的卷积神经网络(CNN)在采用池化或空洞滤波器来增大感受野时势必会造成信息的丢失或网格效应,于是将多级小波变换嵌入到CNN中,在感受野大小和计算效率之间取得了良好的折中,因而首次提出了多级小波卷积神经网络(MWCNN)模型,并证实了其在诸如图像去噪、单图像超分辨率、图像分类等任务中的有效性.同年,Yang等[12]也认为离散小波变换及其逆变化可以很好地替代U-Net 中的下采样和上采样操作,因而提出了一种用于单幅图像去雾的小波U-Net网络,该网络与MWCNN结构非常相似.2020年,Yang等[13]将多级小波与通道注意力相结合设计了一种小波通道注意力模块,据此构建了单幅图像去雨网络模型.同年,Peng等[14]则将残差块与MWCNN相结合提出了一种用于图像去噪的多级小波残差网络(MWRN).2021年,陈书贞等[15]在已有的MWCNN结构上加入多尺度稠密块以提取图像的多尺度信息,并在空间域对重构图像进行进一步细化处理,以弥补小波域和空间域对图像信息表示存在的差异性,从而实现了图像的盲去模糊.为了解决大雾天气下车牌图像对比度低和边缘、字符等信息模糊不清的问题,很多研究人员开始将已有的图像去雾技术应用于车牌识别的预处理中.但大多数只是对已有图像去雾算法进行了简单改进,如对Retinex或DCP等去雾算法进行改进,直接应用于车牌检测识别中.虽然取得了一定的去雾效果,但其并没有很好地复原出车牌图像的特征,且很难应对中等雾和浓雾下的车牌图像.2020年王巧月等[16]则有意识地针对车牌图像的颜色和字符信息进行车牌图像的去雾,提高了车牌图像的质量.受上述研究的启发,本文提出一种基于深度多级小波U-Net的车牌雾图去雾算法,以端到端的方式来实现不同雾度下不同车牌类型的去雾.首先,提出了一种结合DWT、通道注意力(CA)和像素注意力(PA)的三分支结构,该结构可以对编码器每层输出特征的通道和像素进行加权处理,从而让去雾网络聚焦于车牌雾图中的有雾区域;其次,在解码器中引入“SOS”增强模块(“SOS”Block)来对解码器和下层输入的特征进行进一步融合和增强,提高去雾图像的峰值信噪比,并在U-Net编解码结构之间进行了层与层之间的连接,以此来充分利用不同网络层和尺度上的特征信息;最后,为弥补小波域和空间域之间网络表达图像信息的差异,提出了一种结合跨尺度聚合(CSA)的多尺度残差增强模块(CSAE Block),在空间域上进一步丰富网络对于图像细节信息的表达,有效地提高去雾图像的质量.1去雾网络结构本文去雾网络结构如图1所示.该网络主要分为A与B这2大模块,前者在小波域中实现对车牌雾图x LPHaze的去雾,后者在空间域上对模块A输出的无雾图像进行进一步优化,模块A 的网络结构参数见表1.整个网络结构的输出为:y LPDhaze=y B(y A(x LPHaze;θA);θB)(1)式中:y B(⋅)和y A(⋅)分别为模块A和B的输出,θA 和θB分别表示模块A和B的可学习参数.1.1小波U-Net二维离散小波变换(2D-DWT)可以实现对给定的图像I的小波域分解,分解过程可视为将图像I与4个滤波器进行卷积,即1个低通滤波器f LL和3个高通滤波器(f LH、f HL和f HH),这4个滤波器(即卷积核)分别由低通滤波器f L和高通滤波器f H构成.以Haar小波为例,该卷积核表示为:f L=12[]1,1T,f H=12[]1,-1Tf LL=LL T,f HL=HL T,f LH=LH T,f HH=HH T(2)125湖南大学学报(自然科学版)2022年图1去雾网络结构Fig.1Defogging network structure 表1模块A 网络结构参数Tab.1Network structure parameters of module A网络层层1层2层3层4类型注意力块卷积层残差组注意力块(下、上采样)卷积层残差组注意力块(下、上采样)卷积层残差组注意力块(下、上采样)卷积层残差组卷积层编码器(卷积核大小f ×f ,输出通道c )éëêêùûúú()2×2,1()1×1,4and ()1×1,16(3×3,16)éëêêùûúú()3×3,16()3×3,16×3éëêêùûúú()2×2,1()1×1,64(3×3,64)éëêêùûúú()3×3,64()3×3,64×3éëêêùûúú()2×2,1()1×1,256(3×3,256)éëêêùûúú()3×3,256()3×3,256×3éëêêùûúú()2×2,1()1×1,1024(3×3,1024)éëêêùûúú()3×3,1024()3×3,1024×3(3×3,1024)输出大小(宽W ,高H )(64,32)(64,32)(64,32)(32,16)(32,16)(32,16)(16,8)(16,8)(16,8)(8,4)(8,4)(8,4)(8,4)解码器(卷积核大小f ×f ,输出通道c )—éëêêùûúú()3×3,16()3×3,12éëêêùûúú()3×3,16()3×3,16×3—(3×3,64)éëêêùûúú()3×3,64()3×3,64×3—(3×3,256)éëêêùûúú()3×3,256()3×3,256×3————输出大小(宽W ,高H )—(64,32)(64,32)—(32,16)(32,16)—(16,8)(16,8)————层2层4层1层3层1层2层3层4离散小波变换DWT 卷积层ConvLayer 注意力块Attention Block “SOS ”增强块“SOS ”Block 残差组ResGroup 离散小波逆变换IDWT 跨尺度聚合增强块CSAE BlockTanh 层模块B模块A 层间多尺度聚合126第6期陈炳权等:基于深度多级小波U-Net 的车牌雾图去雾算法因此,2D-DWT 可以通过将输入图像I 与4个滤波器进行卷积和下采样来实现,从而获得4个子带图像I LL 、I LH 、I HL 和I HH .其操作定义如下:I LL =()f LL ∗I ↓2I LH =()f LH ∗I ↓2I H L=()f HL∗I ↓2I HH=()f HH∗I ↓2(3)其中:∗表示卷积操作;↓2表示尺度因子为2的标准下采样操作.低通滤波器用于捕获图像中光滑的平面和纹理信息,其它3个高通滤波器则提取图像中存在的水平、垂直和对角线方向上的边缘信息.同时,由于2D-DWT 的双正交性质,可以通过二维逆DWT 的精确重建出原始图像.2D-DWT 及其逆变换的分解和重建示意图如图2所示.本文将2D-DWT 及其逆变换嵌入到U-Net 网络结构中,改善原始U-Net 网络的结构.首先,对输入的3通道车牌雾图进行离散小波变换处理,输出图像的通道数变为原来的4倍,图像大小变为原来的12;然后,使用一个单独的卷积层(“3×3卷积+Lea⁃kyReLU ”)将输入图像扩展为16通道的图像;最后,在U-Net 的每层中迭代使用卷积层和离散小波变换用于提取多尺度边缘特征.1.2基于2D-DWT 的通道-像素联合注意力块(DCPA Block )在去雾网络中平等对待不同的通道和像素特征,对于处理非均匀雾度图像是不利的.为了能灵活处理不同类型的特征信息,Qin 等[17]和Wu 等[18]均采用CA 和PA ,前者主要用于对不同通道特征进行加权,而后者则是对具有不同雾度的图像像素进行加权,从而使网络更关注雾度浓厚的像素和高频区域.引入Hao-Hsiang Yang 等[13]的小波通道注意力块,本文提出了一种基于二维DWT 的通道-像素联合注意力模块,将DWT 、CA 和PA 构建并行的三分支结构,如图3所示.2D-DWT卷积3×3平均池化卷积Leaky ReluLeaky Relu 输入逐元素相乘残差组输出逐元素相加Leaky Relu 1×1卷积2×2Sigmoid 激活函数注意力块图3基于二维DWT 的特征融合注意力模块结构Fig.3Attention module structure of feature fusion basedon two-dimensional DWT其中,两分支结构的注意力块结合了PA (上分支)和CA (下分支)的特点,将具有通道数为C ,高和宽分别为H 、W 的特征图x ∈R C ×H ×W 分别输入到CA 和PA 中,前者通过平均池化将C×H×W 的空间信息转换为C×1×1的通道权重信息,而后者则通过卷积来将C×H×W 的图像转换成1×H×W.CA 由一个平均池化层、一个卷积层和一个LeakyReLU 层构成,其输出为:y CA =LeakyReLU 0.2(Conv 11(AvgPool (x)))(4)式中:y CA ∈R 1×H ×W ;Conv j i (⋅)表示卷积核大小为i ×i 、步长为j 的卷积操作;LeakyReLU 0.2(⋅)表示参数为0.2的LeakyReLU 激活函数;AvgPool (⋅)表示平均池化操作.类似地,PA 有一个卷积层和一个LeakyReLU层,但没有平均池化层,其输出为:y PA =LeakyReLU 0.2(Conv 22(x))(5)式中:y PA ∈R C ×1×1.CA 和PA 通过逐像素相加,并共用一个Sigmoid 激活函数来分别为输入图像通道和像素配置相应的权重参数,其输出为:y A =Sigmoid (y PA ⊕y CA)(6)式中:y A ∈R C ×H ×W ;⊕表示逐元素相加;Sigmoid (⋅)表示Sigmoid 激活函数.最后,和经离散小波变换、卷列行LLLI LII LH I HL HH22222222列I HL I LL I HH222HL H I H2L行HII LH 图22D-DWT 及其逆变换Fig.22D-DWT and its inverse transformation127湖南大学学报(自然科学版)2022年积层和残差组处理后的特征图进行逐元素相乘,从而实现对特征图的加权,其最终输出为:y DCPA =ResGroup (LeakyReLU 0.2(Conv 13(DWT (x))))⊗yA(7)式中:y DCPA ∈R C ×H ×W ;DWT (⋅)为离散小波变换;⊗表示逐元素相乘;ResGroup (⋅)表示残差组函数. 1.3层间多尺度聚合(LMSA )受到Park 等[19]在图像去雾任务中采用多级连接来复原图像细节信息的启示,将U-Net 编码器中前3层中DCPA Block 的输出进行跨层和跨尺度的特征聚合,使网络能充分利用图像潜在的特征信息,其结构如图4所示.假设编码器的第l 层输出的聚合特征为y lconcat ,输入解码器的第l 层特征为D l in ,其中l =1,2,3,则y lconcat =Cat((Cat i =13F l i (y l DCPA )),F up (D l +1out ))(8)D l in =LeakyReLU 0.2(Conv 11(LeakyReLU 0.2(y l SEBlock (y l concat))))(9)式中:Cat (⋅)表示级联操作;F l i (⋅)表示从第i 层到第l 层的采样操作;F up (⋅)表示上采样操作;D i +1out为解码器的第i +1层的输出.将每层聚合后的特征图x l concat 输入到SEBlock 中,自适应地调节各个通道特征,其输出为:y lSEBlock (x l concat )=Sigmoid (FC (ReLu (FC (AvgPool (xlconcat)))))(10)式中:FC (⋅)表示全连接层函数;ReLU (⋅)为ReLU 非线性激活函数.SEBlock 的结构如图5所示.平均池化全连接层全连接层ReLuSigmoid 激活函数S图5SEBlock 结构Fig.5SEBlock structure通过“LeakyRelu-Conv-LeakyRelu ”操作减少每层输出特征的通道数,输入到U-Net 解码器中前3层的“SOS ”Block 中,提升重构图像的峰值信噪比.U-Net 网络中的第4层则聚合前2层的DCPABlock 输出特征和第3层的DWT 输出特征,同样经过SEBlock 和“LeakyRelu-Conv-LeakyRelu ”操作后作为第4层的输入特征进行后续操作.其数学表达式为:y 4concat =Cat((Cat i =12F 4i (y 4DCPA )),E 3out )(11)D 4in =LeakyReLU 0.2(Conv 11(LeakyReLU 0.2(y 4SEBlock (y 4concat))))(12)其中,E 3out 表示解码器第3层的输出.1.4“SOS ”增强模块(“SOS ”Block )从Romano 等[20]和Dong 等[21]对“Strengthen-Operate-Subtract ”增强策略(“SOS ”)的开发和利用中可知,该增强算法能对输入的图像特征进行细化处理,可以提高输出图像的峰值信噪比.因此,在本文的车牌图像去雾网络解码器结构中,直接将Dong 等[21]采用的“SOS ”增强算法嵌入到U-Net 结构中,提升车牌去雾图像的质量.Dong 等[21]所采用的“SOS ”增强算法的近似数学表达式如下:J n +1=g (I +J n )-J n(13)层1DCPA Block输出层2DCPA Block输出层3DCPA Block输出编码器CCC层1“SOS ”Block层2“SOS ”Block层3“SOS ”Block解码器LeakyReLU-1×1Conv-LeakyReLU SEBlocky 2CSACaty 1coocat y 3CSACat y 2CSAEBlocky 1CSAEBlock y 3CSAEBlockD 2inD 1inD 3in图4层间多尺度聚合结构Fig.4Multi-scale aggregation structure128第6期陈炳权等:基于深度多级小波U-Net 的车牌雾图去雾算法其中:I 为输入雾图像;J n 为第n 层估计的去雾图像;I +J n 表示使用输入雾图进行增强的图像;g (⋅)表示去雾方法或者优化方法.在U-Net 中编解码器之间的嵌入方式如下:将U-Net 编解码器之间的连接方式改为了逐元素相加,即编码器第i 层输出的聚合特征D i in 加上对应的解码器的第i 层输入特征D i +1out(经上采样操作得到与D i in 相同的图片大小);将逐元素相加后的输出接入到优化单元(即残差组)中进行进一步的特征细化,随后减去解码器的第i 层输入D i +1out .通过上述嵌入方式,模块输出为:y i sos =g i (D i in +(D i +1out )↑2)-(D i +1out )↑2(14)其中:↑2表示尺度因子为2的上采样操作.其与U-Net 相结合的结构示意图如图6所示.解码器的第i +1层输出编码器的第i 层输出逐元素相减优化单元g (·)逐元素相加2D DWT2D IDWT图6“SOS ”深度增强模块Fig.6"SOS"depth enhancement module1.5跨尺度聚合增强模块(CSAE Block )为了弥补小波域中去雾网络所忽略的精细空间域图像特征信息,本文提出了一种基于残差组的跨尺度聚合增强模块(CSAE Block ),对小波域网络(模块A )最后输出的重构图像进行空间域上的图像细节特征补充.CSAE Block 结构如图7所示.CSAE Block 主要由卷积层(即“3×3卷积-Lea⁃kyReLU ”)、残差组、平均池化、CSA 和级联操作构成.首先,卷积层和残差组负责对模块A 输出的空间域图像y 模块A 进行特征提取,平均池化将输入特征分解为4个不同尺度(S 1=14,S 2=18,S 3=116和S 4=132)的输出,即:y S 1,y S 2,y S 3,y S 4=AvgPool (ResGroup (Conv 13(y 模块A)))(15)然后,CSA 实现对输入的不同尺度、空间分辨率的特征信息进行聚合,从而达到在所有尺度级别上有用信息的融合,并在每个分辨率级别上生成精细特征;最后,通过“LeakyRelu-Conv-LeakyRelu ”操作来对输入特征的通道数进行调整.该模块可以通过聚合不同尺度之间、不同分辨率之间的特征来使去雾网络获得较强的上下文信息处理能力.该聚合操作可表示如下:y SjCSA =⊕S i∈{}1,2,3,4F S j S i(y Si)(16)式中:y S jCSA 表示第j 个尺度S j 的CSA 输出特征,j =1,2,3,4;F S j Si(⋅)表示将尺度为S i 的特征图采样到尺度为S j 的特征图.同时,在该模块中引入短连接以改y 模块A1/161/321/81/4卷积层残差组平均池化跨尺度聚合(CSA )y ResGroupy S 2CSAy S 3CSAy S 4CSAy S 1CSA逐元素相加LeakyReLU-1×1Conv-LeakyReLU-1×1Conv-LeakyReLU-UpSample 3×3Conv-y CSACat级联Cy CSAEBlock图7CSAE Block 结构Fig.7CSAE block129湖南大学学报(自然科学版)2022年善其梯度流.综上所述,CSAE Block 总的输出表达式为:ìíîïïïïy CSACat =Cat j =14()F up ()LeakyReLU 0.2()Conv 11()y S jCSAy CSAE Block =Conv 13()Cat ()y CSACat ,y ResGroup (17)其中y CSACat 表示对上采样为统一大小的输出特征进行级联操作.1.6损失函数为了获得更好的去雾效果,本文使用3个损失函数(有图像去雾损失L rh 、边缘损失L edge 以及对比度增强损失L ce )作为网络训练的总损失L total ,即:L total =αL rh +γL edge -λL ce(18)其中:α,γ和λ均为任意非负常数.1)图像去雾损失函数.本文将L 1损失和L 2损失进行简单结合,构成车牌图像去雾损失函数:L rh =1N ∑i =1N ()I i gt -F Net ()I i haze 1+I i gt -F Net ()I ihaze 2(19)式中:N 表示输入图像像素数;I gt 为干净图像;I haze 为车牌雾图;F Net (⋅)表示车牌去雾网络函数.一方面通过L 1损失函数来获得较高的PSNR 和SSIM 值,另一方面则通过L 2损失函数来尽可能提高去雾图像的保真度.2)边缘损失函数.为了加强输出去雾图像的边缘轮廓信息,本文利用Sobel 边缘检测算法来获得车牌去雾图像和干净图像的边缘轮廓图,分别为E FNet()I haze和E I gt,计算两者的L 1范数,获得边缘损失函数:L edge=E FNet()I haze-E I gt1(20)3)对比度增强损失.为了提高车牌去雾图像的颜色对比度,本文最大限度地提升每个单独颜色通道的方差,即最大化如下表达式:L ce=(21)式中:x 表示图像的像素索引;FˉNet (I haze )为去雾网络输出图像F Net (I haze )的平均像素值.值得注意的是,所期望的输出去雾图像应该增强其对比度,所以L ce 需要最大化,因此在总损失L total 中应减去该项.2训练与测试2.1车牌雾图数据集(LPHaze Dataset )的制作为了解决车牌去雾网络训练过程中缺失的车牌雾图数据集问题,受RESIDE 数据集制作方法的启示,本文采用成熟的ASM 理论来进行车牌雾图数据集的制作.车牌图像数据主要来源于OpenITS 提供的OpenData V3.1-SYSU 功能性能车牌图像数据库,并以中科大开源数据集CCPD [22]作为补充,具体制作方法如下:1)预处理.从OpenITS 和CCPD 数据集中随机选取2291张清晰图像,并对这些清晰车牌图像的车牌区域进行截取;2)配置大气散射模型参数值.参照RESIDE 数据集所选取的参数值范围,随机选取如下一组大气光值A =[0.6,0.7,0.8,0.9,1.0]和一组大气散射系数值β=[0.4,0.6,0.8,1.0,1.2,1.4,1.6],并将场景深度d (x )置为1;3)合成车牌有雾-无雾图像对.采取一张清晰车牌图像对应多张车牌雾图的方法来合成图像对,即根据大气散射模型,结合步骤2中选定的参数值,以一张车牌无雾图像对应35张有雾图像的方法来合成数据集.合成车牌雾图示例如图8所示.(a )原图(b )(A =0.6,β=0.4)(c )(A =0.7,β=0.8)(d )(A =0.8,β=1.2)(e )(A =1.0,β=1.6)图8合成车牌雾图示例Fig.8Example of fog map of composite license plate4)划分训练集和验证集.训练集中干净车牌图像共1697张,对应的车牌雾图共59395张;验证集中干净图像共594张,对应车牌雾图共20790张.2.2实验设置本文采用自制的车牌雾图数据集(LPHaze Data⁃130第6期陈炳权等:基于深度多级小波U-Net 的车牌雾图去雾算法set )作为车牌图像去雾网络的训练和验证数据,其中所有图像像素大小均设置为64×128×3,batch size 为64.并对训练数据进行数据增强操作:随机水平翻转和随机垂直翻转(翻转概率随机取值0或1),以提升网络的鲁棒性.此外,在训练过程中,使用Adam 优化器来优化网络,其参数均采用默认值(β1=0.9和β2=0.999),并通过梯度裁剪策略加速网络收敛,训练800个epoch ,初始学习率为1e -4,总损失函数中α=1、γ=0.1和λ=0.01.采用Pytorch 包进行车牌雾图去雾网络结构代码的编写和训练,整个训练过程均在NVIDIA Tesla T4的显卡上运行.实验主要包括两个部分:其一,测试本文提出的车牌雾图去雾网络模型,其二,对其进行消融实验.上述实验在合成车牌雾图和自然拍摄的车牌雾图上进行测试,所有测试图像均随机选自OpenITS 提供的车牌图像数据库(与LPHaze Dataset 训练集中的数据不重合),并从测试结果中选取如下5种组合类型进行定性和定量分析,分别为(A =0.6,β=0.8)、(A =0.7,β=1.0)、(A =0.8,β=1.2)、(A =0.9,β=1.4)和(A =1.0,β=1.6),同时对其依次编号为组合A 到E.3结果与分析3.1测试实验1)合成车牌雾图去雾结果为了进一步评估算法性能,将本文算法与最近出现的经典去雾算法(基于引导滤波器的暗通道先验算法[4](GFDCP )、基于深度学习的端到端的去雾网络[8](DehazeNet )、端到端的一体化去雾网络(AODNet )、端到端门控上下文聚合去雾网络[23](GCANet )和端到端特征融合注意力去雾网络[17](FFANet ))进行比较.以上算法统一在LPHaze Data⁃set 的验证集上进行测试,并选取其中5类合成车牌雾图的测试结果进行实验分析,其结果见表2.由表2可以看出,在5类不同大气光和散射系数的合成车牌雾图上,与GCANet 得到的结果相比,在PSNR 均值上分别提高了5.64、6.74、8.84、10.52、11.88dB ,SSIM 均值上则分别提高了0.0368、0.0599、0.0991、0.1496、0.2225.同时,在图9中的PSNR 和24222018161412108P S N R (d B )组合A组合B组合C组合D组合EGFDCPDehazeNet AODNet GCANet FFANet 本文算法组合类型(a )6种算法在5类组合上的PSNR 均值曲线GFDCP DehazeNet AODNet GCANet FFANet 本文算法1.00.90.80.70.60.5S S I M组合A组合B组合C组合D组合E组合类型(b )6种算法在5类组合上的SSIM 均值曲线图96种算法在5类合成车牌雾图上的PSNR 和SSIM 均值曲线Fig.9PSNR and SSIM mean curves of 6algorithms on5types of composite license plate fog map表2合成车牌雾图去雾图像的PSNR (dB )/SSIM 均值Tab.2PSNR (dB )/SSIM mean of defogging image of composite license plate fog image组合类型(A =0.6,β=0.8)(A =0.7,β=1.0)(A =0.8,β=1.2)(A =0.9,β=1.4)(A =1.0,β=1.6)GFDCP20.75/0.946119.23/0.924817.85/0.900715.63/0.861612.70/0.8035DehazeNet19.31/0.895216.92/0.846014.20/0.793611.71/0.74509.57/0.6882AODNet13.79/0.775715.12/0.801314.60/0.745211.64/0.64818.01/0.5349GCANet18.86/0.925516.31/0.890613.82/0.845911.92/0.791310.14/0.7091FFANet18.09/0.894718.65/0.878419.31/0.851212.76/0.71678.61/0.5407本文算法24.50/0.962323.05/0.950522.66/0.945022.44/0.940922.02/0.9316131湖南大学学报(自然科学版)2022年SSIM 均值曲线图中亦可知,在重构图像质量方面,本文算法在处理不同雾度的合成车牌雾图上明显优于上述5类经典算法.最后,从合成车牌雾图的去雾图像中选取部分图片进行效果展示,如图10所示.从去雾效果中可以直观感受到,本文算法相较于其它算法而言,具有较少的雾度残留以及颜色失真.2)自然车牌雾图去雾结果本文还对实际拍摄的自然车牌雾图进行测试,并与上述5种经典算法的去雾效果进行视觉比较.该测试数据选自OpenITS 的车牌图像数据库,共915张实际拍摄的车牌雾图,视觉对比结果如图11所示.从图11可知:在处理常见的蓝底车牌雾图时,本文算法很少出现过度曝光和图像整体偏暗的问题,且雾度残留也很少;对于其它底色的车牌雾图(如图11中的黄底和蓝底车牌),本文算法在去雾效果上相较于上述5种经典算法仍能保持自身优势,并且在颜色、字符等图像信息上也能得到较好的恢复.(a )(A =0.6,β=0.8)(b )(A =0.7,β=1.0)(c )(A =0.8,β=1.2)(d )(A =0.9,β=1.4)(e )(A =1.0,β=1.6)合成雾图GFDCPDehazeNetAODNetGCANetFFANet本文算法干净图像图10合成车牌雾图去雾效果展示Fig.10Fog removal effect display of composite license plate图11实际拍摄的车牌雾图去雾效果展示比较Fig.11Comparison of defogging effect of actual license plate fog map自然车牌雾图GFDCP DehazeNet AODNet GCANet FFANet 本文算法132第6期陈炳权等:基于深度多级小波U-Net的车牌雾图去雾算法3.2不同模块对网络性能的影响为了分析其中各个模块的重要性,本文在LP⁃Haze Dataset数据集上进行消融研究分析,以基于ResGroup和“SOS”Block的MWCNN去雾网络作为基准网络模块,对于其他不同的网络模块,其简化形式及说明如下,R1:基于ResGroup和“SOS”Block的MWCNN作为基准网络,该网络不包含DCPA Block、LMSA和CSAE Block;R2:具有DCPA Block的基准网络;R3:具有DCPA Block、LMSA和CSAE Block的基准网络,即最终的车牌雾图去雾网络.上述网络模块均只训练150个epoch,且初始学习率均为1e-4,并在LPHaze Dataset的验证集上进行测试,其测试结果如表3所示.表3不同网络模块在LPHaze Dataset的验证集上测试结果的均值Tab.3Mean value of test results of different networkmodules on the verification set of LPHaze Dataset网络R1 R2 R3“SOS”Block√√√DCPABlock√√LMSA√CSAEBlock√PSNR/dB22.4722.4323.27SSIM0.94210.94320.9513由表3可得,在不加入DCPA Block、LMSA和CSAE Block的情形下,PSNR和SSIM的均值分别可以达到22.47dB和0.9421,而在加入三者后,PSNR 和SSIM均值则分别提升了0.8dB和0.0092,从而使网络能重建出高质量的去雾图像.3.3不同损失函数对网络性能的影响为了分析损失函数的有效性,本文算法分别采用L1、L2、L rh(即L1和L2损失的简单结合)和L total这四类损失函数来进行网络模型的训练,训练150个ep⁃och,且初始学习率为1e-4.分别在LPHaze Dataset的验证集上进行PSNR和SSIM的指标测试,其实验结果如表4所示.从表4可知,只使用L rh损失函数时,表4不同损失函数下车牌去雾网络测试结果Tab.4Test results of license plate defogging networkunder different loss functions损失函数L1L2L rhL total PSNR/dB22.7422.1923.0623.27SSIM0.94170.93710.94710.9513平均PSNR和SSIM可达到23.06dB和0.9471,且相较于单独使用L1或L2时均有着明显提升.而使用总损失函数L total时,平均PSNR和SSIM分别提升了0.21dB和0.0042,从而使网络性能得到较大的改善.4结论本文提出了一种基于深度多级小波U-Net的车牌雾图去雾算法,该算法以MWCNN作为去雾网络主体框架.首先,为了在小波域和空间域中尽可能地整合不同层级和尺度的图像特征,引入了“SOS”增强策略,并在MWCNN中间进行跨层连接,以此对图像特征进行整合、完善和优化;其次,本文将像素注意力、通道注意力和离散小波变换进行有效融合,从而尽可能去除车牌雾图中的雾度;最后,通过跨尺度聚合增强模块来弥补小波域和空间域之间存在的图像信息差异,进一步提高了重构图像质量.自然车牌雾图和LPHaze Dataset的验证集上的实验结果表明,在处理具有不同大气光照和雾度的车牌雾图上,本文算法具有较好的去雾表现,并且在处理具有不同底色的车牌雾图时具有一定的优势.参考文献[1]YADAV G,MAHESHWARI S,AGARWAL A.Foggy image en⁃hancement using contrast limited adaptive histogram equalizationof digitally filtered image:performance improvement[C]//2014In⁃ternational Conference on Advances in Computing,Communica⁃tions and Informatics(ICACCI).September24-27,2014,Delhi,India.IEEE,2014:2225-2231.[2]RUSSO F.An image enhancement technique combining sharpen⁃ing and noise reduction[J].IEEE Transactions on Instrumenta⁃tion and Measurement,2002,51(4):824-828.[3]GALDRAN A,BRIA A,ALVAREZ-GILA A,et al.On the dual⁃ity between retinex and image dehazing[C]//2018IEEE/CVF Con⁃ference on Computer Vision and Pattern Recognition.June18-23,2018,Salt Lake City,UT,USA.IEEE,2018:8212-8221.[4]HE K,SUN J,TANG X.Guided image filtering[C]//DANIILIDISK,MARAGOS P,PARAGIOS puter Vision-ECCV2010,11th European Conference on Computer Vision,Herak⁃lion,Crete,Greece,September5-11,2010,Proceedings,Part I.Springer,2010,6311:1–14.[5]HE K M,SUN J,TANG X O.Single image haze removal usingdark channel prior[C]//IEEE Transactions on Pattern Analysisand Machine Intelligence.IEEE,:2341-2353.[6]ZHU Q S,MAI J M,SHAO L.A fast single image haze removal133。