a-RST a generalization of rough set theory
- 格式:pdf
- 大小:142.44 KB
- 文档页数:16
Journal of Literature and Art Studies, July 2023, Vol. 13, No. 7, 511-513doi: 10.17265/2159-5836/2023.07.006Analysis of Animal Image in Art Spiegelman’s MausXUE Si-qiWuhan University of Technology, Wuhan, ChinaThe visual metaphors of cats for Germans, mice for Jews, pigs for Poles, dogs for Americans make the storycompelling without any extraneous description. In this paper, I explore the animal images in Maus, the mice, thecats and the pigs. By stating the chosen of mice image, symbolic meaning of mice and the function of “masks” thatmice put on to pretend to be others, we explore the universally acknowledged metaphor mice in Maus and interpretthe miserable history of Nazi Holocaust.Keywords: animal image, metaphor, MausAnimal images in literary works is artificially related to animal’s external manifestations and internal essence, including the anima l’s appearance, posture, facial features as well as the animal’s character traits and spiritual temperament. People’s cognition of animal images and the formation of animal conception are inevitably influenced by the social and cultural context. Being a Jewish descent brought up in America, Art inherits both Jewish and American culture. His work Maus is an autobiographical graphic novel narrates the history of Holocaust. Based on two-line structure, Maus tells readers what happened between “I” and my father at present, while showing readers the dark and cruel living situation of Jews during the World War II.Mice—Jewish as VictimsIn the graphic narration of Maus, Spiegelman imposes mice as a metaphor on all Jews. Mice are regarded as vermin by humans being which people try every means to exterminated. During the World War II, the Jews lived in an abyss of misery, just like mice in the dark ditch, being despised and disgusted. They were either piled up and killed in the mass shooting by German Nazis, or lived in fear and agitation, hid all the time in order to avoid the arrest and torture of the Nazis. In order to get food to survive, they had to risk their lives to trade on the black market. Some Jews would dig a hole in the ground as a hiding place and live just like mice to avoid being captured by the German Nazis. In the portrait of Maus, the adult mice are barricaded in Auschwitz Concentration Camp to do hard labor, the weak or the sick are sent to gas chambers or crematoriums and then end their lives. Even the crying little mice will be directly killed without any hesitation. The image of the mice here is not only a symbol of the Jews as victim during the World War II, but also a reflection of their living conditions and status. Jews can only live in the dark, filthy and slaughtered cages, with no dignity, no personality, and no right. Facingthe ferocious and brutal Nazis, they were like mice to cats, allowing them to tease, beat and even kill.XUE Si-qi, postgraduate, School of Foreign Languages, Wuhan University of Technnology.512ANALYSIS OF ANIMAL IMAGE IN ART S PIEGELMAN’S MAUSMice—Jewish Ethical GroupIn image processing, the author weakens the features of each character, so that each mouse presents a more similar facial image. This similar processing endows a generalization of the image after retaining the main features. The specific individual images of the persecuted Jews have been replaced by a unified image of rats. All we can see are the faces of rats depicted by the dots and lines. The rough lines and expressionless images of mice are the record of the dazed, miserable, helpless and tragic situation of the Jewish collective.With regard to the creation of Maus, Art has noted:All kinds of elisions and ellipses and compressions are a part of any shaped work, and my goal was to not betray what I could find out or what I had heard or what I knew but to give a shape to it. But giving shape also involves, by definition, the risk of distorting the underlying reality. Perhaps the only honest way to present such material is to say: “Here are all the documents I used, and here’s like thousands of hours of tape recording, and here’s a bunch of photographs to look at. Now, go make yourself a Maus!”The homogenous visualization of the group identities of animal species is a satire by Spiegelman’s sophisticated visual narrative strategy to Hitler’s categorizations of human beings by race and na tionality. Spiegelman intentionally effaces individual differences in the similar fashion as Hitler did to the Jews. Visually, the obscure individual identity makes distinct the group differences, implying that the Nazi party as a collective image commit the atrocity in the Holocaust. In this way, Maus is more than a story of individual but stands for a part of Jewish history.Mice—Covered in MasksIn people’s cognition, cats are often the nemesis of mice. This highlights the contradictory relationship bet ween Jews and German Nazis. If we say Jews are “victims”, then N a zis are “oppressor”. For the German Nazis, they arrested or massacred Jews during World War II, eager to eradicate all the Jews in the world. In Maus, cats are depicted as dangerous and fierce. Both the German soldier in steel helmets and the German civilians are all vicious, they are open mouthed, with point teeth and matted hair and whiskers, as if ready to pounce on its prey at any time. Spiegelman replaces the individual images of the German Nazis with the unified evil cat image. By adopting this, the overall identity and characteristics of the Germans become more clear. The executioners who organize and plan to exterminate the entire Jewish nation driven by Hitler’s ethnography.The image of pigs constructs the identity of another ethic group—Polish people. Pigs often appear as lazy, stupid and greedy. Art uses the image of pigs show the characteristics of Poles to a certain extent. The Polish tutor turns away the Vladek family although they used to be very close. The Polish escapee who deceived and betrayed Vladek and Anja, leading them captured by the German Nazis and thrown into Auschwitz Camp. In Auschwitz, the Polish prison chiefs would persecute Jews more cruelly than the German Nazis. This group, on the one hand, was oppressed and persecuted by the German Nazis, on the other hand, they became the accomplice in the crime. Their existence made the fate of the Jews even more tragic and helpless, adding a thicker stroke to that dark history.Within the narrative of Maus, the Jews actively attempt to cover up their identity, they are presented as characters wearing pig masks to pass for non-Jewish Poles, as a graphic representation of their struggle to stand apart from their own identity. The Jews eagerly seek for a safe persona: that of the privileged Pole, not theANALYSIS OF ANIMAL IMAGE IN ART S PIEGELMAN’S MAUS513oppressed Jew. While the disguise in Maus usually includes a form of dress, it is through the metaphorical mask device alone that readers recognize a character in disguise.In the scene where Vladek, passing as a Pole, meets up with a disguised Jew in Sosnowiec, the panels are framed in such a way that the other characters’ true identity are at first concealed from readers. The left-hand panel borders crop the “pig’s” face so that, in not allowing us full view of the character’s face, the readers might not see the mask device. Only when he identifies himself as a Jew are we allowed to see the mask in place-only then do we as readers learn that the other man, like Vladek, is in the process of standing outside his true self.By drawing all of the characters as masked humans, Spiegelman makes a move to somehow universalize, or at least compartmentalize, his experience, yet that very act of compartmentalization is itself inauthentic.ConclusionVladek assumes a mouse identity, imposes that metaphor on all Jews, even Art himself as a character in Maus within the main narrative. However, it seems that Spiegelman wants to ask us the authenticity of the metaphors. The very title of this chapter is significant in this respect. The previous chapter is entitled playfully punning on the mouse metaphor. In the narrative, the characters use the real name of place, Auschuwits. For this, where the weight of the past comes crashing into the present, the metaphors limits are exploded, and linguistic puns seems no longer fitting to the tone. “Maus chitz” becomes “Auschiwits”: no mistaking metaphors—Jews are Jews, not mice, the camp is “Auschiwitz”: not clearly titled stand in When the mask becomes clearly a metaphor, the truth behind it shine more clearly.ReferencesDoherty, T. (1996). Art Spiegelman’s Maus: Graphic art and the Holocaust. American Literature, 68(1), 69-84.Ewert, J. C. (2000). Reading visual narrative: Art Spiegelman’s Maus. Narrative, 8(1), 87-103.Hathaway, R. V. (2011). Reading Art Spiegelman’s “Maus” as postmodern ethnography. Journal of Folklore Research, 48(3).陈琳. (2017). 图像小说《鼠族》的表现语言研究(西安美术学院).段枫. (2016). 大屠杀记忆及其艺术想象:《鼠族》的绘画叙事策略. 外国文学研究, 38(4), 115-123.。
Quantum Walks on the HypercubeC RISTOPHER M OOREComputer Science DepartmentUniversity of New Mexico,Albuquerque and the Santa Fe Institute,Santa Fe,New Mexicomoore@A LEXANDER R USSELL Department of Computer Science and Engineering University of ConnecticutStorrs,Connecticutacr@November12,2001AbstractRecently,it has been shown that one-dimensional quantum walks can mix more quickly than clas-sical random walks,suggesting that quantum Monte Carlo algorithms can outperform their classicalcounterparts.We study two quantum walks on the n-dimensional hypercube,one in discrete time andone in continuous time.In both cases we show that the instantaneous mixing time isπ4n steps,fasterthan theΘn log n steps required by the classical walk.In the continuous-time case,the probabilitydistribution is exactly uniform at this time.On the other hand,we show that the average mixing timeas defined by Aharonov et al.[AAKV01]isΩn32in the discrete-time case,slower than the classical walk,and nonexistent in the continuous-time case.This suggests that the instantaneous mixing time is amore relevant notion than the average mixing time for quantum walks on large,well-connected graphs.Our analysis treats interference between terms of different phase more carefully than is necessary for thewalk on the cycle;previous general bounds predict an exponential average mixing time when applied tothe hypercube.1IntroductionRandom walks form one of the cornerstones of theoretical computer science.As algorithmic tools,theyhave been applied to a variety of central problems,such as estimating the volume of a convex body[DFK91,LK99],approximating the permanent[JS89,JSV00],andfinding satisfying assignments for Boolean for-mulae[Sch99].Furthermore,the basic technical phenomena appearing in the study of random walks(e.g.,spectral decomposition,couplings,and Fourier analysis)also support several other important areas such aspseudorandomness and derandomization(see,e.g.,[AS92,(9,15)]).The development of efficient quantum algorithms for problems believed to be intractable for classicalrandomized computation,like integer factoring and discrete logarithm[Sho97],has prompted the investi-gation of quantum walks.This is a natural generalization of the traditional notion discussed above where,roughly,the process evolves in a unitary rather than stochastic fashion.The notion of“mixing time,”thefirst time when the distribution induced by a random walk is sufficientlyclose to the stationary distribution,plays a central role in the theory of classical random walks.For a givengraph,then,it is natural to ask if a quantum walk can mix more quickly than its classical counterpart.(Sincea unitary process cannot be mixing,we define a stochastic process from a quantum one by performinga measurement at a given time or a distribution of times.)Several recent articles[AAKV01,ABN01,NV00]have answered this question in the affirmative,showing,for example,that a quantum walk on then-cycle mixes in time O n log n,a substantial improvement over the classical random walk which requires Θn2steps to mix.Quantum walks were also defined in[Wat01],and used to show that undirected graph connectivity is contained in a version of quantum LOGSPACE.These articles raise the exciting possibilitythat quantum Monte Carlo algorithms could form a new family of quantum algorithms that work morequickly than their classical counterparts.Two types of quantum walks exist in the literature.Thefirst,introduced by[AAKV01,ABN01,NV00],studies the behavior of a“directed particle”on the graph;we refer to these as discrete-time quantumwalks.The second,introduced in[FG98,CFG01],defines the dynamics by treating the adjacency matrixof the graph as a Hamiltonian;we refer to these as continuous-time quantum walks.The landscape isfurther complicated by the existence of two distinct notions of mixing time.The“instantaneous”notion[ABN01,NV00]focuses on particular times at which measurement induces a desired distribution,whilethe“average”notion[AAKV01],another natural way to convert a quantum process into a stochastic one,focuses on measurement times selected randomly from some interval.In this article,we analyze both the continuous-time and a discrete-time quantum walk on the hypercube.In both cases,the walk is shown to have an instantaneous mixing time atπ4n.(Of course,in the discretewalk all times are integers.)Recall that the classical walk on the hypercube mixes in timeΘn log n,sothat the quantum walk is faster by a logarithmic factor.Moreover,in the discrete-time case the walk mixesin time less than the diameter of the graph,sinceπ41;astonishingly,in the continuous-time case theprobability distribution at tπ4n is exactly uniform.Both of these things happen due to a marvelousconspiracy of destructive interference between terms of different phase.These walks show i.)a similarity between the two notions of quantum walks,and ii.)a disparity between the two notions of quantum mixing times.As mentioned above,both walks have an instantaneous mixing time at timeπ4n.On the other hand,we show that the average mixing time of the discrete-time walk isΩn32,slower than the classical walk,and that for the continuous-time walk there is no time at which the time-averaged probability distribution is close to uniform in the sense of[AAKV01].Our results suggest that for large graphs(large compared to their mixing time)the instantaneous notion of mixing time is more appropriate than the average one,since the probability distribution is close to uniform only in a narrow window of time.The analysis of the hypercubic quantum walk exhibits a number of features markedly different fromthose appearing in previously studied walks.In particular,the dimension of the relevant Hilbert space is,for the hypercube,exponential in the length of the desired walk,while in the cycle these quantities are roughly equal.This requires that interference be handled in a more delicate way than is required for the walk on the cycle;in particular,the general bound of[AAKV01]yields an exponential upper bound on the mixing time for the discrete-time walk.We begin by defining quantum walks and discussing various notions of mixing time.We then analyze the two quantum walks on the hypercube in Sections2and3.(Most of the technical details for the discrete-time walk are relegated to an appendix.)1.1Quantum walks and mixing timesAny graph G V E gives rise to a familiar Markov chain by assigning probability1d to all edges leaving each vertex v of degree d.Let P t u v be the probability of visiting a vertex v at step t of the random walk on G starting at u.If G is undirected,connected,and not bipartite,then lim t∞P t u exists1and is independent of u.A variety of well-developed techniques exist for establishing bounds on the rate at which P t u achieves this limit(e.g.,[Vaz92]);if G happens to be the Cayley graph of a group(as are,for example,the cycle and the hypercube),then techniques from Fourier analysis can be applied[Dia88].Below we will use some aspects of this approach,especially the Diaconis-Shahshahani bound on the total variation distance[DS81].For simplicity,we restrict our discussion to quantum walks on Cayley graphs;more general treatments of quantum walks appear in[AAKV01,FG98].Before describing the quantum walk models we set down some notation.For a group G and a set of generatorsΓsuch thatΓΓ1,let X GΓdenote the undirected Cayley graph of G with respect toΓ.For a finite set S,we let L S f:S denote the collection of-valued functions on S with∑s S f s2 1. This is a Hilbert space under the natural inner product f g∑s S f s g s.For a Hilbert space V,a linear operator U:V V is unitary if for all v w V,v w U v U w;if U is represented as a matrix,this is equivalent to the condition that U†U1where†denotes the Hermitian conjugate.There are two natural quantum walks that one can define for such graphs,which we now describe.T HE DISCRETE-TIME WALK:This model,introduced by[AAKV01,ABN01,NV00],augments the graph with a direction space,each basis vector of which corresponds one of the generators inΓ.A step of the walk then consists of the composition of two unitary transformations;a shift operator which leaves the direction unchanged while moving the particle in its current direction,and a local transformation which op-erates on the direction while leaving the position unchanged.To be precise,the quantum walk on X GΓis defined on the space L GΓL G LΓ.LetδγγΓbe the natural basis for LΓ,andδg g G the natural basis for L G.Then the shift operator is S:δgδγδgγδγ,and the local transformation isˇD1D where D is defined on LΓalone and1is the identity on L G.Then one“step”of the walk corresponds to the operator UˇDV.If we measure the position of the particle,but not its direction,at time t,we observe a vertex v with probability P t v∑γΓU tψ0δvδγ2whereψ0L GΓis the initial state.T HE CONTINUOUS-TIME WALK:This model,introduced in[FG98],works directly on L G.The walk evolves by treating the adjacency matrix of the graph as a Hamiltonian and using the Schr¨o dinger equation. Specifically,if H is the adjacency matrix of X GΓ,the evolution of the system at time t is given by U t, where U t eq e iHt(here we use the matrix exponential,and U t is unitary since H is real and symmetric). Then if we measure the position of the particle at time t,we observe a vertex v with probability P t vU tψ0δv2whereψ0is the initial state.Note the analogy to classical Poisson processes:since U t e iHt 1In fact,this limit exists under more general circumstances;see e.g.[MR95].1iHt iHt22,the amplitude of making s steps is the coefficient it s s!of H s,which up to normalization is Poisson-distributed with mean t.Remark.In[CFG01],the authors point out that defining quantum walks in continuous time allows unitarity without having to extend the graph with a direction space and a chosen local operation.On the other hand,it is harder to see how to carry out such a walk in a generically programmable way using only local information about the graph,for instance in a model where we query a graph tofind out who our neighbors are.Instead,continuous-time walks might correspond to special-purpose analog computers,where we build in interactions corresponding to the desired Hamiltonian and allow the system to evolve in continuous time.In both cases we start with an initial wave function concentrated at a single vertex corresponding to the identity u of the group.For the continuous-time walk,this corresponds to a wave functionψ0v ψ0δv(so thatψ0u1andψ0v0for all v u).For the discrete-time walk,we start with a uniform superposition over all possible directions,ψ0vγψ0δvδγ1Γif v u 0otherwise.For the hypercube,u000.In order to define a discrete-time quantum walk,one must select a local operator D on the directionspace.In principle,this introduces some arbitrariness into the definition.However,if we wish D to respectthe permutation symmetry of the n-cube,and if we wish to maximize the operator distance between D andthe identity,we show in Appendix A that we are forced to choose Grover’s diffusion operator[Gro96],which we recall below.We call the resulting walk the“symmetric discrete-time quantum walk”on then-cube.(Watrous[Wat01]also used Grover’s operator to define quantum walks on undirected graphs.)Since for large n Grover’s operator is close to the identity matrix,one might imagine that it would takeΩn12steps to even change direction,giving the quantum walk a mixing time of n32,slower than the classical random walk.However,like many intuitions about quantum mechanics,this is simply wrong.Since the evolution of the quantum walk is governed by a unitary operator rather than a stochastic one,unless P t is constant for all t,there can be no“stationary distribution”lim t∞P t.In particular,for anyε0,there are infinitely many(positive,integer)times t for which U t1εso that U tψuψuεand P t is close to the initial distribution.However,there may be particular stopping times t which induce distributions close to,say,the uniform distribution,and we call these instantaneous mixing times:Definition1We say that t is anε-instantaneous mixing time for a quantum walk if P t Uε,where A B12∑v A v B v denotes total variation distance and U denotes the uniform distribution.For these walks we show:Theorem1For the symmetric discrete-time quantum walk on the n-cube,t kπ4n is anε-instantaneousmixing time withεO n76for all odd k.and,even more surprisingly,Theorem2For the continuous-time quantum walk on the n-cube,t kπ4n is a0-instantaneous mixingtime for all odd k.Thus in both cases the mixing time isΘn,as opposed toΘn log n as it is in the classical case.Aharonov et al.[AAKV01]define another natural notion of mixing time for quantum walks,in whichthe stopping time t is selected uniformly from the set0T1.They show that the time-averageddistributions¯P T1T∑T1t0P t do converge as T∞and study the rate at which this occurs.For a continuous-time random walk,we analogously define the distribution¯P T v1T T0P t v d t.Then we call a time at which¯P T is close to uniform an average mixing time:Definition2We say that T is anε-average mixing time for a quantum walk if¯P T Uε.In this paper we also calculate theε-average mixing times for the hypercube.For the discrete-time walk,it is even longer than the mixing time of the classical random walk:Theorem3For the discrete-time quantum walk on the n-cube,theε-average mixing time isΩn32ε. This is surprising given that the instantaneous mixing time is only linear in n.However,the probability distribution is close to uniform only in narrow windows around the odd multiples ofπ4n,so¯P T is far from uniform for significantly longer times.We also observe that the general bound given in[AAKV01]yields an exponential upper bound on the average mixing time,showing that it is necessary to handle interference for walks on the hypercube more carefully than for those on the cycle.For the continuous-time walk the situation is even worse:while it possesses0-instantaneous mixing times at all odd multiples ofπ4n,the limiting distribution lim T∞¯P T is not uniform,and we show the following:Theorem4For the continuous-time quantum walk on the n-cube,there existsε0such that no time is an ε-average mixing time.Our results suggest that in both the discrete and continuous-time case,the instantaneous mixing time is a more relevant notion than the average mixing time for large,well-connected graphs.2The symmetric discrete-time walkIn this section we prove Theorem1.We treat the n-cube as the Cayley graph of n2with the regular basis vectors e i010with the1appearing in the i th place.Then the discrete-time walk takes place in the Hilbert space L n2n where n1n.Here thefirst component represents the position x of the particle in the hypercube,and the second component represents the particle’s current“direction”;if this is i,the shift operator willflip the i th bit of x.As in[AAKV01,NV00],we will not impose a group structure on the direction space,and will Fourier transform only over the position space.For this reason,we will express the wave functionψL n2n as a functionΨ:n2n,where the i th coordinate ofΨx is the projection ofψintoδxδi,i.e.the complex amplitude of the particle being at position x with direction i.The Fourier transform of such an elementΨis˜Ψ:n2n,where˜Ψk∑x1k xΨx.Then the shift operator for the hypercube is S:Ψx∑n i1πiΨx e i where e i is the i th basis vector in the n-cube,andπi is the projection operator for the i th direction.The reason for considering the Fourier transform above is that the shift operator isdiagonal in the Fourier basis:specifically it maps˜Ψk Sk˜Ψk whereS k 1k101k2...01k nFor the local transformation,we use Grover’s diffusion operator on n states,D i j2nδi j.The advantage of Grover’s operator is that,like the n-cube itself,it is permutation symmetric.We use thissymmetry to rearrange UkSkD to put the negated rows on the bottom,UkSkD2n12n2n2n12n......2n12n2n2n2n1......where the top and bottom blocks have n k and k rows respectively;here k is the Hamming weight of k.The eigenvalues of Uk then depend only on k.Specifically,Ukhas the eigenvalues1and1withmultiplicity k1and n k1respectively,plus the eigenvaluesλλwhereλ12kn2ink n k e iωkandωk0πis described bycosωk12knsinωk2nk n kIts eigenvectors with eigenvalue1span the k1-dimensional subspace consisting of vectors with support on the k“flipped”directions that sum to zero,and similarly the eigenvectors with eigenvalue1span the n k1-dimensional subspace of vectors on the n k other directions that sum to zero.We call these the trivial eigenvectors.The eigenvectors ofλλe iωk arev k v k1in k 1 kWe call these the non-trivial eigenvectors for a given k.Over the space of positions and directions these eigenvectors are multiplied by the Fourier coefficient1k x,so as a function of x and direction1j n the two non-trivial eigenstates of the entire system,for a given k,arev k x j1k x2n21if kj1i n k if k j0with eigenvalue e iωk,and its conjugate vkwith eigenvalue e iωk.We take for our initial wave function a particle at the origin000in an equal superposition of directions.Since its position is aδ-function in real space it is uniform in Fourier space as well as over thedirection space,giving˜Ψ0k2n2n 11This is perpendicular to all the trivial eigenvectors,so theiramplitudes are all zero.The amplitude of its component along the non-trivial eigenvector vkisa k Ψ0vk2n2kni1kn(1)and the amplitude of vk is ak.Note that ak22n2,so a particle is equally likely to appear in eithernon-trivial eigenstate with any given wave vector.At this point,we note that there are an exponential number of eigenvectors in which the initial state has a non-zero amplitude.In Section2.1,we observe that for this reason the general bound of Aharonov et al. [AAKV01]yields an exponential(upper)bound on the mixing time.In general,this bound performs poorly whenever the number of important eigenvalues is greater than the mixing time.Instead,we will use the Diaconis-Shahshahani bound on the total variation distance in terms of the Fourier coefficients of the probability[Dia88].If P t x is the probability of the particle being observed at position x at time t,and U is the uniform distribution,then the total variation distance is bounded byP t U214∑k00k11˜Pt k214n1∑k1nk˜Pt k2(2)Here we exclude both the constant term and the parity term k 11;since our walk changes position at every step,we only visit vertices with odd or even parity at odd or even times respectively.Thus U here means the uniform distribution with probability 2n 1on the vertices of appropriate parity.To find ˜Pt k ,we first need ˜Ψt k .As Nayak and Vishwanath [NV00]did for the walk on the line,we start by calculating the t th matrix power of U k .This isU t k a 1t aa a1t c ......b 1t bc bb 1t ......wherea cos ωk t 1t n k b cos ωk t 1t k and c sin ωk t Starting with the uniform initial state,the wave function after t steps is˜Ψt k 2n 2n cos ωk t n k sin ωk t n kcos ωk t n kksin ωk t k (3)In the next two sections we will use this diagonalization to calculate the average and instantaneous mixing times,which are Ωn 32and Θn respectively.2.1Bounds on the average mixing time of the discrete-time walkIn this section,we prove Theorem 3.To do this,it’s sufficient to calculate the amplitude at the origin.Fouriertransforming Equation 3back to real space at x 00gives ψt 02n 2∑k ˜Ψt k 2n n n ∑k 0n k cos ωk t cos ωk tnThe probability the particle is observed at the origin after t steps is thenP t 0ψt 022nn ∑k 0n k cos ωk t 2Let k 1x n 2.For small x ,k is near the peak of the binomial distribution,and ωk cos 1x π2x O x 3so the angle θbetween ωk for successive values of k is roughly constant,θ2n O x 2leading to constructive interference if θt 2π.Specifically,let t m be the even integer closest to πmn forinteger m .Then cos ωk t mcos 2πkm O x 3mn 1O x 6m 2n 2.By a standard Chernoff bound,2n ∑k 1x n 2n k o 1so long as x ωn 12.Let x νn n 12where νn is a function that goes to infinity slowly as a function of n .We then write P t 0o 12n1x n 2∑k 1x n 2n k 1O x 6m 2n 221O νn 6m 2nwhich is 1o 1as long as m o n 12νn 3,in which case t mo n 32νn 3.For a functionψ:n2n withψ21and a set S n2,we say thatψis c-supported on S if the probability x S is at least c,i.e.∑x S d nψx d2c.The discussion above shows thatψt m is 1o1-supported on0for appropriate t mπmn.Note that ifψis c-supported on0then,as U is local,U kψmust be c1c2c1-supported on W k,the set of vertices of weight k.(The factor of1c is due to potential cancellation with portions of the wave function supported outside0.)Inparticular,at times t m k,for k n2n,ψt m k is1o1-supported on W n2x.If x x nωn,then W12δn2n o1and,evidently,the average1T∑T i P i has total variation distance1o1from the uniform distribution if T o n32.Thus we see that in the sense of[AAKV01],the discrete-time quantum walk is actually slower than the classical walk.In the next section,however,we show that its instantaneous mixing time is only linear in n.We now observe that the general bound of[AAKV01]predicts an average mixing time for the n-cube which is exponential in n.In that article it is shown that the variation distance between¯P T and the uniform distribution(or more generally,the limiting distribution lim T∞¯P T)is bounded by a sum over distinct pairs of eigenvalues,¯PT U 2T∑i j s tλiλja i2λiλj(4)where a iψ0v i is the component of the initial state along the eigenvector v i.(Since this bound includes eigenvaluesλj for which a j0,we note that it also holds when we replace a i2with a i a j,using the same reasoning as in[AAKV01].)For the quantum walk on the cycle of length n,this bound gives an average mixing time of O n log n. For the n-cube,however,there are exponentially many pairs of eigenvectors with distinct eigenvalues,all ofwhich have a non-zero component in the initial state.Specifically,for each Hamming weight k there are nk non-trivial eigenvectors each with eigenvalue e iωk and e iωk.These complex conjugates are distinct from each other for0k n,and eigenvalues with distinct k are also distinct.The number of distinct pairs is thenn1∑k1nk24n∑k k0nknkΩ4nTaking a k2n2from Equation1and the fact thatλiλj2since theλi are on the unit circle, we see that Equation4gives an upper bound on theε-average mixing time of sizeΩ2nε.In general,this bound will give a mixing time of O Mεwhenever the initial state is distributed roughly equally over M eigenvectors,and when these are roughly equally distributed overω1distinct eigenvalues.2.2The instantaneous mixing time of the discrete-time walkTo prove Theorem1we could calculateΨt x by Fourier transforming˜P t k back to real space for all x. However,this calculation turns out to be significantly more awkward than calculating the Fourier trans-form of the probability distribution,˜P t k,which we need to apply the Diaconis-Shahshahani bound.Since P t xΨt xΨt x,and since multiplications in real space are convolutions in Fourier space,we perform a convolution over n2:˜Pt k∑k ˜Ψt k˜Ψt k kwhere the inner product is defined on the direction space,u v∑n i1u i v i.We write this as a sum over j, the number of bits of overlap between k and k,and l,the number of bits of k outside the bits of k(and so overlapping with k k).Thus k has weight j l,and k k has weight k j l.Calculating the dot product˜Ψt k˜Ψt k k explicitly from Equation3as a function of these weights and overlaps gives˜P t k 12k∑j0n k∑l0kjn klcosωj l t cosωk j l t A sinωj l t sinωk j l t(5)whereA cosωk cosωj l cosωk j l sinωj l sinωk j lThe reader can check that this gives˜P t01for the trivial Fourier component where k0,and˜P t n 1t for the parity term where k n.Using the identities cos a cos b12cos a b cos a b and sin a sin b12cos a b cos a b we can re-write Equation5as˜P t k 12k∑j0n k∑l0kjn kl1A2cosωt1A2cosωt12k∑j0n k∑l0kjn klY(6)whereωωj lωk j l.The terms cosωt in Y are rapidly oscillating with a frequency that increases with t.Thus,unlike the walk on the cycle,the phase is rapidly oscillating everywhere,as a function of either l or j.This will make the dominant contribution to˜P t k exponentially small when t nπ4,giving us a small variation distance when we sum over all k.To give some intuition for the remainder of the proof,we pause here to note that if Equation6were an integral rather than a sum,we could immediately approximate the rate of oscillation of Y tofirst order at the peaks of the binomials,where j k2and l n k 2.One can check that dωk d k2n and hence dωd l dωd j4n.Since A1,we would then write˜Pt k O 12k∑j0n k∑l0kjn kle4i jt n e4ilt nwhich,using the binomial theorem,would give˜Pt k O 1e4it n2k1e4it n2n kcos k2tncos n k2tn(7)In this case the Diaconis-Shahshahani bound and the binomial theorem giveP t U214∑0k nnkcos k2tncos n k2tn2122cos22tnn1cos22tnn1If we could take t to be the non-integer valueπ4n,these cosines would be zero.This will,in fact,turn out to be the right answer.But since Equation6is a sum,not an integral,we have to be wary of resonances where the oscillations are such that the phase changes by a multiple of2πbetween adjacent terms,in which case these terms will interfere constructively rather than destructively.Thus to show that thefirst-order oscillation indeed dominates,we have a significant amount of work left to do.The details of managing these resonances can be found in Appendix B.The process can be summarized as follows:i.)we compute the Fourier transform of the quantity Y in Equation6,since the sum of Equation6 can be calculated for a single Fourier basis function using the binomial theorem;ii.)the Fourier transform0.20.40.60.810.20.40.60.81(a)Variation distance at time t as a function of t n .(b)Log 2Probability as a function of Hamming weight.Figure 1:Graph (a)plots an exact calculation of the total variation distance after t steps of the quantum walk for hypercubes of dimension 50,100,and 200,as a function of t n .At t n π4the variation distance is small even though the walk has not had time to cross the entire graph.This happens because the distribution is roughly uniform across the equator of the n -cube where the vast majority of the points are located.Note that the window in which the variation distance is small gets narrower as n increases.Graph (b)shows the log 2probability distribution on the 200-dimensional hypercube as a function of Hamming distance from the starting point after 157π4n steps.The probability distribution has a plateau of 2199at the equator,matching the uniform distribution up to parity.of Y can be asymptotically bounded by the method of stationary phase.The dominant stationary point corresponds to the first-order oscillation,but there are also lower-order stationary points corresponding to faster oscillations;so iii.)we use an entropy bound to show that the contribution of the other stationary points is exponentially small.To illustrate our result,we have calculated the probability distribution,and the total variation distance from the uniform distribution (up to parity),as a function of time for hypercubes of dimension 50,100,and 200.In order to do this exactly,we use the walk’s permutation symmetry to collapse its dynamics to a function only of Hamming distance.In Figure 1(a)we see that the total variation distance becomes small when t n π4,and in Figure 1(b)we see how the probability distribution is close to uniform on a “plateau”across the hypercube’s equator.Since this is where the vast majority of the points are located,the total variation distance is small even though the walk has not yet had time to cross the entire graph.3The continuous-time walkIn the case of the hypercube,the continuous-time walk turns out to be particularly easy to analyze.Theadjacency matrix,normalized by the degree,is H x y1n if x and y are adjacent,and 0otherwise.Interpreting H as the Hamiltonian treats it as the energy operator,and of course increasing the energy makes the system run faster;we normalize by the degree n in order to keep the maximum energy of the system,and so the rate at which transitions occur,constant as a function of n .The eigenvectors of H and U t are simply the Fourier basis functions:if v k x1k x then Hv k 12k n v k and U t v k e it 12k n v k where we again use k to denote the Hamming weight of k .If our initial wave vector has a particle at 0,then its initial Fourier spectrum is uniform,and at time t we have˜Ψt k 2n 2e it 12k n Again writing the probability P as the convolution of Ψwith Ψin Fourier space,。
A PTAS for the Multiple Knapsack ProblemChandra Chekuri Sanjeev KhannaAbstractThe Multiple Knapsack problem(MKP)is a natural and well knowngeneralization of the single knapsack problem and is defined asfollows.We are given a set of items and bins(knapsacks)suchthat each item has a profit and a size,and each bin hasa capacity.The goal is tofind a subset of items of maximumprofit such that they have a feasible packing in the bins.MKP is aspecial case of the Generalized Assignment problem(GAP)wherethe profit and the size of an item can vary based on the specific binthat it is assigned to.GAP is APX-hard and a-approximation for itis implicit in the work of Shmoys and Tardos[26],and thus far,thiswas also the best known approximation for MKP.The main resultof this paper is a polynomial time approximation scheme for MKP.Apart from its inherent theoretical interest as a common gen-eralization of the well-studied knapsack and bin packing problems,it appears to be the strongest special case of GAP that is not APX-hard.We substantiate this by showing that slight generalizationsof MKP are APX-hard.Thus our results help demarcate the bound-ary at which instances of GAP become APX-hard.An interestingand novel aspect of our approach is an approximation preservingreduction from an arbitrary instance of MKP to an instance withdistinct sizes and profits.1IntroductionWe study the following natural generalization of the classicalknapsack problem:Multiple Knapsack Problem(MKP)I NSTANCE:A pair where is a set of bins(knapsacks)and is a set of items.Each bin hasa capacity,and each item has a size and a profit.O BJECTIVE:Find a subset of maximum profit suchthat has a feasible packing in.The decision version of MKP is a generalization of thedecision versions of both the knapsack and bin packingproblems and is strongly NP-Complete.Moreover,it isan important special case of the generalized assignmentproblem where both the size and the profit of an item area function of the bin:1GAP has also been defined in the literature as a(closely related)minimization problem(see[26]).In this paper,following[24],we referto the maximization version of the problem as GAP and refer to theminimization version as Min GAP.item sizes and profits as well as bin capacities may take anyarbitrary values.Establishing a PTAS would show a very fine separation between cases that are APX-hard and thosethat have a PTAS.Until now,the best known approximationratio for MKP was a factor of derived from the approxima-tion for GAP.Our main result:In this paper we resolve the approxima-bility of MKP by obtaining a PTAS for it.It can be easilyshown via a reduction from the Partition problem that MKPdoes not admit a FPTAS even if.A special case of MKP is when all bin capacities are equal.It is relativelystraightforward to obtain a PTAS for this case using ideas from approximation schemes for knapsack and bin packing[11,3,15].However the general case with different bincapacities is a non-trivial and challenging problem.Our paper contains two new technical ideas.Ourfirst idea con-cerns the set of items to be packed in a knapsack instance.We show how to guess,in polynomial time,almost all the items that are to be packed in a given knapsack instance.Inother words we can identify an item set that has a feasible packing and profit at least OPT.This is in contrastto earlier schemes for variants of knapsack[11,1,5]whereonly the most profitable items are guessed.An easy corollary of our strategy is a PTAS for the identical bincapacity case,the details of which we point out later.Thestrengthened guessing plays a crucial role in the general case in the following way.As a byproduct of our guessingscheme we show how to round the sizes of items to at most distinct sizes.An immediate consequence of this is a quasi-polynomial time algorithm to pack items intobins using standard dynamic programming.Our second set of ideas shows that we can exploit the restricted sizesto pack,in polynomial time,a subset of the item set thathas at least a fraction of the profit.Approximation schemes for number problems are usually based on rounding instances to have afixed number of distinct values.In contrast,MKP appears to require a logarithmic number of values.We believe that our ideas to handle logarithmic number of distinct values willfind other applications.Figure 1summarizes the approximability of various restrictions of GAP.Related work:MKP is closely related to knapsack,bin packing,and GAP.A very efficient FPTAS exists for the knapsack problem;Lawler[17],based on ideas from[11], achieves a running time of for a approximation.An asymptotic FPTAS is known for bin packing[3,15].Recently Kellerer[16]has independently developed a PTAS for the special case of the MKP where all bins have identical capacity.As mentioned earlier, this case is much simpler than the general case and falls out as a consequence of ourfirst idea.The generalized assignment problem,as we phrased it,seeks to maximize profit of items packed.This is natural when viewed as a knapsack problem(see[24]).The minimization version of the problem,referred to as Min GAP and also as the cost assignment problem,seeks to assign all the items while minimizing the sum of the costs of assigning items to bins. In this version,item incurs a cost if assigned to bin instead of a obtaining a profit.Since the feasibility of assigning all items is itself an NP-Complete problem we need to relax the bin capacity constraints.Anbi-criteria approximation algorithm for Min GAP is one that gives a solution with cost at most and with bin capacities violated by a factor of at most where is the cost of an optimal solution that does not violate any capacity constraints.Work of Lin and Vitter[20]yields a approximation for Min GAP.Shmoys and Tardos[26],building on the work of Lenstra,Shmoys, and Tardos[19],give an improved approximation. Implicit in this approximation is also a-approximation for the profit maximization version which we sketch later. Lenstra et al.[19]also show that it is NP-hard to obtain a ratio for any.The hardness relies on a NP-Completeness reduction from3-Dimensional Matching. Our APX-hardness for the maximization version,mentioned earlier,is based on a similar reduction but instead relies on APX-hardness of the optimization version of3-Dimensional matching[14].MKP is also related to two variants of variable size bin packing.In thefirst variant we are given a set of items and set of bin capacities.The objective is tofind a feasible packing of items using bins with capacities restricted to be from so as to minimize the sum of the capacities of the bins used.A PTAS for this problem was provided by Murgolo [25].The second variant is based on a connection to multi-processor scheduling on uniformly related machines[18]. The objective is to assign a set of independent jobs with given processing times to machines with different speeds to minimize the makespan of the schedule.Hochbaum and Shymoys[8]gave a PTAS for this problem using a dual based approach where they convert the scheduling problem into the following bin packing problem.Given items of different sizes and bins of different capacities,find a packing of all the items into the bins such that maximum violation of the capacity of any bin is minimized. Bi-criteria approximations,where both capacity and profit can be approximated simultaneously,have been studied for several problems(Min GAP being an example mentioned above)and it is usually the case that relaxing both makes the task of approximation somewhat easier.In particular, relaxing the capacity constraints allows rounding of item sizes into a small number of distinct size values.In MKP we are neither allowed to exceed the capacity constraints nor the number of bins.This makes the problem harder and our result interesting.Knapsack GAPMultiple identical capacity binsNo FPTAS even with 2 binsItem size varies with binsvary with bins.Both size and profitMultiple non-identical capacity binsPTAS (special case of Theorem 2.1)2-approximable (Proposition 3.2, [26])if all sizes are identical (Proposition 3.1)However, it is polynomial time solvable only 2 distinct profits (Theorem 3.2)MKPPTAS (Theorem 2.1)(Theorem 3.1)APX-hard even when each item takes Item profit varies with binsFPTAS[10,15]sizes and all profits are identical.each item takes only 2 distinct APX-hard even whenFigure 1:Complexity of Various Restrictions of GAPOrganization:Section 2describes our PTAS for MKP.In Section 3,we show that GAP is APX-hard on very restricted classes of instances.We also indicate here a -approximation for GAP.In Section 4,we discuss a natural greedy algorithmfor MKP and show that it gives a-approximation even when item sizes vary with bins.2A PTAS for the Multiple Knapsack Problem We denote by OPT the value of an optimal solution to thegiven instance.Given a setof items,we use to denote .The set of integers is denoted by .Our problem is related to both the knapsack problem and the bin packing problem and some ideas used in approximation schemes for those problems will be useful to us.Our approximation scheme conceptually has the following two steps.1.G UESSING I TEMS :Identify a set of itemssuch that OPT and has a feasible packing in .2.P ACKING I TEMS :Given a setof items that has a feasible packing in ,find a feasible packing for a set such that .The overall scheme is more involved since there is interaction between the two steps.The guessed items have some additional properties that are exploited in the packing step.We observe that both of the above steps require new ideas.For the single knapsack problem no previous algorithm identifies the full set of items to pack.Moreover testing the feasibility of packing a given set of items is trivial.However in MKP the packing step is itself quite complex and it seems necessary to decompose the problem as we did.Before we proceed with the details we show how the first step of guessing the items immediately gives a PTAS for the identical bin capacity case.2.1MKP with Identical Bin CapacitiesSuppose we can guess an item set as in our first step above.We show that the packing step is very simple if bin capacities are identical.There are two cases to consider depending on whether ,the number of bins,is less than or not.In the former case the number of bins can be treated as a constant and a PTAS for this case exists even for instances of GAP (implicit in earlier work [5]).Now suppose .We use the any of the known PTASs for bin packing andpack all the guessed items using at mostbins.We find a feasible solution by simply picking the largest profit bins and discarding the rest along with their items.Here weuse the fact that and that the bins are identical.It is easily seen that we get aapproximation.We note that a different PTAS,without using our guessing step,can be obtained for this case by directly adapting the ideas used in approximation schemes for bin packing.The trick of using extra bins does not have a simple analogue when bin capacities are different and we need more ideas.2.2Guessing ItemsConsider the case when items have the same profit(assume it is).Thus the objective is to pack as many items as possible.For this case it is easily seen that OPT is an integer in.Further,given a guess for OPT,we can always pick the smallest(in size)OPT items to pack.Therefore there are only a polynomial number of guesses for the set of items to pack. This idea does not have a direct extension to non-uniform profits.However the useful insight is that when profits are identical we can pick the items in order of their sizes.In the rest of the paper we assume,for simplicity of notation,that and are integers.For the general case thefirst step in guessing items involves massaging the given instance into a more structured one that has few distinct profits.This is accomplished as follows.1.Guess a value such that OPT OPTand discard all items where.2.Scale all profits by such that they are in the range.3.Round down the profits of items to the nearest power of.Thefirst two steps are similar to those in the FPTAS for the single knapsack problem.It is easily seen that at most an fraction of the optimal profit is lost by our transformation.Summarizing:L EMMA2.1.Given an instance with items and a value such that OPT OPT we can obtain in polynomial time another instancesuch that.For every,for some.OPTitems.We discard the items in and for each we increase the size of every item in to the size of the smallest item in.Since is ordered by size no item in is larger than the smallest item in for each.It is easy to see that if has a feasible packing then the modified instance also has a feasible packing.We discard at most an fraction of the profit and the modified sizes have at most distinct values.Applying this to each profit class we obtain an instance with distinct size values.L EMMA2.3.Given an instance with items we can obtain in polynomial time instances such thatFor,and has items only from distinct profit values.For,and has items only from distinct size values.There exists an index such that has a feasible packing in and OPT.We will assume for the next section that we have guessed the correct set of items and that they are partitioned into sets with each set containing items of the same size.We denote by the items of the th size value and by the quantity.2.3Packing ItemsFrom Lemma2.3we obtain a restricted set of instances in terms of item profits and sizes.We also need some structure in the bins and we start by describing the necessary transformations.2.3.1Structuring the BinsAssume without loss of generality that the smallest bin capacity is.We order the bins in increasing order of their capacity and partition them into blocks such that block consists of all bins with.Let denote the number of bins in block.D EFINITION2.1.(Small rge Blocks)A block of bins is called a small bin block if;it is called large otherwise.Let be the set of indices such that is small.Define to be the set of largest indices in the set.Let and be the set of all bins in the blocks specified by the index sets and respectively.The following lemma makes use of the property of geometrically increasing bin capacities.L EMMA2.4.Let be a set of items that can be packed in the bins.There exists a set such that can be packed into the bins,and.Proof.Fix some packing of in the bins.Consider the largest bins in.One of these bins has a profit less than.Without loss of generality assume its capacity is.We will remove the items packed in this bin and use it to pack items from smaller bins.Let be the block containing this bin.Bins in block where andhave a capacity less than.It is easy to verify that the total capacity of bins in small bin blocks with indices less than equal to is at most.Since each of the first bins could be in different blocks the upper bound of blocks follows.Therefore we can retain the small bin blocks of large capacity and discard the rest.Therefore we as-sume that the given instance is modified to satisfy.Then it follows that. When the number of bins isfixed a PTAS is known(implicit in earlier work)even for the GAP.We will use ideas from that algorithm.For large bin blocks the advantage is that we can exceed the number of bins used by an fraction,as we shall see below.The main task is to integrate the allocation and packing of items between the different sets of bins.For the rest of the section we assume that we have a set of items that have a feasible packing and we will implicitly refer to somefixed feasible packing as the optimal solution.2.3.2Packing Profitable Items into Small Bin Blocks We guess here for each bin in,the most profitable items that are packed in it in the optimal solution.Therefore the number of guesses needed is.2.3.3Packing Large Items into Large Bin BlocksThe second step is to select items and pack them in large bin blocks.We say that an item is packed as a large item if its size is at least times the capacity of the bin in which it is packed.Since the capacities of the blocks are increasing geometrically,an item can be packed as large in at mostCombining for all sizes results in guesses over all.Here is where we take advantage of the fact that our items come from only different size classes.Suppose we have correctly assigned all large items to their respective bin blocks.We describe now a procedure for finding a feasible packing of these items.Here we ignore the potential interaction between items that are packed as large and those packed as small.We will show later that we can do so with only a slight loss in the approximation factor.We can focus on a specific block since the large items are now partitioned between the blocks.The abstract problem we have is the following.Given a collection of bins with capacities in the range,and a set of items with sizes in the range,decide if there is a feasible packing for them.It is easily seen that this problem is NP-hard.We obtain a relaxation by allowing use of extra bins to pack the items.However,we restrict the capacity of the extra bins to be.The following algorithm decides that either the given instance is infeasible or gives a packing with at most an additional bins of capacity.Let be the set of items of size greater than.Order items of in non-decreasing sizes and pack each item into the smallest bin available that can accommodate it.Disregard all the bins used up in the previous step since no other large item canfit into them.To pack the remaining items use a PTAS for bin pack-ing,potentially using an extra bins of capacity.We omit the full details of the algorithm but summarize the result in the following lemma.L EMMA2.5.Given bins of capacities in the range and items of sizes in the range,there is an algorithm that runs in time,and provided the items have a feasible packing in the given bins,returns a feasible packing using at most extra bins of capacity.We eliminate the extra bins later by picking the most profitable among them and discarding the items packed in the rest.The restriction on the size and number of extra bins is motivated by the elimination procedure.In order to use extra bins the quantity needs to be at least.This is the reason to distinguish between small and large bin blocks.For a large bin block let be the extra bins used in packing the large items.We note that.2.3.4Packing the Remaining ItemsThe third and last step of the algorithm is to pack the remaining items which we denote by.At this stage we have a packing of the most profitable items in each of the bins in(bins in small bin blocks)and a feasiblepacking of the large items in the rest of the bins(including the extra bins).For each bin let denote the set of items already packed into in thefirst two steps.The itemset is packed via an LP approach.In particular we use the generalized assignment formulation with the following constraints.1.Each remaining item must be assigned to some bin.2.An item can can be assigned to a bin in a large binblock only if.In other wordsshould be small for all bins in.3.An item can be assigned to a bin in a small binblock only ifassigned to the bins satisfy the constraints specified by, that is if.The integral solution to the LP also defines an allocation of items to each block.Let be the total profit associated with all items assigned to bins in block.Then clearly,.We however have an infeasible solution.Extra bins are used for large bin blocks and bin capacities are violated in the rounded solution.We modify this solution to create a feasible solution such that in each block we obtain a profit of at least .Large Bin Blocks:Let be a large bin block and without loss of generality assume that bin capacities in are in the range.By constraint2on the assignment,the size of any violating item in is less than and there are at most of them.For all we conclude that at most extra bins of capacity each are sufficient to pack all the violating items of.Recall that we may have used extra bins in packing the large items as well.Thus the total number of extra bins of capacity,denoted by,is at most .Thus all items assigned to bins in have a feasible integral assignment in the set.Now clearly the most profitable bins in the collection must have a total associated profit of at least.Moreover,it is easy to verify that all the items in these bins can be packed in the bins of itself.Small Bin Blocks:Consider now a small bin block. By constraint3on the assignment,we know that the profit associated with the violating item in any bin of is at most.For each element of we have an item in and similarly for and for.We also have an additional items whereand for any additional item and a bin. Fix a positive constant.For an item and bin we set if and otherwise.The profits of items and are set similarly.The sizes of items, and are all set to1each.It is now easy to verify that if instance has a matching of size,there exists a solution to of value.Otherwise,every solution to has value at most.As above,the APX-hardness now follows from the fact that.Notice that Theorem3.2is not a symmetric analogue of Theorem3.1.In particular,we use items of two different sizes in Theorem3.2.This is necessary as the special case of GAP where all item sizes are identical across the bins (but the profits can vary from bin to bin),is equivalent to minimum cost bipartite matching.P ROPOSITION3.1.There is a polynomial time algorithm to solve GAP instances where all items have identical sizes across the bins.3.2A-approximation for GAPShmoys and Tardos[26]give a bi-criteria approxima-tion for Min GAP.A paraphrased statement of their precise result is as follows.T HEOREM3.3.(S HMOYS AND T ARDOS[26])Given a feasible instance for the cost assignment problem,there is a polynomial time algorithm that produces an integral assignment such thatcost of solution is no more than OPT,each item assigned to a bin satisfies, andif a bin’s capacity is violated then there exists a single item that is assigned to the bin whose removal ensures feasibility.We now indicate how the above theorem implies a-approximation for GAP.The idea is to simply convert the maximization problem to a minimization problem by turning profits into costs by setting whereis a large enough number to make all costs positive.To create a feasible instance we have an additional bin of capacity and for all items we set and(in other words).We then use the algorithm for cost assignment and obtain a solution with the guarantees provided in Theorem3.3.It is easily seen that the profit obtained by the assignment is at least the optimal profit.Nowwe show how to obtain a feasible solution of at least half the profit.Let be any bin whose capacity is violated by the assignment and let be the item guaranteed in Theorem3.3.If is at least half the profit of bin then we retain and leave out the rest of the items in.In the other casewe leave out.This results in a feasible solution of at least half the profit given by the LP solution.We get the following result:P ROPOSITION3.2.There is a-approximation for GAP.R EMARK3.1.The algorithm in[26]is based on rounding an LP relaxation.For MKP an optimal solution to the linear program can be easily constructed in time byfirst sorting items by their profit to size ratio and then greedily filling them in the bins.Rounding takes time. We also note that the integrality gap for the LP relaxation of GAP is a factor of even for instances of MKP with identical bin capacities.4A Greedy AlgorithmWe now analyse a natural greedy strategy:pack bins oneat a time,by applying the FPTAS for the single knapsack problem on the remaining items.Greedy(refers to this algorithm with parameterizing the error tolerance used in the knapsack FPTAS.C LAIM4.1.For instances of MKP with bins of identical capacity Greedy()gives aR EMARK4.1.Claim4.2is valid even if the item sizes(but not profits)are a function of the bins,an important specialcase of GAP that is already APX-hard.The running time of Greedy()is using the algorithm of Lawler[17]for the knapsack problem.Claim4.2has beenindependently observed in[2].A Tight Example:We show an instance on which Greedy’s performance is no better than.There are two items with sizes and and each has a profit of.There are two bins with capacities and each.Greedy packs the smaller item in the big bin and obtains a profit of while OPT.This also shows that ordering bins in non-increasing capacities does not help improve the performance of Greedy.5ConclusionsAn interesting aspect of our guessing strategy is that it is completely independent of the number of bins and their capacities.This might prove to be useful in other variants of the knapsack problem.One recent application is in obtaining a PTAS for the stochastic knapsack problem with Bernoulli variables[7].The Min GAP problem has a bi-criteria ap-proximation and it is NP-hard to obtain a-approximation.In contrast GAP has a-approximation but the known hardness of approximation is for a very small butfixed.Closing this gap is an interesting open problem.Another interesting problem is to obtain a PTAS for MKP with an improved running time.Though an FPTAS is ruled out even for the case of two identical bins,a PTAS with a running time of the form poly might be achievable.The identical bin capacities case might be more tractable than the general case.Extending our ideas to achieve the above mentioned running time appears to be non-trivial.References[1] A.K.Chandra,D.S.Hirschberg,and C.K.Wong.Approx-imate algorithms for some generalized knapsack problems.Theoretical Computer Science,3(3):293–304,Dec1976. [2]M.W.Dawande,J.R.Kalagnanam,P.Keskinocak,R.Ravi,and F.S.Salman.Approximation algorithms for the multiple knapsack problem with assignment restrictions.Technical Report,RC21331,IBM T.J.Watson Research Center,1998.[3]W.Fernandez de la Vega and G.S.Lueker.Bin packing canbe solved within in linear binatorica,1:349–355,1981.[4] C.E.Ferreira, A.Martin,and R.Weismantel.Solvingmultiple knapsack problems by cutting planes.SIAM Journal on Optimization,6(3):858–77,1996.[5] A.M.Frieze and M.R.B.Clarke.Approximation algorithmsfor the-dimensional-knapsack problem:worst-case and probabilistic analyses.European Journal of Operational Research,15(1):100–9,1984.[6]M.R.Garey and puters and Intractabil-ity:A Guide to the Theory of NP-Completeness.Freeman, 1979.[7] A.Goel and P.Indyk.Stochastic load balancing and relatedproblems.In Proceedings of the40th Annual Symposium on Foundations of Computer Science,pages579–86,1999. [8] D.S.Hochbaum and D.B.Shmoys.A polynomial approxi-mation scheme for scheduling on uniform processors:using the dual approximation approach.SIAM Journal on Comput-ing,17:539–551,1988.[9]M.S.Hung and J.C.Fisk.An algorithm for0-1multipleknapsack problems.Naval Research Logistical Quarterly, 24:571–579,1978.[10]M.S.Hung and J.C.Fisk.A heuristic routine for solvinglarge loading problems.Naval Research Logistical Quar-terly,26(4):643–50,1979.[11]O.H.Ibarra and C.E.Kim.Fast approximation algorithmsfor the knapsack and sum of subset problems.Journal of the ACM,22(4):463–8,1975.[12]G.Ingargiola and J.F.Korsh.An algorithm for the solution of0-1loading problems.Operations Research,23(6):110–119, 1975.[13]J.R.Kalagnanam,M.W.Dawande,M.Trubmo,and H.S.Lee.Inventory problems in steel industry.Technical Report, RC21171,IBM T.J.Watson Research Center,1998.[14]V.Kann.Maximum bounded-dimensional matching ismax rmation Processing Letters,37:27–35,1991.[15]N.Karmarkar and R.Karp.An efficient approximationscheme for the one-dimensional bin-packing problem.In Proceedings of the23rd Annual Symposium on Foundations of Computer Science,pages312–320,1982.[16]H.Kellerer.A Polynomial Time Approximation Schemefor the Multiple Knapsack Problem.In Proceedings of APPROX’99,Springer.[17] wler.Fast approximation algorithms for knapsackproblems.Mathematics of Operations Research,4(4):339–56,1979.[18] wler,J.K.Lenstra,A.H.G.Rinnooy Kan,andD.B.Shmoys.Sequencing and scheduling:algorithms andcomplexity.In S.C.Graves et al.,editor,Handbooks in OR &MS,volume4,pages445–522.Elsevier Science Publishers, 1993.[19]J.K.Lenstra,D.B.Shmoys,and E.Tardos.Approximationalgorithms for scheduling unrelated parallel machines.Math-ematical Programming,46:259–271,1990.Preliminary ver-sion appeared in Proceedings of the28th Annual IEEE Sym-posium on Foundations of Computer Science,217–24,1987.[20]J.Lin and J.S.Vitter.-Approximations with minimumpacking constraint.In Proceedings of the24th Annual ACM Symposium on Theory of Computing,771–782,1992. [21]S.Martello and P.Toth.Solution of the zero-one multipleknapsack problem.European J.of Operations Research, 4:322–329,1980.。
Physica A 374(2007)483–490Identification of overlapping community structure in complexnetworks using fuzzy c -means clusteringShihua Zhang a,Ã,Rui-Sheng Wang b ,Xiang-Sun Zhang aa Academy of Mathematics &Systems Science,Chinese Academy of Science,Beijing 100080,Chinab School of Information,Renmin University of China,Beijing 100872,ChinaReceived 28June 2006Available online 7August 2006AbstractIdentification of (overlapping)communities/clusters in a complex network is a general problem in data mining of network data sets.In this paper,we devise a novel algorithm to identify overlapping communities in complex networks by the combination of a new modularity function based on generalizing NG’s Q function,an approximation mapping of network nodes into Euclidean space and fuzzy c -means clustering.Experimental results indicate that the new algorithm is efficient at detecting both good clusterings and the appropriate number of clusters.r 2006Elsevier B.V.All rights reserved.Keywords:Overlapping community structure;Modular function;Spectral mapping;Fuzzy c -means clustering;Complex network1.IntroductionLarge complex networks representing relationships among set of entities have been one of the focuses of interest of scientists in many fields in the recent years.Various complex network examples include social network,worldwide web network,telecommunication network and biological network.One of the key problems in the field is ‘How to describe/explain its community structure’.Generally,a community in a network is a subgraph whose nodes are densely connected within itself but sparsely connected with the rest of the network.Many studies have verified the community/modularity structure of various complex networks such as protein-protein interaction network,worldwide web network and co-author network.Clearly,the ability to detect community structure in a network has important practical applications and can help us understand the network system.Although the notion of community structure is straightforward,construction of an efficient algorithm for identification of the community structure in a complex network is highly nontrivial.A number of algorithms for detecting the communities have been developed in various fields (for a recent review see Ref.[1]and a recent comparison paper see Ref.[2]).There are two main difficulties in detecting community structure.The first is that we don’t know how many communities there are in a given network.The usual drawback in many /locate/physa0378-4371/$-see front matter r 2006Elsevier B.V.All rights reserved.doi:10.1016/j.physa.2006.07.023ÃCorresponding author.E-mail addresses:zsh@ (S.Zhang),wrs@ (R.-S.Wang),zxs@ (X.-S.Zhang).algorithms is that they cannot give a valid criterion for measuring the community structure.Secondly,it is a common case that some nodes in a network can belong to more than one community.This means the overlapping community structure in complex networks.Overlapping nodes may play a special role in a complex network system.Most known algorithms such as divisive algorithm [3–5]cannot detect them.Only a few community-detecting methods [6,7]can uncover the overlapping community structure.Taking into account the first difficulty,Newman and Girvan [8]has developed a new approach.They introduced a modularity function Q for measuring community structure.In order to write the context properly,we refer to a similar formulation in Ref.[5].In detail,given an undirected graph/network G ðV ;E ;W Þconsisting of the node set V ,the edge set E and a symmetric weight matrix W ¼½w ij n Ân ,where w ij X 0and n is the size of the network,the modularity function Q is defined asQ ðP k Þ¼X k c ¼1L ðV c ;V c ÞL ðV ;V ÞÀL ðV c ;V ÞL ðV ;V Þ 2"#,(1)where P k is a partition of the nodes into k groups and L ðV 0;V 00Þ¼P i 2V 0;j 2V 00w ði ;j Þ.The Q function measuresthe quality of a given community structure organization of a network and can be used to automatically select the optimal number of communities k according to the maximum Q value [8,5].The measure has been used for developing new detection algorithms such as Refs.[5,9,4].White and Smyth [5]showed that optimizing the Q function can be reformulated as a spectral relaxation problem and proposed two spectral clustering algorithms that seek to maximize Q .In this study,we develop an algorithm for detecting overlapping community structure.The algorithm combines the idea of modularity function Q [8],spectral relaxation [5]and fuzzy c -means clustering method[10]which is inspired by the general concept of fuzzy geometric clustering.The fuzzy clustering methods don’t employ hard assignment,while only assign a membership degree u ij to every node v i with respect to the cluster C j .2.MethodSimulation across a wide variety of simulated and real world networks showed that large Q values are correlated with better network clusterings [8].Then maximizing the Q function can obtain final ‘optimal’community structure.It is noted that in many complex networks,some nodes may belong to more than one community.The divisive algorithms based on maximizing the Q function fail to detect such case.Fig.1shows an example of a simple network which visually suggests three clusters and classifying node 5(or node 9)intoFig.1.An example of network showing Q and e Qvalues for different number k of clusters using the same spectral mapping but different cluster methods,i.e.k -means and fuzzy c -means,respectively.For the latter,it shows every node’s soft assignment and membership of final clusters with l ¼0:15.S.Zhang et al./Physica A 374(2007)483–490484two clusters at the same time may be more appropriate intuitively.So we introduce the concept of fuzzy membership degree to the network clustering problem in the following subsection.2.1.A new modular functionIf there are k communities in total,we define a corresponding n Âk ‘soft assignment’matrix U k ¼½u 1;...;u k with 0p u ic p 1for each c ¼1;...;k and P kc ¼1u ic ¼1for each i ¼1;...;n .With this we define the membership of each community as ¯V c ¼f i j u ic 4l ;i 2V g ,where l is a threshold that can convert a soft assignment into final clustering.We define a new modularity function e Q as e Q ðU k Þ¼X k c ¼1A ð¯V c ;¯V c ÞA ðV ;V ÞÀA ð¯V c ;V ÞA ðV ;V Þ 2"#,(2)where U k is a fuzzy partition of the vertices into k groups and A ð¯V c ;¯V c Þ¼P i 2¯V c ;j 2¯V c ððu ic þu jc Þ=2Þw ði ;j Þ,A ð¯V c ;V Þ¼A ð¯V c ;¯V c ÞþP i 2¯V c ;j 2V n ¯V c ððu ic þð1Àu jc ÞÞ=2Þw ði ;j Þand A ðV ;V Þ¼P i 2V ;j 2V w ði ;j Þ.This of coursecan be thought as a generalization of the Newman’s Q function.Our objective is to compute a soft assignment matrix by maximizing the new Q function with appropriate k .How could we do?2.2.Spectral mappingWhite and Smyth [5]showed that the problem of maximizing the modularity function Q can be reformulated as an eigenvector problem and devised two spectral clustering algorithms.Their algorithms are similar in spirit to a class of spectral clustering methods which map data points into Euclidean space by eigendecomposing a related matrix and then grouping them by general clustering methods such as k -means and hierarchical clustering [5,9].Given a network and its adjacent matrix A ¼ða ij Þn Ân and a diagonal matrix D ¼ðd ii Þ,d ii ¼P k a ik ,two matrices D À1=2AD À1=2and D À1A are often used.A recent modification [11]uses the top K eigenvectors of the generalized eigensystem Ax ¼tDx instead of the K eigenvectors of the two matrices mentioned above to form a matrix whose rows correspond to original data points.The authors show that after normalizing the rows using Euclidean norm,their eigenvectors are mathematically identical and emphasize that this is a numerically more stable method.Although their result is designed to cluster real-valued points[11,12],it is also appropriate for network clustering.So in this study,we compute the top k À1eigenvectors of the eigensystem to form a ðk À1Þ-dimensional embedding of the graph into Euclidean space and use ‘soft-assignment’geometric clustering on this embedding to generate a clustering U k (k is the expected number of clusters).2.3.Fuzzy c-meansHere,in order to realize our ‘soft assignment’,we introduce fuzzy c -means (FCM)clustering method [10,13]to cluster these points and maximize the e Qfunction.Fuzzy c -means is a method of clustering which allows one piece of data to belong to two or more clusters.This method (developed by Dunn in 1973[10]and improved by Bezdek in 1981[13])is frequently used in pattern recognition.It is based on minimization of the following objective functionJ m ¼Xn i ¼1X k j ¼1u m ij k x i Àc j k 2,(3)over variables u ij and c with P j u ij ¼1.m 2½1;1Þis a weight exponent controlling the degree of fuzzification.u ij is the membership degree of x i in the cluster j .x i is the i th d -dimensional measured data point.c j is the d -dimensional center of the cluster j ,and k Ãk is any norm expressing the similarity between any measured data and the center.Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above,with the update of membership degree u ij and the cluster centers c j .This procedure converges to a local minimum or a saddle point of J m .S.Zhang et al./Physica A 374(2007)483–4904852.4.The flow of the algorithmGiven an upper bound K of the number of clusters and the adjacent matrix A ¼ða ij Þn Ân of a network.The detailed algorithm is stated straightforward for a given l as follows:Spectral mapping:(i)Compute the diagonal matrix D ¼ðd ii Þ,where d ii ¼P k a ik .(ii)Form the eigenvector matrix E K ¼½e 1;e 2;...;e K by computing the top K eigenvectors of thegeneralized eigensystem Ax ¼tDx .Fuzzy c -means:for each value of k ,2p k p K :(i)Form the matrix E k ¼½e 2;e 3;...;e k from the matrix E K .(ii)Normalize the rows of E k to unit length using Euclidean distance norm.(iii)Cluster the row vectors of E k using fuzzy c -means or any other fuzzy clustering method to obtain a softassignment matrix U k .Maximizing the modular function:Pick the k and the corresponding fuzzy partition U k that maximizes e QðU k Þ.In the algorithm above,we initialize FCM such that the starting centroids are chosen to be as orthogonal as possible which is suggested for k -means clustering method in Ref.[12].The initialization does not change the time complexity,and also can improve the quality of the clusterings,thus at the same time reduces the need for restarting the random initialization process.The framework of our algorithm is similar to several spectral clustering methods in previous studies[5,9,12,11].We also map data points (work nodes in our study)into Euclidean space by computing the top K eigenvectors of a generalized eigen system and then cluster the embedding using a fuzzy clustering method just as others using geometric clustering algorithm or general hierarchical clustering algorithm.Here,we emphasize two key points different from those earlier studies:We introduce a generalized modular function e Q employing fuzzy concept,which is devised for evaluating the goodness of overlapping community structure. In combination with the novel e Qfunction,we introduce fuzzy clustering method into network clustering instead of general hard clustering methods.This means that our algorithm can uncover overlapping clusters,whereas general framework:‘‘Objective function such as Q function and Normalized cut function+Spectral mapping+general geometric clustering/hierarchical clustering’’cannot achieve this.3.Experimental resultsWe have implemented the proposed algorithm by Matlab.And the fuzzy clustering toolbox [14]is used for our analysis.In order to make an intuitive comparison,we also compute the hard clustering based on the original Q -function,spectral mapping (same as we used)and k -means clustering.We illustrate the fuzzy concept and the difference of our method with traditional divisive algorithms by a simple example shown in Fig.1.Just as mentioned above,the network visually suggests three clusters.But classifying node 5(or node 9)simultaneously into two clusters may be more reasonable.We can see from Fig.1that our method did uncover the overlapping communities for this simple network,while the traditional method can only make one node belong to a single cluster.We also present the analysis of two real networks,i.e.the Zachary’s karate club network and the American college football team network for better understanding the differences between our method and traditional methods.S.Zhang et al./Physica A 374(2007)483–490486S.Zhang et al./Physica A374(2007)483–490487 3.1.Zachary’s karate clubThe famous karate club network analyzed by Zachary[15]is widely used as a test example for methods of detecting communities in complex networks[1,8,16,3,4,17,9,18,19].The network consists of34members of a karate club as nodes and78edges representing friendship between members of the club which was observed over a period of two years.Due to a disagreement between the club’s administrator and the club’s instructor, the club split into two smaller ones.The question we concern is that if we can uncover the potential behavior of the network,detect the two communities or multiple groups,and particularly identify which community a node belongs to.The network is presented in Fig.2,where the squares and the circles label the members of the two groups.The results of k-means and our analysis are illustrated in Fig.3.The k-means combined with Q function divides the network into three parts(see in Fig.3A),but we can see that some nodes in one cluster are also connected densely with another cluster such as node9and31in cluster 1densely connecting with cluster2,and node1in cluster2with cluster3.Fig.3B shows the results of our method,from which we can see that node1,9,10,31belong to two clusters at the same time.These nodes in the network link evenly with two clusters.Another thing is that the two methods both uncover three communities but not two.There is a small community included in the instructor’s faction,since the set of nodes5,6,7,11,17only connects with node1in the instructor’s faction.Note that our method also classifies node1into the small community,while k-means does not.work of American college football teamsThe second network we have investigated is the college football network which represents the game schedule of the2000season of Division I of the US college football league.The nodes in the network represent the115 teams,while the links represent613games played in the course of the year.The teams are divided into conferences of8–12teams each and generally games are more frequent between members of the same conference than between teams of different conferences.The natural community structure in the network makes it a commonly used workbench for community-detecting algorithm testing[3,5,7].Fig.4shows how the modularity Q and e Q vary with k with respect to k-means and our method,respectively. The peak for k-means is at k¼12,Q¼0:5398,while for our algorithm at k¼10,e Q¼0:4673with l¼0:10. Both methods identify ten communities which contain ten conferences almost exactly.Only teams labeled as Sunbelt are not recognized as belonging to a same community for both methods.This group is classified as well in the results of Refs.[3,19].This happens because the Sunbelt teams played nearly as many games against Western Athletic teams as they played in their own conference,and they also played quite a number of gamesagainst Mid-American team.Our method identified11nodes(teams)which belong to at least twoFig.2.Zachary’s karate club network.Square nodes and circle nodes represent the instructor’s faction and the administrator’s faction, respectively.Thisfigure is from Newman and Girvan[8].communities (see Fig.5,11red nodes).These nodes generally connect evenly with more than one community,so we cannot classify them into one specific community correctly.These nodes represent ‘fuzzy’points which cannot be classified correctly by employing current link information.Maybe such points play a ‘bridge’role in two or more communities in complex network of other types.4.Conclusion and discussionIn this paper,we present a new method to identify the community structure in complex networks with a fuzzy concept.The method combines a generalized modularity function,spectral mapping,and fuzzy clustering technique.The nodes of the network are projected into d -dimensional Euclidean space which is obtained by computing the top d nontrivial eigenvectors of the generalized eigensystem Ax ¼tDx .Then the fuzzy c -means clustering method is introduced into the d -dimensional space based on general Euclidean distance to cluster the data points.By maximizing the generalized modular function e QðU d Þfor varying d ,we obtain the appropriate number of clusters.The final soft assignment matrix determines the final clusters’membership with a designated threshold l .Fig.3.The results of both k -means and our method applied to karate club network.A:The different colors represent three different communities obtained by k -means and the right table shows values of NG’Q versus different k .B:Four red nodes represent the overlap of two adjacent communities obtained by our method and the right table shows values of new Q versus different k with l ¼0:25.3.t y 0510152000.10.20.30.40.50.6k-meansK N G ' Q 0510152000.10.20.30.40.5fuzzy c-means K N e w Q Fig.4.Q and e Qvalues versus k with respect to k -means and fuzzy c -means clustering methods for the network of American college football team.S.Zhang et al./Physica A 374(2007)483–490488Although spectral mapping has been comprehensively used before to detect communities in complex networks (even in clustering the real-valued points),we believe that our method represents a step forward in this field.A fuzzy method is introduced naturally with the generalized modular function and fuzzy c -means clustering technique.As our tests have suggested,it is very natural that some nodes should belong to more than one community.These nodes may play a special role in a complex network system.For example,in a biological network such as protein interaction network,one node (protein or gene)belonging to two functional modules may act as a bridge between them which transfers biological information or acts as multiple functional units [6].One thing should be noted is that when this method is applied to large complex networks,computational complexity is a key problem.Fortunately,some fast techniques for solving eigensystem have been developed[20]and several methods of FCM acceleration can also be found in the literature [21].For instance,if we adopt the implicitly restarted Lanczos method (IRLM)[20]to compute the K À1eigenvectors and the efficient implementation of the FCM algorithm in Ref.[21],we can have the worse-case complexity of O ðmKh þnK 2h þK 3h Þand O ðnK 2Þ,respectively,where m is the number of edges in the network and h is the number of iteration required until convergence.For large sparse networks where m $n ,and K 5n ,the algorithms will scale roughly linearly as a function of the number of nodes n .Nonetheless,the eigenvector computation is still the most computationally expensive step of the method.We expect that this new method will be employed with promising results in the detection of communities in complex networks.AcknowledgmentsThis work is partly supported by Important Research Direction Project of CAS ‘‘Some Important Problem in Bioinformatics’’,National Natural Science Foundation of China under Grant No.10471141.The authors thank Professor M.E.J.Newman for providing the data of karate club network and the college football team network.Fig.5.Fuzzy communities of American college football team network (k ¼10and e Q¼0:4673)with given l ¼0:10(best viewed in color).S.Zhang et al./Physica A 374(2007)483–490489References[1]M.E.J.Newman,Detecting community structure in networks,Eur.Phys.J.B 38(2004)321–330.[2]L.Danon,J.Duch,A.Diaz-Guilera,A.Arenas,Comparing community structure identification,J.Stat.Mech.P09008(2005).[3]M.Girvan,M.E.J.Newman,Community structure in social and biological networks,A 99(12)(2002)7821–7826.[4]J.Duch,A.Arenas,Community detection in complex networks using extremal optimization,Phys.Rev.E 72(2005)027104.[5]S.White,P.Smyth,A spectral clustering approach to finding communities in graphs,SIAM International Conference on DataMining,2005.[6]G.Palla,I.Derenyi,I.Farkas,T.Vicsek,Uncovering the overlapping community structure of complex networks in nature and society,Nature 435(2005)814–818.[7]J.Reichardt,S.Bornholdt,Detecting fuzzy community structures in complex networks with a Potts model,Phys.Rev.Lett.93(2004)218701.[8]M.E.J.Newman,M.Girvan,Finding and evaluating community structure in networks,Phys.Rev.E 69(2004)026113.[9]L.Donetti,M.A.Mun oz,Detecting network communities:a new systematic and efficient algorithm,J.Stat.Mech.P10012(2004).[10]J.C.Dunn,A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,J.Cybernet.3(1973)32–57.[11]D.Verma,M.Meila,A comparison of spectral clustering algorithms.Technical Report,2003,UW CSE Technical Report 03-05-01.[12]A.Ng,M.Jordan,Y.Weiss,On spectral clustering:analysis and an algorithm,Adv.Neural Inf.Process.Systems 14(2002)849–856.[13]J.C.Bezdek,Pattern Recognition with Fuzzy Objective Function Algorithms,Plenum Press,New York,1981.[14]Fuzzy Clustering Toolbox-h http://www.fmt.vein.hu/softcomp/fclusttoolbox/i .[15]W.W.Zachary,An information flow model for conflict and fission in small groups,J.Anthropol.Res.33(1977)452–473.[16]M.E.J.Newman,Fast algorithm for detecting community structure in networks,Phys.Rev.E 69(2004)066133.[17]F.Radicchi,C.Castellano,F.Cecconi,V.Loreto,D.Parisi,Defining and identifying communities in networks,Proc.Natl.Acad.A 101(9)(2004)2658–2663.[18]F.Wu,B.A.Huberman,Finding communities in linear time:a physics approach,Eur.Phys.J.B 38(2004)331–338.[19]S.Fortunato,tora,M.Marchiori,A method to find community structures based on information centrality,Phys.Rev.E 70(2004)056104.[20]Z.Bai,J.Demmel,J.Dongarra,A.Ruhe,H.Vorst (Eds.),Templates for the Solution of Algebraic Eigenvalue Problems:A PracticalGuide,SIAM,Philadelphia,PA,2000.[21]J.F.Kelen,T.Hutcheson,Reducing the time complexity of the fuzzy c -means algorithm,IEEE Trans.Fuzzy Systems 10(2)(2002)263–267.S.Zhang et al./Physica A 374(2007)483–490490。
2024届湖南省长沙市长郡中学高三下学期三模英语试题一、听力选择题1.Which film does Mary want to see?A.Ordinary Angels.B.Bob Marley: One Love.C.Kung Fu Panda 4.2.Where does the conversation probably take place?A.In an apartment.B.In a restaurant.C.In a shop.3.Who is the woman probably talking to?A.Her friend.B.A travel agent.C.A hotel receptionist.4.What is the weather like now?A.Cloudy.B.Sunny.C.Rainy.5.What happens to Sarah?A.She eats too much.B.She has a toothache.C.She needs an operation.听下面一段较长对话,回答以下小题。
6.What does the woman plan to do next?A.Drive home.B.Pick Jack up.C.See her husband.7.What is Jack doing?A.Watching TV.B.Practicing football.C.Walking with Tim.听下面一段较长对话,回答以下小题。
8.Why does Alice want to meet David?A.To seek for advice.B.To borrow some books.C.To invite him to a game.9.How does Ethan sound in the end?A.Humble.B.Proud.C.Satisfied.听下面一段较长对话,回答以下小题。
模糊划分及其模糊粗糙近似算子姚卫; 陈晓庆【期刊名称】《《聊城大学学报(自然科学版)》》【年(卷),期】2020(033)001【总页数】4页(P1-4)【关键词】含幺序半群; 模糊划分; 模糊等价关系; 模糊粗糙近似算子【作者】姚卫; 陈晓庆【作者单位】河北科技大学理学院河北石家庄050018【正文语种】中文【中图分类】O1590 引言与预备知识粗糙集理论由Pawlak于1982年提出[1,2],是基于不可分辨关系的一种聚类方法.经过几十年的发展,粗糙集理论与方法已成功地应用到了过程控制、社会经济、医疗诊断、生物化学、环境科学、心理学和冲突分析等领域中.最初的粗糙集的基本结构是等价关系,然而这并不能描述信息系统中的一些粒化问题,于是基于广义关系的粗糙集模型得到了快速发展.除关系型粗糙集外,覆盖型和邻域型(包括邻域系统和邻域算子)粗糙集也应运而生.邻域型粗糙集可以看做是覆盖型粗糙集的一种特殊情形,而覆盖型粗糙集可以看做是关系型粗糙集的扩展,二者都具有明显的粒化思想.粗糙集和模糊集的交叉结合也是粗糙集理论的一个重要组成部分.在关系型模糊粗糙集模型方面,从模糊二元关系和赋值格的扩展两方面涌现了大量的研究论文,如:基于单位区间的模糊粗糙集[3-12],基于剩余格的模糊粗糙集[13-16]; 在覆盖型和邻域型模糊粗糙集模型方面,由于模糊覆盖诱导的不同类型的邻域系统及其上下近似算子,因此也就诱导了很多不同的粗糙近似算子模型[17-20].此外,文[21]提出了具有分析学背景的度量型粗糙集,研究了这种模型在模糊聚类中的应用.由于等价关系和划分是两个相互等价的概念,因此二者在研究粗糙集时是等价的.对于模糊情形,通常的模糊等价关系是利用格值上的三角模对经典等价关系的逻辑扩展,其定义方式较为固定.是否存在与模糊等价关系一一对应的模糊划分的概念,一直是模糊数学界关注的问题.2004,Belohavek基于完备剩余格、利用模糊等同价系引入了一种模糊划分的概念[22],并证明了模糊等价关系和模糊划分之间一一对应性.在此之前的模糊粗糙集的相关结构都是建立在模糊覆盖的基础上的,模糊覆盖虽然是划分的一种弱化后的模糊扩展,但是无论如何它始终无法与模糊等价关系相对应.本文将以含幺序半群(不必交换)为取值域,引入一种模糊划分的定义,推广Belohavek的相关定义,并证明它与模糊等价关系的一一对应性,最后以交换单位quantale为取值格研究模糊划分诱导的模糊粗糙近似算子的基本性质.下面给出本文所需要的预备知识.定义1 设L是一个偏序集,*是L上的一个半群运算,e是L中关于运算*的单位元.如果运算*与偏序相互协调,即a≤b,c≤d蕴含a*c≤b*d(∀a,b,c,d∈L),则称(L,*,e)是一个含幺序半群.定义2[23] 设(L,*,e)是一个交换的含幺序半群,其中L是完备格,如果运算*对任意并分配,即a*或等价地,存在蕴含算子→:L×L→L使得a*b≤c⟺a≤b→c(∀a,b,c∈L),则称(L,*,e)是一个交换的单位quantale.例1 (1) ([0,+∞),×,1])和([0,+∞)op,+,1])都是交换的含幺序半群.(2) ([0,1],×,1)和([0,1],min,1)是交换的单位quantale.(3) 设L={0,a,b,1}是一个菱形格,即0<a,b<1,a‖b,规定0*x=x*0=0,a*x=x*a=x(∀x∈L),b*b=b*1=1*b=b,1*1=1,则(L,*,a)是一个交换单位的quantale(还是幂等的且严格双侧的).用LX表示集合X上的L-模糊子集(即从X到L的映射)的全体[24].1 模糊划分与模糊等价关系的一一对应性在本节中,我们假定(L,*,e)是一个交换的含幺序半群.定义3 设X是一个非空集,映射R:X×X→L称为X上的一个模糊等价关系,如果(R1) 自反性:∀x∈X,R(x,x)≥e;(R2) 对称性:∀x,y∈X,R(x,y)=R(y,x);(R3) 传递性:∀x,y,z∈X,R(x,y)*R(y,z)≤R(x,z).定义4 非空集X的模糊子集族Φ⊆LX称为X上的一个模糊划分,如果(P1) 对于任意的C∈Φ,存在x∈X使得C(x)≥e;(P2) 对于任意的x∈X,存在C∈Φ使得C(x)≥e;(P3) 对于任意的C1,C2∈Φ和x1,x2∈X,有C1(x1)*C2(x1)*C1(x2)≤C2(x2).注1 设Φ是非空集X上的一个模糊划分,则(P3′) 对于任意的C1,C2∈Φ和x1,x2∈X,C1(x1)*C2(x1)*C2(x2)≤C1(x2).证明我们只需要交换C1和C2就完成了(P3)和(P3′)的相互转化.我们称 (P3)和(P3′)为“三换一”规则.命题1 设Φ是非空集X上的一个模糊划分,则对任意的x∈X都存在唯一的Cx∈Φ使得Cx(x)≥e.证明对任意的x∈X,设有C1,C2∈Φ使得C2(x)≥e和C1(x)≥e.对任意的y∈X,由三换一规则,C1(y)=e*e*C1(y)≤C1(x)*C1(x)*C2(y)≤C2(y).则C1≤C2.同理,C2≤C1.因此C1=C2.在下文中,我们假设R是非空集X上的一个模糊等价关系.对于任意的x∈X,定义映射[x]R:X→L为[x]R(y)=R(x,y),称为x在R下的模糊等价类.命题2 设R是非空集X上的一个模糊等价关系,则ΦR={[x]R|x∈X}是一个模糊划分.证明显然,[x]R(x)=R(x,x)≥e,则(P1)和(P2)成立.由R的对称性和传递性,对于任意的a,b,x,y∈X有,[x]R(a)*[y]R(a)*[x]R(b)=R(x,a)*R(y,a)*R(x,b)≤R(y,b)=[y]R(b),从而(P3)成立.因此ΦR={[x]R|∀x∈X}是一个模糊划分.命题3 设Φ是非空集X上的一个模糊划分,则对于任意的x,y∈X都有Cx(y)=Cy(x). 证明由三换一规则,Cx(y)=Cx(y)*e*e≤Cx(y)*Cy(y)*Cx(x)≤Cy(x).同理,Cx(y)≥Cy(x)因此,Cx(y)=Cy(x).设Φ⊆LX是一个模糊划分,定义RΦ:X×X→L为RΦ(x,y)=Cx(y)(∀x,y∈X).命题4 设Φ是非空集X上的一个模糊划分,则RΦ是一个模糊等价关系.证明 (R1) 由命题1易得.(R2) 由命题3易得.(R3) 对于任意的x,y,z∈X,RΦ(x,y)*RΦ(y,z)=Cy(x)*Cy(z)=Cy(x)*e*Cy(z)≤Cy(x)*Cx(a)*Cy(z)≤Cx(z)=RΦ(x,z ).因此,RΦ是一个模糊等价关系.引理1 设ΦR是非空集X上的一个模糊划分,有Cx=[x]R,其中Cx为命题1中模糊划分对应的模糊子集.证明对于任意x∈X,有[x]R(x)=R(x,x)≥e,由命题1中的唯一性可知,Cx=[x]R.引理2 设ΦRΦ是非空集X上的一个模糊划分,有Cx=[x]RΦ其中Cx为命题1中模糊划分对应的模糊子集.证明对于任意的y∈X,有[x]RΦ(y)=RΦ(x,y)=Cx(y).因此,[x]RΦ=Cx.定理1 设R是非空集X上的一个模糊等价关系,Φ是非空集X上的一个模糊划分,则(1) RΦR=R;(2) ΦRΦ=Φ.因此,模糊等价关系和模糊划分之间存在一一对应性. 证明 (1) 对于任意的x,y∈X,在ΦR中,由于[x]R(x)=R(x,x)≥e,由命题1中的唯一性知故因此,CΦR=R.(2) 首先,ΦRΦ={[x]RΦ|x∈X}.由引理2,对于任意的x∈X,[x]RΦ=Cx.则ΦRΦ⊆Φ.其次,对于任意的C∈Φ,存在x∈X使得C(x)≥e,我们有C=Cx=[x]RΦ.从而ΦRΦ⊇Φ.因此,ΦRΦ=Φ.2 模糊划分诱导的模糊粗糙近似算子在本节中,我们研究由模糊划分诱导的粗糙近似算子的性质.虽然由模糊等价关系和模糊划分的等价性可以知道,模糊划分诱导粗糙近似算子在本质上和模糊等价关系诱导的粗糙近似算子在本质上没有差别,但是由模糊划分诱导粗糙近似算子有它们独特的性质,这些性质可以应用到覆盖型模糊粗糙集理论的研究中去.我们假设L是一个交换的单位quantale.设C,A∈LX,令其取值描述为C是A的子集的程度;令其取值描述为C和A有非空的交的程度.定义5 设X是一个非空集,Φ是X上的一个模糊划分.定义两个映射分别为称为由模糊划分Φ诱导的上、下粗糙近似算子.有意思地是,这两个粗糙近似算子还有如下描述方式.定理2 设Φ是非空集X上的一个模糊划分,则对于任意的A∈LX和x∈X,有证明 (O3) 首先,对于任意的C1,C2∈Φ和y∈X,由三换一规则有从而因此,其次,对于任意的C∈Φ,y∈X,由Cx(x)≥e得因此,(O4) 首先,对于任意的C1,C2∈Φ和y∈X,由三换一规则有从而故其次,对于任意的C∈Φ,y∈X,由Cx(x)≥e得,因此,注2 公式(O1,O3)解释为:x∈aprΦ(A)当且仅当[x]⊆A,当且仅当对于任意的C∈Φ,x∈C蕴含C⊆A,当且仅当存在C∈Φ使得x∈C且C⊆A(注意C其实就是[x]).公式(O2,O4)解释为:当且仅当[x]∩A≠∅,当且仅当存在C∈Φ使得x∈C且含C∩A≠∅,当且仅当对于任意的C∈Φ,x∈C蕴含C∩A≠∅(注意C其实就是[x]). 参考文献【相关文献】[1] Pawlak Z.Rough sets[J].International Journal of Computer and Information Sciences,1982,11:341-356.[2] Pawlak Z.Rough Set:Theoretical Aspects of Reasoning about Data[M].Dordrecht:Kluwer Academic Publishers,1991.[3] Liu G L.The axiomatization of the rough set upper approximationoperations[J].Fundamenta Informaticae,2006,69:331-342.[4] Mi J S,Zhang W X.An axiomatic characterization of a fuzzy generalization of rough sets[J].Information Sciences,2004,160:235-249.[5] Thiele H.On axiomatic characterization of fuzzy approximation operatorsI[C].∥Proceedings of the 2nd International Confer ence Rough Sets and Current Trends in Computing,Canada,2000,pp.239-247.[6] Thiele H.On axiomatic characterization of fuzzy approximation operatorsII[C].∥Proceedings of the 31st IEEE International Symposium on Multiple-Valued Logic,2001.[7] Wu W Z,Mi J S,Zhang W X.Generalized fuzzy rough sets[J].InformationSciences,2003,151:263-282.[8] Wu W Z,Li T J,Gu S ing one axiom to characterize fuzzy rough approximation operators determined by a fuzzy implication operator [J].FundamentaInformaticae,2015,142:87-104.[9] Wu W Z,Zhang W X.Constructive and axiomatic approaches of fuzzy approximation operators[J].Information Sciences,2004,159:233-254.[10] Wu W Z,Leung Y,Mi J S.On characterizations of (I,T)-fuzzy rough approximation operators[J].Fuzzy Sets and Systems,2005,154:76-102.[11] Wu W Z,Xu Y H,Shao M W,et al.Axiomatic characterizations of (S,T)-fuzzy rough approximation operators[J].Information Sciences,2016,334/335:17-43.[12] Morsi N N,Yakout M M.Axiomatics for fuzzy rough sets[J].Fuzzy Sets and Systems,1998,100:327-342.[13] Radzikowska A M.On lattice-based fuzzy rough sets[C].∥35 Years of Fuzzy Set Theory,Springer,2010.[14] Radzikowska A M,Kerre E E.Fuzzy rough sets based on residuated lattices[J].Lecture Notes in Computer Science,2004,3135:278-296.[15] Wang C Y,Hu B Q.Fuzzy rough sets based on generalized residuatedlattices[J].Information Sciences,2013,248:31-49.[16] She Y H,Wang G J.An axiomatic approach of fuzzy rough sets based on residuated lattices[J].Computers and Mathematics with Applications,2009,58:189-201.[17] Deng T Q,Chen Y M,Xu W L,et al.A novel approach to fuzzy rough sets based on a fuzzy covering[J].Information Sciences,2007,177:2308-2326.[18] Ma L W.Two fuzzy covering rough set models and their generalizations over fuzzy lattices[J].Fuzzy Sets and Systems,2015,294:1-17.[19] Feng T,Zhang S P,Mi J S.The reduction and fusion of fuzzy covering systems based on the evidence theory[J].International Journal of Approximate Reasoning,2012,53:87-103. [20] Li T J,Leung Y,Zhang W X.Generalized fuzzy rough approximation operators based on fuzzy coverings[J].International Journal of Approximate Reasoning,2008,48:836-856. [21] Yao W,She Y H,Lu L X.Metric-based L-fuzzy rough sets:Approximation operators and definable sets[J].Knowledge-Based Systems,2019,163:91-102.[22] Belohlavek R.Fuzzy Relational Systems:Foundations and Principles[M].NewYork:Kluwer Academic Publishers,2002.[23] Rosenthal K I.Quantales and Their Applications[M].Harlow:Longman House,1990.[24] Yao W,Zhao B.A duality between Omega-categories and algebraic Omega-categories[J].Electronic Notes in Theoretical Computer Science,2014,301:153-168.。
journal of computer and system sciences 55,119 139(1997)A Decision-Theoretic Generalization of On-Line Learningand an Application to Boosting *Yoav Freund and Robert E.Schapire -AT 6T Labs ,180Park Avenue ,Florham Park ,New Jersey 07932Received December 19,1996In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework.The model we study can be interpreted as a broad,abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting.We show that the multiplicative weight-update Littlestone Warmuth rule can be adapted to this model,yielding bounds that are slightly weaker in some cases,but applicable to a con-siderably more general class of learning problems.We show how the resulting learning algorithm can be applied to a variety of problems,including gambling,multiple-outcome prediction,repeated games,and prediction of points in R n .In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algo-rithm.This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm.We also study generalizations of the new boosting algorithm to the problem of learning functions whose range,rather than being binary,is an arbitrary finite set or a bounded segment of the real line.]1997Academic Press1.INTRODUCTIONA gambler,frustrated by persistent horse-racing losses and envious of his friends'winnings,decides to allow a group of his fellow gamblers to make bets on his behalf.He decides he will wager a fixed sum of money in every race,but that he will apportion his money among his friends based on how well they are doing.Certainly,if he knew psychically ahead of time which of his friends would win the most,he would naturally have that friend handle all his cking such clairvoyance,however,he attempts to allocate each race's wager in such a way that his total winnings for the season will be reasonably close to what he would have won had he bet everything with the luckiest of his friends.In this paper,we describe a simple algorithm for solving such dynamic allocation problems,and we show that our solution can be applied to a great assortment of learning problems.Perhaps the most surprising of these applications is the derivation of a new algorithm for ``boosting,''i.e.,forconverting a ``weak''PAC learning algorithm that performs just slightly better than random guessing into one with arbitrarily high accuracy.We formalize our on-line allocation model as follows.The allocation agent A has N options or strategies to choose from;we number these using the integers 1,...,N .At each time step t =1,2,...,T ,the allocator A decides on a distribu-tion p t over the strategies;that is p t i 0is the amountallocated to strategy i ,and N i =1p ti =1.Each strategy i then suffers some loss l t i which is determined by the (possibly adversarial)``environment.''The loss suffered by A is thenn i =1p t i l t i =p t }l t,i.e.,the average loss of the strategies with respect to A 's chosen allocation rule.We call this loss func-tion the mixture loss .In this paper,we always assume that the loss suffered by any strategy is bounded so that,without loss of generality,l t i #[0,1].Besides this condition,we make no assumptions about the form of the loss vectors l t ,or about the manner in which they are generated;indeed,the adversary's choice for l t may even depend on the allocator's chosen mixture p t .The goal of the algorithm A is to minimize its cumulative loss relative to the loss suffered by the best strategy.That is,A attempts to minimize its net lossL A &min iL iwhereL A =:Tt =1p t }l tis the total cumulative loss suffered by algorithm A on the first T trials,andL i =:Tt =1l t iis strategy i 's cumulative loss.In Section 2,we show that Littlestone and Warmuth's [20]``weighted majority''algorithm can be generalized toarticle no.SS9715041190022-0000Â97 25.00Copyright 1997by Academic PressAll rights of reproduction in any form reserved.*An extended abstract of this work appeared in the ``Proceedings of the Second European Conference on Computational Learning Theory,Barcelona,March,1995.''-E-mail:[yoav,schapire ]Ä.handle this problem,and we prove a number of bounds on the net loss.For instance,one of our results shows that the net loss of our algorithm can be bounded by O(-T ln N) or,put another way,that the average per trial net loss is decreasing at the rate O(-(ln N)ÂT).Thus,as T increases, this difference decreases to zero.Our results for the on-line allocation model can be applied to a wide variety of learning problems,as we describe in Section3.In particular,we generalize the results of Littlestone and Warmuth[20]and Cesa-Bianchi et al.[4]for the problem of predicting a binary sequence using the advice of a team of``experts.''Whereas these authors proved worst-case bounds for making on-line randomized decisions over a binary decision and outcome space with a[0,1]-valued discrete loss,we prove(slightly weaker) bounds that are applicable to any bounded loss function over any decision and outcome spaces.Our bounds express explicitly the rate at which the loss of the learning algorithm approaches that of the best expert.Related generalizations of the expert prediction model were studied by Vovk[25],Kivinen and Warmuth[19], and Haussler et al.[15].Like us,these authors focused primarily on multiplicative weight-update algorithms.Chung [5]also presented a generalization,giving the problem a game-theoretic treatment.BoostingReturning to the horse-racing story,suppose now that the gambler grows weary of choosing among the experts and instead wishes to create a computer program that will accurately predict the winner of a horse race based on the usual information(number of races recently won by each horse,betting odds for each horse,etc.).To create such a program,he asks his favorite expert to explain his betting strategy.Not surprisingly,the expert is unable to articulate a grand set of rules for selecting a horse.On the other hand, when presented with the data for a specific set of races,the expert has no trouble coming up with a``rule-of-thumb''for that set of races(such as,``Bet on the horse that has recently won the most races''or``Bet on the horse with the most favored odds'').Although such a rule-of-thumb,by itself,is obviously very rough and inaccurate,it is not unreasonable to expect it to provide predictions that are at least a little bit better than random guessing.Furthermore,by repeatedly asking the expert's opinion on different collections of races, the gambler is able to extract many rules-of-thumb.In order to use these rules-of-thumb to maximum advan-tage,there are two problems faced by the gambler:First, how should he choose the collections of races presented to the expert so as to extract rules-of-thumb from the expert that will be the most useful?Second,once he has collected many rules-of-thumb,how can they be combined into a single,highly accurate prediction rule?Boosting refers to this general problem of producing a very accurate prediction rule by combining rough andmoderately inaccurate rules-of-thumb.In the second part of the paper,we present and analyze a new boosting algorithm inspired by the methods we used for solving the on-line allocation problem.Formally,boosting proceeds as follows:The booster is provided with a set of labelled training examples(x1,y1), ...,(x N,y N),where y i is the label associated with instance x i; for instance,in the horse-racing example,x i might be the observable data associated with a particular horse race,and y i the outcome(winning horse)of that race.On each round t=1,...,T,the booster devises a distribution D t over the set of examples,and requests(from an unspecified oracle)a weak hypothesis(or rule-of-thumb)h t with low error=t with respect to D t(that is,=t=Pr i t Dt[h t(x i){y i]).Thus,dis-tribution D t specifies the relative importance of each example for the current round.After T rounds,the booster must combine the weak hypotheses into a single prediction rule. Unlike the previous boosting algorithms of Freund [10,11]and Schapire[22],the new algorithm needs no prior knowledge of the accuracies of the weak hypotheses. Rather,it adapts to these accuracies and generates a weighted majority hypothesis in which the weight of each weak hypothesis is a function of its accuracy.For binary prediction problems,we prove in Section4that the error of this final hypothesis(with respect to the given set ofexamples)is bounded by exp(&2 Tt=1#2t)where=t= 1Â2&#t is the error of the t th weak hypothesis.Since a hypothesis that makes entirely random guesses has error 1Â2,#t measures the accuracy of the t th weak hypothesis relative to random guessing.Thus,this bound shows that if we can consistently find weak hypotheses that are slightly better than random guessing,then the error of the final hypothesis drops exponentially fast.Note that the bound on the accuracy of the final hypothesis improves when any of the weak hypotheses is improved.This is in contrast with previous boosting algo-rithms whose performance bound depended only on the accuracy of the least accurate weak hypothesis.At the same time,if the weak hypotheses all have the same accuracy,the performance of the new algorithm is very close to that achieved by the best of the known boosting algorithms.In Section5,we give two extensions of our boosting algo-rithm to multi-class prediction problems in which each example belongs to one of several possible classes(rather than just two).We also give an extension to regression problems in which the goal is to estimate a real-valued function.2.THE ON-LINE ALLOCATION ALGORITHMAND ITS ANALYSISIn this section,we present our algorithm,called Hedge(;),for the on-line allocation problem.The algorithm120FREUND AND SCHAPIREand its analysis are direct generalizations of Littlestone and Warmuth's weighted majority algorithm[20].The pseudo-code for Hedge(;)is shown in Fig.1.The algorithm maintains a weight vector whose value at time tis denoted w t=(w t1,...,w tN).At all times,all weights willbe nonnegative.All of the weights of the initial weight vector w1must be nonnegative and sum to one,so thatNi=1w1i=1.Besides these conditions,the initial weight vec-tor may be arbitrary,and may be viewed as a``prior''over the set of strategies.Since our bounds are strongest for those strategies receiving the greatest initial weight,we will want to choose the initial weights so as to give the most weight to those strategies which we expect are most likely to perform the best.Naturally,if we have no reason to favor any of the strategies,we can set all of the initial weights equally so thatw1i =1ÂN.Note that the weights on future trials need notsum to one.Our algorithm allocates among the strategies using the current weight vector,after normalizing.That is,at time t, Hedge(;)chooses the distribution vectorp t=w tNi=1w ti.(1)After the loss vector l t has been received,the weight vector w t is updated using the multiplicative rulew t+1i =w ti};l i t.(2)More generally,it can be shown that our analysis is applicable with only minor modification to an alternative update rule of the formw t+1i =w ti}U;(l ti)where U;:[0,1]Ä[0,1]is any function,parameterized by;#[0,1]satisfying;r U;(r) 1&(1&;)rfor all r#[0,1].2.1.AnalysisThe analysis of Hedge(;)mimics directly that given by Littlestone and Warmuth[20].The main idea is to deriveupper and lower bounds on Ni=1w T+1iwhich,together,imply an upper bound on the loss of the algorithm.We begin with an upper bound.Lemma1.For any sequence of loss vectors l1,...,l T, ln\:N i=1w T+1i+ &(1&;)L Hedge(;).Algorithm Hedge(;)Parameters:;#[0,1]initial weight vector w1#[0,1]N with Ni=1w1i=1number of trials TDo for t=1,2,...,T1.Choose allocationp t=w tNi=1w ti2.Receive loss vector l t#[0,1]N from environment.3.Suffer loss p t}l t.4.Set the new weights vector to bew t+1i=w ti;l i tFIG.1.The on-line allocation algorithm.Proof.By a convexity argument,it can be shown that:r 1&(1&:)r(3) for: 0and r#[0,1].Combined with Eqs.(1)and(2), this implies:Ni=1w t+1i=:Ni=1w ti;l i t:Ni=1w ti(1&(1&;)l ti)=\:N i=1w t i+(1&(1&;)p t}l t).(4) Applying repeatedly for t=1,...,T yields:Ni=1w T+1i`Tt=1(1&(1&;)p t}l t)exp\&(1&;):T t=1p t}l t+since1+x e x for all x.The lemma follows imme-diately.KThus,L Hedge(;)&ln( Ni=1w T+1i)1&;.(5) Note that,from Eq.(2),w T+1i=w1i`Tt=1;l i t=w1i;L i.(6) This is all that is needed to complete our analysis.121A DECISION THEORETIC GENERALIZATIONTheorem2.For any sequence of loss vectors l1,...,l T, and for any i#[1,...,N],we haveL Hedge(;) &ln(w1i)&L i ln;1&;.(7)More generally,for any nonempty set S [1,...,N],we haveL Hedge(;) &ln(i#Sw1i)&(ln;)max i#S L i1&;.(8)Proof.We prove the more general statement(8)since Eq.(7)follows in the special case that S=[i].From Eq.(6),: N i=1w T+1i:i#Sw T+1i=:i#Sw1i;L i;max i#S L i:i#Sw1i.The theorem now follows immediately from Eq.(5).K The simpler bound(7)states that Hedge(;)does not perform``too much worse''than the best strategy i for the sequence.The difference in loss depends on our choice of;and on the initial weight w1i of each strategy.If each weightis set equally so that w1i=1ÂN,then this bound becomesL Hedge(;) min i L i ln(1Â;)+ln N1&;.(9)Since it depends only logarithmically on N,this bound is reasonable even for a very large number of strategies.The more complicated bound(8)is a generalization of the simpler bound that is especially applicable when the number of strategies is infinite.Naturally,for uncountable collections of strategies,the sum appearing in Eq.(8)can be replaced by an integral,and the maximum by a supremum. The bound given in Eq.(9)can be written asL Hedge(;) c miniL i+a ln N,(10)where c=ln(1Â;)Â(1&;)and a=1Â(1&;).Vovk[24] analyzes prediction algorithms that have performance bounds of this form,and proves tight upper and lower bounds for the achievable values of c and ing Vovk's results,we can show that the constants a and c achieved by Hedge(;)are optimal.Theorem3.Let B be an algorithm for the on-line alloca-tion problem with an arbitrary number of strategies.Suppose that there exists positive real numbers a and c such that for any number of strategies N and for any sequence of loss vectors l1,...,l TL B c miniL i+a ln N.Then for all;#(0,1),eithercln(1Â;)1&;or a1(1&;).The proof is given in the appendix.2.2.How to Choose;So far,we have analyzed Hedge(;)for a given choice of ;,and we have proved reasonable bounds for any choice of ;.In practice,we will often want to choose;so as to maxi-mally exploit any prior knowledge we may have about the specific problem at hand.The following lemma will be helpful for choosing;using the bounds derived above.Lemma4.Suppose0 L L and0<R R.Let;= g(LÂR)where g(z)=1Â(1+-2Âz).Then&L ln;+R1&;L+-2L R+R.Proof.(Sketch)It can be shown that&ln; (1&;2)Â(2;)for;#(0,1].Applying this approximation and the given choice of;yields the result.KLemma4can be applied to any of the bounds above since all of these bounds have the form given in the lemma.For example,suppose we have N strategies,and we also know a prior bound L on the loss of the best strategy.Then, combining Eq.(9)and Lemma4,we haveL Hedge(;) miniL i+-2L ln N+ln N(11)for;=g(LÂln N).In general,if we know ahead of time the number of trials T,then we can use L=T as an upper bound on the cumulative loss of each strategy i.Dividing both sides of Eq.(11)by T,we obtain an explicit bound on the rate at which the average per-trial loss of Hedge(;)approaches the average loss for the best strategy:L Hedge(;)TminiL iT+-2L ln NT+ln NT.(12)122FREUND AND SCHAPIRESince L T,this gives a worst case rate of convergence of O(-(ln N)ÂT).However,if L is close to zero,then the rate of convergence will be much faster,roughly,O((ln N)ÂT). Lemma4can also be applied to the other bounds given in Theorem2to obtain analogous results.The bound given in Eq.(11)can be improved in special cases in which the loss is a function of a prediction and an outcome and this function is of a special form(see Example4 below).However,for the general case,one cannot improve the square-root term-2L ln N,by more than a constant factor.This is a corollary of the lower bound given by Cesa-Bianchi et al.([4],Theorem7)who analyze an on-line prediction problem that can be seen as a special case of the on-line allocation model.3.APPLICATIONSThe framework described up to this point is quite general and can be applied in a wide variety of learning problems. Consider the following set-up used by Chung[5].We are given a decision space2,a space of outcomes0,and a bounded loss function*:2_0Ä[0,1].(Actually,our results require only that*be bounded,but,by rescaling,we can assume that its range is[0,1].)At every time step t,the learning algorithm selects a decision$t#2,receives an out-come|t#0,and suffers loss*($t,|t).More generally,we may allow the learner to select a distribution D t over the space of decisions,in which case it suffers the expected loss of a decision randomly selected according to D t;that is,its expected loss is4(D t,|t)where4(D,|)=E$t D[*($,|)].To decide on distribution D t,we assume that the learner has access to a set of N experts.At every time step t,experti produces its own distribution E ti on2,and suffers loss4(E ti ,|t).The goal of the learner is to combine the distributions produced by the experts so as to suffer expected loss``not much worse''than that of the best expert.The results of Section2provide a method for solving this problem.Specifically,we run algorithm Hedge(;),treating each expert as a strategy.At every time step,Hedge(;) produces a distribution p t on the set of experts which is used to construct the mixture distributionD t=:Ni=1p tiE ti.For any outcome|t,the loss suffered by Hedge(;)will then be4(D t,|t)=:Ni=1p ti4(E ti,|t).Thus,if we define l ti=4(E ti,|t)then the loss suffered by thelearner is p t}l t,i.e.,exactly the mixture loss that wasanalyzed in Section2.Hence,the bounds of Section2can be applied to ourcurrent framework.For instance,applying Eq.(11),weobtain the following:Theorem5.For any loss function*,for any set ofexperts,and for any sequence of outcomes,the expected lossof Hedge(;)if used as described above is at most:Tt=14(D t,|t) mini:Tt=14(E ti,|t)+-2L ln N+ln Nwhere L T is an assumed bound on the expected loss of thebest expert,and;=g(LÂln N).Example1.In the k-ary prediction problem,2=0=[1,2,...,k],and*($,|)is1if${|and0otherwise.Inother words,the problem is to predict a sequence of lettersover an alphabet of size k.The loss function*is1if amistake was made,and0otherwise.Thus,4(D,|)is theprobability(with respect to D)of a prediction that disagreeswith|.The cumulative loss of the learner,or of any expert,is therefore the expected number of mistakes on the entiresequence.So,in this case,Theorem2states that the expec-ted number of mistakes of the learning algorithm will exceedthe expected number of mistakes of the best expert by atmost O(-T ln N),or possibly much less if the loss of thebest expert can be bounded ahead of time.Bounds of this type were previously proved in the binarycase(k=2)by Littlestone and Warmuth[20]using thesame algorithm.Their algorithm was later improved byVovk[25]and Cesa-Bianchi et al.[4].The main result ofthis section is a proof that such bounds can be shown tohold for any bounded loss function.Example2.The loss function*may represent anarbitrary matrix game,such as``rock,paper,scissors.''Here,2=0=[R,P,S],and the loss function is defined by thematrix:|R P SR1210$P0121S1012The decision$represents the learner's play,and the out-come|is the adversary's play;then*($,|),the learner'sloss,is1if the learner loses the round,0if it wins the round,and1Â2if the round is tied.(For instance,*(S,P)=0since``scissors cut paper.'')So the cumulative loss of the learner123A DECISION THEORETIC GENERALIZATION(or an expert)is the expected number of losses in a series of rounds of game play(counting ties as half a loss).Our results show then that,in repeated play,the expected number of rounds lost by our algorithm will converge quickly to the expected number that would have been lost by the best of the experts(for the particular sequence of moves that were actually played by the adversary).Example3.Suppose that2and0are finite,and that* represents a game matrix as in the last example.Suppose further that we create one expert for each decision$#2and that expert always recommends playing$.In game-theoretic terminology such experts would be identified with pure strategies.Von Neumann's classical min-max theorem states that for any fixed game matrix there exists a distribu-tion over the actions,also called a mixed strategy,which achieves the min-max optimal value of the expected loss against any adversarial strategy.This min-max value is also called the value of the game.Suppose that we use algorithm Hedge(;)to choose dis-tributions over the actions when playing a matrix game repeatedly.In this case,Theorem2implies that the gap between the learner's average per-round loss can never be much larger than that of the best pure strategy,and that the maximal gap decreases to zero at the rate O(1-T log|2|). However,the expected loss of the optimal mixed strategy is a fixed convex combination of the losses of the pure strategies,thus it can never be smaller than the loss of the best pure strategy for a particular sequence of events. We conclude that the expected per-trial loss of Hedge(;) is upper bounded by the value of the game plus O(1Â-T log|2|).In other words,the algorithm can never perform much worse that an algorithm that uses the optimal mixed strategy for the game,and it can be better if the adversary does not play optimally.Moreover,this holds true even if the learner knows nothing at all about the game that is being played(so that*is unknown to the learner), and even if the adversarial opponent has complete knowledge both of the game that is being played and the algorithm that is being used by the learner.Algorithms with similar properties(but weaker convergence bounds)were first devised by Blackwell[2]and Hannan[14].For more details see our related paper[13].Example4.Suppose that2=0is the unit ball in R n, and that*($,|)=&$&|&.Thus,the problem here is to predict the location of a point|,and the loss suffered is the Euclidean distance between the predicted point$and the actual outcome|.Theorem2can be applied if probabilistic predictions are allowed.However,in this setting it is more natural to require that the learner and each expert predict a single point(rather than a measure on the space of possible points).Essentially,this is the problem of``tracking''a sequence of points|1,...,|T where the loss function measures the distance to the predicted point.To see how to handle the problem of finding deterministic predictions,notice that the loss function*($,|)is convex with respect to$:&(a$1+(1&a)$2)&|& a&$1&|&+(1&a)&$2&|&(13) for any a#[0,1]and any|#0.Thus we can do as follows. At time t,the learner predicts with the weighted average of the experts'predictions:$t= Ni=1p ti=tiwhere=ti#R n is the prediction of the i th expert at time t.Regardless of the outcome|t,Eq.(13)implies that&$t&|t& :Ni=1p ti&=ti&|t&.Since Theorem2provides an upper bound on the right hand side of this inequality,we also obtain upper bounds for the left hand side.Thus,our results in this case give explicit bounds on the total error(i.e.,distance between predicted and observed points)for the learner relative to the best of a team of experts.In the one-dimensional case(n=1),this case was previously analyzed by Littlestone and Warmuth[20],and later improved upon by Kivinen and Warmuth[19].This result depends only on the convexity and the bounded range of the loss function*($,|)with respect to$. Thus,it can also be applied,for example,to the squared-distance loss function*($,|)=&$&|&2,as well as the log loss function*($,|)=&ln($}|)used by Cover[6]for the design of``universal''investment portfolios.(In this last case,2is the set of probability vectors on n points,and 0=[1ÂB,B]n for some constant B>1.)In many of the cases listed above,superior algorithms or analyses are known.Although weaker in specific cases,it should be emphasized that our results are far more general, and can be applied in settings that exhibit considerably less structure,such as the horse-racing example described in the introduction.4.BOOSTINGIn this section we show how the algorithm presented in Section2for the on-line allocation problem can be modified to boost the performance of weak learning algorithms.We very briefly review the PAC learning model(see,for instance,Kearns and Vazirani[18]for a more detailed description).Let X be a set called the domain.A concept is a Boolean function c:XÄ[0,1].A concept class C is a collection of concepts.The learner has access to an oracle which provides labelled examples of the form(x,c(x)) where x is chosen randomly according to some fixed but124FREUND AND SCHAPIREunknown and arbitrary distribution D on the domain X, and c#C is the target concept.After some amount of time, the learner must output a hypothesis h:XÄ[0,1].The value h(x)can be interpreted as a randomized prediction of the label of x that is1with probability h(x)and0with prob-ability1&h(x).(Although we assume here that we have direct access to the bias of this prediction,our results can be extended to the case that h is instead a random mapping into[0,1].)The error of the hypothesis h is the expected value E x t D(|h(x)&c(x)|)where x is chosen according to D.If h(x)is interpreted as a stochastic prediction,then this is simply the probability of an incorrect prediction.A strong PAC-learning algorithm is an algorithm that, given=,$>0and access to random examples,outputs with probability1&$a hypothesis with error at most=.Further, the running time must be polynomial in1Â=,1Â$and other relevant parameters(namely,the``size''of the examples received,and the``size''or``complexity''of the target concept).A weak PAC-learning algorithm satisfies the same conditions but only for= 1Â2&#where#>0is either a constant,or decreases as1Âp where p is a polynomial in the relevant parameters.We use WeakLearn to denote a generic weak learning algorithm.Schapire[22]showed that any weak learning algorithm can be efficiently transformed or``boosted''into a strong learning ter,Freund[10,11]presented the ``boost-by-majority''algorithm that is considerably more efficient than Schapire's.Both algorithms work by calling a given weak learning algorithm WeakLearn multiple times, each time presenting it with a different distribution over the domain X,and finally combining all of the generated hypotheses into a single hypothesis.The intuitive idea is to alter the distribution over the domain X in a way that increases the probability of the``harder''parts of the space, thus forcing the weak learner to generate new hypotheses that make less mistakes on these parts.An important,practical deficiency of the boost-by-majority algorithm is the requirement that the bias#of the weak learning algorithm WeakLearn be known ahead of time.Not only is this worst-case bias usually unknown in practice,but the bias that can be achieved by WeakLearn will typically vary considerably from one distribution to the next.Unfortunately,the boost-by-majority algorithm can-not take advantage of hypotheses computed by WeakLearn with error significantly smaller than the presumed worst-case bias of1Â2&#.In this section,we present a new boosting algorithm which was derived from the on-line allocation algorithm of Section2.This new algorithm is very nearly as efficient as boost-by-majority.However,unlike boost-by-majority, the accuracy of the final hypothesis produced by the new algorithm depends on the accuracy of all the hypotheses returned by WeakLearn,and so is able to more fully exploit the power of the weak learning algorithm.Also,this new algorithm gives a clean method for handling real-valued hypotheses which often are produced by neural networks and other learning algorithms.4.1.The New Boosting AlgorithmAlthough boosting has its roots in the PAC model,for the remainder of the paper,we adopt a more general learning framework in which the learner receives examples(x i,y i) chosen randomly according to some fixed but unknown distribution P on X_Y,where Y is a set of possible labels. As usual,the goal is to learn to predict the label y given an instance x.We start by describing our new boosting algorithm in the simplest case that the label set Y consists of just two possible labels,Y=[0,1].In later sections,we give extensions of the algorithm for more general label sets.Freund[11]describes two frameworks in which boosting can be applied:boosting by filtering and boosting by sampling.In this paper,we use the boosting by sampling framework,which is the natural framework for analyzing ``batch''learning,i.e.,learning using a fixed training set which is stored in the computer's memory.We assume that a sequence of N training examples (labelled instances)(x1,y1),...,(x N,y N)is drawn randomly from X_Y according to distribution P.We use boosting to find a hypothesis h f which is consistent with most of the sample(i.e.,h f(x i)=y i for most1 i N).In general,a hypothesis which is accurate on the training set might not be accurate on examples outside the training set;this problem is sometimes referred to as``over-fitting.''Often, however,overfitting can be avoided by restricting the hypothesis to be simple.We will come back to this problem in Section4.3.The new boosting algorithm is described in Fig.2.The goal of the algorithm is to find a final hypothesis with low error relative to a given distribution D over the training examples.Unlike the distribution P which is over X_Y and is set by``nature,''the distribution D is only over the instances in the training set and is controlled by the learner. Ordinarily,this distribution will be set to be uniform so that D(i)=1ÂN.The algorithm maintains a set of weights w t over the training examples.On iteration t a distribution p t is computed by normalizing these weights.This distribution is fed to the weak learner WeakLearn which generates a hypothesis h t that(we hope)has small error with respect to the distribution.1Using the new hypothesis h t,the boosting125A DECISION THEORETIC GENERALIZATION1Some learning algorithms can be generalized to use a given distributiondirectly.For instance,gradient based algorithms can use the probabilityassociated with each example to scale the update step size which is basedon the example.If the algorithm cannot be generalized in this way,thetraining sample can be re-sampled to generate a new set of training exam-ples that is distributed according to the given distribution.The computa-tion required to generate each re-sampled example takes O(log N)time.。
a rX iv:mat h /67441v1[mat h.DG ]18J ul26A generalization of Reifenberg’s theorem in R 3G.David,T.De Pauw T.Toro ∗Abstract In 1960Reifenberg proved the topological disc property.He showed that a subset of R n which is well approximated by m -dimensional affine spaces at each point and at each (small)scale is locally a bi-H¨o lder image of the unit ball in R m .In this paper we prove that a subset of R 3which is well approximated by a minimal cone at each point and at each (small)scale is locally a bi-H¨o lder deformation of a minimal cone.We also prove an analogous result for more general cones in R n .1Introduction In 1960,thanks to the development of algebraic topology,Reifenberg was able to formulate the Plateau problem for m -dimensional surfaces of varying topological type in R k (see [R1]).He proved that given a set Γ⊂R k homeomorphic to S m −1there exists a set Σ0with ∂Σ0=Γwhich minimizes the H m Hausdorffmeasure among all competitors in the appropriate class.Furthermore he showed that for almost every x ∈Σ0there exists a neighborhood of x which is a topological disk of dimension m .A remarkable result in [R1]is the Topological Disk Theorem.In general terms it says that if a set is close to an m -plane in the Hausdorffdistance sense at all points and at all (small enough)scales,then it is locally biH¨o lder equivalent to a ball of R m .Using a monotonicity formula for the density,Reifenberg proved that some open subset of full measure of Σ0satisfies this condition.In 1964he proved an Epiperimetric inequality for solutions to the Plateau problem described above.This allowed him to show that the minimizer Σ0is locally real analytic (see [R2],and [R3]).Although the Topological Disk Theorem has never again been used as a tool to study the regularity of minimal surfaces it has played a role in understanding their singularities as well as the singular set of energy minimizing harmonic maps (see [HL]).Reifenberg’s proof has been adapted to producebiLipschitz,quasi-symmetric and biH¨o lder parameterizations both for subsets of Euclidean space and general metric spaces (under the appropriate flatness assumptions).See [To],[DT],and [CC].In 1976J.Taylor [Ta]classified the tangent cones for Almgren almost-minimal sets in R 3.She showed that there are three types of nonempty minimal cones of dimension 2in R 3:the planes,sets that are obtained by taking the product of a Y in a plane with a linein the orthogonal direction,and sets composed of six angular sectors bounded by four half lines that start from the same point and make(maximal)equal angles.See the more precise Definitions2.2and2.3.By lack of better names,we shall call these minimal cones sets of type1,2,and3respectively.In this paper we generalize Reifenberg’s Topological Disk Theorem to the case when the set is close in the Hausdorffdistance sense to a set of type1,2or3at every point and every(small enough)ly,if E is a closed set in R3which is sufficiently close to a two-dimensional minimal cone at all scales and locations,then there is,at least locally,a bi-H¨o lder parameterization of E by a minimal cone.Reifenberg’s theorem corresponds to the case of approximation by planes.Let usfirst state the main result and comment later. Theorem1.1For eachε∈(0,10−15),we canfindα∈(0,1)such the following holds.Let E⊂R3be a compact set that contains the origin,and assume that for each x∈E∩B(0,2) and each radius r>0such that B(x,r)⊂B(0,2),there is a minimal cone Z(x,r)that contains x,such that(1.1)D x,r(E,Z(x,r))≤ε,where we use the more convenient variant of Hausdorffdistance D x,r defined by1D x,r(E,F)=Notice that by(1.4)and(1.5)the restriction of f to Z∩B(0,3/2)provides a biH¨o lderparameterization a piece of E that contains E∩B(0,1).This may be enough information, but in some cases it may also be good to know that this parameterization comes from abiH¨o lder local homeomorphism of R3,as in the statement,because this yields informationon the position of E in space.For instance,we can use Theorem1.1to construct local retractions of space near E onto E.Remark1.1We shall see that when Z(0,2)is a plane,we can take Z to be a plane;whenZ(0,2)is a set of type2centered at the origin,we can take Z to be a set of type2centered at the origin;when Z(0,2)is a set of type3centered at the origin,we can take Z to be aset of type3centered at the origin.In addition,our proof will yield that(1.7)B(0,17/10)⊂f(B(0,18/10))⊂B(0,2),(1.8)E∩B(0,18/10)⊂f(Z∩B(0,19/10))⊂E∩B(0,2),and that(1.5)and(1.6)holds for x,y∈B(0,18/10).In fact,we shall omit the case of the plane(too easy and well known),and concentrateon the two other special cases.The general case will essentially follow.Remark1.2It would be a little too optimistic to think that f is quasisymmetric.This is already false when Z(0,2)is a plane,because E could be the product of R with a snowflakein R2.Remark1.3Theorem1.1can probably be generalized to a number of situations,where approximation by minimal sets of types1,2,and3is replaced with approximation by varioustypes of sets with a hierarchical simplex structure.We did notfind the optimal way to statethis,and probably this would make the proof a little heavier.What we shall do instead is state a slightly more general result in Theorem2.2,prove Theorem1.1essentially as itis,and add a few comments from place to place to explain how to generalize the proof forTheorem2.2.A more general statement,if ever needed,is left out for future investigation.Our proof will use the hierarchical structure of E.We shall see that E∩B(0,2)splitsnaturally into three disjoint subsets E1,E2,and E3,where E j is the set of points where Elooks like a set of type j in small balls centered at x(more precise definitions will be given in Section4).If Z=Z(0,2)is a plane,we do not need sets of type2or3in smaller balls,and we are in the situation of Reifenberg’s theorem.Two other main cases will remain,as in Remark1.2.The case when Z is set of type2,with its spine passing through the origin(see Definition2.2below),and the case when Z is set of type3centered at the origin.In thefirst case,we shall see that E3∩B(0,199/100)is empty and E2is locally a Reifenberg-flat one-dimensional set;in the second case,E3∩B(0,3/2)is just composed of one point near the origin,and away from this point E2is locally a Reifenberg-flat.See the end of Section4for details3The sets E1and E2will play a special role in the proof.Even though we shall in fact construct f directly as an infinite composition of homeomorphisms in space(that move points at smaller and smaller dyadic scales),we shall pay much more attention to the definition of f on E2,and then E,just as if we were defining ffirst from the spine of Z to E2,then from Z to E,then on the rest of B(0,3/2).The construction yields that the restriction of f to Z to each of the three or six faces of Z is of class C1if the approximation at small scales is sufficiently good(for instance if we can takeε=Crβon balls of radius r),see Section10.Theorem1.1will be used in[D1]and[D2]to give a slightly different proof of J.Taylor’s regularity result for minimal sets from[Ta].The plan for the rest of this paper is as follows.In Section2we define sets of type2 and3and state a uniform version of our main theorem.We also discuss the moreflexible version of Theorem1.1mentioned in Remark1.3.In Section3we record some of the simple geometrical facts about minimal sets of types1,2,3,and in particular lower bounds on their relative Hausdorffdistances,that will be used in the proof.In Section4we show that E∩B(0,2)is the disjoint union of E1,E2and E3,that E1is locally Reifenberg-flat,and that E3is discrete.In Section5we define the partitions of unity which we use in the construction of the biH¨o lder parameterization.In Section6we construct a parameterization of E2when E3is empty.In Section7we extend this to a parameterization of E.In Section8we explain how to modify the construction when there is a tetrahedral point in E3.In Section9we extend the parameterization to the whole ball.Finally,in Section10,we prove that our parameterization of E is C1on each face of Z when the numbers D x,r(E,Z(x,r))tend to0 sufficiently fast,uniformly in x,as r tends to0.Acknowledgments:Part of this work was completed in Autumn,2005.T.Toro was vis-iting the Newton Institute of Mathematical Sciences in Cambridge,U.K.and the University College London in London.She would like to thank these institutions for their hospitality. 2Two other statementsWe start with a uniform version of Theorem1.1.Definition2.1A set E⊂R3isε-Reifenbergflat(of dimension2)if for each compact set K⊂R3there exists r K>0such that for x∈E∩K and r<r K,D x,r(E,L)≤ε,(2.1)infL∋xwhere the infimimum is taken over all planes L containing x.Note that this definition is only meaningful forεsmall.Reifenberg’s Topological Disk Theorem gives local parameterizations ofε-Reifenbergflat sets.We want to extend this to4theε-minimal sets defined below,butfirst we give a full description of sets of type2and3. Recall that sets of type1are just planes.Definition2.2Define Prop⊂R2byProp={(x1,x2):x1≥0,x2=0}(x1,x2):x1≤0,x2=−√3x1 .Define Y0⊂R3by Y0=Prop×R.The spine of Y0is the line L0={x1=x2=0}.A set of type2is a set Y=R(Y0),where R is the composition of a translation and a rotation.The spine of Y is the line R(L0).Definition2.3Set A1=(1,0,0),A2= −123,−√3,√3 ,and A4= −126))⊂B(x,2r),23r(2.6)E∩B(x,r)⊂f(Z∩B(0,Thus f is a local homeomorphism of the ambient space which sends a minimal cone into theǫ-minimal set.Theorem2.1is an immediate consequence of Theorem1.1.Also,if we can takeεto tend to0in(2.3)(uniformly in x∈E∩K)when r tends to0,then(2.4)holds with constants that tend to0when r tends to0.Let us now try to generalize Theorem1.1.We want to extend the notion of sets of type 1,2,or3.Let usfix integers d≥2(the dimension of our sets)and n≥d+1(the ambient dimension).Sets of type G1are just d-planes in R n.A generalized propeller in R2will be any union P of three co-planar half lines with the same origin,and that meet with angles larger thanπ/2at this point.Incidentally,π/2was chosen a little at random here,and we shall not try to see whether it can be replaced by smaller angles.A set of type G2is a set Y=R(P×R d−1),where P is a generalized propeller in R2 and R is an isometry of R n.When n>d+1,we abused notation slightly,we should have written Y=R(P×R d−1×{0})to account for the last n−d−1coordinates.The spine of R(P×R d−1×{0})is R({0}×R d−1×{0}).A two-dimensional set of type G3in R m is a set T that we can construct as follows.We pick distinct points A j,1≤j≤k,in the unit sphere of R m.Then we choose a graph,as follows.We pick G⊂{1,···,k}2which is symmetric(i.e.,(i,j)∈G when(j,i)∈G),does not meet the diagonal,and is such that for each i∈{1,···,k},there are exactly three j=i such that(i,j)∈G.We callΓthe union of the segments[A i,A j],(i,j)∈G,and then set T={tγ;t≥0andγ∈Γ}(T is the cone overΓ).Equivalently,for each(i,j)∈G,we set F i,j={tz;t≥0and z∈[A i,A j]},and T is the union of the3k/2faces F i,j.In particular, k is even.We add a few regularity constraints.First,the interior of a face F i,j never meets another face,so the faces only meet by sets of three along half lines L j=[0,A j).For each j, exactly three faces F i,j meet L j,and we require that they make angles larger thanπ/2along L j.Thus the union of three half planes that contains these three faces is a two-dimensional set of type G2.We also require that(2.7)|A i−A j|≥τ0for i=jand(2.8)dist(F i,j∩∂B(0,1),F i′,j′∩∂B(0,1))≥τ0when F i,j∩F i′,j′=∅,for some small positive constantτ0that we pick in advance.Another way to state the constraints is to considerΓ′=T∩∂B(0,1):we wantΓ′to be composed of not too small arcs of circles that only meet at their ends,by sets of three,and with angles greater thanπ/2.In addition,we only allow two arcs to be very close when they share at least an end.Let us also allow the case when more than one arc may connect a given pair of points A j(this would imply changing our notation a tiny bit,and taking a graph G that is not6necessarily contained in{1,···,k}2),but demand that k≥4(to avoid the case of a set of type G2).A set of type G3(of dimension d in R n)is a set Z=R(T×R d−2),where R is an isometry of R n and T is a two-dimensional set of type G3in R n−d+2.The spine of Z is the union L of the half lines L i when Z=T is two-dimensional,and R(L×R d−2)when Z=R(T×R d−2).Denote by T Gi the collection of sets of of type Gi(of dimension d in R n),and by T G the union of the three classes T Gi.We claim that in Theorem1.1,we can replace the classof minimal sets with the class T G,as follows.Theorem2.2Let n and d be as above.Let E⊂R n be a compact set that contains the origin,and suppose that for each x∈E∩B(0,2)and r>0such that B(x,r)⊂B(0,2), we canfind Z(x,r)∈T G that contains x,such that D x,r(E,Z(x,r))≤ε.Ifε>0is smallenough,depending only on n,d,andτ0,there is a set Z∈T G such that(1.3)–(1.6)hold. Here againαdepends only onε,n,d,andτ0,and we can takeα=C(n,d,τ0)εforεsmall.The reader may be surprised that we do not require any coherence between the various sets Z(x,r),but this coherence will follow automatically from the fact that D x,r(E,Z(x,r))≤εfor many x and r.For instance,if d=2and Z(0,2)is of type G3,with a center at the origin,our proof will show that for r small,there is a point x r∈E,close to0,such that Z(x r,r)is of type G3,and the number k of half-lines in the spine of Z(x r,r)is the same asfor Z(0,2).The point is that angles between the faces may vary little by little when x and r vary,but things like the type of Z(x,r)and the number k cannot jump randomly.There is an analogue of Theorem2.1in the context of Theorem2.2,which the readermay easily state.The proof of Theorem2.2will almost be the same as for Theorem1.1;what we shall do is proceed with the proof of Theorem1.1,and indicate from time to time which modifications are required for Theorem2.2.3Simple geometrical factsWe shall record in this section a few geometrical properties of the minimal cones that will be used in the construction.The statements will come with various numbers,like1/3or 1/4in the next lemma;these numbers are hopefully correct,but their precise values do not matter,in the sense that it is easy to adapt the value of later constants to make the proof work anyway.In particular,this is what would happen with the proof of Theorem2.2.This means that the reader may skip the proofs if she believes that the results of this section are true with different constants.The geometrical facts below will be used extensively in Section 4,to establish the hierarchical structure of E.Ourfirst lemma says that the Hausdorffdistances between sets of type1,2,and3,are not too small.7Lemma3.1Let Z be a minimal cone of type3centered at x.Then(3.1)D x,r(Z,Y)≥1/3for r>0whenever Y is a set of type1or2.Similarly,(3.2)D x,r(Y,P)>1/3for r>0when Y is a set of type2centered at x and P is a plane,and(3.3)D x,r(Y,P)>1/4for r>0when Y is a set of type2or3whose spine contains x and P is a plane.Proof.We start with the proof of(3.1).By translation,dilation,and rotation invariance,we may assume that x=0,r=1,and Z is the set T0of Definition2.3.It will be good to know that if P is any plane through the origin andπdenotes the orthogonal projection ontoP,then(3.4)B(0,1/3)∩P⊂π(T0∩B(0,1)).Let Q denote the solid tetrahedron with vertices A j,1≤j≤4,(i.e.,the convex hull of the A j),where the A j are as in Definition2.3.It is clear that B(0,1/3)lies on the right ofthe left face of Q(where thefirst coordinate is−1/3).By symmetry,B(0,1/3)lies in Q, because it lies on the right side of each face.Also observe that T0∩Q separates the(open) faces of Q from each other inside Q.For instance,the component in Q\T0of the lower faceis a pyramidal domain bounded by three faces of T0.Let z∈B(0,1/3)∩P be given,and letℓdenote the line through z perpendicular to P.Thenℓmeets Q because B(0,1/3)⊂Q.First suppose that it does not touch any edge of Q.Then it enters Q through one(open)face and leaves it through another one,so it meets T0∩Q.Ifℓtouches a edge of Q,it meets T0∩Q trivially(because the edges are contained in T0).Thusℓmeets T0∩B(0,1)(except perhaps ifℓcontains one of the A j),and hence z∈π(T0∩B(0,1)).The remaining case whenℓcontains some A j is easily obtained by density(and anyway we don’t need it);(3.4)follows.Return to(3.1).If Y is of type2,choose P orthogonal to the spine of Y;thenπ(Y) is a propeller(a set like Prop in(2.2)),and we canfind z∈∂B(0,1/3)∩P such that dist(z,π(Y))≥1/3.Set z n=(1−2−n)z for n>1.By(3.4)we canfind y n∈T0∩B(0,1) such thatπ(y n)=z n.Then dist(y n,Y)=dist(z n,π(Y))≥1/3−2−n,and(3.1)follows from the definition(1.2)The case when Y is a plane is even easier;we choose P perpendicular to Y,so thatπ(Y)is a line.So(3.1)holds in all cases.Let us now prove(3.2).We may assume that x=0and Y is the set Y0=Prop×R in(2.2).Let P be a plane,suppose that D x,r(Y,P)≤1/3,and let usfind a contradiction. First we want to show that P is almost horizontal.Call P0the horizontal plane R2×{0}, and denote by b1,b2,and b3the three points of∂B(0,9)∩ Prop×{0} .108Each b j lies in Y∩B(0,1),so we canfind b′j in P∩B(b j,1/3).Call d the smallest possible distance from a p j to the line through the other two p l.Then d corresponds to the case when,for instance,p j lies at the extreme right of the left-most disk D j and the two other p l lie at the extreme left of the D l.We get that d=92−2·160.Recall that P goes through the b′j.The fact that d>0(or that the p j are not aligned) already implies that P is a graph over P0.We want to show that it is a1-Lipschitz graph, or equivalently that(3.5)|v3|≤|π(v)|when v is a unit vector in the(direction of)the plane P,and v3denotes its last coordinate. Since the triangle with vertices b′j is nondegenerate,we can write v=α(b′j−c j),where α∈R\{0},1≤j≤3,and c j lies in the opposite side[b′k,b′l].Since c j is a convex combination of b′k and b′l,the size of its last coordinate is at most13.On the other hand,|π(b′j−c j)|≥d=413<4110)and b5=(0,0,95−215, while|π(v)|≤21034=173·2054|π(v)|instead of(3.5).Now we take b4=0and b5=(0,0,99100−1100,and |π(v)|≤1100|π(v)|>40Notice that even for sets of type T Gi,Lemma3.1is very easy to prove by compactness, when we allow very small constants instead of1/3and1/4.The next lemmas will make it easier to apply Lemma3.1.Lemma3.2Let T be a minimal cone of type3,and let B be a ball such that54times the radius)does not contain the center of T.Then thereis a minimal cone Y of type1or2such that Y∩B=T∩B.Proof.Again this would be very easy,even for sets of type T Gi,if we allowed a verylarge constant instead of54replaced with the betterconstantα−1,whereα=(2/3)1/2(observe that5Let P denote the plane through0,A2,and A3,and set x t=(−t,0,0)for t>0;we claim√3z=0(just check that0,that dist(x t,P)=αt.Indeed,an equation of P is2√12=αt,as announced. A2,and A3satisfy the equation).Then dist(x t,P)=(2Let x=0be given,and set B x=B(x,α|x|).We just checked that when x=x t,B x does not meet the three faces on the left of T0.For a general x,B x cannot meet the three faces at the same time,because the optimal position for this to happen would precisely be when x lies on the negativefirst axis.By symmetry,B x never meets the three faces of T that bound a given connected component of R3\T.So the worse that it can do is meet the three faces of T that share a single edge[0,A j)(if B x meets two opposite faces,it also meets the three faces that bound some connected component).Supposefirst that B x meets three such faces,and let Y denote the set of type2that contains these faces.We claim that(3.6)T∩B x=Y∩B x.The direct inclusion is clear,because B x only meets the three faces of T that are contained in Y.For the converse inclusion,we just need to check that F∩B x⊂F′∩B x when F is a face of Y and F′is the face of T that is contained in F.Let y∈F∩B x be given.By assumption,B x meets F′,so we can pick z∈F′∩B x.Since B x is open,we can even pick z in the interior of F′.Notice that the segment[y,z]lies in F∩B x by convexity.In addition, (y,z)does not meet the spine of Y(because z lies in the interior of F).Now B x does not meet the other edges of T(those that are not contained in the spine of Y),because if it meets an edge e,it also meets the three faces of T that touch e.So(y,z)does not meet the boundary of F′,and y∈F′.Our claim follows.Next suppose that B x only meets two faces of T.Still denote by Y the set of type2 that contains them.As before,T∩B x⊂Y∩B x trivially.To prove the converse inclusion, consider a face F of Y,andfirst assume that the face F′of T that it contains meets B x. Pick z∈F′∩B x.For each y∈F∩B x,the segment[x,y]is contained in B x,hence it does not meet the boundary of F′(because if B x meets any edge of T,it meets at least three faces),so it is contained in F′,and y∈F′.We just proved that F∩B x⊂F′∩B x.Call F1,F2,and F3the three face of Y,and denote by F′j the face of T which is contained in F j.Exactly two of the F′j meet B x,and the third one does not;let us assume that this last one is F′3.We just need to check that F3does not meet B x either.Suppose it does.Let y j be a point of F j∩B x,j∈{1,2,3}.Call y′j the orthogonal projection of y j onto the plane P through the center of B x and perpendicular to the spine L of Y;then y′j∈F j∩B x as well. Call p the point of L∩P;by convexity of B x and because p is a convex combination of the three y′j,B x contains p.We reached the desired contradiction,because p lies on an edge of P and B x does not meet edges in the present case.So we proved(3.6)in this second case.Finally assume that B x only meets one face F′of T.Then it does not meet any edge of T,so if Y denotes the plane that contains F′,T∩B x contains Y∩B x(as before,B x does not meet the boundary of F′in Y).In this case also we have(3.6).We just proved that for x=0,we canfind a set Y of type1or2such that(3.6)holds. Now let B be as in the statement,and call x its center and r its radius.We know that10B(x,5rLemma3.3Let Z be a minimal cone of type3centered at z,and let T be a minimal cone.If D x,r(T,Z)<1/3,then T is of type3and its center lies in B(z,5r/3).Proof.Let Z and T be as in the statement.If T is of type1or2,we can apply Lemma3.1 directly to get a contradiction.So T is of type3,and let t denote its center.By Lemma3.2, we canfind a set Y of type1or2that coincides with T in B=B(z,4r),Y coincides with T in B(z,43Lemma3.4Let Z be a set of type2or3.Suppose that the ball B(x,r)meets at least two faces of Z,or that Z does not coincide with any plane in B(x,r).Then the distance from x to the spine of Z is at most6r/5.Proof.For sets of type G2or G3(and with6/5replaced with a large constant),this is a simple consequence of(2.8).We now return to the standard case.Let L denote the spine of Z.We can assume that B=B(x,r)does not meet L.If Z does not coincide with any plane in B,it meets at least one face F;call P the plane that contains F.Since B does not meet L,it does not meet the boundary of F in P,so P∩B⊂Z.Then B meets some otherface of F′of Z.Call P′the plane that contains F′,and set D=P∩P′and d=dist(x,D).√Then r≥d cos(30◦)=d3/2≥5/6.So we can assume that Z is of type3.If B(x,d)meets L,then dist(x,L)≤d≤6r/5as needed. Otherwise,P∩B(x,d)⊂F,because B(x,d)meets F and does not meet the boundary of F in P.But P∩B(x,d)goes all the way to the point of D that minimizes the distance to x,so this point lies in L and dist(x,L)≤d.Lemma3.4follows.and(4.3)c(x,r)=inf D x,r(E,T);T is a set of type3centered at x .It is not always true that either a(x,r),b(x,r),or c(x,r)is small(when x∈E and B(x,r)⊂B(0,2)),because,for instance Z(x,r)may be a minimal cone of type3centered anywhere in B(x,r).The pairs where either a(x,r),b(x,r),or c(x,r)is small are interesting to study. Definition4.1Let x∈E∩B(0,2)be given.We say that:•x is of type1when b(x,r)≤1500εfor all r small,•x is of type32as it is,and the value of a8=7follows from the geometry.Then a9=10−3just needsto be small,depending on a5and a6.Also,a10=11just needs to be a little larger than a7. The constant a11=15in Corollary4.1just needs to be larger than2a8.The constants a12=10−3,a13=10−3,and a14=132in Lemma4.3are just geometric constants that come from Lemma3.3.Ourfirst constraint on a3comes from Lemma4.3and is that a3≥a14.No constraint comes from Lemma4.4;the constant in(4.12)is just the same as a11in Corollary4.1.In Lemma4.5,we just need a15=25needs to be large enough(it needs to be larger than 10in our proof,because we want the pair(z1,r/10)to satisfy the assumptions in the lemma, a little above(4.19));other constraints will come later.The value of a16=600follows by geometric computations,and so does a17=150in(4.15).We need to pick a3≥a17in Definition4.1.Thefirst constraint on a2in Definition4.1is that a2be larger than the constant in(4.20).12The constant a18=17in Lemma4.6is just a geometric constant.In(4.24)and the few lines above,10−3needs to be replaced with a9from Lemma4.2.A few lines later,we only get that b(x,r)≤a19ε,instead of500ε;this gives a new constraint on a2,namely that a2≥a19(so that we can deduce that x∈E2).The rest of the proof of(4.23)goes smoothly.In Lemma4.7,a20=24can be replaced with2|z−e|≤εr.Obviously,e∈B(x,r),so we canfind p∈P0such that|p−e|≤10−3r;then |p−z|≤εr+10−3r<ρ/3.Similarly,if p∈P0∩B(y,ρ)we canfind e∈E such that |p−e|≤10−3r.Then e∈B(x,r),so we canfind z∈Z such that|z−e|≤εr;altogether |p−z|≤εr+10−3r<ρ/3,and D y,ρ(Z,P0)<1/3.This is impossible,by Lemma3.1.Next suppose that Z is a set of type3,whose center z∈B(x,99r/100),but whose spine meets B(x,98r/100)at some point y.As before,D y,ρ(Z,P0)<1/3forρ=r/200.Since |z−y|≥r/100,Lemma3.2says that Z coincides with a set Y of type2in B(y,4r/500)= B(y,8ρ/5).Then D y,ρ(Y,P0)<1/3too.But the spine of Y goes through y,so Lemma3.1 says that this is impossible.The same argument excludes the case when Z is a set of type 2whose spine meets B(x,98r/100).We are left with the case when Z is of type1,or else its spine does not meet B(x,98r/100). Let F denote the face of Z that contains x,and P the plane that contains F.The boundary of F is contained in the spine of Z,so it does not meet B(x,98r/100);hence(4.6)F∩B(x,98r/100)=P∩B(x,98r/100).Every point of P∩B(x,98r/100)lies in Z by(4.6),so it isεr-close to E,and then(ε+10−3)r-close to P0.This forces every point of P0∩B(x,r)to be2·10−3r-close to P.This stays true (with a constant larger than2)when we deal with sets of type Gi and approximations with d-planes in R n.If(4.5)fails,(4.6)says that there is another face F1of Z that meets B(x,97r/100)at some point y.We know that y isεr-close to E,hence(ε+10−3)r-close to P0.Hence, dist(y,P)≤(ε+3·10−3)r.[In all these estimates,we use the fact that y∈B(x,99r/100), so the successive points that we implicitly use never lie out of B(x,r).]On the other hand,y lies in some other face of Z,and the angles between faces of Z are not too small,so the distance from y to the spine of Z is less than10−2r.For sets of type Gi,we deduce this from(2.8).This is impossible,because the spine of Z does not meet B(x,98r/100).So(4.5)holds.We now return to the lemma,and let y∈B(x,2r/3)be as in the statement.Let us first estimate a(y,t)for t∈(r/10,4r/10).First observe that B(y,t)⊂B(x,967r/1000). By(1.1)and(4.5),we canfind q y∈P such that|y−q y|≤εr.Set P′=P+(y−q y). We can use P′to compute a(y,t),because it goes through y,so a(y,t)≤D y,t(E,P′).If e∈E∩B(y,t),dist(e,P′)≤dist(e,P)+|y−q y|≤dist(e,P)+εr≤2εr,by(1.1) and(4.5).If p′∈P′∩B(y,t),p=p′−(y−q y)lies in Z∩B(x,97r/100)by(4.5),and dist(p′,E)≤dist(p,E)+εr≤2εr by(1.1).So a(y,t)≤D y,t(E,P′)≤2εr/t≤20ε.The pair(y,t)satisfies the hypothesis of the lemma,namely,a(y,t)≤10−3,so we can iterate the previous argument(this time,keeping y at the center).This yields that a(y,s)≤20εfor r/100≤s≤4r/10,and(after many iterations of the argument)for every s<4r/10.Finally,for each s<4r/10there is a set Z(y,s)as in(1.1),and since a(y,s)≤20εthe proof of(4.5)shows that Z(y,s)coincides with a plane P on B(y,97s/100).We can use P in the definition of a(y,96s/100),and we get that a(y,96s/100)≤100ε/96≤2ε.Since this holds for s<4r/10,Lemma4.1follows.。
a -RST:a generalization of rough set theoryMohamed QuafafouIRIN,University of Nantes,2rue de la Houssiniere,BP 92208-44322,Nantes Cedex 03,France Received 16September 1997;received in revised form 13July 1999;accepted 4September 1999AbstractThe paper presents a transition from the crisp rough set theory to a fuzzy one,called Alpha Rough Set Theory or,in short,a -RST.All basic concepts or rough set theory are extended,i.e.,information system,indiscernibility,dependency,reduction,core,de®n-ability,approximations and boundary.The resulted theory takes into account fuzzy data and allows the approximation of fuzzy concepts.Besides,the control of knowledge granularity is natural in a -RST which is based on a parameterized indiscernibility rela-tion.a -RST is developed to recognize non-deterministic relationships using notions as a -dependency,a -reduct and so forth.On the other hand,we introduce a notion of relative dependency as an alternative of the absolute de®nibility presented in rough set theory.The extension a -RST leads naturally to the new concept of alpha rough sets which represents sets with fuzzy non-empty boundaries.Ó2000Elsevier Science Inc.All rights reserved.Keywords:Rough sets;Fuzzy sets;Attributes dependency;Concept approximation1.IntroductionRough sets are a suitable mathematical model of vague concepts,i.e.,concepts without sharp boundaries.Rough set theory is emerging as a pow-erful theory dealing with imperfect data [1±3].It is an expending research area which stimulates explorations on both real-world applications and on the theory itself,i.e.,decision analysis,machine learning,knowledge discovery,market research,con¯ict analysis,and so forth.Recent theoretical develop-ments on this theory and its applications are collected in[4].Information Sciences 124(2000)301±/locate/insE-mail address:quafafou@irin.univ-nantes.fr (M.Quafafou).0020-0255/00/$-see front matter Ó2000Elsevier Science Inc.All rights reserved.PII:S 0020-0255(99)00075-4302M.Quafafou/Information Sciences124(2000)301±316Rough set theory assumes that information systems contain only crisp data and any feature(attribute)of any object(example)has a precise and unique value.However,real-world data are generally imprecise,tend to be noisy, contaminated by errors,and values for attributes are often missing.To over-come the problem of uncertain data,Slowinski et al.discuss in Ref.[5]four sources of uncertainty;discretization process[11],imprecise data,missing value and multiple descriptors,and they propose a generalization of the classical indiscernibility relation to handle uncertainty caused by multiple descriptors. On the other hand,Ziarko proposed an extension of rough sets,called variable precision model of rough set theory(VPRS),to take into account non-deter-ministic relationships and to identify strong rules[6,7].Other extensions to standard rough set theory have been suggested in Refs.[8±10].In this paper,we present a new extension of rough set theory,called a-RST, where all basic concepts of rough set theory,i.e.,information system,indis-cernibility relation,dependency,and so forth,are generalized.Fuzzy sets are used to discretize continue attributes but a-RST is developed to approximate fuzzy sets.On the other hand,the control of knowledge granularity is natural in a-RST because it is based on a parameterized indiscernibility relation.An-other feature of a-RST is the notion of a-dependency,i.e.,a set of attributes depends on another with a given degree in the range[0,1].This notion may be seen as a partial dependency between attributes.We have also introduced notions as a-reduction,a-definability and approximations of fuzzy sets that are also fuzzy.Finally,a-RST is to allow the control of the universe partitioning and the approximation of concepts.In this paper we present extensions of the basic concepts of rough set theory. Section2presents a generalized information systems.Section3introduces the aggregation operator and states the new description of an object of the uni-verse.A parameterized indiscernibility relation based on a similarity threshold is de®ned in Section4.The notion of a-core,using extensions of dependency and reduction concepts,is de®ned in Section5.Section6analyses the notion of de®nability and proposes a new de®nition of de®nability,called a-de®nability, which may be seen as a relative de®nability in contrast with the absolute de-®nability in rough set theory.Section7is dedicated to approximations of fuzzy concepts and their properties.At the end of this section,the new concept of Alpha rough sets immerges,i.e.,sets with a fuzzy non-empty boundary.In Section8,we discuss some important issues through related work and we conclude in Section9.2.A generalized information systemLet U be a®nite set of objects or examples,U f x1Y x2Y F F F Y x n g.These objects of the universe U are characterized by a®nite set of attributes,denotedas Q f q 1Y q 2Y F F F Y q m g .An attribute q i may be qualitative or quantitative.A qualitative attribute q i has a domain v qi which determines the set of possible values for attribute q i .For instance,the attribute Age may be seen as a qualitative attribute (nominal or categorical)which is de®ned by its discrete domain (a ®nite discrete set)as {young,adult,middle_age,old,too_old}.Equivalently,each value of Age is described by its membership function.The membership function of middle_age is depicted in Fig.1(a).The attribute Age may also be seen as a quantitative (cardinal)one,which is de®ned by its continuous domain (interval)as a Y b .A preprocessing discretization phase is generally necessary for a numerical attribute to divide its domain into intervals which correspond to qualitative terms.This translation process may be per-formed automatically [discretization]or by an expert [12].Fig.1(b)shows a membership function which represents the value middle_age of attribute Age.Modeling a value of attribute using membership depicted in Fig.1(a)and (b)assumes that concepts are crisp,i.e.,the age of a person is middle_age or is not.This approach is based on 2-valued logic ±black or white.Consider next that middle_age is de®ned by fuzzy property ``around 45years''.Qualitative attri-butes may be interpreted as linguistic variables,according to the fuzzy set theory [13,14].In this case,there is not a unique membership function for middle_age and the expert has to determine it according to the nature of the application domain.Fig.1(c)shows the membership function whichrepresentFig.1.Possible membership functions to represent midle_age.M.Quafafou /Information Sciences 124(2000)301±316303the value middle_age;its values measure degrees to which examples satisfy the imprecisely de®ned property around 45yr.Each attribute value is represented by a membership function,which de®nes a crisp or a fuzzy set.The set V q is a domain of the attribute q ,V q f v q 1Y v q 2Y F F F Y v qr g and V is equal to q P Q V q .We note as M qi the set of membership functions associated to the value v qi of attributes qi ,M q f l q 1Y l q 2Y F F F Y l qr g and M q P Q M q .So far,we have seen that each value of any attribute is represented with a membership function.This modeling preprocess may in¯uence the results of systems using rough set theory [15,16].We de®ne the notion of generalized information system in order to take into account uncertainty inherent to both the data and the preprocessing.We de®ne the generalized information function ~q q Âd ,where:·q is called nominal information function q :U ÂQ 3V Y x Y q 3v qi Y ·d is called cardinal information function d :U ÂQ 3 0Y 1 Y x Y q 3l qi q x XEach object x of the universe U is described by a vector of pairs q x Y q ,d x Y q for all q in Q .Thus,q x Y q represents the nominal (linguistic)value of the attribute q for the object x ,whereas d x Y q is the cardinal value of q ,i.e.,degree of possibility that the attribute q has value q x Y q for the object x .A generalized information system represents a function,called a alpha informa-tion function ,denoted as ~q,which maps U ÂQ into V  0Y 1 .~q X U ÂQ 3V  0Y 1 Y x Y q 3 q x Y q Y d x Y q XConsequently,we de®ne a generalized information system as a classical in-formation system with nominal and cardinal information functions h U Y V Y Q Y ~qi .3.Aggregation operator and fuzzy subsetsWe introduce an aggregation operator ,denoted /R ,which associates to each object x a degree of possibility of its realization according to a set of attributes R .This degree results from the aggregation of d x Y q for all attributes R .In what follows we will de®ne /R as Yager's parameterized t -norm,with 13I :/R U 3 0Y 1 Y x 3min f d x Y q V q P R g XConsequently,we associate to a given subset of attributes R a region of the universe U ,denoted ~UR or,~U which is de®ned as the graph of the function /R ,304M.Quafafou /Information Sciences 124(2000)301±316i.e.,~U graph /R f x Y /R x X x P U g .~U is a fuzzy subset of the universe U ,indeed,any object x of U belongs to ~U to some degree / x P 0Y 1 and the membership function is not black or white.In what follows,members of ~U will be noted ~x ,with ~x x Y /R x .Proposition 1.vet T Y i q i su set of D so graph /R I i graph /qi .The previous proposition holds because V x P U R Y /R x 6/qi x for all q i P T and /R x min q i P R /qi x .In this context,the description of an object x P U in terms of values of attributes from R Q ,denoted a -Des R x becomesa -Des R x f q Y m Y p X ~qx Y q m Y p V q P R g X Let be a subset of attributes containing k attributes denoted as q 1Y q 2Y F F F Y q k .The value of an attribute q i in a generalized information system is a couple m i Y p i ,where m i is a nominal value,i.e.,m i q x Y q i ,and p i is the degree of possibility for x to have value m i for attribute q i ,i.e.,p x d x Y q i .Conse-quently,an object x is characterized by two vectors,denoted m x and p x .The ®rst one m x m 1Y m 2Y F F F Y m k contains nominal values,whereas the second one p x p 1Y p 2Y F F F Y p k contains possibility degrees which a vector in I k 0Y 1 k .The ®rst vector is the classical description of objects in rough set theory,whereas the second vector is a point in the hypercube I k ,which is the set of all fuzzy subsets.Indeed,a fuzzy set is any point in the cube I k [17].The second vector is a fuzzy set.In the classical rough set theory,an object or an example is represented by a vector in multi-dimensional feature space.This description is known in ma-chine learning as attribute-value description.The indiscernibility relation in rough set theory is de®ned basing on the equality of attribute values.In con-trast,an example is described in a -RST by two vectors resulting from both cardinal and nominal functions.In the next section,we generalize the indis-cernibility notion to take into account both the symbolic values and possibility degrees.4.Generalization of indiscernibility relationThe basic operations in rough set theory are approximations which are de®ned according to the indiscernibility notion.In fact,objects of the universe may be grouped according to values of a given set of attributes R .Each group contains objects with the same values of attributes in R .This means that the information given by the attributes in R is not su cient to distinguish objects in the same group.We say that objects of the same group are indiscernible.This equivalence relation,denoted IND R ,is de®ned on the equality of values of attributes in R .The quotient set U a IND R contains equivalence classes M.Quafafou /Information Sciences 124(2000)301±316305306M.Quafafou/Information Sciences124(2000)301±316which are granules of knowledge representation.These groups,called R-ele-mentary sets,form basic granule of knowledge about the universe and they are used to compute approximations of concepts.The previous indiscernibility de®nition is not su cient when we consider a generalized information system because it does not take into account possi-bility degrees associated with values of attributes in R.In order to cope with this problem,we introduce a parameterized relation,denoted IND R Y a or,in short,R a.We consider that two elements are indiscernible if and only if they have the same values for all attributes and if their possibility degree is greater than a given similarity threshold,denoted a:V x Y y P U xR a y@A x~Ry and l~s p x Y p y P a XThe relation R a is de®ned on both an equivalence relation~R and a similarity relation~S.The relation~R is similar to the classical indiscernibility relation R, de®nition is based on the equality of attribute values(i.e.x~R y@A vx vy . On the other hand,the relation~S must verify the three following properties [18]:1.Re¯exivity:V x P U l~S p x Y p x 1Y2.Symmetry:V x Y y P U l~S p x Y p y l~S p y Y p x Y3.Max-*transitivity:V x Y y Y z P U l~S p x Y p z P l~S p x Y p y Ãl~S p y Y p z , whereÃis a binary operation de®ned on 0Y1 such that0Ã0 1Ã0 0Ã1 0and1Ã1 1.The operatorÃis usually chosen such that V a Y b P 0Y1 2Y aÃb6min a Y b .The operatorÃis assumed to be``min''in what follows.The pioneer works onÃare described in Refs.[19±21],see also [22]for more information on similarity relation.Proposition2.sf~R is n equiv len e rel tion,~S is simil rity rel tion nd the oper torÃis the min99then R a is n equiv len e rel tion.Proof.Let~R relation be an equivalence relation and~S be a similarity relation.1.Re¯exivity:x~Rx and l~S p x Y p x 1P a.2.Symmetry:xR a y A l~S p x Y p y P a Y so l~S p y Y p x P a A yR a x.3.Transitivity:x R a y and y R a z A l~S p x Y p y P a and l~s p y Y p z P a,hence, l~S p x Y p z P min l~S p x Y p y Y l~S p y Y p z P a.An equivalence class of U a R a which is determined by an element x P U is denoted as x a R.These classes result from the partition of all the universe .So far,we have associated a fuzzy subset~U R f x Y/R x X x P U g to each subset of attributes.The result of the partition of this subset are fuzzy subsets(or classes):M.Quafafou/Information Sciences124(2000)301±316307 V~x Y~y P~U R Y~x R a~y@A~x~R~y and l~s p~x Y p~y P a XThe family of all equivalence classes of relation R a on~U R is denoted by~U a R a. The following proposition states the relation between U a R a and~U a R a.The family of all equivalence classes of relation R a on~U R is denoted by~U a R a. The indiscernibility relation de®ned in rough set theory is black or white, i.e.,two elements are indiscernible or they are not.The alpha indiscernibility, denoted a-IND,generalizes the notion of indiscernibility by computing an indiscernibility degree associated to any two objects.The degree of indiscern-ibility of two objects~x and~y,denoted a-IND(~x Y~y),corresponds to the degree of non-similarity of the two objects,i.e.,a-IND(~x Y~y 1Àl~S ~x Y~y).So,ele-ments are more indiscernible when they are more similar.5.Dependency and reductionIn rough set theory,we say that a set of attributes R depends on a set of attributes P,denoted P3R,i all elementary sets of the indiscernibility re-lation associated with P are subsets of some elementary sets de®ned by R.A set attributes may be reduced in such a way that the resulting set of attributes, which is smaller,provides the same quality of classi®cation as the original sset of attributes.This means that elementary sets generated by the reduced set A are identical to those generated by the original set of attributes P,i.e., IND A IND P .The smallest reduced set of P is called a reduct.In the case where an information system has more than one reduct,the intersection of these reducts is computed.It is called the core and represents the most sig-ni®cant attributes in the system.The following proposition states the relation between the dependency and reduction notions.Proposition3.vet nd e two su sets of ttri utes,R P Q. is redu t of iff P3R nd R3P .We extend this notion of dependency to consider generalized information systems.Thus,the dependency relation is not a black or white relation.We say that a set of attributes depends on a set i each elementary set of in-discernibility relation associated with has a non-empty intersection with at least an elementary set de®ned by ,and the inclusion degree of in is greater than a dependency parameter,noted as b.We call this property elph dependen y of attributes.De®nition1(a-Dependency).Let P and R be two subsets of attributes, R P Q and a P 0Y1 .R alpha-depends on P i W b P 0Y1 such that308M.Quafafou/Information Sciences124(2000)301±316P3b R@A V B P U a IND P Y a Y W b H P U a IND R Y adegree B b H P b XThe previous de®nition introduces the notion of lph dependen y which can be seen as a partial dependency between attributes.Consequently,the values of attributes in are partially determined by values of attributes in .We say that partially explains and there is only a partial functional dependency be-tween values of and .Proposition4.vet nd e two su sets of ttri utes,ab P HD I su h th t a6b,we h ve P3b R A P3a R .Now,we introduce the notion of a-reduct by generalizing Proposition4.De®nition2(a-Reduct).Let P and R be two subsets of attributes,such that R P Q.R is an alpha-reduct of P,i.e.,R a-reduct( ),i W b P 0Y1 such that(i)P3b R,R3b P and(ii)R is minimal.is minimal means that there is no subset of attributes T P such that T3a H R and a H P a.As a generalized information system may have more than one Alpha reduct,we generalize the notion of the core of an attribute by in-troducing the notion of Alpha core.The a-core is then de®ned as the inter-section of all a-reducts.In this section,we have generalized two key concepts of the rough set the-ory,which are dependency and reduction.The notion of the core is then generalized based on the de®nition of alpha reduct.These concepts are gen-erally used in the rough set analysis to construct minimal subsets of attributes (reducts)which have the same indiscernibility power as the whole set of at-tributes.This analysis leads to the construction of deterministic reductions and a deterministic core,i.e.,the decision depends on reduction and the core is an essential part of the whole set of attributes.Alpha rough set allows the user to explore the information using di erent thresholds related to the reduction or the core.This analysis may lead to the construction of strong reduction and cores,which are only consistent with a part of the data in the information system and we may regard the remaining inconsistency information as noisy data.6.On the de®nibility of setsThe notion of de®nability is important because it is at the basis of the rough set de®nition.Indeed,Pawlak de®nes a rough set as a set with a non-empty boundary region,which means that its lower approximation is not equal to itsupper approximation [1].Hence,a subset X of the universe U is de®nable if and only if RX "RX ,otherwise X is unde®nable.This de®nition of de®nability is suitable when all data in the information system are crisp.We have extended it to consider imprecision inherent to real world data,so we can say that a set,or more precisely a fuzzy set,X is R -de®nable with a given degree a .For instance,X can be R -de®nable (a 1),strongly R -de®nable (e.g.,a 0X 9),slightly R -de®nable (e.g.,a 0X 25),and so forth.We call this property the a -de®nability of a subset,and the degree of de®nability of a set X will be denoted as a -def(X ).As a rough set is characterized by its lower and upper approximation,it is natural to consider that the degree of de®nability of a set depends on its lower and upper approximation.The alpha de®nability notion is formalized as follows:a Àdef X f R a X Y "Ra X Y where f R a X Y "R a X 0if R a X Y Y f R a X Y "R a X 1if R a X "R a X Y f R a X Y "Ra X P 0Y 1 otherwise X P R Any function respecting the previous constraints may be used to compute de®nability degrees of sets.In what follows we will consider thatf RX Y "RX 1 h Ãh 2with h degree "RX RX X If a set X is de®nable,this means that RX "RX ,so a -def X 1.On the contrary,i.e.,RX "RX ,the de®nability degree is less than one:a -def X `1.Di erent methods may be used to compute the inclusion degree of a set into another.For instance,let M X be the cardinally of the fuzzy set X de®ned as follows:M Xe i P xl x e i X The degree to which X is a subset of Y isdegree X Y M X Y M YX Obviously,any subset X which is the union of elementary sets is 1-de®nable,i.e.,its de®nability degree is equal to one.,Such sets are called de®nable sets.When a set is not de®nable we can compute its de®nability degree and it is said unde®nable if the computed degree is none.On the other hand,we de®ne the notion of roughness of a set as a symmetric notion of the de®nability.Indeed,the roughness of a set as a symmetric notion of the de®nability.Indeed,the M.Quafafou /Information Sciences 124(2000)301±316309roughness of a set X which is totally de®nable,i.e.a q ,is equal to none.On the other hand,this roughness degree is equal to 1when X is totally unde-®nable,i.e.a 0.The roughness of a set X is computed according to its de-®nability degree using the following formula:a -rough X 1Àa -def(X ).The more sets are unde®nable,the more they are rough.In the rough set theory,a set (concept)can be de®nable or unde®nablel,and the four following classes of unde®nable sets are introduced for ranking sets according to their de®nability property:B1:if RX Y and "RX U Y X will be called roughly de®nable.B2:if RX Y and "RX U Y X will be called externally unde®nable.B3:if RX Y and "RX U Y X will be called internally unde®nable.B4:if RX Y and "RX U Y X will be called totally unde®nable.Consequently,if a set X is unde®nable and not roughly de®nable,then it is internally,externally,or totally unde®nable.This is a discrete approach for classifying sets according to their de®nability.However it is not possible to classify sets which belong to the same class,for instance those of the de®nable sets class or those of the class B 1(roughly de®nable sets class).In the a -RST framework,we can compare two sets according to their de®nability degrees and a concept C 1is said to be more de®nable than concept C 2i a -def (C 1)is greater than a -def(C 2).In conclusion,we note that rough set theory de®nes an absolute de®nability of a set X by comparing it lower and upper approxima-tion to the empty set and to all the universe,when alpha rough set theory introduces a relative de®nability by comparing its lower approximation to its upper approximation.Any two sets may be classi®ed according to their de®nability degrees.Finally,Alpha de®nability is a property of both crisp and fuzzy sets.7.Approximation of fuzzy conceptsThe most important ideas of rough set theory are lower and upper ap-proximations.These approximations are computed for every concept,i.e.,the set of all examples for which a value of the decision is ®xed.The lower and upper approximations are useful when the information system contains ex-amples which are described by the same value for all condition attributes and belong to di erent classes,i.e.,when the system is inconsistent.The classical de®nition of approximation does not hold when we consider generalized information systems,indeed ~x R a and X are both fuzzy sets.They are extended in order to de®ne approximations of fuzzy sets which are also fuzzy.The basic idea behind degrees of computing process is simple.Let X be a fuzzy subset of U and x P U .The object x belongs to the lower approximation of X i ~x R a is included in X ,which implies that x belongs to X ~x R a .Thus,the degree to which x belongs to the lower approximation is equal to the 310M.Quafafou /Information Sciences 124(2000)301±316minimum of degree a 1to which it belongs to X and degree a 2to which it belongs to ~x R a X On the other hand,if x belongs to the upper approximation,then x must belong to ~x R a or to X ,thus it belongs to the union of X and ~X ÂÃR a.In this case the degree of x is equal to the maximum of a 1and a 2.Conse-quently,the generalized approximations are de®ned as follows:R a X x Y D x P U n 0Y 1 j ~x R a X and D x min l ~x R a ~x Y l x ~x oY "R a X x Y D x P U n  0Y 1 j ~x R a X Y and D x max l ~x R a ~x Y l x ~x oX Let X and Y be two fuzzy subsets of the universe U ,R a subset of attributes,a P 0Y 1 a similarity threshold and ~Ua fuzzy subset de®ned by the graph of the aggregation operator /R .Lower and upper approximations of fuzzy sets have the following properties.3R a Y Y R a Y3R a U U and "Ra U U 3R a ~U~U "R a ~U 3R a X X "Ra X 3R a X R a Y R a X Y3"Ra X "R a Y "R a X Y 3R a x Y "Ra X "R a Y 3R a X ÀY R a X ÀR a Y 3"Ra X À"R a Y "R a X ÀY 3R a R a X R a X3R a X "R a R a X 3R a "Ra X "R a X 3"Ra X "R a "R a X One can see that the universe U is not de®nable,whereas the fuzzy subset ~Uis de®nable.All properties of classical approximations do not hold when we consider approximation of fuzzy sets because the intersection of a fuzzy set and its complement may be non-empty,and the union of fuzzy set and its com-plement is not equal to the universe U .Lower and upper approximations are at the basis of de®nition of the boundary of a set.Let R be a subset of attributes,X be a fuzzy subset and a P 0Y 1 be a similarity threshold.The lower ap-proximation of X ,i.e.,R a X Y and the upper approximation of X ,i.e.,"Ra X ,are also fuzzy sets.We de®ne the boundary of X ,denoted by B aR X ,as de®ned in rough set theoryB a R X "R a X YFor instance,the concept of beautiful sight is vague,i.e.,concept without asharp boundary,indeed,there are some sights for which we cannot decide if they are beautiful or not.This classical approach of rough set is governed by a logic that permits a vague concept to possess one of only two values which are true or false.In Alpha rough set,the boundary of a fuzzy set is also a fuzzy setM.Quafafou /Information Sciences 124(2000)301±316311312M.Quafafou/Information Sciences124(2000)301±316and an example has a degree of membership in a boundary set A set with a non-empty boundary is not only rough but it is also fuzzy.The fuzziness of the boundary results from the fact that concept are not simply true or false,but may be true to any degree.8.Discussion through related workThe goal of this®rst presentation of a-RST is to introduce the main concepts which generalize the basic concepts of rough set theory.The generalization proposed here results from our experience and the lessons learnt from the development of a rule induction system based on both fuzzy and rough sets [23,24].In what follows we sketch some important issues through related work. Information systems and indiscernibility relation.Slowinski et al.have pro-posed a generalized system in[25]considering the case where the value of an attribute may be a set of qualitative terms.They generalize the indiscernibility notion in order to deal with relative degree of possibility of sub-object.This approach does not consider approximation of fuzzy sets even if numerical values are replaced by fuzzy intervals in the discretization process of numerical attributes.Another work[26]was also developed by Slowinski et al.to transform the classical indiscernibility relation used in rough set theory to a more general similarity relation.Dubios and Prade have introduced the con-cept of twofold fuzzy sets in Ref.[27]and they have also proposed to couple fuzzy sets(vagueness)and rough sets(coarseness)to get a more accurate ac-count of imperfect data[28,29].Control of knowledge granularity.In rough set theory,any subset of attribute R generates an indiscernibility relation IND(R).Equivalence classes corre-spond to granules of knowledge representation,which are at the basis of de®nitions of key concepts,i.e.,approximation,dependency and reduction. Ziarko and al.have proposed to control the degree of granularity using the DataLogic system[30].The control of granularity is now possible using only extension of the basic concepts of rough set theory a-RST framework),in-deed,we have de®ned a parameterized indiscernibility relation R a Y where the similarity threshold a is a parameter in the range 0Y1 .The user can control the partitioning of the universe by varying a from none(coarsest partitioning)to one(®nest partitioning).Partial dependency and reduction of attributes.Dependency and reduction are two important issues in rough sets,indeed discovering dependencies among attributes leads to reduction which are minimal subsets which have the same quality of classi®cation as the original set of attributes.Discovering depen-dencies is primarily of importance in rough set approach[31]to knowledge analysis,data exploration,learning and more generally reasoning on data. However,real world data may be noisy,contaminated by errors and/or it may。