当前位置:文档之家› Compressed representation of sequences and full-text indexes

Compressed representation of sequences and full-text indexes

Compressed representation of sequences and full-text indexes
Compressed representation of sequences and full-text indexes

Compressed Representations of Sequences and Full-Text Indexes?

Paolo Ferragina1,Giovanni Manzini2,Veli M¨a kinen3,and Gonzalo Navarro4

1Dipartimento di Informatica,Universit`a di Pisa,Italy.

2Dipartimento di Informatica,Universit`a del Piemonte Orientale,Italy.

3Technische Fakult¨a t,Universit¨a t Bielefeld,Germany.

4Center for Web Research,Department of Computer Science,University of Chile,Chile.

Abstract.Given a sequence S=s1s2...s n of integers smaller than r=O(polylog(n)),we show

how S can be represented using nH0(S)+o(n)bits,so that we can know any s q,as well as answer

rank and select queries on S,in constant time.H0(S)is the zero-order empirical entropy of S and

nH0(S)provides an Information Theoretic lower bound to the bit storage of any sequence S via a?xed

encoding of its symbols.This extends previous results on binary sequences,and improves previous

results on general sequences where those queries are answered in O(log r)time.For larger r,we can

still represent S in nH0(S)+o(n log r)bits and answer queries in O(log r/log log n)time.

Another contribution of this paper is to show how to combine our compressed representation of integer

sequences with an existing compression boosting technique to design compressed full-text indexes that

scale well with the size of the input alphabetΣ.Namely,we design a variant of the FM-index that

indexes a string T[1,n]within nH k(T)+o(n)bits of storage,where H k(T)is the k-th order empirical

entropy of T.This space bound holds simultaneously for all k≤αlog|Σ|n,constant0<α<1,and

|Σ|=O(polylog(n)).This index counts the occurrences of an arbitrary pattern P[1,p]as a substring

of T in O(p)time;it locates each pattern occurrence in O(log1+εn)time,for any constant0<ε<1;

and it reports a text substring of length?in O(?+log1+εn)time.

Compared to all previous works,our index is the?rst one that removes the alphabet-size dependance

from all query times,in particular counting time is linear in the pattern length.Still,our index uses

essentially the same space of the k-th order entropy of the text T,which is the best space obtained in

previous work.We can also handle larger alphabets of size|Σ|=O(nβ),for any0<β<1,by paying

o(n log|Σ|)extra space and by multiplying all query times by O(log|Σ|/log log n).

1Introduction

Recent years have witnessed an increasing interest on succinct data structures.Their aim is to represent the data using as little space as possible,yet e?ciently answering queries on the repre-sented data.Several results exist on the representation of sequences[24,31,3,35,36],trees[33,13], graphs[33],permutations[32],and texts[19,7,37,34,17],to name a few.

One of the most basic structures,which lie at the heart of the representation of more complex ones,are binary sequences,with rank and select queries.Given a binary sequence S=s1s2...s n, we denote by Rank c(S,q)the number of times the bit c appears in S[1,q]=s1s2...s q,and by Select c(S,q)the position in S of the q-th occurrence of bit c.The best current results[35,36] answer those queries in constant time,retrieve any s q in constant time,and occupy nH0(S)+o(n) bits of storage,where H0(S)is the zero-order empirical entropy of S.This space bound includes that for representing S itself,so the binary sequence is being represented in compressed form yet allowing those queries to be answered optimally.

For the general case of sequences over an arbitrary alphabet of size r,the only known result is the one in[17]which still achieves nH0(S)+o(n)space occupancy.The data structure in[17]is the elegant wavelet tree,it takes O(log r)time to answer Rank c(S,q)and Select c(S,q)queries,and to retrieve any character s q.

Our?rst contribution is a new compressed representation for general sequences,which uses nH0(S)+o(n)bits of space and answers the above queries in constant time.This generalizes previous results on binary sequences[36]and improves the existing result on general sequences [17].Our result holds when the alphabet size is polylogarithmic in the sequence length,that is, r=O(polylog(n)).For larger values of r,we can represent S using nH0(S)+o(n log r)bits of space and answer all previous queries in O(log r/log log n)time.

General sequences can be regarded,of course,as texts over an alphabetΣ.The di?erence lies on the types of queries that are of interest on texts.A full-text index is a data structure built over a text string T[1,n]that supports the e?cient search for arbitrary patterns as substrings of the indexed text.A full-text index is called compressed if its space occupancy is bounded byλnH k(T)+o(n)bits for some k≥0,whereλis a constant and H k(T)is the k-th order entropy of T(see Section4.1).If such an index also encapsulates the text,without requiring its explicit storage,then it is called a compressed self-index.Note that a self-index must,in addition to the search functionality,permit the display of any text substring,as the text is not separately available.

Recently,there has been a good deal of activity around compressed full-text indexes,because of their obvious applications in text databases.The most succinct self-index up to date[17]occupies nH k(T)+O(n log log n/log|Σ|n)bits,for a?xed k≤αlog|Σ|n and constant0<α<1.It can count the number of occurrences of a pattern P[1,p]in O(p log|Σ|+polylog(n))time,and can locate each such occurrence in O(log|Σ|log2n/log log n)time.To display a text substring of length?it takes the locate time plus O(?log|Σ|).

Our second contribution(which builds on the?rst)is a new compressed self-index that uses nH k(T)+o(n)bits for any k≤αlog|Σ|n and for alphabets of size|Σ|=O(polylog(n)).Our index improves over the one in[17]by removing from the query times the dependence on the alphabet size and the polylogarithmic terms.More precisely:counting takes O(p)time,locating an occurrence takes O(log1+εn)time for any constantε>0,displaying a length-?text substring takes the time for locating plus O(?).For alphabets larger than O(polylog(n))(and up to O(nβ)for0<β<1),our space requirement becomes nH k(T)+o(n log|Σ|)bits and our query times grow by a multiplicative factor O(log|Σ|/log log n).

The rest of the paper is organized as follows.Section2explains our contributions in more detail and relates them to the current literature.Section3presents our new representation for general sequences,while Section4deploys such a sequence representation to design our new compressed self-index.Finally,Section5discusses conclusions and future directions of interesting research.

2Our Contribution in Context

2.1Succinct Sequence Representations

The?rst results on binary sequences[24,31,3]achieved constant time on rank and select queries by using n+o(n)bits.In those schemes,n bits are used by S itself and o(n)additional bits are needed by the auxiliary data structures that support the Rank c and Select c queries.Further re?nements [35,36]achieved constant time on the same queries by using a compressed representation of S

2

requiring nH0(S)+o(n)bits overall(i.e.for S and the auxiliary data structures).These solutions also support the constant-time retrieval of any bit s q given q.

The case of general sequences,whose symbols are drawn from the range[1,r],has received less attention.The only existing proposal is the wavelet tree[17],a powerful and elegant data structure that allows reducing rank and select operations over general sequences to rank and select operations over binary(compressed)sequences.By using the results in[35,36]for binary sequence representations,the wavelet tree represents S in nH0(S)+o(n log r)bits and answers s q,Rank c(S,q) and Select c(S,q)queries in O(log r)time.

In this paper we generalize the result on binary sequences[35,36]to sequences of integers in the range[1,r],and obtain an improved result.The main challenge in this generalization is to generate short descriptions for pieces of the sequence,which can be computed in constant time and used to index into tables containing partial precomputed queries.This is signi?cantly more complex than for binary sequences.We obtain a compressed sequence representation using nH0(S)+ O((rn log log n)/log r n)bits which answers queries s q,Rank c(S,q)and Select c(S,q)in constant time.

This?rst result is interesting only for small r=o(log n/log log n),as otherwise the term O((rn log log n)/log r n)of the space complexity is?(n log r).We then use that sequence represen-tation as a basic block within a generalized wavelet tree to obtain better results.In fact,the original wavelet tree[17]is a binary tree whose nodes contain binary sequences representing the original (general)sequence restricted to a subset of its symbols.In this paper we consider a h-ary generaliza-tion of the wavelet tree that,in turn,requires the e?cient storage into its nodes of integer sequences from the range[1,h].By setting h=O(logδn)withδ<1,and using our previous sequence repre-sentation to store the wavelet-tree nodes,we obtain a representation for S that uses nH0(S)+o(n) bits and(optimal)constant query time for any r=O(polylog(n)).That is,the queries answered in logarithmic time with binary wavelet trees[17]are now answered in constant time,within the same asymptotic space occupancy.For larger alphabets,we can still obtain o(n log r)extra space over the nH0(S)term and O(log r/log log n)query time,again improving upon binary wavelet trees of a sublogarithmic factor.

Note that there are even better results for binary sequences[36],which we have not generalized. For example,it is possible to represent a sequence with m bits set taking nH0(S)+o(m)+O(log log n) bits of space,and constant query time.This is better than the result we have focused on when m is much smaller than n.

2.2Compressed Full-Text Indexes

The classical full-text indexes,namely su?x trees and su?x arrays[4,29,15],are not succinct nor self-indexes:They both occupyΘ(n log n)bits,plus O(n log|Σ|)bits to represent the indexed text.

The FM-index[7]has been the?rst self-index in the literature to achieve a space occu-pancy proportional to the k-th order entropy of T[1,n].Precisely,the FM-index occupies at most 5nH k(T)+o(n)bits of storage,and allows one to count the number of occurrences of a pattern P[1,p]within T in O(p)time.Each such occurrence can be located in O(log1+εn)time,whereε>0 is an arbitrary constant chosen when the index is built.The FM-index can display any text sub-string of length?in O(?+log1+εn)time.The design of the FM-index is based upon the relationship between the Burrows-Wheeler compression algorithm[1]and the su?x array data structure[29,15]. It is therefore a sort of compressed su?x array that takes advantage of the compressibility of the indexed text in order to achieve space occupancy related to the Information Theoretic minimum. Indeed,the design of the FM-index does not depend on the parameter k whereas its space bound

3

holds simultaneously for any?xed k.These remarkable theoretical properties have been validated by experimental results[8,9,22]and applications[20,39].

The above time and space bounds for the FM-index have been obtained by assuming that the size of the input alphabet is a constant.Hidden in the big-O notation there is an exponential dependence on the alphabet size in the space bound,and a linear dependence on the alphabet size in the time bounds.More speci?cally,the time to locate an occurrence is O(|Σ|log1+εn)and the time to display a text substring is O((?+log1+εn)|Σ|).In practical implementations of the FM-index[8,9]these dependencies have been removed in exchange for a penalty factor(usually O(log n))that multiplies all query times,including that for counting.

We point out that the FM-index concept is more general than the implementation associated to its initial proposals[7–9].For the sake of presentation,let us denote now by T bwt the permutation of the text T given by the Burrows-Wheeler transform(see[1]and Section4.2).Assume we implement the computation of Rank c(T bwt,q)and the retrieval of T bwt[q]in time t rnk and t ret,respectively, using O(N)space(where the parameter N is discussed below).The general FM-index concept gives us immediately a succinct self-index that requires O(N+n/logεn)space,counts the pattern occurrences in O(p t rnk)time,locates any such occurrence in O((t rnk+t ret)log1+εn),and displays any text substring of length?in O((t rnk+t ret)(?+log1+εn))time(here and in the rest of this section εis a positive parameter chosen when the index is built).Up to date,several implementations of the FM-index concept exist:

1.The original one[7],with t rnk=O(1)and t ret=O(|Σ|),where N=5nH k(T)+o(n)contains

an exponential dependence on|Σ|in the o(n)term.

2.The one obtained by using the binary wavelet tree[17]to represent T bwt and to implement

the required functions.This yields t rnk=t ret=O(log|Σ|),so counting time deteriorates but locating and displaying times improve.The space occupancy becomes N=nH0(T)+ o(n log|Σ|),which depends only mildly on|Σ|but has a main term signi?cantly worse than in the original implementation(since here H0(T)occurs in place of H k(T)).Some variants providing t rnk=t ret=O(H0(T))on average,and O(log|Σ|)in the worst case,have also been proposed[26].

3.A variant of the Compressed Su?x Array(CSA)introduced by Sadakane[38].Although the CSA

was originally conceived as completely di?erent from the FM-index,this variant is actually an

implementation of the FM-index concept.It represents T bwt via|Σ|binary sequences T bwt

c ,each

indicating the occurrences of a di?erent character in T bwt.Those sequences are represented with

existing techniques[35],so that Rank c(T bwt,q)=Rank1(T bwt

c ,q)can be answere

d in constant

time.This yields t rnk=O(1)and t ret=O(|Σ|)as in the original implementation.The space occupancy is N=nH0(T)+O(n).The scheme works for|Σ|=O(polylog(n)).

4.The one obtained by Hu?man-compressing T bwt and then representing the resulting binary

sequence with the techniques in[31].This yields t rnk=t ret=O(H0(T))on average and O(log|Σ|)in the worst case.The space occupancy is N=2nH0(T)+O(n)[16].

5.The one obtained by compressing via run-length encoding the sequence T bwt[26,27].This

approach obtains either t rnk=O(1)and t ret=O(|Σ|)if|Σ|=O(polylog(n)),or t rnk=t ret= O(log|Σ|)if|Σ|=o(n/log n).The space occupancy is N=2n(H k(T)log|Σ|+1+o(1))bits.

Notice that this is the only alternative to the original FM-index with space occupancy related to the k-th order entropy(although it shows a linear dependence on n but a milder dependence on|Σ|).

4

We note that,by directly applying our new sequence representation on T bwt,we immediately obtain t rnk=t ret=O(1)time and N=nH0(T)+o(n),which supersedes all the alternatives whose space requirement is proportional to the zero-order entropy of T(that is,items2,3,and4above),as long as|Σ|=O(polylog(n)).Yet,in this paper we do better than this by making a further step that turns H0(T)into H k(T)in the previous space bound.Precisely,our main result is an implementation of the FM-index concept with t rnk=t ret=O(1)time and N=nH k(T)+o(n),for a reasonable range of values of k(see below).Except for the limitation|Σ|=O(polylog(n)),this implementation supersedes all the existing implementations of the FM-index concept,so it can be regarded as the ultimate implementation of the FM-index.To obtain this result,we combine our sequence representation with the compression boosting technique introduced in[6,11].Compression boosting partitions the Burrows-Wheeler transformed text into contiguous areas in order to maximize the overall compression achievable with zero-order compressors used over each area.Then we use our new sequence representation in each area.The resulting structure is thus simple.It indexes a string T[1,n]drawn from an alphabetΣ,with|Σ|=O(polylog(n)),using nH k(T)+O(n/logεn)bits. The data structure does not depend on the parameter k and the space bound holds simultaneously for all k≤αlog|Σ|n and constant0<α<1.With our index,the counting of the occurrences of an arbitrary pattern P[1,p]as a substring of T takes O(p)time(i.e.no alphabet dependence). Locating each pattern occurrence takes O(log1+εn).Displaying a text substring of length?takes O(?+log1+εn)time.

If the size of the alphabet is larger than polylogarithmic,that is,|Σ|=O(nβ)withβ<1, our data structure uses nH k(T)+o(n log|Σ|)space and the query times are multiplied by a factor O(log|Σ|/log log n).Note that,although the space occupancy can now be?(n),it is still less than the size of the uncompressed text.Moreover,in terms of query time our index is faster than most FM-index implementations.As an alternative,if space is more important than query speed,we can use a simpli?ed version of our index which uses binary wavelet trees instead of our new integer sequences representation.For any alphabet size,the space occupancy of this latter index is bounded by nH k(T)+O(log|Σ|(n log log n/log n))bits for any k≤αlog|Σ|n,and0<α<1.The index takes O(p log|Σ|)time to count the pattern occurrences,and O(log|Σ|(?+log2n/log log n))time to locate and display a substring of length?.

There exist several other compressed full-text indexes not based on the FM-index concept[10, 19,37,34,17,18,28,25].Among them,the data structure with the smallest space occupancy is described in[17](Theorems4.2and5.2)and uses nH k(T)+O(log|Σ|(n log log n/log n))bits of storage for a?xed k≤αlog|Σ|n with0<α<1(the parameter k must be chosen when the index is built).The index in[17]takes O(p log|Σ|+polylog(n))time to count the pattern occurrences and O(log|Σ|(?+log2n/log log n))time to locate and display a substring of length?.Note that for alphabets of polylogarithmic size our index is faster in both queries at the cost of a small increase in the big-Oh terms of the space occupancy.Note also that for any alphabet size our simpli?ed data structure takes the same space as the index in[17],but it is faster in counting the occurrences and takes the same time in reporting and displaying the occurrences.

Finally,we point out that the structure in[17]uses binary wavelet trees to compactly represent sequences.By replacing binary wavelet trees with the sequence representation described in this paper we can remove the O(log|Σ|)factors from their query times.

To summarize,our index has asymptotically the smallest known space occupancy and processes all queries faster than the data structure in[17],which is the only other compressed index known to date with essentially nH k(T)space occupancy.Table1summarizes our contribution.

5

Reference Counting time

[7,9]O(p)

nH0(T)+O(n)O(polylog(n))

[27]O(p)

nH0(T)+O(n log log|Σ|)o(n/log n)

[16]O(p log|Σ|)

nH k(T)+o(n log|Σ|)o(n/log n)

[34]O(p3log|Σ|+p log n)

This paper O(p)

nH k(T)+o(n log|Σ|)O(nβ),β<1

O(p log|Σ|)

n,but the data structure is interesting only for r=o(log n/log log n)since otherwise the space occupancy will exceed the space?(n log r)used by the standard uncompressed representation.We?rst focus on S[q]and Rank c(S,q)queries,and address the Select c(S,q)query later.

Structure.We divide S into blocks of size u= 1

–R i is the identi?er of the symbol composition(N1i,...,N r i)among all possible tuples of r numbers that add up to u.R i consists of log u+r?1r?1 ≤?r log(u+1)?bits.

Our compressed representation of S consists of the storage of the following information:

–A bit sequence I obtained by concatenating all variable-length identi?ers I i.

–A bit sequence R obtained by concatenating all?xed-length identi?ers R i.

–A table E=E N1,...,N r for every possible symbol composition(N1,...,N r)summing up to u.

Each entry of E corresponds to a di?erent u-length block G(with the proper symbol composition (N1,...,N r))and stores the answers to all Rank c(G,q)queries,where1≤q≤u and c∈[1,r].

Indexes I i are such that E[I i]stores Rank c-information for the block S i.Tables E do not depend on S,but just on u.

–A table F whose entries are indexed by all possible symbol compositions(N1,...,N r)summing up to u,and point to the corresponding tables E N1,...,N r.Indexes R i are such that F[R i]points

to E N1

i ,...,N r

i

.Also table F does not depend on S,but just on u.

–Information to answer partial sum queries on L i,that is,to compute i j=1L j in constant time for any block i.

–Information to answer partial sum queries on N c i,that is,to compute i j=1N c j in constant time for any block i and symbol c.

Solving queries.To answer queries about position q we?rst compute the block number i=?q/u?to which q belongs and the o?set?=q?(i?1)u inside that block.Then we compute E=F[R i], the table of entries corresponding to block i,and G=E[I i],the entry of E corresponding to block i. Note that,since the I i values use variable number of bits,we need to know which is the starting and ending positions of the representation for I i in the sequence.These are1+ i?1j=1L j and i j=1L j, respectively,which are known in constant time because we have partial sum information on L i. Now,to answer Rank c(S,q)we evaluate(in constant time)the partial sum i?1j=1N c j and add Rank c(G,?).To answer S[q]we simply give G[?].Both queries take constant time.Figure1graph-ically illustrates the rank computation.

Space usage.First notice that uH0(S i)=log u N1i,...,N r i ,and thus

?n/u?

i=1uH0(S i)=?n/u? i=1log u N1i,...,N r i =log?n/u? i=1 u N1i,...,N r i

≤log n n1,...,n r =nH0(S),(1)

where n c is the total number of occurrences of character c in S.The inequality holds because distributing N c i symbols over block S i is just one possible way to distribute n c symbols over S[35]. This result permits us to bound the length of the sequence I as

?n/u?

i=1L i=?n/u? i=1 log u N1i,...,N r i ≤?n/u? i=1uH0(S i)+?n/u?

≤nH0(S)+O(n/log r n).

7

R

......

Fig.1.A graphical description of the algorithm solving rank queries on sequences.Precisely,we illustrate the case Rank c(S,q),where q=i·u+l.The?nal“+”is the answer.

Let us now consider the sequence R.The number of di?erent tuples(N1,...,N r)that add up u is u+r?1r?1 ≤(u+1)r.Hence it is enough to use?r log(u+1)?bits for each R i(which actually is enough to describe any tuple of r numbers in[0,u]).Accumulated over the?n/u?blocks,this requires O(rn log log n/log r n)bits.

We consider now the structures to answer partial sum queries[35],namely i j=1N c j and i j=1L j.Both structures are similar,so we concentrate on the L j’s,whose upper bounds are larger:indeed L i≤?u log r?,since u N1i,...,N r i ≤r u,whereas a partial sum over any N c is trivially bounded by u.Recall that we need to answer partial sum queries over the sequence of integers L=L1,L2,...,L t,where t=?n/u?.Since L i≤?u log r?,each partial sum over L does not ex-ceed n?log r?and can be represented in?log(n?log r?)?bits.Divide L into blocks of that length,?log(n?log r?)?,and store the full partial sums for each block beginning.This requires exactly t=O(n/log r n)bits.Inside each block,store the partial sums relative to the block beginning. These latter partial sums cannot exceed?u log r??log(n?log r?)?because of the upper bound on the L i’s and the length of the L-blocks.Hence we need O(log u+log log r+log log n)=O(log log n) bits for each partial sum within each block of L.Hence we need O(|L|log log n)=O(t log log n)= O(n log log n/log r n)bits overall for the partial sum information on L.A partial sum query on L i is answered in constant time by adding the partial sum of the block of L that contains L i and the relative partial sum of L i inside that block.The same technique can be applied to sequences N c, whose values are in the range[0,u],to obtain O(rn log log n/log r n)bits of space because there are r sequences to index.

Finally,let us consider tables E and F.The total number of entries over all E N1,...,N r tables is clearly r u since each sequence of u symbols over[1,r]belongs exactly to one symbol composition.

8

For each such entry G we explicitly store the sequence itself plus the answers to all Rank c(G,q) queries,c∈[1,r],1≤q≤u.This storage requires O(u log r+ru log u)bits.Added over all the

entries of all the E tables,we have O(r u(u log r+ru log u))=O(

n.There exists a data structure using nH0(S)+O(r(n log log n)/log r n)bits of space,that supports queries Rank c(S,q) and Select c(S,q),and retrieves S[q],all in constant time.??The theorem is a generalization of the result in[35,36],which uses nH0(S)+O((n log log n)/log n) bits of space to represent a binary sequence S(r=2)so as to answer Rank c(S,q)and Select c(S,q) queries in constant time.Note that for binary sequences queries S[q]can be easily answered in constant time by?nding c∈{0,1}such that Rank c(S,q)?Rank c(S,q?1)=1,whereas we had to provide direct access to G[?].Moreover,with binary sequences one can use R i=i,while we needed explicit pointers to an intermediate table F.

As we have already observed,the above result is interesting only for alphabets of size r= o(log n/log log n),since otherwise the space occupancy of the data structure is?(n log r).In the next section we show how to extend this result to alphabets of polylogarithmic size.

3.2Generalized Wavelet Trees

In this section we use the representation of sequences developed in the previous section to build an improved sequence representation,which adapts better to the range of di?erent symbols represented. Albeit we solve exactly the same problem,we will change notation a little bit for clarity.This time our sequence S[1,n]will be a sequence of symbols over an alphabetΣ=[1,|Σ|],so that r≤|Σ| will be reserved to applications of Theorem1.Actually,for r=|Σ|the data structure we are going to introduce will be essentially the data structure of Theorem1.

9

Let us recall the basic ideas underlying the(binary)wavelet tree[17].Consider a balanced binary tree T whose leaves contain the characters of the alphabetΣ.T has height O(log|Σ|).Each node u of T is associated with a string S u that represents the subsequence of S containing only the characters that are stored into leaves descending from u.The root is thus associated with the entire S.The node u of the wavelet tree does store indeed a binary image of the sequence S u, denoted by B u,such that B u[i]=1if the character S u[i]descends from the right child of u,and B u[i]=0otherwise.By representing every binary sequence B u with the data structure of[36]we get a data structure supporting Rank c(S,q),Select c(S,q),and S[q]queries in O(log|Σ|)time using nH0(S)+o(n)space.

The r-ary Wavelet Tree.Let us consider an r-ary balanced tree whose leaves are associated with the symbols of the alphabetΣ.This r-ary tree has height at most1+log r|Σ|.As in a(binary) wavelet tree,each node v is associated with a subsequence S v of S[1,n]formed just by the symbols descending from v.Unlike a(binary)wavelet tree,the subsequence S v is stored in v as a sequence of integers in the range[1,r].Precisely,let v be a node with children v1...v r,and letΣv be the set of symbols descending from v.Because of the balancedness of the tree,Σv is actually split into

r equally-sized subsetsΣv

1...Σv

r

,which are integral ranges of size|Σv

i|≈|Σv|/r.Therefore,the

sequence S v is represented as a sequence of n v=|S v|integers in the range[1,r]such that S v[q]=j whenever S v[q]∈Σv

j

.The data structure of Theorem1is?nally built over S v and stored at node v so to answer queries S v[q],Rank j(S v,q),and Select j(S v,q)in constant time.

A technical details is that we concatenate all the sequences S v lying at the tree level h,and store them into one unique long sequence S h.All these level-wise sequences have the same length of S,namely n.As we go down/up the tree,it is easy to maintain in constant time the index q?+1 where the current node sequence S v starts inside the level sequence S h.To achieve this,each node v maintains also a vector C v[1,r]such that C v[j]is the number of occurrences in S v of symbols in[1,j?1].Now assume that we are at node v and its sequence S v starts at index q?+1of S h; then the sequence S v

j

of the j th child of v starts at q?+C v[j]+1in S h+1.Conversely,assume that we are at node v j and its sequence S v

j

starts at index q?+1of S h+1;then the sequence S v of the parent v of v j starts at q??C v[j]+1in S h.Notice that we need to store pointers(with negligible extra space)to?nd the C vectors of children or parents,or we can take advantage of the tree being almost perfect to avoid such pointers.We need also,for the bottom-up traversal required to implement select(see next),|Σ|pointers to the leaves of the r-ary wavelet tree.

Solving queries.To compute Rank c(S,q),we start at the root node v and determine in constant time the subsetΣv

j

to which c belongs by a simple algebraic calculation.We then compute the

position corresponding to q in S v

j ,namely q v

j

=Rank j(S v,q).We then recursively continue with

q=q v

j

at node v j.We eventually reach a tree leaf v l(corresponding to the subset{c}?Σ),for

which we have the answer to our original query Rank c(S,q)=q v

l .On the other hand,to determine

S[q],we start at the root node v and obtain j=S v[q],so that S[q]∈Σv

j

.Then we continue recursively with node v j and q=q v

j

=Rank j(S v,q)as before,until we reach a leaf v l,where

Σv

l ={S[q]}is?nally determined.Both queries take O(log r|Σ|)time.

To compute Select c(S,q),instead,we proceed bottom-up.We identify the leaf v l corresponding

to subset{c}and then proceed upwards.At leaf v l(not actually represented in the tree),we

initialize q v

l =q.This is the position we want to track upwards in the tree.Now,let v be the

parent of v j,then q v=Select j(S v,q v

j )is the position of S v

j

[q v

j

]in S v.We eventually reach the

root,with q root=Select c(S,q),in O(log r|Σ|)time.

10

It goes without saying that,since we do not represent sequences S v but level-wise sequences S h,in the calculations above we need to take care of the latter.Assume that our current sequence S v starts at position q?+1in S h.Then,queries over S v are translated to queries over S h as follows:S v[q]=S h[q?+q],Rank j(S v,q)=Rank j(S h,q?+q)?Rank j(S h,q?),and Select j(S v,q)= Select j(S h,Rank j(S h,q?)+q).

Space usage.An immediate advantage of having all sequences S h[1,n]over the same alphabet [1,r]is that all tables E and F are the same for all levels,so they take o((rn log log n)/log r n) bits overall.All the other O((rn log log n)/log r n)size structures used to prove Theorem1total-ize O(log|Σ|(rn log log n)/log n)bits of space by adding up all the O(log r|Σ|)levels.The struc-tures C v need O(r log n)bits each,and there is one C v array per non-leaf node v.This totalizes

O |Σ|

2

log n and that n v=|S v|).The sum of local zero-order entropies uH0(S h i)for the?n v/u?blocks is a lower bound to n v H0(S v)(recall Eq.(1)). For the other2blocks,we simply assume that they take the maximum u?log r?=O(log n)bits. We have at most r

n v log

n v

j

n),uses nH0(S)+O(|Σ|log n)+O(log|Σ|(rn log log n)/log n)bits of storage and supports in O(log r|Σ|)time the queries S[q],Rank c(S,q)and Select c(S,q),for any c∈Σand1≤q≤n.

Moreover,if|Σ|=O(polylog(n)),then r can be chosen so that the resulting r-ary wavelet tree supports all queries in constant time and takes nH0(S)+O(n/logεn)bits of space,for any constant 0<ε<1.

Proof.The?rst part of the theorem,for a general r,is a consequence of the development in this section.For the last sentence,note that by choosing r=|Σ|1/κ,for constantκ>0,we can support

11

the query operations in constant time O(κ).Now,if|Σ|=O(polylog(n))=O((log n)d),then we can choose anyκ>d to obtain O(dn(log log n)2/(log n)1?d/κ)space overhead.For any constant 0<ε<1,we choose d<κ

??The theorem is a generalization upon the(binary)wavelet tree data structure[17],which takes nH0(S)+O(log|Σ|(n log log n)/log n)space and answers the same queries in O(log|Σ|)time.The last part shows that,when|Σ|=O(polylog(n)),the generalization obtains essentially the same space(up to lower order terms)and reduces query times to a constant.The case of larger alphabets deserves a separate corollary.

Corollary1.Let S[1,n]be a string over an alphabetΣ.If we choose r=O(log n/(log log n)2), the r-ary wavelet tree built on S uses nH0(S)+O(|Σ|log n)+o(n log|Σ|)bits and supports in O(log|Σ|/log log n)time the queries S[q],Rank c(S,q)and Select c(S,q),for any c∈Σand1≤q≤n.Note that if|Σ|=O(nβ),β<1,the space occupancy simpli?es to nH0(S)+o(n log|Σ|).??4Compressed Representation of Full-Text Indexes

We will now apply the results of the previous section to build a new implementation of the FM-index concept.We need?rst to explain the FM-index concept in full detail,that is,the Burrows-Wheeler transform and the backward search mechanism,highlighting the time dependence on the alphabet size|Σ|.Also,the compression boosting technique[11]will be central to our solution.

Hereafter we assume that T[1,n]is the text we wish to index,compress and query.T is drawn from an alphabetΣof size|Σ|.By T[i]we denote the i-th character of T,T[i,n]denotes the i th text su?x,and T[1,i]denotes the i th text pre?x.A substring of T is any T[i,j].We write|w|to denote the length of string w.

Our?nal goal is to answer substring queries using a compressed representation of T.That is, we wish to?nd out whether a given pattern P[1,p]occurs as a substring of T(existence query), how many times it occurs(counting query),and at which positions(locating query).Also,as T is compressed we need to support the retrieval of any substring of T(context query).If the compressed representation of T supports all these queries,we say that the representation is a compressed full-text self-index.

4.1The k-th Order Empirical Entropy

Following a well established practice in Information Theory,we lower bound the space needed to store a string T by using the notion of empirical entropy.The empirical entropy is similar to the entropy de?ned in the probabilistic setting with the di?erence that it is de?ned in terms of the character frequencies observed in T rather than in terms of character probabilities.The key property of empirical entropy is that it is de?ned pointwise for any string T and can be used to measure the performance of compression algorithms as a function of the string structure,thus without any assumption on the input source.In a sense,compression bounds produced in terms of empirical entropy are worst-case measures.

Just as de?ned for sequences,the zero-order empirical entropy of T is de?ned as H0(T)=? c(n c/n)log(n c/n),where n c is the number of occurrences of alphabet character c in T,and n= c n c=|T|.To introduce the concept of k-th order empirical entropy we need to de?ne what

12

F T bwt

=?

Fig.2.Example of Burrows-Wheeler transform for the string T=mississippi.The matrix on the right has the rows sorted in lexicographic order.The output of the BWT is the last column;in this example the string ipssm#pissii. is a context.A length-k context w in T is one of its substrings of length k.Given w,we denote by w T the string formed by concatenating all the symbols following the occurrences of w in T,taken from left to right.For example,if T=mississippi then s T=sisi and si T=sp.The k-th order empirical entropy of T is de?ned as:

1

H k(T)=

–Let C[]denote the array of length|Σ|such that C[c]contains the total number of text characters which are alphabetically smaller than c.

–Let Occ(c,q)denote the number of occurrences of character c in the pre?x T bwt[1,q].In our sequence terminology,Occ(c,q)=Rank c(T bwt,q).

–Let LF(i)=C[T bwt[i]]+Occ(T bwt[i],i).

LF()stands for Last-to-First column mapping since the character T bwt[i],in the last column of M T,is located in the?rst column F at position LF(i).For example in Fig.2we have LF(10)= C[s]+Occ(s,10)=12;and in fact T bwt[10]and F[LF(10)]=F[12]both correspond to the?rst s in the string mississippi.

The LF()mapping allows us to scan the text T https://www.doczj.com/doc/87663943.html,ly,if T[k]=T bwt[i]then T[k?1]=T bwt[LF(i)].For example in Fig.2we have that T[3]=s is the10th character of T bwt and we correctly have T[2]=T bwt[LF(10)]=T bwt[12]=i(see[7]for details).

4.3The FM-Index

The FM-index is a self-index that allows one to e?ciently search for the occurrences of an arbitrary pattern P[1,p]as a substring of the text T[1,n].Pattern P is provided on-line whereas the text T is given to be preprocessed in advance.The number of pattern occurrences in T is hereafter indicated with occ.

The FM-index consists,in essence,of the data structures required to compute C[],Occ(), and LF().The?rst is directly stored in|Σ|log n bits.To compute Occ(c,q)in constant time,the FM-index stores a compressed representation of T bwt together with some auxiliary information. This also gives the tools to compute LF(i),provided we have access to T bwt[i].This is obtained in O(|Σ|)time by linearly searching for the only c∈Σsuch that Occ(c,q)=Occ(c,q?1).The two key procedures to operate on the FM-index are:the counting of the number of pattern occurrences (shortly count),and the location of their positions in the text T(shortly locate).Note that the counting process returns the value occ,whereas the location process returns occ distinct integers in the range[1,n].

Fig.3.Algorithm count for?nding the set of rows pre?xed by P[1,p],and thus for counting the pattern occurrences occ=Last?First+1.Recall that C[c]is the number of text characters which are alphabetically smaller than c,and that Occ(c,q)denotes the number of occurrences of character c in T bwt[1,q].

Fig.3sketches the pseudocode of the counting operation that works“backwards”in p phases, hence numbered from p to1.The i-th phase preserves the following invariant:The parameter First points to the?rst row of the BWT matrix M T pre?xed by P[i,p],and the parameter Last

14

points to the last row of M T pre?xed by P[i,p].After the?nal phase,P pre?xes the rows between First and Last and thus,according to the properties of matrix M T(see Section4.2),we have

occ=Last?First+1.It is easy to see that the running time of count is dominated by the cost of

the2p computations of the values Occ().

Fig.4.Algorithm locate for the computation of Pos(i).

Given the range(First,Last),we now consider the problem of locating the positions in T of these pattern occurrences.We notice that every row in M T is pre?xed by some su?x of T.For

example,in Fig.2the fourth row of M T is pre?xed by the text su?x T[5,11]=issippi.Then,

for i=First,First+1,...,Last we use procedure locate(i)to?nd the starting position in T of the

su?x that pre?xes the i-th row M T[i].Such a position is denoted hereafter by Pos(i),and the

pseudocode of locate is given in Fig.4.The intuition underlying its functioning is simple.We scan

backward the text T using the LF()mapping(see Section4.2)until a marked position is met.If

we mark one text position everyΘ(log1+εn),for some constantε>0,the while loop is executed

O(log1+εn)times.Since the computation of LF(i)is done via at most|Σ|computations of Occ(),

we have that locate takes O(|Σ|log1+εn)time.The space required by the marked positions is

Θ(n/logεn)https://www.doczj.com/doc/87663943.html,bining the observations on locate with the ones for count,we get[7]: Theorem3.For any string T[1,n]drawn from a constant-sized alphabetΣ,the FM-index counts the occurrences of any pattern P[1,p]within T taking O(p)time.The location of each pattern occurrence takes O(|Σ|log1+εn)time,for any constantε>0.The size of the FM-index is bounded

by5nH k(T)+o(n)bits,for any?xed k.??In order to retrieve the content of T[l1,l2],we must?rst?nd the row in M T that corresponds

to l2,and then issue?=l2?l1+1backward steps in T,using the LF()mapping.Starting at the lowest marked text position that follows l2,we perform O(log1+εn)steps until reaching l2.

Then,we perform?additional LF-steps to collect the text characters.The resulting complexity is

O((?+log1+εn)|Σ|).

As we mentioned in the Introduction,the main drawback of the FM-index is that,hidden in the o(n)term of the space bound,there are constants which depend exponentially on the alphabet size|Σ|.In Section4.5we describe our new implementation of the FM-index concept,which takes nH k(T)+o(n)bits and allows the computation of Occ(c,q)and T bwt[i]in O(1)time for a reasonable range of alphabet sizes,i.e.|Σ|=O(polylog(n)).

4.4Compression Boosting

The concept of compression boosting has been recently introduced in[6,11,14]opening the door to a new approach to data compression.The key idea is that one can take an algorithm whose

15

performance can be bounded in terms of the zero-order entropy and obtain,via the booster,a new compressor whose performance can be bounded in terms of the k -th order entropy,simultaneously for all k .Putting it another way,one can take a compression algorithm that uses no context information at all and,via the boosting process,obtain an algorithm that automatically uses the “best possible”contexts.

For technical reasons we need a boosting theorem which is slightly di?erent from the one in [6,11].However,the proof of Theorem 4is obtained by a straightforward modi?cation of the proof of Lemma 3.5in [11].

Theorem 4.Let A be an algorithm which compresses any string S in less than |S |H 0(S )+f (|S |)bits,where f ()is a non decreasing concave function.Given T [1,n ]there is an O (n )time procedure that computes a partition S 1,S 2,...,S z of T bwt such that,for any k ≥0,we have

z i =1|A (S i )|≤

z i =1(|S i |H 0(S i )+f (|S i |))≤nH k (T )+|Σ|k f (n/|Σ|k ).Proof.For any k ≥0let ?S

1,?S 2,...,?S m denote the partition of T bwt such that m i =1|?S i |H 0(?S i )=nH k (T ).(3)

Each ?S

i is a permutation of one of the strings w T de?ned in Sec.4.1with w ∈Σk (see [11,Sec.2.2]for details).Repeating verbatim the proof of Lemma 3.5in [11]we get that the partition S 1,...,S z of T bwt produced by the boosting algorithm is such that

z i =1

(|S i |H 0(S i )+f (|S i |))≤m i =1 |?S i |H 0(?S i )+f (|?S i |) .(4)Since by hypothesis |A (S i )|≤|S i |H 0(S i )+f (|S i |),From (4)and (3)we get z i =1(|S i |H 0(S i )+f (|S i |))≤m i =1|?S

i |H 0(?S i )+m i =1f (|?S i |)=nH k (T )+m i =1f (|?S

i |)≤nH k (T )+|Σ|k f (n/|Σ|k ),where the last inequality follows from the concavity of f ()and the fact that m ≤|Σ|k .??

To understand the relevance of this result suppose that we want to compress T [1,n ]and that we wish to exploit the zero-order compressor A .Using the boosting technique we can ?rst compute the partition S 1,S 2,...,S z of T bwt ,and then compress each S i using A .By the above theorem,the overall space occupancy would be bounded by i |A (S i )|≤nH k (T )+|Σ|k f (n/|Σ|k ).Note that the process is reversible,because the decompression of each S i retrieves T bwt ,and from T bwt we can retrieve T using the inverse BWT.Summing up,the booster allows us to compress T up to its k -th order entropy using only the zero-order compressor A .Note that the parameter k is neither known to A nor to the booster,it comes into play only in the space complexity analysis.This means that the space bound in Theorem 4holds simultaneously for all k ≥0.The only information that is required by the booster is the function f ()such that |S |H 0(S )+f (|S |)is an upper bound on the size of the output produced by A on input S .

16

4.5An Improved FM-Index Implementation

We now show how to combine the tools described in the previous sections to obtain an FM-index implementation with query time independent of the alphabet size.The crucial observation is the following.To build the FM-index we need to solve two problems:(a)compress T bwt up to H k(T), and(b)support the e?cient computation of Occ(c,q)and T bwt[q].We use the boosting technique to transform problem(a)into the problem of compressing the strings S1,S2,...,S z up to their zero-order entropy,and use the generalized wavelet tree to create a compressed(up to H0)and indexable representation of each S i,thus solving simultaneously problems(a)and(b).

Fig.5.Construction of our improved FM-index.

The details of the construction are given in Fig.5,some comments follow.To compute T bwt[q],we ?rst determine the substring S y containing the q-th character of T bwt by computing y=Rank1(B,q). Then we exploit the generalized wavelet tree T y to determine T bwt[q].By Theorem1the former step takes O(1)time,and by Theorem2the latter step takes also O(1)time.

To compute Occ(c,q),we initially determine the substring S y of T bwt where the matrix row q occurs in,y=Rank1(B,q).Then we?nd the relative position of q within S y by calculating q′=q? y?1j=1|S j|=q?Select1(B,y).5Finally,we exploit the generalized wavelet tree T y and

(c,q′)+C y[c]=Rank c(S y,q′)+C y[c].Again,by use the array C y[c]to compute Occ(c,q)=Occ S

y

Theorems1and2this computation takes overall O(1)https://www.doczj.com/doc/87663943.html,ing this technique inside algorithms count and locate of Section4.3,we immediately obtain O(p)time to count the occurrences of a pattern P[1,p]and O(log1+εn)time to retrieve the position of each occurrence.The time to display a substring of length?is O(?)in addition to the locate time.

We now analyze the space occupancy of our data structure.Let us call a sequence S i long if |Σi|=O(polylog(|S i|)),whereΣi is the alphabet of S i.Otherwise S i is short.

Let us?rst assume that all sequences are long.This assumption and Theorem2allow us to conclude that a generalized wavelet tree T i built on S i uses|S i|H0(S i)+O(|S i|/logε|S i|)bits, for any0<ε<1.By Theorem1,the storage of B takes log n z +O((n log log n)/log n)≤z log n+O((n log log n)/log n)bits.Each array C i takes|Σ|log n bits.Consequently,under the hypothesis that|Σi|=O(polylog(|S i|))for all sequences S i,the total space occupancy is bounded

by

z

i=1 |S i|H0(S i)+K|S i|/logε|S i|+(1+|Σ|)log n +O((n log log n)/log n).

Note that the function f(t)used at Step1of our construction(see Fig.5)matches exactly the overhead with respect to H0that we have for each S i.From Theorem4we get that for any k≥0 we can bound the above summation by

nH k(T)+O n/logε(n/|Σ|k) +O |Σ|k+1log n +O((n log log n)/log n).(5) Recall that we are interested in bounding the space occupancy in terms of H k(T)only for k≤αlog|Σ|n for someα<1.In this case we have|Σ|k≤nα.By observing that O((n log log n)/log n)= O(n/logεn)we turn(5)into

nH k(T)+O(n/logεn).(6) We complete the analysis of the space occupancy by considering the case of short sequences,that is,where|Σi|=ω(polylog(|S i|))for some sequences S i.The?rst part of Theorem2implies that we can always choose r=|Σi|1/κsuch that all the query times are the constant O(log r|A i|)=O(κ). The extra space,however,can only be bounded by o(|Σi||S i|).Sinceω(polylog(|S i|))=|Σi|≤|Σ|=O(polylog(n)),we must have|S i|=o(nβ)for anyβ>0.Thus,the extra space for a short sequence S i is o(|Σ||S i|)=o(nβpolylog(n)).

Recalling again that we are interested in bounding the space occupancy in terms of H k(T)only for k≤αlog|Σ|n andα<1,we have that the overall space overhead because of the(at most |Σ|k≤nα)short sequences S i is o(nα+βpolylog(n))bits for anyβ>0.If we takeβ<1?α, the space bound becomes O(n/logεn)for any desired0<ε<1.Therefore,we can correctly apply Theorem4since the extra space to represent short sequences can be considered to be the one corresponding to a long sequence plus smaller terms that are left in the sublinear term that accompanies nH k(T).

We note that we have inherited from(6)a sublinear space cost of the form O(n/logεn),for any 0<ε<1.Also,from Theorem3,we carry another term of the form O(n/logεn),for anyε>0. We then achieve the following result:

Theorem5.Let T[1,n]be a string over an alphabetΣ,where|Σ|=O(polylog(n)).The data structure of Fig.5indexes T[1,n]within nH k(T)+O(n/logεn)bits,for any k≤αlog|Σ|n and 0<α,ε<1.It can count the number of occurrences of any string P[1,p]in T in O(p)time,locate each occurrence in O(log1+εn)time,and display any text substring of length?in O(?+log1+εn) time.??Large Alphabets.Suppose now that the alphabet is larger than O(polylog(n)),in particular |Σ|=O(nβ)withβ<1.Corollary1shows that we can obtain O(log|Σ|/log log n)query time on the sequences S i,using space|S i|H0(S i)+O(|Σ|log|S i|)+o(|S i|log|Σ|)(more precisely,the latter term is O(|S i|log|Σ|/log log|S i|)in Corollary1).It is easy to repeat the analysis and obtain that, instead of(5),the bound on the size of our index becomes:

nH k(T)+O n log|Σ|/log log(n/|Σ|k) +O |Σ|k+1log n +O((n log log n)/log n). Since,for any k≤α(log|Σ|n)?1,with0<α<1,the above bound is nH k(T)+o(n log|Σ|)we can state the following theorem.

18

Theorem6.Let T[1,n]be a string over an alphabetΣ,with|Σ|=O(nβ)andβ<1.It is possible

to index T[1,n]within nH k(T)+o(n log|Σ|)bits,for any k≤α(log|Σ|n)?1and0<α<1.This

index can count the number of occurrences of any string P[1,p]in T in O(p log|Σ|/log log n)time, locate each occurrence in O(log|Σ|log1+εn/log log n)time,and display any text substring of length

?in O((?+log1+εn)log|Σ|/log log n)time.??As an alternative to Theorem6,we can handle large alphabets by decreasing the arity r of

our generalized wavelet tree.This reduces the space occupancy of our index at the cost of an

increased query time.Here we only discuss the extreme case r=2in which we use the traditional

binary wavelet tree instead of our sequence https://www.doczj.com/doc/87663943.html,ing binary wavelet trees we can represent each S i in|S i|H0(S i)+O(log|Σ|(|S i|log log|S i|)/log|S i|)bits of https://www.doczj.com/doc/87663943.html,bining this representation with the compression booster we get an index of size bounded by nH k(T)+

O(log|Σ|(n log log n/log n))for any k≤αlog|Σ|n andα<1.Notice that this holds for any

alphabet size such that|Σ|=o(n/log n).Since querying a binary wavelet tree takes O(log|Σ|)

time,our simpli?ed index can count the number of occurrences in O(p log|Σ|)time,locate each

occurrence in O(log|Σ|(log2n/log log n)),and display any text substring of length?in O(log|Σ|(?+

log2n)/log log n))time.The details on the analysis of the above simpli?ed index can be found

in[12].

5Conclusions

The contribution of this paper is twofold.First,we have presented a compressed representation

of sequences S[1,n]that requires nH0(S)+o(n)bits of space and is able to retrieve individual symbols S[q]and answer rank and select queries in constant time.The technique works whenever the alphabet size of the sequence satis?es the condition|Σ|=O(polylog(n)).This is a non trivial generalization of previous results on binary sequences[36]and an improvement of previous results on general sequences[17].

Secondly,we have combined the above result with an existing compression boosting technique

to obtain a compressed full-text index for a text T[1,n]which uses nH k(T)+o(n)bits whenever the

alphabet size is O(polylog(n)).Our index has the smallest known space occupancy,is a self-index,

is faster than other indexes of the same size,and is the?rst compressed index with query time

independent of the alphabet size.We have also shown that on larger alphabets we can still improve

the best existing results.

There are several future challenges on compressed full-text indexes:(i)obtaining better results when the alphabet size is not O(polylog(n));(ii)removing the limit on the maximum entropy order k that can be achieved,which is currently k≤αlog|Σ|n for any0<α<1;(iii)achieving the optimal query times within nH k(T)+o(n)space,that is,O(p/log|Σ|n)for counting and O(1)for locating,as opposed to our O(p)and O(log1+εn)(this has been partially achieved[19,10]in some cases);(iv)coping with updates to the text and secondary memory issues(see e.g.[7,2,21]);(v) handling more sophisticated searching,such as approximate and regular expression searching(see e.g.[23]).

It should be clear,however,that some limits cannot be surpassed.For example,one cannot

achieve a nH k(T)+o(n)space bound without any restriction on|Σ|or k.To see this,consider

the extreme case in which|Σ|=n,that is,the input string consists of a permutation of n distinct

characters.In this case we have H k(T)=0for all k≥1,and any representation of such string must

require?(n log n)bits.Hence a self-index of size nH k(T)+o(n)bits cannot exist.Understanding

19

which are the limits and the space-time tradeo?s that can be achieved on compressible strings(cfr.

[5])is an extremely interesting open problem.

References

1.M.Burrows and D.Wheeler.A block sorting lossless data compression algorithm.Technical Report124,Digital

Equipment Corporation,1994.

2.H.Chan,W.Hon,and https://www.doczj.com/doc/87663943.html,pressed index for a dynamic collection of texts.In Proc.of the15th

Symposium on Combinatorial Pattern Matching(CPM’04),pages445–456.Springer-Verlag LNCS n.3109, 2004.

3. https://www.doczj.com/doc/87663943.html,pact Pat Trees.PhD thesis,University of Waterloo,1996.

4.M.Crochemore and W.Rytter.Text Algorithms.Oxford University Press,1994.

5. E.D.Demaine and A.L′o pez-Ortiz.A linear lower bound on index size for text retrieval.Journal of Algorithms,

48(1):2–15,2003.

6.P.Ferragina,R.Giancarlo,G.Manzini,and M.Sciortino.Boosting textual compression in optimal linear time.

Journal of the ACM,To appear.

7.P.Ferragina and G.Manzini.Opportunistic data structures with applications.In Proc.of the41st IEEE

Symposium on Foundations of Computer Science,pages390–398,2000.

8.P.Ferragina and G.Manzini.An experimental study of a compressed https://www.doczj.com/doc/87663943.html,rmation Sciences:special issue

on“Dictionary Based Compression”,135:13–28,2001.

9.P.Ferragina and G.Manzini.An experimental study of an opportunistic index.In Proc.12th ACM-SIAM

Symposium on Discrete Algorithms,pages269–278,2001.

10.P.Ferragina and G.Manzini.On compressing and indexing data.Technical Report TR-02-01,Dipartimento di

Informatica,University of Pisa,2002.

11.P.Ferragina and https://www.doczj.com/doc/87663943.html,pression boosting in optimal linear time using the Burrows-Wheeler transform.

In Proc.15th ACM-SIAM Symposium on Discrete Algorithms(SODA’04),pages648–656,2004.

12.P.Ferragina,G.Manzini,V.M¨a kinen,and G.Navarro.An alphabet-friendly FM-index.In Proc.11th Interna-

tional Symposium on String Processing and Information Retrieval(SPIRE’04),pages150–160.Springer-Verlag LNCS n.3246,2004.

13.R.Geary,N.Rahman,R.Raman,and V.Raman.A simple optimal representation for balanced parentheses.

In Proc.of the15th Symposium on Combinatorial Pattern Matching(CPM’04),pages159–172.Springer-Verlag LNCS n.3109,2004.

14.R.Giancarlo and M.Sciortino.Optimal partitions of strings:A new class of Burrows-Wheeler compression

algorithms.In Combinatorial Pattern Matching Conference(CPM’03),pages129–143,2003.

15.G.H.Gonnet,R.A.Baeza-Yates,and T.Snider.New indices for text:PAT trees and PAT arrays.In B.Frakes

and R.A.Baeza-Yates and,editors,Information Retrieval:Data Structures and Algorithms,chapter5,pages 66–82.Prentice-Hall,1992.

16.Sz.Grabowski,V.M¨a kinen,and G.Navarro.First Hu?man,then Burrows-Wheeler:an alphabet-independent

FM-index.In Proc.11th International Symposium on String Processing and Information Retrieval(SPIRE’04), pages210–211.Springer-Verlag LNCS n.3246,2004.

17.R.Grossi,A.Gupta,and J.Vitter.High-order entropy-compressed text indexes.In Proc.14th Annual ACM-

SIAM Symp.on Discrete Algorithms(SODA’03),pages841–850,2003.

18.R.Grossi,A.Gupta,and J.Vitter.When indexing equals compression:Experiments on compressing su?x arrays

and applications.In Proc.15th Annual ACM-SIAM Symp.on Discrete Algorithms(SODA’04),pages636–645, 2004.

19.R.Grossi and https://www.doczj.com/doc/87663943.html,pressed su?x arrays and su?x trees with applications to text indexing and string

matching.In Proc.of the32nd ACM Symposium on Theory of Computing,pages397–406,2000.

20.J.Healy,E.E.Thomas,J.T.Schwartz,and M.Wigler.Annotating large genomes with exact word matches.

Genome Research,13:2306–2315,2003.

21.W.Hon,https://www.doczj.com/doc/87663943.html,m,K.Sadakane,W.Sung,and https://www.doczj.com/doc/87663943.html,pressed index for dynamic text.In DCC:Data

Compression Conference,pages102–111.IEEE Computer Society TCC,2004.

22.W.Hon,https://www.doczj.com/doc/87663943.html,m,W.Sung,W.Tse,C.Wong,and S.Yiu.Practical aspects of compressed su?x arrays and FM-

index in searching dna sequences.In Proc.6th Workshop on Algorithm Engineering and Experiments(ALENEX ’04),pages31–38.SIAM Press,2004.

20

立春节气的特点是什么_二十四节气立春特点

立春节气的特点是什么_二十四节气立春特点 二十四节气,每一个节气都代表着季节气候的转变。立春是春天的头一个节气,那么大家知道立春节气会有哪些特征变化吗?下面是小编给大家整理的有关立春节气的特点,欢迎大家来参阅。 立春的节气特点 每年2月4日前后,太阳到达黄经315度时交立春节气。“立春”是预示季节转换的节气。这个节气,早在战国末期(约公元前239年)就确定了。 “立春”是二十四节气中的第一个节气。在古时,古人认为,春季开始叫“立春”,夏季开始叫“立夏”,秋季开始叫“立秋”,冬季开始叫“立冬”。其农事意义是“春种、夏长、秋收、冬藏”。 春夏秋冬,以春为首。春季六节气,“立春”居先,标志着春天的到来、四季的开始,这是古时的说法。 立春物候 立春第一候是“东风解冻”,东风送暖,化解了大地冰封已久的寒冻。第二候是“蛰虫始振”,“蛰”是隐藏的意思,指潜伏在地下的小虫都自冬眠中苏醒过来。第三候是“鱼陟负冰”,陟是上升的意思,鱼儿囚为水温渐暖,所以竞相浮游到水面,但水中仍有未溶解的碎冰。在岸上观看,就如同鱼儿背负着冰块在水中游动。 立春气候

从现代气候学标准看,以“立春”作为春季的开始,不能和当时全国各地的自然物 候现象(即春季应有的景象)吻合。如2月初“立春”后,我国华南已经花红柳绿了,然而 华北大地仍会大雪纷飞。现在比较科学的划分,是把候(5天为一候)平均气温在10摄 氏度以上的始日,作为春季开始。 相连的广州起步北上,开始旅程,2月中旬先是到了昆明、下旬则越过云贵高原 进人四川盆地,3月中旬到达武汉三镇、下旬越过郑州和济南,4月上旬抵达京津地区、中旬跨过山海关来到塞外沈阳城、下旬经由长春到达哈尔滨,5月20日前后,才能光 临我国最北的地方—漠河。另外,由于我国幅员辽阔,地势复杂,不但春临大地的时 间有早有迟,春姑娘在各地停留的时间也有长短之别,如西北地区土壤干燥,春季升 温迅速,整个春季仅40多天,可谓来去匆匆;而云南昆明等地则四季如春,被誉为“春城”。 虽然“立春”节气后,不能说春天已到,但确实预示着春天就要来了,农业生产大 忙季节即将来临。这不,一些农谚即是很好的佐证。如:“立春一日,水热三分”、“春到 三分暖”……以安徽省为例,“立春”节气以后,虽处于冬季之尾,但有时气温仍较低, 还会发生低温冻害。如安徽省极端最低气温的极值,就出现在1969年2月6日的固 镇县,为一24.3‘C。因此,仍应加强冬作物,尤其是温室大棚的保温防寒工作。不过,有时冬季偏暖年份,南方湿热空气势力强于北方冷空气,在“立春” 立春时节 立春一到,人们明显地感觉到白昼长了,太阳暖了,气温增高了,日照延长了, 降雨也开始了。农谚提醒人们“立春雨水到,早起晚睡觉”。立春时竹北方备耕也开始了。 “一年之计在干春”,在中国是流传最为广泛的格言式谚语之一。在《增广贤文》中,它一共是四句:“一年之计在于春,一日之计在于晨,一家之计在于和,一生之计 在于勤。”其中前两句,强调的都是时间的紧迫性和重要性。 阳春之季,万象更新,对于农事耕作,这是个关键时刻。对于一个人来说,青年 时期也是人生的春天。青春的创造力是无穷的,爱惜青春,珍视青春,就能创造出知识,创造出财富;反之,浪费青春,虚度年华,除了惭愧之外,将一无所得。“一年之 计在于春”,强调的是要在一年开始时多做并做好工作,为全年的工作打下坚实的基础。 节气特征现象

二十四节气解读

二十四节气解读 立春: 立春,二十四节气[1]之一,“立”是“开始”的意思,立春就是春季的开始,每年2月4日或5日太阳到达黄经315度时为立春。《月令七十二候集解》:“正月节,立,建始也……立夏秋冬同。”古代“四立”,指春、夏、秋、冬四季开始,其农业意义为“春种、夏长、秋收、冬藏”,概括了黄河中下游农业生产与气候关系的全过程。 雨水: 雨水,表示降水开始,雨量逐步增多。每年2月18日前后,太阳到达黄经330度,为“雨水”节气。雨水,表示两层意思,一是天气回暖,降水量逐渐增多了,二是在降水形式上,雪渐少了,雨渐多了。《月令七十二候集解》中说:“正月中,天一生水。春始属木,然生木者必水也,故立春后继之雨水。且东风既解冻,则散而为雨矣。惊蛰: 惊蛰——春雷乍动,惊醒了蛰伏在土壤中冬眠的动物。这时气温回升较快,渐有春雷萌动。每年公历的3月6日左右为惊蛰。二十四节气之一。蛰是藏的意思。“惊蛰”是指钻到泥土里越冬的小动物被雷震苏醒出来活动。“惊蛰”节气日,地球已经达到太阳黄经345度,一般在每年3月4日~7日。 春分: 春分,古时又称为“日中”、“日夜分”、“仲春之月”,在每年的3月21日前后(20日~22日)交节,农历日期不固定,这时太阳到达黄经0°。据《月令七十二候集解》:“二月中,分者半也,此当九十日之半,故谓之分。”另《春秋繁露·阴阳出入上下篇》说:“春分者,阴阳相半也,故昼夜均而寒暑平。”有《明史·历一》说:“分者,黄赤相交之点,太阳行至此,乃昼夜平分。”所以,春分的意义,一是指一天时间白天黑夜平分,各为12小时;二是古时以立春至立夏为春季,春分正当春季三个月之中,平分了春季。 清明: 清明的意思是清淡明智。“清明”是夏历二十四节气之一,中国广大地区有在清明之日进行祭祖、扫墓、踏青的习俗,逐渐演变为华人以扫墓、祭拜等形式纪念祖先的一个中国传统节日。另外还有很多以“清明”为题的诗歌,其中最著名的是唐代诗人杜牧的七绝《清明》。 谷雨: 谷雨是二十四节气之一。谷雨指雨水增多,大大有利于谷类农作物的生长。每年4月19日~21日视太阳到达黄经30°时为谷雨。

二十四节气之立春养生篇

如对您有帮助,可购买打赏,谢谢 生活常识分享二十四节气之立春养生篇 导语:二月四日是立春。立春是一年中的第一个节气,“立”开始之意,立春揭开了春天的序幕,表示万物复苏的春季的开始。此刻“嫩如金色软如丝” 二月四日是立春。立春是一年中的第一个节气,“立”开始之意,立春揭开了春天的序幕,表示万物复苏的春季的开始。此刻“嫩如金色软如丝”的垂柳芽苞,泥土中跃跃而试的小草,正等待着“春风吹又生”,而“律回岁晚冰霜少,春到人间草木知”,形象地反映出立春时节的自然特色。随着立春的到来,人们明显地感觉到白天渐长,太阳也暖和多了,气温、日照、降水也趋于上升和增多。人们按旧历习俗开始“迎春”,我国的台湾还将立春这一天定为“农民节”这是冬三月农闲后的最后一天休息。农谚说得好:立春雨水到,早起晚睡觉。农事活动由此开始,这时人们也走出门户踏青寻春,体会那最细微的最神妙的春意。春季养生要顺应春天阳气生发,万物始生的特点,注意保护阳气,着眼于一个“生”字。按自然界属性,春属木,与肝相应。(这是五行学说,以五行特性来说明五脏的生理活动特点,如肝喜调达,有疏泄的功能,木有生发的特性,故以肝属“木”)肝的生理特点主疏泄,在志为怒,恶抑郁而喜调达。在春季精神养生方面,要力戒暴怒,更忌情怀忧郁,做到心胸开阔,乐观向上,保持心境恬愉的好心态。同时要充分利用、珍惜春季大自然“发陈”之时,借阳气上升,万物萌生,人体新陈代谢旺盛之机,通过适当的调摄,使春阳之气得以宣达,代谢机能得以正常运行。春季气候变化较大,天气乍寒乍暖,由于人体腠理开始变得疏松,对寒邪的抵抗能力有所减弱,所以,初春时节特别是生活在北方地区的人不宜顿去棉服,年老体弱者换装尤宜审慎,不可骤减。《千金要方》主张春时衣着宜“下厚上薄”,《老老恒言》亦云:“春冻半泮,下

二十四节气——立春

立春 简介 立春,二十四节气之一,又名立春节、正月节、岁节、岁旦等。立,是“开始”之意;春,代表着温暖、生长。上古“斗柄指向”法,以北斗星斗柄指向寅位时为立春。现行的“定气法”划分节气,以太阳到达黄经315°时为立春。干支纪元,以立春为岁首,立春意味着新的一个轮回已开启,乃万物起始、一切更生之义也。秦汉以前,礼俗所重的不是阴历一月初一,而是立春日。重大的拜神祭祖、纳福祈年、驱邪攘灾、除旧布新、迎春和农耕庆典等均安排在立春日及其前后几天举行,这一系列的节庆活动不仅构成了后世岁首节庆的框架,而且它的民俗功能也一直遗存至今。 立春作为“二十四节气”之一,与立夏、立秋、立冬一样,反映着一年四季的更替。立春,意味着万物闭藏的冬季已过去,开始进入风和日暖、万物生长的春季。由于我国幅员辽阔,各地的气候相差悬殊,“立”的具体气候意义并不适用于全国各地,所对应的地域是位于北回归线上的岭南地区。冬春的分界线,在广西桂林到江西赣州一线,当立春时(斗指寅,太阳黄经达315°),那一线以南地区,已有春的气息了;但我国93%的陆地面积上都还是冬,到黑龙江,往往是在谷雨或立夏时才入春。“立”对于很多地区来讲只是一种参考意义。 天象规律 立春

黄赤交角及其附近一带的气候、物候亦在渐变,因此成为上古时代人们判断季节、节气变化的依据,即所谓“斗柄指东,天下皆春;斗柄指南,天下皆夏;斗柄指西,天下皆秋;斗柄指北,天下皆冬”的星象规律。古老天文学称北斗七星斗柄所指为“建”,“建”代表斗柄顶端的指向。“十二月建”和“二十四节气”是干支历的基本内容,干支历法将一“岁”(摄提)划分为十二辰(“十二月建”),斗柄旋转依次指向“十二辰”(即十二地支“寅卯辰……子丑”)。以北斗星斗柄所指方位作为确定月份的标准,称为斗建(亦称“月建”),分别为:正月建寅、二月建卯……十一月建子、十二月建丑。斗柄从正东偏北(寅位,后天八卦艮位)开始,绕东、南、西、北旋转一圈,岁末十二月指丑方,次岁正月又复还寅位,斗柄旋转一圈为一周期,谓之一“岁”,循环往复。干支纪元以天象“斗柄回寅”为立春(岁首),亦即“春正”(正月),从立春到下一个立春为一“岁”。立春,意味着新的一个轮回已开启。 地球公转和二十四节气示意图 现行的“定气法”节气也是依照的是精密的天文计算得出。“定气法”划分的“二十四节气”(自1645年起沿用至今),每一个节气分别相应于太阳在黄道上每运行15°所到达的一定位置,反映了太阳对地球产生的影响。“定气法”是根据太阳在回归黄道上的位置确定节气的方法,即在一个为360度圆周的“黄道”(一年当中太阳在天球上的视路径)上,划分为24等份,每15°为1等份,每1等份为一个节气,合起来正好是廿四节气。其是以春分点为0度起点(排序上仍把立春列为首位),按黄经度数编排,也就是视太阳从黄经0度出发(此刻太阳垂直照射在赤道上),每前进15度为一个节气,运行一周又回到春分点,为一回归年。黄道圆周360度,太阳在黄道上每运行15度为一个“节气”,每“节气”的“度数”均等、“时间”不均等。廿四个节气是24个时间点,“点”具体落在哪天,是

二十四节气之首立春日记7篇

立春与立夏、立秋、立冬一样,反映着季节的更替,立春标示着万物闭藏的冬季已过去,开始进入风和日暖、万物生长的春季。那么关于二十四节气之首立春日记要怎么写呢?了解相关精彩内容请参考为大家精心准备的文章:二十四节气之首立春日记 二十四节气之首立春日记1 立春了,可天气还是很冷,我就问妈妈:”立春了,为什么天气还是这么冷呀?“妈妈温和地说:”立春就是春天刚刚来到的意思。“”哦,我明白了,那咱们一起去寻找春天的足迹吧!“ 我们来到小河边,发现冰渐渐融化了,河水也开始流动了。瞧!小金鱼也开始在水中游来游去,好像在迎接春天的到来呢!我们来到草地上,一片枯黄的草丛中,冒出了一缕缕青草,微风吹过,小草对我点头,仿佛对我说:”小朋友们我又回来了。“ 我高兴地大喊:”春姑娘正在慢慢向我们走来,春天来了,春姑娘来了!“ 二十四节气之首立春日记2 今天是2月4日(农历是二月廿一),姥姥告诉我今天立春啦! 俗话说:”春打六九头“,明天就是六九第一天,立春标志着寒冷的冬天已经结束,阳光明媚的春天就要来到啦!一天之计在于晨,一年之计在于春,春天是四季的开始,也是我新的一年学习计划的开端,我一定在新的学年有个好的开端,以报答我的爸爸、妈妈期望和老师的辛勤培育! 我喜欢春天! 二十四节气之首立春日记3 立春了!一年之计在于春,不是吗?当然是,这是不争的事实。 今天立春啦,吃春饼了吗?我是没吃上。家里没有人烙啊!我?必然不会啊! 今年有两个春,只是听说,其实我并不明白是什么意思。跟着知道就好了,估计跟我也没什么本质联系。 立春了嘛,所以天气真得该回暖了。 立春了嘛,所以我该打算打算今后的过活了。 立春了嘛,所以…… ”春打六九头“,什么意思?不懂耶!好高深! 哦哦哦,刚刚在前一秒知道了:立春是六九的第一天耶!真长知识! 二十四节气之首立春日记4

与立春有关的谚语_二十四节气立春谚语大全

与立春有关的谚语_二十四节气立春谚语大全 立春是中国民间重要的传统节日之一。“立”是“开始”的意思,自秦代以来,中国就一直以立春作为孟春时节的开始。关于立春的谚语有哪些呢?以下是小编为大家准备的二十四节气立春谚语,欢迎大家前来参阅。 二十四节气立春谚语大全 立春一日,百草回芽。 立春一日。水暖三分。 立春夭气晴,百物好收成。 立春一年端。种地早盘算。 立春晴,一春晴。 立春下,一春下。 立春阴,花倒春。 立春晴,雨水多。 立春雨到清明,一日落雨一日晴。 立春不晴,还要冷一月零。 立春下雨是反春(指春后有冷雨)。立眷无雨是丰年。 人误地一天,地误人一年。 打春冻人不冻水。

吃了立春饭,一天暖一天。 (1)立春落雨至清明。 寓意:立春那一天如果下雨,预示直到清明前都会多雨。 (2)立春打雷,十处猪栏九处空。 离意:立春开始响雷,六畜不安。 (3)正月展春流。 窝意:立春以后,潮汐的海流会加大。 (4)春天后母面。 窝意:入春以后,天气就像后母的脸色,阴晴冷暖无常。 (5)春雾曝死鬼,夏雾做大水。 寓意:春天降雾说明天会放晴,而夏天降雾则会雨涝成灾。 (6)立春赶春气。 寓意:立春之后万象回春,稻田、池塘等水面开始蒸发,明示世人春天已降临。 (7)初一落初二散,初三落月半。 寓意:初一如果下雨,初二就会放晴;初三如果下雨则可能会一直下到十五。 (8)雨浇上元灯,日晒清明日。 离意:上元日(阴历正月初一)下雨,清明定会是晴天。 (9)早春晚播田。

寓意:立春日如在上年十二月内意为早春,如果要播种不要过早也不要过迟,要按季节行事。 (10)立春天气晴,百事好收成。 寓意:如果立春这一天天气晴朗,一定会有一个好收成。 (11)立春晴,雨水均。 寓意:如果立春这一天天气晴朗,以后便会风调雨顺。 (12)立春晴一日,耕田不费力。 离意:立春是睛天,说明以后的天气风雨相宜,适合耕田。 (13)立春之日雨淋淋,阴阴湿湿到清明。 寓意:如果立春这一天下雨,很容易断断续续下到清明。 (14)雨琳春牛头,七七四十九天愁。 寓意:立春之日下雨,将持续很多天,影响种庄稼。 (15)吃了立春饭,一天暖一天。 寓意:立春以后,天气会逐渐援和起来。 (16)立春打了霜,当春会烂秧。 离意:如果立春降霜,这一年的整个春天农作物都长不好。 (17}雷打立春节,惊蛰雨不歇。 寓意:立春开始打雷,惊蛰时会连续下雨。 (18)腊月立春春水早,正月立春春水迟。 寓意:如果在腊月里立春,雨水会来得早;如果是在正月里立春,雨

二十四节气立春的优秀作文5篇

二十四节气立春的优秀作文5篇 二十四节气立春的优秀作文1 春季是一年中的第一个季节,也是人们最喜爱的季节。今天x月x日是立春。 春天的天气是阳光明媚的,是晴空万里的,是万物复苏的季节。在春姑娘来临之时,我们出去走走是很有利于身体健康的。我们步入公园,呼吸春姑娘的气息,踏着春姑娘的脚步走进大自然去欣赏春天的美景。 冬爷爷拖着沉重的脚步缓缓地走了,春姑娘迈着轻快的步伐来了。小溪里的冰块在溶化,草地上的雪也都消失的无影无踪。花瓣上的雪花溶化在了花蕊里。屋顶上、树枝上、就连屋檐边上挂着的晶莹剔透发出彩虹色彩的冰柱也在滴水……许多小动物们也都睡醒了,开始寻找自己的食物了。 春天的花朵美极了!花儿们都争奇斗艳地开放着。在春天里漫天花香、花瓣飘飞,就连春风也是慢慢得温柔起来了,太阳从东方跳了出来。春姑娘躺在青藏高原绿油油的绿草地上,绿草地就像是一条硕大的绿毯子,春姑娘望着蓝晶晶的天空看着叽叽喳喳的小鸟,晒着太阳,享受着日光浴,别提有多快乐了!

春季的天空是很晴朗的,在碧蓝的天空上有一朵朵雪白雪白的云,飘来飘去。在那雪白雪白的云旁边一只只金黄金黄的小鸟欢快的飞翔。 温暖的春风吹得大地化了残雪,吹绿了树枝,吹蓝了天空,吹得大地泛起了绿意。阳光灿烂的春天,到处是一片欣欣向荣、生机勃勃的景色,真是桃红柳绿,万紫千红。 立春,是一个节气,是一个号令。从这天开始,会有杨柳春风婉转,会有春雨湿了燕子的羽毛,会有桃花红杏花粉梨花白艳丽了天边的流云,会有文字掉落在溪流里化成了蝌蚪,会有蜜蜂在梦里嘤嘤嗡嗡,香甜了一个又一个夜晚······ 春姑娘是春天的象征。我觉得春天非常美丽。 二十四节气立春的优秀作文2 立春是一个节气,是一个号令,是一个转折,从这天开始,风柔了,天也暖了,隐隐约约,春的味道,甚浓了。 我来到屋外,试着去发现春的气息,春的足迹。 看,园内的青草儿从大地黝黑的胸膛上探出了脑袋,懵懂无知地张望蓝天,经受着还略带寒意的东风,摇动着……它们将弱不禁风的野花苞簇拥着,在风中,小花骨朵儿争先恐后地睁大好奇的眼睛,你们这是在和我一起探寻春的气息吗?

二十四节气之立春教学设计

立春 一、教材分析 立春,农历二十四节气中的第一个节气。立春是从天文上来划分的,即太阳到达黄经315°时。立春是中国民间重要的传统节日之一。“立”是“开始”的意思,自秦代以来,中国就一直以立春作为孟春时节的开始。所谓“一年之计在于春”,春是温暖,鸟语花香;春是生长,耕耘播种。立春之日迎春已有三千多年历史,中国自官方到民间都极为重视。立春时,天子亲率三公九卿、诸侯大夫去东郊迎春,祈求丰收。回来之后,要赏赐群臣,布德令以施惠兆民。这种活动影响到庶民,使之成为后来世世代代的全民的迎春活动。教材通过听关于“咬春”的故事使学生了解民间立春的习俗;通过赏析美文《春》,引领学生走进立春时节的美景,感受初春的美好气息;通过品味农谚和相关古诗词《立春偶得》《春雪》带领学生感受百姓和文人眼中的立春节气,并增加了学生的文学积累;玩“打春牛”游戏更是让孩子们颇感节气的趣味,倍增对节气的兴趣和喜爱情怀,剪纸刻春牛更是融入了我国特有的民间文化;教材最后“不同角度写景色”练笔则从大作家朱自清的《春》讲起,引领学生感受作者通过看、听、闻、尝、触摸等五个角度将景物写具体,写细致,告诉学生“面对大自然,我们要用眼睛去看,用耳朵去听,用鼻子去闻,用舌头去尝,用身体去触摸”这样就可以把景色写得更生动。 二、教学目标 1.通过学习了解“立春”的由来、相关农谚、诗歌及各地习俗等知识。 2.通过“做春饼”等实践活动,感受立春的饮食习俗,快乐体验。 3.通过各种活动形式激发学生对民族文化的喜爱之情,传承并发扬民族精神。 三、教学准备 教师准备:多媒体课件;做春饼的材料(和好的面团) 学生准备:小小组分工准备制作春饼的材料豆芽、萝卜、韭菜、菠菜、生菜、豆子、鸡蛋、土豆丝等。 四、教学过程 (一)立春我知道 1.分享感知,畅谈立春 自主交流:观察日历,说一说所了解的“立春”。

二十四节气 之春分

二十四节气之春分 一、教学目标: 1、了解与立春有关的习俗,感受立春带给我们的文化情趣。 2、学习"民间圣果"做法从中体验享受自己劳动成果的喜悦。 教学重点 通过收集立春的材料,传承民俗文化,建立起对家乡浓厚的感情。 教学难点 引导学生主动探索传统节日的历史渊源、独特情趣。 教学准备 1.教师准备: 了解各地有关立春习俗。有关立春习俗的图片,各种与立春节庆祝活动有关的文字介绍。 2.学生准备: 向老人询问民间流传的与立春有关的谚语、习俗。 教学设计: 一: 初步了解一些春季气候变化的相关常识,以及气候变化对生活的影立春简介: (课件、春天的图片) 立春,是二十四节气之一,又称“打春”,“立”是“开始”的意思,中国以立春为春季的开始,每年2月4日或5日太阳到达黄经315度时为立春。

古人将立春定为24个节气之首。立春之日,晚上七点时仰望星空,可见北斗七星的斗柄正指向东北,即方位角45度的地方,古人称为艮(八卦之一)的方向。 二、教师介绍立春节的由来 立春的“立”表示开始,“春”表示季节,故立春有春之节气已开始之意。 农谚有“春打六九头”、“几时霜降几时冬,四十五天就打春”之语,从冬至开始入九“五九”四十五天,因而立春正好是“六九”的开始。 立春作为节令早在春秋时就有了,那时一年中有立春、立夏、立秋、立冬、春分、秋分、夏至、冬至八个节令,到了《礼记?月令》一书和西汉刘安所著的《淮南子?天文训》中,才有24个节气的记载。 在汉代前历法曾多次变革,那时曾将24节气中的立春这一天定为春节,意思春天从此开始。这种叫法曾延续了两千多年,直到1913年,当时的国民政府正式下了一个文件,明确每年的正月初一为春节。 此后立春日,仅作为24个节气之一存在并传承至今。 三介绍xx的习俗 立春亦称“打春”、“咬春”,又叫“报春”。这个节令与众多节令一样有众多民俗,有迎春行春的庆贺祭典与活动,有打春的“打牛”和咬春吃春饼、春盘、咬萝卜之习俗等。 从历史文献记载来看,我国历朝历代的迎春仪式隆重而浩大。此时的立春已经超过了农历24节气只标示节令与气候的功能,而是已经被当成一个节日了,因此在历史上便演化出了许多围绕立春而举办的活动及民俗,如“春娃”、“春鞭”、“春卷”、“打春”、“春酒”、“春牛”等等。 咬春是指立春日吃春盘、吃春饼、吃春卷、嚼萝卜之俗,一个“咬”字道出节令的众多食俗。春盘春饼是用蔬菜、水果、饼饵等装盘馈送亲友或自食,称为春盘。四、拓展: 四、拓展:

二十四节气之立春

立春(Beginning of Spring) 2017年立春时间是2月3日23:34:01 时间 每年02月3~5日 三候 东风解冻:东风送暖,大地开始解冻。 蛰虫始振:蜇居的虫类慢慢在洞中苏醒。 鱼陟负冰:河里的冰开始溶化,鱼开始到水面上游动,此时水面上还有没完全溶解的碎冰片,如同被鱼负着一般浮在水面。 介绍 立春,是二十四节气之一,又称“打春”,“立”是“开始”的意思,中国以立春为春季的开始,每年2月4日或5日太阳到达黄经315度时为立春。

古人将立春定为24个节气之首。立春之日,晚上七点时仰望星空,可见北斗七星的斗柄正指向东北,即方位角45度的地方,古人称为艮(八卦之一)的方向。 立春的“立”表示开始,“春”表示季节,故立春有春之节气已开始之意。农谚有“春打六九头”、“几时霜降几时冬,四十五天就打春”之语,从冬至开始入九“五九”四十五天,因而立春正好是“六九”的开始。 由来 立春作为节令早在春秋时就有了,那时一年中有立春、立夏、立秋、立冬、春分、秋分、夏至、冬至八个节令,到了《礼记?月令》一书和西汉刘安所著的《淮南子?天文训》中,才有24个节气的记载。 在汉代前历法曾多次变革,那时曾将24节气中的立春这一天定为春节,意思春天从此开始。这种叫法曾延续了两千多年,直到1913年,当时的国民政府正式下了一个文件,明确每年的正月初一为春节。 此后立春日,仅作为24个节气之一存在并传承至今。 习俗 立春亦称“打春”、“咬春”,又叫“报春”。这个节令与众多节令一样有众多民俗,

有迎春行春的庆贺祭典与活动,有打春的“打牛”和咬春吃春饼、春盘、咬萝卜之习俗等。 从历史文献记载来看,我国历朝历代的迎春仪式隆重而浩大。此时的立春已经超过了农历24节气只标示节令与气候的功能,而是已经被当成一个节日了,因此在历史上便演化出了许多围绕立春而举办的活动及民俗,如“春娃”、“春鞭”、“春卷”、“打春”、“春酒”、“春牛”等等。 咬春是指立春日吃春盘、吃春饼、吃春卷、嚼萝卜之俗,一个“咬”字道出节令的众多食俗。春盘春饼是用蔬菜、水果、饼饵等装盘馈送亲友或自食,称为春盘。 养生 饮食调养方面要考虑春季阳气初生,宜食辛甘发散之品,不宜食酸收之味。《素问藏气法时论》说:“肝主春,……肝苦急,急食甘以缓之,……肝欲散,急食辛以散之,用辛补之,酸泻之”。过了立春就意味着春天要到了,万物生发,一派生机勃勃。 春季养生要顺应春天阳气生发,万物始生的特点,注意保护阳气,着眼于一个“生”字。按自然界属性,春属木,与肝相应。肝的生理特点主疏泄,在志为怒,恶抑郁而喜调达。有目的地选择一些柔肝养肝、疏肝力理气的草药和食品,草药如枸杞、郁金、丹参、元胡等,食品选择辛温发散的大枣、豆豉、葱、香菜、花生等灵活地进行配方选膳。 在春季精神养生方面,要力戒暴怒,更忌情怀忧郁,做到心胸开阔,乐观向上,保持心

二十四节气谚语之立春篇

二十四节气谚语之立春篇 立春,二十四节气之一,又名立春节、正月节、岁节、岁旦等。立,是“开始”之意;春,代表着温暖、生长。上古“斗柄指向法”,以北斗星斗柄指向寅位时为立春。现行的“定气法”划分节气,以太阳到达黄经315°时为立春。干支纪元,以立春为岁首,立春意味着新的一个轮回已开启,乃万物起始、一切更生之义也。秦汉以前,礼俗所重的不是阴历一月初一,而是立春日。重大的拜神祭祖、纳福祈年、驱邪攘灾、除旧布新、迎春和农耕庆典等均安排在立春日及其前后几天举行,这一系列的节庆活动不仅构成了后世岁首节庆的框架,而且它的民俗功能也一直遗存至今。 ●春争日,夏争时,一年大事不宜迟。 ●船到不等客,季节不饶人。 ●人误地一天,地误人一年。 ●增产措施千万条,不误农时最重要。 ●立春雨水到,早起晚睡觉。 ●要想庄稼好,一年四季早。 ●一场春风对一场秋雨。 ●行下春风望夏雨。 ●春寒夏闷多雨,秋冷冬干多风。 ●春寒雨飕飕,夏寒雨断流。 ●春寒有雨夏寒晴。

●八月十五云遮月,正月十五雪打灯。 ●收花不收花,但看正月三个八。 ●立春雪水化一丈,打得麦子无处放。 ●立春热过劲,转冷雪纷纷。 ●两春加一冬,无被暖烘烘。 ●春脖短,早回暖,常常出现倒春寒。 ●春脖长,回春晚,一般少有倒春寒。 ●打春冻人不冻水。 ●立春一日,百草回芽。 ●一年之计在于春,一日之计在于晨。 ●立春一年端,种地早盘算。 ●一人心里没有计,三人肚里唱本戏。 ●立春一日,百草回芽。 ●立春一日。水暖三分。 ●立春夭气晴,百物好收成。 ●立春一年端。种地早盘算。 ●立春晴,一春晴。 ●立春下,一春下。 ●立春阴,花倒春。 ●立春晴,雨水多。 ●立春雨到清明,一日落雨一日晴。 ●立春不晴,还要冷一月零。

二十四节气立春是什么意思_关于立春节气的简单介绍

二十四节气立春是什么意思_关于立春节气的简单介绍立春,为中国二十四节气之一,喻意春季的开始,时间在2月4或5日之间,该时太阳位于黄经315°。从这一天到立夏这段期间,都被称之为春天。或许有的朋友不知道立春是什么意思,下面小编就来给大家准备了二十四节气立春的介绍,欢迎大家来参阅。 在节气中,“立”是开始的意思,所以立春即有春天开始之意。 《月令七十二候集解》:“立春,正月节;立,建始也;五行之气往者过来者续于此;而春木之气始至,故谓之立也;立夏、秋、冬同。”中国幅员辽阔,各地气候相差悬殊,“立”的具体气候意义并不适用于全国各地,所对应的地域只是我国华南地区,分界线在广西桂林到江西赣州一线。立春时(日平均气温连续5天达10摄氏度以上算入春),那一线以南地区,已有春的气息了;但我国93%的陆地面积上都还是冬,到黑龙江,往往是在谷雨立夏时才入春。“立”对于很多地区来讲只是一种参考意义。 古籍《群芳谱》对立春解释为:“立,始建也。春气始而建立也。”另《立春》诗云:“东风带雨逐西风,大地阳和暖气生。万物苏萌山水醒,农家岁首又谋耕。”(左河水)。在气候学中,春季是指候(5天为一候)平均气温10℃至22℃的时段。立春节令,气温、日照、降雨开始趋于上升、增多,阳和起蛰,品物皆春。人们对“春”很重视,将有“双春”之年视为大吉年份。 立春,是二十四节气之一,又称“打春”,“立”是“开始”的意思,中国以立春为春季的开始,每年2月4日或5日太阳到达黄经315度时为立春。 古人将立春定为24个节气之首。立春之日,晚上七点时仰望星空,可见北斗七星的斗柄正指向东北,即方位角45度的地方,古人称为艮(八卦之一)的方向。 旧俗立春日又为民间传统节日,称“立春节”,中国自古为农业国,春种秋收,关键在春。民谚有“一年之计在于春”的说法。旧俗立春,既是一个古老的节气,也是一个重大的节日。《事物记原》记载:“周公始制立春土牛,盖出土牛以示农耕早晚。”后世历代封建统治者这一天都要举行鞭春之礼,意在鼓励农耕,发展生产。 立春作为节令早在春秋时就有了,那时一年中有立春、立夏、立秋、立冬、春分、秋分、夏至、冬至八个节令,到了《礼记·月令》一书和西汉刘安所著的《淮南子·天文训》中,才有24个节气的记载。

二十四节气立春的相关知识

二十四节气立春的相关知识 立春,是二十四节气之一,又称“打春”,“立”是“开始”的意思,中国以立春为春季的开始,每年2月4日或5日太阳到达黄经315度时为立春。 古人将立春定为24个节气之首。立春之日,晚上七点时仰望星空,可见北斗七星的斗柄正指向东北,即方位角45度的地方,古人称为艮(八卦之一)的方向。 立春的“立”表示开始,“春”表示季节,故立春有春之节气已开始之意。农谚有“春打六九头”、“几时霜降几时冬,四十五天就打春”之语,从冬至开始入九“五九”四十五天,因而立春正好是“六九”的开始。 那么什么时候立春呢? 立春的时间一般都在每年02月3~5日,立春时间不是固定在某一天的,随着每年节气变化的不同,立春的时间也不一样,比如2013年立春时间是2月4日00:13:25。 立春养生防病六大原则 立春六大养生防病原则——不酸 春天饮食应“省酸增甘”,因春天本来肝阳上亢,若再吃酸性食物,易导致肝气过于旺盛,而肝旺容易损伤脾胃。所以,春季饮食忌酸,不宜食用羊肉、狗肉、海鱼、虾、鹌鹑、螃蟹等酸性食物。春季养生

宜食用甘温补脾之品,可多吃山药、菠菜、大枣、春笋、韭菜等。立春六大养生防病原则——不静 春天自然界阳气开始升发,人体应该借助这一自然特点,重点养阳,养阳关键在于“动”,切忌“静”。人们应积极到室外锻炼,春季空气中负氧离子较多,能增强大脑皮层的工作效率和心肺功能,防止动脉硬化。但是需注意的是春练不要太早,防止因早晨气温低、雾气重而患伤风感冒或哮喘病、慢性支气管炎,应在太阳升起后外出锻炼。立春六大养生防病原则——不怒 春季养生要顺应春天阳气生发,万物始生的特点,注意保护阳气,着眼于一个“生”字。按自然界属性,春属木,与肝相应。春季肝阳亢盛之时,情绪易急躁,要做到心胸开阔,身心和谐。心情舒畅有助于养肝,心情抑郁会导致肝气郁滞,影响肝的疏泄功能,也使神经内分泌系统功能紊乱,免疫力下降,容易引发精神病、肝病、心脑血管病等。因此,春季养生要不怒。 立春六大养生防病原则——不贪欲 老人本来阳气相对不足,而春天是养阳大好时机,如情欲妄动而房事较频,会耗气伤精,进一步损伤阳气,因此老年人在春天应节制房事。立春六大养生防病原则——不减衣 春季气候变化较大,天气乍寒乍暖,由于人体腠理开始变得疏松,对寒邪的抵抗能力有所减弱,所以,初春时节不宜顿去棉服,年老体弱者换装尤宜审慎,不可骤减。《千金要方》主张春时衣着宜“下厚上薄”,《老老恒言》亦云:“春冻半泮,下体宁过于暖,上体无妨略减,

立春节气的由来和习俗

立春节气的由来和习俗 立春不仅是农历二十四节气中的第一个节气,立春是从天文上来划分的,而在自然界、在人们的心目中,春意味着风和日暖,鸟语花香;春也意味着万物生长,农家播种。古籍《群芳谱》对立春解释为:“立,始建也。春气始而建立也。”立春期间,气温、日照、降雨,开始趋于上升、增多。但这一切对全国大多数地方来说仅仅是春天的前奏。 战国后期成书的《吕氏春秋》“十二月纪”中,就有了立春、春分、立夏、夏至、立秋、秋分、立冬、冬至等八个节气名称。这八个节气,是二十四个节气中最重要的节气。标示出季节的转换,清楚地划分出一年的四季。立春到立夏前为春季,立夏到立秋前为夏季,立秋到立冬前为秋季,立冬到立春前为冬季。 立春是汉族民间重要的传统节日之一。“立”是“开始”的意思,自秦代以来,中国就一直以立春作为春季的开始。立春是从天文上来划分的,春是温暖,鸟语花香;春是生长,耕耘播种。从立春交节当日一直到立夏前这段期间,都被称为春天。 立春节气的由来 立春不仅是二十四节气中的第一个节气,而且还是一个重要的节日。在天文意义上它标志着春季的开始。今年的立春节气从2月4日开始,到2月18日结束。 对立春的理解,古籍《群芳谱》中这样解释的:“立,始建也。春气始而建立也。”立春期间,气温、日照、降雨,开始趋于上升、增多。但这一切对全国大多数地方来说仅仅是春天的前奏,春天的序幕还没有真正地拉开。 自秦代以来,我国就一直以立春作为春季的开始。立春是从天文上来划分的,而在自然界、在人们的心目中,春是温暖,鸟语花香;春是生长,耕耘播种。在气候学中,春季是指候(5天为一候)平均气温10℃至22℃的时段。时至立春,人们明显地感觉到白昼长了,太阳暖了。气温、日照、降雨,这时常处于一年中的转折点,趋于上升或增多。小春作物长势加快,油菜抽苔和小麦拔节时耗水量增加,应该及时浇灌追肥,促进生长。农谚提醒人们“立春雨水到,早起晚睡觉”大春备耕也开始了。虽然立了春,但是华南大部分地区仍是很冷的“白雪却嫌春色晚,故穿庭树作飞花”的景象。这些气候特点,在安排农业生产时都是应该考虑到的。 立春以后,由于阳气上升、万物复苏、大地解冻、气温回升等因素,夜晚发出一种香甜,清新的气味,取代了秋冬季节灰尘、落叶的气味。 立春节气的习俗 句芒神 句芒为春神,即草木神和生命神。句芒的形象是人面鸟身,执规矩,主春事。在周代就有设东堂迎春之事,说明祭句芒由来已久。 浙江地区立春前一日有迎春之举。立春前一日抬着句芒神出城上山,同时又祭太岁。太岁为值岁之神,坐守当年,主管当年之休咎,因此民间也多祭之。迎神时多举行有大班鼓吹、抬阁、地戏、秧歌、打牛等活动。从乡村抬进城后,人们夹道聚观,争掷五谷,谓之看迎春。山东迎春祭句芒时,根据句芒的服饰预告当年的气候状况:戴帽则示春暖,光头则示春寒,穿鞋则示春雨多,赤脚则示春雨少。其他地区则贴“春风得意”等年画。广州地区则在立春前后,击鼓驱疫,祈求平安。 迎春 迎春是立春的重要活动,事先必须做好准备,进行预演,俗称演春。然后才能在立春那天正式迎春。迎春是在立春前一日进行的,目的是把春天和句芒神接回来。迎春设春官,该职由乞丐担任,或者由娼妓充当,并预告立春之时。过去在每年的皇历上都有芒神、春牛图,清末《点石斋画报》上的“龟子报春”、“铜鼓驱疫”,都是当时过立春节日的重要活动。

二十四节气之立春

立春(Beginning of Spring) 2017年立春时间是2月3日23:34:01 时间 每年02月3~5日 三候 东风解冻:东风送暖,大地开始解冻。 蛰虫始振:蜇居的虫类慢慢在洞中苏醒。 鱼陟负冰:河里的冰开始溶化,鱼开始到水面上游动,此时水面上还有没完全溶解的碎冰片,如同被鱼负着一般浮在水面。 介绍 立春,是二十四节气之一,又称“打春”,“立”是“开始”的意思,中国以立春为春季的开始,每年2月4日或5日太阳到达黄经315度时为立春。

古人将立春定为24个节气之首。立春之日,晚上七点时仰望星空,可见北斗七星的斗柄正指向东北,即方位角45度的地方,古人称为艮(八卦之一)的方向。 立春的“立”表示开始,“春”表示季节,故立春有春之节气已开始之意。农谚有“春打六九头”、“几时霜降几时冬,四十五天就打春”之语,从冬至开始入九“五九”四十五天,因而立春正好是“六九”的开始。 由来 立春作为节令早在春秋时就有了,那时一年中有立春、立夏、立秋、立冬、春分、秋分、夏至、冬至八个节令,到了《礼记?月令》一书和西汉刘安所著的《淮南子?天文训》中,才有24个节气的记载。 在汉代前历法曾多次变革,那时曾将24节气中的立春这一天定为春节,意思春天从此开始。这种叫法曾延续了两千多年,直到1913年,当时的国民政府正式下了一个文件,明确每年的正月初一为春节。 此后立春日,仅作为24个节气之一存在并传承至今。 习俗 立春亦称“打春”、“咬春”,又叫“报春”。这个节令与众多节令一样有众多民俗,

有迎春行春的庆贺祭典与活动,有打春的“打牛”和咬春吃春饼、春盘、咬萝卜之习俗等。 从历史文献记载来看,我国历朝历代的迎春仪式隆重而浩大。此时的立春已经超过了农历24节气只标示节令与气候的功能,而是已经被当成一个节日了,因此在历史上便演化出了许多围绕立春而举办的活动及民俗,如“春娃”、“春鞭”、“春卷”、“打春”、“春酒”、“春牛”等等。 咬春是指立春日吃春盘、吃春饼、吃春卷、嚼萝卜之俗,一个“咬”字道出节令的众多食俗。春盘春饼是用蔬菜、水果、饼饵等装盘馈送亲友或自食,称为春盘。 养生 饮食调养方面要考虑春季阳气初生,宜食辛甘发散之品,不宜食酸收之味。《素问藏气法时论》说:“肝主春,……肝苦急,急食甘以缓之,……肝欲散,急食辛以散之,用辛补之,酸泻之”。过了立春就意味着春天要到了,万物生发,一派生机勃勃。 春季养生要顺应春天阳气生发,万物始生的特点,注意保护阳气,着眼于一个“生”字。按自然界属性,春属木,与肝相应。肝的生理特点主疏泄,在志为怒,恶抑郁而喜调达。有目的地选择一些柔肝养肝、疏肝力理气的草药和食品,草药如枸杞、郁金、丹参、元胡等,食品选择辛温发散的大枣、豆豉、葱、香菜、花生等灵活地进行配方选膳。 在春季精神养生方面,要力戒暴怒,更忌情怀忧郁,做到心胸开阔,乐观向上,保持心境愉悦

二十四节气之立春教学设计

《立春》教学设计 【教学目标】 : 1、了解与立春有关的习俗,感受立春带给我们的文化情趣。 2、学习"民间圣果 "做法从中体验享受自己劳动成果的喜悦。 【教学重点】: 通过收集立春的材料,传承民俗文化,建立起对家乡浓厚的感情。教学难点引导学生主动探索传统节日的历史渊源、独特情趣。 【教学课时】: 1课时 【教师准备】: 了解各地有关立春习俗。有关立春习俗的图片,各种与立春节庆祝活动有关的文字介绍。 【学生准备】: 向老人询问民间流传的与立春有关的谚语、习俗。 【教学过程】 : 一:前置学习 初步了解一些春季气候变化的相关常识,以及气候变化对生活的影响立春简介 :(课件、春天的图片 ) 立春,是二十四节气之一,又称“打春”,“立”是“开始”的意思,中国以立春为春季的开始,每年2月4日或 5 日太阳到达黄经 315 度时为立春。古人将立春定为24个节气之首。立春之日,晚上七点时仰望星空,可见北斗七星的斗柄正指向东北,即方位角45度的地方,古人称为艮(八卦之一 )的方向。 二、教师介绍立春节的由来 立春的“立”表示开始,“春”表示季节,故立春有春之节气已开始之意。农谚有“春打六九头”、“几时霜降几时冬,四十五天就打春”之语,从冬至开始入九“五九”四十五天,因而立春正好是“六九”的开始。 立春作为节令早在春秋时就有了,那时一年中有立春、立夏、立秋、立冬、春分、秋分、夏至、冬至八个节令,到了《礼记.月令》一书和西汉刘安所著的《淮南子 .天文训》中,才有24 个节气的记载。 在汉代前历法曾多次变革,那时曾将24节气中的立春这一天定为春节,意思春天从此开始。这种叫法曾延续了两千多年,直到1913 年,当时的国民政府正式下了一个文件,明确每年的正月初一为春节。 此后立春日,仅作为24 个节气之一存在并传承至今。 三、介绍立春的习俗 立春亦称“打春”、“咬春”,又叫“报春”。这个节令与众多节令一样有众多民俗,有迎春行春的庆贺祭典与活动,有打春的“打牛”和咬

相关主题
文本预览
相关文档 最新文档