当前位置:文档之家› c ○ World Scientific Publishing Company CHINESE WORD SEARCHING IN IMAGED DOCUMENTS

c ○ World Scientific Publishing Company CHINESE WORD SEARCHING IN IMAGED DOCUMENTS

c ○ World Scientific Publishing Company CHINESE WORD SEARCHING IN IMAGED DOCUMENTS
c ○ World Scientific Publishing Company CHINESE WORD SEARCHING IN IMAGED DOCUMENTS

International Journal of Pattern Recognition

and Arti?cial Intelligence

Vol.18,No.2(2004)229–246

c Worl

d Scienti?c Publishing Company

CHINESE WORD SEARCHING IN IMAGED DOCUMENTS

YUE LU?and CHEW LIM TAN?

Department of Computer Science,School of Computing,

National University of Singapore,Kent Ridge,Singapore117543

?luy@https://www.doczj.com/doc/e618684757.html,.sg

?tancl@https://www.doczj.com/doc/e618684757.html,.sg

An approach to searching for user-speci?ed words in imaged Chinese documents,with-

out the requirements of layout analysis and OCR processing of the entire documents,is

proposed in this paper.A small number of Chinese characters that cannot be success-

fully bounded using connected component analysis due to larger gaps between elements

within the characters are blacklisted.A suitable character that is not included in the

blacklist is chosen from the user-speci?ed word as the initial character to search for a

matching candidate in the document.Once a matched candidate is found,the adja-

cent characters in the horizontal and vertical directions are examined for matching with

other corresponding characters in the user-speci?ed word,subject to the constraints of

alignment(either horizontal or vertical direction)and size similarity.A weighted

Hausdor?distance is proposed for the character matching.Experimental results show

that the present method can e?ectively search the user-speci?ed Chinese words from

the document images with the format of either horizontal or vertical text lines,or both

appearing on the same image.

Keywords:Chinese document image;word searching;character segmentation;character

matching;weighted Hausdor?distance.

1.Introduction

With the advent and rapid development of computer technology,digital documents have become popular for storage and transmission instead of the traditional paper documents.The most widespread format of these digital documents is text in which the characters of the documents are represented by machine-readable codes(for ex-ample,ASCII codes).Undoubtedly,the text format facilitates not only the storage and transmission of documents,but also document information retrieval.There has been active research on Web content extraction using text-based techniques,on which the text retrieval community has made signi?cant progress.20,21,28 Although most of the newly generated documents are in text format,billions of volumes of traditional newspapers,books,magazines,etc.are in the paper format, and need to be transferred to the digital domain.In general,these paper docu-ments are easily scanned and converted to digital images by the digitization equip-ment.The technology of Optical Character Recognition(OCR)may be utilized to

229

230Y.Lu&C.L.Tan

automatically transfer the digital images of these paper documents to their machine-readable text format.However,OCR has its inherent weaknesses in the recognition ability,especially for the recognition of Chinese characters.Although many com-mercial products of machine printed Chinese character recognition systems have appeared on the market in recent years,5the process still carries a signi?cant cost. An obvious reason is the need for manual correction/proo?ng of OCR results,which is typically not cost e?ective for transferring a huge amount of paper documents to their text format.

It is generally acknowledged that the OCR accuracy requirements for in-formation retrieval are considerably lower than for many document processing applications.23Motivated by this observation,the method with the ability of tol-erating recognition errors of OCR has been researched recently.10,19Additionally, some methods are reported to improve the retrieval performance by using the OCR candidates.12,13In these OCR-based methods,the layout analysis and character segmentation are unavoidable.However,the research on layout analysis and under-standing is still immature,especially for documents with complex layout.

Nowadays,many documents are stored in the image format due to various rea-sons.It is of signi?cant meaning to study the strategies of retrieving information from these document images directly.An image-based approach to searching and locating a user speci?ed word in the document images has its practical value for re-trieving information from the document repositories,such as digital library archives or the Internet.For example,by using this technique the user can detect a speci-?ed word in the document images without the requirement of OCRing the entire document.

In the case where there are a large number of image documents in the Internet or digital libraries,the user has to download each one to see its contents before knowing whether the document is relevant to his interest.The image-based word searching technology is capable of notifying the user whether a document image contains words of interest to him,and indicating which documents are worth downloading, prior to transmitting the document images through the Internet.

The image-based word searching method is somewhat di?erent from the method of OCRing the document images followed by analysis of the resulting text.The former concentrates only on the presence of the user speci?ed word while paying no attention to the unrelated words.The latter must assign each character in the document to a de?nite class in the OCR procedure no matter whether the character is relevant to the user’s interest or not.Of course,comparing with the image-based approach,OCR has an advantage of producing machine readable text for subsequent use.However,if the goal is only to retrieve a handful of documents from a huge collection of document images,it makes sense not to OCR the entire collection which will never be used again.

In recent years,a number of attempts have also been made to avoid the use of OCR for various text processing applications,1,6but have been mainly on English text documents.Chinese language is quite di?erent from the western languages.

Chinese Word Searching in Imaged Documents231 The major di?culties of word searching in Chinese document images arise due to the complicated nature of Chinese documents as follows:

(a)The number of commonly-used characters in Chinese language is very large.It

was observed a long time ago that Chinese character recognition is probably the most di?cult problem for character recognition.2

(b)The layout of Chinese documents is very complex.The text direction can be

either horizonal or vertical.Moreover,mixed text directions even appear on one same document page sometimes.

(c)A Chinese word is composed of two or more characters,and there is no obvious

and distinct separator between two words.In other words,a Chinese word has the same form as a phrase in the western languages.

Some approaches to searching English word(i.e.keyword spotting)have been reported in the past years,3,4,14,15,18,24but little research has been done on Chinese document images.One related piece of work on Chinese documents is image-based keyword recognition presented by Zhu et al.30The performance of their method ev-idently depends on the prede?ned dictionary,so it is not capable of dealing with an arbitrary word that is not included in the dictionary.Chang2described a method for retrieving information from Chinese document images with the ability of tolerating the OCR errors.He et al.8presented a content-based indexing and retrieval method for Chinese document images.Obviously,the layout analysis and understanding are required in these methods prior to the word searching procedure.

In this paper,we propose an image-based approach to searching user-speci?ed words in Chinese document images,without the requirements of layout analysis and OCRing the entire document images.Connected components are detected and are merged to bound characters or elements?rst.A majority of Chinese characters can be successfully segmented in this way,except for some that have a signi?cant spacing between their separable and disjunct elements.A blacklist is constructed to recode the characters that are prone to be unsuccessfully bounded.From the sequence of characters in the user-speci?ed word,we select a suitable character which is not included in the blacklist as the initial character,to search for a match-ing candidate in the document.Once a matched candidate is found,the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-speci?ed word,subject to the constraints of size similarity and text line alignment.

The character matching process is divided into two stages.The?rst stage, which is called coarse-matching,is carried out based on the stroke density fea-tures to reduce the matched character set to a smaller set.The second stage based on our proposed weighted Hausdor?distance(WHD)aims to choose the best matched characters.Both of the two stages are implemented by the low-level character matching strategy.As such,they are capable of dealing with either simpli-?ed Chinese characters or traditional Chinese characters.In fact the idea of WHD is applicable to western languages,as reported in our previous work.16Comparing

232Y.Lu&C.L.Tan

with other approaches to English text,1,25the WHD is simpler,and more resistant to noise.

Experiments are conducted on the document images collected from Chinese newspapers with diverse layout.The experimental results show that the proposed strategy can e?ectively search the user-speci?ed Chinese words from document images in the format of either horizontal or vertical text line,or both.

The remainder of this paper is organized as follows.The next section dis-cusses the method for bounding Chinese characters in document images.Sections3 presents the approach of Chinese character matching.Section4describes the out-line of the proposed method for searching user-speci?ed Chinese word.Section5 demonstrates some experimental results of the present method,and?nally Sec.6 gives the conclusions and future works.

2.Bounding Chinese Characters and Elements in

Document Images

The strategy used for segmenting characters in document images can be divided into two categories,viz.top-down and bottom-up.26Each has its own advantages and disadvantages.The top-down approach breaks down the document images into di?erent blocks such as headlines,text lines,graphics,tables,etc.the characters are then extracted from the headlines and text lines.The knowledge of the document layout,therefore,is much important during the processing.This approach is simple and fast.It is e?ective for processing the documents with a speci?c format.

On the other hand,in the bottom-up approach,all of the connected components are detected?rst.The connected components are then merged according to their relative locations and sizes.This approach is possible to develop algorithms which are applicable to a variety of documents,but it is time-consuming.

In the literature,the top-down approach has been mainly employed for seg-menting the machine printed Chinese characters,based on the use of multilevel X–Y projection.However,the layout of Chinese documents is more complex than that of western documents.For example,the text can be in either horizontal lines or vertical lines.Sometimes,the horizontal and vertical text lines even appear within one document.Moreover,the embedded titles,diverse character sizes make the layout analysis and understanding of Chinese document image quite di?cult.

To deal with the Chinese document images with complex layout,a bottom-up algorithm is utilized in this present approach to bound the Chinese characters.First, a connected component detecting algorithm is applied to get all of the connected components in the document image.Then,the connected components with small areas are eliminated as noises.The connected components larger than a threshold, which may be graph regions,are ignored too.Finally,the connected components are merged to generate the bounding boxes of Chinese characters,according to their relative positions and sizes.

Chinese Word Searching in Imaged Documents233 The topological relation of any two components belongs to one of the three types, viz.upper-lower,left-right and inside-outside.17If one component is encompassed by another one,they are merged straight away.While two components are in the position of upper-lower or left-right,they can be merged only when they are quite close to each other compared to their sizes.The merging processing is an iterative procedure.

As a result,a majority of machine printed Chinese characters(above95% generally)can be successfully bounded using the above processing,except for some Chinese characters with a large spacing gap between their separable and disjunct elements,as showed in Fig.1.The remainders consist of the elements of characters that cannot be easily merged based on only the characters’general structure knowl-edge.One reason is that the width and height of some left-right disjunct elements in Chinese characters as shown in Fig.2(a)are very similar to that of numeral characters and English characters.Figure2(b)shows some examples of Chinese characters with upper-lower separable and disjunct elements,which are di?cult to be merged.

To avoid the excessive-merger of numeral and English characters appearing on the same document image,a few Chinese characters can only be bounded at the

element level.We can make a blacklist to remember all of the characters which

are

character with upper-lower disjunct elements

character with left-right disjunct elements numeral characters are prone to be bounded as Chinese character

Fig.1.Bounding boxes

of characters

and elements.

(a)(b)

Fig.2.Some Chinese characters with disjunct elements that cannot be merged:(a)Characters with left-right disjunct elements,(b)characters with upper-lower disjunct elements.

234Y.Lu&C.L.Tan

prone to be bounded at the element level rather than at the character level.In the word searching procedure,the characters included in the blacklist will not be chosen as the initial one to be searched.

In our system,122Chinese characters with a large spacing gap between their elements are listed in the https://www.doczj.com/doc/e618684757.html,pared to the commonly-used Chinese characters(e.g.6763),the blacklist is rather small.Therefore,the possibility is really quite low that two or above characters in a user-speci?ed word simultane-ously occur in the blacklist.In general,there exists at least one character in the user-speci?ed word,which does not belong to the blacklist.

3.Chinese Character Matching

To detect a speci?c character in the document image,a corresponding template image is generated from its standard bitmap image.The template image will be compared to each bounded character image in the document.The feature extrac-tion and classi?cation for machine printed Chinese character recognition have been widely researched in the past years.5,29To meet the requirement of fast processing, a hierarchical system is constructed to speed up the character matching process in this paper.The?rst stage,which is a coarse-matching procedure,is simple and fast to execute,but is not powerful enough to distinguish between similar patterns.In the second stage,a modi?ed Hausdor?distance is employed to match the candidate images with the template image.

3.1.Coarse-matching

The character image of I×J pixels is divided into M×N grids,as illustrated in Fig.3.Here M=N=4in our experiments.The similarity between the document image and the template image are evaluated by Euclidean distance based on the features of the stroke density of each grid.To avoid the e?ect of stroke width,the features are extracted from the edge of stroke instead of the stroke pixels directly.

At the same time,the features of stroke density are normalized with respect to their grid area.The stroke density f(m,n)(0≤m

M×N

f(m,n)=

Chinese Word Searching in Imaged Documents235

(a)(b)

Fig.3.Grid division:(a)Overlaid region for grid division and fuzzy weight,(b)noise e?ect on bounding box and feature extraction.

by the noise close to the border of bounding box,some strokes,especially the horizontal and vertical strokes,will be misplaced from their corresponding grids to the neighbors,as showed in Fig.3(b).

To reduce this e?ect,an approach using overlaid region and fuzzy weight is applied for grid division instead of abrupt changes on the region borders.11In other words,the region of each grid is extended to its neighbors,and the border of each grid is overlaid.The weight of each pixels is fuzzed as well,as showed in Fig.3(a).

Accordingly the stroke density f(m,n)of each grid is calculated as:

1

f(m,n)=

236Y.Lu&C.L.Tan

For the purpose of word image matching,we proposed a weighted Hausdor?distance(WHD)in our previous work,16to improve the MHD by setting the di?er-ent parts of the word image with di?erent contributions to calculate the directed distance in the Hausdor?distance.In this paper,we further apply the WHD to Chinese character matching.

The distance(e.g.Euclidean distance)between two points a and b is de?ned as d(a,b)= a?b ,and the distance between a point a and a?nite point set B={b1,...,b N

b

}is commonly de?ned as

d(a,B)=min

b∈B

a?b .(5)

Given two?nite point sets A={a1,...,a N a}and B={b1,...,b N

b

},the Haus-dor?distance is de?ned as:

H(A,B)=max(h(A,B),h(B,A))(6) where h(A,B)and h(B,A)represent the directed distances between two sets A and B.The directed distance h(A,B)is traditionally de?ned as

h(A,B)=max

a∈A d(a,B)=max

a∈A

min

b∈B

d(a,b)=max

a∈A

min

b∈B

a?b .(7)

The function h(A,B)identi?es the point a∈A that is farthest from any point of B and measures the distance from a to its nearest neighbor in B.The Haus-dor?distance H(A,B)measures the degree of mismatch between two point sets A and B.

Dubussion and Jain7presented the modi?ed Hausdor?distance(MHD)measure by employing the summation operator over all distances,rather than the maximum operator:

h MHD(A,B)=

1

Chinese Word Searching in Imaged Documents237

Fig.4.Di?erent parts in a Chinese character(c:the corner parts,h:the heart part,p:the peripheral parts).

Fig.5.Chinese characters selected from real document image.

of di?erent parts of the character image to the Hausdor?distance is not the same. The directed distance of WHD is computed as

1

h WHD(A,B)=

238Y.Lu&C.L.Tan

https://www.doczj.com/doc/e618684757.html,parison of distance measure by MHD and WHD.

(guo)(ming)(po)(cun) Word

Images MHD WHD MHD WHD MHD WHD MHD WHD

Fig.6.Diagram for word searching.

4.Word Searching

Figure6illustrates the diagram of the present system for searching a user speci?ed word in a Chinese document image.

In general,one Chinese word is composed of several single characters.At the ?rst step,we can select one character from the user-speci?ed word to make sure

Chinese Word Searching in Imaged Documents239 that it does NOT belong to the blacklist described in Sec.2.The selected character will act as the initial one to be matched with each of the bounded character images in the document.In most cases,the?rst character of the user-speci?ed word is not included in the blacklist.In such cases,the?rst character can be chosen as the initial character to be detected.However,in case the?rst character in the use-speci?ed word happens to be in the blacklist,we will select another character from the ensuing characters to ensure that the chosen character is not in the blacklist.Because the number of characters included in the blacklist is much fewer in comparison with the total number of Chinese characters,we can select a suitable initial character from the user-speci?ed word generally.

In Fig.7,the?rst character“zhong”is chosen as the initial character to be matched because its character image is basically composed of only one connected

(a)

(b)

Fig.7.The?rst character is chosen:(a)The initial character selected from user-speci?ed word, (b)matching sequence in document image.

(a)

(b)

Fig.8.The second character is chosen:(a)The initial character selected from user-speci?ed word,(b)matching sequence in document image.

240Y.Lu&C.L.Tan

component and it is not recorded in the blacklist.On the other hand,the character “tai”in Fig.8is composed of upper-lower disjunct elements which cannot be merged in some cases.This character is thus recorded in the blacklist.As a result,in Fig.8, we choose the second character from the user-speci?ed word to act as the initial character to be matched.

A template image is generated from the selected initial character,and then matched with each bounded character image in the document based on the charac-ter matching method described in the above section.If the coarse-matching stage results in a distance greater than a prede?ned threshold,the corresponding bounded character is de?nitely not the selected initial character,so no further processing is necessary.If the coarse-matching stage achieves a result showing the similarity be-tween the bounded character image and the template image,the second matching stage will be launched to decide whether the bounded image is really matched with the template image.

When a bounded character image in the document is matched with the selected initial character,the other bounded images in both horizontal and vertical directions are tested in succession to see if they can match the corresponding characters in the user-speci?ed word,subject to the constraint of size similarity and text line align-ment.In the procedure of detecting the ensuing character,the positional relation between character images and the characters’shape features including image size and aspect ratio,are used to merge the separate elements to one character image if possible.When the ensuing neighbor is a space,the detection will be heuristically extended to the?rst character of the possible neighboring text line.

In the case that the selected initial character is not actually the?rst character in the word(i.e.the?rst character belongs to the blacklist,as in Fig.8),the detection of the candidate images for the preceding characters is similar to that for the ensuing character.

5.Experimental Results

To verify the validity of the approach proposed in this paper for searching user-speci?ed Chinese word in Chinese document images,the experiments have been conducted on132scanned document images that are selected from recent Chinese newspapers in the region,some in traditional Chinese and others in simpli?ed Chi-nese.The articles address diversely di?erent topics.The resolution of digitization is300DPI.The main fonts in the document images are commonly-used song,and black.

The standard bitmap of all Chinese characters is stored in a?le,from which the template image of each character in the user-speci?ed word is generated with the size of16×16pixels.In the matching procedure,the bounded character ex-tracted from the document image is normalized to the same size of the template image.The searching strategy depicted in the above section is utilized to detect the user-speci?ed word from the document images,where the thresholds in the coarse

Chinese Word Searching in Imaged Documents241

(a)

(b)

Fig.9.Experimental results.

242Y.Lu&C.L.Tan

(c)

(d)

Fig.9.(Continued)

Chinese Word Searching in Imaged Documents243 Table2.System performance comparison for words with di?erent numbers of characters.

Recall0.97620.94740.86110.83330.76920.8774

Precision0.66140.72000.8378 1.00 1.000.8438

244Y.Lu&C.L.Tan

The application of the present method we have in mind is to deal with the docu-ment images collected from the past issues of newspapers dated about one hundred years ago.Those images are noisier,and the characters are more blurred.The qual-ity of the image,therefore,is much poorer.Our future work,will concentrate on how to cope with the noise and the blurred characters included in the Chinese document images,including adaptive setting of the thresholds during character matching.

Acknowledgment

This project is supported by the Agency for Science,Technology and Research and Ministry of Education of Singapore under research grant R252-000-071-112/303.

References

1. D.S.Bloomberg and F.R.Chen,Document image summarization without OCR,

Proc.Int.Conf.Image Processing,Vol.2,Lausanne,Switzerland,1996,pp.229–232.

2. F.Chang,Retrieving information from document images:problem and solutions,Int.

J.Docum.Anal.Recogn.4(2001)46–55.

3. F.R.Chen,L.D.Wilcox and D.S.Bloomberg,Word spotting in scanned images

using hidden Markov models,Proc.Int.Conf.Acoustics,Speech and Signal Processing, Vol.5,1993,pp.1–4.

4.J.DeCurtins and E.Chen,Keyword spotting via word shape recognition,Proc.SPIE,

Document Recognition II,Vol.2422,San Jose,CA,1995,pp.270–277.

5.X.Q.Ding,Machine printed Chinese character recognition,Handbook of Character

Recogniton and Document Image Analysis,eds.H.Bunke and P.S.P.Wang(World Scienti?c,1997),pp.305–329.

6. D.Doermann,The indexing and retrieval of document images:a survey,Comput.Vis.

Imag.Underst.70(1998)287–298.

7.M.P.Dubuisson and A.K.Jain,A modi?ed Hausdor?distance for object matching,

Proc.12th Int.Conf.Pattern Recognition,1994,pp.566–568.

8.Y.He,Z.Jiang,B.Liu and H.Zhao,Content-based indexing and retrieval method of

Chinese document images,Proc.Fifth Int.Conf.Document Analysis and Recognition, 1999,pp.685–688.

9. D.P.Huttenlocher,G.A.Klanderman and W.J.Rucklidge,Comparing images using

the Hausdor?distance,IEEE Trans.Patt.Anal.Mach.Intell.15(1993)850–863. 10.Y.Ishitani,Model-based information extraction method tolerant of OCR errors for

document images,Proc.Sixth Int.Conf.Document Analysis and Recognition,2001, pp.908–915.

11.H.Kai and H.Yan,O?-line signature veri?cation based on geometric feature extrac-

tion and neural network classi?cation,Patt.Recogn.30(1997)9–17.

12.T.Kameshiro,T.Hirano,Y.Okada and F.Yoda,A document image retrieval method

tolerating recognition and segmentation errors of OCR using shape-feature and mul-tiple candidates,Proc.Fifth Int.Conf.Document Analysis and Recognition,1999, pp.681–684.

13.K.Katsuyama,H.Takebe,K.Kurokawa et al.,Highly accurate retrieval of Japanese

document images through a combination of morphological analysis and OCR,Proc.

SPIE,Document Recognition and Retrieval,Vol.4670,2002,pp.57–67.

14. A.Kolcz,J.Alspector,M.Augusteijn,R.Carlson and G.V.Popescu,A line-oriented

Chinese Word Searching in Imaged Documents245 approach to word spotting in handwritten documents,Patt.Anal.Appl.3(2000) 153–168.

15.S.Kuo and O.F.Agazzi,Keyword spotting in poorly printed documents using pseudo

2-D hidden Markov models,IEEE Trans.Patt.Anal.Mach.Intell.16(1994)842–848.

16.Y.Lu,C.L.Tan,W.Huang and L.Fan,An approach to word image matching

based on weighted Hausdor?distance,Proc.Sixth Int.Conf.Document Analysis and Recognition,Seattle,USA,2001,pp.921–925.

17.Y.Lu,C.L.Tan,P.Shi and K.Zhang,Segmentation of handwritten Chinese char-

acters from destination addresses of mail pieces,Int.J.Pattern Recognition and Ar-ti?cial Intelligence16(2002)85–96.

18.R.Manmatha,C.Han and E.M.Riseman,Word spotting:a new approach to index-

ing handwriting,https://www.doczj.com/doc/e618684757.html,puter Vision and Pattern Recognition,1996, pp.631–637.

19.M.Ohtam,A.Takasu and J.Adachi,Retrieval methods for English text with misrec-

ognized OCR characters,Proc.Fourth Int.Conf.Document Analysis and Recognition, 1997,pp.950–956.

20.G.Salton,Developments in automatic text retrieval,Science253(1991)974–980.

21.G.Salton,J.Allan,C.Buckley and A.Singhal,Automatic analysis,theme generation,

and summarization of machine-readable text,Science264(1994)1421–1426.

22. D.G.Sim,O.K.Kwon and R.H.Park,Object matching algorithm using robust

Hausdor?distance measures,IEEE Trans.Imag.Process.8(1999)425–429.

23. A.L.Spitz,Shape-based word recognition,Int.J.Docum.Anal.Recogn.1(1999)

178–190.

24.T.Syeda-Mahmood,Indexing of handwritten document images,Proc.Workshop on

Document Image Analysis,San Juan,Puerto Rico,1997,pp.66–73.

25. C.L.Tan,W.Huang,Z.Yu and Y.Xu,Imaged document text retrieval without

OCR,IEEE Trans.Patt.Anal.Mach.Intell.24(2002)838–844.

26.Y.Y.Tang,S.W.Lee and C.Y.Suen,Automatic document processing:a survey,

Patt.Recogn.29(1996)1931–1952.

27.X.Wu and P.Shi,Unconstrained handwritten numeral recognition using Hausdor?

distance and multi-layer neural network classi?er,Proc.Fifth Int.Conf.Document Analysis and Recognition,1999,pp.249–252.

28.Y.Yang and X.Liu,A re-examination of text categorization methods,Proc.22nd

Annual Int.ACM SIGIR Conf.Research and Development in Information Retrieval, 1999,pp.42–49.

29.P.C.Yuen,G.C.Feng and Y.Y.Tang Printed Chinese character recognition silimar-

ity measurement using ring projection and distance transform,Advance in Oriental Document Analysis and Recogniton Techniques,eds.S.W.Lee,Y.Y.Tang and P.S.P.

Wang(World Scienti?c,1998),pp.209–221.

30.J.Zhu,T.Hong and J.J.Hull,Image-based keyword recognition in oriental language

document images,Patt.Recogn.30(1997)1293–1300.

246Y.Lu&C.L.

Tan

Yue Lu is a research fellow with the De-partment of Computer Science,National Uni-versity of Singapore. He received his B.E. and M.E.degrees in telecommunications and electronic engineering from Zhejiang Univer-

sity,China in1990,1993respectively,and his Ph.D in pattern recognition and intelligence systems from Shanghai Jiao Tong University, China in2000.

His research interests include document image processing,document retrieval,char-acter recognition and intelligence

systems.

Chew Lim Tan is

an Associate Profes-

sor in the Department

of Computer Science,

School of Computing,

National University of

Singapore.He received

his B.Sc.(Hons)degree

in physics in1971from

the University of Singa-pore,the M.Sc.degree in radiation stud-ies in1973from the University of Sur-rey,UK,and the Ph.D.in computer sci-ence in1986from the University of Vir-ginia,USA.He is an associate editor of Pattern Recognition and has served on the program committees of Int.Conf.Pattern Recognition(ICPR)2002,Graphics Recog-nition Workshop(GREC)2001and2003, Web Document Analysis Workshop(WDA) 2001and2003,Document Image Analysis and Retrieval Workshop(DIAR)2003,Int. Conf.Document Analysis and Recognition (ICDAR)2005.

His research interests include document image and text processing,neural networks and genetic programming.He has published more than170research publications in these areas.

c和c语言的文件操作全高效与简洁修订稿

c和c语言的文件操作 全高效与简洁 SANY标准化小组 #QS8QHH-HHGX8Q8-GNHHJ8-HHMHGN#

例一 #include "" int main() {FILE *fp,*f; int a,b,c; fp=fopen("","r"); f=fopen("","w"); fscanf(fp,"%d%d%d",&a,&b,&c); a=5; b=3; fprintf(f,"%d%d",a+b+c,b); fclose(fp); fclose(f); return 0;} 例二、新建一个名为的文件,里面按如图1存储6个数据,然后在同一目录下建立一文件,按图2格式输出这六个数据。 #include "" int main() {FILE *fp,*fpp; int a,b,c,d,e,f; fp=fopen("","r"); fpp=fopen("","w"); fscanf(fp,"%d%d%d%d%d%d",&a,&b,&c,&d,&e,&f); fprintf(fpp,"%d%d%d%d%d%d",a,b,c,d,e,f); fclose(fp); fclose(fpp); return 0;}

c++常用: #include <> ifstream filein(""); // 定义一个文件输入流 ofstream fileout(""); //cout<< --> fileout<< () //文件到末尾,返回非零值 表示输入的数据文件 本地测试的话本来输入的数据就要在这个文件里面测试了 建一个本地的文本,可以用记事本的方式打开 注意:文件输入的话,以后的cin>>都要改成filein>>, cout<<都要改成fileout<< c语言常用: freopen("","r",stdin); //重定向所有标准的输入为文件输入 freopen("","w",stdout);//重定向所有标准的输出为文件输出 fclose(stdout);//输出结束 freopen("","r",stdin); //重定向所有标准的输入为文件输入 freopen("","w",stdout);//重定向所有标准的输出为文件输出 fclose(stdout);//输出结束 第一句的意思就是文件输入,以"读状态",去替换标准的输入 以上如果只是规定用文件输入输出的某一种,那么就只用其中的一种 方法一:最简单的 main() { freopen("","r",stdin);//从中读取数据 freopen("","w",stdout);//输出到文件 } 方法二:速度比第一种快 main() { FILE *in; FILE *out; in=fopen("","r"); //指针指向输入文件

C语言输入输出函数格式详解

1、输入和输出: 输入:输入也叫读,数据由内核流向用户程序 输出:输出也称写、打印,数据由用户程序流向内核 以下介绍一些输入输出函数,尽管都是一些有缺陷的函数,但比较适合初学者使用 2、printf用法(其缺陷在于带缓存) printf输出时必须加上\n(刷新缓存) 解释:第一幅图没有加'\n',不会刷新缓存区,则不会打印出来;第二幅图是因为主函数结束时刷新了缓存区,但由于没有换行符,所以没有换行便显示了后面的内容;第三幅图时正常打印。 变量定义的是什么类型,在printf打印时就需要选择什么格式符,否则会造成数据的精度丢失(隐式强转),甚至会出现错误

(1)格式输出函数的一般形式 函数原型:intprintf(char * format[,argument,…]); 函数功能:按规定格式向输出设备(一般为显示器)输出数据,并返回实际输出的字符数,若出错,则返回负数。 A、它使用的一般形式为:printf("格式控制字符串",输出项列表); B、语句中"输出项列表"列出要输出的表达式(如常量、变量、运算符表达式、函数返回值等),它可以是0个、一个或多个,每个输出项之间用逗号(,)分隔;输出的数据可以是整数、实数、字符和字符串。 C、"格式控制字符串"必须用英文的双引号括起来,它的作用是控制输出项的格式和输出一些提示信息,例如:

inti=97; printf("i=%d,%c\n",i,i);输出结果为:i=97,a 语句printf("i=%d,%c\n",i,i);中的两个输出项都是变量i,但却以不同的格式输出,一个输出整型数97,另一个输出的却是字符a,其格式分别由"%d"与"%c"来控制。 语句printf("i=%d,%c\n",i,i);的格式控制字符串中"i="是普通字符,他将照原样输出;"%d"与"%c"是格式控制符;"\n"是转义字符,它的作用是换行。 (2)格式控制 格式控制由格式控制字符串实现,格式控制字符串由3部分组成:普通字符、转义字符、输出项格式说明。 A、普通字符。普通字符在输出时,按原样输出,主要用于输出提示信息。(空格属于普通字符) B、转义字符。转义字符指明特定的操作,如"\n"表示换行,"\t"表示水平制表等。 \n 换行 \f 清屏并换页 \r 回车 \tTab符 \xhh表示一个ASCII码用16进表示,其中hh是1到2个16进制数 C、格式说明部分由"%"和"格式字符串"组成,他表示按规定的格式输出数据。格式说明的形式为:% [flags] [width] [.prec] [F|N|h|l] type||%[标志][输出最少宽度][.精度][长度]类型 各部分说明如下: a、[]表示该项为可选项,即可有可无,如printf("%d",100); b、%:表示格式说明的起始符号,不可缺少。 c、flags为可选择的标志字符,常用的标志字符有: - ——左对齐输出,默认为右对齐输出 + ——正数输出加号(+),负数输出减号(-),如省略正数的+默认不显示 0 ——输出数值时指定左面不使用的空位置自动填0,如省略表示指定空位不填 # ——对c、s、d、u类无影响;对o类,在输出时加前缀0(数字0,八进制表示符);对x类,在输出时加前缀0x(字母为小写);对X类,在输出时加前缀0X(字母为大写);对e,g,f类当结果有小数时才给出小数点。 d、width为可选择的宽度指示符。 用十进制正整数表示设置输出值得最少字符个数。不足则补空格,多出则按实际输出,默认按实际输出,例如: printf("%8d\n",100); (前面空五格)100 printf("%08d\n",100); (前面5个0)100 printf("%6d\n",100); (前面空三格)100 printf("%-8d\n",100); 100(后面空五格) printf("%+8\n",100); (前面空四格)+100 e、[.prec]为可选的精度指示符 用“小数点”加“十进制正整数”表示,对“整数”、“实数”和“字符串”的输出有如下

C语言读写文件操作

C语言读写文件操作 #include #include #include FILE *stream;//, *stream2; FILE *stream2; void main( void ) { int numclosed; char *list; list="这个程序由czw编写"; //试图打开文件data.txt,如果该文件不存在,则自动创建 if( (stream= fopen( "data.txt", "r" )) == NULL ) { printf( "试图打开'data.txt'\n" ); printf( "'data.txt'不存在\n" ); printf( "'data.txt'被创建\n" ); } else printf( "'data.txt'被打开\n" ); //以写入方式打开 if( (stream2 = fopen( "data.txt", "w+" )) == NULL ) printf( "'data.txt'不存在\n" ); else { printf( "'data.txt'成功被打开\n" ); fwrite(list,strlen(list),30,stream2); printf("写入数据成功\n"); } //如果文件data.txt存在,就会打开成功,则stream!=NULL,这时就关闭stream if (stream!=NULL) if( fclose( stream) ) printf( "文件流 stream 被关闭\n" ); //关闭所有打开的文件流,返回关闭的文件流个数 numclosed = _fcloseall( );

C语言编程常用头文件

C语言编程常用头文件 C语言常用头文件总结 序号库类别头文件 1 字符处理ctype.h 2 地区化local.h 3 数学函数math.h 4 信号处理signal.h 5 输入输出stdio.h 6 实用工具程序stdlib.h 7 字符串处理string.h 字符处理函数 本类别函数用于对单个字符进行处理,包括字符的类别测试和字符的大小写转换头文件ctype.h 函数列表<> 函数类别函数用途详细说明 字符测试是否字母和数字isalnum 是否字母isalpha 是否控制字符iscntrl 是否数字isdigit 是否可显示字符(除空格外)isgraph 是否可显示字符(包括空格)isprint 是否既不是空格,又不是字母和数字的可显示字符ispunct 是否空格isspace 是否大写字母isupper 是否16进制数字(0-9,A-F)字符isxdigit 字符大小写转换函数转换为大写字母toupper 转换为小写字母tolower 地区化 本类别的函数用于处理不同国家的语言差异。

头文件local.h 函数列表 函数类别函数用途详细说明 地区控制地区设置setlocale 数字格式约定查询国家的货币、日期、时间等的格式转换localeconv 数学函数 本分类给出了各种数学计算函数,必须提醒的是ANSI C标准中的数据格式并不符合IEEE754标准,一些C语言编译器却遵循IEEE754(例如frinklin C51) 头文件math.h 函数列表 函数类别函数用途详细说明 错误条件处理定义域错误(函数的输入参数值不在规定的范围内) 值域错误(函数的返回值不在规定的范围内) 三角函数反余弦acos 反正弦asin 反正切atan 反正切2 atan2 余弦cos 正弦sin 正切tan 双曲函数双曲余弦cosh 双曲正弦sinh 双曲正切tanh 指数和对数指数函数exp 指数分解函数frexp 乘积指数函数fdexp 自然对数log 以10为底的对数log10 浮点数分解函数modf 幂函数幂函数pow 平方根函数sqrt 整数截断,绝对值和求余数函数求下限接近整数ceil 绝对值fabs 求上限接近整数floor 求余数fmod 本分类函数用于实现在不同底函数之间直接跳转代码。头文件setjmp.h io.h

C语言 文件操作

C语言中的文件操作 12.1请编写一个程序,把一个文件的内容复制到另一个文件中。 程序如下: #include main() { char ch; FILE *fp1; FILE *fp2; if((fp1=fopen("C:\\Users\\acer\\Documents\\1.txt","r"))==NULL) { printf("The file 1.txt can not open!"); exit(0); } if((fp2=fopen("C:\\Users\\acer\\Documents\\2.txt","w"))==NULL) { printf("The file 2.txt can not open!"); exit(0); } ch=fgetc(fp1); while(!feof(fp1)) { fputc(ch,fp2); ch=fgetc(fp1); } fclose(fp1); fclose(fp2); } 运行结果: 12.3请编写一个程序,比较两个文件,如果相等则返回0;否则返回1。

程序如下:#include main() { FILE *f1,*f2; char a,b,c; int x; printf("input strings for file1\n"); f1=fopen("file1","w"); while((c=getchar())!= EOF) putc(c,f1); fclose(f1); printf("output strings for file1\n"); f1=fopen("file1","r"); while((c=getc(f1))!= EOF) printf("%c",c); fclose(f1); printf("\n\ninput strings for file2\n"); f2=fopen("file2","w"); while((c=getchar())!= EOF) putc(c,f2); fclose(f2); printf("\noutput strings for file2\n"); f1=fopen("file2","r"); while((c=getc(f2))!= EOF) printf("%c",c); fclose(f2); f2=fopen("file2","r"); getch(); }

C语言输入输出函数printf与scanf的用法格式

C 语言输入输出函数printf 与scanf 的用法格式 printf()函数用来向标准输出设备(屏幕)写数据; scanf() 函数用来从标准输入设备(键盘)上读数据。下面详细介绍这两个函数的用法。 一、printf()函数 printf()函数是格式化输出函数, 一般用于向标准输出设备按规定格式输出信息。在编写程序时经常会用到此函数。printf()函数的调用格式为: printf("<格式化字符串>", <参量表>); 其中格式化字符串包括两部分内容: 一部分是正常字符, 这些字符将按原样输出; 另一部分是格式控制字符, 以"%"开始, 后跟一个或几个控制字符,用来确定输出内容格式。 参量表是需要输出的一系列参数,可以是常量、变量或表达式,其个数必须与格式化字符串所说明的输出参数个数一样多, 各参数之间用","分开, 且顺序一一对应, 否则将会出现意想不到的错误。 例如: printf("a=%d b=%d",a,b); 1. 格式控制符Turbo C 2.0提供的格式化规定符如下: 格式控制字符 参量表 正常字符

━━━━━━━━━━━━━━━━━━━━━━━━━━ 符号作用 ────────────────────────── %d 十进制有符号整数 %u 十进制无符号整数 %f 浮点数 %s 字符串 %c 单个字符 %p 指针的值 %e,%E 指数形式的浮点数 %x, %X 无符号以十六进制表示的整数 %o 无符号以八进制表示的整数 %g,%G 自动选择合适的表示法 ━━━━━━━━━━━━━━━━━━━━━━━━━━ printf的附加格式说明字符 字符说明 l 用于长整型数或双精度实型,可加在格式 符d、o、x、u和f前面 m(代表一个正整数据最小输出显示宽度

C语言中关于文件操作

C语言中关于文件操作 C语言中的文件 C语言把文件看作一个字节的序列 C语言对文件的存取是以字节为单位的文本文件(ASCII文件) 按数据的ASCII形式存储 二进制文件 按数据在内存中的二进制形式存储 文本文件和二进制文件 缓冲文件系统 文件类型指针 FILE类型 保存被使用的文件的有关信息 所有的文件操作都需要FILE类型的指针 FILE是库文件中定义的结构体的别名注意不要写成struct FILE 举例 FILE *fp;

FILE类型 typedef struct { short level; /*缓冲区满空程度*/ unsigned flags; /*文件状态标志*/ char fd; /*文件描述符*/ unsigned char hold; /*无缓冲则不读取字符*/ short bsize; /*缓冲区大小*/ unsigned char *buffer; /*数据缓冲区*/ unsigned char *curp; /*当前位置指针*/ unsigned istemp; /*临时文件指示器*/ short token; /*用于有效性检查*/ } FILE; 文件的打开 (fopen函数) 函数原型 FILE *fopen(char *filename, char *mode); 参数说明 filename: 要打开的文件路径 mode: 打开模式 返回值 若成功,返回指向被打开文件的指针 若出错,返回空指针NULL(0) 打开模式描述 r 只读,打开已有文件,不能写 w 只写,创建或打开,覆盖已有文件 a 追加,创建或打开,在已有文件末尾追加 r+ 读写,打开已有文件 w+ 读写,创建或打开,覆盖已有文件 a+ 读写,创建或打开,在已有文件末尾追加 t 按文本方式打开 (缺省) b 按二进制方式打开 文件的打开模式 文件的关闭 (fclose函数) 函数原型 int fclose(FILE *fp); 参数说明 fp:要关闭的文件指针 返回值 若成功,返回0 若出错,返回EOF(-1) 不用的文件应关闭,防止数据破坏丢失

C语言数据的输入与输出

C语言数据的输入与输出 一.Printf函数 函数原型在头文件stido.h中(使用时可以不包括) printf函数的返回值等于成功输入得数据的个数 1.printf函数得一般调用形式 printf(格式控制字符串,输出项列表) 格式控制字符串包括: (1)格式指示符 格式:%[标志][宽度][.精度][[h|l]<类型>] (2)转义字符 如:'\n','\0' (3)普通字符 如:printf("max=%d,min=%d\n",a,b); “max=”和“,min=”是普通字符;“%d”是格式指示符;“\n”是转义字符; a和b是输出类表中的输出项(可以是常量、变量、或表达式)。 2.print函数中常用得格式控制说明 (1)数据类型控制字符 格式字符说明 %c输出一个字符 %d或%i以十进制形式输出一个带符号得整数(正数不输出符号) %u以十进制形式输出无符号整数。若有符号则自动将符号位转化为数值位,%o 和%x也具有类似得功能 %o以八进制形式输出无符号整型数(不带前导0) %x或%X以十六进制形式输出无符号整型数(不带前导0x或0X)。对于十六进制数中的字符abcdef,用%x时输出得是小写字母,%X时输出的是大写字母 %f以小数形式输出单精度或双精度实数 %e或%E以指数形式输出单精度或双精度实数 %g或%G有系统决定是采用%f还是%e格式,以使输出结果的总宽度最小,并且不输出没意义的0 %s依次输出字符串中得各个字符,知道遇到'\0'是结束 (2)数据类型修饰符 数据类型修饰符在%和数据类型控制符之间 如:长整型"%ld",短整型"%hd" (3)输出数据所占得宽度与精度

c语言中目录及文件操作

1. 错误处理与错误号 cat /usr/include/asm-generic/errno-base.h #define EPERM 1 /* Operation not per mitted */ #define ENOENT 2 /* No suc h file or director y */ #define ESRCH 3 /* No suc h proc ess */ #define EINTR 4 /* Interrupted s y stem call */ #define EIO 5 /* I/O error */ #define ENXIO 6 /* No such dev ice or address */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Ex ec format error */ #define EBADF 9 /* Bad file number */ #define ECHILD 10 /* No c hild process es */ #define EAGAIN 11 /* Try again */ #define ENOMEM 12 /* Out of memor y */ #define EACCES 13 /* Per mission denied */ #define EFAULT 14 /* Bad address */ #define ENOTBLK 15 /* Bl oc k dev ice required */ #define EBUSY 16 /* Dev ice or resource bus y */ #define EEXIST 17 /* File ex ists */ #define EXDEV 18 /* Cross-dev ice link */ #define ENODEV 19 /* N o suc h dev ice */ #define ENOTDIR 20 /* Not a direc tor y */ #define EISDIR 21 /* Is a director y */ #define EINVAL 22 /* Inv alid argument */ #define ENFILE 23 /* File table ov erflow */ #define EMFILE 24 /* Too many open files */ #define ENOTTY 25 /* Not a ty pewriter */ #define ETXTBSY 26 /* Text file bus y */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* N o space l eft on dev ic e */ #define ESPIPE 29 /* Illegal seek */ #define EROFS 30 /* Read-onl y file s y s tem */ #define EMLINK 31 /* T oo many link s */ #define EPIPE 32 /* Brok en pipe */ #define EDOM 33 /* Math argument out of domain of func */ #define ERANGE 34 /* M ath r esult not repres entabl e */ 1.1 用错误常数显示错误信息 函数strerror()可以把一个错误常数转换成一个错误提示语句。char *strerror(int errnum); #include #include #include int main()

一风水学入门讲义

风水学入门讲义(1) 2009-03-17 19:12:54来自: 圆圆的圆(去云游吧~) 编者按:人类一直在追求和平的环境和健康的体魄,然而万事万物都是普遍联系的,一些看似不相干的事,却往往联系得如此紧密,大大超出了人类现有的认识水平。对于许多民族和国家的人来说,不会认为自身的健康和自己的住宅以及过世家人陵墓的方位、地势等问题有太大的关系。但在中国,这个问题已经历了数千年的验证。也许用全息论的观点,可以做出一点勉强的解释。 前言 中国传统文化浩如烟海,内容繁多,其中包含的“风水学”更显得博大精深,为中国炎黄子孙几千年来的经济、文化以及科技的发达,起着十分重要的作用。它像一颗灿烂的明珠,永放光芒。自古以来,人们用风水知识进行趋吉避凶、解除病苦和贫困,乞求子孙的繁荣、福禄的旺盛、仕途的顺昌、爵位的升迁等方面,均依赖选择、改变风水来实现。因此,古人总结为:风水之道“夺神功,改天命。”风水知识浩如烟海,笔者只能以所学之微不足道与各位同仁交流。 一、风水学的基本含义 二、谈风水学的起源及其历史发展 三、浅谈风水知识的基本内容 (一)形势 (二)理气 四、罗盘的发明和应用 五、风水对人体生命的影响 六、学习风水的现实意义 一、风水学的基本含义风水这一名字起始于中国晋朝托名为郭璞(公元276——324年)所著的《葬书》。

书曰:“葬者,乘生气也。”经曰:“气乘风则散,界水则止,古人聚之使不散,行之便有止。故谓之…风水?”。风水之法,“得水为上,藏风次之”。又曰:“浅深得乘,风水自成。夫阴阳之气,噫而为风,升而为云,降而为雨,行乎地中,而为生气。夫土者之体,有土斯有气,气者水之母,有气斯有水。”在这以前人们把风水文化称为堪舆学。《玄灵修真辞典》中说:“堪舆,属传统文化方术之一,堪舆风水之术,在我国民间流传较广,①天地的代称,《淮南子》许慎注:“堪,天道也。舆,地道也。”②即风水,指住宅基地或坟地形势,也指相宅、相墓之法。还可理解为:堪者仰观天象,舆者俯察地气。《辞源》说:“风水,指宅地或坟地的地势,方向等。”《辞海》说:“风水。也叫堪舆。……也指相宅、相墓之法。” 风水的核心内容,是人们对居住环境进行选择和处理的一种学问。风水理论的核心是人,它追求的目标是人与自然环境的和谐与统一。在天、地、人三者之间,既强调人的主体地位,又不偏废自然环境和地理环境,注重天时、地理、人和之间的协和自然美。故有人又称风水学是“地球磁场与人类关系的学问”,也有人称它为“美学”,还有人称它为“环境学”、“元素学”、“软科学”等等。其范围包括住宅、宫室、寺观、陵墓、村落、城市诸方面,其中涉及陵墓的称为阴宅,涉及人居住方面的称为阳宅。风水对居住环境的影响主要表现在:一是对基址的选择,即追求一种在生理和心理上都能满足的地形条件;二是对居住形态的处理,包括自然环境的利用与改造,房屋的朝向、位置、高低大小、出入口选择,道路、供水、排水等因素的安排;三是在上述基础上添加某种符号(即理气),以满足人们避凶就吉的心理需求(摘《风水探源序》)。风水学有时还叫地理风水或简称地理学,很早以前还有称其为青乌、青乌术的等等。 二、谈风水学的起源及其历史发展中国人对地理风水的意识产生很早,“上古之时,人民少而禽兽众,人民不胜禽兽虫蛇。”在那种危机四伏的自然条件下,人们先以树木为巢舍,后来在了解自然和改造自然的实践中,首先对居住环境进行了改造。大约在六七千年前的原始村落——半坡村遗址,“选址多位于发育较好的马兰阶地上,特别是河流交汇处……离河较远的,则多在泉近旁。”西安半坡遗址就座落于渭河的支流浐河阶地上方,地势高而平缓,土壤肥沃,适宜生活和开垦。由此可知,那时的部落对居住环境已有所选择,并且这种选择是建立在对地理环境和风水知识有所认识的基础之上,懂得居住方位与日照和风寒的关系。如半坡中间的房子朝东,围绕大房子氏族成员的房子多朝南,即表现出“喜东南,厌西北”的特点,遗址四周有防御性豪沟;沟北、南有公共墓地,居住与墓地分开(《风水探源》)。 到了殷周时期,已有卜宅之文。如周朝公刘率众由邰迁豳,他亲自勘察宅茔,“既景乃冈,相其阴阳,观其流泉。”(《诗经?公刘》)

c语言程序中文件的操作

文件操作函数 C语言 (FILE fputc fgetc fputs fgets fscanf fprintf) 在ANSI C中,对文件的操作分为两种方式,即流式文件操作和I/O文件操作,下面就分别介绍之。 一、流式文件操作 这种方式的文件操作有一个重要的结构FILE,FILE在中定义如下: typedef struct { int level; /* fill/empty level of buffer */ unsigned flags; /* File status flags */ char fd; /* File descriptor */ unsigned char hold; /* Ungetc char if no buffer */ int bsize; /* Buffer size */ unsigned char _FAR *buffer; /* Data transfer buffer */ unsigned char _FAR *curp; /* Current active pointer */ unsigned istemp; /* Temporary file indicator */ short token; /* Used for validity checking */ } FILE; /* This is the FILE object */ FILE这个结构包含了文件操作的基本属性,对文件的操作都要通过这个结构的指针来进行,此种文件操作常用的函数见下表函数功能 fopen() 打开流 fclose() 关闭流 fputc() 写一个字符到流中

C语言程序设计第一章作业

一、单选题(每小题10分,共100分,得分70 分) 1、1.关于C程序的构成描述,_________是不正确的。 A、一个源程序至少且仅包含一个main函数,也可包含一个main函数和若干个其他函数。 B、函数由函数首部和函数体两部分组成,二者缺一不可。 C、函数首部通常是函数的第1行,包括:函数属性、函数类型、函数名、函数参数等,不管有无函数参数,都必须用一对圆括号括起来。 D、函数体通常在函数首部下面,用一对花括号将声明部分和执行部分括起来,但不能为空。 你的回答:D (√) 参考答案:D 2、2.C程序中,不管是数据声明还是语句,都必须有一个结束符,它是C语句的必要组成部分,该符号是_________。 A、逗号“,” B、句号“。” C、分号“;” D、单撇号“’” 你的回答:C (√) 参考答案:C 3、3.下列关于C程序的运行流程描述,______是正确的。 A、编辑目标程序、编译目标程序、连接源程序、运行可执行程序。 B、编译源程序、编辑源程序、连接目标程序、运行可执行程序。 C、编辑源程序、编译源程序、连接目标程序、运行可执行程序。 D、编辑目标程序、编译源程序、连接目标程序、运行可执行程序。 你的回答:C (√) 参考答案:C 4、5.描述或表示算法有多种方法,______不是常用的表示方法。 A、自然语句 B、流程图或N-S图 C、伪代码 D、效果图 你的回答:D (√) 参考答案:D

5、7.C语言是一种结构化的程序设计语言,任何程序都可以将模块通过3种基本的控制结构进行组合来实现,这三种基本的控制结构是指______。 A、分支结构、循环结构、函数结构 B、顺序结构、选择结构、函数结构 C、顺序结构、分支结构、循环结构 D、以上描述都不正确 你的回答:D (×) 参考答案:C 6、下列关于算法特性的描述,______是不正确的。 A、有穷性:指一个算法应该包含有限的操作步骤,而不能是无限的。 B、确定性:指算法的每一个步骤都应当是确定的,不应该是含糊的、模棱两可的。 C、有效性:指算法中的每一个步骤都应当能有效地执行,并得到确定的结果。 D、输入/输出性:指算法中可以有输入/输出操作,也可以没有输入/输出操作。 你的回答:D (√) 参考答案:D 7、关于运行一个C程序的描述,______是正确的。 A、程序总是从main()函数处开始运行,当main()函数执行结束时,程序也就执行结束。 B、程序总是从main()函数处开始运行,当调用其它函数时,也可在其它函数中执行结束。 C、当程序中无main()函数时,可以设置一个主控函数来代替main()函数,从而达到运行程序的目的。 D、以上描述都不正确。 你的回答:B (×) 参考答案:A 8、下列关于C程序中复合语句的描述,______是正确的。 A、用一对圆括号“( )”将若干语句顺序组合起来就形成一个复合语句。 B、用一对大括号“{ }”将若干语句顺序组合起来就形成一个复合语句。 C、用一对大括号“[ ]”将若干语句顺序组合起来就形成一个复合语句。 D、以上描述都不正确。 你的回答:B (√) 参考答案:B 9、一个C源程序文件的扩展名是______。

C语言中文件_数据的输入输出_读写

C语言中文件,数据的输入输出,读写. 文件是数据的集合体,对文件的处理过程就是对文件的读写过程,或输入输出过程。 所谓文件是指内存以外的媒体上以某种形式组织的一组相关数据的有序集合。文件分类: 顺序文件,随机文件。 文本文件和二进制文件。 文本文件也称为ASCII文件,在磁盘中存放时每个字符对应一个字节,用于存放对应的ASCII码。 文本文件可以在屏幕上按字符显示,源程序文件就是文本文件,由于是按字符显示,所以能读懂文件内容。 二进制文件是按二进制编码方式来存放的。这里主要讨论文件的打开,关闭,读,写,定位等操作。 文件的存取方法 C程序的输入输出(I/O)有两种方式:一种称为标准I/O或流式I/O,另一种称为低级I/O。流式I/O是依靠标准库函数中的输入输出函数实现的。低级I/O利用操作系统提供的接口函数(称为底层接口或系统调用)实现输入输出,低级I/O 主要提供系统软件使用。 在C语言中用一个FILE类型的指针变量指向一个文件,(FILE类型是系统在stdio.h中定义的描述已打开文件的一种结构类型),这个指针称为文件指针。FILE *指针变量标识符; 如 FILE *fp; 文件的打开与关闭 所谓打开文件,指的是建立文件的各种有关信息,并使文件指针指向该文件,以便对它进行操作。 关闭文件则是断开指针与文件之间的联系,也就禁止再对该文件进行操作。 1、fopen 函数原型:FILE *fopen(const char *filename,const char *mode); Fopen函数用来打开一个文件,前一部分用来说明文件路径及文件名,后一部分mode指出了打开文件的存取方式;返回值是被打开文件的FILE型指针,若打开失败,则返回NULL。打开文件的语法格式如下: 文件指针名=fopen(文件名,使用文件方式); 文件指针名必须被说明为FILE类型的指针变量。 FILE *fp; fp=fopen(“C:\\Windowss\\abc.txt”,”r”); 注意用两个反斜杠\\来表示目录间的间隔符。 存取文件的模式是一个字符串,可以由字母r,w,a,t,b及+组合而成,各字符的含

C语言格式输入函数SCANF()详解

scanf函数称为格式输入函数,即按用户指定的格式从键盘上把数据输入到指定的变量之中。 scanf函数的一般形式 scanf函数是一个标准库函数,它的函数原型在头文件“stdio.h”中。与printf函数相同,C语言也允许在使用scanf函数之前不必包含stdio.h文件。scanf函数的一般形式为: scanf(“格式控制字符串”,地址表列); 其中,格式控制字符串的作用与printf函数相同,但不能显示非格式字符串,也就是不能显示提示字符串。地址表列中给出各变量的地址。地址是由地址运算符“&”后跟变量名组成的。 例如:&a、&b分别表示变量a和变量b的地址。 这个地址就是编译系统在内存中给a、b变量分配的地址。在C 语言中,使用了地址这个概念,这是与其它语言不同的。应该把变量的值和变量的地址这两个不同的概念区别开来。变量的地址是C编译系统分配的,用户不必关心具体的地址是多少。 变量的地址和变量值的关系 在赋值表达式中给变量赋值,如: a=567; 则,a为变量名,567是变量的值,&a是变量a的地址。 但在赋值号左边是变量名,不能写地址,而scanf函数在本质上也是给变量赋值,但要求写变量的地址,如&a。这两者在形式上是不同的。&是一个取地址运算符,&a是一个表达式,其功能是求变量的

地址。 【例4-7】 #include int main(void){ int a,b,c; printf("input a,b,c\n"); scanf("%d%d%d",&a,&b,&c); printf("a=%d,b=%d,c=%d",a,b,c); return0; } 在本例中,由于scanf函数本身不能显示提示串,故先用printf 语句在屏幕上输出提示,请用户输入a、b、c的值。执行scanf语句,等待用户输入。在scanf语句的格式串中由于没有非格式字符在 “%d%d%d”之间作输入时的间隔,因此在输入时要用一个以上的空格或回车键作为每两个输入数之间的间隔。如: 789 或 7 8 9 格式字符串 格式字符串的一般形式为:

C语言文件操作

文件 教学目标:了解文件的概念;熟练掌握缓冲文件系统和非缓冲文件系统的概念及其区别;熟练掌握文件类型指针的概念;熟练掌握打开文件和关闭文件的方法;熟练掌握利用标准I/O提供的四种读写文件的方法对文件进行顺序读写和随机读写;了解文件操作的出错检测方法。 教学重点:缓冲文件系统与非缓冲文件;文件类型指针;文件的打开与关闭;利用标准I/O提供的四种读写文件的方法对文件进行顺序读写和随机读写。 教学难点:缓冲文件系统与非缓冲文件;利用标准I/O提供的四种读写文件的方法对文件进行顺序读写操作和随机读写操作。 §11.1 文件概述 11.1.1 文件的概念 所谓文件是指记录在外部存储介质上的数据集合。例如,用EDLN编辑好的一个源程序就是一个文件,把它存储到磁盘上就是一个磁盘文件。从计算机上输出一个源文件到打印机,这也是一个文件。广义上说,所有输入输出设备都是文件。例如,键盘、显示器、打印机都是文件。计算机以这些设备为对象进行输入输出,对这些设备的处理方法统一按文件处理。 计算机中的文件可以从不同角度进行分类: (1)按文件介质:磁带文件、磁盘文件和卡片文件等。 (2)按文件内容:源程序文件、目标文件、可执行文件和数据文件等。 (3)按文件中数据的组织形式:二进制文件和文本文件。 文本文件是指文件的内容是由一个一个的字符组成,每一个字符一般用该字符对应的ASCII码表示。例如,一个实数136.56占6个字符。二进制文件是以数据在内存中的存储形式原样输出到磁盘上去。例如,实数136.56在内存中以浮点形式存储,占4个字符,而不是6个字节。若以二进制形式输出此数,就将该4个字节按原来在内存中的存储形式送到磁盘上去。不管一个实数有多大,都占4个字节。一般来说,文本文件用于文档资料的保存,方便用户阅读理解;二进制文件节省存储空间而且输入输出的速度比较快。因为在输出时不需要把数据由二进制形式转换为字符代码,在输入时也不需要把字符代码先换成二进制形式然后存入内存。如果存入磁盘中的数据只是暂存的中间结果数据,以后还要调入继续处理,一般用二进制文件以节省时间和空间。如果输出的数据是准备作为文档供给人们阅读的,一般用字符代码文件,它们通过显示器或打印机转换成字符输出,比较直观。 11.1.2 缓冲文件系统和非缓冲文件系统 目前C语言所使用的磁盘文件系统有两大类:一类称为缓冲文件系统,又称为标准文件系统或高层文件系统;另一类称非缓冲文件系统,又称为低层文件系统。 (1)缓冲文件系统的特点 对程序中的每一个文件都在内存中开辟一个“缓冲区”。从磁盘文件输入的数据先送到“输入缓冲区”,然后再从缓冲区依次将数据送给接收变量。在向磁盘文件输出数据时,先将程序数据区中变量或表达式的值送到“输出缓冲区”中,然后

风水入门基础知识

风水入门基础知识 包头市传统文化发展促进会 一、罗盘的发明和应用 罗盘的最重要部分是指南针,指南针为中国四大发明之一。约四千余年前,黄帝战蚩尤,蚩尤放大雾以迷,黄帝无法辨别方向,幸得身边的风后将他多年研制成的指南针装入兵车之中,名为指南车,黄帝借此一举战败了蚩尤。意大利航海家哥伦布借助中国的指南针而发现了新大陆。 科学家发现,地球上每个空间或每个物体都存在不同质量的电磁波,它们有吉凶之别,对人体存在一定影响,若一个地方的电磁波与人的生命磁向或屋的磁波十分配合,为吉方;相反,两者发生排斥,即为凶方。古贤借助指南针测定地方的磁向,再配上时间的推算,便推出吉凶的方位。这样根据需要发明了罗盘。 罗盘一般为圆形,圆心装入指南针,红头一端指南,另一端指北。即用于子午线定位。然后将二十四字按八方位置平均刻入,即罗盘之地盘。用于立穴定向、格龙。后来发展为将先天八卦、后天八卦、六十四卦、六十龙透地、九星、黄泉煞、二十八宿、一百二十分金等内容装入,风水家用它来勘察阴阳二宅,已成为风水师唯一使用的工具。是经天纬地的仪器。罗盘目前主要有三合盘(即分正针、中针、缝针),还有三元卦盘,又叫挨星盘,装入先天六十四卦、三百八十四爻分度。罗盘已担负着风水师趋吉避凶的全部使命。 二、风水对人体生命的影响 择风水,是人们生活中必不可少的一项活动。只要是人,不管是活着的人,还是故去之人,均离不开风水的影响,它直接左右着人的寿数、健康、人事关系等等。不管你是在意还是无意,是相信还是不信,它都在对你起着作用。在卜卦预测中有一个基本原则,那就是:“心诚则灵”、“信则有,不信则无。”而风水则不一样,它不管你是心也好,心不诚也好,信亦好,不信亦好,它都客观存在着,只要你一住进房子,你就在这栋房子的保护下生存。阳宅的房子也有“生命”,它的生命强弱与吉凶,取决于周围环境、形态、坐向、通风、采光以及其“出生

c语言输入输出格式集合

1.转换说明符 %a(%A)浮点数、十六进制数字和p-(P-)记数法(C99) %c 字符 %d 有符号十进制整数 %f 浮点数(包括float和doulbe) %e(%E)浮点数指数输出[e-(E-)记数法] %g(%G)浮点数不显无意义的零"0" %i 有符号十进制整数(与%d相同) %u 无符号十进制整数 %o 八进制整数 e.g. 0123 %x(%X)十六进制整数0f(0F) e.g. 0x1234 %p 指针 %s 字符串 %% "%" 2.标志 左对齐:"-" e.g. "%-20s" 右对齐:"+" e.g. "%+20s" 空格:若符号为正,则显示空格,负则显示"-" e.g. "% 6.2f" #:对c,s,d,u类无影响;对o类,在输出时加前缀o;对x类,在输出时加前缀0x; 对e,g,f 类当结果有小数时才给出小数点。 3.格式字符串(格式) 〔标志〕〔输出最少宽度〕〔。精度〕〔长度〕类型 "%-md" :左对齐,若m比实际少时,按实际输出。

"%m.ns":输出m位,取字符串(左起)n位,左补空格,当n>m or m省略时m=n e.g. "%7.2s" 输入CHINA 输出" CH" "%m.nf":输出浮点数,m为宽度,n为小数点右边数位 e.g. "%3.1f" 输入3852.99 输出3853.0 长度:为h短整形量,l为长整形量 printf的格式控制的完整格式: % - 0 m.n l或h 格式字符 下面对组成格式说明的各项加以说明: ①%:表示格式说明的起始符号,不可缺少。 ②-:有-表示左对齐输出,如省略表示右对齐输出。 ③0:有0表示指定空位填0,如省略表示指定空位不填。 ④m.n:m指域宽,即对应的输出项在输出设备上所占的字符数。N指精度。用于说明输出的实型数的小数位数。为指定n时,隐含的精度为n=6位。 ⑤l或h:l对整型指long型,对实型指double型。h用于将整型的格式字符修正为short型。 --------------------------------------- 格式字符 格式字符用以指定输出项的数据类型和输出格式。 ①d格式:用来输出十进制整数。有以下几种用法: %d:按整型数据的实际长度输出。 %md:m为指定的输出字段的宽度。如果数据的位数小于m,则左端补以空格,若大于m,则按实际位数输出。

相关主题
文本预览
相关文档 最新文档