当前位置:文档之家› 2003 Special issue On the quality of ART1 text clustering

2003 Special issue On the quality of ART1 text clustering

2003 Special issue On the quality of ART1 text clustering
2003 Special issue On the quality of ART1 text clustering

2003Special issue

On the quality of ART1text clustering

Louis Massey

Royal Military College of Canada,Kingston,Ont.,Canada K7K 7B4

Abstract

There is a large and continually growing quantity of electronic text available,which contain essential human and organization knowledge.An important research endeavor is to study and develop better ways to access this knowledge.Text clustering is a popular approach to automatically organize textual document collections by topics to help users ?nd the information they need.Adaptive Resonance Theory (ART)neural networks possess several interesting properties that make them appealing in the area of text clustering.Although ART has been used in several research works as a text clustering tool,the level of quality of the resulting document clusters has not been clearly established yet.In this paper,we present experimental results with binary ART that address this issue by determining how close clustering quality is to an upper bound on clustering quality.

Crown Copyright q 2003Published by Elsevier Science Ltd.All rights reserved.

Keywords:Adaptive resonance theory;Text clustering;Text categorization

1.Introduction

We consider the application of clustering to the self-organization of a textual document collection.Clustering is the operation by which similar objects are grouped together in an unsupervised manner (Jain,Murty,&Flynn,1999;Kaufman &Rousseeuw,1990).Hence,when clustering textual documents,one is hoping to form sets of documents with similar content.Instead of exploring the whole collection of documents,a user can then browse the resulting clusters to identify and retrieve relevant docu-ments.As such,clustering provides a summarized view of the information space by grouping documents by topics.Clustering is often the only viable solution to organize large text collections into topics.The advantage of clustering is realized when a training set and classes de?nitions are unavailable,or when creating them is either cost prohibitive due to the collection shear size or unrealistic due to the rapidly changing nature of the collection.

We speci?cally study text clustering with Adaptive Resonance Theory (ART)(Carpenter &Grossberg,1995;Grossberg,1976)neural networks.ART neural networks are known for their ability to perform on-line and incremental clustering of dynamic datasets.Contrary to most other types of arti?cial neural networks such as the popular Back-propagation Multi-Layer Perceptron (MLP)(Rumelhart,

Hinton,&Williams,1986),ART is unsupervised and allows for plastic yet stable learning.ART detects similarities among data objects,typically data points in an N-dimensional metric space.When novelty is detected,ART adaptively and autonomously creates a new category.Another advantageous and distinguishing feature of ART is its ability to discover patterns at various levels of generality.This is achieved by setting the value of a parameter known as vigilance and denoted by r ;r [e0;1 :ART stability and plasticity properties as well as its ability to process dynamic data ef?ciently make it an attractive candidate for clustering large,rapidly changing text collections in real-life environments.Although ART has been investigated previously as a means of clustering text data,due to numerous variations in ART implementations,experimental data sets and quality evaluation method-ologies,it is not clear whether ART performs well in this type of application.Since ART seems to be a logical and appealing solution to the rapidly growing amount of textual electronic information processed by organizations,it would be important to eliminate any confusion surrounding the quality of the text clusters it produces.In this paper,we present experimental results with a binary ART neural network (ART1)that address this issue by determining how close clustering quality achieved with ART is to an expected upper bound on clustering quality.We will consider other versions of ART in future work.

0893-6080/03/$-see front matter Crown Copyright q 2003Published by Elsevier Science Ltd.All rights reserved.

doi:10.1016/S0893-6080(03)00088-1

E-mail address:massey-l@rmc.ca (L.Massey).

2.Related work

We consider one of the many applications of text clustering in the?eld of Information Retrieval(IR) (VanRijsbergen,1979),namely clustering that aims at self-organizing textual document collections.This appli-cation of text clustering can be seen as a form of classi?cation by topics,hence making it the unsupervised counterpart to Text Categorization(TC)(Sebastiani,2002). Text self-organization has become increasingly popular due to the availability of large document collections that change rapidly and that are quasi-impossible to organize manually.

A typical example is the organization of web documents according to some topics hierarchy like the Yahoo e hierarchy(Heuser&Rosenstiel,2000).Even TC becomes unsuitable in such environments because supervised learn-ing of classi?ers is not plastic and thus requires retraining upon detection of novelty.Representative work on text clustering includes,among many others,(Cutting,Karger, Pedersen,&Tukey,1992;Kohonen et al.,2000;Steinbach, Karypis,&Kumar,2000).There has also been interesting work done on incremental and on-line text clustering with non-neural approaches(see for instance(Can(1993)and Wong and Fu(2000)).

MacLeod and Robertson(1991)were as far as we know the?rst researchers to consider ART for text clustering. They used a modi?ed version of ART1in which the similarity computation and weight updates involved domain and task speci?c knowledge.The inclusion of this knowl-edge makes the ART implementation more complex but may advantage it over a basic form of ART1.We intend to test this type of ART network in future work,but for now we are interested in establishing the baseline quality that can be achieved with a more basic implementation.The relatively small Keen and Cran?eld text collections(800and1400 documents,respectively),were used to test MacLeod’s algorithm.ART clustering quality was evaluated with the F measure computed on the results of provided queries.This approach to cluster evaluation does not allow for a comprehensive evaluation of the resulting cluster structure, since it only considers the set of queries.The clustering quality results were not very good with F1?0:25and0.15 (minimum quality value is0and maximum1),but were comparable with other non-neural clustering results pub-lished on the same data sets.Merkl(1995)compared ART with Self-Organizing Maps(SOM)(Kohonen,2001)for the clustering of a small collection of documents and concluded that SOM forms better clusters based on a visual qualitative evaluation.We want to avoid such a subjective evaluation. Moreover,we think that SOM has several weaknesses compared to ART that make it unsuitable for document clustering in a real-life environment characterized by high volume of dynamic data.Indeed,it is unstable under growth while ART provides inherent stability and plasticity. Furthermore,the multiple iterations required to attain convergence with SOM are incompatible with a real-time environment.

On the other hand,some research concluded that ART text clustering resulted in good quality clusters.Vlajic and Card(1998)used a modi?ed ART2network to create a hierarchical clustering of a small number of web pages. They report that clustering was‘appropriate in all cases, when compared to human performance[…]’,but provide no quantitative result.In previous work,we also considered hierarchical clustering,but with ART1and with a small database of document titles(Massey,2002a).Clustering quality was deemed adequate based on the percentage of overlap between clusters and expected classi?cation.Our evaluation measure and text collection were non-standard. Rajaraman and Tan(2001)apply fuzzy ART to the task of Topics Detection and Tracking,in which the aspect of topic detection is performed by ART text clustering.They use a small set of1468news articles and attempt to measure the ability of the ART network to detect novelty.However, once again only qualitative results are presented.Finally, Kondadadi and Kozma(2002)compare KMART,their soft clustering version of ART,to Fuzzy-ART and k-means.The text data consists of2000documents downloaded from the web as well as another2000newsgroup documents.Quality evaluation is based on a one-to-one match of the documents in the clusters with the documents in the speci?ed category. K-MART and Fuzzy-ART results are encouraging with above50%matching for100–500document subsets of the original text collections,while k-means stays in the range of 22–35%matching.However,this evaluation method is optimistic since it does not account for false positives(i.e. documents that are present in a cluster but do not match documents in the corresponding desired topic).

Our interest in this paper is the study of unsupervised text organization,hence supervised versions of ART(Carpenter Grossberg,&Reynolds,1991a)applied to text categoriz-ation(Petridis,Kaburlasos,Fragkou,&Kehagias,2001) will not be considered.

3.Experimental settings

We selected two well-established cluster quality evaluation measures:Jaccard(JAC)(Downton&Brennan, 1980)and Fowlkes–Mallows(FM)(Fowlkes&Mallows, 1983):

JAC?a=eatbtcTe1TFM?a=eeatbTeatcTT1=2e2Twhere

a is the pair-wise number of true positives,i.e.the total

number of document pairs grouped together in the expected solution and that are indeed clustered together by the clustering algorithm;

L.Massey/Neural Networks16(2003)771–778 772

b is the pair-wise number of false positives,i.e.the

number of document pairs not expected to be grouped together but that are clustered together by the clustering algorithm;

c is the pair-wise number of false negatives,i.e.the

number of document pairs expected to be grouped together but that are not clustered together by the clustering algorithm.

We also use a measure that computes the F1clustering quality value.It uses the same underlying pair-wise counting procedure as Jaccard and Fowlkes–Mallow to establish a count of false negatives and false positives,but combines those values following the F-measure(VanRijsbergen, 1979)formulae:

F b?eb2t1Tpr=?b2ptr e3Twhere p?a=eatbTis known as the precision and r?a=eatcTas recall.b is set to1to give equal weighting to precision and recall.We must note that other text clustering work,such as(Larsen and Aone(1999)and Wong and Fu (2000)),have evaluated clustering quality with F1computed on the best cluster-class match.Our initial experiments with this approach indicated that it might unfairly in?ate quality. We have not conducted an indepth analysis of this divergence at this point in time,but we will as part of future work.

The‘ModApte’split(Apte,Fred Damerau,&Weiss, 1994)of the Reuter-21578Distribution1.01data set is used for our experiments.This data sets is known to be challenging because of skewed class distribution,multiple overlapping categories,and its real-life origin(Reuter newswires during the year1987,in chronological order). We evaluate clustering results against the desired solution originally speci?ed by Reuter’s human classi?ers.We only use the desired solution information to evaluate clustering results,i.e.after the clusters have been formed. Reuter is a benchmark data set for https://www.doczj.com/doc/4c12130509.html,ing this data set speci?c split and the F1quality measure makes compari-son with published TC results(Yang&Liu,1999)on the same split possible.This is an important and we believe innovative aspect of our experimental approach:TC F1 quality results are used as an upper bound for cluster quality since learning in a supervised framework with labeled data provides the best possible automated text classi?cation(with current technology).Thus,clustering can be expected to approach this level of quality but not exceed it since it relies solely on the information present in the data itself.This way of evaluating clustering quality allows one to clearly establish the level of quality obtained by a clustering algorithm as a percentage of the upper bound quality.

We use the k-means(MacQueen,1967)clustering algorithm to establish a lower bound for quality.Our rationale is that since k-means represents one of the simplest

possible approaches to clustering,one would expect that any

slightly more advanced algorithm would exceed its cluster-

ing quality.The parameter k is set to the number of topics

(93)speci?ed by the domain experts who manually

organized the Reuter text collection.K-means initial cluster

centroids are determined randomly and clustering results are

averaged over10trials to smooth out extreme values

obtained from good and bad random initialization.Our hope

is that ART clusters would exceed signi?cantly the quality

obtained with k-means and approach the quality of

supervised TC.

In this set of experiments,we use the simplest form of

ART,binary ART1in fast learning mode,to establish

what should be the baseline level of quality attainable by

ART neural networks.ART1networks consist of two

layers of neurons:N input neurons and M output

neurons,where N is the input size and M the number

of clusters.Neurons are fully connected with both feed-

forward and feedback weighted links.The feed-forward

links connecting to the output neuron j are represented

by the real vector W j while the feedback links from that

same neuron are represented by the binary vector T j:The latter stores the prototype representing cluster j:We

speci?cally use Moore’s ART1implementation(Moore,

1988),as follows:

1Initialize network weights and provide parameter

values:

0,r#1(the vigilance parameter)and L.1

W j?1=e1tNTfor all forward connection weights

T j?1for all feedback connection weights

2Set the output neurons activation u j?0for j?1…M

and present a document X k to the network

3Compute output activations:u j?X k·W j for j?1…M

and where·is the inner product.

4Competition between output units:select the most

similar category represented by output neuron j p

with maximal activation.

5Vigilance test:determine if j p is close enough to

X k:

k X

k

^T j p k=k X k k$r

where‘^’is the logical AND operation

If true,go to step6(resonance mode);otherwise,

go to step8(search mode).

6Update weights for winning node:

T0j p?T^j p X k

W0j p?LeT j p^X kT=eL21tk T j p^X k kT

7Return to step2with a new document.

8u j p?21(remove category j p from current search)

and return to step4.

The vigilance parameter r[e0;1 determines the level of abstraction at which ART discovers clusters.Moreover,the minimal number of clusters present in the data can be determined by minimal vigilance(Massey,2002b),computed

1Available from https://www.doczj.com/doc/4c12130509.html,/resources/testcollections/

reuters21578/

L.Massey/Neural Networks16(2003)771–778773

as r min ,1=N where N is the number of features (words)used to represent a document.We chose a value of r min ?0:0005as the initial vigilance parameter and we increment it until we ?nd the best clustering quality.We stop increasing vigilance when more than 200clusters are obtained because such a large number of clusters would simply result in information overload for a user and therefore not achieve the intended objective of text clustering.

A binary vector-space (Salton &Lesk,1968)represen-tation was created for the Reuter ModApte

′test set.Only the test set was clustered for compatibility reasons with TC results and also because in unsupervised learning,one must assume that a training set is unavailable.A standard stop word list was used to remove frequent words and a simple feature reduction by term selection based on term frequency was applied to reduce the dimensionality of the original documents feature space.This approach was judged very effective for TC by Yang and Pedersen (1997)).

4.Experimental results

We eliminated words that appear in 10,20,40and 60or less documents.In the ?rst case,a total of 2282term features were retained while in the last only 466were.Our experiments indicated that less radical feature selection not only increased the number of features and consequently processing time,but also resulted in lower quality clusters in some cases (Fig.1).Best quality is achieved at vigilance value of 0.05,with 106clusters,a number close to the expected number of topics speci?ed by the domain experts who labeled the data (93).Vigilance levels past 0.1result in over 250clusters,which is not desirable for users as explained previously.

The results shown in Fig.1were obtained with a single pass in the data.In reality,ART converges to a stable representation after at most N 21presentations of the data (Georgiopoulos,Heileman,&Huang,1990).By stable,it is meant that if the same document is presented several times to the network,it should be assigned to the same category,and presenting the same inputs over and over should not change the cluster prototype values.Unstable clusters are

problematic since an identical document submitted at different times may end up in different clusters.Further-more,we show in Fig.2a that cluster quality increases after ART has stabilized.Only four iterations were required to attain a stable representation,which is much less than the theoretical upper bound of N 21:However,in real world,high-volume operations this could still be a problem as little idle time in the system operation may be available to stabilize topics representation.

We have processed the documents in their natural order,i.e.the chronological order in which they have been created and thus the order in which they would be submitted to a classi?cation system.This simulates the real-world environ-ment where there is no control over the order in which documents are created.As with any on-line clustering algorithm,ART gives different results depending on the order of presentation of the data.This is expected since clustering decisions are taken for each sequentially submitted document,compared to batch clustering that considers all data at once.We submitted the data set in 15different random orders to ART and averaged clustering quality for each order.Fig.2b shows that other orders of presentation are much worse than the natural,chronological order of the documents in Reuter.This is encouraging because if quality was higher for other orders of presen-tation,one would face the problem of ?nding the best order among the very large number of possibilities or design a way to combine results from different orders.It is possible that other text collections face this problem.Maybe it just happens that Reuter natural order is simply compatible with the desired solution,while in other cases this may not happen.After all,there are many ways to organize large text collections.Some versions of ART use a similarity measure claimed to make it less susceptible to order variations (Sadananda &Sudhakara Rao,1995).Our initial exper-iments with Sadananda and Sudhakara Rao’s similarity measures are inconclusive at this point in time.However,their proposed similarity measure will in some circum-stances not allow the vigilance test (step 5of the algorithm)to pass even for a newly created category node.Hence,novelty integration becomes impossible unless the

vigilance

Fig.1.(a)More radical term selection (removing words appearing in 60documents or less)results in better clustering quality in some cases,(at vigilance 0.05),compared to removing terms appearing in only 20documents or less.(b)More radical feature selection also results in much smaller data set dimentionality which in turn allows for more rapid processing.(c)Vigilance 0.05?nds a number of cluster close to the expected number of 93.Vigilance of 0.1creates too many clusters for users.

L.Massey /Neural Networks 16(2003)771–778

774

test is not performed for new category nodes.This amounts to a simple modi?cation to the algorithm listed in this paper.Fig.2c .shows that ART1cluster quality clearly exceeds the lower bound established by K -means.However,clustering quality achieved by ART with random orders of presentation is comparable to K -means.This implies that if the natural order of presentation does not correspond to the chosen organization of the data,ART1will not do better than K -means.We now compare ART1clustering quality to the upper bound expected for cluster quality:the best TC results obtained with Support Vector Machines (SVM)and k-Nearest Neighbors (kNN)published in Yang and Liu (1999)(Fig.3).ART1achieves 51.2%of the TC https://www.doczj.com/doc/4c12130509.html,paring to 16.3%level of quality for k -means,the lower bound,ART1does much better but is still only about half-way to the optimal expected quality.There is also a potential for lower quality with other orders of presentation.We must however point out that the solution we use to evaluate clusters is merely one among many other useful ways to organize the text collection.Hence,we merely evaluate the ability of ART to recover the speci?ed solution.A users study may better validate the structure discovered by ART,but such studies are costly and also subjective.We also built a small program that simulates ART cluster prototype updating behavior and attempts to assign each document to its desired topic.We found that 2.2%of documents could not be assigned to their designated topic.Thus,even in the best conditions,perfect clustering is not possible with ART1with this data set in natural order since right from the start about 2%quality is lost to the order of

presentation.This can be explained as follows:a document is assigned to a topic if it has a suf?ciently number of overlapping features,as determined by the vigilance par-ameter.Furthermore,over time as documents are submitted to the network,the number of active features in prototype vectors decreases,which makes matching a document with its desired topic more unlikely.This is caused by the prototype updating mechanism that intersects documents assigned to a topic with that topic prototype.So,in our simulation,if no feature of a document overlap with the prototype for the document desired topic,it means that the expected solution cannot be satis?ed with ART1and with this order of presentation.Better prototype updating may be required to improve quality,such as (MacLeod &Robertson,1991),which used the union of a document and its prototype features.Finally,while conducting our experiments,we noticed an interesting phenomenon:clusters do not necessarily form at the speci?ed level of abstraction.In other words,ART sometimes discovers topics generalizing or specializing desired topics.A generalization is a cluster that includes two or more classes (a class being a group of documents representing a desired topic).A specialization on the other hand is a class that includes two or more clusters.In a sense,such behavior can be expected from clustering algorithms since they rely solely on similarity among data items rather than on directions by a domain expert.Fig.4shows a portion of the confusion table built from the matches between clusters and classes to illustrate generalizations and specializations.A match is the number of documents a class and a cluster have in common.We note from Fig.4that:?cluster 0is dominated by 169documents from class 25(topic earnings)out of a total of 1087(16%)documents expected to belong to that

class;

Fig.2.All results shown for vigilance 0.05.(a)Stabilization improves ART clustering quality.(b)Random orders of presentation even when stabilized give much worse clusters than the natural order of the documents in Reuter.(c)ART clusters (in natural order,stabilized)are of better quality than k -means ek ?93T

:

Fig.3.ART1clustering F 1with stabilization at vigilance 0.05for data in its natural order (ART1Nat).For both TC methods (SVM and k -NN),the micro-averaged F1values is used for compatibility with our F1pair

measure.

?cluster 1is dominated by 741documents from class 25(earnings)out of a total possible of 1087(68%);

?cluster 3is dominated by 598documents from class 16(acquisitions)out of a total possible of 719(83%),but there is also a strong presence from class 0(trade),class 1(grain),class 2(crude),class 11(shipping)and class 18(interest rate).Hence,one can consider that clusters 0and 1actually correspond to class 25(earnings)with 910of the 1087documents,so topic earnings is specialized.Cluster 3corresponds to classes 16(acquisitions),class 0(trade),class 1(grain),class 2(crude),class 11(shipping),and class 18(interest rate)for a total of 1106of 1395documents.So all these classes are generalized by cluster 3.Other class-cluster matches may be deemed to simply lower generalization and specialization quality.We designed a quality evaluation methodology that,contrary to existing clustering evaluation measures,does not penalize generalizations and specializations.For each class,it looks for all clusters that match the class.This allows for the discovery of specializations.Then,for each of the matching cluster,it also looks for all matching class,which accounts for generalizations.Extraneous documents in the latter case are the clustering errors and a F 1value is computed based on these errors.Fig.5shows quality evaluation with this measure,which we call Sub-Graph Dispersion (SGD)because a sub-graph is created for each class when looking for matches.If one considers generalizations and specializations of the expected solution as acceptable,higher quality can be computed at lower vigilance,but the quality of generalizations and specialization decreases as vigilance increases.This method of evaluation needs to be re?ned before ?nal conclusions on the true impact of learning at different levels of abstraction on clustering quality can be drawn.For instance,all matches are currently considered but some negatively affect quality and should not be included as being part of generalizations or specializations.

5.Conclusions and future work

Text clustering work conducted with ART up to now has used many different forms of ART-based architec-tures,as well as different and non-comparable text collections and evaluation methods.This situation resulted in confusion as to the level of clustering quality achievable with ART.As a ?rst step towards resolving this situation,we have tested a simple ART1network implementation and evaluated its text clustering quality on the benchmark Reuter data set and with the standard F 1measure.K -means clustering quality was used as the lower bound on quality while published results with supervised TC were used as an upper bound on quality.Our experiments have demonstrated that text clusters formed by ART1achieve 51%of TC upper bound and exceed the lower bound considerably.Consequently,about half of the evidence needed to recover the expected document organization solution is available directly from the data under the form of inter-document similarity rather than from costly and time consuming handcrafting of a large labeled training data set.Whether this level of quality is suf?cient is a task speci?c question and ultimately a matter of cost/quality trade-off:it is a choice between higher quality supervised document categoriz-ation obtained at high development and maintenance cost versus lower quality clusters obtained at basically no cost.At least we provide here a clear picture of the quality aspect by establishing the baseline quality to be expected with ART.Although ART clusters were of medium quality,ART has the unique advantage of proceeding entirely without human intervention,plus offers the interesting properties of plasticity and stability.Should novelty be detected by the network,a new topic would automatically be created as part of normal system operation.This contrasts with supervised TC,which would require downtime for re-training and related human intervention to prepare a new training set.Therefore,despite lower quality,there may be some situations where ART-based text clustering is a necessity.Furthermore,clustering quality can be increased if one considers discovery of topics at other levels of abstraction as acceptable.Hence,an important area of future research is to explore evaluation measures that do not penalize specializations and generalizations.We are also looking into better feature selection that may also help improve cluster quality.For instance,preliminary experiments with TF-IDF,a well known Information Retrieval measure of term importance,are encouraging in that respect.More-over,more advanced ART architectures with nonbinary representation (such as ART2(Carpenter,Grossberg,&Rosen,1991b ),fuzzy ART (Carpenter,Grossberg,&Rosen,1991c ),MacLeod’s ART (MacLeod &Robertson,1991)and FOSART (Baraldi &Alpaydin,2002))may further improve cluster quality.We are currently inves-tigating these avenues.As well,in the Reuter

collection,

Fig.5.Increased quality is computed by SGD at lower vigilance by not penalizing generalizations and specializations.Stabilized results shown.

L.Massey /Neural Networks 16(2003)771–778

776

topics are not mutually exclusive,while ART1clustering is.ART based soft clustering such as with KMART (Kondadadi&Kozma,2002)will thus be explored as yet another possible way to improve clustering.

ART stability and plasticity properties as well as its ability to process dynamic data ef?ciently make it an attractive candidate for clustering large,rapidly changing text collections in real-life environments.However,in this paper we have only evaluated the static clustering case with some glimpses at the issues that may arise in a more realistic,dynamic environment.For instance,there is a requirement for idle time to allow for stabilization in a text clustering system.We are currently conducting a full assessment of ART’s text clustering performance in a simulated realistic environment.

Comparison with other clustering methods has not been our objective.We rather focused on establishing the level of quality achieved by ART within the range de?ned by a lower and an upper bound.We believe this gives a better appreciation of the level of quality by setting it in a wider framework.Some investigators have evaluated clustering quality with other algorithms on Reuter-21578and with the F1measure,but have used non-standard splits(Larsen& Aone,1999;Steinbach et al.,2000).So our results cannot be compared directly with theirs.We plan to eventually evaluate other clustering methodologies—particularly incremental clustering algorithms—to compare their clus-tering quality and ability to function in a dynamic environment to ART’s.As well,testing on other text collections is needed to verify if quality results apply to document sets displaying various characteristics. References

Apte,C.,Fred Damerau,F.,&Weiss,S.M.(1994).Automated learning of decision rules for text categorization.ACM Transactions on Infor-mation Systems,12(2),233–251.

Baraldi,A.,&Alpaydin,E.(2002).Constructive feedforward art clustering networks—Part II.IEEE Transactions on Neural Networks,13,3. Can,F.(1993).Incremental clustering for dynamic information,proces-sing.ACM Transactions on Information Systems,11(2),143–164. Carpenter,G.A.,&Grossberg,S(1995).Adaptive Resonance Theory (ART).In Handbook of brain theory and neural networks,Arbib,M.A.: MIT Press.

Carpenter,G.A.,Grossberg,S.,&Reynolds,J.H.(1991a).Artmap: supervised real-time learning and classi?cation of nonstationary data by

a self-organizing neural network.Neural Networks,4,565–588. Carpenter,G.A.,Grossberg,S.,&Rosen,D.B.(1991b).Art2-a:an adaptive resonance algorithm for rapid category learning and recognition.Neural Networks,4,493–504.

Carpenter,G.A.,Grossberg,S.,&Rosen,D.B.(1991c).Fuzzy art:fast stable learning and categorization of analog patterns by an adaptive resonance system.Neural Networks,4,759–771.

Cutting,D.Karger,D.,Pedersen,J.,&Tukey,J(1992).Scatter-gather:a cluster-based approach to browsing large document collections.In Proceedings of SIGIR’92.

Downton,M.,&Brennan,T(1980).Comparing classi?cations:an evaluation of several coef?cient of partition agreement.In Proceedings of the meeting of the classi?cation society,Boulder,CO.Fowlkes, E.,&Mallows, C.(1983).A method for comparing two hierarchical clusterings.Journal of American Statistical Association, 78,553–569.

Georgiopoulos,M.,Heileman,G.L.,&Huang,J.(1990).Convergence properties of learning in ART1.Neural Computation,2(4),502–509. Grossberg,S.(1976).Adaptive pattern classi?cation and universal recording:I.Parallel development and coding of neural feature detectors.Biological Cybernetics,23,121–134.

Heuser,U.,&Rosenstiel,W(2000).Automatic construction of local internet directories using hierarchical radius-based competitive learn-ing.In Proceedings of the fourth world multiconference on systemics, cybernetics and informatics(SCI2000),Orlando/Florida,Vol.IV (Comunications Systems and Networks)(pp.436–441).

Jain,A.K.,Murty,M.N.,&Flynn,P.J.(1999).Data clustering:a review.

ACM Computing Surveys,31,3.

Kaufman,L.,&Rousseeuw,P.J.(1990).Finding groups in data:An introduction to cluster analysis.New York:Wiley/Interscience. Kohonen,T.(2001).Self-organizing maps.Springer series in information. Kohonen,T.,Lagus,K.,Saloja¨rvi,J.,Honkela,J.,Paatero,V.,&Saarela,A.

(2000).Self organization of a document collection.IEEE Transactions on Neural Networks,11,3.

Kondadadi,R.,&Kozma,R(2002).A modi?ed fuzzy art for soft document clustering.In Proceedings of the international joint conference on neural networks,Honolulu,HA.

Larsen,B.,&Aone,C(1999).Fast and effective text mining using linear-time document clustering.In Proceedings of the?fth ACM SIGKDD international conference on Knowledge discovery and data mining(pp.

16–22).

MacLeod,K.J.,&Robertson,W.(1991).A neural algorithm for document https://www.doczj.com/doc/4c12130509.html,rmation Processing and Management,27(4),337–346. MacQueen,J(1967).Some methods for classi?cation and analysis of multivariate observations.In Proceedings of the?fth Berkeley symposium on mathematical statistics and probability.Vol.1,Statistics. Massey,L(2002a).Structure discovery in text collections.In Proceedings of the sixth international conference on knowledge-based intelligent information&engineering systems(KES’2002),Italy.

Massey,L(2002b).Determination of clustering tendency with art neural networks.In Proceedings of recent adavances in soft-computing (RASC02),Nottingham,UK.

Merkl,D(1995).Content-based software classi?cation by selforganization.

In Proceedings of the IEEE international conference on neural networks(ICNN’95),Perth,Australia(pp.1086–1091).

Moore,B(1988).ART and pattern clustering.In Proceedings of the1988 connectionist models summer school(pp.174–183).

Petridis,V.,Kaburlasos,V.G.,Fragkou,P.,&Kehagias,A(2001).Text classi?cation using the sigma-FLNMAP neural network.In Proceed-ings of international joint conference on neural networks,Washington, DC.

Rajaraman,K.,&Tan,A.-H(2001).Topic detection,tracking and trend analysis using self-organizing neural networks.In Proceedings of the ?fth Paci?c-Asia Conference on knowledge discovery and data mining (PAKDD’01),Hong Kong(pp.102–107).

Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986).Learning internal representations by error propagation.In D.E.Rumelhart,&J.L.

McClelland(Eds.),(Vol.1)(pp.318–364).Parallel distributed processing:Explorations in the microstructure of cognition,Foun-dations,Cambridge,MA:MIT Press.

Sadananda,R.,&Sudhakara Rao,G.R.M(1995).ART1:model algorithm characterization and alternative similarity metric for the novelty detector.In Proceedings IEEE international conference on neural networks,Vol.5(pp.2421–2425).

Salton,G.,&Lesk,M.E.(1968).Computer evaluation of indexing and text processing.Journal of the ACM,15(1),8–36.

Sebastiani,F.(2002).Machine learning in automated text categorization.

ACM Computing Surveys,34(1),1–47.

Steinbach,M.,Karypis,G.,&Kumar,V(2000).A comparison of document clustering techniques.In Proceedings sixth ACM SIGKDD

L.Massey/Neural Networks16(2003)771–778777

international conference on knowledge discovery and data mining, Boston,MA.

VanRijsbergen, C.J.(1979).Information retrieval.London:Butter-worths.

Vlajic,N.,&Card,H.-C(1998).Categorizing web pages using modi?ed ART.In Proceedings of IEEE1998Canadian conference on electrical and computer engineering.

Wong,W.-C.,Fu,A.,&W.-C(2000).Incremental document clustering for web page classi?cation.In Proceedings of IEEE2000international

conference on information society in the21st century:emerging technologies and new challenges,Japan.

Yang,Y.,&Liu,X(1999).A re-examination of text categorization methods.In Proceedings of international ACM conference on research and development in information retrieval(SIGIR-99)(pp.42–49). Yang,Y.,&Pedersen,J.O(1997).A comparative study on feature selection in text categorization.In Proceedings of ICML-97,14th international conference on machine learning,Nashville,USA(pp.

412–420).

L.Massey/Neural Networks16(2003)771–778 778

影响作物品质的因素及其提高品质的的途径

河西学院2010—2011学年第二学期期中考试试卷 考试课程:作物栽培学 考试方式: 课程论文 系别:农业与生物技术学院 班级:08级1班 姓名:陈银辉 学号:0815102 得分:

影响作物品质的因素及其提高品质的途径 陈银辉 指导老师:肖占文 农业与生物技术学院种子科学与工程08(1)班张掖 734000 摘要:本文对影响作物品质的因素及其提高品质的的途径做了详细分析与讨论。作物的品质的内因是遗传因素;外因主要有:地理条件、季节条件、温光条件、水分条件、营养条件。提高作物品质的途径有:一、通过育种手段改善品质形成的遗传因素,培育优良新品种;二、采取相应的调控措施,为优质产品的形成创造有利条件。在作物栽培过程中,应综合考虑各因素之间的相互作用对作物品质的影响,对不同的情况采取与之相应对措施,以保证作物的品质。 关键词:影响、品质、因素、提高、途径 作物品质(crop quality)是指收获目标产品达到某种用途要求的适合度。作物品质的评价标准,即所要求的品质内容因产品用途而异。作为食用的产品,其营养品质和食用品质更重要,作为衣着原料的产品,其纤维品质是人们所重视的。评价作物产品品质,一般采用两种指标,一是化学成分,以及有害物质如化学农药、有毒金属元素的含量等;二是物理指标,如产品的形状、大小、滋味、香气、色泽、种皮厚薄、整齐度、纤维长度和强度等。作物品质的优劣不仅关系到产品的经济价值高低,而且关系到人体的健康。因此,分析影响作物品质的因素及其提高品质的途径有至关重要的意义。 1.影响作物品质的因素 1.1遗传因素 作物产品品质性状,包括蛋白质、淀粉、脂肪、维生素、食味和蒸煮品质等,一般受遗传基因的控制;例如稻米香味(受一个隐性基因控制)、籽粒长度、垩白率等性状,容易受环境条件的影响;淀粉及支链淀粉含量受主效基因和多基因的控制,正因为如此,作物之间本身就存在着产品品质的差异;另一方面作物品种性状在遗传上一般都是数量性状,就容易受环境条件的影响了。 1.2环境因素 1.2.1地理条件 禾谷类作物籽粒(小麦、水稻、玉米、大麦、黑麦)籽粒蛋白质含量由北向南、由西向东逐渐提高;而大豆是南高北低;同时小麦的

产量与源库的关系及其在作物高产中的意义

产量与源库的关系及其在作物高产中的意义 二道河农场156330 李超高爱红周元明 摘要作物生长发育过程中,源库关系在动态变化中相互协调取得平衡,是作物获得 高产的基础。“源”的大小对“库”的建成及其潜力的发挥具有明显的作用,“库”原有的生产潜力能否转化为最终的籽粒产量取决于同化“源”的供应量。因此探讨“库”潜力,“源”的供应能力以及“库”“源”的关系,对于充分发挥“库”的潜力,提高产量是十分必要的,本文简要的对源库关系及对产量的影响进行分析。 关键词:产量源库关系叶/粒 前言:每个国家和地区都在重视农业生产,特别是发展中国家,更是把粮食问题作为 基础,我国加入WTO之后,经济贸易竞争更加激烈,农业更加具有重要性,进行“高产、优质、高效”产品的育种与栽培则是超高产急需解决的问题。源库关系是作物的理论基础之一,协调源库关系对于挖掘作物的生产潜力具有重要作用。 1、概念 1、1源:是光合产物的生产和供应者,源是库的基础,源的能力主要受叶面积光合时间、 光照强度的影响,对于大多作物来说,源主要指的是叶源,而稻麦类作物的源包括叶源、茎鞘源和根源三部分[1]。 1、2 库:是光合产物积累、需求、消耗的器官,库对源是一种反馈调节,库的能力受库的容量和库的强度的影响,对于收获籽粒来讲,籽粒则是产量库[9]。源库具有相对性,一定时期源在另一时期就转变成库;有的作物在一定时期收获的库又是生产源之一[1]。 1、3 叶/粒:表示单位叶面积所承载的库容量,即反映了库容的相对大小,又代表源的质量水平和库对源的调运能力,可作为群体条件下源库关系是否协调的综合指标[6]。 2、源库关系 2、1 源对库的影响:王长宏[2]通过试验对不同密度群体,不同时期单株,叶片质量和单株干物重,几个生理性状与籽粒发育性状进行相关分析表明:玉米开花的籽粒数量潜力则是主要取决于灌浆期的营养状况,郭文善等[6]对小麦试验也得到相似结果。王光华等[12]试验还表明:在生殖生长期时,源供应不足时,库调运植株体内含氮物质的能力明显增强。 2、2 库对源的影响:受有机物的分配供应能力,竞争力和运输三个因素影响,其中竞争能力作用最为重要,这个竞争也就是库的代谢强度,穗部受光条件改变必然引起其生理与生化代谢变化,这种变化的结果可能是削弱代谢库的活力,导致穗部吸收光合产物的能力下降,库的生理机能下降对源的光合作用会产生反馈抑制作用,结果会造成光合产物的减少和生物产量的下降[3],即源的能力降低。陈温福[10]试验表明:随着粒叶比的增加,单株光合生产量也明显增加,表明增加库的能力可能可促进个体的光合作用。 2、3 源库关系:叶/粒大小能反映源库关系是否协调[1],随着粒叶比的增加,单株光合生产量明显增加,表明增加库的能力可促进光合作用,但当粒叶比增加到一定程度后,光合产量不仅不再继续增加,反而有所降低,这是因为库的增加促进了叶片的早衰,使后期的光合量减少,粒叶比不同,抽穗后叶面积发展动态亦不同,随着生育进程,粒叶

影响作物品质的因素及提高作物品质的方法和途径

影响作物品质的因素及提高作物品质的方法和途径 一、作物品质简介概述: 作物品质(crop quality)是指收获目标产品达到某种用途要求的适合度。作物品质的评价标准,即所要求的品质内容因产品用途而异。作为食用的产品,其营养品质和食用品质更重要,作为衣着原料的产品,其纤维品质是人们所重视的。评价作物产品品质,一般采用两种指标,一是化学成分,以及有害物质如化学农药、有毒金属元素的含量等;二是物理指标,如产品的形状、大小、滋味、香气、色泽、种皮厚薄、整齐度、纤维长度和强度等。作物品质的优劣不仅关系到产品的经济价值高低,而且关系到人体的健康。因此,分析影响作物品质的因素及其提高品质的途径有至关重要的意义。 1.1作物品质仪器种类: 作物品质仪器的种类主要包括凯式定氮仪、粗纤维测定仪、脂肪测定仪、精米机面筋测定仪、烟点仪谷物硬度计、降落数值测定仪、智能白度仪、淀粉分析仪、罗维朋比色计、圆形验粉筛、油脂测量仪、粘度测试仪、谷物容重器、磁性金属测定仪、粉质拉伸仪等,这些作物品质仪器将准确快速的辅助广大农民科研朋友。 二、影响作物品质的因素: 2.1遗传因素 作物产品品质性状,包括蛋白质、淀粉、脂肪、维生素、食味和蒸煮品质等,一般受遗传基因的控制;例如稻米香味(受一个隐性基因控制)、籽粒长度、垩白率等性状,容易受环境条件的影响;淀粉及支链淀粉含量受主效基因和多基因的控制,正因为如此,作物之间本身就存在着产品品质的差异;另一方面作物品种性状在遗传上一般都是数量性状,就容易受环境条件的影响了。

2.2环境因素 2.2.1地理条件 禾谷类作物籽粒(小麦、水稻、玉米、大麦、黑麦)籽粒蛋白质含量由北向南、由西向东逐渐提高;而大豆是南高北低;同时小麦的蛋白质含量随纬度或海拔高度的降低而逐渐降低。高纬度和高海拔地区,由于气温低、雨量少、日照长、昼夜温差大,有利于油分的合成。因此,地理条件严重影响着作物的品质。 2.2.2季节条件

相关主题
文本预览
相关文档 最新文档