TREC 2003 QA at BBN Answering Definitional Questions
- 格式:pdf
- 大小:49.35 KB
- 文档页数:9
Multinomial Randomness Models for Retrieval withDocument FieldsVassilis Plachouras1and Iadh Ounis21Yahoo!Research,Barcelona,Spain2University of Glasgow,Glasgow,UKvassilis@,ounis@ Abstract.Documentfields,such as the title or the headings of a document,offer a way to consider the structure of documents for retrieval.Most of the pro-posed approaches in the literature employ either a linear combination of scoresassigned to differentfields,or a linear combination of frequencies in the termfrequency normalisation component.In the context of the Divergence From Ran-domness framework,we have a sound opportunity to integrate documentfieldsin the probabilistic randomness model.This paper introduces novel probabilis-tic models for incorporatingfields in the retrieval process using a multinomialrandomness model and its information theoretic approximation.The evaluationresults from experiments conducted with a standard TREC Web test collectionshow that the proposed models perform as well as a state-of-the-artfield-basedweighting model,while at the same time,they are theoretically founded and moreextensible than currentfield-based models.1IntroductionDocumentfields provide a way to incorporate the structure of a document in Information Retrieval(IR)models.In the context of HTML documents,the documentfields may correspond to the contents of particular HTML tags,such as the title,or the heading tags.The anchor text of the incoming hyperlinks can also be seen as a documentfield. In the case of email documents,thefields may correspond to the contents of the email’s subject,date,or to the email address of the sender[9].It has been shown that using documentfields for Web retrieval improves the retrieval effectiveness[17,7].The text and the distribution of terms in a particularfield depend on the function of thatfield.For example,the titlefield provides a concise and short description for the whole document,and terms are likely to appear once or twice in a given title[6].The anchor textfield also provides a concise description of the document,but the number of terms depends on the number of incoming hyperlinks of the document.In addition, anchor texts are not always written by the author of a document,and hence,they may enrich the document representation with alternative terms.The combination of evidence from the differentfields in a retrieval model requires special attention.Robertson et al.[14]pointed out that the linear combination of scores, which has been the approach mostly used for the combination offields,is difficult to interpret due to the non-linear relation between the assigned scores and the term frequencies in each of thefields.Hawking et al.[5]showed that the term frequency G.Amati,C.Carpineto,and G.Romano(Eds.):ECIR2007,LNCS4425,pp.28–39,2007.c Springer-Verlag Berlin Heidelberg2007Multinomial Randomness Models for Retrieval with Document Fields 29normalisation applied to each field depends on the nature of the corresponding field.Zaragoza et al.[17]introduced a field-based version of BM25,called BM25F,which applies term frequency normalisation and weighting of the fields independently.Mac-donald et al.[7]also introduced normalisation 2F in the Divergence From Randomness (DFR)framework [1]for performing independent term frequency normalisation and weighting of fields.In both cases of BM25F and the DFR models that employ normali-sation 2F,there is the assumption that the occurrences of terms in the fields follow the same distribution,because the combination of fields takes place in the term frequency normalisation component,and not in the probabilistic weighting model.In this work,we introduce weighting models,where the combination of evidence from the different fields does not take place in the term frequency normalisation part of the model,but instead,it constitutes an integral part of the probabilistic randomness model.We propose two DFR weighting models that combine the evidence from the different fields using a multinomial distribution,and its information theoretic approx-imation.We evaluate the performance of the introduced weighting models using the standard .Gov TREC Web test collection.We show that the models perform as well as the state-of-the-art model field-based PL2F,while at the same time,they employ a theoretically founded and more extensible combination of evidence from fields.The remainder of this paper is structured as follows.Section 2provides a description of the DFR framework,as well as the related field-based weighting models.Section 3introduces the proposed multinomial DFR weighting models.Section 4presents the evaluation of the proposed weighting models with a standard Web test collection.Sec-tions 5and 6close the paper with a discussion related to the proposed models and the obtained results,and some concluding remarks drawn from this work,respectively.2Divergence from Randomness Framework and Document Fields The Divergence From Randomness (DFR)framework [1]generates a family of prob-abilistic weighting models for IR.It provides a great extent of flexibility in the sense that the generated models are modular,allowing for the evaluation of new assumptions in a principled way.The remainder of this section provides a description of the DFR framework (Section 2.1),as well as a brief description of the combination of evidence from different document fields in the context of the DFR framework (Section 2.2).2.1DFR ModelsThe weighting models of the Divergence From Randomness framework are based on combinations of three components:a randomness model RM ;an information gain model GM ;and a term frequency normalisation model.Given a collection D of documents,the randomness model RM estimates the probability P RM (t ∈d |D )of having tf occurrences of a term t in a document d ,and the importance of t in d corresponds to the informative content −log 2(P RM (t ∈d |D )).Assuming that the sampling of terms corresponds to a sequence of independent Bernoulli trials,the randomness model RM is the binomial distribution:P B (t ∈d |D )= T F tfp tf (1−p )T F −tf (1)30V .Plachouras and I.Ouniswhere TF is the frequency of t in the collection D ,p =1N is a uniform prior probabilitythat the term t appears in the document d ,and N is the number of documents in the collection D .A limiting form of the binomial distribution is the Poisson distribution P :P B (t ∈d |D )≈P P (t ∈d |D )=λtf tf !e −λwhere λ=T F ·p =T FN (2)The information gain model GM estimates the informative content 1−P risk of the probability P risk that a term t is a good descriptor for a document.When a term t appears many times in a document,then there is very low risk in assuming that t describes the document.The information gain,however,from any future occurrences of t in d is lower.For example,the term ‘evaluation’is likely to have a high frequency in a document about the evaluation of IR systems.After the first few occurrences of the term,however,each additional occurrence of the term ‘evaluation’provides a diminishing additional amount of information.One model to compute the probability P risk is the Laplace after-effect model:P risk =tf tf +1(3)P risk estimates the probability of having one more occurrence of a term in a document,after having seen tf occurrences already.The third component of the DFR framework is the term frequency normalisation model,which adjusts the frequency tf of the term t in d ,given the length l of d and the average document length l in D .Normalisation 2assumes a decreasing density function of the normalised term frequency with respect to the document length l .The normalised term frequency tfn is given as follows:tfn =tf ·log 2(1+c ·l l )(4)where c is a hyperparameter,i.e.a tunable parameter.Normalisation 2is employed in the framework by replacing tf in Equations (2)and (3)with tfn .The relevance score w d,q of a document d for a query q is given by:w d,q =t ∈qqtw ·w d,t where w d,t =(1−P risk )·(−log 2P RM )(5)where w d,t is the weight of the term t in document d ,qtw =qtf qtf max ,qtf is the frequency of t in the query q ,and qtf max is the maximum qtf in q .If P RM is estimatedusing the Poisson randomness model,P risk is estimated using the Laplace after-effect model,and tfn is computed according to normalisation 2,then the resulting weight-ing model is denotedby PL2.The factorial is approximated using Stirling’s formula:tf !=√2π·tftf +0.5e −tf .The DFR framework generates a wide range of weighting models by using different randomness models,information gain models,or term frequency normalisation models.For example,the next section describes how normalisation 2is extended to handle the normalisation and weighting of term frequencies for different document fields.Multinomial Randomness Models for Retrieval with Document Fields31 2.2DFR Models for Document FieldsThe DFR framework has been extended to handle multiple documentfields,and to apply per-field term frequency normalisation and weighting.This is achieved by ex-tending normalisation2,and introducing normalisation2F[7],which is explained below.Suppose that a document has kfields.Each occurrence of a term can be assigned to exactly onefield.The frequency tf i of term t in the i-thfield is normalised and weighted independently of the otherfields.Then,the normalised and weighted term frequencies are combined into one pseudo-frequency tfn2F:tfn2F=ki=1w i·tf i log21+c i·l il i(6)where w i is the relative importance or weight of the i-thfield,tf i is the frequency of t in the i-thfield of document d,l i is the length of the i-thfield in d,l i is the average length of the i-thfield in the collection D,and c i is a hyperparameter for the i-thfield.The above formula corresponds to normalisation2F.The weighting model PL2F corresponds to PL2using tfn2F as given in Equation(6).The well-known BM25 weighting model has also been extended in a similar way to BM25F[17].3Multinomial Randomness ModelsThis section introduces DFR models which,instead of extending the term frequency normalisation component,as described in the previous section,use documentfields as part of the randomness model.While the weighting model PL2F has been shown to perform particularly well[7,8],the documentfields are not an integral part of the ran-domness weighting model.Indeed,the combination of evidence from the differentfields takes place as a linear combination of normalised frequencies in the term frequency nor-malisation component.This implies that the term frequencies are drawn from the same distribution,even though the nature of eachfield may be different.We propose two weighting models,which,instead of assuming that term frequen-cies infields are drawn from the same distribution,use multinomial distributions to incorporate documentfields in a theoretically driven way.Thefirst one is based on the multinomial distribution(Section3.1),and the second one is based on an information theoretic approximation of the multinomial distribution(Section3.2).3.1Multinomial DistributionWe employ the multinomial distribution to compute the probability that a term appears a given number of times in each of thefields of a document.The formula of the weighting model is derived as follows.Suppose that a document d has kfields.The probability that a term occurs tf i times in the i-thfield f i,is given as follows:P M(t∈d|D)=T Ftf1tf2...tf k tfp tf11p tf22...p tf kkp tf (7)32V .Plachouras and I.OunisIn the above equation,T F is the frequency of term t in the collection,p i =1k ·N is the prior probability that a term occurs in a particular field of document d ,and N is the number of documents in the collection D .The frequency tf =T F − ki =1tf i cor-responds to the number of occurrences of t in other documents than d .The probability p =1−k 1k ·N =N −1N corresponds to the probability that t does not appear in any of the fields of d .The DFR weighting model is generated using the multinomial distribution from Equation (7)as a randomness model,the Laplace after-effect from Equation (3),and replacing tf i with the normalised term frequency tfn i ,obtained by applying normal-isation 2from Equation (4).The relevance score of a document d for a query q is computed as follows:w d,q = t ∈q qtw ·w d,t = t ∈qqtw ·(1−P risk )· −log 2(P M (t ∈d |D )=t ∈q qtw k i =1tfn i +1· −log 2(T F !)+k i =1 log 2(tfn i !)−tfn i log 2(p i ) +log 2(tfn !)−tfn log 2(p ) (8)where qtw is the weight of a term t in query q ,tfn =T F − k i =1tfn i ,tfn i =tf i ·log 2(1+c i ·li l i )for the i -th field,and c i is the hyperparameter of normalisation 2for the i -th field.The weighting model introduced in the above equation is denoted by ML2,where M stands for the multinomial randomness model,L stands for the Laplace after-effect model,and 2stands for normalisation 2.Before continuing,it is interesting to note two issues related to the introduced weight-ing model ML2,namely setting the relative importance,or weight,of fields in the do-cument representation,and the computation of factorials.Weights of fields.In Equation (8),there are two different ways to incorporate weights for the fields of documents.The first one is to multiply each of the normalised term frequencies tfn i with a constant w i ,in a similar way to normalisation 2F (see Equa-tion (6)):tfn i :=w i ·tfn i .The second way is to adjust the prior probabilities p i of fields,in order to increase the scores assigned to terms occurring in fields with low prior probabilities:p i :=p i w i .Indeed,the assigned score to a query term occurring in a field with low probability is high,due to the factor −tfn i log 2(p i )in Equation (8).Computing factorials.As mentioned in Section 2.1,the factorial in the weighting model PL2is approximated using Stirling’s formula.A different method to approximate the factorial is to use the approximation of Lanczos to the Γfunction [12,p.213],which has a lower approximation error than Stirling’s formula.Indeed,preliminary experi-mentation with ML2has shown that using Stirling’s formula affects the performance of the weighting model,due to the accumulation of the approximation error from com-puting the factorial k +2times (k is the number of fields).This is not the case for the Poisson-based weighting models PL2and PL2F,where there is only one factorial com-putation for each query term (see Equation (2)).Hence,the computation of factorials in Equation (8)is performed using the approximation of Lanczos to the Γfunction.Multinomial Randomness Models for Retrieval with Document Fields33 3.2Approximation to the Multinomial DistributionThe DFR framework generates different models by replacing the binomial randomness model with its limiting forms,such as the Poisson randomness model.In this section, we introduce a new weighting model by replacing the multinomial randomness model in ML2with the following information theoretic approximation[13]:T F!tf1!tf2!···tf k!tf !p1tf1p2tf2···p k tf k p tf ≈1√2πT F k2−T F·Dtf iT F,p ip t1p t2···p tk p t(9)Dtf iT F,p icorresponds to the information theoretic divergence of the probability p ti=tf iT Fthat a term occurs in afield,from the prior probability p i of thefield:D tfiT F,p i=ki=1tfiT Flog2tf iT F·p i+tfT Flog2tfT F·p(10)where tf =T F− ki=1tf i.Hence,the multinomial randomness model M in theweighting model ML2can be replaced by its approximation from Equation(9):w d,q=t∈q qtw·k2log2(2πT F)ki=1tfn i+1·ki=1tfn i log2tfn i/T Fp i+12log2tfn iT F+tfn log2tfn /T Fp+12log2tfnT F(11)The above model is denoted by M D L2.The definitions of the variables involved in theabove equation have been introduced in Section3.1.It should be noted that the information theoretic divergence Dtf iT F,p iis definedonly when tf i>0for1≤i≤k.In other words,Dtf iT F,p iis defined only whenthere is at least one occurrence of a query term in all thefields.This is not always the case,because a Web document may contain all the query terms in its body,but it may contain only some of the query terms in its title.To overcome this issue,the weight of a query term t in a document is computed by considering only thefields in which the term t appears.The weights of differentfields can be defined in the same way as in the case of the weighting model ML2,as described in Section3.1.In more detail,the weighting of fields can be achieved by either multiplying the frequency of a term in afield by a constant,or by adjusting the prior probability of the correspondingfield.An advantage of the weighting model M D L2is that,because it approximates the multinomial distribution,there is no need to compute factorials.Hence,it is likely to provide a sufficiently accurate approximation to the multinomial distribution,and it may lead to improved retrieval effectiveness compared to ML2,due to the lower accu-mulated numerical errors.The experimental results in Section4.2will indeed confirm this advantage of M D L2.34V.Plachouras and I.Ounis4Experimental EvaluationIn this section,we evaluate the proposed multinomial DFR models ML2and M D L2, and compare their performance to that of PL2F,which has been shown to be particu-larly effective[7,8].A comparison of the retrieval effectiveness of PL2F and BM25F has shown that the two models perform equally well on various search tasks and test collections[11],including those employed in this work.Hence,we experiment only with the multinomial models and PL2F.Section4.1describes the experimental setting, and Section4.2presents the evaluation results.4.1Experimental SettingThe evaluation of the proposed models is conducted with TREC Web test collection,a crawl of approximately1.25million documents from domain.The .Gov collection has been used in the TREC Web tracks between2002and2004[2,3,4]. In this work,we employ the tasks from the Web tracks of TREC2003and2004,because they include both informational tasks,such as the topic distillation(td2003and td2004, respectively),as well as navigational tasks,such as named pagefinding(np2003and np2004,respectively)and home pagefinding(hp2003and hp2004,respectively).More specifically,we train and test for each type of task independently,in order to get insight on the performance of the proposed models[15].We employ each of the tasks from the TREC2003Web track for training the hyperparameters of the proposed models.Then, we evaluate the models on the corresponding tasks from the TREC2004Web track.In the reported set of experiments,we employ k=3documentfields:the contents of the<BODY>tag of Web documents(b),the anchor text associated with incoming hyperlinks(a),and the contents of the<TITLE>tag(t).Morefields can be defined for other types offields,such as the contents of the heading tags<H1>for example. It has been shown,however,that the body,title and anchor textfields are particularly effective for the considered search tasks[11].The collection of documents is indexed after removing stopwords and applying Porter’s stemming algorithm.We perform the experiments in this work using the Terrier IR platform[10].The proposed models ML2and M D L2,as well as PL2F,have a range of hyperpa-rameters,the setting of which can affect the retrieval effectiveness.More specifically,all three weighting models have two hyperparameters for each employed documentfield: one related to the term frequency normalisation,and a second one related to the weight of thatfield.As described in Sections3.1and3.2,there are two ways to define the weights offields for the weighting models ML2and M D L2:(i)multiplying the nor-malised frequency of a term in afield;(ii)adjusting the prior probability p i of the i-th field.Thefield weights in the case of PL2F are only defined in terms of multiplying the normalised term frequency by a constant w i,as shown in Equation(6).In this work,we consider only the term frequency normalisation hyperparameters, and we set all the weights offields to1,in order to avoid having one extra parameter in the discussion of the performance of the weighting models.We set the involved hyperparameters c b,c a,and c t,for the body,anchor text,and titlefields,respectively, by directly optimising mean average precision(MAP)on the training tasks from the Web track of TREC2003.We perform a3-dimensional optimisation to set the valuesMultinomial Randomness Models for Retrieval with Document Fields 35of the hyperparameters.The optimisation process is the following.Initially,we apply a simulated annealing algorithm,and then,we use the resulting hyperparameter values as a starting point for a second optimisation algorithm [16],to increase the likelihood of detecting a global maximum.For each of the three training tasks,we apply the above optimisation process three times,and we select the hyperparameter values that result in the highest MAP.We employ the above optimisation process to increase the likelihood that the hyperparameters values result in a global maximum for MAP.Figure 1shows the MAP obtained by ML2on the TREC 2003home page finding topics,for each iteration of the optimisation process.Table 1reports the hyperparameter values that resulted in the highest MAP for each of the training tasks,and that are used for the experiments in this work.0 0.20.40.60.80 40 80 120 160 200M A PiterationML2Fig.1.The MAP obtained by ML2on the TREC 2003home page finding topics,during the optimisation of the term frequency normalisation hyperparametersThe evaluation results from the Web tracks of TREC 2003[3]and 2004[4]have shown that employing evidence from the URLs of Web documents results in important improvements in retrieval effectiveness for the topic distillation and home page find-ing tasks,where relevant documents are home pages of relevant Web sites.In order to provide a more complete evaluation of the proposed models for these two types of Web search tasks,we also employ the length in characters of the URL path,denoted by URLpathlen ,using the following formula to transform it to a relevance score [17]:w d,q :=w d,q +ω·κκ+URLpathlen (12)where w d,q is the relevance score of a document.The parameters ωand κare set by per-forming a 2-dimensional optimisation as described for the case of the hyperparameters c i .The resulting values for ωand κare shown in Table 2.4.2Evaluation ResultsAfter setting the hyperparameter values of the proposed models,we evaluate the models with the search tasks from TREC 2004Web track [4].We report the official TREC evaluation measures for each search task:mean average precision (MAP)for the topic distillation task (td2004),and mean reciprocal rank (MRR)of the first correct answer for both named page finding (np2004)and home page finding (hp2004)tasks.36V.Plachouras and I.OunisTable1.The values of the hyperparameters c b,c a,and c t,for the body,anchor text and titlefields,respectively,which resulted in the highest MAP on the training tasks of TREC2003Web trackML2Task c b c a c ttd20030.0738 4.326810.8220np20030.1802 4.70578.4074hp20030.1926310.3289624.3673M D L2Task c b c a c ttd20030.256210.038324.6762np20031.02169.232121.3330hp20030.4093355.2554966.3637PL2FTask c b c a c ttd20030.1400 5.0527 4.3749np20031.015311.96529.1145hp20030.2785406.1059414.7778Table2.The values of the hyperparameters ωandκ,which resulted in the high-est MAP on the training topic distillation (td2003)and home pagefinding(hp2003) tasks of TREC2003Web trackML2Taskωκtd20038.809514.8852hp200310.66849.8822M D L2Taskωκtd20037.697412.4616hp200327.067867.3153PL2FTaskωκtd20037.36388.2178hp200313.347628.3669Table3presents the evaluation results for the proposed models ML2,M D L2,and the weighting model PL2F,as well as their combination with evidence from the URLs of documents(denoted by appending U to the weighting model’s name).When only the documentfields are employed,the multinomial weighting models have similar perfor-mance compared to the weighting model PL2F.The weighting models PL2F and M D L2 outperform ML2for both topic distillation and home pagefinding tasks.For the named pagefinding task,ML2results in higher MRR than M D L2and PL2F.Using the Wilcoxon signed rank test,we tested the significance of the differences in MAP and MRR between the proposed new multinomial models and PL2F.In the case of the topic distillation task td2004,PL2F and M D L2were found to perform statistically significantly better than ML2,with p<0.001in both cases.There was no statistically significant difference between PL2F and M D L2.Regarding the named pagefinding task np2004,there is no statistically significant difference between any of the three proposed models.For the home pagefinding task hp2004,only the difference between ML2and PL2F was found to be statistically significant(p=0.020).Regarding the combination of the weighting models with the evidence from the URLs of Web documents,Table3shows that PL2FU and M D L2U outperform ML2U for td2004.The differences in performance are statistically significant,with p=0.002 and p=0.012,respectively,but there is no significant difference in the retrieval ef-fectiveness between PL2FU and M D L2U.When considering hp2004,we can see that PL2F outperforms the multinomial weighting models.The only statistically significant difference in MRR was found between PL2FU and M D L2FU(p=0.012).Multinomial Randomness Models for Retrieval with Document Fields37 Table3.Evaluation results for the weighting models ML2,M D L2,and PL2F on the TREC 2004Web track topic distillation(td2004),named pagefinding(np2004),and home pagefinding (hp2004)tasks.ML2U,M D L2U,and PL2FU correspond to the combination of each weighting model with evidence from the URL of documents.The table reports mean average precision (MAP)for the topic distillation task,and mean reciprocal rank(MRR)of thefirst correct answer for the named pagefinding and home pagefinding tasks.ML2U,M D L2U and PL2FU are evalu-ated only for td2004and hp2004,where the relevant documents are home pages(see Section4.1).Task ML2M D L2PL2FMAPtd20040.12410.13910.1390MRRnp20040.69860.68560.6878hp20040.60750.62130.6270Task ML2U M D L2U PL2FUMAPtd20040.19160.20120.2045MRRhp20040.63640.62200.6464A comparison of the evaluation results with the best performing runs submitted to the Web track of TREC2004[4]shows that the combination of the proposed mod-els with the evidence from the URLs performs better than the best performing run of the topic distillation task in TREC2004,which achieved MAP0.179.The performance of the proposed models is comparable to that of the most effective method for the named pagefinding task(MRR0.731).Regarding the home pagefinding task,the dif-ference is greater between the performance of the proposed models with evidence from the URLs,and the best performing methods in the same track(MRR0.749).This can be explained in two ways.First,the over-fitting of the parametersωandκon the training task may result in lower performance for the test task.Second,usingfield weights may be more effective for the home pagefinding task,which is a high precision task,where the correct answers to the queries are documents of a very specific type.From the results in Table3,it can be seen that the model M D L2,which employs the information theoretic approximation to the multinomial distribution,significantly outperforms the model ML2,which employs the multinomial distribution,for the topic distillation task.As discussed in Section3.2,this may suggest that approximating the multinomial distribution is more effective than directly computing it,because of the number of computations involved,and the accumulated small approximation errors from the computation of the factorial.The difference in performance may be greater if more documentfields are considered.Overall,the evaluation results show that the proposed multinomial models ML2and M D L2have a very similar performance to that of PL2F for the tested search tasks. None of the models outperforms the others consistently for all three tested tasks,and the weighting models M D L2and PL2F achieve similar levels of retrieval effectiveness. The next section discusses some points related to the new multinomial models.。
这是一个圈套这是一个圈套是网络新兴的吐槽语之一,出自星球大战《绝地归来》的阿克巴上将。
阿克巴上将:采取回避!绿色舰队贴近在MV-7区域!(Take evasive action! Green group, stickclose to holding section MV-7!)战舰操控员:上将!在第47区有敌舰出现!(Admiral! We have enemy ships in sector 47!)阿克巴上将:这是个陷阱!(It's a trap!)— 《星球大战·绝地归来》语义随着《星球大战》的热播,这句话成了很多人把玩梗,向《星球大战》致敬的方式。
随后在4chan上,当时流行贴一些很漂亮的伪娘图片骗撸之后再告诉你这是伪娘的行为。
相应地,就也有人直接在这种图楼里贴上阿克巴的梗图提醒,久而久之这是个陷阱也有指代伪娘的含义。
不过现在的绅士往往会因为伪娘而更加兴♂奋使用方法•当某人发布引战言论,最后却发现其知名度随着战争水涨船高时——致敬&玩梗•在星球大战《汉索罗外传》以及《原力觉醒》中也出现过这句台词。
•许多美国电视节目,例如《每日秀》《科尔伯特报告》《生活大爆炸》《居家男人》也曾玩梗或致敬这个模因。
•在2008年以星球大战为题材的电子游戏《星球大战:原力释放》里;阿克巴是该游戏的隐藏角色,其中要呼叫出阿克巴的秘技便是输入拼起来很像“是陷阱”的字词“ITSATWAP”。
[1]•在《星际争霸2》中,人族的战列巡航舰(俗称大和舰)在向移动指令地点航行的过程中如果受到了哪怕些许的攻击,舰长就会特别激动的喊:“这是个陷阱!”更有甚者会大喊:“弃船”注释。
Contact: Patrick LoudenTel: (561) 796-6793Web Site: Pratt & Whitney Space Propulsion – Mk-72The Mk-72 booster is the latest addition to the STANDARD Missile Family, providing the SM-2 Block IV (AEGIS ER) the maximum capability in altitude and range. The first anti-air warfare missile system for fleet defense was initiated over 40 years ago. It has evolved from its origin as a wing-controlled beam rider to its present state as the most advanced supersonic tail-controlled semi-active RF homing missile. Integrated with the Mk-41 Vertical Launch System (VLS), the Block IV provides wide area coverage for the new generation of AEGIS missile cruisers. The Mk-72 provides not only the initial boost, but also complete steering for the missile including pitch, yaw and roll control. The four-nozzle thrust vector control (TVC) is unique for tactical missiles.Pratt & Whitney Space Propulsion’s San Jose, Calif., site developed the Mk-72 under contract to Raytheon Company in cooperation with a team which included the U.S. Navy and Hughes. The motor is the largest booster developed for launch from the Mk-41 Vertical Launching System (VLS). The booster is currently in production under contract to Raytheon Missile Systems Company, which is now responsible for all STANDARD Missile production.The Mk-72 is also the first-stage booster for the Navy’s Ballistic Missile Defense systems. The Mk-72 provides the Navy a high performance, extremely agile booster as the building block for all new missile initiatives, ensuring off-the-shelf compatibility with the VLS.CharacteristicsDiameter: 21 inchesLength: 68 inchesTotal weight: 1,500 poundsTVC system: 4 movable nozzlesActuation: Electromechanical。
・论著・糖皮质激素在重症社区获得性肺炎致急性呼吸窘迫综合征综合救治中的价值探讨宋志芳,郭晓红,王树云,谢 伟,殷 娜,张 悦,单慧敏,李文华(上海第二医科大学附属新华医院内科ICU,上海 200092) 摘要:目的:探讨糖皮质激素(GC)在重症社区获得性肺炎(SCA P)致急性呼吸窘迫综合征(AR DS)综合救治中的价值。
方法:收集2000年5月—2003年2月所有因SCAP致A RD S入住ICU患者的临床资料,分析年龄、性别、急性生理学与慢性健康状况评分Ⅱ(AP A CHEⅡ)、氧合指数(PaO2/F iO2)、肺内分流(Qs/Qt)、肺部感染严重程度、机械通气、呼气末正压(P EEP)水平、ICU停留时间、肺炎吸收指数、氧合改善率、病死率及直接死亡原因等,评价应用GC对缺氧与休克纠正和预后的影响。
结果:24例患者中,未接受G C治疗7例,痊愈5例(71.4%),死亡2例(28.6%),直接死亡原因分别为A RDS(拒绝接受机械通气治疗)和多脏器功能障碍综合征(M OD S);接受G C治疗17例,仅痊愈5例(29.4%),死亡12例(70.6%),主要死亡原因为M OD S(6例,75.0%),少数为A RD S与休克(各1例,12.5%)。
痊愈患者肺部感染严重程度等临床参数与死亡者无显著差异(P>0.05),但治疗后除P aO2/FiO2和Qs/Qt改善、休克纠正明显外,肺部感染有效率高(P<0.001)。
结论:GC能协助机械通气等常规治疗纠正SCA P等肺内性A RDS的顽固性缺氧和休克,为原发病治疗赢得时间。
关键词:激素;重症社区获得性肺炎;急性呼吸窘迫综合征;急性生理学与慢性健康状况评分Ⅱ;机械通气中图分类号:R563.1;R365;R969 文献标识码:A 文章编号:10030603(2003)11066906Evaluation of glucocorticoid in treatment f or patients with acute respiratory distress syndrome as a result ofserious community acquired pneumonia SON G Zhi f ang,GUO X iao hong,W A N G Shu y un,X I EW ei,Y I N N a,ZH A N G Y ue,S H A N H ui min,L I W en hua.M ed ical I ntensiv e Care U nit,X inhuaH osp ital,T he Second M edical University of S hanghai,Shanghai200092,ChinaAbstract:Obj ective:T o evaluate the usag e of g lucoco rtico id(GC)in treat ment f or patient s w it h acute respir ator y distr ess syndro me(A RD S)r esulting fr om ser io us community acquired pneumo nia(SCA P).Methods:T he clinical data fr om all patient s w ith A RD S r esulting fr om SCA P in medical ICU(M ICU)fr omM ay2000to F eb.2003w ere collected.T heir ag e,sex,acute physio lo gy and chr onic health ev aluat ion(A P ACHEⅡ)sco re,P aO2/FiO2and Q s/Q t,the sev erity of SCAP,mechanical ventilation(M V)and the levelof po sitiv e end ex pirat or y pressure(PEEP),time o f st ay in ICU,impr o vement of SCA P and o xy genation,asw ell as mor tality and r easo ns o f death wer e analy zed,respect ively.So w as did t he influence o f adm inistr atio nof GC o n hypox emia,septic sho ck,and their pr og nosis.Results:T here w ere24cases to tally,among them7patients had not taken GC,and5patients w er e cur ed(71.4per cent),and the other2cases died(28.6percent).T heir direct cause of death wer e m ultiple o rg an dysfunction sy ndr ome(M ODS)a nd A RD S,respectiv ely.In17cases GC w as g iv en because hy po xem ia and sept ic shock co uld not be alleviated w ithor dinary ther apy,including M V.A mong them o nly5patients(29.4per cent)w er e cur ed,and all o thers(12cases,70.6percent)died,and the majo r dir ect cause o f deat h w as M O DS(6cases,75.0per cent).A fewof t hem died of A RD S and sept ic sho ck(1case,12.5per cent,r espectiv ely).T he sev erity o f SCA P,as w ell asother clinical data o f t he surv iv or s,sho wed no significant differ ence compar ed w ith the nonsur vivo rs(P>0.05).But ex cept for their Pa O2/FiO2,Q s/Q t and sho ck,their pulmonar y infectio n w as bettercontr olled than deaths(P<0.001).Conclusion:R efracto ry hypo xemia and septic sho ck of pat ients w ithpulmonar y A RD S might be allev iated by GC when they ar e treat ed w ith ro utine methods,including M V,t husit enabled to win the time fo r o ther effective tr eatments.Key words:g lucoco rt icoid;ser ious community acquired pneumonia;acut e r espirat or y distr ess syndr om e;acute phy siolog y a nd chr o nic health eva luatio nⅡsco re;mechanica l v ent ilatio nCLC number:R563.1;R365;R969 Document code:A Article ID:10030603(2003)11066906 基金项目:上海市卫生局科技发展基金项目资助(00409)作者简介:宋志芳(1952),女(汉族),安徽省合肥市人,博士,教授,主任医师,主要研究方向为危重病与急救医学,主编《现代呼吸机治疗机械通气与危重病》专著,发表论文30余篇。
终结者2018英文剧本台词解析.txt52每个人都一条抛物线,天赋决定其开口,而最高点则需后天的努力。
没有秋日落叶的飘零,何来新春绿芽的饿明丽?只有懂得失去,才会重新拥有。
看电影学英语:Terminator Salvation 《终结者2018》[Scene: Longview state correctional facility,2003]朗维尤州立监狱2003年-Dr. Kogan:Marcus ,How are you?Marcus,你怎么样?-Marcus: Ask me in an hour.一小时后再问吧。
-Dr. Kogan: I thought I'd try... one last time.我想...最后再和你谈一次。
-Marcus:You should have stayed in San Francisco, Dr. Kogan .San Francisco: 旧金山 Dr.=Doctor: 博士你该呆在旧金山的,科根博士-Dr. Kogan: By signing this consent form, you'll be donating your body to a noble cause. You'd... have a second chance... through my research to live again. consent: 同意,许可 form:表格 donate: 捐赠 noble:高尚的,崇高的 cause:事业research: 研究签下这份承诺书,你的遗体会捐献给一项崇高的事业。
你可以... 得到救赎的机会。
我的研究可以让你重生。
-Marcus: You know what I did. My brother and two cops are dead because of me. I'm not lookin' cop: 警察 lookin’=looking:寻求你知道我做了些什么。
2003年 Text 1Wild Bill Donovan would have loved the Internet. The American spymaster who built the Office of Strategic Services in the World War II and later laid the roots for the CIA was fascinated with information. Donovan believed in using whatever too ls came to hand in the “great game” of espionage ——spying as a “profession.” These days the Net, which has already re-made such everyday pastimes as buying books and sending mail, is reshaping Donovan’s vocation as well.如果Wild Bill Donovan 当时有互联网的话他肯定会喜欢网络的。
这位美国间谍大王对情报格外着迷,他曾经在第二次世界大战时建立了战略服务办公室,后来又为中央情报局的成立打下了基础。
Donovan 相信,在谍报的“伟大游戏”当中,即间谍这一“职业”当中,可以使用任何可利用的手段。
如今,互联网已经改变了像买书和寄信这样的日常活动,也正在改变Donovan曾经从事的这个职业。
注:其实espionage 和 spying是一个意思,spying是对espionage的解释,espionage主要用于政府军事公司团体,相对正式些,可以翻译成“谍报”;spying是普通用法,主要指公司或个人,所以可以译为“间谍”,“密探”均可。
The latest revolution isn’t simply a matter of gentlemen reading other gentlemen’s e-mail. That kind of electronic spying has been going on for decades. In the past three or four years, the World Wide Web has given birth to a whole industry of point-and-click spying. The spooks call it “open-source intelligence,” and as the Net grows, it is becoming increasingly influential. In 1995 the CIA held a contest to see who could compile the most data about Burundi. The winner, by a large margin, was a tiny Virginia company called Open Source Solutions, whose clear advantage was its mastery of the electronic world.最近的这次革命性的改变不仅仅是一个人偷看他人电子邮件的问题,这样的电子间谍活动已经存在了数十年。
全国2003年4月高等教育自学考试英语阅读(一)试题课程代码:00595全部题目用用英文作答,并将答案写在答题纸相应的位置上,否则不计分。
PART ONEⅠ.TEXT CMOMPREHENSIONThe following comprehension questions are based on the texts you have learned, and each of them is provided with 4 choices marked [A],[B],[C]and[D].Choose the best answer to each question and write it on the ANSWER SHEET.(20 points,1 point each)1.In Gifts of the Magi,both “gift” and “Magus” are in plural, because O. Henry wants to tell the reader that .[A] People are kind to Mr. and Mrs. Young[B] Mr. Young loves Mr. Young[C] Mrs. Young loves Mr. Young[D] Mr. and Mrs. Young love each other2. “I am not sure what I am rebelling against, but I really don‟t see a need for marriage. That isn‟ta statement about my feelings about the relationship, because there is no less strength of commitment.” The underlined clause means .[A] the married couples have more responsibility for each other[B] the cohabiting couples have more responsibility for each other[C] the married couples and the cohabiting ones show no responsibility for each other[D] both the married couples and cohabiting ones should be equally responsible for each other3. “Having come to a very remote and deserted spot, they realized their chance had come: catching Lorenzo off guard, they killed him.” The underlined phrase means .[A] Lorenzo was caught unawares[B] Lorenzo was caught off duty[C] Lorenzo was handed over to them by their guards[D] Lorenzo was caught when his guard was away4.In The Necklace, when Mme. Loise1 took back the necklace, how did Mme. Forrester react?[A] She opened the box and examined the jewel carefully.[B] She said coldly that Mme. Loise1 shouldn‟t have returned it so late.[C] She complained that the necklace had been substituted.[D] She was only too pleased to see her old friend again.5. The Fisherman and His Wife is of .[A] fable [B] myth[C] fairy story [D] fairy-tale-romance6.Mark Twain is NOT the author of .[A] The Adventures of Tom Sawyer[B] The Adventures of Huckleberry Finn[C] The Celebrated Jumping Frog of Calaveras County[D] The Old Man and the Sea7. In his fable about a proud crow and a hungry fox, Aesop intends to tell the reader that .[A] the fox is never trust worthy[B] the fox is always homey-tongued[C] it is harmful to believe big talkers[D] it is harmful to listen to excessive flattery8. According to Bringing up Children, if one stage of child development has been left out, or not sufficiently experienced, .[A] the child may go back and recapture the experience of it[B] the parents may provide the child with the child with the opportunity to play with toys[C] the parents must be consistent in their attitude to their children[D] the child should be sent to a child clinic for a psychological treatment9.The theme of the story A Day‟s Wait is that.[A] misunderstandings can even occur between father and son[B] misunderstandings can sometimes lead to an odd experience[C] to be calm and controlled in the face of death is a mark of courage[D] death is something beyond a child‟s comprehension10. In A Day’s Wait, the hunting scene, at first glance, may seem to have little to do with the plot.However, the author has his own justification for describing it. Which of the following is NOT a reason for such description?[A] It diverts the reader so that the boy‟s real thoughts will be a greater surprise when they arerevealed.[B] It creates a sense of time passing so that we know it is close to evening by the time thefather gets home.[C] It gives the author an opportunity to show that he is able to write very complexsentences though he usually writes very short, simple ones.[D] It brings out a contrast between th e father‟s robust activities outside and the boy‟sterrible tension inside.11. In Art for Heart’s Sake, Dr. Caswell gave Ellsworth a suggestion that be .[A] take more medicine[B] listen to the radio or watch TV[C] take more automobile rides[D] take up art12. In How to live like a Millionaire, the self - made rich develop clear goals for .[A] accumulating income till the age of 50[B] having a dollar figure in mind and working for it[C] leaving an estate to their children[D] retiring early13. The short story as a genre in American literature probably began with Irving‟s The Sketch Book,a collection of essays, sketches and tales, among which the most famous and frequently anthologized are Rip Van Winkle and .[A] The Wild Honeysuckle[B] The Legend of Sleepy Hollow[C] The Scarlet Letter[D] The Pioneers14. “Not even the great Nicholas Veddle himself was safe from the tongue of this daring woman,who blamed himself for much of her husband‟s idleness.” The word tongue in this quotation probably refers to .[A] extremely intelligent and lively words[B] offensive or insulting remarks[C] a movable organ in the mouth[D] the tone or manner of speaking15. According to The Story of the Bible, the Jews were the first among all people to recognize that .[A] different gods made different things in nature[B] one single God created this world[C] one god was devoted to the making of water[D different gods were responsible for the making of the land16. According to Otto Jespersen, the ideal international language was the one that .[A] was the easiest to learn for people all over the world[B] was familiar to scientists all over the world[C] was based on Latin and Greek roots[D] derived the basic structure form non-Indo-European languages17. In Bricks from the Tower of the Babel, the writer provides a detailed explanation for which of the following?[A] The construction of the tower.[B] The structure and sound system of Esperanto.[C] The internationalization of some natural languages.[D] The Indo-European language family.18. In The Girls in Their Summer Dresses, Michael‟s state of mind suggests that .[A] he has adjusted himself to married life[B] he is often absent – minded and confused[C] he starts to resent Frances now[D] he takes for granted what he is doing19.In The Girls in Their Summer Dresses, Frances said, “You‟re going to make a move.” She said so to mean that Michael would .[A] move away to some other location[B] attract and move some girls[C] arouse deep emotions in girls[D] take action and leave her some day20. According to Universities and Polytechnics, Oxford and Cambridge are attractive to both the resident students and visitors for their .[A] advanced academic learning[B] excellent constituent colleges[C] organizational structures[D] buildings of historical significanceⅡ.READING COMPREHENSIONIn this part there are 4 reading passages followed by 20 questions or unfinished statements. For each of them there are 4 choices marked [A], [B],[C] and [D]. You should decide onthe best answer and write it on the ANSWER SHEET. (40 points, 2 points each)Passage 1Failure is probably the most fatiguing experience a person ever has. There is nothing more exhausting than not succeeding—being blocked, not moving ahead. It is an evil circle. Failure breeds fatigue, and fatigue makes it harder to get to work, which adds to the fatigue.We experience this tiredness in two main ways, as start-up fatigue and performance fatigue. In the former case, we keep putting off a task that we are forced to take up. Either because it is too tedious or because it is too difficult, we avoid it. And the longer we postpone it, the more tired we feel.Such start-up fatigue is very real, even not actually physical, not something in our muscles and bones. The remedy is obvious, though perhaps not easy to apply: willpower exercise. The moment I find myself turning away from a job, or putting it under a pile of other things I have to do, I clear my desk of everything else and attack the objectionable item first. To prevent start-up fatigue, always treat the most difficult job first.Performance fatigue is more difficult to handle. Here we are willing to get started, but we cannot seem to do the job right. Its difficulties appear to be insurmountable and however hard we work, we fail again and again. The mounting experience of failure carries with it an ever-increasing burden of mental fatigue. In such a situation, I work as hard as I can-then let the unconscious take over.21.Which of the following can be called an evil circle?[A] Success – zeal – success – zeal.[B] Failure – tiredness – failure – tiredness.[C] Failure – zeal – failure – tiredness.[D] Success – exhaustion – success – exhaustion.22. According to the passage, when keeping putting off a task, we can experience .[A] tiredness[B] performance fatigue[C] start-up fatigue[D] unconsciousness23. To overcome start-up fatigue, we need .[A] toughness[B] prevention[C] muscles[D] strong willpower24.The word insurmountable in the last paragraph probably means .[A]unable to be solved [B] unlikely to be understood[C] unable to be imagined [D] unlikely to be rejected25. According to the passage, which of the following statements is NOT true?[A] It is easier to overcome start-up fatigue.[B] Performance fatigue occurs when the job we are willing to take gets blocked.[C] One will finally succeed after experiencing the evil circle.[D] Fatigue often accompanies failure.Passage 2On days when there is work , I talk to the other guys. Some of them tell me that the harvest season is coming in northern California, and they say that one can earn good money there. Things haven‟t gone so badly in the car wash, but one afternoon I give the manager my thanks for having hired and promoted me, and with a little suitcase that night I board a Greyhound headed north. My ticket is made out for San Francisco, but I don‟t plan to go that far. I pla n to ride until I find a place where people are harvesting, and to get off the bus there.I sleep on the bus for a few hours that night, and in the morning, when I awake, I don‟t know where we are. I get up from my seat and walk down the bus aisle, looking for a Mexican or Chicano to tell me our location, but oddly enough, I don‟t see any among the passengers, who are all white-skinned. I pay attention to the road signs we pass, but they are not of much help. I can read the town names, but I don‟t know whe re the towns lie. A map would help me, and I decide to buy one at our next stop. Lots of things are for sale at the bus stop‟s gift shop, but there are no maps. I direct myself to wards the shop‟s operator, but I run into the language barrier. The operator is an Anglo, and when I speak to him in Spanish, he says that he doesn‟t understand. I try to practice my very precarious (不可靠的)English with him, but it‟s of no use. I have a rough idea of the sound of the words that I want to say, but I can‟t pronounce t hem right. I make signs, signaling a big piece of paper and say “form California,” but he turns into a question mark, with eyes wide open, arms raised and hands extended, “Map,” I say, but I don‟t pronounce the word very well. “Freeways, streets,” I add, but he still doesn‟t understand. He points out chewing gum, candies, pieces of cake, sandwiches, soft drinks, and cigarettes, trying to guess what I‟m asking for. But he doesn‟t show me any maps. Finally, I back out of the store, and as I leave I hear him say, “I‟m sorry.”A little before the bus leaves, I run into a Mexican-American in a hallway and I immediately ask him to help me find a map off California. We go back to the store. The Chicano asks for a map .“Ahh !Ahaaa!” the operator exclaims. Then he go es to a corner of his shelves and takes out what I‟ve been asking for. While I am paying him, he talks to the Chicano in a joyful tone. With the map in my hands, I give the Chicano my thanks, and he explains that the store-keeper thinks that I am asking if he needs anybody to clean the floor or “mop.”26. The writer decided to leave his job and go to northern California because .[A] his boss didn‟t like him[B] things were going badly in the car wash[C] he thought he could earn more money[D] th ere wasn‟t always work27. The writer wanted a map in order to .[A] find the way to San Francisco[B] help him with the road signs[C] know where he was in relation to the entire trip[D] find his way back to his workplace28. Form the passage, we can infer that .[A] the owner of the shop did not want to sell the writer a map[B] the writer was fired from the car wash[C] the writer was a migrant farm worker[D] the writer was traveling with a friend who could speak English29. The writer tries to make himself understood by all the following EXCEPT.[A]gestures[B] words or phrases[C] pronunciations[D] spelling the word30. We can learn from the story that .[A] incorrect pronunciations may result in misunderstanding[B] immigrants usually have a hard time in the foreign countries[C] a foreign language can be learned through conversations[D] traveling alone brings unexpected troubles and problemsPassage 3Exceptional children are different in some significant ways from others of the same age. For these children to develop to their full adult potential, their education must be adapted to those differences.Although we focus on the needs of exceptional children, we find ourselves describing their environment as well. While the leading actor on the stage captures our attention, we are aware of the importance to the supporting players and the scenery of the play itself. Both the family and the society in which exceptional children live are often the key to their growth and development. And it is in the public schools that we find the full expression of society‟s understanding-the knowledge, hopes, and fears that are passed on to the next generation.Education in any society is a mirror of that society. In that mirror we can see the strengths, the weaknesses, the hopes, the prejudices, and the central values of the culture itself. The great interest in exceptional children shown in public education over the past three decades indicates the strong feeling in our society that all citizens, whatever their special conditions, deserve the opportunity to fully develop their capabilities.“All men are created equal.” We‟ve heard it many times, but it still has important meaning for education in America. Although the phrase was used by this country‟s founders to denote equality before the law, it has also been interpreted to mean equality of opportunity. That concept implies educational opportunity for all children-the right of each child to receive help in learning to the limits of his or her capacity, whether that capacity be small or great. Recent court decisions have confirmed the right of all children-disabled or not-to an appropriate education, and have ordered that public schools take the necessary steps to provide that education. In response, schools are modifying their programs, adapting instruction to children who are exceptional, to those who can not profit substantially from regular programs.31.In Paragraph 2, the author cites the example of the leading actor on the stage to show that .[A] the growth of exceptional children has much to do with their families and the society[B] exceptional children are more influenced by their families than normal children are[C] exceptional children are the key interest of the family and society[D] the needs of the society weigh much heavier than the needs of the exceptional children32.The reason why exceptional children receive so much concern in education is that .[A] they are expected to be leaders of the society[B] they might become a burden of the society[C] they should fully develop their potentials[D] disabled children deserve special consideration33. This passage mainly deals with .[A] the differences of children in their learning capabilities[B] the definition of exceptional children in modern society[C] special educational programs for exceptional children[D] the necessity of adapting education to exceptional children34.Form this passage we learn that the educational concern for exceptional children .[A] is now enjoying legal support[B] disagrees with the tradition of the country[C] was cl early stated by the country‟s founders[D] will exert great influence over court decisions35 .Which of the following is TRUE according to the passage?[A] Exceptional children refer to those with mental or physical problems.[B] The author uses “All men are created equal” to counter the school program for exceptionalchildren.[C] Recent court decisions confirm the rights of exceptional children to learn with regularchildren.[D] Regular school programs fail to meet the requirements to develop the potential ofexceptional children.Passage 4Life is a series of problems. Do we want to moan about them or solve them? Do we want to teach our children to solve them?Discipline is the basic set of tools we require to solve life‟s problems. Without discipline w e can solve nothing. With only some discipline we can solve only some problems. With total discipline we can solve all problems.What makes life difficult is that the process of confronting and solving problems is a painful one. Problems, depending upon their nature, evoke in us frustration or grief or sadness or loneliness or guilt or regret or anger or fear or anxiety or anguish or despair. These are uncomfortable feelings, often very uncomfortable, often as painful as any kind of physical pain, sometimes equaling the very worst kind of physical pain. Indeed, it is because of the pain that events or conflicts engender in us all that we call them problems. And since life poses an endless series of problems, life is always difficult and is full of pain as well as joy.Yet it is this whole process of meeting and solving problems that life has its meaning. Problems are the cutting edge that distinguishes between success and failure. Problems call forth ourcourage and our wisdom; indeed, they create our courage and our wisdom. It is only because of problems that we grow mentally and spiritually. When we desire to encourage the growth of the human spirit, we challenge and encourage the human capacity to solve problems, just as in school we deliberately set problems for our children to solve. It is through the pain of confronting and resolving problems that we learn. As Benjamin Franklin said, “Those things that hurt, instruct.” It is for this reason that wise people learn not to dread but actually to welcome problems and actually to welcome the pain of problems.I have stated that discipline is the basic set of tools we require to solve life‟s problems. It will become clear that these tools are techniques of suffering, means by which we experience the pain of problems in such a way as to work them through and solve them successfully, learning and growing in the process. When we teach ourselves and our children discipline, we are teaching them and ourselves how to suffer and also how to grow.What are these tools, these techniques of suffering, these means of experiencing the pain of problems constructively that I call discipline? These are four: delaying of gratification (满足),acceptance of responsibility, dedication to truth, and balancing. As will be evident, these are not complex tools whose application demands extensive training. To the contrary, they are simple tools, and almost all children are adept in their use by the age of ten. Yet presidents and kings will often forget to use them, to their own downfall. The problem lies not in the complexity of these tools but in the will to use them. For they are tools with which pain is confronted rather than avoided, and if one seeks to avoid legitimate suffering, then one will avoid the use of these tools.36.The main point of this passage is that .[A] without discipline we can solve nothing[B] problems evoke in us frustration or grief[C] dealing with one‟s problems gives life meaning[D] the tendency to avoid problems results in mental illness37. People who use a little discipline .[A] can solve all of their problems[B] can solve some of their problems[C] can solve nothing[D] have total discipline38. According to the author, which of the following makes life difficult?[A] Physical pain.[B] Frustration and guilt.[C] Solving problems.[D] Conflicts.39.Problems give our life meaning by all of the following means EXCEPT.[A] showing us the difference between success and failure[B] giving us courage[C] challenging us to grow[D] teaching us to avoid problems40.According to the author, which of the following is TRUE?[A] Successful leaders avoid their problems.[B] The tools for solving problems are hard to learn.[C] We need to confront emotional pain.[D] The tools of discipline are complicated.Ⅲ.SKIMMING AND SCANNINGIn this part there are 3 reading passages followed by 10 questions or unfinished statements. For each of them there are 4 answers marked[A],[B],[C]and [D].Skim or scan the passages, then decide on the best answer and write it on the ANSWER SHEET.(10 points,1 point each)Passage 1Dear Sires: Oct.30,1996 We are pleased to make you an offer regarding our …Swinger‟ dresses and trouser suits in the sizes you require. All the models can be supplied by the middle of December 1996, subject to our receiving your firm order by 15th November. Our C.I.F. prices are understood to be for sea/land transport to Chicago. If you would prefer the goods to be sent by air freight, this will be charged extra cost.Trouser Suits: sizes 8 – 16 in white, yellow, red, turquoise, black, pink per 100$2650.00Swinger Dresses: sizes 8 – 16 in white, yellow, red, turquoise, black per 100$1845.00Prices: valid until 31 st December, 1996Delivery: C. I. F. ChicagoTransport: sea/land freightPayment: by irrevocable letter of credit, or cheque with orderYou will be receiving cuttings of our materials and a colour chart. These were airmailed to you this morning. We hope you agree that our prices are very competitive for these good quality clothes, and look forward to receiving your initial order.Yours FaithfullyRobert Morgan41.Judging from the message given in the letter, the writer is a .[A] seller[B] buyer[C] government official[D] lawyer42. The price quoted for each Swinger Dress is .[A] $2650[B] $1845[C] $26.5[D] $18.4543.The goods under discussion can be delivered by .[A] Oct. 30, 1996[B] the middle of Dec.1996[C] Nov. 15, 1996[D] Dec. 31, 1996Passage 2When the CEO of lotus, manufacturer of computer software, interviews job candidates, he looks for people who can laugh out loud. At the headquarters of ice –cream maker Ben & Jerry‟s, the “Minister of Joy” supervises the “Joy Gang”, which has the job of spending $100,100 a year planning and implementing workplace fun. Odetics, maker of video security systems and other recording equipment, considered it an honor when Industry Week called it “the funniest place to work in the U.S.”In corporate America today, humor is a serious business. Workers have been downsized, re-engineered, restructured, and overworked for so long they have forgotten how to smile and laugh. To remind them, companies are posting amusing notes and cartoons on bulletin boards, building libraries of humorous books for workers to read, sp onsoring “fun at work” days, “laughter” committees, and even hiring specialists.As a result, the corporate humor business has taken off. A “humor services” group, called Humor Project, reports that it receives about twenty requests each day from companies looking for humor consultants. The Laughter Remedy, an organization that teaches the benefits of humor, helps employees build “humor skills” through a program that includes such steps as “developing the ability to play program that includes such steps as “developing the ability to play with language” and “finding humor in everyday life.” Humor consultant Paul McGhee gives audiences “remedial belly laughing” lessons. He tells them to smile, raise their eyebrows, lower their jaws, tighten their stomach muscles, and laugh. Speakers from Lighten Up Limited, a humor consulting firm, urge workers to tell jokes and take humor breaks. In their search for comic relief, organizations are spending thousands of dollars. Humor consultant Matt Weinstein, for example, receives $7500 for a ninety – minute talk.Why all the fuss and expense over an activity that seems contrary to the work ethic? One recent study reports that the most productive workplaces have at least the minutes of laughter every hour. And corporations that have added humor to workplace report an increase not only in productivity but also in employee loyalty , creativity, and morale, as well as improved teamwork and employee health.44.The corporate laughter business is booming because .[A] such an activity seems contrary to the work ethic[B] the humor business has proved profitable[C] the workers overwork, so much so that they intend to get their work re-engineered andrestructured[D] few corporations consider humor a serious business and an incentive to productivity45.According to the passage, the Laughter Remedy helps employees .[A] take humor breaks and relax themselves[B] develop their abilities to use language[C] build “humor skills” through a designed program[D] free themselves from the overwork46. It may be inferred from the passage that .[A] the character of Americans seems to require that they should be humorous[B] wherever there is demand, a market will be created[C] humor is the most popular leisure pursuit in the western world[D] humor is the only source of revenue for the “laughter” specialistsPassage 3This Valentine‟s Day, 35-year-old Peter Henig had no trouble finding a date.He had been elected one of the 10 most wanted bachelors of the Internet by Women. com. Since then, Henig gets some 100 emails a day from women all over the word asking him for a date.Henig is good-looking enough to be considered one of the most suitable bachelors in cyberspace. As a senior editor at Red Herring, the bim onthly magazine of the tech word, he‟s certainly smart and successful.Forget the yuppies of the 1980s, the hottest bachelors these days-dot-com crisis or not-are the Silicon boys.“I didn‟t need a date the badly,” said Henig. But when he was contacted b y Women. com to be included in their “Top 10 Men of the Internet” contest, he eagerly accepted.“I don‟t look at it as a dating machine. I just thought it could be fun,” he said.In Silicon Valley, often dubbed(称之为)as “valley of guys” for its high percen tage of unmarried men, the venture capital gold rush may be over, but the dating industry is booming.According to a recent report, Silicon Valley should be the place for single women looking for love. For every 318 single men in the city of San Jose, the heart of Silicon Valley, there are 288 single women.Known for their lack of social skills, computer geeks are showing that they too can have a life. This is especially true during the economic downturn for tech industries, when there‟s no real need to spend all that time in front of their computers.According to Katherine Winter, who met her husband on Match. Com, an online dating service, the end of gold rush may not be bad news for the Silicon boys. She said, “Silicon Valley is definitely the place to be for single women, because of the quality and the number of men.”47. According to the passage, Henig has been elected as one of the most wanted single men because he is .[A] a handsome young man[B] a computer expert[C] one of the hottest bachelors[D] good-looking, smart and successful48. According to Katherine Winter, Silicon Valley is the ideal place for single women to find。
Part II Reading ComprehensionPassage 1Most people think of a camel as an obedient beast of burden, because it is best known for its ability to carry heavy loads across vast stretches of desert without requiring water. In reality, the camel is considerably more than just the Arabian equivalent of the mule. It also possessed a great amount of intelligence and sensitivity.The Arabs assert that camels are so acutely aware of injustice and ill treatment that a camel owner who punishes one of the beasts too harshly finds it difficult to escape the camel’s vengeance. Apparently, the animal will remember an injury and wait for an opportunity to get revenge.In order to protect themselves from the vengeful beasts, Arabian camel derivers have learned to trick their camels into believing they have achieved revenge. When an Arab realizes that he has excited a camel’s rage, he places his own garments on the ground in the animal’s path, He arranges the clothing so that it appears to cover a man’s body. When the c amel recognizes its master’s clothing on the ground, it seizes the pile with its teeth, shakes the garments violently and tramples on them in a rage. Eventually, after its anger has died away, the camel departs, assuming its revenge is complete. Only then does the owner of the garments come out of hiding, safe for the time being, thanks to this clever ruse.21. Which of the following is mentioned in this passage?A. The camel never drinks waterB. The camel is always violent.C. The camel is very sensitive.D. The camel is rarely used anymore.22. It is implied in the passage that _____________.A. the mule is a stupid and insensitive animalB. the mule is as intelligent as the camelC. the mule is an animal widely used in the desertD. the mule is a vengeful animal23. From this passage we can conclude that _________.A. camels are generally vicious towards their ownersB. camels usually treat their owners wellC. camels don’t see very wellD. camels try to punish people who abuse them24. The writer makes the camel’s vengeful behavior clearer to the reader by presenting _____________.A. a well- planned argumentB. a large variety of examplesC. some eyewitness accountsD. a typical incident25. The main idea of the passage is ______________.A. camels can be as intelligent as their driversB. camels are sensitive to injustice and will seek revenge on thosewho harm themC. camel drivers are often the targets of camels’ revengeD. camels are sensitive creatures that are aware of injustice Passage 2Although April did not bring us the rains we all hoped for, and although the Central V alley doesn't generally experience the atmospheric sound and lightning that can accompany those rains, it is still important for parents to be able to answer the youthful questions about thunder and lightning.The reason these two wonders of nature are so difficult for many adults to explain to children is that they are not very well understood by adults themselves. For example, did you know that the lightning we see flashing down to the earth from a cloud is actually flashing up to a cloud from the earth? Our eyes trick us into thinking we see a downward motion when it's actually the other way around. But then, if we believed only what we think and we see, we'd still insist that the sun rises in the morning and sets at night.Most lighting flashes take place inside a cloud, and only a relative few can be seen jumping between two clouds or between earth and a cloud. But, with about 2,000 thunderstorms taking place above the earth every minute of the day and night, there's enough activity to produce about 100 lightning strikes on earth every second.Parents can use thunder and lightning to help their children learn more about the world around them. When children understand that thelight of the lightning flashing reaches their eyes almost at the same moment, but the sound of the thunder takes about 5 seconds to travel just one mile, they can begin to time the interval between the flash and the crash to learn how close they were to the actual spark.26. According to the author, in the area of the Central V alley, ___________.A. rains usually come without thunder and lightningB. it is usually dry in AprilC. children pay no attention to natural phenomenaD. parents are not interested in thunder and lightning27. We believe that lightning is a downward motion because __________________.A. we were taught so by our parents from our childhoodB. we are deceived by our sense of visionC. it is a common natural phenomenonD. it is a truth proved by science28. What is TRUE about lightning according to the passage?A. Only a small number of lighting flashes occur on earth.B. Lightning travels 5 times faster than thunder.C. Lightning flashes usually jump from one cloud to another.D. There are far more lightning strikes occurring on earth we can imagine.29. The word "activity" (Para.3, Line 3) is most closely related to the word (s)______ .A. "cloud"B. "lightning strikes"C. "lightning flashes"D. "thunderstorms"30. It can be concluded from the passage that _________.A. we should not believe what we see or hearB. things moving downward are more noticeableC. people often have wrong concepts about ordinary phenomenaD. adults are not as good as children in observing certain natural phenomenaPassage 3In what now seem like the prehistoric times of computer history, the earth’s postwar era, there was quite a wide-spread concern that computer would take over the world from man one day. Already today, as computers are relieving us of more and more of the routine tasks in business and in our personal lives, we are faced with a less dramatic but also less foreseen problem. People tend to be over-trusting of computers and are reluctant to challenge their authority. Indeed, they behave as if they were hardly aware that wrong buttons may be pushed, or that a computer may simply malfunction.Obviously, there would be no point in investing in a computer if you had to check all its answers, but people should also rely on their own internal computers and check the machine when they have the feeling that something has gone wrong.Questioning and routine double-checks must continue to be as much a part of good business as they were in pre-computer days. Maybe each computer should come with the warning: for all the help this computer may provide, it should not be seen as a substitute for fundamental thinking and reasoning skills.31. What is the main purpose of this passage?A. To look back to the early days of computers.B. To explain what technical problems may occur with computers.C. To discourage unnecessary investment in computers.D. To warn against a mentally lazy attitude towards computers.32. According to the passage, the initial concern about computers was that they might _________.A. change our personal livesB. take control of the worldC. create unforeseen problemsD. affect our business33. The passage recommends those dealing with computers to _____.A. be reasonably doubtful about themB. check all their answersC. substitute them for basic thinkingD. use them for business purposes only34. The passage suggests that the present-day problem with regard to computers is _____________.A. challengingB. psychologicalC. dramaticD. fundamental35. It can be inferred from the passage that the author would disapprove of ___________.A. investment in computersB. the use of one’s internal computerC. double-check on computersD. complete dependence on computers for decision-making Passage 4With page after page of bulging biceps and masculine jaws, robust hairlines and silken skin, Men’s hea lth is advertising a standard of male beauty as stereotyped and unrealistic as the female version sold by those large-eyed, very young girls seen on the covers of Glamour and Elle. It is well on its way to making the male species as insane, insecure, and i rrational about physical appearance as does any women’s magazine. The days when men scrubbed their faces with regular soap and viewed gray hair and wrinkles as a badge of honor are fading. In U.S.A, an increasing number of men are using toiletries and various skin treatments, which are traditionally for females, to create an illusion of a youthful appearance.Magazines such as Men’s Health are affordable, efficient delivery vehicles for the message that physical imperfection, age, and an underdeveloped fashion sense are potentially crippling disabilities. Moreover, advertising a physical makeover or a trip to a weight reduction clinic as a smart way to help one in his career seems to help man rationalize their image obsession. “Whatever a man’s cosmetic sh ortcoming, it’s apt to be a career liability,” noted Alan Farnham in a recent issue of Fortune.36. According to the passage, men _______.A. have always had a strong obsession about their imageB. used to feel proud of their gray hair and wrinklesC. used to pay no attention to their healthD. use female products to create the illusion of a youthful appearance.37. The magazines such as Men’s Health suggest that __________.A. Men should do regular exercises to be muscular and masculine.B. Men become crazy about their physical appearance.C. Keeping a youthful appearance is one part of men’s career.D. Lack of knowledge about how to keep young and attractive physically is a defect for a modern man.38. Men’s magazines try to convince its readers that to have plastic surgery is a wise thing for men to do because it can __________.A. help them in their careerB. help them more popular among womenC. make them look young and handsome.D. help solve the problem of aging39. What does the word “liability” mean (in the last but one line of Para.3)?A. the quality of being liableB. responsibilityC. mistakeD. drawback40. The author’s attitude towards men’s magazines is _______.A. positiveB. criticalC. neutralD. sympatheticPart III Vocabulary and Structure41. Y ou can’t rely on his promise. It sounds ____ even in his own ears.A. unbearableB. unfoundedC. naïveD. hollow42. If you have any questions ____ any of our services, please feel free to call me.A. in the course ofB. consideringC. regardingD. in view of43. ________ my colleagues and myself I’d like to give a warm welcome to you all.A. For the sake ofB. In name ofC. On behalf ofD. In honor of44. Gambling is _______ on by some church authorities.A. imposedB. stompedC. frownedD. preached45. They were tired, but not any less enthusiastic _________.A. on that accountB. in that wayC. by that meansD. in that case46. If I tell the police I was with you that day, will you ______ my story?A. back upB. back offC. back ofD. back down47. What she suggested in her lecture _______ the existing ideas about the causes of heart disease.A. pursuedB. exploitedC. overcameD. exploded48. Britain’s mineral _______ include oil, coal and gas deposits.A. assetsB. resourcesC. sourcesD. origins49. ________ continued protection and conservation, the country-side will be used and enjoyed by our children and grandchildren.A. AllowedB. ProvidedC. GrantedD. Given50. Each of them _______ the point that the present system was unfair, but nobody said nothing.A. consentedB. concededC. confusedD. concealed51. The beautiful sunset and the peaceful atmosphere left him feeling very _______.A. contemplativeB. contemporaryC. contemptD. consistent52. I think my husband is the most handsome man in the world, but I realize my judgment is rather _______.A. reasonableB. subjectiveC. objectiveD. ridiculous53. The young girl violinist ________ all the other competitors.A. ourgrewB. outragedC. outlastedD. outshone54. The company issued guidelines to prevent any kind of ______ in the workplace.A. harassmentB. commitmentC. advancementD. judgment55. I know you’re very keen to go to the Middle East, but I’m afraid I still have _________.A. observationsB. conservationsC. reservationsD. preservations56. Select an honest and _______ dealer who will supply you with high quality goods.A. defenselessB. reliableC. stubbornD. sophisticated57. Nuclear weapons should be used only as a last ________.A. resortB. approachC. appealD. solution58. In Sweden employers have taken the ______ in promoting health insurance schemes.A. initialB. initiativeC. initialsD. initiatives59. The government has fallen ______ on the subjects of tax cuts after all its promises at the last election.A. silentlyB. silentC. to be silentD. to being silent60. Economists believe that the jobless total will ______ to 3.5 million by the spring.A. roarB. stretchC. soarD. sum61. The drug will ______ the tiger harmless for up to two hours.A. renderB. tendC. obligeD. blend62. She again ______ that she had seen him just before the accident.A. conformedB. confirmedC. reformedD. affirmed63. The ______ of the teacher’s making the students work hard helps them learn a lot.A. consistencyB. inconsistencyC. contradictionD. contraction64. His fear of dogs _______ a bad experience as a child.A. results inB. springs fromC. gives rise toD. attributes to65. Scientists are now on the _______ of a better understanding of how the human brain works.A. basisB. groundC. thresholdD. side66. Well, I suppose she can stay here _______, while she’s looking for an apartment.A. permanentlyB. prevalentlyC. reluctantlyD. temporarily67. I consider David’s article on July 19 to be inaccurate, ______ and ridiculous.A. offensiveB. persuasiveC. comprehensiveD. instructive68. We’ll be obliged to _______ you here while we continue the investigation.A. departB. detainC. retainD. contain69. The conference was seen as an ideal _______ for increased cooperation between the member states.A. vehicleB. accessC. techniqueD. claim70. Prices of fruits and vegetable _______ according to the season.A. frustrateB. convertC. expoundD. fluctuatePart IV Cloze Directions: For each blank in the following passage, choose the best answer from the choices given.People differ in their ability to learn. They differ in the _71_ of ability they have, ___72__ in the kind of ability they have. Some students can __73_ get passing marks in the high school, while others earn all A’s. Certain students are successful ___74_ mathematics and science __75___ do poorly in literature and history. Other students do well in literature and history and poorly in science and mathematics.It is ___76__ what causes these differences. Are they due to the person’s ___77___, to the kind of home or neighborhood in which he ___78__ and kind of exp eriences he had? Or is a person’s ability to learn ___79__ him from his ancestors, ___80__ such traits as the texture of his hair and the color of his eyes?Most studies of these questions seem to __81__ the fact that both the conditions ___82___ a person grows up and the traits passed on to him from his ancestors ___83__ how well the person learns.It is true that some families have an unusually large ___84__ of gifted members. But these families may live in homes __85___ there are books and other opportunities for learning. The parents may be ___86___ their children’s success in school. In other families there are questions of people with very modest abilities. In these families, interest ___87__ learning may not be encouraged.Differences ___88___ boys and girls in ability to learn have also been reported. Generally, however, the differences have been small. Girls __89__ to have slightly better grades in elementary school, and they are slightly better in word skills than boys. Boys, __90_, get higher scores on mathematical learning tasks and in science and mechanics.71. A. amount B. greatness C. number D. room72. A. the same like B. and as wellC. as wellD. as well as73. A. always B. hard C. hardly D. almost74. A on B. about C. at D. in75. A. thereby B. but C. therefore D. thus76. A. hard to find out B. difficult to defineC. easy to decideD. useless to discover77. A. surrounding B. environmentC. atmosphereD. background78. A. educated m B. brought upC. grew upD. experienced79. A. passed on to B. past on toC. passed upD. passed by80. A. together B. alongC. along byD. along with81. A. point up B. point toC. point downD. point off82. A. by which B. with whichC. under whichD. on which83. A. measure B. makeC. determineD. demonstrate84. A. amount B. scopeC. sumD. number85. A. in which B. by whichC. with whichD. under which86. A. linked to B. worried withC. concerned withD. connected with87. A. about B. fromC. inD. at88. A. among B. inC. fromD. between89. A hope B. tendC. expectD. intend90. A. on the contrast B. on the conditionC. in the situationD. on the other hand Part V Writing。
Dynamic Transcriptome Landscape of Maize Embryo and Endosperm Development1[W][OPEN]Jian Chen2,Biao Zeng2,Mei Zhang,Shaojun Xie,Gaokui Wang,Andrew Hauck,and Jinsheng Lai*State Key Laboratory of Agro-biotechnology and National Maize Improvement Center,Department of Plant Genetics and Breeding,China Agricultural University,Beijing100193,People’s Republic of ChinaMaize(Zea mays)is an excellent cereal model for research on seed development because of its relatively large size for both embryo and endosperm.Despite the importance of seed in agriculture,the genome-wide transcriptome pattern throughout seed development has not been well ing high-throughput RNA sequencing,we developed a spatio-temporal transcriptome atlas of B73maize seed development based on53samples from fertilization to maturity for embryo, endosperm,and whole seed tissues.A total of26,105genes were found to be involved in programming seed development,in-cluding1,614transcription factors.Global comparisons of gene expression highlighted the fundamental transcriptomic repro-gramming and the phases of development.Coexpression analysis provided further insight into the dynamic reprogramming of the transcriptome by revealing functional transitions during bined with the published nonseed high-throughput RNA sequencing data,we identified91transcription factors and1,167other seed-specific genes,which should help elucidate key mechanisms and regulatory networks that underlie seed development.In addition,correlation of gene expression with the pattern of DNA methylation revealed that hypomethylation of the gene body region should be an important factor for the expressional activation of seed-specific genes,especially for extremely highly expressed genes such as zeins.This study provides a valuable resource for understanding the genetic control of seed development of monocotyledon plants.Maize(Zea mays)is one of the most important crops and provides resources for food,feed,and biofuel (Godfray et al.,2010).It has also been used as a model system to study diverse biological phenomena,such as transposons,heterosis,imprinting,and genetic diversity (Bennetzen and Hake,2009).The seed is a key organ of maize that consists of the embryo,endosperm,and seed coat.Maize seed development initiates from a double fertilization event in which two pollen sperm fuse with the egg and central cells of the female gametophyte to produce the progenitors of the embryo and endosperm, respectively(Dumas and Mogensen,1993;Chaudhury et al.,2001).The mature embryo inherits the genetic information for the next plant generation(Scanlon and Takacs,2009),whereas the endosperm,which is storage tissue for the embryo,persists throughout seed devel-opment and functions as the site of starch and protein synthesis(Sabelli and Larkins,2009).Elucidation of the genetic regulatory mechanisms involved in maize seed development will facilitate the design of strategies to improve yield and quality,and provide insight that is applicable to other monocotyledon plants.A key means to explore the mechanisms of seed de-velopment is to identify gene activities and functions. Genetic studies have uncovered a number of genes that play major roles in governing embryogenesis and ac-cumulation of endosperm storage compounds,such as Viviparous1,KNOTTED1,Indeterminate gametophyte1, Shrunken1(Sh1),Opaque2(O2),and Defective kernel1 (Chourey and Nelson,1976;McCarty et al.,1991;Smith et al.,1995;Vicente-Carbajosa et al.,1997;Lid et al., 2002;Evans,2007).Furthermore,the activity of some genes has also been extensively studied.Typical exam-ples are zein genes that encode primary storage proteins in endosperm.Woo et al.(2001)examined zein gene expression and showed that they were the most highly expressed genes in endosperm based on EST data,where-as their dynamic expression patterns were revealed in a later study(Feng et al.,2009).Nevertheless,informa-tion on the global gene expression network throughout seed development is still very limited.The transcriptome is the overall set of transcripts, which varies based on cell or tissue type,develop-mental stage,and physiological condition.Analysis of transcriptome dynamics aids in implying the function of unannotated genes,identifying genes that act as critical network hubs,and interpreting the cellular pro-cesses associated with development.In Arabidopsis (Arabidopsis thaliana),the genes expressed in devel-oping seed and its subregions at several develop-ment stages have been analyzed with Affymetrix GeneChips(Le et al.,2010;Belmonte et al.,2013).In1This work was supported by the National High Technology Re-search and Development Program of China(863Project,grant no. 2012AA10A305to J.L.)and the National Natural Science Foundation of China(grant no.31225020to J.L.).2These authors contributed equally to the article.*Address correspondence to jlai@.The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy de-scribed in the Instructions for Authors()is: Jinsheng Lai(jlai@).[W]The online version of this article contains Web-only data.[OPEN]Articles can be viewed online without a subscription./cgi/doi/10.1104/pp.114.240689maize,microarray-based atlases of global transcription have provided insight into the programs controlling development of different organ systems (Sekhon et al.,2011).Compared with microarray,high-throughput RNA sequencing (RNA-seq)is a powerful tool to com-prehensively investigate the transcriptome at a much lower cost,but with higher sensitivity and accuracy (Wang et al.,2009b).Several studies have taken ad-vantage of the RNA-seq strategy to interpret the dy-namic reprogramming of the transcriptome during leaf,shoot apical meristem,and embryonic leaf develop-ment in maize (Li et al.,2010;Takacs et al.,2012;Liu et al.,2013).To date,only two studies have focused on identi-fying important regulators and processes required for embryo,endosperm,and/or whole seed development in maize based on a genome-wide transcriptional pro-file produced by RNA-seq (Liu et al.,2008;Teoh et al.,2013).However,these studies were limited by the low number of samples used and they did not provide an extensive,global view of transcriptome dynamics over the majority of seed development stages.Here,we pres-ent a comprehensive transcriptome study of maize em-bryo,endosperm,and whole seed tissue from fertilization to maturity using RNA-seq,which serves as a valu-able resource for analyzing gene function on a global scale and elucidating the developmental processes of maize seed.RESULTSGeneration and Analysis of the RNA-seq Data SetTo systematically investigate the dynamics of the maize seed transcriptome over development,we generated RNA-seq libraries of B73seed tissues from different developmental stages,including 15embryo,17endo-sperm,and 21whole seed samples (Fig.1).Utilizing paired-end Illumina sequencing technology,we gen-erated around 1.9billion high-quality reads,80.2%of which could be uniquely mapped to the B73referencegenome (Schnable et al.,2009;Supplemental Table S1).The genic distribution of reads was 66.6%exonic,25.6%splice junction,and 2.6%intronic,leaving about 5%from unannotated genomic regions,demonstrating that most of the detected genes have been annotated.Uniquely mapped reads were used to estimate nor-malized transcription level as reads per kilobase per million (RPKM).To reduce the in fluence of transcrip-tion noise,genes from the B73filtered gene set (FGS)were included for analysis only if their RPKM values were $1.Considering that our purpose was not to identify minor differential expression of genes between two time points of development,but to provide an atlas of gene expression pro file across tissues using time se-ries biological samples,we only randomly selected 12samples to have a biological replicate (Supplemental Fig.S1)to assess our data parisons of bio-logical replicates showed that their expression values were highly correlated (average R 2=0.96).For the samples with biological replicates,we took the average RPKM as the expression quantity.To further evaluate the quality of our expression data,we compared the transcript abundance patterns of a number of selected genes with previously measured expression pro files (Supplemental Fig.S2).For example,LEAFY COTYLE-DON1(LEC1),which functions in embryogenesis,was mainly expressed in the early stage of the embryo (Lotan et al.,1998).Globulin2(Glb2)had high expres-sion late in embryogenesis,in accordance with its func-tion as an important storage protein in the embryo (Kriz,1989).Similarly,O2,a transcription factor (TF)that regulates zein synthesis (Vicente-Carbajosa et al.,1997),and Fertilization independent endosperm1(Fie1),a repressor of endosperm development in the absence of fertilization (Danilevskaya et al.,2003),were almost ex-clusively expressed in the endosperm.Expression pat-terns of these selected genes were identi fied exclusively in their known tissue of activity,indicating that the embryo and endosperm samples were processed well.In total,we detected 26,105genes expressed in at least 1of the 53samples (Supplemental Data Set S1).The distribution of these genes is revealed by aVennFigure 1.Overview of the time series maize seed samples used for RNA-seq analysis.The photographs show the changes in maize embryo,endosperm,and whole seed during development.The 53samples shown here were used to generate RNA-seq libraries.Bar =5mm.Transcriptome Dynamics of Maize Seeddiagram (Fig.2A),which shows that 20,360genes were common among all three tissue types.The number of genes detected in endosperm tissue during the devel-opmental stages was lower and much more variable compared with embryo or whole seed tissue,and a greater number of genes were expressed in the tissue during the early and late phases.Several thousand fewer expressed genes were detected 14d after polli-nation (DAP)in the endosperm compared with 6or 8DAP (Fig.2B;Supplemental Data Set S2).In addition,the median expression level in the embryo was roughly 2-fold greater than that of the endosperm from 10to 30DAP (Fig.2C).Of the 1,506genes unique to the whole seed samples,451were present in RNA-seq data of 14and/or 25DAP pericarp tissue (Morohashi et al.,2012).Considering that the maternal tissue is the vast majority of the content of the early seed (Márton et al.,2005;Pennington et al.,2008),we inferred that most of these genes might be expressed exclusively in maternal tissue such as the pericarp or nucellus.Moreover,1,062genes in the embryo and endosperm with low expression were not detected in whole seed samples (Fig.2C).Division of Development Phases by Global Gene Expression PatternsTo gain insight into the relationships among the dif-ferent transcriptomes,we performed principal compo-nent analysis (PCA)on the complete data set,which can graphically display the transcriptional signatures and developmental similarity.The first component(40.7%variance explained)separated samples based on tissue identity and clearly distinguished embryo from endosperm samples,with whole seed samples located in between (Fig.3A).The second component (24.4%variance explained)discriminated early,mid-dle,and late stages of development for all three tissues (Fig.3A).The wider area occupied by endosperm sam-ples than embryo demonstrates stronger transcriptome reprogramming in developing endosperm,which is mainly attributable to drastic changes in the early and late stages.Moreover,whole seed samples of 0to 8DAP and 30to 38DAP clustered closely to the embryo,but 10to 28DAP samples were close to the endosperm.Cluster analysis of the time series data for the tissues grouped samples well along the axis of developmental time (Fig.3,B –D).Embryo samples from 10to 20DAP and 22to 38DAP were the primary clusters,which correspond to morphogenesis and maturation phases of development (Fig.3B).This is consistent with the embryo undergoing active DNA synthesis,cell division,and differentiation,and then switching to synthesis of storage reserve and desiccation (Vernoud et al.,2005).Expression differences in endosperm samples resulted in three primary clusters,which correspond to early,middle,and late phases of development (Fig.3C).The earliest time point (6and 8DAP)is an active period of cell division and cell elongation that terminates at about 20to 25DAP (Duvick,1961).The samples of 10to 24DAP formed one subgroup and 26to 34DAP formed another subgroup,suggesting that they mark the period forming the main cell types and maturation of endo-sperm,respectively (Fig.3C).The two subgroupsformFigure 2.Analysis of global gene expres-sion among different samples.A,Venn diagram of the 26,105genes detected among embryo,endosperm,and whole seed.B,Number of genes expressed in each of the samples.C,Comparison of expression levels of genes detected in embryo and endosperm tissues.Chen et al.a larger cluster for active accumulation of storage com-pounds during 10to 34DAP.The distinct cluster of 36and 38DAP is in accordance with the end of storage compound accumulation in the endosperm and the ac-tivation of biological processes involved in dormancy and dehydration.In whole seed tissue,a primary clus-ter was formed from the earliest time points with 0to 4DAP and 6to 8DAP samples as subgroups,separating the nucellus degradation as well as endosperm syncy-tial and cellularization phases from the rapid expansion of the endosperm and development of embryonic tissues (Fig.3D).After 10DAP,the embryo and endosperm dominate the formation of seed,as shown in Figure 1and morphological observation (Pennington et al.,2008).As effected by both embryo and endosperm,10to 28DAP whole seed samples clustered together and 30to 38DAP formed another group (Fig.3D).These results con firmthat the expression data successfully captured the char-acteristic seed development phases and should there-fore contain valuable insights about corresponding changes in the transcriptome.Integration of Gene Activity and Cellular Function across Development PhasesThe PCA and hierarchical clustering analysis graph-ically display the relationship among different sam-ples,but do not indicate the detailed cellular ing the k-means clustering algorithm,we classi fied the detected genes into 16,14,and 10coexpression modules for embryo,endosperm,and whole seed,re-spectively,each of which contains genes that harbor similar expression patterns (Fig.4).We then used Map-Man annotation to assign genes to functionalcategoriesFigure 3.Global transcriptome relationships among different stages and tissues.A,PCA of the RNA-seq data for the 53seed samples shows five distinct groups:I for embryo (light red),II for endosperm (light blue),and III to V for early (III),middle (IV),and late (V)whole seed (light purple).B to D,Cluster dendrogram showing global transcriptome relationships among time series samples of embryo (B),endosperm (C),and whole seed (D).The y axis measures the degree of variance (see the “Materials and Methods”).The bottom row indicates the developmental phases according to the cluster dendrogram of the time series data.au,Approximately unbiased.Transcriptome Dynamics of Maize Seed(Supplemental Fig.S3).Thus,we can aggregate genes over continuous time points and obtain a view of func-tional transitions along seed development.According to the cluster analysis results,most mod-ules of the embryo can be divided into middle (10–20DAP)and late (22–38DAP)stages (Fig.4A).The middle stage,best represented by modules C1to C7,is typi fied by the overrepresentation of glycolysis,tri-carboxylic acid cycle,mitochondrial electron transport,redox,RNA regulation,DNA and protein synthesis,cell organization,and division-related genes.This is consis-tent with the high requirement of energy during em-bryo formation.The late stage represented by C8to C12exhibited up-regulation of the cell wall,hormone me-tabolism (ethylene and jasmonate),stress,storage pro-teins,and transport-related genes,which coincides with the maturation of the embryo.The modules C13to C16included genes that were broadly expressed across the time points sampled and were related to hormone me-tabolism (brassinosteroid),cold stress,RNA process-ing and regulation,amino acid activation,and protein targeting.All of the 14coexpression modules of endosperm can be roughly divided into early (6–8DAP),middle (10–34DAP),and late (36–38DAP)stages (Fig.4B).The early stage (represented by modules C1to C4)isexempli fied by high expression of hormone metabo-lism (gibberellin),cell wall,cell organization and cycle,amino acid metabolism,DNA,and protein synthesis –related genes,which is consistent with differentiation,mitosis,and endoreduplication.Genes in the tricar-boxylic acid cycle and mitochondrial electron transport are also overrepresented and related to energy de-mands at that time.The middle stage (best represented by C5to C8,in which different modules have distinct pro files)is the active storage accumulation phase and exhibits high expression of carbohydrate metabolism genes,as expected.Increased expression of protein degradation-related genes around 26to 34DAP in C7and C8coincides with the process of endosperm matu-ration.Genes involved in protein degradation,second-ary metabolism,oxidative pentose phosphate,receptor kinase signaling,and transport were up-regulated in the late stage in modules C9to C14during the con-cluding phase of endosperm maturation.Ten DAP and later time points of whole seed sam-ples re flect the additive combination of embryo and endosperm expression.Genes that are active early in development (0–8DAP)in clusters C1to C4are ex-pected to be related to maternal tissue,which is the bulk of the seed at that time.A group of genes are highly expressed in C1at 0DAP,but theirexpressionFigure 4.Coexpression modules.A to C,Expression patterns of coexpression modules of embryo (A),endosperm (B),and whole seed (C),ordered according to the sample time points of their peak expression.For each gene,the RPKM value normalized by the maximum value of all RPKM values of the gene over all time points is shown.Chen et al.drops rapidly by2DAP,suggesting that they have functional roles that precede pollination.This group includes photosynthesis light reaction members and some TFs involved in RNA regulation.Genes related to cell wall and protein degradation,signaling,nucle-otide metabolism,DNA synthesis,cell organization, and mitochondrial electron transport are overrepre-sented in C2to C4,which has increased expression after pollination,in accordance with the degradation of nucellus tissue and development of embryonic tissues. The expression patterns and functional categories of the1,506genes detected only in whole seed samples are shown in Supplemental Figure S4.Because these genes tend to be expressed at high levels mainly before 8DAP,they are presumed to have functions in early seed development.Together,these data show that the transition of major biochemical processes along the developmental time axis of the seed is produced partly by highly coordi-nated transcript dynamics.TF Expression during Seed DevelopmentOf the2,297identified maize TFs(Zhang et al.,2011), 1,614(70%)are included in our analysis(Supplemental Data Set S3),which accounts for6.18%of the total number of genes detected in seed tissue.The num-ber of TFs detected in the different samples is shown in Supplemental Figure S5A.Their proportion to the total genes expressed in each tissue time point was always greater in embryo than endosperm samples (Supplemental Fig.S5B).Shannon entropy has been used to determine the specificity of gene expression, with lower values indicating a more time-specific pro-file(Makarevitch et al.,2013).The Shannon entropy of TFs was significantly lower than all other genes in both embryo(P=2.3310210)and endosperm(P,1.53 1023),indicating that TFs tended to be expressed more time specifically than other genes(Supplemental Fig.S5, C and D).The number of TFs from each family used in the seed development program,along with the proportion of members present in the coexpression modules rel-ative to the total members of the family expressed in the tissue,is shown in Supplemental Figure S6. Enrichment of these TF families in the coexpression modules based on observed numbers was evaluated with Fisher’s exact test.Significant TF family enrich-ment was identified for specific coexpression modules. For example,12auxin-response factor TFs(38.7%) were expressed in embryo module C2during mor-phogenesis and one-half of the detected members of the WRKY family were active in endosperm module C12late in endosperm development.The WRKY family has been reported to be mainly involved in the physiological programs of pathogen defense and se-nescence(Eulgem et al.,2000;Pandey and Somssich, 2009).Twenty-one MIKC family TFs(56.8%)were pres-ent in whole seed module C2,implying an important role in regulating genes involved in response to fertil-ization.The developmental specificity of the detected TFs makes them excellent candidates for reverse ge-netics approaches to investigate their role in grain production.Tissue-Specific Genes of SeedIdentification of uncharacterized tissue-specific genes can help to explain their function and understand the underlying control of tissue or organ identity.To gen-erate a comprehensive catalog of seed-specific genes, results from this study were compared with25pub-lished nonseed RNA-seq data sets(Jia et al.,2009;Wang et al.,2009a;Li et al.,2010;Davidson et al.,2011;Bolduc et al.,2012),including root,shoot,shoot apical meristem, leaf,cob,tassel,and immature ear(Supplemental Table S2).In total,we identified1,258seed-specific genes,in-cluding91TFs from a variety of families(Supplemental Data Set S4).To gain further insight into the spatial expression trend in the developing seed,we divided these genes into four groups:embryo specific,endo-sperm specific,expressed in both embryo and endo-sperm,and other as only expressed in whole seed (Table I).The dynamic expression patterns of these genes reflect their roles in corresponding development stages(Supplemental Fig.S7).The largest numbers of seed-specific genes were observed in the endosperm, consistent with a study in maize using microarrays (Sekhon et al.,2011),perhaps reflecting the specific function of endosperm.We compared the distribution of tissue-specific genes and TFs in embryo and endosperm coexpression mod-ules to identify important phases in the underlying transcription network.Coexpression modules with an enrichment of tissue-specific genes or TFs may provide insight about uncharacterized genes and preparation for subsequent developmental processes.A feature of gene activity is shown in Figure5.Fisher’s exact test (P,0.05)was used to determine modules with sig-nificant enrichment of tissue-specific genes and TFs in the embryo and endosperm.In the embryo,TFs and tissue-specific genes were significantly enriched in the late phase,suggesting a specific process during matu-ration.In the endosperm,we observed that TFs and tissue-specific genes were overrepresented in the mid-dle phase,which conforms to the role of endosperm inTable I.Total number of detected seed-specific genes and TFsData are presented as n.Tissue Type Specific Genes Specific TFs Embryo24923Endosperm74259Embryo and endosperm2196Other a483Total1,25891a These genes only detected expression in whole seeds.Transcriptome Dynamics of Maize Seedstorage compound accumulation and to speci fic pro-gress at this phase.To gain further insight into the functional signi fi-cance of tissue-speci fic genes,overrepresented gene ontology (GO)terms were examined using the WEGO online tool (Ye et al.,2006;Supplemental Fig.S8).All overrepresented GO terms were observed for themiddle embryo development phase,including the bio-synthetic process,cellular metabolic process,macro-molecule,and nitrogen compound metabolic process.Similarly,overrepresented GO terms for the endosperm were mostly observed in the early phase,including the macromolecule,nitrogen compound metabolic pro-cess,DNA binding,and transcription regulator.GenesFigure 5.Distribution and enrichment of genes,tissue-specific genes,TFs,and tissue-specific TFs in coexpression modules of embryo and endosperm.A and B,Bars indicate the percentage of all detected genes (green),tissue-specific genes (blue),TFs (red),and tissue-specific TFs (purple)observed in a coexpression module (C)or in the development phase (Total)relative to the total number of each group detected across samples for embryo (A)and endosperm (B).The number of genes represented by the percentage is shown on the right y axis.Enrichment for tissue-specific genes and TFs was evaluated with Fisher’s exact test based on the number of genes observed in each coexpression module,whereas enrichment for tissue-specific TFs was evaluated based on the number of TFs observed in each coexpression module.Asterisks represent significant enrichment at a false dis-covery rate #0.05.Chen et al.involved in the nutrient reservoir class were enriched in the middle phase.The oxidoreductase class was overrepresented in late phase,and is known to be in-volved in maturation (Zhu and Scandalios,1994).The Expression of Zein Genes in EndospermZeins are the most important storage proteins in maize endosperm and are an important factor in seed quality.According to Xu and Messing (2008),there are 41a ,1b ,3g ,and 2d zein genes.In order to explore their expression pattern,we first con firmed the gene models by mapping publicly available full-length com-plementary DNAs of zein subfamily genes to B73bac-terial arti ficial chromosomes,and then mapped these back to the reference genome.Because some of these zein genes were not assembled in the current B73ref-erence genome or were only annotated in the working gene set,we con firmed a final set of 35zein genes in the FGS of the B73annotation,including 30a ,1b ,3g ,and 1d zein genes (Supplemental Table S3).About three-quarters (26)of these were in the list of the 100most highly expressed genes in the endosperm,based on mean expression across all endosperm samples (Supplemental Table S4).The distribution of these most highly expressed genes clearly showed that the 26zeins and 4starch synthesis genes were actively expressed in the middle phase of endosperm devel-opment,characteristic of storage compound accumu-lation (Fig.6A).Previous research has shown that the zein genes con-stitute approximately 40%to 50%of the total tran-scripts in the endosperm (Marks et al.,1985;Woo et al.,2001),but these results are based on EST data from a single tissue or pooled tissues and only a few zein genes were assessed.Thus,we reevaluated the transcriptomic contribution of zein genes across endosperm develop-ment using our RNA-seq data,which is able to overcome the high structural similarity among them,especially in the a family (Xu and Messing,2008).Zein genes stably accounted for about 65%of transcripts from 10to 34DAP,with 19-kD a zeins (approximately 42%),22-kD a zeins (approximately 8%),and g zeins (approximately 10%)representing the most abundant transcripts (Fig.6B).The expression of different members within agivenFigure 6.Analysis of highly expressed genes in the endosperm.A,The distribution of the 100most highly expressed genes in the endosperm ordered by mean expression in different modules.B,The dynamic transcript levels of different zein gene family members in the endosperm as reflected by their percentage among all detected gene transcript levels.C,Heat map showing RPKM values of 35zein genes in the different development stages of the endosperm.+,Having intact coding regions;2,with premature_stop;N,no;Y ,yes.Transcriptome Dynamics of Maize Seed。
TREC2003 QA at BBN: Answering Definitional QuestionsJinxi Xu, Ana Licuanan and Ralph WeischedelBBN Technologies50 Moulton StreetCambridge, MA 021381 INTRODUCTIONIn TREC 2003, we focused on definitional questions. For factoid and list questions, we simply re-used our TREC 2002 system with some modifications.For definitional QA, we adopted a hybrid approach that combines several complementary technology components. Information retrieval (IR) was used to retrieve from the corpus the relevant documents for each question. Various linguistic and extraction tools were used toanalyze the retrieved texts and to extract various types of kernel facts from which the answer to the question is generated. These tools include name finding, parsing, co-reference resolution, proposition extraction, relation extraction and extraction of structured patterns. All text analysis functions except structured pattern extraction were carried out by Serif, a state of the art information extraction engine (Ramshaw, et al, 2001) from BBN.Section 2 summarizes our submission for factoid and list qeustion answering (QA). The rest of the paper focuses on defintional questions. Section 4 concludes this work.2 FACTOID AND LIST QUESTIONSThe factoid system is the same as our system for TREC 2002 (Xu, et al, 2003), except for a couple of modifications. One modification is to boost the score for answers that occurred multiple times in the corpus. This is similar to previous studies (e.g. Clarke, et al, 2001) thatemployed redundancy information to improve QA performance. Assuming the occurrences of an answer a are a 1, a 2, … a n , which are ranked by IR score, then the final score for a is<=<+ni ia score c a score 11)()( We set c =0.001.The other modification is to use additional information to validate answers. Specifically,• Validation based on question type : For questions of the form “What X is ...”, the system uses WordNet to verify that the question type is a hypernym of the answer. (Forexample, “Tagalog” is a valid answer for “What language ...” because “language” is ahyperym of “Tagalog”.)• Validation of answers for questions looking for a date : We required certain constraints on date answers based on the form of the question. For example, “When(was|did) ... ” questions are likely to refer to a specific date in the past, and a good answer candidate should contain a year. “When is ...” questions may refer to either a specificdate in the past (in which case they should contain a year), or a relative date, such as “the day after X”. “What month ...” questions should contain a month, etc.•Validation of answers for questions looking for a measurement: We wrote patterns to detect four common measurement questions: dimension, duration, speed and temperature.Valid candidate answers are expected to have both a quantity and a unit of theappropriate type.•Validation of answers for questions looking for an author: Specific patterns were written to extract “who-wrote” answers from the text.•Validation of answers for questions looking for an inventor: Specific patterns were written to extract “who-invented” answers from the text.•Validation based on verb-argument: We used WordNet to match verbs (for example, “Who killed X?” = “Y shot X”). In addition, we refined the scoring of slot matching totake into account the number of “filler” words that appear in between the question words. List questions were processed like factoid questions, except that answers were ranked and the list of answers was truncated when the score of an answer drops below 90% of that of the best one. We submitted three runs, BBN2003A, BBN2003B and BBN2003C. The differences are: •BBN2003A. The Web was not used in answer finding.•BBN2003B. For factoid questions, answers were found from both the TREC corpus and the Web. For list questions, it is the same as BBN2003A, except the constant c in thescore computing formula was increased to 0.1.•BBN2003C. For factoid questions, it is the same as BBN2003B, except for one difference.If an answer was found from both the TREC corpus and the Web, its score was boosted.For list questions, it is the same as BBN2003B.Our technique to use the Web for QA is documented in our TREC 2002 work (Xu et al, 2003). Table 1 shows the NIST scores for the three runs. Two observations can be made. First, using the Web improved factoid QA. This is not surprising given previous TREC results by our group as well as by other groups. Compared with our TREC 2002 results, however, the impact of using the Web on QA performance is much smaller, improving accuracy from 0.177 to 0.208, a 3% improvement absolute. In comparison, for TREC 2002, using the Web produced a much larger improvement (10% absolute) in accuracy. More work is needed to determine if the reduced benefit of using the Web is due to changes in the characteristics of the questions or due to the modifications we made to last year’s system. Second, using a large c significantly improved the performance of the list questions (from 0.087 to 0.097). This indicates that taking advantage of answer redundancy is more crucial for list questions than for factoid questions.BBN2003A BBN2003B BBN2003CFactoid 0.177 0.208 0.206List 0.087 0.097 0.097Table 1: Scores for factoid and list questions3DEFINITIONAL QUESTIONS3.1System OverviewOur system processed a definitional question in a number of steps. First, question classification identified the question type, i.e. whether a question is a who or a what question. The distinction is necessary because some subsequent processing treats the two types of questions differently. Also in this step, the question target was extracted from the question text, by stripping of “What is”, “Who is” etc.Second, information retrieval pulled documents about the question target from the TREC corpus. This was achieved by treating the question target as an IR query. BBN’s HMM IR system (Miller, et al, 1999) was used for this purpose. For each question, the top 1000 documents were retrieved.Third, heuristics were applied to the sentences in the retrieved documents to determine if they mention the question target. Sentences that do not mention the question target were dropped. Fourth, kernel facts that mention the question target were extracted from sentences by a variety of linguistic processing and information extraction tools. A kernel fact is usually a phrase extracted from a sentence. The purpose of using kernel facts is twofold: to minimize irrelevant materials in the answer and to facilitate redundancy detection.Fifth, all kernel facts were ranked by their type and their similarity to the profile of the question. The question profile is a word centroid that models the importance of different words in answering the question.Finally, heuristics were applied to detect redundant kernel facts. Up to a cap on the total answer length, facts that survived redundancy detection were output as the answer to the question.3.2Checking if a Sentence Mentions a Question TargetFirst, we check if a document mentions a question target at all. If a document does not mention a question target, we drop the whole document from consideration. For who questions, we require a document to contain a word sequence “F…L” (F and L are the first and last names of the question target) and the distance between F and L is less than 3. The purpose is to match “George Bush” with “George Walker Bush”. We assume the first and last names are the first and last word of the question target respectively. For what questions, we require the document to contain the exact string of the question target, except that plural forms were converted to singular before string comparison.If a sentence contains a noun phrase that either matches the question target directly (via string comparison) or indirectly (through co-reference), we think it contains the question target. Weused Serif (Ramshaw et al, 2001) for co-ref resolution and parsing. For who questions, only the last name was used in string comparison.3.3Extraction of Kernel facts3.3.1Appositives and Copula ConstructionsAn example appositive is the phrase “George Bush, the US President”. An example copula is the sentence “George Bush is the US President.” In both cases, the phrase “the US President” is a definition for “George Bush”. Appositives and copulas were extracted from the parse trees of the sentences based on simple rules using Serif.3.3.2PropositionsPropositions represent an approximation of predicate-argument structures and take the form: predicate (role1: arg1, … , role n: arg n). In the context of this work, the predicate is typically a verb. Arguments can be either an entity or another proposition. The most common roles include logical subject, logical object, and object of a prepositional phrase modifying the predicate. For example, “Smith went to Spain” is represented as went(logical subject: Smith, PP-to: Spain). Propositions were extracted from parse trees using Serif.We classified propositions into special propositions and ordinary ones. We manually created alist of predicate-argument structures that we thought were particularly important in defining an entity. For example, “<PERSON> was born on <DATE>” is one of the predicate -argument structures for persons. Propositions that matched one of such pre-defined structures were classified as special while others were classified as ordinary.3.3.3 Structured PatternsWe handcrafted over 40 rules to extract structured patterns that are typically used in defining a term. Similar techniques were also used by Columbia University (Blair-Goldensohn, et al, 2004) and Language Computer Corporation (Harabagiu, et al, 2004) in their TREC 2003 work. For example, one such rule is “<TERM> ,? (is|was)? also? <RB>? called|named|known+as <NP>”. Applied to a parsed sentence, the rule will match the question target (<TERM>), optionally followed by a coma, optionally followed by “is” or “was”, optionally followed by “also”, optionally by an adverb (<RB>), followed by “called”, “named” or “known as” and followed by a noun phrase (<NP>). In the pattern, the “?” denotes optional, “+” concatenation, and “|” alternative. If the question is “What are tsunamis?”, the pattern will extract the phrase “Tsunamis, also known as tidal waves” from the sentence “Tsunamis, also known as tidal waves, are caused by earthquakes.”3.3.4 RelationsAs discussed in Section 3.3.2, propositions simply consist of lexical predicates. Since different lexical predicates can represent the same underlying relation, normalizing these propositions into relations that are commonly found in an ontology, where possible, is obviously desirable for definitional QA.In this work, relations were extracted by Serif. Serif can extract the 24 binary relations defined in the ACE guidelines (Linguistic Data Consortium, 2002). Using lexicalized patterns, Serifextracts those relations from the propositions. For example, the relation role/general-staff(“Gunter Blobel”, “Rockefeller University”) will be extracted from the sentence “Dr. Gunter Blobel of The Rockefeller University won the Nobel Prize for medicine today for protein research that shed new light on diseases including cystic fibrosis and early development of kidney stones.”.The QA guidelines require the answer to a definitional question to be a list of textual strings rather relations. We mapped a relation extracted by Serif to a phrase by finding the smallest phrase in the parse tree that contains a mention of the question target and the other argument of the relation. For the above example, the extracted phrase would be “Dr. Gunter Blobel of The Rockefeller University”.3.3.5SentencesIn addition to the above types of kernel facts, we used full sentences as fall back facts in order to deal with sentences from which none of the above-mentioned types of kernel facts can be extracted.3.4Ranking the Kernel FactsThe ranking order of the kernel facts is based on two factors: their type and their similarity to the profile of the question. Appositives and copulas were ranked at the top, then structured patterns, then special propositions, then relations and finally ordinary propositions and sentences. Within each type, kernel facts were ranked based on their similarity to the question profile. The similarity is the tf.idf score where both the kernel fact and the question profile were treated as a bag of words. We used the tf.idf function described by Allan et al, 2000.The question profile was created in three possible ways. First, we searched for existing definitions of the question target from a number of sources. The resources include: WordNet glossaries, Merriam-Webster dictionary (), the Columbia Encyclopedia (online at ), Wikipedia (), the biography dictionary at and Google. To search for biographies on Google, we used the person name and the word “biography” as a query (e.g. “George Bush, biography”). A simple rule-based classifier was used to weed out false hits. If definitions were found from these sources, the centroid (i.e,. vector of words and frequencies) of the retrieved definitions was used as the question profile.If no definitions were found from the above sources, we considered two options. If the question is a who question, we used the centroid of a collection of 17,000 short biographies from as the question profile. Our hope is that using a large number of human created biographies, we can predict what words are important for biography generation. If the question is a what question, we used the centriod of all kernel facts about the question target as the question profile. The assumption here is that the most frequently co-occurring words with the question target are the most informative words for answering the question. Similar techniques have been used in definitional QA (Blair-Goldsenshon, et al, 2003) and summarization (Radev, et al, 2000).3.5Redundancy RemovalThe goal of redundancy removal is to determine if the information in a kernel fact f is covered by a set of kernel facts S that are already in the answer. Three methods were used to decide if f is redundant with respect to S:•If f is a proposition, we check if one of the facts in S is equivalent to f. Two propositions are considered equivalent if they share the same predicate (e.g., verb) and same headnoun for each of the arguments. If such a fact is found, f is considered redundant.•If f is a structured pattern, we check how many facts in S were extracted using the same underlying rule. If two or more facts were found, we consider f redundant.•Otherwise, we check the percentage of content words in f that have appeared at least once in the facts in S. If the percentage is very high (i.e., >0.7), we consider f redundant.3.6Results and DiscussionThe algorithm to generate an answer for a definitional question is:1.Set the answer set S={}2.Rank all kernel facts based on their similarity to the question profile regardless of theirtype. Iterate over all facts: In each iteration, discard a fact if it is redundant with respectto S. Otherwise, add the fact to S. Go to the next step if S has m facts.3.Rank all remaining facts by type (the primary field) and then by similarity (the secondaryfield). Iterate over the ranked facts: In each iteration, add a fact to S if it is not redundant.Go to step 5 if the size (i.e., the number of non-space characters) of S is greater thanmax_answer_size or the number of sentences and ordinary propositions in S is greaterthan n.4.If S is empty, rank all sentences in the top 1,000 retrieved documents based on thenumber of shared words between a sentence and the question target. Add the top 20sentences to S.5.Return S as the answer to the question.We submitted three official runs, BBN2003A, BBN2003B and BBN2003C. For all three runs, max_answer_size=4,000 bytes. For BBN2003A, m=0 and n=5. For BBN2003B, m=0 and n=20. For BBN2003C, m=5 and n=10. The parameters were empirically set based on a set of about 79 development questions. Table 2 shows the results of the three runs. Overall, our results are satisfactory, given the median and best scores of all runs provided by NIST. In fact, BBN2003C achieved the highest score for definitional QA at TREC 2003.Shortly after submitting the official runs and after discussion with NIST, we submitted a baseline run. The goal of the baseline was to give every group (including us) a chance to calibrate the results of their official runs. For each question, the baseline sequentially selected from the top 1,000 documents the sentences that mention the question target. The same heuristic in Section 3.2 was used to check if a sentence mentions the question target. As a fallback, if no sentences were found to mention the question target, all sentences in the top 1,000 documents were selected and ranked by the number of shared words between the question target and the sentences. For answer generation, we iterated over the selected sentences. For each sentence, we checked the percentage of words in the sentence that had occurred in previous sentences in the output answer. If the percentage was greater than 70%, the sentence was considered redundant and was skipped. Otherwise, the sentence was appended to the answer. The iteration continueduntil all sentences were considered or the answer length (i.e. the number of non-space characters in the answer) is greater than 4,000. Note that we applied a large length threshold because the F-score favors recall over precision. The baseline run was assessed by NIST in the same way as the official runs.As shown in Table 2, the baseline performed surprisingly well, with an F-score 0.49. In fact, it outperformed all runs NIST received except BBN2003A, B&C. Our official runs (BBN2003A, B&C) are higher than the baseline, but the improvements are modest. One possible explanation for the unexpectedly good baseline is that the current state of the art of definitional QA is immature. The other is that with β=5 the F-score is overly recall-oriented and as such was “fooled” by the long answers produced by the baseline.BBN2003A BBN2003B BBN2003C Baseline0.521 0.520 0.555 0.49Table 2: Results for Definitional QATable 3 shows the scores for who and what questions for BBN2003C. The average score for who questions is somewhat better that that of what questions, but due to the relatively small number of questions, it is hard to determine if the difference is statistically meaningful.TYPE Number of questions NIST F scoreWho questions 30 0.577What questions 20 0.522Total 50 0.555Table 3: BBN2003C score breakdown based on types of definitional questionsFigure 1 shows the score distribution over the 50 definitional questions sorted by F score. Quite a few questions (10) get a score of zero or close to zero. An initial analysis shows that a major source of failures is faulty assumptions we made in interpreting the question target. One example is “What is Ph in biology?”. Our system assumed the literal string “Ph in biology” is the question target and tried to find it in text. Understandably, it failed. Another example is “Who is Akbar the Great?”. Our system assumed the last name is “Great”. These problems can be fixed.. Another major source of errors is erroneous redundancy removal. For example, for the question “Who is Ari Fleischer?”, the inclusion of the kernel fact “Ari Fleischer, Dole’s former spokesman who now works for Bush” in the answer masks the fact “Ari Fleischer, a Bush spokesman”. The latter was considered to be redundant because all the words in it appeared in the former one. We hope better redundancy detection strategies will overcome such problems.Figure 1: Score distribution over 50 definitional questions for BBN2003CA question by question analysis shows that when a question obtains a bad F score, it is usually due to recall rather than precision. In fact, for BBN2003C, if we assume perfect precision for all questions, it would merely increase the average F score from 0.555 to 0.614. However, if we assume perfect recall for all questions, it would increase the score to 0.797. This imbalance is understandable because the F-metric used for TREC 2003 QA emphasizes recall by a factor of five over precision.4CONCLUSIONSIn TREC 2003 QA, we focused on definitional questions. Our approach combines a number of complementary technologies, including information retrieval and various linguistic and extraction tools (e.g., parsing, proposition recognition, pattern matching and relation extraction) for analyzing text. Our results for definitional questions are excellent compared with the results of other groups. However, much work remains as our results are only modestly better than a baseline that did little more than sentence selection using IR.ReferencesAllan, J., Callan, J., Feng, F., and Malin, D. 2000. “INQUERY at TREC8.” In TREC8 Proceedings, Special publication by NIST, 2000.Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A., 2003. “Answering Definitional Questions: A Hybrid Approach.” To appear in Maybury, M., editor, New Directions in Question Answering, AAAI Press. Chapter 13, during 2004.Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A., 2004. “A Hybrid Approach for QA Track Definitional Questions.” To appear in TREC 2003 Proceedings, Special publication by NIST, 2004.Clarke, C., Cormack, G., and Lynam, T., 2001. “Exploiting redundancy in question answering.” In Proceedings of SIGIR, 2001.Harabagiu, S., Moldovan, D., Clark, C., Bowden, M., Williams, J., and Bensley, J., 2004. “Answer Mining by Combining Extraction Techniques with Abductive Reasoning”. To appear in TREC 2003 Proceedings, Special publication by NIST, 2004.Linguistic Data Consortium, 2002. “ACE Phase 2: Information for LDC Annotators”, /Projects/ACE2/.Miller, D., Leek, T., and Schwartz, R., 1999. “A hidden markov model information retrieval system.” In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999.Radev, D., Jing, H., and Budzikowska, M., 2000. “Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies.” In ANLP/NAACL Workshop on Summarization, Seattle, WA, 2000.Ramshaw, L., Boschee, E., Bratus, S., Miller, S., Stone, R., Weischedel, R., and Zamanian, A., 2001. “Experiments in Multi-Modal Automatic Content Extraction.” In Proceedings of Human Language Technology Conference, San Diego, CA, March 18-21, 2001.Xu, J., Licuanan, A., May, J., Miller, S. and Weischedel, R., 2003. “TREC2002 QA at BBN: Answer Selection and Confidence Estimation.” In TREC 2002 Proceedings, Special publication by NIST, 2003.。