A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a grap
- 格式:pdf
- 大小:236.19 KB
- 文档页数:6
英语修辞隐含比较定义In the realm of English rhetoric, implicit comparison is a subtle yet powerful device that enhances the expressiveness of language. It allows writers and speakers to convey complex ideas and emotional nuances without explicitly stating them, thus adding depth and richness to communication. The art of implicit comparison lies in the ability to draw parallels between seemingly unrelated concepts, creating a connection that is both insightful and engaging.One of the most striking examples of implicit comparison in English is the use of analogies. Analogies work by comparing two different things or concepts to illustrate a similarity between them. By doing so, they help readers or listeners better understand a complex concept by relating it to something more familiar. For instance, in describing the vastness of the universe, an author might compare it to an ocean, noting that both are vast, unending, and full of mysteries waiting to be discovered.Another common form of implicit comparison is the use of hyperbole. Hyperbole involves exaggerating a statement for the purpose of emphasis or to evoke a strong emotional response. By pushing the boundaries of realism, hyperbole allows speakers and writers to convey a sense of urgency or importance that might be difficult to achieve with aliteral statement. For example, when someone says, "I'm starving to death!" they are not literally on the brink of death, but they are emphasizing their extreme hunger through exaggeration.The power of implicit comparison lies in its ability to engage the reader or listener on a deeper level. By making connections between seemingly unrelated ideas, it encourages them to think beyond the literal meaning of words and consider the underlying connections and meanings. This type of rhetorical device is particularly effective in persuasive writing, as it allows the author to subtly guide the reader's thinking without overtly stating their arguments.Implicit comparison also adds a layer of aesthetic pleasure to language. By comparing disparate elements in acreative and unexpected way, it generates a sense of surprise and delight that makes language more enjoyable to read or hear. This is particularly evident in poetry, where poets often use implicit comparison to create images and evoke emotions that are both beautiful and profound.In conclusion, the implicit comparison is a vital tool in the English rhetorician's toolbox. It allows writers and speakers to convey complex ideas and emotional nuances with precision and elegance, engaging their audience on a deeper level. By drawing parallels between seemingly unrelated concepts and using creative language to evoke strong emotional responses, implicit comparison adds a unique and powerful dimension to English rhetoric.**修辞之力:英语中隐含比较的修辞魅力**在英语修辞的世界里,隐含比较是一种微妙而强大的手法,能够增强语言的表达力。
Increased Proportions of Bifidobacterium and the Lactobacillus Group and Loss of Butyrate-Producing Bacteria in Inflammatory Bowel DiseaseWei Wang,a Liping Chen,a Rui Zhou,a,b Xiaobing Wang,a Lu Song,a Sha Huang,a Ge Wang,a Bing Xia a,bDepartment of Gastroenterology/Hepatology,Zhongnan Hospital of Wuhan University,Wuhan,People’s Republic of China a;Hubei Clinical Center&Key Laboratory of Intestinal&Colorectal Diseases,Wuhan,People’s Republic of China bDysbiosis in the intestinal microbiota of persons with inflammatory bowel disease(IBD)has been described,but there are still varied reports on changes in the abundance of Bifidobacterium and Lactobacillus organisms in patients with IBD.The aim of this investi-gation was to compare the compositions of mucosa-associated and fecal bacteria in patients with IBD and in healthy controls (HCs).Fecal and biopsy samples from21HCs,21and15Crohn’s disease(CD)patients,and34and29ulcerative colitis(UC) patients,respectively,were analyzed by quantitative real-time PCR targeting the16S rRNA gene.The bacterial numbers were transformed into relative percentages for statistical analysis.The proportions of bacteria were uniformly distributed along the colon regardless of the disease state.Bifidobacterium was significantly increased in the biopsy specimens of active UC patients compared to those in the HCs(4.6%versus2.1%,P؍0.001),and the proportion of Bifidobacterium was significantly higher in the biopsy specimens than in the fecal samples in active CD patients(2.7%versus2.0%,P؍0.012).The Lactobacillus group was significantly increased in the biopsy specimens of active CD patients compared to those in the HCs(3.4%versus2.3%,P؍0.036).Compared to the HCs,Faecalibacterium prausnitzii was sharply decreased in both the fecal and biopsy specimens of the active CD patients(0.3%versus14.0%,P<0.0001for fecal samples;0.8%versus11.4%,P<0.0001for biopsy specimens)and the active UC patients(4.3%versus14.0%,P؍0.001for fecal samples;2.8%versus11.4%,P<0.0001for biopsy specimens).In conclusion,Bifidobacterium and the Lactobacillus group were increased in active IBD patients and should be used more cau-tiously as probiotics during the active phase of IBD.Butyrate-producing bacteria might be important to gut homeostasis.C rohn’s disease(CD)and ulcerative colitis(UC)are two formsof inflammatory bowel disease(IBD),a condition driven by an abnormal immune response to the intestinal microbiota in genetically susceptible hosts(1–3).Dysbiosis of the intestinal mi-crobiota is common in IBD.Evidence from antibiotic treatment of IBD,fecal stream diversion in CD,and experimental models of colitis have shown that microbiotas play an important role in the pathogenesis of IBD,and the improvement of dysbiosis in the intestinal microbiota has been propounded as a new strategy for IBD treatment(4).Probiotics are live microorganisms that have health benefits to the host when consumed in adequate amounts,and clinical stud-ies indicate that the quantity of Bifidobacterium and Lactobacillus organisms decreases in the intestinal microbiotas of IBD patients(4).Several clinical trials have demonstrated the efficacy of VSL#3,a mixture of eight different probiotics,for the treatment of UC patients(5,6),and single-species probiotic treatment,such as one with Escherichia coli Nissle1917,Bifidobacterium,or Lactobacillus rhamnosus GG,also displays efficacy in the management of pa-tients with UC(7–9).Meanwhile,experimental studies in colitis mouse models have demonstrated the potential protective mech-anisms of these probiotics,through their reinforcement of the epithelial barrier(10,11),inhibition of proinflammatory cytokine secretion(12,13),and modulation of immune responses(14, 15).Few studies have evaluated the effectiveness of probiotics in CD patients.One study suggested that Faecalibacterium prausnitzii prevents2,4,6-trinitrobenzenesulfonic acid(TNBS)-induced colitis(16).However,studies have shown that the diversity of the genus Bifidobacterium is not decreased in the feces of patients with active CD(17)and that the numbers of Bifidobacterium organisms do not decrease in active CD patients(18).A twin study even found an increased abundance of Bifidobacterium and F.prausnitzii or-ganisms in the mucosal samples of colonic CD patients,as well as an elevated abundance of Lactobacillus organisms in the mucosal samples of ileal CD patients(19).These reports seem to be in conflict with previous data.To investigate the changes caused by common probiotics in IBD patients,we used real-time PCR to quantify bacteria in mu-cosal biopsy specimens and fecal samples of patients with IBD. Furthermore,we also determined the proportional differences of the dominant commensal bacteria between paired fecal and mu-cosal samples.MATERIALS AND METHODSPatients and samples.Chinese patients of Han ethnicity with UC and CD were consecutively recruited from among the outpatients and inpatientsReceived12June2013Returned for modification26July2013Accepted7November2013Published ahead of print13November2013Editor:B.A.ForbesAddress correspondence to Bing Xia,bingxiawh@.W.W.and L.C.contributed equally to this article.Supplemental material for this article may be found at /10.1128/JCM.01500-13.Copyright©2014,American Society for Microbiology.All Rights Reserved.doi:10.1128/JCM.01500-13 Journal of Clinical Microbiology p.398–406February2014Volume52Number2in the Department of Gastroenterology at Zhongnan Hospital of Wuhan University,Wuhan,China.Patients diagnosed with IBD based on data from clinics,radiology,endoscopy,and histology were included in the study.The protocol was approved by the ethics commission of Zhongnan Hospital.The subjects were asked to complete a questionnaire regarding environmental exposure,dietary habits,and antibiotic,probiotic,and drug use.The subjects were required to be adults with an unrestricted diet.Subjects with positive stool cultures of pathogens who were taking anti-biotic or probiotic treatments or colon-cleansing products in the 3months before sampling were excluded.Next,the subjects were invited to participate in the study and provided informed consent.They were askedto expel stool onto a sterile petri dish directly before bowel preparation,and a fresh stool sample was collected on-site and immediately was trans-ferred to the laboratory with an ice box within 1h and stored at Ϫ80°C for further analysis.Subsequently,a magnesium sulfate solution and water were used for bowel preparation,colonoscopy was followed by video en-doscopy,and biopsy specimens were taken from different gut locations.The collection procedure for the fecal and biopsy specimens was accom-plished within 24h.The fecal and biopsy specimens were collected from 76and 63sub-jects,respectively (Table 1).Active CD and active UC were defined as a CD activity index of Ͼ150and a UC activity index of Ͼ3(20,21),respectively.Meanwhile,21healthy controls were matched for stool samples and biopsied tissues,and there were also 8patients with active CD,3patients with CD in remission,16patients with active UC,and 4patients with UC in remission.DNA extraction from biopsy and fecal specimen materials.DNA was extracted from 200mg of feces.Briefly,200mg of stool was added to a 2-ml microcentrifuge tube prefilled with 300mg of 0.1-mm glass beads (Sigma,USA)and incubated on ice until the addition of 1.4ml stool lysis (ASL)buffer from the QIAamp DNA stool minikit (Qiagen,Germany).The samples were immediately subjected to bead beating (45s;speed,6.5m/s)twice using a FastPrep-24machine (MP Biomedicals,USA)before heat and chemical lysis at 95°C for 5min.The subsequent steps of DNA extraction were performed according to the QIAamp kit protocol for pathogen detection.The biopsy specimen DNA was extracted using the QIAamp DNA minikit (Qiagen,Germany)according to the manufactur-er’s instructions,with an additional bead-beating step (45s;speed,6.5,performed twice)using a FastPrep-24at the beginning of the protocol.The extracted DNA was stored at Ϫ80°C for further analysis.Amplification by conventional PCR to check primer specificity.A Bio-Rad PCR machine (Bio-Rad,USA)was used for conventional PCR to check primer specificity.The primers (Table 2)were purchased fromTABLE 1Numbers of specimens by patient group,disease status,and specimen type Patient group Disease status Biopsy location No.of specimens:No.of matched biopsy/fecal specimens Biopsy Feces CDActiveIleum 9158Colon 12Rectum 12Quiescent Ileum 263Colon 3Rectum 3UC Active Colon 222916Rectum 22QuiescentColon 554Rectum 5HC ControlIleum 212121Colon 21Rectum21TABLE 2Group-and species-specific 16S rRNA primers used Target Primer direction Sequence (5=to 3=)Annealing T m (°C)Product size (bp)Reference no.All bacteriaForward ACTCCTACGGGAGGCAGCAGT 6120044Reverse GTATTACCGCGGCTGCTGGCAC Bacteroides Forward GTCAGTTGTGAAAGTTTGC 61.512745Reverse CAATCGGGAGTTCTTCGTG Bifidobacterium Forward AGGGTTCGATTCTGCTCAG 6215645Reverse CATCCGGCATTACCACCC C.coccoides group (XIVa)Forward AAATGACGGTACCTGACTAA 60.744046Reverse CTTTGAGTTTCATTCTTGCGAA C.leptum group (IV)Forward GTTGACAAAACGGAGGAAGG 6024538Reverse GACGGGCGGTGTGTACAA F.prausnitzii Forward AGATGGCCTCGCGTCCGA 61.519934Reverse CCGAAGACCTTCTTCCTCC Lactobacillus group bForward GCAGCAGTAGGGAATCTTCCA 61.534047Reverse GCATTYCACCGCTACACATG E.coli Forward GTTAATACCTTTGCTCATTGA 6134046Reverse ACCAGGGTATCTAATCCTGTT -Globin geneForward CAACTTCATCCACGTTCACC *a26828ReverseGAAGAGCCAAGGACAGGTACa Based on detected bacterial T m .bLactobacillus group PCR primers used to amplify bacteria,including the Lactobacillus ,Pediococcus ,Leuconostoc ,and Weissella group of lactic acid bacteria (LAB).Bacteria in Inflammatory Bowel DiseaseFebruary 2014Volume 52Number 399ShengGong BioTech(ShengGong,China).PCR consisted of35cycles, with an initial DNA denaturation step at95°C(30s),followed by gradi-ent annealing temperature(30s)and elongation at72°C(45s).The procedure was completed with afinal elongation step at72°C(10min). The determinations of optimum temperature were performed using a MyCycler gradient PCR machine,which was adjusted for various tem-perature ranges(Bio-Rad,USA).Real-time PCR.Bacterial16S rRNA gene copies were quantified in mucosal tissue and feces using an iCycler real-time PCR detection system(Bio-Rad,USA).Briefly,standard curves were constructed with a10-fold dilution series of amplified bacterial16S rRNA genes from the reference strains.To determine the influence of biopsy specimen sizes of mucosal tissue,the human cell numbers were quantified using primers specific for the-globin gene to determine the total number of mucosa-associated bacteria in the biopsy specimens.To reduce the quantitative error of the detected bacteria and to characterize the changes in bacterial copies,the abundance of16S rRNA gene copies was calculated from standard curves,and specific bacterial groups were expressed as a percentage of the total bacteria determined by the universal primers.Each reaction was performed in duplicate and re-peated three times.The amplifications were performed in afinal reac-tion volume of20l containing2ϫSYBR mix(GeneCopoeia,USA), 0.4l of each primer at afinal concentration of0.2M,0.4l of ROX (5-carboxy-X-rhodamine)reference dye,2l of bacterial DNA,and ultrapure water to20l.The amplification protocol consisted of one cycle of95°C for10min,followed by40cycles of95°C for10s,an-nealing temperature for30s,and72°C elongation for30s.Thefluo-rescent products were detected at the last step of each cycle.Melting curve analysis was performed from the annealing temperatures to95°C at an increase of0.5°C per10s after amplification to monitor the target PCR product specificity andfidelity.Statistical analysis.Data analysis was conducted using SPSS17.0. Comparisons were made using Student’s t test or a one-way analysis of variance for variables with normal distributions.For nonnormal distribu-tions,the Mann-Whitney U test was used for comparisons between groups,and the Kruskal-Wallis method was used to compare more than two groups.P values ofϽ0.05were considered statistically significant. The total bacterial counts(CFU/g)of each bacterium in the fecal samples were log transformed(log10CFU)for statistical analysis.Specific bacterial counts were expressed as a percentage of the total bacterial counts of each sample.RESULTSClinical characteristics.The demographic and clinical character-istics of the IBD patients are shown in Tables S1and S2in the supplemental material.Percent variation of bacteria in feces.The average bacterial quantifications of feces in each group are summarized in Table3. The comparisons of the fecal bacteria in all groups are shown in Fig.1a and b.The total numbers of bacteria in the fecal samples were similar between the healthy control(HC),CD,and UC pa-tients,and no significant differences were observed.Interestingly,we unexpectedly observed an increase of Bifido-bacterium and the Lactobacillus group in both the active CD(A-CD)and active UC(A-UC)patients,but neither of these popula-tions was significantly different from those in the HCs.However, the proportion of Bifidobacterium was higher in A-UC patients than in A-CD patients.The proportions of Bifidobacterium and the Lactobacillus group were decreased in quiescent-IBD patients compared to active-IBD patients.We also observed a trend of increased Bacteroides organisms in A-CD and A-UC patients compared to healthy controls,but no significant differences were observed.Furthermore,the propor-tion of Bacteroides was lower in quiescent-IBD patients than in active-IBD patients.The Clostridium coccoides group decreased significantly in the feces of both A-CD(Pϭ0.004)and A-UC patients(Pϭ0.015).The Clostridium leptum group,another main group of the Firmicutes phylum,was decreased in A-CD(PϽ0.0001)and A-UC(PϽ0.0001)patients and decreased in R-CDTABLE3Quantification of bacteria in fecal microbiotaDisease group %(meanϮSD)of the indicated bacterial species/group:Bacteroides C.coccoides C.leptum F.prausnitzii Bifidobacterium Lactobacillus E.coliHC14.566Ϯ12.16129.048Ϯ12.75019.618Ϯ10.55814.023Ϯ10.593 1.244Ϯ2.059 2.260Ϯ3.588 1.597Ϯ4.483 A-CD28.444Ϯ22.85015.593Ϯ12.977 1.703Ϯ2.1640.260Ϯ0.575 1.986Ϯ3.442 4.268Ϯ7.073 6.344Ϯ6.505 R-CD23.957Ϯ19.38917.738Ϯ10.466 5.843Ϯ7.541 4.266Ϯ6.078 1.575Ϯ1.673 2.324Ϯ2.537 5.676Ϯ5.687 A-UC26.958Ϯ22.10119.583Ϯ14.767 5.466Ϯ5.106 2.248Ϯ2.860 2.943Ϯ7.410 3.315Ϯ3.43114.742Ϯ17.474 R-UC28.892Ϯ13.47222.617Ϯ8.24711.784Ϯ11.3577.600Ϯ3.795 2.819Ϯ3.326 2.615Ϯ2.630 2.310Ϯ4.607 FIG1(a)Quantification of total bacteria in feces;(b)quantification of dominant bacteria in feces.HC,healthy control;ACD,active Crohn’s disease;RCD, Crohn’s disease in remission;AUC,active ulcerative colitis;RUC,ulcerative colitis in remission.*,PϽ0.05;**,PϽ0.0001.Wang et al. Journal of Clinical Microbiologypatients (P ϭ0.036)compared to in the HCs.We found that the decreased proportion of C.leptum was higher in A-CD patients than in A-UC patients (P ϭ0.014).Although the proportions of C.coccoides and C.leptum in feces showed a rising trend in patients with quiescent IBD,there was no significant difference between quiescent IBD and active IBD patients.F.prausnitzii ,a represen-tative bacterium of the C.leptum group,was decreased both in patients with A-CD (P Ͻ0.0001)and in those with A-UC (P ϭ0.001).The decrease in the proportion of F.prausnitzii in patients with A-CD was significant compared with that in A-UC patients (P ϭ0.01).F.prausnitzii was increased in quiescent IBD patients,but no significant differences were observed compared with pa-tients with active IBD.E.coli ,the most abundant bacterium in the Gammaproteobacteria ,was increased in both CD and UC patients.The proportion of E.coli increased in active-CD (P ϭ0.005)and quiescent-CD (P ϭ0.026)patients compared to that in the HCs.Additionally,the proportion of E.coli increased in active-UC pa-tients (P ϭ0.001)compared to HCs,and the proportion de-creased in quiescent-UC (P ϭ0.05)patients compared with active-UC patients.Moreover,we found that the increased pro-portion of E.coli was more striking in the active-UC than in the active-CD patients (P ϭ0.027).Percent variation of bacteria in different gut locations.To determine whether the percentages of commensals varied signifi-cantly in the different gut locations,we compared the bacterialproportions among the three biopsied locations (Fig.2).The total number of mucosa-associated bacteria in the healthy controls was consistent across the different biopsied locations.The percentages of detected bacteria were almost uniformly distributed along the colon in the healthy controls.The percentages of detected bacteria were also consistent across the different biopsied locations in pa-tients with A-CD.Interestingly,the same results were observed in patients with A-UC and UC in remission (R-UC),in whom the bacteria were almost uniformly distributed along the colon,re-gardless of whether the area was inflamed.Percent variation of bacteria in mucosal biopsy specimens.The average bacterial quantifications of the biopsy specimens in each group are summarized in Table 4.The results were also com-pared to those for HCs.In the present study,we observed a de-creased trend in total mucosa-associated bacteria in patients with CD and UC compared to in the HCs,but no significant difference was observed.Because the biopsied sample size of the CD patients in remission (R-CD)group was limited,we did not compare it with that of the healthy controls.A comparison of the bacteria found in the biopsy specimens from all groups is shown in Fig.3a and b .Bifidobacterium was increased in patients with A-UC (P ϭ0.001)compared to in the HCs,and the increased proportion of Bifidobacterium in the biopsy specimens was higher in A-UC than A-CD patients (P ϭ0.032).Again,the Lactobacillus groupunex-FIG 2Ratios of bacteria in different gut locations and feces.Shown in the upper left graph is the total number of mucosa-associated bacteria at different biopsied locations in different groups.The other five graphs show the dominant probiotic ratios in the feces and different gut locations.Bacteria in Inflammatory Bowel DiseaseFebruary 2014Volume 52Number 2 401pectedly presented a significant increase in patients with A-CD (Pϭ0.036)compared to in the HCs,and although the increased proportion of the Lactobacillus group was higher in patients with A-CD than A-UC,no significant difference was observed.We also observed a rising trend in patients with A-UC,but this trend was not significant.In contrast,the percentages of Bifidobacterium and the Lactobacillus group presented a decreasing trend in patients with quiescent UC,but no significant differences were observed.We observed a trend of increased Bacteroides in the biopsy specimens from patients with A-CD and A-UC compared to in healthy controls,but no significant difference was observed.The proportion of the C.coccoides group in biopsy specimens was de-creased in A-CD patients(PϽ0.0001)compared to in the HCs, while no significant decrease was found in patients with A-UC. The decreased proportion of the C.coccoides group was more striking in patients with A-CD compared to A-UC(Pϭ0.003). The C.leptum group was decreased in patients with A-CD(PϽ0.0001)and A-UC(PϽ0.0001)compared to HCs,and the de-creased proportion was higher in A-CD than A-UC patients,al-though no significant difference was observed.We observed a sig-nificant decrease in the C.leptum group in patients with R-UC (Pϭ0.016)compared to in the HCs.F.prausnitzii was also de-creased in patients with A-CD(PϽ0.0001)and A-UC(PϽ0.0001)compared to in the HCs,and the decreased proportion ofF.prausnitzii was significantly higher in patients with A-CD than in patients with A-UC(Pϭ0.006).Both the C.coccoides group and F.prausnitzii exhibited a rising trend in patients with quies-cent UC compared to those with active UC,but no significant difference was observed.Additionally,E.coli significantly in-creased in the biopsy specimens in IBD patients.The proportion of E.coli was at a high level in patients with active CD(Pϭ0.018) compared to in the HCs.Moreover,E.coli also increased in active UC patients(Pϭ0.016)compare to in the HCs.Although the proportion of E.coli was higher in active CD than in active UC patients,no significant differences were was observed.Comparison of the ratio between fecal and biopsy specimens. As the detected bacteria in the intestinal mucosal biopsy spec-imens showed similar proportions regardless of the biopsied location,we determined whether the proportion was different between biopsy and fecal specimens(Fig.4).The proportion of E.coli was significantly higher in the biopsy specimens(Pϭ0.002) than in fecal samples in21healthy controls,but no significant differences were observed in the other comparisons.In eight paired A-CD cases,the proportion of Bifidobacterium was in-creased in biopsy specimens of the active CD patients(Pϭ0.012) compared to in the fecal samples.The C.coccoides group showed a decrease in the biopsy specimens of A-CD patients(Pϭ0.003) compared to the fecal samples,but this result was not found in the UC patients.Conversely,the C.leptum group and its representa-tive bacterium F.prausnitzii were decreased in the fecal samples of A-CD patients compared to in the biopsy specimens,but no sig-nificant difference was observed.Thisfinding was partly due to the small number of paired cases.However,the C.leptum group showed a decrease in the fecal samples of patients with A-UC(Pϭ0.001)compared to biopsy specimens,but not in R-UC patients. DISCUSSIONIn the present study,we investigated mucosa-associated com-mensal bacteria,as they adhere strictly to the epithelium and can provide access to the mucosa-associated microbiota of the subjects,which may play a more critical role than fecal mi-crobes in IBD pathogenesis(22).In our study,we found that the proportions of detected mucosa-associated bacteria in healthy gastrointestinal tracts were uniformly distributed along the colon,which was in accordance with thefindings from a previous study(23,24).The total bacterial counts and detected bacteria were similar across the different gut locations in the colon,regardless of the disease state,which was in line with some previous data(24,25),although reports with con-flicts data have also been published(26–30).TABLE4Quantification of bacteria in mucosal microbiotaDisease group %(meanϮSD)of the indicated bacterial species/group:Bacteroides C.coccoides C.leptum F.prausnitzii Bifidobacterium Lactobacillus E.coliHC19.030Ϯ6.59926.182ϮA.98021.957Ϯ8.08911.415Ϯ6.085 2.147Ϯ1.514 2.262Ϯ2.887 4.872Ϯ8.83 A-CD32.263Ϯ22.400 6.286Ϯ3.5148.578Ϯ7.6040.817Ϯ0.976 2.793Ϯ2.600 3.420Ϯ2.16911.666Ϯ8.796 A-UC28.393Ϯ15.35619.045Ϯ14.10613.326Ϯ6.679 2.844Ϯ2.243 4.653Ϯ2.889 3.267Ϯ2.5909.831Ϯ10.984 R-UC31.477Ϯ22.29619.542Ϯ14.44412.754Ϯ7.027 3.849Ϯ4.238 3.527Ϯ1.981 2.349Ϯ2.0080.875Ϯ0.459 FIG3(a)Total mucosa-associated bacteria in different groups.(b)Quantification of dominant bacteria in biopsy specimens.*,PϽ0.05;**,PϽ0.0001. Wang et al. Journal of Clinical MicrobiologyAs common probiotics,Bifidobacterium and Lactobacillus have received considerable attention.Surprisingly,the proportion of Bifidobacterium was found to be increased in patients with active IBD.These data were partly in agreement with previous data (17),although conflicting data have also been published (31).Compar-atively,the proportion of Bifidobacterium was reduced in quies-cent CD and UC patients.However,the quantitative PCR (qPCR)results had good agreement only with 454pyrosequencing in the fecal samples.Moran et al.(32)reported that germ-free interleu-kin-10-deficient (IL-10Ϫ/Ϫ)mice administered Bifidobacterium animalis had marked duodenal and mild colonic inflammation and immune responses.Moreover,Medina et al.(33)showed that B.longum diverted immune responses toward a proinflammatory or regulatory profile,consequently producing different effects.In contrast,another study demonstrated that oral Bifidobacterium administration prevented intestinal inflammation through the in-duction of intestinal IL-10-producing Tr1cells and ameliorated colitis in immunocompromised mice (35).In the current study,the Lactobacillus group PCR primers used to amplify bacteria belong to the Lactobacillus ,Pediococcus ,Leuconostoc ,and Weissella groups of lactic acid bacteria (LAB)(25).Unexpectedly,we observed that the Lactobacillus group pre-sented marked increases in patients with active IBD,despite no significant differences in those with active UC.However,in pa-tients with quiescent IBD,the proportion of the Lactobacillus group was similar to that of the HCs in both the fecal and biopsy samples.Because it was difficult to design genus-specific primers to definitively discriminate Lactobacillus ,Pediococcus ,Leuconos-toc ,and Weissella group organisms,we quantified the Lactobacillus group with the genus primer,and the species of the Lactobacillus genus are phylogenetically diverse,with Ͼ100species docu-mented to date (36).This result may suggest that other species of the Lactobacillus genus or LAB-producing bacteria were also in-creased in active-IBD patients.A previous study showed that Lac-tobacillus can secrete lactocepin and exert anti-inflammatory ef-fects by selectively degrading proinflammatory chemokines (12).Mileti et al.(37)found that Lactobacillus paracasei displayed a delay in the development of colitis and a decreased severity of disease but that L.plantarum and L.rhamnosus GG exacerbated the development of dextran sodium sulfate (DSS)-induced colitis.In contrast,Tsilingiri et al.(39)found that L.plantarum induced an inflammatory response in the healthy tissue cultured ex vivo at the end of incubation that resembled the response induced by Salmonella .Moreover,L.paracasei ,L.plantarum ,and L.rhamno-sus GG were detrimental in the inflamed tissue derived from IBD patients cultured ex vivo ,whereas the supernatant from the cul-ture system of L.paracasei directly acted on the tissue and down-regulated the proinflammatory activities of the existing leukocytes (39).It remains to be determined which species of Lactobacillus group is increased in patients during the active phase of IBD.Thus,the effects of Bifidobacterium and Lactobacillus in the gut lumen of active IBD patients are of importance and should be determined.Although the bacteria of the Firmicutes phylum presented a varied degree of decline,the decrease in proportion was greater in patients with A-CD than in patients with A-UC.Moreover,weFIG 4Comparison of the ratios in paired fecal and biopsy samples.*,P Ͻ0.05;**,P Ͻ0.0001.Bacteria in Inflammatory Bowel DiseaseFebruary 2014Volume 52Number 2 403found that the C.coccoides group,which comprises Clostridiumcluster XIVa,including members of other genera,such as Copro-coccus,Eubacterium,Lachnospira,and Ruminococcus(38),wasmore deficient in the biopsy specimens of the A-CD patients thanin the fecal samples,and that the reduced proportion was higherthan that of C.leptum in the biopsy specimens.In contrast,previ-ous studies reported that F.prausnitzii within the C.leptum groupwas strikingly low in mucosa-associated microbiotas(40,41).Based on these results,it is tantalizing to hypothesize that the C.coccoides group was more effective in adhering to the mucosalsurface and that the decrease in the C.coccoides group in both thefecal and biopsy specimens of active CD patients,especially with astrikingly decreased proportion in the biopsy specimens,was spe-cific to CD in genetically susceptible individuals.In our study,we found that the representative bacterium ofthe C.leptum group,F.prausnitzii,nearly disappeared in bothdifferent gut locations and in feces but increased in patientswith quiescent IBD.Previous reports showed that F.prausnitziiproduces formate and butyrate and that its fermented product D-lactate provides energy for colonic epithelial cells and plays an important role in epithelial barrier integrity and immunemodulation(41,42).Additionally,Sokol et al.(16)demon-strated that F.prausnitzii exhibits a butyrate-independent anti-inflammatory effect in IBD models.Interestingly,however,Hansen et al.(43)found that F.prausnitzii was increased inpediatric CD patients at the onset of disease,but not in patientswith UC,suggesting a more dynamic role for this organism inthe development of IBD.Moreover,Willing et al.(19)reportedan increase in F.prausnitzii in colonic CD in twins with inflam-matory bowel disease but a decrease in F.prausnitzii in ilealCD.The biopsy specimens in the study by Hansen et al.weretaken from a single site:from the distal colon in controls,orfrom the most distal inflamed site in IBD.The biggest differ-ence in their data was the inclusion of subjects regardless ofwhether they accepted the conventional IBD treatment.There-fore,pharmacological treatment may be a potential con-founder in the microbial study of IBD.Previous data showedthat the abundance of F.prausnitzii decreased strikingly in pa-tients with ileal CD(28,40),and Sokol et al.(16)also foundthat F.prausnitzii presented a reduction in resected ileal Crohnmucosa and was associated with endoscopic recurrence at6months.However,our data show that F.prausnitzii was con-sistent at different gut locations in patients with CD.This maybe caused by various lifestyle and dietary habits.Our study wasfocused on the populations of central China,most of whomprefer a high-fiber diet,according to the results of our ques-tionnaire.Additionally,F.prausnitzii represented a higher av-erage proportion(11.4%)in the biopsy specimens of the HCs,and organisms with such high proportions may display variedfunctions in different mucosal sites.This remains an interest-ing pursuit for further research.This study design was based on the analysis of bacterial16S rRNAgenes and reflected the gene copy number rather than true cellcounts.Also,the rRNA gene analysis did not reflect the functionalchanges in gastrointestinal tract microbes,such as enhanced viru-lence,mucosal adherence,and invasion,which do not influence therelative proportions of species in the microbiota.Therefore,furtherstudies should be conducted on the functions of commensal bacteria.We identified specific commensal bacteria that were signif-icantly increased or decreased in individuals with CD and UC.The butyrate-producing bacteria of Clostridium clusters IV and XIVa were found to be decreased;in particular,F.prausnitzii was decreased in IBD patients.However,Bifidobacterium and the Lactobacillus group were increased in patients with active IBD. Thus,more attention should be paid to butyrate-producing bac-teria,and Bifidobacterium and Lactobacillus could then be used more cautiously as probiotics in patients during the acute phase of IBD.ACKNOWLEDGMENTSWe thank all the subjects who volunteered to participate in this study.This study was supported by Hubei Science&Technology Bureau (grant no.303131796),the Fundamental Research Funds of the Central University of Ministry of Education of China(grant no.2012303020201 and201130302020004),and the National Support Project of the Ministry of Science&Technology of China(grant no.2012BAI06B03).We declare no conflicts of interest.REFERENCES1.Chassaing B,Darfeuille-Michaud A.2011.The commensal microbiota andenteropathogens in the pathogenesis of inflammatory bowel diseases.Gastro-enterology140:1720–1728./10.1053/j.gastro.2011.01.054.2.Sartor RB.2006.Mechanisms of disease:pathogenesis of Crohn’s diseaseand ulcerative colitis.Nature Clin.Pract.Gastroenterol.Hepatol.3:390–407./10.1038/ncpgasthep0528.3.Sartor RB.2008.Microbial influences in inflammatory bowel diseases.Gastroenterology134:577–594./10.1053/j.gastro.2007 .11.059.4.Neish AS.2009.Microbes in gastrointestinal health and disease.Gastro-enterology136:65–80./10.1053/j.gastro.2008.10.080.5.Miele E,Pascarella F,Giannetti E,Quaglietta L,Baldassano RN,StaianoA.2009.Effect of a probiotic preparation(VSL#3)on induction andmaintenance of remission in children with ulcerative colitis.Am.J.Gas-troenterol.104:437–443./10.1038/ajg.2008.118.6.Tursi A,Brandimarte G,Papa A,Giglio A,Elisei W,Giorgetti GM,FortiG,Morini S,Hassan C,Pistoia MA,Modeo ME,Rodino’S,D’Amico T, Sebkova L,Sacca’N,Di Giulio E,Luzza F,Imeneo M,Larussa T,Di Rosa S,Annese V,Danese S,Gasbarrini A.2010.Treatment of relapsing mild-to-moderate ulcerative colitis with the probiotic VSL#3as adjunc-tive to a standard pharmaceutical treatment:a double-blind,randomized, placebo-controlled study.Am.J.Gastroenterol.105:2218–2227.http://dx /10.1038/ajg.2010.218.7.Kruis W,Fric P,Pokrotnieks J,Lukás M,Fixa B,Kascák M,Kamm MA,Weismueller J,Beglinger C,Stolte M,Wolff C,Schulze J.2004.Main-taining remission of ulcerative colitis with the probiotic Escherichia coli Nissle1917is as effective as with standard mesalazine.Gut53:1617–1623./10.1136/gut.2003.037747.8.Kato K,Mizuno S,Umesaki Y,Ishii Y,Sugitani M,Imaoka A,OtsukaM,Hasunuma O,Kurihara R,Iwasaki A,Arakawa Y.2004.Randomized placebo-controlled trial assessing the effect of bifidobacteria-fermented milk on active ulcerative colitis.Aliment.Pharmacol.Ther.20:1133–1141./10.1111/j.1365-2036.2004.02268.x.9.Zocco MA,dal Verme LZ,Cremonini F,Piscaglia AC,Nista EC,Candelli M,Novi M,Rigante D,Cazzato IA,Ojetti V,Armuzzi A, Gasbarrini G,Gasbarrini A.2006.Efficacy of Lactobacillus GG in main-taining remission of ulcerative colitis.Aliment.Pharmacol.Ther.23: 1567–1574./10.1111/j.1365-2036.2006.02927.x.10.Zakostelska Z,Kverka M,Klimesova K,Rossmann P,Mrazek J,Ko-pecny J,Hornova M,Srutkova D,Hudcovic T,Ridl J,Tlaskalova-Hogenova H.2011.Lysate of probiotic Lactobacillus casei DN-114001 ameliorates colitis by strengthening the gut barrier function and changing the gut microenvironment.PLoS One6:e27961./10.1371 /journal.pone.0027961.11.Patel RM,Myers LS,Kurundkar AR,Maheshwari A,Nusrat A,Lin PW.2012.Probiotic bacteria induce maturation of intestinal claudin3expres-sion and barrier function.Am.J.Pathol.180:626–635. /10.1016/j.ajpath.2011.10.025.12.von Schillde MA,Hörmannsperger G,Weiher M,Alpert CA,Hahne H,Bäuerl C,van Huynegem K,Steidler L,Hrncir T,Pérez-Martínez G,Wang et al. Journal of Clinical Microbiology。
An Empirical Comparison betweenDirect and Indirect Test Result Checking Approaches∗†‡§Peifeng HuThe University of Hong KongPokfulamHong Kongpfhu@cs.hku.hkZhenyu ZhangThe University of Hong KongPokfulamHong Kong zyzhang@cs.hku.hkW.K.ChanCity University of Hong Kong T at Chee AvenueHong Kong wkchan@.hkT.H.TseThe University of Hong KongPokfulamHong Kongthtse@cs.hku.hkABSTRACTAn oracle in software testing is a mechanism for checking whether the system under test has behaved correctly for any executions.In some situations,oracles are unavailable or too expensive to apply. This is known as the oracle problem.It is crucial to develop techniques to address it,and metamorphic testing(MT)was one of such proposals.This paper conducts a controlled experiment to investigate the cost effectiveness of using MT by38testers on three open-source programs.The fault detection capability and time cost of MT are compared with the popular assertion checking method. Our results show that MT is cost-efficient and has potentials for detecting more faults than the assertion checking method.∗c ACM,2006.This is the authors’version of the work.It is posted here by permission of ACM for your personal use.Not for redistribution.The definitive version was published in Proceedings of the3rd International Workshop on Software Quality Assurance(SOQUA2006)(in conjunction with the14th ACM SIGSOFT International Symposium on Foundations of Software Engineering(SIGSOFT2006/FSE-14)),pages6–13.ACM Press, New York,2006./10.1145/1188895.1188901.†This research is supported in part by a grant of the Research Grants Council of Hong Kong(project no.HKU7145/04E),a grant of City University of Hong Kong and a grant of The University of Hong Kong.‡All correspondence should be addressed to Prof.T.H.Tse at Department of Computer Science,The University of Hong Kong,Pokfulam,Hong Kong.Tel:(+852)28592183.Fax:(+852)25578447.Email: thtse@cs.hku.hk.§Part of the work was done when Chan was with The Hong Kong University of Science and Technology,Clear Water Bay,Hong Kong.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.SOQUA’06,November6,2006,Portland,OR,USA.Copyright2006ACM1-59593-584-3/06/0011...$5.00.Categories and Subject DescriptorsD.2.5[Software Engineering]:Testing and Debugging—Testingtools;D.2.8[Software Engineering]:Metrics—Product metricsGeneral TermsExperimentationKeywordsMetamorphic testing,test oracle,controlled experiment,empirical evaluation1.INTRODUCTIONSoftware testing assures programs by executing test cases overthe programs with the intent to reveal failures[3].To do so, software testers need to evaluate test results through an oracle,which is a mechanism for checking whether a program has behaved correctly[35].In many situations,however,oracles are unavailableor too expensive to apply.This is known as the oracle problem[35]. Usually,the main purpose of implementing a specific program is to compute unknown results.If the expected results could easily be computed through other automatic means,then there would not bea need to implement the program in thefirst place.On the other hand,manual checking of program outputs is slow,ineffective,and costly,especially for a large number of test cases.Assessing the correctness of program outcomes has,therefore,been recognizedas“one of the most difficult tasks in software testing”[27].As we shall review in Section2,assertion checking[4]and metamorphic testing(MT)[9,17,18,11]are techniques to alleviatethe oracle problem.Assertion checking verifies the result or intermediate program states of a test case.It directly confirms the execution behavior of a program in terms of a checking condition.MT takes another direction,which verifies follow-up test casesbased on existing test cases.It cross-checks the test resultsof existing test cases and their follow-up test cases.In other words,MT indirectly verifies the behaviors of multiple program executions in terms of a checking condition.It would be interestingto compare the two approaches on their effectiveness to identify failures.HKU CS Tech Report TR-2006-13There have been various case studies in applying metamorphic testing to different types of programs,ranging from conventional programs and object-oriented programs,to pervasive programs and web services.Chen et al.[16]reported on the testing of programs for solving partial differential equations.They further investigated the integration of metamorphic testing with fault-based testing and global symbolic evaluation[18].Gotlieb and Botella[22]developed an automated framework to check against a restricted class of metamorphic relations.Tse and others applied metamorphic approach to the unit testing[33]and integration testing[10]of context-sensitive middleware-based applications. Chan et al.[13,14]developed a metamorphic approach to the online testing of service-oriented software applications. Throughout these studies,both the testing and the evaluation of experimental results were conducted by the researchers themselves. The programs under test were from academic sources and relatively small.There is a need for systematic empirical research on how well MT can be applied in practical situations and how effective it is compared with other testing strategies1.Like other comparisons of testing strategies such as between controlflow and dataflow criteria[21]and among different data flow criteria[25],controlled experimental evaluations are essential. They should answer the following research questions:(a)Can testers be trained to apply MT properly?(b)How does the fault detection effectiveness of MT compare with other effective strategies?(c)What is the effort for applying MT?This paper reports and discusses the results in such a controlled experiment.We restricted the scope to object-oriented testing at the class level[4].The subjects were38postgraduate students enrolled in an advanced software testing course.Before doing the experiment,they were taught the concepts of MT and a reference strategy—assertion checking[4]—to alleviate the oracle problem.The training sessions for either concept were similar in duration.Three open-source programs were selected as target programs.The subjects were required to apply both MT and assertion checking strategies to test these programs independently. We ran their test cases over faulty versions of the target programs to assess the capability of these two testing strategies in detecting faults[1].Results were analyzed to compare the costs and effectiveness between MT and assertion checking.The main contributions of this paper are four-fold:(i)It is the first controlled experiment to study the above questions.(ii)The experiment shows that metamorphic testing is more effective than assertion checking for identifying faults for object-oriented programs.(iii)It confirms the belief that the subjects can formulate metamorphic relations and implement MT without much difficulty. In fact,the experiment shows that all subjects manage to propose metamorphic relations after a brief introduction,and identical or very similar metamorphic relations are proposed by different subjects.(iv)It also indicates that metamorphic testing is worth applying in terms of time cost.The paper is organized as follows:Section2discusses the related literature.Section3introduces the fundamental notions and procedures of metamorphic testing.Section4describes the experiment,and the result is presented and discussed in Section5. Finally,Section6concludes the paper.1Other researchers have evaluated the selection of metamorphic relations. However,their work is not yet publicly accessible at the time of submission of this paper.Thus,we shall exclude them from our discussions.2.RELATED WORKMany approaches have been proposed to alleviate the test oracle problem.Instead of checking the output directly,these approaches generated various types of oracle to verify the correctness of a program.Chapman[15]suggested that a previous version of a program could be used to verify the correctness of the current version.Weyuker[35]suggested checking whether some identity relations would be preserved by the program under test.Blum and others[6,2]proposed a program checker,which was an algorithm for checking the output of computation for numerical programs. Their theory was subsequently extended into the theory of self-testing/correcting[5].Xie and Memon[36]studied different types of oracle for graphic user interface(GUI)testing.Binder[4] discussed four categories and eighteen oracle patterns in object-oriented program testing.Assertion checking[32]is another method to verify the execution results of programs.An assertion,which is embedded directly in the source code,is a Boolean expression that verifies whether the execution of a test case satisfies some necessary properties for correct implementation.Assertions are supported by many programming languages and are easy to implement. Assertion checking has been widely used in object-oriented testing. For example,state invariants[4,23],represented by assertions,can be used to check the stated-based behaviors of a system.Briand et al.[8]investigated the effectiveness of using state-invariant assertions as oracles and compared it with the results using precise oracles for object-oriented programs.It was shown that state-invariant assertions were effective in detecting state-related errors. Since our target programs are also object-oriented programs,we have chosen assertion checking as the alternative testing strategy in our experimental comparison.Some researchers have proposed to prepare test specifications, either manually or automatically,to alleviate the test oracle problem.Memon et al.[28]assumed that a test specification of internal object interactions was available and used it to identify non-conformance of the execution traces.This type of approach is common in conformance testing for telecommunication protocols. Sun et al.[31]proposed a similar approach to test the harnesses of st and others[24,34]trained pattern classifiers to learn the casual input-output relationships of a legacy system. They then used the classifiers as test oracles.Podgurski and others [30,20]classified failure reports into categories via classifiers,and then refined the classification by further means.Bowring et al.[7] used a progressive approach to train a classifier to help regression testing.Chan et al.[12]used classifiers to identify different types of behaviors related to the synchronization failures of objects in a multimedia application.3.PRELIMINARIES OF METAMORPHICRELATIONS AND TESTINGThis section introduces metamorphic testing.As we have briefed in Section1,metamorphic testing relies on a checking condition that relates multiple test cases and their results in order to verify whether any failures are revealed.Such a checking condition is known as a metamorphic relation.We shallfirst revisit metamorphic relations and then discuss how they are used in the metamorphic approach to software testing.3.1Metamorphic RelationsA metamorphic relation(MR)is an existing or expected relation over a set of distinct inputs and their corresponding outputs for multiple executions of the target function[17].Consider,forinstance,the sine function.For any inputs x1and x2such that x1+x2=π,we must have sin x1=sin x2.Definition1(Metamorphic Relation)[11]Let x1,x2,...,x k be a series of inputs to a function f,where k≥1, and f(x1),f(x2),...,f(x k) be the corresponding series ofresults.Suppose f(x i1),f(x i2),...,f(x im) is a subseries,possibly an empty subseries,of f(x1),f(x2),...,f(x k) .Let x k+1,x k+2,...,x n be another series of inputs to f,where n≥k+1,and f(x k+1),f(x k+2),...,f(x n) be the corresponding series of results.Suppose,further,that there exists relationsr(x1,x2,...,x k,f(x i1),f(x i2),...,f(x im),x k+1,x k+2,...,x n)andr′(x1,x2,...,x n,f(x1),f(x2),...,f(x n))such that r′must be true whenever r is satisfied.We say thatMR={(x1,x2,...,x n,f(x1),f(x2),...,f(x n))|r(x1,x2,...,x k,f(x i1),f(x i2),...,f(x im),x k+1,x k+2,...,x n)→r′(x1,x2,...,x n,f(x1),f(x2),...,f(x n))}is a metamorphic relation.When there is no ambiguity,we simply write the metamorphic relation asMR:If r(x1,x2,...,x k,f(x i1),f(x i2),...,f(x im),x k+1,x k+2,...,x n)then r′(x1,x2,...,x n,f(x1),f(x2),...,f(x n)). Furthermore,x1,x2,...,x k are known as source test cases and x k+1,x k+2,...,x n are known as follow-up test cases.Similar to assertions in the mathematical sense,metamorphic relations are also necessary properties of the function to be implemented.They can,therefore,be used to detect inconsistencies in a program.They can be any relations involving the inputs and outputs of two or more executions of the target program.They may include inequalities,periodicity properties,convergence properties, subsumption relationships,and so on.Intuitively,human testers are needed to study the problem domain related to a target program and formulate metamorphic relations accordingly.This is akin to requirements engineering, in which humans instead of automatic requirements engines are necessary for formulating systems requirements.Is there a systematic methodology guiding testers to formulate metamorphic relations like the methodologies that guide systems analysts to specify requirements?This remains an open question.We shall further investigate along this line in the future.We observe that other researchers are also beginning to formulate important properties in the form of specifications to facilitate the verification of system behaviors[19].3.2Metamorphic TestingIn practice,if the program is written by a competent programmer, most test cases are“successful test cases”,which do not reveal any failure.These successful test cases have been considered useless in conventional testing.Metamorphic testing(MT)uses information from such successful test cases,which will be referred to as source test cases.Consider a program p for a target function f in the input domain D.A set of source test cases T={t1,t2,...,t k}can be selected according to any test case selection strategy.Executing the program p on T produces outputs p(t1),p(t2),...,p(t k). When there is an oracle,the test results can be verified against f(t1),f(t2),...,f(t k).If these results reveal any failure,testing stops.On the other hand,when there is no oracle or when no failure is revealed,the metamorphic testing approach can continue to be applied to automatically generate follow-up test cases T′={t k+1,t k+2,...,t n}based on source test cases T,so thatthe program can be verified against metamorphic relations.For example,given a source test case x1for a program that implements the sine function,we can construct a follow-up test case x2based on the metamorphic relation x1+x2=π.Definition2(Metamorphic Testing)[11]Let P be an imple-mentation of a target function f.The metamorphic testing of the metamorphic relationMR:If r(x1,x2,...,x k,f(x i1),f(x i2),...,f(x im),x k+1,x k+2,...,x n),then r′(x1,x2,...,x n,f(x1),f(x2),...,f(x n)) involves the following steps:(1)Given a series of source test cases x1,x2,...,x k and their respective results P(x1),P(x2),..., P(x k) ,generate a series of follow-up test cases x k+1,x k+2,...,x naccording to the relation r(x1,x2,...,x k,P(x i1),P(x i2),...,P(x im), x k+1,x k+2,...,x n)over the implementation P.(2)Check the relation r′(x1,x2,...,x n,P(x1),P(x2),...,P(x n))over P.If r′is false,then the metamorphic testing of MR reveals a failure.3.3Metamorphic Testing ProcedureGotlieb and Botella[22]developed an automated framework for a subclass of metamorphic relations.The framework translates a specification into a constraint logic programming language program.Test cases can be automatically be generated according to metamorphic testing.Their framework only works on a restricted subset of the C language and is not applicable to test cases involving objects.Since we want to apply MT to test real-world object-oriented programs,we adopt the original procedure[9]as follows: Firstly,testers identify and formulate metamorphic relations MR1,MR2,...,MR n from the target function f.For each metamorphic relation MR i,testers construct a function gen i to generate follow-up test cases from the source test cases.Next, for each metamorphic relation MR i,testers construct a function ver i,which will be used to verify whether multiple inputs and the corresponding outputs satisfy MR i.After that,testers generate a set of source test cases T according to a preferred test case selection strategy.Finally,for every test case in T,the test driver invokes the function gen i to generate follow-up test cases and apply the function ver i to check whether the test cases satisfy the given metamorphic relation MR i.If a metamorphic relation MR i is violated by any test case,ver i reports that an error is found in the program under test.4.EXPERIMENTThis section describes the set up of the controlled experiment.It firstly formulates the research questions to be investigated and then describes the experimental design and experimental procedure. 4.1Research QuestionsThe research questions to be investigated are summarized as follows:(a)Can the subjects properly apply MT after training?Can thesubjects identify correct and useful metamorphic relationsfrom target programs?(b)Is MT an effective testing method?Does MT have acomparative advantage over other testing strategies suchas assertion checking in terms of the number of mutantsdetected?To address this question,we shall use the standardstatistical technique of null hypothesis testing.Null Hypothesis H0:difference between MTterms of the number ofAlternativesignificant differencechecking in terms ofdetected.We aim at applying thethe Mann-Whitney test tofindshould be rejected,with athat the difference betweenstatistically significant rather(c)What is the effort,in terms of4.2Experimental DesignOur experiment identifies fourvariables.The independentsubjects,target programs,and faultyThe dependent variables are effort inof metamorphic relations/assertions,terms of mutation detection ratio.strategies,we incorporate MT andof this section,we describe the otherSection5will analyze the resultsvariables.Subjects:All the38subjectscomputer science who attended theSoftware Engineering:SoftwareHong Kong.These students had atcomputer science,computerThe majority of them wereindustrial experience.The rests were MPhil and PhD students.We controlled that the training sessions of either approach are comparable in duration and in content.Since differences in software engineering background might affect the students’capability to apply metamorphic testing or assertion checking,we conducted a brief survey prior to the experimentation.It showed that most of them had real-life or academic experience in object-oriented design,Java programming, software testing,and assertion checking.Figure1lists the survey result.As most of subjects were knowledgeable about object-oriented design and Java programming,they were deemed to be competent in the experimental tasks.On the other hand,we found a few students having rather limited experience in software testing and assertion checking.Since they did not have prior concepts of metamorphic testing either,the experiment did not specifically favor the metamorphic approach.Target Programs:We used three open-source programs as target programs.All of them were Java programs selected from real-world software systems.Thefirst target program Boyer is a program using the Boyer-Moore algorithm to support the applications in Canadian Mind Products,an online commercial software company2.The program returns the index of thefirst occurrence of a specified pattern within a given text.The second target program BooleanExpression evaluates Boolean expressions and returns the resulting Boolean value.For example,the evaluation result of“!(true&&false)||true”is“true”. 2URL /products1.html.The program is part of a popular open-source project jboolexpr3 in SourceForge4,which is the largest open-source project website. The target program is a core part of the project.The third target program is TxnT ableSorter.It is taken from a popular open-source project Eurobudget5in the SourceForge website.Eurobudget is an office application written in Java,similar to Microsoft Money or Quicken.Table1specifies the statistics of the three target programs.The sizes of these programs are in line with the sizes of the target programs used in typical software testing researches such as[1] or the famous Siemens test suites.Thefirst program is a piece of commercial software.The second program is a core part of a standard library.The third one is selected from real office software with hundreds of classes and more than100,000lines of code in total.All of them are open source.Faulty Versions of Target Programs:To investigate the relative effectiveness of metamorphic testing and assertion checking,we used mutation operators to seed faults to programs.A previous study[1]showed that well-defined mutation operators were valid for testing experiments6.In our experiment,mutants were seeded using the tool muJava[26].The tool uses two types of mutation operator:class 3Available at /projects/jboolexpr.4URL .5Available at .6We also attempted to use publicly accessible real faults of these programs to conduct the experiments.However,descriptions of these faults in the source repositories were either too vague or not available.Table2:Categories of Mutation OperatorsCategory DescriptionAOD Delete Arithmetic OperatorsAOI Insert Arithmetic OperatorsAOR Replace Arithmetic OperatorsROR Replace Relational OperatorsCOR Replace Conditional OperatorsCOI Insert Conditional OperatorsCOD Delete Conditional OperatorsSOR Replace Shift OperatorsLOR Replace Logical OperatorsLOI Insert Logical OperatorLOD Delete Logical OperatorASR Replace Assignment Operatorslevel and method level.Class level mutation operators are operators specific to generating faults in object-oriented programs at the class level.Method level mutation operators defined in[29]are operators specific for statement faults.We only seeded method level mutation operators to the programs under study,because our experiment concentrated on unit testing and because this set of operators had been studied extensively[29,1].Table2list all the mutation operators used in the controlled experiment.A total of151mutants were generated by muJava for the class Boyer,145for the class BooleanExpression,and378for TxnT ableSorter.Note that faults were only seeded into the methods supposedly covered by the test cases for unit testing.Table3lists the number of mutants under each category of operators.We used all of them in the controlled experiment.4.3Experimental ProcedureBefore the experiment,the subjects were given a six-hour training to use MT and assertion checking.The target programs and the tasks to be performed were also presented to the subjects.The subjects were briefed about the main functionality of each target program and the algorithm used,thus simulating the process in real-life in which a tester acquires the background knowledge of the program under test.They were blind to the use of mutants in the controlled experiment.For each program,the subjects were required to apply MT strictly following the procedure in Section3.3,as well as to add assertions to the source code for checking.We did not restrict the number of metamorphic relations and assertions.The subjects were told to develop metamorphic relations and assertions as they sawfit,with a view to thoroughly test each target program.We did not mandate the use of a particular testing case generation strategy,such as all-def-use criterion,for MT or assertion checking. The subjects were simply asked to provide adequate test cases for testing the target programs.This avoided the possibility that some particular test case selection strategy,when applied in large scale, might favor either MT or assertion checking.We asked the students to submit metamorphic relations, functions to generate follow-up test cases,functions to verify metamorphic relations,test cases for metamorphic testing,source code with inserted assertions,and test cases for assertion checking. They were also asked to report the time costs in applying metamorphic testing and assertion checking.Before testing the faulty versions with these functions,assertions,and test cases,we checked the student submissions carefully to ensure that there was no implementation error.4.4Addressing the Threats to ValidityWe briefly describe the threats to validity in this section before we present our main results in the next section.Internal Validity:Internal validity refers to whether the observed effects depend only on the intended experimental variables.For this experiment,we provided the subjects with all the background materials and confirmed with them that they had sufficient time to perform all the tasks.On the other hand,we appreciate that students might be interrupted by minor Internet activities when they performed their tasks.Hence,the time costs reported by the subjects should be conservative.Furthermore,the subjects did not know the nature and details of the faults seeded. This measure ensured that their“designed”metamorphic relations and assertions were unbiased with respect to the seeded faults. External Validity:External validity is the degree to which the results are generalizable to the testing of real-world systems.The programs used in our experiment were from real-life applications. For example,Eurobudget is widely used and has been downloaded more than10000times from SourceForge.On the other hand, some real-world programs can be much larger and less well documented than the open-source programs studied.More future studies may be in order for the testing of large complex systems using the MT method.5.EXPERIMENTAL RESULTSThis section presents the experimental results of applying metamorphic testing and assertion checking.They are structured according to the dependent variables presented in the last section.5.1Metamorphic Relations and AssertionsA critical and difficult step in applying MT and assertion checking is to develop metamorphic relations and assertions for target programs.Table4reports on the number of metamorphic relations and assertions identified by the subjects for the three target programs.The mean numbers of metamorphic relations developed by the subjects for the respective programs were2.79, 2.68,and5.00.The total numbers of different metamorphic relations identified by all subjects for the respective programs were 18,39,and25.The mean numbers of assertions for the respective programs were6.96,11.35,and10.97.For the sake of brevity, we list in Table5only the metamorphic relations identified by the subjects for the Boyer program.The results show that all the subjects could properly apply metamorphic testing and assertion checking after training.In general,they could identify a larger number of assertions than metamorphic relations.Furthermore,their abilities to identify metamorphic relations varied.In particular,we observe that all38subjects managed to propose metamorphic relations after some training for each of the three open-source programs.It confirms the belief by the originators of MT that testers can formulate metamorphic relations effectively.5.2Comparison of Fault DetectionCapabilitiesWe use the subjects’metamorphic relations,assertions,and source and follow-up test cases to test the faulty versions of the target programs.The mutation detection ratio[1]is used to compare the fault detection capabilities of MT and assertion checking strategies.The mutation detection ratio of a test set is defined as the number of mutants detected by the test set over the total number of mutants.For metamorphic testing,a mutant is detected if a source test case and follow up test cases executed。
第二语言习得期中考试复习题1. acquisition& learning➢The term “acquisition” is used to refer to picking up a second language through exposure, whereas the term “learning” is used to refer to the conscious study of a second language. Now most of the researchers use them interchangeably, irrespective of whether conscious or unconscious processes are involved2. incidental learning & intentional learning➢While reading for pleasure a reader does not bother to look up a new word in a dictionary, but a few pages later realizes what that word means, then incidental learning is said to have taken place.➢If a student is instructed to read a text and find out the meanings of unknown words, then it becomes an intentional learning activity. ngauage➢Language is a system of arbitrary vocal symbols used for human communication .That is to say , language is systematic (rule-governed ), symbolic and social.nguage Acquisition Device➢The capacity to acquire one’s FIRST LANGUAGE , when this capacity is pictured as a sort of mech anism or apparatus.5.Contrastive analysis❖Under the influence of behaviorism, researchers of language teaching developed the method of contrastive analysis (CA) to study learner errors. Its original aim is to serve foreign language teaching.6.Error analysis❖Error analysis aims to 1) find out how well the learner knows a second language, 2) find out how the learner learns a second language, 3) obtain information on common difficulties in second language learning, and to 4) serve as an aid in teaching or in the preparation and compilation of teaching materials (Corder, 1981).It is a methodology of describing Second Language Learners’ language system s.7.interlanguage❖It refers to the language that the L2 learner produced .❖The language produced by the learner is a system in its own right.❖The language is a dynamic system, evolving over time.8.Krashen and His Monitor Model❖ 1. The Acquisition-Learning Hypothesis❖ 2. The Monitor Hypothesis❖ 3. The Natural Hypothesis❖ 4. The Input Hypothesis❖ 5. The Affective Filter Hypothsis9. input hypothesis❖Its claims : The learner improves and progresses along the “natural order” when s/he receives second language “input” that is one step beyond his or her current stage of linguistic competence. For example, if a learner is at a stage “i”, then acquisition takes place when s/he is exposed to “Comprehensible Input” that belongs to level “i+1”.10. affective filter hypothesis❖The hypothesis is based on the theory of an affective filter, which states that successful second language acquisition depends on the learner’s feelings. Negative attitudes (including a lack of motivation or self-confidence and anxiety) are said to act as a filter, preventing the learner from making use of INPUT, and thus hindering success in language learning.11.Shumann’s Acculturation Model❖This model of second language acquisition was formulated by John.H.Schumann(1978), and applies to the natural context of second language acquisition where a second language is acquired without any instruction in the environment. Schumann defines acculturation as the process of becoming adapted to a new culture or rather , the social and psychological integration of the learner with the target language group.12.Universal Grammar⏹The language faculty built into the human mind consisting of principles and parameters.⏹This is the universal grammar theory associated with Noam Chomsky.⏹Universal Grammar sees the knowledge of grammar in the mind as having two components: “principles"that all languages have incommon and “parameters” on which they vary.13.M acLaughlin’s Information processing model☐SLA is the acquisition of a complex cognitive skill that must progress from controlled processing to automatic processing.14.Anderson’s ACT☐This is another general theory of cognitive learning that has been applied to SLA☐Also emphasizes the automatization process.☐Conceptualizing three types of memory:1. Working memory2. Declarative long term memory3. Procedural long-term memory15.fossilization☐It refers to the phenomenon in which second language learners often stop learning even though they might be far short of native-like competence. The term is also used for specific linguistic structures that remain incorrect for lengthy periods of time in spite of plentiful input.munication strategies⏹Communication strategies, known as CSs, consist of attempts to deal with problems of communication that have arisen in interaction.They are characterized by the negotiation of an agreement on meaning between the two parties.1.What it is that needs to be learnt in language acquisition?➢Phonetics and Phonology➢Syntax➢Morphology➢Semantics➢Pragmatics2.How experts study the children’s acquisition➢Observe young children’s learning to talk.➢Record the speech of their children➢Create a database➢Have a single hypothesis3.What are learning strategies? Give examples ?➢Intentional behaviour and thoughts that learners make use of during learning in order to better help them understand, learn or remember new information .➢Learning strategies are classified into :1. meta-cognitive strategies2. cognitive strategies3. socio-affective strategies4.What are the factors influencing the success of SLA ?●Cognitive factors :1. Intelligence2. Language aptitudenguage learning strategies●Affective factors:nguage attitudes2.Motivation5.What are the differences between the Behaviorist learning model and that of Mentalist?➢Behaviorist learning model claims that children acquired the L1 by trying to imitate utterances produced by people around them and by receiving negative or positive reinforcement of their attempts to do so. Language acquisition, therefore, was considered to be environmentally determined.6.What are the beneficial views obtained from the studies on children’s L1 acquisition?1. Children’s language acquisition goes through several stages2. These stages are very similar across children for a given language, although the rate at which individual children progress through them ishighly variable;3. These stages are similar across languages;4. Child language is rule-governed and systematic, and the rules created by the child do not necessarily correspond to adult ones;5. Children are resistant to correction;6. Children’s mental capacity limits the n umber of rules they can apply at any one time, and they will revert to earlier hypotheses when two ormore rules compete.7.What are the differences of error analysis from contrastive analysisContrastive analysis stresses the interfering effects of a first language on second language learning and claims that most errors come from interference of the first language. (Corder ,1967). However, such a narrow view of interference ignores the intralingual effects of language learning among other factors. Error an alysis is the method to deal with intralingual factors in learners’ language (Corder, 1981).it is a methodology of describing Second Language Learners’ language systems .Error analysis is a type of bilingual comparison, a comparison between learners’ inte rlanguage and a target language, while contrastive analysis between languages. (native language and target language)8.What are UG principles and parameters?➢The universal principle is the principle of structure-dependency, which states that language is organized in such a way that it crucially depends on the structural relationships between elements in a sentence.➢Parameters are prnciples that differ in the way they work or function from language to language. That is to say there are certain linguistic features that vary across languages.9.What role does UG play in SLA?➢Three possibilities :1. UG operates in the same way for L2 as it does for L1.2. The learner’s Core grammar is fixed and UG is no longer available to the L2 learner, particularly not to th e adult learner.3. UG is partly available but it is only one factor in the acquisition of L2. There are other factors and they may interfere with the UGinfluence.10.What are classifications of communication strategies?Faerch and Kasper characterizes CSs in the light of learners’ attempts at governing two different behaviors and their taxonomies are achievement and reduction strategies , and they are based on the psycholinguistics.➢Achievement Strategies:⏹Paraphrase⏹Approximation⏹Word coinage⏹Circumlocution⏹Conscious Transfer⏹Literal translation⏹Language switch (borrowing)⏹Mime⏹Use body language and gestures to make communication open⏹Appeal for assistance➢Reduction Strategies⏹Message abandonment(topic shift):Ask a student to answer the question :How old are you ? She must utter two orthree sentences to answer the question, but she mustn’t tell her age.⏹Topic avoidance(Silence)。
Literature Comparison MethodsLiterature comparison is a crucial aspect of literary analysis, as it allows readers to gain a deeper understanding of the texts they are studying. There are several methods that can be used to compare literature, each with its own strengths and weaknesses. In this response, I will explore some of the most common literature comparison methods, including close reading, intertextuality, and historical context, and discuss their implications for literary analysis.Close reading is a method of literary analysis that involves examining a text in great detail, paying close attention to language, structure, and form. This method allows readers to uncover the nuances and complexities of a text, and can reveal important themes, symbols, and motifs. By closely reading two or more texts side by side, readers can identify similarities and differences in the way that they use language and structure to convey meaning. This can provide valuable insights into the texts and the ways in which they relate to each other.Intertextuality is another important method of literature comparison, which focuses on the ways in which texts are interconnected and refer to each other. This method involves identifying and analyzing the ways in which one text influences or is influenced by another, whether through direct references, allusions, or shared themes and motifs. By examining the intertextual connections between two or more texts, readers can gain a deeper understanding of the ways in which they are related and the ways in which they contribute to a larger literary tradition.Historical context is also a crucial aspect of literature comparison, as it allows readersto situate texts within their cultural, social, and political environments. By considering the historical circumstances in which a text was written, readers can gain insights into the ways in which it reflects and responds to the concerns of its time. When comparing two or more texts, it is important to consider the historical context of each, as this can provide important insights into the ways in which they are similar or different, and the ways in which they contribute to larger literary and cultural conversations.In addition to these methods, there are several other approaches to literature comparison that can be valuable for literary analysis. For example, readers can compare texts based on their genre, style, or thematic content, in order to gain insights into the ways in which they are similar or different. They can also consider the ways in which texts have been received and interpreted by different audiences over time, in order to gain insights into the ways in which they have been understood and valued.Overall, literature comparison is a complex and multifaceted process that requires careful attention to detail and a willingness to explore texts from multiple perspectives. By utilizing a variety of methods, including close reading, intertextuality, and historical context, readers can gain a deeper understanding of the texts they are studying and the ways in which they relate to each other. This can enrich their appreciation of literature and provide valuable insights for further analysis and interpretation.。
The P ANDA framework for comparing patterns qIlaria Bartolini a ,Paolo Ciaccia a ,Irene Ntoutsi b ,Marco Patella a,*,Yannis Theodoridis baDEIS,University of Bologna,viale Risorgimento,2,40136Bologna,Italy b Department of Informatics,University of Piraeus,Greece and Research Academic Computer Technology Institute,Athens,Greecea r t i c l e i n f o Article history:Received 10July 2008Received in revised form 3October 2008Accepted 4October 2008Available online 25October 2008Keywords:Pattern comparison Pattern base management systems Data models Knowledge discoverya b s t r a c tData Mining techniques are commonly used to extract patterns ,like association rules anddecision trees,from huge volumes of data.The comparison of patterns is a fundamentalissue,which can be exploited,among others,to synthetically measure dissimilarities inevolving or different datasets and to compare the output produced by different data miningalgorithms on a same dataset.In this paper,we present the P ANDA framework for computingthe dissimilarity of both simple and complex patterns,defined upon raw data and other pat-terns,respectively.In P ANDA the problem of comparing complex patterns is decomposedinto simpler sub-problems on the component (simple or complex)patterns and so-obtained partial solutions are then smartly aggregated into an overall dissimilarity score.This intrinsically recursive approach grants P ANDA with a high flexibility and allows it toeasily handle patterns with highly complex structures.P ANDA is built upon a few basic con-cepts so as to be generic and clear to the end user.We demonstrate the generality and flex-ibility of P ANDA by showing how it can be easily applied to a variety of pattern types,including sets of itemsets and clusterings.Ó2008Elsevier B.V.All rights reserved.1.IntroductionA huge amount of heterogeneous data is collected nowadays from a variety of data sources (e.g.,business,health care,telecommunication,science).The storage rate of these data collections is growing at a phenomenal rate (over 1exabyte per year,according to a recent survey [1]).Due to their quantity and complexity it is impossible for humans to thoroughly investigate these data collections through a manual process.Knowledge discovery in data (KDD)tries to solve this problem by discovering hidden information using data mining (DM)techniques.DM results,called patterns ,constitute compact and rich in semantics representations of raw data [2].Well-known examples of patterns are decision trees,clusterings,and fre-quent itemsets.Patterns reduce the complexity and size of data collections,while preserving most of the information of the original raw data;the degree of preservation,however,strongly depends on the parameters of the DM algorithms used for their extraction.The wide spreading of DM technology makes the problem of efficiently managing patterns an important research issue.Ideally,patterns should be treated by pattern management systems as ‘‘first-class citizens”,in the same fashion that raw data are treated by traditional database management systems.Along this line of research some interesting results,mainly con-centrated on representation and querying issues,have been obtained [2,3].In this paper,we address the relevant issue of pattern comparison ,i.e.,how to establish whether two patterns are similar or not.Pattern comparison is valuable in0169-023X/$-see front matter Ó2008Elsevier B.V.All rights reserved.doi:10.1016/j.datak.2008.10.004qThis work was partially supported by the IST-2001-33058Thematic Network P ANDA ‘‘PAtterns for Next-generation DAtabase systems”.*Corresponding author.E-mail addresses:i.bartolini@unibo.it (I.Bartolini),paolo.ciaccia@unibo.it (P.Ciaccia),ntoutsi@unipi.gr (I.Ntoutsi),marco.patella@unibo.it (M.Patella),ytheod@unipi.gr (Y.Theodoridis).Data &Knowledge Engineering 68(2009)244–260Contents lists available at ScienceDirectData &Knowledge Engineeringjo ur na l h o me pa ge :w w w.e ls ev ie r.c o m/lo c a t e/da t a kI.Bartolini et al./Data&Knowledge Engineering68(2009)244–260245 monitoring and detecting changes in patterns describing evolving data(e.g.,the purchasing behavior of customers over time),as well as in a number of other scenarios,some of which are sketched in Section1.1.A principled approach to pattern comparison needs to address several problems.First,there is a large amount of heter-ogeneous patterns for which a dissimilarity operator should be defined:since each of these pattern types could have its own specific requirements on how the dissimilarity should be assessed,it seems almost impossible(and possibly meaningless)to define a‘‘universal”dissimilarity measure.Second,besides patterns defined over raw data(hereafter called simple patterns), there also exist patterns defined upon other patterns,e.g.,a cluster of frequent itemsets,an association rule of clusters,a forest of decision trees,etc.For these patterns,hereafter called complex patterns,dissimilarity operators should also be de-fined:how these are related to the corresponding ones defined for component patterns needs to be addressed.Third,one should consider that two patterns can be more or less similar both in the data they represent and in the way they represent such data.For instance,two clusters might differ either because of their‘‘shape”or because of the amount of raw data they summarize(or because of both).Given the above,we can state a series of high-level methodological requirements that a framework for dissimilarity assessment should satisfy:General applicability:The framework should be applicable to arbitrary types of patterns.Flexibility:The framework should allow for the definition of alternative dissimilarity functions,even for the same pattern type.Indeed,the end user should be able to easily adjust the dissimilarity criterion to her specific needs.Simplicity:The framework should be built upon a few basic concepts,so as to be understandable to the end user.Efficiency:It should be possible to define the dissimilarity between patterns without the need of accessing the underlying raw data.This requirement also encompasses privacy issues,e.g.,when raw data are not publicly available.The framework we propose,called P ANDA,1addresses above requirements as follows.Generality is achieved by considering that patterns can be(recursively)defined by means of a set of type constructors.To gain the necessaryflexibility in defining dissimilarity operators,P ANDA adopts a modular approach.In particular,the problem of comparing complex patterns is reduced to the one of comparing the corresponding sets(or lists,etc.)of component(simpler)ponent patterns arefirst paired(using a specific matching type)and their scores are then aggregated(through some aggregation function)so as to obtain the overall dissimilarity score.This recursive definition of dissimilarity allows highly complex patterns to be easily handled and, due to modularity,to change any component with an alternative one.To address the requirement of simplicity,P ANDA adopts a consistent approach to model patterns,which are viewed as entities composed of two parts:the structure component identifies ‘‘interesting”regions in the attribute space,e.g.,the head and the body of an association rule,whereas the measure component describes how the pattern is related to the underlying raw data,e.g.,the support and the confidence of the rule.When compar-ing two simple patterns,the dissimilarity of their structure components(hereafter,structure dissimilarity)and the dissimilarity of their measure components(hereafter,measure dissimilarity)are combined(through some combining function)in order to de-rive the total dissimilarity score.Finally,considering the efficiency issue,P ANDA only works in‘‘pattern space”,i.e.,raw data need not to be accessed to evaluate patterns’dissimilarity.It has to be remarked that is not in the P ANDA scope the issue of determining the‘‘best”measure for every comparison problem.Indeed,P ANDA represents a conceptual environment within which specific,user-and/or application-dependent,dis-similarity measures can be framed.Obviously,P ANDA is also amenable to act as a software framework,in which case further advantages are that of favoring the reusability of components and the easy development of user-defined building blocks into ready-to-use libraries.1.1.Motivating examplesIn this section,we provide some illustrative examples which demonstrate the usefulness of a pattern comparison oper-ation.Afirst application is as an alternative to the comparison of raw data collections,e.g.,the monthly sales of a supermar-ket.Approaches which use pattern sets in order to compare the original raw datasets already exist in the literature,e.g.[4,5]: such approaches are based on the intuition that,since patterns condensate the information existing in the raw data,their dissimilarity is a(either lossless or lossy)representation of the dissimilarity of the originating data[6].Defining such a map-ping between dissimilarity in the raw data space and that in the corresponding pattern space is really useful:if the compar-ison between patterns does not show substantial differences,it is possible to avoid a thorough(and costly)analysis on the raw datasets.In the same direction,pattern comparison might be helpful in the distributed database domain to analyze,for example,differences of data characteristics across distributed datasets(e.g.,customer transactions in branches of a super-market or human reactions to chemical/biological substances).Other applications include pattern base synchronization (i.e.,keeping patterns up to date with respect to the original raw data),versioning support in a pattern management system (getting a differential backup of the new version or compare versions of the pattern base so as to discover changes and outliers),the discovery of unexpected or outlier patterns(by comparing them to a target pattern),the evaluation of DM 1P ANDA stands for PAtterns for Next-generation DAtabase systems,an acronym used for the IST-2001-33058project of the European Union,which proposed and studied the PBMS(Pattern Base Management Systems)concept.246I.Bartolini et al./Data&Knowledge Engineering68(2009)244–260algorithms(through the comparison of their outcomes),or secure DM where,due to privacy considerations,only patterns (and not the underlying raw data)are available;in this latter case,the comparison should involve only pattern space char-acteristics,since connection to raw data is lost.We conclude this section by describing a few scenarios where similarity between patterns plays an important role. Example1.Consider a telecommunication company providing a package of new generation services with respect to different customer profiles.Let a decision maker of the company request a monthly report depicting the aggregated usage information of this package as extracted from a data warehouse.Such a report would be far more translatable by the decision maker,e.g.,for target marketing,if it was accompanied by the monthly comparison of the classification of the customer profiles using such services,as these are portrayed,say,via decision tree models.Example2.A spatial DM application analyzes how much the density of population in a town correlates with the number of car accidents.For privacy reasons,raw data are not available,rather only the distributions of population and car accidents in the areas of the town can be used.Such distributions cannot only be compared on a per-area basis,because a high correlation is only detected when the distributions of neighboring areas are compared.The definition of a similarity operator between distributions should beflexible enough to take such correlation into account.Example3.A copy detection system developer has to experiment with different techniques for comparing multimedia doc-uments,in order to select the most effective one.She is given a feature-based representation of the documents(e.g.,list of keywords with weights for the text,distribution of color for the images),and needs to setup a set of methods that take into account all such features and return a score assessing how similar two documents are.The rest of the paper is organized as follows:in Section2,we describe the pattern model underlying the framework and introduce two running examples.Section3is devoted to explain the basic concepts and mechanisms of the P ANDA framework, whereas Section4demonstrates how several comparison measures proposed in the literature can be modeled within the framework.Further examples are included in Appendix A,together with actual experimental results as obtained from a pro-totype software implementation described in Section5.Related work is discussed in Section6,while Section7concludes.2 2.Pattern representationOur approach to pattern representation builds upon the logical pattern base model proposed in[9];in the sequel we de-scribe only the parts of the model relevant to our purposes(for a detailed presentation,please refer to[9]).The model assumes a set of base types(e.g.,Int,Real,Boolean,and String)and a set of type constructors,including list ð<ÁÁÁ>Þ,setðfÁÁÁgÞ,arrayð½ÁÁÁ Þ,and tupleððÁÁÁÞÞ.Let us call T the set of types including all the base types and all the types that can be derived from them through repeated application of the type constructors.Types to which a(unique)name is assigned are called named types.Some examples of types are:{Int}set of integersXYPair=(x:Int,y:Int)named tuple type with attributes x and y <XYPair>list of XYPair sDefinition1(Pattern type).A pattern type is a named pair,PT¼ðSS;MSÞ,where SS is the structure schema and MS is the measure schema.Both SS and MS are types in T.A pattern type PT is called complex if its structure schema SS includes another pattern type,otherwise PT is called simple.The structure schema SS defines the pattern space by describing the structure of the patterns which are instances of the particular pattern type.The complexity of the pattern space depends on the expressiveness of the typing system T.The mea-sure schema MS describes measures that relate the pattern to the underlying raw data or,more in general,provides quan-titative information about the pattern itself.It is clear that the measure complexity also depends exclusively on T.A pattern is an instance of a pattern type,thus it instantiates both the structure and the measure schemas.Assuming that each base typeB is associated with a set of values domðBÞ,it is immediate to define values for any type in T.Definition2(Pattern).Let PT¼ðSS;MSÞbe a pattern type.A pattern p,instance of PT,is defined as p¼ðs;mÞ,where p is the pattern identifier,s(the structure of p,also denoted as p:s)is a value for type SS,p:s2domðSSÞ,and m(the measure of p,also denoted as p:m)is a value for type MS,p:m2domðMSÞ.Before describing the main concepts of the P ANDA framework,we introduce here two running examples that will be used throughout the paper to show the applicability of our framework to real cases,namely the comparison of clusterings and of collections of documents.In particular,in thefirst example,each clustering(set of clusters)represent an image in a2This paper extends the concepts introduced in[7,8]by providing a more formal presentation,along with a significant number of new experiments and examples of application of the framework.region-based image retrieval(RBIR)system,where images are retrieved according to their similarity to a provided query im-age.Experiments on both running examples are detailed in Appendix A.Example4(Clusterings(images)).We illustrate here the case of the W INDSURF image retrieval system[10],which applies a clustering algorithm on visual characteristics of images so as to divide each image into regions of homogeneous pixels (clusters),but the behavior of other RBIR systems can be modeled in a similar way.In details,W INDSURF applies a Discrete Wavelet Transform to each image and the k-means algorithm is used to cluster together pixels sharing common visual characteristic,like color and texture.Each region is then represented as a cluster using the centroid and the corresponding covariance matrix for each color channel and wavelet sub-band(details can be found in[10]),while the cluster support(i.e., the fraction of image pixels contained in the region)is used as the pattern measure.In terms of the P ANDA model,each region(simple pattern)is modeled asRegion¼ðSS:ðbands:½ðcenter:½Real 31;cov:½½Real 3131Þ 41Þ;MS:ðsupp:RealÞÞ:Images are then defined as sets of regions(clusters)with no measure:Image¼ðSS:f Region g;MS:?Þ;where?denotes the null type.Example5(Collections of documents).The problem of comparing collections of documents is quite common in web mining where,for example,it is used tofind sites selling similar products.The problem,in its basic form,assumes a collection(set)of textual documents,where each document consists of a set of keywords.Each keyword k in a document is associated to its(normalized)weight in the document itself(e.g.,representing its frequency using tf=idf measures),and can therefore be modeled as a simple pattern:Keyword¼ðSS:ðterm:StringÞ;MS:ðweight:RealÞÞ:A possible instance of this type isp407¼ððterm¼databaseÞ;ðweight¼0:5ÞÞ:Consequently,documents and collections are represented respectively asDocument¼ðSS:f Keyword g;MS:?Þ;Collection¼ðSS:f Document g;MS:?Þ:3.The P ANDA frameworkIn this section,we provide a framework for assessing the dissimilarity of two patterns,p1and p2,of the same type PT.From Section2,it is evident that the complexity of PT can widely vary and is only restricted by the adopted typing system T.Our framework is built upon two basic principles:1.The dissimilarity between two patterns should be evaluated by taking into account both the dissimilarity of their struc-tures and the dissimilarity of their measures.2.The dissimilarity between two complex patterns should(recursively)depend on the dissimilarity of their componentpatterns.Thefirst principle is a direct consequence of having allowed for arbitrarily complex structures in patterns.Since the struc-ture of a complex pattern might include measures of its component patterns,neglecting the structure dissimilarity could easily result in misleading results.For instance,comparing two Image s,as defined in Example4,obviously needs to take into account the structure component,since the measure one is empty.Another motivation underlying this principle arises from the need of building an efficient framework,which does not force accessing the underlying dataset(s)in order to deter-mine the dissimilarity of two patterns,e.g.,in terms of their common instances.To this end,we use all pieces of information that are available in the pattern space,namely the structural description of the patterns and their quantitative measures with respect to the underlying raw data.The second principle provides the necessaryflexibility to the P ANDA framework.Although,for the case of complex patterns,one could devise arbitrary models for their comparison,it is useful and,at the same time,sufficient for practical purposes,to consider solutions that decompose the‘‘difficult”problem of comparing complex patterns into simpler sub-problems like those of comparing simple patterns,and then‘‘smartly”aggregate the so-obtained partial solutions into an overall score.Besides the above principles,it is also sometimes convenient,in order to offer a better and more intuitive interpretation of the results,to assume that the dissimilarity between two patterns yields a score value,normalized in the½0;1 range(the higher the score,the higher the dissimilarity).Unless otherwise stated,we will implicitly make this assumption throughout the paper.I.Bartolini et al./Data&Knowledge Engineering68(2009)244–260247We start by describing how thefirst principle is applied to the basic case of simple patterns,after that we show how to generalize the framework to the case of complex patterns.3.1.Dissimilarity between simple patternsThe dissimilarity between two patterns,p1and p2,of a simple pattern type PT is based on three key ingredients:a structure dissimilarity function,dis struct,that evaluates the dissimilarity of the structure components of the two patterns,p 1:s and p2:s,a measure dissimilarity function,dis meas,used to assess the dissimilarity of the corresponding measure components,p1:mand p2:m,anda combining function,Comb,also called the combiner,yielding an overall score from the structure and measure dissimilar-ity scores.The dissimilarity of two patterns is consequently determined as(see also Fig.1)disðp1;p2Þ¼Combðdis structðp1:s;p2:sÞ;dis measðp1:m;p2:mÞÞ:ð1ÞIf p1and p2share the same structure,then dis structðp1:s;p2:sÞ¼0.In the general case,in which the patterns have differentstructures,two alternatives exist:1.The structural components are somewhat‘‘compatible”,in which case we interpret dis structðp1:s;p2:sÞas the‘‘additionaldissimilarity”one wants to charge with respect to the case of identical structures.2.Structures are completely unrelated(in a sense that depends on the case at hand),i.e.,dis structðp1:s;p2:sÞ¼1.In this case,regardless of the measure dissimilarity,we also require the overall dissimilarity to be maximum,i.e.,disðp1;p2Þ¼1.Thisrestriction is enforced to prevent cases where two completely different patterns might be considered somehow similar due to low differences in their measures.Example6.Continuing Example5,consider two keywords k1¼ðt1;w1Þand k2¼ðt2;w2Þto be compared.For the structure dissimilarity function,if the two terms are the same,then dis structðt1;t2Þ¼0.When t1–t2,if some information about the semantics of the terms is available,such as a thesaurus or a hierarchical hypernymy/hyponymy ontology,like WordNet[11], then one could set dis structðt1;t2Þ<1to reflect the‘‘semantic distance”between t1and t2[12];on the other hand,if no such information is available,then dis structðt1;t2Þ¼1.A possible choice for the measure dissimilarity function is the absolute difference of measures,i.e.,dis measðw1;w2Þ¼j w1Àw2j.Finally,a possible combiner for this example is,say,the algebraic disjunction of the two dissimilarities:disðk1;k2Þ¼dis structðt1;t2Þþdis measðw1;w2ÞÀdis structðt1;t2ÞÁdis measðw1;w2Þ;ð2Þthat correctly yields disðk1;k2Þ¼1when dis structðt1;t2Þ¼1,and disðk1;k2Þ¼dis measðw1;w2Þwhen dis structðt1;t2Þ¼0. Example7.Continuing our other running example(Example4),W INDSURF uses the Bhattacharyya distance[10]to compare regions,i.e.,clusters structures:dis structðp1:s;p2:sÞ2¼X4b¼11=2Álndet p1:bands½b :covþp2:bands½b :covffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetðp1:bands½b :covÞÁdetðp2:bands½b :covÞp@þ1=8ðp1:bands½b :cÀp2:bands½b :cÞTÁp1:bands½b :covþp2:bands½b :cov2À1Áp1:bands½b :cÀp2:bands½b :c ðÞ!!;where detðÁÞdenotes the determinant of a matrix.The measure dissimilarity is defined as dis measðp1:m;p2:mÞ¼j p1:suppÀp2:supp j:248I.Bartolini et al./Data&Knowledge Engineering68(2009)244–260Finally,the combining function simply averages the two distances.It has to be observed that,in several cases,patterns have no measure at all;for instance,sets of strings have type: Set¼ðSS:f String g;MS:?Þ.In this case the assessed dissimilarity will depend only on how much the structural compo-nents of the patterns differ,i.e.,disðp1;p2Þ¼dis structðp1:s;p2:sÞ.It has to be remarked that our framework does not preclude the possibility of defining different dis struct,dis meas and Comb functions for each pattern type of interest.Rather,the functions best suited to the case at hand should be chosen,possibly depending on specific user’s needs,e.g.,to focus the comparison only on some patterns’properties,to trade-off accuracy for computational costs,etc.(see also Section5).3.2.Dissimilarity between complex patternsAlthough in line of principle one could define simple patterns with arbitrarily complicated structural components,this would necessarily force dissimilarity functions to be complex as well and hardly reusable.Among the requirements stated in the introduction,this‘‘monolithic”approach would only comply with that of efficiency,however,failing to address any of the other ones.In P ANDA,we pursue a modular approach that,by definition,is better suited to guaranteeflexibility,sim-plicity,and reusability.Moreover,as it will be discussed later,this does not rule out the possibility of efficient implementations.Coherently with the second principle inspiring our approach,the dissimilarity of complex patterns is evaluated starting from the dissimilarities of the corresponding component patterns.In particular,the structure of complex patterns plays here a major role,since it is where pattern composition occurs.Without loss of generality,in what follows it is assumed that the component patterns,p1;p2;...;p N,of a complex pattern cp completely describe the structure of cp(no additional information is present in cp:s)and that they form a set,i.e., cp:s¼f p1;p2;...;p N g.At the end of this section,we describe how complex patterns built using other type constructors(lists, vectors,and tuples)can be dealt with.The structure dissimilarity of complex patterns cp1:s¼f p11;p21;...;p N11g and cp2:s¼f p12;p22;...;p N22g depends on two fun-damental abstractions,namely:the matching type,which is used to establish how the component patterns of cp1and cp2can be matched,andthe aggregation logic,which is used to combine the dissimilarity scores of the matched component patterns into a single value representing the total dissimilarity between the structures of the complex patterns.3.2.1.Matching typeA matching between the complex patterns cp1:s¼f p11;p21;...;p N11g and cp2:s¼f p12;p22;...;p N22g is a matrix X N1ÂN2¼ðx i;jÞij,where each element x i;j2½0;1 ði¼1;...;N1;j¼1;...;N2Þrepresents the(amount of)matching between the i th componentpattern of cp1and the j th component pattern of cp2,i.e.,between p i1and p j2.A matching type is a set of constraints on the x i;j coefficients so that only some matchings are valid.Relevant cases of matching types include:1–1matching:In this case,each component pattern of cp1(resp.,cp2)might be matched to at most one component patternof cp2(resp.,cp1).Partial matching occurs if N1–N2.The1–1matching type corresponds to the following set ofconstraints:X N1 i¼1x i;j618j;X N2j¼1x i;j618iX N1i¼1X N2j¼1x i;j¼min f N1;N2g;x i;j2f0;1g8i;jN—M(complete)matching:In this case,each component pattern of cp1(resp.,cp2)is matched to every component patternof cp2and vice versa,i.e.,x i;j¼1;8i;j.EMD matching:This matching type,introduced for defining the earth mover’s distance(EMD)[13,14],differs from previ-ous ones in that each x i;j might be real-valued,and represents the amount of p i1‘‘mass”that is matched with p j2.The cor-responding constraints on the matching matrix are:X N1 i¼1x i;j6w j28jX N2j¼1x i;j6w i18i;X N1i¼1X N2j¼1x i;j¼minX N1i¼1w i1;X N2j¼1w j2();x i;j2½0;1 8i;j;where w i1(resp.,w j2)is the weight(mass amount)associated to each component pattern p i1(resp.,p j2)of cp1(resp.,cp2).Finally,note that dissimilarity functions rely,either explicitly or implicitly,on a specific matching type.For instance,the N—M complete matching is used by the complete linkage algorithm to compare clusters.Variations of matching types de-scribed above are also common,such as the one used by the dynamic time warping(DTW)distance[15]as well as related distances for time series(see also Section4.3).I.Bartolini et al./Data&Knowledge Engineering68(2009)244–260249。
对比论证英文作文怎么写英文回答:A comparative essay is a type of academic writing that compares and contrasts two or more subjects. It is acritical analysis of similarities and differences between the subjects, which can be anything from literary works to historical events to scientific theories.The first step in writing a comparative essay is to choose your subjects. The subjects should be similar enough to make a meaningful comparison, but they should also be different enough to make the comparison interesting. Once you have chosen your subjects, you need to develop a thesis statement. The thesis statement is a one-sentence summary of your argument, and it should state the main similarities and differences between your subjects.The next step is to write the body of your essay. The body of the essay should be divided into paragraphs, eachof which focuses on a different similarity or difference between your subjects. Each paragraph should begin with a topic sentence that states the main point of the paragraph, and it should provide evidence to support the topic sentence. The evidence can come from your own research, from the sources you have read, or from your own experiences.The conclusion of your essay should restate your thesis statement and summarize the main points of your essay. It should also provide a final thought or reflection on the comparison.中文回答:对比论证英文作文是一种学术写作类型,用来比较和对比两个或多个主题。
A Comparison of Algorithms for the Optimization of Fermentation ProcessesRui MendesIsabel RochaEug´e nio C.FerreiraMiguel RochaAbstract —The optimization of biotechnological processes isa complex problem that has been intensively studied in the past few years due to the economic impact of the products obtained from fermentations.In fed-batch processes,the goal is to find the optimal feeding trajectory that maximizes the final productivity.Several methods,including Evolutionary Algorithms (EAs)have been applied to this task in a number of different fermentation processes.This paper performs an experimental comparison between Particle Swarm Optimization,Differential Evolution and a real-valued EA in three distinct case studies,taken from previous work by the authors and literature,all considering the optimization of fed-batch fermentation processes.I.I NTRODUCTIONA number of valuable products such as recombinant pro-teins,antibiotics and amino-acids are produced using fer-mentation techniques.Additionally,biotechnology has been replacing traditional manufacturing processes in many areas like the production of bulk chemicals,due to its relatively low requirements regarding energy and environmental costs.Consequently,there is an enormous economic incentive to develop engineering techniques that can increase the pro-ductivity of such processes.However,these are typically very complex,involving different transport phenomena,microbial components and biochemical reactions.Furthermore,the nonlinear behavior and time-varying properties,together with the lack of reliable sensors capable of providing direct and on-line measurements of the biological state variables limits the application of traditional control and optimization techniques to bioreactors.Under this context,there is the need to consider quantita-tive mathematical models,capable of describing the process dynamics and the interrelation among relevant variables.Additionally,robust global optimization techniques must deal with the model’s complexity,the environment constraints and the inherent noise of the experimental process [3].In fed-batch fermentations,process optimization usually encompasses finding a given nutrient feeding trajectory that maximizes productivity.Several optimization methods have been applied in this task.It has been shown that,for simple bioreactor systems,the problem can be solved analytically [24].Rui Mendes and Miguel Rocha are with Department of Infor-matics and the Centro de Ciˆe ncias e Tecnologias da Computac ¸˜a o,Universidade do Minho,Braga,Portugal (email:azuki@di.uminho.pt,mrocha@di.uminho.pt).Isabel Rocha and Eug´e nio Ferreira with the Centro de Engenharia Biol´o gica da Universidade do Minho (email:irocha@deb.uminho.pt,ecferreira@deb.uminho.pt).Numerical methods make a distinct approach to this dy-namic optimization problem.Gradient algorithms are used to adjust the control trajectories in order to iteratively improve the objective function [4].In contrast,dynamic programming methods discretize both time and control variables to a predefined number of values.A systematic backward search method in combination with the simulation of the system model equations is used to find the optimal path through the defined grid.However,in order to achieve a global optimum the computational burden is very high [23].An alternative approach comes from the use of algorithms from the Evolutionary Computation (EC)field,which have been used in the past to optimize nonlinear problems with a large number of variables.These techniques have been applied with success to the optimization of feeding or temperature trajectories [14][1],and,when compared with traditional methods,usually perform better [20][6].In this work,the performance of different algorithms belonging to three main groups -Evolutionary Algorithms (EA),Particle Swarm (PSO)and Differential Evolution (DE)-was compared,when applied to the task of optimizing the feeding trajectory of fed-batch fermentation processes.Three test cases taken from literature and previous work by the authors were used.The algorithms were allowed to run for a given number of function evaluations that was deemed to be enough to achieve acceptable results.The comparison among the algorithms was based on their final result and on the convergence speed.The paper is organized as follows:firstly,the fed-batch fermentation case studies are presented;next,PSO,DE and a real-valued EA are described;the results of the application of the different algorithms to the case studies are presented;finally,the paper presents a discussion of the results,conclu-sions and further work.II.C ASESTUDIES :FED -BATCH FERMENTATIONPROCESSESIn fed-batch fermentations there is an addition of certain nutrients along the process,in order to prevent the accumu-lation of toxic products,allowing the achievement of higher product concentrations.During this process the system states change considerably,from a low initial to a very high biomass and product concen-trations.This dynamic behavior motivates the development of optimization methods to find the optimal input feeding trajectories in order to improve the process.The typical input in this process is the substrate inflow rate time profile.0-7803-9487-9/06/$20.00/©2006 IEEE2006 IEEE Congress on Evolutionary ComputationSheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 20062018For the proper optimization of the process,a white box mathematical model is typically developed,based on dif-ferential equations that represent the mass balances of the relevant state variables.A.Case study IIn previous work by the authors,a fed-batch recombinant Escherichia coli fermentation process was optimized by EAs [17][18].This was considered as thefirst case study in this work and will be briefly described next.During the aerobic growth of the bacterium,with glucose as the only added substrate,the microorganism can follow three main different metabolic pathways:•Oxidative growth on glucose:k1S+k5Oµ1−→X+k8C(1)•Fermentative growth on glucose:k2S+k6Oµ2−→X+k9C+k3A(2)•Oxidative growth on acetic acid:k4A+k7Oµ3−→X+k10C(3) where S,O,X,C,A represent glucose,dissolved oxygen, biomass,dissolved carbon dioxide and acetate components, respectively.In the sequel,the same symbols are used to represent the state variables’concentrations(in g/kg);µ1to µ3are time variant specific growth rates that nonlinearly depend on the state variables,and k i are constant yield coefficients.The associated dynamical model can be described by the following equations:dXdt =(−k1µ1−k2µ2)X+F in,S S indt =(k3µ2−k4µ3)X−DA(6)dOdt =(k8µ1+k9µ2+k10µ3)X−CT R−DC(8)dWT f(10) The relevant state variables are initialized with the follow-ing values:X(0)=5,S(0)=0,A(0)=0,W(0)=3. Due to limitations in the feeding pump capacity,the value of F in,S(t)must be in the range[0.0;0.4].Furthermore, the following constraint is defined over the value of W: W(t)≤5.Thefinal time(T f)is set to25hours.B.Case study IIThis system is a fed-batch bioreactor for the production of ethanol by Saccharomyces cerevisiae,firstly studied by Chen and Huang[5].The aim is tofind the substrate feed rate profile that maximizes the yield of ethanol.The model equations are the following:dx1x4(11)dx2x4(12)dx3x4(13)dx41+x30.22+x2(15)g2=171.5x2dx10.12+A −ux1dt =x3x4e−5x4x5(19)dx3x5)x3(20) dx4x5(21)dx5(x4+0.4)(x4+62.5)(23) The aim of the optimization is tofind the feeding profile (u)that maximizes the following PI:P I=x1(T f)x5(T f)(24) Thefinal time is set to T f=15(hours)and the initial values for relevant state variables are the following:x1(0)= 0,x2(0)=0,x3(0)=1,x4(0)=5and x5(0)=1.The feed rate is constrained to the range u(t)∈[0.0;3.0].III.T HE A LGORITHMSThe optimization task is tofind the feeding trajectory, represented as an array of real-valued variables,that yields the best performance index.Each variable will encode the amount of substrate to be introduced into the bioreactor, in a given time unit,and the solution will be given by the temporal sequence of such values.In this case,the size of the genome would be determined based on thefinal time of the process(T f)and the discretization step(d)considered in the numerical simulation of the model,given by the expression: T fdI+1(25) where I stands for the number of points within each interpo-lation interval.The value of d used in the experiments was d=0.005,for case studies I,II and III.The evaluation process,for each individual in the pop-ulation,is achieved by running a numerical simulation of the defined model,given as input the feeding values in the genome.The numerical simulation is performed using ODEToJava,a package of ordinary differential equation solvers,using a linearly implicit implicit/explicit(IMEX) Runge-Kutta scheme used for stiff problems[2].Thefitness value is then calculated from thefinal values of the state variables according to the PI defined for each case.A.Particle Swarm OptimizationA particle swarm optimizer uses a population of particles that evolve over time byflying through space.The particles imitate their most successful neighbors by modifying their velocity component to follow the direction of the most successful position of their neighbors.Each particle is defined by:P(i)t= x t,v t,p t,e tx t∈R d is the current position in the search space;p t∈R d is the position visited by the particle in the past that had the best function evaluation;v t∈R d is a vector that represents the direction in which the particle is moving,it is called the‘velocity’;e t is the evaluation of p t under the function being optimized,i.e.e t=f(p t).Particles are connected to others in the population via a predefined topology.This can be represented by the adja-cency matrix of a directed graph M=(m ij),where m ij= 1if there is an edge from particle i to particle j and m ij=0 otherwise.At each iteration,a new population is produced by allow-ing each particle to move stochastically toward its previous best position and at the same time toward the best of the previous best positions of all other particles to which it is connected.The following is an outline of a generic PSO.1)Set the iteration counter,t=0.2)Initialize each x(i)and v(i)randomly.Set p(i)=x(i)0.3)Evaluate each particle and set e(i)=f(p(i)0).4)Let t=t+1and generate a new population,whereeach particle i is moved to a new position in the search space according to:(i)v(i)t=velocityvelocityupdate(v(i)t−1)=X j∈N(i)r·(c1+c2)4(i.e.,smallperturbations will be preferred over larger ones).where[min i;max i]is the range of values allowed for gene i.In both cases,an innovation is introduced:the mutation operators are applied to a variable number of genes(a value that is randomly set between1and10in each application). 20213p<0.001 20.001≤p<0.01 10.01≤p<0.05 N p≥0.05CanPso2.5154±0.71232.5563±0.70912.5641±0.7168 DE9.3693±0.05709.4738±0.00529.4770±0.0028 DEBest2.7077±0.19212.7419±0.21152.7936±0.2176 DETourn9.1044±0.19839.2913±0.12409.3596±0.1114 EA7.9371±0.13558.5161±0.08838.8121±0.0673 Fips9.1804±0.16429.4280±0.05519.4528±0.0538CanPso7.1461±1.11527.1461±1.11527.1461±1.1152 DE9.4351±0.00009.4351±0.00009.4351±0.0000 DEBest7.6932±0.83217.6937±0.83217.6937±0.8321 DETourn9.4099±0.05519.4099±0.05519.4099±0.0551 EA8.7647±0.14419.0137±0.14219.1324±0.1320 Fips9.4351±0.00009.4351±0.00009.4351±0.0000CanPso DE DEBest DETourn EA DEN-N-N3-3-1DETourn3-3-13-3-23-3-13-3-1FipsAlgorithm PI40k NFEs PI100k NFEs PI200k NFEsCanPso19385.2±284.319386.4±284.319406.8±272.5 DE20379.4±11.620397.2±13.920406.9±14.5 DEBest19418.1±290.019421.0±290.419430.5±293.5 DETourn20362.7±52.420380.4±42.720394.3±32.8 EA20151.8±69.720335.1±54.120394.7±23.1 Fips19818.0±160.719818.9±161.119818.9±161.1TABLE IVR ESULTS FOR CASE II FOR I=100(109VARIABLES),I=200(55 VARIABLES)AND I=540(21VARIABLES)RESPECTIVELY.Table V shows the comparison of the algorithms.As can be seen,CanPso continues to be the worst contender but DEBest is not a very bad choice when the number of variables is small.EA is still beaten by DE and DETourn in most cases. Figure2presents the convergence curve of the algorithms. DE and DETourn converge fast(around40,000NFEs);Fips gets stuck in a plateau that is higher than the one of DEBest and CanPso;EA converges slowly but is steadily improving. It seems that,given enough time,EAfinds similar solutions to either DE and DETourn.20233-3-1DEBest 3-3-1N-N-N 3-3-N EAN-N-N3-3-NN-N-N3-3-N3-3-NTABLE VP AIRWISE T -TEST WITH THEH OLM P -VALUE ADJUSTMENT FOR THEALGORITHMS OF CASEII.T HE P -VALUE CODES CORRESPOND TOI =100,I =200AND I =540RESPECTIVELY .1650017000 17500 18000 18500 19000 19500 20000 20500 050000100000 150000200000P INFEsCanPsoFips DE DEBest DETournEAFig.2.Convergence of the algorithms for case II for I =200.E.Results for case IIITable VI presents the results of the algorithms on case III.This case seems to be quite simple and most algorithms find similar results.DE ,Fips and EA are the best algorithms in this problem because of their reliability:they have narrow confidence intervals.DETourn seems to be a little less reliable,but its confidence intervals are still small enough.Table VII shows the comparison of the algorithms for this problem.In this case,most algorithms are not statistically different.This is the case when we turn to the reliability of the algorithms to draw conclusions.As we stated before,most algorithms find similar solutions,which indicates that this case is probably not a good benchmark to compare algorithms.Figure 3presents the convergence curve of the algorithms for I =100.In this case DE ,DETourn and Fips converge very fast;EA has a slower convergence rate;CanPso and DEBest get stuck in local optima.V.C ONCLUSIONSANDF URTHER W ORKThis paper compares canonical particle swarm (CanPso ),fully informed particle swarm (Fips ),a real-valued EA (EA )and three schemes of differential evolution (DE ,DEBest and DETourn )in three test cases of optimizing the feeding trajectory in fed-batch fermentation.Each of these problems was tackled with different numbers of points (i.e.,different values for I )to interpolate the feeding trajectory.This is a trade off:the more variables we have,the more precise the curve is but the harder it is to optimize.CanPso 27.069±1.75127.370±1.83627.579±1.681DE 32.641±0.02932.674±0.00232.680±0.001DEBest 30.774±1.00430.775±1.00430.775±1.004DETourn 32.624±0.05732.629±0.05632.631±0.056EA 32.526±0.02532.633±0.01332.670±0.008Fips 32.625±0.10032.629±0.09932.630±0.099CanPso 31.914±0.66231.914±0.66231.914±0.662DE 32.444±0.00032.444±0.00032.444±0.000DEBest 31.913±0.70031.914±0.70031.914±0.700DETourn 32.441±0.00532.441±0.00532.441±0.005EA 32.413±0.01232.439±0.00332.443±0.001Fips 32.444±0.00032.444±0.00032.444±0.000CanPsoDE DEBest DETourn EADE 1-N-N 1-N-N DETourn 2-N-NN-3-N1-N-N N-3-NFips1214 16 18 20 22 24 26 28 30 32 34 050000100000 150000200000P INFEsCanPsoFips DE DEBest DETournEAFig.3.Convergence of the algorithms for case III when I =100.to choose DE instead.Previous work by the authors [19]developed a new representation in EAs in order to allow the optimization of a time trajectory with automatic interpolation.It would be interesting to develop a similar approach within DE or Fips .Another area of future research is the consideration of on-line adaptation,where the model of the process is updated during the fermentation process.In this case,the good computational performance of DE is a benefit,if there is the need to re-optimize the feeding given a new model and values for the state variables are measured on-line.A CKNOWLEDGMENTSThis work was supported in part by the Portuguese Foundation for Science and Technology under project POSC/EIA/59899/2004.The authors wish to thank Project SeARCH (Services and Advanced Research Computing with HTC/HPC clusters),funded by FCT under contract CONC-REEQ/443/2001,for the computational resources made avail-able.R EFERENCES[1]P.Angelov and R.Guthke.A Genetic-Algorithm-based Approach toOptimization of Bioprocesses Described by Fuzzy Rules.Bioprocess Engin.,16:299–303,1997.[2]Spiteri Ascher,Ruuth.Implicit-explicit runge-kutta methods for time-dependent partial differential equations.Applied Numerical Mathe-matics ,25:151–167,1997.[3]J.R.Banga,C.Moles,and A.Alonso.Global Optimization of Bio-processes using Stochastic and Hybrid Methods.In C.A.Floudas and P.M.Pardalos,editors,Frontiers in Global Optimization -Nonconvex Optimization and its Applications ,volume 74,pages 45–70.Kluwer Academic Publishers,2003.[4]A.E.Bryson and Y .C.Ho.Applied Optimal Control -Optimization,Estimation and Control .Hemisphere Publication Company,New York,1975.[5]C.T.Chen and C.Hwang.Optimal Control Computation forDifferential-algebraic Process Systems with General Constraints.Chemical Engineering Communications ,97:9–26,1990.[6]J.P.Chiou and F.S.Wang.Hybrid Method of Evolutionary Algorithmsfor Static and Dynamic Optimization Problems with Application to a Fed-batch Fermentation puters &Chemical Engineering ,23:1277–1291,1999.[7]Maurice Clerc and James Kennedy.The particle swarm -explosion,stability,and convergence in a multidimensional complex space.IEEE Transactions on Evolutionary Computation ,6(1):58–73,2002.[8]J.Stuart Hunter George E.P.Box,William G.Hunter.Statistics forexperimenters:An introduction to design and data analysis .NY:John Wiley,1978.[9]Cyril Harold Goulden.Methods of Statistical Analysis,2nd ed .JohnWiley &Sons Ltd.,1956.[10]S Holm.A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics ,6:65–70,1979.[11]J.Kennedy and R.Mendes.Topological structure and particle swarmperformance.In David B.Fogel,Xin Yao,Garry Greenwood,Hitoshi Iba,Paul Marrow,and Mark Shackleton,editors,Proceedings of the Fourth Congress on Evolutionary Computation (CEC-2002),Honolulu,Hawaii,May 2002.IEEE Computer Society.[12]Rui Mendes,James Kennedy,and Jos´e Neves.The fully informed par-ticle swarm:Simple,maybe better.IEEE Transactions on EvolutionaryComputation ,8(3):204–210,2004.[13]Z.Michalewicz.Genetic Algorithms +Data Structures =EvolutionPrograms .Springer-Verlag,USA,third edition,1996.[14]H.Moriyama and K.Shimizu.On-line Optimization of CultureTemperature for Ethanol Fermentation Using a Genetic Algorithm.Journal Chemical Technology Biotechnology ,66:217–222,1996.[15]S.Park and W.F.Ramirez.Optimal Production of Secreted Protein inFed-batch Reactors.AIChE J ,34(9):1550–1558,1988.[16]I.Rocha.Model-based strategies for computer-aided operation ofrecombinant E.coli fermentation .PhD thesis,Universidade do Minho,2003.[17]I.Rocha and E.C.Ferreira.On-line Simultaneous Monitoring ofGlucose and Acetate with FIA During High Cell Density Fermentation of Recombinant E.coli.Analytica Chimica Acta ,462(2):293–304,2002.[18]M.Rocha,J.Neves,I.Rocha,and E.Ferreira.Evolutionary algo-rithms for optimal control in fed-batch fermentation processes.In G.Raidl et al.,editor,Proceedings of the Workshop on Evolutionary Bioinformatics -EvoWorkshops 2004,LNCS 3005,pages pp.84–93.Springer,2004.[19]Miguel Rocha,Isabel Rocha,and Eug´e nio Ferreira.A new represen-tation in evolutionary algorithms for the optimization of bioprocesses.In Proceedings of the IEEE Congress on Evolutionary Computation ,pages 484–490.IEEE Press,2005.[20]J.A.Roubos,G.van Straten,and A.J.van Boxtel.An EvolutionaryStrategy for Fed-batch Bioreactor Optimization:Concepts and Perfor-mance.Journal of Biotechnology ,67:173–187,1999.[21]Rainer Storn.On the usage of differential evolution for functionoptimization.In 1996Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS 1996),pages 519–523.IEEE,1996.[22]Rainer Storn and Kenneth Price.Minimizing the real functions ofthe icec’96contest by differential evolution.In IEEE International Conference on Evolutionary Computation ,pages 842–844.IEEE,May 1996.[23]A.Tholudur and W.F.Ramirez.Optimization of Fed-batch BioreactorsUsing Neural Network Parameters.Biotechnology Progress ,12:302–309,1996.[24]V .van Breusegem and G.Bastin.Optimal Control of Biomass Growthin a Mixed Culture.Biotechnology and Bioengineering ,35:349–355,1990.2025。
A Comparison of the Cultural Connotation of English andChinese WordsAbstract: The relationship between language and culture is very close. They are interdependent and can not be divided. Some scholars think that the relationship between language and culture is a relationship between part and whole, the language of a society is only one aspect of the culture. But more and more scholars support that language reflects culture, and culture reflects language too. The relationship between them is complex and permeable. Culture permeates the language deeply.In the different backgrounds of language and culture, each word has its denotation and rich cultural connotation, that is to say, each word has its supplementary associate meaning, comparative meaning, symbolic meaning, commendatory sense and derogatory sense and so on. And the close relationship between language and culture is most readily seen in words. The system of meaning shaped in a particular cultural context is unique. That means each language represents a system of meaning that is different from any other such systems. Moreover, each language represents a system of meaning in ways particular to its own. This is not to deny the similarities between either system of meaning or the ways in which these system of meaning are represented. But as a whole, each language represents a unique system of meaning in unique ways. So in cross-cultural studies of language an investigation of words and meaning is indispensable.Key words:words ,cultural connotation, comparison摘要:语言与文化有着密切的关系,它们是相互依存的,是不可分割的。
IDENTIFICATION OF GENOMIC SIGNATURES FOR THE DESIGN OF ASSAYS FOR THE DETECTION AND MONITORING OF ANTHRAX THREATSSORIN DRAGHICI1,†,PURVESH KHATRI1,2,†,YANHONG LIU4,KITTY J CHASE3,ELIZABETH A BODE3,DAVID A KULESH3,LEONARD P WASIELOSKI3,DAVID A NORWOOD3,JAQUES REIFMAN2 1Dept.of Computer Science,Wayne State University,Detroit,MI482022Bioinformatics Cell,Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command,Ft.Detrick,MD217013Diagnostic Systems Division,US Army Medical Research Institute ofInfectitious Diseases,Ft.Detrick,MD217014US Dept.of Agriculture,Agricultural Research Service,Eastern RegionalResearch Center,Wyndmoor,PA19038Sequences that are present in a given species or strain while absent from or different in any other organisms can be used to distinguish the target organism from other related or un-related species.Such DNA signatures are particularly important for the identification of genetic source of drug resistance of a strain or for the detection of organisms that can be used as biological agents in warfare or terrorism.Most approaches used tofind DNA signatures are laboratory based,require a great deal of effort and can only distinguish between two organisms at a time.We propose a more efficient and cost-effective bioinformatics approach that allows identification of genomicfingerprints for a target organism.We validated our approach using a custom microarray,using sequences identified as DNAfingerprints of Bacillus anthracis.Hybridization results showed that the sequences found using our algorithm were truly unique to B.anthracis and were able to distinguish B.anthracis from its close relatives B.cereus and B.thuringiensis.1.IntroductionThe area of organism identification using DNA sequences has many appli-cations in various life science areas.However,there are also many chal-lenges.For instance,sheep pox and goat pox viruses are so closely related that they cannot be distinguished using clinical signs,pathogenesis or sero-reactivity.30Furthermore,both cross-infectivity and cross-resistance have†These authors should be considered jointfirst authors.been reported38to the point that the two were thought to be caused by a single viral species.However,genetic analysis demonstrated that sheep pox and goat pox are actually caused by two related,but genetically dis-tinct viruses.Furthermore,the identification of a few base pair differences in the sequence coding for the P32protein allowed the design of a poly-merase chain reaction(PCR)restriction fragment length polymorphism (PCR RFLP)assay able to distinguish between the two species.This assay involves a PCR amplification with a common primer,followed by a diges-tion with a Hinf I restriction enzyme that produces fragments of different sizes allowing the identification of the two species.The issue of distinguishing between different species is somewhat aca-demic if the two species exhibit both cross-infectivity and,most impor-tantly,allow passive cross-protection as the sheep pox and goat pox do.37 However,this is not always the case.Genes that are present in certain isolates of a given bacterial species and are substantially different or absent from others can determine important strain-specific traits such as drug re-sistance13and virulence.51As an example,B.anthracis,B.cereus,and B. thuringiensis are genetically so close that it has been proposed to consider them a single species.27At the same time,these bacteria are very different on a phenotypic level.B.cereus is a frequent food contaminant but only a mild opportunistic human pathogen;16,28B.thuringiensis is actually a use-ful bacterium being used as a pesticide46while B.anthracis is a virulent pathogen for mammals that has been used as a bio-terror and biological warfare agent.12,53In such cases,the identification of an organism-specific DNA sequence gains an increased importance.Even if such sequences are not functionally active,they can still be extremely useful if used as geneticfingerprints. DNA sequences that are present in a given species while absent from any other organisms can be used to distinguish the target organism from other related or un-related species.If such geneticfingerprints were available for organisms that can be potentially used as biological or terrorist weapons, the task of rapid threat identification,characterization,and selection of ap-propriate medical countermeasures could be immensely facilitated.Genetic fingerprints can also aid identification of genetic source of drug resistance of a strain,17which can be useful to drug developers in pharmacogenomics.2.Existing workThe existing work in the areas of organism identification using DNA sig-natures can be divided into two different categories.One approach uses a laboratory assay to identify the organism.Techniques used include ampli-fied fragment length polymorphism(AFLP),44,45suppression subtractive hybridization(SSH)3and custom DNA microarrays.36A second approach uses a purely bioinformatics analysis of the characteristics of the genomes of various species and extracts those features that are characteristic to in-dividual species.The laboratory based approach does not necessarily require information about the entire genomes involved and is better suited for the development of assays for monitoring and identification of biological threats.For in-stance,SSH,a PCR-based DNA subtraction method,allows identification of genomic sequence differences in a“tester”DNA relative to a“driver”DNA.AFLP relies on the analysis of afluorescence based signal propor-tional to the size of various DNA fragments.49SSH and AFLP have been successfully used to identify genomic sequence differences between various strains or species of bacteria.4,5,10,31,44The major drawback of this ap-proach is that it permits identification of genomic differences only between two organisms.For instance,in order to differentiate two species,one needs to use an SSH assay to compare each strain of one species with each strain of the other species.44Clearly,this approach cannot be used to provide a genomic signature that would differentiate a given organism from all others.The in silico approach to identifying genomic signatures is usually based on an analysis of the entire genomes involved and aims at extracting fea-tures such as species-specific codon usage.1,2,23,32–34,52While this type of genomic signature can be informative about the given organisms and the relationships among them,it may not be directly usable for detection and monitoring purposes.Comparative sequence analysis has also been useful in detecting in-tronic and intergenic regions25,40as well as uncovering novel repeated structures.18,26Several genome scale alignment tools are available:MUM-mer,14,15,39AVID,11MGA,29WABA,35and GLASS7among others.Tax-Plot22provides visual representation of protein homologs in microbial and eukaryotic genomes.Most of these pair-wise a alignment tools assume that the input genomes are closely related.Therefore,there will be a mapping a MGA is a multiple alignment tool but the alignment is still computed pair-wise.of large subsequences between the two input genomes.In turn,they assume that these large subsequences,appearing in the same order in the closely related genomes,are very likely to be part of thefinal alignment.These regions are used as anchors for the alignment of the input genomes.In general,anchor-based genome alignment programsfirst create a suffix tree from the two input genomes.A suffix tree is a compact representation of all suffixes in the input string.41,54A suffix of a string is a substring starting at any position in the string and extending up to the end of the string.Next,the suffix tree is searched for sequences that appear in both input genomes.These exact matching subsequences are known as maxi-mal exact matches(MEMs).The anchors are chosen from these MEMs. Different programs apply different criteria for the selection of anchors.For instance,MUMmer uses the longest increasing subsequence(LIS)24for the selection of anchors.14MUMmer allows the selection of overlapping an-chors whereas AVID and MGA only select non-overlapping anchors.Since MGA allows alignment of more than two genomes,it only selects MEMs that are present in all of the input genomes.AVIDfirstfinds the length of the longest MEM and discards all the MEMs that are less than half the length of the longest MEM.After selecting the anchors,MUMmer employs a variant of the Smith-Waterman algorithm47to close the gaps between the anchors.MGA and AVID close the gaps by recursively creating suffix trees for the non-anchored parts of the input genomes and hence,gradu-ally reducing the gap sizes.Once the gaps are smaller than a threshold, MGA and AVID close them using the ClustalW48and Needleman-Wunsch algorithms,42respectively.These large number of tools are all geared towardsfinding large-scale similarities between two or more genomes.Our focus here is different. While these algorithms were developed tofind sequence similarities,our goal is tofind sequence dissimilarities.These two problems are related but not reciprocal.Simply put,one cannot just take the complement of the sequences found in a similarity search and use them as genomic signatures. The main reason is related to the fact that a search aiming tofind similarity will sometimes discard entire blocks after only a summary inspection be-cause they are not sufficiently similar to the target sequence.On the other hand,a search aiming tofind dissimilarities,i.e.,unique signatures,has to actually focus on exactly those areas that are discarded without extensive analysis during the similarity search.Here,we propose an algorithm forfinding genomicfingerprints that distinguish an organism from all other organisms with known genomes.As the number of sequenced organisms increases,this approach has the potential to substitute existing laboratory based approaches such as AFLP and SSH.In this paper,we used this approach tofind a genetic signature for B.anthracis.Identification of genomic regions unique to B.anthracis can provide clues to its genetic relationship to other highly similar organisms. Related work for the detection of B.anthracis used plasmid-encoded toxin genes for rapid DNA-based assays.8However,these failed to detect non-plasmid containing strains of B.anthracis isolated from the environment.50 Also,there have been efforts to design real-time PCR assays.However, these assays only targeted a single locus and they yielded false-positive results with some strains of B.cereus.20,433.Analysis methodsOur goal is tofind unique DNA sub-sequences for a given target genome across all available known genomes.An obvious approach is to compare (i.e.,align)the genome of our interest against all available known genomes. These alignments will reveal the parts of the target genome that do not align with any other genome(i.e.,are unique to the target genome).However, this seemingly simple approach is computationally very expensive.The GenBank database at NCBI contains nucleotide sequences from more than 140,000organisms.9The length of these genomes vary from a few thousand base pairs to a few billion base pairs.Aligning the input genome with each of these genomes is computationally unfeasible.The amount of computation can be considerably reduced by using the phylogenetic background of the target.Today biologists agree that various organisms have evolved from common ancestors.During evolution,func-tional genomic elements are conserved.Hence,two closely related genomes are expected to have many matching subsequences.If a subsequence that distinguishes the target from all organisms exists,this subsequence will also distinguish the target from its closest relative.Hence,a good initial set of potential genomic signatures can be obtained by comparing the target only with its closest relative and by retaining only those sequences that are dif-ferent.Subsequently,each of these potential signatures is compared with all other known genomes.This approach drastically reduces both the number of comparisons required as well as the length of sequences to be compared (from a few million to a few thousand base pairs,at most).In order tofind the exact matching sequences between the target and itsclosest relative,we start by using their concatenated sequences to create a suffix tree.We then use a suffix tree search algorithm as the one employed in MUMmer tofind the exact matching sequences in both genomes.Since our goal is to determine a set of relatively short sequences to be used on a microarray type assay,we have to search both the forward and the reverse strands.Any sequences that match between the two organisms are removed from further consideration.The result is a set of short segments of the target genome that can be considered potential signatures.These are then compared with all sequences in the blast-nt21database from NCBI.6We consider a sequence is unique for the target genome if it does not align to any sequence from any other organism with an expected value(E-value) less than a threshold of0.01.Fig.1provides an overview of this approach.Figure1.The genomicfingerprinting approach.Two genomes are searched for matching subsequences(MEMs).The MEMs are removed from the target genome and the remaining segments of the target genome(A1,A2,...,A n)are searched against the nt database.If the length of a segment is less than the user specified length,it is discarded and not searched in the nt database.As shown,if a sequence does not align with any sequence from another organism with E value less than the specified threshold it is considered as a sequence unique to the target genome.4.Results and discussionIn order to validate our approach,we designed a custom microarray using sequences identified as genomicfingerprints for B.anthracis.This arraywas then hybridized with B.anthracis and B.cereus.In order tofind a genomic signature for B.anthracis we proceeded as follows.We searched the B.anthracis str.Ames genome(GenBank contig accession number NC03997)for subsequences of30base pairs or more matching anywhere(direct and reverse strand)with sequences from the genome of B.cereus ATCC14579(GenBank contig accession number NC004722).We chose B.cereus ATCC14579genome as a closely re-lated genome because it is considered to be a good representative of the B.cereus family.19Then,we removed all of matching sequences from the B.anthracis genome.This step produced over6,000sequences of length 50or more.These sequences were then searched against the nt database using blastn.The sequences in the BLAST output that were not found in any other organism with E value less than0.01were retrieved and con-sidered part of the genomicfingerprints of B.anthracis.There were140 such sequences.Note that this analysis stage also removed sequences that matched the genomes of other close relatives of B.anthracis,such as B. thuringiensis,without ever directly comparing them.These140target se-quences were provided to CombiMatrix(Mukilteo,WA)for the design of a custom biMatrix designed2probes for80target sequences and1probe for22target sequences(for a total of182probes for102target sequences)with melting temperature in the range of70◦C to75◦C and a length of35base pairs or more.Probes of the required length and melting temperatures could not be identified for the remaining38target sequences. The microarray was designed with three replicates of each of the182probes.The custom microarray was then hybridized with samples of B.an-thracis and B.cereus.The hybridization results showed that18probes only hybridized to the B.anthracis sequences indicating that they were true genomicfingerprints of B.anthracis.Table1provides the positions of the sequences on B.anthracis genome that were found to be unique in the microarray experiment.Surprisingly,many of the initial182probes also hybridized with B. cereus.We further searched these cross-hybridizing probes against the blast-nt database.For the probes that hybridized to B.cereus the re-sults of this comparison showed that although the target sequences of those probes are only present in B.anthracis,the part of the target sequence on which the probes were designed was not unique to B.anthracis and is present in other genomes.This shows that the probe design stage lost some specificity due to its unique added requirements:melting temperatures in a very narrow range,limited lengths,etc.In all cases,although the initial,longer sequence was unique across the blast-nt database,by selecting a shorter subsequence,the probe became unspecific.Hence,another BLAST search is recommended before printing the assay,to check whether the sub-sequences selected as probes continue to be good signatures for the target organism.Table1.The following18probes identify17unique se-quences of B.anthracis(Ames).Thefirst and secondcolumns indicate the start and end,respectively,of the tar-get sequences from B.anthracis.The third and the fourthcolumn are the start and end positions,respectively,on thecorresponding target sequences for which probes were de-signed.Sequence start Sequence end Probe start Probe end175,231175,455644175,567175,6773671488,976489,620130166945,569946,5961511901,629,5221,630,5384895231,629,5221,630,5385295681,845,0011,845,3631111452,021,5352,022,9194915292,098,6192,099,2745916252,783,1902,783,40517542,918,7882,920,25197710133,037,8563,038,1131151523,524,6493,524,73117553,808,0693,809,0467978343,821,6173,822,1634494834,374,3644,375,4782273114,375,5814,376,1231491864,933,4054,933,4829435.ConclusionDNA sequences that are present in a given species or strain while absent from any other organism can be used to distinguish the target organism from other related or un-related species.The identification of such DNA signatures is particularly important for organisms that may be potentially used as biological warfare agents or terrorism threats.Most approaches used to identify DNA signatures are laboratory based and require a significant effort and time.A bioinformatics approach can provide results faster and more efficiently.However,most tools built forgenome comparisons only allow alignment of two genomes at a ing this approach tofind unique DNA signatures across all known organisms is unfeasible.In addition,all existing tools are limited tofinding the similarity between two genomes.In contrast,looking for DNA signatures requires the development of tools that identify sequence dissimilarities.In this paper,we describe an approach tofind the DNAfingerprints of an organism.We used this approach tofind a set of unique sequences for B.anthracis which were then used to design probes for a DNA microarray.The hybridization results revealed that a subset of these probes were truly unique to B.anthracis and were able to distinguish between B.anthracis and B.cereus,which is a close genetic relative.AcknowledgementsThis work was supported by the research area directorates of the US Army Medical Research and Materiel Command and the Defense Threat Reduc-tion Agency.Thefirst two authors are also supported by:NSF DBI-0234806,NIH1S10RR017857-01,MLSC MEDC-538and MEDC GR-352, NIH1R21CA10074001,1R21EB00990-01and1R01NS045207-01. References1.T.Abe,S.Kanaya,M.Kinouchi,Y.Ichiba,T.Kozuki,and T.Ikemura.A novel bioinformatic strategy for unveiling hidden genome signatures ofeukaryotes:Self-organizing map of oligonucleotide frequency.Genome Infor-matics,(13):12–20,2002.2.T.Abe,S.Kanaya,M.Kinouchi,Y.Ichiba,T.Kozuki,and r-matics for unveiling hidden genome signatures.Genome Research,13(4):693–702,2003.3.P.G.Agron,M.Macht,L.Radnedge,E.W.Skowronski,ler,ande of subtractive hybridization for comprehensive surveysof prokaryotic genome differences.FEMS Microbiology Letters,211(2):175–182,Jun2002.4.I.Ahmed,G.Manning,T.Wassenaar,S.Cawthraw,and D.Newell.Identi-fication of genetic differences between two Campylobacter jejuni strains with different colonization potentials.Microbiology,148:1203–1212,2002.5.N.Akopyants,A.Fradkov,L.Diatchenko,and et.al.PCR-based subtractivehybridization and differences in gene content among strains of Helicobacter pylori.Proc.Natl.Acad.Sci.,95:13108–13113,1998.6.S.Altschul,T.Madden,A.Schaffer,J.Zhang,Z.Zhang,M.W.,and D.Lip-man.Gapped BLAST and PSI-BLAST:a new generation of protein database search programs.Nucleic Acids Research,25(17):3389–3402,Sept1997.7.S.Batzoglu,L.Pachter,J.Mesirov,B.Berger,and nder.Human andmouse gene structure:comparative analysis and application to exon predic-tion.Genome Research,10:950–958,2000.8. C.Bell,J.Uhl,T.Hadfield,J.David,R.Meyer,T.Smith,and F.Cocker-ill III.Detection of Bacillus anthracis dna by light-cycle pcr.J.Clin.Micro-biol.,40:2897–2902,2002.9. D.Benson,I.Karsch-Mizrachi,D.Lipman,J.Ostel,and D.Wheeler.Gen-Bank:update.Nucleic Acids Research,32(1):D23–D26,January2004.10.M.Bogush,T.Velikodvorskaya,Y.Lebedev,and et.al.Identificationand localization of differences between Escherichia coli and Salmonella ty-phimurium genomes by suppressive sutractive hybridization.Mol Gen Genet, 262:721–729,1999.11.N.Bray,I.Dubchak,and L.Pachter.AVID:A global alignment program.Genome Research,13(1):97–102,January2003.12. E.Check.Bioshield defence programme set to fund anthrax vaccine.Nature,429(6987):4,May2004.13.J.Davies.Inactivation of antibiotics and the dissemination of resistancegenes.Science,264(5157):375–382,Apr1994.14. A.Delcher,S.Kasif,R.Fleischmann,J.Peterson,O.White,and L.Salzberg.Alignment of whole genomes.Nucleic Acids Research,27(11):2369–237,1999.15. A.Delcher, A.Phillippy,J.Carlton,and S.Salzberg.Fast algorithmsfor large-scale genome alignment and comparison.Nucleic Acids Research, 30(11):2478–2483,2002.16. F.Drobniewski.Bacillus cereus and related species.Clin.Microbiol.Rev.,6:324–338,1993.17.S.Dr˘a ghici and B.Potter.Predicting HIV drug resistance with neural net-works.Bioinformatics,19(1):98–107,January2003.18.I.Dunham,N.Shimuzu,B.Roe,S.Chissoe,and et.al.The DNA sequencesof chromosome22.Nature,402:489–495,1999.19.K.Dwyer,monica,J.Schumacher,L.Williams,J.Bishara,A.Lewandowski,R.Redkar,G.Patra,and D.V.G.Identification of bacil-lus anthracis specific chromosomal sequences by suppressive subtractive hy-bridization.BMC Genomics,5(1):15,Feb2004.20.H.Ellerbrok,H.Nattermann,M.Ozel,L.Beutin,B.Appel,and G.Pauli.Rapid and sensitive identification of pathogenic and apathogenic Bacillus anthracis by real-time pcr.FEMS Microbiol.Lett.,214:51–59,2002.21.N. C.for Biotechnology Information.Blast nucleotide database.ftp:///blast/db/.22.N. C.for Biotechnology Information.Taxplot./sutils/taxik2.cgi?23.R.Grantham,C.Gautier,M.Gouy,R.Mercier,and A.Pave.Codon catalogusage and the genome hypothesis.Nucleic Acids Research,8:r49–r62,1980.24. D.Gusfield.Algorithms on Strings,Trees and Sequences:Computer Scienceand Computational Biology.Cambridge University Press,New York,1997.25.R.Hardison,J.Oeltjen,and ler.Long human-mouse sequence align-ments reveal regulatory elements:a reason to sequence the mouse genome.Genome research,7:959–966,1997.26.M.Hattori,A.Fujiyama,T.Taylor,H.Watanabe,and et.al.The DNAsequence of human chromosome21.Nature,405:311–319,2000.27. E.Helgason,D.Caugant,I.Olsen,and A.Kolsto.Genetic structure of pop-ulation of Bacillus cereus and B.thuringiensis isolates associated with pe-riodontitis and other human infections.J.Clin.Microbiol.,38:1615–1622, 2000.28.O.I.K.A.Helgason E,Caugant DA.Genetic structure of population ofbacillus cereus and b-thuringiensis isolates associated with periodontitis and other human infections.Journal Of Clinical Microbiology,38(4):1615–1622, Apr2000.29.M.Hohl,S.Kurtz,and E.Ohlebusch.Efficient multiple genome alignment.Bioinformatics,18(Suppl.1):S312–S320,2002.30.M.Hosamani,B.Mondal,P.A.Tembhurne,S.K.Bandyopadhyay,R.K.Singh,and T.J.Rasool.Differentiation of sheep pox and goat poxviruses by sequence analysis and pcr-rflp of p32gene.Virus Genes,29(1):73–80,Aug 2004.31. B.Janke,U.Dobrindt,J.Hacker,and G.Blum-Oehler.A subtractive hy-bridization analysis of genomic differences between the uropathogenic E.coli strain536and the E.coli k-12strain mg1655.FEMS Microbial Lett.,199:61–66,2001.32.S.Kanaya,M.Kinouchi,T.Abe,Y.kudo,Y.Yamada,T.Nishi,H.Mori,and T.Ikemura.Analysis of codon usage diversity of bacterial genes with a self-organizing map(som):Characterization of horizontally transferred genes with emphasis on the e.coli o157genome.Gene,276:89–99,2001.33.S.Kanaya,Y.Kudo,T.Abe,T.Okazaki,D.Carlos,and T.Ikemura.Geneclassification by self-organizing mapping of codon usage in bacteria with com-pletely sequences genome.Genome Informatics,9:369–371,1998.34.S.Kanaya,Y.Kudo,Y.Nakamura,and T.Ikemura.Detection of genes inescherichia coli sequences determined by genome projects and prediction of protein production levels,based on multivariate diversity in codon usage.CABIOS,12:213–225,1996.35.W.Kent and A.Zahler.Conservation,regulation,synteny and intronsin large-scale C.briggsae-C.elegans genomic alignment.Genome research, 10:1115–1125,2000.36.M.Kingsley,T.Straub,D.Call,D.Daly,S.Wunschel,and D.Chandler.Fingerprinting closely related xanthomonas pathovars with random nonamer oligonucleotide microarrays.Appl.Environ.Microbiol.,68:6361–6370,2002.37.R.P.Kitching.Passive protection of sheep against capripoxvirus.Res VetSci,41(2):247–250,Sep1986.38.R.P.Kitching and W.P.Taylor.Clinical and antigenic relationship betweenisolates of sheep and goat pox viruses.Trop Anim Health Prod,17(2):64–74, May1985.39.S.Kurtz,A.Phillippy,A.Delcher,M.Smoot,M.Shumway,C.Antonescu,and S.Salzberg.Versatile and open software for comparing large genomes.Genome biology,5:R12,2004.40.G.Loots,R.Locksley,C.Blankespoor,Z.Wang,ler,E.Rubin,andK.Frazer.Identification of a coordinate regulator of interleukins4,13and5 by cross-species sequence comparisons.Science,288:136–140,2000.41. E.McCreight.A space-economical suffix tree construction algorithm.Journalof the ACM,23(2):262–272,1976.42.S.Needleman and C.Wunsch.A general method applicable to the searchfor similarities in the amino acid sequence of two proteins.J.Mol.Biol., 48(3):443–453,1970.43.Y.Qi,G.Patra,X.Liang,L.Williams,S.Rose,R.Redkar,and V.DelVec-chio.Utilization of the rpoB gene as a specific chromosomal marker for real-time pcr detection of Bacillus anthracis.Appl.Environ.Microbiol.,67:3720–3727,2001.44.L.Radnedge,P.G.Agron,K.Hill,P.Jackson,L.Ticknor,P.Keim,andA.G.L.Genome differences that distinguish bacillus anthracis from bacilluscereus and bacillus thuringiensis.Applied And Environmental Microbiology, 69(5):2755–2764,May2003.45.L.Radnedge,S.Gamez-Chin,P.McCready,P.Worsham,and G.Andersen.Identification of nucleotide sequences for the specific and rapid detection of yersinia pestis.Applied And Environmental Microbiology,67(8):3759–3762, Aug2001.46. E.Schnepf,N.Crickmore,J.Van Rie,D.Lereclus,J.Baum,J.Feitelson,D.R.Zeigler,and D.H.Dean.Bacillus thuringiensis and its pesticidal crystalproteins.Microbiology And Molecular Biology Reviews,62(3):775–806,Sep 1998.47.T.Smith and M.Waterman.Identification of common molecular subse-quences.J.Molecular Biology,147(1):195–197,1981.48.J.Thompson,D.Higgins,and T.Gibson.CLUSTALW:improving the sen-sitivity of progressive multiple sequence alignment through sequence weight-ing,position specific gap penalties and weight matrix choice.Nucleic Acids Research,22:4673–4680,1994.49.L.Ticknor,A.Kolsto,K.Hill,P.Keim,ker,M.Tonks,and P.Jackson.Fluorescent amplified fragment length polymorphism analysis of norwegian bacillus cereus and bacillus thuringiensis soil isolates.Appl Environ Micro-biol.,67(10):4863–4873,2001.50.P.Turnbull,R.Hutson,M.Ward,M.Jones,C.Quinn,N.Finnie,C.Dug-gleby,J.Kramer,and J.Melling.Bacillus anthracis but not always anthrax.J.Appl.Bacteriol.,72:21–28,1992.51.M.K.Waldor and M.J.J.Lysogenic conversion by afilamentous phageencoding cholera toxin.Science,272(5270):1910–1914,Jun1996.52.H.Wang,J.Badger,P.Kearney,and M.Li.Analysis of codon usage patternsof bacterial genomes using the self-organizing map.Molecular Biology and Evolution,18:792–800,2001.53.G.Webb.A silent bomb:the risk of anthrax as a weapon of mass destruction.Proceedings of the National Academy of Sciences USA,100(7):4346–4351, 2003.54.P.Weiner.Linear pattern matching algorithms.In Proc.14th IEEE Symp.Switching&Automata Theory,pages1–11,1973.。
Some Common Figures of speech (1)Group A: Figures of resemblance or relationship1 SimileIt is a comparison between two distinctly different things. It is indicated by words such as “as,”“like.”(1) “Mama,” Wangero said, sweet as a bird.2 MetaphorIt is the use of a word which originally denotes one thing to refer to another with a similar quality. It is an implied comparison.(1) He flew to marry a cheap city girl from a family of ignorant flashy people.(2) Railroads began drying up the demand for steamboat pilots.3 PersonificationPersonification treats a thing or an idea as if it were human or had human qualities.(1) Youth is wild, and Age is tame.(2) Dusk came stealthily.4 MetonymyMetonymy substitutes the name of one thing for that of another with which it is closely associated. When metonymy is well used, brevity and vividness may be achieved.(1) Sword and cross in hand, the European conquerors fell upon the Americas.(2) The instant riches of a mining strike would not be his in the reporting trade, but for making money, his pen would prove mightier than his pickax.[In (1) and (2), you have the tools (sword or cross; pen or pickax) as a name for the activity in which they are used (military or religious control).](3) He is too fond of the bottle. [Here you have the container (bottle) as a name for the thing it contains.](4) She used to love reading Shakespeare. [Here you have the writer as a name for the thing he wrote.](5) He was driving a Ford at that time. [Here you have the trade mark or brand as a name for the product of the brand.](6) The Washington Post, in an editorial captioned "Keep Your Old Webster's" [the brand for the dictionary of the brand](7) In the evening, she wears soft rich colours. [Here you have the colors as a name for the clothes of the colors.]More: the crown ~ a king; the White House ~ the American government; the bar ~ the legal profession5 SynecdocheIt is a figure of speech by which a part is put for the whole, the whole for the part, the species for the genus, the genus for the species, or the name of the material for the thing made of it.(1) He has two mouths to feed in his family. [The naming of a part for the whole](2) Italy beat Spain in the football match. [The naming of the whole for a part](3) Alas, the spring should vanish with the rose. [The naming of the species for the genus](4) He is a poor creature. [The naming of the genus for the species](5) Have you any copper? [The naming of the material for the thing made--coins]6 EuphemismIt is the substitution of a mild or vague expression for a harsh or unpleasant one.(1) I want my fill of beauty before I go.More: senior citizens ~ old people; emotionally disturbed ~ mad; military action ~ invasion; sanitation worker ~ dustman Group B: Figures of emphasis or understatement1 Hyperbole/overstatementIn this figure, the diction exaggerates the subject and is not intended to be understood literally.(1) On hearing that he had been admitted to that famous university, he whispered to himself, “I’m the luckiest man in theworld.”(2) Most Americans remember Mark Twain as the father of Huck Finn’s idyllic cruise through eternal boyhood and Tom Sawyer’s endless summer of freedom and adventure.2 UnderstatementThe diction plays down the magnitude or value of the subject. Both overstatement and understatement aim at making the statement or description impressive or interesting.(1) It took a few dollars to build such an indoor swimming pool. [It means that actually it took a great deal.](2) This is not at all unpleasant. [It means that it is quite pleasant.]3 AntithesisIn this figure, two words, phrases, clauses, or sentences that are contrasted or opposed in meaning are juxtaposed in grammatically parallel structure. The effect is for giving emphasis to these contrasting ideas.(1) To err is human, to forgive divine.(2) Does one like islands because one unconsciously appropriates them, a small manageable domain in a large unmanageable world?4 OxymoronAn oxymoron combines terms that are apparently contradictory to one another so as to produce a special effect.(1) The coach had to be cruel to be kind to his trainees.(2) She read that long-awaited letter with a tearful smile.5 Rhetorical questionIt is a question neither requiring nor intended to produce a reply but asked for emphasis. The assumption is that only one answer is possible.(1) Was I not at the scene of the crime? (Unit 2: Hiroshima)(2) Is justice merely a word?6 IronyIt is the use of words which are clearly opposite to what is meant, in order to achieve a special effect.(1) “What a fine weather for an outing!” [spoken when it is raining heavily](2) If a barbarous act is called civilized, irony is used.Group C: figures of sound1 AlliterationIt is the repetition of the initial letter (generally a consonant) or first sound of several words, marking the stressed syllables in a line of poetry of prose. It helps to create a mood or to unify a passage.(1) The Russian danger is therefore our danger, and the danger of the United States, just as the cause of any Russian fighting for his h earth and h ome is the cause of free men and free peoples in every quarter of the globe.(2) It was a splendid population----for all the s low, s leepy, s luggish-brained s loths s tayed at home…2 OnomatopoeiaAn onomatopoeia uses words to imitate natural sounds.(1) The bees are humming.(2) A whizzing arrow is flying through the air.Group D: verbal games1 Transferred epithetAn epithet is an adjective or descriptive phrase that serves to characterize someone or something.However, a transferred epithet is one that is shifted from the noun it logically modifies to a word associated with the noun.(1) I has had a busy day. [I am busy in the whole day.](2) The old man put a reassuring hand on my shoulder. [The old man put a hand on my shoulder, reassuring me.]。
Dear Dr. XXX,Thank you for arranging a timely review for our manuscript. We are pleased to know that our study is of general interest for the readers of NUTRITION. We have carefully evaluated the reviewers’ critical comments and thoughtful suggestions, r esponded to these suggestions point-by-point, and revised the manuscript accordingly. All changes made to the text are in red so that they may be easily identified. With regard to the reviewers’ comments and suggestions, we wish to reply as follows:Enclosures:(1)Correspondences to your reviewers;(2)One copy of the revised manuscript;(3)A floppy disk containing the revised manuscript.(4)Copyright assignmentTo reviewer#11.The author should add a few review articles on ghrelin for readers in theIntroduction.We added two reviews in our revised manuscript.2.The increase in ghrelin levels do not necessary indicate that weight loss in diseaseis well compensated (Introduction and Discussion). This may be interpreted to be insufficient to recover to the previous body weight.There is possibility that the increase in ghrelin levels may result from the insufficient to recover to the previous body weight, but it is more likely that the increase in ghrelin level indicate that weight loss in disease is well compensated.Shimizu et al1 reported that baseline plasma ghrelin level was significantly higher in cachectic patients with lung cancer than in noncachectic patients and control subjects. As weight loss is a chronic process and ghrelin levels may change more rapid than weight loss, the increase in ghrelin in those chronic diseases is unlikely result from the insufficient to recover to the previous body weight. Moreover, this author also reported that follow-up plasma ghrelin level increased in the presence of anorexia after chemotherapy, which further suggests that the increase ghrelin level may represent a compensatory mechanism under catabolic–anabolic imbalance in cachectic patients with lung cancer1.3.The authors should refer to the original report that IL-1b decrease plasma ghrelinlevels(Gastroentelorogy 120:337-345,2001)We referred this article as the reviewer suggested. In fact, this is a mistake of us. Many thanks for the reviewer’s suggestion.4.Ref. 13 dose not include data on ghrelin.We are so sorry to make this mistake for citing the Ref.13. We replaced the reference in the paper.5.There is no report that desacyl ghrelin stimulates food intake. It is the consensusat present acyl ghrelin is involved in feeding response to starvation. Therefore, the authors should be careful about their interpretation described in the last paragraph in page 10.We made it clear in the paper that ghrelin has two isoforms (“active”and “inactive”). Only the “active”isoform is involved in feeding response tostarvation. But the “inactive”isoform has other activities like anti-proliferative activity on tumor cell lines as described in the manuscript.To reviewer#2Major comments1.Earlier studies have shown that circulating ghrelin level is increased inunderweight patients with CHF, lung cancer, and liver cirrhosis. In the present study, however, plasma ghrelin level was decreased despite a significant weight loss in COPD. In addition, earlier studies have reported that circulating ghrelin correlated positively with BMI in patients with CHF and lung cancer. However, the present study demonstrated that plasma ghrelin level correlated positively with BMI in COPD patients. Thus, there are considerable discrepancies between the present study and earlier studies. These discrepancies should be discussed in detail. The author also stated the regulation of ghrelin secretion was disturbed in COPD patients. However, they did not clarify this mechanism.We stated that the role of ghrelin in patients with COPD may be different from its role in CHF, cancer and liver cirrhosis and discussed this difference in the last paragraph of page 9.Following the reviewer’s suggestion, we added that “plasma ghrelin correlated positively with percent predicted residual volume and residual volume/total lung capacity ratio”as the evidence for further supporting that respiratory abnormalities may take part in the regulation of plasma ghrelin levels.2.The authors demonstrated that plasma ghrelin level correlated negatively withplasma TND-a and CRP in COPD patients. However, Nagaya et al. have shown that plasma ghrelin level correlates positively with plasma TNF-a level in patients with CHF. This discrepancy should be discussed.According to the reviewer indicated, we discussed this discrepancy in the second paragraph of page 9.3.The author stated that respiratory abnormalities may take part in the regulation ofplasma ghrelin level in COPD. The authors should describle the relationship between plasma ghrelin level and pulmonary function in COPD.There are evidences that respiratory abnormalities may take part in the regulation of plasma ghrelin level in lung diseases with respiratory abnormalities2,3. As our study was designed to investigate whether the plasma ghrelin levels are increased or decreased in COPD and whether the plasma ghrelin levels relates to the increased systemic inflammation in those patients, so we didn’t analysis the relationship between plasma ghrelin level and pulmonary function.Minor comments1.Circulating ghrelin level exhibits a circadian rhythm. Therefore, the authorsshould describle the limitation of their measurement of ghrelin in single samples.It’s true that circulating ghrelin level exhibits a circadian rhythm and to monitor the ghrelin levels in different time points is better than just measured a single sample. However, we collected the samples at the fasting state (from 9:00 p.m. on the previous night.) by venipuncture at 7:00 a.m. as most studies did2,4. Soour results can exclude the possibility that the difference between groups was result from the circadian rhythm of ghrelin and are well compared with other studies.2.In the Results section, plasma ghrelin level in healthy controls was different withthat in 0.25+0.22ng/ml, whereas, in Figure 1A, it was approximately 1.8ng/ml.We fixed this in our revised manuscript. We are so sorry to make this mistake.To reviewer#31.About the paper of Itoh et al in AJRCC.As the reviewer said, the study by Itoh et al was not published when the current manuscript were submitted. We discussed the difference between the findings of their study and our study in revised manuscript.2.AbstractConclusion: “plasma ghrelin decreased in COPD”. This sounds like the authors have followed subjects for a long time and that the diagnosis COPD was conformed, the plasma ghrelin decreased. This was however not the aim nor the case-a reformulation is necessary.We fixed this as the reviewer suggested in our revised manuscript.3.Introduction(1)Page 2. Ref.1. is a letter to the editor in Br J Nutr and is a commentconcering an earlier published paper. It is not a reference that support the statement. Several other references exist in the literature to be used instead.Thanks for the reviewer’s suggestion. We replaced this reference by other one.(2)Page 2, line 5. “To understand weight loss mechanisms in this disease may behelpful to improve quality of life in these patients”. Do you really think that if we researchers understand the mechanisms that automatically would make the patients happier?We replaced this sentence with “To understand weight loss mechanisms in this disease may be helpful to combat weight loss in these patients”4.Methods(1)Patients: How were the patient and control subjects selected?The authors state that none of the control subjects was taking and medications-was that also the case for the patients?That was also the case for the patients. In fact, most of the COPD patients in China do not take any medications when the disease is clinically stable because of economic reason.Page 4, line 2. A short description of ATS criteria would be helpful for readers who are not familiar with those criteria.As those criteria are widely used by researcher and physicians, we did not describe them in our paper as some paper did. If you think it is necessary to do so, we may add a short description.Page4, line3, what do you mean by “other diseases”? COPD patients mostoften have a lot of other diseases.We are so sorry to mis-express this - we just means that those patients did not have the disease that known to affect the plasma ghrelin level. We fixed it in our revised manuscript.Page 4, line 5. If I understand it correctly, none of the COPD patients were smokers or ex-smokers, i.e. another reason exists for their COPD. Cigarette smoking is the main cause of COPD, but here you have studied patients having other reasons for the disease. What dose this mean regarding the representativity of the study group?Could it affect the results in some way?Smoking increases the plasma ghrelin level5. It is difficult for us to define “ex-smokers” because there is no study about that whether the ex-smoking will affect the plasma ghrelin level or not. This may lead to the representativity problem. However, those patients in our study still lost the weight and had system inflammation as most COPD patients did. Further study should be designed to investigate the effect of ex-smoking on plasma ghrelin level.Page 4, line 6.Why do the authors refer to Whatmore et al? That study investigated ghrelin in healthy adolescents and has nothing to do with factor known to affect serum ghrelin level.We are sorry to make this mistake. We replaced this reference.(2)Body compositionPage 4, last line – page 5, line1. The deuterium dilution study performed by Baarends et al was using arm – to – foot bioelectrical impedance spectroscopy. In the current manuscript the foot – to – foot bioelectrical impedance assessment is used. The readers are lead to believe that the foot – to – foot BIA is also validated with deuterium dilution in COPD patients, which I think is not the case.Thanks for the carefulness of the reviewer. However, there are still evidences that our method is well correlated with DEXA6and arm –to –foot bioelectrical impedance7, so it is appropriate to use this method in our study. However, because those sentences will lead to the confusion, we deleted them in revised manuscript according to suggestion of the reviewer.Page 5, line 4. The %fat was calculated by the machine. It should be stated on which material these calculations are based on – healthy subject? –young or old? – How many.According to the instruction of the manufactory, we selected the standard model for this calculation (the other model was athletic). We stated this in the revised manuscript.(3)StatisticalA reference by Scols et al is used to strengthen the use of values below the detection limit and the use of log. Other reasons need to be provided. What if Schols et al did a statistical error using values that were below the detection limit? There do exist statistical reasonsfor log the values –do they exist in this manuscript?It’s very important to select a suitable statistical method for process the data. There are 6 data below the detection limit in ghrelin and 1 data in leptin. Ifthese data were discarded, it may increase the possibility of type two error as lower ghrelin levels were exclude. However, if the data were analyzed originally, it may increase the possibility of type one error as they below the detection limit.So it is reasonable to adopt the method used by Schols et al.As to log transformation, we added the necessary information in the text according to the opinion of the reviewer.5.DiscussionPage 8. line 2-3. COPD patients had lower ghrelin levels compared to the control subjects. Did the control subjects have “normal” ghrelin values?We selected seventeen age-matched healthy males as control subjects.Those subjects were healthy. So we can take their ghrelin levels as “normal”ghrelin values. However, we think true “normal ghrelin values” should be based on large population study.Page9. line 18. Following “CHF, cancer and liver cirrhosis” a reference is needed here.We added references as the reviewer suggested.Page9. last line.ghrelin instead of gherlin.We fixed it.Page 11. Delete the summary, it is the same as the conclusion in the abstract.We wrote the summary according to the guideline for author of the journal. If you think the summary should be cut, we may delete it.6.ReferenceAs mentioned above, some of the references are not appropriate. They should be replaced by more appropriate and explanatory references.Many thanks for the reviewer’s suggestion. We replaced those references in the revised manuscript.References:1. Shimizu, Y., Nagaya, N., Isobe, T., et al. Increased plasma ghrelin level in lung cancer cachexia. Clin Cancer Res 2003; 9: 7742. Itoh, T., Nagaya, N., Yoshikawa, M., et al. Elevated Plasma Ghrelin Level in Underweight Patients with Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med 2004;3. Haqq, A. M., Stadler, D. D., Jackson, R. H., et al. Effects of growth hormone on pulmonary function, sleep quality, behavior, cognition, growth velocity, body composition, and resting energy expenditure in Prader-Willi syndrome. J Clin Endocrinol Metab 2003; 88: 22064. Nagaya, N., Uematsu, M., Kojima, M., et al. Elevated circulating level of ghrelin in cachexia associated with chronic heart failure: relationships between ghrelin and anabolic/catabolic factors. Circulation 2001; 104: 20345. Fagerberg, B., Hulten, L. M.,Hulthe, J. Plasma ghrelin, body fat, insulin resistance, and smoking in clinically healthy men: the atherosclerosis and insulin resistance study. Metabolism 2003; 52: 14606. Tyrrell, V. J., Richards, G., Hofman, P., et al. Foot-to-foot bioelectrical impedance analysis: a valuable tool for the measurement of body composition in children. Int J Obes Relat Metab Disord2001; 25: 2737. Nunez, C., Gallagher, D., Visser, M., et al. Bioimpedance analysis: evaluation of leg-to-leg system based on pressure contact footpad electrodes. Med Sci Sports Exerc 1997; 29: 524一篇稿子从酝酿到成型历经艰辛,投出去之后又是漫长的等待,好容易收到编辑的回信,得到的往往又是审稿人不留情面的一顿狂批。
高中英语学术论文研究方法单选题40题1. In an academic paper, which of the following is NOT a common research method?A. Quantitative analysisB. Qualitative researchC. Hypothesis testingD. Random guessing答案:D。
本题主要考查对常见学术研究方法的理解。
选项A“Quantitative analysis”( 定量分析)、选项B“Qualitative research”( 定性研究)和选项C“Hypothesis testing” 假设检验)都是常见的研究方法。
而选项D“Random guessing”( 随机猜测)并非一种科学的研究方法。
2. When conducting research for an academic paper, which of the following is a classification of research methods based on data collection?A. Historical researchB. Experimental studyC. Descriptive analysisD. Documentary research答案:D。
本题考查研究方法基于数据收集的分类。
选项A“Historical research” 历史研究)侧重于对过去事件的研究;选项B“Experimental study”( 实验研究)是通过控制变量来探究因果关系;选项C“Descriptive analysis”( 描述性分析)是对现象的描述。
而选项D“Documentary research” 文献研究)是基于已有的文献资料进行收集和分析,属于基于数据收集的分类。
3. In an academic paper, which research method mainly focuses on understanding the meaning and experience of individuals?A. Empirical studyB. Grounded theoryC. Content analysisD. Case study答案:B。
对比法的英语作文In today's society, the use of comparison is a common method of analysis in various fields. Whether it be in literature, science, or everyday decision-making, comparing and contrasting different elements helps us to understand and evaluate their differences and similarities. This essay will explore the use of the comparative method in English literature, science, and everyday decision-making, and discuss its benefits and limitations.In English literature, the comparative method is often used to analyze and understand different literary works. By comparing and contrasting the themes, characters, and writing styles of different authors, readers can gain a deeper understanding of the texts and the messages they convey. For example, by comparing the works of William Shakespeare and Jane Austen, readers can gain insights into the different social and cultural contexts in which these authors wrote, as wellas the different perspectives they bring to their writing. This comparative analysis can enrich readers' understanding and appreciation of literature.In the field of science, the comparative method is used to analyze andevaluate different theories, experiments, and data. Scientists often compare and contrast the results of different studies to identify patterns, trends, and discrepancies. This comparative analysis helps scientists to refine their hypotheses, develop new theories, and make informed decisions about their research. For example, in the field of medicine, comparative studies of different treatments for a particular disease can help doctors and researchers to identify the most effective and safe treatment options for patients.In everyday decision-making, the comparative method is used to evaluate different options and make informed choices. Whether it be choosing between different products, services, or courses of action, people often compare and contrast the features, benefits, and drawbacks of different options to make the best decision. For example, when buying a new car, consumers may compare theprices, fuel efficiency, safety ratings, and customer reviews of different models to choose the best option for their needs and budget.While the comparative method has many benefits in English literature, science, and everyday decision-making, it also has its limitations. In literature, for example, the comparative method can sometimes oversimplify complex texts and overlook the unique qualities of each work. By focusing too much on thesimilarities and differences between texts, readers may miss out on the individual nuances and artistic merits of each work. Similarly, in science, the comparative method can lead to biased interpretations and conclusions if not used carefully. Scientists must be cautious of confounding variables and other factors that may influence the results of their comparative analysis.In everyday decision-making, the comparative method can also be limited by personal biases, preferences, and limited information. People may unconsciously favor certain options over others, leading to biased decision-making. Additionally, the comparative method relies on the availability and accuracy of informationabout different options, which may not always be reliable or complete. As a result, people may make decisions based on incomplete or inaccurate comparative analysis.In conclusion, the comparative method is a valuable tool for analyzing and evaluating different elements in English literature, science, and everydaydecision-making. It helps us to understand and appreciate the differences and similarities between different texts, theories, and options. However, it is important to be mindful of the limitations of the comparative method and use it with caution to avoid oversimplification, bias, and incomplete analysis. Byutilizing the comparative method thoughtfully and critically, we can make informed and thoughtful decisions in various aspects of our lives.。
探讨分析生长抑素、凝血酶联合奥美拉唑治疗 上消化道溃疡出血患者的临床效果及安全性唐慧敏【摘要】 目的 分析生长抑素、凝血酶联合奥美拉唑治疗上消化道溃疡出血的临床效果及安全性。
方法 88例上消化道溃疡出血患者, 采取随机抽签的方式分为观察组和对照组, 各44例。
对照组仅给予奥美拉唑治疗, 观察组给予生长抑素+凝血酶+奥美拉唑联合治疗, 比较两组患者的临床效果、出血量、止血时间、住院时间及不良反应发生情况。
结果 观察组总有效率显著高于对照组, 差异有统计学意义(P<0.05)。
观察组患者出血量为(395.55±80.12)ml, 止血时间为(7.01±2.19)h, 住院时间为(10.27±2.44)d, 均分别优于对照组的(513.74±89.63)ml、(9.58±3.04)h、(12.88±3.28)d, 差异有统计学意义(P<0.05)。
观察组不良反应发生率为6.8%(3/44), 对照组不良反应发生率为9.1%(4/44), 两组比较差异无统计学意义(P>0.05)。
结论 生长抑素、凝血酶联合奥美拉唑治疗上消化道溃疡出血具有显著疗效, 且不良反应少, 安全性高, 适宜在临床上推广。
【关键词】 上消化道溃疡出血;生长抑素;凝血酶;奥美拉唑DOI :10.14164/11-5581/r.2017.24.046Investigation and analysis on clinical effect and safety of somatostatin and thrombin combined with omeprazole in patients with upper gastrointestinal ulcer bleeding TANG Hui-min. Department of Gastroenterology, Liaoyang Central Hospital, Liaoyang 111000, China【Abstract 】 Objective To analyze the clinical effect and safety of somatostatin and thrombin combined with omeprazole in patients with upper gastrointestinal ulcer bleeding. Methods A total of 88 patients with upper gastrointestinal ulcer bleeding were divided by random lottery method into observation group and control group, with 44 cases in each group. The control group was treated with omeprazole, and the observation group was treated with somatostatin + thrombin + omeprazole. Comparison were made on clinical effect, bleeding volume, hemostatic time, hospitalization time and adverse reactions between two groups. Results The observation group had obviously higher total effective rate than the control group, and the difference was statistically significant (P<0.05). The observation group had bleeding volume as (395.55±80.12) ml, hemostatic time as (7.01±2.19) h, hospitalization time as (10.27±2.44) d, which were all better than (513.74±89.63) ml, (9.58±3.04) h, (12.88±3.28) d in the control group, and their difference was statistically significant (P<0.05). The observation group had incidence of adverse reactions as 6.8% (3/44), while the control group had incidence of adverse reactions as 9.1% (4/44), and their difference was not statistically significant (P>0.05). Conclusion Somatostatin and thrombin combined with omeprazole shows excellent efficacy in treating upper gastrointestinal ulcer bleeding with less adverse reactions and high safety. It is suitable to be popularized in clinic.【Key words 】 Upper gastrointestinal ulcer bleeding; Somatostatin; Thrombin; Omeprazole 作者单位:111000 辽阳市中心医院消化内科上消化道出血是一类常见的消化系统疾病, 多由消化道溃疡引起, 上消化道溃疡出血具有起病急、进展快、危害大的特点, 会严重影响到患者生存质量和身心健康[1]。
A comparison between dissimilarity SOM and kernel SOM forclustering the vertices of a graphNathalie Villa(1)and Fabrice Rossi(2)(1)Institut de Math´e matiques de Toulouse,Universit´e Toulouse III118route de Narbonne,31062Toulouse cedex9,France(2)Projet AxIS,INRIA RocquencourtDomaine de V oluceau,Rocquencourt,B.P.105,78153Le Chesnay cedex,France email:(1)nathalie.villa@math.ups-tlse.fr,(2)fabrice.rossi@inria.frKeywords:kernel SOM,dissimilarity,graphAbstract—Flexible and efficient variants of the Self Organizing Map algorithm have been proposed for non vector data,including,for example,the dissimilarity SOM (also called the Median SOM)and several kernelized ver-sions of SOM.Although thefirst one is a generalization of the batch version of the SOM algorithm to data described by a dissimilarity measure,the various versions of the sec-ond ones are stochastic SOM.We propose here to introduce a batch version of the kernel SOM and to show how this one is related to the dissimilarity SOM.Finally,an application to the classification of the vertices of a graph is proposed and the algorithms are tested and compared on a simulated data set.1IntroductionDespite all its qualities,the original Self Organizing Map (SOM,[13])is restricted to vector data and cannot there-fore be applied to dissimilarity data for which only pair-wise dissimilarity measures are known,a much more gen-eral setting that the vector one.This motivated the intro-duction of modified version of the batch SOM adapted to such data.Two closely related dissimilarity SOM were pro-posed in1996[12,1],both based on the generalization of the definition of the mean or median to any dissimilarity measure(hence the alternative name Median SOM).Fur-ther variations and improvements of this model are pro-posed in[11,6,8].Another way to build a SOM on non vector data is to use the kernelized version of the algorithm.Kernel methods have become very popular in the past few years and numer-ous learning algorithm(especially supervised ones)have been“kernelized”:original data are mapped into an high dimensional feature space by the way of a nonlinear feature map.Both the high dimensional space and the feature map are obtained implicitly from a kernel function.The idea is that difficult problems can become linear ones while being mapped nonlinearly into high dimensional spaces.Classi-cal(often linear)algorithms are then applied in the feature spaces and the chosen kernel is used to compute usual oper-ations such as dot products or norms;this kernelization pro-vides extensions of usual linear statistical tools into nonlin-ear ones.This is the case,among others,for Support Vector Machine(SVM,[20])which corresponds to linear discrim-ination or Kernel PCA([17])which is built on Principal Component Analysis.More recently,kernelized version of the SOM has been studied:[10]first proposes a kernelized version of SOM that aims at optimizing the topographic mapping.Then,[2,16]present kernel SOM that applies to the images of the original data by a mapping function;they obtain improvements in classification performances of the algorithm.[15,23]also studied these algorithms:thefirst one gives a comparison of various kernel SOM on several data sets for classification purposes and the second proves the equivalence between kernel SOM and self-organizing mixture density network.In this work,we present a batch kernel SOM algorithm and show how this algorithm can be seen as a particular ver-sion of the dissimilarity SOM(section2).We target specif-ically non vector data,more precisely vertices of a graph for which kernels can be used to define global proximi-ties based on the graph structure itself(section3.1).Ker-nel SOM provides in this context an unsupervised classi-fication algorithm that is able to cluster the vertices of a graph into homogeneous proximity groups.This applica-tion is of a great interest as graphs arise naturally in many settings,especially in studies of social networks such as World Wide Web,scientific network,P2P network([3])or medieval peasant community([4,22]).Wefinally propose to explore and compare the efficiency of these algorithms to this kind of problems on a simulated example(section 3.2).2A link between kernel SOM and Dissimilarity SOMIn the following,we consider n input data,x1,...,x n from an arbitrary input space,G.In this section,we present self-organizing algorithms using kernels,i.e.functions k: G×G→R such that are symmetric(k(x,x )=k(x ,x))and positive (for all m ∈N ,all x 1,...,x m ∈G and all α1,...,αm ∈R , mi =1αi αj k (x i ,x j )≥0).These functions are dot products of a mapping func-tion,φ,which is often nonlinear.More precisely,it exists an Hilbert space,(H , .,. ),called a Reproducing Kernel Hilbert Space (RKHS),and a mapping function φ:G →H such that k (x,x )= φ(x ),φ(x ) .Then,algorithms that use the input data by the way of their norms or dot products are easily kernelized using the images by φof the original data set:φand H are not explicitely known as the opera-tions are defined by the way of the kernel function.2.1On-line kernel SOMA kernel SOM based on the k -means algorithm has beenfirst proposed by [16].The input data of this algorithm are the images by φof x 1,...,x n and,as in the original SOM,they are mapped into a low dimensional grid made of M neurons,{1,...,M },which are related to each others by a neighborhood relationship,h .Each neuron j is represented by a prototype in the feature space H ,p j ,which is a linear combination of {φ(x 1),...,φ(x n )}:p j = n i =1γji φ(x i ).This leads to the following algorithm:Algorithm 1:On-line kernel SOM(1)For all j =1,...,M and all i =1,...,n ,initialize γ0jirandomly in R ;(2)For l =1,...,L ,do (3)assignment step :x l is assigned to the neuron,f l(x l )which has the closest prototype:f l (x l )=argminj =1,...,Mφ(x l )−p l −1j(4)representation step :for all j =1,...,M ,the pro-totype p j is recomputed:for all i =1...,n ,γl ji =γl −1ji +α(l )h (f l (x l ),j )“δil −γl −1ji ”End for.where step (3)leads to the minimization ofni,i =1γl −1ji γl −1ji k (x i ,x i )−2 n i =1γl −1ji k (x i ,x l ).As shown in [23],the kernel SOM can be seen as a result of minimizing the energy E = n j =1h (f (x ),j ) φ(x )−p j 2stochastically.Another version of the kernel SOM is also proposed by [2]:it uses prototypes chosen in the original set and then computes the algorithm with the images by φof this pro-totype.It comes from the minimization of the following energy E = n j =1h (f (x ),j ) φ(x )−φ(p j ) 2.2.2Dissimilarity SOM with dissimilaritiesbased on kernelsDissimilarity SOM ([11,6,8])is a generalization of the batch version of SOM to data described by a dissimilar-ity measure.We assume given,for all i,i =1,...,n ,ameasure δ(x i ,x i ),that is symmetric (δ(x,x )=δ(x ,x )),positive (δ(x,x )≥0)and such that δ(x,x )=0.The Dissimilarity SOM proceeds as follows:Algorithm 2:Dissimilarity SOM(1)For all j =1,...,M ,initialize p 0j randomly to one of theelement of the data set {x 1,...,x n };(2)For l =1,...,L ,do (3)assignment step :for all i =1,...,n ,x i is assigned to the neuron,f l (x i )which has the closest prototype:f l (x i )=argminj =1,...,Mδ(x i ,p l −1j)(4)representation step :for all j =1,...,M ,the pro-totype p j is recomputed:p l j=argminx ∈(x i )i =1,...,nn X i =1h (f l (x i ),j )δ(x i ,x )End for.As shown in step (4),the purpose is to choose prototypesin the data set that minimize the generalized energy E = M j =1 ni =1h (f (x i ),j )δ(x i ,p j ).[6,8]propose variants of this algorithm:the first one allows the use of several prototypes for a single neuron and the second describes a faster version of the algorithm.In [22],a dissimilarity based on a kernel is described:it is designed for the clustering of the vertices of a graph.To construct their dissimilarity,the authors take advantage of the fact that the kernel can be interpreted as a norm;then computing the distance induced by this norm leads to the definition of a dissimilarity measure on {x 1,...,x n }:δmed (x,x )= φ(x )−φ(x ) (1)= k (x,x )+k (x ,x )−2k (x,x ).We can also define a variant of this dissimilarity measure by,for all x ,x in G ,δmean (x,x )= φ(x )−φ(x ) 2(2)=k (x,x )+k (x ,x )−2k (x,x ).We now show that the dissimilarity SOM based on this last measure can be seen as a particular case of a batch version of the kernel SOM.2.3Batch kernel SOMReplacing the value of the dissimilarity (2)in the represen-tation step of the Dissimilarity SOM algorithm leads to the following equation:p l j =argminx ∈(x i )i =1,...,nn i =1h (f l (x i ),j ) φ(x i )−φ(x ) 2.In this equation,the prototypes are the images by φof some vertices;if we now allow the prototypes to belinear combinations of {φ(x i )}i =1,...,n as in the kernel SOM (section 2.1),the previous equation becomes p l j = ni =1γlji φ(x i )whereγl j=arg min γ∈Rnn i =1h (f l(x i ),j ) φ(x i )−n i =1γi φ(x i ) 2.(3)Equation (3)has a simple solution:p j = ni =1h (f l(x i ),j )φ(x i ) n i =1h (f l(x i ),j )(4)which is the weighted mean of the (φ(x i ))i .As a conse-quence,equation (3)is the representation step of a batch SOM computed in the feature space H .We will call this algorithm kernel batch SOM :Algorithm 3:Kernel batch SOM(1)For all j =1,...,M and all i =1,...,n ,initialize γ0jirandomly in R ;(2)For l =1,...,L ,do (3)assignment step :for all i =1,...,n ,x i is assigned to the neuron,f l (x i )which has the closest prototype:f l (x i )=argminj =1,...,Mφ(x i )−p l −1j(4)representation step :for all j =1,...,M ,the pro-totype p j is recomputed:γlj=arg min γ∈Rnn X i =1h (f l(x i ),j ) φ(x i )−n X i =1γi φ(x i ) 2End for.where,as shown in (4),the representation step simplyreduces toγlji =h (f l (x i ),j ) n i =1h (f l (x i ,j )).Like in the on-line version,the assignment is run by di-rectly using the kernel without explicitly defining φandH :in fact,and all x ∈{x i ,...,x n },it leads to the mini-mization on j ∈{1,...,M }of ni,i =1γji γji k (x i ,x i )−2 ni =1γji k (x,x i ).Then,the kernel batch SOM is simply a batch kernel SOM performed in a relevant space so it shares its consistancy properties [7].Finally,we conclude that the dissimilarity SOM de-scribed in section 2.2can be seen as the restriction of a kernel batch SOM to the case where the prototypes are ele-ments of the original data set.Formally,dissimilarity SOM is the batch kernel SOM for which the feature space is not Hilbertian but discrete.3Application to graphsThe fact that the prototypes are defined in the feature space H from the original data {x 1,...,x n }allows to apply thealgorithms described in section 2to a wide variety of data,as long as a kernel can be defined on the original set G (for which no vector structure is needed).In particular,this algorithm can be used to cluster the vertices of a weighted graph into homogeneous proximity groups using the graph structure only,without any assumption on the vertices set.The problem of clustering the vertices of a graph is of a great interest,for instance as a tool for understanding the organization of social networks ([3]).This approach has already been tested for the dissimilarity SOM on a graph extracted from a medieval database ([22]).We use in the rest of the paper the following notations.The dataset {x 1,...,x n }consists in the vertices of a graph G ,with a set of edges in E .Each edge (x i ,x i )has a positive weight w i,i (with w i,i =0⇔(x i ,x i )/∈E ).Weights are assumed to be symmetric (w i,i =w i ,i ).We call d i the degree of the vertex x i given by d i = ni =1w i,i.3.1The Laplacian and related kernelsIn [19],the authors investigate a family of kernels based on regularization of the Laplacian matrix.The Laplacian of the graph is the positive matrix L =(L i,i )i,i =1,...,n such that L i,i =−w i,i if i =id i if i =i (see [5,14]for a comprehensive review of the propertiesof this matrix).In particular,[19]shows how this discrete Laplacian can be derived from the usual Laplacian defined on continuous spaces.Applying regularization functions to the Laplacian,we obtain a family of matrices including •Regularized Laplacian :for β>0,K β=(I n +βL )−1;•and Diffusion matrix :for β>0,K β=e −βL ;These matrices are easy to compute for graphs having a few hundred of vertices via an eigenvalue decomposition:their eigensystem is deduced by applying the regularizing func-tions to the eigenvalues of the Laplacian (the eigenvectors are the same).Moreover,these matrices can be interpreted as regularizing matrices because the norm they induced pe-nalizes more the vectors that vary a lot over close vertices.The way this penalization is taken into account depends on the regularizing function applied to the Laplacian ing these regularizing matrices,we can define associ-ated kernel by k β(x i ,x i )=K βii .Moreover,the diffusion kernels (see also [14])can be interpreted as the quantity of energy accumulated after a given time in a vertex if energy is injected at time 0in the other vertex and if the diffusion is done along the edges.It has then become very popular to summarize both the global structure and the local proxim-ities of a graph (see [21,18]for applications in computa-tional biology).In [9],the authors investigate the distances induced by kernels of this type in order to rank the vertices of a weighted graph;they compare them to each others andshow their good performances compared to standard meth-ods.3.2SimulationsIn order to test the 3algorithms presented in section 2for clustering the vertices of a graph,we simulated 50graphs having a structure close to the ones described in [4,22].We made them as follow:•We built 5complete sub-graphs (C i )i =1,...,5(cliques)having (n i )i =1,...,5vertices where n i are generated from a Poisson distribution with parameter 50;•For all i =1,...,5,we generated l i random links between C i and vertices of the other cliques:l i is generated from a uniform distribution on the set {1,...,100n i }.Finally,multiple links are removed;thus the simulated graphs are “non-weighted”(i.e.w i,i ∈{0,1}).A simplified version of these types of graphs is shown in Figure 1:we restricted the mean of x i to 5and l i is gener-ated from a uniform distribution on {1,...,10n i }in order to make the visualizationpossible.Figure 1:Example of a simulated graph:the vertices of the 5cliques are represented by different labels (+ *x o)Algorithms 1to 3were tested on these graphs by using the diffusion kernel and the dissimilarities (equations (1)and (2))built from it.The grid chosen is a 3×3rectangular grid for which the central neuron has a neighborhood of size 2,as illustrated in Figure 2.We ran all the algorithms until the stabilization is ob-tained,which leads to:•500iterations for the on-line kernel SOM (algorithm 1);•20iterations for the dissimilarity SOM with both dis-similarities (algorithm 2);•10iterations for the batch SOM (algorithm 3).Then,to avoid the influence of the initialization step,the algorithms were initialized randomly 10times.For each graph and each algorithm,the best classification,whichFigure 2:SOM grid used (dark gray is the 1-neighborhood and light gray the 2-neighborhood of the central neuron)minimizes the energy of the final grid,is kept.The compu-tational burden of the algorithms is summarize in Table 1:it gives the total running time for 50graphs and 10initial-izations per graph.Batch kernel SOM is the fastest whereas Algorithm on-line k-SOMd-SOM batch k-SOMTime (min)2608020Table 1:Computation timesthe on-line kernel SOM is very slow because it needs a high number of iterations to stabilize.For the batch kernel SOM,we initialize the prototypes with random elements of the data set (as in the dissimilarity SOM)in order to obtain a good convergence of the algorithm.Finally,we tested three parameters for the diffusion ker-nel (β=0.1,β=0.05and β=0.01).Higher parameters were not tested because we had numerical instabilities in the computation of the dissimilarities for some of the 50graphs.In order to compare the classifications obtained by the different algorithms,we computed the following crite-ria:•the mean energy (except for the dissimilarity SOM with dissimilarity (1)which does not have a compa-rable energy).Notice also that the energies computed for different values of βcan also not be compared;•the standard deviation of the energy;•the mean number of classes found by the algorithm;•after having associated each neuron of the grid to one of the cliques by a majority vote law,the mean pour-centage of good classified vertices (vertices assigned to a neuron associated to its clique);•the number of links divided by the number of possible links between 2vertices assigned to 2neurons having distance 1(or 2,3,4)between them on the grid.The results are summarized in Tables 2to 5and an example of a classification obtained for the batch kernel SOM for the graph represented in Figure 1is given in Figure 3.First of all,we see that the quality of the classification heavily depends on the choice of β.For this application,performances decrease with β,with very bad performances for all the algorithms with the parameter β=0.01.β=0.1β=0.05β=0.01 Mean energy0.10 3.21196Sd of energy0.40 4.739 Mean nb of classes8.048.929 Mean%of79.8478.2839.72 good classif.%of links for54.558.451.31-neighborhood%of links for39.140.048.02-neighborhood%of links for34.933.145.63-neighborhood%of links for24.228.943.84-neighborhoodTable2:Performance criteria for the on-line kernel SOMβ=0.1β=0.05β=0.01 Mean energy NA NA NASd of energy NA NA NA Mean nb of classes999 Mean%of77.3440.4929.56 good classif.%of links for48.645.452.51-neighborhood%of links for42.045.555.82-neighborhood%of links for38.048.157.13-neighborhood%of links for34.851.557.04-neighborhoodTable3:Performance criteria for the dissimilarity SOM (dissimilarity(1))Then,we can also remark that the performances highly depend on the graph:the standard deviation of the energy is high compared to its mean.In fact,5graphs always ob-tained very bad performances and removing them divides the standard deviation by20.Comparing the algorithms to each others,we see that the batch kernel SOM seems tofind the best clustering for the vertices of the graph:this is the one for which the mean number of classes found by the algorithm is the closest to the number of cliques(5).It has also the best pourcentage of good classified vertices and the smallest number of links for all neighborhoods,proving that the vertices classified in the same cluster are also frequently in the same clique. Then comes the on-line kernel SOM that suffers from a long computational time andfinally,the dissimilarity SOM with slightly better performances for the dissimilarity(2). Comparing thefirst three Tables,we can say that the perfor-mance gain of the on-line kernel SOM is really poor com-pared to the computational time differences between the algorithms.Moreover,interpretation of the clusters mean-β=0.1β=0.05β=0.01 Mean energy0.13 3.99300Sd of energy0.557.365 Mean nb of classes999 Mean%of77.8940.6929.77 good classif.%of links for49.645.052.31-neighborhood%of links for41.745.755.82-neighborhood%of links for36.948.056.83-neighborhood%of links for34.051.058.84-neighborhoodTable4:Performance criteria for the dissimilarity SOM (dissimilarity(2))β=0.1β=0.05β=0.01 Mean energy0.10 3.00172Sd of energy0.38 4.4535 Mean nb of classes 6.567.569 Mean%of94.3492.7232.81 good classif.%of links for44.948.047.61-neighborhood%of links for37.837.646.62-neighborhood%of links for29.132.348.83-neighborhood%of links for28.628.546.44-neighborhoodTable5:Performance criteria for batch kernel SOMing can take benefit of the fact that prototypes are elements of the data set.On the contrary,dissimilarity SOM totally fails in decreasing the number of relevant classes(the mean number of clusters in thefinal classification is always the biggest possible,9);this leads to a bigger number of links between two distinct neurons than in the both versions of kernel SOM.4ConclusionsWe show in this paper that the dissimilarity SOM used with a kernel based dissimilarity is a particular case of a batch kernel SOM.This leads us to the definition of a batch un-supervised algorithm for clustering the vertices of a graph. The simulations made on randomly generated graphs prove that this batch version of kernel SOM is both efficient and fast.The dissimilarity SOM,which is more restricted,is less efficient but still have good performances and produces prototypes that are more easily interpretable.We also em-phasized the importance of a good choice of the parameterFigure3:Example of a classification obtained for the batch kernel SOM on graph represented in Figure1of the kernel,a problem for which an automatic solution would be very useful in practice. AcknowledgementsThis project is supported by“ANR Non Th´e matique 2005:Graphes-Comp”.The authors also want to thank the anonymous referees for their helpful comments. References[1] C.Ambroise and aert.Analyzing dissimilarity ma-trices via Kohonen maps.In Proceedings of5th Conference of the International Federation of Classification Societies (IFCS1996),volume2,pages96–99,Kobe(Japan),March 1996.[2]P.Andras.Kernel-Kohonen networks.International Journalof Neural Systems,12:117–135,2002.[3]S.Bornholdt and H.G.Schuster.Handbook of Graphs andNetworks-From the Genome to the Internet.Wiley-VCH, Berlin,2002.[4]R.Boulet and B.Jouve.Partitionnement d’un r´e seau desociabilit´e`a fort coefficient de clustering.In7`e mes Journ´e es Francophones“Extraction et Gestion des Connaissances”, pages569–574,2007.[5] F.Chung.Spectral Graph Theory.Number92in CBMS Re-gional Conference Series in Mathematics.American Math-ematical Society,1997.[6] B.Conan-Guez,F.Rossi,and A.El Golli.Fast algorithmand implementation of dissimilarity self-organizing maps.Neural Networks,19(6-7):855–863,2006.[7]M.Cottrell,B.Hammer,A.Hasenfuss,and T.Villmann.Batch and median neural gas.Neural Networks,19:762–771,2006.[8] A.El Golli,F.Rossi,B.Conan-Guez,and Y.Lecheval-lier.Une adaptation des cartes auto-organisatrices pour des donn´e es d´e crites par un tableau de dissimilarit´e s.Revue de Statistique Appliqu´e e,LIV(3):33–64,2006.[9] F.Fouss,L.Yen,A.Pirotte,and M.Saerens.An experimen-tal investigation of graph kernels on a collaborative recom-mendation task.In IEEE International Conference on Data Mining(ICDM),pages863–868,2006.[10]T.Graepel,M.Burger,and K.Obermayer.Self-organizingmaps:generalizations and new optimization techniques.Neurocomputing,21:173–190,1998.[11]T.Kohohen and P.J.Somervuo.Self-Organizing maps ofsymbol strings.Neurocomputing,21:19–30,1998.[12]T.Kohonen.Self-organizing maps of symbol strings.Tech-nical report a42,Laboratory of computer and information science,Helsinki University of technoligy,Finland,1996.[13]T.Kohonen.Self-Organizing Maps,3rd Edition.In SpringerSeries in Information Sciences,volume30.Springer,Berlin, Heidelberg,New York,2001.[14]R.I.Kondor and fferty.Diffusion kernels on graphs andother discrete structures.In Proceedings of the19th Inter-national Conference on Machine Learning,pages315–322, 2002.[15]u,H.Yin,and S.Hubbard.Kernel self-organisingmaps for classification.Neurocomputing,69:2033–2040, 2006.[16] D.Mac Donald and C.Fyfe.The kernel self organisingmap.In Proceedings of4th International Conference on knowledge-based intelligence engineering systems and ap-plied technologies,pages317–320,2000.[17] B.Sch¨o lkopf,A.Smola,and K.R.M¨u ller.Nonlinear com-ponents analysis as a kernel eigenvalue problem.Neural Computation,10:1299–1319,1998.[18] B.Sch¨o lkopf,K.Tsuda,and J.P.Vert.Kernel methods incomputational biology.MIT Press,London,2004.[19] A.J.Smola and R.Kondor.Kernels and regularization ongraphs.In M.Warmuth and B.Sch¨o lkopf,editors,Pro-ceedings of the Conference on Learning Theory(COLT)and Kernel Workshop,2003.[20]V.Vapnik.The Nature of Statistical Learning Theory.Springer Verlag,New York,1995.[21]J.P.Vert and M.Kanehisa.Extracting active pathways fromgene expression data.Bioinformatics,19:238ii–244ii,2003.[22]N.Villa and R.Boulet.Clustering a medieval social networkby SOM using a kernel based distance measure.In M.Ver-leysen,editor,Proceedings of ESANN2007,pages31–36, Bruges,Belgium,2007.[23]H.Yin.On the equivalence between kernel self-organisingmaps and self-organising map mixture density networks.Neural Networks,19:780–784,2006.。