当前位置:文档之家› 半监督分类方法在软件缺陷预测中的方法比较

半监督分类方法在软件缺陷预测中的方法比较

半监督分类方法在软件缺陷预测中的方法比较
半监督分类方法在软件缺陷预测中的方法比较

DOI 10.1515/jisys-2013-0030 Journal of Intelligent Systems 2014; 23(1): 75–82 Cagatay Catal*

A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction

Abstract: Predicting the defect-prone modules when the previous defect labels of modules are limited is a challenging problem encountered in the software industry. Supervised classification approaches cannot build high-performance prediction models with few defect data, leading to the need for new methods, techniques, and tools. One solution is to combine labeled data points with unlabeled data points during learning phase. Semi-supervised classification methods use not only labeled data points but also unlabeled ones to improve the generalization capability. In this study, we evaluated four semi-supervised classifica-tion methods for semi-supervised defect prediction. Low-density separation (LDS), support vector machine (SVM), expectation-maximization (EM-SEMI), and class mass normalization (CMN) methods have been inves-tigated on NASA data sets, which are CM1, KC1, KC2, and PC1. Experimental results showed that SVM and LDS algorithms outperform CMN and EM-SEMI algorithms. In addition, LDS algorithm performs much better than SVM when the data set is large. In this study, the LDS-based prediction approach is suggested for software defect prediction when there are limited fault data.

Keywords: Defect prediction, expectation-maximization, low-density separation, quality estimation, semi-supervised classification, support vector machines.

*Corresponding author: Cagatay Catal, Department of Computer Engineering, Istanbul Kultur University, Istanbul 34156, Turkey, e-mail: c.catal@https://www.doczj.com/doc/5113611740.html,.tr

1 Introduction

The activities of modern society highly depend on software-intensive systems, and therefore, the quality of software systems should be measured and improved. There are many definitions of software quality, but unfortunately, the general perception is that software quality cannot be measured or should not be measured if it works. However, this assumption is not true, and there are many software quality assessment models such as SQALE that investigate and evaluate several internal quality characteristics of software. Software quality can be defined differently based on different views of quality [19]:

–Transcendental view: quality can be described with abstract terms instead of measurable characteristics and users can recognize quality if it exists in a software product.

–Value-based view: quality is defined with respect to the value it provides and customers decide to pay for the software if the perceived value is desirable.

–User view: quality is the satisfaction level of the user in terms of his/her needs.

–Manufacturing view: process standards improve the product quality and these standards must be applied.–Product view: This view of quality focuses on internal quality characteristics.

In this study, we use the product view definition of software quality, and internal quality characteristics are rep-resented with metrics values. Objective assessment of software quality is performed based on measurement data collected during quality engineering processes. Quality assessment models are quantitative analytical models and more reliable compared with qualitative models based on personal judgment.

They can be classified into two categories: generalized and product-specific models [19]:

1. Generalized models: Project or product-specific data are not used in generalized models and industrial

averages help to estimate the product quality roughly. They are grouped into three subcategories:

–Overall models: a rough estimate of quality is provided.

76 C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction

–Segmented models: different quality estimates are applied for different product segments.

–Dynamic models: dynamic model graphs depict the quality trend over time for all products.

2. Product-specific models: Product-specific data are used in these models in contrast to the generalized

models.

–Semi-customized models: These models apply historical data from previous releases of the same product instead of a profile for all products.

–Observation-based models: These models only use data from the current project, and quality is esti-mated based on observations from the current project.

–Measurement-driven predictive models: These models assume that there is a predictive relation between software measurements and quality.

Software defect prediction models are in the measurement-driven predictive models group, and the objective of our research is to evaluate several semi-supervised classification methods for software defect prediction models when there are limited defect data. Defect prediction models use software metrics and previous defect data. In these models, metrics are regarded as independent variables and defect label is regarded as depend-ent variable. The aim is to predict the defect labels (defect-prone or non-defect-prone) of the modules for the next release of software. Machine-learning algorithms or statistical techniques are trained with previous software metrics and defect data. Then, these algorithms are used to predict the labels of new modules. The benefits of this quality assurance activity are impressive. Using such models, one can identify the refactoring candidate modules, improve the software testing process, select the best design from design alternatives with class level metrics, and reach a dependable software system.

Software defect prediction models have been studied since 1990s and up to now, and defect-prone modules can be identified prior to system tests using these models. According to recent studies, the prob-ability of detection (PD) (71%) of defect prediction models may be higher than PD of software reviews (60%) if a robust model is built [15]. A panel at the IEEE Metrics 2002 conference did not support the claim [6] that inspections can find 95% of defects before testing, and this detection ratio was around 60% [15]. One member of the review team may inspect 8–20 lines of code per minute. This process is performed by all the members of a team consisting of 4–6 members [15]. Therefore, software defect prediction approaches are much more cost-effective in detecting software defects compared with software reviews. Readers who want to get more infor-mation about software defect prediction may read our literature review [2] and systematic review papers [4].

Most of the studies in literature assume that there is enough defect data to build the prediction models. However, there are cases when previous defect data are not available, such as when a company deals with a new project type or when defect labels in previous releases may have not been collected. In these cases, we need new models to predict software defects. This research problem is called software defect prediction of unlabeled program modules.

Supervised classification algorithms in machine learning can be used to build the prediction model with previous software metrics and previous defect labels. However, sometimes, we cannot have enough defect data to build accurate models. For example, some project partners may not collect defect data for some project components, or the execution cost of metrics collection tools on the whole system may be extremely expensive. In addition, during decentralized software development, some companies may not collect defect data for their components. In these situations, we need powerful classifiers that can build accurate classifi-cation models with very limited defect data or powerful semi-supervised classification algorithms that can benefit from unlabeled data together with labeled one. This research problem is called software defect predic-tion with limited defect data. This study is an attempt to identify a semi-supervised classification algorithm for limited defect data problem.

We applied low-density separation (LDS), support vector machine (SVM), expectation-maximization (EM-SEMI), and class mass normalization (CMN) methods to investigate their performance on NASA data sets, which are CM1, KC1, KC2, and PC1. Although previous research on semi-supervised defect prediction reported that EM-SEMI-based semi-supervised approach is very effective [16], we showed that LDS and SVM algorithms provide better performance than EM-based prediction approach. Seliya and K hoshgoftaar [16]

C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction 77 stated that their future work is to compare the EM-based approach with other semi-supervised learning approaches such as SVM or co-training. Therefore, the scope of our study is similar to their article’s scope, but we extend their study with new algorithms and evaluate several methods to find the best method. The details of the algorithms used in this study are explained in Section 3. In addition, recent studies on the use of semi-supervised learning methods for defect prediction are shown in detail in Section 2.

This article is organized as follows: Section 2 presents the related work in the software defect prediction area. Section 3 explains the semi-supervised classification methods, whereas Section 4 shows the experimen-tal results. Finally, Section 5 presents the conclusions and the future works.

2 Related Work

Seliya and Khoshgoftaar [16] applied EM algorithm for software defect prediction with limited defect data, and they reported that the EM-based approach provides remarkable results compared with classification algo-rithms. In the research project started in April 2008, we showed that na?ve Bayes algorithm is the best choice to build a supervised defect prediction model for small data sets, and yet another two-stage idea (YATSI) algo-rithm may improve the performance of na?ve Bayes for large data sets [3]. Researchers are very optimistic to apply unlabeled data together with labeled one when there is not enough labeled data for classification. However, we showed some evidence that this optimistic idea is not true for all the classifiers on all the data sets [3]. Na?ve Bayes performance degraded when unlabeled data were used for YATSI algorithm on KC2, CM1, and JM1 data sets [3]. Recently, Li et al. [11] proposed two semi-supervised machine-learning methods (CoFor-est and ACoForest) for defect prediction. They showed that CoForest and ACoForest can achieve better perfor-mance than traditional machine-learning techniques such as na?ve Bayes. Lu et al. [13] developed an iterative semi-supervised approach for software fault prediction and changed the size of labeled modules from 2% to 50%. They showed that this approach is better than the corresponding supervised learning approach with random forest as the base learner. This approach improves the performance only if the number of labeled modules exceeds 5%. Lu et al. [14] investigated a variant of self-training called fitting-the-confident-fits (FTcF) as a semi-supervised learning approach for fault prediction. In addition to this algorithm, they used a preproc-essing step called multidimensional scaling (MDS). They showed that this model, based on FTcF and MDS, is better than best supervised learning algorithms such as random forest. Jiang et al. [7] proposed a semi-supervised learning approach called ROCUS for software defect prediction. They showed that this approach is better than a semi-supervised learning approach that ignores the class-imbalance nature of the task. In these articles, researchers did not investigate the methods we used in this article. The drawback of these approaches that use semi-supervised learning methods is that they need at least some labeled data points [10].

3 Methods

In this section, semi-supervised classification methods that have been investigated are explained.

3.1 Low-Density Separation

Chapelle and Zien [5] developed the LDS algorithm in 2005. They showed that LDS provides better perfor-mance than SVM, manifold, transductive support vector machine (TSVM), graph, and gradient TSVM. LDS first calculates the graph-based distances and then applies gradient TSVM algorithm. LDS algorithm com-bines two algorithms [5]: graph (training an SVM on a graph-distance-derived kernel) and ?TSVM (training a TSVM by gradient descent).

78 C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction

SVM was first introduced in 1992 [1], and now it is widely used for machine-learning research. SVM is a supervised learning method that is based on the statistical learning theory and the Vapnik-Chervonenkis dimension. Although standard SVM learns to separate the hyperplane on labeled data points, TSVM learns to separate the hyperplane on both labeled and unlabeled data points. Although TSVM appears to be a very powerful semi-supervised classification method, its objective function is nonconvex, and therefore, it is difficult to minimize this function, and sometimes, bad results may results from optimization heuristics such as SVMlight [5]. Because LDS includes two algorithms that are based on SVM and TSVM, LDS can be thought of as an extended version of SVM algorithms for semi-supervised classification. We used the MATLAB ( M athWorks, Natick, M A, USA) implementation of LDS algorithm, which can be accessed from www.kyb.tuebingen.mpg.de/bs/people/chapelle/lds. The motivation and theory behind this algorithm are discussed in a paper published by Chapelle and Zien [5]. They have shown that the combination of graph and ?TSVM yields a superior semi-supervised classification algorithm.

3.2 Support Vector Machine

We used SVMlight [8] software for the experiments. SVMlight is an implementation of Vapnik’s SVM [20] in C programming language for pattern recognition and regression. There are several optimization algorithms [9] in the SVMlight software.

3.3 Expectation-Maximization

EM, which is an iterative algorithm, is used for maximum likelihood estimation with incomplete data. EM replaces missing data with estimated values, estimates model parameters, and then re-estimates the missing data [17]. Therefore, it is an iterative approach. In this study, the class label is used as the missing value. The theory and foundations of EM are discussed by Little and Rubin [12]. EM algorithm has been evaluated for semi-supervised classification in literature [17], and we included it to compare the performance of LDS with the performance of EM. We have noted that LDS algorithm outperforms EM algorithm.

3.4 Class Mass Normalization

Zhu et al. [21] developed an approach that is based on a Gaussian random field model defined on a weighted graph. On the weighted graph, vertices are labeled and unlabeled data points. The weights are given in terms of a similarity function between instances. Labeled data points are shown as boundary points, and unlabeled points appear as interior points. The CMN method was adopted to adjust the class distributions to match the prior knowledge [21]. Zhu et al. [21] chose Gaussian fields over a continuous state space instead of random fields over the discrete label set. They showed that their approach was promising on several data sets, and they used only the mean of the Gaussian random field, which is characterized in terms of harmonic functions and spectral graph theory [21]. In this article, we used this approach, which uses the CMN method, for our benchmarking analysis.

4 Empirical Case Studies

This section presents the data sets and performance evaluation metrics that were used in this study and the results of the experimental studies.

C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction 79

4.1 System Description

NASA projects located in the PROMISE repository were used for our experiments. We used four public data sets, which are KC1, PC1, KC2, and CM1. The KC1 data set, which uses the C++ programming language, belongs to a storage management project for receiving/processing ground data. It has 2109 software modules, 15% of which are defective. PC1 belongs to a flight software project for an Earth-orbiting satellite. It has 1109 modules, 7% of which are defective, and uses the C implementation language. The KC2 data set belongs to a data-processing project, and it has 523 modules. The C++ language was used for the KC2 implementation, and 21% of its modules have defects. CM1 belongs to a NASA spacecraft instrument project and was developed in the C programming language. CM1 has 498 modules, of which 10% are defective.

4.2 Performance Evaluation Parameter

The performance of the semi-supervised classification algorithms was compared using the area under receiver operating characteristic (ROC) curve (AUC) para m eter. ROC curves and AUC are widely used in machine learning for performance evaluation. ROC curve was first used in signal detection theory to evaluate how well a receiver distinguishes a signal from noise, and it is still used in medical diagnostic tests. The ROC curve plots probability of false alarm (PF) on the x-axis and the PD on the y-axis. A ROC curve is shown in Figure 1. ROC curves pass through points (0, 0) and (1, 1) by definition. Because the accuracy and precision parameters are deprecated for unbalanced data sets, we used AUC values for benchmarking.Equations (1) and (2) were used to calculate PD and PF values, respectively. Each pair (PD, PF), represents the classification performance of the threshold value that produces this pair. The pair that produces the best classification performance is the best threshold value to use in practice [18]. PD ,TP TP FN

=+ (1) PF .FP FP TN =

+ (2)

10.50.750.25

01

0.250.750.5Cost-adverse region PF=PD=no information

Risk-adverse region

Preferred curve

Negative

curve

PF=probability of false alarm P D =p r o b a b i l i t y o f d e t e c t i o n Figure 1.?ROC Curve [15].

80 C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction

Table 2.?Performance Results on KC2.

Algorithm

5%10%20%LDS 0.800.810.82SVM 0.840.850.85CMN 0.690.690.69EM-SEMI 0.470.560.55

Table 3.?Performance Results on KC1.

Algorithm

5%10%20%LDS 0.770.780.79SVM 0.760.750.70CMN 0.730.760.76EM-SEMI 0.490.480.50

Table 4.?Performance Results on PC1.

Algorithm

5%10%20%LDS 0.730.750.75SVM 0.700.730.70CMN 0.630.610.80EM-SEMI 0.600.580.55

Table 1.?Performance Results on CM1.

Algorithm

5%10%20%LDS 0.660.700.73SVM 0.760.760.73CMN 0.650.670.74EM-SEMI 0.470.470.53

4.3 Results

We simulated small labeled-large unlabeled data problem with 5%, 10%, and 20% rates and evaluated the performance of each semi-supervised classification algorithm under these circumstances. This means that we trained on only 5%, 10%, and 20% of the data set and tested on the rest for LDS, SVM, CMN, and EM-SEMI algorithms.Tables 1–4 show the performance results of each algorithm on different data sets with respect to these rates, and we compared classifiers using AUC values. According to experimental results in Tables 1 and 2, the best classifier for CM1 and KC2 is SVM because SVM has the highest AUC values for these data sets. Accord-ing to experimental results in Tables 3 and 4, the best classifier for KC1 and PC1 is LDS because LDS has the highest AUC values for these data sets.For all the data sets, SVM and LDS, which are based on SVM, provide better performance than EM-SEMI algorithm. When we looked at the size of data sets, we have noted that the size of data set is large when LDS provides better performance than SVM. Based on these experimental results, we can express that applying LDS algorithm is an ideal solution for large data sets when there are not enough labeled data.

C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction 81

We compared the results of this study with the performance results we achieved in our previous research [3], and we showed that SVM or LDS provides much better performance than the algorithms we used in our previous research [3]. This research also indicates that although EM algorithm has been suggested for semi-supervised defect prediction in literature [17], it is very hard to get acceptable results with this algorithm.

All the classifiers reached their highest AUC values for the KC2 data set. This empirical observation may indicate that the KC2 data set has very few noisy modules or does not have any noisy module. This observa-tion has been observed in our previous research as well [3]. Although CM1 and KC2 data sets have roughly the same number of modules, the classifiers’ performances are higher for KC2 than CM1. Furthermore, lines of code for KC2 are approximately two times longer than CM1 even though they have roughly the same number of modules. According to these observations, we can conclude that SVM performance variations for KC2 and CM1 data sets may be due to the different usage of lines of code per method.

5 Conclusions and Future Works

Software defect prediction using only few defect data is a challenging research area, where only very few studies were made. The previous works were based on EM-based approaches, and in this study, we showed that the LDS-based defect prediction approach works much better than EM-based defect prediction [3]. The main contribution of our article is two-fold. First, this study compares several semi-supervised algorithms for software defect prediction with limited data and suggests a classifier for further research. Second, we empiri-cally show that LDS is usually the best semi-supervised classification algorithm for large data sets when there are limited defect data. In addition, SVM algorithm worked well on smaller data sets. However, the difference between the performance of SVM and LDS for small data sets is small, and quality managers may directly use LDS algorithm for semi-supervised defect prediction without considering the size of the data set. Because LDS provided high performance for our experiments, there is a potential research opportunity to investigate the LDS for much better models for semi-supervised software fault prediction. We did not encounter any article in literature that used LDS for this problem. In the future, we are planning to compare the performance of LDS with the recent approaches explained in Section 2 and correspond with their respective authors. We are also planning to use genetic algorithms to optimize the parameters of LDS algorithm and propose a new algorithm that can achieve higher performance. In addition, case studies of other systems such as open-source projects, e.g., eclipse, may provide further validation of our study.

Received May 8, 2013; previously published online August 31, 2013.

Bibliography

[1] B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, COLT1992 (1992), 144–152.

[2] C. Catal, Software fault prediction: a literature review and current trends, Expert Syst. Appl.38 (2011), 4626–4636.

[3] C. Catal and B. Diri, Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction,

Expert Syst. J. Knowledge Eng.26 (2009), 458–471.

[4] C. Catal and B. Diri, A systematic review of software fault prediction studies, Expert Syst. Appl.36 (2009), 7346–7354.

[5] O. Chapelle and A. Zien, Semi-supervised classification by low density separation, in: Proceedings of the Tenth Interna-

tional Workshop on Artificial Intelligence and Statistics, pp. 57–64, Society for Artificial Intelligence and Statistics, 2005.

[6] M. Fagan, Advances in software inspections, IEEE Trans. Software Eng.12 (1986), 744–751.

[7] Y. Jiang, M. Li and Z. Zhou, Software defect detection with ROCUS, J. Comput. Sci. Technol.26 (2011), 328–342.

[8] T. Joachims, Making large-scale SVM learning practical, advances in kernel methods – support vector learning, MIT Press,

Cambridge, MA, 1999.

[9] T. Joachims, Learning to classify text using support vector machines, dissertation, 2002.

[10] E. Kocaguneli, B. Cukic and H. Lu, Predicting more from less: synergies of learning, in: 2nd International Workshop on

Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013), pp. 42–48, West Virgina University, San Francisco, CA, 2013.

82 C. Catal: Semi-Supervised Classification Approaches for Software Defect Prediction

[11] M. Li, H. Zhang, R. Wu and Z. Zhou, Sample-based software defect prediction with active and semi-supervised learning,

Automated Software Eng. 19 (2012), 201–230.

[12] R. J. A. Little and D. B. Rubin, Statistical analysis with missing data, Wiley, New York, 1987.

[13] H. Lu, B. Cukic and M. Culp, An iterative semi-supervised approach to software fault prediction, in: Proceedings of the 7th

International Conference on Predictive Models in Software Engineering (PROMISE ‘11), 10 pp., ACM, New York, NY, Article 15, 2011.

[14] H. Lu, B. Cukic and M. Culp, Software defect prediction using semi-supervised learning with dimension reduction, in:

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), pp. 314–317, ACM, New York, NY, 2012.

[15] T. Menzies, J. Greenwald and A. Frank, Data mining static code attributes to learn defect predictors, IEEE Trans. Software

Eng.33 (2007), 2–13.

[16] N. Seliya and T. M. Khoshgoftaar, Software quality estimation with limited fault data: a semi-supervised learning

perspective, Software Qual. Contr.15 (2007), 327–344.

[17] N. Seliya, T. M. Khoshgoftaar and S. Zhong, Semi-supervised learning for software quality estimation, in: Proceedings of

the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI ‘04), pp. 183–190, IEEE Computer Society, Washington, DC, 2004.

[18] R. Shatnawi, W. Li, J. Swain and T. Newman, Finding software metrics threshold values using ROC curves, J. Software

Maintenance Evol. Res. Pract.22 (2010), 1–16.

[19] J. Tian, Software quality engineering: testing, quality assurance, and quantifiable improvement, John Wiley & Sons,

Hoboken, New Jersey, USA, 2005.

[20] V. Vapnik, The nature of statistical learning theory, Springer, New York, NY, USA, 1995.

[21] X. Zhu, Z. Ghahramani and J. D. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, ICML

2003 (2003), 912–919.

基于大数据软件缺陷分析(6D)

PMI授权的项目管理综合培训机构 Global Registered Education Provider 课程:基于大数据的软件缺陷分析和预测 什么是大数据?数据挖掘?R语言在华尔街盛行? 大数据Big data、数据挖掘Data mining、R语言等在不同领域的应用越来越流行,比如:用于市场分析消费者消费习惯,以推出针对性广告;找出有问题的信用卡交易;用于股票或 者财务市场预测;用于地区犯罪率预测。 这些对软件和科技行业有什么作用? 在软件工程方面开始使用大数据帮助预测缺陷,更容易地提高软件质量。从2000年开始,学术界对缺陷的预测已经做了不小的研究,一直收集产品发布的不同历史,包括缺陷历 史、变更历史、代码本身。通过数据分析,可以找出在新版本里面容易出错的地方。 如果公司是对软件质量要求很高的相关行业,比如银行、财务、通信,因它们知道单是靠最终的系统测试无法把潜在的缺陷都找出来,以下系列课程可以帮公司,QA,或技术人 员开拓视野。 我们的课程为学员解释以上新技术的概念,教授如何准备对应的日常数据和利用新技术。 这一系列课程先从统计、度量开始,介绍如何利用常用工具,帮公司建立可以长期操作的度量系统,不断去搜集过去历史的产品开发经验,然后可以为日后做出一些公司的基线和 预测模型打好基础,帮助产品质量的提高。 我们一系列总共有3个课程,每个课程为2天,从最基础的统计、度量与分析开始,到最后利用一些大数据,Data mining的技巧,对一些实例做分析研究。 课程特点: 课程以实战为主理论为辅。提供足够的参考资料给学员在课前后研究。学员通过学习后也可以掌握一些立马可以在公司里推行的开源工具、程序和技巧。

预测方法的分类

预测方法的分类 郑XX 预测方法的分类 由于预测的对象、目标、内容和期限不同,形成了多种多样的预测方法。据不完全统计,目前世界上共有近千种预测方法,其中较为成熟的有150多种,常用的有30多种,用得最为普遍的有10多种。 1-1预测方法的分类体系 1)按预测技术的差异性分类 可分为定性预测技术、定量预测技术、定时预测技术、定比预测技术和评价预测 技术,共五类。 2)按预测方法的客观性分类 可分为主观预测方法和客观预测方法两类。前者主要依靠经验判断,后者主要借 助数学模型。 3)按预测分析的途径分类 可分为直观型预测方法、时间序列预测方法、计量经济模型预测方法、因果分析 预测方法等。 4)按采用模型的特点分类 可分为经验预测模型和正规的预测模型。后者包括时间关系模型、因果关系模 型、结构关系模型等。 1-2 常用的方法分类 1)定性分析预测法 定性分析预测法是指预测者根据历史与现实的观察资料,依赖个人或集体的经验与智慧,对未来的发展状态和变化趋势作出判断的预测方法。 定性预测优缺点 定性预测的优点在于: 注重于事物发展在性质方面的预测,具有较大的灵活性,易于充分发挥人的主观能动作用,且简单的迅速,省时省费用。

定性预测的缺点是: 易受主观因素的影响,比较注重于人的经验和主观判断能力,从而易受人的知识、经验和能力的多少大小的束缚和限制,尤其是缺乏对事物发展作数量上的精确描述。 2)定量分析预测法 定量分析预测法是依据调查研究所得的数据资料,运用统计方法和数学模型,近似地揭示预测对象及其影响因素的数量变动关系,建立对应的预测模型,据此对预测目标作出定量测算的预测方法。通常有时间序列分析预测法和因果分析预测法。 ⅰ时间序列分析预测法 时间序列分析预测法是以连续性预测原理作指导,利用历史观察值形成的时间数列,对预测目标未来状态和发展趋势作出定量判断的预测方法。

软件源代码安全缺陷检测技术研究进展综述

软件源代码安全缺陷检测技术研究进展综述 摘要:软件安全缺陷检测已经成为软件行业非常重要的一项工作。安全关键软件设计使用的C/C++语言含有大量未定义行为,使用不当可能产生重大安全隐患。本文将根据八篇前沿论文,总结提出八种比较新的软件安全缺陷检测技术和算法。设计和实现了一个可扩展的源代码静态分析工具平台,并通过实验表明,相对于单个工具的检测结果而言,该平台明显降低了漏报率和误报率。 关键字:源代码;安全缺陷;静态检测工具;缺陷描述 Abstract:Software security detection has become a very important work in the software industry. Fatal security vulnerabilities are caused by undefined behaviors of C/C++ language used in Safety-Critical software. This paper will give out eight kinds of new technology about the software security detection based on eight cutting-edge papers. design. Key words: source code; safety defects; static test tools; statistical analysis; defectives description 1引言: 近年来,随着软件事业的发展,人们逐渐的认识到,想要开发出高质量的软件产品,必须对软件的开发过程进行改善。研究表明,相当数量的安全问题是由于软件自身的安全漏洞引起的。软件开发过程中引入的大量缺陷,是产生软件漏洞的重要原因之一。软件源代码安全性缺陷排除是软件过程改进的一项重要措施。当前,与源代码安全缺陷研究相关的组织有CWE、Nist、OWASP等。业界也出现了一批优秀的源代码安全检测工具,但是这些机构、组织或者公司对源代码发中缺表 1 CWE 中缺陷描述字段表 2 SAMATE 中评估实例描述方法陷的描述方法不一,业界没有统一的标准。在实际工作中,经过确认的缺陷需要提取,源代码需要用统一的方法描述。本文根据实际工作的需要,调研国内外相关资料,提出一种源代码缺陷描述方法。 通常意义上的网络安全的最大威胁是程序上的漏洞,程序漏洞检测主要分为运行时检测和静态分析方法。运行时检测方法需要运行被测程序,其检测依赖外部环境和测试用例,具有一定的不确定性。 开发人员在开发过程中会引入一些源代码缺陷,如SQL 注入、缓冲区溢出、跨站脚本攻击等。同时一些应用程序编程接口本身也可能存在安全缺陷。而这些安全缺陷轻则导致应用程序崩溃,重则导致计算机死机,造成的经济和财产损失是无法估量的。目前的防护手段无法解决源代码层面的安全问题。因而创建一套科学、完整的源代码安全缺陷评价体系成为目前亟待解决的问题。 目前与源代码安全缺陷研究相关的组织有CWE等,业界也出现了一批优秀的源代码安全检测工具,但是这些机构和组织对源代码中缺陷的描述方法不一,没有统一的标准。本文借鉴业界对源代码缺陷的描述,结合实际工作需要,提出了一种计算机源代码缺陷的描述方法。 随着社会信息化的不断加深,人们不得不开始面对日益突出的信息安全问题。研究表明,相当数量的安全问题是由于软件自身的安全漏洞引起的。软件开发过程中引入的大量缺陷,是产生软件漏洞的重要原因之一。不同的软件缺陷会产生不同的后果,必须区别对待各类缺陷,分析原因,研究其危害程度,预防方法等。建立一个比较完整的缺陷分类信息,对预防和修复软件安全缺陷具有指导作用。软件缺陷一般按性质分类,目前已有很多不同的软件缺陷分类法,但在当前实际审查使用中,这些缺陷分类存在以下弊端: (1)专门针对代码审查阶段发现缺陷的分类较少。现有的分类法一般包括动态测试发现的缺陷类型和文档缺陷等,

软件测试之缺陷分析

有完全实现但不影响使用。如提示信息不太准确,或用户界面差,操作时间长,模块功能部分失效等,打印内容、格式错误,删除操作未给出提示,数据库表中有过多的空字段等 D类—较小错误的软件缺陷(Minor),使操作者不方便或遇到麻烦,但它不影响功能过的操作和执行,如错别字、界面不规范(字体大小不统一,文字排列不整齐,可输入区域和只读区域没有明显的区分标志),辅助说明描述不清楚 E类- 建议问题的软件缺陷(Enhancemental):由问题提出人对测试对象的改进意见或测试人员提出的建议、质疑。 常用的软件缺陷的优先级表示方法可分为:立即解决P1、高优先级P2、正常排队P3、低优先级P4。立即解决是指缺陷导致系统几乎不能使用或者测试不能继续,需立即修复;高优先级是指缺陷严重影响测试,需要优先考虑;正常排队是指缺陷需要正常排队等待修复;而低优先级是指缺陷可以在开发人员有时间的时候再被纠正。 正确评估和区分软件缺陷的严重性和优先级,是测试人员和开发人员以及全体项目组人员的一件大事。这既是确保测试顺利进行的要求,也是保证软件质量的重要环节,应该要引起足够的重视。这里介绍三种常用的技术工具供大家参考。 (1)20/80原则 管理学大师彼得杜拉克说过:做事情必须分清轻重缓急。最糟糕的是什么事都做,这必将一事无成。而意大利经济学家柏拉图则更明确提出:重要的少数与琐碎的多数或称20/80的定律。就是80%的有效工作往往是在20%的时间内完成的,而20%的工作是在80%的时间内完成的。因此,为了提高测试质量,必须清晰的认识到哪些软件缺陷是最重要的,哪些软件缺陷是最关键的。不要拣了芝麻,却丢了西瓜。所以,只有抓住了重要的关键缺陷,测试效果才能产生最大的效益,这也是第一个原则---分清轻重缓急,把测试活动用在最有生产力的事情上。 (2)ABC法则

缺陷预测方法介绍

缺陷预测方法介绍 一、背景介绍 研发项目在测试初期由于没有更多的数据支撑,所以不能进行缺陷总数预测。而当数据量达到一定程度后,我们就可以通过工具来进行缺陷总数预测,同时在不同的时间段内多次预测并不断修正预测的缺陷总数,已达到对质量评估和测试计划调整起到一个指导作用。 二、工具及使用介绍 I、Excel II、Gompertz增长模型 III、SPSS 平常测试中我们会发现,测试的初始阶段,由于对测试环境不够熟悉,日均发现的软件缺陷数量比较少,发现软件缺陷数的增长较为缓慢;随着逐渐进入状态并熟练掌握测试环境后,日均发现软件缺陷数增多,发现软件缺陷数的增长速度迅速加快。随着测试的继续进行,软件缺陷的隐藏加深,发现难度加大,需要执行较多的测试才能发现一个缺陷,尽管缺陷数还在增加,但增长速度会减缓,而软件中隐藏的缺陷是有限的,因次限制了发现缺陷数的无限增长。这种发现软件缺陷的变化趋势及增长速度是一种典型的‘S’曲线,根据这种规律我们可以使用增长模型来预测缺陷的总数。 1、Excel运用宏进行缺陷总数预测 1-1、首先先把数据列入Excel表中 1-2、加载宏 Office按钮 -> Excel选项(I) -> 加载项 -> 管理(选择“Excel 加载项”) -> 点击[转到(G)]按钮 -> 加载宏界面勾选“分析工具库”和“规划求解加载项” (确定后等待加载完成即可),图1

图1 1-3、在数据下的菜单里点击“数据分析”(在右边),将弹出数据分析对话框,图2 图2 1-4、在分析工具(A)选择框处选择“回归”后点击[确定]按钮,弹出回归设置对话框,如图3 图3 1-5、根据步骤4的设置,在新的sheet里查看结果,我们只需查看Upper 95%的值即可,如图4 图4 根据以上操作,我们可以预测该系统的缺陷总数约为448.4个。 2、运用Gompertz增长模型进行缺陷总数预测 模型表达式为Y=a*b^(c^T) 其中Y表示随时间T发现的软件缺陷总数,a是当T→∞时的可能发现的软件缺陷总数,即软件中所含的缺陷总数。a*b是当T→0时发现的软件缺陷数,c表示发现缺陷的增长速度。我们需要依据现有测试过程中发现的软件缺陷数量来估算出三个参数a,b,c的值,从而得到拟合曲线函数。

软件缺陷分类标准

审核/日期批准/日期

文档修改记录(Revision Chart) [This chart contains a h istory of this document’s revisions. The entries below are provided solely for purposes of illustration. Entries should be deleted until the revision they refer to has actually been created. The document itself should be stored in revision control, and a brief description of each version should be entered in the revision control system. That brief description can be repeated in this section. Revisions do not need to be described elsewhere in the document except inasmuch as they explain the development plan itself.]

目录 1引言 (1) 1.1 编写目的 (1) 1.2 定义与缩写 (1) 1.3 参考资料 (1) 2软件缺陷分类标准 (1) 2.1 缺陷属性 (1) 2.2 缺陷类型 (1) 2.3 缺陷严重程度 (3) 2.4 缺陷优先级 (4) 2.5 缺陷状态 (4) 2.6 缺陷来源 (4)

预测模型分类

预测模型分类及优缺点分析 灰色(系统)预测模型 神经网络预测模型 趋势平均预测法 1 微分方程模型 当我们描述实际对象的某些特性随时间(或空间)而演变的过程、分析它的变化规律、预测它的未来性态、研究它的控制手段时,通常要建立对象的动态微分方程模型。微分方程大多是物理或几何方面的典型.问题,假设条件已经给出,只需用数学符号将已知规律表示出来,即可列出方程,求解的结果就是问题的答案,答案是唯一的,但是有些问题是非物理领域的实际问题,要分析具体情况或进行类比才能给出假设条件。作出不同的假设,就得到不同的方程。比较典型的有:传染病的预测模型、经济增长预测模型、正规战与游击战的预测模型、药物在体内的分布与排除预测模型、人口的预测模型、烟雾的扩散与消失预测模型以及相应的同类型的预测模型。其基本规律随着时间的增长趋势是指数的形式,根据变量的个数建立初等微分模型。微分方程模型的建立基于相关原理的因果预测法。该法的优点:短、中、长期的预测都适合,而.既能反映内部规律,反映事物的内在关系,也能分析两个因素的相关关系,精度相应的比较高,另外对初等模型的改进也比较容易理解和实现。该法的缺点:虽然反映的是内部规律,但是由于方程的建立是以局部规律:的独立性假定为基础,故做中长期预测时,偏差有点大,而且微分方程的解比较难以得到。 2 时间序列法 将预测对象按照时问顺序排列起来,构成一个所谓的时间序列,从所构成的这一组时间序列过去的变化规律,推断今后变化的可能性及变化趋势、变化规律,就是时间序列预测法。时间序列预测一般反映三种实际变化规律:趋势变化、周期性变

化、随机性变化。考虑一组给定的随时间变化的观察值,t=1,2,3,?,n},如何选取合适模型预报,t=n+1,n+3, n+k}的值。 上面的模型统称ARMA模型,是时间序列建模中最重要和最常用的预测手段。 事实上,对实际中发生的平稳时间序列做恰当的描述,往往能够得到自回归、滑动平均或混合的模型,其阶数通常不超过2。时间序列模型其实也是一种回归模型,属于定量预测,其基于的原理是,一方面承认事物发展的延续性,运用过去时间序列的数据进行统计分析就能推测事物的发展趋势;另一方面又充分考虑到偶然因素影响而产生的随机性,为了消除随机波动的影响,利用历史数据,进行统计分析,并对数据进行适当的处理,进行趋势预测。优点是简单易行,便于掌握,能够充分运用原时间序列的各项数据,计算速度快,对模型参数有动态确定的能力,精度较好,采用组合的时间序列或者把时间序列和其他模型组合效果更好。缺点是不能反映事物的内在联系,不能分析两个因素的相关关系,常数的选择对数据修匀程度影响较大,不宜取得太小,只适用于短期预测 3 灰色预测理论模型 灰色预测的基本思路是将已知的数据序列按照某种规则构成动态或非动态的 白色模块,再按照某种变化、解法来求解未来的灰色模型。它的主要特点是模型使用的不是原始数据序列,而是生成的数据序列。其核心体系是灰色模型(GM),即对原始数据作累加生成(或其他方法生成)得到近似的指数规律再进行建模的模型方法。优点是不需要很多的数据,一般只需要4个数据就够,能解决历史数据少、序列的完整性及可靠性低的问题;能利用微分方程来充分挖掘系统的本质,精度高;能将无规律的原始数据进行生成得到规律性较强的生成数列,运算简便,易于检验,具有不考虑分布规律,不考虑变化趋势。缺点是只适用于中长期的预测,只适合指数增长的预测,对波动性不好的时间序列预测结果较差。 4 BP神经网络模型

基于贝叶斯网络技术的软件缺陷预测与故障诊断

Microcomputer Applications Vol. 25, No.11, 2009 技术交流 微型电脑应用 2009年第25卷第11期 ·31· 文章编号:1007-757X(2009)11-0031-03 基于贝叶斯网络技术的软件缺陷预测与故障诊断 王科欣,王胜利 摘 要:如何进一步地提高软件的可靠性和质量是我们十分关注的问题,而前期软件缺陷和后期软件故障的诊断都是控制质量的关键手段,由此我们提出了基于贝叶斯的神经网络。基于对贝叶斯网络和神经网络理论的分析,发现贝叶斯网络和神经网络各自的优点与不足,利用贝叶斯具有前向推理的优势进行故障诊断,利用神经网络学习算法能够处理更复杂网络结构的优势来积累专家知识,最后提出了贝叶斯网络与概率神经网络相结合的模型,该模型可以更好地兼顾软件缺陷与故障诊断两个方面。 关键词:贝叶斯;神经网络;测试;缺陷预测;故障诊断 中图分类号:TP311.5 文献标志码:A 0 引言 如何进一步提高软件的可靠性和质量是我们十分关注的问题,软件可能存在缺陷,我们在软件的整个生命周期中始终期望能及早发现重要错误,并及时诊断。这就告诉我们,在进行软件前期预测时,就应该重视和记录重要缺陷,以便在故障发生时能通过早期预测的记录表找到故障原因。这就说明软件缺陷预测和故障诊断不应该是两个独立的过程,而应该有所联系。本文就通过贝叶斯网络和模糊神经网络对两项工作进行了整合。通过贝叶斯的在推理规则上的优势,尤其是前向推理的特点进行故障诊断,利用神经网络学习和训练函数的复杂多样性,可以更好地拟合复杂情况。 1 软件缺陷预测与故障诊断 1.1 软件缺陷预测的两个方面 1.1.1 对于软件可靠性早期预测 对于开发者而言,在开发软件之前或者设计软件中,主要作用是进行风险控制,验证其设计可行性。由于贝叶斯网络可以在信息不完全的情形下进行不确定性和概率性事件的推理,所以对于复杂软件的早期预测具有先天的优势。软件缺陷数量属于动态度量元素,需要通过对软件产品进行完整的测试后才能获得。针对特定模块进行完整测试成本比较高,并且必须在软件开发完成之后才能进行集成测试,这样在前期很难控制软件产品缺陷数量。为了更好地提高软件质量,对软件模块中包含的缺陷进行预测是一个可行的方法。软件缺陷预测方法的前提假设是软件的复杂度和软件的缺陷数量有密切关联。复杂度高的软件模块产生的缺陷比复杂度低的模块产生的缺陷多。软件缺陷预测的思路是使用静态度量元素表征软件的复杂度,然后预测软件模块可能的缺陷数量或者发生缺陷的可能性。通过进行软件缺陷预测,能够以较低的成本在项目开发的早期预测产品的缺陷分布状况,可以更好的调整有限的资源,集中处理可能出现较多缺陷的高风险模块,从而从整体上提高软件产品的质量。 1.1.2 对于软件残留缺陷的预测 对于测试者而言,通过质量预测,可将软件的各个组成部分按预测的质量水平进行分类,明确测试的重点,避免在进行测试时同等对待,而是有所侧重,这对节约有限资源和缩短开发周期都有着十分重要的意义。软件的测试和修改是一个螺旋式上升的过程。由于资源和时间的有限投入,什么时候软件达到了要求的质量水平从而能够投入实际使用是一个十分关键的问题。对残留缺陷进行预测,目的就是为了确保代码中的缺陷数量维持在一个安全水平。对测试经理来说,估计目前软件的测试到了哪个阶段、还应该继续做到什么样水平,这都是尤其重要的。从软件经济学的观点上来看,它关系到产业界的投入产出比、测试过度,不能再检查出太 多错误,或者说检查耗费很长的时间和很多的人力,但最终是一个细微的错误,这是不经济的;但是如果残留缺陷还比较多,就停止测试工作,那么会使得这些缺陷在未排除的情 况下交付给用户,等到用户发现错误时,维护的成本就会更 高。因此,正确预测软件残留缺陷对于交付使用后的软件维护也具有重要意义。 1.2 软件故障诊断技术 软件故障诊断是根据软件的静态表现形式和动态信息查找故障源,并进行分析,给出相应的决策。其中静态形式包括程序、数据和文档,动态信息包括程序运行过程中的一系列状态,人在参与软件生存周期的各个阶段工作时,都有可能由于各种疏忽和不可预料的因素,出现各种各样的错误。因而,从广义上说,软件故障诊断的工作涉及到软件的整个生命周期——需求分析、设计、编码、测试、使用、维护等各阶段所造成的缺陷。 软件故障诊断,“诊”的主要工作是对状态检测,包括使用各种度量和分析方法;“断”的工作则更为具体,它需要确定:(1)软件故障特性;(2)软件故障模式;(3)软件故障发生的模块和部位;(4)说明软件故障产生的原因,并且提出相应的纠正措施和避免下一次再发生该类错误的措——————————— 作者简介:王科欣(1982-) ,男,湖南长沙人,暨南大学计算机科学系,硕士研究生,软件设计师,广东体育职业技术学院助教,主要研究方向为软件工程、数据库与知识工程,广东 广州,510632;王胜利(1984-),男,湖南衡阳人,暨南大学计算机科学系,硕士 研究生,研究方向为软件工程、数据挖掘,广东 广州,510632

现代六爻预测的十种分类预测方法

现代六爻预测的十种分类预测方法 预测求财做生意 在市场经济中,每个人的经济活动都离不开经济效益。广义上讲,看我们自己所做的一切工作有没有经济效益。这就是求财。 在求财预测中,必须先搞清楚各个卦爻在预测财运中各自代表什么人什么物,然后依据人和物的旺衰、相生相克、空亡入墓等等情况,才可做出综合判断,看能否得到财。因而,知道卦爻在求财中代表什么,能派上什么用场,是积极因素还是消极因素,才能最后达到预测准确的目的。所以,取准用神是至关重要的第一步,然后再去考虑忌原仇及各自的旺衰。 财爻代表所求的经济效益,是求财中的主用神,因而,财爻的旺与衰,直接关系到能否得到财。 首先,财爻必须与世爻(代表求财人)构成一种关系,这种关系就是财爻持世并合关系,财爻生世克世的关系,财爻被世爻生和克的关系。这几种关系,在卦中是相生相克的关系,在实际求财中,是求财人与经济效益的关系。因而,测求财时,如果卦中无财(财爻不上卦),或伏藏不得出,或子孙爻也不上卦,一般情况下是求财无望。这是因为,卦中无财,无法与世爻构成这种关系,既无这种人与经济效益的关系,或财无来源,又怎能谈发财呢! 按五行论,无论世爻是何种五行,必然与其他五行构成并列、相生、相克三种关系。有了这三种关系,不等于说世爻同财爻就构成了这种联系。因为在实际断卦中,虽然财爻可以生世爻,但财爻衰弱,或空亡,或入墓或因合忘生等等情况,实际上财爻并不能生世爻。这种实际不能生的情况,就等于财爻与世爻暂时无必然联系,无相互关系。既然无联系和无关系,又怎能谈发财呢!财爻可以因种种原因不能生世爻,世爻也可以因其衰弱或入墓空而不受生。这种不受生的表现形式,也等于割断了财爻与世爻的联系。至于世爻克财爻,世爻生财爻,都可以因自己或对方旺衰强弱的具体情况,而不能克对方生对方,或不受生不受克,因而暂时都没有了联系。没有了联系,就不会得财。 财爻生世克世,或世爻生财爻克财爻,都是得财之象。在遇到暂时不能生克对方而被切断这种联系的情况时,也可以因为年月日令即旺衰的来源的改变而又恢复这种联系,使财爻可以生世克世,世爻可以克财生财的时候,也就是求财的应期到了。 不论你是为公私营企业求利,或以自己技能求利,以及得奖、接受馈赠、借贷、继承遗产等等,都是求财求利,希望自己能得到财力。在预测时,均以财爻为用来代表。

软件的缺陷分析

软件的缺陷分析 一、缺陷分析的作用 软件缺陷不只是通常所说程序中存在的错误或疏忽,即俗称的Bug。其范围更大,除程序外还包括其相关产品:项目计划、需求规格说明、设计文档、测试用例、用户手册等等中存在的错误和问题。需要强调,在软件工程整个生命周期中任何背离需求、无法正确完成用户所要求的功能的问题,包括存在于组件、设备或系统软件中因异常条件不支持而导致系统的失败等都属于缺陷的范畴。 软件测试的任务就是发现软件系统的缺陷,保证软件的优良品质。但在软件中是不可能没有缺陷的。即便软件开发人员,包括测试人员尽了努力,也是无法完全发现和消除缺陷。 如何做到最大限度地发现软件系统的缺陷,人们首先想到提高开发人员的素质和责任心,科学地应用测试方法和制定优秀的测试方案。但这是不够的,我们还需要实施缺陷分析。缺陷分析是将软件开发、运行过程中产生的缺陷进行必要的收集,对缺陷的信息进行分类和汇总统计,计算分析指标,编写分析报告的活动。 通过缺陷分析,发现各种类型缺陷发生的概率,掌握缺陷集中的区域、明晰缺陷发展趋势、了解缺陷产生主要原因。以便有针对性地提出遏制缺陷发生的措施、降低缺陷数量。对于改进软件开发,提高软件质量有着十分重要的作用。 缺陷分析报告中的统计数据及分析指标既是对软件质量的权威评估,也是判定软件是否能发布或交付使用的重要依据。 二、管理软件的缺陷分析 不同于系统、工具、工控、游戏等软件,管理软件在实际运行时面临情况要复杂得多。首先是用户的需求更加不统一,而且随时间的推移需求发生变化快、变化大;其次运行环境更复杂,除受操作系统、数据库等影响外,用户在网络、甚至同一计算机安装运行不同性质和背景的应用软件,其影响很难预测;再者客户的操作习性不同,等等。因此管理软件的种种缺陷,不是在开发时通过测试都能预计的。预测并控制缺陷有效手段之一是缺陷分析。 在高级别的CMM 中就包含了缺陷分析活动。缺陷分析更是一种以发展方式进行软件过程改进的机制。 三、缺陷的信息收集 软件工程通常要求为开发项目建立缺陷管理库,也有人称为变更控制库。从发现缺陷开始创建变更,直到缺陷解决、经验证、关闭变更止。在缺陷管理的整个生命周期记录了大量相关资料,它们是缺陷分析所需要的宝贵信息。 由于变更库并不专为缺陷分析而设计,缺陷分析主要关心以下信息项:变更编号、变更主题、变更提交的日期、变更状态、变更性质、变更解决的日期、变更产生的根本原因、解决变更的工作量、验证变更的工作量、变更的严重性等级、变更所属软件产品及子系统、变更修改的模块、变更产生的阶段、变更来源、变更测试情况等。缺陷信息部分是在创建变更时输入的,部分是在变更解决中或解决后输入的。 为了实施统计,有些缺陷信息必需事先设定关键字。 变更控制库中有一信息项——变更原因,由修改缺陷程序的程序员详细记录缺陷产生的具体原因。这项信息显然无法直接用于分类和汇总。变更产生的根本原因信息项,则是基于变更原因的关键字字段,是专为处理缺陷分析中缺陷原因而设计的信息项。 软件发布前缺陷分析所用缺陷根本原因的关键字,可以有下几种实例: * 编程:原始编程出错,没有客观原因。 * 修改:由于修改缺陷而引发的新变更,并且引发的变更与原变更的错误是相关的。 * 培训:项目组新成员培训不充分,或使用新工具不熟练引起的变更。

软件测试之缺陷分析实战经验

一、软件缺陷的定义及主要类型 我们对软件缺陷分析一下,所谓"软件缺陷(bug)",即为计算机软件或程序中存在的某种破坏正常运行能力的问题、错误,或者隐藏的功能缺陷。一般来说,软件缺陷的属性包括缺陷标识、缺陷类型、缺陷严重程度、缺陷优先级、缺陷来源、缺陷原因等。 进行软件缺陷分析后,软件缺陷的主要可以分为以下几种类型: (1)设计不合理; 2)功能、特性没有实现或部分实现; 3)运行出错,包括运行中断、系统崩溃、界面混乱等; 4)与需求不一致,在执行TestCase时则为实际结果和预期结果不一致; (5)用户不能接受的其他问题,如存取时间过长、界面不美观; (6)软件实现了需求未提到的功能。 二、软件缺陷的级别、优先级及状态 软件缺陷有四种级别,分别为:致命的(Fatal),严重的(Critical),一般的(Major),微小的(Minor)。 A类—致命的软件缺陷(Fatal): 造成系统或应用程序崩溃、死机、系统挂起,或造成数据丢失,主要功能完全丧失,导致本模块以及相关模块异常等问题。如代码错误,死循环,数据库发生死锁、与数据库连接错误或数据通讯错误,未考虑异常操作,功能错误等 B类—严重错误的软件缺陷(critical):系统的主要功能部分丧失、数据不能保存,系统的次要功能完全丧失。问题局限在本模块,导致模块功能失效或异常退出。如致命的错误声明,程序接口错误,数据库的表、业务规则、缺省值未加完整性等约束条件 C类—一般错误的软件缺陷(major):次要功能没有完全实现但不影响使用。如提示信息不太准确,或用户界面差,操作时间长,模块功能部分失效等,打印内容、格式错误,删除操作未给出提示,数据库表中有过多的空字段等 D类—较小错误的软件缺陷(Minor),使操作者不方便或遇到麻烦,但它不影响功能过的操作和执行,如错别字、界面不规范(字体大小不统一,文字排列不整齐,可输入区域和只读区域没有明显的区分标志),辅助说明描述不清楚 E类- 建议问题的软件缺陷(Enhancemental):由问题提出人对测试对象的改进意见或测试人员提出的建议、质疑。 常用的软件缺陷的优先级表示方法可分为:立即解决P1、高优先级P2、正常排队P3、低优先级P4。立即解决是指缺陷导致系统几乎不能使用或者测试不能继续,需立即修复;高优先级是指缺陷严重影响测试,需要优先考虑;正常排队是指缺陷需要正常排队等待修复;而低优先级是指

蛋白质功能预测方法概述

蛋白质功能预测方法概述 摘要: 蛋白质是生物体内最必需也是最通用的大分子,对它们功能的认识对于科学领域和农业领域的发展有着至关重要的作用。随着后基因组时代的发展,NCBI 数据库中迅速涌现出大量不明结构与功能的蛋白质序列,这些蛋白质序列甚至一跃成了研究的热点。近几十年来蛋白质功能预测的方法不断被完善。由最初的仅基于蛋白质序列或3D 结构信息的方法衍生出更多的基于序列相似性、基于结构基序、基于相互作用网络等新方法,这些新型方法采用新的算法、新的研究思路和技术手段,力求得到准确性与普遍性并存,能够被广泛应用的蛋白质功能预测方法。本文综述了近年来蛋白质功能预测的方法,并将这些研究方法分类归纳,各自阐明了每类方法的优缺点。 关键词: 蛋白质功能预测方法,结构基序,相互作用网络,ESG An Overview protein function prediction methods Abstract: Protein is the most necessary and versatile macromolecules in vivo,researches on their functions are very important to the fields of science and the development of the agriculture. With the development of the post - genomic era,the NCBI database quickly emerges a large number of protein sequences of unknown structure and functions, which even become hot research Points. In the recent decades,protein function prediction methods have been more and more improved and developed. This article reviews the protein function prediction methods occured in recent years,All these methods were inducted and classicicated,and their advantages and disadvantages of each methods were illustrates respectively. Keywords: Protein Function Prediction Methods,Structal Motif,Interaction Networks,ESG 1 引言 基因组学和蛋白质组学在过去十年的发展过程中产生了大规模的新的蛋白质序列和试 验数据,科学家为了确定这些新序列的功能借助计算机手段进行了大量的研究[1 - 2]。在过去的二十年里,人们利用计算机技术对蛋白质功能进行预测的文章发表了上千篇之多( http: / /www.ncbi.nlm.nih.gov /pubmed) ,大部分是基于序列相似性、基于结构域、基于相互作用网络等方法预测,再利用生物学知识来进行解析。本文综合阐述了迄今为止蛋白质功能预测的分类,大致可分为四类: ( 1) 基于序列相似性预测方法; ( 2) 基于蛋白质相互作用网络预测方法;( 3) 基于结构相似性预测方法; ( 4) 其他预测方法。 2 蛋白质功能 蛋白质功能对于客观环境很敏感: 给定的发挥作用的空间环境不同、规定的作用时间不同都可以使蛋白质所表现出来的功能是有差异性的。为了使功能预测的结果更加准确,Bork 等提出了一种蛋白质功能类型的分类[3],按蛋白质发挥作用的平台不同将蛋白质功能分为分子功能,细胞功能和生理功能。很明显,这三个类型不是独立存在的,而是如图2 那样等级相关的。现如今在蛋白质功能预测中最常用的是GO 分类,Gene Ontology 分类从细胞组

【项目管理知识】软件测试的缺陷分析

软件测试的缺陷分析 相关内容:软件测试缺陷跟踪管理更多知识 一、缺陷分析的作用 软件缺陷不只是通常所说程序中存在的错误或疏忽,即俗称的Bug。其范围更大,除程序外还包括其相关产品:项目计划、需求规格说明、设计文档、测试用例、用户手册等等中存在的错误和问题。需要强调,在软件工程整个生命周期中任何背离需求、无法正确完成用户所要求的功能的问题,包括存在于组件、设备或系统软件中因异常条件不支持而导致系统的失败等都属于缺陷的范畴。本文 软件测试的任务就是发现软件系统的缺陷,保证软件的优良品质。但在软件中是不可能没有缺陷的。即便软件开发人员,包括测试人员尽了努力,也是无法完全发现和消除缺陷。 如何做到限度地发现软件系统的缺陷,人们首先想到提高开发人员的素质和责任心,科学地应用测试方法和制定的测试方案。但这是不够的,我们还需要实施缺陷分析。缺陷分析是将软件开发、运行过程中产生的缺陷进行必要的收集,对缺陷的信息进行分类和汇总统计,计算分析指标,编写分析报告的活动。 通过缺陷分析,发现各种类型缺陷发生的概率,掌握缺陷集中的区域、明晰缺陷发展趋势、了解缺陷产生主要原因。以便有针对性地提出遏制缺陷发生的措施、降低缺陷数量。对于改进软件开发,提高软件质量有着十分重要的作用。-全国教育类网站() 缺陷分析报告中的统计数据及分析指标既是对软件质量的权威评估,也是判定软件是否能发布或交付使用的重要依据。-全国教育类网站()

二、管理软件的缺陷分析 不同于系统、工具、工控、游戏等软件,管理软件在实际运行时面临情况要复杂得多。首先是用户的需求更加不统一,而且随时间的推移需求发生变化快、变化大;其次运行环境更复杂,除受操作系统、数据库等影响外,用户在网络、甚至同一计算机安装运行不同性质和背景的应用软件,其影响很难预测;再者客户的操作习性不同,等等。因此管理软件的种种缺陷,不是在开发时通过测试都能预计的。预测并控制缺陷有效手段之一是缺陷分析。 在高级别的CMM中就包含了缺陷分析活动。缺陷分析更是一种以发展方式进行软件过程改进的机制。来源: 三、缺陷的信息收集 软件工程通常要求为开发项目建立缺陷管理库,也有人称为变更控制库。从发现缺陷开始创建变更,直到缺陷解决、经验证、关闭变更止。在缺陷管理的整个生命周期记录了大量相关资料,它们是缺陷分析所需要的宝贵信息。 由于变更库并不专为缺陷分析而设计,缺陷分析主要关心以下信息项:变更编号、变更主题、变更提交的日期、变更状态、变更性质、变更解决的日期、变更产生的根本原因、解决变更的工作量、验证变更的工作量、变更的严重性等级、变更所属软件产品及子系统、变更修改的模块、变更产生的阶段、变更来源、变更测试情况等。

软件缺陷的分类与管理

软件缺陷的分类与管理 通常大家发现软件缺陷时会对软件缺陷进行分类,可分类的方式只有一种,就是严重极别,难道没有其它的分法吗。比如我们碰到下面这种情况,测试人员发现有一种功能是必需加入进去的,这时他与程序员说,程序员说没有时间或是不必要,这时这种情况则会形成两者的扯皮,最终的结果也就不了了知了,这样会戳伤了测试人员的积极性,下次他们再也不会尽心的考虑产品的问题,只要可以运行就可以了。其实这种情况是可以解决的,下面我会提到一个新的软件缺陷分类概念,从而有效的解决这个问题。在软件缺陷中不仅仅只是严重极别,更多的则是功能没有做到。说到这里也许大家都理解了,就是需求没有考虑到,可需求不会一次就很完美的,需要大家的共同努力,来不断的完善。那么怎样才能让测试人员提出的好的建议得到有效的执行?这就是我下面想说的。在软件缺陷中还有一种分法,跟据缺陷内容来分,主要分为需求Bug与程序Bug,对于这种分法的好处就是明确了Bug处理的责任人。对于程序Bug我们都知道是由相关开发人员进行处理。下面主要讨论一下需求Bug,需求Bug从名称上来就知道是要交由需求人员进行处理,可怎么处理,怎样在处理的过程中有效的让这些创意得到体现。现在我们都有Bug管理系统,这时我们的测试人员将需求Bug不是提交给程序员,而是提交给需求分析人员,由他们进行处理,不过这里我想强调的是对需求Bug的定位,如果这个Bug在软件需求说明书中明确提到了,这时就不可能定位它为需求Bug,它是必需让程序员实现的,称为软件功能缺陷,提交由程序员进行处理。但如果需求说明书没有明确提到的,我们则可以定位为需求Bug,处理的流程如图。

软件质量量化标准

软件质量量化标准 版本记录: 1编写目的 本文档描述了对软件质量的量化方法,适用于软件相关各部门:项目部、电力产品部、研发中心、支持服务中心。 量化指标主要有:测试缺陷率、遗漏缺陷率、设计评分、代码评分。 2 定义 有效缺陷:经过测试总结会、或由技术总监组织评审,确定为影响软件质量的缺陷(包括已立即修改、及因客观条件影响而暂缓修改的缺陷)定义为有效缺陷。测试组提出的改进性建议不记为有效缺陷。 测试缺陷率:以测试阶段发现并确认的有效缺陷为准,该质量指标用于评价开发团队。 遗漏缺陷率:以软件试运行阶段客户或维护人员发现并确认的有效缺陷为准,该质量指标用于评价测试团队。 设计评分:《需求说明书》、《构架设计》、《概要设计》(包括《数据库设计》)必须通过正式会议评审,并由技术总监组织评分。该质量指标用于评价软件设计人员。 代码评分:项目编码阶段结束之后、项目总结会之前,软件代码成果必须经代码复审,并由技术总监组织评分。该质量指标用于评价程序员。 3执行细则

测试阶段: 有效缺陷以测试组提交的《测试总结报告》为依据,通过测试总结会,由技术总监组织评审,并经开发团队和测试团队确认。 试运行阶段: 1)试运行结束日期以客户签字的《试运行分析报告》日期为准。 2)未作版本控制的系统,以《客户信息交流表》记录的缺陷为准。 3)作版本控制的系统,以迁入迁出记录为准,要求迁入迁出必须作修改备注,说明所更 正的缺陷。 缺陷率计算方法 有效缺陷,分为A、B、C、D四级,加权系数分别为1.2、1.1、1.0、0.9; 系统复杂度,分为A、B、C三级,加权系数分别为1.5、1.2、1; 总缺陷数=测试阶段确定的缺陷数+试运行阶段确定的缺陷数; 缺陷比=(A*1.2 + B*1.1 + C*1.0 + D*0.9)/总缺陷数; 缺陷率=(A*1.2 + B*1.1 + C*1.0 + D*0.9)/ (代码行数 * 系统复杂度); 缺陷分类标准

基于分类的软件缺陷严重性预测

Vol.44No.8 1532 计算机与数字工程 Computer&Digital Engineering总第322期 2016年第8期基于分类的软件缺陷严重性预测* 王婧宇1张欣2邹卫琴3 (1.南京市产品质量监督检验院南京210028)(2.大连理工大学软件学院大连116621)(3.江西理工大学南昌330000) 摘要在软件开发过程中,软件缺陷的修复是保证软件质量的重要环节。然而随着软件规模的快速增长、缺陷数目的急剧增加、人力物力资源的有限,报告的软件缺陷不能全部被修复。为了保证软件的质量,人们往往对软件缺陷进行优先级排序,将有限的资源集中在优先级高的软件缺陷修复上。软件缺陷严重性就是一种重要的优先级度量标准。近年来对软件缺陷严重性的预测已引起了人们的广泛研究。当前主流的预测方法是基于分类的技术,即将描述软件缺陷的报告当做文档,严重程度作为标签,利用现有的机器学习算法对其建模并进行预测。论文主要对基于分类的软件缺陷严重性预测工作做一个简要的介绍,包括已有的研究成果,基于分类的技术框架以及将来可能的发展方向。 关键词软件缺陷;缺陷严重性;分类 中图分类号T P311DOI:10.3969/j.issn.1672-9722.2016.08.028 SeverityPredictionofSoftwareDefectsBasedonClassification WANGJingyu1ZHANGXin2ZOUWeiqin3 (1.Nanjing Institute of Product Quality Inspection,Nanjing210028) (2.Software School,Dalin Univeristy of Technology,Dalian116621) (3.Jiangxi University of Science and Technology,Nanchang330000)AbstractIn software development,the debugging of software defects is an important part to guarantee the software’s q uality.However,with the rapid increase of software scale,the more and more defects numbers and the limitation of human and material resources,reported software bugs cannot be all debugged.To guarantee the quality of software,software de-fects are prioritized,focusing the limited source on the debugging of software defects with high priority.The severity of soft-ware defects is an important standard for prioritization.In recent years,p redicts of the severity of software defects have been state-of-the-arts.The principal aspect of predicting is based on the classification,w hich means building a model and predic-ting based on current machine learning algorithms with the reports including defect description working as documents,severi-ty as tag.This article gives a brief introduction of on prediction the severity of software defects based on classification,inclu-ding the existing research results,the framework based on classification and future work. KeyWordssoftware defects,severity,classification ClassNumberT P311 1引言 软件维护是软件开发过程中必不可少的重要环节,对缺陷的修复是软件维护的重要活动之一。随着软件规模的快速增长,大量的软件缺陷单靠人力来维护已变得非常困难。由此,很多软件如 M ozilla、Eclipse等通常使用缺陷管理工具来跟踪和管理软件的缺陷。Bugzilla和Jira就是当前两种比较主流的跟踪维护软件缺陷的管理工具,又称 bug仓库。由于任何用户在使用软件的过程中发现了问题都可以向bug仓库中提交报告来描述缺陷,这通常导致bug仓库存储了大量的缺陷报告。例如,据统计,Eclipse的bug仓库中每天平均接收94个新的bug报告[1]。另一方面,由于软件开发 *收稿日期:2016年2月9日,修回日期:2016年3月28日 作者简介:王靖宇,女,硕士,工程师,研究方向:软件测试、标准化、信息系统检测。张欣,女,研究方向:软件测试、自动化测试。邹卫琴,女,硕士,研究方向:软件测试、软件挖掘。

相关主题
文本预览
相关文档 最新文档