当前位置:文档之家 > A comprehensive simulation study on optimal scheduling for software projects

A comprehensive simulation study on optimal scheduling for software projects

A Comprehensive Simulation Study on

Optimal Scheduling for Software Projects

Frank Padberg

Fakult?t für Informatik

Universit?t Karlsruhe

Germany

padberg@http://www.doczj.com/doc/0f80300202020740be1e9ba0.htmla.de

preliminary version

1. Introduction

Problem and goal. The planning of software projects takes place under considerable uncertainty. For various reasons, the time needed to complete a development activity in a software project is hard to estimate. Feedback between activities in the process causes rework and delays. These circumstances make project scheduling a notoriously difficult task for the managers of software projects.

The goal of this paper is to begin a systematic study of how the choice of good (or even optimal) scheduling decisions in a software project depends on certain characteristics of the project or product, such as the strength of the coupling between the components or the degree of specialization of the teams on different tasks. The ultimate goal of this research is to develop guidelines for managers how to schedule their software projects in the best possible way.

Software process model. This paper uses a scheduling model for software projects which we have presented earlier in detail [SPIP 2002, APSEC 2002, APSEC 2001]. In the model, the software is developed by teams who work in parallel on the software’s components. At any time during the project, each team works on at most one component. The assignment of the components to the teams may change during the project, that is, scheduling takes place.

The process model has a unique way to model feedback between different activities. From time to time, some team will detect a defect, an inconsistency, or some other problem in the software’s high-level design. Since the components are coupled, for example, through common interfaces, the necessary redesign will force a number of components to be reworked. Therefore, the teams do not work independently; the progress of one task not only depends on the productivity of the corresponding team, but also on the progress of the other tasks.

MDP model. Technically, the software process is modelled as a Markov decision process (MDP). The software process moves between states over time. In each state, a scheduling action is chosen by the manager. Once an action has been chosen, the transition to the next state is governed by the transition probabilities. Each transition bears some cost. Hence, the software process appears as a stochastic system.

For the software process MDP, the project state carries information about the net progress (excluding rework) that has been achieved for each component up to the current point in time. In addition, the latest task assignment and the amount of rework yet to be worked off for each component are included in the state. A transition to the next state takes place if a redesign occurs, or, if some team finishes its current component.

In order to compute the transition probabilities for the software process or to simulate a project path, some input data are required: the base probabilities and the dependency degrees. The base probabilities are a measure for the pace at which the teams have made progress in past projects. The dependency degrees are a probabilistic measure for the strength of the coupling between the components. We give examples for the input data in the next section.

Optimal policies. A rule or table which specifies for each possible state which scheduling action to take in that state is called a strategy or policy. Since the process model is stochastic, the best one can achieve is a policy which minimizes the expected project cost. It is well-known in Operations Research how to compute such an optimal scheduling policy for an MDP using stochastic dynamic programming. The computational effort grows exponentially with the number of states of the MDP, though; hence, exact optimization is feasible only if the instance under study is not too large, that is, if the instance does not have too many states and transitions in each state (branching factor).

List policies. A list policy uses a fixed priority list for the components to prescribe an order in which the components must be developed. When a team finishes its current component, it is allocated to the next unprocessed component in the list. Hence, list policies keep all teams busy all the time. In a stochastic setting, the task completion times are not known in advance. Thus, the actual schedule selected by a list policy depends

on the order in which the teams finish their tasks, which is subject to chance. List policies are not very flexible, but common in practice.

Approach. We study the impact of certain project and product factors, such as the strength of the coupling between the components, on the performance of various scheduling policies in a small (hypothetical) sample project. Cost is defined to be the project duration. We fix a set of two teams (with varying productivity) and four components (of varying complexity and risk-level), and then systematically vary ….

? the strength of the coupling between the components (minimal, uniform, maximal, asymmetric);

? the degree of specialization of the teams on certain components (see below);

? the scheduling strategy (all possible list policies and the optimal policy).

This procedure yields a set of more than 80 different sample projects, plus 25 different scheduling policies. The optimal policy for some project setting is computed using a particular dynamic programming algorithm, called value iteration. To evaluate the performance of the various policies, we simulate a large number of project paths (trajectories) for each possible combination of sample project and scheduling policy. We have already seen in [SPIP 2002] that list policies are a good starting point for analyzing scheduling decisions in our model. In this paper, we proceed along similar lines, but in addition analyze the behavior of the optimal policy.

Role of simulation. Process simulation is a key technique in this study. We use simulation extensively to evaluate the scheduling policies under different project settings. For each policy (including the optimal policy) and setting, we simulate 10,000 full project trajectories. We observe the project cost for each trajectory and use the corresponding frequency table and mean value as an approximation of the true cost distribution and expected cost of the policy. For a given setting, we use the simulation results to determine the best list policy. We then compare the performance of the best list policy against the optimal policy.

In future studies, process simulation will be indispensable for analyzing the behavior of policies in detail, most notably, the scheduling decisions made by an optimal policy. Even our small examples have up to several hundred thousand states, and the optimal policy stores an optimal action for each state. Hence, it is hard to see from the state-action table how the policy actually behaves. In such cases, simulation quickly provides valuable summary information about a policy since we can easily collect data such as the actual task assignments and cost for each component while simulating a project trajectory.

Results. For first results, please refer to section 4 (analysis of simulation results) and section 5 (conclusions). Previous PROSIM work. This paper is a continuation of work which was also presented at PROSIM 2000 and PROSIM 2003. As compared to our previous work, we now provide a whole set of related sample projects, compute the optimal policy for each example, compare the performance of the best list policy against the optimal policy, and study the impact of important project factors on the project cost and scheduling decisions.

2. Input Data for the Sample Project

Components. The sample project has 4 components (A, B, C, and D) of varying complexity and risk-level. Differences between the components with respect to their complexity and risk-level are modelled in the following way:

? The probability distributions for the net development times of the components have different means and variances. The expected net effort for A and B is smaller than for C and D. The distributions for C and D have a higher variance than the distributions for A and B.

? The probability that design problems will originate from a component varies from component to component. The risk is lowest for component A and highest for component D.

According to their complexity and risk-level, the components are ranked A < B < C < D. Note that for a project with 4 components there are 4! = 24 different list policies, denoted as ABCD, ABDC, …. DCBA.

Teams. The project has 2 teams (One and Two). The teams work simultaneously on the project. A team may be specialized on different components. Specialization is expressed in this simulation study by using base distributions as input which have a smaller mean as compared to non-specialized teams, see below.

Base Probabilities. The base probabilities for component A consist of two parts: The probabilities P(D i A(t)) specify how likely it is that team i will finish component A after t time units, and are taken from a binomial

A comprehensive simulation study on optimal scheduling for software projects

A comprehensive simulation study on optimal scheduling for software projects

3. Optimization and Simulations

The simulations in this study are based on a re-implementation of the stochastic process model presented at

[PROSIM 2003]. The new implementation is written in C# under .NET instead of the ModL language coming

with the simulation environment EXTEND. The C# code is much faster, of course.

In addition, we have implemented an algorithm to compute an optimal scheduling policy in our model for each

given set of input data. The algorithm is exact up to the precision of the floating point arithmetic. The algorithm implements value iteration [Bertsekas 1995, Ross 1983] but also takes advantage of the fact that our software

process MDP has no cycles. This yields a faster algorithm which combines value iteration with depth-first search

for accessible states.

For each set of dependency degrees (minimal, uniform, maximal, and asymmetric) and each team specialization

coding (0000 through 2221) we carried out the following steps:

? compute the optimal policy;

? simulate all 24 possible list policies;

? simulate the optimal policy.

During optimization, we recorded the CPU time required for computing the optimal policy, the number of

different states of the MDP, and the expected project cost under the optimal policy — this is also called the

optimal cost.

In the simulations, we ran 10,000 full project trajectories for each policy and recorded the simulated cost

distribution and corresponding mean cost. A comparison of the optimal cost against the simulated mean cost for

the optimal policy showed that 10,000 simulation runs were enough to get a reliable picture for our sample

project.

In general, the effort for computing an optimal strategy grows exponentially with the number of different states

of an MDP. Hence, exact optimization is feasible only if the instance under study is not too large. The following

table summarizes for our sample project with different sets of dependency degrees and different team

specializations the number of MDP states and the CPU time required for computing the optimal policy:

asymmetric coupling minimal uniform maximal

min. num. of states 2,685 293,037 87,809 228,183 max. num. of states 2,833 336,615 103,477 254,651 max. CPU time 1 sec 8 min 19 sec 59 sec 6 min 13 sec

The computations were carried out on a Pentium III processor with 700 MHz clock rate and 256 MB of main

memory.

4. Analysis of Simulation Results

Optimal cost.Recall that the optimal cost is defined as the expected cost of the optimal policy.Figure 1 shows

the optimal cost for all combinations of team specialization and coupling strength that we studied for the sample project. The stronger the coupling between the components, the higher is the optimal cost. This is as expected,

because the risk of change propagation and rework increases with the strength of the coupling.

The optimal cost is highest if none of the teams are specialized, see the bars for specialization coding 0000. The

optimal cost in general is lowest if for each component there is a team which is specialized on that component,

see, for example, the bars for specialization coding 1212. Again, this is as expected, because specialized teams

have a shorter expected net development time for the components than non-specialized teams.

Figure 2 shows the relative performance advantage of the optimal policy over the best list policy for various

project settings. For the optimal policy, we used the exact expected cost computed during optimization; for the

best list policy, we used the simulated mean cost. We analyze the figure in the following subsections.

The remainder of this section is organized along the dimension “degree of specialization.” We shall call the components A and B “small” because they have a shorter expected net development time than the “large”

components C and D, independent of which teams is assigned to work on them.

A comprehensive simulation study on optimal scheduling for software projects

The height of the bars in Figure 2 shows that the stronger the coupling between the components, the smaller is the improvement which dynamic scheduling can achieve over the best list policy. A possible explanation is that due to

the increased process feedback and rework that comes with stronger coupling, the future behavior of the process is more and more unpredictable; thus, it is getting harder for the dynamic strategies to make better decisions than the

list policies.

The following table shows that the optimal cost is smallest if one of the teams is specialized on one large and one small component, whereas the other team is specialized on the remaining large and small component (cases 1221

and 1212). The optimal cost grows if the specialization is unbalanced, that is, if one of the teams is specialized on more components than the other team (see, for example, case 2212). The optimal cost also grows if one team is specialized on the large components and the other team on the small components (see, for example, case 2211).

This observation is independent of the coupling strength.

coupling minimal

uniform

maximal

asymmetric

optimal cost (cases 1212 and 1221) 7.52

7.50

8.18 8.74

8.22

7.97

optimal cost (cases 2212 and 2221) 8.20

8.11

9.05

9.04

9.86

9.92

8.95

8.93

optimal cost (cases 1222 and 2122) 8.82

8.81

9.59

10.29

10.33

9.68

9.70

optimal cost

(case 2211) 8.81 9.74 10.51 9.83

optimal cost

(case 1111) 9.10 9.98 10.84 9.79 Specialized teams for two components only. These settings fall into three groups: The first group contains

the cases 0011 and 0012, where the teams are specialized on the two large components; the second group holds

the cases 0101, 0102, 0110, 0120,1001, 1002, 1010, and 1020, where the teams are specialized on one large and

one small component; and the third group contains the cases 1100 and 1200, where the teams are specialized on

the two small components.

In all three groups, the best list policy always is close to optimal, see Figure 2. We are currently analyzing the details, but it looks as if the best list policy can take advantage of the specialization the same way the optimal

policy does.

The following table shows the optimal cost in each group. The specializations complement each other if one

team is specialized on one component and the other team on the other component, as opposed to one and the

same team being specialized on both components. In all three groups, complementary specialization shows a

lower optimal cost than non-complementary specialization (see, for example, setting 0101 versus 0102). The difference increases with the strength of the coupling.

Suppose that the strength of the coupling is fixed. If we restrict the analysis to settings with complementary specialization, the optimal cost is smallest in the first group and largest in the third group (see, for example,

setting 0012 versus 1200). In addition, the optimal cost shows little variation within the second group; the corresponding median lies between the optimal cost of the first and third group. Similar statements hold if we restrict the analysis to settings with non-complementary specialization (see, for example, setting 0011 versus 1100). Given our particular choice of the base probabilities, the expected net development time for both large

and small components is about 50 percent shorter with a specialized team than with a non-specialized team. Therefore, it is better to have specialized teams for the large, expensive components than for the small ones.

coupling minimal

uniform

maximal

asymmetric

optimal cost of 0011 versus 0012 9.50

9.39

10.23

10.08

10.87

10.70

10.29

9.92

optimal cost of 0101 versus 0102 10.23

9.96

11.22

10.73

12.27

11.37

11.07

10.56

optimal cost of 0110 versus 0120 10.21

10.06

11.21

10.81

12.31

11.43

11.14

10.38

optimal cost of 1001 versus 1002 10.32

9.95

11.29

10.72

12.33

11.34

11.26

10.72

optimal cost of 1010 versus 1020 10.28

10.03

11.29

10.77

12.36

11.39

11.13

10.59

optimal cost of 1100 versus 1200 11.11

10.45

12.04

11.10

12.86

11.69

11.93

10.96

5. Conclusions

In the analysis presented in this paper, we have focused on:

? the impact of the strength of the coupling on the optimal cost;

? the performance difference between the best list policy and the optimal policy;

? the impact of the degree of specialization of the teams;

? the impact of the relative “size” of the components.

We have tried to take a representative sample from the input space along the dimensions “specialization” and “coupling.” Although the effort for optimizing an MDP in general grows exponentially with the size of the problem, we found that small examples can be optimized and simulated in reasonable computing time. Based on the study of our small sample projects, we reach the following – preliminary – conclusions: ? The best list policy in general is not optimal.

? The more teams are specialized on certain components, the larger is the performance advantage of the optimal policy over the best list policy.

? The stronger the coupling between the components, the smaller is the improvement which dynamic scheduling can achieve over the best list policy.

In our sample projects, the advantage of the optimal policy over the best list policy ranged between 2 and 5.5 percent of the expected project cost. We argue that the improvement will grow much larger with the size of the project, since then the dynamic strategies will have more opportunities to take advantage of their knowledge of the component sizes, team productivity, coupling, and current project state. This is subject to future studies. Acknowledgments

The simulation and optimization program was written by Max Horstmann as part of his diploma thesis. This research is supported by the Deutsche Forschungsgemeinschaft DFG under the project title OASE. References

[SPIP 2002] Padberg: A discrete simulation model for assessing software project scheduling policies.

Int. Journal on Software Process Improvement and Practice 7 (2002) 127-139

[ICSE 2003] Padberg: A software process scheduling simulator.

Int. Conference on Software Engineering 25 (2003) 816-817

[PROSIM 2003] Padberg: Applying process simulation to software project scheduling.

Int. Software Process Simulation Modeling Workshop 2003

[APSEC 2002] Padberg: Using process simulation to compare scheduling strategies for software projects.

Asia-Pacific Software Engineering Conference 9 (2002) 581-590

[APSEC 2001] Padberg: Scheduling software projects to minimize the development time and cost with a given staff.

Asia-Pacific Software Engineering Conference 8 (2001) 187-194

[PROSIM 2000] Padberg: Towards optimizing the schedule of software projects with respect to development time and cost.

Int. Software Process Simulation Modeling Workshop 2000

[Bertsekas 1995] Bertsekas: Dynamic Programming and Optimal Control I, II

Athena Scientific, 1995

[Ross 1983] Ross: Introduction to Stochastic Dynamic Programming

Academic Press, 1983

《PowerSystemsIEEETransactionson》期刊第94页50条数据
A comprehensive approach to transient stability ...on financial risk in hydropower projects》原文链接...《Branch outage simulation for MVAr flows: bounded......
论文出版期刊
Simulation of Electro-hydrostatic Actuator with a ...Optimal Compensation Control for Electro-hydraulic ...on FLANN Modeling for UAV resource scheduling ......
电力系统优化调度研究
Study for optimal scheduling of power system Abstract Optimal operation of ...a complicated systematic project.Currently,there is no systematic, comprehensive......
WindFloat A floating foundation for offshore wind t...
ed aerodynamic simulation of the wind turbine; ...21 and 22 for comprehensive overviews. ...oater motion on turbine performance, a study was......
A comprehensive simulation approach for pollutant bio-transformation...
A comprehensive simulation approach for pollutant bio-transformation in the gravity sewer Nan Zhao; A comprehensive simulation approach for pollutant bio-......
Aspen_RefSYS
flowsheet simulation, case study analysis and ...comprehensive set of software solutions, professional...For nearly a quarter century, AspenTech has ......
Visual modeler for Grid modeling and simulation (GridSim) toolkit...
provides a comprehensive facility for simulation of application scheduling in ...The usefulness of VM is illustrated by a case study on simulating a Grid......
STRESS A Simulator for Hard Real-Time Systems
ed on a testbed: arguments regarding the ...simulation of both application and kernel software ...scheduling theory) for kernel and application ......
A Parallel Algorithm for Compile-Time Scheduling of Parallel Program...
The algorithm generated optimal solutions for a ...The proposed algorithm is the fastest scheduling ...In addition to simulation studies, the algorithm ......
基于改进引力搜索算法的风电功率短期预测优化调度研究
(2019)03~0201~06 Gravitationalsearchalgorithm~basedstudyonoptimalwindpowerschedulingwithshort~termpredictionKOUJiantao ( StateGridCustomerServiceCenter? Tianjin ......
Zemax simulation
using Zemax can be used to study such a system...Program is such a comprehensive software tool [1...detector plane for the simulation was set to 100......
On Non-Preemptive Scheduling of Recurring Tasks Using Inserted Idle...
on-line inserted idle time algorithm for ...2 1 Introduction The study of scheduling recurring...A comprehensive account of the recent work ......
SIGTTO_-_Simulation_Information_Paper
The geographicdatabaseusuallyhastobemodelledby thesoftwaremanufacturer–aprocess...from such simulation results, decisions can then be madeontheoptimalplantsize......