当前位置:文档之家› Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear S

Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear S

Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear S
Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear S

Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear

Systems?

A NSHUL G UPT A

IBM T.J.W A TSON R ESEARCH C ENTER

Y ORKTOWN H EIGHTS,NY10598

ANSHUL@WA https://www.doczj.com/doc/706504779.html,

V IPIN K UMAR

D EP ARTMENT OF C OMPUTER S CIENCE

U NIVERSITY OF M INNESOT A

M INNEAPOLIS,MN55455

KUMAR@https://www.doczj.com/doc/706504779.html,

Abstract

A few parallel algorithms for solving triangular systems

resulting from parallel factorization of sparse linear sys-

tems have been proposed and implemented recently.We

present a detailed analysis of parallel complexity and scal-

ability of the best of these algorithms and the results of

its implementation on up to256processors of the Cray

T3D parallel computer.It has been a common belief that

parallel sparse triangular solvers are quite unscalable due

to a high communication to computation ratio.Our anal-

ysis and experiments show that,although not as scalable

as the best parallel sparse Cholesky factorization algo-

rithms,parallel sparse triangular solvers can yield reason-

able speedups in runtime on hundreds of processors.We

also show that for a wide class of problems,the sparse tri-

angular solvers described in this paper are optimal and are

asymptotically as scalable as a dense triangular solver.

1Introduction

The process of obtaining a direct solution to a sparse system of linear equations usually consists of four phases:reordering,symbolic fac-torization,numerical factorization,and solving the lower-and upper-triangular systems resulting from factorization.Since numerical fac-torization is computationally the most expensive phase,a signi?cant research effort has been directed towards developing ef?cient and scalable parallel sparse factorization algorithms.We have recently proposed[4]a parallel sparse Cholesky factorization algorithm that is optimally scalable for a wide class of problems.Experiments have shown that this algorithm can easily speedup Cholesky factorization by a factor of at least a few hundred on up to1024processors.With such speedups in numerical factorization,it is imperative that the re-maining phases of the solution process be parallelized effectively in order to scale the performance of the overall solver.Furthermore, without an overall parallel solver,the size of the sparse systems that can be solved may be severely restricted by the amount of memory available on a uniprocessor system.

In this paper,we address the problem of performing the?nal phase of forward and backward substitution in parallel on a distributed mem-ory multiprocessor.We present a detailed analysis of the parallel complexity and scalability of parallel algorithm described brie?y in[5] to obtain a solution to the system of sparse linear equations of the forms LY=B and U X=Y,where L is a lower triangular matrix and U is an upper triangular matrix.Here L and U are obtained from the numerical factorization of a sparse coef?cient matrix A of the original system AX=B to be solved.If A,L,and U are N×N matrices,

then X,Y,and B are N×m matrices,where m is the number of right-hand side vectors for which the solution to the sparse linear system with A as the coef?cient matrix is desired.Our analysis and experi-ments show that,although not as scalable as the best parallel sparse Cholesky factorization algorithms,parallel sparse triangular solvers can yield reasonable speedups in runtime on hundreds of processors. We also show that for a wide class of problems,the sparse triangular solvers described in this paper are optimal and are asymptotically as scalable as a dense triangular solver.

For a single right-hand side(m=1),our experiments show a 256-processor performance of up to435MFLOPS on a Cray T3D,on which the single-processor performance for the same problem is≈8.6 MFLOPS.With m=30,the maximum single-processor and256-processor performance observed in our experiments is≈30MFLOPS and≈3050MFLOPS,respectively.T o the best of our knowledge,this is the highest performance and speedup for this problem reported on any massively parallel computer.

In addition to the performance and scalability analysis of parallel sparse triangular solvers,we discuss the redistribution of the triangu-lar factor matrix among the processors between numerical factoriza-tion and triangular solution,and its impact on performance.In[4],we describe an optimal data-distribution scheme for Cholesky factoriza-tion of sparse matrices.This distribution leaves groups of consecutive columns of L with identical pattern of non-zeros(henceforth called supernodes)with a two-dimensional partitioning among groups of processors.However,this distribution is not suitable for the triangular solvers,which are scalable only with a one-dimensional partition-ing of the supernodal blocks of L.We show that if the supernodes are distributed in a subtree-to-subcube manner[2]then the cost of converting the two-dimensional distribution to a one-dimensional dis-tribution is only a constant times the cost of solving the triangular systems.From our experiments,we observed that this constant is fairly small on the Cray T3D—at most0.9for a single right-hand side vector among the test cases used in our experiments.Of course,if more than one systems need to be solved with the same coef?cient matrix,then the one-time redistribution cost is amortized.

2Algorithm Description

In this section,we describe parallel algorithms for sparse forward elimination and backward substitution,which have been discussed brie?y in[5].The description in this section assumes a single right-hand side vector;however,the algorithm can easily be generalized to multiple right-hand sides by replacing all vector operations by the corresponding matrix operations.

01234567891011121314151718

160123456789101112131415161718

X

X X

X X X X X X

X X X

X X X X X X

X

X X

X X X X X X X X

X X

X X X X X X X

X X X X X X X X X X X X

X X X X X X X X X X X

X X X X X X X X X X X X X

X

X X X X X X X X X X

X X X X

X

X X X

X X 01349101213

2

5

11

14

67

815

16

17

18

Level 3Level 0Level 1

Level 2P P P P P P P P 123

4567

P P P P P P P 0,1

02,3

4,5

6,7

0,1,2,3

4,5,6,7

0,1,2,3,4,5,6,7

(a)

(b)

Figure 1:A symmetric sparse matrix and the associated elimination tree with

subtree-to-subcube mapping onto 8processors.The nonzeros in the original matrix are denoted by the symbol “×”and ?ll-ins are denoted by the symbol “?”.

2.1Forward elimination

The basic approach to forward elimination is very similar to that of multifrontal numerical factorization [12]guided by an elimination tree [13,8]with the distribution of computation determined by a subtree-to-subcube mapping [2].A symmetric sparse matrix,its lower triangular Cholesky factor,and the corresponding elimination tree with subtree-to-subcube mapping onto 8processors is shown in Figure 1.The computation in forward elimination starts with the leaf supernodes of the elimination tree and progresses upwards to terminate at the root supernode.A supernode is a set of columns i 1,i 2,...,i t of the sparse matrix such that all of them have non-zeros in identical locations and i j +1is the parent of i j in the elimination tree for 1≤j

As in the case of multifrontal numerical factorization [12],the com-putation in forward and backward triangular solvers can also be or-ganized in terms of dense matrix operations.In forward elimination (see Figure 2),before the computation starts at a supernode,the

L

RHS

1

3

4

Figure 2:Pictorial representation of forward elimination along three levels of an elimination tree.The color of an RHS box is determined by the color(s)of the box(es)at the next lower level that contribute to its value.

elements of the right-hand side vector with the same indices as the nodes of the supernode are collected in the ?rst t contiguous loca-tions in a vector of length n .The remaining n ?t entries of this vector are ?lled with zeros.The computation corresponding to a trapezoidal supernode,which starts at the leaves,consists of two parts.The ?rst computation step is to solve the dense triangular system at the top of the trapezoid (above the dotted line in Figure 2).The second step is to subtract the product of the vector of length t (above the dotted line)with the (n ?t )×t submatrix of L (below the dotted line)from the vector of length n ?t (below the dotted line).After these two computation steps,the entries in the lower part of the vector of length n ?t are subtracted from the corresponding (i.e.,with the same index in the original matrix)entries of the vector accompany-ing the parent supernode.The computation at any supernode in the tree can commence after the contributions from all its children have been collected.The algorithm terminates after the computation at the triangular supernode at the root of the elimination tree.

In a parallel implementation on p processors,a supernode at level l (see Figure 1)from the top is distributed among p /2l processors.The computation at a level greater than or equal to log p is performed

sequentially on a single processor assigned to that subtree.However, the computation steps mentioned above must be performed in parallel on p/2l processors for a supernode with0≤l

In[6],Heath and Romine describe ef?cient pipelined or wavefront algorithms for solving dense triangular systems with block-cyclic row-wise and column-wise partitioning of the triangular matrices.We use variations of the same algorithms on the dense trapezoidal supern-odes at each of the parallel levels of the elimination tree.the number of processors among which a supernode is partitioned varies with its level in the tree,but the same basic parallel algorithm is used for each supernode.Figure3(a)shows hypothetical forward elimination on a supernode with an unlimited number of processors on an EREW-PRAM.From this?gure,it is clear that,due to data dependencies,at a time only max(t,n/2)processors can remain busy.Since the com-putation proceeds along a diagonal wave from the upper-left to the lower-right corner of the supernode,at any given time,only one block per row and one element per column is active.From this observation, it can be shown that an ef?cient parallel algorithm(an algorithm ca-pable of delivering a speedup of (p)using p processors)for forward elimination must employ a one-dimensional row-wise or column-wise partitioning of the supernode so that all processor can be busy at all times(or most of the time).From a practical perspective,we chose a row-wise block-cyclic partitioning because n≥t and a more uniform partitioning with reasonable block sizes can be obtained if the rows are partitioned.Figures3(b)and(c)illustrate two variations of the pipelined forward elimination with block-cyclic row-wise partitioning of the supernode.Each box in the?gure can be regarded as a b×b square block of the supernode(note that the diagonal boxes repre-sent lower triangular blocks).In the column-priority algorithm,the computation along a column of the supernode is?nished before a new column is started.In the row-priority algorithm,the computation along a row is?nished before a new row is started.

2.2Backward substitution

The algorithm for parallel backward substitution is very similar.Since an upper triangular system is being solved,the supernodes are or-ganized as dense trapezoidal matrices of height t and width n(n≥t) and a column-wise block-cyclic partitioning is used at the top log p levels of the elimination tree.In backward substitution,the computa-tion starts at the root of the elimination tree and progresses down to the leaves.First,the entries from the right-hand side vector with the same indices as the nodes of a supernode are collected in the?rst t contiguous locations of a vector of length n.The remaining n?t entries of this vector are copied from the entries with same indices in the vector accompanying the parent supernode.This step is not per-

four processors. with cyclic mapping of rows onto (b) Row-priority pipelined computation number of processors. EREW-PRAM with unlimited (a) Pipelined computation on an onto four processors.

tation with cyclic mapping of rows (c) Column-priority pipelined compu-Figure 3:Progression of computation consistent with data dependencies in parallel pipelined forward elimination in a hypothetical supernode of the lower-triangular factor matrix L .The number in each box of L represents the time step in which the coresponding element of L is used in the https://www.doczj.com/doc/706504779.html,munication delays are ignored in this ?gure and the computation time for each box is assumed to be identical.In parts (b)and (c),the supernode is partitioned among the processors using a cyclic mapping.A block-cyclic mapping can be visualized by regarding each box as a b ×b block (the diagonal boxes will represent triangular blocks).

formed for the root supernode,which does not have a parent and for which n =t .The computation at a supernode consists of two steps and can proceed only after the computation at its parent supernode is ?nished.The ?rst computation step is to subtract the product of the t ×(n ?t )rectangular portion of the supernode with the lower part of the vector of size n ?t from the upper part of the vector of size t .The second step is to solve the triangular system formed by the t ×t triangle of the trapezoidal supernode and the upper part of the vector of size t .Just like forward elimination,these steps are carried out serially for supernodes at levels greater than or equal to log p in the elimination tree.For the supernodes at levels 0through log p ?1,the computation is performed using a pipelined parallel algorithm.Figure 4illustrates the pipelined algorithm on four processors with column-wise cyclic mapping.The algorithm with a block-cyclic map-ping can be visualized by regarding each box in Figure 4as a square block (the blocks along the diagonal of the trapezoid are triangular)of size b ×b .

In both forward and backward triangular solvers described in this section,if the system needs to be solved with respect to more than one,say m ,right-hand sides,then the vectors of length n are replaced by rectangular n ×m matrices.The overall algorithms remain identical

t

n

Figure 4:Column-priority pipelined backward substitution on a hypothetical su-pernode distributed among 4processors using column-wise cyclic mapping.

except that all vector operations are replaced by the corresponding matrix operations,the size of the matrix being the length of the vector times the number of vectors.

3Analysis

In this section we derive expressions for the communication over-heads and analyze the scalability of the sparse supernodal multi-frontal triangular solvers described in Section 2.We will present the analysis for the forward elimination phase only;however,the reader can verify that the expressions for the communication overhead are identical for backward substitution.

3.1Communication overheads

It is dif?cult to derive analytical expressions for general sparse ma-trices because the location and amount of ?ll-in,and hence,the distribution and the number if non-zeros in L ,is a function of the the number and position of nonzeros in the original matrix.Therefore,we will focus on the problems in which the original matrix is the adja-cency matrix of a two-or three-dimensional neighborhood graph [14].These classes of matrices include the coef?cient matrices generated in all two-and three-dimensional ?nite element and ?nite difference problems.We also assume as that a nested-dissection based ?ll-reducing ordering is used,which results in an almost balanced elim-ination tree.The subtree-to-subcube assignment of the elimination tree to the processors relies heavily on a balanced tree.Although there are bound to be overheads due to unequal distribution of work,it is not possible to model such overheads analytically because the ex-tent of such overheads is data-dependent.From our experience with

actual implementations of parallel triangular solvers as well as par-allel factorization codes[4],we have observed that such overheads are usually not excessive.Moreover,the overhead due to load imbal-ance in most practical cases tends to saturate at32to64processors for most problems and does not continue to increase as the number of processors are increased.In the remainder of this section,we will concentrate on overheads due to inter-processor communication only.

Consider the column-priority pipelined algorithm for forward elimi-nation shown in Figure3(c).Let b be the block size in the block-cyclic mapping.A piece of the vector of size b is transferred from a proces-sor to its neighbor in each step of the algorithm until the computation moves below the upper triangular part of the trapezoidal supernode. If a supernode is distributed among q processors,then during the entire computation at a supernode,q+t/b?1such communication steps are performed;q?1steps are required for the computation to reach the last processor in the pipeline and t/b steps to pass the entire data(of length t)through the pipeline.Thus,the total com-munication time is proportional to b(q?1)+t,which is O(q)+O(t), assuming that b is a constant.

Besides the communication involved in the pipelined processing over a supernode,there is some more communication involved in collecting the contributions of the vectors associated with the children of a supernode into the vector associated with the parent supernode. If the two child supernodes are each distributed among q processors, then this communication is equivalent to an all-to-all personalized communication[8]among2q processors with a data size of roughly t/q on each processor.This communication can be accomplished in time proportional to t/q,which is asymptotically smaller than the O(q)+O(t)time spent during the pipelined computation phase at the child supernodes.Therefore,in the remainder of this section,we will ignore the communication required to transfer the contributions of the vector across the supernodes at different levels of the elimination tree.

So far we have established that a time proportional to b(q?1)+t (or roughly,bq+t)is spent while processing an n×t trapezoidal supernode on q processors with a block-cyclic mapping that uses blocks of size b.We can now derive an expression for the overall communication time for the entire parallel forward elimination process by substituting for q and t in the expression bq+t for a supernode at level l and summing up the resulting expression over all levels.

Let us?rst consider a sparse linear system of N equations resulting from a two-dimensional?nite element problem being solved on p pro-cessors.As a result of using the subtree-to-subcube mapping,q at a level l is p/2l.If a nested-dissection based ordering scheme is used

to number the nodes of the graph corresponding to the coef?cient ma-trix,then the number of nodes t in a supernode at level l isα

N/2l), which is O(p)+O(

p )+O(

p

)+O(N2/3)+O(p).(2) If more than one(say m)right-hand side vectors are present in the system,then each term in Equations1and2is multiplied with m. 3.2Scalability analysis

The scalability of a parallel algorithm on a parallel architecture refers to the capacity of the algorithm-architecture combination to effectively utilize an increasing number of processors.In this section we use the isoef?ciency metric[8,9,3]to characterize the scalability of the algorithm described in Section2.The isoef?ciency function relates the problem size to the number of processors necessary to maintain a?xed ef?ciency or to deliver speedups increasing proportionally with increasing number of processors.

Let W be the size of a problem in terms of the total number of basic operations required to solve a problem on a serial computer.

N)

For example,W=O(N2)for multiplying a dense N×N matrix with an N-vector.The serial run time of a problem of size W is given by T S=t c W,where t c is the time to perform a single basic computation step.If T P is the parallel run time of the same problem on p proces-

sors,then we de?ne an overhead function T o as pT P?T S.Both T P and T o are functions of W and p,and we often write them as T P(W,p) and T o(W,p),respectively.The ef?ciency of a parallel system with p processors is given by E=T S/(T S+T o(W,p)).If a parallel system is used to solve a problem instance of a?xed size W,then the ef?-ciency decreases as p increases.This is because the total overhead T o(W,p)increases with p.For many parallel systems,for a?xed p,if the problem size W is increased,then the ef?ciency increases because for a given p,T o(W,p)grows slower than O(W).For these parallel systems,the ef?ciency can be maintained at a desired value (between0and1)with increasing p,provided W is also increased. We call such systems scalable parallel systems.Note that for differ-ent parallel algorithms,W may have to increase at different rates with respect to p in order to maintain a?xed ef?ciency.As the number of processors are increased,the smaller the growth rate of problem size required to maintain a?xed ef?ciency,the more scalable the parallel system is.

Given that E=1/(1+T o(W,p)/(t c W)),in order to maintain a?xed ef?ciency,W should be proportional to T o(W,p).In other words, the following relation must be satis?ed in order to maintain a?xed ef?ciency:W=e

N).(4) Balancing W against the?rst term in the expression for T o yields the following(see Appendix A for details):

W∝p2,(5)

and balancing it against the second term in the expression for T o yields

p2

W∝

forward and backward triangular solvers are asymptotically as scal-

able as their dense counterparts.From this observation,it can be

argued that the sparse algorithms,at least in the case of matrices

associated with three-dimensional neighborhood graphs are optimal. The topmost supernode in such a matrix is an N2/3×N2/3dense triangle.Solving a triangular system corresponding to this supern-

ode involves asymptotically a computation of the same complexity as

solving the entire sparse triangular system.Thus,the overall scal-

ability cannot be better than that of solving the topmost N2/3×N2/3 dense triangular system in parallel,which is O(p2).

4Data Distribution for Ef?cient Triangular Solution

In Section2and in[8],we discuss that in order to implement the steps

of dense triangular solution ef?ciently,the matrix must be partitioned

among the processors along the rows or along the columns.However,

as we have shown in[4],the dense supernodes must be partitioned along both dimensions for the numerical factorization phase to be ef?-cient.The table in Figure5shows the communication overheads and the isoef?ciency functions for parallel dense and sparse factorization and triangular solution using one-and two-dimensional partitioning schemes.The most ef?cient scheme in each category is denoted by a shaded box in the table.The last column of the table shows the overall isoef?ciency function of the combination of factorization and triangular solvers.Note that the triangular solvers are unscalable by themselves if the dense supernodal blocks of the triangular factor are partitioned in two dimensions.However,the asymptotic communica-tion overhead of the unscalable formulation of the triangular solvers does not exceed the communication overhead of the factorization process.As a result,the overall isoef?ciency function is dominated by that of factorization.Hence,for a solving a system with a single right-hand side vectors(or a small constant number of them),the unscalability of the triangular solvers should not be of much concern. However,if solutions with respect to a number of right-hand side vec-tors are required,then for both the factorization and triangular solution to be ef?cient together,each supernode must be redistributed among the processors that share it.This redistribution must convert the orig-inal two-dimensional block-cyclic partitioning into a one-dimensional block-cyclic partitioning.In this section we show that the time spent in this redistribution is not asymptotically higher than the parallel run time of the triangular solvers.

Consider an n×t dense supernode mapped onto a√q logical grid of processors using a two dimensional partitioning.As shown

Figure5:A table of communication overheads and isoef?ciency functions for sparse factorization and triangular solution with different partitioning schemes.

in Figure6,the redistribution is equivalent to a transposition of each (n/√q proces-sor on which it is horizontally partitioned.This is an all-to-all person-

alized communication operation[8]among√

t

n

Figure 6:Converting the two-dimensional partitioning of a supernode into one-dimensional partitioning.

in [4].Figures 7and 8show the performance of the parallel triangular solvers on a Cray T3D.

In the table in Figure 7,we show the time in seconds and the perfor-mance in MFLOPS on a selected number of processors for ?ve test matrices with the number of right-hand side vectors varying from 1to 30.T o facilitate a comparison of the times for various phases of the solution processes,the table also contains the factorization run time and MFLOPS,as well as the time to redistribute the factor matrix to convert the supernodes from a two-dimensional to a one-dimensional partitioning among the processors.As shown in Figure 7,for a single right-hand side vector,the highest performance achieved on a 256-processor Cray T3D is approximately 435MFLOPS,which increases to over 3GFLOPS if a solution with 30right-hand side vectors is https://www.doczj.com/doc/706504779.html,paring with the single-processor performance for BC-SSTK15,this represents roughly 50-and 100-fold enhancement in performance on 256processors for 1and 30right-hand side vec-tors,respectively.There are two other important observations to be made from the table in Figure 7.First,despite a highly scalable im-plementation of sparse Cholesky factorization,parallelization of the relatively less scalable triangular solvers can speed them enough so that their runtime is still a small fraction of the factorization time.Sec-ond,although ef?cient implementations of factorization and triangular solvers use different data partitioning schemes,the redistribution of the data,on an average,takes much less time than the triangular solvers for a single right-hand side vector on the T3D.

Figure 8shows the plots of MFLOPS versus number of processors

BCSSTK15: N = 3948; Factorization Opcount = 85.5 Million; Nonzeros in factor = 0.49 Million NRHS FBsolve time (sec.)FBsolve MFOLPS 521.510203026.529.430.018.6

2

13.7.228.284.452.740 1.33 1.92Factorization time = .721 sec.Factorization MFLOPS = 3871

Time to redistribute L = .035 sec.Factorization MFLOPS = 800

Time to redistribute L = .009 sec.Factorization time = .107 sec.

Factorization time = 5.59 sec.

Factorization MFLOPS = 499

Time to redistribute L = .071 sec.Factorization time = 2.46 sec.Factorization MFLOPS = 34.8Time to redistribute L = 0.0 sec.

p = 64

p = 1

NRHS FBsolve time (sec.)FBsolve MFOLPS 51020301281.5145285405527583.024.027.034.048.074.100BCSSTK31: N = 35588; Factorization Opcount = 2791 Million; Nonzeros in factor = 6.64 Million NRHS FBsolve time (sec.)FBsolve MFOLPS 510203012NRHS FBsolve time (sec.)FBsolve MFOLPS 51020301

2.073363

.082646.1071240.1521738.24221992385.334HSCT21954: N = 21954; Factorization Opcount = 2822 Million; Nonzeros in factor = 5.84 Million p = 16

.227115

.274194.398330.614427 1.05498 1.51523COPTER2: N = 55476; Factorization Opcount = 8905 Million; Nonzeros in factor = 12.77 Million p = 256NRHS FBsolve time (sec.)FBsolve MFOLPS 51020301

2p = 256NRHS FBsolve time (sec.)FBsolve MFOLPS 510203012.117.130.167.232.364.500434

785

1526

2195

2805

3053

CUBE35: N = 42875; Factorization Opcount = 7912 Million; Nonzeros in factor = 9.95 Million

.108.120.154.216.340.4683696651289183823452548NRHS FBsolve time (sec.)FBsolve MFOLPS 510203012p = 64

.186

.220.320.492.832 1.10274463795103612261277NRHS FBsolve time (sec.)FBsolve MFOLPS 510203012NRHS FBsolve time (sec.)FBsolve MFOLPS

510203012p = 32

.113

.133.189.284.472.6722033476098099731025p = 32

.245.296.436.681 1.21 1.72162269456583660693p = 256NRHS FBsolve time (sec.)FBsolve MFOLPS 51020301

2.091

.122.099.161.234.312255471953145219912244p = 256Factorization time = 2.48 sec.Factorization MFLOPS = 1138Factorization time = 7.528 sec.Factorization MFLOPS = 1051Time to redistribute L = .13 sec.

Factorization time = .619 sec.Factorization MFLOPS = 4560Time to redistribute L = .067 sec.Time to redistribute L = .10 sec.

Factorization time = 1.846 sec.

Time to redistribute L = .07 sec.

Factorization time = 1.43 sec.

Time to redistribute L = .08 sec.

Factorization time = 5.764 sec.Factorization MFLOPS = 1545Time to redistribute L = .11 sec.

Factorization MFLOPS = 4825

Factorization MFLOPS = 5527

Figure 7:A partial table of experimental results for sparse forward and backward substitution on a Cray T3D.In the above table,“NRHS”denotes the number of right-hand side vectors,“FBsolve time”denotes the total time spent in both the forward and the backward solvers,and “FBsolve MFLOPS”denotes the average performance of the solvers in million ?oating point operations per second.

p p p

p 050010001500

2000

25003000350060

80

100

120

140

160

180

200

220

240

260

050010001500

2000250030000

50

100

150

200

250

300

01002003004005006000

10

20

30

40

50

60

70

050010001500

20002500050100150200250300

BCSSTK15

BCSSTK31

CUBE35

COPTER2

MFLOPS

MFLOPS

NRHS = 30

NRHS = 20NRHS = 10NRHS = 5NRHS = 2NRHS = 1Figure 8:Performance versus number of processors for parallel sparse triangular solutions with different number of right-hand side vectors.

of the Cray T3D for triangular solutions with different number of right-hand side vectors.The curves for these four test matrices show that both the overall performance and the speedups are much higher if a block of right-hand side vectors is available for solution.The use of multiple right-hand side vectors enhances the single processor performance due to effective use of BLAS-3routines.It also improves speedups because the cost of certain index computations required in the parallel implementation can be amortized over all the right-hand side vectors.

6Concluding Remarks

Despite more inherent parallelism that dense linear systems,it has been a challenge to develop scalable parallel direct solvers for sparse linear systems.The process of obtaining a direct solution to a sparse system of linear equations usually consists of four phases:reorder-ing,symbolic factorization,numerical factorization,and forward elimi-nation and backward substitution.A scalable parallel solver for sparse linear systems must implement all these phases effectively in parallel.In [4],we introduced a highly scalable parallel algorithm for sparse Cholesky factorization,which is the most time consuming phase of solving a sparse linear system with s symmetric positive de?nite (SPD)matrix of coef?cients.In [7],Karypis and Kumar present an

ef?cient parallel algorithm for a nested-dissection based?ll-reducing ordering for such sparse matrices.The results of this paper bring us another step closer to a complete scalable direct solver for sparse SPD systems.In this paper,we have shown that although less scal-able than numerical factorization,the forward and backward substitu-tion steps can obtain suf?cient speedup on hundreds of processors so that numerical factorization still dominates the overall time taken to solve the system in parallel.In addition,we show that,although ef?cient implementations of factorization and triangular solvers use different data partitioning schemes,the time spent in redistributing the data to change the partitioning schemes is not a bottleneck when compared to the time spent in factorization and triangular solutions. References

[1]A.George and https://www.doczj.com/doc/706504779.html,puter Solution of Large Sparse

Positive De?nite Systems.Prentice-Hall,Englewood Cliffs,NJ, 1981.

[2]A.George,J.W.-H.Liu,and https://www.doczj.com/doc/706504779.html,munication reduc-

tion in parallel sparse Cholesky factorization on a hypercube.

In M.T.Heath,editor,Hypercube Multiprocessors1987,pages 576–586.SIAM,Philadelphia,P A,1987.

[3]Ananth Grama,Anshul Gupta,and Vipin Kumar.Isoef?ciency:

Measuring the scalability of parallel algorithms and architec-tures.IEEE Parallel and Distributed T echnology,1(3):12–21, August,1993.Also available as T echnical Report TR93-24, Department of Computer Science,University of Minnesota,Min-neapolis,MN.

[4]Anshul Gupta,George Karypis,and Vipin Kumar.Highly scal-

able parallel algorithms for sparse matrix factorization.T echnical Report94-63,Department of Computer Science,University of Minnesota,Minneapolis,MN,1994.Submitted for publication in IEEE T ransactions on Parallel and Distributed Computing.

Postscript?le available in users/kumar at anonymous FTP site https://www.doczj.com/doc/706504779.html,.

[5]M.T.Heath and Padma Raghavan.Distributed solution of sparse

linear systems.T echnical Report93-1793,Department of Com-puter Science,University of Illinois,Urbana,IL,1993.

[6]M.T.Heath and C.H.Romine.Parallel solution of triangular

systems on distributed-memory multiprocessors.SIAM Journal on Scienti?c and Statistical Computing,9(3):558–588,1988.

[7]G.Karypis and V.Kumar.Parallel multilevel graph partitioning.

T echnical Report TR95-036,Department of Computer Science, University of Minnesota,1995.

[8]Vipin Kumar,Ananth Grama,Anshul Gupta,and George

Karypis.Introduction to Parallel Computing:Design and Anal-ysis of Algorithms.Benjamin/Cummings,Redwood City,CA, 1994.

[9]Vipin Kumar and Anshul Gupta.Analyzing scalability of parallel

algorithms and architectures.Journal of Parallel and Distributed Computing,22(3):379–391,1994.Also available as T echnical Report TR91-18,Department of Computer Science Depart-ment,University of Minnesota,Minneapolis,MN.

[10]R.J.Lipton,D.J.Rose,and R.E.T arjan.Generalized nested

dissection.SIAM Journal on Numerical Analysis,16:346–358, 1979.

[11]R.J.Lipton and R.E.T arjan.A separator theorem for planar

graphs.SIAM Journal on Applied Mathematics,36:177–189, 1979.

[12]J.W.-H.Liu.The multifrontal method for sparse matrix solution:

Theory and practice.T echnical Report CS-90-04,Y ork Univer-sity,Ontario,Canada,1990.Also appears in SIAM Review, 34:82–109,1992.

[13]J.W.-H.Liu.The role of elimination trees in sparse factorization.

SIAM Journal on Matrix Analysis and Applications,11:134–172, 1990.

[14]Gary https://www.doczj.com/doc/706504779.html,ler,Shang-Hua T eng,and Stephen A.Vavasis.A uni-

?ed geometric approach to graph separators.In Proceedings of 31st Annual Symposium on Foundations of Computer Science, pages538–547,1991.

Appendix A

Derivation of the isoef?ciency function for paral-lel triangular solvers

Consider solving the triangular systems resulting from the factoriza-tion of a sparse matrix associated with a two-dimensional neighbor-hood graph.From Equation3,W=O(N log N and from Equa-tion4,T o=O(p2)+O(p√

and

W∝p

N,

p

N∝

)2,

log p

p2

N log N∝

(15)

log p

Thus,we have derived Equation6.Similarly,we can derive the isoef?ciency function for the triangular systems resulting from the factorization os sparse matrices associated with three-dimensional neighborhood graphs.Recall from Section3.2that for such systems, W=O(N4/3)and T o=O(p2)+O(pN2/3).If W∝T o,then W∝p2 (the?rst term of T o)and W∝pN2/3.The second condition yields

N4/3∝pN2/3,

N2/3∝p,

N4/3∝p2,

W∝p2,(16) which is same as Equation9.

creo2.0_MXXX破解的破解文件及详细安装说明

1. 制作许可证文件:解压下载回来的破解文件,在目录下找到ptc_licfile.dat 文件,复制到D:\Program Files\PTC(没有的话创建,创建后不能删除),然后右键,选择打开方式为用记事本打开,打开后在“编辑”菜单里点“替换”,如下图所示,查找内容00-00-00-00-00-00,替换为你的主机MAC地址(不知道如何查询MAC地址的百度,笔记本有好几个MAC,慢慢研究),输入完成后点全部替换然后保存ptc_licfile.dat文件; 2. 运行setup.exe 开始安装,界面如下图所示,选择“安装新软件”-“下一步”;

3. 选中“我接受”,接受许可协议,然后点“下一步”; 4. 在许可证汇总中输入你的许可证的位置D:\Program Files\PTC\ptc_licfile.dat, 软件会自动检测许可证是否可以,提示“可用的”,单击“下一步”,继续安装;

5. 选择软件的安装位置和需要安装的应用程序,如果需要详细配置应用程序的功能,单击“自定义”,选择需要的应用程序功能,选择完成后,单击“安装”,开始安装程序;

6. 程序安装中 7. 软件安装完成,单击“完成”,退出安装; 8. 破解Creo Parametric:

1.双击运行:辅助论坛Creo破解补丁.exe 2.指定目录如:PTC\Creo 1.0(2.0)\Common Files\Mxxx,然后破解 3.完成后会出现如下提示,点击[OK],破解成功 26.破解Distributed Services: 1. 在右侧指定..\Creo 1.0( 2.0)\Creo Distributed Services Manager 后再次点击【look For】

cadence软件安装步骤说明

Cadence软件安装破解步骤 文档目录 1、安装准备工作 (2) 2、软件安装 (2) 3、软件破解 (4) 4、关于license (4) 5、环境配置 (6) 6、环境配置示例 (7)

Cadence公司软件安装步骤大同小异,这里就归类到一起,安装其所有软件均适用。 1、安装准备工作: 图形安装工具:iscape.04.11-p004 所要安装的软件包:如IC615等(几乎所有cadence软件的图形安装步骤都一样)。 破解文件:破解文件包括两个文件,以为patch文件,以为pfk 文件。 License:Cadence的license比较好找,也好制作。网上很多license,也可以自己制作。 2、软件安装: 1)、进入iscape.04.11-p004/bin/,运行iscape.sh进入软件安装图形界面,如下图所示。 说明:在选择软件安装路径是须注意,如果解压后有多个CDROM

文件夹,在该处选择到CDROM1下即可,其他CDROM包会自动加载。 2)、继续到以下界面,选中所要安装的软件,然后继续下一步: 3)、点击下一步到一下安装界面,进行配置。

点击“Start”开始安装。 4)、安装到一定完成后会弹出一些关于软件的配置,如OA库位置的设置等,若没有特殊要求更改的可一直回车。配置完成后可关闭图形安装窗口。 3、软件破解: 将破解文件复制到软件的安装目录下,运行patch文件跑完即可。但是需要注意的是32bit/64bit的软件破解文件有可能不是同一个patch文件,出现破解不完全。若是这样,会出现只能运行32bit或者64bit的软件,运行另一版本会提示license的错误。在找patch文件的时候需注意patch所适用的软件及版本。 4、关于License: 在网上能找到很多license可用,特别是eetop。也可以根据自己

培训学校教师培训手册

目录 1.教态培训 2.课堂管理培训 3.板书培训 4.课堂用语培训 5.简笔画培训 6.教具设计及制作 7.字母教学培训 8.语音教学培训 9.词汇教学培训 10.句型教学 11.课文教学 12.课型概观 13.新授课设计流程 14.巩固课设计流程 15.练习课设计流程 16.教师规范化课堂用语大全 17.教师教案样本 18.教师课堂实践样本 19.学生心理特点及应对策略 20.教师授课风格解析 21.课堂管理技巧

22.教师基本素质要求 23.如何维持课堂纪律 24.教师行为规范细则 25.教师职能细则 26.活动和示范课视频网址大全

一.教态培训 教态是指教师的体态语言。良好的教态应该大方,亲切,赋予激励性和感染力。 体态语言包括身体动作,面部表情,声音变化,服装修饰等要素。 1.身体动作可分为站姿,位移与手势三个部分。 ①.教师的站立姿态要做到端庄,挺拔,精神饱满。 ②.位移指教师在课堂上站立位置的变化,位移要注意以下几点: a.教师不要总是站在讲台上,要适当地走到学生当中,在心理上接近学生; b.教师的位移要考虑学生的心理,一般来说学生在考试或专心做练习时不希望有教师在他们身边走动或停留,如果这样可能会分散他们的注意力,甚至造成情绪紧张; c.教师不要走到教室太后部进行讲解和演示,否则会使前面的学生看不到教师的演示; d.板书完毕后要离开讲台,不要挡住学生的视线; e.位移的速度不徐不急,不要过于频繁和急速地走动。 ③.手势的运用可以提高教学效率,增强语言表达效果。 教师可以借助手势来组织学生活动,讲解教学内容,调控学生的课堂行为。教师在使用手势语时,要保证自己的手势所表达的意义清楚,明白,同时还要注意手势自然舒展,努力做到赏心悦目。 2.面部表情是教态的一个重要组成部分,教师在教学中的表情大致可分为两类:一是常规性表情,亲切的微笑是常规表情的基本形式;二是变化性表情,

安装bt5到u盘方法与步骤

安装bt5到u盘方法与步骤 先弄个BackTrack的Live版ISO文件,官网上有。我选的是BackTrack5R2KDE64位(文档上介绍的GNOME版) 运行虚拟机,从ISO文件启动,BackTrack就跑起来了。用startx命令切换到图形界面。 安装过程需要从互联网下载安装软件,所以先检查互联网连接,可用nslookup https://www.doczj.com/doc/706504779.html, 如果域名解析成功,互联网连接就没问题了。不行的话用ifconfig检查接口状态,用/etc/init.d/networking stop关闭网络接口,用/etc/init.d/networking start启动网络接口 在U盘上安装先要在系统中找到U盘,即找到它的路径,可以用dmesg|egrep hd.\|sd.命令,一般U盘的路径是/dev/sdb,不过不同环境不一样,例如,接了不止一个U盘的话,就不一定是这个路径了。 找到U盘。用fisk/dev/sdb对它做分区,分区步骤如下 1)建一个主分区(primary),大小500M左右,把它toggle为83,设为active(这个区后面是用作/boot分区的,路经是/dev/sdb1) 2)建一个扩展分区(extend),大小是剩下的空间(就是直接敲回车就行了) 3)建一个逻辑分区(logical),大小跟2)的一样(也是直接敲回车就行了,这个后面是用作/分区,路经是/dev/sdb5) 4)别忘了敲w命令哦,保存分区表 后面的安装需要一些软件和工具,所以要升级一下BackTrack apt-get update apt-get install hashalot 升级成功后,要对U盘上的分区启用加密 cryptsetup-y--cipher aes-xts-plain--key-size512luksFormat/dev/sdb5 这里会要求建立加密口令的

破解版天越软件安装方法

1、安装天越软件 2、安装完毕后打开C:\WINDOWS\system32\drivers\etc\,用安装包中的hosts文件,替 换掉原文件。 3、器件图标配置:将“cfg.dwg”文件替换C:\Program Files\tyicd\Project\def中的“cfg.dwg” 4、打开本地破解文件夹中的破解限制时间软件,将nexbox导入进去,时间选择12年 左右,点run(杀毒软件可能会将netbox当作病毒,需将其信任) 此时电脑右下角出现的标志 5、开启软件使用(只能兼容AUTOCAD2004或2006版本),用户名zgx,密码zgx。 6、登录后“用户名或者密码不正确”或者:”超过时间限制失效”的提示后,说明破解的时 间不正确,先将右下脚nexbox软件关闭,再重复4的步骤,时间做一下调整。直至可以打开为止。 7、登录软件前5分钟,CAD命令行会遇到“网络不通,请检测网络”而不能插入器件等 编辑,请将cad图纸保存,关掉图纸(天越软件不用关),再打开图纸即可正常使用。 8、使用过程偶尔由于特殊操作,会发生“致命错误”而关闭,建议将CAD自动保存时间 设定为3分钟,避免无保存退出造成的灾难。 技巧问题:

1、系统图器件不能再导成系统图,会出错!!馈线例外 2、在天越中90度旋转系统图,馈线线标和器件编号朝向会相反。解决办法:关闭天越,用CAD打开并进行旋转。 3、如需修改天线、器件等标号名称,请使用查找/替换修改。注意保存图纸,偶尔会出现“致命错误”。 4、定向吸顶天线,请插入软件中“对数周期天线” //若需破解版软件可留邮箱,本人会免费传用。

培训机构-教师标准化手册

培训机构 教师标准化手册 科学大叔 2020年5月31日

目录 第一部分:工作流程 1—1各分校组织构 1—2全程个性化辅导委托协议 1—3个性化教学流程 第二部分:学科业务 2—1教师的职业素养 2—l—1教师职业道德原则 2—l—2特色的教学理解 2—l—3教师在个性化教育中扮演的新角色2—2一对一教学特点和方法 2—2—l全方位了解学生 2—2—2全面考虑,制定教学辅导方案2—2—3与学校教育、教材的关系 2—2—4授课技巧 2—2—5模块知识(单元知识)的辅导措施2—3学员类型 2—3—l决定学员类型的因素 2—3—2典型学员和辅导办法 2—4案和备课 2—4—1教案模板

2—4—2教案规范 2—4—3作业的布置与辅导制度2—4—4试卷分析编写基本要求 2—5如何上好第一节 2—5—1如何确定授课内容 2—5—2怎样备2小时的课 2—5—3怎样提高教学效益 第三部分:服务意识 3—1行为规范 3—1—2迟到与请假规范 3—1—3教师候课、上课规范 3—1—4一对一辅导教室公约附录 附录一:个性化教学流程 附录二:个性教育教师绩效考核评价指标及权重分配

1.1校区组织结构 1.2全程个性化辅导委托协议 这个就是我们和学员签订的合同,协议中规定的甲方(即校方和老师)责任包括: 1.双方签订本协议后,甲方即为乙方建立纸质辅导档案(以下简称档案)。档案含纸质全程个性化课外辅导委托协议,现场咨询记录,及本次协议签订之日起至本协议解除前辅导过程中所产生的能力测试、纸质辅导方案、辅导教师教案、辅导计划、学员测评问卷及其他必要存档内容。 2.甲方提供学科有效测试题,对乙方进行综合学力水平测试等;乙方应如实作答。甲方委派教育咨询师与乙方及乙方法定监护人沟通,全方面了解乙方学习情况、学校表现、受教育历史、过往辅导经历及亲自关系等真实情

市场流通科破解企业难题调研报告完整版

编号:TQC/K764 市场流通科破解企业难题调研报告完整版 Daily description of the work content, achievements, and shortcomings, and finally put forward reasonable suggestions or new direction of efforts, so that the overall process does not deviate from the direction, continue to move towards the established goal. 【适用信息传递/研究经验/相互监督/自我提升等场景】 编写:________________________ 审核:________________________ 时间:________________________ 部门:________________________

市场流通科破解企业难题调研报告 完整版 下载说明:本报告资料适合用于日常描述工作内容,取得的成绩,以及不足,最后提出合理化的建议或者新的努力方向,使整体流程的进度信息实现快速共享,并使整体过程不偏离方向,继续朝既定的目标前行。可直接应用日常文档制作,也可以根据实际需要对其进行修改。 为进一步理清工作思路,破解发展难题,确保学习实践科学发展观活动取得实际效果,根据《市商务局深入学习实践科学发展观活动实施方案》的要求,经局党组研究决定,组织局领导和机关各科室进行深入企业调研。XX年4月19日市场流通科在分管领导熊洲林副局长的带领下到鲁甸县生猪标准化健康养殖示范基地——鲁甸长城建安有限公司进行调研,了解了企业的基本情况,为企业进一步理清了发

dpdk安装及示例程序使用指南(虚拟机版)

DPDK安装及示例程序使用指南(适用于虚拟机) --torronto 2016.1.27 关于dpdk的介绍不用多说,主要就是它是intel开发的一个网络数据包查找转发的套件,用以分析网络数据的,所以只支持intel的网卡以及极少数除intel之外的网卡,具体支持的型号,官网有说明。因此,大多数时候,我们都是用虚拟机来仿真。 1.在虚拟机中的ubuntu系统上手动设置2个网卡(一共3个),就使用默认的桥接模式,然后修改处理器个数为2个处理器,每个处理器2核心。内存分配,1GB以上,2GB更好。 2.去官网下载dpdk软件包,http://www.dpdk.eu/download 3.将软件包解压在主目录下,根据个人喜好,因为后面编译和使用示例每次都要访问的。

4.从终端进入 5.tools文件夹中有一个setup.sh方便新手完成dpdk的设置初始化操作:(当然,配置编译之前先进入特权模式) 6.我们可以看到setup.sh里的一些选项如下: ------------------------------------------------------------------------------ RTE_SDK exported as /home/torronto/dpdk-2.2.0 ------------------------------------------------------------------------------ ---------------------------------------------------------- Step 1: Select the DPDK environment to build ---------------------------------------------------------- [1] arm64-armv8a-linuxapp-gcc [2] arm64-thunderx-linuxapp-gcc [3] arm64-xgene1-linuxapp-gcc [4] arm-armv7a-linuxapp-gcc [5] i686-native-linuxapp-gcc [6] i686-native-linuxapp-icc [7] ppc_64-power8-linuxapp-gcc [8] tile-tilegx-linuxapp-gcc [9] x86_64-ivshmem-linuxapp-gcc [10] x86_64-ivshmem-linuxapp-icc [11] x86_64-native-bsdapp-clang [12] x86_64-native-bsdapp-gcc

教师语言文字规范化基本功培训资料

宝鸡市新福园中学教师语言文字规范化基本功培训资料 资料目录 1.《中华人民共和国国家通用语言文字法》 2.《国家第一批简化字总表》 3.《现代汉语常用字表》 4.《第一批异体字整理表》 5.《标点符号用法》 6.《第一批异型词整理表》 7.《汉语拼音正字法基本规则》 8.语言文字知识问答题目 备注: 1.专任教师要求掌握目录1、2、3内容; 2.语文教师要求掌握1、2、3、4、5、6、7内容;

中华人民共和国国家通用语言文字法 (2000年10月31日第九届全国人民代表大会常务委员会第十八次会议通过) 第一章总则 第一条为推动国家通用语言文字的规范化、标准化及其健康发展,使国家通用语言文字在社会生活中更好地发挥作用,促进各民族、各地区经济文化交流,根据宪法,制定本法。 第二条本法所称的国家通用语言文字是普通话和规范汉字。 第三条国家推广普通话,推行规范汉字。 第四条公民有学习和使用国家通用语言文字的权利。 国家为公民学习和使用国家通用语言文字提供条件。 地方各级人民政府及其有关部门应当采取措施,推广普通话和推行规范汉字。 第五条国家通用语言文字的使用应当有利于维护国家主权和民族尊严,有利于国家统一和民族团结,有利于社会主义物质文明建设和精神文明建设。 第六条国家颁布国家通用语言文字的规范和标准,管理国家通用语言文字的社会应用,支持国家通用语言文字的教学和科学研究,促进国家通用语言文字的规范、丰富和发展。 第七条国家奖励为国家通用语言文字事业做出突出贡献的组织和个人。 第八条各民族都有使用和发展自己的语言文字的自由。 少数民族语言文字的使用依据宪法、民族区域自治法及其他法律的有关规定。 第二章国家通用语言文字的使用 第九条国家机关以普通话和规范汉字为公务用语用字。法律另有规定的除外。 第十条学校及其他教育机构以普通话和规范汉字为基本的教育教学用语用字。法律另有规定的除外。 学校及其他教育机构通过汉语文课程教授普通话和规范汉字。使用的汉语文教材,应当符合国家通用语言文字的规范和标准。 第十一条汉语文出版物应当符合国家通用语言文字的规范和标准。 汉语文出版物中需要使用外国语言文字的,应当用国家通用语言文字作必要的注释。

pfSense安装,详细设置,限流教程

pfSense安装,详细设置,限流,无线等教程 2010-06-22 14:16:23| 分类:Pfsense|字号订阅 写在前面: 还是老话,自己架设的过程,图片就不贴了~~没时间的说~~ m0n0, pfsense 都是硬件级的防火墙, 适用于网吧,中小型企业. 特别是pfsense, 可 做流量均衡, 接上N个ADSL, 速度爽的要死~~ :) 俺现在的这个公司就是三根ADSL, 分别为4,8,8 M 因一根ADSL为IP电话专用, 故~~ 流量均衡只做了两根, 另一根接防火墙桥接 ~~~ :) 等会再贴公司用的情况~~ 先把教程转上~~~ 有兴趣的朋友, 可与俺共同研究~~ :) =================================================================== ============= pfSense是一款功能强大的免费路由器软件,它是在著名的路由器软件mOnOwall基础上开发的,增加了许多mOnOwall没有的功能(pfSense的官方网站称它为the better mOnOwall).PfSense除了包含宽带路由器的基本功能外,还有以下的特点: 基于稳定可靠的FreeBSD操作系统,能适应全天候运行的要求. 具有用户认证功能,使用Web网页的认证方式,配合RADIUS可以实现记费功能. 完善的防火墙,流量控制和数据包过滤功能,保证了网络的安全,稳定和高速运行. 支持多条WAN线路和负载均衡功能,可大幅度提高网络出口带宽,在带宽拥塞时自动分配负载. 内置了Ipsec和PPTP VPN功能,实现不同分支机构的远程互联或远程用户安全地访问内部网. 支持802。1Q VLAN标准,可以通过软件模拟的方式使得普通的网卡能识别802。1Q的标记,同时为多个VLAN的用户提供服务. 支持使用额外的软件包来扩展pfSense功能,为用户提供更多的功能(如FTP和透明代理).详细的日志功能,方便用户对网络出现的事件分析,统计和处理. 使用Web管理界面进行配置(支持SSL),支持远程管理和软件版本自动在线升级. 以目前一台硬路由拆开看,CPU多是ARM9或intel的,其它部件都是嵌入式的。同性能的情况下但是价格却和我们自己组装的软路由价格相差十几倍。品牌又要说到售后服务了,别蒙事了,大伙说值嘛?用过品牌路由的人来说说,确实狗屁不值那么多Money。 其实自己装个路由很简单,真是很简单,别管是用PC元件还是服务器元件,都很简单,只不过是主板+内存+硬盘+CPU的组合而已。 pfSense对计算机硬件的要求很低: CPU普通就好(随着服务的机器越多,当然准备越快越好) RAM至少需要128MB

System x3650 M5 安装手册_上架组装_红皮书

Rack Installation Instructions Review the documentation that comes with the rack cabinet for safety and cabling information.Before you install the server in a rack cabinet,review the following guidelines: v Two or more people are required to install devices2U or larger in a rack cabinet. v See the server Installation and Service Guide for the maximum room air temperature. v Install the server only in a rack cabinet that has perforated doors. v Do not block any air vents,usually15cm(6in.)of air space provides proper airflow. v Plan the device installation starting from the bottom of the rack cabinet. v Install the heaviest device in the bottom of the rack cabinet. v Do not leave open space above or below an installed server in your rack cabinet.To help prevent damage to server components,always install a filler panel to cover the open space and to help ensure proper air circulation. v Do not extend more than one device out of the rack cabinet at the same time. v Connect all power cords to properly wired and grounded electrical outlets. v Do not overload the power outlet when installing multiple devices in the rack. v Remove the rack doors and side panels to provide easier access during installation. v Install the server in a rack that meets the following requirements: –Minimum depth of70mm(2.76in.)between the front mounting flange and inside of the front door.–Minimum depth of190mm(7.48in.)between the rear mounting flange and inside of the rear door.–Minimum depth of718mm(28.27in.)and maximum depth of762mm(30in.)between the front and rear mounting flanges to support the use of the cable management arm. Note:The maximum distance between the front and the rear EIA flange of the rack is810mm(31.9 in.). Use safe practices when lifting. ≥18kg(39.7lb)≥32kg(70.5lb)≥55kg(121.2lb) Do not place any object on top of rack-mounted devices. Note: v The slide assemblies in the box can be installed in racks with unthreaded round holes or racks with square holes. v Rack with unthreaded round holes is used for the illustrations in this guide. v The illustrations in this document might differ slightly from your hardware.

幼儿园见习教师规范化培训内容与要求

松江区幼儿园见习教师规范化培训内容与要求 (试行) 松江区幼儿园见习教师规范化培训内容主要包括职业感悟与师德修养、活动设计与保教实践、班级工作与主题活动、教学研究与专业发展四大方面的18个要点。培训要求以经历这18个要点的过程或完成这18个要点的有关任务为主,把过程记录或结果填写在见习培训手册上,作为培训考核和注册的依据。各培训基地可以根据这18个要点设计对应的培训课程、途径、方法与过程,通过培训让新教师获得相应的感知、体验和感悟,最终能胜任新的岗位工作。具体培训内容与要求如下: 一、职业感悟与师德修养 1、对参加见习教师规范化培训制订个人规划和参培计划书。 2、读一本教师职业生涯或师德修养方面的书,写一份读书心得。 3、完成不少于10篇见习期教师职业生涯体验随笔,包括对实习幼儿园的规章制度、校园文化、备课方式、活动设计、保教实践、师生关系、育儿体会、教师礼仪、学生群体、学校特色等方面的一事一议一得。一事一议一得。 4、完成包括对教师职业感悟在内的见习教师规范化培训总结。 二、活动设计与保教实践 5、在导师指导下通读《上海市学前教育课程指南》,在教研组内作一次相关的课程解读报告。 6、《幼儿园教师参考用书》解读(在教研组内作一次有关“生活、运动、游戏、学习”内容的解读报告)。 7、主题活动设计与教案编写(根据所带班组幼儿的生活、活动经验,完成一项主题活动设计与教案编写,在教研组说课)。 8、在导师指导下正确熟练掌握教育教学基本功,包括:学校常用文体(主题活动设计、教案编写、学生评语等)的撰写,学科有关教学具的使

用,学科基本技能(幼儿教师的舞蹈、乐器)的操练等。 9、观摩小、中、大班共10次半日活动,写出观摩活动报告。在此基础上,由导师、基地团队、双方幼儿园有关人员分别把关,通过三次正式试教。 10、有目的有针对性的观摩,对某一类活动中一个幼儿作连续观察,并对幼儿的发展现状作初步分析;并点评3次其他教师的保教活动(生活、运动、游戏、学习)。 11、记载来园接待、生活、保教活动、游戏、户外运动、离园活动实例的设计。 12、编写周、日计划,并说出理由;同时做好保教工作质量分析。 三、班级工作与育儿体验 13、对幼儿生活养成行为作一次调研分析(由指导老师点评)。 14、就某个问题作一次家访(家长点评)。 15、策划并组织一次家园共育主题活动。 四、教学研究与专业发展 16、精读一本导师推荐的专业书,写出读书笔记;并能自学有关书籍。 17、在导师指导下策划并主持一次幼儿活动设计的讨论。 18、能在导师指导下,制定一份三年的个人专业发展计划。 以上四大方面18个要点既是见习教师规范化期间应该完成的培训内容与任务,也是培训成效的考核点,每一点都代表了新教师专业技能某一方面的培训要求。以18个要点形成的《新教师见习培训手册》对新教师来说仅仅是见习期的目标指南、过程记载和考核记录,是教师专业发展起点的开始;对各个培训在地来说,根据18个要点设计相关的培训课程、内容和操作程序。通过培训,最终使新教师能达到这18个要点的要求,胜任教师岗位工作。

IxChariot软件安装破解方法及简要使用说明

一、安装 1、运行IxChariot_670.exe,默认方式安装,这里不再赘述。 2、正常情况下会自动安装EndPoint程序——pevista32_710.exe。此时可查看PC机的 进程中是否有EndPoint进程,如下图。若有该进程,说明安装成功。 3、若无该进程,手动运行pevista32_710.exe程序,进行安装。 4、到此程序安装完毕,需进行程序破解。 二、破解 压缩包中包含有破解程序,如下图所示 破解步骤: 1、将破解程序IxChariot-6.70.44-Crack.exe复制到安装目录IxChariot文件夹下,如C:\Program Files (x86)\Ixia\IxChariot 2、运行该破解程序,点击Patch,会显示如下信息,破解成功。可以正常使用程序了。

三、程序使用方法 1、运行程序,程序图标如下 2、进入程序后,界面如下 3、点击窗口中的按钮出现如下界面,用以添加数据流。

4、输入数据流的源IP地址和目的IP地址,选择协议为TCP,如下图所示为从10.10.10.1 到10.10.10.2的TCP流量。 5、点击,选择用于跑流的进程,一般选择Throughput.scr这个进程。 点击打开。

6、点击OK,一条流就配置完成了。 7、 8、此时页面会显示刚才配置好的数据流信息,如下图

9、根据需求我们可以配置多条数据流,并且可以对数据流进行复制粘贴。 10、点击菜单栏的RUN按钮,配置运行设置 11、配置运行时间,默认运行时间是1分钟,建议适当延长。点击OK完成配置。

12、OK按钮有可能因为屏幕高度问题,会被WIN7系统任务栏遮挡,此时可将任 务栏拖放到左右两侧。 13、点击按钮即可开始跑流。点击可以停止跑流。 14、选择一条数据流,使用可以将将数据流的方向反过来,即上行流量变下行 流量。

ROSE软件安装及破解说明

Rational Suite Enterprise 2003 产品安装及破解办法 ====================================================== 安装与破解:(注:安装Rose2003前先断开网络连接) 1、安装Rational Rose2003时,在需选择安装项的时候,只选择Rational Rose EnterPrise Edition 即可,不需选择其他项。 2、安装好Rational Rose Enterprise Editon后,打开rose2003crack.rar压缩包,里面有四个文件,分别为flexlm.cpl、license.dat、lmgrd.exe、rational.exe。 3、用记事本或者是EditPlus打开license.dat文件,大约在文件的中间位置有:SERVER Microsoft ANY DAEMON rational “C:\Program Files\Rational\common\rational.exe”将其修改为:SERVER 计算机名ANY DAEMON rational “自己安装的目录\rational.exe”后,保存。注:若是按默认目录安装,则只需修改计算机名即可。 4、将license.dat、lmgrd.exe 、rational.exe三个文件一起拷贝到:安装目录\rational\common\ 下面。如:若为默认则为:C:\Program Files\Rational\common\目录。 5、将flexlm.cpl拷贝到system32目录下。如win2000系统中为C:\WINNT\system32目录。 6、进入控制面板,则在控制面板的上方会增加了一个图标,即FLEXlm License Manager,将其打开,在Setup页中lmgrd.exe右侧目录写为:C:\Program Files\Rational\Common\lmgrd.exe (若为默认安装目录),License File右侧目录写为:C:\Program Files\Rational\Common\license.dat 7、回到Control页,点击Start,若出现”Server Started”,则表示已经成功,可以点击Status,若状态为:计算机名:license server UP(MASTER)则成功。 8、这时可打开安装的Rational Rose Enterprise Edition,若还是出现Error,则打开Rational License Key Administrator ,点击工具栏中的第一个工具(Start WIzard),点击下一步,在Server Name中的名字改为自己的计算机名即可。 9、可能安装好了第一次使用是正常的但是下次使用会出现下面的界面: Failed to check out a key for rose_enterprise_windows: FLEXlm Error -2, Invalid license file syntax OR no licenses in file. FLEXlm Error -15, Cannot connect to license server. If you are evaluating this product, please contact your nearest Rational Software Sales office. For a listing, please check https://www.doczj.com/doc/706504779.html,. 出现这样的情况,你要找到控制面板中的“FLEXlm License Manager”打开,重复步骤7 即可。 其他问题: 1、在WindowsXP中安装出现mem-bad-pointer错误时,将shw32.dll文件拷贝到c盘C:\Program Files\Rational\rose文件夹中。

浦东新区见习教师规范化培训实施意见

浦东新区见习教师规范化培训实施意见(试行)为贯彻落实教育部《中学教师专业标准(试行)》、《小学教师专业标准(试行)》、《幼儿园教师专业标准(试行)》以及《上海市中小学(幼儿园)见习教师规范化培训的指导意见(试行)》(沪教委人〔2012〕18号),按照《浦东新区教师继续教育“十二五”规划》等工作部署,特制定本意见。 一、指导思想 以中小幼教师专业标准为指导,完善并严格实施教师准入制度,严把教师入口关,夯实新教师队伍的专业基础,整体提升浦东新区中小幼新入职教师的素质与能力。研究见习教师职业需求和成长规律,协调统筹区、校两级优质教育资源,使见习教师在优秀教育教学团队的浸润和专门的指导教师带教的过程中,正确认识与迅速适应教师角色,形成良好的教育教学行为规范,强化教育教学实践能力,尽快胜任教育教学工作。 二、培训对象 浦东新区公办学校当年录用的师范院校或其他高等院校相关专业毕业,在中小学、幼儿园、职业学校任教,未获得《上海市见习教师规范化培训合格证书》的人员。 三、培训目标 (一)熟悉相关的教育法规政策,树立依法治教的观念,了解浦东的区情教情,提高教师职业感悟和师德修养; (二)了解教学常规,熟悉并初步掌握所教学科的教材和教法,掌握语言表达、板书板画、信息技术应用等方面的基本功; (三)增强对德育工作的认识,掌握班级工作的基本要点;

(四)懂得教学研究,关注自身的专业发展。 四、培训内容与形式 (一)见习教师规范化培训(以下简称“规范化培训”)的内容主要包括职业感悟与师德修养、课堂经历与教学实践、班级工作与育德体验、教学研究与专业发展等四个方面。 (二)规范化培训应在浦东教育发展研究院指导下,按如下四种方式实施: 1.基地学校培训:见习教师每周不少于两天进入承担培训任务的基地学校,在优秀教育教学团队的浸润中开展见习,接受相应培训。 2.聘任学校培训:见习教师的聘任学校要为见习教师配备富有教育教学经验的带教导师,确保见习教师的培训时间,督促见习教师认真完成培训内容。 3.区级集中培训:浦东教育发展研究院负责全区见习教师集中培训,聘请市、区级专家进行区情教育、师德教育、教师礼仪、专业发展等专题培训。 4.自我研修:见习教师通过自主阅读专业书籍、撰写读书心得、制定与完善个人发展规划等形式,加深对教师职业的感悟,提升师德修养,养成自主发展的良好习惯。 五、管理与考核 (一)规范化培训时间为一年,其中基地学校培训与区级集中培训时间合计不少于总培训时间的50%。 (二)浦东新区教育局在年度招聘新教师的实际需求基础上,制订规范化培训计划,指导聘任学校与见习教师签订事业单位聘用合同。见习教师在合同期间按照有关规定享受相应待遇。 (三)浦东教育发展研究院要依据市相关文件要求和本实施意见,研制浦东新区实施规范化培训所需的《浦东新区见习教师规范化培训内容与要求》、《浦东新区见

ProE5.0安装破解完美教程

Pro/EngineerWildFire5.0(M060)安装破解完美教程 一、软件简介 Pro/Engineer操作软件是美国参数技术公司(PTC)旗下的CAD/CAM/CAE一体化的三维软件。Pro/Engineer软件以参数化著称,是参数化技术的最早应用者,在目前的三维造型软件领域中占有着重要地位,Pro/Engineer作为当今世界机械CAD/CAE/CAM领域的新标准而得到业界的认可和推广。是现今主流的 CAD/CAM/CAE软件之一,特别是在国内产品设计领域占据重要位置。 二、下载链接 可使用迅雷下载: Pro/EngineerWildFire5.0(M060) win32位下载链接: ed2k://|file|[CADCAMCAE%E9%9B%86%E6%88%90%E8%BD%AF%E4%BB%B6].TLF-SOFT-PTC.PRO.ENGINEER.WILDFIRE.V5.M060.WIN32-MAGNiTUDE.iso|3430361088 |85957062297C0B73B024941E33CA86C1|/ Pro/EngineerWildFire5.0(M060) win64位下载链接: ed2k://|file|[CADCAMCAE%E9%9B%86%E6%88%90%E8%BD%AF%E4%BB%B6].PTC. PRO.ENGINEER.WILDFIRE.V5.M060.WIN64-MAGNiTUDE.iso|3547502592|9143726E B500198D3034A2664D57A3DF|/ 三、打开方法 下载完毕后,会得到如下文件,该文件是ISO镜像文件,需要用虚拟光驱打开,推荐使用Daemon Tools虚拟光驱,可在360软件管家中下载。 四、安装步骤 使用虚拟光驱打开后,会出现如下文件夹: 下面就可以开始安装了, 以下方法以win7为准,Win32位与win64位安装方法相同。 1.将杀毒软件关闭,防止误杀 因为该软件为破解版,破解文件类似于病毒文件,可能会造成杀毒软件误杀。 2.修改环境变量

虚拟化管理工具的安装和使用

虚拟化管理工具的安装与使用Libvirt是一个软件的集合,包括API库,后台运行程序(Libvirtd)和命令行工具(virsh)。 Libvirt主要有下面3个功能: 1 虚拟机管理:以虚拟机为对象,Libvirt提供了定义,删除,启动,关闭,暂停,恢复,保存,回滚和迁移各种功能 2 虚拟设备管理:能够管理各种虚拟外设,如虚拟磁盘,虚拟网卡,内存,虚拟CPU 3 远程控制:Libvirt除了对本机进行管理外,还提供了远程连接功能。通过提供的virsh程序或API能够远程连接其他物理机的Hypervisor Libvirt官网:https://https://www.doczj.com/doc/706504779.html,/ Libvirt文档:https://https://www.doczj.com/doc/706504779.html,/docs.html 下图为Libvirt的Java API的查找方式:

安装使用步骤(如果qemu是使用apt-get的安装方式,请从步骤1开始操作) 前期处理:(如果qemu是apt-get安装,可略过此步骤) 前期的qemu由于是源码安装,导致后续安装libvirt和virt-manager时找不到qemu模拟器。因此,删除原来源码安装的qemu,重新安装qemu。 qemu卸载根据安装方式的不同也会有相应的卸载方式,源码编译安装的qemu 需要手动卸载。 可执行文件默认放在/usr/local/bin 库文件默认存放在/usr/local/libexec 配置文件默认存放在/usr/local/etc 共享文件默认存放在/usr/local/share 卸载时只需将上面四个目录中相关文件或者目录删除 命令如下:

# rm -rf /usr/local/bin/qemu-* # rm -rf /usr/local/libexec/qemu-bridge-helper # rm -rf /usr/local/etc/qemu # rm -rf /usr/local/share/qemu 安装qemu:apt-get install qemu 1、libvirt的安装和使用 (1)使用“apt-get”命令安装libvirt,包含libvirt-bin和libvirt-dev两个 包。使用命令“apt-get install libvirt-bin”和“apt-get install libvirt-dev”安装即可。(可以使用多种方式安装libvirt,如果想学习源码安装的方式,读者可自行下载源码编译安装。) (2)libvirt安装后时会默认安装libvirtd和virsh等可执行程序。使用命令 “which libvirtd”可以查看libvirtd命令位置。命令“libvirtd --version”可以查看libvirtd的版本号 (3)使用命令“ps –le|grep libvirtd”查看libvirtd守护进程是否启动,“-le” 参数表示长格式显示所有进程。如果出现如下图的界面,说明libvirtd已经启动。 (4)对libvirt服务(或者叫libvirt-bin服务)常用的操作方式有 “{start|stop|restart |status }”,可以使用service命令对其进行操作。 (5)使用“service libvirt-bin stop”停止libvirt-bin服务,停止后使用“ps -le|grep libvirtd”查看已经没有libvirt的进程。使用“service libvirt-bin start”开启libvirt-bin服务,使用“service libvirt-bin status”查看libvirt-bin服务。

相关主题
文本预览
相关文档 最新文档