Multi-Party Computation from any Linear Secret Sharing Scheme Secure against Adaptive Adver
- 格式:pdf
- 大小:180.49 KB
- 文档页数:13
Cut-Elimination in a Sequents-of-Relations Calculus for G¨o del Logic Matthias Baaz,Agata Ciabattoni Christian G.Ferm¨u llerTechnische Universit¨a t WienA-1040Vienna,Austriabaaz,agata,chrisf@logic.atAbstractIn[5]the analytic calculus for G¨o del logic hasbeen introduced.operates on“sequents of rela-tions”.We show constructively how to eliminate cuts from-derivations.The version of the cut rule we con-sider allows to derive other forms of cut as well as a rulecorresponding to the“communication rule”of Avron’s hy-persequent calculus for.Moreover,we give an ex-plicit description of all the axioms of and prove theircompleteness.1.IntroductionG¨o del logic—also called(G¨o del-)Dummett logic,since Dummett[6]presented thefirst axiomatization match-ing G¨o del’s matrix characterization—is one of the mostimportant many-valued logic.It naturally turns up in a num-ber of different contexts.Already in the1930s G¨o del[9]used it in investigations of intuitionistic logic;later,Dunnand Meyer[7]pointed out its relevance for relevance logic;Visser[13]employed it in investigations of the provabilitylogic of Heyting arithmetic;and eventually,it was recog-nized as one of the most useful species of fuzzy logic(see[10]).In contrast to other fuzzy logics,convincing analyticproof systems have been presented for G¨o del logic.In par-ticular,we here investigate the calculus,introducedin[5],which is based on so-called“sequents of relations”.In all rules are local,have at most two premises,introduce at most one connective at a time and are invert-ible.These properties render this calculus particularly aptfor(human and automated)proof search.Alternative ana-lytic systems for can be found,e.g.,in[11,1,2,3,8,4].In particular,the axioms(basic hypersequents)introducedin[4]are closely related to the axioms of.for the derivation of analytic calculi for certain types of many-valued logics(projective logics).This framework isbased on“sequents of relations”.In a sequent is afinite set of components of the form or for arbitrary formulas.Remark3.1In[5]sequents were defined as sequences of components.However,it is easy to see that it suffices to con-sider sets instead of sequences(or multi-sets).This allows to drop the external rules of permutation and contraction from.Sequent calculi of relations are closely related to hyper-sequent calculi(see,e.g.,[2,3]).We denote sequents(of relations)aswhere the sign()is either or and plays a role similar to the sequent arrow in traditional sequent cal-culi.A sequent is called atomic if all,are atomic for-mulas.The separation sign“”is interpreted as disjunction(at the meta-level).More formally,a component is satisfied by an interpretation if(for ).A sequent is satisfied by if satisfies atleast one of its components.is valid if it is satisfied by all interpretations.The logical rules of—i.e.,the rules for intro-ducing connectives at any place of a sequent—are easily computed given the semantics of,as described in[5]. For convience,we restate the complete set of rules.For disjunction and conjunction we have:where stands for either or,uniformly in each rule. The rules,,and for implication are,respectively:The indicated compound formula in the lower sequent of each rule is called principal formula.We also need(external)weakening:cutand are called cut-formulas;and the indicated compo-nents are referred to as cut-components.So far we have not stated any axioms for.In fact, the computation of a complete set of axioms,for which cuts can be eliminated,is not trivial.In[5]the following was stated(without proof):Axioms of are all sequents that contain a sequentfor,where for all,but for at least one.In addition,all sequents that are obtained from the above ones by deleting components of formorare axioms.We present a more explicit description of the set of ax-ioms of,that corresponds to the original set,up to external weakening.(a),whereand the case is defined as, (b),wherethe case is defined as,(c),where the caseis defined as,(d),where the caseis defined as.We call sequents of type(a),(b),(c),and(d),cycles,-chains,-chains,and--chains,respectively.It is easy to check that all of the above axioms are valid in.However,to guarantee completeness we also have to show the converse:namely,that all valid atomic sequents are obtained from these axioms using external weakening only.For this purpose it is better to consider the dual form of the axioms.I.e.,we make use of the fact thatiff,and thus may consider conjunctions of components instead of disjunctions.Definition3.2A set of components is called dual to axioms if it does not contain any subset of one of the following forms:(a)(anti-cycle),where and the case is defined as,(b)(anti--chain),where the case is defined as, (c)(anti--chain),where the case is defined as, (d)(anti---chain),where the case is defined as.It suffices to prove the following:Theorem3.3Let be afinite set of components, ,where and are either propositional vari-ables or truth constants.If is dual to axioms then is satisfiable;i.e.,there exists an interpretation that satisfies all components of.To prove Theorem3.3we extend any that is dual to axioms to a“maximal”set that is still dual to axioms. Let us write.It will follow from Propostion3.4and Lemma3.5,below,that this is an equivalence relation and that the set of equivalence classesfollows from the following: Lemma3.5If is dual to axioms then eitheror is dual to axioms,too.Proof:The proof proceeds by case distinctions:(1)contains an anti-cycle.Then either al-ready contains an anti-cycle or.From this it follows that is dual to axioms iff is dual to axioms.(2)contains an anti-cycle.Then either al-ready contains an anti-cycle or.From this it follows thatis dual to axioms iff is dual to axioms.(3)Neither nor contains ananti-cycle.(3.1)contains an anti--chain W.l.o.g.,the anti--chain is not already contained in.Therefore(a):.(3.1.1)contains an anti--chainthat is not already contained in.There-fore and.The latter subset can be combined with(a)to an anti--chain in.(3.1.2)contains an anti--chainthat is not already contained in.There-fore and.Thelatter subset can be combined with(a)to ananti---chain in.(3.1.3)contains an anti---chainthat is not already contained in.There-fore and.Thelatter subset can be combined with(a)to ananti---chain in.(3.2)contains an anti--chain thatis not already contained in.Therefore(b1):and(b2):.(3.2.1)contains an anti--chain.Thiscase was already settled in(3.1.1).(3.2.2)contains an anti--chain thatis not already contained in.Then.This subset canbe combined with(b1)to an anti---chainin.(3.3)Neither nor containan anti--chain.(3.3.1)contains an anti--chain that isnot already contained in.Then(c).(3.3.1.1)contains an anti--chain that is not already contained in.Thereforeand.Thefirst subset can be com-bined with(c)to an anti--chain in.(3.3.1.2)contains an anti---chain that is not already contained in.Thereforeand.Thefirst subset can be com-bined with(c)to an anti---chain in.Finally observe that if contains an anti---chain then this anti---chain is already contained in.It is easy to check that this settles all remaining cases.Remark3.6To obtain a calculus for-valued G¨o del logic one only has to add to the axiomwhere for at least,with. Remark3.7As pointed out already in[5],different forms of cuts are admissible in.Focusing on cut ismotivated by the fact that it allows to simulate other forms of cut straightforwardly.E.g.,the following transitivity-cutcomm.It can be derived from a4-component-cycle by applying (cut)twice in the following way:A different type of admissible rule,related to cut,is the so-called Takeuti-Titani rule(see[12]),which expresses the density of the set of truth values:Let then we have tofind a derivation of and a derivation of were,for.Let be the last inference in.Three possibilities arise:(1)is a logical inference.(1.1)The indicated occurrence of is the prin-cipal formula of.Then and are obtainedas the two immediate sub-derivations of.(1.2)The principal formula of is not the indicatedoccurrence of.Suppose,e.g.,thatand ends in a rule as followscutthen we apply the Inversion Lemma to obtain derivations,,,and,where,.These can be joint to the required derivation of as follows:-c c--,,or--where andThe entry-in the-column of the-row says that if is a-chain and a-chain then contains a--chain.E.g.,letIf and then we can cut upon the underlined components.The conclusion con-tains the sequentwhich is a--chain.The entry“”in the last column of the last row asserts that there can be no cut between two--chains. Proof of Theorem4.1Let.The transformation of into a cut-free deriva-tion from atomic axioms proceeds in4steps:1.Apply Lemma4.2to obtain,where all axiomsin are atomic.2.Apply the Reduction Lemma(Lemma4.5)to a sub-derivation of that ends with a cut of maximal com-plexity.Repeat this step until all cuts are atomic.3.Apply Lemma4.6to obtain a derivation in whichcuts are only applied to atomic sequents.4.Observe that and cut are the only infer-ence rules that can occur in a sub-derivation of that ends in a cut.Let be such a sub-derivation of maxi-mal length.Lemma4.7implies that the last sequent of contains an axiom.We therefore can replace by this axiom,possibly followed by applications of.This is repeated until all cuts have been removed.References[1]A.Avellone,M.Ferrari,and P.Miglioli.Duplication-free tableau calculi together with cut-free and contrac-tion free sequent calculi for the interpolable propo-sitional intermediate logics.Logic J.of the IGPL, 7(4):447–480,1999.[2]A.Avron.Hypersequents,logical consequence andintermediate logics for concurrency.Annals of Math-ematics and Artificial Intelligence,4:225–248,1991.[3]A.Avron.The method of hypersequents in the prooftheory of propositional nonclassical logics.In Logic: from Foundations to Applications,European Logic Colloquium,pages1–32.Oxford Science Publica-tions.Clarendon Press.Oxford,1996.[4]A.Avron.A tableau system for G¨o del-Dummettlogic based on a hypersequential calculus.In Auto-mated Reasoning with Tableaux and Related Methods (Tableaux’2000),number1847in Lectures Notes in Artificial Intelligence,pages98–112,2000.[5]M.Baaz and C.Ferm¨u ller.Analytic calculi for pro-jective logics.In Automated Reasoning with Tableaux and Related Methods(Tableaux’99),Lectures Notes in Artificial Intelligence,vol.1617,pages36–51, 1999.[6]M.Dummett.A propositional logic with denumerablematrix.J.of Symbolic Logic,24:96–107,1959. [7]J.M.Dunn and R.K.Meyer.Algebraic completenessresults for dummett’s and its extensions.Z.Math.Logik Grundlagen Math,17:225–230,1971.[8]R.Dyckhoff.A deterministic terminating sequent cal-culus for G¨o del Dummett logic.Logic Journal of the IGPL,7:319–326,1999.[9]K.G¨o del.Zum intuitionistischen aussagenkalk¨u l.Anz.Akad.Wiss.Wien,69:65–66,1932.[10]P.H´a jek.Metamathematics of Fuzzy Logic.Kluwer,1998.[11]O.Sonobe.A Gentzen-type formulation of some in-termediate propositional logics.J.of Tsuda College, 7:7–14,1975.[12]G.Takeuti and T.Titani.Intuitionistic fuzzy logic andintuitionistic fuzzy set theory.J.of Symbolic Logic, 49:851–866,1984.[13]A.Visser.On the completeness principle:a study ofprovability in heyting’s arithmetic.Annals of Math.Logic,22:263–295,1982.。
Universally Composable Efficient Multiparty Computation from Threshold HomomorphicEncryptionIvan Damg˚ard and Jesper Buus NielsenBRICS Department of Computer ScienceUniversity of AarhusNy MunkegadeDK-8000Arhus C,DenmarkAbstract.We present a new general multiparty computation protocolfor the cryptographic scenario which is universally composable—in par-ticular,it is secure against an active and adaptive adversary,corruptingany minority of the parties.The protocol is as efficient as the best knownstatically secure solutions,in particular the number of bits broadcast(which dominates the complexity)isΩ(nk|C|),where n is the numberof parties,k is a security parameter,and|C|is the size of a circuit doingthe desired computation.Unlike previous adaptively secure protocols forthe cryptographic model,our protocol does not use non-committing en-cryption,instead it is based on homomorphic threshold encryption,inparticular the Paillier cryptosystem.1IntroductionThe problem of multiparty computation(MPC)dates back to the papers by Yao[14]and Goldreich et al.[11].What was proved there was basically that a collection of n parties can efficiently compute the value of an n-input function, s.t.everyone learns the correct result,but no other new information.More pre-cisely,these protocols can be proved secure against a polynomial time bounded adversary who can corrupt a set of less than n/2parties initially,and then make them behave as he likes,we say that the adversary is active.Even so,the adver-sary should not be able to prevent the correct result from being computed and should learn nothing more than the result and the inputs of corrupted parties. Because the set of corrupted parties isfixed from the start,such an adversary is called static or non-adaptive.Proving security of the protocol from[11]requires a complexity assumption, such as existence of trapdoor one-way permutations.This is because the model of communication considered there is s.t.the adversary may see every message sent {ivan,buus}@brics.dk.Basic Research in Computer Science,Centre of the Danish National Research Foundation.248I.Damg˚ard,J.B.Nielsenbetween parties,this is sometimes known as the cryptographic ter,un-conditionally secure MPC protocols were proposed by Ben-Or et al.and Chaum et al.[2,6],in the model where private channels are assumed between every pair of parties.These protocols are secure,even against an adaptive adversary who may decide dynamically during the protocol who to corrupt.Over the years,several protocols have been proposed which,under specific computational assumptions,improve the efficiency of general statically secure MPC,see for instance[10].In particular,Cramer,Damg˚ard and Nielsen[7]pro-posed a general MPC protocol that is secure against a static adversary corrupting any minority of the parties.The protocol assumes that keys for a threshold ho-momorphic cryptosystem have been set up,and has communication(broadcast) complexityΩ(nk|C|),where n is the number of parties,k is a security parame-ter,and|C|is the size of a circuit computing the desired function.This is the so far the most efficient protocol known for the cryptographic model,tolerating a dishonest minority(which is the best possible if we want to guarantee termina-tion).The homomorphic threshold cryptosystem can be the one of Paillier[13], or can be built from the quadratic residuosity assumption.It is possible to get adaptive security,also in the cryptographic model:we can start from an adaptively secure protocol for the private channels model([2, 6]),and then,instead of assuming perfect channels,we implement them using public-key encryption.While this will in general reduce adaptive security to static,using non-committing encryption,adaptive security carries over to the cryptographic model[4].In[5],it is shown that non-committing encryption plus some additional techniques can provide MPC that is universally composable,an even stronger notion of security defined by Canetti[3].All these protocols are, however,much less efficient than the best statically secure ones.An alternative approach that also gives adaptive security is to assume that parties can be trusted to securely erase certain critical data[1].Our Results In this paper,we present a new general MPC protocol for the cryptographic model,based on Paillier encryption.The protocol is universally composable,in particular,it is secure against an active adaptive adversary cor-rupting any minority of the parties.Up to a constant factor,it is as efficient as the(statically secure)protocol from[7].It is therefore thefirst general MPC solution for the cryptographic model where going to adaptive security does not cause a major loss in efficiency(or costs an extra assumption,such as secure era-sures).It is also thefirst that does not use non-committing encryption,which may be of separate interest from a technical point of view.(The protocol in[5] is not for the private channels model,but it uses non-committing encryption for building adaptively secure oblivious transfer.)Instead of non-committing en-cryption we use some ideas from the universally composable commitments of[9], and combine this with the protocol from[7].Thus we work on the same principle that parties supply encrypted input,and the protocol produces encryptions of the outputs,that the parties can then cooperate to decrypt.To make this se-cure,we introduce a new technique for randomizing encryptions before they are decrypted.This means that the adversary has no control over the encryptionsUniversally Composable Efficient Multiparty Computation249 that are decrypted,and this turns out to be essential for our proof of security to go through.We prove our protocols secure in the framework for universally composable (UC)security by Canetti[3].This framework allows to define the security of general reactive tasks,rather than just evaluation of functions.This allows us to prove that our protocol does not just provide secure function evaluation, but is in fact equivalent to what we call an arithmetic black box(ABB).An ABB can be thought of as a secure general-purpose computer.Every party can in private specify inputs to the ABB,and any majority of parties can ask it to perform any feasible computational task and make the result(and only the result)public.Moreover the ABB can be invoked several times and keeps an internal state between invocations.This point of view allows for easier and more modular proofs,and also makes it easier to use our protocols as tools in other constructions.2An Informal DescriptionIn this section,we give a completely informal introduction to some main ideas. All the concepts introduced here will be treated more formally later in the paper. We will assume that from the start,the following scenario has been established: we have a semantically secure threshold public-key system given,i.e.,there is a public encryption key pk known by all parties,while the matching private decryption key has been shared among the parties,s.t.each party holds a share of it.We will use a threshold version of the Paillier cryptosystem,so the message space is Z N for some RSA modulus N.For a plaintext a∈Z N,we let a denote an encryption of a.We have certain homomorphic properties:from encryptions a,b,anyone can easily compute(deterministically)an encryption of a+b,which we denote a b.We also require that from an encryption a and a constant α∈Z N,it is easy to compute a random encryption ofαa,which we denote α a.This immediately gives us an algorithm for subtracting.We can then sketch how a computation was performed securely in the(stati-cally secure)protocol from[7]:we assume the desired computation is specified as a circuit doing additions and multiplications in Z N—This allows us to simulate a Boolean circuit in a straightforward way using0/1values in R.The protocol then starts by having each party publish encryptions of his input values and give zero-knowledge proofs that he knows these values.Then any operation involving addition or multiplication by constants can be performed with no interaction:if all parties know encryptions a,b of input values to an addition gate,all parties can immediately compute an encryption of the output a+b.This leaves only the problem of multiplications:Given encryptions a,b(where it may be the case that no parties knows a nor b),compute securely an encryption of c=ab.This can be done by the following protocol(which is a slightly optimized version of the protocol from[7]):250I.Damg ˚ard,J.B.Nielsen1.Each party P i chooses random d i ∈Z N and broadcasts encryptions d i and d i b .2.All parties prove in zero-knowledge that they know their respective values of d i ,and that d i b encrypts the correct value.Let S be the subset of the parties succeeding with both proofs.3.All parties can now compute a ( i ∈S d i ).This ciphertext is decrypted using threshold decryption,so all parties learn a + i ∈S d i .4.All parties set c =(a + i ∈S d i ) b ( i ∈S d i b ).At the final stage we know encryptions of the output values,which we just decrypt using threshold decryption.Intuitively this is secure if the encryption is secure because,other than the outputs,only random values or values that should be known to the adversary are ever decrypted.This intuition was proved for a static adversary in [7].But in the UC framework,where the adversary is adaptive,it is well known that there are several additional problems:Loosely speaking,a proof of security requires building a simulator that sim-ulates in an indistinguishable way the adversary’s view of a real attack,while having access only to data the adversary is supposed to know at any given time .This means that for instance in the input stage the simulator needs to show the adversary encryptions that are claimed to contain the inputs of honest parties.At this time the simulator does not know these inputs,so it must encrypt some arbitrary values.This is fine for the time being,but if one of the honest par-ties are later corrupted,the simulator learns the real inputs of this party and must reveal them to the adversary along with a simulation of all internal data of the party.The simulator is now stuck,since the real inputs most likely are not consistent with the arbitrary values encrypted earlier.We handle this problem using a combination of two tricks:first,we include in the public key an encryption K =1.Then,we redefine the encryption method,and fix the rule that to encrypt a value a ,one computes a K .Under normal circumstances,this will be an encryption of a .The point,however,is that in the simulation,the simulator can decide what K should be,and will set K =0.Then all encryptions used in the simulation will contain 0,in fact —using the algebraic properties of the encryption scheme —the simulator can compute C as an encryption of 0,store the random coins used for this and later make it seem as if C was computed as C =a K for any a it desires.However,the simulator must also be able to find the input values the ad-versary supplies,immediately as the encryptions are made public.This is not possible if it only sees encryptions of 0.We therefore redefine the way inputs are supplied:for each input value x of party P i ,P i uses the UC commitment scheme of [9]to make a commitment commit (x )to x ,and also sends an encryp-tion C =x K .Finally he proves in zero-knowledge that commit (x )contains the same value as C .This can be done efficiently because [9]is also based on Paillier encryption,and the UC property of the commitments precisely allows the simulator to extract inputs of corrupted parties and fake it on behalf of honest parties.Universally Composable Efficient Multiparty Computation251 Afinal problem we face is that the simulator will not be able to“cheat”in the threshold decryption protocol.The key setup for this protocolfixes the shares of the private key even in the simulation,so a ciphertext can only be decrypted to the value it actually contains.Of course,when decrypting the outputs,the correct results should be produced both in simulation and real life,and so we havea problem since we just said that all ciphertexts in the simulation really contain0.We solve this by randomizing all ciphertexts before they are decrypted:we include anotherfixed encryption R=0in the public key.Then,given ciphertext C,the parties cooperate to create an encryption r R,where r is a(secret) randomly chosen value that depends on input from all parties.Then we compute C (r R)and decrypt this ciphertext.Under normal circumstances,it will of course contain the same plaintext as C.But in the simulation,the simulator will set R=1,and“cheat”in the process where r is chosen,s.t.r=a,where a is the value the simulator wants the decryption to return.This works,since in the simulation any C actually encrypts0.3PreliminariesNotation Throughout the paper k will denote the security parameter.For a probabilistic polynomial time(PPT)algorithm A we will use a←A(x;r)to de-note running A on input x and uniformly random bits r∈{0,1}p(|x|)producing output a;Here p(·)is a polynomial upper bounding the running time of A. Non-erasureΣ-protocols Consider a binary relation R⊆{0,1}∗×{0,1}∗where (x,w)∈R can be checked in PPT.By L(R)we denote the set{x∈{0,1}∗|∃w∈{0,1}∗((x,w)∈R)}.A non-erasureΣ-protocol for relation R is six PPT algorithms(A,l,Z,B,hvs, rbs,xtr)(and an integer l)specifying a three move,public randomness,honest verifier zero-knowledge protocol with special soundness.The prover has input (x,w)∈R and the verifier has input x.The proverfirst computes a message a←A(x,w;r A)and sends a to V.Then V returns a l-bit challenge e.The prover then computes a response to the challenge z←Z(x,w,r a,e)and sends z to the verifier.The verifier then computes b←B(x,a,e,z),where b∈{0,1} indicates whether to believe that the prover knows a valid witness w or not.The algorithm hvs is called the honest verifier simulator and takes as input x∈L(R), e∈{0,1}l and a uniformly random bit-string r hvs and produces as output(a,z) which is supposed to be distributed as the(a,z)produced by a honest prover with instance x receiving challenge e—this is defined formally below.The algorithm rbs is called the random bits simulator.It takes as input(x,w)∈R,a challenge e∈{0,1}l and bits r hvs,which we think of as the random bits used by hvs in a run(a,z)←hvs(x,e;r hvs),and it produces as output a bit-string r A s.t.a=A(x,w;r A)and z=Z(x,w,r A,e).I.e.if(a,z)is the messages simulated using hvs given just x∈L(R),then if w s.t.(x,w)∈R later becomes known it is possible to construct random bits r A s.t.it looks like(a,z)was generated as a←A(x,w;r A)and z←Z(x,w,r A,e).Finally xtr is a knowledge extractor,252I.Damg˚ard,J.B.Nielsenwhich given two correct conversations with the samefirst message can compute a witness.We now formalize these requirements along with completeness. Completeness:For all(x,w)∈R,r A and e∈{0,1}l we have that B(x,A(x,w;r a),e,Z(x,w,r a,e))=1.Special non-erasure honest verifier zero-knowledge:The following two random variables are identically distributed for all(x,w)∈R and e∈{0,1}l:EXEC(x,w,e)=[a←A(x,w;r A);z←Z(x,w,r A,e):(x,w,a,r A,e,z)] SIM(x,w,e)=[(a,z)←hvs(x,e;r hvs);r A←rbs(x,w,e,r hvs):(x,w,a,r A,e,z)] Special soundness:For all x∈S and(a,e,z)and(a,e ,z )where e=e ,B(x,a,e,z)= 1and B(x,a,e ,z )=1,we have that(x,xtr(x,a,e,z,e ,z ))∈R.4Non-Erasure Concurrent Zero-Knowledge Proofs of KnowledgeIn[8]a concurrent zero-knowledge proofs of knowledge is presented based on anyΣ-protocol.The protocol assumes that the prover P and verifier V agree on a key K for a trapdoor commitment scheme.The prover has input(x,w)∈R and the verifier knows x.The protocol proceeds as follows.1.The prover generates a←A(x,w;r A),commits c←commit K(a;r c)andsends c to the verifier.2.The verifier sends uniformly random e∈{0,1}l to the prover.3.The prover computes z←Z(x,w,r A,e)and sends(a,r c,z)to the verifier.4.The verifier outputs1iffc=commit K(a,r c)and B(x,a,e,z)=1.To simulate the protocol without w one assumes that the simulator knows the trapdoor t of K.The simulator can then let c be a fake commitment and when it receives e compute(a,z)←hvs(x,e;r hvs),use t to compute r c s.t.c= commit K(a;r c)and send(a,r c,z).As observed in[9],if theΣ-protocol is non-erasure the above protocol is adaptively secure.Assume namely that P is cor-rupted and the simulator learns a witness.Then compute r A←rbs(x,w,e,r hvs) and by special non-erasure honest verifier zero-knowledge the simulation is per-fect.In the following we will call this to patch the state of the proof of knowledge.In[8]a knowledge extractor xtr is given working as follows.Assume K is a random commitment key so that commit K is computational binding and assume a corrupt prover succeeds in giving an acceptable proof with probability p say. Then by rewinding and giving new random challenges e until a proof is accepted again we get,e.w.n.p.,values a,e,z,a,e ,z s.t.B(x,a,e,z)=B(x,a,e ,z )= 1and e=e and can compute a witness w for x.The expected number of rechallenges needed is1pso if we run a proof and extract if a correct proof isgiven the expected number of rechallenges used in the extraction is p1p where pis the probability that the proof is accepted in thefirst place.This generalizes to several proofs run concurrently.Each time a proof is accepted invoke xtr and get a witness.If the context in which the proofs are run is PPT,then theUniversally Composable Efficient Multiparty Computation253 running time,and the number of proofs,is bounded by a polynomial P,and so each invocation of xtr has expected running time less than P which with a total of at most P invocations,by the linearity of expectation,gives an expected polynomial running time of at most P2.Note that for the zero-knowledge simulator it is enough that K is a trapdoor commitment key,whereas for knowledge soundness it was needed that the key was random to ensure that commit K is computational binding.We can therefore let V pick the key K and send it to P.All we need to ensure is that is the simulator can get its hands on the trapdoor of K.We will use the protocol and simulator exactly this way later.5Universally Composable SecurityIn this section we give a sketch of the notion of universally composable security of synchronous protocols for the authenticated link model.Except for some minor technical differences it is the synchronous model described in[3].A protocolπconsists of n parties P1,...,P n,all PPT interactive Turing machines(ITMs).The execution of a protocol takes place in the presence of an environment Z,also a PPT ITM,which supplies inputs to and receives outputs from the parties.Z also models the adversary of the protocol,and so schedules the activation of the parties and corrupts parties.In each round r each party P i sends a message m i,j,r to each party P j;The message m i,i,r is the state of P i after round r,and m i,i,0=(k,r i)will be the security parameter and the random bits used by P i.We model open channels by showing the messages {m i,j,r}j∈[n]\{i}to Z,where[n]={1,...,n}.In each round Z inputs a value x i,r to P i and receives and output y i,r from P i.We write the r’th activation of P i as({m i,j,r}j∈[n],y i,r)=P i({m j,i,r−1}j∈[n],x i,r).We model that the parties cannot reliably erase their state by giving r i to Z when P i is corrupted.In the following C will denote the set of corrupted parties and H=[n]\C.In detail the real-life execution proceeds as follows.Init:The input is k,random bits r=(r1,...,r n)∈({0,1}∗)n and an auxiliary input z∈{0,1}∗.Set r=0and C=∅.For i,j∈[n]let m i,j,0= for i=j and m i,i,0=(k,r i).Then input k and z to Z and activate Z.Environment activation:When activated Z outputs one of the following commands: (activate i,x i,r,{m j,i,r−1}j∈C)for i∈H;(corrupt i)for i∈H;(end round);or(guess b)for b∈{0,1}.Commands are handled as described below and the environment is then activated again.We require that all honest parties are acti-vated between two(end round)commands.When a(guess b)command is given the execution stops.Party activation:{m j,i,r−1}j∈H were defined in the previous round;Add these to {m j,i,r−1}j∈C from the environment and compute({m i,j,r}j∈[n],y i,r)= P i({m j,i,r−1}j∈[n],x i,r).Then give{m i,j,r}j∈[n]\{i}to Z.Corrupt:Give r i to Z.Set C=C∪{i}.End round:Give the value{y i,r}i∈H defined in Party activation to Z and set r=r+1.254I.Damg˚ard,J.B.NielsenThe result of the execution is the bit b output by Z.We denote this bit by REALπ,Z(k,r,z).This defines a Boolean distribution ensemble REALπ,Z= {REALπ,Z(k,z)}k∈N,z∈{0,1}∗,where we take r to be uniformly random.To define the security of a protocol an ideal functionality F is specified.The ideal functionality is a PPT ITM with n input tapes and n output tapes which we think of as being connected to n parties.The input-output behavior of the ideal functionality defines the desired input-output behavior of the protocol.To be able to specify protocols which are allowed to leak certain information F has a special output tape(SOT)on which it writes this information.The ideal functionality also has the special input tape(SIT).Each time F is given an input for P i it writes some value on the SOT modeling the part of the input which is not required to be kept secret.When F receives the input(activate v)a round is over,in response to which it writes a value on the output tape for each party. The value v models the inputs from the corrupted parties.When a party P i is corrupted the ideal functionality receives the input(corrupt i)on the SIT.We then say that a protocolπrealizes an ideal functionality F if there exists an interface,also called simulator,S which given access to F can simulate a run ofπwith the same input-output behavior.In doing this S is given the inputs of the corrupted parties,and the information leaked on the SOT of F,and can specify the inputs of corrupted parties.In detail the ideal process,proceeds as follows.Init:The input is k,random bits r=(r F,r S)∈({0,1}∗)2and an auxiliary input z∈{0,1}∗.Set r=0and C=∅.Provide S with r S,provide F with r F and give k and z to Z and activate Z.Environment activation:Z is defined exactly as in the real-word,but now com-mands are handled by S,as described below.Party activation:Give{m j,i,r−1}i∈C to S and give x i,r to F on the input tape for P i and run F to get some value v F on the SOT.This value is given to S which is then required to compute some value{m i,j,r}j∈[n]\{i}which is given to Z. Corrupt:When Z corrupts P i,S is given the values x i,0,y i,0,x i,1,...exchanged be-tween Z and F for P i.Furthermore(corrupt i)is input on the SIT of F and F writes a value on the SOT;This value is given to S.Then S outputs some value r i which is given to Z.Set C=C∪{i}.End round:S is activated and produces a value v.Then(activate v)is input to F which produces output{y i,r}i∈[n].Then{y i,r}i∈C is given to S and{y i,r}i∈H is given to Z.Set r=r+1.The result of the ideal-process is the bit b output by Z.We denote this bit by IDEAL F,S,Z(k,r,z).This defines a Boolean distribution ensemble IDEAL F,S,Z= {IDEAL F,S,Z(k,z)}k∈N,z∈{0,1}∗.Definition1.We say thatπt-realizes F if there exists an interface S s.t.for all environments Z corrupting at most t parties it holds that IDEAL F,S,Z c≈REALπ,Z.We then define the hybrid models.The G-hybrid model is basically the real-life model where the parties in addition to the communication lines also haveUniversally Composable Efficient Multiparty Computation255 access to an ideal functionality G.In each round all parties give an input to G and receive the output from G in the following round.The inputs to the SIT of G are given by Z and Z receives the outputs on the SOT.When defining security of a protocol in a hybrid model the interface S must in addition to the communication and internal state of parties also simulate the values exchanged between G and Z.We call this a hybrid interface.Definition2.We say thatπt-realizes F in the G-hybrid model if there exists a hybrid interface S s.t.for all environments Z corrupting at most t parties it holds that IDEAL F,S,Z c≈HYB Gπ,Z.A universal composability theorem can be proven for this framework.I.e.if a protocol realizes a given functionality it can be replaced for the functionality in all contexts.The above description easily generalizes to the case of several ideal functionalities in a hybrid model.In particular we will in the protocols following assume that the parties have access to an ideal functionality F BA for doing Byzantine Agreement.It expects a bit b i as input from all parties.If all honest parties agree on a value v,then F BA outputs v to all parties.Otherwise the environment is allowed to determine the output through the SIT.We also assume a broadcast channel,which can easily be model with an ideal functionality.6The Paillier Cryptosystem and Some ToolsIn this section we describe the Paillier cryptosystem[13].The public key is a k-bit RSA modulus pk=N=pq,where p an q are chosen s.t.p=2p +1,q=2q +1 for primes p ,q ,and both p and q have k/2bits.The plaintext space is Z N andthe ciphertext space is Z∗N2.To encrypt a∈Z N,one chooses r∈Z∗N uniformlyat random and computes the ciphertext as E pk(a;r)=g a r N mod N2,wherethe element g=N+1has order N in Z∗N2.The encryption function is homo-morphic in the following sense E pk(a1;r1)E pk(a2;r2)mod N2=E pk(a1a2mod N;r1r2mod N)and E pk(a;r)b=E pk(ab mod N;r b mod N).The private key is e.g.sk=φ(N)(φ(N)−1mod N),and it is straightforward to verify that ((N+1)a r N)sk mod N2=Na+1from which a mod N can be computed.Given a one can then compute r N and r=(r N)N−1modφ(N)mod N2.Under an appro-priate complexity assumption—the DCRA—this system is semantic secure.The DCRA states that random elements in Z∗N2are computationally indistin-guishable from random elements of form r N mod N2.For any ciphertext K=E pk(a;r K)we can consider the function E pk,K(m;r)= K m r N mod N2=E pk(am;s m r mod N).If r is uniformly random in Z∗N,then s m r mod N is uniformly random in Z∗N and so E pk,K(m;r)is a uniformly ran-dom encryption of am mod N.Notice that if a∈Z∗N,then from K and c= E pk,K(m;r)and sk we can efficiently compute m.Therefore E pk,K(m;r)is again a semantically secure encryption function.If on the other hand a=0,then E pk,K(m;r)is a uniformly random encryption of0.So,E pk,K(m;r)can be seen as a perfect hiding commitment scheme commit pk,K(m;r)=E pk,K(m;r).It is256I.Damg˚ard,J.B.Nielsentrivial to see that commit pk,K(m;r)is computationally binding.If K∈E pk(1), then commit pk,K(m;r)would be perfect binding,so no algorithm canfind two different openings.An algorithmfinding two different openings when K=E pk(0) would therefore distinguish encryptions of1from encryptions K of0.Notice furthermore that if K=E pk(0;r K)and r K is known,then given ar-bitrary m,m ∈Z N and r∈Z∗N we can compute r =s m−m r mod N,so that E pk,K(m;r)=E pk,K(m ;r ).All in all we have argued that if K=E pk(0;r K)is a random encryption,then commit K(m;r)is a perfect hiding computationally binding trapdoor commitment scheme with trapdoor t K.When considering val-ues K∈Z∗N2as keys we will call keys of the form K=E pk(0;t K)trapdoor keysand we will call keys of the form K=E pk(a;s)for a∈Z∗n encryption keys.We now describe a commitment scheme from[9].A UC Commitment Scheme First we have to introduce a so-called double-trapdoor commitment scheme.Consider a trapdoor commitment scheme with key K=(K1,K2),where K1and K2are both encryptions of0.We commit ascommit K1,K2(m)=(E pk,K1(m1),E pk,K2(m2)),where m1and m2are uniformlyrandom values for which m=m1+m2mod N.To make a fake commitment just commit to random values m1and m2,call the commitment(c1,c2).To trapdoor open such a commitment it is enough to know the trapdoor of one of K1and K2.To open c=(c1,c2)to m we can open c1to m−m2or we can open c2 to m−m1.We note for later use that the distribution of the random bits is independent of which trapdoor is used.Note that the domain of commit K can be extended by committing block-wise.Here is the protocol from[9].Setup:We assume that commitment sender S and receiver R agree on two inde-pendently chosen Paillier public-keys N and N and a commitment key K= (K1,K2)=(E N (0;t1),E N (0,t1).If R is honest at the beginning of the proto-col we furthermore assume that t1and t2are uniformly random. Commitment:S commits to s∈Z N as follows.1.S generate uniformly random L S∈Z∗N2,commits c L=commit N ,K(L S;r L)and sends c L to R.2.R generates uniformly random L R∈Z N2and sends L R to R.3.S computes L=L S L R mod N2,commits to s under L as c s=commit L(s;r s)for uniformly random s∈Z∗N and sends c s to R along with(L S,r L),and outputs(L,c s,r s).4.R checks that c L=commit N ,K(L S;r L)and if so computes L=L S L R mod N2and outputs(L,c s).In[9]a simulator S commit for the UC framework is given which has the prop-erty that given any so-called hitting commitment(L,c s)it can simulate a run of the protocol which results in R outputting(L,c s).All the simulator needs to do this is the trapdoor(s)of K when R is corrupted and an opening(s,r s)of(L,c s) when S is corrupted.The simulator has some properties given in Theorem1. Theorem1.Consider a simulation with hitting commitment(L,c s)and output commitment(L ,c s)from R.。
维数约简技术在网络安全中的应用一、维数约简技术概述维数约简技术是一种在数据科学和机器学习领域中,用于处理高维数据集的方法。
它通过降低数据的维度,同时尽量保留原始数据中的重要信息,以提高数据处理的效率和准确性。
在网络安全领域,维数约简技术的应用日益广泛,它可以帮助分析人员从海量的网络数据中提取有价值的信息,以识别和防御潜在的网络威胁。
1.1 维数约简技术的核心概念维数约简技术的核心在于识别数据中的冗余或无关特征,并将其去除或合并,从而降低数据的复杂性。
这包括但不限于主成分分析(PCA)、线性判别分析(LDA)、奇异值分解(SVD)等方法。
这些方法各有优势,适用于不同的数据类型和场景。
1.2 维数约简技术在网络安全中的应用场景网络安全领域面临的挑战之一是数据的海量和复杂性。
维数约简技术可以应用于以下几个网络安全的关键领域:- 异常检测:通过降低网络流量数据的维度,快速识别出不符合正常模式的行为或事件。
- 入侵检测系统(IDS):优化IDS的性能,减少误报和漏报,提高对新型攻击的识别能力。
- 恶意软件分析:对恶意软件的特征进行降维处理,便于快速识别和分类恶意软件。
- 网络流量分析:对网络流量数据进行维数约简,帮助网络管理员监控和理解网络行为模式。
二、维数约简技术在网络安全中的应用分析2.1 维数约简技术在异常检测中的应用在网络安全中,异常检测是识别潜在威胁的关键步骤。
维数约简技术可以帮助分析人员从常规的网络行为中提取出关键特征,构建正常行为的模型,从而更有效地识别出异常行为。
例如,通过PCA可以减少网络流量数据的维度,同时保留数据的主要变化趋势,为异常检测算法提供更清晰的数据视图。
2.2 维数约简技术在入侵检测系统中的应用入侵检测系统是网络安全的重要组成部分,它通过监控网络流量来检测可能的入侵行为。
维数约简技术可以提高IDS的检测速度和准确性。
通过LDA等方法,可以从网络数据中提取出最能区分正常和异常行为的特征,从而提高IDS 的分类能力。
安全多方计算发展历程安全多方计算(Secure Multi-Party Computation,SMPC)是一种重要的密码学协议,旨在保护参与方的隐私。
它允许多个参与方在不泄露私密数据的情况下进行计算,是一种难以逆袭的技术发展。
SMPC的发展历程可以追溯到20世纪80年代,当时,密钥同态密码系统的概念首次被提出。
密钥同态密码系统是一种协议,可以允许两个参与方执行某些计算,并且只有他们可以获得计算结果。
然而,这种系统的实现方式是基于特定的密码学算法,且只适用于特定的计算任务。
随着计算机技术的迅速发展和互联网的普及,人们对隐私和数据安全的关注度日益增加。
因此,研究人员开始探索如何在多方参与的计算中保护数据隐私,于是SMPC的概念逐渐成型。
1998年,美国密歇根大学的盖尔·埃文斯和美国加州大学伯克利分校的宾利·博纳尔德提出了一个重要的隐私保护模型,称为Diffie-Hellman-based secure function evaluation (DH-SFE)。
这个模型使用了Diffie-Hellman密钥交换算法,可以实现在多方计算中保护隐私数据。
2004年,伊利诺伊大学厄巴纳-香槟分校的奥德利·戴维和以色列理工学院的约维特·尼萨尔提出了一个重要的SMPC算法,称为Yao协议。
这个算法使用了加密技术和随机掩码技术,可以实现在多方计算中保护数据隐私。
从此以后,随着各种密码学技术的不断发展,SMPC的研究也取得了突破性进展。
研究人员提出了许多新的SMPC协议和算法,如Garbled Circuits、Secure Multiparty Linear Regression 等。
这些新的技术不仅可以提供更高的安全性,还可以在更广泛的应用场景中使用。
如今,SMPC已经得到了广泛的应用,包括金融、医疗、社交网络等领域。
在金融领域,SMPC可以用于多方信用评级、资产组合优化等任务。
multi-institutional team 中文译法-概述说明以及解释1.引言1.1 概述在科学研究和工程领域,多机构团队(multi-institutional team)越来越受到重视,这种团队模式是由不同机构或组织合作共同完成一个项目或任务。
这一模式的出现,打破了传统的学术合作限制,使得研究者和工程师能够跨越地域和机构的界限,共同解决复杂的问题和挑战。
本文将探讨multi-institutional team的概念、优势和挑战,并总结其对科研和工程领域的重要性。
的内容1.2 文章结构文章结构部分将主要介绍本文的组织架构和内容安排。
其中,我们将从引言开始,引入读者对multi-institutional team的概念和重要性。
接着,我们将详细探讨multi-institutional team的定义、优势和挑战,帮助读者更全面地了解这一概念。
最后,在结论部分,我们将总结multi-institutional team的重要性,并展望未来发展趋势,以及给出一些结束语,概括全文内容。
通过这样的结构安排,读者可以系统性地了解并理解multi-institutional team的相关知识和观点。
1.3 目的文章的目的是探讨multi-institutional team在当前社会和经济环境下的重要性和影响。
我们将分析multi-institutional team的定义、优势和挑战,以及其在不同领域中的运用情况。
通过深入研究和分析,我们希望能够为读者提供对multi-institutional team的全面和深入的了解,同时为相关领域的专业人士和决策者提供有益的参考和启示。
最终目的是促进不同机构之间的合作与交流,推动协同创新,推动社会与经济的可持续发展。
2.正文2.1 什么是multi-institutional teamMulti-institutional team是指由来自不同机构或组织的成员组成的团队。
(0,2) 插值||(0,2) interpolation0#||zero-sharp; 读作零井或零开。
0+||zero-dagger; 读作零正。
1-因子||1-factor3-流形||3-manifold; 又称“三维流形”。
AIC准则||AIC criterion, Akaike information criterionAp 权||Ap-weightA稳定性||A-stability, absolute stabilityA最优设计||A-optimal designBCH 码||BCH code, Bose-Chaudhuri-Hocquenghem codeBIC准则||BIC criterion, Bayesian modification of the AICBMOA函数||analytic function of bounded mean oscillation; 全称“有界平均振动解析函数”。
BMO鞅||BMO martingaleBSD猜想||Birch and Swinnerton-Dyer conjecture; 全称“伯奇与斯温纳顿-戴尔猜想”。
B样条||B-splineC*代数||C*-algebra; 读作“C星代数”。
C0 类函数||function of class C0; 又称“连续函数类”。
CA T准则||CAT criterion, criterion for autoregressiveCM域||CM fieldCN 群||CN-groupCW 复形的同调||homology of CW complexCW复形||CW complexCW复形的同伦群||homotopy group of CW complexesCW剖分||CW decompositionCn 类函数||function of class Cn; 又称“n次连续可微函数类”。
Cp统计量||Cp-statisticC。
2022年厦门理工学院数据科学与大数据技术专业《计算机系统结构》科目期末试卷B(有答案)一、选择题1、利用时间重叠概念实现并行处理的是( )。
A.流水处理机B.多处理机C.并行(阵列)处理机D.相联处理机2、计算机组成设计不考虑( )。
A.专用部件设置B.功能部件的集成度C.控制机构的组成D.缓冲技术3、“启动I/O”指令是主要的输入输出指令,是属于()。
A.目态指令B.管态指令C.目态、管态都能用的指令D.编译程序只能用的指令4、从计算机系统结构上讲,机器语言程序员所看到的机器属性是()A.计算机软件所要完成的功能B.计算机硬件的全部组成C.编程要用到的硬件组织D.计算机各部件的硬件实现。
5、开发并行的途径有(),资源重复和资源共享。
A.多计算机系统B.多道分时C.分布式处理系统D.时间重叠6、计算机系统的层次结构按照由高到低的顺序分别为()。
A.高级语言机器级,汇编语言机器级,传统机器语言机器级,微程序机器级B.高级语言机器级,应用语言机器级,汇编语言机器级,微程序机器级C.应用语言机器级,传统机器语言机器级,汇编语言机器级,操作系统机器级D.应用语言机器级,操作系统机器级,微程序机器级,传统机器语言机器级7、在操作系统机器级,一般用()程序()作业控制语句。
A.汇编程序,翻译B.汇编程序,解释C.机器语言,解释D.机器语言,翻译8、1TFLOPS计算机能力,1TBYTE/S的I/O带宽和()称为计算机系统的3T性能目标。
A,1TBYTE 硬盘容量B.1TBYTE 软盘容量C.1TBYTE 主存容量D.A和B9、对机器语言程序员透明的是( )。
A.中断字B.主存地址寄存器C.通用寄存器D.条件码10、浮点数尾数基值rm=8,尾数数值部分长6位,可表示的规格化最小正尾数为( )A.0.5B.0.25C.0.125D.1/64二、填空题11、评价地址码个数不同的4种指令的优缺点的主要标准是________和________12、输入输出设备的异步性、实时性、与________三个特点是现代计算机系统必须具备的共同特性。
A very brief guide to using MXMMichail Tsagris,Vincenzo Lagani,Ioannis Tsamardinos1IntroductionMXM is an R package which contains functions for feature selection,cross-validation and Bayesian Networks.The main functionalities focus on feature selection for different types of data.We highlight the option for parallel computing and the fact that some of the functions have been either partially or fully implemented in C++.As for the other ones,we always try to make them faster.2Feature selection related functionsMXM offers many feature selection algorithms,namely MMPC,SES,MMMB,FBED,forward and backward regression.The target set of variables to be selected,ideally what we want to discover, is called Markov Blanket and it consists of the parents,children and parents of children(spouses) of the variable of interest assuming a Bayesian Network for all variables.MMPC stands for Max-Min Parents and Children.The idea is to use the Max-Min heuristic when choosing variables to put in the selected variables set and proceed in this way.Parents and Children comes from the fact that the algorithm will identify the parents and children of the variable of interest assuming a Bayesian Network.What it will not recover is the spouses of the children of the variable of interest.For more information the reader is addressed to[23].MMMB(Max-Min Markov Blanket)extends the MMPC to discovering the spouses of the variable of interest[19].SES(Statistically Equivalent Signatures)on the other hand extends MMPC to discovering statistically equivalent sets of the selected variables[18,9].Forward and Backward selection are the two classical procedures.The functionalities or the flexibility offered by all these algorithms is their ability to handle many types of dependent variables,such as continuous,survival,categorical(ordinal,nominal, binary),longitudinal.Let us now see all of them one by one.The relevant functions are1.MMPC and SES.SES uses MMPC to return multiple statistically equivalent sets of vari-ables.MMPC returns only one set of variables.In all cases,the log-likelihood ratio test is used to assess the significance of a variable.These algorithms accept categorical only, continuous only or mixed data in the predictor variables side.2.wald.mmpc and wald.ses.SES uses MMPC using the Wald test.These two algorithmsaccept continuous predictor variables only.3.perm.mmpc and perm.ses.SES uses MMPC where the p-value is obtained using per-mutations.Similarly to the Wald versions,these two algorithms accept continuous predictor variables only.4.ma.mmpc and ma.ses.MMPC and SES for multiple datasets measuring the same variables(dependent and predictors).5.MMPC.temporal and SES.temporal.Both of these algorithms are the usual SES andMMPC modified for correlated data,such as clustered or longitudinal.The predictor vari-ables can only be continuous.6.fbed.reg.The FBED feature selection method[2].The log-likelihood ratio test or the eBIC(BIC is a special case)can be used.7.fbed.glmm.reg.FBED with generalised linear mixed models for repeated measures orclustered data.8.fbed.ge.reg.FBED with GEE for repeated measures or clustered data.9.ebic.bsreg.Backward selection method using the eBIC.10.fs.reg.Forward regression method for all types of predictor variables and for most of theavailable tests below.11.glm.fsreg Forward regression method for logistic and Poisson regression in specific.Theuser can call this directly if he knows his data.12.lm.fsreg.Forward regression method for normal linear regression.The user can call thisdirectly if he knows his data.13.bic.fsreg.Forward regression using BIC only to add a new variable.No statistical test isperformed.14.bic.glm.fsreg.The same as before but for linear,logistic and Poisson regression(GLMs).15.bs.reg.Backward regression method for all types of predictor variables and for most of theavailable tests below.16.glm.bsreg.Backward regression method for linear,logistic and Poisson regression(GLMs).17.iamb.The IAMB algorithm[20]which stands for Incremental Association Markov Blanket.The algorithm performs a forward regression at first,followed by a backward regression offering two options.Either the usual backward regression is performed or a faster variation, but perhaps less correct variation.In the usual backward regression,at every step the least significant variable is removed.In the IAMB original version all non significant variables are removed at every step.18.mmmb.This algorithm works for continuous or categorical data only.After applying theMMPC algorithm one can go to the selected variables and perform MMPC on each of them.A list with the available options for this argument is given below.Make sure you include the test name within””when you supply it.Most of these tests come in their Wald and perm (permutation based)versions.In their Wald or perm versions,they may have slightly different acronyms,for example waldBinary or WaldOrdinal denote the logistic and ordinal regression respectively.1.testIndFisher.This is a standard test of independence when both the target and the setof predictor variables are continuous(continuous-continuous).2.testIndSpearman.This is a non-parametric alternative to testIndFisher test[6].3.testIndReg.In the case of target-predictors being continuous-mixed or continuous-categorical,the suggested test is via the standard linear regression.If the robust option is selected,M estimators[11]are used.If the target variable consists of proportions or percentages(within the(0,1)interval),the logit transformation is applied beforehand.4.testIndRQ.Another robust alternative to testIndReg for the case of continuous-mixed(or continuous-continuous)variables is the testIndRQ.If the target variable consists of proportions or percentages(within the(0,1)interval),the logit transformation is applied beforehand.5.testIndBeta.When the target is proportion(or percentage,i.e.,between0and1,notinclusive)the user can fit a regression model assuming a beta distribution[5].The predictor variables can be either continuous,categorical or mixed.6.testIndPois.When the target is discrete,and in specific count data,the default test isvia the Poisson regression.The predictor variables can be either continuous,categorical or mixed.7.testIndNB.As an alternative to the Poisson regression,we have included the Negativebinomial regression to capture cases of overdispersion[8].The predictor variables can be either continuous,categorical or mixed.8.testIndZIP.When the number of zeros is more than expected under a Poisson model,thezero inflated poisson regression is to be employed[10].The predictor variables can be either continuous,categorical or mixed.9.testIndLogistic.When the target is categorical with only two outcomes,success or failurefor example,then a binary logistic regression is to be used.Whether regression or classifi-cation is the task of interest,this method is applicable.The advantage of this over a linear or quadratic discriminant analysis is that it allows for categorical predictor variables as well and for mixed types of predictors.10.testIndMultinom.If the target has more than two outcomes,but it is of nominal type(political party,nationality,preferred basketball team),there is no ordering of the outcomes,multinomial logistic regression will be employed.Again,this regression is suitable for clas-sification purposes as well and it to allows for categorical predictor variables.The predictor variables can be either continuous,categorical or mixed.11.testIndOrdinal.This is a special case of multinomial regression,in which case the outcomeshave an ordering,such as not satisfied,neutral,satisfied.The appropriate method is ordinal logistic regression.The predictor variables can be either continuous,categorical or mixed.12.testIndTobit(Tobit regression for left censored data).Suppose you have measurements forwhich values below some value were not recorded.These are left censored values and by using a normal distribution we can by pass this difficulty.The predictor variables can be either continuous,categorical or mixed.13.testIndBinom.When the target variable is a matrix of two columns,where the first one isthe number of successes and the second one is the number of trials,binomial regression is to be used.The predictor variables can be either continuous,categorical or mixed.14.gSquare.If all variables,both the target and predictors are categorical the default test isthe G2test of independence.An alternative to the gSquare test is the testIndLogistic.With the latter,depending on the nature of the target,binary,un-ordered multinomial or ordered multinomial the appropriate regression model is fitted.The predictor variables can be either continuous,categorical or mixed.15.censIndCR.For the case of time-to-event data,a Cox regression model[4]is employed.Thepredictor variables can be either continuous,categorical or mixed.16.censIndWR.A second model for the case of time-to-event data,a Weibull regression modelis employed[14,13].Unlike the semi-parametric Cox model,the Weibull model is fully parametric.The predictor variables can be either continuous,categorical or mixed.17.censIndER.A third model for the case of time-to-event data,an exponential regressionmodel is employed.The predictor variables can be either continuous,categorical or mixed.This is a special case of the Weibull model.18.testIndIGreg.When you have non negative data,i.e.the target variable takes positivevalues(including0),a suggested regression is based on the the inverse Gaussian distribution.The link function is not the inverse of the square root as expected,but the logarithm.This is to ensure that the fitted values will be always be non negative.An alternative model is the Weibull regression(censIndWR).The predictor variables can be either continuous, categorical or mixed.19.testIndGamma(Gamma regression).Gamma distribution is designed for strictly positivedata(greater than zero).It is used in reliability analysis,as an alternative to the Weibull regression.This test however does not accept censored data,just the usual numeric data.The predictor variables can be either continuous,categorical or mixed.20.testIndNormLog(Gaussian regression with a log link).Gaussian regression using the loglink(instead of the identity)allows non negative data to be handled naturally.Unlike the gamma or the inverse gaussian regression zeros are allowed.The predictor variables can be either continuous,categorical or mixed.21.testIndClogit.When the data come from a case-control study,the suitable test is via con-ditional logistic regression[7].The predictor variables can be either continuous,categorical or mixed.22.testIndMVReg.In the case of multivariate continuous target,the suggested test is viaa multivariate linear regression.The target variable can be compositional data as well[1].These are positive data,whose vectors sum to1.They can sum to any constant,as long as it the same,but for convenience reasons we assume that they are normalised to sum to1.In this case the additive log-ratio transformation(multivariate logit transformation)is applied beforehand.The predictor variables can be either continuous,categorical or mixed.23.testIndGLMMReg.In the case of a longitudinal or clustered target(continuous,propor-tions within0and1(not inclusive)),the suggested test is via a(generalised)linear mixed model[12].The predictor variables can only be continuous.This test is only applicable in SES.temporal and MMPC.temporal.24.testIndGLMMPois.In the case of a longitudinal or clustered target(counts),the suggestedtest is via a(generalised)linear mixed model[12].The predictor variables can only be continuous.This test is only applicable in SES.temporal and MMPC.temporal.25.testIndGLMMLogistic.In the case of a longitudinal or clustered target(binary),thesuggested test is via a(generalised)linear mixed model[12].The predictor variables can only be continuous.This test is only applicable in SES.temporal and MMPC.temporal.To avoid any mistakes or wrongly selected test by the algorithms you are advised to select the test you want to use.All of these tests can be used with SES and MMPC,forward and backward regression methods.MMMB accepts only testIndFisher,testIndSpearman and gSquare.The reason for this is that MMMB was designed for variables(dependent and predictors)of the same type.For more info the user should see the help page of each function.2.1A more detailed look at some arguments of the feature selection algorithmsSES,MMPC,MMMB,forward and backward regression offer the option for robust tests(the argument robust).This is currently supported for the case of Pearson correlation coefficient and linear regression at the moment.We plan to extend this option to binary logistic and Poisson regression as well.These algorithms have an argument user test.In the case that the user wants to use his own test,for example,mytest,he can supply it in this argument as is,without””. For all previously mentioned regression based conditional independence tests,the argument works as test=”testIndFisher”.In the case of the user test it works as user test=mytest.The max kargument must always be at least1for SES,MMPC and MMMB,otherwise it is a simple filtering of the variables.The argument ncores offers the option for parallel implementation of the first step of the algorithms.The filtering step,where the significance of each predictor is assessed.If you have a few thousands of variables,maybe this option will do no significant improvement.But, if you have more and a”difficult”regression test,such as quantile regression(testIndRQ),then with4cores this could reduce the computational time of the first step up to nearly50%.For the Poisson,logistic and normal linear regression we have included C++codes to speed up this process,without the use of parallel.The FBED(Forward Backward Early Dropping)is a variant of the Forward selection is per-formed in the first phase followed by the usual backward regression.In some,the variation is that every non significant variable is dropped until no mre significant variables are found or there is no variable left.The forward and backward regression methods have a few different arguments.For example stopping which can be either”BIC”or”adjrsq”,with the latter being used only in the linear regression case.Every time a variable is significant it is added in the selected variables set.But, it may be the case,that it is actually not necessary and for this reason we also calculate the BIC of the relevant model at each step.If the difference BIC is less than the tol(argument)threshold value the variable does not enter the set and the algorithm stops.The forward and backward regression methods can proceed via the BIC as well.At every step of the algorithm,the BIC of the relevant model is calculated and if the BIC of the model including a candidate variable is reduced by more that the tol(argument)threshold value that variable is added.Otherwise the variable is not included and the algorithm stops.2.2Other relevant functionsOnce SES or MMPC are finished,the user might want to see the model produced.For this reason the functions ses.model and mmpc.model can be used.If the user wants to get some summarised results with MMPC for many combinations of max k and treshold values he can use the mmpc.path function.Ridge regression(ridge.reg and ridge.cv)have been implemented. Note that ridge regression is currently offered only for linear regression with continuous predictor variables.As for some miscellaneous,we have implemented the zero inflated Poisson and beta regression models,should the user want to use them.2.3Cross-validationcv.ses and cv.mmpc perform a K-fold cross validation for most of the aforementioned regression models.There are many metric functions to be used,appropriate for each case.The folds can be generated in a stratified fashion when the dependent variable is categorical.3NetworksCurrently three algorithms for constructing Bayesian Networks(or their skeleton)are offered,plus modifications.MMHC(Max-Min Hill-Climbing)[23],(mmhc.skel)which constructs the skeleton of the Bayesian Network(BN).This has the option of running SES[18]instead.MMHC(Max-Min Hill-Climbing)[23],(local.mmhc.skel)which constructs the skeleton around a selected node.It identifies the Parents and Children of that node and then finds their Parents and Children.MMPC followed by the PC rules.This is the command mmpc.or.PC algorithm[15](pc.skel for which the orientation rules(pc.or)have been implemented as well.Both of these algorithms accept continuous only,categorical data only or a mix of continuous,multinomial and ordinal.The skeleton of the PC algorithm has the option for permutation based conditional independence tests[21].The functions ci.mm and ci.fast perform a symmetric test with mixed data(continuous, ordinal and binary data)[17].This is employed by the PC algorithm as well.Bootstrap of the PC algorithm to estimate the confidence of the edges(pc.skel.boot).PC skeleton with repeated measures(glmm.pc.skel).This uses the symetric test proposed by[17]with generalised linear models.Skeleton of a network with continuous data using forward selection.The command work does a similar to MMHC task.It goes to every variable and instead applying the MMPC algorithm it applies the forward selection regression.All data must be continuous,since the Pearson correlation is used.The algorithm is fast,since the forward regression with the Pearson correlation is very fast.We also have utility functions,such as1.rdag and rdag2.Data simulation assuming a BN[3].2.findDescendants and findAncestors.Descendants and ancestors of a node(variable)ina given Bayesian Network.3.dag2eg.Transforming a DAG into an essential(mixed)graph,its class of equivalent DAGs.4.equivdags.Checking whether two DAGs are equivalent.5.is.dag.In fact this checks whether cycles are present by trying to topologically sort theedges.BNs do not allow for cycles.6.mb.The Markov Blanket of a node(variable)given a Bayesian Network.7.nei.The neighbours of a node(variable)given an undirected graph.8.undir.path.All paths between two nodes in an undirected graph.9.transitiveClosure.The transitive closure of an adjacency matrix,with and without arrow-heads.10.bn.skel.utils.Estimation of false discovery rate[22],plus AUC and ROC curves based onthe p-values.11.bn.skel.utils2.Estimation of the confidence of the edges[16],plus AUC and ROC curvesbased on the confidences.12.plotnetwork.Interactive plot of a graph.4AcknowledgmentsThe research leading to these results has received funding from the European Research Coun-cil under the European Union’s Seventh Framework Programme(FP/2007-2013)/ERC Grant Agreement n.617393.References[1]John Aitchison.The statistical analysis of compositional data.Chapman and Hall London,1986.[2]Giorgos Borboudakis and Ioannis Tsamardinos.Forward-Backward Selection with Early Drop-ping,2017.[3]Diego Colombo and Marloes H Maathuis.Order-independent constraint-based causal structurelearning.Journal of Machine Learning Research,15(1):3741–3782,2014.[4]David Henry Cox.Regression Models and Life-Tables.Journal of the Royal Statistical Society,34(2):187–220,1972.[5]Silvia Ferrari and Francisco Cribari-Neto.Beta regression for modelling rates and proportions.Journal of Applied Statistics,31(7):799–815,2004.[6]Edgar C Fieller and Egon S Pearson.Tests for rank correlation coefficients:II.Biometrika,48:29–40,1961.[7]Mitchell H Gail,Jay H Lubin,and Lawrence V Rubinstein.Likelihood calculations for matchedcase-control studies and survival studies with tied death times.Biometrika,68(3):703–707, 1981.[8]Joseph M Hilbe.Negative binomial regression.Cambridge University Press,2011.[9]Vincenzo Lagani,Giorgos Athineou,Alessio Farcomeni,Michail Tsagris,and IoannisTsamardinos.Feature Selection with the R Package MXM:Discovering Statistically-Equivalent Feature Subsets.Journal of Statistical Software,80(7),2017.[10]Diane Lambert.Zero-inflated Poisson regression,with an application to defects in manufac-turing.Technometrics,34(1):1–14,1992.[11]RARD Maronna,Douglas Martin,and Victor Yohai.Robust statistics.John Wiley&Sons,Chichester.ISBN,2006.[12]Jose Pinheiro and Douglas Bates.Mixed-effects models in S and S-PLUS.Springer Science&Business Media,2006.[13]FW Scholz.Maximum likelihood estimation for type I censored Weibull data including co-variates,1996.[14]Richard L Smith.Weibull regression models for reliability data.Reliability Engineering&System Safety,34(1):55–76,1991.[15]Peter Spirtes,Clark Glymour,and Richard Scheines.Causation,Prediction,and Search.TheMIT Press,second edi edition,12001.[16]Sofia Triantafillou,Ioannis Tsamardinos,and Anna Roumpelaki.Learning neighborhoods ofhigh confidence in constraint-based causal discovery.In European Workshop on Probabilistic Graphical Models,pages487–502.Springer,2014.[17]Michail Tsagris,Giorgos Borboudakis,Vincenzo Lagani,and Ioannis Tsamardinos.Constraint-based Causal Discovery with Mixed Data.In The2017ACM SIGKDD Work-shop on Causal Discovery,14/8/2017,Halifax,Nova Scotia,Canada,2017.[18]I.Tsamardinos,gani,and D.Pappas.Discovering multiple,equivalent biomarker sig-natures.In In Proceedings of the7th conference of the Hellenic Society for Computational Biology&Bioinformatics,Heraklion,Crete,Greece,2012.[19]Ioannis Tsamardinos,Constantin F Aliferis,and Alexander Statnikov.Time and sampleefficient discovery of Markov blankets and direct causal relations.In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,pages673–678.ACM,2003.[20]Ioannis Tsamardinos,Constantin F Aliferis,Alexander R Statnikov,and Er Statnikov.Al-gorithms for Large Scale Markov Blanket Discovery.In FLAIRS conference,volume2,pages 376–380,2003.[21]Ioannis Tsamardinos and Giorgos Borboudakis.Permutation testing improves Bayesian net-work learning.In ECML PKDD’10Proceedings of the2010European conference on Machine learning and knowledge discovery in databases,pages322–337.Springer-Verlag,2010.[22]Ioannis Tsamardinos and Laura E Brown.Bounding the False Discovery Rate in LocalBayesian Network Learning.In AAAI,pages1100–1105,2008.[23]Ioannis Tsamardinos,Laura E.Brown,and Constantin F.Aliferis.The Max-Min Hill-ClimbingBayesian Network Structure Learning Algorithm.Machine Learning,65(1):31–78,2006.。
武汉大学2007—2008学年度高校教师研修班《生物信息学》试卷(B)及答案一、翻译下列名词并解释。
(每题5分,共25分)1. HGP2. SRS3. Markov Chain4. ANN5. CDS二、填空(每空2分,共20分)1、生物信息学主要研究的两种信息载体是和。
2、目前国际上主要的核酸数据库是由建立和维护的、由维护的,和日本遗传研究所建立和维护的。
每个机构负责收集来自不同地理分布的数据, 3 个数据库所有信息并向世界开放,3、在进行序列两两比对时,有两方面问题直接影响相似性分值:和。
三、解释说明:请按要求对下列GenBank文件作解释说明。
(每小题4分,共20分)1、LOCUS行中的第3项mRNA linear表示,这里是。
2、DEFINITION行在GenBank记录中用以3 ACCESSION 是,是从数据库中检索一个记录的主要。
4. FEATURES后面部分是,直接表达了记录的生物背景知识,5 CDS 30…533 表示。
四、问答。
(共35分)1、DNA测序有哪些方法?其基本原理是什么?(10分)2、简述蛋白质结构预测的基本思想和方法。
(10分)3、试述人类基因组计划与生物信息学的关系。
(15分)《生物信息学》试卷(B)答案一、翻译下列名词并解释。
(每题5分,共25分)1. HGP Human genome project人类基因组计划2. SRS Sequence Retrieval System 序列检索系统3. 马尔科夫链(Markov Chain),对于生物分子序列分析,马尔科夫链是一个很好的数学统计模型,因为马尔科夫链本身就是相继发生事件的序列,其特征是对于事件序列中的任何一个事件都有一个发生概率,而这个概率依赖于该事件之前的若干个事件。
4. ANN Artificial Neural Network, 人工神经网络5. CDS指的是编码序列,从起始密码子到终止密码子二、填空(每空2分,共20分)1、生物信息学主要研究的两种信息载体是DNA分子和蛋白质分子2、目前国际上主要的核酸数据库是由美国国立生物技术信息中心建立和维护的Genbank库、由欧洲生物信息学研究所(EBI)维护的EMBL-Bank,和日本遗传研究所建立和维护的日本DNA 数据仓库(DDBJ)。