国外博弈论lecture11
- 格式:pptx
- 大小:175.90 KB
- 文档页数:20
1Lecture 11Bargaining 2Readings •Watson: Strategy_ An introduction to gametheory–Ch19.2–Exercise: 33Outline•The ultimatum game•A finite horizon game with alternating offers and impatient players2•An infinite horizon game with alternating offers and impatient players4The ultimatum game☐Players:the two players;☐Timing:player 1 proposes a division(x1,x2) of a pie, where x1+x2=1. If 2 accepts this division, she receives x2and player 1 receives x1; if she rejects it, neither player receives any pie.☐Preferences:Each person’s preferences are represented by payoffs equal to the division of pie she receives.x12,x x 120,0Y NThe ultimatum gameBackward induction☐First consider the optimal strategy of player 2☐Player 2 either accepts all offers (including 0), or accepts all offers and rejects the offer .☐Now consider the optimal strategy of player 1:☐If player 2 accepts all offers (including 0), then player 1’s optimal offer is 0;☐If player 2 accepts all offers except zero, then no offer of player 1 is optimal.20x >20x =Conclusion☐The only equilibrium of the game is the strategy pair in which player 1 offers 0 and player 2 accepts all offers.☐In this equilibrium, player 1’s payoff is all the pie and player 2’s payoff is zero.Extensions of the ultimatum gamex12,x x 0,012Y NNY 12,y y y 21Backward induction☐In this game, player 1 is powerless; Her proposal at the start of the game is irrelevant.☐Every subgame following player 2’s rejection of a proposal of player 1 is a variant of the ultimatum game in which player 2 moves first.☐Thus every such subgame has a unique equilibrium, in which player 2 offers nothing to player 1, andplayer 1 accepts all proposal.☐Hence in every equilibrium player 2 obtains all thepie.Conclusion☐In the extension of this game in which the players alternate offers over many periods, a similar result holds:☐In every equilibrium, the player who makes theoffer in the last period obtains all the pie.A finite horizon game with alternatingoffers and impatient players0,0x12,x x 12Y NNY1122,y y δδy2110y =22x δ=Two-period deadlineBackward Induction☐The game has a unique equilibrium in which :☐Player 1’s initial proposal is .☐Player 2 accepts all proposals in which she receivesat least and rejects all proposals in which shereceives less than . ☐Player 2 proposes (0,1) after any history in which she rejects a proposal of player 1.☐Player 1 accepts all proposals of player 2 at the end of the game (after a history in which player 2 rejects player 1’s opening proposal).22(1,)δδ-2δ2δConclusion☐The outcome of this equilibrium is that player1 proposes,which player 2 accepts.☐Player 1’s payoff is and player 2’s payoffis .22(1,)δδ-21δ-2δ1x12,x x 0,012YN NY1122,y y δδy2Third-period deadline1x '2NY221122,x x δδ''20x '=11y δ=221(1)x δδ=-This game has a unique equilibrium:.2121(1(1)),(1))δδδδ---An infinite horizon game with alternatingoffers and impatient players: The is the following extensive game with perfect information.: Two negotiators, say 1 Definitio and 2n bargaining game of alternating offers Players Te : Every sequenc rminal historie e of the s form (1212,,,,,,)for 1, and every (infinite) sequence of the form (,,,,),where each is a division of the pie (a pair of numbers that sums to 1Player function ).: P()=1 (player 1 trx N x N x Y t x N x N x ≥∅ 1212121makes the first offer), and 1 if is even(,,,,,)(,,,,,,) 2 if is odd: For 1,2, player 's payoff to Preference the terminal history (,,,,,,) is s tttt it P x N x N x P x N x N x N t i i x N x N x Y δ-⎧==⎨⎩= 12, where 01, and her payoff to every (infinite) terminal history (,,,,) is 0.ti i x x N x N δ<<Stationary☐The structure of the game is stationary .☐The stationary structure of the game makes it reasonable to guess that the game has anequilibrium in which each player always makes the same proposal and always accepts the same set of proposals—that is, each player’s strategy is stationary .☐A stationary strategy pair:**11**22Player 1 always proposes and accepts a proposal if and only if Player 2 always proposes and accepts a proposal if and only if x y y yy x x x≥≥x**12,x x12YNNY**1122,y yδδy21y**12,y y21YNNY**1122,x xδδx12…………Properties of subgame perfect equilibrium: Player 2 accepts player 1's first offer, so that agreement is reached immediately; no resources are w aste Efficiency d in delay.Properties of subgame perfect equilibrium 2*11: For a given value of , the value of , the equilibrium payoff of player 1, increases as increases to 1. That is, fixing the patience of player 2, player 1's sha Effect of changes in patie r nce x δδ1e increases as she becomes more patient.Further, as player 1 becomes extremely patient ( close to 1), her share approaches 1. Symmetrically, fixing the patience of player 1,player 2's share increase δs to 1 as she becomes more patient.。
Week 11: Game TheoryRequired Reading: Schotter pp. 229 - 260Lecture Plan1. The Static Game TheoryNormal Form GamesSolution Techniques for Solving Static Games Dominant StrategyNash Equilibrium2. Prisoner’s Dilemma3. Decision AnalysisMaximim CriteriaMinimax Criteria4. Dynamic One-Off GamesExtensive Form GamesThe Sub-Game Perfect Nash Equilibrium1. The static Game TheoryStatic games: the players make their move in isolation without knowing what other players have done1.1 Normal Form GamesIn game theory there are two ways in which a game can be represented.1st) The normal form game or strategic form game2nd) The extensive form gameA normal form game is any game where we can identity the following three things:1. Players:2. The strategies available to each player.3. The Payoffs. A payoff is what a player will receive at the endof the game contingent upon the actions of all players in the game.Suppose that two people (A and B) are playing a simple game. A will write one of two words on a piece of paper, “Top” or “Bottom”. At the same time, B will independently write “left” or “right” on a piece of paper. After they do this, the papers will be examined and they will get the payoff depicted in Table 1.Table 1If A says top and B says left, then we examine the top-left corner of the matrix. In this matrix, the payoff to A(B) is the first(Second) entry in the box. For example, if A writes “top” and B writes “left” payoff of A = 1 payoff of B = 2.What is/are the equilibrium outcome(s) of this game?1.2Nash Equilibrium Approach to Solving Static GamesNash equilibrium is first defined by John Nash in 1951 based on the work of Cournot in 1893.A pair of strategy is Nash equilibrium if A's choice is optimal given B's choice, and B's choice is optimal given A's choice. When this equilibrium outcome is reached, neither individual wants to change his behaviour.Finding the Nash equilibrium for any game involves two stages.1) identify each optimal strategy in response to what the other players might do.Given B chooses left, the optimal strategy for A isGiven B chooses right, the optimal strategy for A isGiven A chooses top, the optimal strategy for B isGiven A chooses bottom, the optimal strategy for B isWe show this by underlying the payoff element for each case.2) a Nash equilibrium is identified when all players are player their optimal strategies simultaneouslyIn the case of Table 2,If A chooses top, then the best thing for B to do is to choose left since the payoff to B from choosing left is 1 and the payoff from choosing right is 0. If B chooses left, then the best thing for A to do is to choose top as A will get a payoff of 2 rather than 0.Thus if A chooses top B chooses left. If B chooses left, A chooses top. Therefore we have a Nash equilibrium: each person is making optimal choice, given the other person's choice.If the payoff matrix changes as:Table 2then what is the Nash equilibrium?Table 3If the payoffs are changed as shown in Table 32. Prisoners’ dilemm aPareto Efficiency: An allocation is Pareto efficient if goods cannot be reallocated to make someone better off without making someone else worse off.Two prisoners who were partners in a crime were being questioned in separate rooms. Each prisoner has a choice of confessing to the crime (implicating the other) or denying. If only one confesses, then he would go free and his partner will spend 6 months in prison. If both prisoners deny, then both would be in the prison for 1 month. If both confess, they would both be held for three months. The payoff matrix for this game is depicted in Table 4.Table 4The equilibrium outcome3. Decision AnalysisLet N=1 to 4 a set of possible states of nature, and let S=1 to 4be a set of strategy decided by you. Now you have to decide which strategy you have to choose given the following payoff matrix.Table 5S=YouN=OpponentIn this case you don't care the payoff of your opponent i.e. nature.3.1 The Maximin Decision Rule or Wald criterionWe look for the minimum pay-offs in each choice and then maximising the minimum pay-offLet us highlight the mimimum payoff for each strategy.3.2 The Minimax Decision Rule or Savage criterionOn this rule we need to compute the losses or regret matrix from the payoff matrix. The losses are defined as the difference between the actual payoff and what that payoff would have been had the correct strategy been chosen.Regret/Loss = Max. payoff in each column – actual payoffFor example of N=1 occurs and S=1 is chosen, the actual gain = 2 from Table 3. However, the best action given N=1 is also to choose S=1 which gives the best gain = 2. For (N=1, S=1) regret = 0.If N=1 occurs but S=2 is chosen, the actual gain = 1. However, the best action given N=1 is also to choose S=1 which gives the best gain = 2. For (N=1, S=2) regret = 2-1.Following the similar analysis, we can compute the losses for each N and S and so can compute the regret matrix.Table 6: Regret MatrixAfter computing the regret matrix, we look for the maximum regret of each strategy and then taking the minimum of these.Minimax is still very cautious but less so than the maximin.4. Dynamic one-off GamesA game can be dynamic for two reasons. First, players may be ableto observe the actions of other players before deciding upon theiroptimal response. Second, one-off game may be repeated a number of times.4.1 Extensive Form GamesDynamic games cannot be represented by payoff matrices we have touse a decision tree (extensive form) to represent a dynamic game.Start with the concept of dynamic one-off game the game can beplayed for only one time but players can condition their optimal actions on what other players have done in the past.Suppose that there are two firms (A and B) that are considering whether or not to enter a new market. If both firms enter the market,then they will make a loss of $10 mil. If only one firm enters the market, it will earn a profit of $50 mil. Suppose also that Firm B observes whether Firm A has entered the market before it decides what to do.Any extensive form game has the following four elements in common:Nodes: This is a position where the players must take a decision.The first position is called the initial node, and each node is labelled so as to identify who is making the decision.Branches: These represent the alternative choices that the person faces and so correspond to available actions.Payoff Vectors: These represent the payoffs for each player, with the payoffs listed in the order of players. When we reach a payoffvector the game ends.In period 1, Firm A makes its decisions. This is observed by Firm B which decides to enter or stay out of the market in period 2. In this extensive form game, Firm B’s decision nodes are the sub-game. This means that firm B observes Firm A’s action before making its own decision.4.2 Subgame Perfect Nash EquilibriumSubgame perfect Nash equilibrium is the predicted solution to a dynamic one-off game. From the extensive form of this game, we can observe that there are two subgames, one starting from each of Firm B’s decision nodes.How could we identify the equilibrium outcomes?In applying this principle to this dynamic game, we start with the last period first and work backward through successive nodes until we reach the beginning of the game.Start with the last period of the game first, we have two nodes. At each node, Firm B decides whether or not entering the market based on what Firm A has already done.For example, at the node of “Firm A enters”, Firm B will either make a loss of –$10mil (if it enters) or receive “0” payoff (if it stays out); these are shown by the payoff vectors (-10, -10) and (50, 0). If Firm B is rational, it will stays outThe node “Firm A enters” can be replaced by the vector (50, 0).At the second node “Firm A stays out”, Firm A h as not entered the market. Thus, Firm B will either make a profit of $50mil (if it enters) or receive “0” payoff (if it stays out); these are shown by the payoff vectors (0, 50) and (0, 0). If Firm B is rational, it will enter thus we could rule out the possibility of both firms stay outWe can now move back to the initial node. Here Firm A has to decide whether or not to enter. If Firm B is rational, it is known that the game will never reach the previously “crossed” vectors. Firm A also knows that if it enters, the game will eventually end at (A enters, B stays out) where A gets 50 and B gets 0. On the other hand, if Firm A stays out, the game will end at (A stays out, B enters) where A gets 0 and B gets 50 Firm A should enter the market at the first stage. The eventual outcome is (A enters, B stays out)How to find a subgame perfect equilibrium of a dynamic one-off game?1. Start with the last period of the game cross out the irrelevant payoff vectors.2. Replace the preceding nodes by the uncrossed payoff vectorsuntil you reach the initial node.3. The only uncrossed payoff vector(s) is the subgame perfect Nash equilibrium.。
博弈论(哈佛⼤学原版教程)Lecture XIII:Repeated GamesMarkus M.M¨o biusApril19,2004Gibbons,chapter2.3.B,2.3.COsborne,chapter14Osborne and Rubinstein,sections8.3-8.51IntroductionSo far one might get a somewhat misleading impression about SPE.When we?rst introduced dynamic games we noted that they often have a large number of(unreasonable)Nash equilibria.In the models we’ve looked at so far SPE has’solved’this problem and given us a unique NE.In fact,this is not really the norm.We’ll see today that many dynamic games still have a very large number of SPE.2Credible ThreatsWe introduced SPE to rule out non-credible threats.In many?nite horizon games though credible threats are common and cause a multiplicity of SPE.Consider the following game:1actions before choosing the second period actions.Now one way to get a SPE is to play any of the three pro?les above followed by another of them (or same one).We can also,however,use credible threats to get other actions played inperiod1,such as:Play(B,R)in period1.If player1plays B in period1play(T,L)in period2-otherwise play (M,C)in period2.It is easy to see that no single period deviation helps here.In period2a NE is played so obviously doesn’t help.In period1player1gets4+3if he follows strategy and at most5+1 if he doesn’t.Player2gets4+1if he follows and at most2+1if he doesn’t. Therefore switching to the(M,C)equilibrium in period2is a credible threat.23Repeated Prisoner’s DilemmaNote,that the PD doesn’t have multiple NE so in a?nite horizon we don’t have the same easy threats to use.Therefore,the? nitely repeated PD has a unique SPE in which every player defects in eachperiod.In in?nite other types of threats are credible.Proposition1In the in?nitely repeated PD withδ≥12there exists a SPEin which the outcome is that both players cooperate in every period. Proof:Consider the following symmetric pro?le:s i(h t)=CIf both players have played C in everyperiod along the path leading to h t.D If either player has ever played D.To see that there is no pro?table single deviation note that at any h t such that s i(h t)=D player i gets:0+δ0+δ20+..if he follows his strategy and1+δ0+δ20+..if he plays C instead and then follows s i.At any h t such that s i(h t)=C player i gets:1+δ1+δ21+..=1 1?δ3if he follows his strategy and2+δ0+δ20+..=2if he plays D instead and then follows s i.Neither of these deviations is worth while ifδ≥12.QEDRemark1While people sometimes tend to think of this as showing that people will cooperate in they repeatedly interact ifdoes not show this.All it shows is that there is one SPE in which they do.The correct moral to draw is that there many possible outcomes.3.1Other SPE of repeated PD1.For anyδit is a SPE to play D every period.2.Forδ≥12there is a SPE where the players play D in the?rst periodand then C in all future periods.3.Forδ>1√2there is a SPE where the players play D in every evenperiod and C in every odd period.4.Forδ≥12there is a SPE where the players play(C,D)in every evenperiod and(D,C)in every odd period.3.2Recipe for Checking for SPEWhenever you are supposed to check that a strategy pro?le is an SPE you should?rst try to classify all histories(i.e.all information sets)on and o?the equilibrium path.Then you have to apply the SPDP for each class separately.Assume you want to check that the cooperation with grim trigger pun-ishment is SPE.There are two types of histories you have to check.Along the equilibrium path there is just one history:everybody coop-erated so far.O?the equilibrium path,there is again only one class: one person has defected.4Assume you want to check that cooperating in even periods and defect-ing in odd periods plus grim trigger punishment in case of deviation by any player from above pattern is SPE.There are three types of his-tories:even and odd periods along the equilibrium path,and o?the equilibrium path histories.Assume you want to check that TFT(’Tit for Tat’)is SPE(which it isn’t-see next lecture).Then you have you check four histories:only the play of both players in the last period matters for future play-so there are four relevant histories such as player1and2both cooperated in the last period,player1defected and player2cooperated etc.1Sometimes the following result comes in handy.Lemma1If players play Nash equilibria of the stage game in each period in such a way that the particular equilibrium being played in a period is a function of time only and does not depend on previous play,then this strategy is a Nash equilibrium. The proof is immediate:we check for the SPDP.Assume that there is a pro?table deviation.Such a deviation will not a?ect future play by assump-tion:if the stage game has two NE,for example,and NE1is played in even periods and NE2in odd periods,then a deviation will not a?ect future play.1 Therefore,the deviation has to be pro?table in the current stage game-but since a NE is being played no such pro?table deviation can exist.Corollary1A strategy which has players play the same NE in each period is always SPE.In particular,the grim trigger strategy is SPE if the punishment strategy in each stage game is a NE(as is the case in the PD). 4Folk TheoremThe examples in3.1suggest that the repeated PD has a tremendous number of equilibria whenδis large.Essentially,this means that game theory tells us we can’t really tell what is going to happen.This turns out to be an accurate description of most in?nitely repeated games.1If a deviation triggers a switch to only NE1this statement would no longer be true.5Let G be a simultaneous move game with action sets A1,A2,..,A I and mixed strategy spacesΣ1,Σ2,..,ΣI and payo?function?u i.De?nition1A payo?vector v=(v1,v2,..,v I)? I is feasible if there exists action pro?les a1,a2,..,a k∈A and non-negative weightsω1,..,ωI which sum up to1such thatv i=ω1?u ia1+ω2?u ia2+..+ωk?u ia k+De?nition2A payo?vector v is strictly individually rational ifv i>v i=minσ?i∈Σ?imaxσi(σ?i)∈Σiu i(σi(σ?i),σ?i)(1)We can think of this as the lowest payo?a rational player could ever get in equilibrium if he anticipates hisopponents’(possibly non-rational)play.Intuitively,the minmax payo?v i is the payo?player1can guarantee to herself even if the other players try to punish her as badly as they can.The minmax payo?is a measure of the punishment other players can in?ict. Theorem1FolkTheorem.Suppose that the set of feasible payo?s of G is I-dimensional.Then for any feasible and strictly individually rational payo?vector v there existsδ<1such that for allδ>δthere exists a SPE x?of G∞such that the average payo?to s?is v,i.e.u i(s?)=v i 1?δThe normalized(or average)payo?is de?ned as P=(1?δ)u i(s?).It is the payo?which a stage game would have to generate in each period such that we are indi?erent between that payo?stream and the one generates by s?:P+δP+δ2P+...=u i(s?)4.1Example:Prisoner’s DilemmaThe feasible payoset is the diamond bounded by(0,0),(2,-1),(-1,2) and(1,1).Every point inside can be generated as a convex combina-tions of these payo?vectors.6The minmax payofor each player is0as you can easily check.The other player can at most punish his rival by defecting,and each playerthat the equilibria I showed before generate payo?s inside this area.4.2Example:BOSConsiderHere each player can guarantee herself at least payo?23which is the pay-o?from playing the mixed strategy Nash equilibrium.You can check that whenever player2mixes with di?erent probabilities,player1can guarantee herself more than this payo?by playing either F or O all the time.4.3Idea behind the Proof1.Have players on the equilibrium path play an action with payo?v(oralternate if necessary to generate this payo?).22.If some player deviates punish him by having the other players for Tperiods chooseσ?i such that player i gets v i.3.After the end of the punishment phase reward all players(other than i)for having carried out the punishment by switching to an action pro?lewhere player i gets v Pij+ .2For example,in the BoS it is not possible to generate32,32in the stage game evenwith mixing.However,if players alternate and play(O,F)and then(F,O)the players can get arbitrarily close for largeδ. 8。
Lecture XII:Analysis of Infinite GamesMarkus M.M¨o biusApril7,2004•Gibbons,chapter2.1.A,2.1.B,2.2.A•Osborne,sections14.1-14.4,16•Oxborne and Rubinstein,sections6.5,8.1and8.21Introduction-Critique of SPEThe SPE concept eliminates non-credible threats but it’s worth to step back for am minute and ask whether we think SPE is reasonable or in throwing out threats we have been overzealous.Practically,for this course the answer will be that SPE restrictions are OK and we’ll always use them in extensive form games.However,it’s worth looking at situations where it has been criticized.Some of the worst anoma-lies disappear in infinite horizon games which we study next.1.1Rationality offthe Equilibrium PathIs it reasonable to play NE offthe equilibrium path?After all,if a player does not follow the equilibrium he is probably as stupid as a broomstick. Why should we trust him to play NE in the subgame?Let’s look at the following game to illustrate that concern:1Here(L,A,x)is the unique SPE.However,player2has to put a lot of trust into player1’s rationality in order to play x.He must believe that player1is smart enough tofigure out that A is a dominant strategy in the subgame following R.However,player2might have serious doubts about player1’s marbles after the guy has just foregone5utils by not playing L.1 1.2Multi-Stage GamesLemma1The unique SPE of thefinitely repeated Prisoner’s Dilemma game in which players get the sum of their payoffs from each stage game has every player defecting at each information set.The proof proceeds by analyzing the last stage game where we would see defection for sure.But then we would see defection in the second to last stage game etc.In the same way we can show that thefinitely repeated Bertrand game results in pricing at marginal cost all the time.Remark1The main reason for the breakdown of cooperation in thefinitely repeated Prisoner’s Dilemma is not so much SPE by itself by the fact that there is afinal period in which agents would certainly defect.This raises the question whether an infinitely repeated PD game would allow us to cooperate. Essentially,we could cooperate as long as the other player does,and if there 1After all any strategy in which L is played strictly dominates any strategy in which R is played in the normal form.2is defection,we defect from then on.This still looks like a SPE-in any subgame in which I have defected before,I might as well defect forever.If I haven’t defected yet,I can jeopardize cooperation by defection,and therefore should not do it as long as I care about the future sufficiently.These results should make us suspicious.Axelrod’s experiments(see fu-ture lecture)showed that in thefinitely repeated Prisoner’s Dilemma people tend to cooperate until the last few periods when the’endgame effect’kicks in.Similarly,there are indications that rivalfirms can learn to collude if they interact repeatedly and set prices above marginal cost.This criticisms of SPE is reminiscent of our criticism of IDSDS.In both cases we use an iterative procedure tofind equilibrium.We might have doubts where real-world subjects are able(and inclined)to do this calculation.game is to drop out immediately(play L)at each stage.However,in experi-3ments people typically continue until almost the last period before they drop out.2Infinite Horizon GamesOf the criticism of SPE the one we will take most seriously is that longfinite horizon models do not give reasonable answers.Recall that the problem was that the backward induction procedure tended to unravel’reasonably’looking strategies from the end.It turns out that many of the anomalies go away once we model these games as infinite games because there is not endgame to be played.The prototypical model is what Fudenberg and Tirole call an infinite horizon multistage game with observed actions.•At times t=0,1,2,..some subset of the set of players simultaneously chooses actions.•All players observe the period t actions before choosing period t+1 actions.•Players payoffs maybe any function of the infinite sequence of actions (play does not end in terminal nodes necessarily any longer)2.1Infinite Games with DiscountingOften we assume that player i’s payoffs are of the form:u i(s i,s−i)=u i0(s i,s−i)+δu i1(s i,s−i)+δ2u i2(s i,s−i)+..(1) where u it(s i,s−i)is a payoffreceived at t when the strategies are followed.2.1.1Interpretation ofδ1.Interest rateδ=11+r .Having two dollars today or two dollars tomor-row makes a difference to you:your two dollars today are worth more, because you can take them to the bank and get2(1+r)Dollars to-morrow where r is the interest rate.By discounting future payoffs withδ=11+r we correct for the fact that future payoffs are worth less to usthan present payoffs.42.Probabilistic end of game:suppose the game is reallyfinite,but thatthe end of the game is not deterministic.Instead given that stage t is reached there is probabilityδthat the game continues and probability 1−δthat the game ends after this stage.Note,that the expectednumber of periods is then11−δandfinite.However,we can’t applybackward induction directly because we can never be sure that any round is the last one.The probabilistic interpretation is particularly attractive for interpreting bargaining games with many rounds.2.2Example I:Repeated GamesLet G be a simultaneous move game withfinite action spaces A1,..,A I.The infinitely repeated game G∞is the game where in periods t=0,1,2,..theplayers simultaneously choose actions(a t1,..,a tI)after observing all previousactions.We define payoffs in this game byu i(s i,s−i)=∞t=0δt˜u ia t1,..,a tI(2)where(a t1,..,a tI)is the action profile taken in period t when players followstrategies s1,..,s I,and˜u i are the utilities of players in each stage game. For example,in the infinitely repeated Prisoner’s Dilemma game the˜u i are simply the payoffs in the’boxes’of the normal form representation.2.3Example II:BargainingSuppose there is a one Dollar to be divided up between two players.The following alternate offer procedure is used:I.In periods0,2,4,...player1offers the division(x1,1−x1).Player2then accepts and the game ends,or he rejects and play continues.II.In period1,3,5,...player2offers the division(1−x2,x2).Player1then accepts or rejects.Assume that if the division(y,1−y)is agreed to in period t then the payoffs areδt y andδt(1−y).22Note that this is not a repeated game.First of all,the stage games are not identical (alternate players make offers).Second,there is no per period payoff.Instead,players only get payoffs when one of them has agreed to an offer.Waiting to divide the pie is costly.5Remark2There arefinite versions of this game in which play end after period T.One has to make some assumption what happens if there is no agreement at time T-typically,one assumes that the pie simply disappears. If T=1then we get the simple ultimatum game.3Continuity at InfinityNone of the tools we’ve discussed so far are easy to apply for infinite games. First,backward induction isn’t feasible because there is no end to work back-ward from.Second,using the definition of SPE alone isn’t very easy.There are infinitely many subgames and uncountably many strategies that might do better.We will discuss a theorem which makes the analysis quite tractable in most infinite horizon games.To do so,we mustfirst discuss what continuity at infinity means.Definition1An infinite extensive form game G is continuous at∞iflim T→∞supi,σ,σ s.th.σ(h)=σ (h)for all h in peri-ods t≤T|u i(σ)−u i(σ )|=0In words:compare the payoffs of two strategies which are identical for all information sets up to time T and might differ thereafter.As T becomes large the maximal difference between any two such strategies becomes arbitrarily small.Essentially,this means that distant future events have a very small payoffeffect.3.1Example I:Repeated GamesIfσandσ agree in thefirst T periods then:|u i(σ)−u i(σ )|=∞t=Tδt˜u ia t−˜u i(at)≤∞t=Tδt maxi,a,a|˜u i(a)−˜u i(a )| 6Forfinite stage games we know that M=max i,a,a |˜u i(a)−˜u i(a )|isfinite. This implies thatlim T→∞|u i(σ)−u i(σ )|≤limT→∞∞t=Tδt M=limT→∞δT1−δM=0.3.2Example II:BargainingIt’s easy to check that the bargaining game is also continuous.3.3Example III:Non-discounted war of attritionThis is an example for an infinite game which is NOT continuous at infinity.Players1and2choose a ti ∈{Out,Fight}at time t=0,1,2,...The gameends whenever one player quits with the other being the’winner’.Assume the payoffs areu i(s i,s−i)=1−ct if player i’wins’in period t −ct if player i quits in period tNote,that players in this game can win a price of1by staying in the game longer than the other player.However,staying in the game is costly for both players.Each player wants the game tofinish as quickly as possible,but also wants the other player to drop outfirst.This game is not continuous at∞.LetσT=Bothfight for T periods and then1quitsσ T=Bothfight for T periods and then2quits.Then we haveuiσT−u iσ T=1.This expression does not go to zero as T→∞.4The Single-Period Deviation PrincipleThe next theorem makes the analysis of infinite games which are continuous at∞possible.7Theorem1Let G be an infinite horizon multistage game with observed ac-tions which is continuous at∞.A strategy profile(σ1,..,σI)is a SPE if and only if there is no player i and strategyˆσi that agrees withσi except at a sin-gle information set h ti and which gives a higher payoffto player i conditionalon h tibeing reached.We write u i(σi,σ−i|x)for the payoffconditional on x being reached.For example,in the entry game below we have u2(Accomodate,Out|node1is reached)= 1.Recall,that we can condition on nodes which are not on the equilibrium path because the strategy of each player defines play at each node.4.1Proof of SPDP for Finite GamesI start by proving the result forfinite-horizon games with observed actions.Step I:By the definition of SPE there cannot be a profitable deviationfor any player at some information set in games with observed actions.3 Step II:The reverse is a bit harder to show.We want to show thatu i(ˆσi,σ−i|x t)≤u i(σi,σ−i|x t)for all initial nodes x t of a subgame(subgameat some round t).We prove this by induction on T which is the number of periods in whichσi andˆσi differ.3We have to very careful at this point.We have defined SPE as NE in every subgame. Subgames can only originate at nodes and not information sets.However,in games with observed actions all players play simultaneous move games in each round t.Therefore any deviation by a player at an information set at round t which is not a singleton is on the equilibrium path of some subgame at round t.8T=1:In this case the result is clear.Supposeσi andˆσi differ only in the information set in period t .If t>t it is clear that u i(ˆσi,σ−i|x t)= u i(σi,σ−i|x t)because the two strategies are identical at all the relevant in-formation sets.If t≤t then:u i(ˆσi,σ−i|x t)=h itu i(ˆσi,σ−i|h it )P rob{h it |ˆσi,σ−i,x t}≤h itu i(σi,σ−i|h it )follows from onestage deviationcriterionP rob{h it |σi,σ−i,x t}follows fromσiandˆσi havingsame play be-tween t and t=u i(σi,σ−i|x t)T→T+1:Assuming that the result holds for T letˆσi be any strategy differing fromσi in T+1periods.Let t be the last period at which they differ and define˜σi by:˜σi(h it)=ˆσi(h it)if t<tσi(h it)if t≥tIn other words,˜σi differs fromσi only at T periods.Therefore we have for any x tu i(˜σi,σ−i|x t)≤u i(σi,σ−i|x t)by the inductive hypothesis since we assumed that the claim holds for T.However,we also know that˜σandˆσonly differ in a single deviation at round t .Therefore,we can use exactly the same argument as in the previous step to show thatu i(ˆσi,σ−i|x t)≤u i(˜σi,σ−i|x t)for any x t.4Combining both inequalities we get the desires result:u i(ˆσi,σ−i|x t)≤u i(σi,σ−i|x t)This proves the result forfinite games with observed actions.4It is important that we have defined˜σi in differing only in the last period deviation. Therefore,after time t strategy˜σi followsσi.This allows us to use the SPDP.94.2Proof for infinite horizon gamesNote,that the proof forfinite-horizon games also establishes that we for a profileσwhich satisfies SPDP player i cannot improve onσi in some subgame x t by considering a new strategyˆσi withfinitely many deviations fromσi. However,it is still possible that deviations at infinitely many periods might be an improvement for player i.Assume this would be the case for someˆσi.Let’s denote the gain from using this strategy with :=u i(ˆσi,σ−i|x t)−u i(σi,σ−i|x t)Because the game is continuous at∞this implies that if we choose T large enough we can define some strategy˜σi which agrees withˆσi up to period T and then follows strategyσi such that:|u i(ˆσi,σ−i|x t)−u i(˜σi,σ−i x t)|< 2This implies that the new strategy˜σi gives player i strictly more utility than σi.However,it can only differ fromσi for at most T periods.But this is a contradiction as we have shown above.QEDRemark3Games which are at continuous at∞are in some sense’the next best thing’tofinite games.5Analysis of Rubinstein’s Bargaining Game To illustrate the use of the single period deviation principle and to show the power of SPE in one interesting model we now return to the Rubinstein bargaining game introduced before.First,note that the game has many Nash equilibria.For example,player 2can implement any division by adopting a strategy in which he only accepts and proposes a share x2and rejects anything else.Proposition1The bargaining game has a unique SPE.In each period of the SPE the player i who proposes picks x i=11+δand the other player acceptsany division giving him at leastδ1+δand rejects any offer giving him less.Several observations are in order:101.We get a roughly even split forδclose to1(little discounting).Theproposer can increase his share the more impatient the other player is.2.Agreement is immediate and bargaining is therefore efficient.Playerscan perfectly predict play in the next period and will therefore choose a division which makes the other player just indifferent between accepting and making her own proposal.There is no reason to delay agreement because it just shrinks the pie.Immediate agreement is in fact not observed in most experiments-last year in a two stage bargaining game in class we observed that only2out of14bargains ended in agreement after thefirst period.There are extensions of the Rubinstein model which do not give immediate agreement.53.The division becomes less equal forfinite bargaining games.Essentially,the last proposer at period T can take everything for himself.Therefore, he will tend to get the greatest share of the pie in period1as well -otherwise he would continue to reject and take everything in the last period.In our two-period experiments we have in deed observed greater payoffs to the last proposer(ratio of2to1).However,half of all bargains resulted in disagreement after the second period and so zero payoffs for everyone.Apparently,people care about fairness as well as payoffs which makes one wonder whether monetary payoffs are the right way to describe the utility of players in this game.5.1Useful ShortcutsThe hard part is to show that the Rubinstein game has a unique SPE.If we know that it is much easier to calculate the actual strategies.5.1.1Backward Induction in infinite gameYou can solve the game by assuming that random payoffafter rejection of offer at period T.The game then becomes afinite game of perfect information which can be solved through backward induction.It turns out that as T→∞5For example,when there is uncertainty about a players discount factor the proposer might start with a low offer in order to weed out player2types with low discount factor. Players with highδwill reject low offers and therefore agreement is not immediate.To analyze these extension we have tofirst develop the notion of incomplete information games.11the backward solution converges to the Rubinstein solution.This technique also provides an alternative proof for uniqueness.5.1.2Using the Recursive Structure of GameThe game has a recursive structure.Each player faces essentially the same game,just with interchanged roles.Therefore,in the unique SPE player 1should proposes some(x,1−x)and player2should propose(1−x,x). Player1’s proposal to2should make2indifferent between accepting imme-diately or waiting to make her own offer(otherwise player1would bid higher or lower).This implies:1−x2’s payoffat period1=δx2’s discounted payofffrom period2x=1 1+δ5.2Proof of Rubinstein’s Solution5.2.1ExistenceWe show that there is no profitable single history deviation.Proposer:If he conforms at period t the continuation payoffisδt11+δ.If he deviates and asks for more he getsδt+1δ.If he deviates and asks for less hegets less.Either way he loses.Recipient:Look at payoffs conditional on y being proposes in period t.–If he rejects he getsδt+111+δ.–If he accepts he getsδt y.–If y≥δ1+δthe strategy says accept and this is better than reject-ing.–If y<δ1+δthe strategy says reject and accepting is not a profitabledeviation.125.2.2UniquenessThis proof illustrates a number of techniques which enable us to prove prop-erties about equilibrium without actually constructing it.Let v and v be the highest and lowest payoffs received by a proposer in any SPE.We first observe:1−v ≥δv (3)If this equation would not be true then no proposer could propose v because the recipient could always get more in any subgame.We also find:v ≥1−δv (4)If not then no proposer would propose v -she would rather wait for the other player to make her proposal because she would get a higher payoffthis way.We can use both inequalities to derive bounds on v and v :v ≥1−δv ≥1−δ(1−δv )(5) 1−δ2 v ≥1−δ(6)v ≥11+δ(7)Similarly,we find:1−v ≥δv ≥δ(1−δv )(8)v ≤11+δ(9)Hence v =v =11+δ.Clearly,no other strategies can generate this payoffin every subgame.66While being a nice result it does not necessarily hold anymore when we change the game.For example,if both players make simultaneous proposals then any division is a SPE.Also,it no longer holds when there are several players.13。
Lecture VIII:LearningMarkus M.M¨o biusMarch10,2004Learning and evolution are the second set of topics which are not discussed in the two main texts.I tried to make the lecture notes self-contained.•Fudenberg and Levine(1998),The Theory of Learning in Games,Chap-ter1and21IntroductionWhat are the problems with Nash equilibrium?It has been argued that Nash equilibrium are a reasonable minimum requirement for how people should play games(although this is debatable as some of our experiments have shown).It has been suggested that players should be able tofigure out Nash equilibria starting from the assumption that the rules of the game,the players’rationality and the payofffunctions are all common knowledge.1As Fudenberg and Levine(1998)have pointed out,there are some important conceptual and empirical problems associated with this line of reasoning:1.If there are multiple equilibria it is not clear how agents can coordinatetheir beliefs about each other’s play by pure introspection.mon knowledge of rationality and about the game itself can bedifficult to establish.1We haven’t discussed the connection between knowledge and Nash equilibrium.As-sume that there is a Nash equilibriumσ∗in a two player game and that each player’s best-response is unique.In this case player1knows that player2will playσ∗2in response toσ∗1,player2knows that player1knows this mon knowledge is important for the same reason that it matters in the coordinated attack game we discussed earlier.Each player might be unwilling to play her prescribed strategy if she is not absolutely certain that the other play will do the same.13.Equilibrium theory does a poor job explaining play in early rounds ofmost experiments,although it does much better in later rounds.This shift from non-equilibrium to equilibrium play is difficult to reconcile with a purely introspective theory.1.1Learning or Evolution?There are two main ways to model the processes according to which players change their strategies they are using to play a game.A learning model is any model that specifies the learning rules used by individual players and examines their interaction when the game(or games)is played repeatedly. These types of models will be the subject of today’s lecture.Learning models quickly become very complex when there are many play-ers involved.Evolutionary models do not specifically model the learning process at the individual level.The basic assumption there is that some un-specified process at the individual level leads the population as a whole to adopt strategies that yield improved payoffs.These type of models will the subject of the next few lectures.1.2Population Size and MatchingThe natural starting point for any learning(or evolutionary)model is the case offixed players.Typically,we will only look at2by2games which are played repeatedly between these twofixed players.Each player faces the task of inferring future play from the past behavior of agents.There is a serious drawback from working withfixed agents.Due to the repeated interaction in every game players might have an incentive to influence the future play of their opponent.For example,in most learning models players will defect in a Prisoner’s dilemma because cooperation is strictly dominated for any beliefs I might hold about my opponent.However, if I interact frequently with the same opponent,I might try to cooperate in order to’teach’the opponent that I am a cooperator.We will see in a future lecture that such behavior can be in deed a Nash equilibrium in a repeated game.There are several ways in which repeated play considerations can be as-sumed away.1.We can imagine that players are locked into their actions for quite2a while(they invest infrequently,can’t build a new factory overnightetc.)and that their discount factors(the factor by which they weight the future)is small compared that lock-in length.It them makes sense to treat agents as approximately myopic when making their decisions.2.An alternative is to dispense with thefixed player assumption,andinstead assume that agents are drawn from a large population and are randomly matched against each other to play games.In this case,it is very unlikely that I encounter a recent opponent in a round in the near future.This breaks the strategic links between the rounds and allows us to treat agents as approximately myopic again(i.e.they maximize their short-term payoffs).2Cournot AdjustmentIn the Cournot adjustment model twofixed players move sequentially and choose a best response to the play of their opponent in the last period.The model was originally developed to explain learning in the Cournotmodel.Firms start from some initial output combination(q01,q02).In thefirst round bothfirms adapt their output to be the best response to q02.Theytherefore play(BR1(q02),BR2(q01)).This process is repeated and it can be easily seen that in the case of linear demand and constant marginal costs the process converges to the unique Nash equilibrium.If there are several Nash equilibria the initial conditions will determine which equilibrium is selected.2.1Problem with Cournot LearningThere are two main problems:•Firms are pretty dim-witted.They adjust their strategies today as if they expectfirms to play the same strategy as yesterday.•In each period play can actually change quite a lot.Intelligentfirms should anticipate their opponents play in the future and react accord-ingly.Intuitively,this should speed up the adjustment process.3Cournot adjustment can be made more realistic by assuming thatfirms are’locked in’for some time and that they move alternately.Firms1moves in period1,3,5,...andfirm2moves in periods2,4,6,..Starting from someinitial play(q01,q02),firms will play(q11,q02)in round1and(q11,q22)in round2.Clearly,the Cournot dynamics with alternate moves has the same long-run behavior as the Cournot dynamics with simultaneous moves.Cournot adjustment will be approximately optimal forfirms if the lock-in period is large compared to the discount rate offirms.The less locked-in firms are the smaller the discount rate(the discount rate is the weight on next period’s profits).Of course,the problem with the lock-in interpretation is the fact that it is not really a model of learning anymore.Learning is irrelevant becausefirms choose their optimal action in each period.3Fictitious PlayIn the process offictitious play players assume that their opponents strategies are drawn from some stationary but unknown distribution.As in the Cournot adjustment model we restrict attention to afixed two-player setting.We also assume that the strategy sets of both players arefinite.4Infictitious play players choose the best response to their assessment of their opponent’s strategy.Each player has some exogenously given weightingfunctionκ0i :S−i→ +.After each period the weights are updated by adding1to each opponent strategy each time is has been played:κt i (s−i)=κt−1i(s−i)if s−i=s t−1−iκt−1i(s−i)+1if s−i=s t−1−iPlayer i assigns probabilityγti(s−i)to strategy profile s−i:γt i (s−i)=κti(s−i)˜s−i∈S−iκti(˜s−i)The player then chooses a pure strategy which is a best response to his assessment of other players’strategy profiles.Note that there is not neces-sarily a unique best-response to every assessment-hencefictitious play is not always unique.We also define the empirical distribution d ti (s i)of each player’s strategiesasd t i (s i)=t˜t=0I˜t(s i)tThe indicator function is set to1if the strategy has been played in period˜tand0otherwise.Note,that as t→∞the empirical distribution d tj of playerj’s strategies approximate the weighting functionκti (since in a two playergame we have j=−i).Remark1The updating of the weighting function looks intuitive but also somewhat arbitrary.It can be made more rigorous in the following way. Assume,that there are n strategy profiles in S−i and that each profile is played by player i’s opponents’with probability p(s−i).Agent i has a prior belief according to which these probabilities are distributed.This prior is a Dirichlet distribution whose parameters depend on the weighting function. After each round agents update their prior:it can be shown that the posterior belief is again Dirichlet and the parameters of the posterior depend now on the updated weighting function.3.1Asymptotic BehaviorWillfictitious play converge to a Nash equilibrium?The next proposition gives a partial answer.5Proposition1If s is a strict Nash equilibrium,and s is played at date t in the process offictitious play,s is played at all subsequent dates.That is,strict Nash equilibria are absorbing for the process offictitious play.Furthermore, any pure-strategy steady state offictitious play must be a Nash equilibrium. Proof:Assume that s=(s i,s j)is played at time t.This implies that s i isa best-response to player i’s assessment at time t.But his next periodassessment will put higher relative weight on strategy s j.Because s i isa BR to s j and the old assessment it will be also a best-response to theupdated assessment.Conversely,iffictitious play gets stuck in some pure steady state then players’assessment converge to the empirical distribution.If the steady state is not Nash players would eventually deviate.A corollary of the above result is thatfictitious play cannot converge to a pure steady state in a game which has only mixed Nash equilibria such as matchingpennies.6distribution over each player’s strategies are converging to12,12-this isprecisely the unique mixed Nash equilibrium.This observation leads to a general result.Proposition2Underfictitious play,if the empirical distributions over each player’s choices converge,the strategy profile corresponding to the product of these distributions is a Nash equilibrium.Proof:Assume that there is a profitable deviation.Then in the limit at least one player should deviate-but this contradicts the assumption that strategies converge.These results don’t tell us whenfictitious play converges.The next the-orem does precisely that.Theorem1Underfictitious play the empirical distributions converge if the stage has generic payoffs and is22,or zero sum,or is solvable by iterated strict dominance.We won’t prove this theorem in this lecture.However,it is intuitively clear whyfictitious play observes IDSDS.A strictly dominated strategy can never be a best response.Therefore,in the limitfictitious play should put zero relative weight on it.But then all strategies deleted in the second step can never be best responses and should have zero weight as well etc.3.2Non-Convergence is PossibleFictitious play does not have to converge at all.An example for that is due to Shapley.7(M,R),(M,L),(D,L),(D,M),(T,M).One can show that the number of time each profile is played increases at a fast enough rate such that the play never converges.Also note,that the diagonal entries are never played.3.3Pathological ConvergenceConvergence in the empirical distribution of strategy choices can be mis-leading even though the process converges to a Nash equilibrium.Take the following game:8riod the weights are2,√2,and both play B;the outcome is the alternatingsequence(B,B),(A,A),(B,B),and so on.The empirical frequencies of each player’s choices converge to1/2,1/2,which is the Nash equilibrium.The realized play is always on the diagonal,however,and both players receive payoff0each period.Another way of putting this is that the empirical joint distribution on pairs of actions does not equal the product of the two marginal distributions,so the empirical joint distribution corresponds to correlated as opposed to independent play.This type of correlation is very appealing.In particular,agents don’t seem to be smart enough to recognize cycles which they could exploit.Hence the attractive property of convergence to a Nash equilibrium can be misleading if the equilibrium is mixed.9。