ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.2 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://mpaldridge.github.io/math2750/S10-stationary-distributions.html |
| Last Crawled | 2026-04-12 08:34:48 (4 days ago) |
| First Indexed | 2021-12-23 06:04:53 (4 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Section 10 Stationary distributions | MATH2750 Introduction to Markov Processes |
| Meta Description | Lecture notes for the course MATH2750 Introduction to Markov Process at the University of Leeds, 2020–2021 |
| Meta Canonical | null |
| Boilerpipe Text | Stationary distributions and how to find them
Conditions for existence and uniqueness of the stationary distribution
Definition of stationary distribution
Consider
the two-state “broken printer” Markov chain from Lecture 5
.
Figure 10.1: Transition diagram for the two-state broken printer chain.
Suppose we start the chain from the initial distribution
λ
0
=
P
(
X
0
=
0
)
=
β
α
+
β
λ
1
=
P
(
X
0
=
1
)
=
α
α
+
β
.
(You may recognise this from
Question 3 on Problem Sheet 3 and the associated video
.) What’s the distribution after step 1? By conditioning on the initial state, we have
P
(
X
1
=
0
)
=
λ
0
p
00
+
λ
1
p
10
=
β
α
+
β
(
1
−
α
)
+
α
α
+
β
β
=
β
α
+
β
,
P
(
X
1
=
1
)
=
λ
0
p
01
+
λ
1
p
11
=
β
α
+
β
α
+
α
α
+
β
(
1
−
β
)
=
α
α
+
β
.
So we’re still in the same distribution we started in. By repeating the same calculation, we’re still going to be in this distribution after step 2, and step 3, and forever.
More generally, if we start from a state given by a distribution
π
=
(
π
i
)
, then after step 1 the probability we’re in state
j
is
∑
i
π
i
p
i
j
. So if
π
j
=
∑
i
π
i
p
i
j
, we stay in this distribution forever. We call such a distribution a stationary distribution. We again recodnise this formula as a matrix–vector multiplication, so this is
π
=
π
P
, where
π
is a
row
vector.
Definition 10.1
Let
(
X
n
)
be a Markov chain on a state space
S
with transition matrix
P
.
Let
π
=
(
π
i
)
be a distribution on
S
, in that
π
i
≥
0
for all
i
∈
S
and
∑
i
∈
S
π
i
=
1
. We call
π
a
stationary distribution
if
π
j
=
∑
i
∈
S
π
i
p
i
j
for all
j
∈
S
,
or, equivalently, if
π
=
π
P
.
Note that we’re saying the
distribution
P
(
X
n
=
i
)
stays the same; the Markov chain
(
X
n
)
itself will keep moving. One way to think is that if we started off a thousand Markov chains, choosing each starting position to be
i
with probability
π
i
, then (roughly)
1000
π
j
of them would be in state
j
at any time in the future – but not necessarily the same ones each time.
Finding a stationary distribution
Let’s try an example. Consider
the no-claims discount Markov chain from Lecture 6
with state space
S
=
{
1
,
2
,
3
}
and transition matrix
P
=
(
1
4
3
4
0
1
4
0
3
4
0
1
4
3
4
)
.
We want to find a stationary distribution
π
, which must solve the equation
π
=
π
P
, which is
(
π
1
π
2
π
3
)
=
(
π
1
π
2
π
3
)
(
1
4
3
4
0
1
4
0
3
4
0
1
4
3
4
)
.
Writing out the equations coordinate at a time, we have
π
1
=
1
4
π
1
+
1
4
π
2
,
π
2
=
3
4
π
1
+
1
4
π
3
,
π
3
=
3
4
π
2
+
3
4
π
3
.
Since
π
must be a distribution, we also have the “normalising condition”
π
1
+
π
2
+
π
3
=
1
.
The way to solve these equations is first to solve for all the variables
π
i
in terms of a convenient
π
j
(called the “working variable”) and then substitute all of these expressions into the normalising condition to find a value for
π
j
.
Let’s choose
π
2
as our working variable. It turns out that
π
=
π
P
always gives one more equation than we actually need, so we can discard one of them for free. Let’s get rid of the second equation, and the solve the first and third equations in terms of our working variable
π
2
, to get
(10.1)
π
1
=
1
3
π
2
π
3
=
3
π
2
.
Now let’s turn to the normalising condition. That gives
π
1
+
π
2
+
π
3
=
1
3
π
2
+
π
2
+
3
π
2
=
13
3
π
2
=
1
.
So the working variable is solved to be
π
2
=
3
13
. Substituting this back into
(10.1)
, we have
π
1
=
1
3
π
2
=
1
13
and
π
3
=
3
π
2
=
9
13
. So the full solution is
π
=
(
π
1
,
π
2
,
π
3
)
=
(
1
13
,
3
13
,
9
13
)
.
The method we used here can be summarised as follows:
Write out
π
=
π
P
coordinate by coordinate. Discard one of the equations.
Select one of the
π
i
as a working variable and treat it as a parameter. Solve the equations in terms of the working variable.
Substitute the solution into the normalising condition to find the working variable, and hence the full solution.
It can be good practice to use the equation discarded earlier to check that the calculated solution is indeed correct.
One extra example for further practice and to show how you should present your solutions to such problems:
Example 10.1
Consider a Markov chain on state space
S
=
{
1
,
2
,
3
}
with transition matrix
P
=
(
1
2
1
4
1
4
1
4
1
2
1
4
0
1
4
3
4
)
.
Find a stationary distribution for this Markov chain.
Step 1.
Writing out
π
=
π
P
coordinate-wise, we have
π
1
=
1
2
π
1
+
1
4
π
2
π
2
=
1
4
π
1
+
1
2
π
2
+
1
4
π
3
π
3
=
1
4
π
1
+
1
4
π
2
+
3
4
π
3
.
We choose to discard the third equation.
Step 2.
We choose
π
1
as our working variable. From the first equation we get
π
2
=
2
π
1
. From the second equation we get
π
3
=
2
π
2
−
π
1
, and substituting the previous
π
2
=
2
π
1
into this, we get
π
3
=
3
π
1
.
Step 3.
The normalising condition is
π
1
+
π
2
+
π
3
=
π
1
+
2
π
1
+
3
π
1
=
6
π
1
=
1
.
Therefore
π
1
=
1
6
. Substituting this into our previous expressions, we get
π
2
=
2
π
1
=
1
3
and
π
3
=
3
π
1
=
1
2
.
Thus the solution is
π
=
(
1
6
,
1
3
,
1
2
)
.
We can check our answer with the discarded third equation, just to make sure we didn’t make any mistakes. We get
π
3
=
1
4
π
1
+
1
4
π
2
+
3
4
π
3
=
1
4
1
6
+
1
4
1
3
+
3
4
1
2
=
1
24
+
2
24
+
9
24
=
1
2
,
which is as it should be.
Existence and uniqueness
Given a Markov chain its natural to ask:
Does a stationary distribution exist?
If a stationary distribution does exists, is there only one, or are there be many stationary distributions?
The answer is given by the following very important theorem.
Theorem 10.1
Consider an irreducible Markov chain.
If the Markov chain is positive recurrent, then a stationary distribution
π
exists, is unique, and is given by
π
i
=
1
/
μ
i
, where
μ
i
is the expected return time to state
i
.
If the Markov chain is null recurrent or transient, then no stationary distribution exists.
We give
an optional and nonexaminable proof to the first part below
.
In our no-claims discount example, the chain is irreducible and, like all finite state irreducible chains, it is positive recurrent. Thus the stationary distribution
π
=
(
1
13
,
3
13
,
9
13
)
we found is the unique stationary distribution for that chain.
Once we have the stationary distribution
π
, we get the expected return times
μ
i
=
1
/
π
i
for free: the expected return times are
μ
1
=
13
,
μ
2
=
13
3
=
4.33
, and
μ
3
=
13
9
=
1.44
.
Note the condition in Theorem
10.1
that the Markov chain is irreducible. What if the Markov chain is not irreducible, so has more than one communicating class? We can work out what must happen from the theorem:
If none of the classes are positive recurrent, then no stationary distribution exists.
If exactly one of the classes is positive recurrent (and therefore closed), then there exists a unique stationary distribution, supported only on that closed class.
If more the one of the classes are positive recurrent, then many stationary distributions will exist.
Example 10.2
Consider the simple random walk with
p
≠
0
,
1
. This Markov chain is irreducible, and is null recurrent for
p
=
1
2
and transient for
p
≠
1
2
. Either way, the theorem tells us that no stationary distribution exists.
Example 10.3
Consider the Markov chain with transition matrix
P
=
(
1
2
1
2
0
0
1
2
1
2
0
0
0
0
1
4
3
4
0
0
1
2
1
2
)
.
This chain has two closed positive recurrent classes,
{
1
,
2
}
and
{
3
,
4
}
.
Solving
π
=
π
P
gives
π
1
=
1
2
π
1
+
1
2
π
2
⇒
3
π
1
=
π
2
π
2
=
1
2
π
1
+
1
2
π
2
⇒
3
π
1
=
π
2
π
3
=
1
4
π
3
+
1
2
π
4
⇒
3
π
3
=
2
π
4
π
4
=
3
4
π
3
+
1
2
π
4
⇒
3
π
3
=
2
π
4
,
giving us the same two constraints twice each. We also have the normalising condition
π
1
+
π
2
+
π
3
+
π
4
=
1
. If we let
π
1
+
π
2
=
α
and
π
3
+
π
4
=
1
−
α
, we see that
π
=
(
1
2
α
1
2
α
2
5
(
1
−
α
)
3
5
(
1
−
α
)
)
is a stationary distribution for any
0
≤
α
≤
1
, so we have infinitely many stationary distributions.
Proof of existence and uniqueness
This subsection is optional and nonexaminable.
It’s very important to be able to find the stationary distribution(s) of a Markov chain – you can reasonably expect a question on this to turn up on the exam. You should also know the conditions for existence and uniqueness of the stationary distribution. Being able to
prove
existence and uniqueness is less important, although for completeness we will do so here.
Theorem
10.1
had two points. The more important point was that irreducible, positive recurrent Markov chains have a stationary distribution, that it is unique, and that it is given by
π
i
=
1
/
μ
i
. We give a proof of that below, doing the existence and uniqueness parts separately.
The less important point was that null recurrent and transitive Markov chains do not have a stationary distribution, and this is more fiddly. You can find a proof (usually in multiple parts) in books such as Norris,
Markov Chains
, Section 1.7.
Existence:
Every positive recurrent Markov chain has a stationary distribution.
Before we start, one last definition. Let us call a vector
ν
a
stationary vector
if
ν
P
=
ν
. This is exactly like a stationary distribution, except without the normalisation condition that it has to sum to 1.
Proof
.
Suppose that
(
X
n
)
is recurrent (either positive or null, for the moment).
Our first task will be to find a stationary vector. Fix an initial state
k
, and let
ν
i
be the expected number of visits to
i
before we return back to
k
. That is,
ν
i
=
E
(
#
visits to
i
before returning to
k
∣
X
0
=
k
)
=
E
∑
n
=
1
M
k
P
(
X
n
=
i
∣
X
0
=
k
)
=
∑
n
=
1
∞
P
(
X
n
=
i
and
n
≤
M
k
∣
X
0
=
k
)
,
where
M
k
is
the return time, as in Section 8
.
Let us note for later use that, under this definition,
ν
k
=
1
, because the only visit to
k
is the return to
k
itself.
Since
ν
is counting the number of visits to different states in a certain (random) time, it seems plausible that
ν
suitably normalised could be a stationary distribution, meaning that
ν
itself could be a stationary vector. Let’s check.
We want to show that
∑
i
ν
i
p
i
j
=
ν
j
. Let’s see what we have:
∑
i
∈
S
ν
i
p
i
j
=
∑
i
∈
S
∑
n
=
1
∞
P
(
X
n
=
i
and
n
≤
M
k
∣
X
0
=
k
)
p
i
j
=
∑
n
=
1
∞
∑
i
∈
S
P
(
X
n
=
i
and
X
n
+
1
=
j
and
n
≤
M
k
∣
X
0
=
k
)
=
∑
n
=
1
∞
P
(
X
n
+
1
=
j
and
n
≤
M
k
∣
X
0
=
k
)
.
(Exchanging the order of the sums is legitimate, because recurrence of the chain means that
M
k
is finite with probability 1.)
We can now do a cheeky bit of monkeying around with the index
n
, by swapping out the visit to
k
at time
M
k
with the visit to
k
at time
0
. This means instead of counting the visits from
1
to
M
k
, we can count the visits from
0
to
M
k
−
1
. Shuffling the index about, we get
∑
i
∈
S
ν
i
p
i
j
=
∑
n
=
0
∞
P
(
X
n
+
1
=
j
and
n
≤
M
k
−
1
∣
X
0
=
k
)
=
∑
n
+
1
=
1
∞
P
(
X
n
+
1
=
j
and
n
+
1
≤
M
k
∣
X
0
=
k
)
=
∑
n
=
1
∞
P
(
X
n
=
j
and
n
≤
M
k
∣
X
0
=
k
)
=
ν
j
.
So
ν
is indeed a stationary vector.
We now want normalise
ν
into a stationary distribution by dividing through by
∑
i
ν
i
. We can do this if
∑
i
ν
i
is finite. But
∑
i
ν
i
is the expected total number of visits to all states before return to
k
, which is precisely the expected return time
μ
k
. Now we use the assumption that
(
X
n
)
is
positive
recurrent. This means that
μ
k
is finite, so
π
=
(
1
/
μ
k
)
ν
is a stationary distribution.
Uniqueness:
For an irreducible, positive recurrent Markov chain, the stationary distribution is unique and is given by
π
i
=
1
/
μ
i
.
I read the following proof in Stirzaker,
Elementary Probability
, Section 9.5.
Proof
.
Suppose the Markov chain is irreducible and positive recurrent, and suppose
π
is a stationary distribution. We want to show that
π
i
=
1
/
μ
i
for all
i
.
The only equation we have for
μ
k
is
this one from Section 8
:
(10.2)
μ
k
=
1
+
∑
j
p
k
j
η
j
k
.
Since that involves the expected hitting times
η
i
k
, let’s write down
the equation for them
too:
(10.3)
η
i
k
=
1
+
∑
j
p
i
j
η
j
k
for all
i
≠
k
.
In order to apply the fact that
π
is a stationary distribution, we’d like to get these into an equation with
∑
i
π
i
p
i
j
in it. Here’s a way we can do that:
Take
(10.3)
, multiply it by
π
i
and sum over all
i
≠
k
, to get
(10.4)
∑
i
π
i
η
i
k
=
∑
i
≠
k
π
i
+
∑
j
∑
i
≠
k
π
i
p
i
j
η
j
k
.
(The sum on the left can be over all
i
, since
η
k
k
=
0
.)
Also, take
(10.2)
and multiply it by
π
k
to get
(10.5)
π
k
μ
k
=
π
k
+
∑
j
π
k
p
k
j
η
j
k
Now add
(10.4)
and
(10.5)
together to get
∑
i
π
i
η
i
k
+
π
k
μ
k
=
∑
i
π
i
+
∑
j
∑
i
π
i
p
i
j
η
j
k
.
We can now use
∑
i
π
i
p
i
j
=
π
j
, along with
∑
i
π
i
=
1
, to get
∑
i
π
i
η
i
k
+
π
k
μ
k
=
1
+
∑
j
π
j
η
j
k
.
But the first term on the left and the last term on the right are equal, and because the Markov chain is irreducible and positive recurrent, they are finite. (That was
our lemma in the previous section
.) Thus we’re allowed to subtract them, and we get
π
k
μ
k
=
1
, which is indeed
π
k
=
1
/
μ
k
. We can repeat the argument for every choice of
k
.
In the next section
, we see how the stationary distribution tells us very important things about the long-term behaviour of a Markov chain. |
| Markdown | Type to search
- [MATH2750 Notes](https://mpaldridge.github.io/math2750/)
- [Schedule](https://mpaldridge.github.io/math2750/index.html)
- [About MATH2750](https://mpaldridge.github.io/math2750/S00-about.html)
- [Organisation of MATH2750](https://mpaldridge.github.io/math2750/S00-about.html#about-module)
- [Notes and videos](https://mpaldridge.github.io/math2750/S00-about.html#notes)
- [Problem sheets](https://mpaldridge.github.io/math2750/S00-about.html#problem-sheets)
- [Lectures](https://mpaldridge.github.io/math2750/S00-about.html#lectures)
- [Workshops](https://mpaldridge.github.io/math2750/S00-about.html#workshops)
- [Assessments](https://mpaldridge.github.io/math2750/S00-about.html#assessments)
- [Computing worksheets](https://mpaldridge.github.io/math2750/S00-about.html#about-computing)
- [Drop-in sessions](https://mpaldridge.github.io/math2750/S00-about.html#dropin)
- [Microsoft Team](https://mpaldridge.github.io/math2750/S00-about.html#team)
- [Time management](https://mpaldridge.github.io/math2750/S00-about.html#time)
- [Exam](https://mpaldridge.github.io/math2750/S00-about.html#exam)
- [Who should I ask about…?](https://mpaldridge.github.io/math2750/S00-about.html#ask)
- [Content of MATH2750](https://mpaldridge.github.io/math2750/S00-about.html#about-content)
- [Prerequisites](https://mpaldridge.github.io/math2750/S00-about.html#prereqs)
- [Syllabus](https://mpaldridge.github.io/math2750/S00-about.html#syllabus)
- [Books](https://mpaldridge.github.io/math2750/S00-about.html#books)
- [And finally…](https://mpaldridge.github.io/math2750/S00-about.html#finally)
- **Part I: Discrete time Markov chains**
- [**1** Stochastic processes and the Markov property](https://mpaldridge.github.io/math2750/S01-stochastic-processes.html "1 Stochastic processes and the Markov property")
- [**1\.1** Deterministic and random models](https://mpaldridge.github.io/math2750/S01-stochastic-processes.html#models)
- [**1\.2** Stochastic processes](https://mpaldridge.github.io/math2750/S01-stochastic-processes.html#stochastic-processes)
- [**1\.3** Markov property](https://mpaldridge.github.io/math2750/S01-stochastic-processes.html#markov-property)
- [**2** Random walk](https://mpaldridge.github.io/math2750/S02-random-walk.html)
- [**2\.1** Simple random walk](https://mpaldridge.github.io/math2750/S02-random-walk.html#simple-random-walk)
- [**2\.2** General random walks](https://mpaldridge.github.io/math2750/S02-random-walk.html#general-random-walks)
- [**2\.3** Exact distribution of the simple random walk](https://mpaldridge.github.io/math2750/S02-random-walk.html#exact-distribution)
- [Problem Sheet 1](https://mpaldridge.github.io/math2750/P01.html)
- [**3** Gambler’s ruin](https://mpaldridge.github.io/math2750/S03-gamblers-ruin.html)
- [**3\.1** Gambler’s ruin Markov chain](https://mpaldridge.github.io/math2750/S03-gamblers-ruin.html#ruin-chain)
- [**3\.2** Probability of ruin](https://mpaldridge.github.io/math2750/S03-gamblers-ruin.html#ruin-probability)
- [**3\.3** Expected duration of the game](https://mpaldridge.github.io/math2750/S03-gamblers-ruin.html#expected-duration)
- [**4** Linear difference equations](https://mpaldridge.github.io/math2750/S04-ldes.html)
- [**4\.1** Homogeneous linear difference equations](https://mpaldridge.github.io/math2750/S04-ldes.html#hom-ldes)
- [**4\.2** Probability of ruin for the gambler’s ruin](https://mpaldridge.github.io/math2750/S04-ldes.html#ruin-probability-solve)
- [**4\.3** Inhomogeneous linear difference equations](https://mpaldridge.github.io/math2750/S04-ldes.html#inhom-ldes)
- [**4\.4** Expected duration for the gambler’s ruin](https://mpaldridge.github.io/math2750/S04-ldes.html#duration-solve)
- [Problem sheet 2](https://mpaldridge.github.io/math2750/P02.html)
- [Assessment 1](https://mpaldridge.github.io/math2750/A1.html)
- [**5** Discrete time Markov chains](https://mpaldridge.github.io/math2750/S05-markov-chains.html)
- [**5\.1** Time homogeneous discrete time Markov chains](https://mpaldridge.github.io/math2750/S05-markov-chains.html#thmc)
- [**5\.2** A two-state example](https://mpaldridge.github.io/math2750/S05-markov-chains.html#S05-example)
- [**5\.3** *n*\-step transition probabilities](https://mpaldridge.github.io/math2750/S05-markov-chains.html#n-step)
- [**6** Examples from actuarial science](https://mpaldridge.github.io/math2750/S06-examples.html)
- [**6\.1** A simple no-claims discount model](https://mpaldridge.github.io/math2750/S06-examples.html#S06-example1)
- [**6\.2** An accident model with memory](https://mpaldridge.github.io/math2750/S06-examples.html#S06-example2)
- [**6\.3** A no-claims discount model with memory](https://mpaldridge.github.io/math2750/S06-examples.html#S06-example3)
- [Problem sheet 3](https://mpaldridge.github.io/math2750/P03.html)
- [**7** Class structure](https://mpaldridge.github.io/math2750/S07-classes.html)
- [**7\.1** Communicating classes](https://mpaldridge.github.io/math2750/S07-classes.html#comm-classes)
- [**7\.2** Periodicity](https://mpaldridge.github.io/math2750/S07-classes.html#periodicity)
- [**8** Hitting times](https://mpaldridge.github.io/math2750/S08-hitting-times.html)
- [**8\.1** Hitting probabilities and expected hitting times](https://mpaldridge.github.io/math2750/S08-hitting-times.html#hitting-definitions)
- [**8\.2** Return times](https://mpaldridge.github.io/math2750/S08-hitting-times.html#return-times)
- [**8\.3** Hitting and return times for the simple random walk](https://mpaldridge.github.io/math2750/S08-hitting-times.html#return-rw)
- [Problem sheet 4](https://mpaldridge.github.io/math2750/P04.html)
- [**9** Recurrence and transience](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html)
- [**9\.1** Recurrent and transient states](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#rec-trans-def)
- [**9\.2** Recurrent and transient classes](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#rec-tran-classes)
- [**9\.3** Positive and null recurrence](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#S09-positive-null)
- [**9\.4** Strong Markov property](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#S09-strong-markov)
- [**9\.5** A useful lemma](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#S09-lemma)
- [**10** Stationary distributions](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html)
- [**10\.1** Definition of stationary distribution](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#def-stationary-definition "10.1 Definition of stationary distribution")
- [**10\.2** Finding a stationary distribution](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#find-stationary "10.2 Finding a stationary distribution")
- [**10\.3** Existence and uniqueness](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#exist-unique)
- [**10\.4** Proof of existence and uniqueness](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#stat-proof "10.4 Proof of existence and uniqueness")
- [Problem sheet 5](https://mpaldridge.github.io/math2750/P05.html)
- [**11** Long-term behaviour of Markov chains](https://mpaldridge.github.io/math2750/S11-long-term-chains.html "11 Long-term behaviour of Markov chains")
- [**11\.1** Convergence to equilibrium](https://mpaldridge.github.io/math2750/S11-long-term-chains.html#equilibrium)
- [**11\.2** Examples of convergence and non-convergence](https://mpaldridge.github.io/math2750/S11-long-term-chains.html#convergence-examples)
- [**11\.3** Ergodic theorem](https://mpaldridge.github.io/math2750/S11-long-term-chains.html#S11-ergodic)
- [**11\.4** Proofs of the limit and ergodic theorems](https://mpaldridge.github.io/math2750/S11-long-term-chains.html#S11-proofs)
- [**12** End of of Part I: Discrete time Markov chains](https://mpaldridge.github.io/math2750/S12-revision-i.html "12 End of of Part I: Discrete time Markov chains")
- [**12\.1** Things to do](https://mpaldridge.github.io/math2750/S12-revision-i.html#todo-revision)
- [**12\.2** Summary of Part I](https://mpaldridge.github.io/math2750/S12-revision-i.html#summary-i)
- [Problem sheet 6](https://mpaldridge.github.io/math2750/P06.html)
- [Assessment 3](https://mpaldridge.github.io/math2750/A3.html)
- **Part II: Continuous time Markov jump processes**
- [**13** Poisson process with Poisson increments](https://mpaldridge.github.io/math2750/S13-poisson-poisson.html "13 Poisson process with Poisson increments")
- [**13\.1** Poisson distribution](https://mpaldridge.github.io/math2750/S13-poisson-poisson.html#poisson-dist)
- [**13\.2** Definition 1: Poisson increments](https://mpaldridge.github.io/math2750/S13-poisson-poisson.html#poisson-def-poisson)
- [**13\.3** Summed and marked Poisson processes](https://mpaldridge.github.io/math2750/S13-poisson-poisson.html#summed-marked)
- [**14** Poisson process with exponential holding times](https://mpaldridge.github.io/math2750/S14-poisson-exponential.html "14 Poisson process with exponential holding times")
- [**14\.1** Exponential distribution](https://mpaldridge.github.io/math2750/S14-poisson-exponential.html#exponential)
- [**14\.2** Definition 2: exponential holding times](https://mpaldridge.github.io/math2750/S14-poisson-exponential.html#definition-2-exponential-holding-times)
- [**14\.3** Markov property in continuous time](https://mpaldridge.github.io/math2750/S14-poisson-exponential.html#cont-markov)
- [Problem sheet 7](https://mpaldridge.github.io/math2750/P07.html)
- [**15** Poisson process in infinitesimal time periods](https://mpaldridge.github.io/math2750/S15-poisson-infinitesimal.html "15 Poisson process in infinitesimal time periods")
- [**15\.1** Definition 3: increments in infinitesimal time](https://mpaldridge.github.io/math2750/S15-poisson-infinitesimal.html#infinitesimal)
- [**15\.2** Example: sum of two Poisson processes](https://mpaldridge.github.io/math2750/S15-poisson-infinitesimal.html#sum2)
- [**15\.3** Forward equations and proof of equivalence](https://mpaldridge.github.io/math2750/S15-poisson-infinitesimal.html#forward)
- [**16** Counting processes](https://mpaldridge.github.io/math2750/S16-counting-processes.html)
- [**16\.1** Birth processes](https://mpaldridge.github.io/math2750/S16-counting-processes.html#birth-processes)
- [**16\.2** Time inhomogeneous Poisson process](https://mpaldridge.github.io/math2750/S16-counting-processes.html#TIPP)
- [Problem Sheet 8](https://mpaldridge.github.io/math2750/P08.html)
- [**17** Continuous time Markov jump processes](https://mpaldridge.github.io/math2750/S17-continuous-time.html "17 Continuous time Markov jump processes")
- [**17\.1** Jump chain and holding times](https://mpaldridge.github.io/math2750/S17-continuous-time.html#jump-holding)
- [**17\.2** Examples](https://mpaldridge.github.io/math2750/S17-continuous-time.html#CTMC-examples)
- [**17\.3** A brief note on explosion](https://mpaldridge.github.io/math2750/S17-continuous-time.html#explosion)
- [**18** Forward and backward equations](https://mpaldridge.github.io/math2750/S18-forward-backward.html)
- [**18\.1** Transitions in infinitesimal time periods](https://mpaldridge.github.io/math2750/S18-forward-backward.html#jump-infinitesimal)
- [**18\.2** Transition semigroup and the forward and backward equations](https://mpaldridge.github.io/math2750/S18-forward-backward.html#semigroup)
- [**18\.3** Matrix exponential](https://mpaldridge.github.io/math2750/S18-forward-backward.html#matrix-exp)
- [Problem Sheet 9](https://mpaldridge.github.io/math2750/P09.html)
- [**19** Class structure and hitting times](https://mpaldridge.github.io/math2750/S19-class-hitting.html)
- [**19\.1** Communicating classes](https://mpaldridge.github.io/math2750/S19-class-hitting.html#classes-cont)
- [**19\.2** A brief note on periodicity](https://mpaldridge.github.io/math2750/S19-class-hitting.html#periods2)
- [**19\.3** Hitting probabilities and expected hitting times](https://mpaldridge.github.io/math2750/S19-class-hitting.html#hitting2)
- [**19\.4** Recurrence and transience](https://mpaldridge.github.io/math2750/S19-class-hitting.html#recurrence-transience2)
- [**20** Long-term behaviour of Markov jump processes](https://mpaldridge.github.io/math2750/S20-long-term-jump.html "20 Long-term behaviour of Markov jump processes")
- [**20\.1** Stationary distributions](https://mpaldridge.github.io/math2750/S20-long-term-jump.html#stationary-jump)
- [**20\.2** Convergence to equilibrium](https://mpaldridge.github.io/math2750/S20-long-term-jump.html#convergernce-cont)
- [**20\.3** Ergodic theorem](https://mpaldridge.github.io/math2750/S20-long-term-jump.html#ergodic-cont)
- [Problem Sheet 10](https://mpaldridge.github.io/math2750/P10.html)
- [Assessment 4](https://mpaldridge.github.io/math2750/A4.html)
- [**21** Queues](https://mpaldridge.github.io/math2750/S21-queues.html)
- [**21\.1** M/M/∞ infinite server process](https://mpaldridge.github.io/math2750/S21-queues.html#MMinf)
- [**21\.2** M/M/1 single server queue](https://mpaldridge.github.io/math2750/S21-queues.html#MM1)
- [**22** End of Part II: Continuous time Markov jump processes](https://mpaldridge.github.io/math2750/S22-end.html "22 End of Part II: Continuous time Markov jump processes")
- [**22\.1** Summary of Part II](https://mpaldridge.github.io/math2750/S22-end.html#summary-ii)
- [**22\.2** Exam FAQs](https://mpaldridge.github.io/math2750/S22-end.html#exam-faqs)
- [Problem Sheet 11](https://mpaldridge.github.io/math2750/P11.html)
- [Computational worksheets](https://mpaldridge.github.io/math2750/computing.html)
- [About the computational worksheets](https://mpaldridge.github.io/math2750/computing.html#C-about)
- [How to access R](https://mpaldridge.github.io/math2750/computing.html#R-access)
- [R background](https://mpaldridge.github.io/math2750/computing.html#R-background)
A
A
Serif
Sans
White
Sepia
Night
# [MATH2750 Introduction to Markov Processes](https://mpaldridge.github.io/math2750/)
# Section 10 Stationary distributions
- Stationary distributions and how to find them
- Conditions for existence and uniqueness of the stationary distribution
## 10\.1 Definition of stationary distribution
Consider [the two-state “broken printer” Markov chain from Lecture 5](https://mpaldridge.github.io/math2750/S05-markov-chains.html#S05-example).

Figure 10.1: Transition diagram for the two-state broken printer chain.
Suppose we start the chain from the initial distribution λ0\=P(X0\=0)\=βα\+βλ1\=P(X0\=1)\=αα\+β. λ 0 \= P ( X 0 \= 0 ) \= β α \+ β λ 1 \= P ( X 0 \= 1 ) \= α α \+ β . (You may recognise this from [Question 3 on Problem Sheet 3 and the associated video](https://mpaldridge.github.io/math2750/P03.html#P03).) What’s the distribution after step 1? By conditioning on the initial state, we have P(X1\=0)\=λ0p00\+λ1p10\=βα\+β(1−α)\+αα\+ββ\=βα\+β,P(X1\=1)\=λ0p01\+λ1p11\=βα\+βα\+αα\+β(1−β)\=αα\+β. P ( X 1 \= 0 ) \= λ 0 p 00 \+ λ 1 p 10 \= β α \+ β ( 1 − α ) \+ α α \+ β β \= β α \+ β , P ( X 1 \= 1 ) \= λ 0 p 01 \+ λ 1 p 11 \= β α \+ β α \+ α α \+ β ( 1 − β ) \= α α \+ β . So we’re still in the same distribution we started in. By repeating the same calculation, we’re still going to be in this distribution after step 2, and step 3, and forever.
More generally, if we start from a state given by a distribution π\=(πi) π \= ( π i ), then after step 1 the probability we’re in state j j is ∑iπipij ∑ i π i p i j. So if πj\=∑iπipij π j \= ∑ i π i p i j, we stay in this distribution forever. We call such a distribution a stationary distribution. We again recodnise this formula as a matrix–vector multiplication, so this is π\=πP π \= π P, where π π is a *row* vector.
**Definition 10.1** Let (Xn) ( X n ) be a Markov chain on a state space S S with transition matrix P P. Let π\=(πi) π \= ( π i ) be a distribution on S S, in that πi≥0 π i ≥ 0 for all i∈S i ∈ S and ∑i∈Sπi\=1 ∑ i ∈ S π i \= 1. We call π π a **stationary distribution** if πj\=∑i∈Sπipijfor all j∈S, π j \= ∑ i ∈ S π i p i j for all j ∈ S , or, equivalently, if π\=πP π \= π P.
Note that we’re saying the *distribution* P(Xn\=i) P ( X n \= i ) stays the same; the Markov chain (Xn) ( X n ) itself will keep moving. One way to think is that if we started off a thousand Markov chains, choosing each starting position to be i i with probability πi π i, then (roughly) 1000πj 1000 π j of them would be in state j j at any time in the future – but not necessarily the same ones each time.
## 10\.2 Finding a stationary distribution
Let’s try an example. Consider [the no-claims discount Markov chain from Lecture 6](https://mpaldridge.github.io/math2750/S06-examples.html#S06-example1) with state space S\={1,2,3} S \= { 1 , 2 , 3 } and transition matrix P\=⎛⎜ ⎜ ⎜⎝143401403401434⎞⎟ ⎟ ⎟⎠. P \= ( 1 4 3 4 0 1 4 0 3 4 0 1 4 3 4 ) .
We want to find a stationary distribution π π, which must solve the equation π\=πP π \= π P, which is (π1π2π3)\=(π1π2π3)⎛⎜ ⎜ ⎜⎝143401403401434⎞⎟ ⎟ ⎟⎠. ( π 1 π 2 π 3 ) \= ( π 1 π 2 π 3 ) ( 1 4 3 4 0 1 4 0 3 4 0 1 4 3 4 ) .
Writing out the equations coordinate at a time, we have π1\=14π1\+14π2,π2\=34π1\+14π3,π3\=34π2\+34π3. π 1 \= 1 4 π 1 \+ 1 4 π 2 , π 2 \= 3 4 π 1 \+ 1 4 π 3 , π 3 \= 3 4 π 2 \+ 3 4 π 3 . Since π π must be a distribution, we also have the “normalising condition” π1\+π2\+π3\=1. π 1 \+ π 2 \+ π 3 \= 1 .
The way to solve these equations is first to solve for all the variables πi π i in terms of a convenient πj π j (called the “working variable”) and then substitute all of these expressions into the normalising condition to find a value for πj π j.
Let’s choose π2 π 2 as our working variable. It turns out that π\=πP π \= π P always gives one more equation than we actually need, so we can discard one of them for free. Let’s get rid of the second equation, and the solve the first and third equations in terms of our working variable π2 π 2, to get π1\=13π2π3\=3π2.(10.1) (10.1) π 1 \= 1 3 π 2 π 3 \= 3 π 2 .
Now let’s turn to the normalising condition. That gives π1\+π2\+π3\=13π2\+π2\+3π2\=133π2\=1. π 1 \+ π 2 \+ π 3 \= 1 3 π 2 \+ π 2 \+ 3 π 2 \= 13 3 π 2 \= 1 . So the working variable is solved to be π2\=313 π 2 \= 3 13. Substituting this back into [(10.1)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:statt), we have π1\=13π2\=113 π 1 \= 1 3 π 2 \= 1 13 and π3\=3π2\=913 π 3 \= 3 π 2 \= 9 13. So the full solution is π\=(π1,π2,π3)\=(113,313,913). π \= ( π 1 , π 2 , π 3 ) \= ( 1 13 , 3 13 , 9 13 ) .
The method we used here can be summarised as follows:
1. Write out
π\=πP
π
\=
π
P
coordinate by coordinate. Discard one of the equations.
2. Select one of the
πi
π
i
as a working variable and treat it as a parameter. Solve the equations in terms of the working variable.
3. Substitute the solution into the normalising condition to find the working variable, and hence the full solution.
It can be good practice to use the equation discarded earlier to check that the calculated solution is indeed correct.
One extra example for further practice and to show how you should present your solutions to such problems:
**Example 10.1** *Consider a Markov chain on state space S\={1,2,3} S \= { 1 , 2 , 3 } with transition matrix* P\=⎛⎜ ⎜ ⎜⎝12141414121401434⎞⎟ ⎟ ⎟⎠. P \= ( 1 2 1 4 1 4 1 4 1 2 1 4 0 1 4 3 4 ) . *Find a stationary distribution for this Markov chain.*
*Step 1.* Writing out π\=πP π \= π P coordinate-wise, we have π1\=12π1\+14π2π2\=14π1\+12π2\+14π3π3\=14π1\+14π2\+34π3. π 1 \= 1 2 π 1 \+ 1 4 π 2 π 2 \= 1 4 π 1 \+ 1 2 π 2 \+ 1 4 π 3 π 3 \= 1 4 π 1 \+ 1 4 π 2 \+ 3 4 π 3 . We choose to discard the third equation.
*Step 2.* We choose π1 π 1 as our working variable. From the first equation we get π2\=2π1 π 2 \= 2 π 1. From the second equation we get π3\=2π2−π1 π 3 \= 2 π 2 − π 1, and substituting the previous π2\=2π1 π 2 \= 2 π 1 into this, we get π3\=3π1 π 3 \= 3 π 1.
*Step 3.* The normalising condition is π1\+π2\+π3\=π1\+2π1\+3π1\=6π1\=1. π 1 \+ π 2 \+ π 3 \= π 1 \+ 2 π 1 \+ 3 π 1 \= 6 π 1 \= 1 . Therefore π1\=16 π 1 \= 1 6. Substituting this into our previous expressions, we get π2\=2π1\=13 π 2 \= 2 π 1 \= 1 3 and π3\=3π1\=12 π 3 \= 3 π 1 \= 1 2. Thus the solution is π\=(16,13,12) π \= ( 1 6 , 1 3 , 1 2 ).
We can check our answer with the discarded third equation, just to make sure we didn’t make any mistakes. We get π3\=14π1\+14π2\+34π3\=1416\+1413\+3412\=124\+224\+924\=12, π 3 \= 1 4 π 1 \+ 1 4 π 2 \+ 3 4 π 3 \= 1 4 1 6 \+ 1 4 1 3 \+ 3 4 1 2 \= 1 24 \+ 2 24 \+ 9 24 \= 1 2 , which is as it should be.
## 10\.3 Existence and uniqueness
Given a Markov chain its natural to ask:
1. Does a stationary distribution exist?
2. If a stationary distribution does exists, is there only one, or are there be many stationary distributions?
The answer is given by the following very important theorem.
**Theorem 10.1** Consider an irreducible Markov chain.
- If the Markov chain is positive recurrent, then a stationary distribution
π
π
exists, is unique, and is given by
πi\=1/μi
π
i
\=
1
/
μ
i
, where
μi
μ
i
is the expected return time to state
i
i
.
- If the Markov chain is null recurrent or transient, then no stationary distribution exists.
We give [an optional and nonexaminable proof to the first part below](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#stat-proof).
In our no-claims discount example, the chain is irreducible and, like all finite state irreducible chains, it is positive recurrent. Thus the stationary distribution π\=(113,313,913) π \= ( 1 13 , 3 13 , 9 13 ) we found is the unique stationary distribution for that chain. Once we have the stationary distribution π π, we get the expected return times μi\=1/πi μ i \= 1 / π i for free: the expected return times are μ1\=13 μ 1 \= 13, μ2\=133\=4\.33 μ 2 \= 13 3 \= 4\.33, and μ3\=139\=1\.44 μ 3 \= 13 9 \= 1\.44.
Note the condition in Theorem [10\.1](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#thm:statex) that the Markov chain is irreducible. What if the Markov chain is not irreducible, so has more than one communicating class? We can work out what must happen from the theorem:
- If none of the classes are positive recurrent, then no stationary distribution exists.
- If exactly one of the classes is positive recurrent (and therefore closed), then there exists a unique stationary distribution, supported only on that closed class.
- If more the one of the classes are positive recurrent, then many stationary distributions will exist.
**Example 10.2** Consider the simple random walk with p≠0,1 p ≠ 0 , 1. This Markov chain is irreducible, and is null recurrent for p\=12 p \= 1 2 and transient for p≠12 p ≠ 1 2. Either way, the theorem tells us that no stationary distribution exists.
**Example 10.3** Consider the Markov chain with transition matrix P\=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝121200121200001434001212⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. P \= ( 1 2 1 2 0 0 1 2 1 2 0 0 0 0 1 4 3 4 0 0 1 2 1 2 ) . This chain has two closed positive recurrent classes, {1,2} { 1 , 2 } and {3,4} { 3 , 4 }.
Solving π\=πP π \= π P gives π1\=12π1\+12π2⇒3π1\=π2π2\=12π1\+12π2⇒3π1\=π2π3\=14π3\+12π4⇒3π3\=2π4π4\=34π3\+12π4⇒3π3\=2π4, π 1 \= 1 2 π 1 \+ 1 2 π 2 ⇒ 3 π 1 \= π 2 π 2 \= 1 2 π 1 \+ 1 2 π 2 ⇒ 3 π 1 \= π 2 π 3 \= 1 4 π 3 \+ 1 2 π 4 ⇒ 3 π 3 \= 2 π 4 π 4 \= 3 4 π 3 \+ 1 2 π 4 ⇒ 3 π 3 \= 2 π 4 , giving us the same two constraints twice each. We also have the normalising condition π1\+π2\+π3\+π4\=1 π 1 \+ π 2 \+ π 3 \+ π 4 \= 1. If we let π1\+π2\=α π 1 \+ π 2 \= α and π3\+π4\=1−α π 3 \+ π 4 \= 1 − α, we see that π\=(12α12α25(1−α)35(1−α)) π \= ( 1 2 α 1 2 α 2 5 ( 1 − α ) 3 5 ( 1 − α ) ) is a stationary distribution for any 0≤α≤1 0 ≤ α ≤ 1, so we have infinitely many stationary distributions.
## 10\.4 Proof of existence and uniqueness
*This subsection is optional and nonexaminable.*
It’s very important to be able to find the stationary distribution(s) of a Markov chain – you can reasonably expect a question on this to turn up on the exam. You should also know the conditions for existence and uniqueness of the stationary distribution. Being able to *prove* existence and uniqueness is less important, although for completeness we will do so here.
Theorem [10\.1](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#thm:statex) had two points. The more important point was that irreducible, positive recurrent Markov chains have a stationary distribution, that it is unique, and that it is given by πi\=1/μi π i \= 1 / μ i. We give a proof of that below, doing the existence and uniqueness parts separately. The less important point was that null recurrent and transitive Markov chains do not have a stationary distribution, and this is more fiddly. You can find a proof (usually in multiple parts) in books such as Norris, [*Markov Chains*](https://www.statslab.cam.ac.uk/~james/Markov/), Section 1.7.
**Existence:** *Every positive recurrent Markov chain has a stationary distribution.*
Before we start, one last definition. Let us call a vector ν ν a **stationary vector** if νP\=ν ν P \= ν. This is exactly like a stationary distribution, except without the normalisation condition that it has to sum to 1.
*Proof*. Suppose that (Xn) ( X n ) is recurrent (either positive or null, for the moment).
Our first task will be to find a stationary vector. Fix an initial state k k, and let νi ν i be the expected number of visits to i i before we return back to k k. That is, νi\=E(\# visits to i before returning to k∣X0\=k)\=EMk∑n\=1P(Xn\=i∣X0\=k)\=∞∑n\=1P(Xn\=i and n≤Mk∣X0\=k), ν i \= E ( \# visits to i before returning to k ∣ X 0 \= k ) \= E ∑ n \= 1 M k P ( X n \= i ∣ X 0 \= k ) \= ∑ n \= 1 ∞ P ( X n \= i and n ≤ M k ∣ X 0 \= k ) , where Mk M k is [the return time, as in Section 8](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#S08-return-times). Let us note for later use that, under this definition, νk\=1 ν k \= 1, because the only visit to k k is the return to k k itself.
Since ν ν is counting the number of visits to different states in a certain (random) time, it seems plausible that ν ν suitably normalised could be a stationary distribution, meaning that ν ν itself could be a stationary vector. Let’s check.
We want to show that ∑iνipij\=νj ∑ i ν i p i j \= ν j. Let’s see what we have: ∑i∈Sνipij\=∑i∈S∞∑n\=1P(Xn\=i and n≤Mk∣X0\=k)pij\=∞∑n\=1∑i∈SP(Xn\=i and Xn\+1\=j and n≤Mk∣X0\=k)\=∞∑n\=1P(Xn\+1\=j and n≤Mk∣X0\=k). ∑ i ∈ S ν i p i j \= ∑ i ∈ S ∑ n \= 1 ∞ P ( X n \= i and n ≤ M k ∣ X 0 \= k ) p i j \= ∑ n \= 1 ∞ ∑ i ∈ S P ( X n \= i and X n \+ 1 \= j and n ≤ M k ∣ X 0 \= k ) \= ∑ n \= 1 ∞ P ( X n \+ 1 \= j and n ≤ M k ∣ X 0 \= k ) . (Exchanging the order of the sums is legitimate, because recurrence of the chain means that Mk M k is finite with probability 1.) We can now do a cheeky bit of monkeying around with the index n n, by swapping out the visit to k k at time Mk M k with the visit to k k at time 0 0. This means instead of counting the visits from 1 1 to Mk M k, we can count the visits from 0 0 to Mk−1 M k − 1. Shuffling the index about, we get ∑i∈Sνipij\=∞∑n\=0P(Xn\+1\=j and n≤Mk−1∣X0\=k)\=∞∑n\+1\=1P(Xn\+1\=j and n\+1≤Mk∣X0\=k)\=∞∑n\=1P(Xn\=j and n≤Mk∣X0\=k)\=νj. ∑ i ∈ S ν i p i j \= ∑ n \= 0 ∞ P ( X n \+ 1 \= j and n ≤ M k − 1 ∣ X 0 \= k ) \= ∑ n \+ 1 \= 1 ∞ P ( X n \+ 1 \= j and n \+ 1 ≤ M k ∣ X 0 \= k ) \= ∑ n \= 1 ∞ P ( X n \= j and n ≤ M k ∣ X 0 \= k ) \= ν j . So ν ν is indeed a stationary vector.
We now want normalise ν ν into a stationary distribution by dividing through by ∑iνi ∑ i ν i. We can do this if ∑iνi ∑ i ν i is finite. But ∑iνi ∑ i ν i is the expected total number of visits to all states before return to k k, which is precisely the expected return time μk μ k. Now we use the assumption that (Xn) ( X n ) is *positive* recurrent. This means that μk μ k is finite, so π\=(1/μk)ν π \= ( 1 / μ k ) ν is a stationary distribution.
**Uniqueness:** *For an irreducible, positive recurrent Markov chain, the stationary distribution is unique and is given by πi\=1/μi π i \= 1 / μ i.*
I read the following proof in Stirzaker, [*Elementary Probability*](https://leeds.primo.exlibrisgroup.com/permalink/44LEE_INST/13rlbcs/alma991013131349705181), Section 9.5.
*Proof*. Suppose the Markov chain is irreducible and positive recurrent, and suppose π π is a stationary distribution. We want to show that πi\=1/μi π i \= 1 / μ i for all i i.
The only equation we have for μk μ k is [this one from Section 8](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#S08-return-times): μk\=1\+∑jpkjηjk.(10.2) (10.2) μ k \= 1 \+ ∑ j p k j η j k . Since that involves the expected hitting times ηik η i k, let’s write down [the equation for them](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#S08-return-times) too: ηik\=1\+∑jpijηjkfor all i≠k.(10.3) (10.3) η i k \= 1 \+ ∑ j p i j η j k for all i ≠ k .
In order to apply the fact that π π is a stationary distribution, we’d like to get these into an equation with ∑iπipij ∑ i π i p i j in it. Here’s a way we can do that: Take [(10.3)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf1), multiply it by πi π i and sum over all i≠k i ≠ k, to get ∑iπiηik\=∑i≠kπi\+∑j∑i≠kπipijηjk.(10.4) (10.4) ∑ i π i η i k \= ∑ i ≠ k π i \+ ∑ j ∑ i ≠ k π i p i j η j k . (The sum on the left can be over all i i, since ηkk\=0 η k k \= 0.) Also, take [(10.2)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf2) and multiply it by πk π k to get πkμk\=πk\+∑jπkpkjηjk(10.5) (10.5) π k μ k \= π k \+ ∑ j π k p k j η j k Now add [(10.4)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf3) and [(10.5)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf4) together to get ∑iπiηik\+πkμk\=∑iπi\+∑j∑iπipijηjk. ∑ i π i η i k \+ π k μ k \= ∑ i π i \+ ∑ j ∑ i π i p i j η j k .
We can now use ∑iπipij\=πj ∑ i π i p i j \= π j, along with ∑iπi\=1 ∑ i π i \= 1, to get ∑iπiηik\+πkμk\=1\+∑jπjηjk. ∑ i π i η i k \+ π k μ k \= 1 \+ ∑ j π j η j k .
But the first term on the left and the last term on the right are equal, and because the Markov chain is irreducible and positive recurrent, they are finite. (That was [our lemma in the previous section](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#S09-lemma).) Thus we’re allowed to subtract them, and we get πkμk\=1 π k μ k \= 1, which is indeed πk\=1/μk π k \= 1 / μ k. We can repeat the argument for every choice of k k.
**In the next section**, we see how the stationary distribution tells us very important things about the long-term behaviour of a Markov chain. |
| Readable Markdown | - Stationary distributions and how to find them
- Conditions for existence and uniqueness of the stationary distribution
## Definition of stationary distribution
Consider [the two-state “broken printer” Markov chain from Lecture 5](https://mpaldridge.github.io/math2750/S05-markov-chains.html#S05-example).

Figure 10.1: Transition diagram for the two-state broken printer chain.
Suppose we start the chain from the initial distribution λ 0 \= P ( X 0 \= 0 ) \= β α \+ β λ 1 \= P ( X 0 \= 1 ) \= α α \+ β . (You may recognise this from [Question 3 on Problem Sheet 3 and the associated video](https://mpaldridge.github.io/math2750/P03.html#P03).) What’s the distribution after step 1? By conditioning on the initial state, we have P ( X 1 \= 0 ) \= λ 0 p 00 \+ λ 1 p 10 \= β α \+ β ( 1 − α ) \+ α α \+ β β \= β α \+ β , P ( X 1 \= 1 ) \= λ 0 p 01 \+ λ 1 p 11 \= β α \+ β α \+ α α \+ β ( 1 − β ) \= α α \+ β . So we’re still in the same distribution we started in. By repeating the same calculation, we’re still going to be in this distribution after step 2, and step 3, and forever.
More generally, if we start from a state given by a distribution π \= ( π i ), then after step 1 the probability we’re in state j is ∑ i π i p i j. So if π j \= ∑ i π i p i j, we stay in this distribution forever. We call such a distribution a stationary distribution. We again recodnise this formula as a matrix–vector multiplication, so this is π \= π P, where π is a *row* vector.
**Definition 10.1** Let ( X n ) be a Markov chain on a state space S with transition matrix P. Let π \= ( π i ) be a distribution on S, in that π i ≥ 0 for all i ∈ S and ∑ i ∈ S π i \= 1. We call π a **stationary distribution** if π j \= ∑ i ∈ S π i p i j for all j ∈ S , or, equivalently, if π \= π P.
Note that we’re saying the *distribution* P ( X n \= i ) stays the same; the Markov chain ( X n ) itself will keep moving. One way to think is that if we started off a thousand Markov chains, choosing each starting position to be i with probability π i, then (roughly) 1000 π j of them would be in state j at any time in the future – but not necessarily the same ones each time.
## Finding a stationary distribution
Let’s try an example. Consider [the no-claims discount Markov chain from Lecture 6](https://mpaldridge.github.io/math2750/S06-examples.html#S06-example1) with state space S \= { 1 , 2 , 3 } and transition matrix P \= ( 1 4 3 4 0 1 4 0 3 4 0 1 4 3 4 ) .
We want to find a stationary distribution π, which must solve the equation π \= π P, which is ( π 1 π 2 π 3 ) \= ( π 1 π 2 π 3 ) ( 1 4 3 4 0 1 4 0 3 4 0 1 4 3 4 ) .
Writing out the equations coordinate at a time, we have π 1 \= 1 4 π 1 \+ 1 4 π 2 , π 2 \= 3 4 π 1 \+ 1 4 π 3 , π 3 \= 3 4 π 2 \+ 3 4 π 3 . Since π must be a distribution, we also have the “normalising condition” π 1 \+ π 2 \+ π 3 \= 1 .
The way to solve these equations is first to solve for all the variables π i in terms of a convenient π j (called the “working variable”) and then substitute all of these expressions into the normalising condition to find a value for π j.
Let’s choose π 2 as our working variable. It turns out that π \= π P always gives one more equation than we actually need, so we can discard one of them for free. Let’s get rid of the second equation, and the solve the first and third equations in terms of our working variable π 2, to get (10.1) π 1 \= 1 3 π 2 π 3 \= 3 π 2 .
Now let’s turn to the normalising condition. That gives π 1 \+ π 2 \+ π 3 \= 1 3 π 2 \+ π 2 \+ 3 π 2 \= 13 3 π 2 \= 1 . So the working variable is solved to be π 2 \= 3 13. Substituting this back into [(10.1)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:statt), we have π 1 \= 1 3 π 2 \= 1 13 and π 3 \= 3 π 2 \= 9 13. So the full solution is π \= ( π 1 , π 2 , π 3 ) \= ( 1 13 , 3 13 , 9 13 ) .
The method we used here can be summarised as follows:
1. Write out
π
\=
π
P
coordinate by coordinate. Discard one of the equations.
2. Select one of the
π
i
as a working variable and treat it as a parameter. Solve the equations in terms of the working variable.
3. Substitute the solution into the normalising condition to find the working variable, and hence the full solution.
It can be good practice to use the equation discarded earlier to check that the calculated solution is indeed correct.
One extra example for further practice and to show how you should present your solutions to such problems:
**Example 10.1** *Consider a Markov chain on state space S \= { 1 , 2 , 3 } with transition matrix* P \= ( 1 2 1 4 1 4 1 4 1 2 1 4 0 1 4 3 4 ) . *Find a stationary distribution for this Markov chain.*
*Step 1.* Writing out π \= π P coordinate-wise, we have π 1 \= 1 2 π 1 \+ 1 4 π 2 π 2 \= 1 4 π 1 \+ 1 2 π 2 \+ 1 4 π 3 π 3 \= 1 4 π 1 \+ 1 4 π 2 \+ 3 4 π 3 . We choose to discard the third equation.
*Step 2.* We choose π 1 as our working variable. From the first equation we get π 2 \= 2 π 1. From the second equation we get π 3 \= 2 π 2 − π 1, and substituting the previous π 2 \= 2 π 1 into this, we get π 3 \= 3 π 1.
*Step 3.* The normalising condition is π 1 \+ π 2 \+ π 3 \= π 1 \+ 2 π 1 \+ 3 π 1 \= 6 π 1 \= 1 . Therefore π 1 \= 1 6. Substituting this into our previous expressions, we get π 2 \= 2 π 1 \= 1 3 and π 3 \= 3 π 1 \= 1 2. Thus the solution is π \= ( 1 6 , 1 3 , 1 2 ).
We can check our answer with the discarded third equation, just to make sure we didn’t make any mistakes. We get π 3 \= 1 4 π 1 \+ 1 4 π 2 \+ 3 4 π 3 \= 1 4 1 6 \+ 1 4 1 3 \+ 3 4 1 2 \= 1 24 \+ 2 24 \+ 9 24 \= 1 2 , which is as it should be.
## Existence and uniqueness
Given a Markov chain its natural to ask:
1. Does a stationary distribution exist?
2. If a stationary distribution does exists, is there only one, or are there be many stationary distributions?
The answer is given by the following very important theorem.
**Theorem 10.1** Consider an irreducible Markov chain.
- If the Markov chain is positive recurrent, then a stationary distribution
π
exists, is unique, and is given by
π
i
\=
1
/
μ
i
, where
μ
i
is the expected return time to state
i
.
- If the Markov chain is null recurrent or transient, then no stationary distribution exists.
We give [an optional and nonexaminable proof to the first part below](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#stat-proof).
In our no-claims discount example, the chain is irreducible and, like all finite state irreducible chains, it is positive recurrent. Thus the stationary distribution π \= ( 1 13 , 3 13 , 9 13 ) we found is the unique stationary distribution for that chain. Once we have the stationary distribution π, we get the expected return times μ i \= 1 / π i for free: the expected return times are μ 1 \= 13, μ 2 \= 13 3 \= 4\.33, and μ 3 \= 13 9 \= 1\.44.
Note the condition in Theorem [10\.1](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#thm:statex) that the Markov chain is irreducible. What if the Markov chain is not irreducible, so has more than one communicating class? We can work out what must happen from the theorem:
- If none of the classes are positive recurrent, then no stationary distribution exists.
- If exactly one of the classes is positive recurrent (and therefore closed), then there exists a unique stationary distribution, supported only on that closed class.
- If more the one of the classes are positive recurrent, then many stationary distributions will exist.
**Example 10.2** Consider the simple random walk with p ≠ 0 , 1. This Markov chain is irreducible, and is null recurrent for p \= 1 2 and transient for p ≠ 1 2. Either way, the theorem tells us that no stationary distribution exists.
**Example 10.3** Consider the Markov chain with transition matrix P \= ( 1 2 1 2 0 0 1 2 1 2 0 0 0 0 1 4 3 4 0 0 1 2 1 2 ) . This chain has two closed positive recurrent classes, { 1 , 2 } and { 3 , 4 }.
Solving π \= π P gives π 1 \= 1 2 π 1 \+ 1 2 π 2 ⇒ 3 π 1 \= π 2 π 2 \= 1 2 π 1 \+ 1 2 π 2 ⇒ 3 π 1 \= π 2 π 3 \= 1 4 π 3 \+ 1 2 π 4 ⇒ 3 π 3 \= 2 π 4 π 4 \= 3 4 π 3 \+ 1 2 π 4 ⇒ 3 π 3 \= 2 π 4 , giving us the same two constraints twice each. We also have the normalising condition π 1 \+ π 2 \+ π 3 \+ π 4 \= 1. If we let π 1 \+ π 2 \= α and π 3 \+ π 4 \= 1 − α, we see that π \= ( 1 2 α 1 2 α 2 5 ( 1 − α ) 3 5 ( 1 − α ) ) is a stationary distribution for any 0 ≤ α ≤ 1, so we have infinitely many stationary distributions.
## Proof of existence and uniqueness
*This subsection is optional and nonexaminable.*
It’s very important to be able to find the stationary distribution(s) of a Markov chain – you can reasonably expect a question on this to turn up on the exam. You should also know the conditions for existence and uniqueness of the stationary distribution. Being able to *prove* existence and uniqueness is less important, although for completeness we will do so here.
Theorem [10\.1](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#thm:statex) had two points. The more important point was that irreducible, positive recurrent Markov chains have a stationary distribution, that it is unique, and that it is given by π i \= 1 / μ i. We give a proof of that below, doing the existence and uniqueness parts separately. The less important point was that null recurrent and transitive Markov chains do not have a stationary distribution, and this is more fiddly. You can find a proof (usually in multiple parts) in books such as Norris, [*Markov Chains*](https://www.statslab.cam.ac.uk/~james/Markov/), Section 1.7.
**Existence:** *Every positive recurrent Markov chain has a stationary distribution.*
Before we start, one last definition. Let us call a vector ν a **stationary vector** if ν P \= ν. This is exactly like a stationary distribution, except without the normalisation condition that it has to sum to 1.
*Proof*. Suppose that ( X n ) is recurrent (either positive or null, for the moment).
Our first task will be to find a stationary vector. Fix an initial state k, and let ν i be the expected number of visits to i before we return back to k. That is, ν i \= E ( \# visits to i before returning to k ∣ X 0 \= k ) \= E ∑ n \= 1 M k P ( X n \= i ∣ X 0 \= k ) \= ∑ n \= 1 ∞ P ( X n \= i and n ≤ M k ∣ X 0 \= k ) , where M k is [the return time, as in Section 8](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#S08-return-times). Let us note for later use that, under this definition, ν k \= 1, because the only visit to k is the return to k itself.
Since ν is counting the number of visits to different states in a certain (random) time, it seems plausible that ν suitably normalised could be a stationary distribution, meaning that ν itself could be a stationary vector. Let’s check.
We want to show that ∑ i ν i p i j \= ν j. Let’s see what we have: ∑ i ∈ S ν i p i j \= ∑ i ∈ S ∑ n \= 1 ∞ P ( X n \= i and n ≤ M k ∣ X 0 \= k ) p i j \= ∑ n \= 1 ∞ ∑ i ∈ S P ( X n \= i and X n \+ 1 \= j and n ≤ M k ∣ X 0 \= k ) \= ∑ n \= 1 ∞ P ( X n \+ 1 \= j and n ≤ M k ∣ X 0 \= k ) . (Exchanging the order of the sums is legitimate, because recurrence of the chain means that M k is finite with probability 1.) We can now do a cheeky bit of monkeying around with the index n, by swapping out the visit to k at time M k with the visit to k at time 0. This means instead of counting the visits from 1 to M k, we can count the visits from 0 to M k − 1. Shuffling the index about, we get ∑ i ∈ S ν i p i j \= ∑ n \= 0 ∞ P ( X n \+ 1 \= j and n ≤ M k − 1 ∣ X 0 \= k ) \= ∑ n \+ 1 \= 1 ∞ P ( X n \+ 1 \= j and n \+ 1 ≤ M k ∣ X 0 \= k ) \= ∑ n \= 1 ∞ P ( X n \= j and n ≤ M k ∣ X 0 \= k ) \= ν j . So ν is indeed a stationary vector.
We now want normalise ν into a stationary distribution by dividing through by ∑ i ν i. We can do this if ∑ i ν i is finite. But ∑ i ν i is the expected total number of visits to all states before return to k, which is precisely the expected return time μ k. Now we use the assumption that ( X n ) is *positive* recurrent. This means that μ k is finite, so π \= ( 1 / μ k ) ν is a stationary distribution.
**Uniqueness:** *For an irreducible, positive recurrent Markov chain, the stationary distribution is unique and is given by π i \= 1 / μ i.*
I read the following proof in Stirzaker, [*Elementary Probability*](https://leeds.primo.exlibrisgroup.com/permalink/44LEE_INST/13rlbcs/alma991013131349705181), Section 9.5.
*Proof*. Suppose the Markov chain is irreducible and positive recurrent, and suppose π is a stationary distribution. We want to show that π i \= 1 / μ i for all i.
The only equation we have for μ k is [this one from Section 8](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#S08-return-times): (10.2) μ k \= 1 \+ ∑ j p k j η j k . Since that involves the expected hitting times η i k, let’s write down [the equation for them](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#S08-return-times) too: (10.3) η i k \= 1 \+ ∑ j p i j η j k for all i ≠ k .
In order to apply the fact that π is a stationary distribution, we’d like to get these into an equation with ∑ i π i p i j in it. Here’s a way we can do that: Take [(10.3)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf1), multiply it by π i and sum over all i ≠ k, to get (10.4) ∑ i π i η i k \= ∑ i ≠ k π i \+ ∑ j ∑ i ≠ k π i p i j η j k . (The sum on the left can be over all i, since η k k \= 0.) Also, take [(10.2)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf2) and multiply it by π k to get (10.5) π k μ k \= π k \+ ∑ j π k p k j η j k Now add [(10.4)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf3) and [(10.5)](https://mpaldridge.github.io/math2750/S10-stationary-distributions.html#eq:pf4) together to get ∑ i π i η i k \+ π k μ k \= ∑ i π i \+ ∑ j ∑ i π i p i j η j k .
We can now use ∑ i π i p i j \= π j, along with ∑ i π i \= 1, to get ∑ i π i η i k \+ π k μ k \= 1 \+ ∑ j π j η j k .
But the first term on the left and the last term on the right are equal, and because the Markov chain is irreducible and positive recurrent, they are finite. (That was [our lemma in the previous section](https://mpaldridge.github.io/math2750/S09-recurrence-transience.html#S09-lemma).) Thus we’re allowed to subtract them, and we get π k μ k \= 1, which is indeed π k \= 1 / μ k. We can repeat the argument for every choice of k.
**In the next section**, we see how the stationary distribution tells us very important things about the long-term behaviour of a Markov chain. |
| Shard | 143 (laksa) |
| Root Hash | 2566890010099092343 |
| Unparsed URL | io,github!mpaldridge,/math2750/S10-stationary-distributions.html s443 |