Lecture 9 Binary symmetric channel - MywallpapersMobi

# Binary symmetric channel

 This article includes a list of references , but its sources remain unclear because it has insufficient inline citations . Please help to improve this article by introducing more precise citations. (March 2013) ( Learn how and when to remove this template message )

A binary symmetric channel (or BSC) is a common communications channel model used in coding theory and information theory . In this model, a transmitter wishes to send a bit (a zero or a one), and the receiver receives a bit. It is assumed that the bit is usually transmitted correctly, but that it will be “flipped” with a small probability (the “crossover probability”). This channel is used frequently in information theory because it is one of the simplest channels to analyze.

## Contents

• 1 Description
• 2 Definition
• 2.1 Capacity of BSCp
• 3 Shannon’s channel capacity theorem for BSCp
• 3.1 Noisy coding theorem for BSCp
• 4 Converse of Shannon’s capacity theorem
• 5 Codes for BSCp
• 6 Forney’s code for BSCp
• 6.1 Decoding error probability for C*
• 8 Notes
• 9 References

## Description[ edit ]

The BSC is a binary channel; that is, it can transmit only one of two symbols (usually called 0 and 1). (A non-binary channel would be capable of transmitting more than 2 symbols, possibly even an infinite number of choices.) The transmission is not perfect, and occasionally the receiver gets the wrong bit.

This channel is often used by theorists because it is one of the simplest noisy channels to analyze. Many problems in communication theory can be reduced to a BSC. Conversely, being able to transmit effectively over the BSC can give rise to solutions for more complicated channels.

## Definition[ edit ]

A binary symmetric channel with crossover probability

$\displaystyle p$

p

\displaystyle p

denoted by BSCp, is a channel with binary input and binary output and probability of error

$\displaystyle p$

p

\displaystyle p

; that is, if

$\displaystyle X$

X

\displaystyle X

is the transmitted random variable and

$\displaystyle Y$

Y

\displaystyle Y

the received variable, then the channel is characterized by the conditional probabilities

\displaystyle \beginaligned\operatorname Pr [Y=0

Pr

[
Y
=
0

|

X
=
0
]

=
1

p

Pr

[
Y
=
0

|

X
=
1
]

=
p

Pr

[
Y
=
1

|

X
=
0
]

=
p

Pr

[
Y
=
1

|

X
=
1
]

=
1

p

\displaystyle X=0]&=1-p\\\operatorname Pr [Y=0

It is assumed that

$\displaystyle 0\leq p\leq 1/2$

0

p

1

/

2

\displaystyle 0\leq p\leq 1/2

. If

$\displaystyle p>1/2$

p
>
1

/

2

\displaystyle p>1/2

, then the receiver can swap the output (interpret 1 when it sees 0, and vice versa) and obtain an equivalent channel with crossover probability

$\displaystyle 1-p\leq 1/2$

1

p

1

/

2

\displaystyle 1-p\leq 1/2

.

### Capacity of BSCp[ edit ]

The channel capacity of the binary symmetric channel is

$\displaystyle \ C_\textBSC=1-\operatorname H _\textb(p),$

C

BSC

=
1

H

b

(
p
)
,

\displaystyle \ C_\textBSC=1-\operatorname H _\textb(p),

where

$\displaystyle \operatorname H _\textb(p)$

H

b

(
p
)

\displaystyle \operatorname H _\textb(p)

is the binary entropy function .

Proof: The capacity is defined as the maximum mutual entropy between input and output for all possible input distributions

$\displaystyle p_X(x)$

p

X

(
x
)

\displaystyle p_X(x)

:

$\displaystyle C=\max _p_X(x)\left\\,I(X;Y)\,\right\$

C
=

max

p

X

(
x
)

I
(
X
;
Y
)

\displaystyle C=\max _p_X(x)\left\\,I(X;Y)\,\right\

The mutual information can be reformulated as

\displaystyle X)\\&=H(Y)-\sum _x\in \0,1\X=x)\\&=H(Y)-\sum _x\in \0,1\p_X(x)\operatorname H _\textb(p)\\&=H(Y)-\operatorname H _\textb(p),\endaligned

I
(
X
;
Y
)

=
H
(
Y
)

H
(
Y

|

X
)

=
H
(
Y
)

x

0
,
1

p

X

(
x
)
H
(
Y

|

X
=
x
)

=
H
(
Y
)

x

0
,
1

p

X

(
x
)

H

b

(
p
)

=
H
(
Y
)

H

b

(
p
)
,

\displaystyle X)\\&=H(Y)-\sum _x\in \0,1\p_X(x)H(Y\\&=H(Y)-\sum _x\in \0,1\p_X(x)\operatorname H _\textb(p)\\&=H(Y)-\operatorname H _\textb(p),\endaligned

where the first and second step follows from the definition of mutual information and conditional entropy respectively. The entropy at the output for a given and fixed input symbol (

$\displaystyle H(Y$

H
(
Y

|

X
=
x
)

\displaystyle H(Y

) equals the binary entropy function, which leads to the third line and this can be further simplified.

In the last line, only the first term

$\displaystyle H(Y)$

H
(
Y
)

\displaystyle H(Y)

depends on the input distribution

$\displaystyle p_X(x)$

p

X

(
x
)

\displaystyle p_X(x)

. And one knows, that the entropy of a binary variable is at maximum one, and reaches this only if its probability distribution is uniform. This case (uniform distribution of

$\displaystyle Y$

Y

\displaystyle Y

) can only be reached by a uniform distribution at the input

$\displaystyle X$

X

\displaystyle X

, which is because of the symmetry of the channel. So one finally gets

$\displaystyle C_\textBSC=1-\operatorname H _\textb(p)$

C

BSC

=
1

H

b

(
p
)

\displaystyle C_\textBSC=1-\operatorname H _\textb(p)

.
[1]

## Shannon’s channel capacity theorem for BSCp[ edit ]

Shannon’s noisy coding theorem is general for all kinds of channels. We consider a special case of this theorem for a binary symmetric channel with an error probability p.

### Noisy coding theorem for BSCp[ edit ]

The noise

$\displaystyle e$

e

\displaystyle e

that characterizes

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

is a random variable consisting of n independent random bits (n is defined below) where each random bit is a

$\displaystyle 1$

1

\displaystyle 1

with probability

$\displaystyle p$

p

\displaystyle p

and a

$\displaystyle 0$

0

\displaystyle 0

with probability

$\displaystyle 1-p$

1

p

\displaystyle 1-p

. We indicate this by writing “

$\displaystyle e\in BSC_p$

e

B
S

C

p

\displaystyle e\in BSC_p

“.

Theorem 1. For all

$\displaystyle p<\tfrac 12,$

p
<

1
2

,

\displaystyle p<\tfrac 12,

all

$\displaystyle 0<\epsilon <\tfrac 12-p$

0
<
ϵ
<

1
2

p

\displaystyle 0<\epsilon <\tfrac 12-p

, all sufficiently large

$\displaystyle n$

n

\displaystyle n

(depending on

$\displaystyle p$

p

\displaystyle p

and

$\displaystyle \epsilon$

ϵ

\displaystyle \epsilon

), and all

$\displaystyle k\leq \lfloor (1-H(p+\epsilon ))n\rfloor$

k

(
1

H
(
p
+
ϵ
)
)
n

\displaystyle k\leq \lfloor (1-H(p+\epsilon ))n\rfloor

, there exists a pair of encoding and decoding functions

$\displaystyle E:\0,1\^k\to \0,1\^n$

E
:

0
,
1

k

0
,
1

n

\displaystyle E:\0,1\^k\to \0,1\^n

and

$\displaystyle D:\0,1\^n\to \0,1\^k$

D
:

0
,
1

n

0
,
1

k

\displaystyle D:\0,1\^n\to \0,1\^k

respectively, such that every message

$\displaystyle m\in \0,1\^k$

m

0
,
1

k

\displaystyle m\in \0,1\^k

has the following property:

$\displaystyle \Pr _e\in BSC_p[D(E(m)+e)\neq m]\leq 2^-\delta n$

Pr

e

B
S

C

p

[
D
(
E
(
m
)
+
e
)

m
]

2

δ

n

\displaystyle \Pr _e\in BSC_p[D(E(m)+e)\neq m]\leq 2^-\delta n

.

What this theorem actually implies is, a message when picked from

$\displaystyle \0,1\^k$

0
,
1

k

\displaystyle \0,1\^k

, encoded with a random encoding function

$\displaystyle E$

E

\displaystyle E

, and sent across a noisy

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

, there is a very high probability of recovering the original message by decoding, if

$\displaystyle k$

k

\displaystyle k

or in effect the rate of the channel is bounded by the quantity stated in the theorem. The decoding error probability is exponentially small.

Proof of Theorem 1. First we describe the encoding function and decoding functions used in the theorem. We will use the probabilistic method to prove this theorem. Shannon’s theorem was one of the earliest applications of this method.

Encoding function: Consider an encoding function

$\displaystyle E:\0,1\^k\to \0,1\^n$

E
:

0
,
1

k

0
,
1

n

\displaystyle E:\0,1\^k\to \0,1\^n

that is selected at random. This means that for each message

$\displaystyle m\in \0,1\^k$

m

0
,
1

k

\displaystyle m\in \0,1\^k

, the value

$\displaystyle E(m)\in \0,1\^n$

E
(
m
)

0
,
1

n

\displaystyle E(m)\in \0,1\^n

is selected at random (with equal probabilities).

Decoding function: For a given encoding function

$\displaystyle E$

E

\displaystyle E

, the decoding function

$\displaystyle D:\0,1\^n\to \0,1\^k$

D
:

0
,
1

n

0
,
1

k

\displaystyle D:\0,1\^n\to \0,1\^k

is specified as follows: given any received codeword

$\displaystyle y\in \0,1\^n$

y

0
,
1

n

\displaystyle y\in \0,1\^n

, we find the message

$\displaystyle m\in \0,1\^k$

m

0
,
1

k

\displaystyle m\in \0,1\^k

such that the Hamming distance

$\displaystyle \Delta (y,E(m))$

Δ
(
y
,
E
(
m
)
)

\displaystyle \Delta (y,E(m))

is as small as possible (with ties broken arbitrarily). This kind of a decoding function is called a maximum likelihood decoding (MLD) function.

Ultimately, we will show (by integrating the probabilities) that at least one such choice

$\displaystyle (E,D)$

(
E
,
D
)

\displaystyle (E,D)

satisfies the conclusion of theorem; that is what is meant by the probabilistic method.

The proof runs as follows. Suppose

$\displaystyle p$

p

\displaystyle p

and

$\displaystyle \epsilon$

ϵ

\displaystyle \epsilon

are fixed. First we show, for a fixed

$\displaystyle m\in \0,1\^k$

m

0
,
1

k

\displaystyle m\in \0,1\^k

and

$\displaystyle E$

E

\displaystyle E

chosen randomly, the probability of failure over

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

noise is exponentially small in n. At this point, the proof works for a fixed message

$\displaystyle m$

m

\displaystyle m

. Next we extend this result to work for all

$\displaystyle m$

m

\displaystyle m

. We achieve this by eliminating half of the codewords from the code with the argument that the proof for the decoding error probability holds for at least half of the codewords. The latter method is called expurgation. This gives the total process the name random coding with expurgation.

A high level proof: Fix

$\displaystyle p$

p

\displaystyle p

and

$\displaystyle \epsilon$

ϵ

\displaystyle \epsilon

. Given a fixed message

$\displaystyle m\in \0,1\^k$

m

0
,
1

k

\displaystyle m\in \0,1\^k

, we need to estimate the expected value of the probability of the received codeword along with the noise does not give back

$\displaystyle m$

m

\displaystyle m

on decoding. That is to say, we need to estimate:

$\displaystyle \mathbb E _E\left[\Pr _e\in BSC_p[D(E(m)+e)\neq m]\right].$

E

E

[

Pr

e

B
S

C

p

[
D
(
E
(
m
)
+
e
)

m
]

]

.

\displaystyle \mathbb E _E\left[\Pr _e\in BSC_p[D(E(m)+e)\neq m]\right].

Let

$\displaystyle y$

y

\displaystyle y

be the received codeword. In order for the decoded codeword

$\displaystyle D(y)$

D
(
y
)

\displaystyle D(y)

not to be equal to the message

$\displaystyle m$

m

\displaystyle m

, one of the following events must occur:

• $\displaystyle y$

y

\displaystyle y

does not lie within the Hamming ball of radius

$\displaystyle (p+\epsilon )n$

(
p
+
ϵ
)
n

\displaystyle (p+\epsilon )n

centered at

$\displaystyle E(m)$

E
(
m
)

\displaystyle E(m)

. This condition is mainly used to make the calculations easier.

• There is another message
$\displaystyle m'\in \0,1\^k$

m

0
,
1

k

\displaystyle m’\in \0,1\^k

such that

$\displaystyle \Delta (y,E(m'))\leqslant \Delta (y,E(m))$

Δ
(
y
,
E
(

m

)
)

Δ
(
y
,
E
(
m
)
)

\displaystyle \Delta (y,E(m’))\leqslant \Delta (y,E(m))

. In other words, the errors due to noise take the transmitted codeword closer to another encoded message.

We can apply Chernoff bound to ensure the non occurrence of the first event. By applying Chernoff bound we have,

$\displaystyle Pr_e\in BSC_p[\Delta (y,E(m))>(p+\epsilon )n]\leqslant 2^-\epsilon ^2n.$

P

r

e

B
S

C

p

[
Δ
(
y
,
E
(
m
)
)
>
(
p
+
ϵ
)
n
]

2

ϵ

2

n

.

\displaystyle Pr_e\in BSC_p[\Delta (y,E(m))>(p+\epsilon )n]\leqslant 2^-\epsilon ^2n.

This is exponentially small for large

$\displaystyle n$

n

\displaystyle n

(recall that

$\displaystyle \epsilon$

ϵ

\displaystyle \epsilon

is fixed).

As for the second event, we note that the probability that

$\displaystyle E(m')\in B(y,(p+\epsilon )n)$

E
(

m

)

B
(
y
,
(
p
+
ϵ
)
n
)

\displaystyle E(m’)\in B(y,(p+\epsilon )n)

is

$\displaystyle \textVol(B(y,(p+\epsilon )n)/2^n$

Vol

(
B
(
y
,
(
p
+
ϵ
)
n
)

/

2

n

\displaystyle \textVol(B(y,(p+\epsilon )n)/2^n

where

$\displaystyle B(x,r)$

B
(
x
,
r
)

\displaystyle B(x,r)

is the Hamming ball of radius

$\displaystyle r$

r

\displaystyle r

centered at vector

$\displaystyle x$

x

\displaystyle x

and

$\displaystyle \textVol(B(x,r))$

Vol

(
B
(
x
,
r
)
)

\displaystyle \textVol(B(x,r))

is its volume. Using approximation to estimate the number of codewords in the Hamming ball, we have

$\displaystyle \textVol(B(y,(p+\epsilon )n))\approx 2^H(p)n$

Vol

(
B
(
y
,
(
p
+
ϵ
)
n
)
)

2

H
(
p
)
n

\displaystyle \textVol(B(y,(p+\epsilon )n))\approx 2^H(p)n

. Hence the above probability amounts to

$\displaystyle 2^H(p)n/2^n=2^H(p)n-n$

2

H
(
p
)
n

/

2

n

=

2

H
(
p
)
n

n

\displaystyle 2^H(p)n/2^n=2^H(p)n-n

. Now using union bound , we can upper bound the existence of such an

$\displaystyle m'\in \0,1\^k$

m

0
,
1

k

\displaystyle m’\in \0,1\^k

by

$\displaystyle \leq 2^k+H(p)n-n$

2

k
+
H
(
p
)
n

n

\displaystyle \leq 2^k+H(p)n-n

which is

$\displaystyle 2^-\Omega (n)$

2

Ω
(
n
)

\displaystyle 2^-\Omega (n)

, as desired by the choice of

$\displaystyle k$

k

\displaystyle k

.

A detailed proof: From the above analysis, we calculate the probability of the event that the decoded codeword plus the channel noise is not the same as the original message sent. We shall introduce some symbols here. Let

$E(m))$

p
(
y

|

E
(
m
)
)

E(m))

denote the probability of receiving codeword

$\displaystyle y$

y

\displaystyle y

given that codeword

$\displaystyle E(m)$

E
(
m
)

\displaystyle E(m)

was sent. Let

$\displaystyle B_0$

B

0

\displaystyle B_0

denote

$\displaystyle B(E(m),(p+\epsilon )n).$

B
(
E
(
m
)
,
(
p
+
ϵ
)
n
)
.

\displaystyle B(E(m),(p+\epsilon )n).

\displaystyle E(m))\cdot 1_D(y)\neq m\endaligned

Pr

e

B
S

C

p

[
D
(
E
(
m
)
+
e
)

m
]

=

y

0
,
1

n

p
(
y

|

E
(
m
)
)

1

D
(
y
)

m

y

B

0

p
(
y

|

E
(
m
)
)

1

D
(
y
)

m

+

y

B

0

p
(
y

|

E
(
m
)
)

1

D
(
y
)

m

2

ϵ

2

n

+

y

B

0

p
(
y

|

E
(
m
)
)

1

D
(
y
)

m

\displaystyle E(m))\cdot 1_D(y)\neq m+\sum _y\in B_0p(y

We get the last inequality by our analysis using the Chernoff bound above. Now taking expectation on both sides we have,

$\displaystyle E(m))\mathbb E [1_D(y)\neq m]\\&\leqslant 2^-\epsilon ^2n+\sum _y\in B_0\mathbb E [1_D(y)\neq m]&&\sum _y\in B_0p(y$

E

E

[

Pr

e

B
S

C

p

[
D
(
E
(
m
)
+
e
)

m
]

]

2

ϵ

2

n

+

y

B

0

p
(
y

|

E
(
m
)
)

E

[

1

D
(
y
)

m

]

2

ϵ

2

n

+

y

B

0

E

[

1

D
(
y
)

m

]

y

B

0

p
(
y

|

E
(
m
)
)

1

2

ϵ

2

n

+

2

k
+
H
(
p
+
ϵ
)
n

n

E

[

1

D
(
y
)

m

]

2

k
+
H
(
p
+
ϵ
)
n

n

(see above)

2

δ
n

\displaystyle E(m))\mathbb E [1_D(y)\neq m]\\&\leqslant 2^-\epsilon ^2n+\sum _y\in B_0\mathbb E [1_D(y)\neq m]&&\sum _y\in B_0p(y

by appropriately choosing the value of

$\displaystyle \delta$

δ

\displaystyle \delta

. Since the above bound holds for each message, we have

$\displaystyle \mathbb E _m\left[\mathbb E _E\left[\Pr _e\in BSC_p\left[D(E(m)+e)\right]\neq m\right]\right]\leqslant 2^-\delta n.$

E

m

[

E

E

[

Pr

e

B
S

C

p

[

D
(
E
(
m
)
+
e
)

]

m

]

]

2

δ
n

.

\displaystyle \mathbb E _m\left[\mathbb E _E\left[\Pr _e\in BSC_p\left[D(E(m)+e)\right]\neq m\right]\right]\leqslant 2^-\delta n.

Now we can change the order of summation in the expectation with respect to the message and the choice of the encoding function

$\displaystyle E$

E

\displaystyle E

. Hence:

$\displaystyle \mathbb E _E\left[\mathbb E _m\left[\Pr _e\in BSC_p\left[D(E(m)+e)\right]\neq m\right]\right]\leqslant 2^-\delta n.$

E

E

[

E

m

[

Pr

e

B
S

C

p

[

D
(
E
(
m
)
+
e
)

]

m

]

]

2

δ
n

.

\displaystyle \mathbb E _E\left[\mathbb E _m\left[\Pr _e\in BSC_p\left[D(E(m)+e)\right]\neq m\right]\right]\leqslant 2^-\delta n.

Hence in conclusion, by probabilistic method, we have some encoding function

$\displaystyle E^*$

E

\displaystyle E^*

and a corresponding decoding function

$\displaystyle D^*$

D

\displaystyle D^*

such that

$\displaystyle \mathbb E _m\left[\Pr _e\in BSC_p\left[D^*(E^*(m)+e)\neq m\right]\right]\leqslant 2^-\delta n.$

E

m

[

Pr

e

B
S

C

p

[

D

(

E

(
m
)
+
e
)

m

]

]

2

δ
n

.

\displaystyle \mathbb E _m\left[\Pr _e\in BSC_p\left[D^*(E^*(m)+e)\neq m\right]\right]\leqslant 2^-\delta n.

At this point, the proof works for a fixed message

$\displaystyle m$

m

\displaystyle m

. But we need to make sure that the above bound holds for all the messages

$\displaystyle m$

m

\displaystyle m

simultaneously. For that, let us sort the

$\displaystyle 2^k$

2

k

\displaystyle 2^k

messages by their decoding error probabilities. Now by applying Markov’s inequality , we can show the decoding error probability for the first

$\displaystyle 2^k-1$

2

k

1

\displaystyle 2^k-1

messages to be at most

$\displaystyle 2\cdot 2^-\delta n$

2

2

δ
n

\displaystyle 2\cdot 2^-\delta n

. Thus in order to confirm that the above bound to hold for every message

$\displaystyle m$

m

\displaystyle m

, we could just trim off the last

$\displaystyle 2^k-1$

2

k

1

\displaystyle 2^k-1

messages from the sorted order. This essentially gives us another encoding function

$\displaystyle E'$

E

\displaystyle E’

with a corresponding decoding function

$\displaystyle D'$

D

\displaystyle D’

with a decoding error probability of at most

$\displaystyle 2^-\delta n+1$

2

δ
n
+
1

\displaystyle 2^-\delta n+1

with the same rate. Taking

$\displaystyle \delta '$

δ

\displaystyle \delta ‘

to be equal to

$\displaystyle \delta -\tfrac 1n$

δ

1
n

\displaystyle \delta -\tfrac 1n

we bound the decoding error probability to

$\displaystyle 2^-\delta 'n$

2

δ

n

\displaystyle 2^-\delta ‘n

. This expurgation process completes the proof of Theorem 1.

## Converse of Shannon’s capacity theorem[ edit ]

The converse of the capacity theorem essentially states that

$\displaystyle 1-H(p)$

1

H
(
p
)

\displaystyle 1-H(p)

is the best rate one can achieve over a binary symmetric channel. Formally the theorem states:

Theorem 2
If

$\displaystyle k$

k

\displaystyle k

$\displaystyle \geq$

\displaystyle \geq

$\displaystyle \lceil$

\displaystyle \lceil

$\displaystyle (1-H(p+\epsilon )n)$

(
1

H
(
p
+
ϵ
)
n
)

\displaystyle (1-H(p+\epsilon )n)

$\displaystyle \rceil$

\displaystyle \rceil

then the following is true for every encoding and decoding function

$\displaystyle E$

E

\displaystyle E

:

$\displaystyle \0,1\^k$

0
,
1

k

\displaystyle \0,1\^k

$\displaystyle \rightarrow$

\displaystyle \rightarrow

$\displaystyle \0,1\^n$

0
,
1

n

\displaystyle \0,1\^n

and

$\displaystyle D$

D

\displaystyle D

:

$\displaystyle \0,1\^n$

0
,
1

n

\displaystyle \0,1\^n

$\displaystyle \rightarrow$

\displaystyle \rightarrow

$\displaystyle \0,1\^k$

0
,
1

k

\displaystyle \0,1\^k

respectively:

$\displaystyle Pr_e\in BSC_p$

P

r

e

B
S

C

p

\displaystyle Pr_e\in BSC_p

[

$\displaystyle D(E(m)+e)$

D
(
E
(
m
)
+
e
)

\displaystyle D(E(m)+e)

$\displaystyle \neq$

\displaystyle \neq

$\displaystyle m]$

m
]

\displaystyle m]

$\displaystyle \geq$

\displaystyle \geq

$\displaystyle \frac 12$

1
2

\displaystyle \frac 12

.

For a detailed proof of this theorem, the reader is asked to refer to the bibliography. The intuition behind the proof is however showing the number of errors to grow rapidly as the rate grows beyond the channel capacity. The idea is the sender generates messages of dimension

$\displaystyle k$

k

\displaystyle k

, while the channel

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

introduces transmission errors. When the capacity of the channel is

$\displaystyle H(p)$

H
(
p
)

\displaystyle H(p)

, the number of errors is typically

$\displaystyle 2^H(p+\epsilon )n$

2

H
(
p
+
ϵ
)
n

\displaystyle 2^H(p+\epsilon )n

for a code of block length

$\displaystyle n$

n

\displaystyle n

. The maximum number of messages is

$\displaystyle 2^k$

2

k

\displaystyle 2^k

. The output of the channel on the other hand has

$\displaystyle 2^n$

2

n

\displaystyle 2^n

possible values. If there is any confusion between any two messages, it is likely that

$\displaystyle 2^k2^H(p+\epsilon )n\geq 2^n$

2

k

2

H
(
p
+
ϵ
)
n

2

n

\displaystyle 2^k2^H(p+\epsilon )n\geq 2^n

. Hence we would have

$\displaystyle k\geq \lceil (1-H(p+\epsilon )n)\rceil$

k

(
1

H
(
p
+
ϵ
)
n
)

\displaystyle k\geq \lceil (1-H(p+\epsilon )n)\rceil

, a case we would like to avoid to keep the decoding error probability exponentially small.

## Codes for BSCp[ edit ]

Very recently, a lot of work has been done and is also being done to design explicit error-correcting codes to achieve the capacities of several standard communication channels. The motivation behind designing such codes is to relate the rate of the code with the fraction of errors which it can correct.

The approach behind the design of codes which meet the channel capacities of

$\displaystyle BSC$

B
S
C

\displaystyle BSC

,

$\displaystyle BEC$

B
E
C

\displaystyle BEC

have been to correct a lesser number of errors with a high probability, and to achieve the highest possible rate. Shannon’s theorem gives us the best rate which could be achieved over a

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

, but it does not give us an idea of any explicit codes which achieve that rate. In fact such codes are typically constructed to correct only a small fraction of errors with a high probability, but achieve a very good rate. The first such code was due to George D. Forney in 1966. The code is a concatenated code by concatenating two different kinds of codes. We shall discuss the construction Forney’s code for the Binary Symmetric Channel and analyze its rate and decoding error probability briefly here. Various explicit codes for achieving the capacity of the binary erasure channel have also come up recently.

## Forney’s code for BSCp[ edit ]

Forney constructed a concatenated code

$\displaystyle C^*=C_\textout\circ C_\textin$

C

=

C

out

C

in

\displaystyle C^*=C_\textout\circ C_\textin

to achieve the capacity of Theorem 1 for

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

. In his code,

• The outer code
$\displaystyle C_\textout$

C

out

\displaystyle C_\textout

is a code of block length

$\displaystyle N$

N

\displaystyle N

and rate

$\displaystyle 1-\frac \epsilon 2$

1

ϵ
2

\displaystyle 1-\frac \epsilon 2

over the field

$\displaystyle F_2^k$

F

2

k

\displaystyle F_2^k

, and

$\displaystyle k=O(logN)$

k
=
O
(
l
o
g
N
)

\displaystyle k=O(logN)

. Additionally, we have a decoding algorithm

$\displaystyle D_\textout$

D

out

\displaystyle D_\textout

for

$\displaystyle C_\textout$

C

out

\displaystyle C_\textout

which can correct up to

$\displaystyle \gamma$

γ

\displaystyle \gamma

fraction of worst case errors and runs in

$\displaystyle t_\textout(N)$

t

out

(
N
)

\displaystyle t_\textout(N)

time.

• The inner code
$\displaystyle C_\textin$

C

in

\displaystyle C_\textin

is a code of block length

$\displaystyle n$

n

\displaystyle n

, dimension

$\displaystyle k$

k

\displaystyle k

, and a rate of

$\displaystyle 1-H(p)-\frac \epsilon 2$

1

H
(
p
)

ϵ
2

\displaystyle 1-H(p)-\frac \epsilon 2

. Additionally, we have a decoding algorithm

$\displaystyle D_\textin$

D

in

\displaystyle D_\textin

for

$\displaystyle C_\textin$

C

in

\displaystyle C_\textin

with a decoding error probability of at most

$\displaystyle \frac \gamma 2$

γ
2

\displaystyle \frac \gamma 2

over

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

and runs in

$\displaystyle t_\textin(N)$

t

in

(
N
)

\displaystyle t_\textin(N)

time.

For the outer code

$\displaystyle C_\textout$

C

out

\displaystyle C_\textout

, a Reed-Solomon code would have been the first code to have come in mind. However, we would see that the construction of such a code cannot be done in polynomial time. This is why a binary linear code is used for

$\displaystyle C_\textout$

C

out

\displaystyle C_\textout

.

For the inner code

$\displaystyle C_\textin$

C

in

\displaystyle C_\textin

we find a linear code by exhaustively searching from the linear code of block length

$\displaystyle n$

n

\displaystyle n

and dimension

$\displaystyle k$

k

\displaystyle k

, whose rate meets the capacity of

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

, by Theorem 1.

The rate

$\displaystyle R(C^*)=R(C_\textin)\times R(C_\textout)=(1-\frac \epsilon 2)(1-H(p)-\frac \epsilon 2)\geq 1-H(p)-\epsilon$

R
(

C

)
=
R
(

C

in

)
×
R
(

C

out

)
=
(
1

ϵ
2

)
(
1

H
(
p
)

ϵ
2

)

1

H
(
p
)

ϵ

\displaystyle R(C^*)=R(C_\textin)\times R(C_\textout)=(1-\frac \epsilon 2)(1-H(p)-\frac \epsilon 2)\geq 1-H(p)-\epsilon

which almost meets the

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

capacity. We further note that the encoding and decoding of

$\displaystyle C^*$

C

\displaystyle C^*

can be done in polynomial time with respect to

$\displaystyle N$

N

\displaystyle N

. As a matter of fact, encoding

$\displaystyle C^*$

C

\displaystyle C^*

takes time

$\displaystyle O(N^2)+O(Nk^2)=O(N^2)$

O
(

N

2

)
+
O
(
N

k

2

)
=
O
(

N

2

)

\displaystyle O(N^2)+O(Nk^2)=O(N^2)

. Further, the decoding algorithm described takes time

$\displaystyle Nt_\textin(k)+t_\textout(N)=N^O(1)$

N

t

in

(
k
)
+

t

out

(
N
)
=

N

O
(
1
)

\displaystyle Nt_\textin(k)+t_\textout(N)=N^O(1)

as long as

$\displaystyle t_\textout(N)=N^O(1)$

t

out

(
N
)
=

N

O
(
1
)

\displaystyle t_\textout(N)=N^O(1)

; and

$\displaystyle t_\textin(k)=2^O(k)$

t

in

(
k
)
=

2

O
(
k
)

\displaystyle t_\textin(k)=2^O(k)

.

### Decoding error probability for C*[ edit ]

A natural decoding algorithm for

$\displaystyle C^*$

C

\displaystyle C^*

is to:

• Assume
$\displaystyle y_i^\prime =D_\textin(y_i),\quad i\in (0,N)$

y

i

=

D

in

(

y

i

)
,

i

(
0
,
N
)

• Execute
$\displaystyle D_\textout$

D

out

\displaystyle D_\textout

on

$\displaystyle y^\prime =(y_1^\prime \ldots y_N^\prime )$

y

=
(

y

1

y

N

)

\displaystyle y^\prime =(y_1^\prime \ldots y_N^\prime )

Note that each block of code for

$\displaystyle C_\textin$

C

in

\displaystyle C_\textin

is considered a symbol for

$\displaystyle C_\textout$

C

out

\displaystyle C_\textout

. Now since the probability of error at any index

$\displaystyle i$

i

\displaystyle i

for

$\displaystyle D_\textin$

D

in

\displaystyle D_\textin

is at most

$\displaystyle \tfrac \gamma 2$

γ
2

\displaystyle \tfrac \gamma 2

and the errors in

$\displaystyle BSC_p$

B
S

C

p

\displaystyle BSC_p

are independent, the expected number of errors for

$\displaystyle D_\textin$

D

in

\displaystyle D_\textin

is at most

$\displaystyle \tfrac \gamma N2$

γ
N

2

\displaystyle \tfrac \gamma N2

by linearity of expectation. Now applying Chernoff bound , we have bound error probability of more than

$\displaystyle \gamma N$

γ
N

\displaystyle \gamma N

errors occurring to be

$\displaystyle e^\frac -\gamma N6$

e

γ
N

6

\displaystyle e^\frac -\gamma N6

. Since the outer code

$\displaystyle C_\textout$

C

out

\displaystyle C_\textout

can correct at most

$\displaystyle \gamma N$

γ
N

\displaystyle \gamma N

errors, this is the decoding error probability of

$\displaystyle C^*$

C

\displaystyle C^*

. This when expressed in asymptotic terms, gives us an error probability of

$\displaystyle 2^-\Omega (\gamma N)$

2

Ω
(
γ
N
)

\displaystyle 2^-\Omega (\gamma N)

. Thus the achieved decoding error probability of

$\displaystyle C^*$

C

\displaystyle C^*

is exponentially small as Theorem 1.

We have given a general technique to construct

$\displaystyle C^*$

C

\displaystyle C^*

. For more detailed descriptions on

$\displaystyle C_\textin$

C

in

\displaystyle C_\textin

and

$\displaystyle Cout$

C

o
u
t

\displaystyle Cout

please read the following references. Recently a few other codes have also been constructed for achieving the capacities. LDPC codes have been considered for this purpose for their faster decoding time. [2]

• Z channel

## Notes[ edit ]

1. ^ Thomas M. Cover, Joy A. Thomas. Elements of information theory, 2nd Edition. New York: Wiley-Interscience, 2006.

ISBN   978-0-471-24195-9 .

2. ^ Richardson and Urbanke

## References[ edit ]

• David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN   0-521-64298-1
• Thomas M. Cover, Joy A. Thomas. Elements of information theory, 1st Edition. New York: Wiley-Interscience, 1991. ISBN   0-471-06259-6 .
• Atri Rudra’s course on Error Correcting Codes: Combinatorics, Algorithms, and Applications (Fall 2007), Lectures 9 , 10 , 29 , and 30 .
• Madhu Sudan’s course on Algorithmic Introduction to Coding Theory (Fall 2001), Lecture 1 and 2 .
• G. David Forney. Concatenated Codes . MIT Press, Cambridge, MA, 1966.
• Venkat Guruswamy’s course on Error-Correcting Codes: Constructions and Algorithms , Autumn 2006.
• A mathematical theory of communication C. E Shannon, ACM SIGMOBILE Mobile Computing and Communications Review.
• Modern Coding Theory by Tom Richardson and Rudiger Urbanke., Cambridge University Press

// http://oscar.iitb.ac.in/availableProposalsAction1.do?type=av&id=534&language=english A Java applet implementing Binary Symmetric Channel
broken

Categories :

• Coding theory
Hidden categories:

• Articles lacking in-text citations from March 2013
• All articles lacking in-text citations

### Personal tools

• Not logged in
• Talk
• Contributions
• Create account

• Article
• Talk

### Views

• Edit
• View history

### Search

• Main page
• Contents
• Featured content
• Current events
• Random article
• Donate to Wikipedia
• Wikipedia store

### Interaction

• Help
• Community portal
• Recent changes
• Contact page

### Tools

• Related changes
• Special pages
• Page information
• Wikidata item

### Print/export

• Create a book
• Printable version

### Languages

• Deutsch
• Español
• فارسی
• Français
• 日本語
• Português
• Русский
• Tiếng Việt