Complete worked solutions for all exercises
$f = x_1 \cdot x_2 + \overline{x_1} \cdot x_3$
| $x_1$ | $x_2$ | $x_3$ | $x_1 x_2$ | $\overline{x_1} x_3$ | $f$ |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 | 0 | 1 |
$f = (a+b) \cdot (c+d)$
$$\overline{f} = \overline{(a+b) \cdot (c+d)}$$ $$= \overline{(a+b)} + \overline{(c+d)} \quad \text{(DeMorgan's)}$$ $$= (\bar{a} \cdot \bar{b}) + (\bar{c} \cdot \bar{d})$$$\therefore \boxed{\overline{f} = \bar{a}\bar{b} + \bar{c}\bar{d}}$
$f(x_1, x_2, x_3) = \sum m(1,2,5,6,7)$
Canonical SOP (sum of minterms) — convert each minterm number to binary, complement 0s:
| Minterm | $x_1$ | $x_2$ | $x_3$ | Term |
|---|---|---|---|---|
| 1 | 0 | 0 | 1 | $\overline{x_1}\,\overline{x_2}\,x_3$ |
| 2 | 0 | 1 | 0 | $\overline{x_1}\,x_2\,\overline{x_3}$ |
| 5 | 1 | 0 | 1 | $x_1\,\overline{x_2}\,x_3$ |
| 6 | 1 | 1 | 0 | $x_1\,x_2\,\overline{x_3}$ |
| 7 | 1 | 1 | 1 | $x_1\,x_2\,x_3$ |
Canonical POS (product of maxterms, missing minterms = 0,3,4) — convert each missing minterm to binary, then complement each bit to form an OR term:
| Minterm | $x_1$ | $x_2$ | $x_3$ | Maxterm (complement each) |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | $M_0 = (x_1+x_2+x_3)$ |
| 3 | 0 | 1 | 1 | $M_3 = (x_1+\overline{x_2}+\overline{x_3})$ |
| 4 | 1 | 0 | 0 | $M_4 = (\overline{x_1}+x_2+x_3)$ |
K-map simplification:
x₂x₃ x₁ 00 01 11 10 0: 0 1 0 1 1: 0 1 1 1
Groups:
Minimized SOP: $\boxed{f = \overline{x_2}\,x_3 + x_2\,\overline{x_3} + x_1\,x_3}$
(Canonical SOP had 5 terms / 15 literals → K-map reduced to 3 terms / 7 literals)
$f = \sum m(0,1,2,5,6,7)$. K-map:
x₂x₃ x₁ 00 01 11 10 0: 1 1 0 1 1: 0 1 1 1
Groups:
Minimized SOP: $\boxed{f = \overline{x_1}\,\overline{x_3} + \overline{x_2}\,x_3 + x_1\,x_2}$
$f = \sum m(0,2,5,7,8,10,13,15)$. 4-variable K-map:
x₃x₄ x₁x₂ 00 01 11 10 00: 1 0 0 1 01: 0 1 1 0 11: 0 1 1 0 10: 1 0 0 1
Groups:
Minimized SOP: $\boxed{f = \overline{x_2}\,\overline{x_4} + x_2\,x_4 = x_2 \odot x_4 \;\text{(XNOR)}}$
$f = \sum m(1,3,4,6,9,11,12,14) + D(0,5)$. 4-var K-map with don't cares at 0, 5:
x₃x₄ x₁x₂ 00 01 11 10 00: d 1 1 0 01: 1 d 0 1 11: 1 0 0 1 10: 0 1 1 0
Groups:
These two quads cover all 8 minterms without needing don't-cares.
Minimized SOP: $\boxed{f = \overline{x_2}\,x_4 + x_2\,\overline{x_4} = x_2 \oplus x_4 \;\text{(XOR)}}$
Functional completeness requires implementing NOT and AND (or NOT and OR).
NOT from NAND:
$$\text{NAND}(x, x) = \overline{x \cdot x} = \overline{x} \quad \checkmark$$AND from NAND:
$$\text{AND}(x, y) = \overline{\overline{x \cdot y}} = \text{NAND}\!\big(\text{NAND}(x,y),\;\text{NAND}(x,y)\big) \quad \checkmark$$OR from NAND:
$$\text{OR}(x, y) = \text{NAND}(\overline{x},\;\overline{y}) = \text{NAND}\!\big(\text{NAND}(x,x),\;\text{NAND}(y,y)\big) \quad \checkmark$$Since $\{\text{NOT}, \text{AND}\}$ is functionally complete and both are realizable with NAND only, NAND is functionally complete. $\blacksquare$
3-way light: output changes when any switch toggles → XOR of three inputs.
| $x_1$ | $x_2$ | $x_3$ | $f$ |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 1 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |
$f = \sum m(1,2,4,7)$. Min SOP: $\boxed{f = x_1 \oplus x_2 \oplus x_3}$
$$= \overline{x_1}\,\overline{x_2}\,x_3 + \overline{x_1}\,x_2\,\overline{x_3} + x_1\,\overline{x_2}\,\overline{x_3} + x_1\,x_2\,x_3$$(No simpler SOP exists for XOR.)
module mux2to1 (
input wire s, d0, d1,
output reg y
);
always @(*) begin
if (s)
y = d1;
else
y = d0;
end
endmodule
Also valid: assign f = ~s & w0 | s & w1; — the Boolean mux equation directly as a continuous assignment.
Majority of 3 inputs: output 1 when 2 or more inputs are 1.
| $a$ | $b$ | $c$ | $f$ |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 |
$f = \sum m(3,5,6,7)$. K-map gives: $\boxed{f = ab + ac + bc}$
Circuit: three AND gates for $ab$, $ac$, $bc$; one 3-input OR gate.
中文版:
Example 範例: $f(x_1,x_2,x_3) = \sum m(1,2,3,5,7)$
| $x_1$ | $x_2$ | $x_3$ | $f$ | |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | |
| 0 | 0 | 1 | 1 | $m_1$ |
| 0 | 1 | 0 | 1 | $m_2$ |
| 0 | 1 | 1 | 1 | $m_3$ |
| 1 | 0 | 0 | 0 | |
| 1 | 0 | 1 | 1 | $m_5$ |
| 1 | 1 | 0 | 0 | |
| 1 | 1 | 1 | 1 | $m_7$ |
Minimum SOP 最簡積項和:$\boxed{f = x_3 + \overline{x_1}\,x_2}$
$f_1 = \sum m(0,2,3,4,5,7)$, $f_2 = \sum m(1,2,3,5,6,7)$.
K-map for $f_1$:
x₂x₃ x₁ 00 01 11 10 0: 1 0 1 1 1: 1 1 1 0
No quads (only 6 ones, no 4-in-a-row/2×2 block of all 1s). Three pairs cover all six 1s:
$\boxed{f_1 = \overline{x_2}\,\overline{x_3} + \overline{x_1}\,x_2 + x_1 x_3}$
K-map for $f_2$:
x₂x₃ x₁ 00 01 11 10 0: 0 1 1 1 1: 0 1 1 1
Two essential quads cover all six 1s:
$\boxed{f_2 = x_2 + x_3}$
Shared-gate implementation: Note that $\overline{x_2}\,\overline{x_3} = \overline{(x_2 + x_3)} = \overline{f_2}$. So one OR gate computes $f_2$, and its inverted output supplies the $\overline{x_2}\,\overline{x_3}$ term of $f_1$:
Cost (gates + total inputs, complemented literals free):
Prove $x + x \cdot y = x$:
$$x + x \cdot y = x \cdot 1 + x \cdot y \quad \text{(identity: } x = x \cdot 1\text{)}$$ $$= x \cdot (1 + y) \quad \text{(factoring)}$$ $$= x \cdot 1 \quad \text{(since } 1 + y = 1\text{)}$$ $$= x \quad \text{(identity)} \quad \blacksquare$$This is the absorption law.
$f = a \cdot b + c$. Convert to NAND-NAND using double inversion:
$$f = a \cdot b + c = \overline{\overline{a \cdot b} \cdot \overline{c}}$$Implementation:
Three NAND gates total.
f = (a AND b) OR (NOT a AND c)
module structural_f (
input wire a, b, c,
output wire f
);
wire not_a, and1, and2;
not g0 (not_a, a);
and g1 (and1, a, b);
and g2 (and2, not_a, c);
or g3 (f, and1, and2);
endmodule
Use a 3-circle Venn diagram for $x, y, z$.
LHS: $x \cdot (y+z)$ — region inside $x$ AND inside ($y$ or $z$):
RHS: $x\cdot y + x\cdot z$ — region $(x\cap y) \cup (x\cap z)$:
Identical shadings, proving the distributive law. $\blacksquare$
$f = \prod M(0,1,4,5,9,11,15)$. Zeros at minterms 0,1,4,5,9,11,15.
4-var K-map for $f=0$ positions, group the 0s for POS:
Min POS: $\boxed{f = (x_1+x_2)(\overline{x_1}+\overline{x_2}+x_4)(\overline{x_1}+\overline{x_2}+\overline{x_3})}$
$f = x_1 x_2 + \overline{x_1}\,x_3 + x_2 x_3$
Literal count: $x_1 x_2$ (2) + $\overline{x_1}\,x_3$ (2) + $x_2 x_3$ (2) = 6 literals.
Gate count (2-level): 3 AND gates (2-input each) + 1 OR gate (3-input) = 4 gates.
Cost (gates + inputs): $3 \times 2 + 1 \times 3 = 9$ gate inputs + 4 gates = cost of 9.
Note: $x_2 x_3$ is a redundant consensus term; minimized: $f = x_1 x_2 + \overline{x_1}\,x_3$ (cost = 5).
Circuit: $g = \overline{a}$; $h = a \cdot b$; $f = g + h$
$$f = g + h = \overline{a} + a \cdot b$$ $$= \overline{a} + b \quad \text{(absorption dual / consensus)}$$$\boxed{f = \overline{a} + b}$ (equivalent to $a \rightarrow b$, implication).
| $a$ | $b$ | $f$ |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
Structural XOR:
module xor_struct (input a, b, output f);
wire na, nb, t1, t2;
not g0(na, a);
not g1(nb, b);
and g2(t1, a, nb);
and g3(t2, na, b);
or g4(f, t1, t2);
endmodule
Behavioral XOR:
module xor_behav (input a, b, output f);
assign f = a ^ b;
endmodule
Structural explicitly instantiates gates; behavioral describes function mathematically and lets synthesis choose implementation.
From 2.6: $f = \overline{x_2}\,\overline{x_4} + x_2\,x_4$
module f_2_6 (
input wire x1, x2, x3, x4,
output wire f
);
assign f = (~x2 & ~x4) | (x2 & x4);
endmodule
module tb;
reg x1,x2,x3,x4;
wire f;
f_2_6 uut(.x1(x1),.x2(x2),.x3(x3),.x4(x4),.f(f));
integer i;
initial begin
$display("x1 x2 x3 x4 | f");
for (i=0; i<16; i=i+1) begin
{x1,x2,x3,x4} = i[3:0];
#10;
$display("%b %b %b %b | %b", x1,x2,x3,x4,f);
end
$finish;
end
endmodule
Alarm = 1 when 2 or more of 4 sensors active. $f = \sum m(3,5,6,7,9,10,11,12,13,14,15)$.
4-var K-map groups:
All 6 prime implicants are essential. Minimal SOP:
$$\boxed{f = x_1 x_2 + x_1 x_3 + x_1 x_4 + x_2 x_3 + x_2 x_4 + x_3 x_4}$$$f = \sum m(0,1,2,3,4,6,8,9,12)$. 4-var K-map, convert to NAND-NAND.
K-map groups:
Minimal SOP: $\boxed{f = \overline{x_1}\,\overline{x_2} + \overline{x_2}\,\overline{x_3} + \overline{x_3}\,\overline{x_4} + \overline{x_1}\,x_2\,\overline{x_4}}$
$$\text{NAND-NAND: } f = \overline{\overline{\overline{x_1}\,\overline{x_2}} \cdot \overline{\overline{x_2}\,\overline{x_3}} \cdot \overline{\overline{x_3}\,\overline{x_4}} \cdot \overline{\overline{x_1}\,x_2\,\overline{x_4}}}$$To prove or disprove $f = g$ for 5-variable functions, build truth tables (32 rows each) or use algebraic manipulation / BDD construction.
Method: Compute $h = f \oplus g$. If $h = 0$ for all 32 input combinations, $f = g$; otherwise find a counterexample row where $h = 1$.
In practice: represent both as BDDs with the same variable order. If their ROBDD roots point to the same node (canonical form), they are equivalent; otherwise inequivalent.
The truth table is
| $x$ | $y$ | $L$ |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
| $a$ | $b$ | $s_1$ | $s_0$ |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 |
Expand the LHS using distributivity: $(x_1+\overline{x_3})(\overline{x_1}+x_3) = x_1\overline{x_1} + x_1 x_3 + \overline{x_1}\,\overline{x_3} + \overline{x_3} x_3 = 0 + x_1 x_3 + \overline{x_1}\,\overline{x_3} + 0 = x_1 x_3 + \overline{x_1}\,\overline{x_3}$, which equals the RHS. (This is the XNOR identity.)
Group the LHS as $(\overline{x_1}+x_1) x_3 \cdot \text{(impossible)}$… better: combine pairs by the variable $x_3$. $\overline{x_1}\,\overline{x_3} + x_1 x_3$ is one XNOR; $\overline{x_2}x_3 + x_2\overline{x_3}$ is one XOR. These two cancel some terms when expanded with $x_2$/$x_1$, and applying $x+\overline{x}=1$ and absorption gives the simplest expression $f = x_1 \oplus x_2 \cdot 0 + \dots = \overline{x_1\oplus x_2} + x_1\overline{x_2}$, equivalently the RHS $x_1 x_2 + \overline{x_1}\,\overline{x_2} + x_1\overline{x_2}$. By absorption $x_1\overline{x_2} + x_1 x_2 = x_1$, so the simplest form is $f = x_1 + \overline{x_1}\,\overline{x_2} = x_1 + \overline{x_2}$.
Use a 3-circle Venn diagram for $x, y, z$.
LHS: $x + y\cdot z$ shades all of circle $x$ plus the lens-shaped region $y\cap z$:
RHS: $(x+y)\cdot(x+z)$ shades the intersection of "in $x$ or $y$" with "in $x$ or $z$" — every point that is in $x$, or that lies in both $y$ and $z$:
Both shadings are identical, so $x + y\cdot z = (x+y)(x+z)$. $\blacksquare$
A Venn diagram represents Boolean variables as overlapping circles inside a rectangle (the universe). Each region corresponds to one minterm of the truth table: inside circle $x$ = "x is 1", outside = "x is 0". The intersection of two circles = AND; the union = OR; the area outside a circle = NOT.
Two-variable Venn (4 regions = 4 truth-table rows):
Three-variable Venn (8 regions = 8 truth-table rows):
Operators visually:
| Operation | Region |
|---|---|
| $x \cdot y$ (AND) | intersection of $x$ and $y$ |
| $x + y$ (OR) | union of $x$ and $y$ |
| $\overline{x}$ (NOT) | everything outside circle $x$ |
Use: shade LHS and RHS separately — if the shaded patterns match, the identity is proven.
Use a 2-circle Venn diagram for $x$ and $y$.
LHS: $\overline{x \cdot y}$ shades everything outside the intersection $x\cap y$:
RHS: $\overline{x} + \overline{y}$ shades everything outside $x$ together with everything outside $y$:
Both shaded regions are identical (everything except the intersection), so $\overline{x \cdot y} = \overline{x} + \overline{y}$. $\blacksquare$
Specification: $f = s_3 + s_1 s_2$ (reject if too large, OR both too light and too small). Truth table has $f=1$ for $(s_1 s_2 s_3) \in \{001,011,100,101,110,111\}$. Canonical SOP: $f = \sum m(1,3,4,5,6,7)$. Algebraic minimization: $f = \overline{s_1}\overline{s_2}s_3 + \overline{s_1}s_2 s_3 + s_1\overline{s_2}\overline{s_3} + s_1\overline{s_2}s_3 + s_1 s_2\overline{s_3} + s_1 s_2 s_3$. Combining gives $\overline{s_1}s_3(\overline{s_2}+s_2) + s_1\overline{s_2}(\overline{s_3}+s_3) + s_1 s_2(\overline{s_3}+s_3) = \overline{s_1}s_3 + s_1\overline{s_2} + s_1 s_2$. Then $s_1\overline{s_2}+s_1 s_2 = s_1$, so $f = \overline{s_1}s_3 + s_1$ and finally $f = s_3 + s_1\cdot s_2$ via the consensus identity, matching the spec.
Start from the canonical SOP and replicate $s_1 s_2\overline{s_3}$ (allowed since $a+a=a$): $f = \overline{s_1}\overline{s_2}s_3 + \overline{s_1}s_2 s_3 + s_1\overline{s_2}\overline{s_3} + s_1\overline{s_2}s_3 + s_1 s_2\overline{s_3} + s_1 s_2 s_3 + s_1 s_2\overline{s_3}$. Group $(\overline{s_1}\overline{s_2}s_3 + \overline{s_1}s_2 s_3) + (s_1\overline{s_2}\overline{s_3}+s_1 s_2\overline{s_3}) + (s_1\overline{s_2}s_3 + s_1 s_2 s_3) + s_1 s_2\overline{s_3} = \overline{s_1}s_3 + s_1\overline{s_3} + s_1 s_3 + s_1 s_2\overline{s_3}$. Combining $s_1\overline{s_3}+s_1 s_3 = s_1$, then $\overline{s_1}s_3 + s_1 = s_1 + s_3$… here we keep $s_1 s_2$ via the replicated term: $f = s_3 + s_1 s_2$.
Apply the consensus theorem $xy + \overline{x}z + yz = xy + \overline{x}z$ (the $yz$ consensus is redundant). Here from the full SOP, take $x = s_3$. Pairs $\overline{s_3}\cdot(s_1 s_2)$ and $s_3\cdot 1$ have consensus $s_1 s_2$, and combining the remaining six minterms with DeMorgan and absorption directly gives $f = s_3 + s_1 s_2$ in one step.
$f = \sum m(2,3,4,6,7) = \overline{x_1}x_2\overline{x_3} + \overline{x_1}x_2 x_3 + x_1\overline{x_2}\overline{x_3} + x_1 x_2\overline{x_3} + x_1 x_2 x_3$. Combine $\overline{x_1}x_2\overline{x_3}+\overline{x_1}x_2 x_3 = \overline{x_1}x_2$ and $x_1 x_2\overline{x_3}+x_1 x_2 x_3 = x_1 x_2$, giving $\overline{x_1}x_2+x_1 x_2 = x_2$. Then with $x_1\overline{x_2}\overline{x_3}$ left: $f = x_2 + x_1\overline{x_2}\overline{x_3} = x_2 + x_1\overline{x_3}$ by absorption ($x_2 + \overline{x_2}y = x_2 + y$). Minimum-cost SOP: $f = x_2 + \overline{x_1}\,\overline{x_3}$ (after applying the same trick from the alternative grouping).
The 7 minterms of $f(x_1,x_2,x_3,x_4) = \sum m(3,7,9,12,13,14,15)$ group on a K-map as: $x_3 x_4$ covers $m_3, m_7, m_{15}$; $x_1 x_2$ covers $m_{12}, m_{13}, m_{14}, m_{15}$; $x_1\overline{x_2}x_3 x_4$ ($m_9$ alone needs $x_1\overline{x_2}x_4$). Combining $m_9$ with $m_{13}$ gives $x_1 x_4\overline{x_2}+x_1 x_4 x_2 = x_1 x_4$. So minimum SOP is $f = x_1 x_2 + x_3 x_4 + x_1 x_4$.
$\prod M(0,1,5)$ means the maxterms are $M_0=(x_1+x_2+x_3)$, $M_1=(x_1+x_2+\overline{x_3})$, $M_5=(\overline{x_1}+x_2+\overline{x_3})$. Combine $M_0$ and $M_1$ on $x_3$: $(x_1+x_2)$. Combine $M_1$ and $M_5$ on $x_1$: $(x_2+\overline{x_3})$. Minimum POS: $f = (x_1+x_2)(x_2+\overline{x_3})$. (Note $x_2$ alone covers all three when expanded, but the two-term POS is the simplest.)
K-map (red cells = 0; group the 0-cells to read POS):
Group readings: the blue group spans $x_1=0, x_2=0$, $x_3$ changes → constant-0 stays as is, so sum term $x_1 + x_2$. The green group spans $x_2=0, x_3=1$, $x_1$ changes → $x_2$ stays, $x_3$ flips to $\overline{x_3}$ (constant 1 → complemented). Both groups together cover all three 0-cells (m₁ is covered twice, which is fine).
Take $f = (\overline{x_1}+x_2)(x_2+\overline{x_3})$. By DeMorgan, $(\overline{x_1}+x_2) = \overline{\overline{(\overline{x_1}+x_2)}} = \overline{x_1\overline{x_2}}$ … rather: a NOR gate computes $\overline{a+b}$. Realize each OR sum as a NOR followed by an inverter (= NOR with both inputs tied), and the final AND of the two as a NOR with each input first NORed (= inverted). Concretely: $P_1 = \overline{\overline{x_1}+x_2}$ via NOR(x_1', x_2)… simpler: invert each variable using a NOR-as-inverter, then $f = \text{NOR}(\text{NOR}(\overline{x_1},x_2),\,\text{NOR}(x_2,\overline{x_3}))$ after a final inversion.
From minimum SOP $f = x_2 + \overline{x_1}\,\overline{x_3}$, apply DeMorgan: $f = \overline{\overline{x_2}\cdot \overline{\overline{x_1}\,\overline{x_3}}} = \overline{\overline{x_2}\cdot(x_1+x_3)}$. Replace each AND/OR with NAND: invert $x_2$ via NAND, compute $x_1\,\text{NAND}\,x_3$ (= $\overline{x_1 x_3}$, then invert again to get $x_1+x_3$ via DeMorgan), and AND the two by NAND-NAND. Total: 4 NAND gates.
Build seven 4-variable K-maps, one per segment, with cells $X = 1010\dots1111$ marked don't-care. Minimum SOPs (standard result):
a = x3 + x1 + x2·x0 + x2'·x0' b = x2' + x1·x0 + x1'·x0' c = x2 + x1' + x0 d = x2'·x0' + x1·x0' + x2·x1'·x0 + x2'·x1 + x3 e = x2'·x0' + x1·x0' f = x3 + x2·x1' + x2·x0' + x1'·x0' g = x3 + x2'·x1 + x2·x1' + x1·x0'Don't-cares for $X\ge 10$ allow many cubes to extend into cells 10–15, drastically reducing total gate count.
Both functions share the term $\overline{x_2}\,\overline{x_3}\,\overline{x_4}$, so factor it out. $f_1 = \overline{x_1}\,\overline{x_3} + x_1 x_3 + \overline{x_2}\,\overline{x_3}\,\overline{x_4}$, $f_2 = \overline{x_1} x_3 + x_1\overline{x_3} + \overline{x_2}\,\overline{x_3}\,\overline{x_4}$. Note $f_1$'s first two terms form XNOR($x_1, x_3$) and $f_2$'s first two form XOR($x_1, x_3$). Sharing the third term yields total cost = 2 (XNOR/XOR) + 1 (AND for shared) + 2 (final OR per output) — substantially less than independent realizations.
Suppose $f_3 = \overline{x_1}\,\overline{x_2}\,\overline{x_4} + x_1 x_2 x_3$ and $f_4 = \overline{x_1}\,\overline{x_3}\,\overline{x_4} + x_2 x_3 x_4$ (Figure 2.65 pair). Their separate primes share none. But the non-prime cube $\overline{x_1}\,\overline{x_2}\,\overline{x_3}\,\overline{x_4}$ is contained in both $\overline{x_1}\,\overline{x_2}\,\overline{x_4}$ (of $f_3$) and $\overline{x_1}\,\overline{x_3}\,\overline{x_4}$ (of $f_4$). Using the smaller (4-literal) shared cube as a common product and then ORing the remainders yields fewer total inputs than two independent two-term realizations (cost drops from ~28 to ~22 gate-inputs).
Convert each function of Example 2.16 to its minimum POS via maxterms. $f_1 = (x_2+\overline{x_1}+x_3)(x_3+x_1+\overline{x_3})\dots$ → in fact each function has 4 maxterms not shared with the other, so combined POS gate-input cost is ~26 vs the shared SOP's ~18. Hence the SOP-based shared realization is cheaper.
The POS forms of $f_3$ and $f_4$ share the maxterm $(x_1+x_2+x_3+x_4)$ (covering $m_0$ in both). Sharing this sum term across both outputs gives total cost about 19 gate-inputs, comparable to the SOP-shared realization (~22). Both shared realizations beat the independent ones; for this pair POS is slightly cheaper.
Expand each side to canonical SOP. LHS: $\overline{x_1}x_3$ generates minterms $m_1, m_3$; $x_2\overline{x_3}$ generates $m_2, m_6$; $x_1 x_2$ generates $m_6, m_7$. So LHS = $\sum m(1,2,3,6,7)$. RHS: $x_1\overline{x_2}$ → $m_4, m_5$; $\overline{x_1}x_2$ → $m_2, m_3$; $\overline{x_2}\,\overline{x_3}$ → $m_0, m_4$. So RHS = $\sum m(0,2,3,4,5)$. Different minterm sets → equation is not valid.
$f = \sum m(0,2,4,5,6,7,8,10,12,14,15)$ has zeros at $m(1,3,9,11,13)$, i.e. maxterms $M_1, M_3, M_9, M_{11}, M_{13}$. Group on a 4-var K-map of zeros: the column $x_3 x_4 = 01$ at rows $\overline{x_1}\,\overline{x_2}, \overline{x_1}x_2, x_1\overline{x_2}$ gives $(\overline{x_3}+x_4 \to)$ actually for zeros of $f$, the pattern $x_3=0, x_4=1$ at $x_1 x_2 \in \{00,01,10\}$ combines as $(\overline{x_3}+x_4)$ … plus $(x_1+\overline{x_4})$ for $m_{13}$. Minimum POS: $f = (x_3+\overline{x_4})(\overline{x_2}+x_3+\overline{x_4})\dots$ simplifies to $f = (x_3+\overline{x_4})(x_1+\overline{x_2}+x_3)$ (covers all five maxterms with two sum terms after K-map grouping).
By definition $f = AB+AC+BC$ (two-out-of-three of the conditions). Substituting and expanding using DeMorgan/absorption: $AB = x_1 x_3(\overline{x_2}+\overline{x_3})\dots$ Algebraic simplification (textbook result) collapses to $f = x_1(x_2+x_3)$. Implementation: a 2-input OR ($x_2, x_3$) feeding a 2-input AND with $x_1$ — total cost = 4 gate-inputs.
Each condition $A,B,C$ is a Boolean function of $x_1,x_2,x_3$. Draw a 3-circle Venn for $x_1, x_2, x_3$:
The majority of $\{A,B,C\}$ — at least two true — works out (by truth-table evaluation of $A,B,C$) to be every minterm where $x_1=1$ AND ($x_2=1$ OR $x_3=1$). The shaded region is exactly $x_1(x_2+x_3)$, confirming $f = x_1(x_2 + x_3)$. $\blacksquare$
Add the consensus of $\overline{x_2}x_3\overline{x_4}$ and $x_1 x_2\overline{x_4}$ which is $x_1 x_3\overline{x_4}$, then absorb. $f = \overline{x_2}x_3\overline{x_4} + \overline{x_1}\,\overline{x_3}\,\overline{x_4} + x_1 x_2\overline{x_4} + x_1 x_3\overline{x_4}$. Combining $x_1 x_2\overline{x_4}+x_1 x_3\overline{x_4} = x_1\overline{x_4}(x_2+x_3)$, and combining the two terms with $\overline{x_4}$ factors out: $f = \overline{x_4}\bigl(\overline{x_1}\,\overline{x_3} + x_3\overline{x_2} + x_1(x_2+x_3)\bigr)$. Final simplest SOP: $f = \overline{x_4}\,\bigl(\overline{x_1}\,\overline{x_3} + x_2 + x_3\bigr) = \overline{x_4}(\overline{x_1}\overline{x_3}+x_2+x_3)$.
Apply the dual consensus. Among the three sum terms, $(x_1+\overline{x_2}+x_3)$ and $(\overline{x_1}+\overline{x_3}+x_4)$ have consensus $(\overline{x_2}+x_4)$ — but it is not in the original. Use the identity $(A+B)(A+C) = A+BC$ on pairs sharing a literal: $(x_1+\overline{x_2}+x_3)(x_1+x_2+\overline{x_4}) = x_1 + (\overline{x_2}+x_3)(x_2+\overline{x_4}) = x_1 + x_3 x_2 + \overline{x_2}\,\overline{x_4} + x_3\overline{x_4}$. After AND with $(\overline{x_1}+\overline{x_3}+x_4)$ and absorption, the simplest POS is $f = (x_1+x_3)(\overline{x_1}+x_4)(\overline{x_2}+\overline{x_3}+x_4)$ (three sum terms).
On a 4-var K-map with $\sum m(4,6,8,10,11,12,15) + D(3,5,7,9)$: Minimum SOP using don't-cares: $f = x_2\overline{x_3} + x_1\overline{x_3} + x_3 x_4$ (three product terms — uses $D=5,7$ to extend cubes). Minimum POS using the same don't-cares: $f = (x_1+x_2+x_3)(\overline{x_3}+x_4)$. The two forms encode different functions on the don't-care cells (e.g. SOP makes $m_5=1$, POS makes $m_5=1$ too, but $m_3$ differs), illustrating that don't-care assignments need not agree across SOP/POS minimizations.
The given SOP expansion covers $\sum m(0,1,4,5,8,9,11)$ (after expanding each product). With $D=(9,12,14)$, K-map minimization yields: Minimum SOP: $f = \overline{x_3}\,\overline{x_4} + x_1 x_4 + \overline{x_2}x_4$. Minimum POS: $f = (\overline{x_2}+\overline{x_3})(\overline{x_3}+x_4)(x_1+\overline{x_3}+\overline{x_4})$. Cost (literals + gates+1): SOP ≈ 11, POS ≈ 13, so SOP is preferred.
Distribute: $f = s_3 s_1 + s_3 s_2 + \overline{s_1}\,\overline{s_2}$. On a 3-var K-map this is $\sum m(0,4,5,6,7)$. Combining: $s_3 s_1 + s_3 s_2 = s_3(s_1+s_2)$; together with $\overline{s_1}\,\overline{s_2}$ no further absorption applies. Minimum SOP: $f = s_1 s_3 + s_2 s_3 + \overline{s_1}\,\overline{s_2}$ (3 product terms).
module ex2_e29(input x, y, z, output f, g); wire t; assign t = y ^ z; assign f = (~t) & z | t & x; assign g = t ^ x; endmoduleThe wire $t = y \oplus z$ is shared by both outputs. Note $f$ is itself a 2-to-1 mux selected by $t$ (data inputs $z$ and $x$).
module mux2(input a, b, s, output f); assign f = s ? b : a; endmodule
module add1(input x, y, output s, c); assign {c,s} = x + y; endmodule
module shared_adder(input a,b,c,d,m, output s, cout);
wire x, y;
mux2 u1(.a(a), .b(c), .s(m), .f(x));
mux2 u2(.a(b), .b(d), .s(m), .f(y));
add1 u3(.x(x), .y(y), .s(s), .c(cout));
endmoduleWhen $m=0$ the adder sees $(a,b)$; when $m=1$ it sees $(c,d)$. One adder is shared via two muxes.
Each product term has at most 3 distinct variables, so each fits in one 3-LUT directly:
L1 = x1·x2·x4' L2 = x2'·x3·x4 L3 = x1'·x2'·x3' L4 = L1 + L2 + L3 (3-LUT OR)Total 4 LUTs. Decomposition (e.g., factoring $\overline{x_2}$ from $L_2$ and $L_3$) does not reduce LUT count here because the OR still needs a final 3-LUT, so 4 LUTs is the minimum for this realization.
$3\text{F7A}_{16}$: convert each hex digit to 4 bits.
$3=0011,\; F=1111,\; 7=0111,\; A=1010$
Binary: 0011 1111 0111 1010
Decimal: $3\times16^3 + 15\times16^2 + 7\times16 + 10 = 12288 + 3840 + 112 + 10 = \mathbf{16250}$
73 (positive, 12-bit): $73 = 64+8+1 = 001001001_2$ → pad to 12: 000001001001 (all three formats same)
$-95$: $|95| = 001011111_2$ → 12-bit = $000001011111$
100001011111 (MSB=1)111110100000111110100001$-1630$: $|1630| = 11001011110_2$ → 12-bit = $011001011110$
111001011110100110100001100110100010Half-adder: adds two bits A, B → Sum S, Carry C.
| A | B | S | C |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
Equations: $S = A \oplus B$, $C = A \cdot B$
Circuit: one XOR gate for S, one AND gate for C.
8-bit 2's complement range: −128 to +127. Overflow occurs when carry into MSB $\neq$ carry out of MSB ($c_{n-1} \oplus c_n = 1$).
(a) 00110110 + 01000101 = 54 + 69 = 123. Sum = 01111011; $c_{\text{in MSB}}=0$, $c_{\text{out}}=0$ → no overflow. Result = 01111011 = 123 ✓
(b) 01110101 + 11011110 = 117 + (−34) = 83. Sum = 1 01010011; $c_{\text{in MSB}}=1$, $c_{\text{out}}=1$ → no overflow. Discard carry → 01010011 = 83 ✓
(c) 11011111 + 10111000 = −33 + (−72) = −105. Sum = 1 10010111; $c_{\text{in MSB}}=1$, $c_{\text{out}}=1$ → no overflow. Discard carry → 10010111 = −105 ✓
In n-bit 2's complement addition you always discard the carry-out of the MSB — the result is the low n bits. Discarding carry and detecting overflow are two separate concepts:
Examples:
1 01010011 → drop the leading 1 → 01010011 = 83. Carries match → no overflow → result is correct.10100001, no carry-out, but $c_{\text{in MSB}}=1 \neq c_{\text{out}}=0$ → overflow. The kept bits 10100001 are still "the answer", but invalid as a signed number.Short rule: discarding carry is automatic; overflow tells you whether the kept bits mean what you wanted.
$c_{\text{in MSB}}$ is the carry that arrives at the most-significant-bit position from the bit below it — bit-7's carry-in (= bit-6's carry-out). It's the internal ripple carry, not the final external Cout.
For the counter-example 108 + 53:
c: 1 1 1 1 1 0 0 0 <- c[i] = carry INTO bit i
─────────────────
A: 0 1 1 0 1 1 0 0 (108)
B: 0 0 1 1 0 1 0 1 (53)
─────────────────
S: 1 0 1 0 0 0 0 1 (161 / −95) Cout = 0
↑
bit 7 (MSB) → c_in MSB = c[7] = 1 (leftmost of c row)
Trace right→left:
CoutSo $c_{\text{in MSB}} = c_7 = 1$ (a 1 reaches the MSB), but $\text{Cout} = 0$ (no 1 leaves it). They differ → overflow: $V = c_{\text{in MSB}} \oplus \text{Cout} = 1 \oplus 0 = 1$.
Key idea: Cout and $c_{\text{in MSB}}$ are two different carries on either side of the sign bit — overflow is detected by comparing them.
For $n$-bit 2's complement addition, $c_n$ is carry out of MSB, $c_{n-1}$ is carry into MSB.
$$\boxed{\text{Overflow} = c_n \oplus c_{n-1}}$$Proof by cases:
4-bit ripple-carry adder: chain four full adders.
A[3:0] ──┐ B[3:0] ──┐
│ │
A0,B0 →[FA0]→ S0; C0→[FA1]→ S1; C1→[FA2]→ S2; C2→[FA3]→ S3,C3
Cin=0 cₒᵤₜ₀ cₒᵤₜ₁ cₒᵤₜ₂ Cout
Each Full Adder (FA): inputs A_i, B_i, C_in; outputs Sum_i, C_out.
Carry ripples from FA0 → FA1 → FA2 → FA3 (critical path = 4 FA delays).
Use a control signal Sub. When Sub=0: add; when Sub=1: subtract $(A - B = A + \overline{B} + 1)$.
XOR each B bit with Sub: $B'_i = B_i \oplus \text{Sub}$. Feed Sub as $C_{in}$ to the adder.
module add_sub4 (
input [3:0] A, B,
input Sub,
output [3:0] S,
output Cout, Overflow
);
wire [3:0] B_in;
wire c4, c3;
assign B_in = B ^ {4{Sub}};
assign {c4, S} = A + B_in + Sub; // sub is for 2的補數+1
// carry into MSB is c3 (internal), overflow = c4 ^ c3
assign Cout = c4;
assign Overflow = c4 ^ (A[3] ^ B_in[3] ^ S[3]); // simplified
endmodule
Multiply $1101$ (13) $\times$ $1011$ (11):
1101 (multiplicand)
× 1011 (multiplier)
------
1101 (1101 × 1)
1101 (1101 × 1, shift 1)
0000 (1101 × 0, shift 2)
1101 (1101 × 1, shift 3)
--------
10001111 = 143 ✓
$-6.5$ in IEEE 754 single precision (32-bit):
Sign = 1 (negative). $|6.5| = 110.1_2 = 1.101 \times 2^2$.
Exponent $= 2 + 127\;\text{(bias)} = 129 = 10000001_2$.
Mantissa (fraction) = 101 followed by 20 zeros = 10100000000000000000000.
Result: 1 10000001 10100000000000000000000
Hex: C0D00000
module full_adder (
input a, b, cin,
output sum, cout
);
assign sum = a ^ b ^ cin;
assign cout = (a & b) | (a & cin) | (b & cin);
endmodule
module ripple_adder #(parameter N=8) (
input [N-1:0] A, B,
input Cin,
output [N-1:0] Sum,
output Cout
);
wire [N:0] c;
assign c[0] = Cin;
genvar i;
generate
for (i = 0; i < N; i = i+1) begin : fa_loop
full_adder fa_inst (
.a(A[i]), .b(B[i]), .cin(c[i]),
.sum(Sum[i]), .cout(c[i+1])
);
end
endgenerate
assign Cout = c[N];
endmodule
The scan-from-right method: copy rightmost bits up to and including the first 1, then invert all remaining bits.
Why it works: Let $N = \ldots 1\underbrace{0\ldots0}_{k}$ (last 1 at position $k$, zeros below). 2's comp $= \overline{N} + 1$. Adding 1 to $\overline{N}$ propagates carries through all the 1s at positions $0 \ldots k-1$ (which were 0s in $N$, inverted to 1s in $\overline{N}$), zeroing them and carrying into position $k$. Position $k$ was 1 in $N$, inverted to 0 in $\overline{N}$, $+1$ carry makes it 1. Higher bits: $\overline{N}$ bits (inverted from $N$). Net result: bits $0 \ldots k-1$ are 0 (same as $N$), bit $k$ is 1 (same as $N$), bits above $k$ are inverted. $\blacksquare$
| Format | Min | Max |
|---|---|---|
| Unsigned 8-bit | 0 | 255 |
| Sign-magnitude | −127 | +127 |
| 1's complement | −127 | +127 |
| 2's complement | −128 | +127 |
2's complement has one extra negative number (−128) because 10000000 has no positive counterpart; sign-magnitude and 1's complement have two representations of zero.
Define: $g_i = A_i B_i$ (generate), $p_i = A_i \oplus B_i$ (propagate).
$$c_1 = g_0 + p_0 c_0$$ $$c_2 = g_1 + p_1 g_0 + p_1 p_0 c_0$$ $$c_3 = g_2 + p_2 g_1 + p_2 p_1 g_0 + p_2 p_1 p_0 c_0$$ $$c_4 = g_3 + p_3 g_2 + p_3 p_2 g_1 + p_3 p_2 p_1 g_0 + p_3 p_2 p_1 p_0 c_0$$All carries computed in parallel in 2 logic levels (AND-OR), vs. 8 levels for ripple-carry.
2's complement: invert all bits, add 1.
01010101: invert → 10101010, +1 → 10101011 (= −85)
11110000: invert → 00001111, +1 → 00010000 (= +16; confirms −16 → +16)
10000000: invert → 01111111, +1 → 10000000 (special case: −128, no positive counterpart in 8-bit)
0.625: 0.625×2=1.25→1; 0.25×2=0.5→0; 0.5×2=1.0→1; exact.
$0.625 = 0.101_2$ (exact in 3 bits)
0.3: 0.3×2=0.6→0; 0.6×2=1.2→1; 0.2×2=0.4→0; 0.4×2=0.8→0; 0.8×2=1.6→1; 0.6→1; 0.2→0; 0.4→0 (repeats)
$0.3 \approx 0.01001100_2$ (8 bits, repeating)
Prove $(A \oplus B) \oplus C = A \oplus (B \oplus C)$:
Note $A \oplus B = A\overline{B} + \overline{A}B$. Expand $(A \oplus B) \oplus C$:
$$= (A\overline{B} + \overline{A}B)\overline{C} + \overline{(A\overline{B} + \overline{A}B)} \cdot C$$ $$= A\overline{B}\,\overline{C} + \overline{A}B\overline{C} + (\overline{A}\,\overline{B} + AB)C$$ $$= A\overline{B}\,\overline{C} + \overline{A}B\overline{C} + \overline{A}\,\overline{B}C + ABC$$Expanding $A \oplus (B \oplus C)$ yields the same 4 minterms $(1,2,4,7)$. $\blacksquare$
BCD encodes each decimal digit in 4 bits.
947: 9 = 1001, 4 = 0100, 7 = 0111
BCD(947) = 1001 0100 0111
BCD addition 67 + 45:
0110 0111 (67 BCD) + 0100 0101 (45 BCD) ----------- 1010 1100 (binary sum)
Lower nibble 1100 = 12 > 9 → add 0110 correction, generate carry:
1100 + 0110 = 0010 carry 1 Upper nibble 1010 + 1 (carry) = 1011 > 9 → add 0110: 1011 + 0110 = 0001 carry 1
Result: 0001 0001 0010 BCD = 112 ✓ (67+45=112)
module adder8 (
input [7:0] A, B,
input Cin,
output [8:0] Result
);
assign Result = {1'b0, A} + {1'b0, B} + Cin;
endmodule
The 9-bit result Result[8] captures the carry out. Using {1'b0, A} zero-extends to 9 bits before addition, giving a correct 9-bit sum.
01110101 (117) − 11010110. Subtraction = add 2's complement of subtrahend.
2's complement of 11010110: invert→00101001, +1→00101010 (= −42 represented, but +42 in unsigned).
01110101 (117) + 00101010 (+42... wait, -(-42)=+42) ----------
Actually: 01110101 − 11010110 = 01110101 + 2'scomp(11010110).
11010110 inverted = 00101001, +1 = 00101010.
01110101 + 00101010 = 10011111. Carry out = 0.
Result $10011111$ in 2's complement $= -97$. Check: $117 - (-42) = 159$, but in 8-bit signed, $11010110 = -42$, so $117-(-42)=159$ overflows. Alternately if $11010110$ unsigned $= 214$: $117-214 = -97$. Result $= 10011111_2 = -97$.
$(752)_8$ to binary: $7=111,\; 5=101,\; 2=010$ → $111101010_2$
To hex: group binary in 4s from right: $0001\;1110\;1010$ → $1\text{EA}_{16}$
$(\text{B3D})_{16}$ to binary: $B=1011,\; 3=0011,\; D=1101$ → $101100111101_2$
To octal: group binary in 3s from right: $101\;100\;111\;101$ → $5475_8$
For an $n$-bit CLA adder:
Total gate count $\approx O(n^2)$ gates but $O(1)$ depth (constant 4 levels for standard 4-bit CLA block). For $n$-bit using hierarchical CLA: $O(n \log n)$ gates with $O(\log n)$ depth.
Ripple-carry adder ($n$-bit): Critical path $= n \times t_{FA} \approx n \times 2\;\text{gate levels} = O(n)$ delay.
For 4-bit at 1 ns per gate: $\sim 8$ ns (through 4 FAs, each 2 levels).
Carry-lookahead adder (CLA): Carries computed in 2 logic levels; sum then 1 more XOR level = 3 levels total, $O(\log n)$ for hierarchical CLA.
For 4-bit CLA: $\sim 3$ gate delays regardless of $n$ (within one block).
Tradeoff: CLA uses more gates (fan-in grows) but is significantly faster for wide adders.
Hierarchical CLA using two 2-bit CLA blocks (each handles bits 0-1 and bits 2-3):
Block 0 (bits 0,1): computes $c_1, c_2$, and group generate $G_0 = g_1 + p_1 g_0$, group propagate $P_0 = p_1 p_0$.
Block 1 (bits 2,3): uses $c_2$ from Block 0; computes $c_3, c_4, G_1, P_1$.
Top-level carry:
$$c_2 = G_0 + P_0 \cdot c_0 \qquad c_4 = G_1 + P_1 \cdot c_2$$This achieves the same $O(1)$ carry depth within 4 bits, and extends to 16-bit with another hierarchical level.
module array_mult4 (
input [3:0] A, B,
output [7:0] P
);
wire [3:0] pp [3:0];
wire [7:0] sum;
genvar i;
generate
for (i = 0; i < 4; i = i+1) begin : partial
assign pp[i] = A & {4{B[i]}};
end
endgenerate
// Sum partial products with shifts
assign P = pp[0]
+ (pp[1] << 1)
+ (pp[2] << 2)
+ (pp[3] << 3);
endmodule
Each row pp[i] is the AND of A with a replicated B[i]. Shifted addition produces the 8-bit product.
Add $10^n$ and subtract: $A - B = A + (10^n - B) - 10^n$. $10^n - B$ is the 10's complement of $B$. After the addition $A + (10^n - B)$ we obtain a value whose leading position carries $1$ (representing $10^n$) when $A \ge B$; discarding that carry-out subtracts the $10^n$ for free. When $A < B$, no carry occurs and the result is $10^n - (B-A)$, which is the 10's complement of the negative answer.
$045 - 027$: 10's-comp of $027$ is $1000 - 027 = 973$. $045 + 973 = 1018$; discard the leading 1 → $018 = +18$. $027 - 045$: 10's-comp of $045$ is $955$. $027 + 955 = 982$; no carry-out, so the result is $982$, which is the 10's complement of $-18$ (since $1000 - 982 = 18$).
$+5 = 0101$, $+2 = 0010$. 2's-comp of $0010$ is $1110$. $0101 + 1110 = 1\,0011$. Discard the carry-out from bit 4 (= $2^4 = 16$) → $0011 = +3$. The discarded bit corresponds to the $2^n$ term, exactly as predicted.
$+2 = 0010$, $+5 = 0101$. 2's-comp of $0101$ is $1011$. $0010 + 1011 = 1101$ with no carry-out. $1101$ in 4-bit 2's complement is $-(0010+1) = -3$, ✓.
$+5 = 0101$, $-2 = 1110$. 2's-comp of $-2$ is $0010$. $0101 + 0010 = 0111 = +7$. ✓ (No carry to discard in this case because the addend is positive.)
Repeated division by 16:
| step | quot | rem (hex) |
|---|---|---|
| 14959/16 | 934 | 15 = F |
| 934/16 | 58 | 6 |
| 58/16 | 3 | 10 = A |
| 3/16 | 0 | 3 |
Multiply 0.8254 by 2 repeatedly, recording the integer part: $1.6508, 1.3016, 0.6032, 1.2064, 0.4128, 0.8256, 1.6512, 1.3024$ → bits 1,1,0,1,0,1,1,1. So $0.8254_{10} \approx \mathbf{0.11010111_2}$ (with truncation error).
Integer part: $214_{10} = 11010110_2$ (by repeated /2). Fractional part $0.45$: ×2 yields $0.9, 1.8, 1.6, 1.2, 0.4, 0.8, 1.6, \dots$ → bits $0,1,1,1,0,0,1,\dots$. Result: $214.45_{10} \approx \mathbf{11010110.0111001\ldots_2}$.
With 2's-complement subtraction $S = X - Y$ and signed flags:
Structural version instantiates four full-adder modules with $Y$ inverted and $c_0=1$ to form $X-Y$, then derives $Z = !|S$, $N = S[3]$, $V = c_3 \oplus c_2$.
module comp4(input [3:0] X, Y, output Z, N, V, AeqB, AltB, AleB, AgtB, AgeB); wire [3:0] S; wire [3:0] c; full_add fa0(X[0], ~Y[0], 1'b1, S[0], c[0]); full_add fa1(X[1], ~Y[1], c[0], S[1], c[1]); full_add fa2(X[2], ~Y[2], c[1], S[2], c[2]); full_add fa3(X[3], ~Y[3], c[2], S[3], c[3]); assign Z = ~|S, N = S[3], V = c[3]^c[2]; assign AeqB = Z, AltB = N^V, AleB = Z|(N^V), AgtB = ~(Z|(N^V)), AgeB = ~(N^V); endmoduleGeneric version uses
parameter n=4 and a generate for loop over full_add instances.
Ripple-carry array (Figure 3.35): each row is an $n$-bit ripple adder of delay $\approx n\cdot t_{FA}$, and there are $n-1$ rows, so total delay $\approx (n-1)n\, t_{FA} + t_{AND}$. For $n=4$: $\approx 12 t_{FA} + t_{AND}$. Carry-save array (Figure 3.48): the partial-product reduction uses $n-1$ rows of carry-save adders (each $\approx t_{FA}$) plus one final ripple adder of $n$ FAs. Total $\approx (n-1) t_{FA} + n\, t_{FA} + t_{AND} = (2n-1) t_{FA} + t_{AND}$. For $n=4$: $\approx 7 t_{FA} + t_{AND}$. Carry-save is roughly $n/2$× faster.
$f = \sum m(0,2,3,4,5,7)$. A 3-to-8 decoder asserts exactly one output for each minterm.
Connect decoder outputs $Y_0, Y_2, Y_3, Y_4, Y_5, Y_7$ to an OR gate:
$$f = Y_0 + Y_2 + Y_3 + Y_4 + Y_5 + Y_7$$The decoder has inputs $x_1 x_2 x_3$ and active-high outputs. No minimization needed; the OR gate directly implements the canonical SOP.
$f = \sum m(0,2,3,6)$. Shannon expansion on $x_1$:
$$f = \overline{x_1} \cdot f(0,x_2,x_3) + x_1 \cdot f(1,x_2,x_3)$$$f|_{x_1=0}$: minterms $\{0,2,3\}$ → $f_0(x_2,x_3) = \overline{x_3} + x_2$
$f|_{x_1=1}$: minterm $\{6\}$ → $(x_2 x_3) = 10$ → $f_1 = x_2\overline{x_3}$
2-to-1 mux: select $= x_1$; $D_0 = \overline{x_3} + x_2$, $D_1 = x_2\overline{x_3}$.
4-to-1 mux with select $S_1 S_0$, inputs $D_0 \ldots D_3$, output $Y$:
$$Y = \overline{S_1}\,\overline{S_0}\,D_0 + \overline{S_1}\,S_0\,D_1 + S_1\,\overline{S_0}\,D_2 + S_1\,S_0\,D_3$$Circuit: - 2 NOT gates: S̄₁, S̄₀ - 4 AND gates (3-input each): one per term - 1 OR gate (4-input)
Total: 2 NOT + 4 AND + 1 OR = 7 gates.
4-to-2 priority encoder: highest-priority active input determines output; V=1 if any input active.
| $I_3$ | $I_2$ | $I_1$ | $I_0$ | $Y_1$ | $Y_0$ | $V$ |
|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | x | x | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| 0 | 0 | 1 | x | 0 | 1 | 1 |
| 0 | 1 | x | x | 1 | 0 | 1 |
| 1 | x | x | x | 1 | 1 | 1 |
$Y_1 = I_3 + I_2$; $\;Y_0 = I_3 + I_1\overline{I_2}$; $\;V = I_0 + I_1 + I_2 + I_3$
2-to-4 decoder with enable E (active-high):
| $E$ | $A_1$ | $A_0$ | $Y_0$ | $Y_1$ | $Y_2$ | $Y_3$ |
|---|---|---|---|---|---|---|
| 0 | x | x | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 1 |
module dec2to4 (input E, A1, A0, output reg [3:0] Y);
always @(*) begin
if (!E)
Y = 4'b0000;
else
case ({A1, A0})
2'b00: Y = 4'b0001;
2'b01: Y = 4'b0010;
2'b10: Y = 4'b0100;
2'b11: Y = 4'b1000;
default: Y = 4'b0000;
endcase
end
endmodule
BCD to 7-segment (common cathode, segments a-g). Digits 0-9; inputs 10-15 are don't-cares.
| Digit | wxyz | a | b | c | d | e | f | g |
|---|---|---|---|---|---|---|---|---|
| 0 | 0000 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
| 1 | 0001 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2 | 0010 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 3 | 0011 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| 4 | 0100 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 5 | 0101 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
| 6 | 0110 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 7 | 0111 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 8 | 1000 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 9 | 1001 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
K-map minimization with don't-cares for inputs 10–15 (BCD = wxyz, MSB = w):
$$\boxed{a = w + y + xz + \overline{x}\,\overline{z}} \qquad \boxed{b = \overline{x} + yz + \overline{y}\,\overline{z}} \qquad \boxed{c = x + \overline{z} + \overline{y}}$$ $$\boxed{d = w + \overline{x}\,\overline{z} + \overline{x}y + y\overline{z} + x\overline{y}z} \qquad \boxed{e = \overline{x}\,\overline{z} + y\overline{z} = \overline{z}(\overline{x}+y)}$$ $$\boxed{f = w + \overline{y}\,\overline{z} + x\overline{z} + x\overline{y}} \qquad \boxed{g = w + \overline{x}y + x\overline{y} + x\overline{z}}$$2-bit magnitude comparator. Inputs $A = A_1 A_0$, $B = B_1 B_0$.
$A > B$ when $A_1 > B_1$, or $A_1 = B_1$ and $A_0 > B_0$.
$$\text{AgtB} = A_1\overline{B_1} + (A_1 \odot B_1) \cdot A_0\overline{B_0}$$ $$\text{AeqB} = (A_1 \odot B_1) \cdot (A_0 \odot B_0) \quad [\odot = \text{XNOR}]$$ $$\text{AltB} = \overline{A_1}B_1 + (A_1 \odot B_1) \cdot \overline{A_0}B_0$$Note: exactly one of AgtB, AeqB, AltB is 1 for any input.
Two 2-to-4 decoders (each with enable) build a 3-to-8 decoder:
When A₂=0, decoder A is enabled and selects among Y₀..Y₃. When A₂=1, decoder B is enabled and selects among Y₄..Y₇. Exactly one of the 8 outputs is active at any time.
module priority_enc8 (
input [7:0] in,
output reg [2:0] out,
output reg valid
);
always @(*) begin
valid = 1'b1;
if (in[7]) out = 3'd7;
else if (in[6]) out = 3'd6;
else if (in[5]) out = 3'd5;
else if (in[4]) out = 3'd4;
else if (in[3]) out = 3'd3;
else if (in[2]) out = 3'd2;
else if (in[1]) out = 3'd1;
else if (in[0]) out = 3'd0;
else begin out = 3'd0; valid = 1'b0; end
end
endmodule
An incomplete case statement without a default in a combinational always block infers latches for any output not assigned in all branches.
The synthesizer holds the previous value for unspecified cases, which requires storage elements (latches). This is usually unintended and leads to unexpected behavior and synthesis warnings.
Fix: always include a default branch, or use always @(*) with full assignments.
Gray code: consecutive codes differ in exactly 1 bit. For 4-bit: $G_3 G_2 G_1 G_0$ from binary $B_3 B_2 B_1 B_0$:
$$G_3 = B_3 \qquad G_2 = B_3 \oplus B_2 \qquad G_1 = B_2 \oplus B_1 \qquad G_0 = B_1 \oplus B_0$$module bin2gray (input [3:0] B, output [3:0] G);
assign G = B ^ (B >> 1);
endmodule
| Bin | Gray | Bin | Gray |
|---|---|---|---|
| 0000 | 0000 | 1000 | 1100 |
| 0001 | 0001 | 1001 | 1101 |
| 0010 | 0011 | 1010 | 1111 |
| 0011 | 0010 | 1011 | 1110 |
Shannon's expansion:
$$f(x_1,\ldots,x_n) = \overline{x_i} \cdot f(x_1,\ldots,0,\ldots,x_n) + x_i \cdot f(x_1,\ldots,1,\ldots,x_n)$$Proof: Every input combination has $x_i = 0$ or $x_i = 1$.
Case $x_i = 0$: RHS $= 1 \cdot f(\ldots,0,\ldots) + 0 \cdot f(\ldots,1,\ldots) = f(\ldots,0,\ldots)$ = LHS. $\checkmark$
Case $x_i = 1$: RHS $= 0 \cdot f(\ldots,0,\ldots) + 1 \cdot f(\ldots,1,\ldots) = f(\ldots,1,\ldots)$ = LHS. $\checkmark$
Since both cases hold for all input combinations, the expansion is valid. $\blacksquare$
Build 8-to-1 mux from 2-to-1 muxes in a tree:
Level 1 (4 muxes): pair up D₀/D₁, D₂/D₃, D₄/D₅, D₆/D₇, selected by S₀.
Level 2 (2 muxes): pair level-1 outputs, selected by S₁.
Level 3 (1 mux): pair level-2 outputs, selected by S₂.
Total: 4+2+1 = 7 two-to-one muxes. S₂ is the MSB of select.
module mux4to1 (
input [1:0] sel,
input [3:0] d,
output reg y
);
always @(*)
y = d[sel];
// alternatively using conditional operator:
// assign y = (sel==2'b00) ? d[0] :
// (sel==2'b01) ? d[1] :
// (sel==2'b10) ? d[2] : d[3];
endmodule
1-to-4 demultiplexer: routes single input I to one of four outputs based on S₁S₀.
| $S_1$ | $S_0$ | $Y_0$ | $Y_1$ | $Y_2$ | $Y_3$ |
|---|---|---|---|---|---|
| 0 | 0 | I | 0 | 0 | 0 |
| 0 | 1 | 0 | I | 0 | 0 |
| 1 | 0 | 0 | 0 | I | 0 |
| 1 | 1 | 0 | 0 | 0 | I |
$Y_i = I \cdot \text{(decoder output for } i\text{)} = I \cdot m_i(S_1, S_0)$
module comparator4 (
input [3:0] A, B,
output AgtB, AeqB, AltB
);
assign AgtB = (A > B);
assign AeqB = (A == B);
assign AltB = (A < B);
endmodule
Synthesis tools expand the relational operators into optimized combinational logic.
module bcd_to_7seg (
input [3:0] bcd,
output reg [6:0] seg // seg[6]=a, seg[5]=b,..., seg[0]=g
);
function [6:0] decode;
input [3:0] d;
case (d)
4'd0: decode = 7'b1111110;
4'd1: decode = 7'b0110000;
4'd2: decode = 7'b1101101;
4'd3: decode = 7'b1111001;
4'd4: decode = 7'b0110011;
4'd5: decode = 7'b1011011;
4'd6: decode = 7'b1011111;
4'd7: decode = 7'b1110000;
4'd8: decode = 7'b1111111;
4'd9: decode = 7'b1111011;
default: decode = 7'b0000000;
endcase
endfunction
always @(*) seg = decode(bcd);
endmodule
module popcount8 (
input [7:0] data,
output reg [3:0] count
);
integer i;
always @(*) begin
count = 0;
for (i = 0; i < 8; i = i+1)
count = count + data[i];
end
endmodule
The for loop is unrolled by synthesis into a tree of adders. Result is the number of 1-bits (0 to 8, needs 4 bits).
Apply Shannon's expansion recursively on all variables:
$$f(x_1,x_2,x_3) = \overline{x_1}\,\overline{x_2} \cdot f(0,0,x_3) + \overline{x_1}\,x_2 \cdot f(0,1,x_3) + x_1\,\overline{x_2} \cdot f(1,0,x_3) + x_1\,x_2 \cdot f(1,1,x_3)$$Each cofactor $f(a,b,x_3)$ is further expanded on $x_3$:
$$f(a,b,x_3) = \overline{x_3} \cdot f(a,b,0) + x_3 \cdot f(a,b,1)$$$f(a,b,c) \in \{0,1\}$ for each specific minterm. This recursively produces the canonical sum of minterms: $f = \sum_i m_i \cdot f(m_i)$, where the sum is over all input combinations where $f = 1$. $\blacksquare$
casez: Treats z (high-impedance) and ? as don't-care bits in case item comparisons. Used for priority encoders and partial matches.
casex: Treats both x (unknown) and z as don't-care. More aggressive masking — also masks unknown values in the case expression itself, which can hide simulation mismatches.
Best practice: Prefer casez (or casez with ?) for synthesis; avoid casex as it masks Xs in simulation, potentially hiding bugs.
module sort3 (
input [7:0] a, b, c,
output reg [7:0] x, y, z // x≤y≤z
);
task swap;
inout [7:0] p, q;
reg [7:0] t;
begin
if (p > q) begin t = p; p = q; q = t; end
end
endtask
always @(*) begin
x = a; y = b; z = c;
swap(x, y); // ensure x≤y
swap(y, z); // bubble largest to z
swap(x, y); // final sort x≤y
end
endmodule
CLA carry: $c_{i+1} = g_i + p_i c_i$. This is a 2-to-1 mux structure:
$c_{i+1} = \text{mux}(p_i,\; g_i,\; c_i)$: if $p_i = 1$, output $= c_i$ (propagate); if $p_i = 0$, output $= g_i$ (generate or kill).
This is a 2-to-1 mux with select $= p_i$, $D_1 = c_i$, $D_0 = g_i$.
$$c_{i+1} = p_i \cdot c_i + \overline{p_i} \cdot g_i = \text{mux}(p_i,\; g_i,\; c_i)$$This "carry select" mux structure underlies fast carry-select adders.
$f = w_1 w_2 + w_2 w_3 + \overline{w_1}\,\overline{w_2}\,w_3$. Expand by Shannon on $w_1$:
$$f|_{w_1=0} = w_2 w_3 + \overline{w_2}\,w_3 = w_3(w_2 + \overline{w_2}) = w_3$$ $$f|_{w_1=1} = w_2 + w_2 w_3 = w_2(1 + w_3) = w_2$$Use 2-to-1 mux with select $= w_1$: $D_0 = w_3$, $D_1 = w_2$.
$$\boxed{f = \overline{w_1} \cdot w_3 + w_1 \cdot w_2}$$module decoder_n #(parameter N=3) (
input [N-1:0] addr,
input en,
output [(1<
Cascadable 4-bit comparator has additional inputs: GIN (greater-in), EIN (equal-in), LIN (less-in) from the less-significant stage.
GOUT = AgtB_internal OR (AeqB_internal AND GIN)
EOUT = AeqB_internal AND EIN
LOUT = AltB_internal OR (AeqB_internal AND LIN)
Initialize the LSB stage with GIN=0, EIN=1, LIN=0. Chain stages from LSB to MSB; the MSB stage outputs the final result.
Two 2-to-1 muxes both controlled by $s$: $y_1 = \overline{s}\,x_1 + s\,x_2$, $y_2 = \overline{s}\,x_2 + s\,x_1$. When $s=0$ each output forwards its same-index input; when $s=1$ they swap.
Maj($w_1, w_2, w_3$) = $w_1 w_2 + w_1 w_3 + w_2 w_3$. With $w_1, w_2$ as the 4-to-1 mux selects:
| $w_1 w_2$ | $f$ |
|---|---|
| 00 | 0 |
| 01 | $w_3$ |
| 10 | $w_3$ |
| 11 | 1 |
Verilog (4-to-1 MUX form, matching the table):
module majority3 (
input w1, w2, w3,
output reg f
);
// {w1, w2} chooses one of four data inputs: {0, w3, w3, 1}
always @(*) begin
case ({w1, w2})
2'b00: f = 1'b0; // I0
2'b01: f = w3; // I1
2'b10: f = w3; // I2
2'b11: f = 1'b1; // I3
endcase
end
endmodule
Verilog (direct minimum SOP):
module majority3 ( input w1, w2, w3, output f ); assign f = (w1 & w2) | (w1 & w3) | (w2 & w3); endmodule
Recursively decompose $w_1\oplus w_2\oplus w_3 = w_1 \oplus (w_2\oplus w_3)$. Use one mux selected by $w_2$ to compute $w_2\oplus w_3$ (data inputs $w_3$ and $\overline{w_3}$). Then a second mux selected by $w_1$ takes that XOR result and its complement as data, producing $f$. Total: 2 muxes plus inverters for the complemented data lines.
Verilog (two cascaded 2-to-1 MUXes): the concept is very important
module xor3_via_muxes ( input w1, w2, w3, output f ); // First mux: select=w2, computes w2 ⊕ w3 // when w2=0 → output = w3 // when w2=1 → output = ~w3 wire xor23 = w2 ? ~w3 : w3; // Second mux: select=w1, computes w1 ⊕ xor23 // when w1=0 → output = xor23 // when w1=1 → output = ~xor23 assign f = w1 ? ~xor23 : xor23; endmodule
Verilog (direct XOR — for comparison): wrong answer
module xor3_direct ( input w1, w2, w3, output f ); assign f = w1 ^ w2 ^ w3; endmodule
Both produce the same function. The MUX-based form makes the recursive decomposition $w_1 \oplus (w_2 \oplus w_3)$ explicit and shows how Shannon-expansion turns an arbitrary function into a tree of 2-to-1 MUXes.
$f = \overline{w_1}\,\overline{w_3} + w_2 w_3$. On $w_1$: $f_{w_1}=w_2 w_3$, $f_{\overline{w_1}} = \overline{w_3}+w_2 w_3 = \overline{w_3}+w_2$. → 6 inputs. On $w_2$: $f_{w_2} = \overline{w_1}\,\overline{w_3}+w_3$, $f_{\overline{w_2}} = \overline{w_1}\,\overline{w_3}$. → 5 inputs. On $w_3$: $f_{w_3} = w_2$, $f_{\overline{w_3}} = \overline{w_1}$. So $f = \overline{w_3}\,\overline{w_1} + w_3 w_2$ — only 4 gate-inputs. Expansion on $w_3$ wins.
Shannon on $w_1$: $f_{\overline{w_1}} = \overline{w_3}$, $f_{w_1} = w_2 + w_3$. (a) 2-to-1 mux selected by $w_1$, data inputs $\overline{w_3}$ and $(w_2+w_3)$ — needs an inverter and an OR. (b) Shannon on $\{w_1, w_3\}$: cofactors at $(w_1, w_3) \in \{00,01,10,11\}$ are $\{1, 0, 1, w_2\to 1\}\to\{1,0,1,1\}$. So 4-to-1 mux selected by $\{w_1, w_3\}$ has data inputs $1, 0, w_2, 1$.
Maj $= w_1 w_2 + w_1 w_3 + w_2 w_3$. Shannon on $w_1$: $f_{\overline{w_1}} = w_2 w_3$, $f_{w_1} = w_2 + w_3$. Shannon each cofactor on $w_2$: $f_{\overline{w_1}\overline{w_2}}=0$, $f_{\overline{w_1}w_2} = w_3$; $f_{w_1\overline{w_2}} = w_3$, $f_{w_1 w_2} = 1$. Build the tree as two muxes selected by $w_2$ (data $0,w_3$ and $w_3,1$) feeding one mux selected by $w_1$. Total: 3 2-to-1 muxes.
A 2-to-4 decoder with En=1 outputs the four minterms of its select inputs $s_0, s_1$. AND each minterm $m_i$ with the corresponding data $w_i$ and OR all four results: $f = m_0 w_0 + m_1 w_1 + m_2 w_2 + m_3 w_3$ — exactly the function of a 4-to-1 mux.
module mux2to1(input w0, w1, s, output f); assign f = s ? w1 : w0; endmodule
module mux2to1(input w0, w1, s, output reg f);
always @(*) begin
if (s) f = w1;
else f = w0;
end
endmodule
module mux16to1(input [15:0] W, input [3:0] S, output f); wire [3:0] m; mux4to1 g0(W[3:0], S[1:0], m[0]); mux4to1 g1(W[7:4], S[1:0], m[1]); mux4to1 g2(W[11:8], S[1:0], m[2]); mux4to1 g3(W[15:12], S[1:0], m[3]); mux4to1 g4(m, S[3:2], f); endmodule
module mux4to1(input [3:0] W, input [1:0] S, output reg f);
always @(*) case (S)
2'b00: f = W[0];
2'b01: f = W[1];
2'b10: f = W[2];
2'b11: f = W[3];
endcase
endmodule
module dec2to4(input [1:0] W, input En, output reg [3:0] Y);
always @(*) case ({En, W})
3'b100: Y = 4'b0001;
3'b101: Y = 4'b0010;
3'b110: Y = 4'b0100;
3'b111: Y = 4'b1000;
default: Y = 4'b0000;
endcase
endmodule
module dec2to4(input [1:0] W, input En, output reg [3:0] Y);
always @(*) begin
if (En == 0) Y = 4'b0000;
else case (W)
2'b00: Y = 4'b0001;
2'b01: Y = 4'b0010;
2'b10: Y = 4'b0100;
2'b11: Y = 4'b1000;
endcase
end
endmodule
module dec4to16(input [3:0] W, input En, output [15:0] Y); wire [3:0] M; dec2to4 d0(W[3:2], En, M); dec2to4 d1(W[1:0], M[0], Y[3:0]); dec2to4 d2(W[1:0], M[1], Y[7:4]); dec2to4 d3(W[1:0], M[2], Y[11:8]); dec2to4 d4(W[1:0], M[3], Y[15:12]); endmodule
dec2to4?dec2to4 is a 2-to-4 binary decoder with an enable input — the small reusable cell that gets instantiated 5 times in the 4-to-16 decoder above. Two select bits choose which one of four outputs is asserted; the enable input gates the whole decoder so its outputs can be forced to zero.
Behavior — for inputs W[1:0] and En:
| En | W[1] | W[0] | Y[3] | Y[2] | Y[1] | Y[0] |
|---|---|---|---|---|---|---|
| 0 | x | x | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 |
Equivalently: Y[i] = En AND (binary value of W equals i); all other Y's are 0.
Verilog:
module dec2to4 (
input [1:0] W,
input En,
output reg [3:0] Y
);
always @(*) begin
Y = 4'b0000;
if (En) Y[W] = 1'b1; // assert exactly one output
end
endmodule
(Or, equivalently, a flat case (W) + AND with En.)
Why it shows up here: the 4-to-16 decoder problem (4.26) uses dec2to4 as a building block instantiated 5 times in a 2-stage tree:
d0(W[3:2], En, M) — decodes the high two bits, gated by En. Only one of the four outputs M[0..3] is high.d1..d4 — each takes the low two bits W[1:0] as its select, and uses one of M[i] as its enable. Only the bottom decoder whose enable is high produces an output; the other three output all-zeros.Net effect: exactly one of the 16 outputs Y[0..15] is high — the one whose index equals the binary value of W[3:0] (provided En=1). This is the same hierarchical-decoder trick used in real DRAM row/column address decoders.
Why hierarchical? A flat 4-to-16 decoder would need 16 4-input AND gates (64 input pins). The 2-stage tree uses 5 dec2to4 modules ≈ 5 × 4 = 20 2-input gates ≈ 40 input pins — slightly less area, but more importantly the tree shape generalizes: an $n$-to-$2^n$ decoder can be built from $\lceil n/2 \rceil$-stage trees of dec2to4 cells. The same enable trick is what lets you cascade many memory chips on a shared address bus — each chip's chip-select comes from a higher-stage decoder.
module seg7(input [3:0] hex, output reg [6:0] leds);
always @(*) case (hex) // active-low; bits a..g
4'h0: leds = 7'b0000001;
4'h1: leds = 7'b1001111;
4'h2: leds = 7'b0010010;
4'h3: leds = 7'b0000110;
4'h4: leds = 7'b1001100;
4'h5: leds = 7'b0100100;
4'h6: leds = 7'b0100000;
4'h7: leds = 7'b0001111;
4'h8: leds = 7'b0000000;
4'h9: leds = 7'b0000100;
4'hA: leds = 7'b0001000;
4'hB: leds = 7'b1100000;
4'hC: leds = 7'b0110001;
4'hD: leds = 7'b1000010;
4'hE: leds = 7'b0110000;
4'hF: leds = 7'b0111000;
endcase
endmodule
module alu74381(input [2:0] S, input [3:0] A, B, output reg [3:0] F);
always @(*) case (S)
3'b000: F = 4'b0000; // clear
3'b001: F = B - A;
3'b010: F = A - B;
3'b011: F = A + B;
3'b100: F = A ^ B;
3'b101: F = A | B;
3'b110: F = A & B;
3'b111: F = 4'b1111; // preset
endcase
endmodule
module prio4(input [3:0] W, output reg [1:0] Y, output reg z);
always @(*) begin
z = |W;
casex (W)
4'b1xxx: Y = 2'd3;
4'b01xx: Y = 2'd2;
4'b001x: Y = 2'd1;
4'b0001: Y = 2'd0;
default: Y = 2'd0;
endcase
end
endmodule
module dec2to4(input [1:0] W, input En, output reg [3:0] Y);
integer k;
always @(*) begin
Y = 4'b0;
for (k = 0; k < 4; k = k+1)
if (En && (W == k)) Y[k] = 1'b1;
end
endmodule
module prio4(input [3:0] W, output reg [1:0] Y, output reg z);
integer k;
always @(*) begin
Y = 2'b0; z = 1'b0;
for (k = 0; k < 4; k = k+1)
if (W[k]) begin Y = k[1:0]; z = 1'b1; end
end
endmoduleBecause later loop iterations overwrite earlier ones, the highest-index asserted bit wins.
module cmp4(input [3:0] A, B, output reg AeqB, AgtB, AltB);
always @(*) begin
AeqB = 0; AgtB = 0; AltB = 0;
if (A == B) AeqB = 1;
else if (A > B) AgtB = 1;
else AltB = 1;
end
endmodule
module rca #(parameter n=4)(input [n-1:0] A, B, input cin,
output [n-1:0] S, output cout);
wire [n:0] c; assign c[0] = cin; assign cout = c[n];
genvar i;
generate for (i=0; i
module mux16to1(input [15:0] W, input [3:0] S, output reg f);
reg [3:0] M;
task mux4; input [3:0] X; input [1:0] s; output y;
case (s) 0:y=X[0]; 1:y=X[1]; 2:y=X[2]; 3:y=X[3]; endcase
endtask
always @(*) begin
mux4(W[3:0], S[1:0], M[0]);
mux4(W[7:4], S[1:0], M[1]);
mux4(W[11:8], S[1:0], M[2]);
mux4(W[15:12], S[1:0], M[3]);
mux4(M, S[3:2], f);
end
endmodule
module mux16to1(input [15:0] W, input [3:0] S, output f);
function mux4; input [3:0] X; input [1:0] s;
case (s) 0:mux4=X[0]; 1:mux4=X[1]; 2:mux4=X[2]; 3:mux4=X[3]; endcase
endfunction
wire [3:0] M;
assign M[0] = mux4(W[3:0], S[1:0]);
assign M[1] = mux4(W[7:4], S[1:0]);
assign M[2] = mux4(W[11:8], S[1:0]);
assign M[3] = mux4(W[15:12], S[1:0]);
assign f = mux4(M, S[3:2]);
endmoduleDifference: a function returns one value (no output ports) and may be used in a continuous assign; a task can have multiple outputs but must be invoked inside an always block.
Use a 3-to-8 decoder to produce minterms $m_0\dots m_7$, then OR the outputs corresponding to $m_0, m_1, m_3, m_4, m_6, m_7$: $f = m_0+m_1+m_3+m_4+m_6+m_7$. One 6-input OR gate suffices.
One-hot inputs $w_0\dots w_7$. Truth table maps $w_i=1$ to $y_2 y_1 y_0 = i$. $y_0 = w_1+w_3+w_5+w_7$, $y_1 = w_2+w_3+w_6+w_7$, $y_2 = w_4+w_5+w_6+w_7$ (three 4-input OR gates).
Shannon on $(w_1, w_4)$ as the mux selects. Cofactors are computed from $f$: $f|_{00}=0$, $f|_{01}=w_5(\overline{w_2}+w_3)$, $f|_{10}=\overline{w_2}+w_3$, $f|_{11}=1$. So a 4-to-1 mux selected by $\{w_1, w_4\}$ with data inputs $0,\,w_5(\overline{w_2}+w_3),\,\overline{w_2}+w_3,\,1$ realizes $f$. Extra logic: one OR ($\overline{w_2}+w_3$) feeding both an AND with $w_5$ and the third data input — assuming $\overline{w_2}$ comes from an inverter.
| $b_2 b_1 b_0$ | $g_2 g_1 g_0$ |
|---|---|
| 000 | 000 |
| 001 | 001 |
| 010 | 011 |
| 011 | 010 |
| 100 | 110 |
| 101 | 111 |
| 110 | 101 |
| 111 | 100 |
Shannon: $f = \overline{w_1} f_{\overline{w_1}} + w_1 f_{w_1}$. (a) If $f_{\overline{w_1}} = 0$: $f = w_1\, f_{w_1}$ — the mux degenerates to a single AND gate. (b) If $f_{\overline{w_1}} = 1$: $f = \overline{w_1} + w_1 f_{w_1} = \overline{w_1} + f_{w_1}$ — the mux degenerates to a single OR gate.
A 4-to-1 mux has 6 inputs ($w_0..w_3, s_0, s_1$). With 4-LUTs, two LUTs suffice: LUT1 takes $(w_0, w_1, s_0, s_1)$ and produces $a = \overline{s_1}\overline{s_0}w_0 + \overline{s_1}s_0 w_1$; LUT2 takes $(a, w_2, w_3, s_0, s_1)$ — too many. Better: LUT1 takes $(w_0, w_1, s_0)$ → $\overline{s_0}w_0+s_0 w_1$; LUT2 takes $(w_2, w_3, s_0)$ → $\overline{s_0}w_2+s_0 w_3$; LUT3 takes those two outputs and $s_1$ → final mux. That is 3 LUTs of 3 inputs (or 4-LUTs with one input unused). The optimized 2-LUT mapping: LUT1$(w_0,w_1,s_0,s_1) = \overline{s_1}(\overline{s_0}w_0+s_0 w_1)$ and LUT2$(\text{LUT1}, w_2, w_3, s_0)$ ORs in the upper half — 2 4-LUTs.
Five 2-to-1 muxes selected by Shift; data lines (Shift=0, Shift=1): $y_3:(w_3, 0)$, $y_2:(w_2, w_3)$, $y_1:(w_1, w_2)$, $y_0:(w_0, w_1)$, $k:(0, w_0)$. Shift=0 passes $W$ unchanged with $k=0$; Shift=1 right-shifts (LSB into $k$, MSB filled with 0).
Each output $y_i$ takes the rotated bit $w_{(i+S)\bmod 4}$. Use four 4-to-1 muxes, all sharing select $S$: $y_0$ data $\{w_0,w_1,w_2,w_3\}$, $y_1$ data $\{w_1,w_2,w_3,w_0\}$, $y_2$ data $\{w_2,w_3,w_0,w_1\}$, $y_3$ data $\{w_3,w_0,w_1,w_2\}$.
Verilog (explicit four-MUX form):
module barrel_rot4 (
input [3:0] W,
input [1:0] S,
output reg [3:0] Y
);
always @(*) begin
case (S)
2'b00: Y = {W[3], W[2], W[1], W[0]}; // no rotation
2'b01: Y = {W[0], W[3], W[2], W[1]}; // rotate right by 1
2'b10: Y = {W[1], W[0], W[3], W[2]}; // rotate right by 2
2'b11: Y = {W[2], W[1], W[0], W[3]}; // rotate right by 3
endcase
end
endmodule
Verilog (compact rotate idiom):
module barrel_rot4 ( input [3:0] W, input [1:0] S, output [3:0] Y ); // right-rotate: bits shifted off the bottom re-enter at the top assign Y = (W >> S) | (W << (4 - S)); //very smart endmodule
Both versions synthesize to the same 4-bit barrel rotator (four parallel 4-to-1 MUXes sharing $S$ as select). Single MUX-delay regardless of shift amount — that's the whole point of a barrel shifter vs. a sequential shift register.
module mux4to1(input [3:0] W, input [1:0] S, output f); wire [3:0] D; dec2to4 g(.W(S), .En(1'b1), .Y(D)); assign f = | (D & W); endmodule
// Style 1: explicit if-else
module shr1(input [3:0] W, input Shift, output reg [3:0] Y, output reg k);
always @(*) if (Shift) {Y, k} = {1'b0, W};
else {Y, k} = {W, 1'b0};
endmodule
// Style 2: shift operator
module shr2(input [3:0] W, input Shift, output [3:0] Y, output k);
assign {Y, k} = Shift ? {1'b0, W} : {W, 1'b0};
endmodule
module barrel4(input [3:0] W, input [1:0] S, output [3:0] Y);
wire [7:0] T;
assign T = {W, W};
assign Y = T >> S;
endmoduleDoubling $W$ in concatenation makes the right-shift behave as a rotation on the lower 4 bits.
module evenparity(input [6:0] b_in, output [7:0] b_out); assign b_out[6:0] = b_in; assign b_out[7] = ^b_in; // XOR-reduction = even parity bit endmodule$b_7 = b_6\oplus b_5\oplus\cdots\oplus b_0$ ensures the total count of 1s in $b_7\ldots b_0$ is even.
SR latch using two cross-coupled NOR gates:
Q = NOR(R, Q̄); Q̄ = NOR(S, Q)
| S | R | Q(t+1) | State |
|---|---|---|---|
| 0 | 0 | Q(t) | Hold |
| 0 | 1 | 0 | Reset |
| 1 | 0 | 1 | Set |
| 1 | 1 | – | Forbidden (both outputs 0) |
Latch: Level-sensitive; transparent when clock/enable is high — output follows input while enabled. Can have timing issues (multiple transitions per clock cycle).
Flip-flop: Edge-triggered; samples input only at clock edge (rising or falling). Output changes only once per cycle.
Why edge-triggered preferred:
D input sequence (held during each clock cycle): 0,1,1,0,1,0,0,1. Q starts at 0.
The D flip-flop captures D's value during cycle n at the rising edge that ends that cycle, so Q during cycle n+1 equals D during cycle n — the one-cycle delay:
Cycle: 0 1 2 3 4 5 6 7 8 Clk: ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ D: 0 1 1 0 1 0 0 1 - Q: 0 0 1 1 0 1 0 0 1
Read column-by-column: at the rising edge between cycle n and cycle n+1, Q latches D's value from cycle n. So Q's row is D's row shifted right by one column — the visible "one clock cycle delay."
Textbook-style timing diagram: D held during each cycle, Q lags by one cycle
T flip-flop from D flip-flop: D = T⊕Q (XOR T with current output).
Characteristic equation: Q(t+1) = T⊕Q(t)
| T | Q | Q(t+1) |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
3-bit synchronous up counter using T flip-flops:
T₀ = 1 (always toggles — LSB)
T₁ = Q₀ (toggle when Q₀=1)
T₂ = Q₁·Q₀ (toggle when Q₁Q₀=11)
State sequence: 000→001→010→011→100→101→110→111→000
All T FFs clocked simultaneously — synchronous design.
Assume each FF has t_FF = 5 ns and each gate = 1 ns.
Ripple counter: Each FF clocked by previous FF output. Critical path through all 4 FFs: 4 × 5 ns = 20 ns. Max freq = 1/20 ns = 50 MHz.
Synchronous counter: All FFs share clock. Critical path = t_FF + t_AND (for T₃ logic) ≈ 5 + 1 = 6 ns. Max freq = 1/6 ns ≈ 167 MHz.
Synchronous counter is ~3× faster for 4-bit; advantage grows with bit width.
module counter_updown (
input clk, rst, up,
output reg [3:0] count
);
always @(posedge clk or posedge rst) begin
if (rst)
count <= 4'd0;
else if (up)
count <= count + 1;
else
count <= count - 1;
end
endmodule
Blocking (=): Executes sequentially within always block; each assignment takes effect immediately for the next statement.
Non-blocking (<=): All RHS evaluated first, then all LHS updated simultaneously at end of time step. Models register behavior correctly.
Wrong type example — swap with blocking:
// WRONG: blocking swap in sequential logic
always @(posedge clk) begin
a = b; // a gets b's new value
b = a; // b gets a's NEW value (a already changed)
end
// CORRECT: non-blocking
always @(posedge clk) begin
a <= b; // RHS evaluated simultaneously
b <= a; // b gets original a
end
module dff_srst_en (
input clk, srst, en, d,
output reg q
);
always @(posedge clk) begin
if (srst)
q <= 1'b0;
else if (en)
q <= d;
// else q holds
end
endmodule
Synchronous reset: reset takes effect only on rising clock edge. Enable gates the data input.
Setup time (t_su): Minimum time the data input D must be stable before the active clock edge for reliable capture.
Hold time (t_h): Minimum time D must remain stable after the active clock edge.
Clock-to-Q delay (t_cQ): Time from active clock edge until the output Q reaches its new stable value.
Violations: setup violation → metastability (indeterminate output); hold violation → data captured incorrectly regardless of clock period.
Master-slave D FF: Two latches in series (master then slave), clocked by CLK and CLK̄ respectively.
Data is captured at the rising clock edge (master closes, slave opens) — effectively edge-triggered.
Why edge-triggered: The double-latch structure means input can only affect output during a brief transition window at the clock edge, providing full edge-triggered behavior.
Metastability: When a flip-flop's setup or hold time is violated, the output may settle to an indeterminate voltage level between 0 and 1 for an unpredictable time. This can propagate errors through the circuit.
Synchronizer: A chain of two or more flip-flops that allows a metastable signal time to resolve before use. The second FF sees a stable input in most cases. A single FF is insufficient because its output could still be metastable when sampled by subsequent logic.
MTBF (mean time between failures) improves exponentially with each additional synchronizer stage.
module counter_load (
input clk, arst, load, en,
input [3:0] d,
output reg [3:0] q
);
always @(posedge clk or posedge arst) begin
if (arst)
q <= 4'd0;
else if (load)
q <= d;
else if (en)
q <= q + 1;
end
endmodule
Async reset clears on arst edge regardless of clock. Load takes parallel data on clock edge. Enable controls counting.
| J | K | Q(t+1) | Action |
|---|---|---|---|
| 0 | 0 | Q(t) | Hold |
| 0 | 1 | 0 | Reset |
| 1 | 0 | 1 | Set |
| 1 | 1 | Q̄(t) | Toggle |
Characteristic equation: Q(t+1) = J·Q̄ + K̄·Q
4-bit SISO (Serial-In Serial-Out) shift register: 4 D flip-flops in series. Input 1,0,1,1 (MSB first).
Clk In Q₀ Q₁ Q₂ Q₃(out) 0 - 0 0 0 0 1 1 1 0 0 0 2 0 0 1 0 0 3 1 1 0 1 0 4 1 1 1 0 1 ← first bit out
After 4 clocks, the input sequence 1,0,1,1 appears at Q₃ in order.
4-bit ring counter, init = 1000. Each clock, the single 1 bit rotates right (or left).
Cycle: Q₃Q₂Q₁Q₀ 0: 1 0 0 0 1: 0 1 0 0 2: 0 0 1 0 3: 0 0 0 1 4: 1 0 0 0 (repeats)
Period = 4. Only 4 of 16 states are used.
4-bit Johnson counter: complement of Q₃ fed back to D₀ (or Q₃ fed to rightmost). Init = 0000.
Cycle: Q₃Q₂Q₁Q₀ 0: 0 0 0 0 1: 1 0 0 0 2: 1 1 0 0 3: 1 1 1 0 4: 1 1 1 1 5: 0 1 1 1 6: 0 0 1 1 7: 0 0 0 1 8: 0 0 0 0 (repeats)
Johnson counter has 2n = 8 valid states (for n=4 FFs); the other 8 states are unused/invalid.
module shift_reg8 (
input clk, s0, s1, // mode: s1s0
input si_l, si_r, // serial in left/right
input [7:0] d,
output reg [7:0] q
);
// 00=hold, 01=shift right, 10=shift left, 11=parallel load
always @(posedge clk) begin
case ({s1,s0})
2'b00: q <= q;
2'b01: q <= {si_r, q[7:1]}; // shift right
2'b10: q <= {q[6:0], si_l}; // shift left
2'b11: q <= d;
endcase
end
endmodule
module bcd_counter (
input clk, rst,
output reg [3:0] count
);
always @(posedge clk) begin
if (rst || count == 4'd9)
count <= 4'd0;
else
count <= count + 1;
end
endmodule
Counts 0-9; synchronous reset to 0 when reaching 9 or when rst is asserted.
module counter_async_rst (
input clk, arst,
output reg [3:0] q
);
always @(posedge clk or posedge arst) begin
if (arst)
q <= 4'd0;
else
q <= q + 1;
end
endmodule
Async reset: the posedge arst in sensitivity list causes immediate reset independent of clock.
module shift_detect (
input clk, rst, si,
output detected
);
reg [3:0] sr;
always @(posedge clk or posedge rst) begin
if (rst) sr <= 4'b0;
else sr <= {sr[2:0], si};
end
assign detected = (sr == 4'b1011);
endmodule
Shift register captures last 4 bits. Output asserts when pattern 1011 is in the register.
Modulo-6 counter sequences: 000→001→010→011→100→101→000.
State 110 and 111 unused. Using D FFs, derive next-state equations via K-maps:
D₂: K-map over present states (with don't cares at 6,7): D₂ = Q₂Q̄₀ + Q₁Q₀
D₁: D₁ = Q₁⊕Q₀ (standard counter rule for bit 1)
D₀: D₀ = Q̄₂Q̄₁Q̄₀ + Q₂Q̄₁Q̄₀... simplified: D₀ = Q̄₀·(Q̄₂+Q̄₁)
Output can be count value or a terminal count = Q₂Q̄₁Q₀ ... Verify by tracing all 6 states.
Given: t_su=2 ns, t_h=1 ns, t_cQ=3 ns, t_logic=5 ns, t_skew=0.5 ns.
Timing constraint: T_clk ≥ t_cQ + t_logic + t_su + t_skew
T_clk ≥ 3 + 5 + 2 + 0.5 = 10.5 ns
Max clock frequency = 1 / 10.5 ns ≈ 95.2 MHz
Hold constraint: t_cQ + t_logic_min ≥ t_h + t_skew → must also be verified separately.
LFSR with polynomial x³+x+1 (taps at positions 3 and 1). Feedback = Q₃⊕Q₁. Starting from 001:
State (Q₃Q₂Q₁): feedback = Q₃⊕Q₁ 001 → fb=0⊕1=1 → 100... Wait, standard LFSR: Q₁←Q₂, Q₂←Q₃, Q₃←fb 001: fb=Q₁⊕Q₃=1⊕0=1 → next: Q₃=1,Q₂=0,Q₁=0 → 100 100: fb=0⊕1=1 → 110 110: fb=0⊕1=1 → 111 111: fb=1⊕1=0 → 011... wait recheck taps x³+x+1: feedback into position 0 = Q₃⊕Q₁? For right-shift LFSR: new_bit = Q[3]⊕Q[1]: 001→100→110→111→011→101→010→001 (7 states)
Period = 7 = 2³−1. Yes, maximal-length LFSR (2ⁿ−1 = 7 for n=3).
Divide-by-6 with 50% duty cycle: count 0-5, output high for 0-2, low for 3-5 (but this gives 3/6 = 50%).
module div6 (
input clk, rst,
output reg out
);
reg [2:0] cnt;
always @(posedge clk or posedge rst) begin
if (rst) begin cnt <= 0; out <= 0; end
else if (cnt == 3'd5) begin cnt <= 0; out <= ~out; end
else if (cnt == 3'd2) begin cnt <= cnt+1; out <= ~out; end
else cnt <= cnt + 1;
end
endmodule
Alternatively use two edges: toggle on count 0 and count 3 to achieve 50% duty cycle at f_clk/6.
module D_latch(input D, Clk, output reg Q);
always @(D, Clk)
if (Clk) Q = D;
endmoduleThe if-without-else and level-sensitive sensitivity list infer a transparent latch: $Q$ tracks $D$ while Clk=1 and holds when Clk=0.
module flipflop(input D, Clock, output reg Q);
always @(posedge Clock)
Q <= D;
endmodule
With blocking assignments inside always @(posedge Clock), the statements execute in order, so $Q1$ is updated first to $D$, then $Q2$ is set to the new $Q1$ — i.e. both registers receive $D$ on the same edge. The synthesizer infers a single flip-flop driven by $D$ whose output fans out to both $Q1$ and $Q2$ (or two flip-flops with their D inputs tied together).
Non-blocking assignments evaluate all RHS at the edge, then update LHS. So $Q2$ samples the old $Q1$ (the value before this edge), then $Q1$ receives $D$. This is exactly the behavior of two cascaded flip-flops: $D \to Q1 \to Q2$ — a 2-stage shift register.
Blocking assignments execute sequentially, so when the synthesizer sees $f = x_1\,\&\, x_2$ followed by $g = f\,|\, x_3$, the variable $f$ is treated as a wire (its value is the AND result on this cycle). The result: $f$ is registered, but $g$'s flip-flop is fed by $(x_1 x_2)\,|\, x_3$ — i.e. the AND gate output goes through an OR with $x_3$ before the $g$ flip-flop, with no register between them on this path.
With non-blocking, $g \le f \,|\, x_3$ uses the previous value of $f$ (the flip-flop output). So the OR gate that feeds the $g$ flip-flop is driven by the registered $f$, not the live $x_1 x_2$ AND output. Effectively $g$ is one cycle delayed relative to the blocking version, and the AND output no longer feeds OR directly.
module dff_async_reset(input D, Clock, Resetn, output reg Q);
always @(negedge Resetn, posedge Clock)
if (!Resetn) Q <= 1'b0;
else Q <= D;
endmodule
module dff_sync_reset(input D, Clock, Resetn, output reg Q);
always @(posedge Clock)
if (!Resetn) Q <= 1'b0;
else Q <= D;
endmodule
module regn #(parameter n=8)(input [n-1:0] R, input Clock, Resetn,
output reg [n-1:0] Q);
always @(negedge Resetn, posedge Clock)
if (!Resetn) Q <= {n{1'b0}};
else Q <= R;
endmodule
module muxdff(input D0, D1, Sel, Clock, output reg Q); always @(posedge Clock) Q <= Sel ? D1 : D0; endmodule module shift4(input [3:0] R, input L, w, Clock, output [3:0] Q); muxdff s3(Q[3], R[3], L, Clock, Q[3]); // bit 3: serial-in is w? -> see below // Correct wiring (left-shift serial-in = w): // bit3 D0 = Q[2] (shift), D1 = R[3] (load) muxdff u3(Q[2], R[3], L, Clock, Q[3]); muxdff u2(Q[1], R[2], L, Clock, Q[2]); muxdff u1(Q[0], R[1], L, Clock, Q[1]); muxdff u0(w, R[0], L, Clock, Q[0]); endmoduleEach
muxdff picks parallel data $R[i]$ when $L=1$; otherwise it shifts in the lower neighbor (or $w$ for the LSB).
module shift4(input [3:0] R, input L, w, Clock, output reg [3:0] Q);
always @(posedge Clock) begin
if (L) Q <= R;
else Q <= {Q[2:0], w}; // shift left, w in at LSB
end
endmodule
module shiftn #(parameter n=8)(input [n-1:0] R, input L, w, Clock,
output reg [n-1:0] Q);
integer k;
always @(posedge Clock) begin
if (L) Q <= R;
else begin
for (k = n-1; k > 0; k = k-1) Q[k] <= Q[k-1];
Q[0] <= w;
end
end
endmodule
module upcnt4(input Clock, Resetn, E, output reg [3:0] Q);
always @(negedge Resetn, posedge Clock)
if (!Resetn) Q <= 4'b0;
else if (E) Q <= Q + 1;
endmodule
module upcnt4_load(input [3:0] R, input Clock, Resetn, L, E,
output reg [3:0] Q);
always @(negedge Resetn, posedge Clock)
if (!Resetn) Q <= 4'b0;
else if (L) Q <= R;
else if (E) Q <= Q + 1;
endmodule
module dncnt #(parameter n=4)(input [n-1:0] R, input Clock, Resetn, L, E,
output reg [n-1:0] Q);
always @(negedge Resetn, posedge Clock)
if (!Resetn) Q <= {n{1'b0}};
else if (L) Q <= R;
else if (E) Q <= Q - 1;
endmodule
module updncnt #(parameter n=4)(input [n-1:0] R, input Clock, Resetn, L, E,
input up_down, output reg [n-1:0] Q);
always @(negedge Resetn, posedge Clock)
if (!Resetn) Q <= {n{1'b0}};
else if (L) Q <= R;
else if (E) Q <= up_down ? Q + 1 : Q - 1;
endmodule
Let $t_{skew} = \delta_2 - \delta_1$. The data launch time at $Q_1$ is $\delta_1 + t_{cQ}$; data must reach $Q_2$ before $\delta_2 + T - t_{su}$. (a) Setup: $\delta_1 + t_{cQ} + t_L + t_{su} \le \delta_2 + T$, so $T_{min} = t_{cQ} + t_L + t_{su} - t_{skew}$. (b) Hold: data from the same edge must not reach $Q_2$ before its hold time after the edge: $\delta_1 + t_{cQ} + t_l \ge \delta_2 + t_h$, i.e. $t_{cQ} + t_l \ge t_h + t_{skew}$. Negative skew increases the hold-time pressure.
With each gate adding delay $\Delta$, internal node $A$ (after one gate) lags $C$ by $\Delta$; $B$ (after two gates) lags by $2\Delta$. Edges that occur within a window of width $\Delta$ on $C$ produce glitches (hazards) on $B$ when an AND/OR re-converges paths of unequal delay. The waveforms show $A$ and $B$ as $C$ with phase shifts of $\Delta$ and $2\Delta$, and any glitch widths equal to the path-delay difference.
Build the excitation table for the JK pair given inputs $J_0=w$, $K_0=w$, $J_1=w Q_0$, $K_1=w$ (typical Figure 5.71 wiring). Starting at $Q_1 Q_0 = 00$: pulse → $01$; pulse → $10$; pulse → $00$; … sequence $00 \to 01 \to 10 \to 00 \dots$ — a modulo-3 (divide-by-3) counter. Output $Q_1 Q_0$ takes 3 distinct states.
Datapath: a 6-bit register $S$ accumulates the running total. On each Coin pulse, encode $\{N,D,Q\}$ to a 5-bit value $V \in \{5,10,25\}$: $V = N\cdot 5 + D\cdot 10 + Q\cdot 25$. A 6-bit adder forms $S + V$ and clocks the result back into $S$ on the negative edge of Coin. Reach-30 detection: $Z = 1$ when $S \ge 30$. Since $30 = 011110_2$, $Z = s_5 + s_4 s_3 s_2 s_1$ covers all values $\ge 30$ in the achievable range. Resetn asynchronously zeros the register.
module vending(input N, D, Q, Coin, Resetn, output Z);
reg [5:0] S;
wire [4:0] inc = N ? 5'd5 : D ? 5'd10 : Q ? 5'd25 : 5'd0;
always @(negedge Resetn, negedge Coin)
if (!Resetn) S <= 6'd0;
else S <= S + {1'b0, inc};
assign Z = S[5] | (S[4] & S[3] & S[2] & S[1]);
endmodule
The straight design has cascaded ANDs with delay $(n-1)t_{AND}$ between $Q_0$ and the highest XOR. Refactor by using an XOR for each toggle bit and an AND tree of fan-in 2 driven directly from $Q_0\dots Q_{i-1}$, so the longest path is one AND followed by one XOR. Then $T_{min} = t_{cQ} + t_{AND} + t_{XOR} + t_{su} = 1.0 + 1.4 + 1.2 + 0.6 = 4.2$ ns, so $F_{max} = 1/4.2\,\text{ns} \approx 238$ MHz.
Worst-case path through the FSM: $T_{min}(i\to j) = t_{cQ} + t_{path} + t_{su} - (\delta_j - \delta_i)$. Hold check: $t_{cQ} + t_{min\,path} \ge t_h + (\delta_j - \delta_i)$. (i) zero skew: typical $T_{min}\approx 1.0+t_{path}+0.6$. (ii) skew on $\delta_2$: paths into FF2 gain $0.7$ ns slack, paths out of FF2 lose $0.7$ ns. Compute worst path; one of the data paths probably becomes the bottleneck, producing a slightly slower $F_{max}$. (iii) $\delta_1=1, \delta_2=0, \delta_3=0.5$: paths $1\to 2$ have $t_{skew}=-1$ ns (penalty), paths $2\to 3$ get +0.5 ns slack. The worst-case path tightens the clock; check hold: $0.8 + t_{l} \ge 0.4 + (\delta_j-\delta_i)$ — for $1\to 2$ with skew $-1$, hold is automatic. Compute each $F_{max}$ from the dominant path.
Moore FSM detecting 101 (overlapping). States: S0(initial), S1(got 1), S2(got 10), S3(got 101, output=1). *必背
S0 --1--> S1 --0--> S2 --1--> S3(out=1) S0 --0--> S0 S1 --1--> S1 S2 --0--> S0 S3 --1--> S1 (overlap: 101, last 1 starts new) S3 --0--> S2 (overlap: 1010...)
Output z=1 only in S3 (Moore: output on state, not transition).
State encoding: S0=00, S1=01, S2=10, S3=11. Using D FFs:
| State | w | Next | D₁ | D₀ | z |
|---|---|---|---|---|---|
| 00 | 0 | 00 | 0 | 0 | 0 |
| 00 | 1 | 01 | 0 | 1 | 0 |
| 01 | 0 | 10 | 1 | 0 | 0 |
| 01 | 1 | 01 | 0 | 1 | 0 |
| 10 | 0 | 00 | 0 | 0 | 0 |
| 10 | 1 | 11 | 1 | 1 | 0 |
| 11 | 0 | 10 | 1 | 0 | 1 |
| 11 | 1 | 01 | 0 | 1 | 1 |
D₁ = Q₁Q̄₀w + Q̄₁Q₀w̄ + Q₁Q₀w̄; D₀ = Q̄₁w + Q₀; z = Q₁Q₀
Moore FSM: output $z=1$ after three (or more) consecutive 1s on input $w$. States count consecutive 1s — A(0), B(1), C(2), D(≥3, output=1). Output depends only on state, so a dedicated D state is required to register the "detection".
Verilog (Moore, 4 states; output is a function of state only):
module moore_three_ones (
input clock, resetn, w,
output z
);
reg [1:0] y;
parameter A = 2'b00, B = 2'b01, C = 2'b10, D = 2'b11;
// single always: state register only (Moore output is combinational on y)
always @(posedge clock or negedge resetn) begin
if (!resetn) y <= A;
else case (y)
A: y <= (w ? B : A);
B: y <= (w ? C : A);
C: y <= (w ? D : A);
D: y <= (w ? D : A);
default: y <= A;
endcase
end
// Moore output: depends only on state
assign z = (y == D);
endmodule
Why 4 states (vs 3 for the Mealy version in 6.4): in Moore form the output is a pure function of the state register, so "saw exactly 3" cannot be conflated with "saw 2"; a separate detection state D is required. Mealy can fold "third 1" into the C→C transition output, eliminating one state.
Mealy FSM for three consecutive 1s: output $z=1$ on the transition that completes the third 1.
States: A (0 ones), B (1 one), C (2+ consecutive 1s). $z=1$ only on the C→C transition when $w=1$.
Verilog (Mealy, 3 states):
module mealy_three_ones (
input clock, resetn, w,
output reg z
);
reg [1:0] y;
parameter A = 2'b00, B = 2'b01, C = 2'b10;
// single always block: state register + registered Mealy output
always @(posedge clock or negedge resetn) begin
if (!resetn) begin
y <= A;
z <= 1'b0;
end else begin
case (y)
A: begin y <= (w ? B : A); z <= 1'b0; end
B: begin y <= (w ? C : A); z <= 1'b0; end
C: begin y <= (w ? C : A); z <= w; end // z=1 on C&&w=1
default: begin y <= A; z <= 1'b0; end
endcase
end
end
endmodule
State count comparison: Mealy needs 3 states; Moore needs 4. Mealy folds the "just detected" event into the transition output, saving one state.
Moore FSM: Output depends only on current state. Output is stable for entire clock period; simpler to analyze.
Mealy FSM: Output depends on current state AND current input. Output can change mid-cycle if input changes; typically needs fewer states; output arrives one cycle earlier than Moore for same sequence.
Moore advantages: Glitch-free outputs (registered), easier timing analysis.
Mealy advantages: Fewer states (more compact), faster response (output same cycle as input).
One-hot encoding: One flip-flop per state; exactly one FF is 1 at any time. Simple next-state logic (each state is directly a FF output).
5-state comparison:
One-hot uses more FFs but reduces combinational logic complexity, which is preferable on FPGAs where FFs are plentiful and LUTs are the scarce resource.
module seq_detect_101 (
input clk, rst, w,
output z
);
parameter S0=2'd0, S1=2'd1, S2=2'd2, S3=2'd3;
reg [1:0] state, next;
always @(posedge clk or posedge rst)
state <= rst ? S0 : next;
always @(*) begin
case (state)
S0: next = w ? S1 : S0;
S1: next = w ? S1 : S2;
S2: next = w ? S3 : S0;
S3: next = w ? S1 : S2;
default: next = S0;
endcase
end
assign z = (state == S3);
endmodule
Mealy FSM — alternative labeling ($A$ = last input was 0, $B$ = last input was 1 / reset). The detection arrow is $A \to B$ on $w{=}1/z{=}1$:
$\delta(A, 0)=A$, $\delta(A, 1)=B$ (with $z=1$); $\delta(B, 0)=A$, $\delta(B, 1)=B$. Reset → B.
Verilog (compact form — combinational Mealy output):
module mealy_01detect (
input clk, rst, w,
output z
);
parameter A = 1'b0, // last input was 0
B = 1'b1; // last input was 1 / reset
reg state;
always @(posedge clk or posedge rst) begin
if (rst) state <= B; // initial = B
else case (state)
A: state <= w ? B : A; // A→B on w=1 (detection!), A→A on w=0
B: state <= w ? B : A; // B→B on w=1, B→A on w=0
endcase
end
// Mealy output: z=1 only on A→B transition driven by w=1
assign z = (state == A) & w; // 非常重要 要用組合電路
endmodule
Verilog (verbose form — registered output, single always block):
module mealy_01detect (
input clk, reset, w,
output reg z
);
parameter S1 = 1'b0, // last input was 0
S2 = 1'b1; // last input was 1 / reset
reg y;
always @(posedge clk or posedge reset) begin
if (reset) begin
y <= S2; // initial = S2 (so first w=1 doesn't false-detect)
z <= 1'b0;
end else case (y)
S1: begin
if (w == 1'b1) begin
y <= S2;
z <= 1'b1; // ← "01" detected! z = 1
end else begin
y <= S1;
z <= 1'b0;
end
end
S2: begin
if (w == 1'b1) begin
y <= S2;
z <= 1'b0;
end else begin
y <= S1;
z <= 1'b0;
end
end
default: begin y <= S2; z <= 1'b0; end
endcase
end
endmodule
Difference between the two: the compact form uses a continuous assign for z (combinational Mealy — z reacts to w changes within the cycle). The verbose form registers z inside the same clocked always block, so z stays stable for a full cycle (Moore-equivalent timing) but functionally still detects "01". Both produce z=1 at the same clock cycles for any input sequence.
Equivalent to the swapped labeling: the original code in the textbook uses A = "last was 1" and detects on B→A; this version uses A = "last was 0" and detects on A→B. Both produce identical input-output behavior; choice is a stylistic preference. Verify with input 1,0,1,0,1 (initial state B): cycles 3 and 5 fire z=1 exactly when "01" completes.
Given a state table with states, inputs, next states, and outputs, derive equations by:
For a specific table (example with 3 states, 1 input): the equations emerge directly from K-map minimization of each column (D₁, D₀, z) treating state bits and input as variables.
State minimization via partitioning (Myhill-Nerode / implication chart):
For states A–F: build implication chart, mark incompatible pairs (different outputs), propagate implications. Remaining unmarked pairs are equivalent. Merge to minimal state machine.
Vending machine ASM: accepts nickels (N) and dimes (D); item costs 15¢; dispenses change.
States: S0(0¢), S5(5¢), S10(10¢), S15(dispense) ASM blocks: S0: if N→S5; if D→S10 S5: if N→S10; if D→S15(no change) S10: if N→S15(no change); if D→S15+change(5¢) S15: dispense item; if change needed→output nickel; →S0
Decision boxes for N, D inputs; conditional outputs for item and change; state boxes for each amount. ASM chart uses rectangles (states), diamonds (decisions), ovals (conditional outputs).
module traffic_light (
input clk, rst,
output reg [1:0] light // 00=red,01=green,10=yellow
);
reg [4:0] cnt;
// Green 20 cycles, Yellow 5, Red 20
always @(posedge clk or posedge rst) begin
if (rst) begin cnt<=0; light<=2'b01; end
else begin
cnt <= cnt + 1;
case (light)
2'b01: if(cnt==19) begin light<=2'b10; cnt<=0; end
2'b10: if(cnt==4) begin light<=2'b00; cnt<=0; end
2'b00: if(cnt==19) begin light<=2'b01; cnt<=0; end
endcase
end
end
endmodule
Detect 1001 or 1111 (overlapping). Build a state machine tracking the longest suffix of the input that is a prefix of either target:
States: S0, S1(1), S2(10), S3(100), S4(11), S5(111) S0: 0→S0, 1→S1 S1: 0→S2, 1→S4 S2: 0→S0, 1→S3 S3: 0→S0, 1→S4; on completing 1001 at S3+input 1: output=1, goto S1 S4: 0→S2, 1→S5; completing 1001: S3→w=1→output+goto S1 S5: 0→S2, 1→S1; completing 1111: S5+1→output=1, goto S4
Output 1 when completing 1001 (state S3, input 1) or 1111 (state S5+, input 1). Moore output can be added with an accepting state.
A finite state machine is a 5-tuple M = (Q, Σ, δ, q₀, F) where:
Given: D₁ = x⊕Q₂, D₂ = Q₁ (using Q₁,Q₂ as state, x as input).
Next state: Q₁⁺=x⊕Q₂, Q₂⁺=Q₁.
| Q₁Q₂ | x=0: Q₁⁺Q₂⁺ | x=1: Q₁⁺Q₂⁺ |
|---|---|---|
| 00 | 00 | 10 |
| 01 | 10 | 00 |
| 10 | 01 | 11 |
| 11 | 11 | 01 |
State diagram: 4 states, each with two transitions. No output specified → count states/cycles for analysis or add output function.
Arbiter FSM for r₁, r₂: grants access one at a time.
States: IDLE, GRANT1, GRANT2 IDLE: if r₁&r₂→GRANT1 (r₁ priority); if r₁ only→GRANT1; if r₂ only→GRANT2; else→IDLE GRANT1: g₁=1; if ~r₁→IDLE (r₁ releases); else hold GRANT2: g₂=1; if ~r₂→IDLE; else hold
Outputs: g₁=1 in GRANT1, g₂=1 in GRANT2. The arbiter ensures mutual exclusion: only one grant active at a time. Priority given to r₁ when both assert simultaneously.
Advantages of ASM charts over state diagrams:
// Binary encoding (3 FFs for 8 states)
parameter [2:0] S0=3'd0, S1=3'd1, ..., S7=3'd7;
// Gray encoding (3 FFs, adjacent states differ by 1 bit - reduces glitches)
parameter [2:0] S0=3'b000, S1=3'b001, S2=3'b011, S3=3'b010,
S4=3'b110, S5=3'b111, S6=3'b101, S7=3'b100;
// One-hot (8 FFs, simplest next-state logic - preferred for FPGAs)
parameter [7:0] S0=8'b00000001, S1=8'b00000010, S2=8'b00000100,
S3=8'b00001000, S4=8'b00010000, S5=8'b00100000,
S6=8'b01000000, S7=8'b10000000;
One-hot: 8 FFs but next-state = OR of incoming state FFs (no decoding needed). Best for FPGAs. Binary: 3 FFs but complex decode. Gray: reduces output glitches for counters.
Modulo-8 up counter via sequential design. States 000–111, next state = present state + 1 (mod 8). Using D FFs:
State table same as binary counter. K-maps for D₂,D₁,D₀:
D₀ = Q̄₀ (toggles every cycle)
D₁ = Q₁⊕Q₀
D₂ = Q₂⊕(Q₁Q₀)
Output = state value (Q₂Q₁Q₀). Implementation: D₀=NOT Q₀; D₁=XOR(Q₁,Q₀); D₂=XOR(Q₂,AND(Q₁,Q₀)).
Mod-8 counter with JK FFs. JK excitation table: J=0,K=x (hold 0); J=x,K=0 (hold 1); J=1,K=x (set); J=x,K=1 (reset).
J₀=1, K₀=1 (always toggle for LSB).
J₁=Q₀, K₁=Q₀ (toggle when Q₀=1).
J₂=Q₁Q₀, K₂=Q₁Q₀ (toggle when Q₁Q₀=1).
Gate count comparison: JK implementation uses same XOR structure but expressed as J/K. D FF version needs explicit XOR gates; JK version has simpler excitation equations (J₀=K₀=1 is trivially wired). Total gates similar; JK can save gates when J≠K not needed.
Output 1 when w₁=w₂ for four consecutive cycles. States track count of consecutive equal cycles: 0,1,2,3,4(output).
5 states, 2 inputs. Let eq = (w₁ XNOR w₂).
S0 --eq=1--> S1, --eq=0--> S0 S1 --eq=1--> S2, --eq=0--> S0 S2 --eq=1--> S3, --eq=0--> S0 S3 --eq=1--> S4, --eq=0--> S0 S4 --eq=1--> S4, --eq=0--> S0
Moore output: z=1 in S4. Need 3 FFs (⌈log₂5⌉=3). Reduce by noting the counter simply counts eq=1 consecutive cycles.
Serial adder FSM: Adds two serial bit streams A, B. State = carry. Inputs: aᵢ, bᵢ. Output: sᵢ = aᵢ⊕bᵢ⊕carry.
Moore version: 2 states (carry=0, carry=1). Output sᵢ must depend on input too, so pure Moore would need 4 states. Mealy is more natural.
Mealy version (2 states):
| Carry | ab | Next carry | s (output) |
|---|---|---|---|
| 0 | 00 | 0 | 0 |
| 0 | 01/10 | 0 | 1 |
| 0 | 11 | 1 | 0 |
| 1 | 00 | 0 | 1 |
| 1 | 01/10 | 1 | 0 |
| 1 | 11 | 1 | 1 |
Converting Mealy FSM to Moore FSM:
Result: Moore FSM typically has more states (one per unique (state, output) combination). If the Mealy FSM has n states and 2 outputs, the Moore FSM can have up to 2n states.
Detect 110 or 101 (overlapping). States track suffix matching either pattern:
S0: initial S1: saw 1 S2: saw 11 S3: saw 10 S0 --1--> S1, --0--> S0 S1 --1--> S2, --0--> S3 S2 --1--> S2(output), --0--> S3(output for 110) S3 --1--> S1(output for 101), --0--> S0
Moore output: z=1 in states reached when pattern completed. Minimal: 4 states. Output state added or Moore output on S2(w=0) and S3(w=1).
For Mealy: output on transitions S2→(0) and S3→(1).
Four states $A, B, C, D$. State $A$ idles; on $w=1$ go to $B$, then $C$, then $D$ unconditionally, then back to $A$. Outputs (Moore): $B$: $R2_{out}=R3_{in}=1$ (copy R2→R3); $C$: $R1_{out}=R2_{in}=1$ (R1→R2); $D$: $R3_{out}=R1_{in}=Done=1$ (R3→R1). State assignment $A=00, B=01, C=10, D=11$. With $y_2 y_1$ as state bits: $Y_1 = w\overline{y_2}\overline{y_1} + y_2$, $Y_2 = y_1 + y_2\overline{y_1}$ (and so on). The four-cycle sequence performs the swap via R3 as scratch.
Moore-type FSM: each state lists the control signals it asserts. "−" = unconditional transition (input ignored).
Verilog (Moore, single sequential always block + combinational outputs):
module swap_fsm (
input Clock, Resetn, w,
output R1in, R1out, R2in, R2out, R3in, R3out, Done
);
reg [1:0] y;
parameter A = 2'b00, B = 2'b01, C = 2'b10, D = 2'b11;
// single always: state register, async-low reset to A
always @(posedge Clock or negedge Resetn) begin
if (!Resetn) y <= A;
else case (y)
A: y <= (w ? B : A);
B: y <= C;
C: y <= D;
D: y <= A;
default: y <= A;
endcase
end
// Moore outputs — pure functions of state
assign R2out = (y == B), R3in = (y == B); // R3 <- R2
assign R1out = (y == C), R2in = (y == C); // R2 <- R1
assign R3out = (y == D), R1in = (y == D); // R1 <- old R2 (was in R3)
assign Done = (y == D);
endmodule
Datapath wrapper (n-bit registers + shared bus): the FSM above only generates the enables; here is a parameterized top module that hooks up an actual N-bit register file. Width is controlled by the parameter N (default 8). The shared bus is multiplexed by the R*out enables; the R*in enables select which register loads the bus on the next clock edge.
module swap_top #(parameter N = 8) (
input Clock, Resetn, w,
input [N-1:0] init_R1, init_R2, // initial values loaded on reset
output [N-1:0] R1, R2, R3,
output Done
);
reg [N-1:0] r1, r2, r3;
wire R1in, R1out, R2in, R2out, R3in, R3out;
// shared bus: whichever register has *out asserted drives it
wire [N-1:0] bus = R1out ? r1 :
R2out ? r2 :
R3out ? r3 : {N{1'b0}};
always @(posedge Clock or negedge Resetn) begin
if (!Resetn) begin
r1 <= init_R1;
r2 <= init_R2;
r3 <= {N{1'b0}};
end else begin
if (R1in) r1 <= bus;
if (R2in) r2 <= bus;
if (R3in) r3 <= bus;
end
end
assign R1 = r1;
assign R2 = r2;
assign R3 = r3;
// controller from above
swap_fsm ctrl (
.Clock(Clock), .Resetn(Resetn), .w(w),
.R1in(R1in), .R1out(R1out),
.R2in(R2in), .R2out(R2out),
.R3in(R3in), .R3out(R3out),
.Done(Done)
);
endmodule
Width-independence. The FSM doesn't know or care what N is — the swap takes 4 clock cycles regardless of register width. Set N to 1 for a single-bit swap, 32 for a word swap, etc.
Swapping $C$ and $D$ assigns $C=11, D=10$. Recompute next-state K-maps: $Y_1$ now requires $w\overline{y_2}\overline{y_1}$ for $A\to B$ and additional terms for $B\to D$ and $C\to A$ that no longer combine as nicely. Output expressions also become more complex. The total gate-input cost rises by ~3 inputs versus the straightforward assignment, illustrating that state-assignment choice matters.
One-hot: $y_1=A, y_2=B, y_3=C, y_4=D$. Next-state: $Y_1 = (y_1\overline{w}) + y_4$, $Y_2 = w y_1$, $Y_3 = y_2$, $Y_4 = y_3$. Outputs come directly: e.g. $R1_{out} = y_3$, Done = $y_4$. Each transition is one product term. Trade-off: more flip-flops (4 vs 2), but simpler combinational logic and faster speed.
One-hot encoding: each state owns one flip-flop; "−" = unconditional transition (input ignored).
Mealy version uses 3 states $A, B, C$. In $A$, on $w=1$, output $R2_{out}=R3_{in}=1$ and go to $B$. In $B$, output $R1_{out}=R2_{in}=1$ and go to $C$. In $C$, output $R3_{out}=R1_{in}=Done=1$ and go to $A$. The output is generated combinationally with the transition (one clock cycle earlier than Moore), saving one cycle. Total: 3 cycles instead of 4. Care must be taken so outputs do not glitch on combinational input changes.
module swap_fsm(input Clock, Resetn, w, output R1in,R1out,R2in,R2out,R3in,R3out,Done);
reg [1:0] y, Y;
parameter A=2'b00, B=2'b01, C=2'b10, D=2'b11;
always @(*) case (y)
A: Y = w ? B : A;
B: Y = C;
C: Y = D;
D: Y = A;
endcase
always @(negedge Resetn, posedge Clock)
if (!Resetn) y <= A; else y <= Y;
assign R2out = (y==B), R3in = (y==B);
assign R1out = (y==C), R2in = (y==C);
assign R3out = (y==D), R1in = (y==D), Done = (y==D);
endmodule
Partition refinement (output classes first). $P_1$: split states by output → e.g. $\{S_1, S_3, S_5\}$ (output 0) and $\{S_2, S_4, S_6, S_7\}$ (output 1). $P_2$: refine by where each state's transitions go (which $P_1$ block). After several iterations $P_3 = P_4$ and the equivalent classes form 4 states. The reduced FSM has 4 states with the same I/O behavior.
States track cumulative deposit modulo 25 (with credit): $S_0=0$¢, $S_5, S_{10}, S_{15}, S_{20}, S_{25}$. Inputs: nickel ($n$), dime ($d$). Transitions: $S_0$: n→$S_5$, d→$S_{10}$; $S_5$: n→$S_{10}$, d→$S_{15}$; $S_{10}$: n→$S_{15}$, d→$S_{20}$; $S_{15}$ (dispense)→$S_0$; $S_{20}$ (dispense + 5¢ credit) goes to $S_5$. Apply partitioning to merge $S_{15}$ and $S_{20}$ if their behaviors match — they don't (different next states), so 5 states minimum after dispense logic added.
Four unspecified entries can be set to 0 or 1 to maximize state merging. With both = 0: partitioning yields a coarser refinement and merges 7 states down to (e.g.) 4. With both = 1: a different merging produces (e.g.) 5 states. The exact counts depend on the table; the lesson is that completing don't-cares strategically minimizes states.
Four states $00, 01, 10, 11$. Compute $Y_1 Y_2$ for each $(y_1, y_2, w)$: Substitute and tabulate. The state transitions trace: $00 \xrightarrow{w=1} 10 \xrightarrow{w=1} 11 \xrightarrow{w=1} 11$ (with $z=1$). On $w=0$ from any state, return to $00$ or $01$. Sequence detection: three consecutive 1s drive the FSM to state $11$ where $z=1$. Hence the FSM detects the pattern 111.
Excitation $J_1=w$, $K_1=\overline{w}+y_2$, $J_2=wy_1$, $K_2=\overline{w}$. Compute next-state from JK rules ($Q^+=J\overline{Q}+\overline{K}Q$): $y_1^+ = w\overline{y_1} + (w\overline{y_2}) y_1$, $y_2^+ = wy_1\overline{y_2} + wy_2$. Tabulate four states. The same input sequence $w=111\dots$ produces $00\to 10\to 11\to 11$ with $z=1$ in state $11$ — the same three-1s detector as Example 6.9.
$Y_1^+ = D_1 = w(y_1+y_2)$. $Y_2^+ = y_2 \oplus T_2$ for the T flip-flop, where $T_2 = \overline{w}y_2 + wy_1\overline{y_2}$. Build the excitation table: starting at 00, with $w=1\to 00, 10, 11, 11$… The output $z = y_1 y_2$ asserts in state 11. Same FSM behavior as 6.9 and 6.10 — three consecutive 1s detector.
States: $A$ (idle), $B$ (just saw 0), $C$ (saw 00, output $z=1$), $D$ (just saw 1), $E$ (saw 11, output $z=1$). Transitions: $A$: 0→$B$, 1→$D$. $B$: 0→$C$, 1→$D$. $C$: 0→$C$, 1→$D$. $D$: 0→$B$, 1→$E$. $E$: 0→$B$, 1→$E$. Output $z=1$ in $C$ and $E$. State assignment for 5 states needs 3 flip-flops. Derive $Y_3 Y_2 Y_1$ next-state from K-maps.
FSM_1 (detect 11): states $A_1$ (idle), $B_1$ (saw 1), $C_1$ (saw 11, $z_1=1$). FSM_2 (detect 00): mirror. Output $z = z_1 + z_2$. Each FSM has 3 states, total 6 states, but they run independently and combine more cleanly than the monolithic 5-state version. The cost trade-off depends on shared inputs.
Mealy: 3 states $A$ (idle), $B$ (just saw 0), $C$ (just saw 1). $A$: 0/0→$B$, 1/0→$C$. $B$: 0/1→$B$, 1/0→$C$. $C$: 0/0→$B$, 1/1→$C$. Output asserts on the input edge that completes the pair. With 2 flip-flops $y_1 y_2$: $Y_1 = w$, $Y_2 = \overline{w}y_2 + wy_1\dots$ (assignment-dependent), $z = wy_1 + \overline{w}y_2$ (output high when current input matches the previous one).
Mealy FSM: $z=1$ fires on the transitions $B \xrightarrow{0/1} B$ and $C \xrightarrow{1/1} C$ (current input matches the previous one).
For each present-state/next-state transition, JK excitation is: $00\to 0$: J=0, K=X; $0\to 1$: J=1, K=X; $1\to 0$: J=X, K=1; $1\to 1$: J=X, K=0. Build K-maps for $J_1, K_1, J_2, K_2, J_3, K_3$ from the assigned table. Resulting expressions typically simpler than D-FF case because of the X cells. The book's solution shows a ~30% input-count reduction over D realization.
module seq_moore(input Clock, Resetn, w, output reg z);
reg [2:0] y, Y;
parameter A=0, B=1, C=2, D=3, E=4;
always @(*) case (y)
A: Y = w ? D : B;
B: Y = w ? D : C;
C: Y = w ? D : C;
D: Y = w ? E : B;
E: Y = w ? E : B;
default: Y = A;
endcase
always @(negedge Resetn, posedge Clock)
if (!Resetn) y <= A; else y <= Y;
always @(*) z = (y == C) || (y == E);
endmodule
module seq_mealy(input Clock, Resetn, w, output reg z);
reg [1:0] y, Y;
parameter A=0, B=1, C=2;
always @(*) begin
z = 0;
case (y)
A: if (w) Y = C; else Y = B;
B: if (w) Y = C; else begin Y = B; z = 1; end
C: if (w) begin Y = C; z = 1; end else Y = B;
default: Y = A;
endcase
end
always @(negedge Resetn, posedge Clock)
if (!Resetn) y <= A; else y <= Y;
endmodule
Frame: 1 start bit (0) + 7 data bits + 1 stop bit (1) = 9 bits. State machine with 11 states: idle, start, $b_0\dots b_6$, stop, done. A counter cycles through bit positions; on each clock the output line presents the next bit of the shift register holding the byte. Done asserts after the stop bit. Verilog uses a parallel-load shift register and a state-FSM controller.
Tri-state bus: Multiple drivers share a wire; only one enabled at a time via OE (output-enable). Bus floats when all disabled. Simple wiring but requires careful control; floating bus can pick up noise.
Mux-based bus: A multiplexer selects one of several sources; no tri-state buffers needed. Cleaner logic, easier to synthesize for FPGAs (which have no tri-state internal routing), but requires a physical mux and select signals.
FPGAs prefer mux-based; external PCB buses often use tri-state.
Simple processor datapath with 4 registers R0–R3, ALU, and bus:
Registers R0-R3 ──┐
├──► MUX_A ──► ALU_A
└──► MUX_B ──► ALU_B
ALU: ADD/SUB/AND/OR with opcode input
ALU result ──► Result register ──► Write-back MUX ──► Rn
Control signals: RegSel_A[2], RegSel_B[2], ALUop[2],
WE (write enable), RegDest[2]
The controller FSM sequences: fetch operands → execute ALU op → write back to destination register.
Arithmetic mean of four 8-bit numbers: sum A+B+C+D (10-bit result), then divide by 4 (right-shift 2).
ASM chart: IDLE ──start──► LOAD: Sum:=0, cnt:=4, ptr:=0 ACCUM: Sum:=Sum+Data[ptr]; ptr:=ptr+1; cnt:=cnt-1 ──(cnt≠0)──► ACCUM ──(cnt=0)──► COMPUTE: Mean:=Sum>>2 COMPUTE ──► DONE: output Mean; done:=1 ──► IDLE
Datapath: 10-bit accumulator, 4-element memory or 4 input ports, 2-bit right shifter, done flag.
Why debouncing is needed: Mechanical switches bounce for 1–20 ms, generating multiple transitions. Without debouncing, a single press registers as multiple pulses.
Hardware approach: SR latch with two SPDT switch positions; once set/reset, further bounces have no effect. Alternatively: RC low-pass filter + Schmitt trigger.
Software approach: Sample the switch; if value is stable for N consecutive samples (e.g., 20 ms at 1 ms intervals), accept it as a valid press. Implemented with a counter in software or HDL.
Clock skew: Difference in arrival time of the clock signal at different flip-flops. Caused by wire delays, buffer delays.
Setup violation due to skew: If the destination FF clock arrives earlier than the source FF clock, the effective cycle time is reduced. Constraint: T_clk + t_skew_neg ≥ t_cQ + t_logic + t_su.
Hold violation due to skew: If destination FF clock arrives later, data may change before being captured. Constraint: t_cQ + t_logic_min ≥ t_h + t_skew_pos. Hold violations cannot be fixed by slowing the clock — they require adding buffers on the data path.
Microinstructions for a simple processor (horizontal microcode):
| Instruction | Microoperations |
|---|---|
| Load Rn, M | MAR←PC; PC←PC+1; MDR←Mem[MAR]; Rn←MDR |
| Move Rd, Rs | Rd←Rs |
| Add Rd, Rs | T←Rd+Rs; Rd←T |
| Sub Rd, Rs | T←Rd−Rs; Rd←T (using 2's complement adder) |
Each microinstruction specifies: source register(s), ALU operation, destination register, memory operation, and next microaddress.
Register Transfer Level (RTL) design describes a digital system as registers storing data and combinational logic transferring data between registers in each clock cycle.
Key concepts: registers hold state; data operations described as register transfers (e.g., A ← B + C); controller FSM determines which transfers occur each cycle; datapath implements the hardware to perform those operations.
RTL abstraction sits between algorithmic level and gate level — high enough for synthesis tools to optimize, detailed enough to specify hardware behavior precisely.
GCD using Euclidean algorithm: GCD(A,B) = GCD(B, A mod B). In hardware use repeated subtraction: GCD(A,B): while A≠B, if A>B then A←A−B else B←B−A.
ASM chart:
IDLE ──start──► LOAD: A:=a, B:=b
TEST: if A==B → DONE: gcd:=A
if A>B → SUBA: A:=A-B → TEST
else → SUBB: B:=B-A → TEST
Datapath: two registers A, B; subtractor; comparator. Cycles for worst case: O(max(a,b)).
module counter_param #(parameter N=8) (
input clk, rst, en, load,
input [N-1:0] d,
output reg [N-1:0] q
);
always @(posedge clk or posedge rst) begin
if (rst) q <= {N{1'b0}};
else if (load) q <= d;
else if (en) q <= q + 1'b1;
end
endmodule
An ASM block consists of three elements:
module bit_counter (
input clk, rst, start,
input [7:0] data,
output reg [3:0] count,
output reg done
);
reg [7:0] sr;
reg [2:0] cnt;
reg [1:0] state; // 0=IDLE,1=LOAD,2=COUNT,3=DONE
always @(posedge clk or posedge rst) begin
if (rst) begin state<=0; done<=0; count<=0; end
else case (state)
0: if (start) begin sr<=data; cnt<=0; count<=0; state<=2; end
2: begin
count <= count + sr[0];
sr <= sr >> 1;
cnt <= cnt + 1;
if (cnt==7) begin done<=1; state<=3; end
end
3: begin done<=0; state<=0; end
endcase
end
endmodule
Shift-and-add multiplier datapath:
ASM chart: INIT state → LOAD A:=0, Q:=B, cnt:=4; LOOP: if Q[0]=1 then A:=A+M; shift A:Q right; cnt:=cnt−1; if cnt=0 → DONE else → LOOP.
Product = A:Q after 4 iterations.
System: read eight 4-bit values, store them, compute sum and average.
Datapath: 8×4-bit memory (or shift reg), 7-bit accumulator, 3-bit counter Controller ASM: IDLE ──start──► INIT: acc:=0, cnt:=0 READ: store data[cnt]; acc:=acc+data[cnt]; cnt:=cnt+1 ──(cnt<8)──► READ ──(cnt=8)──► COMPUTE: sum:=acc; avg:=acc>>3 (divide by 8) DONE: output sum, avg; done:=1 ──► IDLE
Average = sum/8 implemented as 3-bit right shift (exact for multiples of 8, truncated otherwise).
module tristate_bus (
input [7:0] data_in,
input oe, // output enable
output [7:0] bus
);
assign bus = oe ? data_in : 8'bz;
endmodule
When oe=1, drives bus with data_in. When oe=0, bus is high-impedance (Z). Multiple such modules can share the same bus wire provided only one has oe=1 at a time.
Sorting network for three 4-bit numbers A, B, C into sorted order X≤Y≤Z:
Step 1: Compare-swap (A,B): if A>B, swap → ensures min(A,B) in A Step 2: Compare-swap (B,C): if B>C, swap → ensures max in C (= Z) Step 3: Compare-swap (A,B): if A>B, swap → ensures A≤B (= X≤Y)
3 compare-swap elements total. Each comparator is a 4-bit magnitude comparator with a 4-bit 2-to-1 mux for each output. Fully combinational, depth = 3 comparator levels.
Moore in ASM: Outputs are listed inside the state box (rectangle). Output active for the entire clock cycle regardless of input. Example: state S2 box contains "z=1".
Mealy in ASM: Outputs are listed in oval conditional output boxes on arcs, adjacent to the decision diamond. Output is conditional on both state and input. Example: on the arc from S1 when w=1, an oval box contains "z=1".
Both representations preserve the one-clock-per-ASM-block timing relationship.
Synchronizer circuit: Two cascaded D flip-flops clocked by the destination domain. The first FF captures the asynchronous input (may go metastable); the second FF samples the first FF's output after a full clock period — by which time the first FF has almost certainly resolved.
Why a single FF is insufficient: The first FF's output may still be metastable (not fully resolved to a valid 0 or 1) when the next logic stage samples it, propagating an invalid value. Adding a second FF gives the first FF a full clock period to resolve its metastable state, reducing the probability of failure exponentially.
Repeated-subtraction division: quotient Q=0; while dividend ≥ divisor: dividend−=divisor, Q++.
Pseudo-code:
Q := 0
R := dividend // 255
D := divisor // 7
while R >= D:
R := R - D
Q := Q + 1
// result: Q=36, R=3 (255 = 36×7 + 3)
Cycles for 255÷7: quotient = 36, so loop executes 36 times plus setup = 37 clock cycles (36 subtractions + final comparison). In hardware, each subtraction takes 1 clock cycle.
Shift-and-add multiplier cycles = number of bits in multiplier:
4-bit multiplier: 4 iterations + 1 cycle for initialization + 1 cycle for done = 4 clock cycles (excluding load/done).
8-bit multiplier: 8 clock cycles for the shift-and-add loop.
Generally: n-bit × n-bit multiplier requires n clock cycles for the sequential shift-and-add algorithm. A combinational (array) multiplier requires 0 clock cycles but uses O(n²) gates.
Debouncing circuit using SR latch with SPDT switch and pull-up resistors:
Vcc | R1 (pull-up) |──────── S input of SR latch | ← switch common R2 (pull-up) |──────── R input of SR latch GND
When switch moves to S position: S line is grounded → SR latch Set. Bounces on S side re-apply Set — no effect (already set). When switch moves to R: R line grounded → Reset. Result: Q output changes cleanly once per switch throw, regardless of bouncing contacts.
Clock distribution challenges: Long interconnects introduce varying delays to different FFs (clock skew). Skew causes setup/hold violations.
Techniques to minimize skew:
Restoring divider: 8-bit ÷ 4-bit.
Datapath: 8-bit partial remainder register P, 4-bit divisor D.
Each step: P_shifted = P left-shifted by 1 (bring in next dividend bit); attempt subtraction P_trial = P_shifted − D; if P_trial ≥ 0: quotient bit = 1, P = P_trial; else: quotient bit = 0, P restored (P remains).
ASM chart: INIT(load dividend into P[7:0], D into divisor reg, cnt=4) → SHIFT(P=P<<1) → SUBTRACT(P_trial=P−D) → decision(P_trial[MSB]=0 → quotient bit=1, P=P_trial else bit=0) → cnt−1 → if cnt≠0 repeat → DONE.
module multiplier (
input clk, rst, start,
input [3:0] A, B,
output reg [7:0] product,
output reg done
);
reg [7:0] acc; reg [3:0] mplier; reg [2:0] cnt;
always @(posedge clk or posedge rst) begin
if (rst) begin done<=0; acc<=0; end
else if (start) begin
acc<=0; mplier<=B; cnt<=4; done<=0;
end else if (cnt>0) begin
if (mplier[0]) acc <= acc + {4'b0,A};
mplier <= mplier>>1; acc[7:1]<=acc[6:0]; // shift
// actually: {acc,mplier} >>= 1 with add to upper half
cnt <= cnt-1;
if (cnt==1) begin product<=acc+(mplier[0]?{4'b0,A}:0); done<=1; end
end
end
endmodule
// Testbench: 13*11=143
// A=4'd13, B=4'd11; expect product=8'd143
For signed (2's complement) multiplication:
UART transmitter: sends 8-bit data with start (0) and stop (1) bits.
ASM: IDLE(tx=1) ──send──► START: tx:=0; cnt:=8 BIT: tx:=data[0]; data:=data>>1; cnt:=cnt-1 ──(cnt≠0)──► BIT ──(cnt=0)──► STOP: tx:=1 ──► IDLE
module uart_tx (
input clk, rst, send,
input [7:0] data,
output reg tx, busy
);
reg [7:0] sr; reg [3:0] cnt;
reg [1:0] state;
always @(posedge clk or posedge rst) begin
if (rst) begin tx<=1; state<=0; busy<=0; end
else case (state)
0: if(send) begin sr<=data;cnt<=8;tx<=0;busy<=1;state<=1; end
1: begin tx<=sr[0];sr<=sr>>1;cnt<=cnt-1;
if(cnt==1) state<=2; end
2: begin tx<=1; busy<=0; state<=0; end
endcase
end
endmodule
Datapath: $n$-bit shift register holding the input (MSB shifts out), $\lceil\log_2 n\rceil$-bit counter accumulating ones, and an $n$-bit-cycle counter. Control signals: load, shift, incr, done. ASM chart: state $S_1$ loads input on Start. $S_2$ loops $n$ times: shift the register; if the shifted-out bit is 1, increment the count; decrement the cycle counter. When the cycle counter reaches 0, go to $S_3$ which asserts done.
module bitcount #(parameter n=8)(input Clk, Rst, Start, input [n-1:0] D,
output reg [3:0] cnt, output reg done);
reg [n-1:0] sr; reg [3:0] k; reg [1:0] s;
always @(posedge Clk, posedge Rst)
if (Rst) begin s<=0; cnt<=0; done<=0; end
else case (s)
0: if (Start) begin sr<=D; cnt<=0; k<=n; s<=1; done<=0; end
1: if (k==0) begin done<=1; s<=2; end
else begin if (sr[n-1]) cnt<=cnt+1; sr<=sr<<1; k<=k-1; end
2: if (!Start) s<=0;
endcase
endmodule
Datapath: $n$-bit multiplicand register $M$, $n$-bit multiplier register (LSB shifted out each cycle), $2n$-bit accumulator $A$ initially 0. ASM: in state $S_1$ load operands. In $S_2$ (looping $n$ times): if multiplier LSB = 1, add $M$ to upper half of $A$; right-shift $A$ by one (capturing the next product bit); shift multiplier right. After $n$ iterations the product sits in $A$. State $S_3$ asserts Done.
module shift_add #(parameter n=4)(input Clk, Rst, Start, input [n-1:0] M, Q,
output reg [2*n-1:0] P, output reg done);
reg [2*n-1:0] A; reg [n-1:0] q; reg [3:0] k; reg [1:0] s;
always @(posedge Clk, posedge Rst)
if (Rst) begin s<=0; done<=0; end
else case (s)
0: if (Start) begin A<=0; q<=Q; k<=n; s<=1; done<=0; end
1: begin
if (q[0]) A[2*n-1:n-1] <= A[2*n-1:n-1] + {1'b0,M};
A <= A>>1; q <= q>>1; k<=k-1;
if (k==1) begin s<=2; done<=1; P<=A; end
end
endcase
endmodule
Restoring divider: $n$-bit divisor $B$, $2n$-bit register $\{R, Q\}$ where dividend loaded into $Q$ and $R$ initially 0. ASM: each cycle, shift $\{R, Q\}$ left by 1 (MSB into $R$); subtract $B$ from $R$; if result $\ge 0$, place 1 into $Q$'s LSB; else restore (add $B$ back) and place 0 in $Q$'s LSB. After $n$ cycles, $R$ holds the remainder and $Q$ holds the quotient. Verilog: case-based FSM with shift, subtract-or-restore conditional, and a cycle counter.
Datapath: an accumulator (width $n+\lceil\log_2 k\rceil$), a counter for the number of inputs received, and a divider (or shift-by-$\log_2 k$ if $k$ is a power of 2). ASM: in state $S_1$ wait for Start. $S_2$: on each input strobe, add input to accumulator and increment counter; loop until counter = $k$. $S_3$: divide accumulator by $k$ (use the divider FSM if $k$ is not a power of 2, or right-shift if it is). $S_4$: assert Done with the mean as output.
Datapath: a register file of $k$ entries × $n$ bits, two read ports, one write port; a comparator and a swap mechanism. Selection-sort ASM: outer loop $i = 0\dots k-2$; inner loop finds index of minimum in $RF[i\dots k-1]$ and swaps with $RF[i]$. Each iteration uses two registers (current min index, current scan index) and a comparator. Verilog uses a 2-counter nested-loop FSM with comparator-driven conditional swaps. Total time: $O(k^2)$ clock cycles.
f = x₁x₃ + x₁x₄ + x₂x₃ + x₂x₄ = x₃(x₁+x₂) + x₄(x₁+x₂) = (x₁+x₂)(x₃+x₄)
SOP cost: 4 AND gates (2-input) + 1 OR gate (4-input) = 8 gate inputs + 5 gates.
Factored form cost: 2 OR gates (2-input) + 1 AND gate (2-input) = 3 gates, 6 gate inputs.
Saving: 2 gates, multilevel is cheaper and faster for this function.
f = ac+ad+bc+bd+e = (a+b)(c+d)+e. Convert to NAND-NAND:
Use double inversion: f = (a+b)(c+d) · e
NAND of NAND (applying DeMorgan):
t1 = NAND(a,c); t2=NAND(a,d); t3=NAND(b,c); t4=NAND(b,d) t5 = NAND(t1,t2,t3,t4) // = ac+ad+bc+bd ne = NAND(e,e) // = ē (not needed if e uncomplemented) f = NAND(t5, NAND(e,e)̄) // adjust...
More directly: f = NAND(NAND(a,c), NAND(a,d), NAND(b,c), NAND(b,d), NAND(e,e)̄) — using e with self-NAND for inversion, then a 5-input NAND for the final OR layer.
f = Σm(0,1,2,5,6,7,8,9,14). Quine-McCluskey grouping by number of 1s:
Group 0: 0000(0) Group 1: 0001(1), 0010(2), 1000(8) Group 2: 0101(5), 0110(6), 1001(9) Group 3: 0111(7), 1110(14) Combine: (0,1)=000-, (0,2)=00-0, (0,8)=-000 (1,5)=0-01, (1,9)=-001, (2,6)=0-10 (8,9)=100- (5,7)=01-1, (6,7)=011- (9,...) check (1,9)=-001 ✓ Second pass: (0,1,8,9)=-00-, (0,2,1,...)...
Prime implicants: -00- (0,1,8,9); 0-0- (0,2)? ; 0-1- (5,7)=0-1-? ; 011- (6,7); 0-10 (2,6); -001 (1,9); 1110 (14). Full QM table needed for exact set.
f = x₁x₂ + x̄₁x₃, order x₁ < x₂ < x₃.
x₁
/ \
0 1
x₃ x₂
/ \ / \
0 1 0 1
0 1 0 1
Build Shannon expansion tree. After reduction:
After ROBDD reduction (x₃ and x₂ nodes both map to identity, but different variables — cannot merge). 4 nodes: root x₁, left x₃, right x₂, terminal 0 and 1.
For n variables, the K-map has 2ⁿ cells. Practical issues with 5+ variables:
Alternatives:
f = Σm(0,3,4,7,9,10,13,14). K-map grouping with fan-in ≤ 2 constraint means only 2-input gates.
Standard K-map: {0,3,4,7}=x̄₂x̄₃·? No: 0=0000,3=0011,4=0100,7=0111 → x̄₁(x̄₂x̄₃x̄₄+x̄₂x₃x₄+x₂x̄₃x̄₄+x₂x₃x₄)... group {0,3,4,7}= x̄₁(x₃⊙x₄)... {0,3,4,7}=x̄₁·(x₂⊕x₃x₄)?
Groups: {0,4,9,13}? No. Best K-map cover: {0,3,4,7}=x̄₁(x̄₂x̄₃+x₂x₃)... ={x̄₁(x₂⊙x₃)}... Actually {0,3,4,7}: bits differ in x₂,x₃,x₄ — these are x̄₁x̄₄(x̄₂x̄₃ impossible since 3=0011,7=0111). With fan-in≤2: decompose using a tree of 2-input gates.
Two-level logic advantages: Simple structure (SOP/POS); well-understood minimization (K-map, QM); predictable delay (exactly 2 levels); easy to implement with PLAs.
Two-level disadvantages: Can have large fan-in for complex functions; cannot share common sub-expressions; gate count can be exponential in the number of variables.
Multilevel advantages: Shares sub-expressions → fewer gates/literals; supports any fan-in constraint; more area-efficient for complex functions.
Multilevel disadvantages: More logic levels → longer critical path; harder to optimize systematically; less predictable timing.
f = abd + abe + acd + ace. Factor using algebraic division:
= a(bd + be + cd + ce) = a·(b+c)(d+e)
Extract t₁ = (b+c), t₂ = (d+e):
f = a · t₁ · t₂ = a·(b+c)·(d+e)
Original: 4 product terms × 3 literals = 12 literals + 1 OR gate.
Factored: 2 OR gates + 3-input AND gate (or 2 two-input ANDs) = 3 gates, 6 literals. Significant saving.
ROBDD (Reduced Ordered Binary Decision Diagram): A directed acyclic graph where each internal node is labeled with a variable, has two children (low=0, high=1), and the variable ordering is consistent along all root-to-leaf paths.
Two reduction rules:
Canonical form importance: For a fixed variable ordering, the ROBDD is unique for each Boolean function. This enables O(1) equality checking (same pointer = same function) and efficient BDD operations.
Technology mapping transforms a technology-independent (generic gate) netlist into one using cells from a specific cell library (standard cells for ASIC, or LUTs for FPGA).
Steps:
The result is a mapped netlist with specific cell instances rather than generic gates.
Mux f = s·d₁ + s̄·d₀. Variables: s, d₀, d₁.
Ordering 1: s < d₀ < d₁
Root: s s=0: d₀ (output = d₀) s=1: d₁ (output = d₁) BDD has 3 nodes: s, d₀, d₁ plus terminals 0,1.
Ordering 2: d₀ < d₁ < s
Root: d₀; must branch on d₁ then s → more nodes needed. Effectively 7 nodes in worst case.
Ordering 1 (s first) gives the smaller BDD because s is the "controlling" variable for the mux function.
Functional decomposition: express f(x₁..x₅) = h(g(x₁,x₂,x₃), x₄, x₅) where g is a simpler sub-function.
Identify a "bound set" (e.g., {x₁,x₂,x₃}) and "free set" ({x₄,x₅}).
For each combination of x₄,x₅, list the required function of x₁,x₂,x₃. If multiple (x₄,x₅) combinations require the same sub-function of x₁,x₂,x₃, they share the same g output column.
A decomposition exists if the number of distinct columns ≤ 2ᵏ for k bits. Extract g (3-input function) and h (3-input: g-output + x₄ + x₅).
f = Σm(0,2,3,5,7,8,10,11,14,15,24,26,27,29,31) — 5 variable function.
QM: Group by number of 1s in binary representation, then combine pairs (differ by 1 bit), then quads, etc. With 5 variables (00000–11111), systematically combine until no further reduction possible.
Pattern: minterms (0,2,8,10,24,26)=−−0−0; (3,7,11,15,27,31)=−−111; (5,7,29,31)=−10−1; etc. Full enumeration yields prime implicants from which minimum cover is selected via PI chart.
f = (a+b)(c+d)+e using NOR gates:
NOR-NOR implements POS. f' = f = (a+b)(c+d)+e.
f' = NOR(NOR(a,b), NOR(c,d), e) — using De Morgan repeatedly.
Actually: complement the function and implement complement of complement with NOR-NOR:
NOR(a,b) → nor_ab = ā·b̄ NOR(c,d) → nor_cd = c̄·d̄ NOR(nor_ab, nor_cd, ē) → f
This requires complemented inputs (ā,b̄,c̄,d̄,ē) or NOR inverters for each.
f = x̄₁x₂ + x₁x̄₃. Cubical representation — each cube is a product term represented as a ternary vector (0=complemented, 1=uncomplemented, −=don't care):
| Cube | x₁ | x₂ | x₃ |
|---|---|---|---|
| x̄₁x₂ | 0 | 1 | − |
| x₁x̄₃ | 1 | − | 0 |
The cover (set of cubes) = {(0,1,−), (1,−,0)}. Each cube represents a set of minterms: cube (0,1,−) = minterms {010, 011}={2,3}; cube (1,−,0)={100,110}={4,6}.
For a complex 5-variable function, multilevel synthesis extracts common sub-expressions:
Example: if f = abcd + abce + fgh, extract t₁=ab: f = t₁(cd+ce)+fgh = t₁·c·(d+e)+fgh. This reduces the literal count from 9 to 7 (one ab → one t₁ used twice, saving literals).
f = Σm(4,7,8,11) + D(12,15). K-map (4-var):
x₃x₄ x₁x₂ 00 01 11 10 00: 0 0 0 0 01: 1 0 1 0 11: d 0 d 0 10: 1 0 1 0
Groups: {4,8,12d}=x̄₃x̄₄? No, 4=0100,8=1000,12=1100 → differ in x₁,x₂. {4,12d}=x̄₃x̄₄·x₂? {4,8}=x̄₂x̄₄? No..
Minterms: 4=0100, 7=0111, 8=1000, 11=1011, DC 12=1100, 15=1111.
Groups: {4,12}=?x100; {7,15d}=?111; {8,12d}=1?00; {11,15d}=1?11.
Min SOP: f = x̄₃x̄₄ + x₃x₄ (covers 4,8,12dc,0? — check). Recheck: {4,8,12,0}: x₃=0,x₄=0 → covers 0,4,8,12 — but 0 not in f. Use {4,12}: x₂=1,x₃=0,x₄=0→x₂x̄₃x̄₄; {8}: x₁=1,x₂=0,x₃=0,x₄=0→x₁x̄₂x̄₃x̄₄; {7,15d}: x₂x₃x₄; {11,15d}: x₁x₃x₄.
Min SOP: x₂x̄₃x̄₄ + x₁x̄₂x̄₃x̄₄ + x₁x₃x₄. Factored: (x₂+x₁x̄₂)x̄₃x̄₄ + x₁x₃x₄ = (x₁+x₂)x̄₃x̄₄ + x₁x₃x₄.
Cubical representation definitions:
In Quine-McCluskey, a "check mark" (✓) is placed next to a minterm entry in a group when it has been successfully combined with another minterm. Any row without a check mark at the end is a prime implicant (cannot be combined further).
PI chart selection:
f = x₁x₂ + x₃x₄ + x₅x₆.
Interleaved ordering (x₁,x₂,x₃,x₄,x₅,x₆): Each pair xᵢxᵢ₊₁ is adjacent; BDD can share the AND-structure for each pair compactly. BDD size = O(n) nodes.
Grouped ordering (x₁,x₃,x₅,x₂,x₄,x₆): Each product term's variables are separated; the BDD cannot share structure between the first halves — exponential blowup possible. BDD size = O(2^(n/2)).
General rule: For a function that is a sum of independent pairs, interleaving the pair members minimizes BDD size. Variable ordering is NP-hard to optimize optimally; dynamic variable reordering heuristics (sifting) are used in practice.
PI chart: columns = minterms, rows = prime implicants. Mark each minterm covered by each PI.
Find essential PIs: columns with only one mark → that PI is essential. Select all essential PIs, remove covered minterms, then use Petrick's method or greedy selection for remaining.
For f from 8.5: PI -00- covers {0,1,8,9}; if any of 0,1,8,9 is only covered by -00-, it's essential. Continue until all minterms covered. The minimum cover = essential PIs + minimum additional PIs for uncovered minterms.
Carry function for n-bit adder: cₙ = f(a₁,b₁,...,aₙ,bₙ,c₀).
Interleaved ordering (a₁,b₁,a₂,b₂,...aₙ,bₙ): BDD size is polynomial O(n) — each carry bit depends on its and lower bits' a,b pairs; BDD can share nodes efficiently.
Grouped ordering (a₁,...,aₙ,b₁,...,bₙ): BDD size grows exponentially O(2ⁿ) because cₙ depends on all aᵢ and bᵢ in a way that requires exponentially many nodes when variables are split into two separate groups.
Lesson: variable ordering dramatically affects BDD size; interleaved is optimal for adder carry.
f = Σm(0,4,8,13,14,15) with uncomplemented inputs only (no inverters).
Without complements, we can only form product terms using uncomplemented variables. Minterms requiring all-1 variables are easy; minterms with 0s require complements.
Minterm 0 = x̄₁x̄₂x̄₃x̄₄ requires all complements → not realizable with uncomplemented inputs only.
For this constraint: use NAND/NOR decomposition, or accept that some minterms (0,4,8) are not directly implementable; consider alternative: implement f̄ using uncomplemented inputs and invert, or use XOR-based form.
If only minterms 13,14,15 covered: f_partial = x₁x₂x₃x̄₄ + x₁x₂x̄₃x₄ + x₁x₂x₃x₄ = x₁x₂(x₃+x₄). Add minterms 0,4,8 by alternative decomposition.
f = ab+cd. Cell library example: {INV, NAND2, NOR2, AND2, OR2, AOI21 (AND-OR-Invert)}.
Minimize area: f = ab+cd = NAND(NAND(a,b), NAND(c,d)). Uses 3 NAND2 gates (area = 3 units). Alternatively: AND2+AND2+OR2 = 3 cells but likely larger area.
Minimize delay: f can be implemented as single AOI21 cell if available: ̄f = AOI(a,b,c,d) then INV. Or NAND-NAND: 2 levels × t_NAND delay — same as any 2-level implementation.
Circuit: t₁=a+b, t₂=c·d, t₃=t₁·e, f=t₂+t₃ = c·d + (a+b)·e
Analysis: f = cd + ae + be. Literal count = 5. Gate count = AND(c,d) + AND(a,b→OR) + 2 ANDs + OR = 4 gates.
Re-factor: f = cd + e(a+b). This is already factored. Alternative: no simpler form exists since cd and e(a+b) share no common factor.
Could also write f = c·d + (a+b)·e — same. Verify: expanding gives cd+ae+be which matches original enumeration.
Note $f_2 = (x_1+x_2)(x_3+x_4)$ already factored. Examine $f_1$: it equals $f_2$ minus the case $x_1=x_2=0$ where it should be $\overline{x_1}\overline{x_2}(x_3+x_4)$ — i.e. $f_1 = (x_1+x_2)(x_3 x_4) + \overline{x_1}\overline{x_2}(x_3+x_4)$. Both share the subexpression $g = x_3+x_4$ and $h = x_1+x_2$. Total cost: $g, h$ shared; then $f_2 = h\cdot g$, $f_1 = h\cdot x_3 x_4 + \overline{h}\cdot g$. Sharing $g$ and $h$ saves several gate-inputs versus building each output independently.
Group on the K-map columns: in columns 01 ($\overline{x_3}x_4$) and 10 ($x_3\overline{x_4}$) the function takes value $x_1 \oplus \overline{x_2}$; in columns 00 and 11 it is 0. So $g(x_3, x_4) = x_3 \oplus x_4$ (asserts in the two middle columns). Then $f = g \cdot (x_1 \oplus \overline{x_2})$ — a 2-level XOR product, two XORs feeding one AND. Cost: 3 gates with 2 inputs each, vs four 3-input ANDs in the SOP form.
By inspection of the K-map, when rows where $g(x_1,x_2,x_5)=0$ all have one column pattern and rows where $g=1$ have a different pattern, define $g$ as the row indicator. Similarly $k(x_3,x_4)$ as the column index. The book's example produces $f = g \oplus k$ or $f = g\overline{k} + \overline{g}k$. Cost: 3 small subfunctions + 1 XOR ≈ 8 inputs, versus minimum SOP ≈ 18 inputs — a substantial saving.
5-NAND XOR (standard):
g1 = NAND(x1,x2) g2 = NAND(x1,g1) g3 = NAND(g1,x2) f = NAND(g2,g3)(That's actually 4 NANDs.) The 4-NAND version uses $g = \text{NAND}(x_1, x_2)$ shared among the next stages:
g = NAND(x1, x2) f1 = NAND(x1, g) f2 = NAND(x2, g) f = NAND(f1, f2)This realizes $x_1 \oplus x_2$ in 4 NAND gates.
Bubble-pair insertion: AND-OR levels alternate, so place a bubble pair at every signal between adjacent levels. After absorbing each bubble into the gate it touches: every AND becomes NAND and every OR (with bubbles on inputs and output) becomes NAND too — yielding (a) all-NAND. For (b) all-NOR: insert bubble pairs differently — turn the AND gates into NORs (with bubbled inputs) and the ORs into NORs (with bubbled outputs absorbed). The structure flips between AND-OR-AND-OR and OR-AND-OR-AND topology.
Label intermediate nodes $P_1, P_2, P_3$ inside Figure 8.12. Trace from output back: $f = P_1 + P_2$; $P_1 = x_1 P_3$, $P_3 = x_2 + x_3$; $P_2 = \overline{x_1} x_4$. So $f = x_1(x_2+x_3) + \overline{x_1}x_4 = x_1 x_2 + x_1 x_3 + \overline{x_1}x_4$. SOP cost: 3 ANDs + 1 OR = 4 gates, 9 inputs.
Replace the XOR/XNOR gates of Example 8.3 by their SOP equivalents ($x \oplus y = x\overline{y}+\overline{x}y$). Label internal nodes and trace: each leg of the multilevel circuit collapses to one of the original product terms. The expression reconstructs to $f = $ the original specification — verifying functional equivalence.
Label $P_1=$ NAND($x_1, x_2$), $P_2=$ NAND($x_3, x_4$), $P_3=$ NAND($P_1, P_2, x_5$). Apply DeMorgan to each: $P_1 = \overline{x_1 x_2}$, $P_2 = \overline{x_3 x_4}$, but feeding into the next-level NAND with bubble absorption gives an OR-of-AND structure. The textbook reduction yields $f = x_1 x_2 x_4 + \overline{x_3} x_4 + x_5$ after simplification.
Label $P_1\dots P_4$ at gate outputs. NAND outputs equal $\overline{\prod}$; NOR outputs equal $\overline{\sum}$. Propagate bubbles by DeMorgan until all bubbles cancel. After algebraic simplification: $f = x_5(x_1 + x_2 + \overline{x_3} + x_4) = x_1 x_5 + x_2 x_5 + \overline{x_3} x_5 + x_4 x_5$. ✓
Build a small truth table over $(x_1, x_2)$ for each terminal of the original BDD, recording where edges from the swap-boundary go. Then re-create the BDD with $x_1$ above $x_2$. Many edges merge if the function is symmetric in those variables. Result is a BDD of equivalent or smaller size depending on the variable ordering.
Shannon on $x_1$: $f|_{x_1=0} = x_2 x_3 + x_2 x_4 + \overline{x_2}\overline{x_3}\overline{x_4}$, $f|_{x_1=1} = x_3 + x_4 + x_2 x_3 + x_2 x_4 = x_3 + x_4$. Shannon $f|_{x_1=1}$ on $x_2$: same $x_3+x_4$ either way (independent of $x_2$). For $f|_{x_1=0}$ on $x_2$: $x_2=1$ gives $x_3+x_4$; $x_2=0$ gives $\overline{x_3}\overline{x_4}$. BDD: root $x_1$, both children point to an $x_2$ node (or merge if equal). The $x_3+x_4$ subgraph appears multiple times — merge it. Multilevel SOP from BDD: $f = x_1(x_3+x_4) + \overline{x_1}[\,x_2(x_3+x_4) + \overline{x_2}\overline{x_3}\overline{x_4}\,]$.
Quine-McCluskey on $\sum m(0,2,5,6,7,8,9,13) + D(1,12,15)$: Group minterms by 1-count, combine adjacent ones across iterations until no more merges. Resulting prime implicants (after marking unmerged): typically $\{\overline{x_2}\overline{x_3}\overline{x_4}, x_2 x_4, x_2 x_3, x_1\overline{x_3}, x_1 x_4\}$ or similar. Build cover table over essential minterms, eliminate dominated rows/columns, and select. Minimum cover (one valid result): $f = x_2 x_4 + x_2 x_3 + \overline{x_2}\overline{x_3}\overline{x_4} + x_1\overline{x_3}$.
Apply tabular method: prime implicants include several 2-cubes and 1-cubes, no essential primes. Cover table has no row or column dominance. Use Petrick's method or branching: select an arbitrary prime, recurse, choose minimum-cost cover. Typical result: $f = $ 4 prime implicants of 2 literals each, gate-input cost $\approx 12$.
$\star$-product pairs cubes that differ in exactly one position, replacing that position with x. $C_0 = \{000, 001, 010, 011, 111\}$. $C_1$: $000\star 001 = 00x$, $000\star 010 = 0x0$, $001\star 011 = 0x1$, $010\star 011 = 01x$, $011\star 111 = x11$. $C_2$: $00x\star 01x = 0xx$. No further merges. After removing covered cubes, primes are $\{0xx, x11\}$. ✓
Apply $\star$-product to $C_0=\{0101,1101,1110,011x,x01x\}$. $0101\star 1101 = x101$. $011x \star x01x$: differ in two positions → no merge. $0101\star 011x = 01x1$ (differ in bit 1, x merge). $1101\star 1110$: two diff → no. $C_1 = \{x101, 01x1, 1110, 011x, x01x\}$. Continue: $x101 \star 01x1$: two diff, no. Final primes: $x101, 01x1, 1110, 011x, x01x$. Algebraic: $\{x_2\overline{x_3}x_4, \overline{x_1}x_2 x_4, x_1 x_2 x_3\overline{x_4}, \overline{x_1}x_2 x_3, \overline{x_2}x_3\}$.
The $\#$-operation: $A \# B$ returns minterms in $A$ not in $B$. For each prime $p$, compute $p \# (P \setminus \{p\})$ over the on-set. If non-empty, $p$ is essential (covers a minterm no other prime covers). For $\{x11, 0xx\}$ on the function of Figure 2.56: $x11 \# 0xx = 111$ (non-empty) → essential. $0xx \# x11 = \{000, 001, 010\}$ (non-empty) → essential. Both primes are essential.
For each $p \in P$, compute $p \# (P\setminus\{p\})$. $x01x \#$ rest: $\{0010, 0011\}$ → essential. $x101$: subset of $x101$ alone — empty after removing self → not essential. $01x1$: $\{0101\}$ — essential. $0x1x \#$ rest: $\{0010, 0011\}$ — already covered by $x01x$ → check: actually $0x1x \# x01x = \{0110, 0111\}$ — essential. $xx10$: most cells covered by others → not essential. Essentials: $x01x, 01x1, 0x1x$. (Exact set depends on minterm coverage in the function.)
Apply $\star$-product to find primes (many cubes after several iterations on the 5-var function). Use $\#$-op to identify essentials covering minterms unique to each. After essentials are chosen, branch on the remaining cover table to minimize cost. Final minimum SOP (illustrative): $f = \overline{x_1}\overline{x_2}\overline{x_3}\overline{x_4} + x_4 x_5(x_1\oplus x_2) + x_2 x_3 x_4 + \overline{x_3}\overline{x_5}$ (or equivalent ~4-term cover, depending on don't-care assignments).
Straightforward 3-LUT mapping: LUT1 = $x_1 x_2\overline{x_4}$, LUT2 = $\overline{x_2} x_3 x_4$, LUT3 = $x_1 x_2 x_3$, LUT4 = OR of these — total 4 LUTs. Decomposition: factor $x_1 x_2(\overline{x_4} + x_3) = x_1 x_2(x_3 + \overline{x_4})$ in one 3-LUT, $\overline{x_2}x_3 x_4$ in another, OR them in a third — total 3 LUTs.
Group: $f = (x_1+x_2)(x_3+x_4) + \overline{x_1}\overline{x_2}\overline{x_3}\overline{x_4}$. Use $\overline{x_1}\overline{x_2}\overline{x_3}\overline{x_4} = (\overline{x_1+x_2})(\overline{x_3+x_4}) = \overline{x_3}\overline{x_4}\cdot(\overline{x_1+x_2}) = \overline{x_3}\overline{x_4}$ when $x_1+x_2=0$. Net: $f = \overline{x_3}\overline{x_4} + (x_1+x_2)(x_3+x_4)$. ✓ (Identity verified by expansion: when $x_3+x_4=0$, only the first term remains; when $x_3+x_4=1$, the AND term covers it.)
Shannon on $x_1$: $f|_{x_1=0} = x_2 x_4$, $f|_{x_1=1} = x_3 + x_2 x_4$. Shannon $f|_{x_1=1}$ on $x_2$: $f|_{x_1=1, x_2=0} = x_3$, $f|_{x_1=1, x_2=1} = x_3 + x_4$. BDD: root $x_1$. Right branch ($x_1=1$) goes to $x_2$ node with two cofactors (one is $x_3$, one is $x_3+x_4$). Left branch ($x_1=0$) goes to a node testing $x_2$ — only $x_2 x_4$ branch. Continue with $x_3$ then $x_4$ levels, merging duplicate subgraphs.
Track the 4 entries of the truth table over $(x_2, x_3)$ at each subgraph crossing the swap. After swapping inner $x_2$ and $x_3$, build a new BDD with root $x_1$, then $x_3$, then $x_2$, then $x_4$. Some subgraphs merge differently. The new BDD usually has the same node count for symmetric functions, but variable ordering can change BDD size dramatically in general.
Minterms of $f$: $\sum m(0,1,2,4,5,8,9,13) + D(9,12,14)$ (after expanding, given the SOP form). Tabular method: group by 1-count, combine. Generated cubes: $\{00x0, 0x0x, 1x01, 10x1, 1101\}$ and primes after removal of covered ones. Cover table: essentials from rows with single mark. Minimum-cost SOP: $f = \overline{x_3}\overline{x_4} + x_1 x_4 + \overline{x_2} x_4$ (as in 2.38).
$C_0$ from minterms+don't-cares. $C_1$ via $\star$-product yields 2-cubes; $C_2$ yields 3-cubes. Convergence after ~3 iterations. Resulting primes (in cube notation, $x_1 x_2 x_3 x_4$): $\{xx00, x0x1, 1x01, 1xx1\}$ — algebraically $\{\overline{x_3}\overline{x_4}, \overline{x_2}x_4, x_1\overline{x_3}x_4, x_1 x_4\}$. The book's minimum-cost cover selects a subset matching 8.35.
(a) Minimum SOP $f = \overline{x_3}\overline{x_4} + x_1 x_4 + \overline{x_2} x_4$: 3 ANDs + 1 OR, 11 gate-inputs. (b) Minimum POS $f = (\overline{x_2}+\overline{x_3})(\overline{x_3}+x_4)(x_1+\overline{x_3}+\overline{x_4})$: 3 ORs + 1 AND, ~12 inputs. (c) Multilevel: factor $f = \overline{x_3}\overline{x_4} + x_4(x_1+\overline{x_2})$: 2 levels for the second term + final OR — about 8 inputs but 3 levels deep. Trade-off: SOP is fastest (2 levels) at 11 inputs; multilevel saves area (8 inputs) at the cost of 3 levels of delay.
$h[g(x_1,x_2,x_3,x_4), x_5, x_6, x_7]$ — first LUT computes any 4-var function $g$, second computes any 4-var function $h$ of $g$ and three other vars. $f_1 = x_1 x_2 x_3 x_4 x_5 x_6 x_7$: set $g = x_1 x_2 x_3 x_4$, $h = g \cdot x_5 x_6 x_7$ — works. $f_2 = x_1+\dots+x_7$: $g = x_1+x_2+x_3+x_4$, $h = g + x_5+x_6+x_7$ — works. But e.g. $f = x_1 x_5 + x_2 x_6 + x_3 x_7 + x_4$ requires LUT2 to know all of $x_1\dots x_4$ individually — only $g$ (one bit summarizing them) is available. So no 2-LUT realization exists for this $f$.
Fundamental mode: Asynchronous circuit operating assumption where inputs change only when the circuit is in a stable state — i.e., one input changes at a time, and the circuit is allowed to settle before the next input change.
Input constraints:
These constraints prevent races and ensure well-defined behavior.
Cross-coupled NOR gates (SR latch): Q = NOR(R,Q̄), Q̄ = NOR(S,Q). Flow table (S,R inputs; stable states circled):
| State | SR=00 | SR=01 | SR=11 | SR=10 |
|---|---|---|---|---|
| Q=0 | 0 | 0 | 0 | 1 |
| Q=1 | 1 | 0 | 1 | 1 |
Stable states (circled): Q=0 with SR=00,01; Q=1 with SR=00,10. SR=11 is the forbidden input (both outputs attempt to become 0). When SR=11→00, output is indeterminate (race).
Stable state: A state where the next state equals the present state for the current input combination — the circuit has finished transitioning and no further changes occur. Y = y (feedback variable equals its driving function).
Unstable state: A state where Y ≠ y — the circuit's next state differs from its current state. The circuit will continue transitioning (may pass through multiple unstable states) until reaching a stable state. The sequence of unstable-to-stable transitions is called a transition.
Critical race: When multiple state variables must change simultaneously, and the outcome depends on which changes first. Different orderings lead to different final states — non-deterministic behavior.
Example: State 00→11 transition. If y₁ changes first: 00→10→11 (correct). If y₂ changes first: 00→01→11 (may be correct). But if 10 or 01 are stable states for the current input, the circuit may get stuck there.
Fix: Use race-free state assignment (e.g., assign codes so any transition changes only one variable) or use one-hot encoding for asynchronous circuits.
f = x₁x₂ + x̄₂x₃. Static 1-hazard occurs when both x₁x₂ and x̄₂x₃ are 0→1 simultaneously — when x₂ transitions from 1→0 with x₁=1, x₃=1: transition from x₁x₂=1 to x̄₂x₃=1, momentarily neither term is 1.
Hazard condition: At x₁=1, x₃=1, as x₂ transitions. The two minterms 110 and 011 are not in the same K-map group.
Eliminate: Add the consensus term x₁x₃ (covers the adjacent minterms 110 and 011, bridging the gap):
f = x₁x₂ + x̄₂x₃ + x₁x₃ (hazard-free)
Static 1-hazard: Output should remain 1 but momentarily glitches to 0 during an input transition.
Static 0-hazard: Output should remain 0 but momentarily glitches to 1 during an input transition. Occurs in POS implementations; eliminated by adding extra sum terms.
Dynamic hazard: Output should change once (0→1 or 1→0) but transitions 3 or more times (e.g., 0→1→0→1). Caused by multiple paths of different delays when there are reconvergent paths. More complex to eliminate — requires removing the reconvergent topology or balancing delays.
Compatible states: Two states sᵢ and sⱼ are compatible if (1) they have the same output for all input combinations where both are defined, and (2) their successor state pairs are also compatible (or both undefined) for all inputs.
Merger diagram: A graph where nodes are states and undirected edges connect compatible state pairs. Groups of mutually compatible states can be merged into a single state.
Process: find all compatible pairs (from implication table), draw merger diagram, find maximum compatible groups (cliques), select minimum number of merged states that covers all original states.
f = Σm(1,2,3,5,7). K-map groups: {1,3,5,7}=x₃; {2,3}=x̄₁x₂; {3,7}=x₂x₃.
Min SOP = x₃ + x̄₁x₂. Check for static 1-hazard: adjacent 1s not in same group?
Minterms 2(010) and 3(011): covered by x̄₁x₂ and x₃ respectively. Transition m2→m3 (x₃: 0→1) with x₁=0,x₂=1: initially x̄₁x₂=1, finally x₃=1. During transition: momentarily neither? No — x̄₁x₂=1 throughout (x₁,x₂ constant). No hazard on this path.
Check m5(101)→m7(111): x₂:0→1 with x₁=1,x₃=1. Before: x₃=1(covers). After: x₃=1 and x₂x₃=1. No hazard.
Hazard exists between m2(010) and m3(011): x̄₁x₂ covers m2; x₃ covers m3. Adding x̄₁x₂x₃ as bridge or the extra PI x₂x₃: f = x₃ + x̄₁x₂ + x̄₁x₃ is hazard-free (extra term bridges).
Y₁ = x̄₁(x₂+y₁); Y₂ = x₁(x̄₂+y₂). Build flow table: states = (y₁y₂), inputs = (x₁x₂).
| y₁y₂ | x₁x₂=00 | 01 | 11 | 10 |
|---|---|---|---|---|
| 00 | 00→00 | 01→0,0 | 00 | 00 |
| 01 | 01→0,0 | 01→01 | 01→0,1 | 00 |
| 10 | 10→1,0 | 10→1,0 | 10→1,1 | 10→10 |
| 11 | 10 | 11→1,1 | 11→11 | 10 |
Stable states (Y=y): (00,x=00), (01,x=01), (10,x=10), (11,x=11). Transitions determined by evaluating Y₁,Y₂ for each (y₁y₂,x₁x₂) combination.
| Aspect | Synchronous | Asynchronous |
|---|---|---|
| Speed | Limited by clock period | Can be faster (no clock overhead) |
| Power | Clock tree always switching | Activity-dependent; potentially lower |
| Testability | Easy: scan chains, controllable states | Difficult: timing-dependent behavior |
| Design effort | Well-understood; rich CAD support | Complex; races, hazards, limited tools |
| Noise | Simultaneous switching noise at clock edges | Spread out; less EMI |
Essential hazard: An inherent hazard in an asynchronous sequential circuit that cannot be eliminated by adding or removing gates. It arises when a single input change causes the circuit to cycle through multiple unstable states, and the required ordering of state variable changes cannot be guaranteed by pure combinational logic delay matching.
Differs from static/dynamic hazards: Static/dynamic hazards occur in combinational logic and can be eliminated by adding redundant terms. Essential hazards are sequential — they result from the interaction between feedback loops and input changes.
Can gates fix it? Not by adding redundant terms. Instead, essential hazards require introducing deliberate delays (e.g., RC circuits or buffer chains) on specific feedback lines to ensure the proper sequencing of state variable updates.
Muller C-element: output C changes to match inputs only when all inputs agree; holds previous value otherwise.
For 2-input C-element: C(t+1) = AB + (A+B)·C = AB + AC + BC (majority function of A, B, C_current).
Implementation using async techniques:
Y = A·B + Y·(A + B) = A·B + Y·A + Y·B
Circuit: two AND gates for AB and YA and YB; one OR gate for Y. This is a hazard-free equation since the feedback Y ensures that C holds its value when A≠B.
The C-element is fundamental to handshaking asynchronous protocols (Micropipelines, NULL convention logic).
SR latch synthesis from primitive flow table:
Primitive flow table: 2 rows (Q=0, Q=1), 4 input columns (SR: 00,01,10,11). Assign y=Q.
Excitation Y (next state) from the flow table:
Y = S + R̄·y (standard SR latch equation, SR≠11 assumed).
Output Q = y (state variable directly).
Implementation: Y = NOR(R, NOR(S,Y)) — cross-coupled NOR gates realize this feedback equation naturally.
State reduction on an async flow table:
Example: if A and C have same outputs and their successors under all inputs are also compatible pairs → A≡C, merge to a single state. Reduces the flow table rows, simplifying the implementation.
One-hot state encoding for 3-state async FSM: one SR latch (or FF) per state.
States S₁,S₂,S₃ → y₁,y₂,y₃ with only one yᵢ=1 at a time.
Transitions are controlled by setting the target state latch and resetting the current state latch. Since only one variable changes per transition (the one being set or reset), one-hot is inherently race-free for asynchronous circuits.
Advantage: no critical races. Disadvantage: uses more state variables (3 instead of 2 for 3 states).
Three challenges of async circuit design:
Primitive flow table: One row per stable state. Each input combination in that row has exactly one stable state (the current state) and shows the next state for each input combination. Rows are not merged — one unique stable state per row.
Merged flow table: Multiple compatible stable states combined into one row, reducing the number of state variables (fewer FFs needed).
Why merging is needed: A primitive flow table with N stable states requires ⌈log₂N⌉ state variables or N state variables (one-hot). Merging compatible states reduces the number of rows (and thus state variables), yielding a more compact implementation and simplifying the excitation equations.
Dynamic hazard: A signal that should make a single transition (0→1 or 1→0) instead makes three or more transitions (0→1→0→1 or 1→0→1→0) due to multiple paths with different delays in the circuit.
Conditions: Requires at least three distinct paths from input to output with different delays, and reconvergent fan-out where the paths merge.
Elimination: Dynamic hazards cannot be eliminated simply by adding terms (unlike static hazards). They require restructuring the circuit — either eliminating reconvergent paths, ensuring that no competing paths exist, or using hazard-free logic design techniques such as implementing in a single fan-in stage.
"Don't happen" input combinations in async design are input state combinations that cannot physically occur — either due to circuit constraints or by design convention.
They appear as blanks or "–" in the flow table cells. For example, if inputs are R and S of a latch, the combination R=1, S=1 may be defined as a "don't happen" condition (never applied simultaneously).
These are treated as don't-care conditions during excitation equation minimization — may be assigned 0 or 1 freely to minimize logic cost — provided the circuit designer guarantees they never actually occur.
Async arbiter for two requestors r₁, r₂. Must grant mutual exclusion — only one grant active at a time.
The arbiter is an inherently asynchronous element. A simple implementation uses a cross-coupled set-reset circuit:
g₁ = r₁ AND NOT(g₂) // can only grant 1 if 2 not granted g₂ = r₂ AND NOT(g₁)
But this has a combinational loop (metastable when both r₁,r₂=1 simultaneously). A proper async arbiter uses a Muller C-element or bistable circuit with metastability resolution time built in. The circuit is not truly deterministic when requests arrive simultaneously — this is fundamental (proven by Lamport).
Race-free state assignment for 4-state async FSM. States A,B,C,D need binary codes such that any required transition changes only 1 bit (to avoid critical races).
Draw the transition diagram. Required transitions: e.g., A→B, B→C, C→D, D→A.
Assign codes so each transition differs by 1 bit (Gray code sequence):
A=00, B=01, C=11, D=10 (standard Gray code).
If transitions A→C or B→D are needed (2-bit change), add intermediate states or use shared-row technique (extra state codes that serve as transition intermediaries).
Async vending machine controller: primitive flow table has one row per stable state-input combination.
Example states: IDLE, GOT_NICKEL, GOT_DIME, DISPENSE. Input variables: N (nickel inserted), D (dime inserted). Output: dispense, change.
Primitive flow table: each row is a unique (state, input) pair with a stable state. Adjacent rows that are compatible are merged using the merger diagram. After merging, assign race-free state codes and derive excitation equations for each state variable using K-maps.
For a 5-state async FSM, a race-free assignment may not exist with ⌈log₂5⌉=3 bits if required transitions involve multi-bit changes.
Shared-row technique: Add "extra" (intermediate) states between problematic transitions. The extra states have unique codes that differ by 1 bit from both the source and destination, converting a 2-bit transition into two consecutive 1-bit transitions.
For example, transition A(001)→C(110) requires 3-bit change. Add intermediate state X(011) or (101): A→X→C, each step changes 1 bit. X is an unstable state for the relevant input — the circuit passes through X instantaneously on its way to C.
For a 6-row flow table (6 stable states):
With 6 states, ideally merge to 2–3 rows (2 state variables), though constraints may require more.
Proof that two-level SOP has no static 0-hazards:
A static 0-hazard occurs when the output should remain 0 but briefly goes to 1. In a two-level SOP, the output is 1 only when at least one product term is 1. If the output should remain 0, all product terms must be 0 both before and after the transition. During the transition, each product term is a static function of individual gate delays — a product term can only glitch 0→1 if both its inputs are 1 simultaneously, but that would mean the function value is actually 1, not 0. Therefore, a product term cannot produce a 1-glitch when both its pre- and post-transition values are 0. Hence no static 0-hazard in SOP. ∎
When do static 0-hazards appear? In two-level POS (product of sums) implementations — hazards in the sum terms can produce 0-glitches on the output when the output should remain 0.
Treat $C, D$ as inputs and $y$ (latch state) as the only state variable. Next-state: $Y = CD + \overline{C}y$ (when $C=1$, $Y=D$; when $C=0$, $Y=y$). Excitation table over $(C,D,y)$: $Y = 0$ for $(00,0),(01,0),(10,0),(11,0)$; $Y=1$ for $(00,1),(01,1),(10,1),(11,1)$ — i.e. stable when $C=0$ regardless of $D$, transparent when $C=1$. Flow table: 2 stable states $S_1$ ($y=0$) and $S_2$ ($y=1$); $S_1\to S_2$ when $CD=11$; $S_2\to S_1$ when $CD=10$.
Two state variables $y_m, y_s$. $Y_m = CD + \overline{C}y_m$ (master latch transparent when $C=1$); $Y_s = Cy_m + \overline{C}y_s$ (slave transparent when $C=0$, copies $y_m$ when $C=1$ — actually, slave copies when $C=0$, so $Y_s = \overline{C}y_m + Cy_s$). Use the textbook's exact form. Four states $S_1\ldots S_4$ correspond to $(y_m, y_s) \in \{00, 01, 10, 11\}$. Excitation table tracks each transition under input changes; flow table shows the master loads on $C=1$, slave outputs on $C=0$ — i.e. proper edge-triggered behavior.
Tabulate $(Y_1, Y_2)$ for each $(w_1, w_2, y_1, y_2)$ combination using the given equations. Stable states (where $Y_i = y_i$): identify rows with no change. Output $z = y_1\overline{y_2}$. States: $(00)$ idle, $(01)$ first coin, $(10)$ second coin (vend pulse), back to $(00)$. On input $w_1$ (a coin), the FSM advances; on $w_2$ (a separate coin) likewise. State diagram: 3 reachable states, $z=1$ in state $10$ — vending machine that dispenses after two coins.
Four states $A, B, C, D$ (states between rising and falling edges of even/odd pulses). $A$ ($w=0$, $z=0$, even count), on $w$ rising → $B$; $B$ ($w=1$, $z=1$, odd count), on $w$ falling → $C$; $C$ ($w=0$, $z=1$), on $w$ rising → $D$; $D$ ($w=1$, $z=0$), on $w$ falling → $A$. Race-free assignment: $A=00, B=01, C=11, D=10$ (Gray code, single-bit changes around the loop). Next-state expressions: $Y_1 = w y_2 + y_1 w$ (etc., derive from K-map), output $z = y_2 \oplus y_1$ (or similar simple expression).
Eight states (rising and falling edge pair for each of 4 counts). Use a 3-bit Gray-code assignment so each transition changes one bit: $S_0 = 000, S_1 = 001, S_2 = 011, S_3 = 010, S_4 = 110, S_5 = 111, S_6 = 101, S_7 = 100$. Next-state equations $Y_3, Y_2, Y_1$ derived from the cycle $S_0\to S_1\to S_2\to\dots\to S_7\to S_0$ on alternating $w$ edges. Outputs $z_2 z_1$ encode the count modulo 4: $z_2 z_1 = 00$ in $\{S_0, S_1\}$, $01$ in $\{S_2, S_3\}$, $10$ in $\{S_4, S_5\}$, $11$ in $\{S_6, S_7\}$.
States: idle $A$, granted-1 $B$, granted-2 $C$. Diagonal transition $B \to C$ when $r_1$ falls and $r_2$ rises simultaneously is unsafe. Insert state $D$ (transient): $B \to D$ on $\overline{r_1}$, $D \to C$ on $r_2$. State assignment $A=00, B=01, C=11, D=10$ (Gray code). Outputs (Moore): $g_1=1$ in $B$ only; $g_2=1$ in $C$ only; $D$ has $g_1=g_2=0$ (safe value during transient).
Primitive flow table has many rows (one per input combination per state). Apply partitioning: group rows with the same outputs and same next-state pattern. Then build the merger diagram showing which rows have compatible (i.e., agreeing-where-defined) entries; merge maximal cliques. Result: 4 rows representing states $A$ (idle, 0¢), $B$ (5¢), $C$ (10¢ → dispense), $D$ (transient if needed). Output asserts in $C$.
Apply partitioning first: split the 12 rows by output equivalence, then by next-state equivalence iteratively. Once partitioning stabilizes, draw the merger diagram (vertices = remaining rows, edges = compatibility). Find a clique cover; merge each clique into one row. After 2-3 reductions the table typically shrinks from 12 → 6 → 4 rows.
Moore-style merging: outputs must agree row-by-row. Mealy-style merging: outputs may differ on transient transitions, so more rows are compatible. Apply partitioning first. Then compute Moore merger graph and Mealy merger graph. Mealy yields fewer states because flexibility on outputs lets more row-pairs merge; e.g., 8 rows reduce to 5 (Moore) or 3 (Mealy). Mealy wins when outputs depend strongly on input transitions.
Partition the 8 rows by output-equivalence. Build Mealy merger graph; identify a 2-clique that covers all rows (only possible if all rows are pairwise compatible after relaxing on don't-care entries). Merge to 2 rows. The reduced FSM has only 2 states; the input-output sequence is recovered by the Mealy outputs differing on transitions.
On a 2-cube vertices $00, 01, 10, 11$. Assignment $A=00, B=01, C=10, D=11$: transition $B\to C$ goes from $01$ to $10$ — diagonal (Hamming distance 2). Alternative $A=00, B=01, C=11, D=10$: transitions $A\to B$ ($00\to 01$, dist 1), $B\to C$ ($01\to 11$, dist 1), $C\to D$ ($11\to 10$, dist 1), $D\to A$ ($10\to 00$, dist 1). All single-bit changes — race free.
3 states need at least 2 state variables. Vertices on 2-cube: $00, 01, 10, 11$. With 3 used, the unused vertex isolates one transition pair. Some pair of states must lie on a diagonal — Hamming-distance-2 — causing race. Add unstable state $D$ to break the diagonal: transitions $B\to D\to C$ each Hamming distance 1. Modify the flow table accordingly: $D$ has next-state $C$ on the input that previously took $B\to C$, with output set to a safe value (e.g., both grants 0).
Flow table has 7 stable states across 4 logical states and 4 input columns. Relabel each stable instance separately ($S_1@col_1, S_1@col_2,\dots$) to enumerate 7 placements. Draw a transition diagram between the 4 logical states; identify the unspecified entry as a free placement. Use it to break a diagonal transition by routing through an empty cell. State assignment maps the 4 states onto a 2-cube without diagonals. Specify Mealy outputs in unstable cells to match the destination's stable output.
4 states all pairwise reachable → in 2-cube some transitions must be diagonal. Add 3 unstable states $E, F, G$ on the 3-cube to break each diagonal. On a 3-cube (8 vertices), place 4 logical states at vertices forming an independent-set pattern (Hamming distance 2 apart), then route each diagonal through one of $E, F, G$. The new flow table has 7 rows (4 logical + 3 unstable), and the excitation table for $Y_1, Y_2, Y_3$ implements all transitions with single-bit changes.
Replace each of the 4 states with a pair (e.g., $A\to A_1, A_2$). Now 8 states fill the 3-cube. Arrange transitions so each move is single-bit Hamming. The resulting transition diagram has no diagonal paths — every original transition between logical states maps to a 2-step path through one of the new vertex pairs. Modify the flow table to specify the second member of each pair as a transient with the same outputs.
When $C$ falls from 1 to 0, the master output $Y_m$ should hold the captured $D$ value, but the unequal delays on the $CD$ vs $\overline{C}y_m$ paths cause a momentary $Y_m=0$ glitch. The slave samples this glitch and settles into the wrong state. Fix: add the redundant prime implicants $D y_m$ to $Y_m$ (covering the adjacent 1-cells across the $C$ boundary) and $y_m y_s$ to $Y_s$. The hazard-free expressions: $Y_m = CD + \overline{C}y_m + D y_m$, $Y_s = Cy_m + \overline{C}y_s + y_m y_s$.
A static-1 hazard arises when an input change causes the only product term covering a minterm to drop momentarily before another term takes over. Only product terms covering pairs of adjacent 1s in the K-map (with the changing variable) are needed for hazard-free coverage — not all primes. For Figure 9.64, the changing-variable adjacencies are covered by $\{x_1\overline{x_3}, \overline{x_2}\overline{x_3}, x_3 x_4\}$. So $f = x_1\overline{x_3} + \overline{x_2}\overline{x_3} + x_3 x_4$ is hazard-free.
When $x_1=x_3=0$ and $x_2$ rises, sum term $(x_1+\overline{x_2}+\dots)$ drops from 1 to 0; another sum term takes over with delay, producing a $0\to 1\to 0$ glitch on the AND output. Fix: add the sum term that covers the 0-cell pair across the $x_2$ boundary — typically $(x_1+\overline{x_2}+x_3)$ (or analogous, using POS K-map adjacent zeros). The hazard-free POS includes that extra factor.
Substitute the given expressions into the excitation table over $(w_1, w_2, y_1, y_2)$. Identify stable cells (where $Y_i = y_i$). Label states. Some transitions cause both $y_1$ and $y_2$ to change simultaneously (race). For each, check if the destination is the same regardless of which variable changes first — these are safe races. The output $z=y_2$ tracks which state the FSM is in.
Plot $Y_1$ and $Y_2$ on K-maps. Identify 1-cells that are adjacent in the K-map but covered by different product terms. The pair-coverage rule says a hazard-free SOP must cover every such adjacency with some single product term. If a prime implicant is missing that covers a needed adjacency, add it. For $Y_1$, add e.g. $w_1 y_1$ as the missing implicant. For $Y_2$ similarly. Resulting hazard-free expressions include all minimum primes plus the extra ones for adjacency coverage.
4 states — $S_0$ (idle, even count), $S_1$ (just saw rising edge of 1st pulse), $S_2$ (after 1st pulse, even count, no output), $S_3$ (just saw rising edge of 2nd pulse, output asserted). Output $z=1$ in $S_3$ and during its falling edge. Race-free Gray code: $S_0=00, S_1=01, S_2=11, S_3=10$. Derive $Y_1, Y_2$ from K-map; output $z = y_1\overline{y_2}$ (or similar).
Apply partitioning: split rows by output, then refine by next-state equivalence. Suppose 8 rows → 4 after partitioning. Build Moore merger diagram; find compatible-row cliques. After merging, ~4 logical states remain. State assignment on a 3-cube: pick vertices with Hamming distance 1 between transitioning pairs (some require unstable intermediate states). Derive excitation table from the assigned next-state values.
Plot $f$ on a 5-var K-map ($\sum m + D$). Find minimum prime cover: a typical cover is $\{\overline{x_1}\overline{x_2}x_3, x_1\overline{x_2}x_4 x_5, x_1 x_2\overline{x_4}x_5, x_1\overline{x_3}x_4\}$ (4 primes). Then check pair-of-adjacent-1s coverage: identify any adjacency not covered by a single prime and add an extra prime implicant to cover it. Final hazard-free SOP includes those extras.