## VU2 185.324

Compilation Techniques for VLIW Architectures

Dietmar Ebner ebner@complang.tuwien.ac.at Florian Brandner brandner@complang.tuwien.ac.at

http://complang.tuwien.ac.at/cd/vliw

Ebner, Brandner | Compilation Techniques for VLIWs | SS08 Slide #1

## Last Lecture

- Product life cycle
  - Product volume ("long tail" vs. "head")
- Constraints
  - Performance/power/size
  - Production costs/Development costs
  - Market/Time to market











| Example: MIP 3 74K                                                                                                                                                             |                                                                                            |                                                                                                     |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|--|--|
| <pre>void vec_sum(int *a   for (int i = 0; i       (*c++) = (*a++) }</pre>                                                                                                     | <pre>, int *b, int*c, int<br/>&lt; n; i++)<br/>+ (*b++);</pre>                             | = n) {                                                                                              |  |  |
| (0) .BB1_2:                                                                                                                                                                    | IEU                                                                                        | IEU/AGEN                                                                                            |  |  |
| <pre>(1) lw \$10,0(\$4)<br/>(2) addiu \$3,\$3,1<br/>(3) lw \$9,0(\$5)<br/>(4) addiu \$4,\$4,4<br/>(5) addiu \$5,\$5,4<br/>(6) addiu \$5,\$5,4<br/>(6) addiu \$9,\$9,\$10</pre> | <pre>(2) addiu \$3,\$3,1 (4) addiu \$4,\$4,4 (5) addiu \$5,\$5,4 (9) addiu \$6,\$6,4</pre> | <pre>(1) lw \$10,0(\$4)<br/>(3) lw \$9,0(\$5)<br/>(6) addu \$9,\$9,\$10<br/>(7) sw \$9,0(\$6)</pre> |  |  |

| Example: ST231                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                        |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| <ul> <li>Compiler generates 2 lo         <ul> <li>first captures initial<br/>(n &amp; 3) iterations</li> <li>second is 4 times<br/>unrolled</li> </ul> </li> </ul> | <pre>(0) L?0_9:<br/>(1) ldw \$r90 = 0[\$r17]<br/>(2) add \$r16 = \$r16,4<br/>(3) add \$r17 = \$r17,4<br/>(4) add \$r20 = \$r20,-1;;<br/>(5) ldw \$r10 = -4[\$r16]<br/>(6) convib \$h1 = \$r20<br/>(7) add \$r18 = \$r18,4<br/>(8) add \$r15 = \$r15,1;;<br/>(9) add \$r9 = \$r9,\$r10;;<br/>(10) stw -4[\$r18] = \$r9<br/>(11) br \$b1, L?_0_9;;</pre> |  |  |  |
|                                                                                                                                                                    | (11) br \$b1, L? <u>0</u> 9;;                                                                                                                                                                                                                                                                                                                          |  |  |  |

| Architecture classification                                     |                                                                                         |                                                                   |                                                                                           |  |  |
|-----------------------------------------------------------------|-----------------------------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------------------------------|--|--|
|                                                                 |                                                                                         |                                                                   |                                                                                           |  |  |
|                                                                 | Sequential<br>Architectures                                                             | Dependence<br>Architectures                                       | Independence<br>Architectures                                                             |  |  |
|                                                                 | Superscalar                                                                             | Dataflow                                                          | VLIW                                                                                      |  |  |
| Dependence<br>information in<br>the program                     | Implicit via<br>register names                                                          | Exact description of<br>all dependences                           | Explicit description of<br>some independent<br>operations                                 |  |  |
| How are<br>dependent<br>operations<br>typically exposed         | By the hardware's<br>control unit                                                       | By the compiler<br>(and they are embedded<br>into the program)    | By the compiler<br>(and they are implicit i<br>the program)                               |  |  |
| How are<br>independent<br>operations<br>typically exposed       | By the hardware's<br>control unit                                                       | By the hardware's<br>control unit                                 | By the compiler<br>(and they are implicit i<br>the program)                               |  |  |
| Where is the final<br>opeartion<br>scheduling<br>typically done | In the hardware's<br>control unit                                                       | In the hardware's control unit                                    | In the compiler                                                                           |  |  |
| Role of the compiler                                            | Rearrange code to make<br>parallelism more evident<br>and accessible to the<br>hardware | Replace some of the<br>analysis hardware found<br>in superscalars | Replaces virtually all<br>hardware dedicated to<br>parallelism exposure<br>and scheduling |  |  |













- Conditionally nullify the effect of operations
- Full predication
  - All (almost all) operations can be predicated
- Partial predication
  - Only a few instructions can be predicated
- Conditional move (cmov)
- Select

3/28/08 Ebner, Brandner | Compilation Techniques for VLIWs | SS08 Slide #21



- Undo incorrect calculations













## Terminology

- Instruction/group
  - Independent operations that can be executed in parallel
- Bundle
  - Group of operations that are encoded in the same  $\ensuremath{\mathsf{VLIW}}$
  - Not necessarily independent

03/28/08 Ebner, Brandner | Compilation Techniques for VLIWs | SS08 Slide #29

## Terminology (2)

- Operation
  - Basic operation of the execution pipeline
  - Similar to RISC operations/instructions
- Syllable
  - Basic unit for the instruction encoding
  - Fixed bit width
  - Typically encodes one single operation

03/28/08 Ebner, Brandner | Compilation Techniques for VLIWs | SS08 Slide #30

























