Using perfctr and perfex on the PPC 7450 family

Perfctr is a Linux patch for using the performance monitoring counters of a number of a number of IA32, AMD64, and PPC CPUs. Perfex is a tool coming with Perfctr. This page is mainly about using Perfex on the PPC7450 family (which includes the 744x and the 745x CPUs, used in various Apple G4 computers with >500MHz). (for perctr-2.6.11)

The first thing to do (after installation) is to run perfex -i. I get

PerfCtr Info:
abi_version             0x05010000
driver_version          2.6.11
cpu_type                5 (PowerPC 7450)
cpu_features            0x7 (rdpmc,rdtsc,pcint)
cpu_khz                 589824
tsc_to_cpu_mult         32
cpu_nrctrs              6
cpus                    [0], total: 1
cpus_forbidden          [], total: 0

When you run perfex, you have to multiply the tsc value you get by tsc_to_cpu_mult*real_khz/cpu_khz (with real_khz being the actual CPU speed; ideally real_khz=cpu_khz, but on my iBook, real_khz=1066666 and cpu_khz=589824). Another way to get cycles is to use the event 1 of any counter:

[b6:/tmp:283] perfex -e 1 true
tsc                                   28054
event 0x00000001                    1620946

The event numbers used by perfex are those given in Section 11.5 of the MPC7450 RISC Microprocessor Family User's Manual. Note that perfex expects event numbers in hex, whereas the manual gives them in decimal and binary.

The PPC 7450 has six counters. Each of them can count different events, and usually the same event number means different things for different counters (and it's not very systematical, either). Perfex works by using the first event number given for PMC1SEL, the second for PMC2SEL etc.

As an example

perfex -e 14 -e 1a -e e -e 2 -e 1 latex bench
counts the numbers of (in this order) completed stores, loads, FP instructions and total instructions, and the number of cycles while running latex bench.

Here are the events counted by the PPC7450, sorted by name; look in the manual for more detailed desriptions of the events (P=PMC#; ev=event number in hex (for perfex)):

P ev
1  6   -- External performance monitor
1 40   AltiVec load instructions completed
2 3d   between Privileged and User
6 1c   BORDQ full
2 39   BPU Stall on LR dependency
3 1a   Branch flushes
2 3b   Branch link stack correctly resolved
3 1f   Branch Link Stack Mispredicted
1 1b   Branch link stack predicted
1 19   Branch unit stall
3 1d   Branch unit stall on CTR dependency
2 3a   BTIC miss
6 2c   Bus reads not retried
6 2e   Bus reads/writes not retries
6 1a   Bus retry
6 31   Bus retry due to collision
6 32   Bus retry due to intervention ordering
6 2f   Bus retry due to L1 retry
6 30   Bus retry due to previous adjacent
6 2a   Bus TAs for reads
6 2b   Bus TAs for writes
6 2d   Bus writes not retried
1 34   Cache-inhibited stores
2 34   Cacheable store merge to 32 bytes
3 13   Cancelled L1 instruction cache misses
1 22   Completed branch instructions
1 21   Completed IU2 instructions
4  e   Completing 3 instructions
2 21   Completing one instruction
3  8   Completing two instruction
2 20   Completion queue entries over MMCR0[THRESHO LD] value
1 1f   Counts AltiVec issue queue entries over MMCR0[THRESHOLD]
1 1e   Counts instruction queue entries over MMCR0[THRESHOLD]
1  f   Cycles a VFPU instruction
2  f   Cycles a VFPU instruction in the reservation station is waiting for operand
1 10   Cycles a VIU1 instruction
2 10   Cycles a VIU1 instruction in the reservation station is waiting for operand
2 11   Cycles a VIU2 instruction in the reservation station is waiting for operand
1  e   Cycles a VPU instruction
2  e   Cycles a VPU instruction in the reservation station is waiting for an operand
1 11   Cycles an instruction in VIU2 reservation station waits for operand
1 18   Cycles first speculation buffer active
2 24   Cycles waiting from L1 instruction cache miss
1 1d   Cycles where 3 instructions are dispatched
1 20   Cycles where no instructions completed
2 35   Data breakpoint matches
2 18   Dispatches to FPR issue queue
1 1c   Dispatches to GPR issue queue
3  a   Dispatches to VR issue queue
4  f   Dispatching 0 instructions
1 3c   dss instructions completed
4 13   dssall instructions
1 39   dst instructions dispatched
2 2a   dst stream 1 cache line fetches
4 1a   dst stream 3 cache line fetches
4 17   DTLB hardware table search cycles
1 28   DTLB hardware table search cycles over MMCR0[THRESHOLD] value
3 12   DTLB misses
6 19   DTQ full cycles
1 23   eieio instructions completed
6 16   External interventions
2  7   External performance monitor signal
3  7   External performance monitor signal
4  7   External performance monitor signal
6 17   External pushes
6 18   External snoop retry
2 36   Fall-through branches processed
3 1e   Fast BTIC hit
2 37   First speculative branch buffer resolved correctly
1 5b   Floating-point 1/2 FPSCR renames busy
1 5a   Floating-point 1/4 FPSCR renames busy
1 5c   Floating-point 3/4 FPSCR renames busy
1 5d   Floating-point all FPSCR renames busy
1 43   Floating-point denormalization
1 5e   Floating-point denormalized result
1 51   Floating-point load double completed in LSU
1 4f   Floating-point load instruction completed in LSU
1 50   Floating-point load single instruction completed in LSU
1 42   Floating-point renormalization
1 44   Floating-Point store causes stall in LSU
4 1e   Floating-point store double completes in LSU
1 41   Floating-point store instructions completed in LSU
4 1d   Folded branches
3  d   FPR issue queue entries
2 3c   FPR issue stalled
3  e   FPU instructions
4 10   GPR issue queue entries over threshold
4 11   GPR issue queue stalled
3  c   GPR rename buffer entries over MMCR0[THRESHO LD]
1 2a   Instruction breakpoint matches
1  2   Instructions completed
2  2   Instructions completed
3  2   Instructions completed
4  2   Instructions completed
4  9   Instructions completed in VFPU
4  8   Instructions completed in VPU
1  4   Instructions dispatched
2  4   Instructions dispatched
3  4   Instructions dispatched
4  4   Instructions dispatched
5 12   Intervention
6 12   Intervention
1 27   ITLB hardware table search cycles
3 11   ITLB hardware table search cycles over threshold
2 23   ITLB non-speculative misses
4 12   IU1 instructions
2 32   L1 data cache castouts to L2
1 2b   L1 data cache load miss cycles over MMCR0[THRESHOLD] value
3 14   L1 data cache operation hit
2 31   L1 data cache reloads
2 29   L1 data cycles used
2 25   L1 data load access miss
1 35   L1 data load hit
3 15   L1 data load miss cycles
3 16   L1 data Pushes
1 31   L1 data snoop hit castout
1 30   L1 data snoop hit in L1 castout queue
1 2c   L1 data snoop hit on modified
1 32   L1 data snoop hits
1 16   L1 data snoops
2 16   L1 data snoops
1 37   L1 data store hit
2 27   L1 data store miss
1 38   L1 data total hits
3 17   L1 data total miss
2 17   L1 data total misses
1 36   L1 data touch hit
2 26   L1 data touch miss
2 28   L1 data touch miss cycles
6 13   L1 external Interventions
1 29   L1 instruction cache accesses
1 15   L1 instruction cache misses
2 15   L1 instruction cache misses
2 30   L1 instruction cache reloads
6  8   L2 cache castouts
5  2   L2 cache hits
6  2   L2 cache hits
5 13   L2 cache misses
6 1d   L2 cache misses
5  6   L2 data cache misses
6  6   L2 data cache misses
6 14   L2 external Interventions
5  4   L2 instruction cache misses
6  4   L2 instruction cache misses
5  8   L2 load hits
5  9   L2 store hits
5  d   L2 touch hits
6  d   L2 touch hits
6 1b   L2 valid request
6  a   L2SQ full cycles
5  3   L3 cache hits
6  3   L3 cache hits
6 1f   L3 cache hits
5 14   L3 cache misses
6 1e   L3 cache misses
6 20   L3 cache misses
6  9   L3 castouts
5  7   L3 data cache misses
6  7   L3 data cache misses
6 22   L3 data cache misses
6 15   L3 external Interventions
5  5   L3 instruction cache misses
6  5   L3 instruction cache misses
6 21   L3 instruction cache misses
5  a   L3 load hits
6 23   L3 load hits
5  b   L3 store hits
6 24   L3 store hits
5  e   L3 touch hits
6  e   L3 touch hits
6 25   L3 touch hits
6  b   L3SQ full cycles
2 1a   Load instructions
1 2d   Load miss alias
1 2e   Load miss alias on touch
1 26   Load string and load multiple instructions completed
3 10   Load string and multiple instruction pieces
1 46   Load/store true alias stall
1 49   LSU alias versus CSQ
1 48   LSU alias versus FSQ/WB0/WB1
2 3e   LSU completes floating-point store single
1 56   LSU CSQ forwarding
1 47   LSU indexed alias stall
2 19   LSU instructions completed
1 4e   LSU LMQ full stall
1 54   LSU LMQ index alias
1 53   LSU load versus store queue alias stall
1 4a   LSU load-hit line alias versus CSQ0
1 4b   LSU load-miss line alias versus CSQ0
1 57   LSU misalign load finish
1 59   LSU misalign stall
1 58   LSU misalign store complete
1 52   LSU RA latch stall
1 55   LSU store queue index alias
1 4d   LSU touch alias versus CSQ
1 4c   LSU touch alias versus FSQ/WB0/WB1
2 1d   lwarx instructions completed
2 1e   mfspr instructions completed
1 12   mfvscr synchronization
2 12   mfvscr synchronization
4 1c   Mispredicted branches
1 24   mtspr instructions completed
1  d   mtvrsave instructions completed
2  d   mtvrsave instructions completed
4  d   mtvrsave Instructions completed
1  c   mtvscr instructions completed
2  c   mtvscr instructions completed
4  c   mtvscr Instructions completed
1  0   Nothing
2  0   Nothing
3  0   Nothing
4  0   Nothing
5  0   Nothing
6  0   Nothing
3  9   One instruction dispatched
6 37   Prefetch engine collision vs. i instruction fetch
6 35   Prefetch engine collision vs. load
6 38   Prefetch engine collision vs. load/store/instruction fetch
6 36   Prefetch engine collision vs. store
6 39   Prefetch engine full
6 34   Prefetch engine request
1  1   Processor cycles
2  1   Processor cycles
3  1   Processor cycles
4  1   Processor cycles
5  1   Processor cycles
6  1   Processor cycles
1  5   Processor performance monitor exception
2  5   Processor performance monitor exception
3  5   Processor performance monitor exception
4  5   Processor performance monitor exception
6 10   RAQ full cycles
2 1f   Refetch serialization
1 3a   Refreshed dsts
1 25   sc instructions completed
2 38   Second speculation buffer active
3 1b   Second speculative branch buffer resolved correctly
1  7   signal
5 10   Snoop modified
6 33   Snoop requests
4 18   Snoop retries
5  f   Snoop retries
6  f   Snoop retries
5 11   Snoop valid
1 14   Store instructions
2 14   Store instructions
2 33   Store merge/gather
4 16   Store string and multiple instruction pieces
2 1b   Store string and store multiple instructions
1 3d   stream 0 cache line fetches
3  f   stwcx. instructions
1 3b   Successful dst, dstt, dstst, and dststt table search operations
4 19   Successful stwcx.
4 15   sync instructions
3 19   Taken branches that are processed
1  3   TBL bit transitions
2  3   TBL bit transitions
3  3   TBL bit transitions
4  3   TBL bit transitions
3 1c   Third speculation buffer active
4 1b   Third speculative branch buffer resolved correctly
2 1c   tlbie instructions completed
2 2f   TLBIE snoops
4 14   tlbsync instructions
1 2f   Touch alias
1 1a   True branch target instruction hits
2 22   Two instructions dispatched
1 17   Unresolved branches
1  9   VFPU instructions completed
2  9   VFPU instructions completed
1  a   VIU1 instructions completed
2  a   VIU1 instructions completed
4  a   VIU1 instructions completed
1  b   VIU2 instructions completed
2  b   VIU2 instructions completed
4  b   VIU2 Instructions completed
1  8   VPU instructions completed
2  8   VPU instructions completed
3  b   VR Stalls
1 13   VSCR[SAT] set
2 13   VSCR[SAT] set
3 18   VT2 fetches
2 2e   VTQ line fetch
1 3f   VTQ line fetch hit
2 2d   VTQ line fetch miss
2 2c   VTQ resumes due to change of context
2 2b   VTQ stream cancelled prematurely
1 3e   VTQ suspends due to change of context
6 11   WAQ full cycles
1 33   Write-through stores

Anton Ertl