ffmpeg_powerpc_performance_evaluation_howto.txt 5.63 KB
Newer Older
1 2
FFmpeg & evaluating performance on the PowerPC Architecture HOWTO

3
(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
4 5 6 7 8



I - Introduction

9 10
The PowerPC architecture and its SIMD extension AltiVec offer some
interesting tools to evaluate performance and improve the code.
Diego Biurrun's avatar
Diego Biurrun committed
11
This document tries to explain how to use those tools with FFmpeg.
12

13 14
The architecture itself offers two ways to evaluate the performance of
a given piece of code:
15 16 17 18

1) The Time Base Registers (TBL)
2) The Performance Monitor Counter Registers (PMC)

Diego Biurrun's avatar
Diego Biurrun committed
19 20 21 22
The first ones are always available, always active, but they're not very
accurate: the registers increment by one every four *bus* cycles. On
my 667 Mhz tiBook (ppc7450), this means once every twenty *processor*
cycles. So we won't use that.
23

Diego Biurrun's avatar
Diego Biurrun committed
24
The PMC are much more useful: not only can they report cycle-accurate
25
timing, but they can also be used to monitor many other parameters,
Diego Biurrun's avatar
Diego Biurrun committed
26
such as the number of AltiVec stalls for every kind of instruction,
27 28 29
or instruction cache misses. The downside is that not all processors
support the PMC (all G3, all G4 and the 970 do support them), and
they're inactive by default - you need to activate them with a
Diego Biurrun's avatar
Diego Biurrun committed
30 31 32
dedicated tool. Also, the number of available PMC depends on the
procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
and the various 74xx (aka G4) have 6.
33

Diego Biurrun's avatar
Diego Biurrun committed
34 35 36
*WARNING*: The PowerPC 970 is not very well documented, and its PMC
registers are 64 bits wide. To properly notify the code, you *must*
tune for the 970 (using --tune=970), or the code will assume 32 bit
37
registers.
38 39 40 41


II - Enabling FFmpeg PowerPC performance support

Diego Biurrun's avatar
Diego Biurrun committed
42 43
This needs to be done by hand. First, you need to configure FFmpeg as
usual, but add the "--powerpc-perf-enable" option. For instance:
44 45

#####
Diego Biurrun's avatar
Diego Biurrun committed
46
./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
47 48
#####

Diego Biurrun's avatar
Diego Biurrun committed
49
This will configure FFmpeg to install inside /usr/local/ffmpeg-svn,
50
compiling with gcc-3.3 (you should try to use this one or a newer
Diego Biurrun's avatar
Diego Biurrun committed
51 52
gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of
thumb, those at 550Mhz and more). It will also enable the PMC.
53 54 55 56 57 58 59

You may also edit the file "config.h" to enable the following line:

#####
// #define ALTIVEC_USE_REFERENCE_C_CODE 1
#####

60 61
If you enable this line, then the code will not make use of AltiVec,
but will use the reference C code instead. This is useful to compare
Diego Biurrun's avatar
Diego Biurrun committed
62
performance between two versions of the code.
63

Diego Biurrun's avatar
Diego Biurrun committed
64
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
65 66 67 68 69

#####
#define POWERPC_NUM_PMC_ENABLED 4
#####

Diego Biurrun's avatar
Diego Biurrun committed
70 71
If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more
PMC than available on your CPU!
72

Diego Biurrun's avatar
Diego Biurrun committed
73
Then, simply compile FFmpeg as usual (make && make install).
74 75 76 77 78



III - Using FFmpeg PowerPC performance support

Diego Biurrun's avatar
Diego Biurrun committed
79
This FFmeg can be used exactly as usual. But before exiting, FFmpeg
80
will dump a per-function report that looks like this:
81 82 83

#####
PowerPC performance report
84 85
 Values are from the PMC registers, and represent whatever the
 registers are set to record.
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
 Function "gmc1_altivec" (pmc1):
        min: 231
        max: 1339867
        avg: 558.25 (255302)
 Function "gmc1_altivec" (pmc2):
        min: 93
        max: 2164
        avg: 267.31 (255302)
 Function "gmc1_altivec" (pmc3):
        min: 72
        max: 1987
        avg: 276.20 (255302)
(...)
#####

101
In this example, PMC1 was set to record CPU cycles, PMC2 was set to
Diego Biurrun's avatar
Diego Biurrun committed
102
record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec
103
Issue Stalls.
104

105 106 107
The function "gmc1_altivec" was monitored 255302 times, and the
minimum execution time was 231 processor cycles. The max and average
aren't much use, as it's very likely the OS interrupted execution for
Diego Biurrun's avatar
Diego Biurrun committed
108
reasons of its own :-(
109

Diego Biurrun's avatar
Diego Biurrun committed
110 111
With the exact same settings and source file, but using the reference C
code we get:
112 113 114

#####
PowerPC performance report
115 116
 Values are from the PMC registers, and represent whatever the
 registers are set to record.
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
 Function "gmc1_altivec" (pmc1):
        min: 592
        max: 2532235
        avg: 962.88 (255302)
 Function "gmc1_altivec" (pmc2):
        min: 0
        max: 33
        avg: 0.00 (255302)
 Function "gmc1_altivec" (pmc3):
        min: 0
        max: 350
        avg: 0.03 (255302)
(...)
#####

132 133 134
592 cycles, so the fastest AltiVec execution is about 2.5x faster than
the fastest C execution in this example. It's not perfect but it's not
bad (well I wrote this function so I can't say otherwise :-).
135

136
Once you have that kind of report, you can try to improve things by
Diego Biurrun's avatar
Diego Biurrun committed
137 138 139
finding what goes wrong and fixing it; in the example above, one
should try to diminish the number of AltiVec stalls, as this *may*
improve performance.
140 141 142



Diego Biurrun's avatar
Diego Biurrun committed
143
IV) Enabling the PMC in Mac OS X
144

145 146
This is easy. Use "Monster" and "monster". Those tools come from
Apple's CHUD package, and can be found hidden in the developer web
Diego Biurrun's avatar
Diego Biurrun committed
147
site & FTP site. "MONster" is the graphical application, use it to
148 149 150
generate a config file specifying what each register should
monitor. Then use the command-line application "monster" to use that
config file, and enjoy the results.
151

Diego Biurrun's avatar
Diego Biurrun committed
152
Note that "MONster" can be used for many other things, but it's
153
documented by Apple, it's not my subject.
154

155 156 157 158
If you are using CHUD 4.4.2 or later, you'll notice that MONster is
no longer available. It's been superseeded by Shark, where
configuration of PMCs is available as a plugin.

159 160


Diego Biurrun's avatar
Diego Biurrun committed
161
V) Enabling the PMC on Linux
162

163 164 165 166 167 168
On linux you may use oprofile from http://oprofile.sf.net, depending on the
version and the cpu you may need to apply a patch[1] to access a set of the
possibile counters from the userspace application. You can always define them
using the kernel interface /dev/oprofile/* .

[1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch
169

170
--
171 172
Romain Dolbeau <romain@dolbeau.org>
Luca Barbato <lu_zero@gentoo.org>