summaryrefslogtreecommitdiff
path: root/Source/DirectFB/gfxdrivers/sh772x/README.sh7722
blob: 45bbf874ebced44297a48ceb30b2c62033bbb38e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
Renesas SH7722 graphics driver
==============================

This driver supports the SH7722 SoC from Renesas Solutions Corp. controlling
- LCDC (LCD Controller) for display
- BEU (Blit Engine Unit) for blending of planes
- TDG (2D Graphics) for accelerated operations

It's using a kernel device called sh7722gfx which mainly does interrupt handling.

The 2D Graphics unit supports display lists in RAM and reads them via DMA. These
lists consist of double word entries, first word designates a register, second
word contains the data to write. Once a list has been completely processed, the
hardware generates an interrupt to start the next list.

The kernel module allocates a ring buffer for usage as a display list. The user
space prepares a block of commands and puts it into the ring buffer. If the hardware
is idle, it's started directly from user space. When a DMA completion interrupt
is received, the next block of commands is started from kernel space. If the
hardware is still running the previous block, new commands are appended to the
next one. The driver is designed to run without any locking or system calls. Only
a few interrupts happen over time depending on the operations. The hardware is not
getting idle, while commands are being sent to keep it busy. There's just a minimal
gap which is the interrupt handler setting the new start address and kicking the
hardware again.

To build the kernel module use "make -f Makefile.kernel". You might want to set
the variables KERNEL_SOURCE, KERNEL_BUILD (if != KERNEL_SOURCE), KERNEL_VERSION
and DESTDIR.

To run the driver you need the DevMem system module using the directfbrc.sh7722
file (renamed to directfbrc in $prefix/etc).


Performance (as of 2007-09-21, multi app, 127.79 BogoMIPS)
----------------------------------------------------------

Only 14% CPU load with df_andi running 800x480 at 28.1 fps :)

Benchmarking with 256x256 in 16bit mode... (16bit)
                                                                                CPU load
Anti-aliased Text                              3.020 secs (   41.721 KChars/sec) [100%]
Anti-aliased Text (blend)                      3.328 secs (   10.817 KChars/sec) [100%]
Fill Rectangle                                 5.549 secs (*  69.681 MPixel/sec) [  3%]
Fill Rectangle (blend)                        11.873 secs (*  22.079 MPixel/sec) [  1%]
Fill Rectangles [10]                           9.384 secs (*  69.838 MPixel/sec) [  0%]
Fill Rectangles [10] (blend)                  14.836 secs (*  22.086 MPixel/sec) [  0%]
Fill Triangles                                 3.024 secs (+  50.929 MPixel/sec) [ 40%]
Fill Triangles (blend)                         3.064 secs (+  20.319 MPixel/sec) [  8%]
Draw Rectangle                                 3.284 secs (*   6.942 KRects/sec) [ 26%]
Draw Rectangle (blend)                         3.302 secs (*   6.268 KRects/sec) [ 25%]
Draw Lines [10]                                3.238 secs (*  28.103 KLines/sec) [ 20%]
Draw Lines [10] (blend)                        3.198 secs (*  27.829 KLines/sec) [ 19%]
Fill Spans                                     3.092 secs (*  61.466 MPixel/sec) [ 33%]
Fill Spans (blend)                             3.094 secs (*  21.181 MPixel/sec) [ 11%]
Blit                                          10.436 secs (*  30.143 MPixel/sec) [  2%]
Blit colorkeyed                                9.333 secs (*  32.301 MPixel/sec) [  2%]
Blit destination colorkeyed                    3.763 secs (    6.966 MPixel/sec) [ 99%]
Blit with format conversion                   13.369 secs (*  22.549 MPixel/sec) [  1%]
Blit with colorizing                           4.419 secs (    2.966 MPixel/sec) [100%]
Blit from 32bit (blend)                       21.973 secs (*  13.123 MPixel/sec) [  1%]
Blit from 32bit (blend) with colorizing        5.129 secs (    1.277 MPixel/sec) [100%]
Stretch Blit                                  10.271 secs (*  33.463 MPixel/sec) [  3%]
Stretch Blit colorkeyed                        7.895 secs (*  35.159 MPixel/sec) [  3%]

(*) SH7722/BLT: 940 starts, 940 done, 940 interrupts, 43 wait_idle, 780 wait_next, 89 idle
(*) SH7722/BLT: 24700744 words, 26277 words/start, 277536 words/idle, 10 starts/idle

* = accelerated
+ = half way accelerated


Performance (as of 2007-09-25, multi app, 127.79 BogoMIPS)
----------------------------------------------------------

Only 13% CPU load with df_andi running 800x480 at 28.8 fps :)
Only 46% CPU load with ClanBomber2 running 800x600 at 48 fps :)

Benchmarking with 256x256 in 16bit mode... (16bit)
                                                                                CPU load
Anti-aliased Text                              3.057 secs (*  98.920 KChars/sec) [ 47%] !
Anti-aliased Text (blend)                      3.298 secs (   10.915 KChars/sec) [100%] 
Fill Rectangle                                 5.732 secs (*  69.743 MPixel/sec) [  3%] 
Fill Rectangle (blend)                        11.571 secs (*  22.088 MPixel/sec) [  1%] 
Fill Rectangles [10]                           9.384 secs (*  69.838 MPixel/sec) [  0%] 
Fill Rectangles [10] (blend)                  14.836 secs (*  22.086 MPixel/sec) [  0%] 
Fill Triangles                                 4.176 secs (*  61.989 MPixel/sec) [  6%] !
Fill Triangles (blend)                         8.132 secs (*  21.759 MPixel/sec) [  2%] !
Draw Rectangle                                 3.216 secs (*   6.965 KRects/sec) [ 26%]  
Draw Rectangle (blend)                         3.290 secs (*   6.322 KRects/sec) [ 22%]  
Draw Lines [10]                                3.216 secs (*  28.296 KLines/sec) [ 14%]  
Draw Lines [10] (blend)                        3.196 secs (*  28.160 KLines/sec) [ 14%]  
Fill Spans                                     3.086 secs (*  61.586 MPixel/sec) [ 25%]  
Fill Spans (blend)                             3.092 secs (*  21.195 MPixel/sec) [  7%]  
Blit                                           8.692 secs (*  30.159 MPixel/sec) [  2%]  
Blit 180                                       4.783 secs (*  30.144 MPixel/sec) [  2%] !
Blit colorkeyed                               11.965 secs (*  32.316 MPixel/sec) [  2%]  
Blit destination colorkeyed                    3.795 secs (    6.907 MPixel/sec) [ 99%]   
Blit with format conversion                    9.039 secs (*  22.476 MPixel/sec) [  1%]  
Blit with colorizing                           4.414 secs (    2.969 MPixel/sec) [100%]  
Blit from 32bit (blend)                       23.375 secs (*  13.177 MPixel/sec) [  1%]  
Blit from 32bit (blend) with colorizing        5.137 secs (    1.275 MPixel/sec) [100%]  
Stretch Blit                                   8.976 secs (*  33.495 MPixel/sec) [  2%]  
Stretch Blit colorkeyed                        9.728 secs (*  35.226 MPixel/sec) [  2%]  

(*) SH7722/BLT: 521 starts, 521 done, 521 interrupts, 45 wait_idle, 363 wait_next, 90 idle
(*) SH7722/BLT: 11511104 words, 22094 words/start, 127901 words/idle, 5 starts/idle

* = accelerated
! = updated


Statistics
----------

The statistics at the end are more valuable when looking at one case at a time:

Fill Rectangle                                 5.834 secs (*  69.647 MPixel/sec) [  4%]

(*) SH7722/BLT: 16 starts, 16 done, 16 interrupts, 4 wait_idle, 2 wait_next, 11 idle
(*) SH7722/BLT: 74840 words, 4677 words/start, 6803 words/idle, 1 starts/idle

This means that while the FillRectangle() benchmark was running, the hardware
didn't get idle, which is obvious when running the benchmark for just 10 ms:

Fill Rectangle                                 0.191 secs (*  68.624 MPixel/sec) [ 10%]

(*) SH7722/BLT: 13 starts, 13 done, 13 interrupts, 4 wait_idle, 0 wait_next, 11 idle
(*) SH7722/BLT: 2840 words, 218 words/start, 258 words/idle, 1 starts/idle

See? The same number of times becoming idle, but a few less interrupts. Don't
worry about the 191 ms the benchmark needed to complete, after 10 ms of stuffing
the display list, we need to wait until the hardware is done before measuring
the time it took and calculating the result.

Here's FillSpans() which as opposed to FillRectangle() does a lot of small commands:

Fill Spans                                     3.028 secs (*  61.467 MPixel/sec) [ 34%]

(*) SH7722/BLT: 245 starts, 245 done, 245 interrupts, 3 wait_idle, 185 wait_next, 22 idle
(*) SH7722/BLT: 5828128 words, 23788 words/start, 264914 words/idle, 11 starts/idle


Example kernel log (debug mode)
-------------------------------

0.549.014 - sh7722_reset     : Resetting hardware...
0.549.046 - sh7722_reset     : Initializing shared area...
0.549.748 - sh7722_reset     : Clearing interrupts...
0.549.770 - sh7722_reset     : Ready        (   idle, hw     0-    0, next     0-    0, invalid, HC 0000000, INT 000000)
0.568.700 - sh7722_wait      : Waiting..... (running, hw     0-   54, next    56-   56, invalid, HC 1010111, INT 000000)
0.573.339 - sh7722_tdg_irq   : -Interrupt   (running, hw     0-   54, next    56-   56, invalid, HC 0000000, INT 100100)
0.573.397 - sh7722_tdg_irq   :  '-> Idle.   (running, hw     0-   54, next    56-   56, invalid, HC 0000000, INT 000000)
0.573.480 - sh7722_wait      : ........done (   idle, hw     0-   54, next    56-   56, invalid, HC 0000000, INT 000000)
0.583.575 - sh7722_wait      : Waiting..... (running, hw    56-   78, next    80-   80, invalid, HC 1010111, INT 000000)
0.588.414 - sh7722_tdg_irq   : -Interrupt   (running, hw    56-   78, next    80-   80, invalid, HC 0000000, INT 100100)
0.588.470 - sh7722_tdg_irq   :  '-> Idle.   (running, hw    56-   78, next    80-   80, invalid, HC 0000000, INT 000000)
0.588.544 - sh7722_wait      : ........done (   idle, hw    56-   78, next    80-   80, invalid, HC 0000000, INT 000000)
0.601.336 - sh7722_tdg_irq   : -Interrupt   (running, hw    80-  102, next   104-  104, invalid, HC 0000000, INT 100100)
0.601.420 - sh7722_tdg_irq   :  '-> Idle.   (running, hw    80-  102, next   104-  104, invalid, HC 0000000, INT 000000)
0.700.117 - sh7722_tdg_irq   : -Interrupt   (running, hw   104-  124, next   128-  128, invalid, HC 0000000, INT 100100)
0.700.205 - sh7722_tdg_irq   :  '-> Idle.   (running, hw   104-  124, next   128-  128, invalid, HC 0000000, INT 000000)
3.115.419 - sh7722_tdg_irq   : -Interrupt   (running, hw   128-  220, next   224-  224, invalid, HC 0000000, INT 100100)
3.115.506 - sh7722_tdg_irq   :  '-> Idle.   (running, hw   128-  220, next   224-  224, invalid, HC 0000000, INT 000000)
3.151.700 - sh7722_tdg_irq   : -Interrupt   (running, hw   224-  324, next   328-  328, invalid, HC 0000000, INT 100100)
3.151.788 - sh7722_tdg_irq   :  '-> Idle.   (running, hw   224-  324, next   328-  328, invalid, HC 0000000, INT 000000)
3.159.160 - sh7722_wait      : Waiting..... (running, hw   328-  444, next   448-12994,   valid, HC 1010111, INT 000100)
3.161.783 - sh7722_tdg_irq   : -Interrupt   (running, hw   328-  444, next   448-12994,   valid, HC 0000000, INT 100100)
3.161.839 - sh7722_tdg_irq   :  '-> Start!  (running, hw   448-12994, next 12996-12996, invalid, HC 0000000, INT 000000)
4.316.367 - sh7722_tdg_irq   : -Interrupt   (running, hw   448-12994, next 12996-12996, invalid, HC 0000000, INT 100100)
4.316.434 - sh7722_tdg_irq   :  '-> Idle.   (running, hw   448-12994, next 12996-12996, invalid, HC 0000000, INT 000000)
4.316.505 - sh7722_wait      : ........done (   idle, hw   448-12994, next 12996-12996, invalid, HC 0000000, INT 000000)