Difference between revisions of "Performance"
From Gw-qcd-wiki
Line 1: | Line 1: | ||
'''Carver''' | '''Carver''' | ||
− | -Tester: Ben Gamari | + | - Tester: Ben Gamari |
− | -Test date: 14 Jul 2010 | + | - Test date: 14 Jul 2010 |
− | -Commit: e3e4ffafd158abd004c483694a27f4f6bc7d2185 | + | - Commit: e3e4ffafd158abd004c483694a27f4f6bc7d2185 |
− | -Hardware: | + | - Hardware: |
− | -CUDA version 3.0 | + | - CUDA version 3.0 |
{| | {| |
Revision as of 13:37, 14 July 2010
Carver
- Tester: Ben Gamari - Test date: 14 Jul 2010 - Commit: e3e4ffafd158abd004c483694a27f4f6bc7d2185 - Hardware: - CUDA version 3.0
Kernel | Configuration | Bandwidth | FLOPs |
---|---|---|---|
Dslash_cuda | Dslash (24^4) | 73 GB/s | 32 GFLOP/s |
hopping (24^4) | 74 GB/s | 34 GFLOP/s | |
Dslash_multi_gpu (double) | 1 node, 24^4 Dslash | 79 GB/s | 35 GFLOP/s |
2 nodes, 24^4 Dslash | 145 GB/s | 64 GFLOP/s | |
4 nodes, 24^4 Dslash | 256 GB/s | 114 GFLOP/s | |
Dslash_multi_gpu (double) | 1 node, 24^4 Dslash | 79 GB/s | 76 GFLOP/s |
2 nodes, 24^4 Dslash | 156 GB/s | 140 GFLOP/s | |
4 nodes, 24^4 Dslash | 283 GB/s | 252 GFLOP/s | |
Vector utilities | Addition | 82 GB/s | 3.4 GFLOP/s |
Dot product | 88 GB/s | N/A | |
Copy | 84 GB/s | N/A |