For speeding up processing times on the Collatz Conjecture project, locate the
collatz_sieve_1.21_windows_intelx86__opencl_amd_gpu.config (32 bit OS) or
collatz_sieve_1.21_windows_x86_64__opencl_amd_gpu.config (64 bit OS) file in the
C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz project directory.
This will initially be an empty file. There are various settings that can be entered into this file to control the processing of the Collatz workunits with an empty configuration file, the workunits take approximately twice as long to process)
(name) (values)
verbose 1/0
If enabled, more data appears in the output - no effect on crunching
kernels/reduction 1 - 64
Number of kernels done before reducing. Higher numbers speed processing, too high crashes the video driver. 8 - 48 originally recommended although higher values work well with more powerful cards.
threads 6 - 11
Amount of GPU registers used (2^6 = 64, 2^11 = 2048) Bigger numbers allows more threads to be run in parallel, too big and slow external RAM needs to be used instead of the fast GPU registers. AMD best at 6 - 8, NVIDIA work best at 8 - 9 Bigger values for more modern cards.
lut_size 2 - 31
Size of the lookup table in powers of 2, each entry takes 8 bytes. (17 = 2^17 = 131,072 entries = 1,048,576 bytes or 1 GB) Larger is better, >20 will probably crash the GPU driver. Scale to fit in the GPU L1/L2 cache. 4GB cache = 512M entries = 2^19 entries -> 19
sieve_size 25 - 32
Size of the sieve used (2^25 to 2^32) as well as the number of of items per kernel (26 = roughly 1 million items per kernel, 27 = roughly 2 million items per kernel). Higher is better, too high crashes the GPU driver.
sleep >0
Number of milliseconds for the CPU to sleep while waiting for the kernel to be processed. Bigger values gives less CPU usage and better response but slows down the workunit processing.
cache_sieve 1/0
Cache the local sieve table between workunits. If 0, it's re-created for each workunit. Stored on disc not on the card, GPU memory size immaterial.
reduce_cpu 1/0
If enabled, more CPU utilisation (weirdly opposite to what the item name would suggest) but a more responsive graphics system.
For maximum credits per workunit:
threads should be scaled to the number of GPU registers present on your card.
lut_size should be scaled to fit in the GPU level 1/level 2 cache.
sleep should be set to 1.
cache_sieve should be set to 1.
reduce_cpu should be set to 0.
kernels/reduction and
sieve_size are the ones to experiment with. (Note: not many cards list the number of GPU registers present so
threads is usually also one for experimenting with)
Sample configurations grabbed from the top hosts:
AMD 480 (~13 minutes)
Code: Select all
verbose 1
kernels/reduction 50
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
AMD R9-280 (~12:15 minutes)
Code: Select all
verbose 1
kernels/reduction 48
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
AMD R9-290 (~10 minutes)
Code: Select all
verbose 1
kernels/reduction 64
threads 8
lut_size 16
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
AMD R9-390 (~11 minutes)
Code: Select all
verbose 1
kernels/reduction 50
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
GTX Titan X (~24:30 minutes)
Code: Select all
verbose 1
kernels/reduction 48
threads 9
lut_size 16
sieve_size 26
sleep 1
cache_sieve 1
reduce_cpu 1
**** Does have CPU usage increased and so longer processing times ****
GTX 780 Ti (~19:15 minutes)
Code: Select all
verbose 1
kernels/reduction 48
threads 9
lut_size 18
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
GTX 980Ti (~7 minutes)
Code: Select all
verbose 1
kernels/reduction 64
threads 8
lut_size 18
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
GTX 1060 (~10 minutes)
Code: Select all
verbose 1
kernels/reduction 48
threads 8
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
GTX 1070 (~9:30 minutes)
Code: Select all
verbose 1
kernels/reduction 56
threads 9
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
GTX 1080 (~6 minutes)
Code: Select all
verbose 1
kernels/reduction 48
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Although pure CPU crunching is very inefficient on Collatz, the following setup gives the best results
Code: Select all
verbose 1
lut_size 18
sieve_size 30
cache_sieve 1