Tweaking app xml files
-
- UBT Forum Admin
- Posts: 9725
- Joined: Mon Mar 13, 2006 12:00 am
- Location: NW Midlands
- Contact:
Tweaking app xml files
Each project will normally just crunch tasks based on each PC's capabilities.
However, there are times, esp now with CPU and GPU processing power improving in leaps and bounds, that your PC could actually do more.
And likewise, there could be times when you want to specifically limit certain projects from doing too much (as it could slow your PC down).
This is partly where the app_config.xml and app_info.xml files come into use as you can subtlety tweak the project so that it performs based on your willingness to share your computers resources (or not, as the case may be).
I'll list some project specific codes that might help members in further posts to this thread.
However, there are times, esp now with CPU and GPU processing power improving in leaps and bounds, that your PC could actually do more.
And likewise, there could be times when you want to specifically limit certain projects from doing too much (as it could slow your PC down).
This is partly where the app_config.xml and app_info.xml files come into use as you can subtlety tweak the project so that it performs based on your willingness to share your computers resources (or not, as the case may be).
I'll list some project specific codes that might help members in further posts to this thread.
Specific to the Collatz Conjecture Boinc project
For speeding up processing times on the Collatz Conjecture project, locate the collatz_sieve_1.21_windows_intelx86__opencl_amd_gpu.config (32 bit OS) or collatz_sieve_1.21_windows_x86_64__opencl_amd_gpu.config (64 bit OS) file in the C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz project directory.
This will initially be an empty file. There are various settings that can be entered into this file to control the processing of the Collatz workunits with an empty configuration file, the workunits take approximately twice as long to process)
(name) (values)
verbose 1/0
If enabled, more data appears in the output - no effect on crunching
kernels/reduction 1 - 64
Number of kernels done before reducing. Higher numbers speed processing, too high crashes the video driver. 8 - 48 originally recommended although higher values work well with more powerful cards.
threads 6 - 11
Amount of GPU registers used (2^6 = 64, 2^11 = 2048) Bigger numbers allows more threads to be run in parallel, too big and slow external RAM needs to be used instead of the fast GPU registers. AMD best at 6 - 8, NVIDIA work best at 8 - 9 Bigger values for more modern cards.
lut_size 2 - 31
Size of the lookup table in powers of 2, each entry takes 8 bytes. (17 = 2^17 = 131,072 entries = 1,048,576 bytes or 1 GB) Larger is better, >20 will probably crash the GPU driver. Scale to fit in the GPU L1/L2 cache. 4GB cache = 512M entries = 2^19 entries -> 19
sieve_size 25 - 32
Size of the sieve used (2^25 to 2^32) as well as the number of of items per kernel (26 = roughly 1 million items per kernel, 27 = roughly 2 million items per kernel). Higher is better, too high crashes the GPU driver.
sleep >0
Number of milliseconds for the CPU to sleep while waiting for the kernel to be processed. Bigger values gives less CPU usage and better response but slows down the workunit processing.
cache_sieve 1/0
Cache the local sieve table between workunits. If 0, it's re-created for each workunit. Stored on disc not on the card, GPU memory size immaterial.
reduce_cpu 1/0
If enabled, more CPU utilisation (weirdly opposite to what the item name would suggest) but a more responsive graphics system.
For maximum credits per workunit:
threads should be scaled to the number of GPU registers present on your card.
lut_size should be scaled to fit in the GPU level 1/level 2 cache.
sleep should be set to 1.
cache_sieve should be set to 1.
reduce_cpu should be set to 0.
kernels/reduction and sieve_size are the ones to experiment with. (Note: not many cards list the number of GPU registers present so threads is usually also one for experimenting with)
Sample configurations grabbed from the top hosts:
AMD 480 (~13 minutes)
AMD R9-280 (~12:15 minutes)
AMD R9-290 (~10 minutes)
AMD R9-390 (~11 minutes)
GTX Titan X (~24:30 minutes)
**** Does have CPU usage increased and so longer processing times ****
GTX 780 Ti (~19:15 minutes)
GTX 980Ti (~7 minutes)
GTX 1060 (~10 minutes)
GTX 1070 (~9:30 minutes)
GTX 1080 (~6 minutes)
Although pure CPU crunching is very inefficient on Collatz, the following setup gives the best results
This will initially be an empty file. There are various settings that can be entered into this file to control the processing of the Collatz workunits with an empty configuration file, the workunits take approximately twice as long to process)
(name) (values)
verbose 1/0
If enabled, more data appears in the output - no effect on crunching
kernels/reduction 1 - 64
Number of kernels done before reducing. Higher numbers speed processing, too high crashes the video driver. 8 - 48 originally recommended although higher values work well with more powerful cards.
threads 6 - 11
Amount of GPU registers used (2^6 = 64, 2^11 = 2048) Bigger numbers allows more threads to be run in parallel, too big and slow external RAM needs to be used instead of the fast GPU registers. AMD best at 6 - 8, NVIDIA work best at 8 - 9 Bigger values for more modern cards.
lut_size 2 - 31
Size of the lookup table in powers of 2, each entry takes 8 bytes. (17 = 2^17 = 131,072 entries = 1,048,576 bytes or 1 GB) Larger is better, >20 will probably crash the GPU driver. Scale to fit in the GPU L1/L2 cache. 4GB cache = 512M entries = 2^19 entries -> 19
sieve_size 25 - 32
Size of the sieve used (2^25 to 2^32) as well as the number of of items per kernel (26 = roughly 1 million items per kernel, 27 = roughly 2 million items per kernel). Higher is better, too high crashes the GPU driver.
sleep >0
Number of milliseconds for the CPU to sleep while waiting for the kernel to be processed. Bigger values gives less CPU usage and better response but slows down the workunit processing.
cache_sieve 1/0
Cache the local sieve table between workunits. If 0, it's re-created for each workunit. Stored on disc not on the card, GPU memory size immaterial.
reduce_cpu 1/0
If enabled, more CPU utilisation (weirdly opposite to what the item name would suggest) but a more responsive graphics system.
For maximum credits per workunit:
threads should be scaled to the number of GPU registers present on your card.
lut_size should be scaled to fit in the GPU level 1/level 2 cache.
sleep should be set to 1.
cache_sieve should be set to 1.
reduce_cpu should be set to 0.
kernels/reduction and sieve_size are the ones to experiment with. (Note: not many cards list the number of GPU registers present so threads is usually also one for experimenting with)
Sample configurations grabbed from the top hosts:
AMD 480 (~13 minutes)
Code: Select all
verbose 1
kernels/reduction 50
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 48
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 64
threads 8
lut_size 16
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 50
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 48
threads 9
lut_size 16
sieve_size 26
sleep 1
cache_sieve 1
reduce_cpu 1
GTX 780 Ti (~19:15 minutes)
Code: Select all
verbose 1
kernels/reduction 48
threads 9
lut_size 18
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 64
threads 8
lut_size 18
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 48
threads 8
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 56
threads 9
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 48
threads 8
lut_size 17
sieve_size 30
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
lut_size 18
sieve_size 30
cache_sieve 1
Last edited by Woodles on Sun Nov 05, 2017 4:39 pm, edited 3 times in total.
Re: Tweaking app xml files
Hi Nick,
Best I could find were:
GTX 750 Ti (45 minutes)
AMD 200 series Picairn (~30 minutes) - Which I assume is a 270x or very similar as AMD GPUs don't identify themselves very well in the results
Mark
Best I could find were:
GTX 750 Ti (45 minutes)
Code: Select all
verbose 1
kernels/reduction 32
threads 8
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
Code: Select all
verbose 1
kernels/reduction 48
threads 6
lut_size 14
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
Re: Tweaking app xml files
thanks for looking seems like that did not shave any time off for the 750's


Re: Tweaking app xml files
Oh well, worth a try.
ETA. Are you sure you put the .config files in the right place as none of your recent tasks seem to be using them?
They should be plain text files called collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config and collatz_sieve_1.21_windows_intelx86__opencl_nvidia_gpu.config in the ....\projects\boinc.thesonntags.com_collatz directory with just the text from the code brackets above.
If you check a valid task it should have something like the following in it.
Mark
ETA. Are you sure you put the .config files in the right place as none of your recent tasks seem to be using them?
They should be plain text files called collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config and collatz_sieve_1.21_windows_intelx86__opencl_nvidia_gpu.config in the ....\projects\boinc.thesonntags.com_collatz directory with just the text from the code brackets above.
If you check a valid task it should have something like the following in it.
Code: Select all
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
Collatz Conjecture Sieve 1.21 Windows x86_64 for OpenCL
Written by Slicker (Jon Sonntag) of team SETI.USA
Based on the AMD Brook+ kernels by Gipsel of team Planet 3DNow!
Sieve code and OpenCL optimization provided by Sosiris of team BOINC@Taiwan
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 56
threads 2^9 (512)
lut_size 17 (1048576 bytes)
sieve_size 2^28 (14098344 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Platform NVIDIA
Device 00000000002F4990
Max Dimensions 3
Max Work Items 1024 1024 64
Max Work Groups 1024
Max Kernel Threads 1024
Device Vendor NVIDIA Corporation
Name GeForce GTX 1070
Driver Version 376.09
OpenCL Version OpenCL 1.2 CUDA
actual threads 512
Start 3121413499751415939072
Stop 3121413552527974072320
Best 3121413542220481506543
Highest steps 2137
Total steps 405115045471362
Average steps 584
CPU time 0.405603 seconds
Elapsed time 556.361seconds
22:25:03 (1164): called boinc_finish
</stderr_txt>
]]>
Re: Tweaking app xml files
collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config:
verbose 1
kernels/reduction 32
threads 8
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
do i need to put it in the 32?
verbose 1
kernels/reduction 32
threads 8
lut_size 17
sieve_size 28
sleep 1
cache_sieve 1
reduce_cpu 0
do i need to put it in the 32?


Re: Tweaking app xml files
Sorry Nick, I didn't describe the file contents very well to begin with.
You need a collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config file in the sonntags directory only containing the values above BUT, you need equals signs between the variables and the values:
You only need the 32 bit config file if you're going to be running the 32 bit applications. I include it just in case the project runs out of 64 bit workunits (not happened yet).
Regards,
Mark
You need a collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config file in the sonntags directory only containing the values above BUT, you need equals signs between the variables and the values:
I don't know if spaces matter, I don't put them in and it works fine.verbose=1
kernels_per_reduction=32
threads=8
lut_size=17
sieve_size=28
sleep=1
cache_sieve=1
reduce_cpu=0
You only need the 32 bit config file if you're going to be running the 32 bit applications. I include it just in case the project runs out of 64 bit workunits (not happened yet).
Regards,
Mark
-
- Active UBT Contributor 15+ yrs
- Posts: 1627
- Joined: Mon Jul 24, 2006 1:00 am
- Location: Halifax, West Yorks.
Re: Tweaking app xml files
I was trying to this just now but found that the directory was empty.
I then spotted the notice about having to remove and re-add the project.
Did that and the files appeared in the directory, so I've added the tweaks. I'm not sure how to tell if it's making a difference though?
And, strangely, the notice has disappeared? Or is that because I've done what it asked?
I'm confused!
Update - after doing the config files I was getting computation errors after a few seconds of running.
I've gone back and cleared out the files and all seems well again. I've obviously done something wrong, I'll take another look later, it's the wrong time of day to be messing about, I think.....
I then spotted the notice about having to remove and re-add the project.
Did that and the files appeared in the directory, so I've added the tweaks. I'm not sure how to tell if it's making a difference though?
And, strangely, the notice has disappeared? Or is that because I've done what it asked?
I'm confused!
Update - after doing the config files I was getting computation errors after a few seconds of running.
I've gone back and cleared out the files and all seems well again. I've obviously done something wrong, I'll take another look later, it's the wrong time of day to be messing about, I think.....

Re: Tweaking app xml files
Hi Jeff,
ie

stderr states "Unrecognized command: kernels/reduction=56"
Also try dropping threads down to 8
Mark
If you check on your returned tasks, the times should be much reduced also looking in the stderr output, you should find the settings reflected there:
ie
Code: Select all
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 32
threads 2^9 (512)
lut_size 17 (1048576 bytes)
sieve_size 2^28 (14098344 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Yes

From your results, it looks like you've put "kernels/reduction=56" in the config file rather than "kernels_per_reduction=56"Jeffers wrote: ↑Sun Oct 14, 2018 5:45 amUpdate - after doing the config files I was getting computation errors after a few seconds of running.
I've gone back and cleared out the files and all seems well again. I've obviously done something wrong, I'll take another look later, it's the wrong time of day to be messing about, I think.....
stderr states "Unrecognized command: kernels/reduction=56"
Also try dropping threads down to 8
Mark
-
- Active UBT Contributor 15+ yrs
- Posts: 1627
- Joined: Mon Jul 24, 2006 1:00 am
- Location: Halifax, West Yorks.
Re: Tweaking app xml files
OK, but I'm sure I just cut and pasted from what was posted earlier. Just edited to add the "="
I'll have a go tomorrow, just back from the pub so definitely not the time to try........
I'll have a go tomorrow, just back from the pub so definitely not the time to try........

Re: Tweaking app xml files
Ah but some idiot (http://www.ukboincteam.org.uk/newforum/ ... file&u=380) didn't describe the contents properly. what was posted is the output from the stderr NOT what goes in the .config file.
.config file (input)
Code: Select all
verbose=1
kernels_per_reduction=32
threads=9
lut_size=17
sieve_size=28
sleep=1
cache_sieve=1
reduce_cpu=0
Code: Select all
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 32
threads 2^9 (512)
lut_size 17 (1048576 bytes)
sieve_size 2^28 (14098344 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Mark
-
- Active UBT Contributor 15+ yrs
- Posts: 1627
- Joined: Mon Jul 24, 2006 1:00 am
- Location: Halifax, West Yorks.
Re: Tweaking app xml files
There's no shame in being an idiot, been there myself a few times.....
Anyway, I think I've got it sorted. Run times for Collatz tasks now look to be around 9 minutes instead of 14.
I did change that threads option to 8 rather than 9 as suggested in your message earlier....
Anyway, I think I've got it sorted. Run times for Collatz tasks now look to be around 9 minutes instead of 14.
I did change that threads option to 8 rather than 9 as suggested in your message earlier....
