Message boards :
News :
milkyway3 v0.02 source released
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
As I said before, we're moving over to a new application. Here's a preliminary version that we'll be using to test it's validator and assimilator. It's going to have to be updated with a new fitness function which uses a different model of the Milky Way galaxy, but I'm still waiting for John Vickers to give me go ahead to start using that code -- should be later this week. In the meantime, we can still start testing this code as it uses the same output format that John's new code will use. This new application should really improve server performance as it doesn't use search parameter files to send out new workunits, it will take the parameters from the command line instead. Additionally, it will report the fitness in stderr, which gets moved into the result's xml. What this means is the server will be creating 1 less file for every workunit made (it currently creates 1 file), and receiving 1 less file for every result sent (it currently receives 1 file). This means sending workunits out and getting results in won't be hammering the filesystem nearly as hard. Right now the recent server crashes have been happening because the file deletion daemon just can't handle the sheer mass of WU files that are being generated, and it brings the server to a screeching halt which then crashes the other daemons. This new application should fix that problem so we really want to get everyone swapped over to it ASAP. Contained in this news post (if you go to the forums) is the download location for the new source, and directions on how to compile and test it. --Travis |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
You can get the source here: http://milkyway.cs.rpi.edu/milkyway/download/mw3_v0.02.zip and http://milkyway.cs.rpi.edu/milkyway/download/mw3_v0.02.tar The makefile in milkyway/bin should work for both linux and OS X now. The BOINC software should be in the parent directory of milkyway, ie. if you have a software directory: /software /software/boinc/... <----boinc code here /software/milkyway/ <---- milkyway code here To be able to compile. To test the code, move your binary into milkyway/bin/test_files/ There's a test_small.sh and a test_large.sh. If you run the test_small.sh it should output the results for the different stripes provided in stderr.txt, and it should look something like the following. You can also run the tests individually (with test_application_on_ stripe 11: 21:21:38 (12732): Can't open init data file - running in standalone mode 21:22:20 (12732): called boinc_finish stripe 12: shmget in attach_shmem: Invalid argument 21:22:22 (12736): Can't set up shared mem: -1. Will run in standalone mode. 21:23:05 (12736): called boinc_finish stripe 20: shmget in attach_shmem: Invalid argument 21:23:07 (12741): Can't set up shared mem: -1. Will run in standalone mode. 21:23:16 (12741): called boinc_finish stripe 21: shmget in attach_shmem: Invalid argument 21:23:18 (12745): Can't set up shared mem: -1. Will run in standalone mode. 21:23:24 (12745): called boinc_finish stripe 79: shmget in attach_shmem: Invalid argument 21:23:26 (12748): Can't set up shared mem: -1. Will run in standalone mode. 21:23:32 (12748): called boinc_finish shmget in attach_shmem: Invalid argument stripe 82: 21:23:34 (12751): Can't set up shared mem: -1. Will run in standalone mode. 21:23:41 (12751): called boinc_finish stripe 86: shmget in attach_shmem: Invalid argument 21:23:43 (12754): Can't set up shared mem: -1. Will run in standalone mode. 21:23:58 (12754): called boinc_finish You want to make sure that the background_integral, stream_integral, and search_likelihood fields are all within 10e-11 (and probably at least 10e-12 to be really safe) of the ones given here to ensure they're validated correctly. If you're getting the results on a CPU, they should probably match exactly or close to exactly -- between my different machines they do. I'm currently running test_large.sh. this uses workunits that the server is currently sending out to people, and should have those values to compare against shortly. It's very important that these match up to the same amount of decimal places, because this is what we're actually going to be using, and the added complexity of these WUs could cause some differences that won't show up in test_small.sh -- which is only for testing purposes. When I have these results (hopefully tomorrow), I'll edit this post with them as well. --Travis |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Here's test_large.sh for stripe 11: stripe 12: stripe 20: stripe 21: stripe 79: stripe 82: stripe 86: |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I just noticed that the integral sizes for stripes 20, 21, 79, 82 and 86 (the large versions) were not large enough to reflect the current workunit sizes. To be sure we're getting the right values I've updated the link to the source code files. For stripes 20, 21, 79, and 82, the integrals should be: 32 r[min,max,steps]: 16.0, 22.5, 1400 33 mu[min,max,steps]: 133, 249, 1600 34 nu[min,max,steps]: -1.25, 1.25, 640 and for 86 (which does two integrals), they should be: 22 r[min,max,steps]: 16.000000, 22.500000, 1400 23 mu[min,max,steps]: 310.000000, 420.000000, 1600 24 nu[min,max,steps]: -1.250000, 1.250000, 640 25 number_cuts: 1 26 r_cut[min,max,steps][3]: 16.0000000000, 22.5000000000, 700 27 mu_cut[min,max,steps][3]: 21.6000000000, 22.3000000000, 800 28 nu_cut[min,max,steps][3]: -1.2500000000, 1.2500000000, 320 |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I have just built some kind of hybrid between MW2 and the new MW3. It takes the values from the search parameter file if it exists, otherwise from the command line. It should work for "sub projects" (an out file is written if the search parameter file was there). I just have to check all the different test units. It's a bit late here already, so for today just the stripe 11 WUs. Bear in mind, that double precision values have only slightly less than 16 decimal digits precision. That's why I output only those 16 significant digits. The binary encoding of the number is unambigously determined by those digits, anything more is simply redundant. stripe 11-small: <background_integral>0.0003444046777666962</background_integral> stripe 11-large: <background_integral>0.0003443992660766585</background_integral> The difference of the final likelihood between the stock OSX and the ATI app is about 10^-12 for the large (production size) WU. I think that is roughly the same range as the difference between the stock CPU and the CUDA application. I've not tested it yet (takes too long ;), but I think my CPU versions will return virtually the same (maybe with a 10^-15 difference) likelihood values as the ATI GPUs. PS: The small WU took 2 seconds or so, the large one about 8.5 minutes on a HD3870. PPS @Travis: I'm quite sure that the stock CPU and CUDA versions would get quite a bit closer to my results if they would use Kahan summations for everything outside of the convolution loops as I'm doing (integration as well as likelihood calculation). You told me the results can be anywhere in the stderr.txt and additional output doesn't hurt. If you want to test it, here is the complete output of the run: Can't set up shared mem: -1 Will run in standalone mode. Running Milkyway@home ATI GPU application version 0.24 (Win64, CAL 1.3) by Gipsel Parsing search parameters from command line. CPU: AMD Phenom(tm) 9750 Quad-Core Processor (4 cores/threads) 2.44807 GHz (294ms) CAL Runtime: 1.3.145 Found 3 CAL devices Device 0: ATI Radeon HD2350/2400/3200/4200 (RV610/RV620) 64 MB local RAM (remote 28 MB cached + 4 MB uncached) GPU core clock: 837 MHz, memory clock: 397 MHz 40 shader units organized in 1 SIMDs with 8 VLIW units (5-issue), wavefront size 32 threads not supporting double precision Device 1: ATI Radeon HD3800 (RV670) 512 MB local RAM (remote 28 MB cached + 0 MB uncached) GPU core clock: 837 MHz, memory clock: 397 MHz 320 shader units organized in 4 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision Device 2: ATI Radeon HD3800 (RV670) 512 MB local RAM (remote 28 MB cached + 0 MB uncached) GPU core clock: 837 MHz, memory clock: 397 MHz 320 shader units organized in 4 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision 0 WUs already running on GPU 1 0 WUs already running on GPU 2 Starting WU on GPU 1 main integral, 640 iterations (1600x1400), 3 streams predicted runtime per iteration is 649 ms (33.3333 ms are allowed), dividing each iteration in 20 parts borders of the domains at 0 80 160 240 320 400 480 560 640 720 800 880 960 1040 1120 1200 1280 1360 1440 1520 1600 Calculated about 3.28897e+013 floatingpoint ops on GPU, 2.47165e+008 on FPU. Approximate GPU time 502.078 seconds. <background_integral>0.0003443992660766585</background_integral> <stream_integral>0 34.16147717899386</stream_integral> <stream_integral>1 667.2161728244141</stream_integral> <stream_integral>2 336.15696917356</stream_integral> probability calculation (97434 stars) Calculated about 3.23676e+009 floatingpoint ops on FPU. <search_likelihood>-3.125086817279327</search_likelihood> <search_application>Gipsel_GPU_CAL_x64: 0.24 double</search_application> WU completed. CPU time: 5.875 seconds, GPU time: 502.078 seconds, wall clock time: 504.791 seconds, CPU frequency: 2.44809 GHz When the exact same application is used with a search parameter file, it is ignoring the command line arguments (and suppresses the warnings as seen with the current version). Unfortunately the results are not identical to the 0.23 version as 0.24 uses the updated (no hardcoded rotation matrix anymore) sgrToGal coordinate conversion in "atSurveyGeometry.c". Running Milkyway@home ATI GPU application version 0.24 (Win64, CAL 1.3) by Gipsel Search parameter file found. Ignoring search parameters on command line. CPU: AMD Phenom(tm) 9750 Quad-Core Processor (4 cores/threads) 2.44808 GHz (347ms) [..] |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I've updated the info with stipes 79 and 82, just one more to go. My poor old laptop has been crunching 24/7 trying to get the values out. How does your application compare to the other stripes? There's some different settings which might effect the end result (especially between stripes 11,12 and 20,21 and 79,82 and 86). I'm going to have to talk to Anthony about Kahan summation. I know I implemented it but I'm not quite sure if he's using it. Other than that the results look really good :) Having things not be identical to 0.23 doesn't really matter because we'll be using it for milkyway3 (and not the current running application). So it should match what we've got. I'm just finishing up some work generation changes and should hopefully start putting out WUs for the new application tomorrow. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I've updated the info with stipes 79 and 82, just one more to go. My poor old laptop has been crunching 24/7 trying to get the values out. I just modified the batch files and set the integral sizes in astronomy parameters to the values you mentioned above. It will take half an hour or so and I have the values for all stripes. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Small tests (stock app is the one for OSX with the values Travis posted above): stripe 11: stock app: -3.125097883803210 ATI3 0.24: -3.125097883803223 stripe 12: stock app: -3.220830076676231 ATI3 0.24: -3.220830076676198 stripe 20: stock app: -2.985361292761437 ATI3 0.24: -2.985361292761449 stripe 21: stock app: -2.889404057127142 ATI3 0.24: -2.889404057127163 stripe 79: stock app: -2.946733795900296 ATI3 0.24: -2.946733795900271 stripe 82: stock app: -2.985617635736142 ATI3 0.24: -2.985617635736161 stripe 86: stock app: -3.027973787718258 ATI3 0.24: -3.027973787718241 ==================================== Large Test: stripe 11: stock app: -3.125086817277920 ATI3 0.24: -3.125086817279327 stripe 12: stock app: -3.220820423936994 ATI3 0.24: -3.220820423937265 stripe 20: stock app: -2.985310834912030 ATI3 0.24: -2.985310834396576 stripe 21: stock app: -2.889340482223516 ATI3 0.24: -2.889340481681885 stripe 79: stock app: -2.946681292501331 ATI3 0.24: -2.946681291653775 stripe 82: stock app: -2.985567797346584 ATI3 0.24: -2.985567796521971 stripe 86: stock app: not available yet ATI3 0.24: -3.027907173993795 ================================== All results are given with 16 significant decimal digits. The results of the large WUs for stripes 20, 21, 79, 82, and 86 deviate significantly. That coincides with the wrong integral sizes defined in the astronomy parameter file for those stripes as mentioned by Travis above. I've run the calculations for the corrected values given in the linked post. I can only assume the values of the stock app are for the original (too small) values. Can you say something about it Travis? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I just updated with stripe 86 (which seems to match yours), but I'm sure the other values (20, 21, 79, 82) are for the right integral sizes, because I fixed them al before running them; and they all took 10-12 hours on my laptop. Smaller sizes would have been much quicker. *edit* doh i see the problem, 20, 21, 79, and 82 should be 640 for the nu value (not 320). That most likely explains the discrepancy. Going to fix the post about the sizes. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Large test updated with fixed integral sizes: stripe 11: stock app: -3.125086817277920 ATI3 0.24: -3.125086817279327 stripe 12: stock app: -3.220820423936994 ATI3 0.24: -3.220820423937265 stripe 20: stock app: -2.985310834912030 ATI3 0.24: -2.985310834913758 stripe 21: stock app: -2.889340482223516 ATI3 0.24: -2.889340482226973 stripe 79: stock app: -2.946681292501331 ATI3 0.24: -2.946681292507772 stripe 82: stock app: -2.985567797346584 ATI3 0.24: -2.985567797359872 stripe 86: stock app: -3.027907173971970 ATI3 0.24: -3.027907173993795 ==================================== The differences are generally in the 10^-12 range, but for stripe 82 and 86 10^-11. Maybe one should really look into reducing that a bit. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
The differences are generally in the 10^-12 range, but for stripe 82 and 86 10^-11. Maybe one should really look into reducing that a bit. I wonder if using Kahan summations on the CPU would bring things closer together at all. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
The differences are generally in the 10^-12 range, but for stripe 82 and 86 10^-11. Maybe one should really look into reducing that a bit. It definitely worked here ;) You see it when comparing the results of the ATI apps and my optimized CPU apps (starting with the 0.20 versions). They are extremely close (I've taken extra care to do some things as similar as possible in the ATI and CPU version). That is simply the effect of the Kahan summation and the "scale shift trick" in calculate_likelihood. The tests I did back then in september last year with some WUs (1600x700x320, 120 convolution steps) yielded really identical results for all my CPU and GPU versions. Did you read my PM? Should be easy to put those things in and check what it does. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Did you read my PM? Should be easy to put those things in and check what it does. I did and I'm gonna give it a shot. I'm just wondering if doing Kahan summation and the scale shift trick will bring the CPUs closer to the GPU or farther away. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Did you read my PM? Should be easy to put those things in and check what it does. You mean when comparing a CPU version with the changes to a CUDA app without them? When it is implemented in both, it will bring it closer together (as can be seen from my versions starting with 0.20, CPU and GPU versions have it). When the results could be seen in the task details for a short time, one has seen it is still true for current production WUs (just look at the results of the 0.21 GPU and the 64bit 0.20 SSE3 CPU application, they are identical). When implemented only in the CPU version, it may not help that much. It depends also how you do the reduction (summing the results of all threads) in the CUDA version. The ATI version did a tree like summation before, which was already better than what is currently done in the stock CPU application. The Kahan summation during the integration was therefore a less marked improvement as the changes to the likelihood calculation. But todays larger WUs may shift that as the summation method gets relatively more important with larger integrals. As one sees the deviation between the stock CPU version and my version implementing those changes grows with the integral sizes, one may conclude that a better summation outweighs the positive effect of the scale shift. The scale shift is extremely easy to do. You just need to change two lines of code or so, and it should work exactly the same for the CUDA version. It buys you simply roughly one digit more precision for that step (if the integral values are not already too "noisy" by accumulated numerical imprecision). Maybe you should start with that for both versions and see what happens. After that you can look into the better summation methods as this may be a bit more tricky for the CUDA version than the CPU version. ATI cards do some combination of a Kahan summation and a tree like Kahan sum (where 2 values with 2 correction terms are added together resulting in 1 value with 1 correction term) as the very last step (only once per WU). As said, it didn't changed too much compared to the earlier (simpler) approach, but this really depends how it is done in the CUDA version now (I guess I have to look into that). From my experience, it moved the CPU results more than the ATI results, because the ATI version used a slightly better method (pairwise/treelike sum) from the beginning. |
Send message Joined: 16 Jun 09 Posts: 85 Credit: 172,476 RAC: 0 |
Large 11 CUDA new: -3.125086817279229 stock app: -3.125086817277920 ATI3 0.24: -3.125086817279327 12 CUDA new: -3.220820423937230 stock app: -3.220820423936994 ATI3 0.24: -3.220820423937265 20 CUDA new: -2.985310834913899 stock app: -2.985310834912030 ATI3 0.24: -2.985310834913758 21 CUDA new: -2.889340482227060 stock app: -2.889340482223516 ATI3 0.24: -2.889340482226973 79 CUDA new: -2.946681292507908 stock app: -2.946681292501331 ATI3 0.24: -2.946681292507772 82 CUDA new: -2.985567797360019 stock app: -2.985567797346584 ATI3 0.24: -2.985567797359872 86 CUDA new: -3.027907173993949 stock app: -3.027907173971970 ATI3 0.24: -3.027907173993795 CUDA version has the added kahan summation |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
That's interesting. Looks like the two GPU apps are closer to each other than the CPU one. Going to have to try the Kahan summation in the CPU app to see if it brings it closer to them. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
[snipping the results] Looks great. The differences between CUDA and ATI are now down to the 10^-13 range. Have you also tried that "scale shift" thing? It improves the accuracy of the likelihood calculation (are you using Kahan sums also there?), if that is the limit now. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
That's interesting. Looks like the two GPU apps are closer to each other than the CPU one. Going to have to try the Kahan summation in the CPU app to see if it brings it closer to them. Any news to this? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
That's interesting. Looks like the two GPU apps are closer to each other than the CPU one. Going to have to try the Kahan summation in the CPU app to see if it brings it closer to them. Getting there :) Had a couple things come up. Should have some results this weekend. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Some results for testing the Kahan summation method with the CPU application (stock ksm, stock app is the old version): stripe 11: stock app: -3.125097883803210 stock ksm: -3.1250978838032299478300047 ATI3 0.24: -3.125097883803223 stripe 12: stock app: -3.220830076676231 stock ksm: -3.2208300766762043565449858 ATI3 0.24: -3.220830076676198 stripe 20: stock app: -2.985361292761437 stock ksm: -2.9853612927614427974276623 ATI3 0.24: -2.985361292761449 stripe 21: stock app: -2.889404057127142 stock ksm: -2.8894040571271575323919478 ATI3 0.24: -2.889404057127163 stripe 79: stock app: -2.946733795900296 stock ksm: -2.9467337959002648517525813 ATI3 0.24: -2.946733795900271 stripe 82: stock app: -2.985617635736142 stock ksm: -2.9856176357361552398117510 ATI3 0.24: -2.985617635736161 stripe 86: stock app: -3.027973787718258 stock ksm: -3.0279737877182353322780273 ATI3 0.24: -3.027973787718241 I'm gonna start crunching the large sized workunits to get some values for those. I should be releasing updated code later today. At least for the smaller WUs, it looks like it brought the values closer to the ATI application, sometimes overshooting :P which is kind of interesting. Generally it looks like we gained at least a significant digit of accuracy from it. It will be interesting to see the difference in larger sized workunits. |
©2025 Astroinformatics Group