Message boards :
News :
New Nbody version
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 May 14 Posts: 73 Credit: 356,131 RAC: 0 |
Hey all, I put out a new version of Nbody, version 1.44. This should fix some library issues people were having. If any issues arise, let me know. -Sidd |
Send message Joined: 18 Aug 09 Posts: 123 Credit: 21,238,136 RAC: 5,642 |
9/27/2014 9:59:52 PM | Milkyway@Home | Task ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited with zero status but no 'finished' file 9/27/2014 9:59:52 PM | Milkyway@Home | If this happens repeatedly you may need to reset the project. 9/27/2014 9:59:52 PM | Milkyway@Home | [task] task_state=UNINITIALIZED for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 from handle_premature_exit 9/27/2014 9:59:52 PM | Milkyway@Home | [task] task_state=EXECUTING for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 from start 9/27/2014 10:00:59 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:02:04 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:03:08 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:04:12 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:05:17 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:06:22 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:07:26 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:08:30 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:09:35 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:10:40 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:11:45 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:12:49 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:13:55 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:14:59 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:16:04 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:17:08 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:18:12 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:19:16 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:20:21 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:21:25 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed 9/27/2014 10:22:24 PM | Milkyway@Home | [task] Process for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited, exit code 0, task state 1 9/27/2014 10:22:24 PM | Milkyway@Home | Task ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited with zero status but no 'finished' file 9/27/2014 10:22:24 PM | Milkyway@Home | If this happens repeatedly you may need to reset the project. (Task Aborted after this cycle repeats 3 or 4 times) |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
Is this an isolated instance or does this consistently happen? |
Send message Joined: 1 Jan 14 Posts: 24 Credit: 4,277,349 RAC: 0 |
Task 839721663 erifi_000 · log out Name ps_nbody_08_05_orphan_sim_3_1411504411_79004_0 Workunit 624947649 Created 30 Sep 2014, 17:06:39 UTC Sent 2 Oct 2014, 21:06:27 UTC Report deadline 14 Oct 2014, 21:06:27 UTC Received 3 Oct 2014, 17:30:20 UTC Server state Over Outcome Computation error Client state Compute error Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED Computer ID 554457 Run time 4 hours 26 min 21 sec CPU time 1 days 5 hours 0 min 48 sec Validate state Invalid Credit 0.00 Device peak FLOPS 28.32 GFLOPS Application version MilkyWay@Home N-Body Simulation v1.44 (mt) Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> <search_application> milkyway_nbody 1.44 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 8 max threads on a system with 8 processors RHO MAX IS 3192.19230 3192.19230Using OpenMP 8 max threads on a system with 8 processors Using OpenMP 8 max threads on a system with 8 processors Using OpenMP 8 max threads on a system with 8 processors Using OpenMP 8 max threads on a system with 8 processors </stderr_txt> ]]> Main page · Your account · Message boards Copyright © 2014 AstroInformatics Group recent error I got. First one in awhile. |
Send message Joined: 1 Jan 14 Posts: 24 Credit: 4,277,349 RAC: 0 |
Name ps_nbody_08_05_orphan_sim_3_1411504411_79004_0 Workunit 624947649 Created 30 Sep 2014, 17:06:39 UTC Sent 2 Oct 2014, 21:06:27 UTC Report deadline 14 Oct 2014, 21:06:27 UTC Received 3 Oct 2014, 17:30:20 UTC Server state Over Outcome Computation error Client state Compute error Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED Computer ID 554457 Run time 4 hours 26 min 21 sec CPU time 1 days 5 hours 0 min 48 sec Validate state Invalid Credit 0.00 Device peak FLOPS 28.32 GFLOPS Application version MilkyWay@Home N-Body Simulation v1.44 (mt) Another one for you |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
This appears to be an issue with that specific work unit. RHO_MAX = 3000 is absurd. If it's just an isolated instance, I wouldn't worry. Jake |
Send message Joined: 13 Jan 08 Posts: 19 Credit: 820,482 RAC: 0 |
All appear to be going well except this one. Task 843900336 mycal · log out Name de_nbody_08_05_orphan_sim_3_1411504411_49264_4 Workunit 624873849 Created 6 Oct 2014, 4:47:51 UTC Sent 7 Oct 2014, 0:06:01 UTC Report deadline 19 Oct 2014, 0:06:01 UTC Received 9 Oct 2014, 3:18:18 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 572628 Run time 1 days 12 hours 46 min 25 sec CPU time 5 days 3 hours 15 min 14 sec Validate state Checked, but no consensus yet Credit 0.00 Device peak FLOPS 9.41 GFLOPS Application version MilkyWay@Home N-Body Simulation v1.44 (mt) Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkyway_nbody 1.44 Windows x86 double OpenMP, Crlibm </search_application> Using OpenMP 4 max threads on a system with 4 processors RHO MAX IS 994.77569 994.77569Using OpenMP 4 max threads on a system with 4 processors Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Using OpenMP 4 max threads on a system with 4 processors Using OpenMP 4 max threads on a system with 4 processors <search_likelihood>-481.668042842351160</search_likelihood> 03:17:15 (6108): called boinc_finish </stderr_txt> ]]> |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
This succeeded. I see some checkpointing issues, but nothing critical. Am I missing something? Jake |
Send message Joined: 18 Aug 09 Posts: 123 Credit: 21,238,136 RAC: 5,642 |
There are some serious bugs in this task: http://milkyway.cs.rpi.edu/milkyway/results.php?userid=57700 Only 2 users have completed this task successfully and their validation was inconclusive. 3 have had errors while computing and the task was terminated. 1 user (me) has aborted the task after 1,007,779.00 seconds Another user got the tasks new. The 3 with computation errors are using Linux with Intel(R) Core(TM)2 Duo CPU E8500, Intel(R) Xeon(R) CPU E5410, Intel(R) Xeon(R) CPU E5504 I use Win8.1 64 Pro with AMD FX(tm)-6350 Six-Core Processor (this task hogged all 6 of my CPU's) The two with success have:Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (Linux machine) and Intel(R) Core(TM) i7-4770 CPU (Win 7 64 pro SP1) The new person is using: Intel(R) Core(TM) i3-2350M CPU with Win 7 64 pro SP1 I have been busy all week so I did not have time to look more closely at the logs for my system. But today I aborted it (all I seem to do is abort tasks from you guys) because it got caught up in the Exit 0 (What is this????) loop and kept restarting. Maybe you guys should add some code that checks for Exit 0 errors and if the tasks gets more than 3 or 5 it forces Boincmgr to terminate the task. |
Send message Joined: 1 Jan 14 Posts: 24 Credit: 4,277,349 RAC: 0 |
Name de_nbody_08_05_orphan_sim_1_1411504411_153_4 Workunit 621565038 Created 10 Oct 2014, 5:17:34 UTC Sent 10 Oct 2014, 6:12:16 UTC Report deadline 22 Oct 2014, 6:12:16 UTC Received 11 Oct 2014, 1:54:46 UTC Server state Over Outcome Computation error Client state Compute error Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED Computer ID 554457 Run time 1 hours 53 min 55 sec CPU time 12 hours 30 min 15 sec Validate state Invalid Credit 0.00 Device peak FLOPS 28.34 GFLOPS Application version MilkyWay@Home N-Body Simulation v1.44 (mt Here I have another |
Send message Joined: 18 Feb 10 Posts: 3 Credit: 1,265,618 RAC: 0 |
I only have a single task at the moment de_nbody_08_05_orphan_sim_2_ etc etc It has been running for 28+ hours and has over 448 hours estimated remaining. Is this the new normal?? |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
This is normal. N-body simulations are proving very difficult to estimate completion times for. Apologies. It should not actually take that long (probably no more than 15 hours). Jake |
Send message Joined: 18 Feb 10 Posts: 3 Credit: 1,265,618 RAC: 0 |
Hi Jake. Thanks. In addition to the extraordinary length of time being estimated, my other "concern" is that I have no other tasks being queued. Maybe this is a function of the time estimate (?) |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,583,740 RAC: 668 |
Hi Jake. Thanks. There is no maybe about it. I always abort n-body WUs that take longer than 24 hours. |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
Yes. This is absolutely true. If I recall, the algorithm that did this was changed recently, and we had to significantly revise this calculation. The issue with time estimates in n-body simulations is the the time estimate is potentially dependent on any parameters you use to describe your system, the number of bodies (resolution), and whatever n-body solver you are using. By time, I of course am referring to CPU and not clock time. Actually, this issue has piqued my interest in this topic. I might take another stab at it to see if we can get a better time estimate. Jake |
Send message Joined: 19 Oct 14 Posts: 1 Credit: 27,506 RAC: 0 |
Hi, i just joined mw@home and i'm having issues with these nbody wus. I don't mind that they use all 4 cpus and take a long time but i don't like the amount of "validation inconclusive"s i'm getting for the amount of time i'm spending crunching these wus. And it's not just me; the inconclusive wus are inconclusive on other computers too, not just mine. Some computers can't even run them without errors. These are the culprets: ps_nbody_09_10_orphan_real_1_1413455402_15397 de_nbody_09_10_orphan_real_1_1413455402_19277 ps_nbody_09_10_orphan_real_2_1413455402_17734 de_nbody_08_05_orphan_sim_0_1413455402_1910 de_83_DR8_rev_8_4_00001_1413455402_1542781 de_nbody_08_05_orphan_sim_0_1413455402_14328 I guess one of those isn't an nbody but it inconclusive for me and another computer also. I was going to stop crunching these nbody wus but i thought i'd post here first and see what you guys had to say about it. Will i get credit for these sometime or will they stay inconclusive? How many computers have to successfully run the wu before a consensus is reached (or is it not like that)? LOL. I just went back to my "validation inconclusive" page and noticed one was missing since i made the above list. ps_nbody_09_10_orphan_real_1_1413455402_15397 that wu must have been validated while i was typing. So i guess i answered my own questions; yes, i will get credit and it take 3 successful runs to reach a consensus. LOL (instead of erasing this post i'll leave it in case someone else has the same issue/questions i had) |
Send message Joined: 25 Dec 10 Posts: 1 Credit: 3,757,042 RAC: 0 |
Hi, Still getting a problem where the task starts as ~6 hours of processing and then over time increases to 30, 50 and up to 90 hours of processing Remaining. I have had to kill several tasks that have been running for days with less than 10% complete. |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
I almost have a solution. I'm just testing now. Abort these if you wish, but they shouldn't actually take that long. |
Send message Joined: 1 Apr 08 Posts: 30 Credit: 84,658,304 RAC: 0 |
Hi I juste realized I have a MilkyWay@Home N-Body Simulation 1.44 (mt) running on my iMac (i7 late 2009) since more than a day Nom ps_nbody_08_05_orphan_sim_3_1413455402_39907_0 The % is moving slowly, but it's moving. The WU is eating avg 600/650 % CPU on the mac (out of 800% max, Mac OS X measures 100% per core), so it's working. So the calculation estimate is more than another day, it will be over 2 days of calculation if it terminates based on this estimate. Is this OK ? should I let it run ? |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
2 days is absurd and you shouldn't need this long to compute any of these simulations running on 6 cores. I would recommend aborting if it looks like it is going to take this long. |
©2024 Astroinformatics Group