Message boards :
News :
N-Body 1.18
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
I am releasing what will hopefully be the final update to the N-body code. It will be released as 1.18. You will see 1.14 and 1.16, and these will be outdated by tonight. I apologize for everything that has been happening with all of these updates, but things needed to be fixed. We will try to address multithreading issues with the following update. Jake |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,247,643 RAC: 23,614 |
I have been wondering what happened to the multithreading operation on my machine. I thought my machine was configured incorrectly. Are their a lot of issues and has someone summarized them somewhere? If you have time, I would interested in knowing what the problems are. thanks |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
I cannot explain what the cause of this problem is as I am still investigating, but many (so yes, it is widespread) users are having issues where N-body either: A) does not multithread at all B) does not multithread as much as it should Since most of our development team is composed of students, many are not here right now. I am doing my best to address these concerns and will keep everyone updated. Jake |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
I cannot explain what the cause of this problem is as I am still investigating, but many (so yes, it is widespread) users are having issues where N-body either: A) does not multithread at all B) does not multithread as much as it should Since most of our development team is composed of students, many are not here right now. I am doing my best to address these concerns and will keep everyone updated. Jake |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Jake, Are you aware of the private message conversation which I had with Jeff Thompson on 2nd April this year? If not, please read my side of the conversation, which I re-posted in public at message 58339 a couple of weeks ago. It was clear from the conversation that Jeff had been misdirected by his supervisors/superiors as to the nature of the BOINC 'plan_class' mechanism under which the application had been deployed on the MilkyWay BOINC server. Please read my explanation, think about it, and read it again. It's clear that no-body (pun intended) at MilkyWay understands how to integrate N-Body into the BOINC infrastructure. While composing this post, I re-activated host 479865 - Windows 7/64, running the current BOINC client v7.0.64 It has already returned two v1.18 tasks, showing the expected (by me) Using OpenMP 1 max threads on a system with 4 processors This is because you have not defined the app_version on the server to be a member of a plan_class with 'mt' in the name, and passing an appropriate --nthreads <cmdline> tag. If you would care to refer back to the Nobdy Release 1.02 thread (which you yourself closed with the words 'Don't worry! N-Body is getting a lot of attention.'), you will see that I posted details of the missing multi-threading commands last November, over 6 months ago. I despair, I really do. What else, and how much else, can I do to point you in the right direction? I cannot explain what the cause of this problem is as I am still investigating, but many (so yes, it is widespread) users are having issues where N-body either: |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I think I see the issue with the plan classes They were suppose to use the built in __mt plan class. The binaries have this but not the apps itself. So we pulled the gpu plan classes to avoid those issues but need to add the __mt directories. Working on it now. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Bed-time in the UK. I look forward to taking it for a whirl in the morning. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I have added the mt classes and they are showing in the application lists now. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I tried allowing new work again, and got a nice variety of work - de_ and ps_, dark and nodark - but all of it was assigned as single-CPU, none from the new MT plan class. I'm not sure how the scheduler chooses which version to send when both are available - you may need to deprecate the non-MT versions to force a test, or there may be something in project configuration. I'll take a look. |
Send message Joined: 15 Sep 11 Posts: 1 Credit: 2,047,878 RAC: 0 |
I'm not sure if this is a problem of N-Body or Boinc Tasks but anyway. In the WU "time left" it says 354 days but if you count real time left from progress and elapsed time the WU should be completed in 6 hours. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I'm not sure if this is a problem of N-Body or Boinc Tasks but anyway. It's a two-part problem. 1) there seems to be a very wide variation in the 'size' of the n-body tasks prepared by the server, and little (or no) correlation with the eventual running time. Here are the figures for the six tasks received by my test machine this morning: ----------------------------------------------------------- Name: de_nbody_06_05_nodark_2_1369931835_143443 Size: 1,841,130,000,000 Time: 10368 Name: ps_nbody_06_06_dark_1370577207_50707 Size: 651,009,000,000 Time: 676 Name: de_nbody_06_06_nodark_3_1370577207_63977 Size: 2,929,460,000,000 Time: 502 Name: ps_nbody_06_06_nodark_3_1370577207_64060 Size: 2,034,840,000,000 Time: 678 Name: de_nbody_06_06_dark_1370577207_17850 Size: 59,430,100,000,000 Time: (18000+ - still running, 70% complete) Name: de_nbody_100k_chisq_alt_40913_1366886102_615378 Size: 75,392,100,000,000,000 Time: (estimated 18,119:46:12 - slightly over two YEARS) ----------------------------------------------------------- "Size" is the <rsc_fpops_est> value set by the server for the task, which is converted into a runtime estimate by your BOINC client (using its understanding - accurate or not - of your computer's speed). "Time" is the final CPU time (in seconds) for completed tasks, or as noted. 2) The 'remaining time' estimation for a running task is calculated by the BOINC core client. (Boinc Tasks simply displays the values it's given - don't shoot the messenger) For some reason I didn't quite catch at the time, the 'remaining' estimate was changed in v7.0.64: it is now weighted far more strongly by the initial estimate, and only gives a very small weight to the actual running time until the task nears completion. You'll see that 'time left' drop like a stone when the task passes 90%, 95%, 99% done. |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,247,643 RAC: 23,614 |
Thanks Richard and Jeffery (I think it was you two who put mt back in working order) Richard, It looks like I inadvertently played your "straight man". It appears that your information has put "mt" back in play. It was not automatic but I am now running (it appears) mt MilkyWay workloads with multiple CPU. UPDATING caused MW mt workload to think it was running mt mode but only used one CPU. A DETACH and ATTACH seemed to fix it. The DETACH/ATTACH seemed to work for me. I am not suggesting that it be the general solution for everyone. I leave the general solution to those who know what is going on. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
It was all Richard he did hand us everything we needed to get it done. I messed it up in the 1.10 release it was a small mistake easy to fix but we had other bugs in the software to sort through at the same time with those sorted I had the ability to just look at my mistake and understand it on its own instead of with all the other bits. I have updated our documentation spelling out how to use the default classes in our release instructions to make sure as people will come and go in an academic environment that we don't lose the steps in transitions between teams as we did last time. I do believe there will be so smaller issues to shake out as we progress with this but I think the major misconfigurations are corrected. If there are other things happening let us know. Jeff |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Thanks. I doubt I'll be able to look at the MT solution tonight - my "two year" task is still plodding away on a single CPU - but as soon as BOINC will let me fetch new work, I'll grab a bundle and see what I can make of it. As you say, there are always some smaller issues left behind, but now we're on the same page it should be easier to get them sorted. That two-year task, by the way: it's just reaching 40% at 6 hours, so it should finish in a total of 15 hours or so. But BOINC still thinks it will take a further 15,275 hours (that's BOINC's problem, not mine or yours). |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,247,643 RAC: 23,614 |
I am still running but ran into what appears to be a scheduler problem. I have an 8-core i7 Sandy Bridge and an EVGA GTX 650 Ti Nvidia GPU. The CPU mt started up "running 8 cpus". The mt GPU task started up and takes 0.417 cpu. I see a pattern. Only one of the two will run under normally scheduling. They Ping-Pong back and forth where the 8-CPU mt version only runs during reload of the next GPU task and a short time following. If I suspend all GPU, the 8cpu mt starts. If I resume GPU both run for a short period of time and then the 8cpu mt job suspends. It appears that the CPU mt version wants ALL the CPU but the GPU starts and wants a CPU FRACTION with the total of the two is > total. 6/8/2013 3:07:48 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:07:50 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 6/8/2013 3:07:50 PM | Milkyway@Home | Project has no tasks available 6/8/2013 3:15:28 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_2_1370577207_678480_2 finished 6/8/2013 3:15:28 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062567_0 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:15:31 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:15:31 PM | Milkyway@Home | Reporting 1 completed tasks 6/8/2013 3:15:31 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:15:33 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 6/8/2013 3:20:29 PM | Milkyway@Home | Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9 6/8/2013 3:24:19 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_1062567_0 finished 6/8/2013 3:24:19 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062563_0 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:24:24 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:24:24 PM | Milkyway@Home | Reporting 1 completed tasks 6/8/2013 3:24:24 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:24:26 PM | Milkyway@Home | Scheduler request completed: got 2 new tasks 6/8/2013 3:25:31 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:25:31 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:25:33 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 6/8/2013 3:25:33 PM | Milkyway@Home | Project has no tasks available 6/8/2013 3:33:12 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_1062563_0 finished 6/8/2013 3:33:12 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:33:14 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:33:14 PM | Milkyway@Home | Reporting 1 completed tasks 6/8/2013 3:33:14 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:33:16 PM | Milkyway@Home | Scheduler request completed: got 1 new tasks 6/8/2013 3:34:22 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:34:22 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:34:24 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 6/8/2013 3:34:24 PM | Milkyway@Home | Project has no tasks available 6/8/2013 3:41:13 PM | | Reading preferences override file 6/8/2013 3:41:13 PM | | Preferences: 6/8/2013 3:41:13 PM | | max memory usage when active: 8183.22MB 6/8/2013 3:41:13 PM | | max memory usage when idle: 14729.80MB 6/8/2013 3:41:13 PM | | max disk usage: 100.00GB 6/8/2013 3:41:13 PM | | don't use GPU while active 6/8/2013 3:41:13 PM | | suspend work if non-BOINC CPU load exceeds 25 % 6/8/2013 3:41:13 PM | | (to change preferences, visit a project web site or select Preferences in the Manager) 6/8/2013 3:41:28 PM | | Suspending GPU computation - user request 6/8/2013 3:41:28 PM | Milkyway@Home | Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9 6/8/2013 3:41:37 PM | | Resuming GPU computation 6/8/2013 3:41:37 PM | Milkyway@Home | Restarting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:42:24 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_677538_2 finished 6/8/2013 3:42:24 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062568_0 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:42:26 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:42:26 PM | Milkyway@Home | Reporting 1 completed tasks 6/8/2013 3:42:26 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:42:29 PM | Milkyway@Home | Scheduler request completed: got 1 new tasks 6/8/2013 3:43:34 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:43:34 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:43:37 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 6/8/2013 3:43:37 PM | Milkyway@Home | Project has no tasks available 6/8/2013 3:49:42 PM | Milkyway@Home | Sending scheduler request: To fetch work. 6/8/2013 3:49:42 PM | Milkyway@Home | Requesting new tasks for NVIDIA 6/8/2013 3:49:44 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 6/8/2013 3:49:44 PM | Milkyway@Home | Project has no tasks available |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,247,643 RAC: 23,614 |
I was installing a new compiler and EVERYTHING was operating normally, I think. Please ignore my last post. |
Send message Joined: 26 May 11 Posts: 32 Credit: 43,959,896 RAC: 0 |
FYI, Reference mt & nbody. I have a 12 core computer and the nodark uses all 12 cores just fine. However, I have yet to observe the mt dark tasks utilize more than 1 equivalent core even though the task has taken control of all 12 cores. I am basing these comments viewing the resource monitor. Dark never gets above 10% CPU utilized and this has been after view many tasks. Many of the "dark" cpu cores are marked "parked" on the resource monitor and not utilized. Time to eat my words somewhat, finally got a dark that is utilizing all cores... |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
This is a problem. Are you positive it is only the dark work units? If so, I will take the search down. Jake |
Send message Joined: 26 May 11 Posts: 32 Credit: 43,959,896 RAC: 0 |
Continued Observations: So far AS I HAVE OBSERVED only 1 dark mt job has utilized all 12 cores. All of the short jobs - estimated at less than 10 minutes have all have many cores "parked". The one dark mt job that used all cores had an estimated time in the 0'000 hours, and took say 45 minutes to run... |
Send message Joined: 26 May 11 Posts: 32 Credit: 43,959,896 RAC: 0 |
Continued Observations: So far AS I HAVE OBSERVED only 1 dark mt job has utilized all 12 cores. All of the short jobs - estimated at less than 10 minutes have all have many cores "parked". The one dark mt job that used all cores had an estimated time in the 0'000 hours, and took say 45 minutes to run... |
©2024 Astroinformatics Group