Message boards :
News :
Scheduled Server Maintenance Concluded
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hello Everyone, I just finished up server maintenance. Expect errors from any runs not labelled: de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1 de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_2 de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_3 de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_4 If you experience errors from these runs specifically please post below so I can help troubleshoot them. Thanks for your patience and support, Jake W. |
Send message Joined: 1 Apr 10 Posts: 49 Credit: 171,863,025 RAC: 0 |
You said that there would be a credit update as well. Not seeing that ! Regards John |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
Is a massacre. Hundreds of wu ends with error after 1 second of crunching on a r9-290. All worked fine till this update. Do we need some special app config? Regards |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
Wus like this work well de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3 the ones you mentioned like de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1_1493824438_35417_1 end all with error of BSOD. Why don't you delete this wus batch?? |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
GIPICS, Can you double check and make sure you have the names right. The Sim19 runs should be running fine and the others should be erroring. Nothing should need to be changed on your end unless you are using a custom built binary. If you are running a custom binary you will need to rebuild it from master on our github page. I see you have several hosts attached to your account. Is this a common problem you see across all of your hosts? Jake |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
John, Looking into the credit issue now. Might be a couple hours before you see it updated on your end as workunits run through the queue. Jake |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Giving the server a quick reboot while I update the server binaries to recognize the new credit calculation. Don't be alarmed. Jake |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
Here i am so yes, i had the opposite situation What sholdn't work worked very well and the wus with good labels ended in massacre Anyway on the r9-290 all the wu like this de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x ended after one second with computing error I would like to remind that till today, to let the 290 crunching i needed (and i think most of the crunchers) an app_info.xml I stopped boinc, deleted that app_info and .. started the boinc client and magia! all the wus de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x works very well yeahhhhhhhhhhhhhhhhhh |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
GIPICS, Looks like you were running with a custom application, or preventing the client application from updating with your old config file. Glad all works now. I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required. Jake |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
But this is not all on the 280x - 280 all the wu like de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3 end with error no app_info.xml here any way to prevent to get those wus? you were talking about a way to solve this issue... |
Send message Joined: 30 Mar 09 Posts: 63 Credit: 621,582,726 RAC: 0 |
GIPICS, Just had the first 'new credit'; 227.23 credits |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
GIPICS, I have already cancelled making more of those workunits. Just letting them clear out of the queue. Jake |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I see the remaining time has been fixed on the 5 bundles. Thanks for that. |
Send message Joined: 16 Nov 14 Posts: 16 Credit: 335,683,507 RAC: 0 |
I don't know if this means anything, but many of my compute errors like the ones listed in this thread say Nvida in the work unit details. This is a dual R9 280x system. https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=723824&offset=0&show_names=0&state=6&appid= |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey iwajabitw, The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine. Thanks for the update, Jake |
Send message Joined: 16 Nov 14 Posts: 16 Credit: 335,683,507 RAC: 0 |
Hey iwajabitw, Thanks Jake |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks. They are all erroring out immediately, deferring communication with the server. In turn, when there are plenty of such errored tasks, communication gets deferred for hours and the queue gets empty. The only solution is to force a manual update or to abort such tasks before they get errored out. But that's possible only for attended machines. Left alone, when there are plenty of such errors, the communication gets deferred further and further. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I'm getting the same, they stop after 1 or 2 seconds. I don't know how long I've had them for though, as I run 4 projects. |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks...... the communication gets deferred further and further. Is the same over here Why don't they delete those useless wus batch that bring only total mess? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time. |
©2024 Astroinformatics Group