Welcome to MilkyWay@home

Scheduled Server Maintenance Concluded

Message boards : News : Scheduled Server Maintenance Concluded
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66327 - Posted: 3 May 2017, 15:51:46 UTC

Hello Everyone,

I just finished up server maintenance. Expect errors from any runs not labelled:

de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_2
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_3
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_4

If you experience errors from these runs specifically please post below so I can help troubleshoot them.

Thanks for your patience and support,
Jake W.
ID: 66327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John G

Send message
Joined: 1 Apr 10
Posts: 49
Credit: 171,863,025
RAC: 0
Message 66338 - Posted: 3 May 2017, 17:50:55 UTC

You said that there would be a credit update as well. Not seeing that !

Regards

John
ID: 66338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GIPICS

Send message
Joined: 24 Apr 17
Posts: 8
Credit: 77,149,813
RAC: 0
Message 66339 - Posted: 3 May 2017, 18:02:12 UTC

Is a massacre. Hundreds of wu ends with error after 1 second of crunching on a r9-290.

All worked fine till this update. Do we need some special app config?

Regards
ID: 66339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GIPICS

Send message
Joined: 24 Apr 17
Posts: 8
Credit: 77,149,813
RAC: 0
Message 66341 - Posted: 3 May 2017, 18:48:12 UTC

Wus like this work well

de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3


the ones you mentioned like

de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1_1493824438_35417_1

end all with error of BSOD.

Why don't you delete this wus batch??
ID: 66341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66342 - Posted: 3 May 2017, 19:13:17 UTC

GIPICS,

Can you double check and make sure you have the names right. The Sim19 runs should be running fine and the others should be erroring. Nothing should need to be changed on your end unless you are using a custom built binary. If you are running a custom binary you will need to rebuild it from master on our github page.

I see you have several hosts attached to your account. Is this a common problem you see across all of your hosts?

Jake
ID: 66342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66343 - Posted: 3 May 2017, 19:15:41 UTC

John,

Looking into the credit issue now. Might be a couple hours before you see it updated on your end as workunits run through the queue.

Jake
ID: 66343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66344 - Posted: 3 May 2017, 19:29:03 UTC

Giving the server a quick reboot while I update the server binaries to recognize the new credit calculation. Don't be alarmed.

Jake
ID: 66344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GIPICS

Send message
Joined: 24 Apr 17
Posts: 8
Credit: 77,149,813
RAC: 0
Message 66345 - Posted: 3 May 2017, 19:29:06 UTC

Here i am

so yes, i had the opposite situation
What sholdn't work worked very well and the wus with good labels ended in massacre

Anyway on the r9-290 all the wu like this

de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x

ended after one second with computing error

I would like to remind that till today, to let the 290 crunching i needed (and i think most of the crunchers) an app_info.xml

I stopped boinc, deleted that app_info and .. started the boinc client and magia!

all the wus
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x

works very well

yeahhhhhhhhhhhhhhhhhh
ID: 66345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66346 - Posted: 3 May 2017, 19:33:36 UTC

GIPICS,

Looks like you were running with a custom application, or preventing the client application from updating with your old config file. Glad all works now.

I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required.

Jake
ID: 66346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GIPICS

Send message
Joined: 24 Apr 17
Posts: 8
Credit: 77,149,813
RAC: 0
Message 66348 - Posted: 3 May 2017, 19:46:52 UTC

But this is not all
on the 280x - 280 all the wu like

de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3

end with error

no app_info.xml here

any way to prevent to get those wus?
you were talking about a way to solve this issue...
ID: 66348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
aad

Send message
Joined: 30 Mar 09
Posts: 63
Credit: 621,582,726
RAC: 0
Message 66352 - Posted: 3 May 2017, 20:58:14 UTC - in response to Message 66346.  

GIPICS,

I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required.

Jake


Just had the first 'new credit'; 227.23 credits
ID: 66352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66353 - Posted: 3 May 2017, 21:09:56 UTC
Last modified: 3 May 2017, 21:10:03 UTC

GIPICS,

I have already cancelled making more of those workunits. Just letting them clear out of the queue.

Jake
ID: 66353 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 66354 - Posted: 3 May 2017, 21:18:31 UTC

I see the remaining time has been fixed on the 5 bundles. Thanks for that.
ID: 66354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
iwajabitw

Send message
Joined: 16 Nov 14
Posts: 16
Credit: 335,683,507
RAC: 0
Message 66355 - Posted: 3 May 2017, 23:59:39 UTC
Last modified: 4 May 2017, 0:00:05 UTC

I don't know if this means anything, but many of my compute errors like the ones listed in this thread say Nvida in the work unit details. This is a dual R9 280x system.

https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=723824&offset=0&show_names=0&state=6&appid=
ID: 66355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66356 - Posted: 4 May 2017, 0:21:20 UTC

Hey iwajabitw,

The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine.

Thanks for the update,
Jake
ID: 66356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
iwajabitw

Send message
Joined: 16 Nov 14
Posts: 16
Credit: 335,683,507
RAC: 0
Message 66357 - Posted: 4 May 2017, 0:49:05 UTC - in response to Message 66356.  

Hey iwajabitw,

The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine.

Thanks for the update,
Jake

Thanks Jake
ID: 66357 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 66358 - Posted: 4 May 2017, 6:33:21 UTC

I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks. They are all erroring out immediately, deferring communication with the server. In turn, when there are plenty of such errored tasks, communication gets deferred for hours and the queue gets empty. The only solution is to force a manual update or to abort such tasks before they get errored out. But that's possible only for attended machines. Left alone, when there are plenty of such errors, the communication gets deferred further and further.
ID: 66358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 66361 - Posted: 4 May 2017, 10:36:18 UTC - in response to Message 66358.  

I'm getting the same, they stop after 1 or 2 seconds. I don't know how long I've had them for though, as I run 4 projects.
ID: 66361 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GIPICS

Send message
Joined: 24 Apr 17
Posts: 8
Credit: 77,149,813
RAC: 0
Message 66362 - Posted: 4 May 2017, 11:29:31 UTC - in response to Message 66358.  

I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks...... the communication gets deferred further and further.



Is the same over here

Why don't they delete those useless wus batch that bring only total mess?
ID: 66362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 66363 - Posted: 4 May 2017, 11:35:43 UTC - in response to Message 66362.  

Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time.
ID: 66363 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Scheduled Server Maintenance Concluded

©2024 Astroinformatics Group