Message boards :
News :
The Last Couple Days
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Hi Everyone, As I'm sure you're all aware by now, the server went through a bit of maintenance the last couple days and now seems to be working normally. We had to clear out a lot of backlog workunits due to technical problems on our side. The bad news is that if you crunched workunits in the last two or three days, those workunits may have been deleted before they were validated, meaning that you might not be getting credit for them. The good news is that the server is working again and that we have taken precautions so that this won't happen again in the future. Apologies for any inconvenience this may have caused. This caught us by surprise just as it caught all of you by surprise, and know that we will always try our best to communicate with you all when these sorts of things happen. Best, Tom |
Send message Joined: 2 Apr 11 Posts: 2 Credit: 4,048,921 RAC: 0 |
Thank you Tom! I have a quick question. You mentioned that you have taken precautions. Are they related to the INT -> BIGINT transition you mentioned before? Or are they something else? Back to crunching as soon as my client downloads the new units. Proud Gridcoin Team member! |
Send message Joined: 9 Dec 11 Posts: 38 Credit: 1,497,896,956 RAC: 0 |
Communication in my most humble of opinions is vitally important. Not only does it keep we donors up to date, but shows that those running the project have not only an interest in whatever the goal is of the project, but those that donate cycles to the cause. Thank you greatly for continuing to keep us as up to date as possible :) |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Hi Artstein, Thanks for the question. We are going through the code that generated the faulty workunit that jammed things up and fixing the logic in that code. Also, we are no longer going to leave converged runs up after they have converged, and will instead refresh the runs before they cause these issues. I will probably make the INT --> BIGINT change at some convenient point in the future. We did not make that change yet because we were more concerned about getting the server back to functional and less concerned about future proofing. Tom |
Send message Joined: 9 Dec 11 Posts: 38 Credit: 1,497,896,956 RAC: 0 |
So in watching how things are operating I came to a question. Was it intentional to greatly reduce the amount of WUs a user gets per request? Before the outage i was getting 900(at most) WUs per request, now I'm getting 22. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
That may be a temporary side effect of flushing out the server. Hopefully as things run, workunits will fill up (the "tasks ready to send" are steadily climbing) and you will be able to pull more per request. Regardless, I will look into this problem, please keep me monitored on whether it continues to be an issue. |
Send message Joined: 18 Feb 10 Posts: 57 Credit: 222,646,444 RAC: 5,838 |
Got over 300 WUs on my 2 PCs on the first request after the outage. |
Send message Joined: 19 Aug 19 Posts: 3 Credit: 7,529,662 RAC: 0 |
Hello Tom, you wrote: "We had to clear out a lot of backlog workunits due to technical problems on our side. The bad news is that if you crunched workunits in the last two or three days, those workunits may have been deleted before they were validated, meaning that you might not be getting credit for them." I found three unconfirmed wu from Jan. the 25. an 30. After clicking on the work package links, I got no information on these wu. The server couldn't find them. Is it that what you meant? |
Send message Joined: 9 Dec 11 Posts: 38 Credit: 1,497,896,956 RAC: 0 |
Ok, time to turn in my "guy card" and give y'all a laugh at my expense. Like many of you, I participate in more than one project. For me it's this one, Prime Grid, and Einstein. My Nvidia GPUs are running PG and my AMD cards are here. I hit up Einstein when this project went through that rough spot the last couple days. Now this all relates to the thread I promise. I noticed that Einstein seemed to be slow to give me work as well, but there is chatter over there about WUs that allowed me to believe it could be me. Now this project is back to normal and as I said I was getting few WUs while others have said they getting work no prob. So i got to thinking "gee Hold, what would cause all your projects to get limited work?" I strolled over to PG to see if I found any primes being Tour de Prime and all and it hit me. You idiot, you set your WU que to 0 days. So ever hopeful, I set up an alternate venue that allowed several days que and ploped my primary MW cruncher in that venue. Told MW to get work and POW...900 WUs. Sorry if I cause you any worry Tom, looks like all is actually well with the servers, I'm just a scrub. Ok, y'all can stop laughing now lol. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Don't worry Holdolin, after 3 days sweating over a database I'm not in a position to make fun of anyone for dumb tech mistakes. Glad to hear that everything is working as it should be! |
Send message Joined: 2 Apr 11 Posts: 2 Credit: 4,048,921 RAC: 0 |
Confirmed it's working, BOINC downloaded 200 tasks and GPUs are currently crunching! |
Send message Joined: 6 Apr 20 Posts: 31 Credit: 1,251,105,570 RAC: 1,867 |
Thank you for keeping us informed! . |
Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,421,362,376 RAC: 0 |
I somehow end up with substantial amount oh inconclusive tasks (more than 50% actually). Is anyone else seeing something like that? |
Send message Joined: 9 Dec 11 Posts: 38 Credit: 1,497,896,956 RAC: 0 |
Ya, that happens. What's going on is older stats don't delete immediately so many of those inconclusives are from older WUs. Same thing with all the invalids from stripes 84 and 85. Was kinda shocked at seeing so many till i stopped to look and see it's older data. Out of 380 invalids, only 2 are recent and were server cancellations. All the rest were older. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I somehow end up with substantial amount oh inconclusive tasks (more than 50% actually). Is anyone else seeing something like that? You have your pc's hidden but Windows is rolling out some updates and you could have gotten caught by them, they LOOK like the offical drivers but don't have all the crunching stuff in them. If you are using Windows try reloading your drivers right over the top of the old ones and see if it helps. |
Send message Joined: 4 Nov 07 Posts: 3 Credit: 108,870,725 RAC: 12 |
Yes. A very large percentage of new tasks processed in the last 24 hours are getting "Validation Inconclusive" for me. |
Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,421,362,376 RAC: 0 |
Yeah, I'm aware of that and hav disabled driver updates. Otherwise windows all the time shoves in crap drivers. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Yes. A very large percentage of new tasks processed in the last 24 hours are getting "Validation Inconclusive" for me. Validation inconclusive is Milkyways way of saying 'waiting for a wingman', other projects do it other ways but this is Milkyway and they like it this way. |
Send message Joined: 10 Mar 11 Posts: 9 Credit: 16,497,101 RAC: 0 |
Running any project embraces its own level of stress. I'm certain everyone who participates in this project thanks you for what you've created and supported and empathetically cheers you on as you deal with the issues related to this endeavor. Cheers Tom! ~johnnymc Life's short; make fun of it! |
Send message Joined: 26 Feb 15 Posts: 6 Credit: 9,821,682 RAC: 0 |
I thank you for the good communication. Just a very small request. There are people in many, if not most of the time zones. If, when you give a time, could you give the time zone. Thanks for your work! |
©2024 Astroinformatics Group