Message boards :
News :
New Separation Runs 6/9/2021
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
![]() Send message Joined: 26 Apr 08 Posts: 87 Credit: 64,801,496 RAC: 0 ![]() ![]() |
Same problem here. 50 invalids over the past three or four days. ![]() ![]() Plus SETI Classic = 21,082 WUs |
Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
Well, if you wanted Invalids on the gapfix_bgset3 you got them by their hundreds. Personally, I don't like them. They are mainly occurring on CPU tasks but some show up on GPU tasks. |
![]() Send message Joined: 30 Dec 14 Posts: 34 Credit: 910,198,756 RAC: 1,091 ![]() ![]() |
I currently have 452 invalid tasks on three different computers. Invalid tasks are on AMD GPUs and CPU tasks on a Dell T-420 server with two Intel Xeon processors. I've given up trying to figure if I possibly have hardware problems. |
![]() Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 ![]() ![]() |
Ugh, it looks like the invalid problems may not be fixed after all. I'll have to look into things in the next couple days. |
![]() Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 ![]() ![]() |
These validation problems are happening because different types of machines & setups are returning (slightly) different values for the likelihood of each workunit. The likelihoods are only off by a few parts in a million, but luckily we don't actually need the results to be that precise. I am looking into changing the validator for the project instead taking down the runs so that these stripes can get crunched. That may take a couple days though, so thanks for your patience in the meantime. |
![]() ![]() Send message Joined: 24 Jan 11 Posts: 716 Credit: 558,805,369 RAC: 31,577 ![]() ![]() ![]() ![]() |
Will relaxing the validator limits hurt the science results for future task sets? ![]() |
![]() ![]() Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 ![]() ![]() ![]() |
Will relaxing the validator limits hurt the science results for future task sets? If so maybe they can match up the pc's doing the crunching so ie Linux pc's only validate against each other and the same with Windows pc's. With Windows being MUCH more represented in the stats page it really shouldn't matter very much. |
![]() ![]() Send message Joined: 24 Jan 11 Posts: 716 Credit: 558,805,369 RAC: 31,577 ![]() ![]() ![]() ![]() |
Not sure where in the BOINC server code you can configure the scheduler for that change. Don't think it possible. Hope a dev will chime in and tell me I'm wrong. ![]() |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 109,633,250 RAC: 1,174 ![]() ![]() ![]() |
Will relaxing the validator limits hurt the science results for future task sets? That might help quite a lot with with CPU-to-CPU comparisons as it would [hopefully] eliminate rounding inconsistencies due to different compilers doing different optimizations and ordering of operati0ns within complicated expressions. However, it might not help as much when a CPU job is matched against a GPU job, especially if the OpenCL is using fused multiply-add... Different GPUs have different attitudes to rounding and how close one can get to zero before a value is treated as zero - and that can be a "feature" of hardware versions from the same supplier! And I've seen tasks where a Windows CPU result wouldn't validate with a Windows GPU job, so splitting out by Operating System alone may not be enough... I suspect that if Tom relaxes the validation constraints slightly we might still see some issues, but probably nowhere near as many. Without knowing the current required precision one can only speculate - so good luck to Tom! Cheers - Al. |
![]() ![]() Send message Joined: 24 Jan 11 Posts: 716 Credit: 558,805,369 RAC: 31,577 ![]() ![]() ![]() ![]() |
Well for an example of what is possible. I have not a single invalid or inconclusive task at TN-Grid where there are 3 separate cpu applications in play on the same datasets. SSE2, AVX and FMA. I have validated against every other type of application by my wingmen. So it is possible to get the same results or close enough to be considered valid if you setup the validator with relaxed enough parameters. ![]() |
![]() Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 ![]() ![]() |
There is a way to set up BOINC so that machines will only validate against other machines of the same type, but it seems like a pain and I believe it would force us to reconfigure a large portion of the project. Definitely not something I'm trying to do. When I was digging through workunit validations, it looked like the failed validations were all working but were just off by something like +/-0.00001, probably due to inconsistencies in CPU vs GPU or CPU compiler options, etc. I am planning on turning down (up?) the tolerance so that those slightly different results are accepted. I was looking through the server code today to try to find where to do that, but didn't see it. I will keep working on this so that hopefully things will be fixed soon. I'm not sure if a lowered validation threshold could potentially cause problems with the project science in the future... I'm not immediately sure why they would, as long as the threshold isn't set to be too lenient. It could just as easily be that we are already using too stringent of a validation threshold right now! Anyways, if it causes problems with the science, hopefully it would just be a matter of changing the tolerance back to the value it was before. |
Send message Joined: 11 Feb 14 Posts: 4 Credit: 321,527 RAC: 0 ![]() ![]() |
Hi! FYI, I have had validate errors for tasks 231095611 and 231095541! Nigel. |
![]() Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 ![]() ![]() |
The good news is that I've found the tolerance setting. The bad news is that in order to change it, I have to rebuild a couple binaries. I'm going to be working on this for a little while, but I think that it may require a server restart (or at least a temporary outage of a few services). I'll keep everyone up to date with progress as I figure more out. The validation errors seem to be holding steady at <8% for the time being, which isn't ideal, but it's not a lot and it doesn't seem to be increasing very quickly. |
Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
I am seeing 11.44% Invalids and fit is increasing daily. I have seen reports where four computers (2 Windows 10 and 2 Linux) tried to evaluate 4 completed tasks. 2 Tasks were judged to be Invalid, 1 Windows 10 and 1 Linux. Makes me think that problem doesn't reside in an operating system. Keep smilin'. |
Send message Joined: 30 Apr 18 Posts: 4 Credit: 26,044,476 RAC: 0 ![]() ![]() |
Maybe, better if you eventually assign credits for workunits having validation errors? The point is that our PCs spend their time to calculate results so they deserve credits anyway. As far as I know, some projects support this feature. |
![]() Send message Joined: 26 Apr 08 Posts: 87 Credit: 64,801,496 RAC: 0 ![]() ![]() |
My RAC has decreased from around 33,500 to under 25,000. I'm running about 20% invalid. I hope something can be done and that this is not going to be the "cost of doing business". Nevertheless, I enjoy running the project but I would like to see all of my work count. ![]() ![]() Plus SETI Classic = 21,082 WUs |
![]() Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 ![]() ![]() |
I have made changes to the validator and it is currently being run on the test server. I apologize for the high rate of invalids in the meantime, but a fix is coming. I would rather make sure that this new validator works than put a broken validator up on the server. |
![]() Send message Joined: 7 Mar 20 Posts: 22 Credit: 107,093,252 RAC: 9,877 ![]() ![]() ![]() |
Seeing about a 20% invalid rate on my running hosts as well (1 linux, 1 win10, both running CPU tasks only). Good to see in the forum here the issue's being worked. I'll take another look when I hear the change hits production. Thanks! ![]() ![]() |
![]() Send message Joined: 23 Aug 11 Posts: 35 Credit: 13,094,693 RAC: 20,710 ![]() ![]() ![]() |
Sure got worried when I saw my RAC drop by some 40% and a whole bunch of validate errors. But maybe it's just this issue and not my computer... |
![]() ![]() Send message Joined: 1 Jul 08 Posts: 88 Credit: 25,079,058 RAC: 0 ![]() ![]() |
Maybe, better if you eventually assign credits for workunits having validation errors? The point is that our PCs spend their time to calculate results so they deserve credits anyway. As far as I know, some projects support this feature. Hi AB, I'm beginning to agree with you in that our PCs do the work without errors just to have the validation system err for whatever reason and we should be compensated for the work done. I don't remember the project I was on some time ago, but they had the system set to credit us for the work done on tasks that had validation errors. My RAC here was finally back up to 121K and now it has dropped to just over 102K, soon to drop under that. :-( Have a great day! :) Siran CAPT Siran d'Vel'nahr XO - L L & P _\\// USS Vre'kasht NCC-33187 Winders 10 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
©2025 Astroinformatics Group