Server Trouble

Author	Message
Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 72040 - Posted: 18 Mar 2022, 19:05:43 UTC - in response to Message 72039. Last modified: 18 Mar 2022, 19:16:50 UTC Was it a mirror/RAID and having one less drive lowered the read speed sufficiently to cause this problem? The server is in RAID, although I don't remember the actual RAID setup. Losing this drive meant that the server was constantly rebuilding the data from the missing drive off of the working disks, which ate up a lot of memory overhead. I didn't realise there was extra work from doing that, I guess there's some processing to be done to create the data from parity. This means it must be RAID and not just a mirror. The servers I've run have never been ones that do heavy processing and data storage on the same system, so I never had this problem. I'm guessing you don't have the cash to have several servers! You mention memory overhead, maybe a memory upgrade is also in order. Why wasn't there a spare? Drives are not that expensive. I guess this drive was the spare, since the server is in RAID. We are now considering purchasing more drives and running in a higher RAID level (thanks to recent donations!) Changing from RAID 5 to RAID 6 for example means that 2 drives can fail before armageddon. However you can also have a "spare" drive that's sat there not even powered up, but if one fails, the server automatically powers that up and uses it, as though you'd plugged it in as a replacement. Glad to hear the donations worked well. That really should be on the home page, I never knew it existed. And are they SSDs or HDDs? I believe they are HDDs. However, I could definitely be wrong about that. I didn't build the server, so I'm not super aware of what hardware it has besides what I can gather from the command line. You can't interrogate the hardware that way? If they're not SSD, no wonder things are slow. How big are they? SSD is pretty cheap now, around 20 cents a GB for server grade SSDs or NVMEs. ID: 72040 · Rating: 0 · rate: / Reply Quote

nairb Send message Joined: 17 Feb 09 Posts: 24 Credit: 3,503,203 RAC: 36	Message 72041 - Posted: 18 Mar 2022, 19:27:44 UTC The server page is all green and the feeder is green but 2 of my machines still get' 18/03/2022 19:24:18 \| Milkyway@Home \| Server error: feeder not running Is it sit and wait time?. ID: 72041 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 72042 - Posted: 18 Mar 2022, 19:28:06 UTC - in response to Message 72040. I plan on putting it on the home page at some point, I have no idea why the donation page isn't there in the first place! Also, I did end up finding a way to identify HDD vs SSD without installing additional tools, and the server drives are HDDs. ID: 72042 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 72043 - Posted: 18 Mar 2022, 19:29:36 UTC Also, yes it is sit and wait time. The new drive is plugged in, but the server will have to recreate the faulty drive from parity, and that will take a little time. ID: 72043 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 72044 - Posted: 18 Mar 2022, 19:32:46 UTC - in response to Message 72042. I plan on putting it on the home page at some point, I have no idea why the donation page isn't there in the first place! Also, I did end up finding a way to identify HDD vs SSD without installing additional tools, and the server drives are HDDs. If the donations are enough, I would suggest at some point more memory and SSDs would cheer it up. Looks like it was only just managing before the disk failed? ID: 72044 · Rating: 0 · rate: / Reply Quote

Jimbocous Send message Joined: 7 Mar 20 Posts: 22 Credit: 106,255,096 RAC: 11,014	Message 72045 - Posted: 18 Mar 2022, 19:35:34 UTC Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span. ID: 72045 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 72046 - Posted: 18 Mar 2022, 19:50:54 UTC - in response to Message 72045. Last modified: 18 Mar 2022, 19:56:54 UTC Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span. Clearly with this server the HDDs were only just managing if one dying made it this bad. The life span may be less on SSDs, but it's more measurable, they tell you exactly when they're going to wear out. It all depends on the capacity required, the speed required, and the funding available. Universe@Home and SIDock@home recently changed over to SSD. Rosetta has had SSDs for a while - 72 of them!!! ID: 72046 · Rating: 0 · rate: / Reply Quote

Tom George Send message Joined: 29 Dec 21 Posts: 7 Credit: 8,995,805 RAC: 0	Message 72048 - Posted: 18 Mar 2022, 21:13:09 UTC - in response to Message 72045. Last modified: 18 Mar 2022, 21:14:16 UTC Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span. I agree with you... Maybe some SSD for cache drives (depending on if you're using a SAN or not) but I would devote the storage to HDD. ID: 72048 · Rating: 0 · rate: / Reply Quote

GolfSierra Send message Joined: 11 Mar 22 Posts: 42 Credit: 21,902,543 RAC: 0	Message 72049 - Posted: 18 Mar 2022, 21:25:56 UTC Just received another bunch of WUs and the completed ones were uploaded. We're making progress here! Thanks, Tom! ID: 72049 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 72051 - Posted: 18 Mar 2022, 21:41:06 UTC - in response to Message 72048. Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span. I agree with you... Maybe some SSD for cache drives (depending on if you're using a SAN or not) but I would devote the storage to HDD. You have to if you have a lot of access. Mechanical stuff don't cut it no more. Rosetta: Primary file server: 72 x 1TB SSD via LSI SAS 9207-8i ID: 72051 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,874,049 RAC: 43,183	Message 72052 - Posted: 19 Mar 2022, 0:10:36 UTC Still waiting for all the work today to start validating. ID: 72052 · Rating: 0 · rate: / Reply Quote

poppinfresh99 Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,400,538 RAC: 0	Message 72053 - Posted: 19 Mar 2022, 1:14:20 UTC - in response to Message 72024. Independant of what? The projects listed as independent at the "Choosing BOINC projects" page do not have a university or corporate sponsor. While it may seem that supporting non-sponsored projects is "helping the underdog", corporations and universities have standards for both the research and the experts employed to do the research. This mostly guarantees that your time is not being wasted on a project that has either or both of (1) bad code or analysis or (2) code that could be sped up over 50 times (Collatz project for example). Also, sponsored projects have a means of using or publishing their results, whereas the results of independent projects may be gone when their project's website is deleted some years later. ID: 72053 · Rating: 0 · rate: / Reply Quote

Cavalary Send message Joined: 23 Aug 11 Posts: 33 Credit: 11,367,307 RAC: 14,293	Message 72054 - Posted: 19 Mar 2022, 1:59:18 UTC Woop, could report completed tasks at least. Thank you for the effort put into it! Still not receiving new work yet though, says GPU work is available but for CPU not getting anything. ID: 72054 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 72055 - Posted: 19 Mar 2022, 8:28:34 UTC Ah, all work gone back and new stuff come out and half a million less queued on the server to validate, looks like it's catching up well. ID: 72055 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 72056 - Posted: 19 Mar 2022, 8:29:39 UTC - in response to Message 72053. Independant of what? The projects listed as independent at the "Choosing BOINC projects" page do not have a university or corporate sponsor. While it may seem that supporting non-sponsored projects is "helping the underdog", corporations and universities have standards for both the research and the experts employed to do the research. This mostly guarantees that your time is not being wasted on a project that has either or both of (1) bad code or analysis or (2) code that could be sped up over 50 times (Collatz project for example). Also, sponsored projects have a means of using or publishing their results, whereas the results of independent projects may be gone when their project's website is deleted some years later. I've heard about Collatz, yet nobody has produced this "better alternative" to the calculations. ID: 72056 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 72057 - Posted: 19 Mar 2022, 9:07:52 UTC - in response to Message 72046. Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span. Clearly with this server the HDDs were only just managing if one dying made it this bad. The life span may be less on SSDs, but it's more measurable, they tell you exactly when they're going to wear out. It all depends on the capacity required, the speed required, and the funding available. Universe@Home and SIDock@home recently changed over to SSD. Rosetta has had SSDs for a while - 72 of them!!! Reading newer articles on "SSD vs. HDD" lifespans is quite interesting. Things have changed in the past years. Repairing a HDD is expensive and takes time (usually) - and you loose data on it, which is "ok" if you have a "good" RAID-system. But I would say, either way (SSD or HDD), the key is to have enough(!) spares in/for an excellently designed RAID-setup. So you just have to pull the disk and replace it. Or let RAID switch by itself to an "online" spare (if present and please more than one). The recovery will be done through the RAID-system. This has all been said in previous posts. It is just a matter of "cash". So let's donate ... ID: 72057 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 72058 - Posted: 19 Mar 2022, 10:27:17 UTC - in response to Message 72057. . But I would say, either way (SSD or HDD), the key is to have enough(!) spares in/for an excellently designed RAID-setup. So you just have to pull the disk and replace it. Or let RAID switch by itself to an "online" spare (if present and please more than one). The recovery will be done through the RAID-system. This has all been said in previous posts. It is just a matter of "cash". So let's donate ... And "Time" to repair the drives as Tom mentioned as well which can't be donated. Unless the "online spare" can keep itself up to date automatically it still takes "Time" to bring it into the fold and be ready for use no matter how fast or big it is. Now yes a different Server setup with one for tasks being sent out and one for tasks being returned should speed up the Project and that CAN be done thru donations!! BUT and yes there is always one it depends on the amount of space etc allotted for the MilkyWay Servers etc, remember Seti had size limitations because they were relegated to a "small closet" before they shut down. ID: 72058 · Rating: 0 · rate: / Reply Quote

Max_Pirx Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,421,362,376 RAC: 0	Message 72060 - Posted: 19 Mar 2022, 10:33:53 UTC Still, no regular supply of WUs for me despite the server status is all green and plenty of 'available' WUs to send. Verry intermittent supply of a few tens of WUs to a couple of hosts. I was hoping that by now the server should have been sorted. It's getting to the point of being really annoying and thinking about moving to a different project all together. ID: 72060 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 72062 - Posted: 19 Mar 2022, 10:41:25 UTC - in response to Message 72060. Max: I solved it (which you probably already tried without success) successfully by setting the project to no new tasks supsend exit boinc wait at least 3 minutes restart boinc set new tasks allowed resume Maybe it will help your case .... ID: 72062 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 72063 - Posted: 19 Mar 2022, 10:45:45 UTC - in response to Message 72058. mikey: time is stretchable "thing". Sometimes it can mean several minutes or an hour only or maybe days or even weeks. I wonder how long it took to repair this HDD? ID: 72063 · Rating: 0 · rate: / Reply Quote