May 16 Maintenance Complete, Followup Begins

The May 16 maintenance outage was finished last week, with some drama related to Samba services (smb-cluster) and the aggr14 filesystem. All services were restored by late last week, and we've made a collection of "follow up" tasks that you can see here:

ITDEV-3229 - Getting issue details... STATUS

It took about 4 hours just to get things cleanly shut down for maintenance. The primary goals, to enable DMAPI on GPFS filesystems, were accomplished relatively soon after that. The secondary goals regarding the home-app cluster were completed next, making future matinenance on home-app servers much easier. The unexpected work came after, when Samba services did not properly return, and the repair of the aggr14 filesystem corruption took some time. Regarding aggr14, it turns out that some of the reported filesystem corruption was in fact a false positive report caused by a bug that has since been fixed in the next version of GPFS. The number of files that were actually corrupted was small, and the data was recovered after the filesystem came back online. So, in the end, no data was lost at all. Ironically, the aggr14 data in question is scheduled for deletion.

As usual, we thank you for your patience during maintenance outages. We know that the interruption can be frustrating.