26 August 2016, 14:10 - 15:12
Present: Balazs, Mattias, Jon, Aleksandr, Anders (last 2 min), Oxana
Apologies: David
= Urgent issues
Some unexplained crashes at NSC (segfaults) and LUNARC (huge infoprovider logs), probably due to some corrupted files. Still internal to NeIC, no reports filled yet.
= Bugs
-
3468 arex excessive logging when infoproviders timeout expires - no progress yet
-
3210 CPU time isn't measured correctly for some jobs (e.g. ALICE); Ake Sandgren had an idea how to address it, some ago
-
3470 Watchdog did not restart arched after segfault - no reason found yet
-
3163 Infosystem showing incorrect info on multicore jobs with condor backend - tetsbed is set up, but no tests ran yet
-
2036 infosys not scalable for ~100k jobs - requires a serious re-writing
-
3384 Support for per-queue authorisation configuration and publishing - a dramatic change, triggers a major release
-
3433 Publish authorised VOs per queue - related to the 3384 above
-
3486 External helper log file location is hardcoded to controldir/job.helper.errors - Aleksandr can easily fix it
-
3432 bdii-update.log fills up with complaints about dn suffix (REOPENED) - Mattias still to look into it, not easy to reproduce
-
3457 Accounting problem with PBS/torque for multi-core jobs (REOPENED) - no progress yet
-
3503 PBS scan not parse node information - probably related to 3457 above, to be clarified
-
3505 ACIX produces not only acix-cache.log, but also twistd.log - another specimen for the log zoo
-
3506 PBS scan does not handle job IDs without suffix - patch exists, some disagreements on style
-
3504 openldap 2.4.40 crashed after few minutes with ARC 5.x (MAJOR) - probably not our problem
-
3497 Skip heavily loaded delivery servers - David's todo list
-
3499 SGE LRMS inforprovider should properly detect GLUE2 OSName,OSVersion,OSFamily - a minor feature request
-
3500 LL LRMS inforprovider should properly detect GLUE2 OSName,OSVersion,OSFamily - twin brother of the above
-
3502 bulk arcls - David's todo list
= Release status
A bugfix release will be needed. To be fixed: 3470 (watchdog, would be nice), not clear when it'll be ready.
Jon will send a warning that a bugfix release is being planned.
Next major release would require a meeting.
= Coming meetings
September 10: back-ends FTF in Copenhagen
September 29: NeIC NT1 all-hands in Ljubljana
NordForsk project kick-off some time in October
= A.O.B.
Anders has some issues with linthian warnings (e.g. man pages for old arcproxy and canl++), something needs to be done.