Technical Coordination Weekly



Balazs Konya (Lunds universitet)
Technical coordination group members and invited persons
26 August 2016, 14:10 - 15:12

Present: Balazs, Mattias, Jon, Aleksandr, Anders (last 2 min), Oxana
Apologies: David

= Urgent issues

Some unexplained crashes at NSC (segfaults) and LUNARC (huge infoprovider logs), probably due to some corrupted files. Still internal to NeIC, no reports filled yet.

= Bugs
  • 3468    arex excessive logging when infoproviders timeout expires - no progress yet
  • 3210    CPU time isn't measured correctly for some jobs (e.g. ALICE); Ake Sandgren had an idea how to address it, some ago
  • 3470    Watchdog did not restart arched after segfault - no reason found yet
  • 3163    Infosystem showing incorrect info on multicore jobs with condor backend - tetsbed is set up, but no tests ran yet
  • 2036    infosys not scalable for ~100k jobs - requires a serious re-writing
  • 3384    Support for per-queue authorisation configuration and publishing - a dramatic change, triggers a major release
  • 3433    Publish authorised VOs per queue - related to the 3384 above
  • 3486    External helper log file location is hardcoded to controldir/job.helper.errors - Aleksandr can easily fix it
  • 3432    bdii-update.log fills up with complaints about dn suffix (REOPENED) - Mattias still to look into it, not easy to reproduce
  • 3457    Accounting problem with PBS/torque for multi-core jobs (REOPENED) - no progress yet
  • 3503    PBS scan not parse node information - probably related to 3457 above, to be clarified
  • 3505    ACIX produces not only acix-cache.log, but also twistd.log - another specimen for the log zoo
  • 3506    PBS scan does not handle job IDs without suffix - patch exists, some disagreements on style
  • 3504    openldap 2.4.40 crashed after few minutes with ARC 5.x (MAJOR) - probably not our problem
  • 3497    Skip heavily loaded delivery servers - David's todo list
  • 3499    SGE LRMS inforprovider should properly detect GLUE2 OSName,OSVersion,OSFamily - a minor feature request
  • 3500    LL LRMS inforprovider should properly detect GLUE2 OSName,OSVersion,OSFamily - twin brother of the above
  • 3502    bulk arcls - David's todo list

= Release status

A bugfix release will be needed. To be fixed: 3470 (watchdog, would be nice), not clear when it'll be ready.

Jon will send a warning that a bugfix release is being planned.

Next major release would require a meeting.

= Coming meetings

September 10: back-ends FTF in Copenhagen
September 29: NeIC NT1 all-hands in Ljubljana
NordForsk project kick-off some time in October

= A.O.B.

Anders has some issues with linthian warnings (e.g. man pages for old arcproxy and canl++), something needs to be done.

There are minutes attached to this event. Show them.