1 July 2015, 14:02 - 14:53
Present: Mattias, Jon, Balazs
Apologies: Oxana, David
=News
No news from Mattias.
Nothing from Aleksandr.
Jon: first package submitted to EPEL. Balazs asked if the SLURM comparison testing (Python vs. old script) started yet: answer no
Balazs: Tells about strange GGUS behaviour, work around VO-views.
=Cutting through the "Mess Everywhere"
motivation: the new major release at the end of the year will allow us to do proper cleanup of the messy ARC areas. This meeting we started to identify the topics which are most messy in ARC. Here is the initial list:
- server side logging: "state of-the-art" is almost collected, Jon got lost wrt the infosys logs
- interfaces: any chagne will be more difficult because of compatibility reasons
- treatmet of VOs: publishing, authorizing, discovering, accounting VO stuff: all done differently
- arc.conf : non-intruitive, naming inconsistency all over the place
- naming of ARC components, modules, functionality
=Release status
Jon: We are celebrating setting a new record to come up with a minor release: 23h:50min.
Dyma deployed 5.0.2, right after the discovered the problem with 5.0.1.
No plan yet for new minor release :)
Jon proposed both 5.0.1 and 5.0.2 for the UMD inclusion process ;)
Long discussion about the EPEL testing phase. No conclusion.
=Bugs
-3468 arex excessive logging when infoproviders timeout expires:
Balazs and Florido might propose a radical solution: remove a large part of code that causes the problem. more info to be posted later
-3210 CPU time isn't measured correctly for some jobs (e.g. ALICE)
no progress
-3470 Watchdog did not restart arched after segfault
Aleksandr, Jon thinks more investigation needed. The problem is unclear
-3163 Infosystem showing incorrect info on multicore jobs with condor backend
no progress
-2036 infosys not scalable for ~100k jobs
no progress
-3384 Support for per-queue authorisation configuration and publishing
no progress
-3486 External helper log file location is hardcoded to controldir/job.helper.errors
Aleksandr agrees with the proposed change
-3432 bdii-update.log fills up with complaints about dn suffix (REOPENED)
Mattias promises to look at it next week
-3457 Accounting problem with PBS/torque for multi-core jobs (REOPENED)
no progress
-3482 ARC cache service failed to stage data for job submitted via EMI-ES due to proxy issues (REOPENED)
cache-service welcome back :)
=AOB
David posts the following towards the end of the meeting:
this page was mentioned today, not sure if you have seen it https://twiki.cern.ch/twiki/bin/view/LCG/BatchSystemComparison
The vast majority of European WLCG sites are using Torque
i would like someone to comment on the APEL support row
There are minutes attached to this event.
Show them.