Bodrogi Kuria Code Camp Summary report
The third week of January 2016 the ARC developer community gathered together for a week-long face-to-face jamboree at the Bodrogi Kuria, near Budapest. The event was organized under the umbrella of the "ARC4eInfrstructures" Nordforsk project and was attended by thirteen participants including almost all the core ARC developers (see the event page [1] for the names). The venue provided a perfect atmosphere for efficient work: full-board with on-site catering, unlimited access to the conference room and a relaxing, distraction-free environment [2]. The event started with a Monday late afternoon session and ended with a Friday morning session followed by a lunch.
The goal of the developer week was to make real progress on difficult-to-discuss-via-email topics. Prior to the meeting the following discussion areas were identified: ARC configuration, controldir & scalability, VOs in ARC, authorization in ARC, overall simplifications.
After a short round-table introduction and logistical information, the Monday afternoon was spent trying to estimate the complexity of the proposed discussion areas, evaluate whether realistic progress could be made during the developer week. We also reviewed the ARC documentation published on the nordugrid website and on the nordugrid wiki [3]. The result of the quick documentation review was that the ARC documentation is rather broad and slowly getting obsolete. The large volume of documentation unfortunately makes the maintenance of the information difficult. Due to the limited available resources, we will focus on two documents and make sure that those are always up-to-date: the ARC CE sysadmin guide [4] and the arc.conf.reference [5]. The latter should be better advertised on the documentation webpage. The review of the nordugrid wiki page revealed the obvious: wikis usually contain lots of old garbage and the nordugrid wiki was not an exception. AS a quick action, Jon and Balazs got the Monday evening task to clean up and restructure the wiki after dinner. It was successfully completed.
We started the Tuesday morning with going through the various server-side ARC logfiles, checking their default locations, purpose, content. This was meant as a warming up exercise originally planned for max 1 hour but lasted almost all morning. Outcome and actions are recorded on the related wiki page [6].
Next topic was the performance metrics. In several iterations and smaller groups we came up with a set of server-side performance numbers (metrics) we want to measure on an ARC CE. The sub-system specific metrics and the proposed log file syntax are recorded on the dedicated wiki page [7]. In the afternoon we switched topics and addressed the VO handling area: how ARC handles VO assignments, authorization, publication. We converged to focus on supporting the basic use case of WLCG: VO assignment of a job is decided by the VO info available in the proxy. Then this job property is to be used consistently in every ARC subsystem like authorization (done via arc.conf authgroup blocks), accounting and infosys publication, including queue-relations. See details in the dedicated wiki page [8].
Working parallel to the logfiles/VOs/metrics discussions, a smaller group of coders were addressing the ARC and ATLAS data system integration: Cedric, Vincent and David made progress on the integration of Rucio traces: sending reports from aCT to Rucio regarding files that are used as input. These reports are used to measure popularity of ATLAS data; and the integration of ARC caches and Rucio: The content of ARC CE caches can now be propagated to Rucio to be used for ATLAS job brokering.
Wednesday we jumped on the arc.conf review. First the overall structure of the arc.conf was discussed (i.e. case sensitivity, partial order-dependence, so on), then the tedious job of checking and discussing every config parameter one by one started. It turned out to be a hell-of-a-job! Wednesday evening was the most difficult part when some people hinted as if the arc.conf re-engineering task were just a waste of effort again, and no way that we can finish it anytime. Balazs tried to keep the momentum up, convince people that this is our last and only chance to address configuration changes. Sort of worked since we did not give up and decided to continue the arc.conf review on Thursday as well. The config review turned out to be a very useful way of cross-checking ARC functionality and to better understand the various ARC components. As a result of the config restructuring review some new config blocks will be introduced, several parameters renamed, clarified and lots of obsolete options removed or hardcoded in the code. All the details are recorded in the working document of the arc.conf.reference (attached to the indico page).
Thursday was started with a new topic, the controldir scalability. Here Balazs reminded people about an earlier decision: we won't enter into a grandiose rewrite of ARC internals via introducing some database replacement for the controldir. In the short/medium term we want to improve the current system by e.g. restructuring files in the controldir, merging some of them, changing the way infosys or lrms backends or arex processes, scans those. But before any changes we want to have a better understanding of the bottlenecks. During the discussion some initial improvement ideas were brought up, such as infosys could skip scanning the deleted/finished controldir files. see details on the dedicated wiki task page [9]. After the controldir brainstorming, we moved back to the arc.conf review and re-structuring and despite all the bad feelings we had on Wednesday evening, we could finish the review by Thursday evening (!): the accounting subsystem (jura), the infosys blocks, gridftp and grid-manager blocks all were scrutinized. The config review turned out to be very useful exercise and we hope the new arc.conf structure, to be released in the next major release of ARC, will help new deployments and also simplify the management of current sites. On Thursday a smaller group worked on various backend bugs as well. As a social program, after the dinner, we went to Budapest where we climbed the Gellert hill and walked some 5 kilometers and visited of some of the ruin pubs [10].
Despite the late return from Budapest (around 2am), the Friday morning session started sharp 9am . First we discussed the way we are going to collect the performance metrics from the various ARC subsystems. See the "How" block on the Logging_of_CE_performance_numbers wiki page. Then as the last topic of the developer week we agreed on the actual steps for the arc.conf re-structuring [10]. The code camp was closed with a lunch.
[1] ARC code camp @ Bodrogi Kuria. Event page, including participant list, program and uploaded materials:
https://indico.lucas.lu.se/conferenceDisplay.py?ovw=True&confId=294
[2] The only non-coding activity one could do was to walk to the nearby village (we did it once) and to check out the mangalica pigs of the hotel
[3] Official ARC documentation page: http://www.nordugrid.org/documents/; nordugrid wiki http://wiki.nordugrid.org
[4] The main server-side ARC manual, http://www.nordugrid.org/documents/arc-ce-sysadm-guide.pdf
[5] The configuration reference document:
http://svn.nordugrid.org/trac/nordugrid/browser/arc1/trunk/src/doc/arc.conf.reference
[6] server-side logging wiki area:
https://wiki.nordugrid.org/wiki/ARC_server-side_logging
[7] Logging of CE performance numbers,
https://wiki.nordugrid.org/wiki/Logging_of_CE_performance_numbers
[8] VO handling in ARC task page: https://wiki.nordugrid.org/VO_handling_in_ARC
[9] controldir task page: https://wiki.nordugrid.org/Controldir
[10] ruin pubs of Budapest: http://ruinpubs.com
[11] dedicated wiki page for the arc.conf changes http://wiki.nordugrid.org/wiki/Arc.conf_review