8 September 2016, 10:30 - 11:45

Present: Florido, Caterina, Jonas, Oxana; additionally for the storage meeting: Anders F., Szymon Gadomski, Luis March

* MoU: Jonas suggests that the requirements towards LUNARC should be aligned with those applicable to the rest of SNIC/LUNARC users, since there is no possibility to offer additional applications expert to ATLAS (Florido is the only one). In particular, this concerns foreseen by the MoU regular reports to the users, which is not being done for others. LUNARC holds regular internal meetings, and Caterina suggests to publish minutes of those, however, such meetings may discuss sensitive aspects (e.g. security-related) and thus are not meant to be public. LUNARC is committed to communicate major outages promptly, and deploy requested software within a couple of days. This response time however can not be applied to the much more complex ATLAS software, which is a very special case. The standard channel to submit user requests is the RT issue tracking system of SNIC. The MoU can be simplified by listing tasks in a table.

* CentOS7 vs SL6: since CERN will use SL6 until the end of Run2, two options are considered: use Singularity, or to install SL6/CentOS6 on the HEP nodes. Nobody in Lund had time to test Singularity yet, but it is now being tested by the US LHC teams, the report is expected on Wednesday, is seen as a quick and simple solution. Installing CentOS6 on worker nodes is a more time-consuming work, since it implies configuring such sub-cluster from scratch. If Singularity environment will be set up at LUNARC, perhaps some ATLAS students can be assigned to try it. Jonas will assign Marcos to do the setup, and ask Tore about the possibility to deploy CentOS6 on HEP nodes.

* Storage: Florido prepared a proposal (https://docs.google.com/presentation/d/1agBLlMrMe3Pu1RGou5ut5LE0dgzeXFGztKu4Gjn_QBE/edit?usp=sharing), which was briefly discussed before the meeting with Geneva experts. During the meeting, Szymon explained that in Geneva they deployed 17 disk servers hosting 700 TB of data (an order of magnitude more than us), and the data from the Grid Storage Element were not synchronised or staged to a local storage: instead, user jobs were working in a "Grid mode", either pre-fetching needed data to a temporary space, or reading data directly via xrootd. We can do the same, but LDC network routing is likely to introduce notable latency (packets from storage to nodes might have to travel via Stockholm). Szymon is very concerned about heavy I/O load on storage servers, and advices to use Hadoop, which reduces the load by copying all the data 3 times, offers a common name space, and can make use of the existing disks in worker nodes. However, Hadoop as set up at RH is still a transient storage. Moreover, deploying Hadoop also takes quite some time and amounts to a project. Jonas points out that Aurora already has Lustre, and all agree that Hadoop will be a duplication of that. Caterina assesses storage needs as 100 TB to store xAOD (not critical, as it can be found elsewhere), and 50 TB to store skimmed ntuples; if xAODs will not be in Lund, the skimming will be done on the Grid, which means that ntuples will end up at the Grid storage as well. As the bottom line, it is clear that the fastest immediate solution is to keep the existing dCache storage and extend it with more disks. The data can be accessed via e.g. xrootd or by copying, and eventual routing problems will need to be solved with LDC.


Action points:
* Jonas asks Marcos to set up a Singularity environment
* Jonas asks Tore about the possibility to deploy CentOS6 on HEP nodes
* Florido adds the task table to the MoU
* Florido makes the storage proposal