Previous Meeting Minutes from XSEDE SP Forum

Previous Meeting Minutes from XSEDE SP Forum

Meeting: July 21, 2022 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. ACCESS Coordinating Office - John Towns and Dina Meek

  3. XSEDE and SP Coordination News

  4. Next meeting:  August 18, 2022

Meeting: July 7, 2022 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. ACCESS Track 4 (Monitoring & Measurement Services): Tom Furlani and team

  3. XSEDE and SP Coordination News

  4. Next meeting:  July 21, 2022

Meeting:  June 23, 2022 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. Introducing ACCESS's operations and user services - The CONECT project:  Amy Scheule, Tim Boerner, Winona Sapps-Childs, Kathy Benninger and Alex Withers

  3. XSEDE and SP Coordination News

  4. Next meeting: June 23, 2022

Meeting:  June 9, 2022 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. Introducing ACCESS's approach to user services - The MATCH project:  Alan Chalker

  3. XSEDE and SP Coordination News

  4. Next meeting: June 23, 2022

Meeting:  May 25, 2022 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. Introducing ACCESS's approach to allocation services - The RAMPS project:  Dave Hart

  3. XSEDE and SP Coordination News

  4. Next meeting: June 9, 2022

Meeting:  May 12, 2022 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. Open Storage Network current activities and future plans: John Goodhue

  3. XSEDE and SP Coordination News

  4. Next meeting:  May 25, 2022

Meeting:  March 31, 2022 @1PM Pacific/4 pm Eastern

Attendees: 21

Agenda:

  1. Welcome

    1. Davide Del Vento is leaving NCAR, welcome Ben Kirk as NCAR's SP Forum representative

  2. Introducing ACES, a new NSF-funded resource (Honggao Liu, Texas A&M)

    1. New system ACES program (NSF ACSS grant - #2112356); 

      • $5M + 1.25M for 5 years); program has been archived

      • 2-3 awards/year over 5 years

    2. User Portal — uses Slurm and Liquid Software+Kubernetes

    3. Composable resources

      • 11.500+ CPU cores; dual intel sapphire rapides 48c profs

      • liquid composable infrastructure; 120 nodes;  supports up to 10 PCIe cards (GPU, FPGA, VE)

    4.  Mult. accelerator techs: GPU, FPGA, VE, Intel Optane memory

    5. Deployment: test phase: Sept2022;  early user:  Q4 2022; allocations: Q1/Q2 2023

    6. https://hprc.tami.edu/aces

  3. Roundtable

  4. Next meeting:  April 14, 2022 - SPs' Training activities and approaches

Meeting:  March 17, 2022 @1PM Pacific/4 pm Eastern

Attendees:  Tim Boerner, Kevin Colby, Davide Del Vento, Craig Earley, Jeremy Fischer, Jim Griffioen, David Hancock, Doug Jennewein, Ruth Marinshaw, Kenton McHenry, Tim Middelkoop, Sudhakar Pamidighantam, Tabitha Samuel, Sergiu Sanielevici, Eva Siegmann, Bob Sinkovitz, Dan Stanzione, Mary Thomas

Agenda:

  1. Welcome

  2. Discussion of Campus Champions SP allocations post-XSEDE, with Champions program leadership team member Tim Middelkoop and SP Forum member Doug Jennewein (also part of the CC leadership team)

    1.  Campus Champion program was introduced in 2009, and it has grown steadily over the years.  Post-XSEDE, the CC program will continue and the leadership team has been crafting a sustainability plan

    2.  Champions find value in the CC allocations (about 40 CCs out of 600 have active allocations) from Service Providers as well as significant value from the XSEDE allocations reports

    3. Champions would like clarity regarding whether the individual SPs will continue to provide allocations post-XSEDE.

    4. SPs noted that the ACCESS awardees should be known soon, but the SPs are committed to continue to provide allocations in some form.  For example, TACC noted that it would be happy to continue allocations going on their systems using local accounts if necessary.  There is hope that the allocations infrastructure and process will continue in ACCESS, rather than each SP having to manage parallel allocations.  In general, CC's usage of the allocations they have received has been quite modestSPs persist across XSEDE and ACCESS, and CCs will too.

    5. Discussion of how best to manage allocations if the number of champions per campus/institution grows significantly, as there IS overhead for the SPs in terms of account management. One recommendation was to create one shared project per institution. 

    6. Another suggestion was perhaps separate the types of CC allocations into (a)kick the tires – low CPU utilization, debugging mostly and (b) testing codes and running science, which are more intensive

  3. XSEDE program news:  Tim reported that the XSEDE Quarterly meetings had been held earlier in March, in a hybrid mode.  A future meeting topic for the SP Forum will be:  SP experiences over XSEDE and Lessons Learned, to help inform XSEDE's wrap up reports and insights to the NSF and community.

  4. Next meeting:  March 31, 2022

Meeting:  February 17, 2022 @1PM Pacific/4 pm Eastern

Attendees: Jaime Camboriza, Davide Del Vento, Craig Earley, Jeremy Fischer, Vikram Gazula, Jim Griffioen, Robert Harrison, John Huffman, Doug Jennewein, Honggao Liu, Ruth Marinshaw, Linh Ngo, Sudhakar Pamidighantam, Sheri Sanders, Sergiu Sanielevici, Joe Schoonover, Anita Schwartz, Eva Siegmann, Preston Smith, Carol Song, Dan Stanzione, Mary Thomas, Grigori Yourganov

Agenda:

  1. Welcome

  2. SP Coordination Updates (Tabitha Samuel) - Ookami will soon be an allocable resource through the XRAC process

  3. Ookami presentation and discussion (Robert Harrison and Eva Siegmann)

    • Ookami is a testbed system, up for 2 years, 1 year operating as a testbed; 5 year project, + 1 additional year to ops)

    • mid-2022 move into production, 

    • Move to L2 SP (Oct’22) — 90% allocations dedicated to XSEDE

     

    Qs:

    • SVE compilers are getting better; performance is not what was expected

    • TACC: still waiting for the Fujistu compiler; how does it compare to ARM compiler?

      • Fujitsu and CRAY are generating good SVE code

      • Suggested by Robert: Fujistu ARM compiler is more standard, so better place to start

      • Lack of vectorized math is factor 40x different

      • e.g. memory bound code not looking great

      • ARM compiler getting better for TACC: everything compiles now they are working on optimization

    • apps :which work best? 

      • thread scaling is good for large memory BW

      • issue: short vectors add to instruction latency and power consumption

      • good with apps operating on long vectors

    • how are legacy codes doing (in particular chemistry)?

      • most "out of the box" codes are working fine

      • for well-vectorized codes, they have reasonably competitive code

      • vectorized scalar not doing that well.

    • How is NWChem working?

      • very well; Fujitsu compiler is giving good performance

      • expect different perf between mat-mat-mult and vectorization of FP operations

    • training

      • webinars on ARM vectorization and profiling tools

    • Data and Testbed?

      • each node has SSD (512GB)

      • high-bandwidth/local storage (talking with NSF (Bob Chadduck) to expand to multi-terabytes)

    • Modern (2008) fortran support/what features will Fujitsu support

      • ARM compiler (C/C++) is good

      • Cray is best (C++ 17 support)

      • Fujitsu — challenging

    • Are commercial vendors working on compiling software for the system?

      • they have 1:  Parallelware Analyzer; testing on Ookami

      • VAPS, working with XDMOD team to get “top 10 or 15”

    • Are they expecting a lot of startup allocation requests?

      • Yes. They expect a lot of exploration application

  4. Roundtable

    • Joe Schoonover noted that Fluid Numerics is working with AMD to have a co-located "lunch & learn" on HIP and OpenMP at PEARC22

  5. Next meeting: March 3, 2022 – seeking volunteers to present.

  6. March 17, 2022 meeting - Mary Thomas is organizing a panel or discussion around training

 

Meeting:  January 6, 2022 @1PM Pacific/4 pm Eastern

Agenda: 

  1. Welcome

  2. SP Forum leadership for remainder of XSEDE project

  3. Roundtable discussion

  4. Next meeting:  January 20, 2022

Meeting:  December 8, 2021 @1PM Pacific/4 pm Eastern

  1. Welcome

  2. XSEDE Evaluation Team Activities - Julie Wernert

  3. Roundtable

  4. Next meeting:  January 6, 2022

Meeting:  October 29  @1PM Pacific/4 pm Eastern

 

  1. Welcome

  2. XSEDE's YouTube channel (Susan Mehringer)

  3. Featured L3:  Arizona State (Doug Jennewein)

  4. XSEDE Program Updates (John Towns/Tim B)

  5. ROI Survey Reminder

  6. Next meeting:  November 11, 2021

Meeting:  July 8, 2021  @1PM Pacific/4 pm Eastern

Agenda: 

  1. Welcome

  2. Dana Brunson, Internet2:  New Center of Excellence

  3. John or Tim: XSEDE annual review recap

  4. Roundtable (all)

  5. Next meeting:  August 5:  Terminology Training

Meeting:  June 10, 2021  @1PM Pacific/4 pm Eastern

  1. Welcome

  2. David Wheeler, XSEDE Data Transfer Service (DTS) Engagement

  3. Ruth Marinshaw: Fluid Numerics' SP Forum membership application update

  4. Newest SPs:  updates on onboarding and allocation processes

  5. John or Tim: XSEDE annual review recap

  6. Roundtable (all)

Meeting:  May 27, 2021  @1PM Pacific/4 pm Eastern

  1. Welcome

  2. Honggao Liu: Texas A&M's FASTER system and HPC activities

  3. Ruth Marinshaw: Fluid Numerics' SP Forum membership application

  4. Tim: XSEDE program updates

Meeting:  April 15, 2021  @1PM Pacific/4 pm Eastern

Attendees: 20

  1. Welcome

  2. CC* CyberTeam from Utah, Colorado and Colorado State (RMACC): “Creating a Community of Regional Data and Workflow Cyberinfrastructure Facilitators”. (Brett Milash, Andrew Monaghan, Mara Sedlins)

  3. SP Coordination: working towards a single source of SP info (Tabitha Samuel)

  4. XSEDE News (Tim Boerner)

  5. Reminder to sign up to review XSEDE's plans for the coming year (https://docs.google.com/document/d/1R--DEdoo0_A7a-d2n4fYiqpa9Gp7dQpe7AZ8MnfMmBM/edit )

  6. Roundtable (all)

Meeting:  April 1, 2021  @1PM Pacific/4 pm Eastern

Attendees: 30

  1. Welcome

  2. XSEDE Terminology Task Force - Linda Akli & Susan Mehringer

    1. Status:  list, docs, outreach (Slides at events)

      • Subtasks: 

        • intake/review/add new terms;

        •  publicity/post list; 

        • training/orientation; internal communications; tracking metrics/records

      • Next: XSEDE rollout

    2. Each event has pre-slides to show

    3. Terminology Statement is now linked on bottom of home site

    4. Replacement terms:

      • master branch —> main branch

      • white paper —> position paper, publications, etc.

      • other word: disabled, brown bag, sanity check

      • NIPS project —> changed acronym —> NaIP

  3. PY11 Planning - Tim Boerner

    1. Annual project planning exercise (Sept 1 - Aug 31), start 6 months before; in time for June NSF meetings

    2. Goals: what would people do/not do with +- 5% change in budget?

    3. looking for input from  L3/L2 program areas… what should XSEDE do next year for the SP?

    4. Leslie Froeschl will put out a link to a google doc describing what they are seeking

  4. XSEDE transition and spin down - John Towns

    1. NSF ACCESS; does not cover all XSEDE activitie along with reduced support some existing activities

    2. creates risks for XSEDE and SPs in final deliverables

    3. May be a slightly higher risk near the end of XSEDE. 

    4. how to prepare for transition?

      • L1 SPs; having discussions with program officers about what to do after XSEDE services disappear? (sergio)

      • Concern for areas without an obvious continuation 

      • XSEDE may be understaffed — staff may move on to other positions

        • could SPs possibly find staff who can step in and finish project tasks?

        • will impact deliverables and support

    5. How are L1 SPs planning for end of XSEDE?

    6. what are the gaps in service — can we identify them?

      • support

      • training —> what will replace this?

      • user portal

      • allocation systems

      • ECSS —> end of ECSS will be a big gap in funding

      • Campus Champions

      • PEARC

    7. what types of mitigate plans should be put in place?

  5. Next Meeting agenda:  SP data cleanup and RMACC CC* CyberTeam presentation

  6. Roundtable (all)

Meeting:  March 18, 2021 @1PM Pacific/4 pm Eastern

  1. Rockfish - a resource coming to XSEDE mid-2021 (Jaime Comboriza, JHU)

  2. XSEDE News (Tim B or John T)

  3. Roundtable (all)

 

(To add: interim meetings)

Meeting:  January 7, 2021 @1PM Pacific/4 pm Eastern

  1. Welcome new SP Forum members

  2. Site Update:  Stanford

  3. XSEDE Project news (Tim B)

  4. SP Coordination news (Tabitha Samuel)

  5. SP Forum elections upcoming (Ruth)

  6. Roundtable (all)

Meeting:  December 17, 2020 @1PM Pacific/4 pm Eastern

Attendees:  Jon Anderson, Tim Boerner, Jaime Combariza, Keith Crabb, Matt Deaton, David Hancock, Andrew Keen, Ruth Marinshaw, Kenton McHenry, Sudhakar Pamidighantam, Tabitha Samuel, Carol Song, Sergiu Sanielevici, Dan Stanzione, Mary Thomas, John Towns, Jim Wilgenbusch, Paul Williams

Agenda

  1. Announcements (Ruth Marinshaw)

    • Chet Langin is retiring from SIU;  Matt Deaton and/or Jerry Richards will represent SIU as an L3 SP Forum member.

  2. L3 Update:  The Minnesota Supercomputing Institute (Jim Wilgenbusch, Director of Research Computing)

    • MSI’s Research Computing team provides access to diverse data & computing services; training and support via tutorials and other means across 2 campuses and 6 colleges; dedicated staff expertise and support written into grants; and deploy and test novel services.  In addition, they provide seed funding for informatics research projects as well as a graduate assistantship program for Informatics students.

    • Key service area of focus is U-Spatial (https://thespatialuniversity.umn.edu) with expertise in GIS, remote sensing and spatial computing

    • HPC resources include 2 HPC clusters, storage (6 PB high performance, 6 PB second tier, 30 PB tape library), interactive computing (Citrix, DCS Nice, Open Stack-based secure cloud), Galaxy, Jupyter Hub and custom web interfaces, databases and applications

    • 75% of compute resources are used by Engr, Chem and Physics; 50% of the storage is used by Biology, Genetics & Health Sciences.  

    • New major systems are purchased every 5 years. Next system coming spring 2021. Also have a condo-like option to buy in with a one time fee that covers equipment purchase as well as staffing to support those systems for 5 years.

    • Secure computing environment has 42TB VM block storage, 1  PB NFS storage, and nodes have 2TB SSD per node for job-allocated local scratch.

    • MSI has 45 FT staff; UM’s Informatics Institute (UMII) has 8 FTE; U-Spatial has 10.

    • Jim reports to the VPR; the MSI, UMII and U-Spatial programs report to him. There is a substantial governance structure associated with CI services. 

  3. XSEDE News (Tim Boerner)

    • The XSEDE Quarterly Staff meetings were held last week

    • Working on diagrams to illustrate the interconnections between the various XSEDE program areas

  4. Coming in January: SP Forum Elections 

  5. Next meeting:  Thursday, January 7, 2021.

Meeting:  October 29@1PM Pacific/4 pm Eastern

Attendees:  Jon Anderson, Tim Boerner, Jaime Cambariza, Alan Chalker, Kevin Colby, Keith Crabb, Jeremy Fischer, David Hancock, Ron Hawkins, Dave Hudak, John Huffman, Andrew Keen, Chet Langin, Dan Lapine, Lee Liming,  Ruth Marinshaw, JP Navarro, Sudhakar Pamidighantam, Tabitha Samuel, Scott Sakai, Sergiu Sanielevici, Anita Schwartz, Bob Settlage, Shava Smallen, Preston Smith, Dan Stanzione, John Towns, Brian Voss,

 

Agenda

The primary agenda topic was to return to the discussion of ways to deliver Open On Demand for XSEDE SPs.  Since discussed by the SP Forum earlier in 2020, the OOD development team and the XSEDE Requirements Analysis & Capability Delivery (RACD) team have been meeting to discuss possibilities and options. Their recommendation was as follows:

“Our recommendation for serving OOD to XSEDE users is to create a service provider specific OOD portal that served researchers from that SP but served as a portal to BOTH the local resources (simply PSU for instance) AND the XSEDE resources (local and remote, but moderated by what the user had access to), ie the user would navigate to ood.psu.edu and be able to see and hit any resource they have access to including local and XSEDE.  We do think there should be some consistency across SPs.

 

For completeness, the options we discussed included:

 

  1. create SP specific OOD portals managed by the site to server their XSEDE resources only, ie if a VT investigator had access to PSU XSEDE resources, they would go to

ood.xsede.psu.edu

 

  1. create a master XSEDE OOD portal that contained the connectors to the various SP resources, ie a VT investigator would go to

ood.xsede.org and be able to submit to the resource they were allocated on, ie xsede.psu

 

  1. create a SP specific OOD portal that served researchers from that SP but served as a portal to BOTH the local resources (simply PSU for instance) AND the XSEDE resources (local and remote, but moderated by what the user had access to), ie the user would navigate to

ood.psu.edu and be able to see and hit any resource they have access to including local and XSEDE.”

Draft use cases were reviewed and discussed based on this document: https://docs.google.com/document/d/1Vs87wUkRP9gPLP3ZC4JdE6YEdXhKo5RUGWZLqlQaEdY/edit

 

Key questions explored were what would OOD do for users?  Should XSEDE do this and, if so, how?  It was noted that several SPs have already adoped OOD.  Among the use cases that are compelling is that of training - participants don't have to download and configure a tool like Putty, etc.  SDSC noted that it has configured systems to use federated authentication.  OOD is also a powerful way to provide access to Jupyter notebooks and to applications like Matlab and Ansys.  The OOD developers are discussing integration with Globus. 

A vigorous discussion was held, with Forum members ultimately supportive of the recommendation from OOD and XRAS. 

The forum will not meet again until December, given Supercomputing and Thanksgiving.

Meeting:  October 15@1PM Pacific/4 pm Eastern

Attendees:  Jon Anderson, Jaime Cambariza, Kevin Colby, Jeremy Fischer, Dave Hart, Dave Hancock, John Huffman, Andrew Keen, Chet Langin, Rob Light, John Lowe, Ruth Marinshaw, Tabitha Samuel, Sergiu Sanielevici, Anita Schwartz, Carol Song, Mary Thomas, Nathan Tolbert, 14 others

Agenda

  1. XSEDE's RAS team discusses an updated AMIE environment for usage data and pushing project info out to SPs (Dave Hart, Rob  Light)

    • AMIE -XSEDE's account management and information exchange.  Work is in progress to overhaul its infrastructure and database and to improve analytics

    • current: data transport using XML via SSH or rabbitMQ

    • new features – target date is the end of this program year

    • new -- Rest API

      • transactions are mostly the same:  

        • e.g. request_project_create

      • notify_project_usage now has its own API

      • some SPs are starting to use

    • legacy AIME required SPs to maintain a local DB

      • new version does not need localDB, 

      • use new API to query

      • jobs sent to XDCDB individually via AMIE, very slow (transmission and processing —> bottleneck)

        • e.g.: 100,000 tiny jobs in 1 day; and this impacted all jobs

    • new usage API, restful; different URL; but same AMIE auth system

      • non-blocking transactions; uses AWS lambda so easy to scale functions so can handle different loads

      • currently process 1million jobs / month (5400 jobs/hour)

      • new API can process this in 2 hours 

    • new: POST / usage API

      • post many jobs at once

      • queues all jobs for async. processing

        • jobs that fail return with error msg

      • POSTing jobs already in the database will be replaces automatically (no longer manual)

    • GET /usage/ status

      • From and TO dates; sound of job records; sum of charge for each SP

      • provides list of records that did not load —> error log table

      • if job fails, SP can repost again

    • Usage API — avail soon

      • GET usage_summary, ….

      • different job attributes (CPU, GPU, storage, …)

    • AMIE Client library posted on GitHub:

    • Other tasks in redesign:

      • redesign entire system

      • simplifying: remove use of/mention of the Grid term

      • schema design changes to make action processing simpler; 

        • keep balance for each allocation not person. like bank account; job storage —> more flexible and faster

        • schemas for people and organizations—> right now they are embedded in accounting

      • changes will mostly be invisible to SPs

    • RDR redesign: resource description repository

      • simpler resource entry; better schema for describing resources

    Overall, everyone was very pleased with the new system.  The SP Forum gave the presenters the closest thing to a standing ovation that could be done via zoom

  2. SP Coordination (Tabitha Samuel) - Working on SP checklist updates as well as a software/services baseline.  There is also a Slack channel now for SP sys admins.

Meeting:  September 17@1PM Pacific/4 pm Eastern

Attendees:  Jon Anderson, Tim Boerner, Jeremy Fischer, Ron Hawkins, John Huffman, Kyle Hutson, Ruth Marinshaw, Addis O’Connor, Sudhakar Pamidighantam, Tabitha Samuel, Anita Schwartz, Carol Song, Sergiu Sanielevici, Derek Simmel, Preston Smith, Dan Stanzione, Barr von Oehsen

Agenda

1, Cybersecurity Updates (Derek Simmel)

  1. No XSEDE systems were broken into but a couple of user accounts were compromised

  2. Cooperation with/info sharing with European HPC community

  3. Use Zeek security monitoring (The Zeek Network Security Monitor ); XSEDE monitors and coordinates communication between allocated SPs

  4. L3s are, though, welcome to participate in XSEDE’s security community and meetings; contact derek (dsimmel@psc.edu) to be added to the meetings and email list

  5. Question about gateway users and potential vulnerabilities/concerns; vigilance in all aspects of information security was recommended

  6. Discussion of persistent intruder threats

Derek gave a brief overview of REN-ISAC (https://www.ren-isac.net/) for those who weren’t familiar and strongly encouraged sites to be sure their campus participates in some way 

  1. XSEDE Update (Tim Boerner

  1. Relatively quiet right now