Previous Meeting Minutes from XSEDE SP Forum
Meeting: July 21, 2022 @1PM Pacific/4 pm Eastern
Welcome
ACCESS Coordinating Office - John Towns and Dina Meek
XSEDE and SP Coordination News
Next meeting: August 18, 2022
Meeting: July 7, 2022 @1PM Pacific/4 pm Eastern
Welcome
ACCESS Track 4 (Monitoring & Measurement Services): Tom Furlani and team
XSEDE and SP Coordination News
Next meeting: July 21, 2022
Meeting: June 23, 2022 @1PM Pacific/4 pm Eastern
Welcome
Introducing ACCESS's operations and user services - The CONECT project: Amy Scheule, Tim Boerner, Winona Sapps-Childs, Kathy Benninger and Alex Withers
XSEDE and SP Coordination News
Next meeting: June 23, 2022
Meeting: June 9, 2022 @1PM Pacific/4 pm Eastern
Welcome
Introducing ACCESS's approach to user services - The MATCH project: Alan Chalker
XSEDE and SP Coordination News
Next meeting: June 23, 2022
Meeting: May 25, 2022 @1PM Pacific/4 pm Eastern
Welcome
Introducing ACCESS's approach to allocation services - The RAMPS project: Dave Hart
XSEDE and SP Coordination News
Next meeting: June 9, 2022
Meeting: May 12, 2022 @1PM Pacific/4 pm Eastern
Welcome
Open Storage Network current activities and future plans: John Goodhue
XSEDE and SP Coordination News
Next meeting: May 25, 2022
Meeting: March 31, 2022 @1PM Pacific/4 pm Eastern
Attendees: 21
Agenda:
Welcome
Davide Del Vento is leaving NCAR, welcome Ben Kirk as NCAR's SP Forum representative
Introducing ACES, a new NSF-funded resource (Honggao Liu, Texas A&M)
New system ACES program (NSF ACSS grant - #2112356);
$5M + 1.25M for 5 years); program has been archived
2-3 awards/year over 5 years
User Portal — uses Slurm and Liquid Software+Kubernetes
Composable resources
11.500+ CPU cores; dual intel sapphire rapides 48c profs
liquid composable infrastructure; 120 nodes; supports up to 10 PCIe cards (GPU, FPGA, VE)
Mult. accelerator techs: GPU, FPGA, VE, Intel Optane memory
Deployment: test phase: Sept2022; early user: Q4 2022; allocations: Q1/Q2 2023
Roundtable
Next meeting: April 14, 2022 - SPs' Training activities and approaches
Meeting: March 17, 2022 @1PM Pacific/4 pm Eastern
Attendees: Tim Boerner, Kevin Colby, Davide Del Vento, Craig Earley, Jeremy Fischer, Jim Griffioen, David Hancock, Doug Jennewein, Ruth Marinshaw, Kenton McHenry, Tim Middelkoop, Sudhakar Pamidighantam, Tabitha Samuel, Sergiu Sanielevici, Eva Siegmann, Bob Sinkovitz, Dan Stanzione, Mary Thomas
Agenda:
Welcome
Discussion of Campus Champions SP allocations post-XSEDE, with Champions program leadership team member Tim Middelkoop and SP Forum member Doug Jennewein (also part of the CC leadership team)
Campus Champion program was introduced in 2009, and it has grown steadily over the years. Post-XSEDE, the CC program will continue and the leadership team has been crafting a sustainability plan
Champions find value in the CC allocations (about 40 CCs out of 600 have active allocations) from Service Providers as well as significant value from the XSEDE allocations reports
Champions would like clarity regarding whether the individual SPs will continue to provide allocations post-XSEDE.
SPs noted that the ACCESS awardees should be known soon, but the SPs are committed to continue to provide allocations in some form. For example, TACC noted that it would be happy to continue allocations going on their systems using local accounts if necessary. There is hope that the allocations infrastructure and process will continue in ACCESS, rather than each SP having to manage parallel allocations. In general, CC's usage of the allocations they have received has been quite modestSPs persist across XSEDE and ACCESS, and CCs will too.
Discussion of how best to manage allocations if the number of champions per campus/institution grows significantly, as there IS overhead for the SPs in terms of account management. One recommendation was to create one shared project per institution.
Another suggestion was perhaps separate the types of CC allocations into (a)kick the tires – low CPU utilization, debugging mostly and (b) testing codes and running science, which are more intensive
XSEDE program news: Tim reported that the XSEDE Quarterly meetings had been held earlier in March, in a hybrid mode. A future meeting topic for the SP Forum will be: SP experiences over XSEDE and Lessons Learned, to help inform XSEDE's wrap up reports and insights to the NSF and community.
Next meeting: March 31, 2022
Meeting: February 17, 2022 @1PM Pacific/4 pm Eastern
Attendees: Jaime Camboriza, Davide Del Vento, Craig Earley, Jeremy Fischer, Vikram Gazula, Jim Griffioen, Robert Harrison, John Huffman, Doug Jennewein, Honggao Liu, Ruth Marinshaw, Linh Ngo, Sudhakar Pamidighantam, Sheri Sanders, Sergiu Sanielevici, Joe Schoonover, Anita Schwartz, Eva Siegmann, Preston Smith, Carol Song, Dan Stanzione, Mary Thomas, Grigori Yourganov
Agenda:
Welcome
SP Coordination Updates (Tabitha Samuel) - Ookami will soon be an allocable resource through the XRAC process
Ookami presentation and discussion (Robert Harrison and Eva Siegmann)
Ookami is a testbed system, up for 2 years, 1 year operating as a testbed; 5 year project, + 1 additional year to ops)
mid-2022 move into production,
Move to L2 SP (Oct’22) — 90% allocations dedicated to XSEDE
slides on Ookami (Japanese for wolf); motivated by Fugaku
located at StonyBrook, NSF funded
1.5 million node-hrs/year; 32GB@1TB/sec
211 users (90% US, 93% academic). List of the 71 current projects @ https:www.stonybrook.edu//ookami/projtects
support:
website https://www.stonybrook.edu//ookami/ ;
email
ticketing system (https://iacs.supportsystem.com);
biweekly office hours (Tues 10am, Thur 2);
slack
Qs:
SVE compilers are getting better; performance is not what was expected
TACC: still waiting for the Fujistu compiler; how does it compare to ARM compiler?
Fujitsu and CRAY are generating good SVE code
Suggested by Robert: Fujistu ARM compiler is more standard, so better place to start
Lack of vectorized math is factor 40x different
e.g. memory bound code not looking great
ARM compiler getting better for TACC: everything compiles now they are working on optimization
apps :which work best?
thread scaling is good for large memory BW
issue: short vectors add to instruction latency and power consumption
good with apps operating on long vectors
how are legacy codes doing (in particular chemistry)?
most "out of the box" codes are working fine
for well-vectorized codes, they have reasonably competitive code
vectorized scalar not doing that well.
How is NWChem working?
very well; Fujitsu compiler is giving good performance
expect different perf between mat-mat-mult and vectorization of FP operations
training
webinars on ARM vectorization and profiling tools
Data and Testbed?
each node has SSD (512GB)
high-bandwidth/local storage (talking with NSF (Bob Chadduck) to expand to multi-terabytes)
Modern (2008) fortran support/what features will Fujitsu support
ARM compiler (C/C++) is good
Cray is best (C++ 17 support)
Fujitsu — challenging
Are commercial vendors working on compiling software for the system?
they have 1: Parallelware Analyzer; testing on Ookami
VAPS, working with XDMOD team to get “top 10 or 15”
Are they expecting a lot of startup allocation requests?
Yes. They expect a lot of exploration application
Roundtable
Joe Schoonover noted that Fluid Numerics is working with AMD to have a co-located "lunch & learn" on HIP and OpenMP at PEARC22
Next meeting: March 3, 2022 – seeking volunteers to present.
March 17, 2022 meeting - Mary Thomas is organizing a panel or discussion around training
Meeting: January 6, 2022 @1PM Pacific/4 pm Eastern
Agenda:
Welcome
SP Forum leadership for remainder of XSEDE project
Roundtable discussion
Next meeting: January 20, 2022
Meeting: December 8, 2021 @1PM Pacific/4 pm Eastern
Welcome
XSEDE Evaluation Team Activities - Julie Wernert
Roundtable
Next meeting: January 6, 2022
Meeting: October 29 @1PM Pacific/4 pm Eastern
Welcome
XSEDE's YouTube channel (Susan Mehringer)
Featured L3: Arizona State (Doug Jennewein)
XSEDE Program Updates (John Towns/Tim B)
ROI Survey Reminder
Next meeting: November 11, 2021
Meeting: July 8, 2021 @1PM Pacific/4 pm Eastern
Agenda:
Welcome
Dana Brunson, Internet2: New Center of Excellence
John or Tim: XSEDE annual review recap
Roundtable (all)
Next meeting: August 5: Terminology Training
Meeting: June 10, 2021 @1PM Pacific/4 pm Eastern
Welcome
David Wheeler, XSEDE Data Transfer Service (DTS) Engagement
Ruth Marinshaw: Fluid Numerics' SP Forum membership application update
Newest SPs: updates on onboarding and allocation processes
John or Tim: XSEDE annual review recap
Roundtable (all)
Meeting: May 27, 2021 @1PM Pacific/4 pm Eastern
Welcome
Honggao Liu: Texas A&M's FASTER system and HPC activities
Ruth Marinshaw: Fluid Numerics' SP Forum membership application
Tim: XSEDE program updates
Meeting: April 15, 2021 @1PM Pacific/4 pm Eastern
Attendees: 20
Welcome
CC* CyberTeam from Utah, Colorado and Colorado State (RMACC): “Creating a Community of Regional Data and Workflow Cyberinfrastructure Facilitators”. (Brett Milash, Andrew Monaghan, Mara Sedlins)
SP Coordination: working towards a single source of SP info (Tabitha Samuel)
XSEDE News (Tim Boerner)
Reminder to sign up to review XSEDE's plans for the coming year (https://docs.google.com/document/d/1R--DEdoo0_A7a-d2n4fYiqpa9Gp7dQpe7AZ8MnfMmBM/edit )
Roundtable (all)
Meeting: April 1, 2021 @1PM Pacific/4 pm Eastern
Attendees: 30
Welcome
XSEDE Terminology Task Force - Linda Akli & Susan Mehringer
Status: list, docs, outreach (Slides at events)
Subtasks:
intake/review/add new terms;
publicity/post list;
training/orientation; internal communications; tracking metrics/records
Next: XSEDE rollout
Each event has pre-slides to show
Terminology Statement is now linked on bottom of home site
Replacement terms:
master branch —> main branch
white paper —> position paper, publications, etc.
other word: disabled, brown bag, sanity check
NIPS project —> changed acronym —> NaIP
PY11 Planning - Tim Boerner
Annual project planning exercise (Sept 1 - Aug 31), start 6 months before; in time for June NSF meetings
Goals: what would people do/not do with +- 5% change in budget?
looking for input from L3/L2 program areas… what should XSEDE do next year for the SP?
Leslie Froeschl will put out a link to a google doc describing what they are seeking
XSEDE transition and spin down - John Towns
NSF ACCESS; does not cover all XSEDE activitie along with reduced support some existing activities
creates risks for XSEDE and SPs in final deliverables
May be a slightly higher risk near the end of XSEDE.
how to prepare for transition?
L1 SPs; having discussions with program officers about what to do after XSEDE services disappear? (sergio)
Concern for areas without an obvious continuation
XSEDE may be understaffed — staff may move on to other positions
could SPs possibly find staff who can step in and finish project tasks?
will impact deliverables and support
How are L1 SPs planning for end of XSEDE?
what are the gaps in service — can we identify them?
support
training —> what will replace this?
user portal
allocation systems
ECSS —> end of ECSS will be a big gap in funding
Campus Champions
PEARC
what types of mitigate plans should be put in place?
Next Meeting agenda: SP data cleanup and RMACC CC* CyberTeam presentation
Roundtable (all)
Meeting: March 18, 2021 @1PM Pacific/4 pm Eastern
Rockfish - a resource coming to XSEDE mid-2021 (Jaime Comboriza, JHU)
XSEDE News (Tim B or John T)
Roundtable (all)
(To add: interim meetings)
Meeting: January 7, 2021 @1PM Pacific/4 pm Eastern
Welcome new SP Forum members
Site Update: Stanford
XSEDE Project news (Tim B)
SP Coordination news (Tabitha Samuel)
SP Forum elections upcoming (Ruth)
Roundtable (all)
Meeting: December 17, 2020 @1PM Pacific/4 pm Eastern
Attendees: Jon Anderson, Tim Boerner, Jaime Combariza, Keith Crabb, Matt Deaton, David Hancock, Andrew Keen, Ruth Marinshaw, Kenton McHenry, Sudhakar Pamidighantam, Tabitha Samuel, Carol Song, Sergiu Sanielevici, Dan Stanzione, Mary Thomas, John Towns, Jim Wilgenbusch, Paul Williams
Agenda
Announcements (Ruth Marinshaw)
Chet Langin is retiring from SIU; Matt Deaton and/or Jerry Richards will represent SIU as an L3 SP Forum member.
L3 Update: The Minnesota Supercomputing Institute (Jim Wilgenbusch, Director of Research Computing)
MSI’s Research Computing team provides access to diverse data & computing services; training and support via tutorials and other means across 2 campuses and 6 colleges; dedicated staff expertise and support written into grants; and deploy and test novel services. In addition, they provide seed funding for informatics research projects as well as a graduate assistantship program for Informatics students.
Key service area of focus is U-Spatial (https://thespatialuniversity.umn.edu) with expertise in GIS, remote sensing and spatial computing
HPC resources include 2 HPC clusters, storage (6 PB high performance, 6 PB second tier, 30 PB tape library), interactive computing (Citrix, DCS Nice, Open Stack-based secure cloud), Galaxy, Jupyter Hub and custom web interfaces, databases and applications
75% of compute resources are used by Engr, Chem and Physics; 50% of the storage is used by Biology, Genetics & Health Sciences.
New major systems are purchased every 5 years. Next system coming spring 2021. Also have a condo-like option to buy in with a one time fee that covers equipment purchase as well as staffing to support those systems for 5 years.
Secure computing environment has 42TB VM block storage, 1 PB NFS storage, and nodes have 2TB SSD per node for job-allocated local scratch.
MSI has 45 FT staff; UM’s Informatics Institute (UMII) has 8 FTE; U-Spatial has 10.
Jim reports to the VPR; the MSI, UMII and U-Spatial programs report to him. There is a substantial governance structure associated with CI services.
XSEDE News (Tim Boerner)
The XSEDE Quarterly Staff meetings were held last week
Working on diagrams to illustrate the interconnections between the various XSEDE program areas
Coming in January: SP Forum Elections
Next meeting: Thursday, January 7, 2021.
Meeting: October 29@1PM Pacific/4 pm Eastern
Attendees: Jon Anderson, Tim Boerner, Jaime Cambariza, Alan Chalker, Kevin Colby, Keith Crabb, Jeremy Fischer, David Hancock, Ron Hawkins, Dave Hudak, John Huffman, Andrew Keen, Chet Langin, Dan Lapine, Lee Liming, Ruth Marinshaw, JP Navarro, Sudhakar Pamidighantam, Tabitha Samuel, Scott Sakai, Sergiu Sanielevici, Anita Schwartz, Bob Settlage, Shava Smallen, Preston Smith, Dan Stanzione, John Towns, Brian Voss,
Agenda
The primary agenda topic was to return to the discussion of ways to deliver Open On Demand for XSEDE SPs. Since discussed by the SP Forum earlier in 2020, the OOD development team and the XSEDE Requirements Analysis & Capability Delivery (RACD) team have been meeting to discuss possibilities and options. Their recommendation was as follows:
“Our recommendation for serving OOD to XSEDE users is to create a service provider specific OOD portal that served researchers from that SP but served as a portal to BOTH the local resources (simply PSU for instance) AND the XSEDE resources (local and remote, but moderated by what the user had access to), ie the user would navigate to ood.psu.edu and be able to see and hit any resource they have access to including local and XSEDE. We do think there should be some consistency across SPs.
For completeness, the options we discussed included:
create SP specific OOD portals managed by the site to server their XSEDE resources only, ie if a VT investigator had access to PSU XSEDE resources, they would go to
create a master XSEDE OOD portal that contained the connectors to the various SP resources, ie a VT investigator would go to
ood.xsede.org and be able to submit to the resource they were allocated on, ie xsede.psu
create a SP specific OOD portal that served researchers from that SP but served as a portal to BOTH the local resources (simply PSU for instance) AND the XSEDE resources (local and remote, but moderated by what the user had access to), ie the user would navigate to
ood.psu.edu and be able to see and hit any resource they have access to including local and XSEDE.”
Draft use cases were reviewed and discussed based on this document: https://docs.google.com/document/d/1Vs87wUkRP9gPLP3ZC4JdE6YEdXhKo5RUGWZLqlQaEdY/edit
Key questions explored were what would OOD do for users? Should XSEDE do this and, if so, how? It was noted that several SPs have already adoped OOD. Among the use cases that are compelling is that of training - participants don't have to download and configure a tool like Putty, etc. SDSC noted that it has configured systems to use federated authentication. OOD is also a powerful way to provide access to Jupyter notebooks and to applications like Matlab and Ansys. The OOD developers are discussing integration with Globus.
A vigorous discussion was held, with Forum members ultimately supportive of the recommendation from OOD and XRAS.
The forum will not meet again until December, given Supercomputing and Thanksgiving.
Meeting: October 15@1PM Pacific/4 pm Eastern
Attendees: Jon Anderson, Jaime Cambariza, Kevin Colby, Jeremy Fischer, Dave Hart, Dave Hancock, John Huffman, Andrew Keen, Chet Langin, Rob Light, John Lowe, Ruth Marinshaw, Tabitha Samuel, Sergiu Sanielevici, Anita Schwartz, Carol Song, Mary Thomas, Nathan Tolbert, 14 others
Agenda
XSEDE's RAS team discusses an updated AMIE environment for usage data and pushing project info out to SPs (Dave Hart, Rob Light)
AMIE -XSEDE's account management and information exchange. Work is in progress to overhaul its infrastructure and database and to improve analytics
current: data transport using XML via SSH or rabbitMQ
new features – target date is the end of this program year
new -- Rest API
transactions are mostly the same:
e.g. request_project_create
notify_project_usage now has its own API
some SPs are starting to use
legacy AIME required SPs to maintain a local DB
new version does not need localDB,
use new API to query
jobs sent to XDCDB individually via AMIE, very slow (transmission and processing —> bottleneck)
e.g.: 100,000 tiny jobs in 1 day; and this impacted all jobs
new usage API, restful; different URL; but same AMIE auth system
non-blocking transactions; uses AWS lambda so easy to scale functions so can handle different loads
currently process 1million jobs / month (5400 jobs/hour)
new API can process this in 2 hours
new: POST / usage API
post many jobs at once
queues all jobs for async. processing
jobs that fail return with error msg
POSTing jobs already in the database will be replaces automatically (no longer manual)
GET /usage/ status
From and TO dates; sound of job records; sum of charge for each SP
provides list of records that did not load —> error log table
if job fails, SP can repost again
Usage API — avail soon
GET usage_summary, ….
different job attributes (CPU, GPU, storage, …)
AMIE Client library posted on GitHub:
Other tasks in redesign:
redesign entire system
simplifying: remove use of/mention of the Grid term
schema design changes to make action processing simpler;
keep balance for each allocation not person. like bank account; job storage —> more flexible and faster
schemas for people and organizations—> right now they are embedded in accounting
changes will mostly be invisible to SPs
RDR redesign: resource description repository
simpler resource entry; better schema for describing resources
Overall, everyone was very pleased with the new system. The SP Forum gave the presenters the closest thing to a standing ovation that could be done via zoom
SP Coordination (Tabitha Samuel) - Working on SP checklist updates as well as a software/services baseline. There is also a Slack channel now for SP sys admins.
Meeting: September 17@1PM Pacific/4 pm Eastern
Attendees: Jon Anderson, Tim Boerner, Jeremy Fischer, Ron Hawkins, John Huffman, Kyle Hutson, Ruth Marinshaw, Addis O’Connor, Sudhakar Pamidighantam, Tabitha Samuel, Anita Schwartz, Carol Song, Sergiu Sanielevici, Derek Simmel, Preston Smith, Dan Stanzione, Barr von Oehsen
Agenda
1, Cybersecurity Updates (Derek Simmel)
No XSEDE systems were broken into but a couple of user accounts were compromised
Cooperation with/info sharing with European HPC community
Use Zeek security monitoring (The Zeek Network Security Monitor ); XSEDE monitors and coordinates communication between allocated SPs
L3s are, though, welcome to participate in XSEDE’s security community and meetings; contact derek (dsimmel@psc.edu) to be added to the meetings and email list
Question about gateway users and potential vulnerabilities/concerns; vigilance in all aspects of information security was recommended
Discussion of persistent intruder threats
Derek gave a brief overview of REN-ISAC (https://www.ren-isac.net/) for those who weren’t familiar and strongly encouraged sites to be sure their campus participates in some way
XSEDE Update (Tim Boerner
Relatively quiet right now
Supplemental year extension for XSEDE was approved
The XAB (XSEDE Advisory Board) discussed a new phase of the COVID-19 HPC Consortium to rebalance the portfolio. Discussions are ongoing with the NSF.
The XSEDE quarterly staff meetings were held September 1 and 2.
Next meeting: Thursday, October 15, 2020.
Meeting: August 20@1PM Pacific/4 pm Eastern
Agenda:
Open OnDemand (Alan Chalker and Bob Settlage)
XSEDE program news and updates (John Towns/Tim Boerner)
Roundtable (all)
Meeting: July 23 @1PM Pacific/4 pm Eastern
Agenda:
Research Computing/Cyberinfrastructure Services at Rutgers (Barr von Oehsen)
XSEDE program news and updates (John Towns/Tim Boerner)
Roundtable (all)
The next scheduled meeting will be Thursday, August 20.
Meeting: July 9@1PM Pacific/4 pm Eastern
Agenda:
Overview of Delta, recently awarded to NCSA (Tim Boerner)
Discussion of Panel report from annual NSF review of XSEDE (Tim Boerner)
SP Coordination and operations updates (Victor Hazelwood)
Roundtable (all)
The next scheduled meeting will be Thursday, July 23.
Meeting: June 11@1PM Pacific/4 pm Eastern
Agenda:
Overviews of 5 recently awarded new and exciting CI systems
NeoCortex - AI Supercomputer, PSC (Paola Buitrago)
Jetstream 2 - Indiana University Pervasive Technology Institute (Dave Hancock)
Anvil - Purdue University ITaP Research Computing (Carol Song)
Voyager - SDSC (Amit Majumdar)
Annual NSF review of XSEDE (Tim Boerner)
Meeting: May 14 @1PM Pacific/4 pm Eastern
Agenda:
Welcome
Tim Boerner - SP Forum discussion of proposed XSEDE PY10 Program plans. See https://docs.google.com/document/d/1-frL9CH6-jS_GYSzOtZdzZ7ts_kHunIC_oERVFdWo2s/edit - look for the yellow highlighted sections and please comment in advance.
Victor Hazelwood - XSEDE Operations and SP Coordination Updates
Open topics - suggestion that all SPs be vigilant in terms of monitoring their systems in light of misuse of many European HPC sites.
Meeting: April 30 @1PM Pacific/4 pm Eastern
Attendees: Amit Amrikar, Jon Anderson, Tim Boerner, Jaime Combariza, Tom Doak, Jeremy Fischer, Dave Hart, John Huffman, Andy Keen, Chet Langin, Julia Levites, Ruth Marinshaw, Kenton McHenry, Susan Mehringer, Tim Middlekoop, Andi Moore, Sudhakar Pamidighantam, Sergiu Sanielivici, Anita Schwartz, Carol Song, Mary Thomas, Barr von Oehsen, Cindy Wong,
Agenda:
Welcome
Mary Thomas- Pivoting from in-person to online events: SDSC's experience with hosting an NVIDIA-sponsored remote GPU Hackathon. WIth the participation of two partners from NVIDIA, Mary led a discussion of how SDSC approached turning what was intended to be an in-person GPU hackathon into a remote-only event. The program included 2 mini-days, with 7 teams formed from 49 people participating from 12 different institutions. At least 20 volunteer mentors were also involved. Other SP Forum attendees noted that hosting technical sessions like this requires a high ratio of mentors:participant, similar to what SDSC has experienced. Tom Doak shared experiences using Canvas as a platform to host a remote R for Biologists seminar. Lots of discussion and lessons learned.
Chet Langin- Pros and cons of moving from an SP L3 to an L2 for a shared multi-campus HPC resource
Victor Hazelwood (if available) - XSEDE Operations and SP Coordination Updates
Meeting: April 16 @1PM Pacific/4 pm Eastern
Attendees: Amit Amrikar, Jon Anderson, Tim Boerner, Tom Doak, Jeremy Fischer, Dave Hancock, Victor Hazelwood, John Huffman, Matt Jones, Andrew Keen, Chet Langin, Ruth Marinshaw, Kenton McHenry, Tim Middlekoop, Sudhakar Pamidighantam, Jeff Pummill, Sergui Sanielivici, Anita Schwartz, Carol Song, Dan Stanzione, Mary Thomas
Welcome
Matt Jones (University at Buffalo's Center for Computational Research) - Update on Ookami, the NSF-funded ARM-based system
Ookami (ARM + SVE); NSF testbed program; PI Robert Harrison from Stonybrook, A64 FX processor
Anticipate installation in Q3 2020
48 cores per system, 176 nodes, .5 PFlops; Post-K system;
not using tofu interconnect, so need to use IB interconnect. Tofu interconnect not being offered outside Japan until 2021.
based on profiling done earlier, believe roughly 86% of XSEDE jobs would 'fit' in terms of footprint
high CPU to memory BW so looking at applications needed higher memory BW
a lot of chemistry applications
integrate with XSEDE Fall 2020 (maybe)
Testbed (~2year)
Green 500 /. 150 th in Top500 (SC19); 160/170 Watts/node
Fujitsu investing in SPACK; on track for shipping;
paper referenced: "A workload analysis of NSF’s Innovative HPC Resources Using XDMoD”, N.A. Simakov, et. al., arXiv
Q3 2020 initial deployment;
XSEDE integration may be sooner than originally scheduled; startup allocations in 2021 (maybe?)
In terms of support for users, talking with the team at MolSSI (https://molssi.org/)
Tim Boerner - XSEDE program updates, XSEDE response to COVID-19, etc.
Tim is XSEDE deputy proj. director as of early/mid-March
project year 10 planning; annual review will be remote in June, versus the typical F2F review at NSF
working on program plan for next year; needs SPForum input;
NSF invited XSEDE to work on a proposal to extend XSEDE 20 into a project year #6
Covid-19 updates: HPC Consortium — https://covid19-hpc-consortium.org/
XSEDE/XRas coordinating this effort. quick response/quick allocation; JTownes matching
65-70% success rate for applications; everything picked has been matched, 3-4 on big cloud, doe 3-4, NSF has 6-10; SDSC has 4; Pitts has 3, 2 @LLNL; 10-12 providers
using portal to manage Covid processes and applications
Dan Stanzione noted that TACC is hosting a number of other COVID-19 related projects outside those allocated through the HPC Consortium
operations are ’normal’ across all XSEDEsystems.
Victor Hazelwood - SP continuity planning in times of COVID-19.
XSEDE pandemic planning questionnaire to allocated SPs —> all sites are fully operational; staff working from home; only OSG had some impact
From COVID point of view XSEDE is helping the national infrastructure
Campus bridging: NCGAS (https://ncgas.org/about.php ) computational tools
Has the covid work affected any other allocated projects?
34 applications overall working on COVID-19
Epidemiologists required the most handholding - new to HPC
20% of TACC cycles went to COVID-19; 300K hours on TACC
Those COVID-related jobs have top priority
Frontera can be used for PHI-containing projects; talk to Dan.
Tom Doak/David Hancock had 400 people take an online Biology course – lots of interest in genomics and genome analysis as people pivot to do COVID-19 research.
Open topics
Meeting: March 19 @1PM Pacific/4 pm Eastern
Welcome
Dana Brunson - Internet2 community engagement and program overview
Nick Nystrom - Bridges-2
XSEDE program updates, XSEDE response to COVID-19, XSEDE Staff Quarterly meeting
Meeting: February 20 @1PM Pacifc
Welcome
Gerard Lemson, JHU Institute for Data Intensive Engineering and Science - Gerard will talk with us about SciServer, a collaborative environment for server-side analysis with extremely large datasets
John Towns if available - XSEDE program updates, XSEDE response to OAC Blueprint, news, words of wisdom, etc.
Open topics
Meeting: February 6 @1PM Pacific/4 pm Eastern
Attendees: Dan Andresen, Jon Anderson, Jaime Combariza, Tom Doak, Jeremy Fischer, Dave Hancock, Victor Hazelwood, John Huffman, Ryan Johnson, Andrew Keen, Rich Knepper, Chet Langin, Ruth Marinshaw, Rick McMullen, Susan Mehringer, Tim Middlekoop, Frank Mohn, JP Navarro, Sudhakar Pamidighantam, Sergiu Sanielivici, Anita Schwartz, Shava Smallen, Carol Song, Dan Stanzione, Mary Thomas
Agenda:
Welcome - Sudhakar will be representing the Science Gateways Community Institute this semester.
Rick Mohr: XSEDE DTS team consultation services for optimizing and troubleshooting data workflows (slides available at https://confluence.xsede.org/download/attachments/5735148/DTS-SP-Forum-Presentation%282%29.pptx?version=1&modificationDate=1582221553979&api=v2)
Data Transfer Services:perfsonar, gridFTP logging, internet2 metrics,
new tech: Zettar
consultation services: new program by DTS
work with DTS, like ECSS, 3-4 months
network performance, configuration, etc.
beta project: U Col data transfer problems;
found DTN bottlenecks and fixed them
send requests to
Subject: DTS Consultation
Questions:
Can users and SP’s submit a ticket to XSEDE consult?
Yes, that should work, RTS ticket also
Can (Jon Anderson) talk about experience working with DTS
work done by others; DTS did alleviate the bottlenecks, now looking at new node
Q about best practices outcomes? based on summary/outcomes of engagements
Some of the best practice materials on faster data transfer is available at fasterdata.es.net provided by DOE Energy Science Network
Will DTS help anyone in the XSEDE ecosystem?
yes. services mostly limited by staff time - large time commitment
Have you published any of this work?
not yet, just focussed on the pilot project
suggestion to do a PEARC paper or a BOF
Shava Smallen, Rich Knepper, JP Navarro: XCI (XSEDE Cyberinfrastructure Integration) activities. Slides are available at https://docs.google.com/presentation/d/1ykwim3U1BwOzlNyWrRS3elubxWb0jxWB7mbhQjcll5U
UREP(Use Requirements Evaluation and Prioritization): Requirement, analysis, and capabilities group
1/3 SP Forum; 1/3 Campus Championss; rest from other areas of XSEDE
11/19 UREP Results
10 use cases. Top ones:
documentation for containers by SP sites, provide a 1-stop page
Vulnerability consulting and resolution assistance
Enhancing SW and data sharing using CVMFS (enables sharing of read only data), from OSG.
remote login;
security (CI Silver CAs; CoManage for groups);
CI discovery and information access: cloud images and container discovery; training discovery; scheduler job info, etc.
data capabilities: data analytics, metadata manage, globus tools
XSEDE XACS (XSEDE Acronym Cheat Sheet): https://confluence.xsede.org/pages/viewpage.action?pageId=1671279
Recent needs gathering efforts:
face-to-face sessions with XSEDE community: 7/15/19 SP Forum @ Stanford; PEARC 19 BOF; 12/4/19 quarterly mtg
https://software.xsede.org/node/add/need — users can submit and issues will be tracked
ECSS and XCI
https://software.xsede.org/expanding-xsede-software-through-collaboration
phase 1: contact 4 domestic LOS projects (Whole Tale, Pegasus, NCGAS, MOLSI) - projects funded to produce SW that should be of interest to XSEDE community
phase2: reach out to CSSI awards, campus bridging efforts, and L3 Sps
CRI (CI Resource Integration):
best practices; SP input
SP Coordination: facilitate new members; 6 new SPs this year; 4 have never been part of XSEDE; new resources OOKAMI, Open storage network
Campus outreach: visit, provide containerized applications/repository on docker hub; HPC and OpenStack deployment help for campus clusters/clouds
Questions:
what is the process to get needs: they are forwarded to UREP;
Rich: How would a campus or L3 SP request campus bridging help? ?rich.knepper@cornell.edu and use "CRI help requested"
Dan Stanzione: Frontera Update
TACC/Frontera:allocations:
Leadership Resource Alloc (LRAC) (used to be PRAC, but deadline passed) (1/17/20)
Pathways: small allocations, independent of XSEDE request (3/8)
Large-Scale Community Partnerships (LSCP):
up to 3 years; software institutes, large community code providers or gateways; merit review
Discretionary (rolling)
Fellowships (2/7/20)
questions:
how can we know everyone who got allocations? previously 37 PRAC awards last year were NSF awards, so they are on NSF; newer awards will be published on TACC website, estimate 25-30 large.
Training: online training dev with Cornell; also doing some webinars, etc.
Rick McMullen: ROI data requests from L2 and L3 Service Providers
Reminder to read and ponder XSEDE's response to the NSF CI Blueprint (attached to this week's meeting reminder)
Meeting: January 23 @1PM Pacific/4 pm Eastern
Attendees: Jon Anderson, Jaime Cobariza, Jeremy Fischer, Nathan Gregg, Dave Hart, Victor Hazelwood, Ryan Johnson, Andrew Keen, Chet Langin, Ruth Marinshaw, Sergiu Sanielivici, Anita Schwartz, Carol Song, Dan Stanzione, Mary Thomas, and a few more
Welcome - welcome to Ryan Johnson, the new SP Forum representative for the University of South Dakota.
SSO Hub Grace period: Dave Hart, in his role as XSEDE's RAS Director, sought input from SP Forum members regarding a proposed 90 day grace period for SSO Hub login access post-expiry of an allocation. This action was prioritized by the XSEDE UREP (User committee). Most SPs offer a grace period at the end of an allocation period to allow users to archive and clean up their work. The SSO Hub currently does not have such a grace period; rather, when an allocation ends, so does SSO Hub login access. His proposal is for RAS to modify the query currently used to enable SSO Hub logins to allow for a 90 day grace period. Forum members agreed that this is a reasonable policy change.
Fields of Science: Dave Hart led a discussion of the need to update XSEDE's current Field of Science (FOS) list. The current list is limited to domains funded by NSF, even though XSEDE allows research supported by any agency. The current FOS list has 163 possible options. He noted that some represent NSF programs that no longer exist, and many of the 163 entries aren't fields of science at all. There is a 'standard' FOS hierarchy that is maintained by the OECD; it is used by European projects and perhaps others. XSEDE's proposed updated FOS list would expand only on Natural Sciences; the complete list would then have ~90 options (versus today's 163). Forum members discussed the hierarchy, implications of having more granularity in some areas and less in others (for example, there are currently 5 flavors of Astronomical Sciences; with the new taxonomy, there would be one.) SP members noted that being able to produce metrics based on FOS is essential to them. There was some discussion of the timing of the change; if effort is substantial, is it worthwhile given the relatively short remaining time for the XSEDE project's funding. In general, members were very supportive of the proposed change.
Post-SU Approach to Sizing Allocations: Dave Hart began a discussion of the need to size allocations differently as a number of resources that are allocated (or will be) are not traditional compute resources. And the Normalized Unit (NU) doesn't work for GPU-based systems, storage, or memory-based units. Given time constraints, he will come back to the SP Forum to continue what was shaping up to be a very interesting discussion.
Elections for Calendar Year 2020 SP Forum Leadership: SP Forum Leaders were elected for calendar year 2020. They are: L3 representative - Chet Langin L2 representative - Carol Song XAB representatives - Jon Anderson and Dave Hancock Vice Chair - Mary Thomas Chair - Ruth Marinshaw
Discussion of proposed changes to SP Forum and XSEDE Federation membership document - The SP Forum's discussion centered around proposed modifications to the L1 and L2 categories of membership. The proposed change with the highest impact is to treat a resource that is allocated through XSEDE in any way - partially or fully - as L1 resources. The group voted to accept the proposed changes to the Requesting Membership in the XSEDE Federation document, with a suggestion that consideration later be given to streamlining further, perhaps combining L2 and L3. Ruth will share this with the XSEDE SMT; the SMT had discussed the proposed changes at some length in the fall of 2018.
Roundtable: updates from any/all SPs: Reminder to complete SP Survey by 2/13/2020 (email each SP member would have received from Indiana University); contact Julie Wernert (jwernert@iu.edu) if you have questions or cannot find yours.
Next meeting: February 6, 2020, same meeting coordinates as always.
Meeting: December 12, 2019 @1PM Pacific/4 pm Eastern
Attendees: Linda Akli, Jaime Cobariza, Lizanne DeStefano, Tom Doak, Jeremy Fischer, Victor Hazelwood, Andrew Keen, Chet Langin, Ruth Marinshaw, Tim Middlekoop, Marlon Pierce, Sergiu Sanielivici, Mike Showerman, Preston Smith, Dan Stanzione, John Towns
Welcome
Code of Conduct: Lizanne DeStefano and others
XSEDE program, operations and SP coordination updates: John Towns and Victor Hazelwood
Nominations for SP Forum leadership positions for 2020
Roundtable: updates from any/all SPs
Meeting: November 14, 2019
Attendees:
Linda Akli, Jay Alameda, Jon Anderson, Jaime Cobariza, Jeremy Fischer, Dave Hart, Victor Hazelwood, Tom Maiden, Ruth Marinshaw, Kenton McHenry, Susan Mehringer, Tim Middlekoop, Henry Neeman, Marlon Pierce, Sergiu Sanielivici, Mike Showerman, Dan Stanzione, John Towns, Lan Zhao.
Agenda:
Welcome - Ruth Marinshaw
Open Storage Network - Kenton McHenry. Kenton described the Open Storage Network (OSN) as a pilot NSF effort to establish a missing piece of the distributed national CI ecosystem – distributed storage. This pilot was funded to help address unmet storage needs around active datasets, in close proximity to computational resources, and with an eye towards better leveraging previous NSF CI investments (such as Track 1 and 2 systems, Gateways, Big Data Hubs, etc.). He anticipates that many of the active datasets to be hosted in OSN till be those from small projects. He and the other OSN PIs are working with XSEDE staff on integrations needed for OSN storage to be allocated through the XRAC process. Hardware has been delivered to most sites. SP Forum members and other attendees asked about ECSS training, sustainability model for the PODS (building blocks), user community training, and more. The anticipated timeline for 'go live' is Summer 2020.
XSEDE program, operations and SP coordination updates
SC19 - recap of 'known' XSEDE-related presentations and events at SC19, including Discover More with XSEDE, Champions meeting, and Breakfast with ECSS and Champions. In addition, as part of the XSEDE ROI effort, there will be an L3 feedback session at the IU booth (SPs received an email about that earlier on November 14.)
Roundtable: updates from any/all SPs - We will not meet on November 28 as it coincides with Thanksgiving. Ruth, Jon, Tom D, Mary and Dave H will, as they can find time, start working on draft modifications to the SP Forum's charter and other operating documents.
Next meeting: December 12, 2019
Meeting: October 17, 2019
Agenda:
Welcome and introduction of new SP Forum member
Preston Smith, Executive Director of IT Research Services and Support, will provide the SP Forum with an overview and update on Purdue's cyberinfrastructure systems, services and activities. Purdue is an L2 member of the SP Forum.
Victor H. will have an XSEDE Operations and SP coordination update.
Follow-up from October 3 meeting: SP Forum interest in communicating to NSF Program Officer about XSEDE follow-on
Review and initial discussion of the SP Forum Charter, especially the membership levels. Please take time, in advance of the meeting to read through thecurrent charter in advance; you can find it at https://www.ideals.illinois.edu/handle/2142/49980 . Please also review https://confluence.xsede.org/download/attachments/1671600/Requesting%20Membership%20in%20the%20XSEDE%20Federation-v3.00.docx?version=1&modificationDate=1544499345000&api=v2 .
Meeting: October 3, 2019
Attendees:
Jon Anderson, Jaime Cobariza, Jeremy Fischer, Dave Hancock, Victor Hazelwood, Andrew Keen, Chet Langin, Tom Maiden, Ruth Marinshaw, Susan Mehringer, Marlon Pierce, Sergiu Sanielivici, Anita Schwartz, Mike Showerman, Carol Song, Dan Stanzione, John Towns.
Agenda:
Susan Mehringer, Cornell University and XSEDE Training lead - Susan provided an overview of the breadth of XSEDE's training programs, see https://portal.xsede.org/training/overview as a starting point. Training offerings are provided through multiple delivery methods and locations. The course catalog details in person and multicast offerings. In person trainings are offered mostly monthly as multicast webinars coordinated through PSC with up to 25 participating sites for each training for up to 500 total participants. There are also training videos available on You Tub through the XSEDE channel. She noted that XSEDE offers many other training opportunities through ECSS, broadening participation activities, and many SPs also provide training. Advancements with Big Data is relatively new and there's lots of interest in Python. Participants discussed some of the advantages of in-person training versus standard webcasting of trainings. Technical challenges include issues with slides across an entire day; data also show that from a content retention perspective, in-person training with an on-site mentor available to assist is advantageous. For those who cannot host or attend in person, we can point them to the videos and exercises. There was discussion around the management of remote site registrations, with general consensus that overbooking is a reasonable strategy. Forum members suggested it would be useful to have a training course focused on SLURM.
John Towns - XSEDE is now in the early part of year 4 of its 5 year program; then what? There is assurance from NSF that there will be some type of follow on but no details are available yet. Several people anticipate that the next solicitation may actually be multiple solicitations; pros and cons of that were discussed. The Forum will discuss whether to author a letter of concern to NSF Program Officer Bob Chadduck regarding the urgency, from a community perspective, of issuing the solicitations now. John also would like to come back to the SP Forum soon for a more detailed discussion of new collaborations between XSEDE, PRACE and Japan; soon there will be a release of a call for researchers who have collaborations with those sites to request start-up allocations on SPs.
Next meeting: October 17, 2019
Meeting: September 5, 2019
Attendees: Amit Amritkar, Jon Anderson, Jaime Cobariza, Tom Doak, Jeremy Fischer, Dave Hancock, Victor Hazelwood, Doug Jennewein, Andrew Keen, Chet Langin, Ruth Marinshaw, Rick McMullen, Tim Middlekoop, Marlon Pierce, Sergiu Sanielivici, Anita Schwartz, Mike Showerman, Shawn Strande, Mary Thomas
Agenda:
Welcome, introduction of new SP Forum L3 member University of Houston Data Sciences Institute - Represented on the SP Forum by Amit Amritkar.
Service Provider Update: SDSC - Shawn Strande provided an overview of the recently awarded SDSC compute resource, EXPANSE, that will be allocated through XSEDE. EXPANSE is an evolution of COMET with a number of enhancements and new features. Among those are cloud integration (SLURM plus in-house enhancements plus Terraform). SDSC also has a new CC* award, Triton Stratus.
XSEDE Operations and SP coordination update - Victor noted that with the recent NSF compute awards, there will be new types of resources as Service Providers, specifically the ARM-based system. He and others from XSEDE are in discussions with a group that may want to be an SP to allocate and share datasets, rather than allocate compute cycles. This would represent yet
another new resource type. Jaime noted that JHU also has a resource that 'serves' datasets. NICS/UTK was the recipient of an MRI award for secure packet capturing; perhaps a "data serving' SP could be used by that project, for example, to publish and share data.Scheduling Service Provider updates for remainder of the year - we will continue a series of presentations from various SPs, with Purdue scheduled for 10/17 and NICS perhaps 10/3 or later. Ruth will reach out to Nick at PSC to request an update on Bridges2.
Open Topics - Jon inquired about hosting remote workshops for XSEDE and who the POC was; he was pointed to Tom Maiden. There were also suggestions to have ECSS talk about reaching out to researchers and how to break down barriers. The Forum also had updates from several members regarding this morning's NSF webinar on NSF 19-534.
Next meeting: October 3, 2019
Meeting: August 8, 2019
minutes to be added
Victor Hazlewood - continuation of discussion of standardizing SP ticket handling procedures for allocated resources
Jon Anderson - How are SP’s handling/experiencing these things?
IGTF certs for use with XSEDE and OSG
Alternatives to GSI-SSH for federated authentication
Alternatives to Globus under consideration? Fact or fiction?
Recap of SP Forum lunch discussion at PEARC19
Meeting: July 25, 2019
Agenda below; minutes coming soon
XSEDE program updates (John Towns)
Gauging SP interest in participating in XSEDE External Relations Scavenger Hunt at SC19 (Ruth Marinshaw on behalf of Hannah Remmert and Leslie Froeschl)
PEARC19 lunch gathering (Ruth Marinshaw)
Next subsequent meeting: Thursday, August 8, 2019
Meeting: July 11, 2019 @ 1PM Pacific/4 pm Eastern
Attendees: Jon Anderson, Jaime Cobariza, Chet Langin, Tom Maiden, Ruth Marinshaw, Rick McMullin, Sergiu Sanielevici, Anita Schwartz, Mike Showerman, Dan Stanzione, Mary Thomas
Agenda:
Victor Hazelwood: Continued Discussion of SP ticket handling standards. The group had a vigorous discussion of possible approaches to standardizing how tickets are triaged and closed across the various allocated SPs. Tickets come in to the XSEDE XOC; the XOC answers any tickets that they can. If they cannot answer, then the XOC "tags" each ticket for the appropriate SP and moves the ticket to the right bucket (sort of like ticket queue) in the RT ticket system. However, not all SPs run their own RT system. It was noted that in some cases, the "tagged" tickets get duplicated in the assigned SP's own local ticket system. Tickets may get resolved in that, but sometimes folks forget to go back and close the original ticket in the XSEDE XOC's RT system. A number of questions were asked including the feasibility of different proposed approaches to automatically closing the XOC's RT ticket when a local ticket is resolved. Victor will investigate and come back to the group after PEARC to share what he has learned.
Sergiu Sanielevici; XSEDE Novel and Innovative Projects (NIP) effort: Sergiu provided an overview of XSEDE's NIP projects, noting that NIP provides advice and lightweight consulting. He shared a set of informative slides that will be uploaded to the SP Forum's wiki. He specifically asked SP Forum members to reach out to him if they encounter projects that could benefit from NIP.
Dan Stanzione: Update on TACC GP/GPU Cluster: In the remaining five minutes of the meeting, Dan provided a lightning fast overview of the GPU makeup of Frontier. He noted that the original proposal for Frontier had specified use of desktop GPUs. However, between the time of proposal submission and award, NVIDIA widely promulgated a EULA in which the terms and conditions precluded wide usage of desktop GPUs in data centers. Recently, NVIDIA developed a program to offer Quaddro RTX GPUs at competitive prices. Frontier with have RTX5000s (my notes say 96+360 as of 9/1/19 but I will confirm with Dan). In addition, TACC's Longhorn system (IBM P9s, similar to Sierra) will have 448 V100 GPUs. Part of Longhorn is intended to be available to NSF projects.
Reminder of XCI request for in-depth discussions of user needs with each XP and also the opportunity to talk with the XCI team at PEARC19.
Meeting Frequency discussion was deferred to the next meeting.
Next subsequent meeting: Thursday, July 25
Meeting: June 27, 2019 @ 1PM Pacific/4 pm Eastern
Attendees:
Jaime Cobariza, Tom Doak, Dave Hancock, Andrew Keen, Chet Langin, Tom Maiden, Ruth Marinshaw, Henry Neeman, Marlon Pierce, Hannah Remmert, Sergiu Sanielevici, Anita Schwartz, Mike Showerman, Carol Song, Dan Stanzione, Craig Stewart, Mary Thomas
Craig Stewart: SP ROI survey reminder: Please complete the survey. If you have more than 1 person who represents your SP, the survey was probably sent to ONE of you. Please get together across your site, provide answers and submit. As a follow-up, Craig will also resend his ROI paper on utility in cloud computing to the SP Forum list.
Hannah Remmert - XSEDE External Relations: Discussion of XSEDE communications and resources. Hannah provided an excellent overview of the following to SP Forum members:
Informational materials about XSEDE that SPs can request for their centers
Standard XSEDE logos and information about the project which can be used on their websites
Resources which can be used to communicate about the project, such as our new “What is XSEDE?” video
Contributing to our monthly newsletters by emailing er@xsede.org.
Of specific interest to the forum were her tips that : logos matter - if you are going to cite or reference XSEDE, use the right logo! And personal relationships make a huge difference in promoting activities. She is available as a resource to the SPs regarding all XSEDE communications and resources.
All: Initial discussion with SPs regarding ticket standards for SPs along these lines (examples only):How old is too old for an unresolved ticket in any state and therefore changed to a non-open state:
a. How old is too old for an unresolved ticket in any state and therefore changed to a non-open state?
b. What is the reasonable amount of time that a ticket should be allowed to be in user wait that the ticket should then be changed to state of abandoned?
c. What is the reasonable amount of time that a ticket should be allowed to be in internal wait that the ticket should then be changed to state of abandoned?
d. If a ticket is waiting for a 3rd party interaction (separate submission about a problem to a vendor, for example) should that be an internal wait state or some other and how long to wait for that?
Various Service Providers contributed a number of perspectives on this topic:
Tom Maiden at PSC: They don't specifically use XSEDE for their ticketing system. Their internal goal is to respond (not necessarily resolve) within one business hour. How long is too long for a ticket to be open? Agrees we should standardize across SPs. For example, set an expectation with quarterly boundaries. If something is not resolved/solved in a given quarter, close it and then re-open as solutions are presented or discoverd.
Dave Hancock and Jetstream: started out using their own RT instance, then moved to XSEDE's ticketing system.
SDSC: currently using XSEDE's system.
Carol Song and Purdue: now using Purdue's central IT solution.
NCSA: leverage a heavier-handed management approach over tickets: weekly review of open tickets. Asked if the current XSEDE ticketing system differentiates by ticket type?
PSC: What do we consider a "stale" ticket? Perhaps "if 7 days elapse with no communication from the submitter"? If an answer is proposed, should we then close the ticket?
PSC (Tom) again: should we use a basic "open-close" system or one with more states? Perhaps there are ticket assignment status categories that are underutilized.
Discussion around the value of a "sleep" state for tickets but general thinking that that is not an option.
Johns Hopkins noted that it is difficult to allocate SP human resources to ticket rsolution. Do you charge at your site for problem solving beyond a certain level? Stanford and Oklahoma said that at their sites, this isn't practical. IU has a "for fee"option if a ticket request will take >2 weeks to resolve and the reported issue is not a system-specific problem. Purdue uses as CRM to track engagements like this for more extensive support/help requests.
Discussion to continue at the next SP Forum call.
Observations and brief report out from NSF annual review of XSEDE: Ruth, as SP Forum chair, was invited to attend and participate in the June NSF panel review of XSEDE. She reported briefly on the comprehensive nature of the XSEDE program, the extent to which XSEDE clearly demonstrated large scale collaboration and success over many participating sites and programs, and the panel's seeming consensus around the value of the XSEDE program and its value to the national CI community. More to come from John when he is next available.
XCI request: Ruth described (and then followed up with an email to the SP Forum) a request from the XSEDE XCI team for help identifying team members whom XCI could interview to understand and collect user needs information. XCI is making this request as an explicit follow up to a recommendation from the SP Forum. SCI will also have a BOF on this topi at PEARC19 (Monday, 7/29, 5:15 pm). All SPs are asked to participate in the interview and are encouraged to participate in the BOF as available.
Next Meeting: Thursday, July 11, 2019. Topics to include XSEDE NIP (Novel and Innovative Programs) request for input from the SP Forum and continued discussion of ticket standards. SPs should identify an operations team member to attend to address the latter if the official SP Forum site rep is less familiar with their site's approach and needs.
Meeting: May 16, 2019 @ 1PM Pacific/4 pm Eastern
Attendees:
Jaime Cobariza, Tom Doak, Dave Hancock, Doug Jennewein, Andrew Keen, Chet Langin, Ruth Marinshaw, Tim Middlekoop, JP Navarro, Sergiu Sanielevici (for Robin Scibek), Mike Showerman, Shava Smallen, Carol Song, Dan Stanzione, Craig Stewart, Mary Thomas, Nancy Wilkins
Craig Stewart - Craig is working with others on an expansion of earlier work (shared with SP Forum after the meeting) to perform an ROI analysis for XSEDE. He requests information from each SP and intends to send a survey out to each SP representative by the time of the XSEDE annual review in June. Past information showed a trend for L3s to report an increased value of XSEDE. Participation rates in the last survey were high - 100% Level 1 SPs and 2/3 of L2s and L3s. Craig described some of the methodology previously used to calculate value, for example, from ECSS and training videos. He asked SPs to think about the question "If XSEDE services did not exist, how many additional FTE would you need to add to provide similar services to your users?" Discussion focused on how to untangle, in the context of a survey, what aspects of resources, programs and value derive specifically from things delivered by XSEDE versus those that are provided by SPs themselves or other sources. Often these are related and overlap. A goal is to tease out the specific value of XSEDE to SP operations. Craig will make some refinements to the survey, based on the discussion, prior to sending it out. SP representatives will have approximately one month to complete it.
J.P. Navarro and Shava Smallen: The various aspects of XCI (XSEDE Cyberinfrastructure Integration) were described by JP and Shava. The XCI hopes to improve tools and would like to get additional feedback from SPs regarding what would be most helpful. That information might be gathered through breakouts at quarterly meetings and/or at other venues where many SPs are represented, such as PEARC. SP Forum representatives were asked to check with their operational staff to solicit input on what is needed that XCI might deliver. An example of a value added service provided through XCI is the SSO Hub. XCI also is interested in identifying SPs that would like to be early testers of work XCI is doing with OpenStack
Updates to https://www.xsede.org/web/site/about/governance/spf
Next meeting: Thursday, May 30 (Ruth will not be able to attend) if there are agenda items; otherwise, we will next meet again in late June (not June 13, as that is the wrap-up day for the XSEDE annual review.)
Meeting: May 2, 2019 @ 1PM Pacific/4 pm Eastern
Attendees:
Jon Anderson, Dan Andreesen, Alan Chalker, Jaime Cobariza, Nathan Gregg, Dave Hancock, Victor Hazlewood, Christopher Irving, Andrew Keen, Chet Langin, Ruth Marinshaw, Henry Neeman, Anita Schwartz, Robin Scibek, Carol Song, Dan Stanzione, Craig Stewart, Mary Thomas, John Towns, Shemeema Oottikkal
Victor Hazlewood - XSEDE engineering updates: The process of adding non-allocated SP member information to the RDR is almost complete; he noted that having an entry in the RDR is the only requirement for L3 SP Forum members. He is going through the checklist with each allocated SP now to ensure that everything is up to date. Many (40-50) XSEDE Enterprise Services have been migrated to AWS. Also, software.xsede.edu has been updated – note that this is a great resource for service providers / campuses beyond XSEDE, as it is a repository of tools that XSEDE provides for the community.
Alan Chalker and others - Open OnDemand: OpenOn Demand provides a plugin-free, graphical desktop environment as an interface to HPC resources. Originally funded by a 3 year NSF SI2 award, the project team recently was awarded a 5 year CSSI to continue development. Their roadmap includes integration with OpenXDMod and additional templates. SP Forum members asked many questions and had suggestions around features that would be of use. Alan requests feedback from the SP Forum on prioritization and features; this is something that Ruth and Alan will follow up on. Craig Stewart noted that he will connect with Alan to discuss ways that the XSEDE common software repo tools might be "templated" for use with this.
John Towns - Quarterly XAB meeting recap, with concern about what next summer's allocable resources will look like (summer of 2020), as some SP awards may be ending. New resources for XSEDE may be online by October 2020. Will there be a resource gap between those? John thanked the SP Forum for its comments to NSF regarding the follow-on to XSEDE. XSEDE's annual review will be June 11-13.
Other news from SP Forum members
Next meeting: May 16, 2019: Craig Stewart: ROI analysis for XSEDE (info and requests coming to SPs); Dave Lifka to discuss XCI (XSEDE Cyberinfrastructure Integration) activities and plans (https://www.xsede.org/ecosystem/ci-integration)
Meeting: April 18, 2019 @ 1PM Pacific/4 pm Eastern
Attendees: Jaime Combariza, Dan Andreesen, Trevor Cooper, Steve Brandt, Ester Soriano, Doug Jennewein, Chet Langin, Burt Cubbison, Mike Showerman, M. Shapiro, Alina Banerjee, Jon Anderson, Dave Hancock, Dave Hart, N. Wolter, S. Peckins, Anita Schwartz, Carol Song, Dan Stanzione, Ruth Marinshaw
XSEDE Accounting system input gathering: Dave Hart, Ester Soriano, Burt Cubberson, S. Peckins and Alina attended the SP Forum meeting to discuss potential redesign of XSEDE accounting system and solicit input from all service providers about that. The RAS team is considering revamping of the accounting system in the upcoming year; are there specific needs? What is working and what isn't? Steve Brandt (LSU) noted that he had found it difficult to use, but once it was running, it mostly just worked. Perhaps it would make sense to make the old (current) accounting system easier to use rather than rewriting. Suggestions were made that the team consider creating docker images, packaging the tools as a container so there is no ambiguity. Jonathon Anderson (Colorado) noted some possible areas for improvement to the AMIE system, in that it feels too stateful. He would like to see a REST endpoint to query rather than getting XML off/on a message queue. This would make the overhead of polling go away. Jon went on to note that it is in XSEDE's interest to make onboarding to and use of AMIE easy, and a REST interface would facilitate that. Colorado also uses AMIE for accounts only but have other tooling for allocating resources. There was an inquiry about integration with/interaction with ColdFront. Dave Hart described the importance of the system beyond accounts; for resource allocation tracking, the data must be exact, and SPs must be willing to send their data in a timely fashion. The different responsibilities of RAS versus SPs around accounting and allocation charges was discussed. A suggestion was made that SPs (perhaps mostly L3s) would be interested in sending their own system data to XSEDE if the data could then be mined with XDMod rather than the L3s having to run their own instances of XDMod. The RAS team would like to continue to receive input around this topic, so Dave Hart will coordinate with Ruth to set up a shared google doc as a collaboration mechanism. Once set up, SPs are encouraged to have their "boots on the ground" accounting/allocation staff provide input.
Jetstream overview and update: Dave Hancock, the PI for Jetstream and Director of Advanced Cyberinfrastructure at IU, and Jeremy Fischer updated the SP Forum on Jetstream, which is close to ending its 3rd year of operations. An NSF-funded cloud environment, it is intended to function at the boundary of traditional HPC and cloud, surfacing on-demand interactive computing as well as persistent services like gateways. He noted that Jetstream has had significant use for teaching. Unlike traditional HPC clusters, Jetstream has no shared filesystem and no high-end interconnect. GPUs will be added in the coming year. A key use for the OpenStack-based Jetstream is as a prototype environment – you can build your environment with your own tooling or use "featured" resources that have been curated by IU. In terms of information security, IDS clusters scan the system heavily (network-based scans, not scanning the state of the image as it runs). The range of disciplines supported on Jetstream is broad, including 28 social science projects and 19 neuroscience. As of March 2019, Jetstream had 415 active XSEDE projects. Of note is that use by various engineering communities has grown as Matlab and SimuLink can be run on Jetstream without the users having to have their own licenses. Also, Johns Hopkins is partnering with them to provide access to Galaxy. Other Jetstream gateways includ OpenMRS and ChemCompute. Jetstream access has been extended to November 2020. When asked about multiyear allocations, Dave noted that such allocations are a function of XSEDE policies, not local policies.
Next meeting: May 2, 2019
Meeting: April 4, 2019 1 pm Pacific/4 pm Eastern
Attendees:Jaime Combariza, Tom Doak, Dan Andreesen, Henry Neeman, Robin S, Dan Stanzione, Anita, Mike Showerman, Jon Anderson, Nancy Wilkins, Ruth Marinshaw
Tom Doak - NCGAS description and discussion: NCGAS is an NSF-funded program focused on service delivery for genomics, typically for biologists and bioinformaticists. It was funded through a DBI award in collaboration with PSC. Large NCGAS jobs are run on Bridges. PSC is the lead on the microbiome work and a partner on gene pattern hosting. Four FTE for NCGAS are organizationally located within IU's Research Technologies department; physical co-location with them provides for many opportunities for internal collaboration. Tom noted that genomic analyses are often computationally demanding and the community is growing rapidly. NCGAS provides a curated set of applications for that community and access to large-scale HPC. Any NSF-funded researcher can use NCGAS's services but writing an allocation request to NCGAS; most users are biologists. Galaxy runs are primarily done on Jetstream, larger HPC runs on Bridges. NCGAS has community allocations on both of those. A key value provided by NCGAS is that carefully curated collection of applications, with a focus on ensuring that those applications are interoperable between the HPC resources at IU and at PSC. NCGAS is also funded to be domain champions for bioinformatics. To learn more, go to ncgas.org ; at that site, there is an opportunity to submit an allocation request. In terms of authentication, users get affiliate IDs at IU.
Ruth Marinshaw - XSEDE engineering updates sent by Victor H: There are no new items deployed in the xsede engineering process to announce. There are two currently in the works: new Globus OpenSSH and Globus Connect version 5. Currently working with the two new level 3 SPs to get them an entry in RDR.
All - what's on your mind? Roundtable discussion/opportunity to share: no news was, well, no news.
April 18 agenda topic reminders: Dave Hart/Ester Soriano from the XSEDE staff to solicit input about planned modifications to the XSEDE accounting system. Also, Dave Hancock and Jeremy Fischer will provide a Jetstream update.
Meeting: March 7, 2019 @ 1PM Pacific/4 pm Eastern
Minutes to be added by Vice Chair; meeting agenda is below
Dave Hart, XSEDE’s Resource Allocations Service (RAS) Director - ORCID integration with RAS: description, demo and discussion of what SPs need to do to participate.
Victor Hazlewood - how the XSEDE engineering process works (which groups are involved) and how activities from that process end up where they can be deployed to SPs
John Towns (if able to attend) - XSEDE updates
Future agenda topics, housekeeping/cleanup of lists, documents, etc.
Meeting: February 7, 2019 @ 1PM PDT
Minutes to be added by Vice Chair; meeting agenda is below
Operations/SP Coordination update (Victor Hazelwood)
Current NSF solicitations (Track2 and mid-range CI) & discussion of impact, timing vis-a-vis current services (Dan Stanzione)
Reminder: Provide input NOW to NSF regarding the XSEDE follow-on (https://docs.google.com/document/d/1Vzu7wsuEzudnXlm9bN-NH1m4BRF4wBmINSsndawM17I/edit )
Roundtable - anything you want to share with or ask your fellow SP Forum members? (All)
Next meeting: February 21, 2019: Among other topics, Dana Brunson will talk about her new role as Internet2's Executive Director for Research Engagement
Meeting: January 24, 2019
Attendees: Mary Thomas, Ruth Marinshaw, Jon Anderson, Thomas Doak, Shawn Strande, John Towns, Dan Stanzione, Jaime Combariza, ....
John Towns, XSEDE Program news: little to update us on, shared information regarding the (minimal) impact of the government shutdown on XSEDE operations thus far. There is a Forum email thread on this topic. If the shutdown extends for months, the situation may change.
Shawn Strande : As immediate past SP Forum chair, Shawn had offered to collect and then summarize SP Forum input to the NSF as the agency considers what the XSEDE follow-on might look like. Ruth will share the Google Doc input link with the Forum again after today's call. He'll collect input until mid-February. Among the possible questions Forum members might consider providing input on:
what are the key XSEDE program elements offered today that should continue?
what, if anything, should be changed/improved in the allocation process?
what value does XSEDE bring you today?
All meeting participants committed to providing input to Shawn either via the shared Google doc or in direct emails to him.
"Open Mic": - Brainstorming around future agenda topics of interest- Review of the purpose of the XAB and SP Forum representatives' roles on thatNew members:New SP Forum members Delaware and John Hopkins shared background on their programs and their perspectives on the value of the forum to them and their institutions - what is the value of the forum to them?
Next meeting: Thursday, February 7, 2019, 1 pm Pacific
Meeting: January 10, 2019
Attendees: Andrew Keen (Michigan State); Ruth Marinshaw (Stanford);
XSEDE update (John Towns)
Notice of service availability during government shutdown
SP Forum leadership nominations
Chair: Ruth Marinshaw (Stanford)
Vice Chair: Mary Thomas (SDSC)
L2: Thomas Doak (Indiana U/NCGAS). In tandem IU is submitting a request to move NCGAS from L3 to L2 SP
L3: ** None tendered **
XAB Representatives: David Hancock (Indiana U), Jonathon Anderson (CU Boulder)
Dana Brunson
new position at i2
research engagement role
Providing input to NSF on the XSEDE follow-on
https://docs.google.com/document/d/1Vzu7wsuEzudnXlm9bN-NH1m4BRF4wBmINSsndawM17I
Or send info to Shawn Strande
Shawn will try to synthesize group comments and present to the forum if there are competing viewpoints
Next meeting: January 24, 2019
Meeting: December 13, 2018
XSEDE update (John Towns)
Operations/SP Coordination update (Victor)
updates / changes pending for rdr
SP Forum leadership nominations
Hoping to have nominations in place before 2019
Providing input to NSF on the XSEDE follow-on.
Shawn will create a google doc to accept comments from forum
Discussion of proposed changes to Requesting Membership in the XSEDE Federation document.
Changes include redefining L2 SPs, as well as other minor updates.
Update PEARC requirement to be participation in an XSEDE event at PEARC
Around the table
Next meeting: January 10, 2019
Meeting: November 29, 2018
Attendees: Jonathon Anderson (CU Boulder); John Towns (NCSA); Shawn Strande (SDSC); Robin Scibek (PSC); Andrew Keen (Michigan State); Bob Stock (PSC); Carol Song (Purdue); Chet Langin (SIU); Dana Brunson; Daniel Andresen; David Hancock; Doug Jennewein; Mike Showerman (NCSA); Nancy Wilkins-Diehr; Ruth Marinshaw; Steve Brandt (LSU); Timothy Middelkoop; Tom Doak; Victor Hazlewood (UTK)
XSEDE update (John Towns)
Submitted most recent interim project report, accepted by NSF
NSF calls
Cloud access
"track 2" solicitation
Many others
Plans to provide a webpage to support with proposals to use XSEDE resources
Available in a few weeks
Opportunity to request additional resources to be added to the list specified on the website
Plans for follow-up to XSEDE-2
Operations/SP Coordination update (Victor)
Looking at RDR
Cleaning up
SP Forum nominations for 2019 (Shawn)
XSEDE and XRAC Quarterly meetings
SC '18: impressions, useful and cool things you learned
IO500 list for network performance
PEARC event
Around the table
Problems between OpenOnDemand and latest SLURM versions, requiring OOD patch
Meeting: November 1, 2018
No XSEDE update (John Towns not available)
Operations/SP Coordination update (Victor)
Dan Stanzione from the Texas Advanced Computing Center (TACC) will give a presentation on the Frontera Leadership Class Computing Facility.
Dana Brunson from the Oklahoma State University will give an update on the Campus Champions program
Reminder: SP Forum meeting for Nov 14 is cancelled due to SC'18.
Time to start thinking about elections of new officers for 2019
Program and operations updates as people are available.
Meeting: October 18 2018
Attendees: Peter Uthuppuru (IU); David Hancock (IU); Charles McClary (IU); Steve Simms (IU); Jonathon Anderson (CU Boulder); Glenn Lockwood (IU); Robin Scibek (PSC); Shawn Strande (SDSC); Bob Stock (PSC); Carol Song (Purdue); Chet Langin (SIU); Dana Brunson (OSU); Daniel Andresen (KSU); David Hancock (IU); Jeremy Fischer (IU); Mike Showerman (NCSA); Nancy Wilkins-Diehr (SDSC/SGCI); Ronald Hawkins (SPSC); Ruth Marinshaw (Stanford); Stefan Thiell (Stanford); Steve Simms (IU); Tom Doak (NCGAS); Stewart Howard (IU)
Featured speaker: Glenn Lockwood from NERSC's Advanced Technology Group will give a presentation Planning for the Future of Data, Storage, and I/O at NERSC
Discussed options for XSEDE community to access NERSC systems; desire expressed to participate more through PERC
Discussion about performance benefits of object store over POSIX FS for HPC scratch
https://www.nextplatform.com/2017/09/11/whats-bad-posix-io/Discussion about viability / longevity of pure ssd in HPC scratch
XSEDE updates (John Towns, no new updates)
Operations/SP Coordination update (Victor, not available)
Update on revisions to Requesting Membership in the XSEDE Federation to accommodate changes to the L1 definition and associated requirements for integration.
Slurm bug and interest in helping get it resolved (Jonathon)
Reminder for folks to add their SC'18 activities to the shared doc. The XSEDE Communications team is interested in this as well.
Program and operations updates as people are available.
Meeting: October 4 2018
Attendees: Daniel Andresen (KSU); Henry Neeman (OU); Shawn Strande (SDSC); Victor Hazlewood (UTK); Mike Showerman (NCSA); Doug Jennewein (USD); Jeremy Fischer (IU); Bob Stock (PSC); Chet Langin (SIU); Andrew Keen (Michigan State); Robin Scibek (PSC); Steve Brandt (LSU); Nancy Wilkins-Diehr (SDSC/SGCI); John Towns (NCSA);
XSEDE update: Nothing noteworthy to report. (John)
Operations/SP Coordination update: Victor reminded folks that he is working with sysadmins and PIs (or their delegates) on the annual update the SP Forum checklist (Victor)
Followup on SMT feedback on refining L1s as a Service Provider that is in whole or part allocable via XRAC. This topic consumed most of the meeting. There was general agreement (or at least no disagreement) about redefining L1s SPs this way. Shawn will provide a first draft edit of "Requesting the Membership int he XSEDE Federation" (https://www.ideals.illinois.edu/handle/2142/49981), which is the governing document for this.
Reminder for folks to add their SC'18 activities to this shared doc: https://docs.google.com/spreadsheets/d/15VbE3BAAsA8EN5gtw9T4jw2QegNXS09HLFjdF8bJqbk/edit#gid=0 The XSEDE Communications team is interested in this as well.
Meeting: September 20 2018
Attendees: Jonathon Anderson (CU Boulder); Henry Neeman (OU); Shawn Strande (SDSC); Victor Hazlewood (UTK); Dan Stanzione (TACC); Mike Showerman (NCSA); Daniel Andresen (KSU); Doug Jennewein (USD); Emre Brookes (UT); Jeremy Fischer (IU); John Towns (NCSA); Preston Smith (Purdue); Robin Scibek (PSC); Tom Doak (NCGAS)
Overview of the recent U of Oklahoma OURRstore award (Henry Neeman)
https://www.hpcwire.com/off-the-wire/university-of-oklahoma-team-receives-nsf-instrumentation-grant-for-data-archiving/xsedespforum2018_talk_ourrstore_neeman_20180920.pdfGrant funds initial infrastructure
Institution funds power/cooling and extended warranties
Researchers fund incremental media costs and shipping
Incremental cost of tape is lowest of any storage medium
Discussion on Federated Logon (Jonathon Anderson)
Maybe eduroam would be a good example?
TACC always distributes passwd files from ldap to distribute load
Recommended continued offline investigation
Discussion and move towards a vote to update the SP Forum charter to redefine SP Levels
per an earlier proposal such that any Service Provider which is in whole or part allocable via XRAC will is an L1.
This would mean changes to Stanford XSTREAM, OSG, and SuperMIC.
In concert with this change, there would be some changes to the requirements for integration with XSEDE based on the experience of Stanford, et al. https://confluence.xsede.org/display/XT/XSEDE+Federation?preview=/1671600/9077678/SP%20Levels.pdf
Concerns raised that these SPs aren't prepared with operational budget / staff to meet expectations for L1 (particularly for MRI systems)
Concerns expressed about fractured user experience for allocated systems between L1 and L2 resources
We take up the promotion of NCGAS from and L3 to and L2 since by definition, NCGAS makes resources available to XSEDE.
Shawn will review bylaws for process or move to vote
alternative may be to decom the L3 and apply as a new L2
Small-scale, rapid access node hosting options a science gateway (Emre Brookes)
UoI might be able to support this (John Towns)
Tacc may be able to support this (Dan Stanzione)
Meeting: September 6, 2018
Attendees: Jonathon Anderson; John-Paul Navarro; Ruth Marinshaw; Sawn Strande; Tom Doak; unknown call-in participant; Mike Showerman (NCSA); Andrew R Keen; Bob Stock; Chet Langin, SIU; Dana Brunson; David Hancock; Jeremy Fischer; Nancy Wilkins-Diehr; Robin Scibek; sbrandt; Timothy Middelkoop
XSEDE Update (John Towns)
Operations Update (Victor Hazlewood)
Draft recommendations for SP Forum members about using RDR and IPF (JP Navarro)
https://docs.google.com/presentation/d/1pj4cVwhUcjTPJvUgbkuLBlBrcTEt74vmlm-YPgo09A8
"SP will provide and maintain digital service/resource info in XSEDE Information Services"
RDR, resource description repository
manual entry for static information
all L1-3 SP
IPF, information publishing framework
automatic publication for dynamic
all L1-2 allocated SP
New recommendation for which specific information:
modules - future required for all
services - future required for all
batch config and state - future required for all
batch job events - future required for L1-L2, optional for L3
concerns expressed about batch system and job event requirements
SP Forum member review and Charter discussion (Shawn Strande)
https://docs.google.com/spreadsheets/d/1Qep_1_ujjcMpa3Cb77dv5vRVkekIW2B_cCkjMbdlilc
SC '18 XSEDE PR and SP Forum member awareness (Shawn)
Resource selector data is out-of-date (needs to be updated)
SP definition alternatives presented
Meeting: August 9, 2018
Attendees: Jonathon Anderson; Ron Payne; Shawn Sprande; Tom Doak; Andrew Keen; Bob Stock; Carol Song; ChrisH; Dan Stanzione /TACC; Dana Brunson; David Hancock; Doug Jennewein; Mike Showerman (NSCS); sbrandt; Suranga; Victor; Furlani; rideleon
XSEDE Update (John Towns)
Operations Update
Tom Furlani ColdFront presentation (presentation link)
SP Forum membership and charter discussion (charter link)
Topic 1
Tom Furlani will give an overview of ColdFront. Here’s a brief description of ColdFront and the types of functions it provides. (presentation is available from the agenda link above if you like to review in advance):
ColdFront is an open source resource allocation management tool for HPC centers that allows the management of Center resources and User allocations (or subscriptions). Functionality:
HPC Center Staff: Approve allocations based on PI provided information; Collect external funding and publication data; Generate reports on research, publications and grant funding
System Administration Staff Managing faculty group accounts and access to resources
Principal Investigators: Ability to easily add and delete users; Ability to easily report research, publication and grant data
Topic 2
The application of BioBurst as an L3 spurred some discussion in the XSEDE Senior Management Team about the pros/cons of an L3 being a resource vs. an organization. I’ll provide some background on the call, and hopefully John will be able to join as well since he has a lot of the SP Forum history. If you have some time, I’d suggest taking a read of the SP Forum Charter (Section E in particular), which is also linked at the agenda URL above. Noteworthy from that section: It is expected that most Level 3 SPs are organizations rather than tied to specific systems, and hence less likely to become ineligible as a Level 3 SP.
ColdFront - Furlani - An open-source HPC center allocation management and reporting system from University at Buffalo (need to get slides to share)
Full release September 2018
Only supports Slurm today, but other schedulers should be possible by developing new plugins
xdmod integration
no AMIE integration / compatibility
XSEDE update
Membership issues in the SP Forum
Meeting: July 25, 2018 (in person lunch meeting at PEARC '187)
(Meeting notes courtesy of Victor Hazlewood)
Attendees: Shawn Strande, SDSC; Victor Hazlewood, UT; Timothy Middelkoop, Missouri; Carol Song, Purdue; Thomas Houser, U Colorado; Chet Langin, SIU; David Hancock, IU; Jeremy Fisher, IU; Nathan Gregg, West Virginia; Andy Cain, MSU; Steve Brandt, LSU; David Swanson, Nebraska; Doug Jenwein, South Dakota; Dan Stanzione, TACC; Henry Neeman, OU
Discussion of UColorado, Boulder use of XSEDE services to solve their non-CSU non-UColorado users and their integration into RMACC-Summit regional compute resource.
Nathan Gregg asked about Export Control
Carol brought up a NSF funded CICI project that involved some sensitive information processing.There was a workshop at PEARC on that topic
NSF Cybersecurity summit workshop is going to have a presentation on compliance
Discussion of a best practices set of docs
Shawn brought up CASC thread on campus compute and storage policies
Been some discussion on computing resources and storage policies
Email from Ralph Roskies which compiled responses from 22 universities
Dan, Henry, and Thomas gave their perspective on the allocation of university resources
Henry mentioned the Sustainable Computing workshops (Lifka lead author on paper) Website at http://www.cac.cornell.edu/srcc/
It was mentioned CASC is mgmt. (high level), SP Forum is similar and Campus Champions are operational with some overlap at each
Another call for the XSEDE Operations Data Transfer Services are looking for volunteers for a pilot project to improve science data flows. The announcement is repeated below:
The XSEDE Data Transfer Services (DTS) group is exploring the possibility of offering ECSS-like services to level 2 & 3 SP sites. The idea would be to conduct a hands-on, limited-time engagement with a site to provide design and/or implementation assistance for projects critical to XSEDE users’ science workflows. Some examples might include:
Network optimization and troubleshooting
Parallel file system design and deployment
Data transfer node configuration and optimization
To assess the feasibility of offering such a service, the DTS group is looking for one or two volunteers to participate in a pilot project. During the pilot project, DTS would work with the selected site to identify the scope of the project and allocate appropriate XSEDE staff resources. Once the pilot project is completed, a short review would be conducted to assess the effectiveness of the project.
Anyone interested in participating should send an email to Rick Mohr <rmohr@utk.edu> (and CC Tim Boerner <tboerner@illinois.edu> and Tabitha Samuel <tsamuel@utk.edu>) with a subject line that includes the string “XSEDE DTS Pilot Project”. The following information should be included:
Name of your organization
Name of project contact (who will help coordinate activities between XSEDE and the organization)
Name of technical contact (who will help with technical work on the project, if different from project contact)
Short description of the type of help the organization would like to receive from XSEDE
Applications should be submitted by June 30, 2018. Sites will be selected and notified by July 15, 2018.
Meeting: July 12, 2018
Attendees: 20 (names not captured)
Call for agenda items and logistics at PEARC SP Forum F2F meeting (Shawn)
Face-to-Face meeting will be on Wednesday, July 25 from 12-1 in the Brigade room.
Discussion of compute and storage allocations at different sites (continuing CASC email discussion)
XSEDE communications campaign wrap up. See draft document here: Resources and Documents.
Existing collection of comments presented and approved for delivery
Operations update; Information Publishing Framework 1.4 and SP call to action (Victor Hazlewood, UTK)
Available to use
Going to send invitation to L3
info.xsede.org and xsede.org website
XCRI update https://www.xsede.org/ecosystem/xcri-mission (Rich Knepper, Cornell)
XCRI helps sites move to or stand up OpenHPC-based deployments
Should be marketed via LCI
Tentative Future agenda items:
8/9: Tom Furlani, XDMoD and/or FreeRAS
Meeting: June 28, 2018
Attendees: Dana Brunson; Tom Doak; Kristin Williamson; Victor; Bob Stock; Steve Duensing; Rob at PSC; Ken Chiacchia; Robin Scibek; Dave Hart; David Hancock; Mike Showerman; Alex (Dave entourage); Dan Andresen; Tim Middelkoop
XSEDE update (John Towns)
XSEDE communications campaign concept discussion (Kristin Williamson, UIUC)
Send feedback about the Communications campaign to Dana or Shawn.
Slides that outline the concept are here: https://docs.google.com/presentation/d/1arG-5iQ6yhg7wvQH4_lIAjfnBemRWQLF6A_IOErPHJI/edit?usp=sharing
Nov 1 target date
Comments:
"Really looking forward to a well-made 'What is XSEDE'"
"really likes 'Discovered with XSEDE'"
"ER looked and thought it was well-planned"
RDR Resource Selector (Dave Hart, NCAR)
SP controls what it says.
If pointing to un-allocated resources they may want to include a spot that explains who is eligible to use them. See docs.
Dave will send out link to docs again.
It will end up working with orcid.
Everybody needs to update their entries and add logo (200x200)
if an SP Forum representative can't access the RDR to edit their resources, please contact Victor and he'll get that updated.
If in doubt contact Victor vhazlewo@utk.edu
The beta Resource Selector is "live" at https://portal.xsede.org/test-resource
The RDR is at https://rdr.xsede.org/.
And here is the Google Doc I've started with instructions on how to update the fields in the RDR that the Resource Selector is currently displaying. https://docs.google.com/document/d/1Yu7CfN9DxcBDyxqT84xzKSGRaqFCCdkxgZAG-qV5o28/edit?usp=sharing
Operations update, Information Publishing Framework (Victor Hazlewood, UTK)
Tentative Future agenda items:
TBD: XDMoD overview + study of NSF innovation systems (Tom Furlani, U of Buffalo)
Meeting: June 14, 2018
Attendees: Jonathon Anderson (CU Boulder); Shawn Strande (SDSC); Ruth Marinshaw (Stanford); Andrew Keen (Michigan State University); Bob Stock (PSC); Chet Langin (SIU); Dan Stanzione (TACC); Dana Brunson (Oklahome State University); David Swanson (University of Nebraska); Henry Neeman (University of Oklahoma); John Towns (NCSA); JP Navarro (Argonne National Lab); Robin Scibek (PSC); Sergiu Sanielevici (PSC); Thomas Doak (Indiana University); Thomas Hauser (CU Boulder); Victor Hazlewood (UTK); Frank Wuerthwein, UCSD; Rick Mohr, (UTK)
XSEDE update (John Towns)
Debrief on NSF review
The review went "extremely" well
Panel pleased with our progress
More detailed report with recommendations to be received next week; our response to follow
Introducing Chet Langin from Southern Illinois University (SIU)
Carbondale, IL, active with campus champions for 2.5 years
40 node cluster, ~800 haswell cpu, 2 himem nodes, 1 gpu node
about 30 user departments on campus, including physics, engineering, cs, biology
Open Science Grid - the next phase (Frank Wuerthwein, UCSD)
See slide presentation, OSG-SPForum(14June18).pdf
Q: What would be the first thing to do to contribute? A: Send an email to fkw@ucsd.edu and become a hosted CE.
Providing ECSS-like consulting for L2 and L3's for Data Transfer Services (Rick Mohr, UTK).
Opportunity to work closely with people familiar with XSEDE services
Looking for interested pilot sites with thoughts on what kinds of consulting they could benefit from
Network and data services group has combined within XSEDE, trying to extend service and support from L1 to L2 and L3
Questions about tracking network performance and utilization data; might be a topic for a future call or offline work
Novel and Innovative Projects support for Machine Learning projects (Sergiu Sanielevici, PSC)
Want to make sure we form a community of practice with similar projects that might come to the attention of L2 and L3 SP
Call to action for SPs: if you come across projects relevant to NIP, contact sergiu@psc.edu
Operations update, Information Publishing Framework (Victor Hazlewood, UTK)
Changes to resource description repository
Almost all SPs added to RDR
Push for PY8 is to get all sites publishing at least one item (e.g., their cluster)
Tentative future agenda items:
TBD: XDMoD overview + study of NSF innovation systems (Tom Furlani, U of Buffalo)
RDR Resource Selector (Dave Hart, NCAR)
XSEDE communications campaign concept discussion (Kristin Williamson, UIUC)
Meeting: May 17, 2018
Attendees: Jonathon Anderson (CU Boulder); Dan Stanzione (TACC); John Towns (NCSA); Shawn Strande (SDSC); Andrew R Keen (Michigan State U); Bob Stock (PSC); Chris Hempel (TACC); Carol Song (Purdue); Dana Brunson (Oklahoma State U); Daniel Andresen (?); Doug Jennewein (U South Dakota); Jeremy Fischer (IU); Nancy Wilkins-Diehr (SDSC); Robin Scibek (PSC); Suranga Edirisnghe (Georgia State U); Thomas Doak (Indiana U); Timothy Middelkoop (University of Missouri)
1 XSEDE update (John Towns)
working on completing and submitting annual NSF report; good progress, thanks to those who provided input
quarterly meeting will be focused on reviewing content that will be presented from the report
Membership update
University of Missouri, Research Computing Support Services (Timothy Middelkoop, Ph.D.; Director of Research Computing Support Services, Division of IT; Assistant Teaching Professor, Industrial Engineering (IMSE))
Southern Illinois University, Office of Information Technology (Chet Langin, PhD; Research Coordinator, Office of Information Technology)
UCSD BioBurst: Application sent back for further clarification
XSEDE Quarterly is June 6-7
Opportunity for a slot to present. Let Shawn Strande know and he’ll work with Ron Payne to get a slot.
Victor OOO, and reports that there is no operations update.
Straw poll on SPF meeting frequency (Monthly: 4; Every 2 weeks: 2; Every 2-weeks and cancel as needed: 6)
Will go with every two weeks and cancel as needed.
With no objection from attendees, the May 31 meeting is cancelled due to ~10 of the regular attendees being unavailable due to a conflicting workshop.
PEARC'18 SPF F2F meeting
13 SPF members confirmed so far
please let Shawn know if you’re attending asap, so he can reserve a room for the meeting
Perspectives on the rise of for-fee models of support for previously free software (e.g. Globus, Singularity, HDF5, etc). Would be interested in hearing what SP Forum perspectives are, how they're approaching, if there's anything we can/should do as a group to engage with these vendors.
How/are licenses negotiated for use by individual XSEDE sites, as opposed to just XSEDE central services? Can XSEDE negotiate license fees on behalf of the group for access by the individual sites? Maybe sites could join an existing XSEDE license without having to pay for multiple site licenses? How does this play into NSF priorities for “sustainability” in service funding? XSEDE serving as a single-point on licenses could be beneficial in reflecting site priorities to service upstream service providers (e.g., Globus).
What are other examples of software or services in use at sites that might inform how this could be addressed? Example of Matlab license changes in terms of physical presence vs system access
Tentative Future agenda items:
June 14: NIP/ECSS and machine learning (Sergui Sanielevici, PSC)
June 28: XDMoD overview + study of NSF innovation systems (Tom Furlani, U of Buffalo)
might need a volunteer to run the meeting on this day
Two-factor solution on Stampede, looking at a startup “Evo” that has an open platform with commercial support. Looking for a few volunteer beta sites. Contact Dan Stanzione with interest
Introductions from Timothy Middelkoop (new member). HPC center. Rebuilt infrastructure over last 4 years
PEARC will be opening nominations for new steering committee members
4-5 seats available. Nominations open in June, announcement forthcoming. Committee directs PEARC conference
New alternate contact for PSC, Robin Scibek
Meeting: May 3, 2018
John Towns - XSEDE update
Developing annual report and program plan.
Still waiting for some information from SP forum (Update: This has been provided)
Closure on PY8 forward-looking presentations
SP-forum statement of activities
Welcome the new SP Forum Vice Chair, Jonathon Anderson, CU Boulder
Tech. Team Lead for Research Computing
L3 and regional resource RMACC Summit, partnership between CU Boulder, and CSU
Debrief on L2/L3 past/present Chair discussion
Update mailing lists
Program Year 8 Plan Review by SP Forum: Reminder to send your written summaries to Shawn (Completed)
Participation: https://docs.google.com/spreadsheets/d/11lOsZveUCiYxEZbW2iDtMO4AI0R1t
If you have notes, please send them in (To Shawn Strande by email if nothing else)
SP Forum meeting logistics
Building the agenda
L2 and L3 chairs should feel free to contribute agenda items
campus champion representative? / formal agenda item from campus champions
Meeting frequency
suggestions to go to 1/month
improve distribution of minutes to alleviate problems missing less-fequent meetings
might put out an informal poll
SP Coordination update - Victor Hazlewood
Read / be aware of note from JP regarding Globus TK succession plan
https://software.xsede.org/news/xsede-response-globus-toolkit-end-support-announcement
Access to SSO hub for campus resources available
Discussion about the SP Forum Charter. There are a multiple versions out there. The one linked to the main SP Forum page (https://www.xsede.org/about/governance/spf) https://www.xsede.org/documents/10157/281380/SPF_Definition_v10.2_130716.pdf/3d16bfb5-a742-4cea-ada2-0c0ddb06e1ac is an older version than this one: https://www.ideals.illinois.edu/handle/2142/49980
A question raised: the older version notes that Campus Champions are L3's by default. There's no such mention in the later version. The later version also has no details on what constitutes each level of an SP. Where is this information now located?
Blue Waters webinar take-aways - Dan Stanzione
https://www.youtube.com/watch?list=PLO8UWE9gZTlAMRvvVfS7-6q3x1DrXKmkR
Monthly/quarterly outreach event
30 people live on the call
Slides coming
Informational
Cleanup of SP Forum e-mail list
L3 Member application in review
Upcoming events
PEARC18, www.pearc18.org, Pittsburgh, July 22-26, 2018
Gateways call for participation extended to May 16: https://sciencegateways.org/web/gateways2018/program/cfp
Meeting: April 5, 2018
John Towns - XSEDE news
SPF administration
Program Year 8 Plan Review by SP Forum (Shawn collecting written feedback from participants)
XRAC Reviewers Guide Draft (Complete)
Discussion with Dave Hart
By April 5 (~2 weeks),
"how well (the reviewers guide) accurately reflects current policies, procedures and practices."
ongoing discussion topic of review practices
Provide feedback to Shawn before our next meeting on April 5
Project Improvement Fund (PIF) Review due March 29 (Complete)
PY8-01: ER Market Analysis - https://goo.gl/forms/iwUzdDtXZG6TaTlw2
PY8-02: Update Applications of Parallel Computing online course - https://goo.gl/forms/PSwGWM4GXK7A7uJ03
PY8-03: ORCID Membership Fee - https://goo.gl/forms/bKloTW1vRWJNYEFv2
SP Coordination update - Victor
Blue Waters webinar (Complete: details here: https://bluewaters.ncsa.illinois.edu/webinars/workforce/xsedespforum and here: )
Dan to handle
Schedule?
UREP Prioritization (Complete)
IU Knowledge Base discussion - final call for comments; please submit by next meeting
XSEDE Advisory Board (XAB) Meeting (Complete)
April 17-18 / Chicago
First workshop on Machine Learning for Computing Systems (MLCS) (Paper deadline has passed)
Call for Participation open; papers due April 9
Colocated with ACM HPDC 2018
June 12, 2018 in Tempe, AZ
PEARC18
Pittsburgh, July 22-26, 2018
Meeting: March 22, 2018
John Towns - XSEDE news
Program Year 8 Plan Review by SP Forum
Sign up to participate at: https://docs.google.com/spreadsheets/d/11lOsZveUCiYxEZbW2iDtMO4AI0R1t
XRAC Reviewers Guide Draft
Discussion with Dave Hart
By April 5 (~2 weeks),
"how well (the reviewers guide) accurately reflects current policies, procedures and practices."
ongoing discussion topic of review practices
Section G.1. Review Criteria/Efficient Use of Resources
Provide feedback to Shawn before our next meeting on April 5
Project Improvement Fund (PIF) Review due March 29
PY8-01: ER Market Analysis - https://goo.gl/forms/iwUzdDtXZG6TaTlw2
PY8-02: Update Applications of Parallel Computing online course - https://goo.gl/forms/PSwGWM4GXK7A7uJ03
PY8-03: ORCID Membership Fee - https://goo.gl/forms/bKloTW1vRWJNYEFv2
SP Coordination update - Victor
Blue Waters webinar
Request by Maxim Belkin for SPF talk
Wednesdays in April?
IU Knowledge Base discussion - on hold
UREP Prioritization - on hold
SP Forum web presence - ongoing
Upcoming events
XSEDE Advisory Board (XAB) Meeting
April 17-18 / Chicago
First workshop on Machine Learning for Computing Systems (MLCS)
Call for Participation open; papers due April 9
Colocated with ACM HPDC 2018
June 12, 2018 in Tempe, AZ
PEARC18
Pittsburgh, July 22-26, 2018
Attendees
J Ray (PSC), Carol Song (Purdue), Dana Brunson (Oklahoma State U), Thomas Doaks (Indiana U), Doug Jennewein (U South Dakota), Nancy Wilkins-Diehr (SDSC), Mike Showerman (NCSA), Stefan Thiell (Stanford), Andrew Keen (Michigan State U), Bob Stock (PSC), Chris Hempel (TACC), Dan Stanzione (TACC), Steve Brandt (LSU), Suranga Edirisnghe (Georgia State U), Jonathon Anderson (CU Boulder), John Towns (NCSA)
Topic 1: XSEDE News (John Towns)
Meeting in Tampa a couple weeks back, primary focus was PY8 planning, with presentation from L2 Directors - asking specifically for deltas: what's new, what's discontinued, or what's significantly changed. Discussed the ORCID. John stressed the importance of the the SPF to XSEDE planning and encouraged participation in the PY8 review (next item:
Topic 2: PY8 Review
Reviewed the PY8 review spreadsheet (link above) and got commitments from SP members to review the sections. Karla Gendler sent out Doodle polls to each team of section reviewers for meetings in the coming days.
Topic 3: XRAC Reviewers Guide
Dave Hart is anxious to get the current document reviewed within the scope of current practices. Need feedback within 2 weeks. Please send comments to Shawn who will roll these into a single doc and get them off to Dave by April 5. Will also come to a follow-up call to discuss changes to the current process that are out of scope of the document.
Topic 4: Project Improvement Fund
Review due March 29. Please review at least 1 of them, get input back to J Ray and /or the SPF group e-mail. Committee will review, with SPF casting one of the votes.
Topic 5: SP coordinate update. No discussion
Topic 6: Blue Waters webinar.
Dan Stanzione offered to give the talk on behalf of SPF.
Topic 7, 8, 9: Nothing to report.
Topic 10: See above.
Meeting: March 8, 2018
John Towns - XSEDE news
updates from quarterly
IU Knowledge Base discussion
ongoing discussion topic of review practices
Section G.1. Review Criteria/Efficient Use of Resources
Blue Waters webinar
Request by Maxim Belkin for SPF talk
Wednesdays in April?
Top Priority: SPI-05: Active account information
An XSEDE staff member needs to obtain a list of all resources and services on which a specific XSEDE user identity has an active account.
SP Coordination update - Victor
SP Forum web presence
Membership lists - plural; this page, below vs https://www.xsede.org/ecosystem/service-providers
Upcoming events
First workshop on Machine Learning for Computing Systems (MLCS)
Call for Participation open; papers due April 9
Colocated with ACM HPDC 2018
June 12, 2018 in Tempe, AZ
PEARC18
Pittsburgh, July 22-26, 2018
Attendees
Steve Brandt (LSU), Dana Brunson (Oklahoma State University), Thomas Doaks (IU), Jeremy Fischer (IU), Chris Hempel (TACC), Ruth Marinshaw (Stanford), , Victor Hazlewood, Ron Payne (NCSA), J Ray Scott (PSC), Mike Showerman (NCSA), Carol Song (Purdue) Bob Stock (PSC), Dan Stanzione (TACC), Shawn Strande (SDSC)
Topic 1: Updates from Quarterly & XSEDE News. John Towns (not present)
Updates from Dana, Victor, et al who were there. Victor: several suggestions that are being addressed. E.g., regarding system monitoring. Dana: look at connecting points between SPF and Campus Champions. Opportunities for better awareness of what each are doing and how to leverage work of each.
Topic 2: IU Knowledge Base discussion. Craig Stewart (not present)
Should KB be updated to reflect SPs (especially L2 and L3) via PIF, or should sites go their own way? Jeremy: There is a handful of general usage questions (e.g., getting supplement) that don’t belong to one SP for which generally available document would be useful. Ruth: Based on limited experience from Stanford POV, would have been easier to maintain local doc and link to main XSEDE website. Carol: Does XSEDE have general Q&A page for this sort of info? Bob Stock: SPs ought to own their content, but cross-site would be useful. Ruth: What about a researcher that wants to run a particular application, like Abaqus? Victor: Software Information provides info on each site. There is a software search (link??) Dana: What would the benefit for L2s and L3s? Could be a benefit to Campus Champions, but not clear where it’s of general benefit. Action: J Ray will draft SPF position paper and get back to Craig asap to close this out.
Topic 3: XRAC Reviewers Guide draft
Draft available at link above. Suggestion that SP Forum be involved in reviewing . Dave Hart and Ken Hackworth driving the process. J Ray: Are we as SPs providing the information we should to reviewers? E.g., Section G.1 Efficient Use of Resources. Dan: Issue with efficiency at times in conflict with users, particularly as we look at non-traditional/new users. Jeremy: Often reviewers become arbiters of what’s appropriate for the resource. Much of it comes down to ensuring that reviewers judge on merit of information in proposal. Bob Stock: Is “effective” or “appropriate” a better word than “efficient”? Dan: should we have a meeting with allocated resources for alternate possible models for allocation, say in line with standard NSF review. Action: Formalize feedback in document and setup follow-on meeting to dig down.
Topic 4: Blue Waters webinar
Maxim Belkin communication to J Ray asking about SP Forum presentation. Action: Mike Showerman will reach out to J Ray to give suggestions on what sorts of things to cover.
Topic 5: UREP Prioritization
Table for future meeting but need to work on it prior to next meeting. Action: Review and provide input prior to next meeting for J Ray.
Topic 6: XSEDE AUP
In force now. If there are any comments, get in touch with J Ray. Action: Retire this agenda item.
Topic 7: SP Coordination update – Victor
No updates since last meeting.
Topic 8: SP Forum web presence
Membership lists - plural; this page, below vs https://www.xsede.org/ecosystem/service-providers
Preference to update Confluence site and link it to the XSEDE site. Victor says the XSEDE sites comes out of RDR and he’s prefer not to remove it since he just put it in place. No specific action at this time.
Topic 9: Upcoming events
See Agenda for details.
Meeting: February 22, 2018
Agenda:
John Towns - XSEDE news
IU Knowledge Base update
Meltdown/Spectre presentation - Trevor Cooper/SDSC
SP Coordination update - Victor
SP Forum web presence
Upcoming events
UCAR SEA: April 2-6, 2018 https://sea.ucar.edu/conference/2018
More information on the Resources and Documents page
Attendees: Stefan Thiell (Stanford), Trevor Cooper (SDSC), J Ray Scott (PSC), Dana Brunson (Oklahoma State University), Craig Stewart (IU), Doug Jennewein (University of South Dakota), Carol Song (Purdue) Mike Showerman (NCSA), Chris Hempel (TACC), David Swanson (University of Nebraska), Andrew Keen (Michigan State University), Nancy Wilkins-Diehr (Science Gateways), Steve Brandt (LSU), Dan Stanzione (TACC), Victor Hazlewood (NICS)
Topic 1: XSEDE update/John Towns. Postponed.
Topic 2: IU Knowledge Base update/Craig Stewart.
Circulated document (link); SP members add, then reviewed by IU. Particularly beneficial for L2 and L3s. REST interface can be customized to support search for any site. Forum feedback? If Forum supports, there is a reasonable chance of funding. Carol: Has XSEDE site been evaluated to see how many people have used it? Do most users go straight to the user guides? Craig: only have # of hits, no other info. Continue as active discussion. Will ask for feedback in time for next meeting.
Topic 3: UREP Prioritization. Postponed. Will revisit and provide feedback to Shava/XSEDE.
Topic 4: XSEDE AUP(Acceptable Use Policy) approved at last SMT meeting. Here: https://www.xsede.org/ecosystem/operations/usagepolicy
Topic 5: Meltdown/Spectre presentation - Trevor Cooper/SDSC
Topic 6: SP Coordination update – Victor Hazlewood
Victor reports good progress moving through SP checklist, two sites left to do, see software.xsede.org for details.
Topic 7: SP Forum web presence
General discussion about plans, Jay reminded members to provide input on layout, content, etc.
Topic 8: Upcoming events
UCAR SEA: April 2-6, 2018 https://sea.ucar.edu/conference/2018. More information on the Resources and Documents page
Meeting: February 8, 2018
Attendees
Chris Hempel (TACC), J. Ray Scott (PSC), Bob Stock (PSC), Craig Stewart (IU), Nathan Greg (West Virginia U.), Shawn Strande (SDSC), Jeremy Fischer(IU), Greg Peterson (NICS), Nancy Wilkins-Diehr (SDSC), Dana Brunson (Oklahoma State University), Doug Jennewein (University of South Dakota), Thomas Doak (IU), Carol Song (Purdue), Victor Hazlewood (NICS)
Agenda/minutes
Topic 1: Proposed "PIF" for updating the IU Knowledge Base. Craig Stewart: https://kb.iu.edu/index.html
Summary from Craig: Would like SP Forum input on apply for Project Improvement Funds (PIF) from XSEDE. The IU Knowledge based sStarted in 1996 and evolved since then – structure is Q&A. Funding covered content generation (no software, etc.) Since then, KB technology has been changing. Recently the team has added funcionality to allow anyone to add content, which is then curated by KB staff, before becoming publicly available. Would like to see XSEDE funds, model would be for SP’s to enter content, then have an XSEDE staff person curate. Thinking 20 hours/week, reporting to Craig, initially funded by a PIF. Timeline: unless we have a plan to update content by July, then we reach a point of diminishing return and better to discontinue. Question about whether or not there’s been any evaluation of the impact in reducing helpdesk tickets. Nancy: sees value in questions like, “what XSEDE systems supports reservations”, i.e., questions you would have to visit each SP site to get an answer to. Bob Stock: "When can the info there now get updated, regardless of PIF funding. Action: Craig will discuss further with various SP Forum members and start a Google doc that summarizes the opportunity and come back at a future SP Forum meeting for presentation
Topic 2: John Towns update on XAB meeting: XAB 2018 February Call; and SPF Membership
John’s written update:
“XAB met yesterday (2/7). The topic of discussion was debrief on the Jan 11-12 review. The review was very positive and we are processing the panel report and preparing our responses now. It was received last week on Wednesday. The XAB provided a lot of guidance on review process, how we present various topics, and how to handle some difficulties we experienced in process.
There are elements of the review that will be feeding into our planning process for the next project year that are relevant for SPs. We will be coming back to you over the next month or so to discuss these. I know this is not much detail, but more to come.
Regarding memberships: I know Dana exchanged some email with a potential new L3 SP, but I don't believe the request for membership has come in as yet.”
Topic 3: Discussion of the UREP (see: https://software.xsede.org/node/1783/rating)
J. Ray: Would like to give a summary SP Forum response to Shava/UREP; 37 members from various areas (gateways, CCs, large users); will discuss in future meeting. Victor clarified that any SP member can submit ideas for use cases into the UREP. SP Forum has a slot in the UREP – J. Ray will pick this up.
Topic 4: Operations updates (Victor Hazlewood)
SP Coordination: Annual checklist review of all SP’s (except Stampede2, which has been done). Cycling through SP’s and collecting information, process is going well. On software.xsede.org under SP Forum see: https://software.xsede.org/sp-resource-integration-status
Topic 5: SP Forum wiki and portal pages updates
Moving to Confluence. Please review and provide any input on structure, content, etc. J. Ray and Shawn have licenses for editing, so unless you have an interest in adding content, we might be able to get an extra license or two.
Topics for future meeting:
Meltdown/Spectre
Dave Hart updating the Acceptable Use Policy.
Blue Waters Outreach webinar about SP Forum. J. Ray has responded affirmatively, probably for the March webinar. Would be good to have L2 and L3 representation.
Meeting: Jan 25, 2018
Attendees
J. Ray Scott (PSC), Dana Brunson (Oklahoma State University), Carol Song (Purdue), Shawn Strande (SDSC), Jeremy Fischer (Indiana University) Ruth Marinshaw (Stanford), Greg Peterson (NICS), Bob Stock (PSC), Mike Showerman (NCSA), Dave Hancock (Indiana University), Nathan Gregg (West Virginia University), Chris Hempel (TACC), Dan Stanzione (TACC), John Towns (NCSA), Gary Miksik (Indiana University), Andrew Keen (Michigan State University)
Agenda/minutes
Topic 1: XSEDE Review (John Towns' observation)
Briefed the SP Forum on the mid-year review. Sense is the program is in good shape, though won’t have details until the panel report is issued. Will be a full annual review in June.
Topic 2: Meltdown/Spectre
Dan Stanzione/TACC: Have done some at-scale testing of kernel patches; have concluded that the BIOS patches are not trustworthy, but kernel patches are ok. Testing 12K MPI tasks on Skylake and KNL – seeing about 1% performance impact on Skylake; about 4% on KNL. Deployed kernel everywhere, but holding off on firmware changes until stability is better.
Trevor Cooper/SDSC: Have deployed firmware and kernel to login nodes, which looks good; seeing large impact to I/O migration to virtual front-end hosting nodes and seeing 6-40% on these. No updated firmware from vendors yet.
J Ray Scott/PSC: Upgraded Red Hat w/firmware during a planned outage. So far things look okay. Seeing 1% for general MPI codes; one researcher running large memory Trinity job (small I/O) and saw 50% performance hit for one of their runs.
Dave Hancock/IU: Have patched most of Jetstream (excluding firmware). Anecdotal feedback that there are hits to I/O performance. Assessment continues.
Andrew Keen: Provided this link to Red Hat - https://access.redhat.com/articles/3311301 useful for verifying whether the mitigations are active and how to disable them at runtime
Topic 3: Ops update
Victor Hazlewood (NICS): Time for annual audit of SP checklist; will reach out to contacts; next Software Call is first Monday of the month.