What Data Does Everyone Have to Drive Data Based Decisons

What Data Does Everyone Have to Drive Data Based Decisons

Data Driven Decision Making

Session Lead: Shelley Knuth, Support

Slides: fall2025_quarterly_data_driven_decision_making.pptx

This is an interactive session.

 

What data are we collecting? Raw Notes

Support

  • OOD through XDMOD

  • ACCESS Pegasus

    • Usage

    • Findings/issues

    • Resources running on

    • Institutions

  • Focus Groups 2023 website review

    • User experience

  • Webinar (NAIRR/ACCESS)

    • Names

    • Institutions

    • ACCESS ID

    • Email

    • How they found us

  • NAIRR UEWG surveys from underutilization

    • November 2024

    • What resource? 

    • Why? 

    • What can we do to help?

  • NAIRR Underutilized allocations by 2000 SU 

    • Currently ongoing 2025

    • 1:1 meetings

    • Resource?

    • Why?

    • What can we do?

  • Survey data (CU) from Workshops

    • Pre and post workshop surveys

    • AI centered

    • Institution

    • Why they want to attend

    • Questions about their experience 

CCEP awards (Community Grant)

  • Who applied

  • Institution and $$ awarded

  • Where they went with the funds

  • Announcements

    • Who is making announcements (RPs)

  • Support Digest 

    • How many are clicking links

    • Open rates

  • CSSN Roles/Interests

  • Events and training (ACCESS/NAIRR)

    • Registrations for events

    • What types of events/trainings

  • Ticketing Information (ACCESS System only)

    • Everything!

    • Generate Tags

    • Generating new data sets

  • SDS

    • We know what they search for

    • All the software collected from CIDeR/IPFTool

  • Chatbot

    • Questions

    • Answers

    • Ratings

  • MATCH Services

    • Who is requesting MATCH

  • Office Hours Tracking

    • Who attends

    • Concerns they raise

  • Website data/weblogs

    • Institution

    • Who is Logged-in/logged-out

    • Carnegie classification

    • Geographic location

    • Session clicks/origination website

    • Affinity Group tracking

    • How they connect to ACCESS (e.g. CampusChampions)

Metrics

  • Slides

  • XDMoD Data Sources

    • From the ACCESS allocations database & XRAS:

      • Jobs submitted and run.

      • Queues and system accounts.

      • Users, PIs, and organizations.

      • Projects and allocations.

      • Fields of science.

      • Science Gateway users.

      • From CiDeR

      • Resource specifications.

    • From NSF.gov:

      • NSF awards.

      • Direct from RPs:

      • Slurm (and other resource manager) accounting logs.

      • Compute node-level performance data.

      • OpenStack logs.

      • Open OnDemand logs.

      • Network logs (stored in NetSage).

      • CloudBank data.

  • From NetSage

    • data for every flow and Globus record, source and dest:

    • Flow/task size and rate (and unique ID)

    • Organization, ASN

    • Country, Lat, Long

    • Subnet, Port

    • ScienceRegistry Project Name, Discipline

    • Resource name

    • Community membership

Allocations

Operations

Data we have now:

  • System & service Logs – security data we retain in case we need to look at it.

  • Monitoring that security does on different services

  • Qualys data (is this host vulnerable?)

  • Resource information data

    • Logging, APIs, software, hardware, RP people/contacts

  • Resource news (outages etc.)

  • Nagios monitoring data

  • Service index about our services

  • perfSONAR log info

  • ACCESS identity information / COManage info

  • ciLogon/ACCESS authentication logs

  • Ticketing statistics, time to resolution, tags

  • Web stats/hits to web pages (google analytics)

  • STEP application institution demographics

  • STEP survey data (sensitive/not to be share)

  • Historical data on volumes of network traffic sent across the network

Data we need that requires input from across all ACCESS teams:

  • Time/effort required to guide a new resource through integration process

  • Time/effort required to support/maintain an existing RP

Data we receive from others:

  • Flow data from Internet2

  • Netsage data

  • Eval team: community, staff, RP survey data

ACO

ACO – What data do we collect

  • Meeting Notes / Attendance – RAC

  • Meeting Notes / Attendance – EAB

    • ideas generated

  • Meeting Notes / Attendance - EC

  • Software Working Groups and Standing Committees Notes

Comms

  • Stories (HPC Wire – articles & awards)

  • Newsletters - Internal and External

    • statistics

  • Reach / Social Media

  • Website stats – visits, click-through

  • Publications – some are not in website Publication list

Surveys

  • EAB Survey

  • QM Survey

  • Staff Survey

  • RP Survey

  • Community Surveys

  • SC booth, PEARC booth info – names of people who have visited booth

Reports

  • Financial Reports

  • Meeting Reports – ACO

  • Tools we are using: UDO & VIVO

    • Risk Register

    • Jira used to track NSF, EC, and ACO meeting tasks

    • PEP milestones

    • Confluence

    • Adoption

    • Engagement Tracker

Quarterly Reports

  • EAB -

    • number of new allocations

    • Turn around time

  • Publications

  • New Publications

  • Turnaround time

  • EC decisions

  • Lessons learned

  • Financials

  • Tasks generated and completed


AI Generated reference list:

Collated Data Sets & Cross-Reference Schema

Below is a structured breakdown of the hundreds of data sets your teams collect, organized into eight key categories. We then identify the common identifiers across those categories and present a cross-reference matrix showing where each identifier appears. This will help you design joins, spot overlaps, and build a unified analytics warehouse.

 

  1. Data Categories & Sources

  • User & Institution Info
    Names; Institutions; ACCESS ID; Email; Carnegie classification; Geographic location; How they found us; Affinity group tracking.

  • Usage & Allocations
    XDMoD jobs/queues/accounts; XRAS allocations & credit exchanges; SDS search logs; Software inventory from CiDeR/IPFTool.

  • Support Interactions
    ACCESS ticketing; Support Digest (open rates, clicks); Office Hours attendance & concerns; Chatbot Q&A logs & ratings.

  • Surveys & Feedback
    Focus Groups 2023 site review; User-experience surveys; NAIRR UEWG underutilization (Nov 2024, ongoing 2025); 1:1 meetings; Pre/post workshop surveys; STEP surveys; EAB/QM/Staff/RP/community surveys.

  • Events & Training
    Webinar registrations (NAIRR/ACCESS); CSSN roles/interests; MATCH requests; CCEP award applications; SC/PEARC booth logs; Workshop attendance.

  • Communications & Outreach
    Announcements (RPs); Internal/external newsletters; Website stats (visits, click-through); Social-media reach; Article mentions (HPC Wire).

  • Resource & Infrastructure Metrics
    OnDemand logs; ACCESS Pegasus usage; NetSage flows; CloudBank billing; Nagios/perfSONAR/security logs; SDS search behavior.

  • Performance & KPIs
    Democratization Index; Ecosystem access time; RP satisfaction; XRAS uptime; Feedback response times; Annual Report metrics.

 

  1. Common Identifiers (Join Keys)

  • Person: Name, Email, ACCESS ID

  • Institution: University/Company, Carnegie classification, Geographic region

  • Resource: Resource name/ID (e.g., queue, system account), Software package

  • Interaction: Survey response ID, Ticket ID, Session or event ID

  • Session: Web session ID, click events

 

  1. Cross-Reference Matrix

 

Category

Key Data Sets

Category

Key Data Sets

User & Institution Info

Names; Institutions; ACCESS ID; Email; Carnegie classification; Geographic location; Affinity group tracking

Usage & Allocations

XDMoD; XRAS credits & allocations; SDS searches; CiDeR software

Support Interactions

Ticketing; Support Digest; Office Hours; Chatbot

Surveys & Feedback

Focus Groups 2023 site review; NAIRR UEWG underutilization; NAIRR Underutilized allocations; 1:1 meetings; Pre/post workshop surveys; STEP application & survey; EAB/QM/Staff/RP/Community surveys

Events & Training

Webinars; Registrations; MATCH requests; CCEP awards; Booth logs

Communications & Outreach

Announcements; Newsletters; Website visits; Social media reach

Resource & Infrastructure

OnDemand logs; Pegasus usage; NetSage flows; CloudBank billing; Nagios/perfSONAR logs

Performance & KPIs

Democratization Index; Ecosystem access time; RP satisfaction; XRAS uptime; Feedback response times

  1. Next Steps

  1. Design a star schema with a central User dimension (Person ID + Institution ID) and separate fact tables for Allocations/Usage, Support, Events, Surveys, Communications, and System Logs.

  2. Align naming conventions so that “Institution” in surveys, ticketing, and allocations all reference the same institution dimension.

  3. Implement a master registry of resource IDs (including queue names, software packages, OnDemand instances) to unify Usage and Infrastructure facts.

  4. Develop ETL pipelines that tag each row with the proper Session ID and Survey ID to enable cross-analysis (e.g., linking support tickets to resource usage spikes).

 

Beyond this schema, you might consider:

  • Building real-time dashboards for Alerts (e.g., sudden drop in XRAS uptime vs. surge in support tickets).

  • Implementing role-based data access so teams see only their slice yet can share global snapshots.

  • Establishing a data governance council to refine definitions (e.g., what counts as “active” in each domain).