HPC Operational Management, On Demand or Consulting Services
We enhance High Performance Computing (HPC) with focused operational expertise, collaboration tools and accountability. We bridge the gap between the scientists using HPC and the technologists providing the solution to accelerate success. We use proven methodologies to deliver peace of mind for projects, support for your team, or the assumption of HPC daily operations.
HPC Operational Management
Managing your cluster or storage systems and infrastructure
HPC Consulting
Architecting, troubleshooting, guidance and industry perspectives at your disposal
On Demand Services
Affordable block rates to work on short projects and efforts which you need done now
HPC Operational Management System
AT DST, we assume operational management of your HPC environment while enhancing key strategic areas: day-to-day cluster operations maintenance; domain-specific user community support; collaboration and accountability within the user base; increase cluster usage within the user community via education and tuning; and increase accountability. Here is an overview of what we provide to our customers.
Day to Day Management
We deliver consistent focused expertise that the institution no longer has to manage on a day-to-day basis. This allows our clients that ability to never miss a step when a key employee leaves the business, goes to training, or takes vacation.
Day to Day Management
We deliver consistent focused expertise that the institution no longer has to manage on a day-to-day basis. This allows our clients that ability to never miss a step when a key employee leaves the business, goes to training, or takes vacation.
Customized Support System
A customized support ticketing system is employed to provide an audit trail on every interaction with the user community, the institutions HPC team, and the vendor community. HOMS delivers reports, real-time dashboards and weekly operational reviews with your team to keep communication and expectations in full view. Finally, DST closes every ticket with a “How are we doing?” customer satisfaction survey so that you have data on your user communities experience with cluster services and DST support.
Inclusive and Ongoing Collaboration
With HOMS, DST creates a customized WIKI to help educate and empower the user community. Topics including: cluster operations, policies, workflows, component functions, FAQ and How To’s, are published during the on-boarding process and enhanced on a weekly basis. Further, we offer the Wiki FAQ’s customized to the site and community forums for sharing.
Our service philosophy, that we are an extension of your team, demands a collaborative approach to every effort. DST recognizes the different needs of varied stakeholders in each work effort and has established a collaboration methodology and tool set designed to communicate effectively.
Therefore, we have ongoing collaboration initiatives with staff, scientists and executives.
Vendor Management and Advocacy
- Working with vendors and partners to open tickets, managing them to resolution through the provider’s system and raising a flag when vendors are not responsive or comprehensive in their solution provides unprecedented accountability. Because we manage many environments, the team of professionals at DST are highly regarded by vendors and able to work closely to provide speedy and complete resolution with significant input in the process.
Supported Ecosystem
Data in Science Technologies offers agnostic hardware/software services to the research compute clusters. We support all parallel file systems but specialize in GPFS and Lustre. We support all major compute providers, all network providers and all storage providers used in research computing. Further, we have experience supporting all major schedulers and all major cluster management tools as well as OpenStack. DST can provide a team of senior level engineers to help your HPC management team or that team can assume management of the cluster. Our remote sys admin service is agnostic to the components of the cluster and we provide an entire support team less expensively than an FTE in most cases.
DST works with you to define the parameters of the existing environment that you would like to have us manage. Our services can offer operational management for compute, scripting, cluster and pipeline management, file systems and operational hardware support. The customized support plan is defined by your individual needs.
Delivery System
DST developed a systematic integration methodology as an extension of the existing HPC management team. Our methodology is designed to on-board HOMS, offer daily operational support; improve accountability, and increase collaboration. No aspect of the client’s environment is unaccounted for. If you would like our detailed Step by Step Guide, please contact us.
Delivery System
DST developed a systematic integration methodology as an extension of the existing HPC management team. Our methodology is designed to on-board HOMS, offer daily operational support; improve accountability, and increase collaboration. No aspect of the client’s environment is unaccounted for. If you would like our detailed Step by Step Guide, please contact us.
Operations Focus
The goal of HOMS is to assume the operational management of the HPC cluster as defined by need — daily, quarterly, single use, alert relevancy. We ensure a predictable, consistent, reliable HPC environment.
DST takes responsibility for HPC management and can customize the service to fit your individual requirements. We can assume accountability for operational management in compute, script support, cluster management, file system, and hardware.
Education Focus
Policy Focus
DST works with HPC and the Scientist on agreeing to important internal policies for protecting the integrity of the HPC cluster and its data contents. Some of the key Policy considerations include topics surrounding:
- Usage
- Sensitivity categorization
- Publishing
- Queue Usage
- Resource Reservation
- Software load/compiling
- Support
- Documentation
HOMS Tools
We use browser-based tool for the visualization of the distribution of filesystem capacity utilization. The age-based heat map offers users the ability to identify asset reclamation opportunities.
Some of the HOMS tools are customized for your unique environment to report on the key attributes of the cluster, complete with a ticketing system. DST actively optimizes the heart of your HPC including:
- Shared storage health
- Parallel file system health
- Compute node – Operating system updates
- Proactive problem mitigation
- Capacity planning/reporting
- Cluster networking
- Scheduler (optimizations)
HOMS delivers a Cluster dashboard customized for your environment.
HPC Consulting
DST provides a team of focused senior expert resources providing you with an outsider’s expert insight into the challenges you are facing and non-bias solutions to address them. With proven results, our consulting services provide:
- Lower risk to moving forward based on industry expert, best practice, and well-documented plans supporting your strategic roadmap.
- Customer advocate response teams providing expert opinions void of a political agenda.
- Goal-oriented team dedicated to efforts without distraction, accelerating the completion and documentation time of the effort.
- Meticulous, unbiased blueprinting to lower costs by avoiding painful oversight.
Appraisals and Blueprints
Appraisals and Blueprints provide you with an expert estimate of the current and future state of the environment. DST creates a blueprint to address vision based on industry best practices coupled with end user and budget requirements.
Each blueprint consists of detailed architecture documentation, workflow and gap analysis based on DST’s experience and industry best practices. Incorporating the gap analysis creates a plan to move from where you are as defined in the appraisal to where you want to be as defined in the blueprint.
Appraisals and Blueprints are available for the following areas:
- Performance
- Workflow
- Archival and Data Retention
- Data Protection
- Storage Architecture
Appraisals and Blueprints
Appraisals and Blueprints provide you with an expert estimate of the current and future state of the environment. DST creates a blueprint to address vision based on industry best practices coupled with end user and budget requirements.
Each blueprint consists of detailed architecture documentation, workflow and gap analysis based on DST’s experience and industry best practices. Incorporating the gap analysis creates a plan to move from where you are as defined in the appraisal to where you want to be as defined in the blueprint.
Appraisals and Blueprints are available for the following areas:
- Performance
- Workflow
- Archival and Data Retention
- Data Protection
- Storage Architecture
Strategic Planning
A focused team of industry experts go through a systematic process that provides you with a strategic 5-year roadmap for your HPC services and environment. This exhaustive process starts with an intensive discovery phase with key stakeholders and end user interviews, coupled with discovery and documentation of the HPC ecosystem. DST documents each aspect of the current state including:
- Hardware and networking
- Software
- Work flow
- Applications and application dependencies
- Filesystems
- Data archiving and data retention
Once the current state is understood the DST consulting team begins a process of extensive interviews with key stake holders in the user community. The purpose here is to understand their requirements today and the workloads they project moving forward so that we can understand requirements of the future. We also meet with executive management so that our strategic plan includes their vision and objectives for the years ahead. The visionary design to combines current reality with executive objectives and end user requirements (current and projected). The end product is a detailed document that defines:
- Current environment and potential recommendations outlining to eliminate or move bottlenecks to provide optimal performance for existing HPC services
- Design goals based on executive and end user requirements using a modular and elastic architecture that does not limit adoption of new technology
- Meticulously detailed architecture of future state with a protocol and platform agnostic environment for easy adoption of new technologies.
- High level project plan outlining migration from current state to future state
The deliverable includes a phased approach and high level project plan for moving to the future state. Establishing a roadmap on how to move toward can be daunting goal.
The final 5-year strategic plan you receive includes a phased high level project plan realizing a “how to” execution of the strategic plan.
This phased project plan includes: Detail of the phase tasks; A high level sequence of steps involved in each phase; Resource requirements for each phase; Projected time frame for completion.
System Design and Deployment
Our team of experts stay focused to provide the helping hands physically or digitally so your team can deliver on time and under budget. Clients leverage DST to provide one or all of the following services:
- System Design: Create detailed architecture and documentation of new clusters environments.
- System Integration: Procure, assemble and test hardware and software.
- System Installation: Provide basic rack and stack, power, ping and pipe.
- System Implementation: Implement cluster components including the load of operating system and file systems, setting up file permissions, scheduler deployment and cluster node image creation.
System Deployment: Production-ready cluster deployment from loading and testing of applications; to comprehensive file system tuning; to scheduler optimization.
System Design and Deployment
Our team of experts stay focused to provide the helping hands physically or digitally so your team can deliver on time and under budget. Clients leverage DST to provide one or all of the following services:
- System Design: Create detailed architecture and documentation of new clusters environments.
- System Integration: Procure, assemble and test hardware and software.
- System Installation: Provide basic rack and stack, power, ping and pipe.
- System Implementation: Implement cluster components including the load of operating system and file systems, setting up file permissions, scheduler deployment and cluster node image creation.
System Deployment: Production-ready cluster deployment from loading and testing of applications; to comprehensive file system tuning; to scheduler optimization.
How Are Consulting Services Delivered?
We have a step-by-step methodology designed to provide focused expertise as an extension of your team during the life of the engagement. The DST Consulting methodology is designed to provide a high level of collaboration and accountability which includes:
- Develop detailed Statement of Work (SOW) outlining the effort
- Establish contract based on SOW
- Design a detailed project plan
- On-site kickoff meeting: establish roles, review project plan, create collaboration framework, agree on timeline
- Implement collaboration and project management tools deployed among the team
- Begin weekly project status meetings leading to project status report creation and disseminating to the team
- Establish on-site meetings as required
- Complete project and documentation Blueprint
- Presentation of findings in the form of a detailed Blueprint
We deliver accountability to you with a custom solution acting as an extension of your team.
On Demand
DST offers senior level engineers, architects, and developers to help you rapidly and predictably launch and manage your HPC vision. Our unique teaming model fully engages you in the process and acts as an extension of your team, working with your HPC staff members and the HPC vendor community.
On Demand Services offer our clients the ability to scale out senior level expertise focused on the effort at hand, becoming an extension of your team. Having an agreement in place with Data in Science for a number of hours permits easy consumption access to on-demand service for fulfilling organizational needs.
What Are On Demand Services?
Typical use cases for HPC on-demand services include:
- Project-based Assistance – Our team of senior HPC engineers work with your team to provide HPC support to accelerate the deployment of targeted efforts and projects including file system migrations, R studio deployment, enabling MatLab, additional cluster builds, or establishing a swift stack archive repository.
- Internal Skill Augmentation – On-demand hours are often employed to augment the existing staff. Organizations often only have a single member of the team who poses advanced understanding in parallel file system, Linux administration, or cluster management experience. Our ability to augment the team when a senior level skilled resource, takes vacation, leaves for training, or transitions to a new job, helps organizations insure a deeper benches of critical skillsets.
- Lifeline – On-demand Services could be a lifeline support system for your existing HPC team. Our highly skilled HPC centric team acts as the extra set of eyes in HPC Support to optimize the cluster or troubleshoot a particularly difficult issue with rapid resolution. A resolution that often takes days or weeks just to research.Typical use cases for HPC on-demand services include:
- Project-based Assistance – Our team of senior HPC engineers work with your team to provide HPC support to accelerate the deployment of targeted efforts and projects including file system migrations, R studio deployment, enabling MatLab, additional cluster builds, or establishing a swift stack archive repository.
- Internal Skill Augmentation – On-demand hours are often employed to augment the existing staff. Organizations often only have a single member of the team who poses advanced understanding in parallel file system, Linux administration, or cluster management experience. Our ability to augment the team when a senior level skilled resource, takes vacation, leaves for training, or transitions to a new job, helps organizations insure a deeper benches of critical skillsets.
- Lifeline – On-demand Services could be a lifeline support system for your existing HPC team. Our highly skilled HPC centric team acts as the extra set of eyes in HPC Support to optimize the cluster or troubleshoot a particularly difficult issue with rapid resolution. A resolution that often takes days or weeks just to research.
Why On Demand Services
DST offers senior level engineers, architects, and developers to help you rapidly and predictably launch and manage your HPC vision. Our unique teaming model fully engages you in the process and acts as an extension of your team, working with your HPC staff members and the HPC vendor community.
DST Services encompassing best practices to deploy, manage, trouble shoot or tune premium products like Bright Cluster Manager, IBM Spectrum Scale, Lustre, BeeGFS, Quobyte, ZFS, DDN, Penguin, Mellanox, Seagate ClustorStor, RAID Inc, Cray and others.
What Are On Demand Services?
Typical use cases for HPC On Demand services include:
- Project-based Assistance – Our team of senior HPC engineers work with your team to provide HPC support to accelerate the deployment of targeted efforts and projects including file system migrations, R studio deployment, enabling MatLab, additional cluster builds, or establishing a swift stack archive repository.
- Internal Skill Augmentation – On Demand hours are often employed to augment the existing staff. Organizations often only have a single member of the team who poses advanced understanding in parallel file system, Linux administration, or cluster management experience. Our ability to augment the team when a senior level skilled resource, takes vacation, leaves for training, or transitions to a new job, helps organizations insure a deeper benches of critical skillsets.
- Lifeline – On Demand Services could be a lifeline support system for your existing HPC team. Our highly skilled HPC centric team acts as the extra set of eyes in HPC Support to optimize the cluster or troubleshoot a particularly difficult issue with rapid resolution. A resolution that often takes days or weeks just to research.Typical use cases for HPC on-demand services include:
- Project-based Assistance – Our team of senior HPC engineers work with your team to provide HPC support to accelerate the deployment of targeted efforts and projects including file system migrations, R studio deployment, enabling MatLab, additional cluster builds, or establishing a swift stack archive repository.
- Internal Skill Augmentation – On Demand hours are often employed to augment the existing staff. Organizations often only have a single member of the team who poses advanced understanding in parallel file system, Linux administration, or cluster management experience. Our ability to augment the team when a senior level skilled resource, takes vacation, leaves for training, or transitions to a new job, helps organizations insure a deeper benches of critical skillsets.
- Lifeline – On Demand Services could be a lifeline support system for your existing HPC team. Our highly skilled HPC centric team acts as the extra set of eyes in HPC Support to optimize the cluster or troubleshoot a particularly difficult issue with rapid resolution. A resolution that often takes days or weeks just to research.
On Demand Skills Matrix
The following is a non-exhaustive list of skills that the DST team can perform with an On-Demand engagement:
- File System: GPFS, Lustre, XtreemFS, Quobyte, ZFS
- R/RStudio/Parallel R
- LSF, PBS, SGE and UGE Schedulers
- Bright Computing Cluster Manager
- Python and Perl Scripting
- JAVA Development
- MATLAB and associated MATLAB Modules
- Linux Engineering
- Network Engineer
- KVM, oVirt and OpenStack Virtualization
- DataDogHQ and Nagios
- Cluster and File System Architects, Engineers and Administrators
On Demand Skills Matrix
The following is a non-exhaustive list of skills that the DST team can perform with an On Demand engagement:
- File System: GPFS, Lustre, XtreemFS, Quobyte, ZFS
- R/RStudio/Parallel R
- LSF, PBS, SGE and UGE Schedulers
- Bright Computing Cluster Manager
- Python and Perl Scripting
- JAVA Development
- MATLAB and associated MATLAB Modules
- Linux Engineering
- Network Engineer
- KVM, oVirt and OpenStack Virtualization
- DataDogHQ and Nagios
- Cluster and File System Architects, Engineers and Administrators
How Does On Demand Work?
Data in Science’s field proven On Demand methodology provides focused expertise as an extension of your team providing a high level of accountability in HPC Support; including real-time collaboration and detailed weekly reports on all activities for your review.
On Demand Methodology:
- Contract established with hours to be consumed on demand
- Project-based work SOW
- Detailed project plan
- Collaboration tools and constant communication
- Weekly reports detailing all activities
- Documentation of the effort upon completion
The DST team delivers accountability to your organization with a custom solution to act as an extension of your team. The following illustrates a report that is weekly report that is designed to inform the client of the hours used and the detailed tasks performed during those hours.