aws more_solution hpc high_performance_computing

Data Management and Transfer

  • Direct Connect (DX): Move data to the cloud via private network
  • Snowball and Snowmobile: Move PB data to cloud
  • Data Sync: Move data between on-premise S3, EFS, FSx for Windows.

Compute & Networking

  • EC2 Instance: Optimize CPU, GPU, Spot Instance for cost saving + Auto Scaling
  • EC2 Placement Group: EC2 Server in the same Rack and same AZ for low latency
  • EC2 Enhanced Networking: Higher bandwidth
    • ENA (Elastic Network Adapter)
    • Intel 82599 VF
  • EFA (Elastic Fabric Adapter)
    • Improved ENA but only for Linux
    • Great for inter-nodes communication, tightly coupled workload like Cluster Placement Group, (Same Rack Same AZ) or AWS ParallelCluster

Storage

  • Instance attached storage: EBS (io1/io2), Instance Store
  • Network Storage: S3, EFS, FSx for Lustre (for Linux and Cluster)

Automation and Orchestration

  • AWS Batch: Run multi-node paralell jobs, spread via multiple EC2 instances
  • AWS ParallelCluster
    • Open source for cluster management tool to deploy HPC on AWS
    • Enable EFA to improve network performance