createTrainingPlan method
Creates a new training plan in SageMaker to reserve compute capacity.
Amazon SageMaker Training Plan is a capability within SageMaker that allows customers to reserve and manage GPU capacity for large-scale AI model training. It provides a way to secure predictable access to computational resources within specific timelines and budgets, without the need to manage underlying infrastructure.
How it works
Plans can be created for specific resources such as SageMaker Training Jobs or SageMaker HyperPod clusters, automatically provisioning resources, setting up infrastructure, executing workloads, and handling infrastructure failures.
Plan creation workflow
-
Users search for available plan offerings based on their requirements
(e.g., instance type, count, start time, duration) using the
SearchTrainingPlanOfferingsAPI operation. - They create a plan that best matches their needs using the ID of the plan offering they want to use.
-
After successful upfront payment, the plan's status becomes
Scheduled. -
The plan can be used to:
- Queue training jobs.
- Allocate to an instance group of a SageMaker HyperPod cluster.
-
When the plan start date arrives, it becomes
Active. Based on available reserved capacity:- Training jobs are launched.
- Instance groups are provisioned.
A plan can consist of one or more Reserved Capacities, each defined by a
specific instance type, quantity, Availability Zone, duration, and start
and end times. For more information about Reserved Capacity, see ReservedCapacitySummary
.
May throw ResourceInUse.
May throw ResourceLimitExceeded.
May throw ResourceNotFound.
Parameter trainingPlanName :
The name of the training plan to create.
Parameter trainingPlanOfferingId :
The unique identifier of the training plan offering to use for creating
this plan.
Parameter spareInstanceCountPerUltraServer :
Number of spare instances to reserve per UltraServer for enhanced
resiliency. Default is 1.
Parameter tags :
An array of key-value pairs to apply to this training plan.
Implementation
Future<CreateTrainingPlanResponse> createTrainingPlan({
required String trainingPlanName,
required String trainingPlanOfferingId,
int? spareInstanceCountPerUltraServer,
List<Tag>? tags,
}) async {
_s.validateNumRange(
'spareInstanceCountPerUltraServer',
spareInstanceCountPerUltraServer,
0,
1152921504606846976,
);
final headers = <String, String>{
'Content-Type': 'application/x-amz-json-1.1',
'X-Amz-Target': 'SageMaker.CreateTrainingPlan'
};
final jsonResponse = await _protocol.send(
method: 'POST',
requestUri: '/',
exceptionFnMap: _exceptionFns,
// TODO queryParams
headers: headers,
payload: {
'TrainingPlanName': trainingPlanName,
'TrainingPlanOfferingId': trainingPlanOfferingId,
if (spareInstanceCountPerUltraServer != null)
'SpareInstanceCountPerUltraServer': spareInstanceCountPerUltraServer,
if (tags != null) 'Tags': tags,
},
);
return CreateTrainingPlanResponse.fromJson(jsonResponse.body);
}