startClusterHealthCheck method
Start deep health checks for a SageMaker HyperPod cluster. You can use DescribeClusterNode API to track progress of the deep health checks. The unhealthy nodes will be automatically rebooted or replaced. Please see Resilience-related Kubernetes labels by SageMaker HyperPod for details.
May throw ResourceNotFound.
Parameter clusterName :
The string name or the Amazon Resource Name (ARN) of the SageMaker
HyperPod cluster.
Parameter deepHealthCheckConfigurations :
A list of configurations containing instance group names, EC2 instance
IDs, and deep health checks to perform.
Implementation
Future<StartClusterHealthCheckResponse> startClusterHealthCheck({
required String clusterName,
required List<InstanceGroupHealthCheckConfiguration>
deepHealthCheckConfigurations,
}) async {
final headers = <String, String>{
'Content-Type': 'application/x-amz-json-1.1',
'X-Amz-Target': 'SageMaker.StartClusterHealthCheck'
};
final jsonResponse = await _protocol.send(
method: 'POST',
requestUri: '/',
exceptionFnMap: _exceptionFns,
// TODO queryParams
headers: headers,
payload: {
'ClusterName': clusterName,
'DeepHealthCheckConfigurations': deepHealthCheckConfigurations,
},
);
return StartClusterHealthCheckResponse.fromJson(jsonResponse.body);
}