startMLDataProcessingJob method
- required String inputDataS3Location,
- required String processedDataS3Location,
- String? configFileName,
- String? id,
- String? modelType,
- String? neptuneIamRoleArn,
- String? previousDataProcessingJobId,
- String? processingInstanceType,
- int? processingInstanceVolumeSizeInGB,
- int? processingTimeOutInSeconds,
- String? s3OutputEncryptionKMSKey,
- String? sagemakerIamRoleArn,
- List<
String> ? securityGroupIds, - List<
String> ? subnets, - String? volumeEncryptionKMSKey,
Creates a new Neptune ML data processing job for processing the graph data
exported from Neptune for training. See The
dataprocessing command.
When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:StartMLModelDataProcessingJob IAM action in that cluster.
May throw BadRequestException.
May throw ClientTimeoutException.
May throw ConstraintViolationException.
May throw IllegalArgumentException.
May throw InvalidArgumentException.
May throw InvalidParameterException.
May throw MissingParameterException.
May throw MLResourceNotFoundException.
May throw PreconditionsFailedException.
May throw TooManyRequestsException.
May throw UnsupportedOperationException.
Parameter inputDataS3Location :
The URI of the Amazon S3 location where you want SageMaker to download the
data needed to run the data processing job.
Parameter processedDataS3Location :
The URI of the Amazon S3 location where you want SageMaker to save the
results of a data processing job.
Parameter configFileName :
A data specification file that describes how to load the exported graph
data for training. The file is automatically generated by the Neptune
export toolkit. The default is
training-data-configuration.json.
Parameter id :
A unique identifier for the new job. The default is an autogenerated UUID.
Parameter modelType :
One of the two model types that Neptune ML currently supports:
heterogeneous graph models (heterogeneous), and knowledge
graph (kge). The default is none. If not specified, Neptune
ML chooses the model type automatically based on the data.
Parameter neptuneIamRoleArn :
The Amazon Resource Name (ARN) of an IAM role that SageMaker can assume to
perform tasks on your behalf. This must be listed in your DB cluster
parameter group or an error will occur.
Parameter previousDataProcessingJobId :
The job ID of a completed data processing job run on an earlier version of
the data.
Parameter processingInstanceType :
The type of ML instance used during data processing. Its memory should be
large enough to hold the processed dataset. The default is the smallest
ml.r5 type whose memory is ten times larger than the size of the exported
graph data on disk.
Parameter processingInstanceVolumeSizeInGB :
The disk volume size of the processing instance. Both input data and
processed data are stored on disk, so the volume size must be large enough
to hold both data sets. The default is 0. If not specified or 0, Neptune
ML chooses the volume size automatically based on the data size.
Parameter processingTimeOutInSeconds :
Timeout in seconds for the data processing job. The default is 86,400 (1
day).
Parameter s3OutputEncryptionKMSKey :
The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to
encrypt the output of the processing job. The default is none.
Parameter sagemakerIamRoleArn :
The ARN of an IAM role for SageMaker execution. This must be listed in
your DB cluster parameter group or an error will occur.
Parameter securityGroupIds :
The VPC security group IDs. The default is None.
Parameter subnets :
The IDs of the subnets in the Neptune VPC. The default is None.
Parameter volumeEncryptionKMSKey :
The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to
encrypt data on the storage volume attached to the ML compute instances
that run the training job. The default is None.
Implementation
Future<StartMLDataProcessingJobOutput> startMLDataProcessingJob({
required String inputDataS3Location,
required String processedDataS3Location,
String? configFileName,
String? id,
String? modelType,
String? neptuneIamRoleArn,
String? previousDataProcessingJobId,
String? processingInstanceType,
int? processingInstanceVolumeSizeInGB,
int? processingTimeOutInSeconds,
String? s3OutputEncryptionKMSKey,
String? sagemakerIamRoleArn,
List<String>? securityGroupIds,
List<String>? subnets,
String? volumeEncryptionKMSKey,
}) async {
final $payload = <String, dynamic>{
'inputDataS3Location': inputDataS3Location,
'processedDataS3Location': processedDataS3Location,
if (configFileName != null) 'configFileName': configFileName,
if (id != null) 'id': id,
if (modelType != null) 'modelType': modelType,
if (neptuneIamRoleArn != null) 'neptuneIamRoleArn': neptuneIamRoleArn,
if (previousDataProcessingJobId != null)
'previousDataProcessingJobId': previousDataProcessingJobId,
if (processingInstanceType != null)
'processingInstanceType': processingInstanceType,
if (processingInstanceVolumeSizeInGB != null)
'processingInstanceVolumeSizeInGB': processingInstanceVolumeSizeInGB,
if (processingTimeOutInSeconds != null)
'processingTimeOutInSeconds': processingTimeOutInSeconds,
if (s3OutputEncryptionKMSKey != null)
's3OutputEncryptionKMSKey': s3OutputEncryptionKMSKey,
if (sagemakerIamRoleArn != null)
'sagemakerIamRoleArn': sagemakerIamRoleArn,
if (securityGroupIds != null) 'securityGroupIds': securityGroupIds,
if (subnets != null) 'subnets': subnets,
if (volumeEncryptionKMSKey != null)
'volumeEncryptionKMSKey': volumeEncryptionKMSKey,
};
final response = await _protocol.send(
payload: $payload,
method: 'POST',
requestUri: '/ml/dataprocessing',
exceptionFnMap: _exceptionFns,
);
return StartMLDataProcessingJobOutput.fromJson(response);
}