dataproc/v1 library

Cloud Dataproc API - v1

Manages Hadoop-based clusters and jobs on Google Cloud Platform.

For more information, see cloud.google.com/dataproc/

Create an instance of DataprocApi to access these resources:

ProjectsResource
- ProjectsLocationsResource
- ProjectsRegionsResource

Classes

AcceleratorConfig: Specifies the type and number of accelerator cards attached to the instances of an instance.
AccessSessionSparkApplicationEnvironmentInfoResponse: Environment details of a Saprk Application.
AccessSessionSparkApplicationJobResponse: Details of a particular job associated with Spark Application
AccessSessionSparkApplicationResponse: A summary of Spark Application
AccessSessionSparkApplicationSqlQueryResponse: Details of a query for a Spark Application
AccessSessionSparkApplicationSqlSparkPlanGraphResponse: SparkPlanGraph for a Spark Application execution limited to maximum 10000 clusters.
AccessSessionSparkApplicationStageAttemptResponse: Stage Attempt for a Stage of a Spark Application
AccessSessionSparkApplicationStageRddOperationGraphResponse: RDD operation graph for a Spark Application Stage limited to maximum 10000 clusters.
AccessSparkApplicationEnvironmentInfoResponse: Environment details of a Saprk Application.
AccessSparkApplicationJobResponse: Details of a particular job associated with Spark Application
AccessSparkApplicationResponse: A summary of Spark Application
AccessSparkApplicationSqlQueryResponse: Details of a query for a Spark Application
AccessSparkApplicationSqlSparkPlanGraphResponse: SparkPlanGraph for a Spark Application execution limited to maximum 10000 clusters.
AccessSparkApplicationStageAttemptResponse: Stage Attempt for a Stage of a Spark Application
AccessSparkApplicationStageRddOperationGraphResponse: RDD operation graph for a Spark Application Stage limited to maximum 10000 clusters.
AccumulableInfo
AnalyzeBatchRequest: A request to analyze a batch workload.
ApplicationAttemptInfo: Specific attempt of an application.
ApplicationEnvironmentInfo: Details about the Environment that the application is running in.
ApplicationInfo: High level information corresponding to an application.
AppSummary
AutoscalingConfig: Autoscaling Policy config associated with the cluster.
AutoscalingPolicy: Describes an autoscaling policy for Dataproc cluster autoscaler.
AutotuningConfig: Autotuning configuration of the workload.
AuxiliaryNodeGroup: Node group identification and configuration information.
AuxiliaryServicesConfig: Auxiliary services configuration for a Cluster.
BasicAutoscalingAlgorithm: Basic algorithm for autoscaling.
BasicYarnAutoscalingConfig: Basic autoscaling configurations for YARN.
Batch: A representation of a batch workload in the service.
Binding: Associates members, or principals, with a role.
BuildInfo: Native Build Info
Cluster: Describes the identifying information, config, and status of a Dataproc cluster
ClusterConfig: The cluster config.
ClusterMetrics: Contains cluster daemon metrics, such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only.
ClusterSelector: A selector that chooses target cluster for jobs based on metadata.
ClusterStatus: The status of a cluster and its instances.
ClusterToRepair: Cluster to be repaired
ConsolidatedExecutorSummary: Consolidated summary about executors used by the application.
DataprocApi: Manages Hadoop-based clusters and jobs on Google Cloud Platform.
DataprocMetricConfig: Dataproc metric config.
DiagnoseClusterRequest: A request to collect cluster diagnostic information.
DiskConfig: Specifies the config of disk options for a group of VM instances.
DriverSchedulingConfig: Driver scheduling configuration.
EncryptionConfig: Encryption settings for the cluster.
EndpointConfig: Endpoint config for this cluster
EnvironmentConfig: Environment configuration for a workload.
ExecutionConfig: Execution configuration for a workload.
ExecutorMetrics
ExecutorMetricsDistributions
ExecutorPeakMetricsDistributions
ExecutorResourceRequest: Resources used per executor used by the application.
ExecutorStageSummary: Executor resources consumed by a stage.
ExecutorSummary: Details about executors used by the application.
FallbackReason: Native SQL Execution Data
FlinkJob: A Dataproc job for running Apache Flink applications on YARN.
GceClusterConfig: Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster.
GetIamPolicyRequest: Request message for GetIamPolicy method.
GkeClusterConfig: The cluster's GKE config.
GkeNodeConfig: Parameters that describe cluster nodes.
GkeNodePoolAcceleratorConfig: A GkeNodeConfigAcceleratorConfig represents a Hardware Accelerator request for a node pool.
GkeNodePoolAutoscalingConfig: GkeNodePoolAutoscaling contains information the cluster autoscaler needs to adjust the size of the node pool to the current cluster usage.
GkeNodePoolConfig: The configuration of a GKE node pool used by a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke#create-a-dataproc-on-gke-cluster).
GkeNodePoolTarget: GKE node pools that Dataproc workloads run on.
GoogleCloudDataprocV1WorkflowTemplateEncryptionConfig: Encryption settings for encrypting workflow template job arguments.
HadoopJob: A Dataproc job for running Apache Hadoop MapReduce (https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) jobs on Apache Hadoop YARN (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html).
HiveJob: A Dataproc job for running Apache Hive (https://hive.apache.org/) queries on YARN.
IdentityConfig: Identity related configuration, including service account based secure multi-tenancy user mappings.
InjectCredentialsRequest: A request to inject credentials into a cluster.
InputQuantileMetrics
InstanceFlexibilityPolicy: Instance flexibility Policy allowing a mixture of VM shapes and provisioning models.
InstanceGroupAutoscalingPolicyConfig: Configuration for the size bounds of an instance group, including its proportional size to other groups.
InstanceGroupConfig: The config settings for Compute Engine resources in an instance group, such as a master or worker group.
InstanceReference: A reference to a Compute Engine instance.
InstanceSelection: Defines machines types and a rank to which the machines types belong.
InstanceSelectionResult: Defines a mapping from machine types to the number of VMs that are created with each machine type.
InstantiateWorkflowTemplateRequest: A request to instantiate a workflow template.
Interval: Represents a time interval, encoded as a Timestamp start (inclusive) and a Timestamp end (exclusive).The start must be less than or equal to the end.
Job: A Dataproc job resource.
JobData: Data corresponding to a spark job.
JobPlacement: Dataproc job config.
JobReference: Encapsulates the full scoping used to reference a job.
JobScheduling: Job scheduling options.
JobsSummary: Data related to Jobs page summary
JobStatus: Dataproc job status.
JupyterConfig: Jupyter configuration for an interactive session.
KerberosConfig: Specifies Kerberos related configuration.
KubernetesClusterConfig: The configuration for running the Dataproc cluster on Kubernetes.
KubernetesSoftwareConfig: The software configuration for this Dataproc cluster running on Kubernetes.
LifecycleConfig: Specifies the cluster auto-delete schedule configuration.
ListAutoscalingPoliciesResponse: A response to a request to list autoscaling policies in a project.
ListBatchesResponse: A list of batch workloads.
ListClustersResponse: The list of all clusters in a project.
ListJobsResponse: A list of jobs in a project.
ListOperationsResponse: The response message for Operations.ListOperations.
ListSessionsResponse: A list of interactive sessions.
ListSessionTemplatesResponse: A list of session templates.
ListWorkflowTemplatesResponse: A response to a request to list workflow templates in a project.
LoggingConfig: The runtime logging config of the job.
ManagedCluster: Cluster that is managed by the workflow.
ManagedGroupConfig: Specifies the resources used to actively manage an instance group.
MemoryMetrics
MetastoreConfig: Specifies a Metastore configuration.
Metric: A Dataproc custom metric.
NamespacedGkeDeploymentTarget: Used only for the deprecated beta.
NativeBuildInfoUiData
NativeSqlExecutionUiData: Native SQL Execution Data
NodeGroup: Dataproc Node Group.
NodeGroupAffinity: Node Group Affinity for clusters using sole-tenant node groups.
NodeInitializationAction: Specifies an executable to run on a fully configured node and a timeout period for executable completion.
NodePool: indicating a list of workers of same type
Operation: This resource represents a long-running operation that is the result of a network API call.
OrderedJob: A job executed by the workflow.
OutputQuantileMetrics
ParameterValidation: Configuration for parameter validation.
PeripheralsConfig: Auxiliary services configuration for a workload.
PigJob: A Dataproc job for running Apache Pig (https://pig.apache.org/) queries on YARN.
Policy: An Identity and Access Management (IAM) policy, which specifies access controls for Google Cloud resources.A Policy is a collection of bindings.
PoolData: Pool Data
PrestoJob: A Dataproc job for running Presto (https://prestosql.io/) queries.
ProcessSummary: Process Summary
ProjectsLocationsAutoscalingPoliciesResource
ProjectsLocationsBatchesResource
ProjectsLocationsBatchesSparkApplicationsResource
ProjectsLocationsOperationsResource
ProjectsLocationsResource
ProjectsLocationsSessionsResource
ProjectsLocationsSessionsSparkApplicationsResource
ProjectsLocationsSessionTemplatesResource
ProjectsLocationsWorkflowTemplatesResource
ProjectsRegionsAutoscalingPoliciesResource
ProjectsRegionsClustersNodeGroupsResource
ProjectsRegionsClustersResource
ProjectsRegionsJobsResource
ProjectsRegionsOperationsResource
ProjectsRegionsResource
ProjectsRegionsWorkflowTemplatesResource
ProjectsResource
ProvisioningModelMix: Defines how Dataproc should create VMs with a mixture of provisioning models.
PyPiRepositoryConfig: Configuration for PyPi repository
PySparkBatch: A configuration for running an Apache PySpark (https://spark.apache.org/docs/latest/api/python/getting_started/quickstart.html) batch workload.
PySparkJob: A Dataproc job for running Apache PySpark (https://spark.apache.org/docs/latest/api/python/index.html#pyspark-overview) applications on YARN.
Quantiles: Quantile metrics data related to Tasks.
QueryList: A list of queries to run on a cluster.
RddDataDistribution: Details about RDD usage.
RddOperationCluster: A grouping of nodes representing higher level constructs (stage, job etc.).
RddOperationEdge: A directed edge representing dependency between two RDDs.
RddOperationGraph: Graph representing RDD dependencies.
RddOperationNode: A node in the RDD operation graph.
RddPartitionInfo: Information about RDD partitions.
RddStorageInfo: Overall data about RDD storage.
RegexValidation: Validation based on regular expressions.
RepairClusterRequest: A request to repair a cluster.
RepairNodeGroupRequest
RepositoryConfig: Configuration for dependency repositories
ReservationAffinity: Reservation Affinity for consuming Zonal reservation.
ResizeNodeGroupRequest: A request to resize a node group.
ResourceInformation
ResourceProfileInfo: Resource profile that contains information about all the resources required by executors and tasks.
RuntimeConfig: Runtime configuration for a workload.
RuntimeInfo: Runtime information about workload execution.
SearchSessionSparkApplicationExecutorsResponse: List of Executors associated with a Spark Application.
SearchSessionSparkApplicationExecutorStageSummaryResponse: List of Executors associated with a Spark Application Stage.
SearchSessionSparkApplicationJobsResponse: A list of Jobs associated with a Spark Application.
SearchSessionSparkApplicationSqlQueriesResponse: List of all queries for a Spark Application.
SearchSessionSparkApplicationsResponse: A list of summary of Spark Applications
SearchSessionSparkApplicationStageAttemptsResponse: A list of Stage Attempts for a Stage of a Spark Application.
SearchSessionSparkApplicationStageAttemptTasksResponse: List of tasks for a stage of a Spark Application
SearchSessionSparkApplicationStagesResponse: A list of stages associated with a Spark Application.
SearchSparkApplicationExecutorsResponse: List of Executors associated with a Spark Application.
SearchSparkApplicationExecutorStageSummaryResponse: List of Executors associated with a Spark Application Stage.
SearchSparkApplicationJobsResponse: A list of Jobs associated with a Spark Application.
SearchSparkApplicationSqlQueriesResponse: List of all queries for a Spark Application.
SearchSparkApplicationsResponse: A list of summary of Spark Applications
SearchSparkApplicationStageAttemptsResponse: A list of Stage Attempts for a Stage of a Spark Application.
SearchSparkApplicationStageAttemptTasksResponse: List of tasks for a stage of a Spark Application
SearchSparkApplicationStagesResponse: A list of stages associated with a Spark Application.
SecurityConfig: Security related configuration, including encryption, Kerberos, etc.
Session: A representation of a session.
SessionStateHistory: Historical state information.
SessionTemplate: A representation of a session template.
SetIamPolicyRequest: Request message for SetIamPolicy method.
ShieldedInstanceConfig: Shielded Instance Config for clusters using Compute Engine Shielded VMs (https://cloud.google.com/security/shielded-cloud/shielded-vm).
ShufflePushReadQuantileMetrics
ShuffleReadMetrics: Shuffle data read by the task.
ShuffleReadQuantileMetrics
ShuffleWriteQuantileMetrics
SinkProgress
SoftwareConfig: Specifies the selection and config of software inside the cluster.
SourceProgress
SparkApplication: A summary of Spark Application
SparkBatch: A configuration for running an Apache Spark (https://spark.apache.org/) batch workload.
SparkHistoryServerConfig: Spark History Server configuration for the workload.
SparkJob: A Dataproc job for running Apache Spark (https://spark.apache.org/) applications on YARN.
SparkPlanGraph: A graph used for storing information of an executionPlan of DataFrame.
SparkPlanGraphCluster: Represents a tree of spark plan.
SparkPlanGraphEdge: Represents a directed edge in the spark plan tree from child to parent.
SparkPlanGraphNode: Represents a node in the spark plan tree.
SparkPlanGraphNodeWrapper: Wrapper user to represent either a node or a cluster.
SparkRBatch: A configuration for running an Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) batch workload.
SparkRJob: A Dataproc job for running Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) applications on YARN.
SparkRuntimeInfo
SparkSqlBatch: A configuration for running Apache Spark SQL (https://spark.apache.org/sql/) queries as a batch workload.
SparkSqlJob: A Dataproc job for running Apache Spark SQL (https://spark.apache.org/sql/) queries.
SparkStandaloneAutoscalingConfig: Basic autoscaling configurations for Spark Standalone.
SparkWrapperObject: Outer message that contains the data obtained from spark listener, packaged with information that is required to process it.
SpeculationStageSummary: Details of the speculation task when speculative execution is enabled.
SqlExecutionUiData: SQL Execution Data
SqlPlanMetric: Metrics related to SQL execution.
StageAttemptTasksSummary: Data related to tasks summary for a Spark Stage Attempt
StageData: Data corresponding to a stage.
StageMetrics: Stage Level Aggregated Metrics
StageShuffleReadMetrics: Shuffle data read for the stage.
StagesSummary: Data related to Stages page summary
StartClusterRequest: A request to start a cluster.
StartupConfig: Configuration to handle the startup of instances during cluster create and update process.
StateHistory: Historical state information.
StateOperatorProgress
StopClusterRequest: A request to stop a cluster.
StreamBlockData: Stream Block Data.
StreamingQueryData: Streaming
StreamingQueryProgress
SubmitJobRequest: A request to submit a job.
SummarizeSessionSparkApplicationExecutorsResponse: Consolidated summary of executors for a Spark Application.
SummarizeSessionSparkApplicationJobsResponse: Summary of a Spark Application jobs.
SummarizeSessionSparkApplicationStageAttemptTasksResponse: Summary of tasks for a Spark Application stage attempt.
SummarizeSessionSparkApplicationStagesResponse: Summary of a Spark Application stages.
SummarizeSparkApplicationExecutorsResponse: Consolidated summary of executors for a Spark Application.
SummarizeSparkApplicationJobsResponse: Summary of a Spark Application jobs.
SummarizeSparkApplicationStageAttemptTasksResponse: Summary of tasks for a Spark Application stage attempt.
SummarizeSparkApplicationStagesResponse: Summary of a Spark Application stages.
TaskData: Data corresponding to tasks created by spark.
TaskMetrics: Executor Task Metrics
TaskQuantileMetrics
TaskResourceRequest: Resources used per task created by the application.
TemplateParameter: A configurable parameter that replaces one or more fields in the template.
TerminateSessionRequest: A request to terminate an interactive session.
TrinoJob: A Dataproc job for running Trino (https://trino.io/) queries.
UsageMetrics: Usage metrics represent approximate total resources consumed by a workload.
UsageSnapshot: The usage snapshot represents the resources consumed by a workload at a specified time.
ValueValidation: Validation based on a list of allowed values.
VirtualClusterConfig: The Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-overview).
WorkflowTemplate: A Dataproc workflow template resource.
WorkflowTemplatePlacement: Specifies workflow execution target.Either managed_cluster or cluster_selector is required.
WriteSessionSparkApplicationContextRequest: Write Spark Application data to internal storage systems
WriteSparkApplicationContextRequest: Write Spark Application data to internal storage systems
YarnApplication: A YARN application created by a job.

Typedefs

CancelJobRequest = $Empty: A request to cancel a job.
ConfidentialInstanceConfig = $ConfidentialInstanceConfig: Confidential Instance Config for clusters using Confidential VMs (https://cloud.google.com/compute/confidential-vm/docs)
Empty = $Empty: A generic empty message that you can re-use to avoid defining duplicated empty messages in your APIs.
Expr = $Expr: Represents a textual expression in the Common Expression Language (CEL) syntax.
GetPolicyOptions = $GetPolicyOptions01: Encapsulates settings provided to GetIamPolicy.
InputMetrics = $InputMetrics: Metrics about the input data read by the task.
OutputMetrics = $OutputMetrics: Metrics about the data written by the task.
ShufflePushReadMetrics = $ShufflePushReadMetrics
ShuffleWriteMetrics = $ShuffleWriteMetrics: Shuffle data written by task.
SparkConnectConfig = $Empty: Spark connect configuration for an interactive session.
StageInputMetrics = $InputMetrics: Metrics about the input read by the stage.
StageOutputMetrics = $OutputMetrics: Metrics about the output written by the stage.
StageShufflePushReadMetrics = $ShufflePushReadMetrics
StageShuffleWriteMetrics = $ShuffleWriteMetrics: Shuffle data written for the stage.
Status = $Status00: The Status type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs.
TestIamPermissionsRequest = $TestIamPermissionsRequest01: Request message for TestIamPermissions method.
TestIamPermissionsResponse = $TestIamPermissionsResponse: Response message for TestIamPermissions method.
WriteSessionSparkApplicationContextResponse = $Empty: Response returned as an acknowledgement of receipt of data.
WriteSparkApplicationContextResponse = $Empty: Response returned as an acknowledgement of receipt of data.

Exceptions / Errors

ApiRequestError: Represents a general error reported by the API endpoint.
DetailedApiRequestError: Represents a specific error reported by the API endpoint.

dataproc/v1 library

Classes

Typedefs

Exceptions / Errors

googleapis package

v1 library