dataproc/v1 library
Cloud Dataproc API - v1
Manages Hadoop-based clusters and jobs on Google Cloud Platform.
For more information, see cloud.google.com/dataproc/
Create an instance of DataprocApi to access these resources:
Classes
- AcceleratorConfig
- Specifies the type and number of accelerator cards attached to the instances of an instance.
- AccessSessionSparkApplicationEnvironmentInfoResponse
- Environment details of a Saprk Application.
- AccessSessionSparkApplicationJobResponse
- Details of a particular job associated with Spark Application
- AccessSessionSparkApplicationResponse
- A summary of Spark Application
- AccessSessionSparkApplicationSqlQueryResponse
- Details of a query for a Spark Application
- AccessSessionSparkApplicationSqlSparkPlanGraphResponse
- SparkPlanGraph for a Spark Application execution limited to maximum 10000 clusters.
- AccessSessionSparkApplicationStageAttemptResponse
- Stage Attempt for a Stage of a Spark Application
- AccessSessionSparkApplicationStageRddOperationGraphResponse
- RDD operation graph for a Spark Application Stage limited to maximum 10000 clusters.
- AccessSparkApplicationEnvironmentInfoResponse
- Environment details of a Saprk Application.
- AccessSparkApplicationJobResponse
- Details of a particular job associated with Spark Application
- AccessSparkApplicationResponse
- A summary of Spark Application
- AccessSparkApplicationSqlQueryResponse
- Details of a query for a Spark Application
- AccessSparkApplicationSqlSparkPlanGraphResponse
- SparkPlanGraph for a Spark Application execution limited to maximum 10000 clusters.
- AccessSparkApplicationStageAttemptResponse
- Stage Attempt for a Stage of a Spark Application
- AccessSparkApplicationStageRddOperationGraphResponse
- RDD operation graph for a Spark Application Stage limited to maximum 10000 clusters.
- AccumulableInfo
- AnalyzeBatchRequest
- A request to analyze a batch workload.
- ApplicationAttemptInfo
- Specific attempt of an application.
- ApplicationEnvironmentInfo
- Details about the Environment that the application is running in.
- ApplicationInfo
- High level information corresponding to an application.
- AppSummary
- AutoscalingConfig
- Autoscaling Policy config associated with the cluster.
- AutoscalingPolicy
- Describes an autoscaling policy for Dataproc cluster autoscaler.
- AutotuningConfig
- Autotuning configuration of the workload.
- AuxiliaryNodeGroup
- Node group identification and configuration information.
- AuxiliaryServicesConfig
- Auxiliary services configuration for a Cluster.
- BasicAutoscalingAlgorithm
- Basic algorithm for autoscaling.
- BasicYarnAutoscalingConfig
- Basic autoscaling configurations for YARN.
- Batch
- A representation of a batch workload in the service.
- Binding
- Associates members, or principals, with a role.
- BuildInfo
- Native Build Info
- Cluster
- Describes the identifying information, config, and status of a Dataproc cluster
- ClusterConfig
- The cluster config.
- ClusterMetrics
- Contains cluster daemon metrics, such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only.
- ClusterSelector
- A selector that chooses target cluster for jobs based on metadata.
- ClusterStatus
- The status of a cluster and its instances.
- ClusterToRepair
- Cluster to be repaired
- ConsolidatedExecutorSummary
- Consolidated summary about executors used by the application.
- DataprocApi
- Manages Hadoop-based clusters and jobs on Google Cloud Platform.
- DataprocMetricConfig
- Dataproc metric config.
- DiagnoseClusterRequest
- A request to collect cluster diagnostic information.
- DiskConfig
- Specifies the config of disk options for a group of VM instances.
- DriverSchedulingConfig
- Driver scheduling configuration.
- EncryptionConfig
- Encryption settings for the cluster.
- EndpointConfig
- Endpoint config for this cluster
- EnvironmentConfig
- Environment configuration for a workload.
- ExecutionConfig
- Execution configuration for a workload.
- ExecutorMetrics
- ExecutorMetricsDistributions
- ExecutorPeakMetricsDistributions
- ExecutorResourceRequest
- Resources used per executor used by the application.
- ExecutorStageSummary
- Executor resources consumed by a stage.
- ExecutorSummary
- Details about executors used by the application.
- FallbackReason
- Native SQL Execution Data
- FlinkJob
- A Dataproc job for running Apache Flink applications on YARN.
- GceClusterConfig
- Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster.
- GetIamPolicyRequest
- Request message for GetIamPolicy method.
- GkeClusterConfig
- The cluster's GKE config.
- GkeNodeConfig
- Parameters that describe cluster nodes.
- GkeNodePoolAcceleratorConfig
- A GkeNodeConfigAcceleratorConfig represents a Hardware Accelerator request for a node pool.
- GkeNodePoolAutoscalingConfig
- GkeNodePoolAutoscaling contains information the cluster autoscaler needs to adjust the size of the node pool to the current cluster usage.
- GkeNodePoolConfig
- The configuration of a GKE node pool used by a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke#create-a-dataproc-on-gke-cluster).
- GkeNodePoolTarget
- GKE node pools that Dataproc workloads run on.
- GoogleCloudDataprocV1WorkflowTemplateEncryptionConfig
- Encryption settings for encrypting workflow template job arguments.
- HadoopJob
- A Dataproc job for running Apache Hadoop MapReduce (https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) jobs on Apache Hadoop YARN (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html).
- HiveJob
- A Dataproc job for running Apache Hive (https://hive.apache.org/) queries on YARN.
- IdentityConfig
- Identity related configuration, including service account based secure multi-tenancy user mappings.
- InjectCredentialsRequest
- A request to inject credentials into a cluster.
- InputQuantileMetrics
- InstanceFlexibilityPolicy
- Instance flexibility Policy allowing a mixture of VM shapes and provisioning models.
- InstanceGroupAutoscalingPolicyConfig
- Configuration for the size bounds of an instance group, including its proportional size to other groups.
- InstanceGroupConfig
- The config settings for Compute Engine resources in an instance group, such as a master or worker group.
- InstanceReference
- A reference to a Compute Engine instance.
- InstanceSelection
- Defines machines types and a rank to which the machines types belong.
- InstanceSelectionResult
- Defines a mapping from machine types to the number of VMs that are created with each machine type.
- InstantiateWorkflowTemplateRequest
- A request to instantiate a workflow template.
- Interval
- Represents a time interval, encoded as a Timestamp start (inclusive) and a Timestamp end (exclusive).The start must be less than or equal to the end.
- Job
- A Dataproc job resource.
- JobData
- Data corresponding to a spark job.
- JobPlacement
- Dataproc job config.
- JobReference
- Encapsulates the full scoping used to reference a job.
- JobScheduling
- Job scheduling options.
- JobsSummary
- Data related to Jobs page summary
- JobStatus
- Dataproc job status.
- JupyterConfig
- Jupyter configuration for an interactive session.
- KerberosConfig
- Specifies Kerberos related configuration.
- KubernetesClusterConfig
- The configuration for running the Dataproc cluster on Kubernetes.
- KubernetesSoftwareConfig
- The software configuration for this Dataproc cluster running on Kubernetes.
- LifecycleConfig
- Specifies the cluster auto-delete schedule configuration.
- ListAutoscalingPoliciesResponse
- A response to a request to list autoscaling policies in a project.
- ListBatchesResponse
- A list of batch workloads.
- ListClustersResponse
- The list of all clusters in a project.
- ListJobsResponse
- A list of jobs in a project.
- ListOperationsResponse
- The response message for Operations.ListOperations.
- ListSessionsResponse
- A list of interactive sessions.
- ListSessionTemplatesResponse
- A list of session templates.
- ListWorkflowTemplatesResponse
- A response to a request to list workflow templates in a project.
- LoggingConfig
- The runtime logging config of the job.
- ManagedCluster
- Cluster that is managed by the workflow.
- ManagedGroupConfig
- Specifies the resources used to actively manage an instance group.
- MemoryMetrics
- MetastoreConfig
- Specifies a Metastore configuration.
- Metric
- A Dataproc custom metric.
- NamespacedGkeDeploymentTarget
- Used only for the deprecated beta.
- NativeBuildInfoUiData
- NativeSqlExecutionUiData
- Native SQL Execution Data
- NodeGroup
- Dataproc Node Group.
- NodeGroupAffinity
- Node Group Affinity for clusters using sole-tenant node groups.
- NodeInitializationAction
- Specifies an executable to run on a fully configured node and a timeout period for executable completion.
- NodePool
- indicating a list of workers of same type
- Operation
- This resource represents a long-running operation that is the result of a network API call.
- OrderedJob
- A job executed by the workflow.
- OutputQuantileMetrics
- ParameterValidation
- Configuration for parameter validation.
- PeripheralsConfig
- Auxiliary services configuration for a workload.
- PigJob
- A Dataproc job for running Apache Pig (https://pig.apache.org/) queries on YARN.
- Policy
- An Identity and Access Management (IAM) policy, which specifies access controls for Google Cloud resources.A Policy is a collection of bindings.
- PoolData
- Pool Data
- PrestoJob
- A Dataproc job for running Presto (https://prestosql.io/) queries.
- ProcessSummary
- Process Summary
- ProjectsLocationsAutoscalingPoliciesResource
- ProjectsLocationsBatchesResource
- ProjectsLocationsBatchesSparkApplicationsResource
- ProjectsLocationsOperationsResource
- ProjectsLocationsResource
- ProjectsLocationsSessionsResource
- ProjectsLocationsSessionsSparkApplicationsResource
- ProjectsLocationsSessionTemplatesResource
- ProjectsLocationsWorkflowTemplatesResource
- ProjectsRegionsAutoscalingPoliciesResource
- ProjectsRegionsClustersNodeGroupsResource
- ProjectsRegionsClustersResource
- ProjectsRegionsJobsResource
- ProjectsRegionsOperationsResource
- ProjectsRegionsResource
- ProjectsRegionsWorkflowTemplatesResource
- ProjectsResource
- ProvisioningModelMix
- Defines how Dataproc should create VMs with a mixture of provisioning models.
- PyPiRepositoryConfig
- Configuration for PyPi repository
- PySparkBatch
- A configuration for running an Apache PySpark (https://spark.apache.org/docs/latest/api/python/getting_started/quickstart.html) batch workload.
- PySparkJob
- A Dataproc job for running Apache PySpark (https://spark.apache.org/docs/latest/api/python/index.html#pyspark-overview) applications on YARN.
- Quantiles
- Quantile metrics data related to Tasks.
- QueryList
- A list of queries to run on a cluster.
- RddDataDistribution
- Details about RDD usage.
- RddOperationCluster
- A grouping of nodes representing higher level constructs (stage, job etc.).
- RddOperationEdge
- A directed edge representing dependency between two RDDs.
- RddOperationGraph
- Graph representing RDD dependencies.
- RddOperationNode
- A node in the RDD operation graph.
- RddPartitionInfo
- Information about RDD partitions.
- RddStorageInfo
- Overall data about RDD storage.
- RegexValidation
- Validation based on regular expressions.
- RepairClusterRequest
- A request to repair a cluster.
- RepairNodeGroupRequest
- RepositoryConfig
- Configuration for dependency repositories
- ReservationAffinity
- Reservation Affinity for consuming Zonal reservation.
- ResizeNodeGroupRequest
- A request to resize a node group.
- ResourceInformation
- ResourceProfileInfo
- Resource profile that contains information about all the resources required by executors and tasks.
- RuntimeConfig
- Runtime configuration for a workload.
- RuntimeInfo
- Runtime information about workload execution.
- SearchSessionSparkApplicationExecutorsResponse
- List of Executors associated with a Spark Application.
- SearchSessionSparkApplicationExecutorStageSummaryResponse
- List of Executors associated with a Spark Application Stage.
- SearchSessionSparkApplicationJobsResponse
- A list of Jobs associated with a Spark Application.
- SearchSessionSparkApplicationSqlQueriesResponse
- List of all queries for a Spark Application.
- SearchSessionSparkApplicationsResponse
- A list of summary of Spark Applications
- SearchSessionSparkApplicationStageAttemptsResponse
- A list of Stage Attempts for a Stage of a Spark Application.
- SearchSessionSparkApplicationStageAttemptTasksResponse
- List of tasks for a stage of a Spark Application
- SearchSessionSparkApplicationStagesResponse
- A list of stages associated with a Spark Application.
- SearchSparkApplicationExecutorsResponse
- List of Executors associated with a Spark Application.
- SearchSparkApplicationExecutorStageSummaryResponse
- List of Executors associated with a Spark Application Stage.
- SearchSparkApplicationJobsResponse
- A list of Jobs associated with a Spark Application.
- SearchSparkApplicationSqlQueriesResponse
- List of all queries for a Spark Application.
- SearchSparkApplicationsResponse
- A list of summary of Spark Applications
- SearchSparkApplicationStageAttemptsResponse
- A list of Stage Attempts for a Stage of a Spark Application.
- SearchSparkApplicationStageAttemptTasksResponse
- List of tasks for a stage of a Spark Application
- SearchSparkApplicationStagesResponse
- A list of stages associated with a Spark Application.
- SecurityConfig
- Security related configuration, including encryption, Kerberos, etc.
- Session
- A representation of a session.
- SessionStateHistory
- Historical state information.
- SessionTemplate
- A representation of a session template.
- SetIamPolicyRequest
- Request message for SetIamPolicy method.
- ShieldedInstanceConfig
- Shielded Instance Config for clusters using Compute Engine Shielded VMs (https://cloud.google.com/security/shielded-cloud/shielded-vm).
- ShufflePushReadQuantileMetrics
- ShuffleReadMetrics
- Shuffle data read by the task.
- ShuffleReadQuantileMetrics
- ShuffleWriteQuantileMetrics
- SinkProgress
- SoftwareConfig
- Specifies the selection and config of software inside the cluster.
- SourceProgress
- SparkApplication
- A summary of Spark Application
- SparkBatch
- A configuration for running an Apache Spark (https://spark.apache.org/) batch workload.
- SparkHistoryServerConfig
- Spark History Server configuration for the workload.
- SparkJob
- A Dataproc job for running Apache Spark (https://spark.apache.org/) applications on YARN.
- SparkPlanGraph
- A graph used for storing information of an executionPlan of DataFrame.
- SparkPlanGraphCluster
- Represents a tree of spark plan.
- SparkPlanGraphEdge
- Represents a directed edge in the spark plan tree from child to parent.
- SparkPlanGraphNode
- Represents a node in the spark plan tree.
- SparkPlanGraphNodeWrapper
- Wrapper user to represent either a node or a cluster.
- SparkRBatch
- A configuration for running an Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) batch workload.
- SparkRJob
- A Dataproc job for running Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) applications on YARN.
- SparkRuntimeInfo
- SparkSqlBatch
- A configuration for running Apache Spark SQL (https://spark.apache.org/sql/) queries as a batch workload.
- SparkSqlJob
- A Dataproc job for running Apache Spark SQL (https://spark.apache.org/sql/) queries.
- SparkStandaloneAutoscalingConfig
- Basic autoscaling configurations for Spark Standalone.
- SparkWrapperObject
- Outer message that contains the data obtained from spark listener, packaged with information that is required to process it.
- SpeculationStageSummary
- Details of the speculation task when speculative execution is enabled.
- SqlExecutionUiData
- SQL Execution Data
- SqlPlanMetric
- Metrics related to SQL execution.
- StageAttemptTasksSummary
- Data related to tasks summary for a Spark Stage Attempt
- StageData
- Data corresponding to a stage.
- StageMetrics
- Stage Level Aggregated Metrics
- StageShuffleReadMetrics
- Shuffle data read for the stage.
- StagesSummary
- Data related to Stages page summary
- StartClusterRequest
- A request to start a cluster.
- StartupConfig
- Configuration to handle the startup of instances during cluster create and update process.
- StateHistory
- Historical state information.
- StateOperatorProgress
- StopClusterRequest
- A request to stop a cluster.
- StreamBlockData
- Stream Block Data.
- StreamingQueryData
- Streaming
- StreamingQueryProgress
- SubmitJobRequest
- A request to submit a job.
- SummarizeSessionSparkApplicationExecutorsResponse
- Consolidated summary of executors for a Spark Application.
- SummarizeSessionSparkApplicationJobsResponse
- Summary of a Spark Application jobs.
- SummarizeSessionSparkApplicationStageAttemptTasksResponse
- Summary of tasks for a Spark Application stage attempt.
- SummarizeSessionSparkApplicationStagesResponse
- Summary of a Spark Application stages.
- SummarizeSparkApplicationExecutorsResponse
- Consolidated summary of executors for a Spark Application.
- SummarizeSparkApplicationJobsResponse
- Summary of a Spark Application jobs.
- SummarizeSparkApplicationStageAttemptTasksResponse
- Summary of tasks for a Spark Application stage attempt.
- SummarizeSparkApplicationStagesResponse
- Summary of a Spark Application stages.
- TaskData
- Data corresponding to tasks created by spark.
- TaskMetrics
- Executor Task Metrics
- TaskQuantileMetrics
- TaskResourceRequest
- Resources used per task created by the application.
- TemplateParameter
- A configurable parameter that replaces one or more fields in the template.
- TerminateSessionRequest
- A request to terminate an interactive session.
- TrinoJob
- A Dataproc job for running Trino (https://trino.io/) queries.
- UsageMetrics
- Usage metrics represent approximate total resources consumed by a workload.
- UsageSnapshot
- The usage snapshot represents the resources consumed by a workload at a specified time.
- ValueValidation
- Validation based on a list of allowed values.
- VirtualClusterConfig
- The Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-overview).
- WorkflowTemplate
- A Dataproc workflow template resource.
- WorkflowTemplatePlacement
- Specifies workflow execution target.Either managed_cluster or cluster_selector is required.
- WriteSessionSparkApplicationContextRequest
- Write Spark Application data to internal storage systems
- WriteSparkApplicationContextRequest
- Write Spark Application data to internal storage systems
- YarnApplication
- A YARN application created by a job.
Typedefs
- CancelJobRequest = $Empty
- A request to cancel a job.
- ConfidentialInstanceConfig = $ConfidentialInstanceConfig
- Confidential Instance Config for clusters using Confidential VMs (https://cloud.google.com/compute/confidential-vm/docs)
- Empty = $Empty
- A generic empty message that you can re-use to avoid defining duplicated empty messages in your APIs.
- Expr = $Expr
- Represents a textual expression in the Common Expression Language (CEL) syntax.
- GetPolicyOptions = $GetPolicyOptions01
- Encapsulates settings provided to GetIamPolicy.
- InputMetrics = $InputMetrics
- Metrics about the input data read by the task.
- OutputMetrics = $OutputMetrics
- Metrics about the data written by the task.
- ShufflePushReadMetrics = $ShufflePushReadMetrics
- ShuffleWriteMetrics = $ShuffleWriteMetrics
- Shuffle data written by task.
- SparkConnectConfig = $Empty
- Spark connect configuration for an interactive session.
- StageInputMetrics = $InputMetrics
- Metrics about the input read by the stage.
- StageOutputMetrics = $OutputMetrics
- Metrics about the output written by the stage.
- StageShufflePushReadMetrics = $ShufflePushReadMetrics
- StageShuffleWriteMetrics = $ShuffleWriteMetrics
- Shuffle data written for the stage.
- Status = $Status00
- The Status type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs.
- TestIamPermissionsRequest = $TestIamPermissionsRequest01
- Request message for TestIamPermissions method.
- TestIamPermissionsResponse = $TestIamPermissionsResponse
- Response message for TestIamPermissions method.
- WriteSessionSparkApplicationContextResponse = $Empty
- Response returned as an acknowledgement of receipt of data.
- WriteSparkApplicationContextResponse = $Empty
- Response returned as an acknowledgement of receipt of data.
Exceptions / Errors
- ApiRequestError
- Represents a general error reported by the API endpoint.
- DetailedApiRequestError
- Represents a specific error reported by the API endpoint.