All Classes and Interfaces
Class
Description
Class for absolute error loss calculation (for regression).
Base class for launcher implementations.
Indicates that the source accepts the latest seen offset, which requires streaming execution
 to provide the latest seen offset when restarting the streaming query from checkpoint.
:: DeveloperApi ::
 Information about an 
AccumulatorV2 modified during a task or stage.An internal class used to track accumulators by Spark itself.
The base class for accumulators, that can accumulate inputs of type 
IN, and produce output of
 type OUT.Trait for functions and their derivatives for functional layers
Fit a parametric survival regression model named accelerated failure time (AFT) model
 (see 
 Accelerated failure time model (Wikipedia))
 based on the Weibull distribution of the survival time.
Model produced by 
AFTSurvivalRegression.Params for accelerated failure time (AFT) regression.
AggregatedDialect can unify multiple dialects into one virtual Dialect.
Base class of the Aggregate Functions.
Interface for a function that produces a result value by aggregating over multiple input rows.
Aggregation in SQL statement.
:: DeveloperApi ::
 A set of functions used to aggregate data.
A base class for user-defined aggregations, which can be used in 
Dataset operations to take
 all of the elements of a group and reduce them to a single value.Enum to select the algorithm for the decision tree
Used in full graph update to select all flows.
A message used by ReceiverTracker to ask all receiver's ids still stored in
 ReceiverTrackerEndpoint.
Used in full graph updates to select all tables.
Alternating Least Squares (ALS) matrix factorization.
Alternating Least Squares matrix factorization.
Trait for least squares solvers applied to the normal equation.
Rating class for better code readability.
Model fitted by ALS.
Common params for ALS and ALSModel.
Common params for ALS.
A predicate that always evaluates to 
false.A filter that always evaluates to 
false.A predicate that always evaluates to 
true.A filter that always evaluates to 
true.Thrown when a query fails to analyze, usually because the query itself is invalid.
Represents a warning generated as part of graph analysis.
Warning that some streaming reader options are being dropped
A predicate that evaluates to 
true iff both left and right evaluate to
 true.A filter that evaluates to 
true iff both left or right evaluate to true.ANOVA Test for continuous data.
An 
AbstractDataType that matches any concrete data types.A 
Flow that reads source[s] completely and appends data to the target, just once.An interface for creating history listeners(to replay event logs) defined in other modules like
 SQL, and setup the UI of the plugin to rebuild the history UI.
Implements in-place application of functions in the arrays
An object that computes a function incrementally by merging in results of type U from multiple
 tasks.
Computes the area under the curve (AUC) using the trapezoidal rule.
ARPACK routines for MLlib's vectors and matrices.
Implicit methods related to Scala Array.
A column vector backed by Apache Arrow.
Generates association rules from a 
RDD[FreqItemset[Item}.An association rule between sets of items.
An asynchronous queue for events.
A set of asynchronous RDD actions available through an implicit conversion.
Abstract class for ML attributes.
Trait for ML attribute factories.
Attributes that describe a vector ML column.
Keys used to store attributes.
An enum-like type for attribute types: 
AttributeType$.Numeric, AttributeType$.Nominal,
 and AttributeType$.Binary.An aggregate function that returns the mean of all the values in a group.
A 
BackoffStrategy determines the backoff duration (how long we should wait) for
 retries after failures.:: Experimental ::
 A 
TaskContext with extra contextual info and tooling for tasks in a barrier stage.:: Experimental ::
 Carries all task infos of a barrier task.
Base class for resource handlers that use app-specific data.
Trait for 
MLWriter and MLReader.Represents a collection of tuples with a known schema.
Base class for streaming API handlers, provides easy access to the streaming listener that
 holds the app's information.
A physical representation of a data source scan for batch queries.
:: DeveloperApi ::
 Class having information on completed batches.
Options for a batch read of an input.
A `FlowExecution` that writes a batch `DataFrame` to a `Table`.
An interface that defines how to write the data to data source for batch processing.
:: DeveloperApi ::
 A sampler based on Bernoulli trials for partitioning a data sequence.
:: DeveloperApi ::
 A sampler based on Bernoulli trials.
Binarize a column of continuous features given a threshold.
A binary attribute.
Evaluator for binary classification, which expects input columns rawPrediction, label and
  an optional weight column.
Trait for a binary classification evaluation metric computer.
Evaluator for binary classification.
Abstraction for binary classification results for a given model.
Trait for a binary confusion matrix.
Abstraction for binary logistic regression results for a given model.
Binary logistic regression results for a given model.
Abstraction for binary logistic regression training results.
Binary logistic regression training results.
Abstraction for BinaryRandomForestClassification results for a given model.
Binary RandomForestClassification for a given model.
Abstraction for BinaryRandomForestClassification training results.
Binary RandomForestClassification training results.
Class that represents the group and value of a sample.
The data type representing 
Array[Byte] values.Utility functions that help us determine bounds on adjusted sampling rate to guarantee exact
 sample size with high confidence when sampling without replacement.
A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques"
 by Steinbach, Karypis, and Kumar, with modification to fit Spark.
A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques"
 by Steinbach, Karypis, and Kumar, with modification to fit Spark.
Model fitted by BisectingKMeans.
Clustering model produced by 
BisectingKMeans.Common params for BisectingKMeans and BisectingKMeansModel
Summary of BisectingKMeans.
BLAS routines for MLlib's vectors and matrices.
BLAS routines for MLlib's vectors and matrices.
Abstracts away how blocks are stored and provides different ways to read the underlying block
 data.
Listener object for BlockGenerator events
:: DeveloperApi ::
 Identifies a particular Block of data, usually associated with a single file.
:: DeveloperApi ::
 This class represent a unique identifier for a BlockManager.
The response message of 
GetLocationsAndStatus request.Driver to Executor message to get a heap histogram.
Driver to Executor message to trigger a thread dump.
Represents a distributed matrix in blocks of local matrices.
::DeveloperApi::
 BlockReplicationPrioritization provides logic for prioritizing a sequence of peers for
 replicating blocks.
:: DeveloperApi ::
 Stores information about a block status in a block manager.
A Bloom filter is a space-efficient probabilistic data structure that offers an approximate
 containment test with one-sided error: if it claims that an item is contained in it, this
 might be in error, but if it claims that an item is not contained in it, then this is
 definitely true.
Specialized version of 
Param[Boolean] for Java.The data type representing 
Boolean values.Configuration options for 
GradientBoostedTrees.A Double value with error bars and associated confidence.
Represents a function that is bound to an input type.
A procedure that is bound to input types.
In-place DGEMM and DGEMV for Breeze
A broadcast variable.
An interface for all the broadcast implementations in Spark (to allow
 multiple broadcast implementations).
This 
BucketedRandomProjectionLSH implements Locality Sensitive Hashing functions for
 Euclidean distance metrics.Model produced by 
BucketedRandomProjectionLSH, where multiple random vectors are stored.Params for 
BucketedRandomProjectionLSH.Bucketizer maps a column of continuous features to a column of feature buckets.Helper class that ensures a ManagedBuffer is released upon InputStream.close() and
 also detects stream corruption if streamCompressedOrEncrypted is true
The data type representing 
Byte values.Basic interface that all cached batches of data must support.
Provides APIs that handle transformations of SQL data associated with the cache/persist APIs.
The class representing calendar intervals.
The data type representing calendar intervals.
Case-insensitive map of string keys to string values.
Represents a cast expression in the public logical expression API.
Catalog interface for Spark.
An API to extend the Spark built-in session catalog.
A catalog in Spark, as returned by the 
listCatalogs method defined in Catalog.A marker interface to provide a catalog implementation for Spark.
Conversion helpers for working with v2 
CatalogPlugin.::Experimental::
 An interface for experimenting with a more direct connection to the query planner.
Split which tests a categorical feature.
Extractor Object for pulling out the root cause of an error.
A CHECK constraint.
Enumeration to manage state transitions of an RDD through checkpointing
A mutable class loader that gives preference to its own URLs over the parent class loader
 when loading classes and resources.
Deprecated.
use UnivariateFeatureSelector instead.
Creates a ChiSquared feature selector.
Model fitted by 
ChiSqSelector.Chi Squared selector model.
Conduct the chi-squared test for the input RDDs using the specified method.
param:  name String name for the method.
Object containing the test results for the chi-squared hypothesis test.
Chi-square hypothesis testing for categorical data.
Compute Cholesky decomposition.
Raised when there's a circular dependency in the current pipeline.
Model produced by a 
Classifier.Represents a classification model that predicts to which of a set of categories an example
 belongs.
Abstraction for multiclass classification results for a given model.
Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>>     
Single-label binary or multiclass classification.
(private[spark]) Params for classification.
Listener class used when any item has been cleaned by the Cleaner class.
Classes that represent cleaning tasks.
A WeakReference associated with a CleanupTask.
An interface to represent clocks, so that they can be mocked out in unit tests.
A cleaner that renders closures serializable if they can be done so safely.
This class represents a transform for 
ClusterBySpec.A distribution where tuples that share the same values for clustering expressions are co-located
 in the same partition.
Evaluator for clustering results.
Metrics for clustering, which expects two input columns: prediction and label.
Summary of clustering algorithms.
Metrics for code generation.
:: DeveloperApi ::
 An RDD that cogroups its parents.
A function that returns zero or more output records from each grouping key and its values from 2
 Datasets.
Collation aware equivalent of 
EqualNullSafe.Collation aware equivalent of 
EqualTo.Base class for collation aware string filters.
Collation aware equivalent of 
GreaterThan.Collation aware equivalent of 
GreaterThanOrEqual.Collation aware equivalent of 
In.Collation aware equivalent of 
LessThan.Collation aware equivalent of 
LessThanOrEqual.Collation aware equivalent of 
StringContains.Collation aware equivalent of 
StringEndsWith.Collation aware equivalent of 
StringStartsWith.An 
accumulator for collecting a list of elements.A column in Spark, as returned by 
listColumns method in Catalog.A column that will be computed based on the data in a 
DataFrame.An interface representing a column of a 
Table.Array abstraction in 
ColumnVector.This class wraps multiple ColumnVectors as a row-wise table.
This class wraps an array of 
ColumnVector and provides a row view.Map abstraction in 
ColumnVector.Row abstraction in 
ColumnVector.A class representing the default value of a column.
A convenient class used for constructing schema.
Utility transformer for removing temporary columns from a DataFrame.
An interface to represent column statistics, which is part of
 
Statistics.An interface representing in-memory columnar data in Spark.
Contains basic command line parsing functionality and methods to parse some common Spark CLI
 options.
A 
Flow that declares exactly what data should be in the target table.A 
FutureAction for actions that could trigger multiple Spark jobs./**
 Represents a 
ReadLimit where the MicroBatchStream should scan approximately
 given maximum number of rows with at least the given minimum number of rows.:: DeveloperApi ::
 CompressionCodec allows the customization of choosing different compression implementations
 to be used in block storage.
A trait to implement 
Configurable interface.Connected components algorithm.
An input stream that always returns the same RDD on each time step.
A constraint that restricts states of data in a table.
An indicator of the validity of the constraint.
A factory object that is used to construct 
PipelineEvents with common fields
 automatically filled in.Deprecated.
since 4.0.0 as its only usage for Python evaluation is now extinct
For each barrier stage attempt, only at most one barrier() call can be active at any time, thus
 we can use (stageId, stageAttemptId) to identify the stage attempt where the barrier() call is
 from.
A variation on 
PartitionReader for use with continuous streaming processing.A variation on 
PartitionReaderFactory that returns ContinuousPartitionReader
 instead of PartitionReader.Split which tests a continuous feature.
A 
SparkDataStream for streaming queries with continuous mode.Represents a matrix in coordinate format.
Processor that is responsible for analyzing each flow and sort the nodes in
 topological order
API for correlation functions in MLlib, compatible with DataFrames and Datasets.
Trait for correlation algorithms.
Maintains supported and default correlation names.
Delegates computation to the specific correlation object based on the input method name.
The algorithm which is implemented in this object, instead, is an efficient and parallel
 implementation of the Silhouette using the cosine distance measure.
An aggregate function that returns the number of the specific row in a group.
A Count-min sketch is a probabilistic data structure used for cardinality estimation using
 sub-linear space.
An aggregate function that returns the number of rows in a group.
Extracts a vocabulary from document collections and generates a 
CountVectorizerModel.Converts a text document to a sparse vector of token counts.
Params for 
CountVectorizer and CountVectorizerModel.Trait to restrict calls to create and replace operations.
K-fold cross validation performs model selection by splitting the dataset into a set of
 non-overlapping randomly partitioned folds which are used as separate training and test datasets
 e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs,
 each of which uses 2/3 of the data for training and 1/3 for testing.
CrossValidatorModel contains the model with the highest average cross-validation
 metric across folds and uses this model to transform input data.
Writer for CrossValidatorModel.
Params for 
CrossValidator and CrossValidatorModel.A util class for manipulating IO encryption and decryption streams.
Built-in `CustomMetric` that computes average of metric values.
A custom metric.
Built-in `CustomMetric` that sums up metric values.
A custom task metric.
Types of events that can be handled by the DAGScheduler.
A database in Spark, as returned by the 
listDatabases method defined in Catalog.DataflowGraph represents the core graph structure for Spark declarative pipelines.
Resolves the 
DataflowGraph by processing each node in the graph.Exception thrown when transforming a node in the graph fails with a non-retryable error.
Exception thrown when transforming a node in the graph fails because at least one of its
 dependencies weren't yet transformed.
Functionality for working with missing data in 
DataFrames.Interface used to load a 
Dataset from external storage systems (e.g.Statistic functions for 
DataFrames.Interface used to write a 
Dataset to external storage systems (e.g.Interface used to write a 
Dataset to external storage using the v2
 API.A Dataset is a strongly typed collection of domain-specific objects that can be transformed in
 parallel using functional or relational operations.
A container for a 
Dataset, used for implicit conversions in Scala.DatasetManager is responsible for materializing tables in the catalog based on the given
 graph.Wraps table materialization exceptions.
Data sources should implement this trait so that they can register an alias to their data source.
Interface used to load a streaming 
Dataset from external storage systems (e.g.Interface used to write a streaming 
Dataset to external storage systems (e.g.The base type of all Spark SQL data types.
To get/create specific data type, users should use singleton objects and factory methods
 provided by this class.
A collection of methods used to validate data before applying ML algorithms.
A data writer returned by 
DataWriterFactory.createWriter(int, long) and is
 responsible for writing data for an input RDD partition.A factory of 
DataWriter returned by
 BatchWrite.createBatchWriterFactory(PhysicalWriteInfo), which is responsible for
 creating and initializing the actual data writer at executor side.The date type represents a valid date in the proleptic Gregorian calendar.
The type represents day-time intervals of the SQL standard.
A feature transformer that takes the 1D discrete cosine transform of a real vector.
A mutable implementation of BigDecimal that can hold a Long if values are small enough.
A 
Integral evidence parameter for Decimals.Common methods for Decimal evidence parameters
A 
Fractional evidence parameter for Decimals.The data type representing 
java.math.BigDecimal values.A class which implements a decision tree learning algorithm for classification and regression.
Decision tree model (http://en.wikipedia.org/wiki/Decision_tree_learning) for classification.
Decision tree learning algorithm (http://en.wikipedia.org/wiki/Decision_tree_learning)
 for classification.
Abstraction for Decision Tree models.
Decision tree model for classification or regression.
Helper classes for tree model persistence
Info for a 
NodeInfo for a 
SplitParameters for Decision Tree-based algorithms.
 Decision tree (Wikipedia) model for regression.
Decision tree
 learning algorithm for regression.
Returns DefaultAWSCredentialsProviderChain for authentication.
Helper trait for making simple 
Params types readable.Helper trait for making simple 
Params types writable.Coalesce the partitions of a parent RDD (
prev) into fewer partitions, so that each partition of
 this RDD computes one or more of the parent ones.A TopologyMapper that assumes all nodes are in the same rack
A class that represents default values.
A simple implementation of 
CatalogExtension, which implements all the catalog functions
 by calling the built-in session catalog directly.An interface that defines how to write a delta of rows during batch processing.
A logical representation of a data source write that handles a delta of rows.
An interface for building a 
DeltaWrite.A data writer returned by 
DeltaWriterFactory.createWriter(int, long) and is
 responsible for writing a delta of rows.A factory for creating 
DeltaWriters returned by
 DeltaBatchWrite.createBatchWriterFactory(PhysicalWriteInfo), which is responsible for
 creating and initializing writers at the executor side.Column-major dense matrix.
Column-major dense matrix.
A dense vector represented by a value array.
A dense vector represented by a value array.
:: DeveloperApi ::
 Base class for dependencies.
:: DeveloperApi ::
 A stream for reading serialized objects.
A holder for storing the deserialized values.
The deterministic level of RDD's output (i.e.
A parent trait for aggregators used in fitting MLlib models.
A Breeze diff function which represents a cost function for differentiable regularization
 of parameters.
Distributed model fitted by 
LDA.Distributed LDA model.
Represents a distributively stored matrix backed by one or more RDDs.
An interface that defines how data is distributed across partitions.
Helper methods to create distributions to pass into Spark.
An 
accumulator for computing sum, count, and averages for double precision
 floating numbers.Specialized version of 
Param[Array[Array[Double}] for Java.Specialized version of 
Param[Array[Double} for Java.A function that returns zero or more records of type Double from each input record.
A function that returns Doubles, and can be used to construct DoubleRDDs.
Specialized version of 
Param[Double] for Java.Extra functions available on RDDs of Doubles through an implicit conversion.
The data type representing 
Double values.:: DeveloperApi ::
 Driver component of a 
SparkPlugin.A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous
 sequence of RDDs (of the same type) representing a continuous stream of data (see
 org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
A single directed edge consisting of a source id, target id,
 and the data associated with the edge.
Criteria for filtering edges based on activeness.
Represents an edge along with its neighboring vertices and allows sending messages along the
 edge.
The direction of a directed edge relative to a vertex.
EdgeRDD[ED, VD] extends RDD[Edge[ED} by storing the edges in columnar format on each
 partition for performance.An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
Compute eigen-decomposition.
Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a
 provided "weight" vector.
Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a
 provided "weight" vector.
Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.
Placeholder term for the result of undefined interactions, e.g.
Used to convert a JVM object of type 
T to and from the internal Spark SQL representation.EncoderImplicits used to implicitly generate SQL Encoders.
Methods for creating an 
Encoder.Enum to select ensemble combining strategy for base learners
Info for one 
Node in a tree ensembleClass for calculating entropy during multiclass classification.
Performs equality comparison, similar to 
EqualTo.A filter that evaluates to 
true iff the column evaluates to a value
 equal to value.A reader to load error information from one or more JSON files.
Information associated with an error class.
Information associated with an error state / SQLSTATE.
Information associated with an error subclass.
Abstract class for estimators that fit models to data.
Abstract class for evaluators that compute metrics from predictions.
Contains helpers and implicits for working with 
PipelineEvents.:: DeveloperApi ::
 Task failed due to a runtime exception.
Manager for 
QueryExecutionListener.A flow's execution may complete for two reasons:
 1.
:: DeveloperApi ::
 Stores information about an executor to pass from the scheduler to SparkListeners.
:: DeveloperApi ::
 The task failed because the executor that it was running on was lost.
Executor metric types for executor-level metrics stored in ExecutorMetrics.
:: DeveloperApi ::
 Executor component of a 
SparkPlugin.An Executor resource request.
A set of Executor resource requests.
ExpectationAggregator computes the partial expectation results.
:: Experimental ::
 Holder for experimental methods for the bravest.
Class used to provide access to expired timer's expiry time.
A 
BackoffStrategy where the back-off time grows exponentially for each
 successive retry.Generates i.i.d.
Base class of the public logical expression API.
Helper methods to create logical transforms to pass into Spark.
A trait for a session extension to implement that provides addition explain plan
 information.
A cluster manager interface to plugin external scheduler.
An interface to execute an arbitrary string command inside an external execution engine rather
 than Spark.
Represent an extract function, which extracts and returns the value of a
 specified datetime field from a datetime or interval value expression.
Params for Factorization Machines
Indicates that there was a failure while stopping the flow.
Abstract class used to identify failures related to failures stopping an operation/timeouts.
False positive rate.
Feature hashing projects a set of categorical or numerical features into a feature vector of
 specified dimension (typically substantially smaller than that of the original feature
 space).
Enum to describe whether a feature is "continuous" or "categorical"
:: DeveloperApi ::
 Task failed to fetch shuffle data from a remote node.
A simple file based topology mapper.
A filter predicate for data sources.
Base interface for a function used in Dataset's filter function.
Event fired after 
Estimator.fit.Event fired before 
Estimator.fit.A function that returns zero or more output records from each input record.
A function that takes two inputs and returns zero or more output records.
A function that returns zero or more output records from each grouping key and its values.
::Experimental::
 Base interface for a map function used in
 
org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState(
 FlatMapGroupsWithStateFunction, org.apache.spark.sql.streaming.OutputMode,
 org.apache.spark.sql.Encoder, org.apache.spark.sql.Encoder)Specialized version of 
Param[Float] for Java.The data type representing 
Float values.A 
Flow is a node of data transformation in a dataflow graph.A `FlowExecution` specifies how to execute a flow and manages its execution.
Specifies how we should filter Flows.
A wrapper for the lambda function that defines a 
Flow.Holds the DataFrame returned by a 
FlowFunction along with the inputs used to
 construct it.param:  identifier The identifier of the flow.
Plans execution of 
Flows in a DataflowGraph by converting Flows into
 'FlowExecution's.This class should be used for all flow progress events logging, it controls the level at which
 events are logged.
Used in partial graph updates to select flows that flow to "selectedTables".
Model produced by 
FMClassifierAbstraction for FMClassifier results for a given model.
FMClassifier results for a given model.
Abstraction for FMClassifier training results.
FMClassifier training results.
Factorization Machines learning algorithm for classification.
Params for FMClassifier.
Model produced by 
FMRegressor.Factorization Machines learning algorithm for regression.
Params for FMRegressor
Base interface for a function used in Dataset's foreach function.
Base interface for a function used in Dataset's foreachPartition function.
The abstract class for writing custom logic to process data generated by a query.
A FOREIGN KEY constraint.
A parallel FP-growth algorithm to mine frequent itemsets.
A parallel FP-growth algorithm to mine frequent itemsets.
Frequent itemset.
Model fitted by FPGrowth.
Model trained by 
FPGrowth, which holds frequent itemsets.Common params for FPGrowth and FPGrowthModel
Base interface for functions whose return types do not create special RDDs.
A user-defined function in Spark, as returned by 
listFunctions method in Catalog.Base class for user-defined functions.
A zero-argument function that returns an R.
A two-argument function that takes arguments of type T1 and T2 and returns an R.
A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.
A four-argument function that takes arguments of type T1, T2, T3 and T4 and returns an R.
Catalog methods for working with Functions.
Commonly used functions available for DataFrame operations.
A future for the result of an action to support cancellation.
FValue test for continuous data.
Generates i.i.d.
Gaussian Mixture clustering.
This class performs expectation maximization for multivariate Gaussian
 Mixture Models (GMMs).
Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points
 are drawn from each Gaussian i with probability weights(i).
Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points
 are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are
 the respective mean and covariance for each Gaussian distribution i=1..k.
Common params for GaussianMixture and GaussianMixtureModel
Summary of GaussianMixture.
Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting)
 model for classification.
Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting)
 learning algorithm for classification.
Parameters for Gradient-Boosted Tree algorithms.
Gradient-Boosted Trees (GBTs)
 model for regression.
Gradient-Boosted Trees (GBTs)
 learning algorithm for regression.
The general implementation of 
AggregateFunc, which contains the upper-cased function
 name, the `isDistinct` flag and all the inputs.GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).
GeneralizedLinearModel (GLM) represents a model trained using
 GeneralizedLinearAlgorithm.
Fit a Generalized Linear Model
 (see 
 Generalized linear model (Wikipedia))
 specified by giving a symbolic description of the linear
 predictor (link function) and a description of the error distribution (family).
Binomial exponential family distribution.
Gamma exponential family distribution.
Gaussian exponential family distribution.
Poisson exponential family distribution.
Params for Generalized Linear Regression.
Model produced by 
GeneralizedLinearRegression.Summary of 
GeneralizedLinearRegression model and predictions.Summary of 
GeneralizedLinearRegression fitting and model.Trait for classes that provide 
GeneralMLWriter.A ML Writer which delegates based on the requested format.
The general representation of SQL scalar expressions, which contains the upper-cased
 expression name and all the children expressions.
Class for calculating the Gini impurity
 (http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity)
 during multiclass classification.
Helper class for import/export of GLM classification models.
Helper methods for import/export of GLM regression models.
Class used to compute the gradient for a loss function, given a single data point.
A class that implements
 Stochastic Gradient Boosting
 for regression and binary classification.
Represents a gradient boosted trees model.
Class used to solve an optimization problem using Gradient Descent.
The Graph abstractly represents a graph with arbitrary objects
 associated with vertices and edges.
An element in a 
DataflowGraph.Collection of errors that can be thrown during graph resolution / analysis.
Represents the reason why a flow execution should be stopped.
Indicates that the flow execution should be retried.
Indicates that the flow execution should be stopped with a specific reason.
Specifies how we should filter Graph elements.
A collection of graph generating functions.
Responsible for properly qualify the identifiers for datasets inside or referenced by the
 dataflow graph.
Represents the identifier for a dataset that is defined or referenced in a pipeline.
Represents the identifier for a dataset that is external to the current pipeline.
Represents the identifier for a dataset that is defined by the current pipeline.
An implementation of 
Graph to support computation on graphs.Provides utilities for loading 
Graphs from files.Contains additional functionality for 
Graph.A mutable context for registering tables, views, and flows in a dataflow graph.
Validations performed on a `DataflowGraph`.
A filter that evaluates to 
true iff the attribute evaluates to a value
 greater than value.A filter that evaluates to 
true iff the attribute evaluates to a value
 greater than or equal to value.This Spark trait is used for mapping a given userName to a set of groups which it belongs to.
:: Experimental ::
Represents the type of timeouts possible for the Dataset operations
 
mapGroupsWithState and flatMapGroupsWithState.An utility object to look up Hadoop compression codecs and create input streams.
::DeveloperApi::
 Hadoop delegation token provider.
Utility functions to simplify and speed-up file listing.
:: DeveloperApi ::
 An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS,
 sources in HBase, or S3), using the older MapReduce API (
org.apache.hadoop.mapred).Trait for shared param aggregationDepth (default: 2).
Trait for shared param blockSize.
Trait for shared param checkpointInterval.
Trait for shared param collectSubModels (default: false).
Trait for shared param distanceMeasure (default: "euclidean").
Trait for shared param elasticNetParam.
Trait for shared param featuresCol (default: "features").
Trait for shared param fitIntercept (default: true).
Trait for shared param handleInvalid.
Maps a sequence of terms to their term frequencies using the hashing trick.
Maps a sequence of terms to their term frequencies using the hashing trick.
A 
Partitioner that implements hash-based partitioning using
 Java's Object.hashCode.Trait for shared param inputCol.
Trait for shared param inputCols.
Trait for shared param labelCol (default: "label").
Trait for shared param loss.
Trait for shared param maxBlockSizeInMB (default: 0.0).
Trait for shared param maxIter.
Trait for shared param numFeatures (default: 262144).
Trait for shared param outputCol (default: uid + "__output").
Trait for shared param outputCols.
Trait to define a level of parallelism for algorithms that are able to use
 multithreaded execution, and provide a thread-pool based execution context.
A mix-in for input partitions whose records are clustered on the same set of partition keys
 (provided via 
SupportsReportPartitioning, see below).A mix-in for input partitions whose records are clustered on the same set of partition keys
 (provided via 
SupportsReportPartitioning, see below).Trait for shared param predictionCol (default: "prediction").
Trait for shared param probabilityCol (default: "probability").
Trait for shared param rawPredictionCol (default: "rawPrediction").
Trait for shared param regParam.
Trait for shared param relativeError (default: 0.001).
Trait for shared param seed (default: this.getClass.getName.hashCode.toLong).
Trait for shared param solver.
Trait for shared param standardization (default: true).
Trait for shared param stepSize.
Trait for shared param threshold.
Trait for shared param thresholds.
Trait for shared param tol.
Trait for models that provides Training summary.
Trait for shared param validationIndicatorCol.
Trait for shared param varianceCol.
Trait for shared param weightCol.
Compute gradient and loss for a Hinge loss function, as used in SVM binary classification.
An interface to represent an equi-height histogram, which is a part of
 
ColumnStatistics.An interface to represent a bin in an equi-height histogram.
Metrics for access to the hive external catalog.
A servlet filter that implements HTTP security features.
Trait for an object with an immutable unique ID that identifies itself and its derivatives.
Identifies an object in a catalog.
Identity column specification.
Compute the Inverse Document Frequency (IDF) given a collection of documents.
Inverse document frequency (IDF).
Document frequency aggregator.
Model fitted by 
IDF.Represents an IDF model that can transform term frequency vectors.
image package implements Spark SQL data source API for loading image data as DataFrame.Defines the image schema and methods to read and manipulate images.
Factory for Impurity instances.
Trait for calculating information gain.
Imputation estimator for completing missing values, using the mean, median or mode
 of the columns in which the missing values are located.
Model fitted by 
Imputer.Params for 
Imputer and ImputerModel.A filter that evaluates to 
true iff the attribute evaluates to one of the values in the array.String type that was the result of coercing two different non-explicit collations.
Represents a row of 
IndexedRowMatrix.Represents a row-oriented 
DistributedMatrix with
 indexed rows.A 
Transformer that maps a column of indices back to a new column of corresponding
 string values.Information gain statistics for each split
 param:  gain information gain value
 param:  impurity current node impurity
 param:  leftImpurity left node impurity
 param:  rightImpurity right node impurity
 param:  leftPredict left node predict
 param:  rightPredict right node predict
In-process launcher for Spark applications.
Specifies an input that can be referenced by another Dataset's query.
This is the abstract base class for all input streams.
This holds file names of the current Spark task.
:: DeveloperApi ::
 Parses and holds information about inputFormat (and files) specified as a parameter.
A serializable representation of an input partition returned by
 
Batch.planInputPartitions() and the corresponding ones in streaming .Generic options for a read of an input.
A BaseRelation that can be used to insert data into it through the insert method.
Specialized version of 
Param[Array[Int} for Java.The data type representing 
Int values.A term that may be part of an interaction, e.g.
Implements the feature interaction transform.
A collection of fields and methods concerned with internal accumulators that represent
 task level metrics.
A writer for KMeans that handles the "internal" (or default) format
A writer for LinearRegression that handles the "internal" (or default) format
Internal Decision Tree node.
:: DeveloperApi ::
 An iterator that wraps around an existing iterator to provide task killing functionality.
Specialized version of 
Param[Int] for Java.An extractor object for parsing strings into integers.
A filter that evaluates to 
true iff the attribute evaluates to a non-null value.A filter that evaluates to 
true iff the attribute evaluates to null.Isotonic regression.
Isotonic regression.
Params for isotonic regression.
Model fitted by IsotonicRegression.
Regression model for isotonic regression.
A Java-friendly interface to 
DStream, the basic
 abstraction in Spark Streaming that represents a continuous stream of data.A Java-friendly interface to 
InputDStream.A Kryo serializer for serializing results returned by asJavaIterable.
DStream representing the stream of data generated by 
mapWithState operation on a
 JavaPairDStream.This helper class is used to place some JVM runtime options(eg: `--add-opens`)
 required by Spark when using Java 17.
A dummy class as a workaround to show the package doc of 
spark.mllib in generated
 Java API docs.A Java-friendly interface to a DStream of key-value pairs, which provides extra methods
 like 
reduceByKey and join.A Java-friendly interface to 
InputDStream of
 key-value pairs.A Java-friendly interface to 
ReceiverInputDStream, the
 abstract class for defining any input stream that receives data over the network.Java-friendly wrapper for 
Params.Defines operations common to several Java RDD implementations.
A Java-friendly interface to 
ReceiverInputDStream, the
 abstract class for defining any input stream that receives data over the network.:: DeveloperApi ::
 A Spark serializer that uses Java's built-in serialization.
A Java-friendly version of 
SparkContext that returns
 JavaRDDs and works with Java collections instead of Scala ones.Low-level status reporting APIs for monitoring job and stage progress.
Deprecated.
This is deprecated as of Spark 3.4.0.
Base trait for events related to JavaStreamingListener
::DeveloperApi::
 Connection provider which opens connection toward various databases (database specific instance
 needed).
:: DeveloperApi ::
 Encapsulates everything (extensions, workarounds, quirks) to handle the
 SQL dialect of a certain database or jdbc driver.
:: DeveloperApi ::
 Registry of dialects that apply to every new jdbc 
org.apache.spark.sql.DataFrame.An RDD that executes a SQL query on a JDBC connection and reads results.
The builder to build a single SELECT query.
:: DeveloperApi ::
 A database type definition coupled with the jdbc type needed to send null
 values to the database.
Utilities for launching a web server using Jetty's HTTP Server class
Event classes for JobGenerator
Interface used to listen for job completion or failure events after submitting a job to the
 DAGScheduler.
:: DeveloperApi ::
 A result of a job in the DAGScheduler.
Handle via which a "run" function passed to a 
ComplexFutureAction
 can submit jobs for execution.A servlet filter that requires JWS, a cryptographically signed JSON Web Token, in the header.
Kernel density estimation.
Represents a partitioning where rows are split across partitions based on the
 partition transform expressions returned by 
KeyGroupedPartitioning.keys.A 
Dataset has been logically grouped by a user specified grouping key.This is a helper class that wraps the methods in KinesisUtils into more Python-friendly class and
 function so that it can be easily instantiated and called from Python's KinesisUtils.
K-means clustering with support for k-means|| initialization proposed by Bahmani et al.
K-means clustering with a k-means++ like initialization mode
 (the k-means|| algorithm by Bahmani et al).
KMeansAggregator computes the distances and updates the centers for blocks
 in sparse or dense matrix in an online fashion.
Generate test data for KMeans.
Model fitted by KMeans.
A clustering model for K-means.
Common params for KMeans and KMeansModel
Summary of KMeans.
A trait that allows a class to give 
SizeEstimator more accurate size estimation.Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a
 continuous distribution.
Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a
 continuous distribution.
Object containing the test results for the Kolmogorov-Smirnov test.
Interface implemented by clients to register their classes with Kryo when using Kryo
 serialization.
A Spark serializer that uses the 
 Kryo serialization library.
Updater for L1 regularized problems.
Class that represents the features and label of a data point.
Class that represents the features and labels of a data point.
Label Propagation algorithm.
LAPACK routines for MLlib's vectors and matrices.
Regression model trained using Lasso.
Train a regression model with L1-regularization using Stochastic Gradient Descent.
Trait that holds Layer properties, that are needed to instantiate it.
Trait that holds Layer weights (or parameters).
Class used to solve an optimization problem using Limited-memory BFGS.
Latent Dirichlet Allocation (LDA), a topic model designed for text documents.
Latent Dirichlet Allocation (LDA), a topic model designed for text documents.
Model fitted by 
LDA.Latent Dirichlet Allocation (LDA) model.
An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can
 hold optimizer-specific parameters for users to set.
Utility methods for LDA.
Decision tree leaf node.
Compute gradient and loss for a Least-squared loss function, as used in linear regression.
A filter that evaluates to 
true iff the attribute evaluates to a value
 less than value.A filter that evaluates to 
true iff the attribute evaluates to a value
 less than or equal to value.Helper trait for defining thread locals with lexical scoping.
Final class representing a handle to a thread local value.
libsvm package implements Spark SQL data source API for loading LIBSVM data as DataFrame.Generate sample data used for Linear Data.
Linear regression.
Model produced by 
LinearRegression.Regression model trained using LinearRegression.
Params for linear regression.
Linear regression results evaluated on a dataset.
Linear regression training results.
Train a linear regression model with no regularization using Stochastic Gradient Descent.
Linear SVM Model trained by 
LinearSVCParams for linear SVM Classifier.
Abstraction for LinearSVC results for a given model.
LinearSVC results for a given model.
Abstraction for LinearSVC training results.
LinearSVC training results.
An event bus which posts events to its listeners.
Interface used for arbitrary stateful operations with the v2 API to capture list value state.
Convenience extractor for any Literal.
Represents a constant literal value in the public expression API.
Tracker for data related to a persisted RDD.
Data about a single partition of a cached RDD.
Trait for classes which can load models and transformers from files.
Event fired after 
MLReader.load.Event fired before 
MLReader.load.Exception raised when a flow fails to read from a table defined within the pipeline
An utility object to run K-means locally.
Local (non-distributed) model fitted by 
LDA.Local LDA model.
A special Scan which will happen on Driver locally instead of Executors.
Helper methods for working with the logical expressions API.
This interface contains logical write information that data sources can use when generating a
 
WriteBuilder.Compute gradient and loss for a multinomial logistic loss function, as used
 in multi-class classification (it is also used in binary logistic regression).
Logistic regression.
Generate test data for LogisticRegression.
Model produced by 
LogisticRegression.Classification model trained using Multinomial/Binary Logistic Regression.
Params for logistic regression.
Abstraction for logistic regression results for a given model.
Multiclass logistic regression results for a given model.
Abstraction for multiclass logistic regression training results.
Multiclass logistic regression training results.
Train a classification model for Multinomial/Binary Logistic Regression using
 Limited-memory BFGS.
Train a classification model for Binary Logistic Regression
 using Stochastic Gradient Descent.
Class for log loss calculation (for classification).
Generates i.i.d.
:: : DeveloperApi ::
 Utils for querying Spark logs with Spark SQL.
An 
accumulator for computing sum, count, and average of 64-bit integers.Specialized version of 
Param[Long] for Java.The data type representing 
Long values.A trait to encapsulate catalog lookup function and helpful extractors.
Extract legacy table identifier from a multi-part identifier.
Extract legacy table identifier from a multi-part identifier.
Extract catalog and identifier from a multi-part name with the current catalog if needed.
Extract catalog and identifier from a multi-part name with the current catalog if needed.
Extract catalog and namespace from a multi-part name with the current catalog if needed.
Extract catalog and namespace from a multi-part name with the current catalog if needed.
Extract non-session catalog and identifier from a multi-part identifier.
Extract non-session catalog and identifier from a multi-part identifier.
Extract session catalog and identifier from a multi-part identifier.
Extract session catalog and identifier from a multi-part identifier.
Trait for adding "pluggable" loss functions for the gradient boosting algorithm.
Trait for loss function
A loss reason that means we don't yet know why the executor exited.
Lower priority implicit methods for converting Scala objects into
 
Datasets.Params for 
LSH.:: DeveloperApi ::
 LZ4 implementation of 
CompressionCodec.:: DeveloperApi ::
 LZF implementation of 
CompressionCodec.Base interface for a map function used in Dataset's map function.
Base interface for a map function used in GroupedDataset's mapGroup function.
::Experimental::
 Base interface for a map function used in
 
KeyValueGroupedDataset.mapGroupsWithState(MapGroupsWithStateFunction, org.apache.spark.sql.Encoder, org.apache.spark.sql.Encoder):: Private ::
 Represents the result of writing map outputs for a shuffle map task.
:: Private ::
 An opaque metadata tag for registering the result of committing the output of a
 shuffle map task.
Base interface for function used in Dataset's mapPartitions.
An AccumulatorV2 counter for collecting a list of (mapper index, row count).
Interface used for arbitrary stateful operations with the v2 API to capture map value state.
Result returned by a ShuffleMapTask to a scheduler.
The data type for Maps.
DStream representing the stream of data generated by 
mapWithState operation on a
 pair DStream.Factory methods for 
Matrix.Factory methods for 
Matrix.Trait for a local matrix.
Trait for a local matrix.
Represents an entry in a distributed matrix.
Model representing the result of matrix factorization.
Provides utility functions to be used inside SparkSubmit.
An aggregate function that returns the maximum value in a group.
Rescale each feature individually to range [-1, 1] by dividing through the largest maximum
 absolute value in each feature.
Model fitted by 
MaxAbsScaler.Params for 
MaxAbsScaler and MaxAbsScalerModel.An extractor object for parsing JVM memory strings, such as "10g", into an Int representing
 the number of megabytes.
MergeIntoWriter provides methods to define and execute merge actions based on specified
 conditions.Default Meta-Algorithm read and write implementation.
Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean,
 Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and
 Array[Metadata].
Builder for 
Metadata.Interface for a metadata column.
Helper utilities for algorithms using ML metadata
Helper class to identify a method.
Generate RDD(s) containing data for Matrix Factorization.
A 
SparkDataStream for streaming queries with micro-batch mode.Helper object that creates instance of 
Duration representing
 a given number of milliseconds.An aggregate function that returns the minimum value in a group.
LSH class for Jaccard distance.
Model produced by 
MinHashLSH, where multiple hash functions are stored.Rescale each feature individually to a common range [min, max] linearly using column summary
 statistics, which is also known as min-max normalization or Rescaling.
Model fitted by 
MinMaxScaler.Params for 
MinMaxScaler and MinMaxScalerModel.Helper object that creates instance of 
Duration representing
 a given number of minutes.:: DeveloperApi ::
 Stores information about an Miscellaneous Process to pass from the scheduler to SparkListeners.
Event emitted by ML operations.
A small trait that defines some methods to send 
MLEvent.ML export formats for should implement this trait so that users can specify a shortname rather
 than the fully qualified class name of the exporter.
Machine learning specific Pair RDD functions.
Trait for objects that provide 
MLReader.Abstract class for utility classes that can load ML instances.
Helper methods to load, save and pre-process data used in MLLib.
Trait for classes that provide 
MLWriter.Abstract class for utility classes that can save ML instances in Spark's internal format.
Abstract class to be implemented by objects that provide ML exportability.
A fitted model, i.e., a 
Transformer produced by an Estimator.Evaluator for multiclass classification, which expects input columns: prediction, label,
 weight (optional) and probability (only for logLoss).
Evaluator for multiclass classification.
:: Experimental ::
 Evaluator for multi-label classification, which expects two input
 columns: prediction and label.
Evaluator for multilabel classification.
Classification model based on the Multilayer Perceptron.
Abstraction for MultilayerPerceptronClassification results for a given model.
MultilayerPerceptronClassification results for a given model.
Abstraction for MultilayerPerceptronClassification training results.
MultilayerPerceptronClassification training results.
Classifier trainer based on the Multilayer Perceptron.
Params for Multilayer Perceptron.
This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.
This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.
MultivariateOnlineSummarizer implements 
MultivariateStatisticalSummary to compute the mean,
 variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector
 format in an online fashion.Trait for multivariate statistical summary of a data matrix.
A 
Row representing a mutable aggregation buffer.:: DeveloperApi ::
 A tuple of 2 elements.
URL class loader that exposes the `addURL` method in URLClassLoader.
Naive Bayes Classifiers.
Trains a Naive Bayes model given an RDD of 
(label, features) pairs.Model produced by 
NaiveBayesModel for Naive Bayes Classifiers.
Params for Naive Bayes Classifiers.
Represents a field or column reference in the public logical expression API.
Convenience extractor for any Transform.
NamespaceChange subclasses represent requested changes to a namespace.
A NamespaceChange to remove a namespace property.
A NamespaceChange to set a namespace property.
:: DeveloperApi ::
 Base class for dependencies where each partition of the child RDD depends on a small number
 of partitions of the parent RDD.
:: DeveloperApi ::
 An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS,
 sources in HBase, or S3), using the new MapReduce API (
org.apache.hadoop.mapreduce).A feature transformer that converts the input array of strings into an array of n-grams.
InputStream implementation which uses direct buffer
 to read a file to avoid extra copy of data between Java and
 native memory which happens when using BufferedInputStream.Object used to solve nonnegative least squares problems using a modified
 projected gradient method.
Decision tree node interface.
Node in a decision tree.
Used to specify that no flows should be refreshed.
Make the 
classifyException method throw out the original exceptionA nominal attribute.
NOOP dialect object, always returning the neutral element.
Interface for classes that solve the normal equations locally.
Normalize a vector to have unit norm using the given p-norm.
Normalizes samples individually to unit L^p^ norm
A predicate that evaluates to 
true iff child is evaluated to false.A filter that evaluates to 
true iff child is evaluated to false.Used to select no tables.
A null order used in sorting expressions.
The data type representing 
NULL values.A numeric attribute with optional summary statistics.
A generic, re-usable histogram class that supports partial aggregations.
The Coord class defines a histogram bin, which is just an (x,y) pair.
Simple parser for a numeric structure consisting of three types:
Numeric data types.
Helper class to simplify usage of 
Dataset.observe(String, Column, Column*):An abstract representation of progress through a 
MicroBatchStream or
 ContinuousStream.A one-hot encoder that maps a column of category indices to a column of binary vectors, with
 at most a single one-value per row that indicates the input category index.
Private trait for params and common methods for OneHotEncoder and OneHotEncoderModel
Provides some helper methods used by 
OneHotEncoder.param:  categorySizes  Original number of categories for each feature being encoded.
:: DeveloperApi ::
 Represents a one-to-one dependency between partitions of the parent and child RDDs.
Reduction of Multiclass Classification to Binary Classification.
Model produced by 
OneVsRest.Params for 
OneVsRest.An online optimizer for LDA.
Trait for optimization problem solvers.
Like 
java.util.Optional in Java 8, scala.Option in Scala, and
 com.google.common.base.Optional in Google Guava, this class represents a
 value of a given type that may or may not exist.A predicate that evaluates to 
true iff at least one of left or right
 evaluates to true.A filter that evaluates to 
true iff at least one of left or right evaluates to true.A distribution where tuples have been ordered across partitions according
 to ordering expressions, but not necessarily within a given partition.
Extra functions available on RDDs of (key, value) pairs where the key is sortable through
 an implicit conversion.
Represents a node in a 
DataflowGraph that can be written to by a Flow.OutputMode describes what data will be written to a streaming sink when there is
 new data available in a streaming DataFrame/Dataset.
:: DeveloperApi ::
 Class having information on output operations.
A paged table that will generate a HTML table for a specified page and also the page navigation.
PageRank algorithm implementation.
Extra functions available on DStream of (key, value) pairs through an implicit conversion.
A function that returns zero or more key-value pair records from each input record.
A function that returns key-value pairs (Tuple2<K, V>), and can be used to
 construct PairRDDs.
Extra functions available on RDDs of (key, value) pairs through an implicit conversion.
Form an RDD[(Int, Array[Byte])] from key-value pairs returned from R.
A param with self-contained documentation and optionally default value.
Builder for a param grid used in grid search-based model selection.
A param to value map.
A param and its value.
Trait for components that take parameters.
Factory methods for common validation functions for 
Param.isValid.A class loader which makes some protected methods in ClassLoader accessible.
An identifier for a partition in an RDD.
::DeveloperApi::
 A PartitionCoalescer defines how to coalesce the partitions of a given RDD.
An object that defines how the elements in a key-value pair RDD are partitioned by key.
An evaluator for computing RDD partitions.
A factory to create 
PartitionEvaluator.::DeveloperApi::
 A group of 
Partitions
 param:  prefLoc preferred location for the partition groupAn interface to represent the output data partitioning for a data source, which is returned by
 
SupportsReportPartitioning.outputPartitioning().Used for per-partition offsets in continuous processing.
:: DeveloperApi ::
 An RDD used to prune RDD partitions/partitions so we can avoid launching tasks on
 all partitions.
A partition reader returned by 
PartitionReaderFactory.createReader(InputPartition) or
 PartitionReaderFactory.createColumnarReader(InputPartition).A factory used to create 
PartitionReader instances.Represents the way edges are assigned to edge partitions based on their source and destination
 vertex IDs.
Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical
 direction, resulting in a random vertex cut that colocates all edges between two vertices,
 regardless of direction.
Assigns edges to partitions using only the source vertex ID, colocating edges with the same
 source.
Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix,
 guaranteeing a 
2 * sqrt(numParts) bound on vertex replication.Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a
 random vertex cut that colocates all same-direction edges between two vertices.
PCA trains a model to project vectors to a lower dimensional space of the top 
PCA!.k
 principal components.A feature transformer that projects vectors to a low-dimensional space using PCA.
Model fitted by 
PCA.Model fitted by 
PCA that can project vectors to a low-dimensional space using PCA.Compute Pearson correlation for two RDDs of the type RDD[Double] or the correlation matrix
 for an RDD of the type RDD[Vector].
Representing a persisted 
View in a DataflowGraph.This interface contains physical write information that data sources can use when
 generating a 
DataWriterFactory or a StreamingDataWriterFactory.A simple pipeline, which acts as an estimator.
An internal event that is emitted during the run of a pipeline.
Describes where the event originated from
 param:  datasetName The name of the dataset
 param:  flowName The name of the flow
 param:  sourceCodeLocation The location of the source code
Executes a 
DataflowGraph by resolving the graph, materializing datasets, and running the
 flows.Represents a fitted pipeline.
An in-memory buffer which contains the internal events that are emitted during a run of a
 pipeline.
Interface for validating and accessing Pipeline-specific table properties.
A stage in a pipeline, either an 
Estimator or a Transformer.An implementation of the PipelineUpdateContext trait used in production.
:: DeveloperApi ::
 Context information and operations for plugins loaded by Spark.
Export model to the PMML format
 Predictive Model Markup Language (PMML) is an XML-based file format
 developed by the Data Mining Group (www.dmg.org).
A writer for KMeans that handles the "pmml" format
A writer for LinearRegression that handles the "pmml" format
Utility functions that help us determine bounds on adjusted sampling rate to guarantee exact
 sample sizes with high confidence when sampling with replacement.
Generates i.i.d.
:: DeveloperApi ::
 A sampler for sampling with replacement, based on values drawn from Poisson distribution.
Perform feature expansion in a polynomial space.
A class that allows DataStreams to be serialized and moved around by not creating them
 until they need to be read
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
 Lin and Cohen.
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
 Lin and Cohen.
Cluster assignment.
Model produced by 
PowerIterationClustering.Common params for PowerIterationClustering
Precision.
The general representation of predicate expressions, which contains the upper-cased expression
 name and all the children expressions.
Predicted value for a node
 param:  predict predicted value
 param:  prob probability of the label (classification only)
Abstraction for a model for prediction tasks (regression and classification).
Predictor<FeaturesType,Learner extends Predictor<FeaturesType,Learner,M>,M extends PredictionModel<FeaturesType,M>>     
Abstraction for prediction problems (regression and classification).
(private[ml])  Trait for parameters for prediction (regression and classification).
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
Represents a frequent sequence.
Model fitted by 
PrefixSpan
 param:  freqSequences frequent sequencesImplements a Pregel-like bulk-synchronous message-passing API.
A PRIMARY KEY constraint.
ProbabilisticClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>>  
Model produced by a 
ProbabilisticClassifier.ProbabilisticClassifier<FeaturesType,E extends ProbabilisticClassifier<FeaturesType,E,M>,M extends ProbabilisticClassificationModel<FeaturesType,M>>     
Single-label binary or multiclass classifier which can output class conditional probabilities.
(private[classification])  Params for probabilistic classification.
A base interface for all procedures.
A catalog API for working with procedures.
A 
procedure parameter.An enum representing procedure parameter modes.
:: DeveloperApi ::
 
ProtobufSerDe used to represent the API for serialize and deserialize of
 Protobuf data related to UI.A Jetty handler to handle redirects to a proxy server.
A BaseRelation that can eliminate unneeded columns and filter using selected
 predicates before producing an RDD containing all matching tuples as Row objects.
A BaseRelation that can eliminate unneeded columns before producing an RDD
 containing all of its tuples as Row objects.
:: DeveloperApi ::
 A class with pseudorandom behavior.
Helper class for 
ShuffleBlockFetcherIterator that encapsulates all the push-based
 functionality to fetch push-merged block meta and shuffle chunks.Py4J allows a pure interface so this proxy is required.
Represents QR factors.
QuantileDiscretizer takes a column with continuous features and outputs a column with binned
 categorical features.Params for 
QuantileDiscretizer.Enum for selecting the quantile calculation strategy
Query context of a 
SparkThrowable.Contains the catalog and database context information for query execution.
The type of 
QueryContext.Indicates that run has failed due to a query execution failure.
The interface of query execution listener that can be used to analyze execution metrics.
Represents the query info provided to the stateful processor used in the arbitrary state API v2
 to easily identify task retries on the same partition.
Records information used to track the provenance of a given query to user code.
Trait for random data generators that generate i.i.d.
ALGORITHM
A class that implements a Random Forest
 learning algorithm for classification and regression.
Random Forest model for classification.
Abstraction for multiclass RandomForestClassification results for a given model.
Multiclass RandomForestClassification results for a given model.
Abstraction for multiclass RandomForestClassification training results.
Multiclass RandomForestClassification training results.
Random Forest learning algorithm for
 classification.
Represents a random forest model.
Parameters for Random Forest algorithms.
Random Forest model for regression.
Random Forest
 learning algorithm for regression.
Generator methods for creating RDDs comprised of 
i.i.d. samples from some distribution.:: DeveloperApi ::
 A pseudorandom sampler.
:: DeveloperApi ::
 Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
A 
Partitioner that partitions sortable records by range into roughly
 equal ranges.:: Experimental ::
 Evaluator for ranking, which expects two input columns: prediction and label.
Evaluator for ranking algorithms.
A component that estimates the rate at which an 
InputDStream should ingest
 records, based on updates at every batch completion.A more compact class to represent a rating than Tuple3[Int, Int, Double].
A helper program that sends blocks of Kryo-serialized text strings out on a socket at a
 specified rate.
Authentication handler for connections from the R process.
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
:: Experimental ::
 Wraps an RDD in a barrier stage, which forces Spark to launch tasks of this stage together.
Machine learning specific RDD functions.
A custom sequence of partitions based on a mutable linked list.
InputStream implementation which asynchronously reads ahead from the underlying input
 stream when specified amount of data has been read from the current buffer.Represents a 
ReadLimit where the MicroBatchStream must scan all the data
 available at the streaming source.Interface representing limits on how much to read from a 
MicroBatchStream when it
 implements SupportsAdmissionControl.Represents a 
ReadLimit where the MicroBatchStream should scan files which total
 size doesn't go beyond a given maximum total size.Represents a 
ReadLimit where the MicroBatchStream should scan approximately the
 given maximum number of files.Represents a 
ReadLimit where the MicroBatchStream should scan approximately the
 given maximum number of rows.Represents a 
ReadLimit where the MicroBatchStream should scan approximately
 at least the given minimum number of rows.Recall.
Trait representing a received block
Trait that represents a class that handles the storage of blocks received by receiver
Trait that represents the metadata related to storage of blocks
Trait representing any event in the ReceivedBlockTracker that updates its state.
:: DeveloperApi ::
 Abstract class of a receiver that can be run on worker nodes to receive external data.
:: DeveloperApi ::
 Class having information about a receiver
Abstract class for defining any 
InputDStream
 that has to start a receiver on worker nodes to receive external data.Messages sent to the Receiver.
Enumeration to identify current state of a Receiver
Messages used by the driver and ReceiverTrackerEndpoint to communicate locally.
Messages used by the NetworkReceiver and the ReceiverTracker to communicate
 with each other.
Base interface for function used in Dataset's reduce.
A 'reducer' for output of user-defined functions.
Base class for user-defined functions that can be 'reduced' on another function.
Convenience extractor for any NamedReference.
A regex based tokenizer that extracts tokens either by using the provided regex pattern to split
 the text (default) or repeatedly matching the regex (if 
gaps is false).Evaluator for regression, which expects input columns prediction, label and
 an optional weight column.
Evaluator for regression.
Model produced by a 
Regressor.Regressor<FeaturesType,Learner extends Regressor<FeaturesType,Learner,M>,M extends RegressionModel<FeaturesType,M>>     
Single-label regression
Implemented by objects that produce relations for a specific kind of data source.
A mix-in interface for streaming sinks to signal that they can report
 metrics.
A mix-in interface for 
SparkDataStream streaming sources to signal that they can report
 metrics.A write that requires a specific distribution and ordering of data.
A 
Flow whose flow function has been invoked, meaning either:
  - Its output schema and dependencies are known.A 
Flow whose flow function has failed to resolve.A 
Flow whose flow function has successfully resolved.A wrapper for a resolved internal input that includes the alias provided by the user.
Trait used to help executor/worker allocate resources.
:: DeveloperApi ::
 A plugin that can be dynamically loaded into a Spark application to control how custom
 resources are discovered.
The default plugin that is loaded into a Spark application to control how custom
 resources are discovered.
Resource identifier.
Class to hold information about a type of Resource.
A case class to simplify JSON serialization of 
ResourceInformation.Resource profile to associate with an RDD.
Resource profile builder to build a 
ResourceProfile to associate with an RDD.Class that represents a resource request.
:: DeveloperApi ::
 A 
org.apache.spark.scheduler.ShuffleMapTask that completed successfully earlier, but we
 lost the executor before the stage completed.Allows Spark to rewrite the given references of the transform during analysis.
Implements the transforms required for fitting a dataset against an R model formula.
Base trait for 
RFormula and RFormulaModel.Model fitted by 
RFormula.Limited implementation of R formula parsing.
Regression model trained using RidgeRegression.
Train a regression model with L2-regularization using Stochastic Gradient Descent.
Scale features using statistics that are robust to outliers.
Model fitted by 
RobustScaler.Params for 
RobustScaler and RobustScalerModel.Defines the policy based on which 
RollingFileAppender will
 generate rolling files.Represents one row of output from a relational operator.
A factory class used to construct 
Row objects.A logical representation of a data source DELETE, UPDATE, or MERGE operation that requires
 rewriting data.
A row-level SQL command.
An interface for building a 
RowLevelOperation.An interface with logical information for a row-level operation such as DELETE, UPDATE, MERGE.
Represents a row-oriented distributed Matrix with no meaningful row indices.
An RDD that stores serialized R objects as Array[Byte].
Indicates that a triggered run has successfully completed execution.
Indicates that an run entered the failed state..
Helper exception class that indicates that a run has to be terminated and
 tracks the associated termination reason.
Runtime configuration interface for Spark.
This is the Scala stub of SparkR read.ml.
Filter that allows loading a fraction of HDFS files.
Trait for models and transformers which may be saved as files.
Event fired after 
MLWriter.save.Event fired before 
MLWriter.save.SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.
Interface for a function that produces a result value for each input row.
A logical representation of a data source scan.
This enum defines how the columnar support for the partitions of the data source
 should be determined.
An interface for building the 
Scan.An interface for schedulable entities.
An interface to build Schedulable tree
 buildPools: build the tree nodes(pools)
 addTaskSetManager: build the leaf nodes(TaskSetManagers)
A backend interface for scheduling systems that allows plugging in different ones under
 TaskSchedulerImpl.
An interface for sort algorithm
 FIFO: FIFO algorithm between TaskSetManagers
 FS: FS algorithm between Pools, and FIFO or FS within Pools
"FAIR" and "FIFO" determines which policy is used
    to order tasks amongst a Schedulable's sub-queues
  "NONE" is used when the a Schedulable has no sub-queues.
Implemented by objects that produce relations for a specific kind of data source
 with a given schema.
Utils for handling schemas.
Utils for handling schemas.
Helper object that creates instance of 
Duration representing
 a given number of seconds.There are cases when global JVM security configuration must be modified.
Various utility methods used by Spark Security.
Params for 
Selector and SelectorModel.Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile,
 through an implicit conversion.
Utility functions to serialize, deserialize objects to / from R
Hadoop configuration but serializable.
SerializableWritable<T extends org.apache.hadoop.io.Writable>
An implicit class that allows us to call private methods of ObjectStreamClass.
:: DeveloperApi ::
 A stream for writing serialized objects.
A holder for storing the serialized values.
:: DeveloperApi ::
 A serializer.
:: DeveloperApi ::
 An instance of a serializer, for use by one thread at a time.
A mix-in interface for 
TableProvider.Code generator for shared params (sharedParams.scala).
Computes shortest paths to the given set of landmark vertices, returning a graph where each
 vertex attribute is a map containing the shortest-path distance to each reachable landmark.
The data type representing 
Short values.:: Private ::
 An interface for plugging in modules for storing and reading temporary shuffle data.
:: DeveloperApi ::
 Represents a dependency on the output of a shuffle stage.
:: DeveloperApi ::
 The resulting RDD from a shuffle (e.g.
:: Private ::
 An interface for building shuffle support modules for the Driver.
:: Private ::
 An interface for building shuffle support for Executors.
A listener to be called at the completion of the ShuffleBlockFetcherIterator
 param:  data the ShuffleBlockFetcherIterator to process
:: Private ::
 A top-level writer that returns child writers for persisting the output of a map task,
 and then commits all of the writes as one atomic operation.
A common trait between 
MapStatus and MergeStatus.:: Private ::
 An interface for opening streams to persist partition bytes to a backing data store.
Helper class used by the 
MapOutputTrackerMaster to perform bookkeeping for a single
 ShuffleMapStage.Various utility methods used by Spark.
Contains utilities for working with posix signals.
A 
FutureAction holding the result of an action that triggers a single job.A 
CachedBatch that stores some simple metrics that can be used for filtering of batches with
 the SimpleMetricsCachedBatchSerializer.Provides basic filtering for 
CachedBatchSerializer implementations.A simple updater for gradient descent *without* any regularization.
Optional extension for partition writing that is optimized for transferring a single
 file to the backing store.
Represents singular value decomposition (SVD) factors.
Information about progress made for a sink in the execution of a 
StreamingQuery during a
 trigger.:: DeveloperApi ::
 Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in
 memory-aware caches.
:: DeveloperApi ::
 Snappy implementation of 
CompressionCodec.Used in partial graph updates to select "selectedTables".
A sort direction used in sorting expressions.
Represents a sort order in the public expression API.
Information about progress made for a source in the execution of a 
StreamingQuery during a
 trigger.A handle to a running Spark application.
Listener for updates to a handle's state.
Represents the application's state.
Serializable interface providing a method executors can call to obtain an
 AWSCredentialsProvider instance for authenticating to AWS services.
Builder for 
SparkAWSCredentials instances.Configuration for a Spark application.
Main entry point for Spark functionality.
The base interface representing a readable data stream in a Spark streaming query.
:: DeveloperApi ::
 Holds all the runtime environment objects for a running Spark instance (either master or worker),
 including the serializer, RpcEnv, block manager, map output tracker, etc.
Exposes information about Spark Executors.
Resolves paths to files added through 
SparkContext.addFile().TODO (PARQUET-1809): This is a temporary workaround; it is intended to be moved to Parquet.
Class that allows users to receive all SparkListener events.
Exposes information about Spark Jobs.
Launcher for Spark applications.
:: DeveloperApi ::
 A default implementation for 
SparkListenerInterface that has no-op implementations for
 all callbacks.A 
SparkListenerEvent bus that relays SparkListenerEvents to its listenersDeprecated.
use SparkListenerExecutorExcluded instead.
Deprecated.
use SparkListenerExecutorExcludedForStage instead.
Periodic updates from executors.
Deprecated.
use SparkListenerExecutorUnexcluded instead.
Interface for listening to events from the Spark scheduler.
An internal class that describes the metadata of an event log.
Deprecated.
use SparkListenerNodeExcluded instead.
Deprecated.
use SparkListenerNodeExcludedForStage instead.
Deprecated.
use SparkListenerNodeUnexcluded instead.
Peak metric values for the executor for the stage, written to the history log at stage
 completion.
A collection of regexes for extracting information from the master string.
A canonical representation of a file path.
:: DeveloperApi ::
 A plugin that can be dynamically loaded into a Spark application.
Utils for handling schemas.
The entry point to programming Spark with the Dataset and DataFrame API.
:: Experimental ::
 Holder for injection points to the 
SparkSession.Base trait for implementations used by 
SparkSessionExtensionsExposes information about Spark Stages.
Low-level status reporting APIs for monitoring job and stage progress.
Interface mixed into Throwables thrown from Spark.
Companion object used by instances of 
SparkThrowable to access error class information and
 construct error messages.Column-major sparse matrix.
Column-major sparse matrix.
A sparse vector represented by an index array and a value array.
A sparse vector represented by an index array and a value array.
Compute Spearman's correlation for two RDDs of the type RDD[Double] or the correlation matrix
 for an RDD of the type RDD[Vector].
A 
SparkListener that detects whether spills have occurred in Spark jobs.Interface for a "Split," which specifies a test made at a decision tree node
 to choose the left or right path.
Split applied to a feature
 param:  feature feature index
 param:  threshold Threshold for continuous feature.
The entry point for working with structured data (rows and columns) in Spark 1.x.
This SQLContext object contains utility functions to create a singleton SQLContext instance, or
 to get the created SQLContext instance.
SQL data types for vectors and matrices.
SQL statement processor context.
Class that holds the logical plan and query origin parsed from a SQL statement.
Data class for all state that is accumulated while processing a particular
 
SqlGraphRegistrationContext.A collection of implicit methods for converting common Scala objects into
 
Datasets.Implements the transformations which are defined by SQL statement.
::DeveloperApi::
 A user-defined type which can be automatically recognized by a SQLContext and registered.
Class for squared error loss calculation.
SquaredEuclideanSilhouette computes the average of the
 Silhouette over all the data of the dataset, which is
 a measure of how appropriately the data have been clustered.
Updater for L2 regularized problems.
Represents a table which is staged for being committed to the metastore.
:: DeveloperApi ::
 Stores information about a stage to pass from the scheduler to SparkListeners.
An optional mix-in for implementations of 
TableCatalog that support staging creation of
 a table before committing the table's metadata along with its contents in CREATE TABLE AS
 SELECT or REPLACE TABLE AS SELECT operations.Generates i.i.d.
Standardizes features by removing the mean and scaling to unit variance using column summary
 statistics on the samples in the training set.
Standardizes features by removing the mean and scaling to unit std using column summary
 statistics on the samples in the training set.
Model fitted by 
StandardScaler.Represents a StandardScaler model that can transform vectors.
Params for 
StandardScaler and StandardScalerModel.A class for tracking the statistics of a set of numbers (count, mean and variance) in a
 numerically robust way.
:: Experimental ::
 Abstract class for getting and updating the state in mapping function used in the 
mapWithState
 operation of a pair DStream (Scala)
 or a JavaPairDStream (Java).Represents the arbitrary stateful logic that needs to be provided by the user to perform
 stateful manipulations on keyed streams.
Represents the operation handle provided to the stateful processor used in the arbitrary state
 API v2.
Stateful processor with support for specifying initial state.
Information about updates made to stateful operators in a 
StreamingQuery during a trigger.:: Experimental ::
 Abstract class representing all the specifications of the DStream transformation
 
mapWithState operation of a
 pair DStream (Scala) or a
 JavaPairDStream (Java).API for statistical functions in MLlib.
An interface to represent statistics for a data source, which is returned by
 
SupportsReportStatistics.estimateStatistics().:: DeveloperApi ::
 Simple SparkListener that logs a few summary statistics when each stage completes.
:: DeveloperApi ::
 A simple StreamingListener that logs summary statistics across Spark Streaming batches
 param:  numBatchInfos Number of last batches to consider for generating statistics (default: 10)
This message will trigger ReceiverTrackerEndpoint to send stop signals to all registered
 receivers.
A feature transformer that filters out stop words from input.
:: DeveloperApi ::
 Flags for controlling the storage of an RDD.
A mapper class easy to obtain storage levels based on their names.
Expose some commonly useful storage level constants.
Helper methods for storage-related objects.
Protobuf type 
org.apache.spark.status.protobuf.AccumulableInfoProtobuf type 
org.apache.spark.status.protobuf.AccumulableInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationAttemptInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationAttemptInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationEnvironmentInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationEnvironmentInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationEnvironmentInfoWrapperProtobuf type 
org.apache.spark.status.protobuf.ApplicationEnvironmentInfoWrapperProtobuf type 
org.apache.spark.status.protobuf.ApplicationInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationInfoProtobuf type 
org.apache.spark.status.protobuf.ApplicationInfoWrapperProtobuf type 
org.apache.spark.status.protobuf.ApplicationInfoWrapperProtobuf type 
org.apache.spark.status.protobuf.AppSummaryProtobuf type 
org.apache.spark.status.protobuf.AppSummaryProtobuf type 
org.apache.spark.status.protobuf.CachedQuantileProtobuf type 
org.apache.spark.status.protobuf.CachedQuantileProtobuf enum 
org.apache.spark.status.protobuf.DeterministicLevelProtobuf type 
org.apache.spark.status.protobuf.ExecutorMetricsProtobuf type 
org.apache.spark.status.protobuf.ExecutorMetricsProtobuf type 
org.apache.spark.status.protobuf.ExecutorMetricsDistributionsProtobuf type 
org.apache.spark.status.protobuf.ExecutorMetricsDistributionsProtobuf type 
org.apache.spark.status.protobuf.ExecutorPeakMetricsDistributionsProtobuf type 
org.apache.spark.status.protobuf.ExecutorPeakMetricsDistributionsProtobuf type 
org.apache.spark.status.protobuf.ExecutorResourceRequestProtobuf type 
org.apache.spark.status.protobuf.ExecutorResourceRequestProtobuf type 
org.apache.spark.status.protobuf.ExecutorStageSummaryProtobuf type 
org.apache.spark.status.protobuf.ExecutorStageSummaryProtobuf type 
org.apache.spark.status.protobuf.ExecutorStageSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.ExecutorStageSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.ExecutorSummaryProtobuf type 
org.apache.spark.status.protobuf.ExecutorSummaryProtobuf type 
org.apache.spark.status.protobuf.ExecutorSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.ExecutorSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.InputMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.InputMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.InputMetricsProtobuf type 
org.apache.spark.status.protobuf.InputMetricsProtobuf type 
org.apache.spark.status.protobuf.JobDataProtobuf type 
org.apache.spark.status.protobuf.JobDataProtobuf type 
org.apache.spark.status.protobuf.JobDataWrapperProtobuf type 
org.apache.spark.status.protobuf.JobDataWrapperProtobuf enum 
org.apache.spark.status.protobuf.JobExecutionStatusProtobuf type 
org.apache.spark.status.protobuf.MemoryMetricsProtobuf type 
org.apache.spark.status.protobuf.MemoryMetricsProtobuf type 
org.apache.spark.status.protobuf.OutputMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.OutputMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.OutputMetricsProtobuf type 
org.apache.spark.status.protobuf.OutputMetricsProtobuf type 
org.apache.spark.status.protobuf.PairStringsProtobuf type 
org.apache.spark.status.protobuf.PairStringsProtobuf type 
org.apache.spark.status.protobuf.PoolDataProtobuf type 
org.apache.spark.status.protobuf.PoolDataProtobuf type 
org.apache.spark.status.protobuf.ProcessSummaryProtobuf type 
org.apache.spark.status.protobuf.ProcessSummaryProtobuf type 
org.apache.spark.status.protobuf.ProcessSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.ProcessSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.RDDDataDistributionProtobuf type 
org.apache.spark.status.protobuf.RDDDataDistributionProtobuf type 
org.apache.spark.status.protobuf.RDDOperationClusterWrapperProtobuf type 
org.apache.spark.status.protobuf.RDDOperationClusterWrapperProtobuf type 
org.apache.spark.status.protobuf.RDDOperationEdgeProtobuf type 
org.apache.spark.status.protobuf.RDDOperationEdgeProtobuf type 
org.apache.spark.status.protobuf.RDDOperationGraphWrapperProtobuf type 
org.apache.spark.status.protobuf.RDDOperationGraphWrapperProtobuf type 
org.apache.spark.status.protobuf.RDDOperationNodeProtobuf type 
org.apache.spark.status.protobuf.RDDOperationNodeProtobuf type 
org.apache.spark.status.protobuf.RDDPartitionInfoProtobuf type 
org.apache.spark.status.protobuf.RDDPartitionInfoProtobuf type 
org.apache.spark.status.protobuf.RDDStorageInfoProtobuf type 
org.apache.spark.status.protobuf.RDDStorageInfoProtobuf type 
org.apache.spark.status.protobuf.RDDStorageInfoWrapperProtobuf type 
org.apache.spark.status.protobuf.RDDStorageInfoWrapperProtobuf type 
org.apache.spark.status.protobuf.ResourceInformationProtobuf type 
org.apache.spark.status.protobuf.ResourceInformationProtobuf type 
org.apache.spark.status.protobuf.ResourceProfileInfoProtobuf type 
org.apache.spark.status.protobuf.ResourceProfileInfoProtobuf type 
org.apache.spark.status.protobuf.ResourceProfileWrapperProtobuf type 
org.apache.spark.status.protobuf.ResourceProfileWrapperProtobuf type 
org.apache.spark.status.protobuf.RuntimeInfoProtobuf type 
org.apache.spark.status.protobuf.RuntimeInfoProtobuf type 
org.apache.spark.status.protobuf.ShufflePushReadMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.ShufflePushReadMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.ShufflePushReadMetricsProtobuf type 
org.apache.spark.status.protobuf.ShufflePushReadMetricsProtobuf type 
org.apache.spark.status.protobuf.ShuffleReadMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.ShuffleReadMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.ShuffleReadMetricsProtobuf type 
org.apache.spark.status.protobuf.ShuffleReadMetricsProtobuf type 
org.apache.spark.status.protobuf.ShuffleWriteMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.ShuffleWriteMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.ShuffleWriteMetricsProtobuf type 
org.apache.spark.status.protobuf.ShuffleWriteMetricsProtobuf type 
org.apache.spark.status.protobuf.SinkProgressProtobuf type 
org.apache.spark.status.protobuf.SinkProgressProtobuf type 
org.apache.spark.status.protobuf.SourceProgressProtobuf type 
org.apache.spark.status.protobuf.SourceProgressProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphClusterWrapperProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphClusterWrapperProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphEdgeProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphEdgeProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphNodeProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphNodeProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphNodeWrapperProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphNodeWrapperProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphWrapperProtobuf type 
org.apache.spark.status.protobuf.SparkPlanGraphWrapperProtobuf type 
org.apache.spark.status.protobuf.SpeculationStageSummaryProtobuf type 
org.apache.spark.status.protobuf.SpeculationStageSummaryProtobuf type 
org.apache.spark.status.protobuf.SpeculationStageSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.SpeculationStageSummaryWrapperProtobuf type 
org.apache.spark.status.protobuf.SQLExecutionUIDataProtobuf type 
org.apache.spark.status.protobuf.SQLExecutionUIDataProtobuf type 
org.apache.spark.status.protobuf.SQLPlanMetricProtobuf type 
org.apache.spark.status.protobuf.SQLPlanMetricProtobuf type 
org.apache.spark.status.protobuf.StageDataProtobuf type 
org.apache.spark.status.protobuf.StageDataProtobuf type 
org.apache.spark.status.protobuf.StageDataWrapperProtobuf type 
org.apache.spark.status.protobuf.StageDataWrapperProtobuf enum 
org.apache.spark.status.protobuf.StageStatusProtobuf type 
org.apache.spark.status.protobuf.StateOperatorProgressProtobuf type 
org.apache.spark.status.protobuf.StateOperatorProgressProtobuf type 
org.apache.spark.status.protobuf.StreamBlockDataProtobuf type 
org.apache.spark.status.protobuf.StreamBlockDataProtobuf type 
org.apache.spark.status.protobuf.StreamingQueryDataProtobuf type 
org.apache.spark.status.protobuf.StreamingQueryDataProtobuf type 
org.apache.spark.status.protobuf.StreamingQueryProgressProtobuf type 
org.apache.spark.status.protobuf.StreamingQueryProgressProtobuf type 
org.apache.spark.status.protobuf.StreamingQueryProgressWrapperProtobuf type 
org.apache.spark.status.protobuf.StreamingQueryProgressWrapperProtobuf type 
org.apache.spark.status.protobuf.TaskDataProtobuf type 
org.apache.spark.status.protobuf.TaskDataProtobuf type 
org.apache.spark.status.protobuf.TaskDataWrapperProtobuf type 
org.apache.spark.status.protobuf.TaskDataWrapperProtobuf type 
org.apache.spark.status.protobuf.TaskMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.TaskMetricDistributionsProtobuf type 
org.apache.spark.status.protobuf.TaskMetricsProtobuf type 
org.apache.spark.status.protobuf.TaskMetricsProtobuf type 
org.apache.spark.status.protobuf.TaskResourceRequestProtobuf type 
org.apache.spark.status.protobuf.TaskResourceRequestStores all the configuration options for tree construction
 param:  algo  Learning goal.
Auxiliary functions and data structures for the sampleByKey method in PairRDDFunctions.
Deprecated.
This is deprecated as of Spark 3.4.0.
:: DeveloperApi ::
 Represents the state of a StreamingContext.
A factory of 
DataWriter returned by
 StreamingWrite.createStreamingWriterFactory(PhysicalWriteInfo), which is responsible for
 creating and initializing the actual data writer at executor side.A 
Flow that represents stateful movement of data to some target.A 'FlowExecution' that processes data statefully using Structured Streaming.
StreamingKMeans provides methods for configuring a
 streaming k-means analysis, training the model on streaming,
 and using the model to make predictions on streaming data.
StreamingKMeansModel extends MLlib's KMeansModel for streaming
 algorithms, so it can keep track of a continuously updated weight
 associated with each cluster, and also update the model by
 doing a single iteration of the standard k-means algorithm.
StreamingLinearAlgorithm implements methods for continuously
 training a generalized linear model on streaming data,
 and using it for prediction on (possibly different) streaming data.
Train or predict a linear regression model on streaming data.
:: DeveloperApi ::
 A listener interface for receiving information about an ongoing streaming
 computation.
:: DeveloperApi ::
 Base trait for events related to StreamingListener
Train or predict a logistic regression model on streaming data.
A handle to a query that is executing continuously in the background as new data arrives.
Exception that stopped a 
StreamingQuery.Interface for listening to events related to 
StreamingQueries.Base type of 
StreamingQueryListener eventsEvent representing that query is idle and waiting for new data to process.
Event representing any progress updates in a query.
Event representing the start of a query
 param:  id
   A unique query id that persists across restarts.
Event representing that termination of a query.
A class to manage all the 
StreamingQuery active in a SparkSession.Information about progress made in the execution of a 
StreamingQuery during a trigger.Reports information about the instantaneous status of a streaming query.
Options for a streaming read of an input.
A `StreamingFlowExecution` that writes a streaming `DataFrame` to a `Table`.
Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs.
Significance testing methods for 
StreamingTest.An interface that defines how to write the data to data source in streaming queries.
:: DeveloperApi ::
 Track the information of input stream at specified batch time.
A streaming listener that converts streaming events into pipeline events for the relevant flows.
::Experimental::
 Implemented by objects that can produce a streaming 
Sink for a specific format or system.::Experimental::
 Implemented by objects that can produce a streaming 
Source for a specific format or system.Specialized version of 
Param[Array[String} for Java.A filter that evaluates to 
true iff the attribute evaluates to
 a string that contains the string value.A filter that evaluates to 
true iff the attribute evaluates to
 a string that ends with value.A label indexer that maps string column(s) of labels to ML column(s) of label indices.
Base trait for 
StringIndexer and StringIndexerModel.Model fitted by 
StringIndexer.An RDD that stores R objects as Array[String].
A filter that evaluates to 
true iff the attribute evaluates to
 a string that starts with value.The data type representing 
String values.Strongly connected components algorithm implementation.
A field inside a StructType.
A 
StructType object can be constructed byPerforms Students's 2-sample t-test.
:: DeveloperApi ::
 Task succeeded.
An aggregate function that returns the summation of all the values in a group.
Tools for vectorized statistics on MLlib Vectors.
Trait for the Summary
 All the summaries should extend from this Summary in order to
 support connect.
A builder object that provides summary statistics about a given column.
A mix-in interface for 
SparkDataStream streaming sources to signal that they can control
 the rate of data ingested into the system.An atomic partition interface of 
Table to operate multiple partitions atomically.An interface, which TableProviders can implement, to support table existence checks and creation
 through a catalog, without having to use table identifiers.
A mix-in interface for 
Table delete support.A mix-in interface for 
Table delete support.A mix-in interface for 
RowLevelOperation.Write builder trait for tables that support dynamic partition overwrite.
Table methods for working with index
An interface for exposing data columns for a table that are not in the table schema.
Catalog methods for working with namespaces.
Write builder trait for tables that support overwrite by filter.
Write builder trait for tables that support overwrite by filter.
A partition interface of 
Table.A mix-in interface for 
ScanBuilder.A mix-in interface for 
ScanBuilder.A mix-in interface for 
ScanBuilder.A mix-in interface for 
ScanBuilder.A mix-in interface for 
ScanBuilder.A mix-in interface for 
Scan.A mix-in interface for 
ScanBuilder.A mix-in interface for 
ScanBuilder.A mix-in interface of 
Table, to indicate that it's readable.A mix in interface for 
Scan.A mix in interface for 
Scan.A mix in interface for 
Scan.A mix-in interface for 
Table row-level operations support.A mix-in interface for 
Scan.A mix-in interface for 
Scan.Implemented by StreamSourceProvider objects that can generate file metadata columns.
An interface for streaming sources that supports running in Trigger.AvailableNow mode, which
 will process all the available data at the beginning of the query in (possibly) multiple batches.
Write builder trait for tables that support truncation.
A mix-in interface of 
Table, to indicate that it's writable.Implementation of SVD++ algorithm.
Configuration parameters for SVDPlusPlus.
Generate sample data used for SVM.
Model for Support Vector Machines (SVMs).
Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
A table in Spark, as returned by the 
listTables method in Catalog.An interface representing a logical structured data set of a data source.
A table representing a materialized dataset in a 
DataflowGraph.Capabilities that can be provided by a 
Table implementation.Catalog methods for working with Tables.
Capabilities that can be provided by a 
TableCatalog implementation.TableChange subclasses represent requested changes to a table.
A TableChange to add a field.
A TableChange to alter table and add a constraint.
Column position AFTER means the specified column should be put after the given `column`.
A TableChange to alter clustering columns for a table.
A TableChange to delete a field.
A TableChange to alter table and drop a constraint.
Defines modes for dropping a constraint.
Column position FIRST means the specified column should be the first column.
A TableChange to remove a table property.
A TableChange to rename a field.
A TableChange to set a table property.
A TableChange to update the comment of a field.
A TableChange to update the default value of a field.
A TableChange to update the nullability of a field.
A TableChange to update the position of a field.
A TableChange to update the type of a field.
Specifies how we should filter Tables.
Index in a table
A type of 
Input where data is loaded from a table.The base interface for v2 data sources which don't have a real catalog.
A BaseRelation that can produce all of its tuples as an RDD of Row objects.
Interface for invoking table-valued functions in Spark SQL.
The table write privileges that will be provided when loading a table.
Target Encoding maps a column of categorical indices into a numerical feature derived
 from the target.
Private trait for params and common methods for TargetEncoder and TargetEncoderModel
param:  stats  Array of statistics for each input feature.
:: DeveloperApi ::
 Task requested the driver to commit, but was denied.
:: DeveloperApi ::
Contextual information about a task which can be read or mutated during
 execution.
:: DeveloperApi ::
 Various possible reasons why a task ended.
:: DeveloperApi ::
 Various possible reasons why a task failed.
:: DeveloperApi ::
Tasks have a lot of indices that are used in a few different places.
:: DeveloperApi ::
 Information about a running task attempt inside a TaskSet.
:: DeveloperApi ::
 Task was killed intentionally and needs to be rescheduled.
:: DeveloperApi ::
 Exception thrown when a task is explicitly killed (i.e., task failure is expected).
A location where a task should run.
A task resource request.
A set of task resource requests.
:: DeveloperApi ::
 The task finished successfully, but the result was lost from the executor's block manager before
 it was fetched.
Low-level task scheduler interface, currently implemented exclusively by
 
TaskSchedulerImpl.An event that SparkContext uses to notify HeartbeatReceiver that SparkContext.taskScheduler is
 created.
Representing a temporary 
View in a DataflowGraph.R formula terms.
:: Experimental ::
Trait for hypothesis test results.
Utilities for tests.
This is a simple class that represents an absolute instant of time.
Represents the time modes (used for specifying timers and ttl) possible for
 the Dataset operations 
transformWithState.Class used to provide access to timer values for processing and event time populated before
 method invocations using the arbitrary state API v2.
The timestamp without time zone type represents a local time in microsecond precision, which is
 independent of time zone.
The timestamp type represents a time instant in microsecond precision.
Intercepts write calls and tracks total time spent writing in order to update shuffle write
 metrics.
The time type represents a time value with fields hour, minute, second, up to microseconds.
A tokenizer that converts the input string to lowercase and then splits it by white spaces.
Trait for the artificial neural network (ANN) topology properties
::DeveloperApi::
 TopologyMapper provides topology information for a given host
 param:  conf SparkConf to get required properties, if needed
Trait for ANN topology model
Abstraction for training results.
Validation for hyper-parameter tuning.
Model from train validation split.
Writer for TrainValidationSplitModel.
Params for 
TrainValidationSplit and TrainValidationSplitModel.Represents a transform function in the public logical expression API.
Event fired after 
Transformer.transform.Abstract class for transformers that transform one dataset into another.
Event fired before 
Transformer.transform.Parameters for Decision Tree-based classification algorithms.
Parameters for Decision Tree-based ensemble classification algorithms.
Abstraction for models which are ensembles of decision trees
Parameters for Decision Tree-based ensemble algorithms.
Parameters for Decision Tree-based ensemble regression algorithms.
Parameters for Decision Tree-based regression algorithms.
Compute the number of triangles passing through each vertex.
Policy used to indicate how often results should be produced by a [[StreamingQuery]].
Executes all of the flows in the given graph in topological order.
Represents a subset of the fields of an [[EdgeTriplet]] or [[EdgeContext]].
Represents a table which can be atomically truncated.
TTL Configuration for state variable.
Deprecated.
As of release 3.0.0, please use the untyped builtin aggregate functions.
Deprecated.
please use untyped builtin aggregate functions.
A Spark SQL UDF that has 0 arguments.
A Spark SQL UDF that has 1 arguments.
A Spark SQL UDF that has 10 arguments.
A Spark SQL UDF that has 11 arguments.
A Spark SQL UDF that has 12 arguments.
A Spark SQL UDF that has 13 arguments.
A Spark SQL UDF that has 14 arguments.
A Spark SQL UDF that has 15 arguments.
A Spark SQL UDF that has 16 arguments.
A Spark SQL UDF that has 17 arguments.
A Spark SQL UDF that has 18 arguments.
A Spark SQL UDF that has 19 arguments.
A Spark SQL UDF that has 2 arguments.
A Spark SQL UDF that has 20 arguments.
A Spark SQL UDF that has 21 arguments.
A Spark SQL UDF that has 22 arguments.
A Spark SQL UDF that has 3 arguments.
A Spark SQL UDF that has 4 arguments.
A Spark SQL UDF that has 5 arguments.
A Spark SQL UDF that has 6 arguments.
A Spark SQL UDF that has 7 arguments.
A Spark SQL UDF that has 8 arguments.
A Spark SQL UDF that has 9 arguments.
Functions for registering user-defined functions.
This object keeps the mappings between user classes and their User Defined Types (UDTs).
This trait is shared by the all the root containers for application UI information --
 the HistoryServer and the application UI.
Utility functions for generating XML pages with spark content.
Continuously generates jobs that expose various features of the WebUI (internal testing tool).
Abstract class for transformers that take one input column, apply transformation, and output the
 result as a new column.
Represents a user-defined function that is not bound to input types.
A procedure that is not bound to input types.
Uncaught exception handler which first calls the delegate and then calls the
 OnFailure function with the uncaught exception.
Run could not be associated with a proper root cause.
Generates i.i.d.
Returns a flow filter that is a union of two flow filters
A UNIQUE constraint.
Feature selector based on univariate statistical tests against labels.
Model fitted by 
UnivariateFeatureSelectorModel.Params for 
UnivariateFeatureSelector and UnivariateFeatureSelectorModel.Represents a partitioning where rows are split across partitions in an unknown pattern.
:: DeveloperApi ::
 We don't know why the task ended -- for example, because of a ClassNotFound exception when
 deserializing the task result.
An unresolved attribute.
Exception raised when a flow tries to read from a dataset that exists but is unresolved
A 
Flow whose output schema and dependencies aren't known.Exception raised when a pipeline has one or more flows that cannot be resolved
A distribution where no promises are made about co-location of data.
Rule that defines which upcasts are allow in Spark.
Class used to perform steps (weight update) using Gradient Descent methods.
The general representation of user defined aggregate function, which implements
 
AggregateFunc, contains the upper-cased function name, the canonical function name,
 the `isDistinct` flag and all the inputs.Deprecated.
UserDefinedAggregateFunction is deprecated.
A user-defined function.
The general representation of user defined scalar function, which contains the upper-cased
 function name, canonical function name and all the children expressions.
The data type for User Defined Types (UDTs).
Various utility methods used by Spark.
A trait that should be implemented by V1 DataSources that would like to leverage the DataSource
 V2 read code paths.
A logical write that should be executed using V1 InsertableRelation interface.
The builder to generate SQL from V2 expressions.
A V2 table with V1 fallback support.
Common params for 
TrainValidationSplitParams and CrossValidatorParams.Interface used for arbitrary stateful operations with the v2 API to capture single value state.
Class for calculating variance during regression
Feature selector that removes all low-variance features.
Model fitted by 
VarianceThresholdSelector.Params for 
VarianceThresholdSelector and VarianceThresholdSelectorModel.The data type representing semi-structured values with arbitrary hierarchical data structures.
Represents a numeric vector, whose index type is Int and value type is Double.
Represents a numeric vector, whose index type is Int and value type is Double.
A feature transformer that merges multiple columns into a vector column.
Utility transformer that rewrites Vector attribute names via prefix replacement.
Class for indexing categorical feature columns in a dataset of 
Vector.Model fitted by 
VectorIndexer.Private trait for params for VectorIndexer and VectorIndexerModel
Factory methods for 
Vector.Factory methods for 
Vector.A feature transformer that adds size information to the metadata of a vector column.
This class takes a feature vector and outputs a new feature vector with a subarray of the
 original features.
Trait for transformation of a vector
:: AlphaComponent ::
Utilities for working with Spark version strings
VertexPartitionBaseOpsConstructor<T extends org.apache.spark.graphx.impl.VertexPartitionBase<Object>>
A typeclass for subclasses of 
VertexPartitionBase representing the ability to wrap them in a
 VertexPartitionBaseOps.Extends 
RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by
 pre-indexing the entries for fast, efficient joins.An interface representing a persisted view.
Representing a view in the 
DataflowGraph.Catalog methods for working with views.
ViewChange subclasses represent requested changes to a view.
A class that holds view information.
A type of 
TableInput that returns data from a specified schema or from the inferred
 Flows that write to the table.Entry in vocabulary
A function with no return value.
A two-argument function that takes arguments of type T1 and T2 with no return value.
Generates i.i.d.
Performs Welch's 2-sample t-test.
A class for defining actions to be taken when matching rows in a DataFrame during a merge
 operation.
A class for defining actions to be taken when no matching rows are found in a DataFrame during
 a merge operation.
A class for defining actions to be performed when there is no match by source during a merge
 operation in a MergeIntoWriter.
Utility functions for defining window in DataFrames.
A window specification that defines the partitioning, ordering, and frame boundaries.
Word2Vec trains a model of 
Map(String, Vector), i.e.Word2Vec creates vector representation of words in a text corpus.
Params for 
Word2Vec and Word2VecModel.Model fitted by 
Word2Vec.Word2Vec model
 param:  wordIndex maps each word to an index, which can retrieve the corresponding
                  vector from wordVectors
 param:  wordVectors array of length numWords * vectorSize, vector corresponding
                    to the word mapped with index i can be retrieved by the slice
                    (i * vectorSize, i * vectorSize + vectorSize)
:: Private ::
 A thin wrapper around a 
WritableByteChannel.A logical representation of a data source write.
:: DeveloperApi ::
 This abstract class represents a write ahead log (aka journal) that is used by Spark Streaming
 to save the received data (by receivers) and associated metadata to a reliable storage, so that
 they can be recovered after driver failures.
:: DeveloperApi ::
 This abstract class represents a handle that refers to a record written in a
 
WriteAheadLog.A helper class with utility functions related to the WriteAheadLog interface
An interface for building the 
Write.Configuration methods common to create/replace operations and insert/overwrite operations.
A commit message returned by 
DataWriter.commit() and will be sent back to the driver side
 as the input parameter of BatchWrite.commit(WriterCommitMessage[]) or
 StreamingWrite.commit(long, WriterCommitMessage[]).The type represents year-month intervals of the SQL standard.
:: DeveloperApi ::
 ZStandard implementation of 
CompressionCodec.