Pipeline#
- class pyspark.ml.connect.Pipeline(*, stages=None)[source]#
A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each of which is either an
Estimatoror aTransformer. WhenPipeline.fit()is called, the stages are executed in order. If a stage is anEstimator, itsEstimator.fit()method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the next stage. If a stage is aTransformer, itsTransformer.transform()method will be called to produce the dataset for the next stage. The fitted model from aPipelineis aPipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages. If stages is an empty list, the pipeline acts as an identity transformer.New in version 3.5.0.
Examples
>>> from pyspark.ml.connect import Pipeline >>> from pyspark.ml.connect.classification import LogisticRegression >>> from pyspark.ml.connect.feature import StandardScaler >>> scaler = StandardScaler(inputCol='features', outputCol='scaled_features') >>> lor = LogisticRegression(maxIter=20, learningRate=0.01) >>> pipeline=Pipeline(stages=[scaler, lor]) >>> dataset = spark.createDataFrame([ ... ([1.0, 2.0], 1), ... ([2.0, -1.0], 1), ... ([-3.0, -2.0], 0), ... ([-1.0, -2.0], 0), ... ], schema=['features', 'label']) >>> pipeline_model = pipeline.fit(dataset) >>> transformed_dataset = pipeline_model.transform(dataset) >>> transformed_dataset.show() +------------+-----+--------------------+----------+--------------------+ | features|label| scaled_features|prediction| probability| +------------+-----+--------------------+----------+--------------------+ | [1.0, 2.0]| 1|[0.56373452100212...| 1|[0.02423273026943...| | [2.0, -1.0]| 1|[1.01472213780381...| 1|[0.09334788471460...| |[-3.0, -2.0]| 0|[-1.2402159462046...| 0|[0.99808156490325...| |[-1.0, -2.0]| 0|[-0.3382407126012...| 0|[0.96210002899169...| +------------+-----+--------------------+----------+--------------------+ >>> pipeline_model.saveToLocal("/tmp/pipeline") >>> loaded_pipeline_model = PipelineModel.loadFromLocal("/tmp/pipeline")
Methods
clear(param)Clears a param from the param map if it has been explicitly set.
copy([extra])Creates a copy of this instance.
explainParam(param)Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
Returns the documentation of all params with their optionally default values and user-supplied values.
extractParamMap([extra])Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
fit(dataset[, params])Fits a model to the input dataset with optional parameters.
getOrDefault(param)Gets the value of a param in the user-supplied param map or its default value.
getParam(paramName)Gets a param by its name.
Get pipeline stages.
get_uid_map(instance)hasDefault(param)Checks whether a param has a default value.
hasParam(paramName)Tests whether this instance contains a param with a given (string) name.
isDefined(param)Checks whether a param is explicitly set by user or has a default value.
isSet(param)Checks whether a param is explicitly set by user.
load(path)Load Estimator / Transformer / Model / Evaluator from provided cloud storage path.
loadFromLocal(path)Load Estimator / Transformer / Model / Evaluator from provided local path.
save(path, *[, overwrite])Save Estimator / Transformer / Model / Evaluator to provided cloud storage path.
saveToLocal(path, *[, overwrite])Save Estimator / Transformer / Model / Evaluator to provided local path.
set(param, value)Sets a parameter in the embedded param map.
setParams(self, \*[, stages])Sets params for Pipeline.
setStages(value)Set pipeline stages.
Attributes
Returns all params ordered by name.
Methods Documentation
- clear(param)#
Clears a param from the param map if it has been explicitly set.
- copy(extra=None)[source]#
Creates a copy of this instance.
New in version 3.5.0.
- Parameters
- extradict, optional
extra parameters
- Returns
Pipelinenew instance
- explainParam(param)#
Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
- explainParams()#
Returns the documentation of all params with their optionally default values and user-supplied values.
- extractParamMap(extra=None)#
Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
- Parameters
- extradict, optional
extra param values
- Returns
- dict
merged param map
- fit(dataset, params=None)#
Fits a model to the input dataset with optional parameters.
New in version 3.5.0.
- Parameters
- dataset
pyspark.sql.DataFrameor py:class:pandas.DataFrame input dataset, it can be either pandas dataframe or spark dataframe.
- paramsa dict of param values, optional
an optional param map that overrides embedded params.
- dataset
- Returns
Transformerfitted model
- getOrDefault(param)#
Gets the value of a param in the user-supplied param map or its default value. Raises an error if neither is set.
- getParam(paramName)#
Gets a param by its name.
- static get_uid_map(instance)#
- hasDefault(param)#
Checks whether a param has a default value.
- hasParam(paramName)#
Tests whether this instance contains a param with a given (string) name.
- isDefined(param)#
Checks whether a param is explicitly set by user or has a default value.
- isSet(param)#
Checks whether a param is explicitly set by user.
- classmethod load(path)#
Load Estimator / Transformer / Model / Evaluator from provided cloud storage path.
New in version 3.5.0.
- classmethod loadFromLocal(path)#
Load Estimator / Transformer / Model / Evaluator from provided local path.
New in version 3.5.0.
- save(path, *, overwrite=False)#
Save Estimator / Transformer / Model / Evaluator to provided cloud storage path.
New in version 3.5.0.
- saveToLocal(path, *, overwrite=False)#
Save Estimator / Transformer / Model / Evaluator to provided local path.
New in version 3.5.0.
- set(param, value)#
Sets a parameter in the embedded param map.
- setStages(value)[source]#
Set pipeline stages.
New in version 3.5.0.
- Parameters
- valuelist
of
pyspark.ml.connect.Transformerorpyspark.ml.connect.Estimator
- Returns
Pipelinethe pipeline instance
Attributes Documentation
- params#
Returns all params ordered by name. The default implementation uses
dir()to get all attributes of typeParam.
- stages = Param(parent='undefined', name='stages', doc='a list of pipeline stages')#
- uid#
A unique id for the object.