ezequias · February 27, 2020 13:22
diff --git a/PDI_steps.txt b/PDI_steps.txt
 Name	Category 	Description
 Abort	Flow	Abort a transformation.
 Add a checksum	Transform	Add a checksum column for each input row.
 Add constants	Transform	Add one or more constants to the input rows.
 Add sequence	Transform	Get the next value from an sequence.
 Add value fields changing sequence	Transform	
 Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence.
 Add XML	Transform	Encode several fields into an XML fragment.
 Analytic query	Statistics	Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST).
 AMQP Consumer	Streaming	Pull streaming data from an AMQP broker or clients through an AMQP transformation.
 AMQP Producer	Streaming	Publish messages in near-real-time to an AMQP broker.
 Append streams	Flow	Append two streams in an ordered way.
 ARFF output	Data Mining	Write data in ARFF format to a file.
 Automatic Documentation Output	Output	
 Generate documentation automatically based on input in the form of a list of transformations and jobs.
 Avro input	Big Data	
 Decode binary or JSON Avro data and extracts fields from the structure it defines, either from flat files or incoming fields.
 Avro input (deprecated)	Deprecated	Replaced by Avro input.
 Avro output	Big Data	Serialize data into Avro binary or JSON format from the PDI data stream, then writes it to file.
 Block this step until steps finish	Flow	Block this step until selected steps finish.
 Blocking step	Flow	
 Block flow until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step.
 Calculator	Transform	Create new fields by performing simple calculations.
 Call DB Procedure	Lookup	Get back information by calling a database procedure.
 Call Endpoint	Pentaho Server 	Call API endpoints from the Pentaho Server within a PDI transformation.
 Cassandra Input	Big Data	Read from a Cassandra column family.
 Cassandra Output	Big Data	Write to a Cassandra column family.
 Change file encoding	Utility	Change file encoding and create a new file.
 Check if a column exists	Lookup	Check if a column exists in a table on a specified connection.
 Check if file is locked	Lookup	Check if a file is locked by another process.
 Check if webservice is available	Lookup	Check if a webservice is available.
 Clone row	Utility	Clone a row as many times as needed.
 Closure Generator	Transform	Generate a closure table using parent-child relationships.
 Combination lookup/update	Data Warehouse	
 Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields.
 Concat Fields	Transform	
 Concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step.
 Copy rows to result	Job	
 Write rows to the executing job. The information will then be passed to the next entry in this job.
 CouchDB Input	Big Data	
 Retrieve all documents from a given view in a given design document from a given database.
 Credit card validator	Validation	
 Determines: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...).
 CSV file input	Input	Read from a simple CSV file input.
 Data Grid	Input	Enter rows of static data in a grid, usually for testing, reference or demo purpose.
 Data Validator	Validation	Validates passing data based on a set of rules.
 Database join	Lookup	Execute a database query using stream values as parameters.
 Database lookup	Lookup	Look up values in a database using field values.
 De-serialize from file	Input	Read rows of data from a data cube.
 Delay row	Utility	Output each input row after a delay.
 Delete	Output	Delete data in a database table based upon keys.
 Detect empty stream	Flow	
 Output one empty row if input stream is empty (ie when input stream does not contain any row).
 Dimension lookup/update	Data Warehouse	
 Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension.
 Dummy (do nothing)	Flow	
 Does not do anything. It is useful however when testing things or in certain situations where you want to split streams.
 Dynamic SQL row	Lookup	
 Execute dynamic SQL statement build in a previous field.
 Edi to XML	Utility	
 Convert an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds).
 ElasticSearch Bulk Insert	Bulk loading	Perform bulk inserts into ElasticSearch.
 Email messages input	Input	
 Read POP3/IMAP server and retrieve messages.
 ESRI Shapefile Reader	Input	
 Read shape file data from an ESRI shape file and linked DBF file.
 ETL Metadata Injection	Flow	Inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions.
 Example step (deprecated)	Deprecated	
 Is an example of a plugin test step.
 Execute a process	Utility	
 Execute a process and return the result.
 Execute Row SQL Script	Scripting	Execute an SQL statement or file for every input row.  
 Execute SQL script	Scripting	Execute an SQL script, optionally parameterized using input rows.
 File exists	Lookup	Check if a file exists.
 Filter Rows	Flow	Filter rows using simple equations.
 Fixed file input	Input	Read from a fixed file input.
 Formula	Scripting	Calculate a formula using Pentaho's libformula.
 Fuzzy match	Lookup	
 Find the approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream.
 Generate random credit card numbers	Input	
 Generate random valide (luhn check) credit card numbers.
 Generate random value	Input	
 Generate random value.
 Generate Rows	Input	
 Generate a number of empty or equal rows.
 Get data from XML	Input	
 Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field.
 Get File Names	Input	
 Get file names from the operating system and send them to the next step.
 Get files from result	Job	
 Read filenames used or generated in a previous entry in a job.
 Get Files Rows Count	Input	
 Get files rows count.
 Get ID from slave server	Transform	
 Retrieve unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file.
 Get records from stream	Streaming	
 Return records that were previously generated by another transformation in a job. 
 Get repository names	Input	
 List detailed information about transformations and/or jobs in a repository.
 Get rows from result	Job	
 Read rows from a previous entry in a job.
 Get Session Variables	Pentaho Server 	
 Retrieve the value of a session variable.
 Get SubFolder names	Input	
 Read a parent folder and return all subfolders.
 Get System Info	Input	
 Get information from the system like system date, arguments, etc.
 Get table names	Input	
 Get table names from database connection and send them to the next step.
 Get Variables	Job	
 Determine the values of certain (environment or Kettle) variables and put them in field values.
 Google Analytics	Input	
 Fetch data from google analytics account.
 Greenplum Bulk Loader (deprecated)	Deprecated	
 Bulk load Greenplum data. Replacement step is Greenplum Load.
 Greenplum Load	Bulk loading 	
 Bulk load Greenplum data. 
 Group by	Statistics	
 Build aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.
 GZIP CSV Input	Input	
 Read in parallel from a GZIP CSV file.
 Hadoop File Input	Big Data	
 Read data from a variety of different text-file types stored on a Hadoop cluster.
 Hadoop File Output	Big Data	
 Write data to a variety of different text-file types stored on a Hadoop cluster.
 HBase Input	Big Data	
 Read from an HBase column family.
 HBase Output	Big Data	
 Write to an HBase column family.
 HBase Row Decoder	Big Data	
 Decode an incoming key and HBase result object according to a mapping.
 HL7 Input	Input 	
 Read data from HL7 data streams.
 HTTP client	Lookup	
 Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically.
 HTTP Post	Lookup	
 Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically.
 IBM Websphere MQ Consumer (deprecated)	Deprecated	
 Receive messages from any IBM Websphere MQ Server. 
 IBM Websphere MQ Producer (deprecated)	Deprecated	
 Send messages to any IBM Websphere MQ Server. 
 Identify last row in a stream	Flow	
 Mark the last row.
 If field value is null	Utility	
 Set a field value to a constant if it is null.
 Infobright Loader	Bulk loading	
 Load data to an Infobright database table.
 Ingres VectorWise Bulk Loader	Bulk loading	
 Interface with the Ingres VectorWise Bulk Loader "COPY TABLE" command.
 Injector	Inline	
 Inject rows into the transformation through the java API.
 Insert / Update	Output	
 Update or insert rows in a database based upon keys.
 Java Filter	Flow	
 Filter rows using java code.
 JMS Consumer	Streaming	
 Receive messages from a JMS server.
 JMS consumer (deprecated)	Deprecated	Replaced by JMS consumer.
 JMS Producer	Streaming	
 Send messages to a JMS server.
 JMS producer (deprecated)	Deprecated	Replaced by JMS producer.
 Job Executor	Flow 	
 Run a PDI job, and passes parameters and rows. 
 Join Rows (cartesian product) 	Joins	
 Output the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams.
 JSON Input	Input	
 Extract relevant portions out of JSON structures (file or incoming field) and output rows.
 JSON output	Output	
 Create JSON block and output it in a field to a file.
 Kafka Consumer	Streaming	
 Run a sub-transformation that executes according to message batch size or duration, letting you process a continuous stream of records in near-real-time.
 Kafka Producer	Streaming	
 Publish messages in near-real-time across worker nodes where multiple, subscribed members have access.
 Knowledge Flow	Data Mining	
 Executes a Knowledge Flow data mining process.
 LDAP Input	Input	
 Read data from LDAP host.
 LDAP Output	Output	
 Perform Insert, upsert, update, add or delete operations on records based on their DN (Distinguished Name).
 LDIF Input	Input	
 Read data from LDIF files.
 Load file content in memory	Input	
 Load file content in memory.
 LucidDB streaming loader (deprecated)	Deprecated	
 Load data into LucidDB by using Remote Rows UDX. 
 Mail	Utility	
 Send e-mail.
 Mail Validator	Validation	
 Check if an email address is valid.
 Mapping (sub-transformation)	Mapping	
 Run a mapping (sub-transformation), use MappingInput and MappingOutput to specify the fields interface.
 Mapping input specification	Mapping	
 Specify the input interface of a mapping.
 Mapping output specification	Mapping	
 Specify the output interface of a mapping.
 MapReduce Input	Big Data	
 Enter Key Value pairs from Hadoop MapReduce.
 MapReduce Output	Big Data	
 Exit Key Value pairs, then push into Hadoop MapReduce.
 MaxMind GeoIP Lookup	Lookup	
 Lookup an IPv4 address in a MaxMind database and add fields such as geography, ISP, or organization.
 Memory Group by	Statistics	
 Build aggregates in a group by fashion. This step doesn't require sorted input.
 Merge Join	Joins	
 Join two streams on a given key and outputs a joined set. The input streams must be sorted on the join key.
 Merge Rows (diff)	Joins	
 Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged.
 Metadata structure of stream	Utility	
 Read the metadata of the incoming stream.
 Microsoft Access Input 	Input	
 Read data from a Microsoft Access file
 Microsoft Access Output	Output	
 Store records into an MS-Access database table.
 Microsoft Excel Input	Input	
 Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS).
 Microsoft Excel Output	Output	
 Store records into an Excel (XLS) document with formatting information.
 Microsoft Excel Writer	Output	
 Write or appends data to an Excel file.
 Modified Java Script Value	Scripting	
 Run JavaScript programs (and much more).
 Mondrian Input	Input	
 Execute and retrieve data using an MDX query against a Pentaho Analyses OLAP server (Mondrian).
 MonetDB Agile Mart	Agile 	
 MonetDB Bulk Loader	Bulk loading	
 Load data into MonetDB by using their bulk load command in streaming mode.
 MongoDB Input	Big Data	
 Read all entries from a MongoDB collection in the specified database.
 MongoDB Output	Big Data	
 Write to a MongoDB collection.
 MQTT Consumer	Streaming	Pull streaming data from an MQTT broker or clients through an MQTT transformation. 
 MQTT Producer	Streaming	Publish messages in near-real-time to an MQTT broker.
 Multiway Merge Join	Joins	
 Join multiple streams. This step supports INNER and FULL OUTER joins.
 MySQL Bulk Loader	Bulk loading	
 Load data over a named pipe (not available on MS Windows).
 Null if...	Utility	
 Set a field value to null if it is equal to a constant value.
 Number range	Transform	
 Create ranges based on numeric field.
 OLAP Input	Input	
 Execute and retrieve data using an MDX query against any XML/A OLAP datasource using olap4j.
 OpenERP object delete (deprecated)	Deprecated 	
 Delete data from the OpenERP server using the XMLRPC interface with the 'unlink' function. 
 OpenERP object input (deprecated)	Deprecated	
 Retrieve data from the OpenERP server using the XMLRPC interface with the 'read' function.
 OpenERP object output (deprecated)	Deprecated	
 Update data on the OpenERP server using the XMLRPC interface and the 'import' function 
 Oracle Bulk Loader	Bulk loading	
 Use Oracle Bulk Loader to load data.
 ORC Input	Big Data	Read fields data from ORC files into a PDI data stream.
 ORC Output	Big Data	Serialize data from the PDI data stream into an ORC file format and writes it to a file.
 Output steps metrics	Statistics	
 Return metrics for one or several steps.
 Palo cell input (deprecated)	Deprecated	
 Retrieve all cell data from a Palo cube.
 Palo cell output (deprecated)	Deprecated	
 Update cell data in a Palo cube.
 Palo dim input (deprecated)	Deprecated	
 Return elements from a dimension in a Palo database.
 Palo dim output (deprecated)	Deprecated	
 Create/update dimension elements and element consolidations in a Palo database.
 Parquet Input	Input	
 Decode Parquet data formats and extracts fields from the structure it defines.
 Parquet Output	Output	
 Map fields within data files and choose where you want to process those files.
 Pentaho Reporting Output	Output	
 Execute an existing report file (.prpt).
 PostgreSQL Bulk Loader	Bulk loading	
 Bulk load PostgreSQL data.
 Prioritize streams	Flow	
 Prioritize streams in an order way.
 Process files	Utility	
 Process one file per row (copy or move or delete). This step only accept filename in input.
 Properties Output	Output	
 Write data to properties file.
 Property Input	Input	
 Read data (key, value) from properties files.
 Python Executor	Scripting	Map upstream data from a PDI input step or execute a Python script to generate  data.  When you send all rows, Python stores the dataset in a variable that kicks off your Python script. 
 R script executor	Statistics	
 Execute an R script within a PDI transformation.
 Regex Evaluation	Scripting	
 Evaluate regular expressions. This step uses a regular expression to evaluate a field. It can also extract new fields out of an existing field with capturing groups.
 Replace in string	Transform	
 Replace all occurrences a word in a string with another word.
 Reservoir Sampling	Statistics	
 Transform Samples a fixed number of rows from the incoming stream.
 REST Client	Lookup	
 Consume RESTful services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs
 Row Denormaliser	Transform	
 Denormalise rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields.
 Row Flattener	Transform	
 Flatten consecutive rows based on the order in which they appear in the input stream.
 Row Normaliser	Transform	
 Normalise de-normalised information.
 RSS Input	Input	
 Read RSS feeds.
 RSS Output	Output	
 Read RSS stream.
 Rule Executor	Scripting	
 Execute a rule against each row (using Drools).
 Rule Accumulator	Scripting	
 Execute a rule against a set of rows (using Drools).
 Run SSH commands	Utility	
 Run SSH commands and returns result.
 S3 CSV Input	Input	Read from an S3 CSV file.
 S3 File Output	Output	Export data to a text file on an Amazon Simple Storage Service (S3).
 Salesforce Delete	Output	Delete records in a Salesforce module.
 Salesforce Input	Input	Read information from SalesForce.
 Salesforce Insert	Output	Insert records in a Salesforce module.
 Salesforce Update	Output	Update records in a Salesforce module.
 Salesforce Upsert	Output	Insert or update records in a Salesforce module.
 Sample rows	Statistics	Filter rows based on the line number.
 SAP input (deprecated)	Deprecated	Read data from SAP ERP, optionally with parameters.
 SAS Input	Input	Reads file in sas7bdat (SAS) native format.
 Script	Experimental	Calculate values by scripting in Ruby, Python, Groovy, Javascript, and other scripting languages.
 Select values 	Transform	Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision.
 Send message to Syslog	Utility	Send message to Syslog server.
 Serialize to file	Output	Write rows of data to a data cube.
 Set Field Value	Transform	Replace value of a field with another value field.
 Set Field Value to a Constant	Transform	Replace value of a field to a constant.
 Set files in result	Job	Set filenames in the result of this transformation. Subsequent job entries can then use this information.
 Set Session Variables	Pentaho Server 	Set the value of session variable.
 Set Variables	Job	Set environment variables based on a single input row.
 SFTP Put	Experimental	Upload a file or a stream file to a remote host via SFTP.
 Simple Mapping	Mapping 	
 Turn a repetitive, re-usable part of a transformation (a sequence of steps) into a mapping (sub-transformation).
 Single Threader	Flow	
 Execute a sequence of steps in a single thread. 
 Socket reader	Inline	Read a socket. A socket client that connects to a server (Socket Writer step).
 Socket writer	Inline	Write a socket. A socket server that can send rows of data to a socket reader.
 Sort rows	Transform	Sort rows based upon field values (ascending or descending).
 Sorted Merge 	Joins	Merge rows coming from multiple input steps providing these rows are sorted themselves on the given key fields.
 Split field to rows	Transform	Split a single string field by delimiter and creates a new row for each split term.
 Split Fields	Transform	Split a single field into more then one.
 Splunk Input	Transform	Read data from Splunk.
 Splunk Output	Transform	Write data to Splunk.
 SQL File Output	Output	Output SQL INSERT statements to a file.
 Stream lookup	Lookup	Look up values coming from another stream in the transformation.
 SSTable Output	Big Data	Write to a filesystem directory as a Cassandra SSTable.
 String operations	Transform	Apply certain operations like trimming, padding and others to string value.
 Strings cut	Transform	Cut a portion of a substring.
 Switch / Case	Flow	Switch a row to a certain target step based on the case value in a field.
 Synchronize after merge	Output	Perform insert/update/delete in one go based on the value of a field.
 Table Agile Mart	Agile 	 
 Table Compare	Utility 	
 Compare the data from two tables (provided they have the same lay-out). It'll find differences between the data in the two tables and log it.
 Table exists	Lookup	Check if a table exists on a specified connection.
 Table Input	Input	Read information from a database table.
 Table Output 	Output	Write information to a database table.
 Teradata Fastload Bulk Loader	Bulk loading	Bulk load Teradata Fastload data.
 Teradata TPT Insert Upsert Bulk Loader	Bulk loading	Bulk load via TPT using the tbuild command.
 Text file input	Input	
 Read data from a text file in several formats. This data can then be passed to your next step(s).
 Text file input (deprecated)	Deprecated	Replaced by Text file input.
 Text file output	Output	
 Write rows to a text file.
 Text file output (deprecated)	Deprecated	Replaced by Text file output.
 Transformation Executor	Flow 	
 Run a Pentaho Data Integration transformation, sets parameters, and passes rows.
 Unique rows	Transform	
 Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.
 Unique rows (HashSet)	Transform	
 Remove double rows and leave only unique occurrences by using a HashSet.
 Univariate Statistics	Statistics	Compute some simple stats based on a single input field.
 Update	Output	Update data in a database table based upon keys.
 User Defined Java Class	Scripting	Program a step using Java code.
 User Defined Java Expression	Scripting	Calculate the result of a Java Expression using Janino.
 Value Mapper	Transform	Map values of a certain field from one value to another.
 Vertica Bulk Loader	Bulk loading	Bulk load data into a Vertica table using their high performance COPY feature.
 Web services lookup	Lookup	Look up information using web services (WSDL).
 Write to log 	Utility	Write data to log.
 XBase input	Input	Read records from an XBase type of database file (DBF).
 XML Input Stream (StAX)	Input	Process very large and complex XML files very fast.
 XML Join	Joins	Join a stream of XML-Tags into a target XML string.
 XML Output	Output	Write data to an XML file.
 XSD Validator	Validation	Validate XML source (files or streams) against XML Schema Definition.
 XSL Transformation	Transform	Transform XML stream using XSL (eXtensible Stylesheet Language).
 Yaml Input	Input	
 Read YAML source (file or stream) parse them and convert them to rows and writes these to one or more output.
 Zip File	Utility 	Create a standard ZIP archive from the data stream fields.