Tuesday, 22 April 2014

PowerCenter Client

The PowerCenter Client application consists of the tools to manage the repository and to design mappings, mapplets, and sessions to load the data. The PowerCenter Client application has the following tools:

  1. Designer. Use the Designer to create mappings that contain transformation instructions for the Integration Service.
  2. Mapping Architect for Visio. Use the Mapping Architect for Visio to create mapping templates that generate multiple mappings.
  3. Repository Manager. Use the Repository Manager to assign permissions to users and groups and manage folders.
  4. Workflow Manager. Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.
  5. Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service.

Wednesday, 16 April 2014

Informatica Transformations 3

Following are the list of Transformations available in Informatica:

  • Aggregator Transformation
  • Application Source Qualifier Transformation
  • Custom Transformation
  • Data Masking Transformation
  • Expression Transformation
  • External Procedure Transformation
  • Filter Transformation
  • HTTP Transformation
  • Input Transformation
  • Java Transformation
  • Joiner Transformation
  • Lookup Transformation
  • Normalizer Transformation
  • Output Transformation
  • Rank Transformation
  • Reusable Transformation
  • Router Transformation
  • Sequence Generator Transformation
  • Sorter Transformation
  • Source Qualifier Transformation
  • SQL Transformation
  • Stored Procedure Transformation
  • Transaction Control Transaction
  • Union Transformation
  • Unstructured Data Transformation
  • Update Strategy Transformation
  • XML Generator Transformation
  • XML Parser Transformation
  • XML Source Qualifier Transformation
  • Advanced External Procedure Transformation
  • External Transformation

In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail. 


Aggregator Transformation

Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple rows or groups. The Integration Service performs these calculations as it reads and stores data group and row data in an aggregate cache. It is an Active & Connected transformation.
Difference b/w Aggregator and Expression Transformation? Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups.
Aggregator transformation has following ports State, State_Count, Previous_State and State_Counter.
Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.
Aggregate Expressions: are allowed only in aggregate transformations. can include conditional clauses and non-aggregate functions. can also include one aggregate function nested into another aggregate function.
Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE

Application Source Qualifier Transformation

Represents the rows that the Integration Service reads from an application, such as an ERP source, when it runs a session.It is an Active & Connected transformation.

Custom Transformation

It works with procedures you create outside the designer interface to extend PowerCenter functionality. calls a procedure from a shared library or DLL. It is active/passive & connected type.
You can use CT to create T. that require multiple input groups and multiple output groups.
Custom transformation allows you to develop the transformation logic in a procedure. Some of the PowerCenter transformations are built using the Custom transformation. Rules that apply to Custom transformations, such as blocking rules, also apply to transformations built using Custom transformations. PowerCenter provides two sets of functions called generated and API functions. The Integration Service uses generated functions to interface with the procedure. When you create a Custom transformation and generate the source code files, the Designer includes the generated functions in the files. Use the API functions in the procedure code to develop the transformation logic.
Difference between Custom and External Procedure Transformation? In Custom T, input and output functions occur separately.The Integration Service passes the input data to the procedure using an input function. The output function is a separate function that you must enter in the procedure code to pass output data to the Integration Service. In contrast, in the External Procedure transformation, an external procedure function does both input and output, and its parameters consist of all the ports of the transformation.

Data Masking Transformation

Passive & Connected. It is used to change sensitive production data to realistic test data for non production environments. It creates masked data for development, testing, training and data mining. Data relationship and referential integrity are maintained in the masked data.
For example: It returns masked value that has a realistic format for SSN, Credit card number, birthdate, phone number, etc. But is not a valid value. Masking types: Key Masking, Random Masking, Expression Masking, Special Mask format. Default is no masking.



Expression Transformation

Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row. Example: to calculate discount of each product or to concatenate first and last names or to convert date to a string field.
You can create an Expression transformation in the Transformation Developer or the Mapping Designer. Components: Transformation, Ports, Properties, Metadata Extensions.
External Procedure
Passive & Connected or Unconnected. It works with procedures you create outside of the Designer interface to extend PowerCenter functionality. You can create complex functions within a DLL or in the COM layer of windows and bind it to external procedure transformation. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. You must be an experienced programmer to use TX and use multi-threaded code in external procedures.

Filter Transformation

Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do not meet the condition. For example, to find all the employees who are working in NewYork or to find out all the faculty member teaching Chemistry in a state. The input ports for the filter must come from a single transformation. You cannot concatenate ports from more than one transformation into the Filter transformation. Components: Transformation, Ports, Properties, Metadata Extensions.

HTTP Transformation

Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves data or posts data to the target or downstream transformation in the mapping.
Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.



Java Transformation

Active or Passive & Connected. It provides a simple native programming interface to define transformation functionality with the Java programming language. You can use the Java transformation to quickly define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or an external Java development environment.

Joiner Transformation

Active & Connected. It is used to join data from two related heterogeneous sources residing in different locations or to join data from the same source. In order to join two sources, there must be at least one or more pairs of matching column between the sources and a must to specify one source as master and the other as detail. For example: to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.
The Joiner transformation supports the following types of joins:
  • Normal
Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
  • Master Outer
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.
  • Detail Outer
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
  • Full Outer
Full outer join keeps all rows of data from both the master and detail sources.
Limitations on the pipelines you connect to the Joiner transformation:
*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.

Lookup Transformation

Passive & Connected or UnConnected. It is used to look up data in a flat file, relational table, view, or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations. You can create a lookup definition from a source qualifier and can also use multiple Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.

Lookup Components: Lookup source, Ports, Properties, Condition.
Types of Lookup:
1) Relational or flat file lookup.
2) Pipeline lookup.
3) Cached or uncached lookup.
4) connected or unconnected lookup.

Informatica Transformations 2

Transformations can be Connected or UnConnected to the data flow.

Connected Transformation:

Connected transformation is connected to other transformations or directly to target table in the mapping.
UnConnected Transformation:

An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Informatica Transformations 1

A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data.

Transformations can be of two types:

Active Transformation:

An active transformation can change the number of rows that pass through the transformation, change the transaction boundary, can change the row type. For example, Filter, Transaction Control and Update Strategy are active transformations. 

The key point is to note that Designer does not allow you to connect multiple active transformations or an active and a passive transformation to the same downstream transformation or transformation input group because the Integration Service may not be able to concatenate the rows passed by active transformations However, Sequence Generator transformation(SGT) is an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a result, the Integration Service does not encounter problems concatenating rows passed by a SGT and an active transformation.
  • Source Qualifier Transformation
  • Sorter Transformations
  • Aggregator Transformations
  • Filter Transformation
  • Union Transformation
  • Joiner Transformation
  • Normalizer Transformation
  • Rank Transformation
  • Router Transformation
  • Update Strategy Transformation
  • Advanced External Procedure Transformation

Passive Transformation:

A passive transformation does not change the number of rows that pass through it, maintains the transaction boundary, and maintains the row type.

The key point is to note that Designer allows you to connect multiple transformations to the same downstream transformation or transformation input group only if all transformations in the upstream branches are passive. The transformation that originates the branch can be active or passive. 
  • Expression Transformation
  • Sequence Generator Transformation
  • Lookup Transformation
  • Stored Procedure Transformation
  • XML Source Qualifier Transformation
  • External Procedure Transformation
  • Input Transformation(Mapplet)
  • Output Transformation(Mapplet)

Informatica Product Line

Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software and ETL softwares.

The important products provided by Informatica Corporation is provided below:

  • Power Center
  • Power Mart
  • Power Exchange
  • Power Center Connect
  • Power Channel
  • Metadata Exchange
  • Power Analyzer
  • Super Glue
Power Center & Power Mart: Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless environment.

Power Exchange: Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server, db2 etc) and flat files in unix, linux and windows systems.

Power Center Connect: This is add on to Informatica Power Center. It helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Power Channel: This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, tranfer files over FTP, etc.

Meta Data Exchange: Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models. Functional and technical team should have spent much time and effort in creating the data model's data structures(tables, columns, data types, procedures, functions, triggers etc). By using meta deta exchange, these data structures can be imported into power center to identifiy source and target mappings which leverages time and effort. There is no need for informatica developer to create these data structures once again.

Power Analyzer: Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and sharing enterprise data simple and easily available to decision makers. PowerAnalyzer enables to gain insight into business processes and develop business intelligence.
With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data store, or otherdata storage models. PowerAnalyzer is best with a dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform to the dimensional model.

Super Glue: Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue to analyze meta data.


Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on Informatica in near future. Please visit us soon to check back. To know more about Informatica, contact its official website www.informatica.com 

Tuesday, 15 April 2014

ETL Terms

Source System:

Source System is a database, application, file, or other storage facility from which the data in a data warehouse is derived. Some of them are Flat files,Oracle Tables, Microsoft SQL server tables, COBOL Sources, XML files.

Mapping: 

Mapping is the definition of the relationship and data flow between source and target objects. It is a pictorial representation about the flow of data from source to target.

Metadata:

Metadata describes data and other structures, such as objects, business rules, and processes.
Metadata contains all the information about the source tables, target tables, the transformations, so that it will be useful and easy to perform transformations during the ETL process. A repository contains metadata.

Staging Area:

Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data, and perform data cleansing and merging, before loading the data into warehouse.

Data Cleansing:

It is the process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process. The data cleansing technology improves data quality by validating, correctly naming and standardizing data. For example, a person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).

Transformation:

Transformation is the process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources. A transformation is a repository object that generates, modifies, or passes data.

Target System:

Target System is a database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.

Wednesday, 2 April 2014

What is Web Services


  • Web services convert the applications into Web format, are published  can be used by authorized applications.
  • Web services are can be used across any platform and programing langauge. For example, Java can talk with Perl, Windowsapplications cancommunicate with UNIX applications.

  • Web Services are developed to enable interaction between two different platforms.  It processes the real time data very fast with accuracy.

How it works:

Web services works based on 4 elements
  • XML
  • SOAP
  • UDDI
  • WSDL

XML is used to tag (covert data into XML format <city>Bangalore</city>) the data.
SOAP is used to transfer the data.
WSDL is used for describing the services available .
UDDI is used for listing what services are available

1. A service provider describes its service using WSDL. This definition is published to a directory of services. The directory could use Universal Description, Discovery, and Integration (UDDI). Other forms of directories can also be used.
 
2. A service consumer issues one or more queries to the directory to locate a service and determine how to communicate with that service.
 
3. Part of the WSDL provided by the service provider is passed to the service consumer. This tells the service consumer what the requests and responses are for the service provider.

4. The service consumer uses the WSDL to send a request to the service provider.

5. The service provider provides the expected response to the service consumer.


DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...