Friday, 28 March 2014

Informatica Software Architecture

Informatica ETL product, known as Informatica Power Center consists of 3 main components.

1. Informatica PowerCenter Client Tools:
These are the development tools installed at developer end. These tools enable a developer to

•Define transformation process, known as mapping. (Designer)
•Define run-time properties for a mapping, known as sessions (Workflow Manager)
•Monitor execution of sessions (Workflow Monitor)
•Manage repository, useful for administrators (Repository Manager)
•Report Metadata (Metadata Reporter)

2. Informatica PowerCenter Repository:
Repository is the heart of Informatica tools. Repository is a kind of data inventory where all the data related to mappings, sources, targets etc is kept. This is the place where all the metadata for your application is stored. All the client tools and Informatica Server fetch data from Repository. Informatica client and server without repository is same as a PC without memory/harddisk, which has got the ability to process data but has no data to process. This can be treated as backend of Informatica.

3. Informatica PowerCenter Server:
Server is the place, where all the executions take place. Server makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system.

Informatica PowerCenter Components

Informatica PowerCenter is not just a tool but an end-to-end data processing and data integration environment. It facilitates organizations to collect, centrally process and redistribute data. It can be used just to integrate two different systems like SAP and MQ Series or to load data warehouses or Operational Data Stores (ODS).

Now Informatica PowerCenter also includes many add-on tools to report the data being processed, business rules applied and quality of data before and after processing.

To facilitate this PowerCenter is divided into different components:

PowerCenter Domain: As Informatica says “The Power Center domain is the primary unit for management and administration within PowerCenter”. Doesn’t make much sense? Right… So here is a simpler version. Power Center domain is the collection of all the servers required to support Power Center functionality. Each domain has gateway (called domain server) hosts. Whenever you want to use Power Center services you send a request to domain server. Based on request type it redirects your request to one of the Power Center services.
PowerCenter Repository: Repository is nothing but a relational database which stores all the metadata created in Power Center. Whenever you develop mapping, session, workflow, execute them or do anything meaningful (literally), entries are made in the repository.
Integration Service: Integration Service does all the real job. It extracts data from sources, processes it as per the business logic and loads data to targets.
Repository Service: Repository Service is the one that understands content of the repository, fetches data from the repository and sends it back to the requesting components (mostly client tools and integration service)
PowerCenter Client Tools: The PowerCenter Client consists of multiple tools. They are used to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The PowerCenter Client connects to the repository through the Repository Service to fetch details. It connects to the Integration Service to start workflows. So essentially client tools are used to code and give instructions to PowerCenter servers.
PowerCenter Administration Console: This is simply a web-based administration tool you can use to administer the PowerCenter installation.

There are some more not-so-essential-to-know components discussed below:

Web Services Hub: Web Services Hub exposes PowerCenter functionality to external clients through web services.
SAP BW Service: The SAP BW Service extracts data from and loads data to SAP BW.
Data Analyzer: Data Analyzer is like a reporting layer to perform analytics on data warehouse or ODS data.
Metadata Manager: Metadata Manager is a metadata management tool that you can use to browse and analyze metadata from disparate metadata repositories. It shows how the data is acquired, what business rules are applied and where data is populated in readable reports.
PowerCenter Repository Reports: PowerCenter Repository Reports are a set of prepackaged Data Analyzer reports and dashboards to help you analyze and manage PowerCenter metadata.

Thursday, 27 March 2014

Why do we need ETL Tools?

Business Case: The company has over 100+ years of history & presence in almost all the industries. Over these years company’s management style has been changed from book keeping to SAP. This transition was not a single day transition. In transition, from book keeping to SAP, they used a wide array of technologies, ranging from mainframes to PCs, data storage ranging from flat files to relational databases, programming languages ranging from Cobol to Java.This transformation resulted into different businesses, or to be precise different sub businesses within a business, running different applications, different hardware and different architecture. Technologies are introduced as and when invented & as and when required.

This directly resulted into the scenario, like HR department of the company running on Oracle Applications, Finance running SAP, some part of process chain supported by mainframes, some data stored on Oracle, some data on mainframes, some data in VSM files & the list goes on. If one day company requires a consolidated reports of assets, there are two ways.

First completely manual, generate different reports from different systems and integrate them.
Second fetch all the data from different systems/applications, make a Data Warehouse, and generate reports as per the requirement.
Obviously second approach is going to be the best.

Now to fetch the data from different systems, making it coherent, and loading into a Data Warehouse requires some kind of extraction, cleansing, integration, and load. ETL stands for Extraction, Transformation & Load.

ETL Tools provide facility to Extract data from different non-coherent systems, cleanse it, merge it and load into target systems.

Tuesday, 25 March 2014

XQuery Select nodes and FLWOR expression

Look at the following path expression:

doc("books.xml")/bookstore/book[price>30]/title

The expression above will select all the title elements under the book elements that are under the bookstore element that have a price element with a value that is higher than 30.

FLWOR expression

FLWOR is an acronym for "For, Let, Where, Order by, Return".

The for clause selects all book elements under the bookstore element into a variable called $x.
The where clause selects only book elements with a price element with a value greater than 30.
The order by clause defines the sort-order. Will be sort by the title element.
The return clause specifies what should be returned. Here it returns the title elements.

Example FOR clause

The for clause binds a variable to each item returned by the in expression. The for clause results in iteration. There can be multiple for clauses in the same FLWOR expression.

To loop a specific number of times in a for clause, you may use the to keyword:

for $x in (1 to 5)

return <test>{$x}</test>

Result:

The at keyword can be used to count the iteration:

for $x at $i in doc("books.xml")/bookstore/book/title

return <book>{$i}. {data($x)}</book>

Result:

<book>1. Everyday Italian</book>

<book>2. Harry Potter</book>

<book>3. XQuery Kick Start</book>

<book>4. Learning XML</book>

It is also allowed with more than one in expression in the for clause. Use comma to separate each in expression:

for $x in (10,20), $y in (100,200)

return <test>x={$x} and y={$y}</test>

Result:

Example LET Clause

The let clause allows variable assignments and it avoids repeating the same expression many times. The let clause does not result in iteration.

let $x := (1 to 5)

return <test>{$x}</test>

Result:

Example WHERE Clause

The where clause is used to specify one or more criteria for the result

where $x/price>30 and $x/price<100

Example ORDER BY Clause

The order by clause is used to specify the sort order of the result. Here we want to order the result by category and title

for $x in doc("books.xml")/bookstore/book
order by $x/@category, $x/title
return $x/title

Result:

<title lang="en">Harry Potter</title>
<title lang="en">Everyday Italian</title>
<title lang="en">Learning XML</title>

<title lang="en">XQuery Kick Start</title>

Example RETURN Clause

The return clause specifies what is to be returned

for $x in doc("books.xml")/bookstore/book
return $x/title

Result:

<title lang="en">Everyday Italian</title>
<title lang="en">Harry Potter</title>
<title lang="en">XQuery Kick Start</title>

<title lang="en">Learning XML</title>

Example FLWOR

for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title
for - (optional) binds a variable to each item returned by the in expression
let - (optional)
where - (optional) specifies a criteria
order by - (optional) specifies the sort-order of the result

return - specifies what to return in the result

XQuery Conditional Expressions and Comparisons

"If-Then-Else" expressions,

example:

for $x in doc("books.xml")/bookstore/book
return if ($x/@category="CHILDREN")
then <child>{data($x/title)}</child>

else <adult>{data($x/title)}</adult>

Comparisons

In XQuery there are two ways of comparing values.

1. General comparisons: =, !=, <, <=, >, >=

2. Value comparisons: eq, ne, lt, le, gt, ge

The difference between the two comparison methods are shown below.

The following expression returns true if any q attributes have a value greater than 10:

$bookstore//book/@q > 10

The following expression returns true if there is only one q attribute returned by the expression, and its value is greater than 10. If more than one q is returned, an error occurs:

$bookstore//book/@q gt 10

XQuery

XQuery is to XML what SQL is to database tables.
XQuery is designed to query XML data - not just XML files, but anything that can appear as XML, including databases.
XQuery for XML is like SQL for databases
XQuery is built on XPath expressions
XQuery is supported by all major databases
XQuery is a W3C Recommendation
XQuery is compatible with several W3C standards, such as XML, Namespaces, XSLT, XPath, and XML Schema.

What You Should Already Know

Before you continue you should have a basic understanding of the following:

HTML / XHTML
XML / XML Namespaces
XPath

Basic Syntax Rules

XQuery is case-sensitive
XQuery elements, attributes, and variables must be valid XML names
An XQuery string value can be in single or double quotes
An XQuery variable is defined with a $ followed by a name, e.g. $bookstore
XQuery comments are delimited by (: and :), e.g. (: XQuery Comment :)

Monday, 3 March 2014

DATA WAREHOUSE DIMENSIONAL MODELLING (TYPES OF SCHEMAS)

There are four types of schemas are available in data warehouse. Out of which the star schema is mostly used in the data warehouse designs. The second mostly used data warehouse schema is snow flake schema. We will see about these schemas in detail.

Star Schema:

A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.

Snow Flake Schema:

A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.

Galaxy Schema:

Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). This schema is a combination of many data marts.

Fact Constellation Schema:

The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city; constellation schema would have five dimensions instead of one.

Subramanian G

Friday, 28 March 2014

Informatica Software Architecture

Informatica PowerCenter Components

Thursday, 27 March 2014

Why do we need ETL Tools?

Tuesday, 25 March 2014

XQuery Select nodes and FLWOR expression

XQuery Conditional Expressions and Comparisons

XQuery

Monday, 3 March 2014

DATA WAREHOUSE DIMENSIONAL MODELLING (TYPES OF SCHEMAS)

Kiro - Core Features

Search This Blog