IBM InfoSphere DataStage Reviews

Name: IBM InfoSphere DataStage
Brand: IBM
Rating: 3.9 (44 reviews)

Vendor: IBM

3.9 out of 5

44 reviews
84% willing to recommend

Leave a review

What is IBM InfoSphere DataStage?

IBM InfoSphere DataStage offers powerful ETL capabilities focusing on data transformation and integration, ensuring seamless data processing and management in complex environments. It is particularly valued for handling extensive data volumes with robust transformation features and scalability options.

Get the IBM InfoSphere DataStage Buyer's Guide and find out what your peers are saying about IBM InfoSphere DataStage, Informatica Intelligent Data Management Cloud (IDMC), Teradata and more!

IBM InfoSphere DataStage is the #16 ranked solution in top Data Integration Tools. PeerSpot users give IBM InfoSphere DataStage an average rating of 7.8 out of 10. IBM InfoSphere DataStage is most commonly compared to Informatica Intelligent Data Management Cloud (IDMC): IBM InfoSphere DataStage vs Informatica Intelligent Data Management Cloud (IDMC). IBM InfoSphere DataStage is popular among the large enterprise segment, accounting for 60% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 26% of all views.

Buyer's Guide

IBM InfoSphere DataStage

June 2026

Get the report

Helped 900,644 peers since 2012

Featured IBM InfoSphere DataStage reviews

Prasad Bodduluri

Senior Data Warehouse Developer at itcinfotech

There is no issue with IBM InfoSphere DataStage's graphical interface for designing data flows, but I will provide feedback that we are gathering the source from the Oracle database mainly, as well as from some spreadsheets. With respect to the Oracle DB Connector, if you write any PL/SQL or SQL with the connectors, there aren't many options, such as executing procedures in the PL/SQL, executing functions, or executing packages. The Oracle connector doesn't have many features and needs improvement. Nowadays many people are writing programs in Python or in PL/SQL with respect to Oracle, so especially in IBM InfoSphere DataStage, there are no features to call programs directly instead of calling them as a script. What I am facing, especially with parallel processing, is that a developer and admin have to sit together. They have to run the job multiple times with different combinations of parallel processing to get the best performance. Instead of that, if the job itself gave some guidance, such as running this parallel processing with this many nodes, it would help; I think that is missing. An additional feature I would want to see in the next release is the ability to work on logs, especially machine logs or artificial logs, to pull semi-structured or unstructured data without having to write extensive code in Python and integrate it. If IBM InfoSphere DataStage provided some feature for this, it would help.

Read full review

reviewer2837757

Data Scientist at a comms service provider with 501-1,000 employees

The best features IBM InfoSphere DataStage offers, in my opinion, is the graphical development interface that helps developers in a sense that there's minimal coding. You can interpret the work that you're doing because you understand what you're doing. It's also scalable and reliable. The governance and security are very robust in a sense that you can provide an authorization scheme. It's not the case that everyone has access to the pipelines or the data that you're trying to consume. In that sense, it is very safe. The governance is improving, and I haven't tried that many features about it.

Read full review

Vikash Yadav

Senior Officer at State Bank of India

IBM InfoSphere DataStage is a stable tool with active support from IBM. Many ETL tools, including open-source ones, are available, but proper support can be an issue. As we are a financial organization, security is our main concern, so we prefer enterprise tools. Additionally, IBM InfoSphere DataStage is very scalable, allowing us to extend it according to our processing needs.

Read full review

IBM InfoSphere DataStage mindshare

As of June 2026, the mindshare of IBM InfoSphere DataStage in the Data Integration category stands at 1.4%, down from 4.9% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Data Integration Mindshare Distribution
Product	Mindshare (%)
IBM InfoSphere DataStage	1.4%
Informatica Intelligent Data Management Cloud (IDMC)	3.7%
SSIS	3.6%
Other	91.3%

Data Integration

PeerResearch reports based on IBM InfoSphere DataStage reviews

Type	Title	Date
Category	Data Integration	Jun 23, 2026	Download
Product	Reviews, tips, and advice from real users	Jun 23, 2026	Download
Comparison	IBM InfoSphere DataStage vs Informatica Intelligent Data Management Cloud (IDMC)	Jun 23, 2026	Download
Comparison	IBM InfoSphere DataStage vs SSIS	Jun 23, 2026	Download
Comparison	IBM InfoSphere DataStage vs Informatica PowerCenter	Jun 23, 2026	Download

Valuable Features

IBM InfoSphere DataStage excels in ETL processes, offering cost-effectiveness, ease of use, and high performance with large data volumes. It supports complex transformations via its transformer feature, facilitates easy database connections, and ensures high scalability with parallel processing. The tool integrates well with other systems and provides robust data integration, quality management, and security. Users appreciate its user-friendly, drag-and-drop interface and the ability to manage transformations efficiently.

"I would recommend trying IBM InfoSphere DataStage; it's very useful for medium to large companies that require a reliable and scalable data integration solution."
"The performance optimization is quite good in DataStage, and it provides parallelism and pipelining mechanisms."
"IBM InfoSphere DataStage is a good product; it is quite useful and powerful."

Room for Improvement

IBM InfoSphere DataStage faces challenges with logging, navigation, and documentation, hindering troubleshooting efficiency. Users report slow server connections, lack of data quality features, and high costs. Integration with modern technologies and cloud systems is limited. The interface is deemed outdated and complex, while cloud migration options are lacking. Performance issues, connectivity with big data, compatibility, and support are also areas needing enhancement. Improved scheduling and reporting, alongside user-friendly solutions, are desired.

"The learning curve for IBM InfoSphere DataStage can be difficult for new users, even if there's a graphical design."
"As a product, it needs to be more stable. It's a legacy product, so even though it's high-performing, it's not very stable compared to other products like Informatica or Talend."
"From a practice point of view, solutions such as IBM InfoSphere DataStage and Oracle Data Integrator are losing ground, whereas open-source solutions are becoming increasingly powerful."

ROI

Many users describe substantial financial success using IBM InfoSphere DataStage, citing improved performance rates and time-saving benefits. Optimizations in process designs reduce maintenance efforts and unnecessary queries, resulting in increased efficiency. Although specific ROI statistics are not always tracked explicitly, the sustained use over years suggests positive outcomes. Multiple projects note enhanced performance by over 200% through addressing costly inefficiencies. Users express satisfaction with its management capabilities despite not all dealing directly with financial tracking.

Pricing

IBM InfoSphere DataStage pricing is considered high, particularly for small and medium-sized businesses. While it's cheaper than Informatica, costs include an annual or permanent license, with initial setup fees, such as $100,000 for on-premises hosting. Licensing models vary, with costs potentially rising with usage. Some users find it competitive yet expensive, while others note its value for enterprise projects. Monthly operational costs can be around $6,000, including data storage and execution.

"The pricing is competitive but on the higher side of the pricing scale."
"The solution is cheap."
"The product is expensive."

Popular Use Cases

IBM InfoSphere DataStage is primarily utilized for ETL processes, enabling organizations to extract, transform, and load data across different systems efficiently. It supports complex data integration from various sources into centralized data warehouses, aids in data migration, data quality, data governance, and enables reporting and business logic application in sectors like telecom, banking, insurance, and healthcare, assisting in customer segmentation and marketing campaigns.

Service and Support

IBM InfoSphere DataStage customer service and support vary by region with mixed experiences. Some users report quick, helpful responses, while others find delays and inefficiencies. Ratings range from five to ten out of ten, with room for improvement suggested, especially in communication and response time. The quality of support seems to depend on the region or specific team involved. Enquiries often require ticket submissions, with some users noting efficient assistance when tickets are raised.

Deployment

IBM InfoSphere DataStage's initial setup presents a range of complexity. While some users find it straightforward due to web-based processes, others report challenges that require technical expertise, especially in complex environments. The process can vary depending on on-premises versus cloud setups and organizational needs. Documentation quality varies, impacting ease of installation. Deployments can take anywhere from hours to weeks, depending on team experience and setup requirements, with successful outcomes often hinging on experienced personnel.

Scalability

IBM InfoSphere DataStage demonstrates high scalability, easily handling large-scale operations across geographies. Users find it suitable for enterprises, with efficient parallel processing and flexible scaling capabilities. Concerns include resource consumption and dependency on server capacity. Some note challenges with hardware-based scaling, suggesting easier cloud usability. Opinions vary, with ratings between five and nine out of ten. Licensing and cost can impact scaling strategies. Users appreciate its compatibility with various data sources and efficient resource management.

Stability

IBM InfoSphere DataStage is often described as stable and reliable, though experiences vary depending on the operating system used. Users note fewer crashes on Linux compared to Windows, where manual memory handling is required. Occasional stability issues can occur with new features or improper process development. Recent versions have seen some instability, but previous versions were steady. User ratings range from five to ten, reflecting diverse experiences with stability.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our IBM InfoSphere DataStage Buyer's Guide for additional reliable information.

Review data by company size

By reviewers
Company Size	Count
Small Business	15
Midsize Enterprise	5
Large Enterprise	20

By reviewers

By visitors reading reviews
Company Size	Count
Small Business	185
Midsize Enterprise	95
Large Enterprise	417

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

26%

Manufacturing Company

10%

Computer Software Company

Government

Retailer

Healthcare Company

Construction Company

Performing Arts

University

Comms Service Provider

Insurance Company

Educational Organization

Marketing Services Firm

Outsourcing Company

Legal Firm

Hospitality Company

Wholesaler/Distributor

Consumer Goods Company

Transportation Company

Real Estate/Law Firm

Energy/Utilities Company

Logistics Company

Aerospace/Defense Firm

Media Company

Non Profit

Recreational Facilities/Services Company

Leisure / Travel Company

Pharma/Biotech Company

Compare IBM InfoSphere DataStage with alternative products

Learn more about IBM InfoSphere DataStage

IBM InfoSphere DataStage is renowned for its strength in data extraction, transformation, and loading, making it a preferred choice for businesses handling large datasets. It provides extensive database connectors, integrates efficiently with existing systems, and facilitates complex data transformations. Users appreciate its scalability, metadata management, and effectiveness in applying business rules. Despite this, areas for improvement include enhanced cloud integration, better error messaging, and expanded connectivity with modern databases. Its pricing scheme and deployment complexity also present considerations for potential users.

What are the key features of IBM InfoSphere DataStage?

ETL Capabilities: Efficiently manages extraction, transformation, and loading of data across different platforms.
Parallel Processing: Processes large data volumes quickly with parallel data handling techniques.
Robust Integration: Seamlessly integrates with a range of databases and existing systems.
Data Transformation: Offers powerful tools for comprehensive data modification and manipulation.
Business Rule Application: Enables the incorporation of sophisticated business rules within data workflows.

What benefits do users highlight in reviews?

High Performance: Maintains quick processing speeds and efficient data handling.
Scalability: Adapts to increased data volumes and workflow loads effectively.
Flexibility: Provides adaptable integration solutions for varied data environments.
Error Logging: Delivers accurate logging to track and resolve issues efficiently.
Impact Analytics: Offers tools to assess data modifications and effects comprehensively.

Businesses in sectors like telecommunications, banking, and insurance commonly implement IBM InfoSphere DataStage for ETL processes. It's used for integrating data from multiple sources into data warehouses, supporting business intelligence initiatives, and managing data quality. Known for efficiently handling integration of mainframes and Oracle databases, it supports complex data projects tailored to industry needs.

IBM InfoSphere DataStage customers

Dubai Statistics Center, Etisalat Egypt

Product Categories

Data Integration

Popular Comparisons

Informatica Intelligent Data Management Cloud (IDMC) vs IBM InfoSphere DataStage

Teradata vs IBM InfoSphere DataStage

SSIS vs IBM InfoSphere DataStage

Qlik Talend Cloud vs IBM InfoSphere DataStage

Palantir Foundry vs IBM InfoSphere DataStage

Informatica PowerCenter vs IBM InfoSphere DataStage

Azure Data Factory vs IBM InfoSphere DataStage

Oracle Data Integrator (ODI) vs IBM InfoSphere DataStage

Domo vs IBM InfoSphere DataStage

Fivetran vs IBM InfoSphere DataStage

Denodo vs IBM InfoSphere DataStage

SnapLogic vs IBM InfoSphere DataStage

Oracle GoldenGate vs IBM InfoSphere DataStage

SAP Data Services vs IBM InfoSphere DataStage

Pentaho Data Integration and Analytics vs IBM InfoSphere DataStage

See all alternatives

IBM InfoSphere DataStage Reviews Summary
Author info	Rating	Review Summary
Senior Data Warehouse Developer at itcinfotech	3.0	I've used IBM InfoSphere DataStage for 16 years; it's stable and integrates well, but lacks direct script execution, flexible Oracle connectors, better parallel processing guidance, and tools for handling unstructured data efficiently. Installation is also complex.
Data Scientist at a comms service provider with 501-1,000 employees	4.0	I use IBM InfoSphere DataStage for ETL, valuing its graphical interface, scalability, and robust security. While it saves time, the learning curve is steep, and licensing costs are confusing. I rate it 8/10 for reliable data integration, especially for medium to large companies.
Senior Officer at State Bank of India	3.5	We use IBM InfoSphere DataStage for ETL, valuing its stability, scalability, and strong support crucial for our financial organization's security needs. However, improvements are needed in connectivity with big data technologies like Spark, compared to older RDBMS systems.
Sr Product Manager at a computer software company with 501-1,000 employees	4.0	I use IBM DataFeeds for ETL and data warehousing, finding it straightforward with good support. While the tool is stable, I'm curious about its support for cloud, open source, or other databases. I rate it 8/10.
Senior Data Architect at Anadolu Sigorta	4.0	I used IBM InfoSphere DataStage for data integration and management across various sectors, appreciating its robust capabilities and unified interface. However, deployment is complex, and high costs impact ROI, despite a 200% performance improvement after optimization.
Manager - Business Technology Solutions at a consultancy with 1,001-5,000 employees	4.5	IBM InfoSphere DataStage is a leading ETL tool, known for being cost-effective and user-friendly, transforming large volumes of data efficiently. However, it needs improved logging and troubleshooting features. Alternatives like Informatica are more expensive, while newer tools have emerged.
Arquitecto Industrial IoT at Xignux SA de CV	3.5	We use IBM InfoSphere DataStage for ETL tasks, valuing its ability to manage large record volumes. It excels in batch data integration but could improve in integrating with modern data sources. We considered Azure Data Factory before selecting DataStage.
Bi Architect at a healthcare company with 10,001+ employees	4.0	I use IBM DataStage for BI/ETL, valuing its efficiency with IBM targets. It needs better non-IBM tool support (e.g., Snowflake). Setup was easy, customer service good. I recommend it, rating it 8/10.
Associate Manager at a consultancy with 10,001+ employees	4.0	I use IBM InfoSphere DataStage for ETL processes and data quality. Its valuable connectors and excellent debugging capabilities stand out, though I heard support will end by 2026, prompting a move to Cloud Pak for Data for continued support.
Solution Architect - Data Engineering at Tenx	3.5	I integrated multiple data sources into a single data warehouse using IBM DataStage. The Transformer is highly valuable for complex transformations, but it lacks some features like custom code integration seen in Talend, which needs improvement.

Prasad Bodduluri

Senior Data Warehouse Developer at itcinfotech

Oct 27, 2025

Has required complex workarounds for scripts and struggles with unstructured data processing

What is our primary use case?

The primary use case is that we are using IBM InfoSphere DataStage to extract the data from the different source systems to build the warehouse.

What is most valuable?

In IBM InfoSphere DataStage, especially in the transformers or stages, there are no features to call the programs directly. Generally, we are making it as a batch file or PowerShell, calling it as a batch file. If there were any provision or option to call the script directly instead of calling the batch file, it would be much better.

I have leveraged IBM InfoSphere DataStage's integration with IBM's Information Server suite, and it is indeed beneficial.

What needs improvement?

Nowadays many people are writing programs in Python or in PL/SQL with respect to Oracle, so especially in IBM InfoSphere DataStage, there are no features to call programs directly instead of calling them as a script.

What I am facing, especially with parallel processing, is that a developer and admin have to sit together. They have to run the job multiple times with different combinations of parallel processing to get the best performance. Instead of that, if the job itself gave some guidance, such as running this parallel processing with this many nodes, it would help; I think that is missing.

An additional feature I would want to see in the next release is the ability to work on logs, especially machine logs or artificial logs, to pull semi-structured or unstructured data without having to write extensive code in Python and integrate it. If IBM InfoSphere DataStage provided some feature for this, it would help.

For how long have I used the solution?

I have been using IBM InfoSphere DataStage for almost 16 years.

What do I think about the stability of the solution?

We are not facing any issues with the product, and I have already closed the ticket.

What do I think about the scalability of the solution?

There is an issue with evaluating the performance of IBM InfoSphere DataStage's parallel processing capabilities because it is dependent on the server capacity. We need to create the virtual nodes and the APT_CONFIG file; then only we will get the features of parallel processing. If the job provided suggestions about running this kind of parallel processing and how many virtual nodes are required, it would help.

How are customer service and support?

The technical support by IBM is very good. I rate their support as nine on a scale from one to ten.

How would you rate customer service and support?

Positive

How was the initial setup?

I was involved in the deployment of IBM InfoSphere DataStage, as I am also a DataStage administrator. The deployment has no issues, and it is good, but the installation generally takes a very long time. There are multiple screens and processes we have to complete, creating new users and applying fix patches, and I think the installation needs to be made simpler.

Which other solutions did I evaluate?

IBM InfoSphere DataStage is on-premises.

What other advice do I have?

So far, we have not utilized IBM InfoSphere DataStage's metadata management features.

Eighty-five percent of the debug issues are solved once we read the job log regarding IBM InfoSphere DataStage's debugging tools, but there are still some issues. For example, when the job fails, we are not getting proper log file generation. The log is generated at the finish and only produces a certain kind of log, for example, a log generated with 40 or 50 lines. The job fails at the first stage, and it has to be stopped at that stage and give a proper log; now that is missing.

My experience with it is good; I recently got a chance to use it again.

For reading the machine log in IBM InfoSphere DataStage, I would rate it only a two because there's not much improvement in this area.

I would want the feature with which you can deal with unstructured data to make it into structured data.

I rate IBM InfoSphere DataStage a six out of ten.

Which deployment model are you using for this solution?

On-premises

reviewer2837757

Data Scientist at a comms service provider with 501-1,000 employees

Jun 17, 2026

Integrated data pipelines have simplified complex ETL workflows and improved governance

What is our primary use case?

I use IBM InfoSphere DataStage mostly for the ETL process of extraction, transforming, or loading data. For example, when I need to extract information from consumer tables, I then perform cleansing, remove duplicates, do standardization, and finally load the data into a data warehouse itself.

I was working with a client that was a bank. As a bank has many transactions per day, I was using IBM InfoSphere DataStage to customize a new table for the company that we were providing services. We performed the whole process of extraction, transforming, or loading.

I also use the orchestrator in IBM InfoSphere DataStage to run pipelines, schedule pipelines, and define roles.

What is most valuable?

The governance and security are very robust in a sense that you can provide an authorization scheme. It's not the case that everyone has access to the pipelines or the data that you're trying to consume. In that sense, it is very safe. The governance is improving, and I haven't tried that many features about it.

What needs improvement?

The learning curve for IBM InfoSphere DataStage can be difficult for new users, even if there's a graphical design. However, it becomes challenging when working with advanced configurations. A quick tutorial guideline would be very useful.

The licensing cost setup for IBM InfoSphere DataStage confused me at first. I was unclear about how it works for different types of organizations. The explanation was not clear to me.

For how long have I used the solution?

I have three years of experience using IBM InfoSphere DataStage, combining my previous data analyst position at another company.

What do I think about the stability of the solution?

IBM InfoSphere DataStage is pretty consistent. I haven't seen any downtime besides sometimes the troubleshooting process, but it is very documented, which is good.

What do I think about the scalability of the solution?

IBM InfoSphere DataStage's scalability is not something they need to worry about because it works with tons of data, and I was able to scale my database.

How are customer service and support?

IBM InfoSphere DataStage's customer support is very open about questions. They help me with tickets, so I think it's good. Overall, it's good.

Which solution did I use previously and why did I switch?

IBM InfoSphere DataStage was the first tool I used. It was introduced to my company, and I'm not involved in the part of choosing the tool. They assigned the tool to me, but I think it went well.

How was the initial setup?

My experience with pricing, setup cost, and licensing was that there was not a clear differentiation between these tiers of costs. For me, that was one of the things that was a shock. I think it could be better overall.

What was our ROI?

I would say time saved is the benefit with IBM InfoSphere DataStage. Instead of performing three things separately, now it's done in one thing. I cannot talk about return of investment because I'm not in the position that manages finances, but in the end, I think time saving would be the best benefit.

What's my experience with pricing, setup cost, and licensing?

The cost is mostly expensive. Dividing the cost of the three features I mentioned before, it's expensive because you also have to add to that the platform that you're running into. However, if you have everything in one platform and you understand the flow of the data from the beginning to the end, it's very helpful.

Which other solutions did I evaluate?

I heard about testing data orchestration from Google Cloud Services before choosing IBM InfoSphere DataStage, because this other competitor has BigQuery as a tool and they have other platforms to perform the extraction, transforming, and loading of data.

What other advice do I have?

IBM InfoSphere DataStage has a lot of features, but it needs improvement too.

I would rate IBM InfoSphere DataStage an eight out of ten because the learning curve for people at the beginning is the main concern, and since you don't have tutorials on the tool, it makes you waste time going to other platforms to see how it works. It's mostly because of that. However, overall it's a good tool. I think it's very convenient.

Regarding IBM InfoSphere DataStage's accuracy and reliability of output, I would say yes to some extent because I haven't tried the tool that far. It brings the best AI hints and prompts that you need to improve accuracy and reliability.

I would recommend trying IBM InfoSphere DataStage. It's very useful for medium to large companies that require a reliable and scalable data integration solution. My overall rating for this product is eight out of ten.

Which deployment model are you using for this solution?

On-premises

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other

Vikash Yadav

Senior Officer at State Bank of India

Apr 22, 2025

Supports secure ETL processes, connectivity with new technologies needs work

What is our primary use case?

We use IBM InfoSphere DataStage for ETL purposes, specifically extraction, transformation, and loading.

What is most valuable?

What needs improvement?

The solution needs improvement in connectivity with big data technologies such as Spark. While it is better when connected to RDBMS databases, connectivity with new technologies and open-source ETL tools could be enhanced.

For how long have I used the solution?

We have been using IBM InfoSphere DataStage for ten years.

What was my experience with deployment of the solution?

The deployment of IBM InfoSphere DataStage is easy.

What do I think about the scalability of the solution?

IBM InfoSphere DataStage is very scalable, allowing us to extend it based on our processing requirements.

How are customer service and support?

IBM tech support has allocated dedicated resources, making it satisfactory. I would rate their tech support as eight out of ten.

How would you rate customer service and support?

Positive

What's my experience with pricing, setup cost, and licensing?

Pricing for IBM InfoSphere DataStage is moderate and not much expensive. It helps save time, particularly during debugging, approximately twenty to thirty percent.

Which other solutions did I evaluate?

Other popular ETL tools include Talend and Informatica. Our organization initially started with IBM InfoSphere DataStage.

What other advice do I have?

I rate my overall experience with IBM InfoSphere DataStage as seven. We face some challenges with connectivity when adapting to new technologies and integrations. On a scale of 1 to 10, I would rate the overall solution as a 7.

Which deployment model are you using for this solution?

On-premises

Swetha S

Sr Product Manager at a computer software company with 501-1,000 employees

Jan 9, 2025

The solution streamlines design, development, and deployment with effective ETL features

What is our primary use case?

We are using IBM DataFeeds for data conversion activities and also for data warehousing. Our team focuses on converting data from a legacy system to a modernized platform. We use this solution quickly for data conversion and for the data warehousing aspect.

What is most valuable?

It is mostly the ETL part that we have been using, including DataStage and the ETL side of it. It is straightforward from a design and development perspective, and also for deployment. The failure detection has been very useful for us, as well as the load balancing feature.

What needs improvement?

We predominantly use this for the Oracle side of things. I wonder if it supports other areas, such as cloud environments with open source support, or EdgeShift. I am unsure of whether it supports other databases like Postgres or Redshift.

For how long have I used the solution?

I have used this solution for more than five to 12 years.

What do I think about the stability of the solution?

There could have been issues in terms of the network, the system side, or the data itself, but not from the tool's perspective.

How are customer service and support?

The support has been really good. Typically, if we have any issues, we raise a ticket with IBM, and they help us resolve the issues if required. We also have the flexibility to submit a feature request to be included as part of the wishlist, potentially becoming a product feature in subsequent releases.

How was the initial setup?

I would rate it as eight. I think the first time we do it, it might be a bit difficult. However, later on, we document the necessary steps and try to automate them. Once the data is available, it is easy to use.

What about the implementation team?

Typically, we have only one or two people involved in the deployment and configuration. It does not require more than that.

What other advice do I have?

We do not have all the required administrative access to do it by ourselves. Sometimes, we need to collaborate with other teams, such as networking and system teams. There is a lot of collaboration across the board, and it really depends on the support we get from them. Once the software is set up in terms of configurations, it should be easier for us to build.

I would rate the solution as eight out of ten.

Yusuf Arslan

Senior Data Architect at Anadolu Sigorta

Apr 15, 2024

Provides governance, data management with improved connectors and Kafka connectivity

What is our primary use case?

We used various ETA steps, such as SDGs and data management. We employed DataStage for integration, governance, data management, and data quality cleaning steps. Additionally, we leveraged real-time data applications for projects in both government and corporate sectors. I have over five years of experience with DataStage and am familiar with all steps, methodologies, best practices, and ETL strategies.

How has it helped my organization?

Traditional companies still prefer the on-premise version. DataStage has also improved its connectors, such as connectivity with Kafka for real-time data integration, cloud connectors, and others like Spark and HVAC SaaS. All these processes are expected to shift to the cloud in the next five years. It's a very robust tool, much like PowerCenter. Many companies, including large enterprises, use DataStage for all their integration needs.

What is most valuable?

IBM InfoSphere DataStage offers robust data integration capabilities. It provides powerful data and methodology management features compared to other tools like Power Exchange. For example, DataStage allows you to integrate various data sources and manage transformations efficiently. Unlike other tools, DataStage enables comprehensive impact analytics from source to target in a unified interface. While DataStage continues to evolve, it remains a solid choice for ETL processes. Advancements in cloud integration and data connectors are expected to enhance IBM's offerings further.

What needs improvement?

The deployment could be more straightforward.

For how long have I used the solution?

I have been using IBM InfoSphere DataStage for three years.

How are customer service and support?

Our support team connected in minutes of government compliance because we encountered issues with our CDC methodology. We utilized DataStage to push data from the source to a file, but unfortunately, the file was lost during transmission to the target. This change caused numerous problems. To resolve them, we discussed, identified the issues, and re-sent all requirements, steps, and logs to fix the problem. This involved contacting the support team and referencing the support ID.

How would you rate customer service and support?

Positive

How was the initial setup?

When deploying using IBM InfoSphere DataStage, the initial steps involve defining the source and target connections. The first step is establishing these connections, ensuring data can flow from the source to the target.

Subsequent steps, such as implementing changes or updates, can be approached in two ways. If a change data capture mechanism like IBM InfoSphere or similar tools like Oracle GoldenGate is available, DataStage can leverage these tools to propagate changes. Alternatively, if such mechanisms are not in place, DataStage can handle updates using its ETL capabilities, which may require more effort.

DataStage performs incremental updates when dealing with large and smaller reference tables.

What was our ROI?

Numerous jobs in IBM InfoSphere DataStage have higher costs. By addressing these issues, we've improved performance rates by more than 200%. We identified costly steps in DataStage that are causing poor performance, such as inefficient lookups and joining operations. By optimizing these steps and addressing issues with strategy and design, we've reduced maintenance efforts and unnecessary SQL queries. While some steps still require manual intervention, IBM developers are actively considering these challenges, as they are common across many companies.

What other advice do I have?

IBM InfoSphere DataStage is versatile in integrating with both data lakes and traditional enterprise data warehouse systems. It supports ETL processes such as data standardization, cleaning, conforming, and transformation. We utilize DataStage to embed SQL code from source systems, enabling efficient data transformation and calculation. For performance optimization, we employ push-down optimization techniques.

In IBM InfoSphere DataStage, we leverage parallel processing for all steps and data movements, employing a push-down methodology for handling big data. This approach involves parallelizing sessions and harnessing DataStage's robust mechanisms for highly efficient and scalable data processing.

If you are maintaining or have developed IBM InfoSphere DataStage, for example, you can use the scenario view to push data from the source staging area or source mechanism. We also support real-time data if you need real-time data, although we haven't changed that aspect here. We primarily use the ETA tool to push data into a specified layer in real-time. However, we handle data loads periodically, such as monthly or weekly. For instance, if a manager or end-user requires weekly or monthly reports, we can run all the necessary steps using the Semantic layer. We can adjust our model accordingly based on the customer or end-user's reporting needs, whether monthly or yearly.

Overall, I rate the solution an eight out of ten.

Rahul Saxena

Manager - Business Technology Solutions at a consultancy with 1,001-5,000 employees

Jan 23, 2024

A helpful and cost-effective tool that performs well and is very easy to use

What is our primary use case?

The tool is used for ETL. It is one of the leading tools in the industry.

What is most valuable?

It is a very helpful and cost-effective tool for ETL. Data-rich industries use such tools to transform tons of data into insights. The solution is very easy to use. The documentation is self-explanatory. The product performs well. It can be used when we deal with tons of data and millions of records and want to transform it and connect with multiple databases. It can do such operations in the shortest period. It is one of the USPs of the tool.

What needs improvement?

The product must improve logging. It must also improve the navigation guide. When there is an issue, the product must give us insight into why a particular task failed. The troubleshooting guide is very bad. There is no detailed documentation, and the troubleshooting must be done manually. It is a time-consuming task. We must spend much time finding the root cause of a particular task or execution failure. It is very difficult to find an expert in DataStage.

For how long have I used the solution?

I have been using the tool for six years.

What do I think about the stability of the solution?

The solution is quite mature. It has been in the technology ecosystem for more than two decades. In 2008 and 2009, the solution did not have some connectors. However, the product designers gave us the flexibility to design our connectors. Custom features were always available in the product. It was quite helpful for the end users.

What do I think about the scalability of the solution?

Less than 100 people are using the solution in our organization.

How are customer service and support?

The support team needs to understand the importance of end users. If the user reaches out to the team for support, the resolution and turnaround time must be very quick. IBM must put a strong team that can improve the customer experience.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup is quite easy.

What's my experience with pricing, setup cost, and licensing?

DataStage’s cost is one-third of that of Informatica. The solution is cheap. Any client would opt for it.

Which other solutions did I evaluate?

In 2008, Informatica was the only competitor. However, the price and licensing cost of Informatica was very high. So, we used IBM InfoSphere DataStage. After 2013, several other tools like Talend were launched. Informatica was also revamped. Many tools in the cloud follow the ideologies of DataStage and Informatica. Informatica has always been the most expensive tool, though. DataStage can do everything that Informatica can do.

What other advice do I have?

I deal with companies from the healthcare industry. The solutions are largely cloud-based. In data-rich industries like telecom or BFSI, such tools are extensively used. Healthcare also has a lot of data. I will encourage people to use the solution. It is quite an easy tool.

Every stage has a help guide. It’s an extensive documentation. We can understand the purpose of a stage, how the connection has to be set up, how to set up a username and password, and whom we should contact. New users must start using the tool and explore it. They might have to invest ten days or two weeks to understand the workflows and options. It is easy to learn. My company is a partner with IBM.

Overall, I rate the product a nine out of ten.

Which deployment model are you using for this solution?

On-premises

ARTURO MONTIEL

Arquitecto Industrial IoT at Xignux SA de CV

Feb 21, 2024

Effectively handles large volumes of records for ETL processes

What is our primary use case?

We primarily use IBM InfoSphere DataStage for ETL scenarios, extracting, transforming, and loading data across various systems efficiently. Although we experimented with using it for real-time or on-time integrations in the past few years, our main focus remains on its strength in traditional batch data integration processes.

What is most valuable?

The most valuable feature for our data processing needs is IBM InfoSphere DataStage's capability to handle ETL tasks with large record volumes.

What needs improvement?

Improvements for DataStage could include better integration with modern data sources like cloud solutions and documents, along with enhancing its capability to handle non-structured data.

For how long have I used the solution?

I have been working with IBM InfoSphere DataStage for almost four years.

What do I think about the stability of the solution?

I would rate the stability of DataStage at around eight out of ten. While it is generally stable, occasional issues arise due to our team not consistently following best practices during process development, impacting server installations.

What do I think about the scalability of the solution?

I would rate the scalability of DataStage at around seven out of ten for our organization because our licensing is based on CPUs, which complicates scaling without hardware adjustments.

How are customer service and support?

I would rate IBM's support for InfoSphere at around five out of ten. It is complicated due to the reliance on our partner and integrator for support, which sometimes affects the quality of assistance we receive directly from IBM.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup of DataStage, implemented through an IBM partner in Mexico, faced challenges due to partner skill gaps, leading to some complications. Deployment took around one to two years until the system became more stable, and currently, a team of three handles maintenance and support, with two individuals at level two communicating with IBM for assistance.

Which other solutions did I evaluate?

Before choosing IBM, we evaluated Data Factory from Azure. The main difference lies in DataStage's strength in handling large-scale scenarios for data integration, while Data Factory is more suitable for specific service scenarios and may require complementary tools for broader use cases, like email integration.

What other advice do I have?

We have used IBM InfoSphere DataStage effectively for managing Big Data within our products, particularly in scenarios involving large volumes of records for ETL processes. However, we have seen that for near real-time or on-time integration tasks, DataStage may not be optimal due to its resource-intensive nature.

DataStage's scalability has indeed supported our data growth, particularly for ETL tasks involving large volumes of data, enabling us to manage increased data loads effectively.

The scalability of DataStage supported our data growth by allowing us to manage increased data loads effectively, primarily through optimizing the usage of the tool rather than inherent scalability features. However, we faced challenges with real-time processing as DataStage could not trigger processes based on events like emails, requiring us to schedule tasks at intervals, which limited its suitability for real-time scenarios.

DataStage integrates with our existing IT infrastructure by connecting to our manufacturing processes and systems like ERP and SAP. It facilitates integration by consolidating data from various sources, enabling us to view unified information across our systems.

I would recommend DataStage for data integration, especially for SQL data and ETL tasks.

Overall, I would rate DataStage at a seven out of ten. While it is a robust solution for data integration and ETL tasks, there is room for improvement in adopting more modern architectures to meet evolving needs.

reviewer2597208

Bi Architect at a healthcare company with 10,001+ employees

Dec 4, 2024

High efficiency with good backend code generation and IBM system optimization

What is our primary use case?

The primary use case for IBM InfoSphere DataStage is related to standard business intelligence (BI) and extract, transform, load (ETL) operations.

What is most valuable?

DataStage is pretty good if the target is an IBM product like Netezza or Netezza on Cloud Pak. The optimization they do is much better compared to writing to non-IBM products. It generates highly efficient backend code to write data onto IBM systems, which I find valuable. The team was able to use it to address the requirements of the users.

What needs improvement?

They can provide better support for non-IBM tools when it comes to the target. Specifically, with Snowflake, there is no push-down optimization, which is a drawback when using DataStage.

For how long have I used the solution?

I have had experience with DataStage for about three to four years.

How are customer service and support?

I would rate their customer service and support at eight out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did not switch over from a different solution. This is what we started with.

How was the initial setup?

The initial setup was straightforward. It was a few weeks project to set everything up.

What was our ROI?

We are happy with what we are seeing. We are able to manage our requirements with the tool so far.

What's my experience with pricing, setup cost, and licensing?

I do not have access to the pricing or standard licensing fee information.

Which other solutions did I evaluate?

I was exploring and came to PeerSpot to evaluate different solutions.

What other advice do I have?

For our requirements, we are able to manage with DataStage, and the support is good. I would recommend it to others.

I rate the overall solution an eight out of ten.

Murali B

Associate Manager at a consultancy with 10,001+ employees

Mar 28, 2024

Facilitated our peak data integration projects, offers good GUI and availability of connectors is strong

What is our primary use case?

I use it for ETL processes and data quality.

How has it helped my organization?

DataStage facilitated our peak data integration projects. For example, big data integrations have happened, particularly when we worked with BigQuery files... that integration server.

DataStage parallel processing capabilities have improved data tasks. When I worked with DataStage, it could handle around two terabytes of data. We have other appliances as well, but we're processing data concurrently. It was good. My team supported it well, and everything worked fine.

The GUI was good. Compared to Cloud Pak for Data, we have some enhanced connectors in the standard InfoSphere DataStage version. That standard version is really good; it's easy to use.

When we want to find out the absolute quality of data, the governance features really helped. For example, when we tried to identify discrepancies between systems, it worked well.

What is most valuable?

The most valuable feature for me is the connectors. Basically, the availability of connectors is strong.

Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good.

I also reached out to teams when procuring specific connectors. That process would sometimes take time, but ultimately, DataStage worked well in that regard.

What needs improvement?

DataStage is a standalone product, but from what I understand, support for this version will end in 2026. Customers will need to use Cloud Pak for Data.

That's what I heard when we were doing an assessment for one of the biggest banks. They are migrating to Cloud Pak for Data because of the support timeline.

For how long have I used the solution?

I have been using it for seven years now.

What do I think about the stability of the solution?

The product is stable. I would rate the stability an eight out of ten.

What do I think about the scalability of the solution?

I'd rate the scalability a nine out of ten. I've used other solutions like Informatica, but I find IBM DataStage more comfortable in terms of scalability.

I work primarily with enterprise businesses.

How are customer service and support?

There's room for improvement in customer service and support. They should put more effort into this. When we reach out for support, answers from the R&D team can take a long time.

How would you rate customer service and support?

Neutral

How was the initial setup?

In terms of intermediate storage, we have some challenges, especially with customers who store data in intermediate locations.

Also, when we migrated from standard DataStage to Cloud Pak for Data, we faced a lot of challenges surrounding data latency.

One client is located at two different locations, about 900 miles away, and they have strict expectations for low latency between systems. The initial infrastructure setup was fine, but we had intermittent storage issues.

We relied on other service providers to assist in that situation.

The deployment time depends. Sometimes, it's just a few hours, depending on the applications involved. When I did a large migration from one version to another, it took almost a month to migrate about 20,000 processes.

If there are no code changes in the system, the migration might take about a day. In some cases, where there are code changes, it could take even longer.

What's my experience with pricing, setup cost, and licensing?

The pricing is competitive but on the higher side of the pricing scale.

It depends on the project's budget. For very large enterprise-level projects, DataStage is still a good choice.

What other advice do I have?

If they're already using DataStage, definitely. If it's a new implementation, there are other tools with competitive pricing that might be worth considering.

Overall, I would rate the solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises

Amir Amin

Solution Architect - Data Engineering at Tenx

Sep 12, 2023

Allows for the integration of multiple data sources into a single data warehouse but there is potential for scalability improvement

What is our primary use case?

We have integrated multiple data sources into a single data warehouse. For this, we used to build complex ETL jobs and datasets to integrate data from multiple sources into a single data warehouse. So, these are basically the use cases.

What is most valuable?

In IBM DataStage, the Transformer is the most valuable feature for me. It enables me to apply complex transformations, generate the gateway key, and map source tables into the session table.

What needs improvement?

So, there are some features that are missing. If I compare DataStage to Talend, Talend allows you to write custom code in Java or use these tools in your applications as well if you are building a job application. But in DataStage, it does not allow you to write custom code for any component.

Moreover, Talend allows you to extract Java code and call it in your APIs or applications, DataStage does not have this feature.

In future releases, DataStage could benefit from the ability to save metadata into a database. So, if the database crashes or you lose the data in the database, you could recover it. Unlike files, which are harder to manage.

For how long have I used the solution?

I have been using this solution for five years.

What do I think about the stability of the solution?

I would rate the stability of this solution a seven out of ten.

What do I think about the scalability of the solution?

I would rate the scalability of this solution a five out of ten. It should be improved. We have almost eight end users in our area. Some are engineers, one is an administrator, and two of them monitor the pipelines. The rest are developers.

We plan to increase the further usage.

How are customer service and support?

In terms of documentation and support, IBM is reliable for providing support to its partners or those with licenses. You can easily find problem resolution support online.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I have previously worked on SSIS. After using SSIS, I moved to DataStage. We explored IBM DataStage for our specific needs.

How was the initial setup?

I would rate my experience with the initial setup a seven out of ten, where one is difficult and ten is easy. I have worked both on-premises and deployed it on a private cloud.

The deployment process usually takes a day.

What about the implementation team?

For deployment, you first need to install DataStage on the desired server. Then, you have to take a backup of the development and deploy it on the server. After importing, you need to execute and schedule it through your job application.

People required for the deployment depends on the scenario. Sometimes, one person is more than enough for deployment.

What's my experience with pricing, setup cost, and licensing?

Pricing is handled by the procurement department. But compared to other enterprise tools like Informatica or Pentaho, IBM DataStage is quite cheaper.

What other advice do I have?

I would highly recommend this solution because of its shared-nothing architecture that it uses, the capabilities it offers, and the fact that every feature has its own use. For example, it has a Director for creating jobs, clients for monitoring and scheduling jobs, and an Administrative client for administration purposes. This is something well managed by IBM.

Overall, I would rate the solution a seven out of ten. There are certain areas of improvement.

Title	Rating	Mindshare	Recommending
Informatica Intelligent Data Management Cloud (IDMC)	4.0	3.7%	92%	215 interviews Add to research
Teradata	4.1	1.1%	88%	83 interviews Add to research

IBM InfoSphere DataStage Reviews

What is IBM InfoSphere DataStage?

Featured IBM InfoSphere DataStage reviews

IBM InfoSphere DataStage mindshare

PeerResearch reports based on IBM InfoSphere DataStage reviews

Valuable Features

Room for Improvement

ROI

Pricing

Popular Use Cases

Service and Support

Deployment

Scalability

Stability

Review data by company size

Top industries

Compare IBM InfoSphere DataStage with alternative products

Learn more about IBM InfoSphere DataStage

IBM InfoSphere DataStage customers

Related questions

Product Categories

Popular Comparisons

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

Which other solutions did I evaluate?

What other advice do I have?

Which deployment model are you using for this solution?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Which deployment model are you using for this solution?

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What was my experience with deployment of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Which deployment model are you using for this solution?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

How are customer service and support?

How was the initial setup?

What about the implementation team?

What other advice do I have?

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What was our ROI?

What other advice do I have?