No more typing reviews! Try our Samantha, our new voice AI agent.
Analytics Solution Manager at Telekom Malaysia
Real User
Jul 31, 2023
A cost-effective solution widely adopted and with a broad open-source community
Pros and Cons
  • "Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop."
  • "There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it."

What is most valuable?

Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop.

What needs improvement?

There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it.

For how long have I used the solution?

I have been using Apache Airflow for five years. We are using the latest version of the solution.

What do I think about the stability of the solution?

I rate the solution’s stability a seven out of ten.

Buyer's Guide
Apache Airflow
June 2026
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.
900,644 professionals have used our research since 2012.

What do I think about the scalability of the solution?

The scalability is good. We have five people working on it for five different projects.

I rate the solution’s scalability a ten out of ten.

Which solution did I use previously and why did I switch?

We have used open-source Apache NiFi for data flow, Talend, and secret server integration services.

We chose Apache Airflow because it is quite popular, adopted by many people, and has an open-source community and engineers. We moved with the crowd and chose it based on popularity.

How was the initial setup?

I rate the initial setup five on a scale of one to ten, one being difficult and ten being easy.

The deployment required a senior engineer and took a week to complete.

What's my experience with pricing, setup cost, and licensing?

The solution is cheap.

What other advice do I have?

A new user has to be prepared to adopt a new paradigm and treat the data baseline as code rather than drag and drop. An organization should have a dedicated or experienced person looking into this.

Overall, I rate the solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Bernd Stroehle. - PeerSpot reviewer
Enterprise Architect at kosakya
Real User
Top 5Leaderboard
Sep 6, 2024
An open-source solution that has limitations in processing too many jobs
Pros and Cons
  • "I worked on a project at a leading German bank for two years, successfully migrating large applications with hundreds of jobs."
  • "Apache Airflow improved workflow efficiency, but we had to find solutions for large workflows. For instance, a monthly workflow with 1200 jobs had to be split into three to four pieces as it struggled with large job numbers. Loading a workflow with 500 jobs could take 10 minutes, which wasn't acceptable."

What needs improvement?

Apache Airflow improved workflow efficiency, but we had to find solutions for large workflows. For instance, a monthly workflow with 1200 jobs had to be split into three to four pieces as it struggled with large job numbers. Loading a workflow with 500 jobs could take 10 minutes, which wasn't acceptable.

The most important feature Apache Airflow lacks is support for external configuration files. All classical schedulers like Control-M or Automic allow you to load workflow definitions from YAML, XML, or JSON files, but the tool requires you to write Python programs. Airflow only supports external configuration for variables, not for workflows. To address this, I created a YAML configuration file that I converted into Python programs, but this functionality is missing from Apache Airflow itself.

All of its competitors have this feature. In Control-M, Automic, and IBM's scheduler, you can load workflows from XML, JSON, or YAML files.

For how long have I used the solution?

I've been familiar with Apache Airflow for about three to four years. I worked on a project at a leading German bank for two years, successfully migrating large applications with hundreds of jobs. However, the leading German bank paused its migration strategy due to issues with the team in India. They're likely waiting for version 3, which is expected next year.

What do I think about the stability of the solution?

I rate the tool's stability a nine out of ten. 

What do I think about the scalability of the solution?

I rate the product's scalability a seven out of ten. 

How are customer service and support?

Apache Airflow doesn't have its own technical support.

How was the initial setup?

I've been involved in all aspects of Airflow deployment, including building infrastructure using Kubernetes and containers. We faced challenges migrating from enterprise schedulers like Control-M and IBM's scheduler to Airflow, as it lacked some functionality. I had to implement extra features and extensions to support things like individual calendars.

What's my experience with pricing, setup cost, and licensing?

Apache Airflow is open-source and free. Hyperscalers like Google (with Composer), Azure, and AWS offer managed Airflow services.

What other advice do I have?

I recommend Apache Airflow because it's open-source, but you must accept its limitations. However, I wouldn't recommend it to companies in biomedical, chemistry, or oil and gas industries with large workflows and thousands of jobs. For example, genomic analysis at an American multinational pharmaceutical and biotechnology corporation involved workflows with around twenty thousand jobs, which Airflow can't handle. Special schedulers are needed for such cases, as even classical schedulers like Control-M and Automic aren't suitable.

I rate the overall solution a seven out of ten. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Apache Airflow
June 2026
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.
900,644 professionals have used our research since 2012.
Prathamesh D Marathe - PeerSpot reviewer
Senior Software Engineer at Annalect India
Real User
Jun 10, 2024
Easy to use and implement its functionalities
Pros and Cons
  • "Apache Airflow can be integrated or used to run multiple files."
  • "The in-built package dependencies in Python have some issues in Apache Airflow, making it an area that needs improvement."

What is most valuable?

Apache Airflow is an open-source tool. Apache Airflow can be integrated or used to run multiple files. The tool is helpful in the orchestration process, and it also allows our company to provide some notifications to teams via email. The main valuable feature of the tool is that it is an open-source product and that it can be integrated with any cloud environment.

What needs improvement?

I have not come across any challenges associated with the product.

The scripts that we use in our company refer to the package dependencies in Python, but those are lost when Apache Airflow starts running for a particular test.

The in-built package dependencies in Python have some issues in Apache Airflow, making it an area that needs improvement.

For how long have I used the solution?

I use the solution in my company for orchestration since we have data coming from different sources. Once the data arrives at our company, we consume the data, and then we do the transformation process in SQL. There can be a few Python scripts or any SQL scripts, and our company does the data quality check using Great Expectations. After that, it gets loaded into the target database.

What do I think about the stability of the solution?

It is a stable solution. In Apache Airflow, the in-built package dependencies in Python have some issues. Apart from the aforementioned area, Apache Airflow provides a stable environment.

What do I think about the scalability of the solution?

Around 70 to 80 people in my company use the product, as it is used across all the projects.

How are customer service and support?

I have never contacted the solution's technical support team.

How was the initial setup?

I have not exactly worked on the installation part, but I have definitely worked on the tool's local installation phase, which was an easy process.

The solution is deployed with the help of the cloud services offered by AWS.

What's my experience with pricing, setup cost, and licensing?

It is an open-source tool. There are no additional fees or charges associated with the product. Expenses are associated with only the machines that our company uses on AWS.

What other advice do I have?

DAG or a directed acyclic graph functionality has enhanced our company's workflow management since it helps to find out the source tasks that are running and also the target that fall subsequent to them, and it helps figure out how the data flow is working. In DAG, our company can group the tasks, and so that helps to figure out which group of tasks are running.

Speaking about my experience with Apache Airflow's UI for monitoring and managing workflows, I would say that the tool's UI allows one to add variables, and it also allows one to check the status of the tasks that are running or the previous run on a particular DAG. Our company can send notifications via Apache Airflow, and also check the connection and configuration details.

I recommend the tool to others who plan to use it since it is quite easy to use and it is easy to implement its functionalities for the use cases.

I rate the tool a nine out of ten.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
ManojKumar43 - PeerSpot reviewer
Big Data Engineer at BigTapp Analytics Pte Ltd
Real User
Apr 10, 2024
A solution for orchestrating EMR clusters with plug-and-play UI
Pros and Cons
  • "Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details."
  • "Airflow should support the dynamic drag creation."

What is our primary use case?

I have used Apache Airflow for various purposes, such as orchestrating Spark jobs, EMR clusters, Glue jobs, and submitting jobs within the DCP data flow on Azure Databricks including ad hoc queries. For instance, if there's a need to execute queries on Redshift or other databases.

How has it helped my organization?

If you are working with APIs or databases, you must write SQL queries and formulate the right statements to retrieve everything. But with the UI, it's more like plug-and-play. You go there, select the task you want to see, like logs, and click on it. It will promptly display the details of the logs, automatically showing the returned logs. However, if you're accessing logs manually from the web server, you must write commands and perform additional tasks. These overheads can be efficiently managed using the UI.

What is most valuable?

Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details. All logs are readily accessible within the interface itself. Examining the logs lets you discern which steps and processes are being executed.

You don't have to configure SMTP for everything. You need to configure email settings, such as email on error, failure, or alert access. With Apache Airflow, you can send emails with just a few lines of code. You don't have to write extensive code to configure SMTP; all those configurations can be accomplished within a few lines of code.

I managed a complex workflow for a finance application project. They use Apache Airflow to orchestrate processes, such as retrieving data from SFTP and landing it into S3. From S3, they trigger Glue jobs based on certain conditions. Additionally, they use the Glue catalog in Glusoft for data management, all orchestrated using Airflow. Furthermore, various logics are written in Airflow DAGs to handle scenarios like security mismatches. For instance, files are sent accordingly if there's a missing security.

Apache Airflow triggers a set of tasks based on DAGs. If you have multiple tags, such as raw, transform, and ready layers, instead of manually triggering each DAGs. In that case, you can integrate them to trigger one, automatically triggering the others. Also, you can put conditions.

What needs improvement?

Airflow should support the dynamic drag creation.

For how long have I used the solution?

I have been using Apache Airflow for over 8 years.

What do I think about the stability of the solution?

The solution is stable. 

I rate the solution's stability a nine-point five out of ten.

What do I think about the scalability of the solution?

We were using Apache Airflow on Kubernetes. As more requests came in, it scaled dynamically based on the available ports. There are almost 15 data engineers who are using Apache Airflow.

I rate the solution's scalability a nine out of ten.

How was the initial setup?

The initial setup is straightforward. It will be tricky if you go with an executor or Kubernetes operator.

If you're into plug-and-play convenience, Apache Airflow supports various deployment methods like Docker, Helm, or Kubernetes. If you want to spin up Airflow, it will take more than 10-15 minutes. However, if you're making customizations or prefer not to use existing databases, the setup time could be extended due to customization requests.

What other advice do I have?

You use Apache Airflow to automate your data pipelines. When you have a data pipeline, such as a Spark job or any other job, and want to automate it, triggering the job manually is not always necessary. You need to configure these DAGs accordingly. For instance, Airflow can initiate the job when the data becomes available. We don't need to keep the cluster running all the time, 24/7. We start the cluster using Airflow when we need to submit the job. Once the job is completed, we terminate the cluster.

I recommend the solution.

Overall, I rate the solution a nine out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel
Real User
Mar 25, 2024
Streamlines complex data workflows with its user-friendly interface, robust scheduling, and monitoring capabilities, offering scalability and efficient orchestration of diverse sources
Pros and Cons
  • "Its user-friendly interface makes it straightforward to operate, offering a plethora of features for data preparation, buffering, and format conversion."
  • "It would be beneficial to improve the pricing structure."

What is our primary use case?

It serves as a versatile tool for data ingestion, enabling various tasks including data transformation from one type or format to another. It facilitates seamless preparation and processing of data, supporting diverse operations such as format conversion, type transformation, and other related functions.

How has it helped my organization?

We leverage Apache Airflow to orchestrate our data pipelines, primarily due to the multitude of data sources we manage. These sources vary in nature, with some delivering streaming data, while others follow different protocols such as FTP or utilize landing areas. We utilize Airflow for orchestrating tasks such as data ingestion, transformation, and preparation, ensuring that the data is formatted appropriately for further processing. Typically, this involves tasks like normalization, enrichment, and structuring the data for consumption by tools like Spark or other similar platforms in our ecosystem.

The scheduling and monitoring functionalities enhance our data processing workflows. While the interface could be more user-friendly, proficiency in scheduling and monitoring can be attained through practice and skill development.

The scalability of Apache Airflow effectively accommodates our increasing data processing demands without issue. While occasional server problems may arise, particularly in this aspect, overall, the product remains reliably stable.

It offers a straightforward means to orchestrate numerous data sources efficiently, thanks to its user-friendly interface. Learning to use it is relatively quick and straightforward, although some experimentation, practice, and training may be required to master certain aspects.

What is most valuable?

Our data workflow management is greatly streamlined by the use of Apache Airflow, which proves highly beneficial. Its user-friendly interface makes it straightforward to operate, offering a plethora of features for data preparation, buffering, and format conversion. With its extensive capabilities, Airflow serves as a comprehensive tool for managing our data workflows effectively.

What needs improvement?

The current pricing of Apache Airflow is considerably higher than anticipated, catching us off guard as it has evolved from its initial pricing structure. It would be beneficial to improve the pricing structure. Also, enhancing the interface furthermore would be highly beneficial.

For how long have I used the solution?

We have been using it for approximately two years.

What do I think about the stability of the solution?

While the stability of the system is satisfactory, maintaining stability requires vigilance and attention to various factors. During usage, occasional issues may arise, particularly when operating on-premises configurations. For instance, a single hard disk failure on a physical node can pose a challenge, necessitating the node's shutdown for disk replacement. However, the process of switching off and on the node is intricate and requires careful handling.

What do I think about the scalability of the solution?

Scalability is achievable, but it comes with its challenges, particularly in terms of temporary downsizing due to failures or other unforeseen circumstances. While scaling up is feasible, each additional node introduced into the cluster adds complexity and raises the likelihood of potential failures. Dealing with failures involves following standard procedures, yet reinstating the cluster to its fully operational state can be a demanding task.

Approximately ten technical staff members and an equivalent number of data scientists utilize the platform. Additionally, a segment of the network team employs it for network quality analysis, leveraging reporting tools built on top of Impala, which is integrated into the cluster.

How was the initial setup?

The setup process is notably intricate, particularly considering our cluster configuration consisting of twelve data nodes and various additional components. Furthermore, unforeseen issues may arise, such as disk space constraints for Airflow or similar challenges, necessitating vigilance and attention to detail to avoid complications.

What about the implementation team?

The initial phase of the deployment process involves creating a comprehensive plan outlining the setup of our cluster, considering all nodes involved. Since we're deploying on-premises, we need to determine which components will reside on physical machines and which can be accommodated on virtual machines or clusters. This assessment will guide the allocation of resources to each server, ensuring an optimal configuration. Following this, the configuration phase begins, taking into account the specific requirements of our organization and stringent security measures. Access to the clusters must be carefully managed, categorized, and restricted as per security protocols. It's imperative to prepare everything meticulously prior to deployment to ensure a smooth and successful implementation. We've undertaken the deployment process partially in-house and with the assistance of the system integrator dedicated to this project. For maintenance and deployment tasks, we rely on a team of ten technical personnel. Typically, only two or three individuals are needed to monitor operations and address issues as they arise. Moreover, we have the backing of our system integrator for additional support if necessary.

What's my experience with pricing, setup cost, and licensing?

The pricing is on the higher side.

What other advice do I have?

I would confidently recommend Apache Airflow to others, assuring them of its benefits. In my opinion, it's a mature and efficient product that delivers reliable performance. Overall, I would rate it nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis
Real User
Mar 19, 2024
User-friendly, provides a graphical representation of the whole flow, and the user interface is pretty good
Pros and Cons
  • "The tool is user-friendly."
  • "We cannot run real-time jobs in the solution."

What is our primary use case?

The main use case is orchestration. We use it to schedule our jobs.

What is most valuable?

The best thing about the product is its UI. The tool is user-friendly. We can divide our work into different tasks and groups. It gives a graphical representation of the whole flow. It also creates a graph of the complete pipeline. The UI is beautiful. Whenever there is a failure, we can see it at the backend. We can retry at the point where the failure happened. We do not have to redo the whole flow. The user interface is pretty good. It provides details about the jobs. It also provides monitoring features. We can see the metrics and the history of the runs. The administration features are good. We can manage the users.

What needs improvement?

The solution lacks certain features. We cannot run real-time jobs in the solution. It supports only batch jobs. If we are using ETL pipelines, it can either be a batch job or a real-time job. Real-time jobs run continuously. They are not scheduled. Apache Airflow is for scheduled jobs, not real-time jobs. It would be a good improvement if the solution could run real-time jobs. Many connectors are available in the product, but some are still missing. We have to build a custom connector if it is not available. The solution must have more in-built connectors for improved functionality.

For how long have I used the solution?

I have been using the solution for four to five years.

What do I think about the stability of the solution?

The tool has stability issues that are present in open-source products. It has some failures or bugs sometimes. It is difficult to troubleshoot because we do not have any support for it. We have to search the community to get answers. It would be good if there were a support team for the tool.

What do I think about the scalability of the solution?

We have 5000 to 10,000 users in our organization.

How was the initial setup?

The installation is relatively easy. It doesn't have much configuration. It is straightforward. Some companies provide custom installations. It is easier, but it will be a costly paid service. We generally use the core product. We also have AWS Managed Services. It is a better option if we do not want to do the configuration ourselves.

What other advice do I have?

Apache Airflow is a better option for batch jobs. My advice depends on the tools people use and the jobs they schedule. Databricks has its own scheduler. If someone is using Databricks, a separate tool for scheduling would be useless. They can schedule the jobs through Databricks.

Apache Airflow is a good option if someone is not using third-party tools to run the jobs. When we use APIs to get data or get data from RDBMS systems, we can use Apache Airflow. If we use third-party vendors, using the in-built scheduler is better than Apache Airflow. Overall, I rate the solution a nine out of ten.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
IT Professional at Freelance
Real User
Top 20
Sep 19, 2024
Equips users with a comprehensive feature set for managing complex workflows and has a responsive technical support team
Pros and Cons
  • "Airflow integrates well with Cloudera and effectively supports complex operations."
  • "One area for improvement would be to address specific functionalities removed in recent updates that were previously useful for our operations."

What is our primary use case?

We use the product for scheduling and defining workflows. It helps us extensively to manage complex workflows within Cloudera's ecosystem, particularly for handling and processing data.

How has it helped my organization?

The solution has been beneficial in automating and managing our data workflows efficiently. It has integrated well with our Cloudera environment, enabling us to handle complex workflows with greater ease and reliability.

What is most valuable?

The solution's most valuable feature is its ability to run workflows without saving changes. It allows us to execute tasks without permanently altering our configurations, which is useful for temporary adjustments and testing.

What needs improvement?

One area for improvement would be to address specific functionalities removed in recent updates that were previously useful for our operations.

Additional features that could enhance the product include more flexibility in parameterization and improved tools for managing and debugging workflows.

For how long have I used the solution?

I have been working with Airflow for approximately a year and a half, focusing on the current version for the past eight months.

What do I think about the stability of the solution?

The product has been stable in our environment.

What do I think about the scalability of the solution?

The product is scalable. 

How are customer service and support?

The technical support team has been responsive and helpful. They addressed issues related to removed functionalities and ensured critical features were restored in subsequent updates.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used Hortonworks but switched to Cloudera CDP. We also used other Cloudera tools but found Airflow to be a better fit for our current needs due to its capabilities in workflow management.

How was the initial setup?

The initial setup was complex due to the integration with various data sources and configuration requirements, but once properly set up, it has proven effective.

What about the implementation team?

The implementation was carried out with guidance from Cloudera's support team, who provided valuable assistance in configuring the solution to meet our requirements.

Which other solutions did I evaluate?

We evaluated other data workflow solutions but found Airflow the most suitable due to its integration with Cloudera and comprehensive feature set for managing complex workflows.

What other advice do I have?

Airflow integrates well with Cloudera and effectively supports complex operations. However, users should be aware of changes in functionality between versions and plan accordingly.

Overall, I rate it a nine out of ten. 

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Pravin Gadekar - PeerSpot reviewer
Google Cloud Architect at Capgemini
Real User
Apr 1, 2024
Has an efficient user interface, but its stability needs improvement
Pros and Cons
  • "The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. c"
  • "The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues."

What is our primary use case?

We use the product to orchestrate data engines and process new data files.

What is most valuable?

The product's most valuable feature is scalability. It helps us run hundreds of data jobs every day.

What needs improvement?

The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues. It requires manual intervention to resume jobs. Additionally, while extending the code is possible, it sometimes necessitates creating custom plugins.

For how long have I used the solution?

We have been using Apache Airflow for four years.

What do I think about the scalability of the solution?

We have more than 100 Apache Airflow users in our organization.

How was the initial setup?

The initial setup on Google Cloud using Cloud Composer is straightforward and simplified. However, deploying it on-premises can be complex and challenging.

What was our ROI?

The product is worth the investment.

What's my experience with pricing, setup cost, and licensing?

It is an open-source solution, so there are no hidden fees or licensing costs associated with the software. However, users need to cover the operational costs for the actual infrastructure, such as the virtual machines (VMs).

What other advice do I have?

The directed acyclic graph (DAG) functionality in Apache Airflow has significantly enhanced our workflow management. It provides a visual representation of data processing tasks.

The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. It is difficult for beginners to use the platform, and some training is required.

I recommend the product to others, and it is much better than our competitors. It is an open source. I rate it a seven out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Punit_Shah - PeerSpot reviewer
Director at Smart Analytica
Reseller
Dec 25, 2023
Excels in orchestrating complex workflows, offering extensibility, a graphical user interface for clear pipeline monitoring and affordability
Pros and Cons
  • "One of its most valuable features is the graphical user interface, providing a visual representation of the pipeline status, successes, failures, and informative developer messages."
  • "Enhancements become necessary when scaling it up from a few thousand workflows to a more extensive scale of five thousand or ten thousand workflows."

What is our primary use case?

We utilize Apache Airflow for two primary purposes. Firstly, it serves as the tool for ingesting data from the source system application into our data warehouse. Secondly, it plays a crucial role in our ETL pipeline. After extracting data, it facilitates the transformation process and subsequently loads the transformed data into the designated target tables.

What is most valuable?

One of its most valuable features is the graphical user interface, providing a visual representation of the pipeline status, successes, failures, and informative developer messages. This graphical interface greatly enhances the user experience by offering clear insights into the pipeline's status.

What needs improvement?

Enhancements become necessary when scaling it up from a few thousand workflows to a more extensive scale of five thousand or ten thousand workflows. At this point, resource management and threading, become critical aspects. This involves optimizing the utilization of resources and threading within the Kubernetes VM ecosystem.

For how long have I used the solution?

I have been working with it for five years.

What do I think about the stability of the solution?

I would rate its stability capabilities nine out of ten.

What do I think about the scalability of the solution?

While it operates smoothly with up to fifteen hundred pipelines, scaling beyond that becomes challenging. The performance tends to drop when dealing with five thousand pipelines or more, leading to the rating of five out of ten.

How are customer service and support?

I would rate the customer service and support nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup is straightforward. I would rate it nine out of ten.

What about the implementation team?

The deployment process requires approximately four hours, and the level of involvement from individuals depends on the quantity of pipelines intended for deployment.

What's my experience with pricing, setup cost, and licensing?

The cost is quite affordable. I would rate it two out of ten.

What other advice do I have?

If you have around two thousand pipelines to execute daily within an eight to nine-hour window, Apache Airflow proves to be an excellent solution. I would rate it nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer. Service provider
PeerSpot user
SabinaZeynalova - PeerSpot reviewer
Data Engineer Team Lead at Unibank
Real User
Sep 28, 2023
Can be used with multiple systems and servers, Kubernetes systems, and dashboard systems
Pros and Cons
  • "The product is stable."
  • "There is a need for more features on experimental evolution steps."

What is our primary use case?

We use Apache Airflow for the automation and orchestration of model deployment, training, and feature engineering steps. It is a model lifecycle management tool.

How has it helped my organization?

We have an integration with Apache Airflow in our portal for messaging. We use group and transformation data from Redshift to Tesco, and then create a call flow to the router. This is a source of data leakage, such as data engineering and machine learning, especially in a HIPAA environment. We need to check the evolution steps in the pipeline. In production, we only have two cases. Sometimes, we need customer data not in the database, which we get from object storage. The call flow from Redshift to Tesco involves transforming the data and then generating it with the router or Kibana router for the policy. The data is then transformed and sent to the dashboard or data warehouse.

What needs improvement?

Airflow is a pipeline for transferring code by clients, but for experimental model experiments, Apache Airflow does not have any solution. There is a need for more features on experimental evolution steps.

For how long have I used the solution?

I have been using Apache Airflow for one and a half years.

What do I think about the stability of the solution?

The product is stable. I rate the solution’s stability an eight out of ten.

What do I think about the scalability of the solution?

20 users are using this solution in our organization. I rate the solution’s scalability an eight out of ten.

How was the initial setup?

The initial setup is not complex and can be done by two people. However, open-source prime solutions have some difficulties. We can schedule Apache Airflow on Kubernetes. Space limitations and installation issues may arise, as we do not have full control over Kubernetes cluster resources, and our administration is limited. I rate the initial setup a six out of ten, where one is difficult, and ten is easy.

What other advice do I have?

I recommend Apache Airflow because it is still profitable and can be used with multiple systems and servers, Kubernetes systems, and dashboard systems. You can use it to get social media and other data, but it can be expensive. Overall, I rate the solution a nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
Updated: June 2026
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.