Apache NiFi is used to fetch data from different sources and ingest it into different destinations. The entire platform depends on Apache NiFi for data transformation and data movement. Multiple sources such as flat files, Oracle, Postgres are connected, and the main database where data is stored is Postgres and Apache Druid. The UI connects directly to those databases to ensure the data is available at all times. This platform has been built for many enterprise platforms. Sources include Oracle and files in S3. Apache NiFi creates flows using NiFi processors to fetch data from particular sources such as Oracle. Business data and customer data are available there. Every day there are many incidents, changes, and logging into the system. Data must be fetched daily in two ways: batch streaming and near real-time streaming. Data is fetched twice every day as a batch job. The data is processed after fetching. When connected to the Oracle database, data is received in Avro format. It is then changed into JSON format, and a mapping is used. Whatever destination is required, fields are mapped accordingly. The schema in Postgres is mapped using evaluate JSON path. Whatever JSON field is received is mapped to the respective schema in the database. Records are then ingested into Postgres. Almost daily, 5 million records are received and multiple flows process them. Each flow has a different schedule, which can be every five minutes or twice daily. Another process retrieves cost data from AWS and Azure. Everyday billing files are received in S3 or Azure blob if Azure is being used. The data is retrieved and transformed with modifications made based on the schema. The data is then sent into Apache Druid using the REST API. Apache NiFi is the heart of data ingestion for the entire platform. Whatever data movement is happening will happen through Apache NiFi, which is the single data ingestion tool in the platform. Once data lands into the respective destinations using Apache NiFi, multiple transformations, queries, or other operations are performed and projected to the UI. The main gateway for all data from whatever source can be is Apache NiFi. It is a no-code platform where multiple flows can be created, which reduces the time spent writing code and integrating into different connectors. Connectors are already available in Apache NiFi, so flows just need to be created, which makes the process easy and stable.
I have been using Apache NiFi virtually daily, as it is part of my main responsibility in my current role. My main use case for Apache NiFi involves integrating various data sources and performing transformations to load them into mostly our NoSQL database, Elasticsearch, but sometimes into other databases as well. For integrating and transforming data, we receive a lot of logs generated with our AWS services that the company wants to collect, particularly for our security team to review those logs and ensure they can conduct their security checks and reviews to confirm there is no abnormal behavior. We use Apache NiFi to capture those logs sent to many S3 buckets, collect those logs, decompress them with Apache NiFi, perform any necessary transformations, and send them to Elasticsearch so that end users, often from the network team or security team, can then use Elasticsearch and Kibana for data analysis. My advice for others considering Apache NiFi is that if you are willing to, you can use it on-premises; it offers great customizability. While it is specifically designed for streaming data, it can also accommodate batch data. Moreover, it is useful for various out-of-the-box solutions, including unique uses such as email notifications, showcasing flexibility in data orchestration, ETL, and other applications.
My main use case for Apache NiFi is orchestration; I kickstart the job and then pass on the handler to Databricks or AWS to run the ETL pipeline.A specific example of a workflow where Apache NiFi plays a key role is when there is an on-premises Hadoop system and a cloud component. Because of the company's policy regarding firewalls, the data cannot be directly moved through services such as Kinesis or DMS integrating directly with on-premises resources. Apache NiFi works as an orchestrator and a middle tool to get the parameters to trigger the job and then pass on the handler to the cloud services. Because of the firewall, Apache NiFi comes into the picture. Another use case for Apache NiFi is once the data is created in S3; I can extract a subset of the data and send it as an SFTP for outside recipients. I have another scenario regarding my main use case with Apache NiFi; there is a use case for synthetic data, and we are using synthetic data generative AI software to synthesize data in the cloud environment. Now for users who are not on the cloud and want to access the synthetic data, Apache NiFi is used to pull the data back from the cloud.
Apache NiFi is used to orchestrate ingestion processes. For example, Apache NiFi ingests data from external sources such as external databases or external APIs. Custom transformation is then applied, and data is written inside the data lake.
Apache NiFi is used for real-time and batch ingestion on data warehouse platforms. For example, Apache NiFi ingests all analytics from the e-commerce website into the data warehouse in the AWS Redshift database.
Head of Data Engineering and AI Engineering at Coraline
Real User
Top 10
Apr 2, 2025
I am implementing the ETL workflow using Apache NiFi ( /products/apache-nifi-reviews ) to prepare data and upload it to the cloud. Our use case involves importing data from on-premise and private servers to build a data hub and data mart. The data mart is then published on the cloud.
I use NiFi as a tool for ETL, which stands for extract, transform, and load. It is particularly effective for integration methodologies. The tool is useful for designing ETL pipelines and is an open-source product. Data is often stored in different forms and locations. If I want to integrate and transform it, NiFi can help load data from one place to another while making transformations. I can handle stream or batch data and identify various data types on different platforms. NiFi can integrate with tools like Slack and perform required transformations before loading to the desired downstream. It is primarily a pipeline-building tool with a graphical UI, however, I can also write custom JARs for specific functions. NiFi is an open-source tool effective for data migration and transformations, helping improve data quality from various sources.
Engineering Lead- Cloud and Platform Architecture at a financial services firm with 1,001-5,000 employees
Real User
Oct 25, 2023
As a DevOps engineer, my day-to-day task is to move files from one location to another, doing some transformation along the way. For example, I might pull messages from Kafka and put them into S3 buckets. Or I might move data from a GCS bucket to another location. NiFi is really good for this because it has very good monitoring and metrics capabilities. When I design a pipeline in NiFi, I can see how much data is being processed, where it is at each stage, and what the total throughput is. I can see all the metrics related to the complete pipeline. So, I personally like it very much.
One example is how Apache NiFi has helped us to create data pipelines to migrate data from Oracle to Postgres, Oracle to Oracle, Oracle to Minio, or other databases, such as relational databases, NoSQL databases, or object storage. We create templates for these pipelines so that we can easily reuse them for different data migration projects. For example, we have a template for migrating data from Oracle to Postgres. This template uses an incremental load process. The template also checks the source and destination databases for compatibility and makes any necessary data transformations. If our data is not more than ten terabytes, then NiFi is mostly used. But for a heavy table setup, I don't use NiFi for customers or enterprise solutions.
Our company uses the solution to ingest raw data. We have five repositories with a huge amount of data. We normalize the data to previously structured files, prepare it, and ingest it to devices. The size of any project team depends on the workflow or management activities but typically includes two to five users.
Senior Technology Architect at a tech services company with 10,001+ employees
Real User
Mar 18, 2021
I use Apache NiFi to build workflows. It's an event that is used for distributed messaging. You need to transfer the message that comes into Kafka Broker Topic. You get the messages in the Kafka queue topic, you transform it and send it to other entities for storage, or you can return it back to Kafka to send to the consumer.
The primary use case is to collect data from different source systems. This includes different soft types of files, such as text files, bin files, and CSV files, to name a few. It is also used for API training. It works for a large amount of data. It is oriented for endpoint solutions and high and low frequency in small packets of data, for example, files. It can also work well when integrated with Spark, they are complementary in some use cases. At times we work only with Nifi, at times only with Spark and other times when they are integrated.
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
Apache NiFi is used to fetch data from different sources and ingest it into different destinations. The entire platform depends on Apache NiFi for data transformation and data movement. Multiple sources such as flat files, Oracle, Postgres are connected, and the main database where data is stored is Postgres and Apache Druid. The UI connects directly to those databases to ensure the data is available at all times. This platform has been built for many enterprise platforms. Sources include Oracle and files in S3. Apache NiFi creates flows using NiFi processors to fetch data from particular sources such as Oracle. Business data and customer data are available there. Every day there are many incidents, changes, and logging into the system. Data must be fetched daily in two ways: batch streaming and near real-time streaming. Data is fetched twice every day as a batch job. The data is processed after fetching. When connected to the Oracle database, data is received in Avro format. It is then changed into JSON format, and a mapping is used. Whatever destination is required, fields are mapped accordingly. The schema in Postgres is mapped using evaluate JSON path. Whatever JSON field is received is mapped to the respective schema in the database. Records are then ingested into Postgres. Almost daily, 5 million records are received and multiple flows process them. Each flow has a different schedule, which can be every five minutes or twice daily. Another process retrieves cost data from AWS and Azure. Everyday billing files are received in S3 or Azure blob if Azure is being used. The data is retrieved and transformed with modifications made based on the schema. The data is then sent into Apache Druid using the REST API. Apache NiFi is the heart of data ingestion for the entire platform. Whatever data movement is happening will happen through Apache NiFi, which is the single data ingestion tool in the platform. Once data lands into the respective destinations using Apache NiFi, multiple transformations, queries, or other operations are performed and projected to the UI. The main gateway for all data from whatever source can be is Apache NiFi. It is a no-code platform where multiple flows can be created, which reduces the time spent writing code and integrating into different connectors. Connectors are already available in Apache NiFi, so flows just need to be created, which makes the process easy and stable.
I have been using Apache NiFi virtually daily, as it is part of my main responsibility in my current role. My main use case for Apache NiFi involves integrating various data sources and performing transformations to load them into mostly our NoSQL database, Elasticsearch, but sometimes into other databases as well. For integrating and transforming data, we receive a lot of logs generated with our AWS services that the company wants to collect, particularly for our security team to review those logs and ensure they can conduct their security checks and reviews to confirm there is no abnormal behavior. We use Apache NiFi to capture those logs sent to many S3 buckets, collect those logs, decompress them with Apache NiFi, perform any necessary transformations, and send them to Elasticsearch so that end users, often from the network team or security team, can then use Elasticsearch and Kibana for data analysis. My advice for others considering Apache NiFi is that if you are willing to, you can use it on-premises; it offers great customizability. While it is specifically designed for streaming data, it can also accommodate batch data. Moreover, it is useful for various out-of-the-box solutions, including unique uses such as email notifications, showcasing flexibility in data orchestration, ETL, and other applications.
My main use case for Apache NiFi is orchestration; I kickstart the job and then pass on the handler to Databricks or AWS to run the ETL pipeline.A specific example of a workflow where Apache NiFi plays a key role is when there is an on-premises Hadoop system and a cloud component. Because of the company's policy regarding firewalls, the data cannot be directly moved through services such as Kinesis or DMS integrating directly with on-premises resources. Apache NiFi works as an orchestrator and a middle tool to get the parameters to trigger the job and then pass on the handler to the cloud services. Because of the firewall, Apache NiFi comes into the picture. Another use case for Apache NiFi is once the data is created in S3; I can extract a subset of the data and send it as an SFTP for outside recipients. I have another scenario regarding my main use case with Apache NiFi; there is a use case for synthetic data, and we are using synthetic data generative AI software to synthesize data in the cloud environment. Now for users who are not on the cloud and want to access the synthetic data, Apache NiFi is used to pull the data back from the cloud.
Apache NiFi is used to orchestrate ingestion processes. For example, Apache NiFi ingests data from external sources such as external databases or external APIs. Custom transformation is then applied, and data is written inside the data lake.
Apache NiFi is used for real-time and batch ingestion on data warehouse platforms. For example, Apache NiFi ingests all analytics from the e-commerce website into the data warehouse in the AWS Redshift database.
I am implementing the ETL workflow using Apache NiFi ( /products/apache-nifi-reviews ) to prepare data and upload it to the cloud. Our use case involves importing data from on-premise and private servers to build a data hub and data mart. The data mart is then published on the cloud.
I use NiFi as a tool for ETL, which stands for extract, transform, and load. It is particularly effective for integration methodologies. The tool is useful for designing ETL pipelines and is an open-source product. Data is often stored in different forms and locations. If I want to integrate and transform it, NiFi can help load data from one place to another while making transformations. I can handle stream or batch data and identify various data types on different platforms. NiFi can integrate with tools like Slack and perform required transformations before loading to the desired downstream. It is primarily a pipeline-building tool with a graphical UI, however, I can also write custom JARs for specific functions. NiFi is an open-source tool effective for data migration and transformations, helping improve data quality from various sources.
We use the tool to transfer data from one service to another. It helps us to migrate data from one department to another.
As a DevOps engineer, my day-to-day task is to move files from one location to another, doing some transformation along the way. For example, I might pull messages from Kafka and put them into S3 buckets. Or I might move data from a GCS bucket to another location. NiFi is really good for this because it has very good monitoring and metrics capabilities. When I design a pipeline in NiFi, I can see how much data is being processed, where it is at each stage, and what the total throughput is. I can see all the metrics related to the complete pipeline. So, I personally like it very much.
One example is how Apache NiFi has helped us to create data pipelines to migrate data from Oracle to Postgres, Oracle to Oracle, Oracle to Minio, or other databases, such as relational databases, NoSQL databases, or object storage. We create templates for these pipelines so that we can easily reuse them for different data migration projects. For example, we have a template for migrating data from Oracle to Postgres. This template uses an incremental load process. The template also checks the source and destination databases for compatibility and makes any necessary data transformations. If our data is not more than ten terabytes, then NiFi is mostly used. But for a heavy table setup, I don't use NiFi for customers or enterprise solutions.
We use the solution for data streaming.
Our company uses the solution to ingest raw data. We have five repositories with a huge amount of data. We normalize the data to previously structured files, prepare it, and ingest it to devices. The size of any project team depends on the workflow or management activities but typically includes two to five users.
I use Apache NiFi to build workflows. It's an event that is used for distributed messaging. You need to transfer the message that comes into Kafka Broker Topic. You get the messages in the Kafka queue topic, you transform it and send it to other entities for storage, or you can return it back to Kafka to send to the consumer.
The primary use case is to collect data from different source systems. This includes different soft types of files, such as text files, bin files, and CSV files, to name a few. It is also used for API training. It works for a large amount of data. It is oriented for endpoint solutions and high and low frequency in small packets of data, for example, files. It can also work well when integrated with Spark, they are complementary in some use cases. At times we work only with Nifi, at times only with Spark and other times when they are integrated.