The solution is very stable and reliable. It's quite mature.
The solution scales very well.
The solution is very stable and reliable. It's quite mature.
The solution scales very well.
The installation is difficult. You definitely need more than one person. That said, if you are implementing the cloud, it's easier.
The solution itself could be easier to use.
The solution is free to use as it is open-source.
I've used the solution for a while. I use it every day. However, it depends on the project.
The solution is stable. It's not a new tool. It's quite mature. It's been on the market for many years. We found that especially version 3.0.1 is a good, stable version.
The solution is quite scalable. It is the most scalable tool that I have seen before.
We have five people using the solution.
I did not previously use a different solution. I've been working in data warehousing for around 20 years ago and I used a batch system that used Oracle Database and is not a scalable system like Spark.
The initial setup is quite involved. Streaming is a huge system. It's different. You need to use another part of the code, however, it's not extremely much. People who work in data need to micro-batch. You need other tools, such as Hadoop or Data Lake, or Kafka to control the data.
It's not easy to install. Not all products are open-source. It is not easy to implement on-premise as you need maybe two technical persons to maintain the system. If you put it in the cloud, it's easier.
It's less expensive to use the cloud. Using on-premises is more costly. Spark is open-source and doesn't actually cost us anything.
It's cheaper for companies to use cloud systems, however, you can implement it on-premise.
We use the cloud. As it is the cloud, it's always on the latest version and updates itself regularly.
I would rate the product at a nine out of ten. It's very good in terms of its capabilities and I have been very happy with it.
I would recommend the solution to other users.
I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy.
In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.
Apache Spark is a versatile technology useful not only for data solutions but also for data creation. This is especially valuable given GDPR regulations and limited access to production, which make tasks like testing quite difficult. It helps with data creation and alignment for both consumers and developers.
Apache Spark Streaming is particularly good at handling real-time data. It has built-in data streaming integration, which allows it to stream data from any source as soon as it becomes available.
The scalability features are already good, but they could be further enhanced. Additionally, the debugging aspect could use some improvement.
The stability is very good. Since everything runs as code, it's easy to understand what's happening under the hood. It's not a closed-box system, which makes it quite transparent.
On my team, there are about six or seven people using it. However, on the analytics side, where users view the reports, there are many more, perhaps over a hundred.
The deployment process is quite easy and not very complicated.
Since it's an open-source technology, it can be deployed in various environments, including local machines and all kinds of clouds. If you're using the cloud, scaling is quite easy.
If there are knowledgeable, experienced team members, it doesn't require a large team. One or two developers are enough.
It can handle large datasets and is relatively easy to manage, especially with cloud technologies. This means you can process a lot of data even with a low-configuration environment, which helps with cost savings.
I would rate it a seven out of ten. Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way. This can be much more efficient than using various technologies. You can also combine its batch processing capabilities with new technologies and machine learning.
It's quite useful for AI because of its machine-learning capabilities, which allow for model training and output generation.
The primary use of the solution is to implement predictive maintenance qualities.
The solution is better than average and some of the valuable features include efficiency and stability.
There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused. For example, it is still not plug and play and use as some of the cloud offerings that come ready to use. It is not up there in the reading leading edge.
I have been using this solution for approximately one and a half year.
The solution is very stable.
The initial setup is developer-focused but it is not very complex. I can set up a stream in less than an hour. It will stream but It will not be a production-ready stream.
I rate Apache Spark Streaming a six out of ten.
The platform’s most valuable feature for processing real-time data is its ability to handle continuous data streams.
The product's event handling capabilities, particularly compared to Kaspersky, need improvement. Integrating event-level streaming capabilities could be beneficial. This aligns with the idea of expanding Spark's functionality to cover unaddressed areas, potentially enhancing its competitiveness.
We have been using Apache Spark Streaming for five years.
Spark is an affordable solution, especially considering its open-source nature. However, it could use support from experienced companies to resolve any issues effectively.
Spark does not encounter integration issues, particularly due to its utilization of JDBC connectors. These connectors facilitate seamless integration with third-party solutions. Furthermore, successful integration with tools like SAP HANA indicates its versatility in handling various data sources. Additionally, its performance surpasses Informatica in certain scenarios, especially when real-time streaming capabilities are crucial. It remains a preferred choice for businesses requiring efficient real-time data processing.
I rate it an eight.