We mainly use it for high-level decision-making insights in the education industry. We also use it for operational reports and data integration between systems.
Microsoft Analytics Platform System integrates with Microsoft ecosystem, connecting databases and offering advanced Cube Solution for analytics. It provides cost-effective end-to-end capabilities with multiple features under one license.

| Product | Mindshare (%) |
|---|---|
| Microsoft Analytics Platform System | 2.2% |
| Snowflake | 9.3% |
| Teradata | 8.7% |
| Other | 79.8% |
| Type | Title | Date | |
|---|---|---|---|
| Category | Data Warehouse | Jun 23, 2026 | Download |
| Product | Reviews, tips, and advice from real users | Jun 23, 2026 | Download |
| Comparison | Microsoft Analytics Platform System vs Snowflake | Jun 23, 2026 | Download |
| Comparison | Microsoft Analytics Platform System vs Oracle Exadata | Jun 23, 2026 | Download |
| Comparison | Microsoft Analytics Platform System vs Teradata | Jun 23, 2026 | Download |
| Title | Rating | Mindshare | Recommending | |
|---|---|---|---|---|
| Dell PowerStore | 4.4 | 1.4% | 97% | 220 interviewsAdd to research |
| Teradata | 4.1 | 8.7% | 88% | 83 interviewsAdd to research |
Microsoft Analytics Platform System offers seamless integration with Azure products, and its tools enhance flexibility and compatibility across different services and scenarios. While it provides comprehensive BI charts and data warehousing capabilities, users have noted that support and documentation need improvements, especially in hybrid environments. The system's machine learning and AI could be more user-friendly, and real-time analytics require enhancements. Handling permissions and data ingestion are cost-intensive, posing challenges for non-technical teams.
What are the key features of Microsoft Analytics Platform System?In industries such as finance, telecom, and healthcare, Microsoft Analytics Platform System is used for multi-dimensional analysis and business intelligence. It supports predictive analytics, report generation, and server monitoring within Azure settings. Users leverage it for data integration and decision-making, allowing extensive project execution with its suite of tools.
Microsoft Analytics Platform System was previously known as Microsoft APS, MS Analytics Platform System.
| Author info | Rating | Review Summary |
|---|---|---|
| Senior Data Engineer at Tatweer Educational Technologies Company - TETCO | 3.5 | We use Microsoft Analytics Platform System for insights in the education industry, benefiting from its cost-effective features. However, improvements in real-time analytics and permissions are needed. Despite challenges, it provides a positive return on investment. |
| CTO at Data Semantics Pvt Limited | 4.0 | I primarily use Microsoft Analytics Platform System for predictive analytics, valuing its seamless integration with Microsoft's reporting platforms. However, support could be improved, as it sometimes takes time to find assistance for model adjustments. |
| Data Solution Architect at a government with 10,001+ employees | 4.0 | I find Microsoft Analytics Platform System a highly integrated and versatile solution, rated 8/10. Its strong integration with other Microsoft products is key, but new feature documentation is severely lacking and fragmented, making it challenging for users. |
| Manager - Infrastructure at a government with 1,001-5,000 employees | 2.0 | We use Microsoft Analytics for Azure server monitoring, valuing its integration and stability. However, the expensive data ingest model, exacerbated by licensing changes, and lack of real-time, interactive monitoring make it a disappointing 4/10. |
| Delivery Lead at a tech consulting company with 1-10 employees | 2.5 | I use this product for data warehousing, valuing its end-to-end solution and Cube's advanced analytics. However, standard support, ML/AI ease of use, and feature parity need improvement. I rated it an average five, given available modern alternatives. |
| Solution Manager at Erste group | 4.0 | I've used this stable solution for five years for data analysis and reporting. Its universal database connectivity is key, but I find its flexibility needs improving. Setup was straightforward. I rate it 8/10. |
| CTO with 11-50 employees | 4.0 | No summary available |
| Senior Data Architect at a pharma/biotech company with 1,001-5,000 employees | 4.0 | I find Microsoft APS an EDW game changer, combining SQL Server PDW and Hadoop via PolyBase. This enables T-SQL queries across structured and unstructured data, simplifying analytics. Its ease of use, minimal learning curve, and lower cost are key advantages. |
| Senior Data Architect at a pharma/biotech company with 1,001-5,000 employees | 4.0 | I found that Microsoft APS offers a practical solution for a Hub and Spoke EDW architecture. It overcomes the limitations of traditional centralized and decentralized systems, providing a scalable, flexible, and cost-effective platform for enterprise data warehousing, integrating central control with decentralized agility. |

We mainly use it for high-level decision-making insights in the education industry. We also use it for operational reports and data integration between systems.
We leverage its capabilities for many applications. We can integrate with our databases, like Oracle, MySQL, or any other, using Microsoft Integration Services. This lets us continue using private databases without paying additional licensing fees.
Additionally, the license includes Analytics services and Power BI, which work on-premises, unlike most other technologies that require cloud solutions.
It provides multiple features under one license, which is cost-effective compared to other solutions like IBM's. We are in the education area. This is why we've mainly settled on Microsoft products.
We need better real-time analytics capabilities. It's a bit challenging for us.
Moreover, there are some permission limitations.
I have been using it for five years. We use the latest version because of app access and the latest features Microsoft provides.
It is a stable product.
There are around a hundred end users. There are challenges in the scalability due to accessing a large virtual HCM and retrieving data.
Sometimes, for common issues, we can escalate to the 24-hour Platform support.
Positive
The deployment process is easy with Microsoft Stack. It provides access to the newest features and Microsoft's ongoing development and deployment support. They've integrated DevOps pipelines for streamlined development and deployment in containers.
We had to make some custom implementations because we couldn't utilize out-of-the-box features. It might be due to our multiple requirements.
We only have on-premises solutions from Microsoft. So what I'm building is a Microsoft product, but on-premises.
Thanks to automation, we don't need a lot of people or resources for deployment. We can have an automated pipeline for deployment, like Continuous Integration/Continuous Delivery (CIDI).
Assuming no issues, a smooth deployment can be completed within an hour, potentially even faster. It depends on the changes you want to make. If management only chooses specific functionalities to update, it's generally straightforward.
It's not inherently difficult to maintain. Most maintenance tasks are routine, like performance monitoring through the portal.
We get updates, and if complex, it's costly compared to sending the information for simpler updates.
It does bring in an ROI.
It's not too confusing. It's based on features, and we don't have to buy additional licenses.
If you have already a license for Office 365, then it's a good-to-go solution.
Overall, I would rate the solution a seven out of ten.
Microsoft Analytics Platform System's most valuable feature is its ecosystems and seamless integration with other Microsoft reporting platforms and databases.
Microsoft Analytics Platform System could have better support. Sometimes, it takes time to find support to tweak a model or get details.
I have been using Microsoft Analytics Platform System for five years.
I rate Microsoft Analytics Platform System an eight out of ten for stability.
I rate Microsoft Analytics Platform System a seven out of ten for scalability.
I rate Microsoft Analytics Platform System an eight out of ten for its ease of initial setup.
Depending on the usage we are working on, the solution's implementation can take two to three months or be really fast. It typically takes us a week to do the complete setup. However, if you're asking for a specific use case implementation, it depends upon the complexity of the problem we are trying to solve.
Many SaaS offerings in the market can give you the same results at a much cheaper rate. However, we typically use Microsoft Analytics Platform System for its ecosystem. I rate Microsoft Analytics Platform System a seven out of ten for pricing.
Users can start with a small proof of concept to see whether the solution fits their requirements, and then they can proceed with an implementation.
Overall, I rate Microsoft Analytics Platform System an eight out of ten.
I am a freelancer in this area. Microsoft Analytics Platform System is a suite of many different products. I have been busy using Microsoft Analytics Platform for the last 10 years and it is not really one single product. There are many different tools and many different technologies used both on cloud and on-premise. It is maybe a hundred products. There is not one single component that I would call Microsoft Analytics Platform System because it is just that: a system.
Because that is the case, it is very difficult to pick one thing as most important or that we use it for most because it is so versatile and serves very different needs for different users. Sometimes a part of the suite is used by just a couple of people and sometimes by a complete team or even groups of teams. It really depends on the situation and the solution the product provides for a given project. No part is the primary part. I could only say that the use case is to work with a unified system to enhance collaboration, analysis, and productivity, and only that.
Personally, I get assignments from companies and I then implement those assignments for those companies. I, myself, do not use those products at all. So I am not going to use more or less of a particular part or service. I go to the assignment and I implement the solution in Microsoft Analytics Platform System for them. I do all kinds of different projects, from small to medium to very big. My personal use case is to do the implementations for projects for those companies.
The most valuable part of the product is that it is a system. It has different tools for different services for different kinds of scenarios. It is a very rich tool and an integrated technology-rich platform. The total integration with the rest of Microsoft products is probably the most valuable piece that creates flexibility and compatibility and makes the tool a very useful one.
In general, I am not really very satisfied with the tutorials that are out there. When Microsoft releases a new tool, technology — whatever it is — oftentimes it is not really very easy to get your hands on the insightful information and documentation, training courses, and other training materials. If you can find them, they may not explain what you need to know in a nice way. Often times they are a little bit fragmented. These user-oriented guides are something that should be better and released along with the products they are supposed to support.
For example, we have servers in Azure called Azure Data Factory which I work with quite a lot. When a new feature or new release happens, finding the right documentation or resources that explain these features and how do you work with them is a little bit more difficult than it should be, in my opinion.
There are probably a lot of extra features that might be considered to add to the scope of this solution. However, adding ports for different types of users may be one of the best. Certain users are advanced users and they can find their way around. But sometimes non-technical users or those that do not have a lot of technical background can find the complexity a little bit difficult to work with. Better handling of user gateways and privileges would be a benefit.
We have been using this solution for the last three or four years.
I do not personally see many issues with the stability as long as everything is configured correctly.
Essentially the scalability sometimes is a little bit difficult depending on how it needs to be applied in some scenarios. I have been working for very different companies from medium-sized to quite large — a few thousand users. Oftentimes only for groups of 50 plus users. Scalability is inherent in that scope and it can be done. The specific reasons and application of scaling may make it more or less challenging, but it can be done.
I would say that the technical support is satisfactory. It is neither really good nor really bad.
I have used other products like SAP or other third-party tools here and there. Most of my experience is with Microsoft Azure and I have not really considered working with other tools, platforms and solutions too much simply because Microsoft is best at integrating with their own products.
For me, the setup is quite straightforward.
The deployment can take just a few weeks in some cases. In some cases, it is a month and even years because of the scope of the rollout. So it really depends on the project.
As I do the installations, I do not need to use outside services.
I would recommend and do recommend using this product for others who need it.
On a scale from one to ten where one is the worst and ten is the best, I would rate this product as an eight-out-of-ten.
Our primary use case is for monitoring our servers. We have a virtual fleet within Azure and we monitor that.
Because we use Azure, it's built into Azure. We probably haven't been using it properly, so there's lots of bits and pieces to Microsoft Analytics. To be quite honest, we've never sat down and formulated a plan on how to use it to a hundred percent. We probably have used a bit of it at the moment.
I like that it's integrated with other Azure products.
I think the biggest problem with the product is that it does a data ingest model, which is very expensive.
I have been using Microsoft Analytics Platform System for two to three years.
We haven't had any crashes or issues with. It's pretty stable.
It requires two to three staff members for the maintenance. We use it occasionally, not on a daily basis.
We rarely contact their technical support.
Most of the setup is pretty easy. It's just out of the box. It's already there. We set it up by ourselves.
If you're going to do it, do it seriously. Implement it straight away. Don't do it piecemeal like we've done. Get the best out of it as you can. The trouble is Microsoft has changed their licensing. It's an extremely expensive product now. We are currently paying $200,000 a year for all the different parts of the suite during an ingest model Microsoft now charges us $700,000 a year.
They changed the whole product but and the way it used to be done. You would buy a license per server and now it's an ingest model on data.
I would rate it a four out of ten. To make it a higher score, they should have real application time monitoring. It's not that good. They have to drill back. It's not very interactive.
My primary use for this product would be as a data warehouse and to do business analysis.
I think each component of the product has its own advantages, but I do not think I should explain every component and instead focus on one that stands out. One thing I have concluded through use is that the Cube Solution is quite different when compared to the rest of the competition and has unique functionality for advanced analytics. Also, the variety of charts that are available for the BI are a nice, functional addition. The rest is probably almost the same as other products in the category.
I think Microsoft Stack offers the end-to-end solution I need. If I go for other products, they may have the end-to-end functionality but not all of the tools I already have in Microsoft Stack. Some of the other products just cover the ETO (Extract, Transform and Load) part, some of them just cover the visualization. I think that the Microsoft solution does better in truly covering warehousing end-to-end.
With the release of the 2019 version, I think Microsoft Stack has the capability for machine learning and in fact, it can now live in a Linux environment. I have had to deal exclusively with Microsoft technology for about 17 years working with the product and I personally have not deployed that capability, but I think that one thing is a big improvement in potential flexibility. If they can continue to do one important thing with each release like what Oracle is already doing, I think it is good.
For example, if you do select star (*) from one table, Oracle returns the first 50 results. Microsoft will return all the results regardless of the number of rows in a table. I think these key features and functionality are something that Microsoft should improve because it makes sense how Oracle treats the customer queries. There are a few other improvements that can be made, but I can see key limitations in what Microsoft has in comparison to Oracle. They should concentrate on the most important features and add them.
In my opinion, the standard technical support at Microsoft should be improved as well. It is not really helping a product to be noticed in the market if they allow support to just remain at the market standard. So I think it should be improved for clients who do not choose to pay for premium services.
What I would like to see in upcoming releases is improvement in the machine learning and the AI to make it much easier for people to jumpstart their efforts. The foundation for this is probably already there the data platform, but making people able to do machine learning solutions and artificial intelligence very fast would help them have success and become more involved to learn about the technology. So, I would say to make that learning experience as short as possible and provide useful examples. Then that will help.
I have been using the Microsoft Analytics Platform System since 2008, so twelve years.
I have worked with Microsoft products before as an engineer for data platforms, so I do not see many issues with stability. For people who do not have that much knowledge about technology and architecture, I think performance something they might have problems with if they do not design and configure the product properly.
In my case, the scalability is okay because I know how to work with the architecture and the design. I do not think many people would know that. If someone is coming from a wider experience base and thinks that just because they have worked with other solutions this will work easily, they may end up building something not scalable. So the issue of scalability is not really dependent on the product but is rather is the fault of the design engineer and their knowledge.
I have definitely been in touch with Microsoft's technical support because I have worked for Microsoft before. Because of that, I have got a lot of experience with Microsoft support directly.
I have worked with Microsoft support in the capacity of premier services. When they provide services to premier customers they definitely need to serve at the highest standard possible. From the escalation standpoint, sometimes users find it very disappointing because it is difficult to get through the initial support level. But when it comes to customer satisfaction overall, I think their services are above average compared with other similar product providers. But, of course, customers need to pay a premium price to get that kind of attention in support in the first place.
In comparing the Microsoft and Oracle products I think the main difference comes down to ease-of-use. I think the Oracle product track and the architecture is designed for people with less depth-of-knowledge about the product. If you do not have knowledge about the Oracle products, generally the product can be maintained and useful because it is designed to work that way. But for Microsoft, if you do not have much knowledge to maintain the database and if you have a very high workload, you will end up having technology that is much more difficult to maintain.
I think Oracle's trade secret is really incorporating a lot of features inside that were designed for less maintenance and administrative attention. For example, Oracle has something called Materialize View. It is kind of like a local duplication of physical tables. In Microsoft, there is no feature like Materialize View. From a performance perspective, it definitely will have an advantage in performance using local data and fields. Inside Oracle, the way it displays the query results is also a performance advantage. But with Oracle, even if people lack knowledge about writing more complicated PL/SQL script, they will find it easier to use. With Microsoft, if you do not know about how to write a good script, then the experience will not be as easy or as good.
I think the ease-of-use is why Oracle is much more expensive than Microsoft Stack. But if you are going to be using SQL and scripts on a larger scale in Microsoft, you can end up with quite expensive investment anyway.
Microsoft needs to change the license structure in my opinion. This is because I think Oracle — when it comes to visualization — has an advantage in terms of the total cost of ownership. Microsoft does not have visualization between virtual SQL and physical SQL, so customers end up paying more if they have multiple visual sequel services.
Because I am so used to Microsoft technology, I do not find much complexity in the initial setup of these products. I think the setup for Microsoft Stack is quite straightforward. But if somebody does not have much knowledge about the technology and Microsoft, they might try to take more advanced steps. If their configuration is not designed properly, they will end up with a platform that is not able to scale according to their workload. I think that it is a common pitfall in Microsoft technology because people think it is easy because of its friendly interface, but without understanding the product you can not use it to its capability.
If you do not have considerable experience, it is better to install it with the help of a consultant or integrator. Otherwise, you need to have somebody on your team who is really good on the backend who has the technical knowledge to do it correctly rather than treat it as a simple solution.
Besides the standard licensing users have to pay additional fees for technical support. The default support I think is just the same as with other products and it has become industry standard to be average. But if you pay the additional premium price for the above-average standard of service, you do experience an enhanced support experience.
I have experience working with business intelligence solutions and data science platforms. The majority of that experience is in working with is Microsoft Azure Stack and Oracle. Really my experience is with the whole Microsoft Technology Stack. I tried to do some research to figure out what is the best tool that I can use to cater to both worlds of data warehousing. The reason for the research came about because of a potential opportunity with a customer that is at the stage of doing the initial build of its data warehouse. It is an initial build but at the same time, they want that solution to be able to drive them to the future of big data analytics.
So, while I have experience with Stack already and know what it can do, I was comparing newer products which I think are potentially the best to see which is the optimal solution because there are new solutions and technologies on the market. It would be to help achieve an end-to-end data warehouse that is best from data loading to extractions through transformation as well as the visualization using a product that still has strong prospects for future development.
My advice to people considering this solution is that as a user and administrator you need to know the internal workings of the product. We can downplay that software by simply saying that it is just a database engine like all the other ones without finding out the real capabilities. You need to know the capabilities in-depth to know what sets the product apart from other products and if the features are the features and capabilities that you need.
On a scale from one to ten where one is the worst and ten is the best, I would rate this product overall as a five. This is probably because there is still a lot of room for improvement, features that other products have that are missing, and a lot of open-source technology nowadays that are very good and people can use instead. I still think five says it is average compared to modern technologies and advancements.
We are integrating services to collect data and this solution is used to analyze data, prepare results, and then generate reports.
The most valuable feature is database connectivity. This solution will connect to any database, you can combine databases, and you can create a cube or tabular model. This includes, for example, a relational model.
The flexibility of this solution needs to be improved because you cannot make changes at every one of the different steps.
I have been using this solution for five years.
This is a stable solution.
We have approximately five hundred users and I expect our usage to increase in two years' time.
The initial setup of this solution is straightforward and we had no issues.
We performed the implementation ourselves.
One needs to continually practice with this solution to keep improving. Every day, there are new challenges.
I would rate this solution an eight out of ten.
We are VAR/integration/development partner for our customers. Multi-dimensional analysis of financial and services data for financial institutions (including banks), telecom, and healthcare. For some financial institutions, we do some mining and machine learning scenarios.
We also combine cube manipulations on mobile devices, like tablet and smartphone, using Power BI.
We have worked with Microsoft Analysis Services from SQL Server 7.0 and with the Analytics Platform System from beginning.
We help customers in many ways from customized analysis for detection of anomalies in tax, operations, customer relationship, and marketing campaigns, etc. We also use mining and ML to helping them discover trends, which provides useful information based on their business.
It is closely integrated with other products in the MS portfolio.
Hybrid environments are complex to manage. We need to support customers frequently, even when they have done many training classes to ensure technology transfer.
Originally published at https://www.linkedin.com/pulse/microsoft-analytics...
In April 2014, Microsoft announced the future vision for their data
and analytics platforms. Microsoft Parallel Data Warehouse (PDW) was
rebranded as Microsoft Analytics Platform System (APS) with additional
appliance component offerings. APS combines MPP SQL Server data
warehouse with HDInsight, Microsoft’s 100% Apache Hadoop component
directly into the appliance. APS is a big data analytics appliance
capable of analyzing data of any type, structured or unstructured, and
of any size.
Microsoft ASP integrates data from SQL Server PDW with unstructured
big data from Hadoop through the PolyBase data querying technology.
Polybase gives APS a huge advantage over the competition because it you
to talk with big data in the regular T-SQL language you already use and
understand.
With APS, NoSQL doesn’t replace relational databases. Structured and
unstructured data technologies complement one another, and queries can
be executed across both universes.
By integrating Hadoop into the same rack as the relational data
warehouse, organizations can save on consulting, development and
configuration costs for Hadoop with an integrated appliance.
Why Microsoft APS is an EDW Game Changer
While ASP provides many innovations and improvements, four stand out
as strategic game changes CIOs should consider when evaluating data
warehouse and analytics strategies. SQL Server PDW, HDInsight, Polybase
and xVelocity Columnstores as a Platform for Data Mining and Analysis.
By combining both Microsoft Parallel Data Warehouse (PDW) and
unstructured Hadoop analytical capabilities in a single, easy-to-manage
EDW appliance, Microsoft APS is well positioned to help organizations
use information to enhance their competitive position.
Microsoft Parallel Data Warehouse (PDW)
Microsoft SQL Server Parallel Data Warehouse (PDW) and xVelocity Columnstores are covered in my article Microsoft Parallel Data Warehouse (PDW).
HDInsight & Hortonworks
HDInsight
is Microsoft’s 100% Apache Hadoop distribution based on Hortonworks
Data Platform. HDInsight is the phoenix that emerged from the ashes of
Dryad. Dryad was Microsoft’s own proprietary and competing version of
Hadoop that Microsoft tinkered with for 5 years before abandoning it.
PolyBase & Big Data Hadoop Integration
It’s not enough to store data in Hadoop. Businesses today need to
figure out how they can analyze Hadoop data fast and seamlessly in order
to make more informed business decisions. Unstructured and high volume
data are the two fastest growing types of enterprise data.
Organizations are using Apache Hadoop to store process non-relational
data from sources like blogs, clickstream data that is generated at a
rapid rate, social sentiment data with different schemas customer
feedback, sensor data, or telemetry data feeds. Most of this data is
not suitable for relational database management systems and often ends
up isolated from business users because it is not integrated with data
in the traditional data warehouse.
Technologies, like Hadoop are generally used, but implementing Hadoop
with traditional data warehouse and business intelligence
platforms pose new challenges. Hadoop is both open source, Java-based
and manages non-relational data across many nodes. It’s easy to add data
to Hadoop, but not so quick to extract and analyze it. The idea is
that, if the data is there, it may take a while to retrieve it, but at
least the data is stored somewhere in the system. MapReduce doesn't
have to be implemented in Java, however.
Big Data is not only about figuring out how to store, manage, and
analyze data from non-relational sources, but also about mashing
together various non-relational data with an organization’s relational
data to gain business insight. See my articles on Big Data
Data Lakes & Don’t Drown in the Data and 360 Degree View & Unifying Enterprise Data in a Sea.
PolyBase is the Microsoft APS query tool that enables you to easily
query PDW and HDInsight data using T-SQL, without investing in
Hadoop-based skills or training.
Microsoft PolyBase is a fundamental breakthrough on the data
processing engine which enables integrated query across Hadoop and
relational data. PolyBase opens up a whole new world of data analysis
and integration possibilities. This integration allows organizations to
merge large volumes of non-relational data stored within Hadoop with
their traditional enterprise data. Customers can continue to use their
existing analytics tool set to analyze their organization’s big data.
Without manual intervention, PolyBase Query Processor can accept a
standard SQL query and join tables from a relational source with tables
from a Hadoop source to return a combined result seamlessly to the user.
Queries that run too slow in Hadoop can now run quickly in PDW, data
mining queries can combine Hadoop and PDW data, Hadoop data can be
stored as relational data in PDW, and query results can be stored back
to Hadoop.
By using the power of Microsoft APS to run queries on Hadoop data in
HDInsight, it is now possible to do more in-depth data mining,
reporting, and analysis without acquiring the skills to run MapReduce
queries in Hadoop. PolyBase gives you the flexibility to structure the
Hadoop data you need, when you need it, as it’s brought into PDW for
fast analysis. You can seamlessly select from both Hadoop data in
HDInsight and PDW data in the same query, and join data from both data
sources. To satisfy a query, PolyBase transfers data quickly and
directly between PDW’s Compute Nodes and Hadoop’s Data Nodes.
APS uses external tables to point to data stored in text files on a
Hadoop HDFS cluster. Once an external table is created, the table can be
used in a select statement in the same manner as a PDW table. PolyBase
uses a single Transact-SQL query interface to leverage PDW and Hadoop,
so you don’t need to learn a host of new skills to run MapReduce queries
in Hadoop. PolyBase hides all the complexity of using Hadoop so most
business users do not need to know anything about Hadoop.
PolyBase uses ‘predicate pushdown’ to Hadoop that generates
map-reduce jobs behind the scenes to do the work on the Hadoop side
instead of distributed query data movement when necessary.
With PolyBase, organizations can take advantage of flexible hybrid
Hadoop solutions and query across Hortonworks, Cloudera, and even into
the cloud with Microsoft Azure HDInsight. PolyBase is only available in
Microsoft APS and is not available in SQL Server SMP at this time.
Integration with Business Intelligence Tools
APS has deep integration with Microsoft’s BI tools and other leading
non-Microsoft tools, making it simple to use the BI tools you are
familiar with to perform analysis. APS’s deep integration with Business
Intelligence (BI) tools makes APS a comprehensive platform for building
end-to-end data mining and analysis solutions. APS integrates with the
Microsoft BI Stack including Reporting Services, Analysis Services,
PowerPivot for Excel, and PowerView. But, APS also integrates with a
growing list of leading non-Microsoft BI platforms, such as Business
Objects, Cognos, SAP Data Integrator, Tableau, MicroStrategy, QlikView,
Oracle Business Intelligence, and TIBCO Spotfire.
Easy to Use & Manage
APS is designed for simplicity. The complexity is already engineered
into the appliance so that you don’t have to handle the details. The
appliance arrives with the hardware and software already configured and
installed. PDW handles all the plug and play details of distributing
the data across the appliance nodes, performs all the extra steps
required to process queries in parallel, and manages the low-level
hardware and software configuration settings. No tuning is required
because the appliance is already built and tuned to balance CPU, memory,
I/O, storage, network, and other resources.
Minimal Learning Curve
EPS has a minimal learning curve. There’s no need to
hire new talent in order to move from SQL Server SMP to SQL Server PDW
and EPS. DBAs who already know T-SQL can easily transfer their SQL
Server SMP knowledge to PDW. Some T-SQL query statements are added or
extended to accommodate the MPP architecture. There’s less DBA
maintenance. You don’t need to create indexes besides a clustered
columnstore index. DBAs can spend more of their time as architects and
not baby sitters. In my opinion, the alignment of APS with existing IT
skills may be its
biggest competitive advantage.
The appliance model is key to getting great performance. Tuning a
large database using traditional approaches is extremely difficult and
requires highly skilled DBAs. One of the main problems with the SMP
model, is the difficulty of understanding and tuning the interface
between the DBMS software and the underlying OS and hardware platform.
With SMP, there are a plethora of tuning parameters and options for the
DBA and OS administrator to setup. In the appliance model, the entire
software and hardware stack from SQL to storage is automatically
controlled. As a result, virtually all the complexity is removed.
Manageable Costs
Microsoft APS has manageable costs. APS has lower price/terabyte over
other companies by a significant margin. About 2x lower than Teradata,
Oracle, Greenplum and others. It’s worth noting that Microsoft’s
offering is cheaper than the competition not because of lower quality or
missing capabilities, but because of a different business strategy.
The strategy of commoditizing markets and then selling higher volumes to
make up for lower margins. Given that SQL Server is one of the most
popular enterprise databases on the planet, and APS falls under the SQL
umbrella, it has enough of a relative advantage that it could easily
become the biggest Big Data appliance player of all.
Microsoft APS & Hub and Spoke Architecture
See my article Microsoft APS & Hub and Spoke Architecture about using Microsoft APS to Build a Hub and Spoke EDW Architecture.
These views are my own and may not necessarily reflect those of my current or previous employers.
Originally published at www.linkedin.com/pulse/microsoft-aps-hub-spoke-architecture-stephen-c-folkerts
Using Microsoft APS to Build a Hub and Spoke EDW Architecture
Scalable, cost-effective Enterprise Data Warehouse (EDW)
implementation is an elusive goal for many large organizations. Three
common approaches include centralized EDW or the ‘monolithic’ approach.
Decentralized collections of data marts, and attempts at Hub and Spoke
architectures that combine the two. Microsoft APS combines the best
features of both EDW and decentralized data marts.
Centralized EDW
The monolithic approach has become more common in large EDW
installations. However, centralized EDW tends to be extremely expensive
and very inflexible. As a result, business units become frustrated at
the inability of the EDW to meet their needs within a reasonable cost
and timeframe.
Decentralized EDW & Data Marts
A divide-and-conquer strategy is a natural and effective approach to
addressing large-scale problems such as creating an integrated,
enterprise-wide data warehouse. Decentralized EDW architectures align
well with this approach and fit the many compartmentalized demands
presented by large organizations. Decentralized data marts are more
responsive to business unit needs, but often result in many versions of
the same data that are very difficult to keep consistent across the
enterprise. Each approach, centralized or decentralized, tends to
evolve or degenerate into the other, but neither is a tenable long-term
solution for most large organizations.
Microsoft APS & Hub and Spoke Architecture
A Microsoft APS appliance enables a true ‘hub and spoke’
architecture, where the centrally managed ‘hub’ contains detailed
enterprise data, and departments or business units use ‘spokes’ to
exchange data with the hub according to their unique schemas. This
architecture can exist across many connected SQL Server MPP databases.
Microsoft expands the Hub and Spoke concept to include not only MPP
appliances but also standard symmetric multi-processing (SMP) instances
of SQL Server and SQL Server Analysis Services, (SSAS), allowing either
to be viewed as nodes within a grid. The result is a highly flexible,
affordable, and scalable platform that makes large-scale Hub and Spoke
EDW architectures a practical reality. They combine the benefits of
central control and governance with the agility of decentralized data
marts, but without the inherent delivery pains, headaches, risks, and
costs associated with previous strategies.
Hub and Spoke Architectures
Hub and Spoke architectures match the structure of most large
enterprises by offering a combination of a centralized EDW and a set of
dependent data marts. The EDW hub allows the entire enterprise to set
and enforce common standards while answering questions that cut across
business units. The data mart spokes allow business units to meet their
own needs quickly and at relatively low cost while still conforming to
the needs of the overall enterprise.
The Hub and Spoke architecture allows business units to set their own
budgets and priorities, while contributing as necessary to the central
EDW. This close fit between the architecture of the business and the
architecture of the DW platform means Hub and Spoke systems are widely
regarded as the best overall approach. In practice, Hub and Spoke
systems have been notoriously difficult to implement.
Distributing data from a centralized EDW reliably and quickly enough
to meet the needs of the business units is a big challenge in the face
of growing data volumes. To try to compensate for this, complex and
cumbersome ETL processes are developed to transfer data, between the hub
and spokes, resulting in high maintenance costs and an inability to
change with the business. In general, efforts to build a Hub and Spoke
architecture have quickly degenerated into a set of siloed data marts
after being torn apart by conflicting business units and requirements.
One response to the difficulties of building a Hub and Spoke
architecture has been to simply centralize everything onto one
monolithic EDW. A centralized EDW platform quickly becomes overloaded
with conflicting use cases. Solving any one problem requires evaluation
of all existing dependencies, which drives rigid change control
processes and ultimately impacts cost and time-to-delivery for
projects. And if virtualized data marts are used, all the queries and
I/O, execute physically in the hub. Business units become frustrated by
the inability of IT to quickly meet new requirements with the central
EDW and start building their own independent physical data marts as a
result.
Decentralized EDW
With a decentralized approach, which the other two tend to degenerate
into anyway, business units simply build their own independent data
marts. Although such an approach is obviously responsive to business
needs, it doesn’t allow management to answer cross-enterprise questions
easily or quickly. Keeping all copies of data across a decentralized
infrastructure current and accurate can become overwhelming. The
problem becomes worse as relatively low bandwidth data movement options
drive complex data transformations that scale poorly. And it’s very
difficult to apply any real measure of enterprise-wide standards,
controls or regulatory compliance.
Microsoft’s EDW Platform
A Microsoft APS appliance can be viewed as a highly-specialized grid
of servers being pulled together to collectively form an EDW platform.
Taking this view, it is a small step to think of PDW as both a grid of
appliances and a grid of nodes. Moving data across this grid of
appliances is incredibly efficient, since data can be moved directly
from node to node within the grid. This maximizes parallelism across the
environment and minimizes the conversion overhead associated with
export and load operations. Such a grid of appliances can be used to
implement a data warehousing Hub and Spoke architecture.
Microsoft expands the Hub and Spoke solution to include not only MPP
appliances but also standard SMP instances of SQL Server and SSAS to be
viewed as nodes within a grid. A grid of SMP databases and MPP
appliances can be used as the basis for any large-scale data warehouse
environment or architecture. However, it is particularly suitable for a
Hub and Spoke architecture.
MPP for Hub and Spoke
Microsoft PDW with high speed parallel database copy is fundamental
to solving one of the most intractable problems in large-scale data
warehousing. Building an effective, scalable, and affordable Hub and
Spoke solution. The basic idea is to take a divide-and-conquer approach
to building an EDW. This avoids performance problems due to conflicts
between queries from different business units. Provides a dedicated,
high-speed, network interconnecting all hub and spoke databases. And
business analysts view the appliance as a set of separate data marts,
but can drill into detailed data on the hub where required.
The Microsoft Hub and Spoke Solution
Imagine a fairly large MPP appliance acting as the hub for a set of
MPP appliance and SMP database data marts. The hub holds detailed data,
probably in a normalized schema, for a number of business units or the
entire enterprise. The hub is loaded in near real time or in daily
batches from source systems leveraging a preferred ETL solution. Data is
then transformed or restructured to a denormalized structure (star,
cube, etc.), as needed, and transferred to the appropriate data mart(s)
via the high speed grid for consumption by end users. If a data mart
requires data from sources that are not covered by the hub, this data is
loaded independently using standard ETL tools. However, most of the
data required (both fact and dimensions) comes from the hub.
Users connect to the independent data mart appliances as usual for
running queries. This allows each data mart to be tuned for the needs of
a particular set of users and sized to handle the required level of
performance and concurrency. While the data marts can be independently
designed to meet the needs of each business, it will be possible to
leverage existing data mart applications such as Microsoft Analysis
Services, Reporting Services, Excel, or other BI vendor products.
Bandwidth within the grid is large enough to enable the direct copy
of detailed fact data or entire data marts. This can greatly simplify
the data mart creation and update process by using a publish-subscribe
model as opposed to complex transformation logic that, coupled with
expensive export and load scenarios, creates significant challenges for
traditional federated approaches. The end result is an EDW platform
that can handle a very complex workload while being extremely scalable
at a sensible cost.
Disaster Recovery and High Availability
The Microsoft EDW platform provides the capability to set alternate
database systems within the dedicated high speed network as failover
targets. As an example, a user attempting to connect to a spoke that is
currently unavailable would automatically be redirected to an alternate
spoke specified within the standard connection protocol. This simple
approach becomes very powerful when combined with the Hub and Spoke
architecture. The high-speed and bandwidth of the grid copy facility
allows full copies of end-user data marts to be moved to multiple
spokes. This effectively recreates the end-user view of the data on
multiple spoke systems, each a valid failover option for the other in an
outage scenario.
This concept can also be leveraged across multiple data centers to
provide an effective disaster recovery architecture. Individual
appliances can be replicated on a second site and automatically kept
up-to-date. Note that not all of the appliances on a grid would need to
be replicated. In most scenarios only the hubs need to be replicated,
as spokes can be recreated from the hubs. This provides the flexibility
for each business unit to decide whether or not to provide a disaster
recovery capability, based on their own service-level agreements (SLAs).
Microsoft’s Grid-Enablement Strategy
This approach offers customers an attractive alternative to
centralized, monolithic approaches. Data marts can be tailored to meet
the individual needs of business units (both in terms of capacity and
performance). Furthermore, customers can buy into the Microsoft EDW
approach with the deployment of a few stand-alone data marts on standard
SQL Server SMP reference architectures. From this relatively low-cost
start point, you can scale into the hundreds of terabytes while
delivering manageable flexibility without sacrificing cost and
performance.
Microsoft Analytics Platform System (APS)
See my article Microsoft Analytics Platform System (APS) for a more in-depth look at Microsoft APS.
These views are my own and may not necessarily reflect those of my current or previous employers.