The requirements for modern business intelligence or analytics systems can't be underestimated. Systems not only have to be robust, fast and reliable, they also need to be able to quickly respond to new requests and changing requirements.
The cloud, with its unparalleled scale and flexibility has swept away the Hadoop landscape in almost no time. Running large, distributed batch or streaming data processes will never be trivial, but are a lot easier to manage with cloud services that care of a lot of the infrastructure heavy lifting for you. However, you don't need to process massive petabytes of data to be successful with cloud analytics, the same advantages to cloud analytics apply to projects with more modest data volumes.
With the services offered by your cloud vendor and the pay-per-use pricing model, experimenting with technology becomes a lot easier than it would be in an on-premise scenario. Running a machine learning project in an on-premise scenario would require you to set up the required (virtual) hardware and install and configure your machine learning environment. This alone could be a daunting tasks, but only takes enabling and trying out the services offered by your cloud provider. The problems to solve won't change, the ease of use may increase dramatically.
Cloud vs on-premise Cloud Projects
A number of differences between cloud and on-premise projects we've come to take into account through experience are:
- Plan: moving to the cloud is easy. Moving out of the cloud or to another cloud provider won't be. Investigate which cloud provider works best for you, what your cloud migration road map, timeline and budget will be
- Architecture: even more so than for on-premise projects, architecture is key. You'll want to have a flexible architecture that is future-proof, but allows you to swap components in and out as your project evolves. A key element in your architecture is to identify which components will use your cloud provider's services and which components you'll want to install and configure yourself. Using your cloud vendor's services will get you up and running quickly but increases your lock-in, self-managed services will reduce the lock-in, but will take you more time and effort to manage.
- DevOps: to harvest the flexibility and scalability of the cloud, you'll have to go all in. Manually maintaining your infrastructure won't work. Once you have a basic architecture in place, start scripting that environment. There are lots of platforms (Ansible, Terraform, AWS CloudFormation, GCP Cloud Deployment Manager to name just a few), find out what works best for you.
- Experiment: all cloud platforms release new and update services continuously. Experiment, try, do PoCs to find out which services you can benefit from. There is a vast number of services for streaming data processing, relational and NoSQL databases, storage etc, all of which can be enabled and tested, often through just a couple of clicks.
- Monitor: as your infrastructure and cloud environments starts to grow, it's easy to lose the overview. Start monitoring your environment (and cost!) as early as possible, and let your monitoring environment grow with your cloud deployments.
The Big Three: AWS, Azure and GCP
The lion's share of the cloud market is taken by AWS (Amazon Web Services), Azure (Microsoft) and GCP (Google Cloud Platform). Which platform works best for you depends on a lot of factors. There are no definitive answers, a lot depends on your specific needs and preferences.
All three platforms have everything you need to successfully run your data warehouse, data lake and analytical projects. All three have invested heavily in a robust toolbox of AI and ML services.
The general advantages of running your projects in the cloud apply to business intelligence or analytics projects as well:
- scalable, flexible: you only use (and pay for) what you need. Quickly and/or temporarily expanding resources and experimenting with new technologies becomes a lot more achievable and affordable. As mentioned earlier, resource and cost monitoring is key to keep close control over your deployments.
- reliable: cloud providers guarantee service uptimes that are hard to achieve in your own data center. The roll-out, deployment and monitoring of your project-specific infrastructure and services can be fully automated.
- easy: a lot of mandatory but time consuming tasks, ranging from installing security updates over building backup/restores procedures to the installation and configuration of entire clusters can all be taken care of by cloud services, freeing up your own time for tasks that add value
- secure: cloud providers take care of encryption, protection against DDoS attacks, firewall management etc at a scale and on a level that makes even the largest companies and organizations move their data to the cloud
How can we help?
Over the last number of years, the vast majority have moved to or started in the cloud. We have extensive experience in running BI and analytics projects in the three major cloud platforms: AWS, Azure and GCP.
Contact us if you'd like to learn more about how we can help you to:
- determine which platform works best for you
- build a cloud architecture and strategy
- build your data integration, business intelligence and analytics projects in the cloud