March 31, 2017

Data Integration from Different Perspectives

21 minutes

Our guest on the podcast this week is James Markarian, CTO at SnapLogic.

Data integration is going through a Renaissance at the moment just like application integration did a few years ago. The primary driver for these changes is the movement to the cloud. When you turn back the clock, application integrations used to be simple. There were a few ERP systems, and as things started moving to the cloud via SaaS, application integration changed quite a bit. Now we're seeing analytic applications move to the cloud as well. The biggest force of gravity is the mass movement to the cloud. That is also driving data integration. By definition, we're now replicating platforms as much as we are moving into them. Enterprises that are moving into the cloud need a parallel data application integration strategy to make use of your data as an asset and to make sure the overall plan will work.

Data is frustrating to deal with. The problems seem easy and the reality is hard. Normalizing data, merging it together so you can get meaningful results out of it, making sure your SLAs are met in terms of freshness and quality are all easy to describe but are frustratingly hard to do. One of the challenges we have for on-premise systems is there has always been a goal to democratize data. The challenge is access. It sounds easy, and the cloud inherently breaks this democratization barrier. There is now a public place where people can go get access to data. That centrality is driving a lot of innovative capabilities we are seeing emerge.

Going from legacy applications to cloud requires a different approach to application and data integration. One starting point is choosing a cloud platform and trying to pick the right one. This can be the toughest decision a customer faces. Within each cloud platform there are also myriad choices, from which query technology to use, to whether to use Hadoop implementations on Azure, and more. There are so many choices and it's easy to feel like you've made the wrong one.

Working in enterprise IT, there is a shift in the view of how data should be managed, accessed, and manipulated. In the early days, it was a developer's job. IT was the team who knew how to access data and application systems. Self-service pushes some of the responsibility out to the edges. It's not about wanting to move away from IT, but it's about empowering those who who have the domain knowledge and will eventually be using the data. How we access these systems is different now. The structure of the data is also different. It used to be row and column-oriented. Nowadays, with REST and JSON, being able to handle the formats and not create new barriers so you are not manhandling the tool into dealing with data in unnatural ways. That will make developer lives easier and make it easier for vendors to add connections to the data.

Databases used to be relatively simple. These days it's often special-purpose databases such as in-memory, object-based, hierarchical, and relational databases. In some ways this makes things more complex, but with layers of abstraction it can also make things easier. When we only had relational, you had to make data look relational whether it was meant to or not. Now you do not need to force anything to fit different data sets.

SnapLogic is beginning to leverage AI and Machine Learning. Their focus is on making computers work for you instead of making you work for computers.

...more