In my previous question on this portal, I had asked about some insight about syncing data between SQL Server and key-value based data repositories.
In lieu of the same problem (one way real-time synchronization from SQL to HBase or any other database), I need to take care of some performance and latency considerations and did not find a very foolproof way of doing it.
We have multiple SQL 2008 data shards where data is updated from various sources and processed by many processes at the same time (and UI reads from the same shards).
The goal is to get all updates in selected tables at any point of time and transport them to other data source in almost real-time fashion.
Amount of change in SQL shards will stays in 100-500 MB range (if we keep the frequency of 1 min). We do not want to make major changes in SQL servers as we would discard it after migrating complete system.
Unfortunately, our data access layer is messed up in various layers. Otherwise, it would have been the best way to fork the updates in two directions.
Triggers will slow down the shards and leave them in an unresponsive state.
Not sure if SQL Server 2008 has something similar to SQL Server 2005 Notification services and how effective that would be.
Any other innovative solution would greatly helpful.
Here my problem is not about transforming the data from relational to Key-value form (it's fairly easy), but how to get the SQL Server updates in real-time (can afford the latency of 1-2 minutes) without affecting the user experience.
have you looked at SQL Service Broker? here's a link with some info on it: http://blogs.msdn.com/b/sql_service_broker/archive/2008/07/09/real-time-data-integration-with-service-broker-and-other-sql-techniques.aspx
One option you might want to look into is SQL Server Integrated Change Tracking (part of SQL2008 or higher). This is an incredibly efficient way of finding the changes that have occurred in your SQL Server database (including deletes), has very little impact on your SQLDB, does not require triggers and provides a good way of allowing you to then move the data changes to Hadoop.
Full disclosure, I work on Cotega and this data sync is something we are making a big focus on. I am happy to help more if this is a direction you are interested in taking.