Ingest, upload, analyze: AWS wraps up data in the cloud
At Re:Invent, Amazon Web Services offer new options for all phases of data in the cloud
Uploading data, ingesting data, getting insights from data — we typically associate all three capabilities with cloud workloads. Amazon’s Wednesday keynote announcements at Re:Invent unveiled new services for doing all of the above — with creative wrinkles all around.
Why sync data over the wire to Amazon, for instance, when you can instead mail it? And given how much data gets socked away in Amazon for analysis, how about a tool aimed at business folks, not IT personnel, for getting value from that data?
AWS Import/Export Snowball
“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway,” computer scientist Andy Tannenbaum is reputed to have said. Amazon’s new appliance for migrating data to the cloud takes that notion to heart.
AWS Import/Export Snowball is a refined version of a service Amazon started offering back in 2009, where the user loaded data onto a device of their choosing and shipped it to Amazon with a manifest file. With Snowball, Amazon automates the process by providing the hardware to be loaded and streamlining the round-trip process.
The Snowball appliance is a ruggedized, tamper-proof, network-connected disk array, outfitted with a 10GB network port. The user fills it with up to 50TB of data at once, then ships it back to Amazon to have the data dumped into an S3 volume of their choosing. Each device costs $200 per job, with a daily penalty of $15 imposed for taking longer than the allotted 10 days to fill the device and ship it back.
Snowball’s appeal is meant to go beyond convenience, since many of its current and possible future features are aimed at assuring a customer its data won’t end up in the wild blue yonder. Not only is the data encrypted at rest on the device, Amazon can alert the customer whenever a Snowball job hits specific milestones: “in transit to customer,” “in transit to AWS,” “importing,” and so on. The e-ink status monitor on the front of the box even doubles as a shipping label, and Amazon mentioned the possibility of “other enhancements including continuous, GPS-powered chain-of-custody tracking.”
Amazon Database Migration Service
For those who are comfortable shuttling structured data over the wire incrementally into Amazon’s data centers, the company whipped the drapes off a similar item: Amazon Database Migration Service.
Users of Oracle, MySQL, or Microsoft SQL Server can replicate data from their data center to either the same database in AWS or convert it on the fly to a different one — for example, Oracle to MySQL. An included schema conversion tool ensures that the translated data won’t get mangled during the move, and Amazon claims it can suggest parallel ways to implement features that might not be available on the target platform.
Pricingis calculated by instance-hour for a virtual machine that runs the migration service (starting at 1.8 cents per hour), but data transfers to a database in the same availability zone as the Migration Service instance cost nothing.
Who’s the target audience? Most likely those looking to migrate to Amazon, but on their own terms and their own time. By setting up a path where data is passively replicated in the background, alongside existing business operations, they aren’t stuck in an all-at-once-or-nothing migration to AWS.
Amazon Kinesis Firehose
Amazon’s Kinesis was created to allow AWS customers to capture and work with live data, no matter the source. Its newest wrinkle, Amazon Kinesis Firehose, doesn’t expand on that idea. In fact, it cuts it down.
As the name implies, Firehose is little more than a connector that allows streaming data to be written into S3 or Redshift as it arrives. The only (optional) processing done on the stream in Firehose is compression or encryption, and the only options set by users are, for example, buffer size and the interval before data is delivered to its target bucket.
What’s interesting about Firehose is that it allows for data gathering and processing to be decoupled from each other. A user could, for instance, hitch an AWS Lambda job to trigger whenever Firehose data arrives in its target S3 bucket. That way, work can be done entirely on-demand as data arrives, using only as much code as is needed (per the Lambda processing model).
Data in the cloud isn’t much good on its own. Amazon has not traditionally lacked for options to collect and store data at scale, but now offers users a way to derive visualizations and insights from the bits they’ve amassed — through a service hosted right on Amazon.
Amazon’s new business intelligence product QuickSight connects to Amazon’s forest of existing database (RDS, DynamoDB, ElastiCache, Redshift) and analytics systems (EMR, Data Pipeline, Elasticsearch Service, Kinesis, Machine Learning). Once a given data source is connected, the user is presented with a UI akin to a simplified version of products like Tableau, with recommendations on what kinds of visualizations might be most appropriate for the selected data set.
Amazon’s cloud products are notoriously obtuse for users, but the interface for QuickSight seems straightforward and uncluttered — after all, it’s meant for the business side of an enterprise rather than IT. In another nod to business users is a promised feature where data harvested through QuickSight can be accessed via a SQL-like command language, so partner products (right now, Domo, Qlik, Tableau, and Tibco) can eventually make use of QuickSight’s in-memory processing. That said, there must be a better way to hitch Excel up to QuickSight, or Amazon will miss out on taking advantage of the single biggest self-service data tool in use in enterprises.
In another appeal to business users, QuickSight will cost $12 per user per month, or $9 per user for a year at a time. Up to 10GB of data taken into QuickSight from other systems can be stored for free. However, it’ll be a while before Amazon customers can judge if this is an improvement over legacy BI — QuickSight isn’t scheduled to launch officially until “early 2016.”