The popular Python-powered devops tool joins Red Hat, which plans to make it part of the larger workflow for hybrid clouds
Ansible has been acquired by Red Hat in a deal with terms that remain private, though VentureBeat claims $100 million changed hands. The purchase gives Red Hat a relatively well-known and widely used devops tool for system configuration that it can incorporate into its devops workflow.
In a news release and an FAQ, Red Hat said it saw Ansible’s tool set as a strong and complementary match for its own product line. The acquisition includes the Ansible project, as well as the Ansible Tower commercial product outfitted with many enterprise-grade additions (such as role-based access control). Existing contracts for Tower customers, and the existing Tower development model, will remain in place.
Ansible’s hooks are its simplicity and power, as InfoWorld’s Paul Venezia said in his review. The project enjoys a strong community of developers, as cited by Ansible’sblog post about the acquisition.
Among the major devops automation frameworks, Ansible most closely resembles Salt/SaltStack, which can operate in both agented and agentless mode, while Ansible uses an agentless architecture. Also, both Salt and Ansbile are built on Python, leveraging the convenience, development speed, and breadth of libraries available to the language, along with all the modules already created for it.
Red Hat has a sizable devops tool set, so will it trade them for Ansible and Ansible Tower? That seems unlikely, based on Red Hat’s positioning of its existing solutions, CloudForms and Satellite. CloudForms is mainly aimed at policy, orchestration, and governance of hybrid clouds, not automation; Satellite is for maintaining Red Hat servers and has been integrated with a competing automation system, Puppet.
Rather than replace existing solutions outright, Red Hat plans to adopt Ansible as automation middleware. Configuration requests supplied by CloudForms can be passed on to Ansible, which in turn can automate changes — for example, by deploying Satellite agents on the machines that need them.
Ansible might also edge out the use of Puppet in Satellite, depending on which of the two solutions has the bigger draw with Red Hat’s user base. Satellite itself is also Python-based, meaning Ansible could be a more complementary fit for it in the long run.[An earlier version of this article incorrectly stated that Salt requires an agent for its operations, and uses Ruby as its language.]
- Published in DevOps
Automation is good in many cases — but not all. Too many enterprises don’t make that assessment
We all know the trend: Use the cloud to automate security, governance, and management, and use devops tools and technology to automate the stream of software that flows from the coders to the cloud.
Automation is good. It removes us from the mundane tasks, and it drives a repeatable process that eliminates the element of human error. But enterprises are going a bit nuts with the concept. It’s clear to me that you can overautomate far past a diminished return.
When moving to the cloud and devops, here are the types of duties you want to automate:
- Any task that has a repeatable pattern, such as unit testing, proactive performance monitoring, and removing unused machine instances
- Any task that runs better without a person involved or that are easily automated
- Any task that is noncritical to the business; if it fails, you won’t be hurt too badly
However, here’s what you should perhaps not automate:
- A task that requires constant human intervention; if a person must constantly make a decision, automation brings little value
- A task that is not repeatable, which means it’s difficult or impossible to pre-program an automated response (sorry, machine learning is nowhere near that advanced)
- A task that is critical to the business — if it fails, you will be badly hurt
Cloud and tool providers tell IT that automation leads to productivity, which leads to efficiency and, in turn, a return on investment. Although the notion is generally true, it’s not consistently true. You must make the decision to automate, or not, on a case-by-case basis.
As you move to cloud and use devops as a path to leveraging the cloud more effectively, you might have too many items you can automate. Keep in mind that simply because a task can be automated doesn’t mean it should. Pick your automation opportunities, battles, risk, and — ultimately — ROI thoughtfully.
In the next few years, I figure many enterprises will find they overdid the automation because they could. Try not to be one of those enterprises.
- Published in DevOps
Bruno Connelly, vice president of engineering at LinkedIn, describes how transforming operations gave rise to a new, hyperscale Internet platform
Bruno Connelly is not a fan of the term devops, mainly because it means different things to different people.
In certain startups, for example, devops simply means that developers shoulder tasks once performed by operations. But at LinkedIn, where as VP of engineering Connelly has led the company’s site reliability efforts for five and a half years, operations has expanded its role to become more vital than ever while providing developers with the self-service tools they need to be more productive.
You might call that devops done right. In fact, Connelly’s buildout of operations holds valuable lessons for any organization that needs to scale its Internet business. For LinkedIn, that growth has been dramatic: Over the past five years, the service has ballooned from around 80 million to nearly 400 million users — and from basic business social networking to a wide array of messaging, job seeking, and training services.
Throughout that expansion, Connelly has played a key role in creating new sets of best practices and infrastructure-related technologies. More importantly, he has helped lead a transformation of operations culture that has affected the entire company.
A shaky situation
When Connelly joined LinkedIn in 2010, both traffic and the brand were taking off — and LinkedIn.com was creaking under the load. “We struggled just keeping the site up. I spent my first six months, maybe a year, at LinkedIn being awake and on a keyboard with a bunch of folks during those periods trying to get portions, if not all, of the site back up.”
The team he inherited was great, he says, but there were only six or seven of them, as opposed to a couple of hundred software engineers writing code constantly. “I was hired at LinkedIn specifically to scale the product, to take us from one data center to multiple data centers, but also to lead the cultural transition of the operations team,” he says.
As with many enterprise dev shops today, developers had no access to production — nor even to nonproduction environments without chasing down ops first. “The cynical interpretation is that operations’ job was to keep developers from breaking production,” Connelly says. Essentially, new versions of the entire LinkedIn.com site were deployed every two weeks using a branch-based model. “People would try to get all their branches merged. We’d get as much together as we could. If you missed the train, you missed the train. You had to wait two weeks.”
Adding to the frustration were the site rollouts themselves, which Connelly remembers as “an eight-hour process. Everyone was on deck to get it out there.” At a certain point in that process, rollback was impossible, so problems needed to be fixed in production. At the same time, the site ops team had to maintain the nonproduction environment “just to keep that release train going, which is not a healthy thing.”
Change came from the top, driven by David Henke, LinkedIn’s then-head of operations, and Kevin Scott, who was brought in from Google in 2011 to run software engineering. Connelly reported to Henke and was charged with changing the role of operations.
The first priority across the company was to stop the bleeding and get everyone to agree that site reliability trumped everything else, including new product features.
Along with that imperative came a plan to make operations “engineering focused.” Instead of being stuck in a reactive, break-fix role, operations would take charge of building the automation, instrumentation, and monitoring necessary to create a hyperscale Internet platform.
Operations people would also need to be coders, which dramatically changed hiring practices. The language of choice was Python — for building everything from systems-level automation to a wide and varied array of homegrown monitoring and alerting tools. The title SRE (site reliability engineer) was created to reflect the new skillset.
Many of these new tools were created to enable self-service for developers. Today, not only can developers provision their own dev and test environments, but there’s also an automated process by which new applications or services can be nominated to the live site. Using the monitoring tools, developers can see how their code is performing in production — but they need to do their part, too. As Connelly puts it:
Monitoring is not something where you talk to operations and say: “Hey, please set up monitoring on X for me.” You should instrument the hell out of your code because you know your code better than anyone else. You should take that instrumentation, have a self-service platform with APIs around it where you can get data in and out, and set up your own visualization.
On the development side, Connelly says that Scott established an “ownership model and ownership culture.” All too often, developers build what they’re told to build and hand it off to production, at which point operations takes on all responsibility. In the ownership model, developers retain responsibility for what they’ve created — improving code already in production as needed. Pride in software craftsmanship became an important part of the ethos at LinkedIn.
Altogether, a great deal of self-service automation has been put into place. I asked if, on the operations side, whether some engineers feared they were automating themselves out of a job. Connelly’s answer was instructive:
My personal opinion is that is absolutely the right goal. We should be automating ourselves out of a job. In my experience, though, that never happens — it’s an unreachable goal. That’s point one … point two is there’s a lot of other stuff that SREs do, especially what we call embedded SREs. They are part of product teams; they are involved with the design of new applications and infrastructure from the ground up so they are contributing to the actual design. “Hey, there should be a cache here, this should fail this way …”
Meanwhile, the monitoring, alerting, and instrumentation has grown more sophisticated. To ensure high availability, operations has written software to simulate data center failures multiple times per week and measure the effects. “We built a platform last year called Nurse, which is basically a workflow engine, where you can define a set of automated steps to do what we associate with a failure scenario,” Connelly says. Currently, he says he’s building a self-service escalation system with functionality similar to that of PagerDuty.
The most important lesson from LinkedIn’s journey is that the old divisions between development and operations become showstoppers at Internet scale. Developers need to be empowered through self-service tools, and operations needs a seat at the table as applications or services are being developed — to ensure reliability and to inform the creation of appropriate tooling. Call it devops if you like, but anything less and you could find yourself on shaky ground.
- Published in DevOps
Delphix’s first ‘State of DevOps’ report finds even the definition of the word is in flux among its practitioners
Data-as-a-service company Delphix has launched its first annual “State of DevOps” report, which attempts to gather data from “leaders and practitioners” across North American and European enterprises on how they see devops.
One of the biggest questions, Delphix believes, is the very definition of the term: What does devops stand for among its implementers, what is the term meant to encompass, and how should it be handled?
Leaders, practitioners, and everyone in between
Delphix’s survey breaks devops people into two camps: “leaders” and “practitioners.” The former includes those who self-identify as being part of a “a strongly defined and successful series of DevOps initiatives.” Only 10 percent of those surveyed identified themselves as leaders; another 59 percent identified as practitioners who were involved in ongoing devops work or planned on starting such work. (The remaining 31 percent evidently didn’t meet the criteria for either category.)
Much of the rationale for seeing devops teams in this binary fashion is the belief that while there are plenty of definitions for devops, few are in agreement. The ones who have the sharpest definition of the term, claims Delphix, benefit the most simply because they’re able to better describe the missions at hand.
However, not everyone feels an exact definition is needed. Adam Jacob, CTO of Chef, has likened devops to kung fu: The implementations vary, but those who practice the art recognize its other practitioners as well.
Data, devops, and cloud deployments
Delphix’s background in data virtualization influenced the report’s approach. One section, entitled “The State of Data in DevOps,” covered how devops teams deal with live data. Ninety percent of the respondents cited limitations with their testing environments due to data management issues, saying they needed full production data to do devops work and more often than not simply gave developers unaudited access to production data. (The report doesn’t attempt to connect such behavior to data leaks, but asserts that “companies are opting for agility over security.”)
Even aside from Dephix’s theses about devops, the data gathered about specific devops activities is intriguing. The most often cited reason why organizations embrace devops (true for 70 percent of leaders and 59 percent of practitioners) was pressure from other parts of the organization to deliver — to get products out faster, to reduce defect counts — far more than the need to accomplish more with less.
Another intriguing finding concerns what types of devops projects get the lion’s share of attention. The lowest-ranked item in the report was “deployments to private cloud,” cited by 29 percent of leaders and 47 percent for practitioners. “Testing” and “continuous integration” both ranked only incrementally higher. The reason for the private cloud’s low ranking wasn’t teased out in the report, but may reflect how the ops side of devops is potentially endangered by cloud; those with major cloud initiatives already enacted simply have less for their ops teams to do.
- Published in DevOps
In the cloud, infrastructure is accessible via APIs, and developers now have complete control.
There’s something new on the horizon, taking theprinciples of devops to a new area. It’s called infrastructure as code (IaC), and it’s how you configure cloud-based infrastructure via applications, using APIs.
This is a hard turn from the old days of configuration management, with sys admins controlling the platforms on behalf of developers.
There are compelling reasons for this new approach:
- Applications are now configurable for only the platform resources they need, programmatically. You can allocate the right amount of storage, memory, and processor instances. There is no more adapting the applications for the limitations of the platform; you can make the platform anything you want.
- The costs should drop for each application, considering you use only the resources you need. (If you check server utilization at any data center, you’ll see it’s typically less than 3 percent. The excess capacity represents wasted dollars.)
- You don’t need as much management and operations resources, both people and technology. You give up control to developers, so those items are now redundant.
The larger question: Is passing control of the infrastructure to developers a good thing? You bet. With the rise of devops and its tight coupling of developers and operations, the ability for developers to dynamically configure their deployment platforms via APIs is a better way to manage many applications and many platform instances.
You can now match platforms to applications, one to one. This is different from the past, where developers had to accept whatever platform was set up, then fight for any necessary configuration changes.
That said, this new approach requires culture change. People who are part of legacy processes and roles won’t easily accept this new process, and I suspect they will push back hard on IaC. However, organizations will discover that IaC works better, and the old guard will need to adjust. This notion simply makes sense.
- Published in DevOps