3 Nov 2014

Making The Cloud Work For The Military

pple, Amazon, and Google long since outstripped the Pentagon in information technology. But as the military and intelligence community try to take advantage of commercial IT innovation, especially in cloud computing, they have run into harsh limits. Security, long-range bandwidth and the sheer volume of data have created problems for the Pentagon that current commercially available cloud services can’t solve, two senior defense officials told me recently.
The Defense Department will need a different kind of cloud, said Dave Mihelcic, chief technology officer at the Defense Information Systems Agency (DISA), and Dan Doney, chief innovation officer at the Defense Intelligence Agency (DIA). In fact, it’ll need several different kinds of cloud, customized for different missions.
There won’t be “one database to rule them all,” a knowledgeable Hill staffer told me, “[but] having five clouds is better than having 200 data centers,” some of them little more than a bunch of servers in a closet and run by managers lacking cybersecurity expertise. “From a security perspective, the fewer you have the better off you are,” the staffer went on, and from the user perspective, “you go down from having potentially hundreds of [separate] databases that you would have to search, that aren’t always connected, to potentially five or so that are connected.”
That’s a far cry from the single all-encompassing system envisioned by some enthusiasts and originally pursued by the National Security Agency. But dialing down that vision is a logical response to the lessons learned from the NSA’s struggles over the last two years to achieve a single super-cloud, several insiders told my colleague, Colin Clark, at the recent Intelligence & National Security Summit. The Director of National Intelligence himself, James Clapper, had already admitted in May that moving to the cloud did not save money, as he’d hoped.
True, both the National Security Agency and the Central Intelligence Agency have committed to the cloud. The NSA used open-source technology to build its in-house cloud computing network for top secret data — albeit at a much higher cost than NSA leaders had hoped. The CIA actually hired Amazon Web Services to run its secret cloud — although the civilian AWS employees must have security clearances and work on-site at a CIA facility, running a “private cloud” unconnected to the public Internet. What works for national intelligence agencies, however, may not translate to military intelligence or the armed forces in general, where users are scattered in low-bandwidth locations around the world.
In the upper echelons of the intelligence community, said DIA’s Doney, “we work on networks that are very sensitive, but [the] analysts have direct access to the network.” A CIA analyst in an office at Langley can use a secure landline connection to the classified server next door. Not so a young military intelligence officer on a ship at sea, advising foreign forces, or marching across a distant battlefield.
In the Defense Department, the Hill staffer pointed out to me, “you have to be able to accommodate everything from Secretary of Defense sitting at his desk… all the way down to the guy in the foxhole, who may only have the equivalent of a dial-up modem.”
Indeed, the central concept of “the cloud” itself — that users keep all their data in distant servers and download it as needed — chokes on the sheer amount of data modern sensors can collect and the limited wireless bandwidth available to transmit it. A reconnaissance drone can carry a dozen different high-tech cameras, for example, but it can hardly trail a thousand miles of fiber optic cable behind it to download all that data to its base, let alone back to the Pentagon.
Some top-priority video can be live-streamed over the military’s global network, DISA’s Mihelcic told me, but “there’s too much information that comes off the modern platforms, Global Hawk in particular, [and] that can’t transit the network in real time.” Currently, he said, “there are disk packs that are actually flown back to the continental United States on a regular basis… giant bricks of high-density terabyte disk drives.”
“Always moving an enormous amount of data back to your main cloud nodes in the continental United States really isn’t feasible,” Mihelcic said, “so you are going to have a distributed cloud” — one with multiple data repositories around the world — “and you’re going to move the answers between various locations.”
But how do you figure out which small slice of these huge datasets gets priority to move across limited satellite bandwidth between continents? That turns out to be another area where simply borrowing commercial models will not work.


Searching For Trouble
Google’s search engine is so powerful that it’s become a verb. Amazon’s algorithm for recommending other items you might like is so individualized it’s uncanny. But they do exactly the opposite of what intelligence analysts and officers need, said the DIA’s chief innovation officer, Doney, a problem which will require some fundamental research to fix.
“The models that dominate this information discovery problem are popularity-based,” Doney told me: The more people click, link to, or purchase something, the higher its rank and the more often it shows up in searches. (The underlying mathematics is called a Pearson correlation). But an intelligence analyst who only looks at what lots of other people have already seen is, by definition, not doing their job.
“In the intelligence community we want what’s new, what’s different,” Doney said. “Novelty and popularity are almost inverse,” which means Google and Amazon-style popularity-rank searches would make the most important new intelligence the hardest to find.
What’s more, Doney added, “novelty is a very complex thing.” It doesn’t simply equate to “most recently posted,” which is how Google News sorts articles on a given topic. “New” has to be new substance, not the latest regurgitation of what’s been previously reported, not a minor variation on what the analyst already knows.
Commercial search algorithms just don’t handle novelty well, said Doney. For instance, he uses Pandora to find new music, with the software extrapolating from what he already likes. “What I find is, after not too much time, I’m getting the same stuff via a particular channel that I got yesterday and the day before,” he said.
Pandora shows another fundamental problem with commercial search: Like everything from NASCAR video clips to this website, Pandora relies on human users “tagging” data with keywords and ratings. (Pandora uses 400 different music-theory metrics). There are ways to automate some tagging: With drones, for instance, “the good news is we tag a lot of this data on the sensor,” Mihelcic told me. “We know where the platform was, we know where the camera was pointed, we know the capabilities of that camera,” etc.
Even in automated systems, however, some human being has to choose what kind of things to tag in the first place — and because the world keeps changing, that list will always be missing something. (In fact, Godel’s Incompleteness Theorem says no system of logic can ever be complete). “Tagging of data is putting that data into a context,” Doney said. “The real challenge there is what happens when the context changes?
You can always add new tags, of course, but how do you apply them retroactively to the vast amounts of material already in the database? It’s like watching a murder mystery and meticulously noting which suspects appear in every single scene — until you come to the climax and realize you never tagged the butler. Or, to give a grimmer example, who would have known before Sept. 11, 2001, that “taking flying lessons” was a crucial criterion for tracking potential terrorists?
In any modern search system, “the user will only find the particular data they know to look for…and that does not solve the 9/11 challenge,” said Doney. “The 9/11 challenge is, how does the data find the user?”
The search system needs to be smart enough to know the ever-changing context of what each individual user has already seen and what they need to see. It also needs to know what they do not need to see, so no more Bradley Mannings or Edward Snowdens can scoop up entire archives of classified material. Current clearance levels and “compartments” (as in SCI, “Sensitive Compartmented Information”) are just too broad to handle the problem, Doney said: If I’m a counterintelligence officer, for example — let’s call me “Aldrich Ames” —  that doesn’t mean I should have access to or even know about a counterintelligence investigation of me.
This kind of search raises fundamental questions of information theory and artificial intelligence research — some of them neglected for decades. “There used to be a field, 30 years ago, called ‘user modeling,’ about understanding a context of a user’s needs,” said Doney. “When keyword search came along” — e.g. Google — “it sort of killed off that field.” Now, he argues, it’s time for defense and intelligence R&D funds to revive it.


Kinds of Clouds
The unique needs of many military missions require highly customized kinds of cloud. DISA’s Mihelcic envisions a multi-level approach. Many Defense Department functions are things any large organization does and can use the same commercially available cloud services as the private sector, which will save money. Press releases, for example, are supposed to be seen by as many people as possible, so they hardly require the same protection as war plans.
That said,  Mihelcic cautioned me, “I don’t want someone to alter my press releases.” Imagine, for example, the fallout if jihadi hackers reworded this standard Central Command airstrike report from “The strike destroyed one ISIL armed vehicle” to “The strike destroyed one mosque.” So, Mihelcic said, “we do have some requirements for security even in the public cloud.”
Slightly more sensitive data can reside in clouds restricted to government users only, such as Amazon’s GovCloud. That still doesn’t mean Defense Department data can just mix and mingle with other federal agencies’ information. Under five new pilot projects shepherded by Pentagon chief information officer Terry Halvorsen, Mihelcic said, DoD is looking at technologies “that will allow us to virtually separate our sensitive workloads from other sensitive workloads in those government clouds.”
Virtual separation means a single server or group of servers running two or more clouds at once, but not letting either set of users see the others’ database. The magnetic ones and zeroes representing the data are still all physically in the same place. “What happens if one customer in a virtual private cloud can get into another customer’s virtual data center, [or] a bad guy from the Internet can get into one of those virtual private clouds?” asked Mihelcic. “[They're] physically on the same machine, attached to the Internet.”
So at higher levels of security, what’s required is a physically separate system devoted to military data and unconnected to civilian networks — a truly, not virtually, “private” cloud. “The DOD is going to have a requirement for private cloud solutions, certainly at the classified level and potentially at the highest level of sensitive [but] unclassified at well,” Mihelcic said.
Some of those private clouds, however, may well be run by (cleared) civilian contractors using (customized) civilian technology. That’s how Amazon runs its private cloud for the CIA, for example, while NSA is run by government personnel but still uses commercial technology. In fact, even at the highest levels of security, Mihelcic emphasized, there’s no need to reinvent the wheel at taxpayer expense. The Defense Department has to build its own custom networks for some purposes, he said, but it can still use commercial technologies as component: “No one wants to develop this from scratch.”
Nuclear command and control is a great example of that,” Mihelcic said. “We are upgrading much of nuclear command and control to use modern technologies, so we’re not using 30-year-old processors, but still that’s going to be on dedicated networks with tightly controlled access. It’s not going to sit out on the Internet. It’s not even going to be shared on our classified DoD networks.”
Applying commercial technology to nuclear weapons may seem odd, even unsettling. “It’s a difficult balance to strike, obviously,” said the Heritage Foundation’s Steven Bucci, a retired Army Special Forces colonel himself, “but the day of DoD designing its own technologies are long gone.”
When it comes to information technology, Bucci told me, “[while] the Department of Defense and even the intel community… can come up with some niche capabilities, they really need to start with the incredibly dynamic industry that comes out of the private sector. That stuff is so far ahead of what the government uses that it’s not funny.”

http://breakingdefense.com/2014/09/making-the-cloud-work-for-the-military/