Monthly Archives: October 2016

How to Secure User Controlled Data

Most people with smartphones use a range of applications that collect personal information and store it on Internet-connected servers — and from their desktop or laptop computers, they connect to Web services that do the same. Some use still other Internet-connected devices, such as thermostats or fitness monitors, that also store personal data online.

Generally, users have no idea which data items their apps are collecting, where they’re stored, and whether they’re stored securely. Researchers at MIT and Harvard University hope to change that, with an application they’re calling Sieve.

With Sieve, a Web user would store all of his or her personal data, in encrypted form, on the cloud. Any app that wanted to use specific data items would send a request to the user and receive a secret key that decrypted only those items. If the user wanted to revoke the app’s access, Sieve would re-encrypt the data with a new key.

“This is a rethinking of the Web infrastructure,” says Frank Wang, a PhD student in electrical engineering and computer science and one of the system’s designers. “Maybe it’s better that one person manages all their data. There’s one type of security and not 10 types of security. We’re trying to present an alternative model that would be beneficial to both users and applications.”

The researchers are presenting Sieve at the USENIX Symposium on Networked Systems Design and Implementation this month. Wang is the first author, and he’s joined by MIT associate professors of electrical engineering and computer science Nickolai Zeldovich and Vinod Vaikuntanathan, who is MIT’s Steven and Renee Finn Career Development Professor, and by James Mickens, an associate professor of computer science at Harvard University.

Selective disclosure

Sieve required the researchers to develop practical versions of two cutting-edge cryptographic techniques called attribute-based encryption and key homomorphism.With attribute-based encryption, data items in a file are assigned different labels, or “attributes.” After encryption, secret keys can be generated that unlock only particular combinations of attributes: name and zip code but not street name, for instance, or zip code and date of birth but not name.

The problem with attribute-based encryption — and decryption — is that it’s slow. To get around that, the MIT and Harvard researchers envision that Sieve users would lump certain types of data together under a single attribute. For instance, a doctor might be interested in data from a patient’s fitness-tracking device but probably not in the details of a single afternoon’s run. The user might choose to group fitness data by month.

This introduces problems of its own, however. A fitness-tracking device probably wants to store data online as soon as the data is generated, rather than waiting until the end of the month for a bulk upload. But data uploaded to the cloud yesterday could end up in a very different physical location than data uploaded by the same device today.

So Sieve includes tables that track the locations at which grouped data items are stored in the cloud. Each of those tables is encrypted under a single attribute, but the data they point to are encrypted using standard — and more efficient — encryption algorithms. As a consequence, the size of the data item encrypted through attribute-based encryption — the table — is fixed, which makes decryption more efficient.

In experiments, the researchers found that decrypting a month’s worth of, say, daily running times grouped under a single attribute would take about 1.5 seconds, whereas if each day’s result was encrypted under its own attribute, decrypting a month’s worth would take 15 seconds.

Wang developed an interface that displays a Sieve user’s data items as a list and allows the user to create and label icons that represent different attributes. Dragging a data item onto an icon assigns it that attribute. At the moment, the interface is not particularly user friendly, but its purpose is to show that the underlying encryption machinery works properly.

Blind manipulation

Key homomorphism is what enables Sieve to revoke an app’s access to a user’s data. With key homomorphism, the cloud server can re-encrypt the data it’s storing without decrypting it first — or without sending it to the user for decryption, re-encryption, and re-uploading. In this case, the researchers had to turn work that was largely theoretical into a working system.

“All these things in cryptography are very vague,” Wang says. “They say, ‘Here’s an algorithm. Assume all these complicated math things.’ But in reality, how do I build this? They’re like, ‘Oh, this group has this property.’ But they don’t tell you what the group is. Are they numbers? Are they primes? Are they elliptic curves? It took us a month or so to wrap our heads around what we needed to do to get this to work.”

Of course, a system like Sieve requires the participation of app developers. But it could work to their advantage. A given application might provide more useful services if it had access to data collected by other devices. And were a system like Sieve commercially deployed, applications could distinguish themselves from their competitors by advertising themselves as Sieve-compliant.

“Privacy is increasing in importance and the debate between Apple’s iPhone encryption and the FBI is a good example of that,” says Engin Kirda, a professor of electrical and computer engineering at Northeastern University. “I think a lot of users would appreciate having cryptographic control over their own data.”

“I think the real innovation is how they use attribute-based encryption smartly, and make it usable in practice,” Kirda adds. “They show that it is possible to have private clouds where the users have real privacy control over their data.”

Protecting bulk power systems from hackers

Hacked

Advances in smart grid technology — such as smart meters in homes, management systems for distributed energy resources like wind and solar production along with instrumentation systems in power plants, substations or control centers — create both improvements in monitoring and entry points for hackers.

“Ten years ago, cybersecurity simply didn’t exist — it wasn’t talked about and it wasn’t a problem,” Ten says, joking that people thought he was crazy for suggesting power grid hacking was possible. “Now with events like in Ukraine last year and malware like Stuxnet, where hackers can plan for a cyberattack that can cause larger power outages, people are starting to grasp the severity of the problem.”

Ten points out that hackers target specific parts of the control network of power infrastructure and they focus on the mechanisms that control it. Automated systems control much of the grid from generation to transmission to use. As Ten puts it, the convenience and cost reduction of automation streamlines the process, but without solid security measures, it also makes the systems vulnerable. The interconnectedness of the grid can also cause cascading impacts leading to blackouts, equipment failure and islanding where regions become cut off and isolated from the main power grid.

Emerging Cybersecurity Threats

Ten and his team draw connections and assess weaknesses using a framework that would constantly assess the bottleneck of a power grid and its interconnection with their neighboring grids. Using quantitative methods to prioritize cybersecurity protection will ensure power grids are operated in a more secure and safer manner. Ten says it’s like measuring blood pressure.

“You know your health is at risk because we monitor systolic and diastolic numbers, so perhaps you work out more or eat healthier,” Ten says. “The grid needs established metrics for health too, a number to gauge if we are ready for this security challenge.”

With a better understanding of the system’s weaknesses, it’s easier to be strategic and shore up security risks. In the long run, Ten says improving regulations with specifics to match actual infrastructure needs and providing cybersecurity insurance will help.

“Simply because the remote substation networks are constantly commissioned with full compliance doesn’t mean they are secure,” Ten says. “There is going to be a tremendous impact if we’re negligent and fail to keep up with changes in communication infrastructure and emerging security threats.”

The Internet and your brain

“The founders of the Internet spent a lot of time considering how to make information flow efficiently,” says Salk Assistant Professor Saket Navlakha, coauthor of the new study that appears online inNeural Computation on February 9, 2017. “Finding that an engineered system and an evolved biological one arise at a similar solution to a problem is really interesting.”

In the engineered system, the solution involves controlling information flow such that routes are neither clogged nor underutilized by checking how congested the Internet is. To accomplish this, the Internet employs an algorithm called “additive increase, multiplicative decrease” (AIMD) in which your computer sends a packet of data and then listens for an acknowledgement from the receiver: If the packet is promptly acknowledged, the network is not overloaded and your data can be transmitted through the network at a higher rate. With each successive successful packet, your computer knows it’s safe to increase its speed by one unit, which is the additive increase part. But if an acknowledgement is delayed or lost your computer knows that there is congestion and slows down by a large amount, such as by half, which is the multiplicative decrease part. In this way, users gradually find their “sweet spot,” and congestion is avoided because users take their foot off the gas, so to speak, as soon as they notice a slowdown. As computers throughout the network utilize this strategy, the whole system can continuously adjust to changing conditions, maximizing overall efficiency.

Navlakha, who develops algorithms to understand complex biological networks, wondered if the brain, with its billions of distributed neurons, was managing information similarly. So, he and coauthor Jonathan Suen, a postdoctoral scholar at Duke University, set out to mathematically model neural activity.

Because AIMD is one of a number of flow-control algorithms, the duo decided to model six others as well. In addition, they analyzed which model best matched physiological data on neural activity from 20 experimental studies. In their models, AIMD turned out to be the most efficient at keeping the flow of information moving smoothly, adjusting traffic rates whenever paths got too congested. More interestingly, AIMD also turned out to best explain what was happening to neurons experimentally.

It turns out the neuronal equivalent of additive increase is called long-term potentiation. It occurs when one neuron fires closely after another, which strengthens their synaptic connection and makes it slightly more likely the first will trigger the second in the future. The neuronal equivalent of multiplicative decrease occurs when the firing of two neurons is reversed (second before first), which weakens their connection, making the first much less likely to trigger the second in the future. This is called long-term depression. As synapses throughout the network weaken or strengthen according to this rule, the whole system adapts and learns.

“While the brain and the Internet clearly operate using very different mechanisms, both use simple local rules that give rise to global stability,” says Suen. “I was initially surprised that biological neural networks utilized the same algorithms as their engineered counterparts, but, as we learned, the requirements for efficiency, robustness, and simplicity are common to both living organisms and the networks we have built.”

Understanding how the system works under normal conditions could help neuroscientists better understand what happens when these results are disrupted, for example, in learning disabilities. “Variations of the AIMD algorithm are used in basically every large-scale distributed communication network,” says Navlakha. “Discovering that the brain uses a similar algorithm may not be just a coincidence.”