Interested in hearing this talk at your conference or meetup? Get in touch!
Want to share? Tweet this talk
Be sure to check out these Sketchnotes from a 15 minute version of this talk from Codeland by Stephanie Nemeth
I’m going to tell you about how I took a job building software to kill people.
But don’t get distracted by that; I didn’t know at the time.
Even before I’d walked across the stage for graduation, I accepted an offer for an internship. It paid half again as much as the most I’d ever gotten in the highest paying job up to that point. Not to mention that I’d spent years in college with low paid student jobs or living only on student loans.
I’d be joining a contracting company for the Department of Defense. The Department of Defense, or DOD, is the part of the government made up by the military in the United States. The DOD outsources all sorts of things, from P-8A Multi-mission Maritime aircraft to blue, shade 451, Poly/wool cloth.
At the time, I thought nothing of the fact that I’d be supporting the military. Besides, they’re the good folks right? My dad was in the military. So was my grandfather. It was good money, a great opportunity, and a close friend of mine had gotten me the gig. Life was good.
I showed up for my first day of work in North Virginia, or NoVA as the industry likes to call it. I met the team of other interns, then learned what the team would be building: a tool to use phones to find WiFi signals.
It seemed pretty cool compared to what I’d built to up that point. The most complicated thing was an inventory management system. It didn’t concern itself overmuch with persistence. Who needs to stop and restart a program anyhow? The data was right there in memory and if you forgot how many Aerosmith CDs you had, who cares? It got me an A on the assignment, and that was all that mattered.
Honestly, the idea of finding WiFi routers based on the signal strength seemed pretty intimidating at the time. The idea impressed me.
But don’t get distracted by all this; the software was intended to kill people.
I joined the team after they’d already gotten started on the project. The gist of the tool was that it would look at how WiFi signal strength changed as your phone moved around. If the signal strength got stronger, you were getting closer. If it got weaker, you were moving away. To find this information, we’d collect two pieces of information for each WiFi access point in range. The phone’s geolocation information and the WiFi signal strength.
To predict the actual location of the WiFi signal, we used a convolution of two algorithms. Both of them relied on the Free Space Path Loss equation. For our purposes, FSPL calculates how far away a phone is from a WiFi signal based on loss in signal strength. It assumes that there is only empty air, or “free space”, between the access point and the phone.
The first algorithm was R^2. It measured the difference between the signal strength we’d observed and the expected signal strength at each distance in a search grid based on Free Space Path Loss. Locations with the lowest R^2 error rate were the most likely location.
We’d combine that calculation with a Gaussian estimate. Now I spent two or three days last week trying to understand Gaussian estimates for this talk and couldn’t. The best documentation on them are still research papers. I do know that it creates a probability curve. The high points of the curve are distances where the access point is likely to be, and low points are unlikely. The curve started with a probability hole of low values. They represented low likelihood that the phone is standing right next to the signal. The curve then increased to high probabilities further out. It decreased to near zero even further. The algorithm adjusted the width and height of these two curves by consulting past measurements. It created a heat map of probabilities for the signal source.
We’d normalize the probabilities for each location in the search grid. A combination of probabilities was more correct than either algorithm itself.
We stored this probability matrix for each location a phone collected from. Using these, if we collected readings while moving in a straight line, we could tell you how far away the WiFi was. If you turned a corner, we could also add direction, so you could find it in 2D space. If you climbed some stairs, we’d show you altitude as well. The technology was the most interesting project I’d ever worked on.
But don’t let that distract you; it was designed to kill people.
I mentioned that I had a software engineering degree at this point. My teammates were earlier in their education careers. Most of them were a year or two into their four year programs, also a mix of Computer Science and Math majors. My expertise was in the design and process of building software, while theirs was more in high level mathematics or the theory of computer use.
I helped to translate the working algorithms they’d designed in MatLab to the Java code we needed to run on the phones.
And let’s be honest, we spent plenty of time deciding whether we preferred Eclipse or NetBeans. Can I say how happy I am that as a Ruby developer, I’ve not had to figure out where to put a .jar file in over half a decade?
One such example is calculating distance between two points. As these things go, it was in the deepest level of each loop. We were using great circle distance, which is the way you measure the shortest distance between two points on a sphere, such as Earth. Did we need to use this complicated measurement over the meager distances a WiFi network can operate over? Absolutely not. Did I mention this was over half a decade ago? We weren’t always making the best choices. Anyhow, we needed to calculate these distances. The function doing it was being hit hundreds of thousands of time for each collection point, often with the same two locations. It was a very slow process.
We solved that by implementing a dictionary of latitude/longitude pairs to distances. This at least meant we didn’t re-do those calculations. This and other optimizations we made sped up the performance from seven minutes to a few seconds.
But don’t get distracted; that performance increase made it faster to kill people.
The accuracy of the locations wasn’t fantastic. I don’t remember exactly what it was before we focused on improving this, but an average error about 45 feet sticks in my head.
That’s a little longer than a Tyrannosaurus Rex nose to tail, or more concretely the length of a shipping container.
That’s significant when the WiFi range for 802.11n is only about 100 feet. That means we could be up to almost half the range of the router off from where it was.
I talked about the Gaussian estimation, the two curves from the second algorithm. We hard-coded numbers that defined this curve. They were only starting points, but they were starting points every time we made the calculation.
A Genetic Algorithm is a type of program that produces a set of values that optimize for a desired result. It’s a perfect fit for tuning these hard coded values to get more accurate results.
Each of the Gaussian estimation values will is a gene. The set of values is a genome in Genetic Algorithm parlance. The genes were our 3 constants.
A fitness function is what Genetic Algorithms use to measure performance of a genome. For the dataset of readings I was using in the GA, I knew the actual location of the access points. That meant I could run the geolocation algorithm with each genome’s values in place as the fitness function. The result would be the distance between the actual location and the one calculated by the GA-derived values.
Genetic Algorithms take a set of genomes, called a generation, and keep a certain percentage of top performers.
These top performers “survive” to the next generation as copies. Sometimes the algorithm mutates these copies by adding a Gaussian random value to each gene. This means that there was a chance that any of the copied genomes would have each gene changed slightly. That way they would have a chance of performing better or worse. New random genomes are created for the remaining spots in the new generation.
We saved the top performers across all populations. When the GA ended I could take a look at the values and select the best performer.
I let this genetic algorithm run over the weekend. It was able to increase the accuracy from 40-odd feet to about 10.3 feet, 25% of the error from the original. That is less than the GPS accuracy on the smartphones collecting data (which is about 16 feet according to GPS.gov). This is too accurate, so it may have over-optimized against the test data and might not be as accurate against other data sets. This is called overfitting and the way around it is to have separate sets of training and test data. Did I mention this was five years ago? I didn’t even know what overfitting was back then.
I loved this. Genetic algorithms, R^2, and Gaussian estimation are the kind of thing that they tell you you’ll never need to use again once you graduate. But we were using them for a real world project! It was great.
But don’t let that distract you; this accuracy made it easier for the software to help kill people.
The tracking now worked accurately and quickly. The next feature was to add tracking a moving WiFi access point. I briefly wondered why an access point would be moving, but that question wasn’t as interesting as figuring out how to track it.
We made use of Kalman Filters to observe state variables: the position, velocity, and acceleration of the WiFi signal. Given these and the time since the last measurement, a Kalman Filter is able to improve the current prediction with surprising accuracy, filtering out “noise” data automatically.
Each time we ran the real time algorithm, we’d also feed this to the Kalman Filter. With only that information, it was able to produce an estimate that is a weighted average of previous data. A new predicted location that was more accurate than the calculated value.
At the same time, we added the ability to track more than one WiFi signal. We’d filter our collection of readings by the unique identifier of each access point. The filtered datasets each went through the full algorithm to produce predictions.
We used the APIs on the phone to read the signal strength of all WiFi access points in range. We were able to track multiple hotspots, basically whatever we could see in the WiFi network list. It was all very exciting. These seem like academic problems, but we were getting to use them in a real project! Being a programmer was going to be great.
But don’t let that distract you; this meant we could kill multiple more accurately people.
We’d been working with the project owner throughout this process.
He was laissez-faire about most things. He might check in once a day then going back to his main job in the area of the building dedicated to classified work.
Whenever we hit one of these milestones, we’d tell him. He’d be happy about it, but a question always came up. He wanted it to sniff for the signals put out by phones in addition to WiFi hotspots. This is a much harder problem from a technical perspective. The functionality necessary to do this is “promiscuous mode”, a setting on the wireless network controller. Neither iPhone nor Android supported that option. We’d need to jailbreak or root the phone regardless of the platform. We looked for packages that we could use that would let us do this to “sniff” the packets that devices sent back to routers. The closest we ever found was a SourceForge project that seemed promising. We didn’t fully understand its use and it wasn’t well documented.
We told the project owner that we’d get to it later. None of us thought it was that important: we had the technology to find WiFi Access Points working. That was the goal right? Each time we’d demonstrate the new exciting tech we’d built though, the same question came up.
- We got WiFi hotspots located! Great, does it find phones?
- It’s taking seconds instead of minutes! Great, does it find phones?
- We looked into it finding phones, it seems unlikely but maybe! Ok, we’ll come back to it.
- We got moving targets working! Great, does it find phones?
I had been distracted.
All of the cool problems we were solving: finding nodes, speeding things up, making more accurate predictions. It was all so cool, so much fun. I hadn’t thought about why we were putting all this work into finding a better place to sit and get good WiFi. That doesn’t even make sense if you look at it for more than a few seconds.
Does it find phones.
This was never about finding better WiFi. We were always finding phones. Phones carried by people. Remember I said I was working for a Department of Defense contractor? The DoD is the military. I was building a tool for the military to find people based on where their phones where, and shoot them.
I tried to rationalize this then. The military is in place to protect Truth, Justice, and the American Way. But this was the same time that we found out the government had been spying on Americans in the US with drones. They’d also lent out that technology to federal, state, and local law enforcement agencies nearly 700 times to run missions. The military and government do things that I know I don’t agree with pretty often. I didn’t want to be a part of building something used to kill people, especially since I knew I’d never know who it was killing, let alone have a say.
I rationalize it now too. We were interns, and we didn’t even have clearance. The projects this company did for the government were classified Top Secret. I wasn’t allowed to know what they were. My code probably got thrown away and forgotten. Probably.
This was an extreme example of code used in a way that the creator did not intend it. The project owner conveniently left out its purpose was when explaining the goals. I conveniently didn’t focus too much on that part. It was great pay for me at the time. It was a great project. Maybe I just didn’t want to know what it would be used for. I got distracted.
There are other examples of when code is used in ways it wasn’t intended, and of code that does bad things.
A year and a day ago, a developer named Bill Sourour wrote a blog post. It opened with the line: “If you write code for a living, there’s a chance that at some point in your career, someone will ask you to code something a little deceitful – if not outright unethical.”
Bill had been asked to create a quiz that would almost always give a result that benefitted his client. Bill worked in Canada, and in Canada there are laws in place that limit how pharmaceutical companies can advertise prescription drugs to customers. Anyone could learn about the general symptoms a given drug addressed, but only patients with prescriptions could get specific information about the drug.
Because of this law, the quiz was posing as a general information site and not an advertisement for a specific drug. If the user didn’t answer that either they were allergic to the drug or already taking it, every quiz result suggested this specific drug. That’s what the requirements said to do, and that’s what Bill coded up.
The project manager did a quick test before submitting the website to the client. She told Bill that the quiz was broken: it always had the same answer. “Those were the requirements,” Bill responded. “Oh. Ok.”
A little while later, Bill got an email from a colleague that had a link to a news article. A young woman had taken the drug that Bill had built this quiz for. She had killed herself. It turns out that one of the main side effects of the drug were severe depression and suicidal thoughts.
Nothing Bill did was illegal. Like me, Bill was a young developer making great money. The purpose of the site was to push a particular drug - that’s why it was being built. He chalked it up to marketing. He never intended for this to happen. Maybe Bill got distracted too.
As his conclusion, Bill writes:
Bill’s story isn’t that far off from mine, but there are still other examples.
Earlier this year, a story came out that Uber had built into its ridesharing app code they call “greyball”. It’s a feature of their VTOS (or violation of terms of service) tool that can populate the screen with fake cars when the app is opened by users in violation of the terms of service.
In a statement, Uber said, “This program denies ride requests to users who are violating our terms of service — whether that’s people aiming to physically harm drivers, competitors looking to disrupt our operations, or opponents who collude with officials on secret ‘stings’ meant to entrap drivers.”
In practice, as The New York Times reports, it was used in Portland to avoid code enforcement officers working to build a case against Uber for operating without a license. When triggered by Uber’s logic, it populates the app with cars that don’t exist, with fake drivers who quickly cancel after accepting a ride.
I am not a lawyer, but it seems like this is likely an obstruction of justice, itself a crime outside of Uber’s unlawful operations in Portland. Greyball is used even today, though mostly outside the United States. I’m a huge fan of ridesharing - though I use a competitor in Austin and Boston called Fasten rather than the much larger Uber or Lyft. But it’s not uncommon to see in the news these days articles about heinous things these drivers are doing. Greyball may have enabled some of those.
Again, it’s an unintended consequence of a tool built. Maybe the greyball internal pitch was to “greyball” users who were in violation of the terms of service. People who were under 18, or who didn’t pay to clean up their late night explosive accidents one too many times for example. Rather than block them, probably causing them to create a new account, they could be put into an alternate dimension where for some reason they just couldn’t ever get a ride. That’s fine, right?
If these developers had thought about the worst possible case for how this could be used, maybe obstruction of an investigation into Uber’s shady dealings would have come up in that conversation and it could have been addressed early on. Maybe they were distracted by the face value of the request from looking deeper at the purpose and uses.
There’s all sorts of things as well that aren’t as black and white (if you’ll excuse the pun). Apps that always listen to the microphone to tailor ads to you based on what you say near your <phone, websites designed to exploit psychology to take up as much of your time and attention as <possible, and any number of apps that opt you into mailing lists when you sign up or purchase something. These aren’t nearly as obviously bad, but at least in my opinion they’re still kind of shady.
This value system is different for others. We don’t always agree as individuals what is right and wrong, or even with what should be legal or illegal.
There are actually words for things that society decides are good or bad versus what you or I individually believe: ethics and morals. While modern philosophy more or less uses these terms interchangeably, a common understanding at least between us will be important later.
Ethics are imposed by an outside group. A society, a profession, a community such as ours or even where you live. Religions provide ethical systems, as do groups of friends. Societies in whatever form define right and wrong, good and bad, and imposes those on its members. Ethics in societies such as local, state, and national groups are often, but not always, coded into laws.
Morals are a more personal version of the same thing. Society as a whole imposes its mores on smaller communities, and all of that trickles down to the individual level. That’s not to say that your morals can’t conflict with the ethics of society. For example, you might think that freedom of speech is a basic human right, but live somewhere that defacing religious or political objects is considered wrong.
Let’s not get distracted by morals and ethics yet, though. We’ll come back to them.
The unifying factor in all of the stories I’ve told is that a developer wrote the code that did these unethical or immoral things. As a profession, we have a superpower: we can make computers do things. We build tools, and ultimately some responsibility lies with us to think through how those tools will be used. Not just what their intention is, but also what misuses might come out of them. None of us wants to build things that will be used for evil.
The Association for Computing Machinery is a society dedicated to advancing computing as a science & profession. ACM includes this in their Code of Ethics and Professional Conduct:
So how can we “carefully consider potential impacts”? Honestly, I don’t have any answers to this. I don’t think that there really is a universal answer yet, because if we had it I have to believe we’d not be building these dangerous pieces of software.
I do have a couple of ideas though. One I got from my friend Schneems is to add to the planning process a step where we come up with the worst possible uses of our software. In opting in folks to an email list by default, the worst case might be that we send them a bunch of unwanted email and they unsubscribe. Maybe they even stop being a customer. As Schneems said: “Am I willing to sell my hypothetical startup’s soul for a bigger mailing list, when that might be all that keeps the company afloat? Yeah, no problem.” That makes sense to me. I don’t think it’s the best practice, but in the end it’s not physically hurting anyone. If I had sat down and thought about what the WiFi location app could be used for in the worst case, I would have come to a very different conclusion.
Actually, thinking about the worst possible uses of code could probably be a fun exercise. You might come up with some pretty wacky examples like “If we send Batman an email and he happens to have notifications on his iPhone for new emails, he might be looking at the notification when the Riddler drives by in the Riddler Car and he might not catch him before he gets off his witty one liner at the crime scene. Riddle me this, riddle me that, who’s afraid of the big, black bat?.” This isn’t so plausible, but it shows that these exercises can go down all sorts of different paths that aren’t obvious at a glance.
Another, the thing that I think I should have done, and that we can all do more of, is to simply not take requests at face value. The project owner at the Defense contractor I worked at didn’t spell out what the reason for the code was. But at least in retrospect, it wasn’t a big leap of logic. “We’re going to build an app to find WiFi signals” is all true, but it’s not the whole truth. Asking them, or myself, “why” enough times probably would have led me to a much earlier understanding. Why? To find the sources. Why? To go to them. Why? Why? Why?
Comedian Kumail Nanjiani, best known for the TV show Silicon Valley and his recent film The Big Sick, took to Twitter recently on this subject.
It’s a major problem when we’re given so much power in tech, but we’re not doing anything to ensure that we use it safely. Thinking about what we’re doing and being careful not to build things that can be used maliciously is really important.
Make your own decisions. Make your own choices. Make your own judgement.
Don’t get distracted by deadlines and feature requests. Think about the consequences of what you’re building. Build in safeguards to prevent misuse, or don’t build it at all because it’s too dangerous.
I’m asking you to do something about this. Well, I guess I’m asking you to not do something because of this. It’s only fair that we talk a bit about how and when to take a stand.
Let’s say I had a time machine and could go back in time to 2011 and do it all over again. I already have the foreknowledge that this tool is unethical. I’ve accepted this job. I’ve moved across the country from Tempe, Arizona to Brookville, Maryland. I’ve driven the two hour commute to Sterling, Virginia, home of 395 defense contractors awarded 16.8 trillion dollars in contracts over the past 16 years. It’s my first job out of school, and it’s my first day. I don’t have my clearance so I’m an intern. My new project owner introduces me to the team then pulls me into a side room to give me an overview of the project. What do I say?
I think the first thing is to establish a mutual understanding of the task. It’s entirely possible at this point that I don’t understand what the actual thing is, and that I’m overreacting. I ask “Why are we finding these signals” and the project owner says “We want to find people’s cell phones.” “Who’s finding them, and why?” I ask. “I don’t know, probably some soldiers in the Middle East.” “Why?” I repeat. “I can’t tell you that.”
“I can’t tell you that” is something I got a lot from this project owner. It’s code for “I have clearance and I know things about this project. I know what you’re asking and I know the answer but I am not allowed to tell you.”
At this point, I think we have a mutual understanding. The task is to help soldiers find people’s phones, probably attached to those people. The reason is left unsaid but we both know.
This organization is a defense contractor. They build things for the military. It is their core competency. They’re not not going to do this…. On the other hand, I care a lot about not killing people. The company’s goal is to build things for the military. If my goal is not to let this happen, then there isn’t a good fit for me at this company. This probably means that the worst case here is that I’m going to leave today without a job. Either I’ll say no and they’ll fire me, or I’ll say “that’s not something I’m comfortable with, best of luck” and quit. These are the worst case scenarios, not necessarily what will happen.
Before saying no then, I need to consider: Can I afford to leave here without a job financially? Am I likely to be able to rely on my network to get me another job? Have I built up a trust with my employer where I can go to them with this type of thing and feel confident that I’ll be heard out? The answer to these questions was no for me in 2011. Sometimes, something is important enough that you should still do something, but there’s a lot that goes into these decisions. I’d like to think that I would still say no.
Let’s look at another situation, where someone did the ethical thing. A developer we’ll call Alice received a strange request. We want to identify weak passwords in the system to notify users to change them. We’d like you to run a password cracking program on the very, very, large password database.
This was a long time ago, before aged passwords were common. Expiring old passwords wasn’t a straightforward option. Alice thought this was a weird request, but said that if the appropriate paperwork was completed she would be willing.
Alice received the completed paperwork and ran the password crack. The next request was “We’d like the list of users along with their weak passwords”. Alice knew that her coworkers had a valid desire to help customers improve their passwords. She also knew that users often re-used passwords. Combining the email and password into one report could allow someone to log into the customers’ accounts on other websites.
Alice pointed this out to her manager, and together they worked with the CSA team to design an email that didn’t include the password. Customers received notifications about their weak passwords, and there was less risk of the report falling into malicious hands. No one was fired and Alice built up trust within her team.
Different scenarios need different ways of analyzing what you should do. In some cases, the right thing to do say nothing and build the product. It isn’t a simple thing to make this decision.
But don’t get distracted by having to think through it. Sometimes your code can kill people.
- Does Technology Need to Be Ethical? Anil Dash briefly talks to The Atlantic about tech ethics.
- Is your smartphone listening to you? The BBC addresses whether tech companies use your phone to listen to what you’re saying while not using their apps, and if they even can.
- Know your ethical obligations regarding coding and documentation A blog post on how to define your ethical obligations as a programmer, some ways of dealing with them, and some real world examples
- ‘The Business of War’: Google Employees Protest Work for the Pentagon Google employees ask Google’s CEO not to build software to help the military build warfare technology
- Volkswagen America’s CEO blames software engineers for emissions cheating scandal VW’s CEO blames software engineers for changing how diesel emissions are reported during tests. Even if they were doing what they were told, someone up the chain can throw the blame back onto the programmers.
Glossary of Terms
- Gaussian Estimation Used in the WiFi geolocation algorithm to estimate free space path loss using probability density estimation. It said “it’s less likely to be nearby if you’re looking for it and more likely to be further out”.
- Genetic Algorithm A type of machine learning or searching that is inspired by the theory of evolution. It uses randomization to create populations of individuals and tests their fitness to determine which ones will reproduce. It was used to increase accuracy in the geolocation algorithm.
- Heroku A cloud platform as a service (PaaS) supporting several programming languages, that is used as a web application deployment model. It supports Java, Node.js, Scala, Clojure, Python, PHP, and Go.
- Kalman Filter A Kalman Filter can be used any time you have a model of motion and some noisy data that you want to produce a more accurate prediction.
- Memoization Remembering the result of a calculation based on its arguments, and then using that result instead of re-calculating if the same method call is made with the same arguments. Caleb’s team did this with a hash/dictionary/map (different names for the same thing) that contained the location pairs as keys and the distance between them as values.
- R2 Algorithm Used in the WiFi geolocation algorithm to measure the difference between expected and actual signal strength for each point in a search grid. The smallest difference is the most likely to be the correct distance from the source.