Have You Been Pwned: A Q&A with Troy Hunt
Troy Hunt, creator of Have I Been Pwned, discusses the beginnings of his project, how he validates the breached data he acquires, and his thoughts on the upcoming GDPR legislation.
In our latest podcast, Troy Hunt joined our hosts Will Gragido and Thomas Fischer to talk about Have I Been Pwned, his outlook on GDPR and how to develop better security practices. For the full podcast, see below or head over to iTunes or Google Play. Check out the following excerpt for highlights of our burning questions for Troy.
How did Have I Been Pwned start?
Well, the project goes back about four years and at that time I was working with Pfizer as an architect. Basically, in my experience, an architect is normally the sort of thing you do once you've been doing a good job of development. Then they go, “Hey you should stop doing that. You should come do something else that's progressing.” The problem that I really had with that was I like building stuff, but I also like my career progressing and it’s this thing we all get torn between as technology people. So, I was doing my architect thing and you know I guess drawing UML diagrams and stuff like that and I just really missed the hands-on work.
I actually built up Have I Been Pwned in a plane or a hotel in Manila. I was like, “Hey, this would be fun. I’ll just wrangle this up.” I had a bunch of data like from the Adobe data breach that had just happened. The first implementation of Have I Been Pwned was something that was pretty simple and pretty quick and all of those same architecture decisions back then have prevailed through to today and they’ve actually turned out to be reasonable decisions.
When you were in London back in June, you were talking about how sometimes people actually contact you with breached data. How do you deal with the legal and ethical issues behind that and how do you verify that the data is actually valid?
That's a good question. Got any other questions? Nah, nah, I'll answer. It’s key. I am very transparent about how the data comes in and how I make these decisions. At one end of the scale, we’ve got cases where the data is so deliberately redistributed for maximum impact. The number one goal of these folks is to get that data out there as broadly as possible. So obtaining that data is not hard because it’s everywhere. At the other end, we have incidents where a database just has no password, or data backups are published to publicly-facing web servers. I’ll give a couple of scenarios of how we’ve handled things.
About a year ago, someone popped up and said, “I’ve found a database backup of the Red Cross Blood Service in Australia." This ended up being a big story because it was our largest ever database breach down here and there were over a half million records involved. My and my wife’s records were in there. They have data like names, birth dates, eligibility questions, like some really sensitive stuff in there. This is a case where the database had been inadvertently published to a publicly facing web server. Someone just found the database, got in touch with me, and I went through a local CERT (Cyber Emergency Response Team) and they sort of handled the process with the Red Cross. While there were no indicators that anyone had accessed the data, I just wanted to give assurance to people that this thing had been cleaned up.
Then there’s this crazy more recent South African data breach where someone found a publicly facing database backup of about 27 GB of data on basically everyone in South Africa. For some reason, all of that information was required by a real estate agent, who backed it up. The files dated all way back to 2015 so this database had been sitting there for ages. Other people have this data and it’s not a case that’s just going to go back in the bottle. In this scenario, I’m going to put it in Have I Been Pwned and we’re obviously going to notify all the appropriate authorities and it’s become a big new story. In terms of legalities, it’s just a massively grey area so I have to take every incident case by case in the most responsible way I can think of to handle it that will make both the organization and impacted parties aware. So far, it’s worked quite well.
What do you think is the biggest challenge the cyber security industry faces globally today?
I think it’s a developer competency thing. The thing I just see over and over again is people building systems with mistakes we’ve known about for so long. All the stuff in the OWASP’s Top 10 we just see over and over again. In particularly this era of Cloud and DevOps and giving people access to things that go all the way through the pipeline and through the production systems, it’s very often the same folks who build these systems that are putting those MongoDBs up there with no passwords and in publicly facing networks or backing their databases up to web servers. It all ties back to the mistakes being made by individuals. The underlying issue is the competency of the people building the systems and what we’re seeing now is that there’s sort of enough factors out there now that are causing those mistakes to be more and more visible than ever before.
Should we be doing something better to actually teach better security practices or is it just that our industry is growing so big now that we’re kind of just rounding up as many people as we can and we’ll deal with best practices later?
Many organizations build a project and wait until they get to the end of it to check if the security is okay. The software will be built and right at the point where they’re about to go live, they get the good security folks to go through it but there’s no budget or time. It’s sad and amusing at the same time. We’ve also got a huge amount of people in this industry that have no formal education at all, but whether or not you have a formal or informal education, we often don’t have security as part of that. It almost seems ancillary. We need to ingrain security because if it’s not part of what you’re doing from the beginning, then it’s always going to be an uphill battle.
Everybody is talking about GDPR nowadays because of the impact it's going to have. You deal with data breaches and data breach notifications. Do you think the GDPR is going to wake up the industry and wake up companies?
I guess the hope we have is that we actually see some organizations that do a bad job of security get hit with the 4% of gross annual revenue fine. I often use TalkTalk as an example. They say the breach cost them about £42 million but the fine that they got from the ICO was like £400,000 off revenue of billions. Apparently it’s the biggest fine the ICO has levied. That’s like the equivalent of if you earned $100,000 and you got fined $20. It’s lunch and that worries me.
Now if they had gotten hit with the max amount under the GDPR, that $20 fine turns into £72 million. Now that actually hurts. I hope that will change behavior. At the moment, most of the noise that’s being made about GDPR seems to be from lawyers. I’m really keen to see what happens once we hit May next year. What difference will it actually make? GDPR is just unifying existing legislation as well. We’ve had data protection laws effective in the EU for years, just implemented differently between the states. My fear is that it won’t actually change a lot and the biggest beneficiaries will be folks trying to sell their services and scare people. GDPR talks a lot about extraterritoriality though and how it will apply to other countries. Let’s see how much it actually changes behaviors in other places outside the EU. Until we see, say, a Chinese-based company getting stunned by a regulator in Belgium for losing a lot of EU data and receiving some sort of financial sting that incentivizes them enough to try and get it right, I can’t see the legislation making much of an impact.
Intro/outro music: "Groovy Baby" by Jason Shaw, licensed under CC BY 3.0 US
The Definitive Guide to DLP
- The seven trends that have made DLP hot again
- How to determine the right approach for your organization
- Making the business case to executives
The Definitive Guide to Data Classification
- Why Data Classification is Foundational
- How to Classify Your Data
- Selling Data Classification to the Business