Amy Heineike is the Director of Mathematics at Quid, a startup that seeks to understand technology development and dissemination through data analysis. She was the first employee at Quid, where she helped develop their technology early on. She has been recognized as one of the top Big Data Scientists. As a part of our ongoing interview series talked to Amy about data science, Quid, and how statisticians can get involved in the tech scene.
Which term applies to you: data scientist, statistician, computer scientist, or something else?
Data Scientist fits better than any, because it captures the mix of analytics, engineering and product management that is my current day to day.
When I started with Quid I was focused on R&D - developing the first prototypes of what are now our core analytics technologies, and working to define and QA new data streams. This required the analysis of lots of unstructured data, like news articles and patent filings, as well as the end visualisation and communication of the results.
After we raised VC funding last year I switched to building our data science and engineering teams out. These days I jump from conversations with the team about ideas for new analysis, to defining refinements to our data model, to questions about scalable architecture and filling out pivotal tracker tickets. The core challenge is translating the vision for the product back to the team so they can build it.
How did you end up at Quid?
In my previous work I’d been building models to improve our understanding of complex human systems - in particular the complex interaction of cities and their transportation networks in order to evaluate the economic impacts of, Crossrail, a new train line across London, and the implications of social networks on public policy. Through this work it became clear that data was the biggest constraint - I became fascinated by a quest to find usable data for these questions - and thats what led me to Silicon Valley. I knew the founders of Quid from University, and approached them with the idea of analysing their data according to ideas I’d had - especially around network analysis - and the initial work we collaborated on became core to the founding techology of Quid.
Who were really good mentors to you? What were the qualities that helped you?
I’ve been fortunate to work with some brilliant people in my career so far. While I still worked in London I worked closely with two behavioural economists - Paul Ormerod, who’s written some fantastic books on the subject (mostly recently Why Things Fail), and Bridget Rosewell, until recently the Chief Economist to the Greater London Authority (the city government for London). At Quid I’ve had a very productive collaboration with Sean Gourley, our CTO.
One unifying characteristic of these three is their ability to communicate complex ideas in a powerful way to a broad audience. Its an incredibly important skill, a core part of analytics work is taking the results to where they are needed which is often beyond those who know the technical details, to those who care about the implications first.
How does Quid determine relationships between organizations and develop insight based on data?
The core questions our clients ask us are around how technology is changing and how this impacts their business. Thats a really fascinating and huge question that requires not just discovering a document with the answer in it, but organizing lots and lots of pieces of data to paint a picture of the emergent change. What we can offer is not only being able to find a snapshot of that, but also being able to track how it changes over time.
We organize the data firstly through the insight that much disruptive technology emerges in organizations, and that the events that occur between and to organizations are a fantastic way to signal both the traction of technologies and to observe strategic decision making by key actors.
The first kind of relationship thats important is of the transactional type, who is acquiring, funding or partnering with who, and the second is an estimate of the technological clustering of organizations, what trends do particular organizations represent. Both of these can be discovered through documents about them, including in government filings, press releases and news, but requires analysis of unstructured natural language.
We’ve experimented with some very engaging visualisations of the results, and have had particular success with network visualisations, which are a very powerful way of allowing people to interact with a large amount of data in a quite playful way. You can see some of our analyses in the press links at http://quid.com/in-the-news.php
What skills do you think are most important for statisticians/data scientists moving into the tech industry?
Technical statistical chops are the foundation. You need to be able to take a dataset and discover and communicate what’s interesting about it for your users. To turn this into a product requires understanding how to turn one-off analysis into something reliable enough to run day after day, even as the data evolves and grows, and as different users experience different aspects of it. A key part of that is being willing to engage with questions about where the data comes from (how it can be collected, stored, processed and QAed on an ongoing basis), how the analytics will be run (how will it be tested, distributed and scaled) and how people interact with it (through visualisations, UI features or static presentations?).
For your ideas to become great products, you need to become part of a great team though! One of the reasons that such a broad set of skills are associated with Data Science is that there are a lot of pieces that have to come together for it to all work out - and it really takes a team to pull it off. Generally speaking, the earlier stage the company that you join, the broader the range of skills you need, and the more scrappy you need to be about getting involved in whatever needs to be done. Later stage teams, and big tech companies may have roles that are purer statistics.
Do you have any advice for grad students in statistics/biostatistics on how to get involved in the start-up community or how to find a job at a start-up?
There is a real opportunity for people who have good statistical and computational skills to get into the startup and tech scenes now. Many people in Data Science roles have statistics and biostatistics backgrounds, so you shouldn’t find it hard to find kindred spirits.
We’ve always been especially impressed with people who have built software in a group and shared or distributed that software in some way. Getting involved in an open source project, working with version control in a team, or sharing your code on github are all good ways to start on this.
Its really important to be able to show that you want to build products though. Imagine the clients or users of the company and see if you get excited about building something that they will use. Reach out to people in the tech scene, explore who’s posting jobs - and then be able to explain to them what it is you’ve done and why its relevant, and be able to think about their business and how you’d want to help contribute towards it. Many companies offer internships, which could be a good way to contribute for a short period and find out if its a good fit for you.