A Review of Beautiful Data edited by Toby Segaran & Jeff Hammbacher
As Google Director of Research Peter Norvig says in his essay on data for a natural language corpus, “Most of this book deals with data that is beautiful in the sense of Baudelaire: ‘All which is beautiful and noble is the result of reason and calculation.’” Admittedly, Beautiful Data isn’t the book I expected, but it was captivating nonetheless.
Based mainly on its title and cover image, I anticipated Beautiful Data would be something more like Jonathan Harris meets a college math book. However, the book is neither filled with visual representations of information, nor is it heavy on the equations. Rather, it’s a collection of individual essays all loosely tied to the topic of data usage.
Don’t look for how-tos, lectures, or any mechanics. This book is the liberal arts version of computer science. Thirty-nine individuals, each with varying degrees of experience and all covering an array of industries, present stories on how they use data and how data have influenced their work. Their case studies and examples sometimes touch on the philosophy of data and analysis and occasionally are intimate portrayals of the marvels and shortcomings experienced when dealing with information.
Offering alluring glimpses into the projects and data centers of organizations like Facebook and Google, graduate student research, and the making of a Radiohead music video, these are stories of data experiences. Beautiful Data feels a bit like bedtime reading for the data scientist, statistician, or programmer. When you’re done going through the stacks, logs, and code of the day, choose a story and learn how other people are capturing, moving, and understanding bits of information.
I began with the essay on applying aspects of user experience to data collection, “Beautiful People: Keeping Users in Mind When Designing Data Collection Methods.” The story starts with the notion that researchers can often get better data from users who don’t specifically know that they are being surveyed. For those times when a researcher must explicitly ask users to complete a survey in order to gather valuable data, the authors go through a case study on designing a demographic survey to collect perceptions of luxury products.
I enjoyed the essay from Jeff Hammerbacher, the once data manager at Facebook and now founder of Cloudera. More than just an inside look at the struggles Facebook encountered as its databases expanded during the site’s rise to popularity, the essay, “Information Platforms and the Rise of the Scientist,” is the story of how Hammerbacher went from a 17 year old hiding out in his local library to a research scientist at Facebook, despite his “potentially suboptimal background.”
Returning to Peter Norvig, his piece “Natural Language Corpus Data” has a mundane title but is an essay I wish I could have read in college while I was deep in linguistic corpora and word frequencies. In amazingly straightforward terms, Norvig breaks down what a linguistic data corpus is and what kind of information the Google n-gram corpus holds. More technical than many of the essays, Norvig does include the Python and calculations specific to his discussions, but non-programmers should be able to skip through the more technical details and still come away with a strong understanding of the ideas. As Norvig explains of the data points in the Google corpus, it isn’t merely the collection and aggregation of data that makes it beautiful; “The data is beautiful because it represents much of what is worth saying.”
Beautiful Data: The Stories Behind Elegant Data Solutions is published by O’Reilly Media. You can purchase a copy from O’Reilly, Amazon, or other book sellers. I received a free download of this book from O’Reilly to write this review but I chose the book based on my own interests in data and analysis.