Wednesday, December 2, 2015

Charles Pensig: Reccomendations for Data People

Charles Pensig wrote an article on his displeasure with his formal data science program at UC Berkeley. I'm grateful to have had a great experience going back to school part-time, mostly because of the connections I've made with group work and my university's ability to leverage top government and corporate organizations to work with our program on projects. When I get my portfolio together, it will be clutch. I sympathize with Charles, but he dropped a great list of recommendations for learning more about data. While I'm locked indoors attempting to stay warm this Winter, I'm planning to get to know Mr. Data Science really well.
Recommendations for Data People
Sign up for some online classes, get a pile of books, schedule two hours into every week night, and sit at an empty desk working through them. Don't leave the desk.  Here are sets of resources you can look into, in order of importance, whether you're learning this for the first time or as a refresher:


  • SQL: If you can't get data, you can't analyze data.  Whether you retrieve data from a SQL database or Hadoop cluster with a SQL-language layer on top of it, this is where you start.  http://sqlschool.modeanalytics.com/ is a great interactive learning interface.  O'Reilley's SQL Cookbook is a masterpiece that traverses all levels of SQL proficiency.
  • Full-Stack Data Science: Coursera offers a full stack online curriculum on a continuous basis for a reasonable price.  This DOES NOT teach you SQL.  If you're in SF or NYC, you can attend General Assembly's pricier in-person full stack curriculum.  This gives you a cursory introduction to data storage, retrieval, prep, light analysis, and deeper predictive and inferential analysis.
  • Python: Code Academy or Udemy will teach you the basics.  Python can play two functions in the skill stack: 1) to conduct ad-hoc statistical analysis as you would with R, 2) to do everything else.  Python is important for the "everything else."  You might use it to get data from APIs, scrape, write ETL jobs, refresh data in your warehouse, or retrain models.  This is the piece of the skill stack moves you from being a Static Data Scientist (one who works with data in a manual fashion), to a Live Data Scientist (one who has automated many of the processes contributing to data science output, loosely defined).
  • Basic Statistics: Khan Academy Probability and Statistics. 
  • Linear Algebra and Multivariable Calculus: Go to a local college or Khan Academy to brush up on Multivariable Calculus and Linear Algebra.  Their curriculums have largely been the same for the past 5 decades.
  • Mapreduce/Hadoop: Focus on this last**. There are so many technologies that enable SQL-like interfacing with Hadoop that to know how to write a MapReduce job is, for the most part, not necessary. To build real MapReduce pipelines is a behemoth of a task that might be the work of an early-stage startup Data Scientist, but shouldn't be if you have a solid BI infrastructure team. This is why companies hire the rockstars we know as backend and data engineers.  Side note: if you ever meet one and aren't sure what their company does, thank them for their service to our country, regardless.
  • Cleaning: plan to spend most of your time cleaning and transforming in these languages/technologies.  The analysis is the fast and fun part.

Wednesday, October 14, 2015

Power Searching with Google

I feel like I've been a walking Google advertisement here lately, but I'm learning so much about Google and what they offer. I'm visiting Southern California for vacation this winter, and on my agenda is a chat session with a few people from the Engineering and Design team at Google LA. I think they do amazing work and I can't wait to pick their brains.

In the meantime, if you want to learn about all the cool and fascinating functions of Google Search, there's a whole class on it--and it's free! Check it out here: https://coursebuilder.withgoogle.com/sample/course

Friday, October 9, 2015

Google News Lab and More

From the Desk of Sarah Doody. Google News Lab is kind of amazing!

Google News Lab helps journalists learn and master digital tools to help them tell better stories. See examples here of how the New York Times, Washington Post, NPR, Buzzfeed, & more have used Google News Lab’s tools.


Google Cardboard is a cardboard viewer that helps people experience virtual realities. Can’t afford an expensive virtual reality viewer? No problem. Just get this cardboard viewer from Google, put your phone in it, and voila.


Google Field Trip is an app for Apple and Android that lets you learn more about places of interest around you. Unlike discovery apps that focus on food or shopping, Field Trip helps you learn about historical places, monuments, museums, architecture and more. I’m definitely trying this out in Europe next month!

Monday, October 5, 2015

The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals

My current read, when I can squeeze in the time: http://www.amazon.com/Accidental-Data-Scientist-Opportunities-Professionals/dp/1573875112