Wednesday, December 2, 2015

Charles Pensig: Reccomendations for Data People

Charles Pensig wrote an article on his displeasure with his formal data science program at UC Berkeley. I'm grateful to have had a great experience going back to school part-time, mostly because of the connections I've made with group work and my university's ability to leverage top government and corporate organizations to work with our program on projects. When I get my portfolio together, it will be clutch. I sympathize with Charles, but he dropped a great list of recommendations for learning more about data. While I'm locked indoors attempting to stay warm this Winter, I'm planning to get to know Mr. Data Science really well.
Recommendations for Data People
Sign up for some online classes, get a pile of books, schedule two hours into every week night, and sit at an empty desk working through them. Don't leave the desk.  Here are sets of resources you can look into, in order of importance, whether you're learning this for the first time or as a refresher:


  • SQL: If you can't get data, you can't analyze data.  Whether you retrieve data from a SQL database or Hadoop cluster with a SQL-language layer on top of it, this is where you start.  http://sqlschool.modeanalytics.com/ is a great interactive learning interface.  O'Reilley's SQL Cookbook is a masterpiece that traverses all levels of SQL proficiency.
  • Full-Stack Data Science: Coursera offers a full stack online curriculum on a continuous basis for a reasonable price.  This DOES NOT teach you SQL.  If you're in SF or NYC, you can attend General Assembly's pricier in-person full stack curriculum.  This gives you a cursory introduction to data storage, retrieval, prep, light analysis, and deeper predictive and inferential analysis.
  • Python: Code Academy or Udemy will teach you the basics.  Python can play two functions in the skill stack: 1) to conduct ad-hoc statistical analysis as you would with R, 2) to do everything else.  Python is important for the "everything else."  You might use it to get data from APIs, scrape, write ETL jobs, refresh data in your warehouse, or retrain models.  This is the piece of the skill stack moves you from being a Static Data Scientist (one who works with data in a manual fashion), to a Live Data Scientist (one who has automated many of the processes contributing to data science output, loosely defined).
  • Basic Statistics: Khan Academy Probability and Statistics. 
  • Linear Algebra and Multivariable Calculus: Go to a local college or Khan Academy to brush up on Multivariable Calculus and Linear Algebra.  Their curriculums have largely been the same for the past 5 decades.
  • Mapreduce/Hadoop: Focus on this last**. There are so many technologies that enable SQL-like interfacing with Hadoop that to know how to write a MapReduce job is, for the most part, not necessary. To build real MapReduce pipelines is a behemoth of a task that might be the work of an early-stage startup Data Scientist, but shouldn't be if you have a solid BI infrastructure team. This is why companies hire the rockstars we know as backend and data engineers.  Side note: if you ever meet one and aren't sure what their company does, thank them for their service to our country, regardless.
  • Cleaning: plan to spend most of your time cleaning and transforming in these languages/technologies.  The analysis is the fast and fun part.

Wednesday, October 14, 2015

Power Searching with Google

I feel like I've been a walking Google advertisement here lately, but I'm learning so much about Google and what they offer. I'm visiting Southern California for vacation this winter, and on my agenda is a chat session with a few people from the Engineering and Design team at Google LA. I think they do amazing work and I can't wait to pick their brains.

In the meantime, if you want to learn about all the cool and fascinating functions of Google Search, there's a whole class on it--and it's free! Check it out here: https://coursebuilder.withgoogle.com/sample/course

Friday, October 9, 2015

Google News Lab and More

From the Desk of Sarah Doody. Google News Lab is kind of amazing!

Google News Lab helps journalists learn and master digital tools to help them tell better stories. See examples here of how the New York Times, Washington Post, NPR, Buzzfeed, & more have used Google News Lab’s tools.


Google Cardboard is a cardboard viewer that helps people experience virtual realities. Can’t afford an expensive virtual reality viewer? No problem. Just get this cardboard viewer from Google, put your phone in it, and voila.


Google Field Trip is an app for Apple and Android that lets you learn more about places of interest around you. Unlike discovery apps that focus on food or shopping, Field Trip helps you learn about historical places, monuments, museums, architecture and more. I’m definitely trying this out in Europe next month!

Monday, October 5, 2015

The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals

My current read, when I can squeeze in the time: http://www.amazon.com/Accidental-Data-Scientist-Opportunities-Professionals/dp/1573875112

Tuesday, September 15, 2015

REPORT TO THE PRESIDENT AND CONGRESS ENSURING LEADERSHIP IN FEDERALLY FUNDED RESEARCH AND DEVELOPMENT IN INFORMATION TECHNOLOGY

Can someone please read this and give me a summary? :)

Guess that's my job, though. #sigh

The Report

White House Blog Post

Data Analysis with Pipes

NIH Frontiers in Data Science Series

Lecture Title: Data Analysis with Pipes

Please join us for a lecture by Hadley Wickham, Chief Scientist at RStudio and Adjunct Assistant Professor at Rice University. He is the author of several of the most revolutionary, influential, and popular software packages for the R statistical software environment including dplyr, ggplot2, reshape2, and numerous others. This lecture is sponsored by the NIH Office of the Associate Director for Data Science in conjunction with the National Cancer Institute.

Hadley Wickham
Chief Scientist at RStudio
and Adjunct Assistant Professor
Rice University

When: Wednesday, September 16, 2015, 2:30-3:30 pm
Where: Building 40, room 1201/1203

Abstract: Over the last year and half, three things have had a profound impact on how I develop tools for data analysis: Rcpp, writing the advanced R book (http://adv-r.had.co.nz/) and the pipe operator (%>%, from magrittr). In this talk, I'll focus on the pipe operator and how it’s influenced the development of tidyr, dplyr and ggvis, the next generation of reshape2, plyr and ggplot2. Come along to learn about why I think pipelines are awesome and see how pipelines + tidyr, dplyr, and ggvis can make your data analysis fast, fluent and fun.

Links of interest:
http://had.co.nz/
http://priceonomics.com/hadley-wickham-the-man-who-revolutionized-r/
http://www.r-bloggers.com/a-conversation-with-hadley-wickham-the-user-2014-interview/

Feel free to contact Sean Davis (sdavis2@mail.nih.gov) or Michelle Dunn (dunnm3@od.nih.gov) with questions.

Recent Developments in Artificial Intelligence - Lessons from the Private Sector

NIH Frontiers in Data Science Series

Lecture Title: Recent Developments in Artificial Intelligence - Lessons from the Private Sector

Andrew Moore
Dean of the School of Computer Science
Carnegie Mellon University

When: Monday, September 21, 2015, 12:00-1:00 pm
Where: Building 10, Lipsett Auditorium

The lecture will be archived and videocasted at: https://videocast.nih.gov
Please join us for a lecture sponsored by the NIH Office of the Associate Director for Data Science in conjunction with the National Library of Medicine. Dr. Andrew Moore will discuss some of the big developments in computer science from the perspective of someone crossing over from industry to academia. He will talk about roadmaps for AI-based consumer and advice products in the commercial world and contrast with some of the potentially viable roadmaps in healthcare. Dr. Moore will also touch on entity stores (aka knowledge graphs), question answering and ultra-large data center architectures. Please visit the event page at https://datascience.nih.gov/community/datascience-at-nih/frontiers for more information.

Andrew Moore is the Dean of the School of Computer Science at Carnegie Mellon University. His areas of research and expertise include decision and control algorithms, statistical machine learning, artificial intelligence, robotics, and statistical computation for large volumes of data. Dr. Moore previously served as the VP of Engineering at Google Pittsburg where he was responsible for the retail segment: Google Shopping. He was involved with a number of Google/University activities, two examples of which were Google Sky (in collaboration with CMU, Hubble Space Telescope Center and University of Washington) and the Android SkyMap app.

Reasonable Accommodation: Individuals with disabilities who need Sign Language Interpreters and/or reasonable accommodation to participate in this event should contact Sonynka Ngosso, at 301-402-9816 and/or the Federal Relay(1-800-877-8339). Requests should be made at least 5 business days in advance of the event.

Thursday, September 10, 2015

Indiana Clinical and Translational Sciences Institute

I'm hoping to  help build collaborations with the librarians from IU for a joint Purdue-IU meeting. We're getting together this week at the Indiana Clinical and Translational Sciences Institute Conference, where I'll hear about what's going on in the world of translational research and meet a few new people.

See the agenda here!.

Wednesday, September 2, 2015

My -- Obsession

I'm expected to write a lot now--briefs, memos, reports, reviews, articles, etc. I know I have lots of room for improvement so I've joined a faculty writing group.

The writing group is interdisciplinary, and broken into groups of four. On my team are an Anthropologist, an African American Studies person, and someone from the English department. We had our first session, were asked to write a two page report on a topic, and critiqued each others' work. I learned that I use '--' way too much, especially in place of commas. The universal comment: 

"Do not use dashes to set apart material when commas would do the work for you."

Ok, fine. I just love dashes so much--apparently a little too much.

"The dash is a handy device, informal and esentially playful, telling you that you're about to take off on a different tack but still in some way connected with the present course--only you have to remember that the dash is there, and either put a second dash at the end of the notion to let the reader know that he's back on course, or else end the sentence, as here, with a period." 
--Lewis Thomas

Friday, August 28, 2015

JHU Data Science Hackathon

I hit the ground running and am already in the midst of a few projects. One upcoming event I really look forward to is the JHU Data Science Hackathon, a collaboration between the National Institutes of Health and the Johns Hopkins University Biostatistics department, to give data scientists and future data scientists (me!) a chance to meet and solve/hack through real world problems. I've been spending a lot of time practicing R and can't wait to talk with a few experts.

The hackathon is September 21-23, in Baltimore, MD. More info here.

The UX Notebook

One of my favorite UX resources is Sarah Doody's UX Notebook and blog--it's super helpful, especially for new UXers. Sarah's also surprisingly responsive and responds to emails with excellent advice--I speak from experience!

Check her out here: Sarah Doody

UX and Health Literacy by Communicate Health

I enjoy how CommunicateHealth has combined UX with Health Literacy. Sandy Hilfiker gave a great presentation a few weeks ago: https://webmeeting.umd.edu/p6nme78c1vz/

Babies’ brains stimulated by reading moms - Cape Times

One of the things I love most about my position is the encouragement and funding for involvement in international research. I find health literacy efforts like the one at Tygerberg Children's Hospital in Cape Town, South Africa inspiring and can't wait until I'm established enough to start my own.

Babies’ brains stimulated by reading moms - Cape Times

Wednesday, August 26, 2015

Something New

Hi!

I'm Bethany--I recently accepted a faculty position at at Big 10 school, am wrapping up a second Master's degree, and have been accepted in the first cohort of the Design 4 Learning project. To keep my sanity, and keep track of my professional life, I've turned to blogging to help organize my thoughts. This is the beginning!

A bit about me--I just moved to Lafayette, IN from the DC area and though I miss DC, I'm settling into Indiana quite nicely. I'm originally from South Carolina and it reminds me a lot of home. I have a Master of Library and Information Science from the University of South Carolina (2010) and a graduate Certificate in Health Sciences Librarianship from the University of Pittsburgh (2011). After graduating with my MLIS I spent the summer of 2010 as a fellow at the Library of Congress, Congressional Research Service then was hired by Howard University as the Allied Health Sciences Librarian. I spent a total of 5 years in DC, and just accepted a position as Assistant Professor of Library Science at Purdue University, where I serve as the Health Sciences Information Specialist.

My position at Purdue is tenure-track, which means I'm expected to excel in teaching, research, and community involvement. There's always room for improvement, especially when learning new resources and access policies, but I'm generally comfortable teaching and have a personal passion for community involvement--no worries. I only have the vaguest idea where I want to go with my research though.

I'm working on a MS in Interaction Design and Information Architecture, which I expect to finish in the summer of 2016. My research interests are in health literacy and user experience design. This semester, I'm taking Advanced Information Architecture, and received an override to take a doctoral level Research Methodologies class. My reading lists are about 6-inches of articles each week.

The Design4Learning project will be interesting--I accepted my place because it's a remote program that allows you to work at your own pace. I will need that flexibility and I'm hoping I can squeeze the work into my free time. I'm really hoping the teaching and learning strategies will help me manage my classroom, on and offline.

I'm pretty busy learning and teaching, but I still make time for fun. I've already joined a gym and found release in BodyPump and BodyFlow classes. Purdue has a stellar aquatics center with an olympic-sized pool, a regular pool, a jacuzzi, a sauna, etc. It's amazing. I've been church-hunting and am taking advantage of faculty-events to meet new people.

I also love traveling, good food and wine, and photography.

In past attempts, I've been terrible keeping an academic blog. I hope I'll be better at this--I'll give it my best shot!