My short bio: There’s a big team behind this University of Michigan Coursera specialization and we want to share with you what we’re doing to bring applied data science and python skills to everyone! From pedagogy and technology through to curricular design and content please feel free to ask us anything! Want to know why we think python is great for data science? Or what it takes to put a MOOC together?

  • Christopher Brooks is faculty in the University of Michigan School of Information, and does research in learning analytics and educational technologies, such as predictive models of student success.
  • Kevyn Collins-Thompson is faculty in the University of Michigan School of Information and does research in information retrieval and text analysis.
  • Daniel Romero is faculty in the University of Michigan School of Information and does research in networks and complex systems.
  • V.G.Vinod Vydiswaran is faculty in the University of Michigan Medical School and the School of Information and does research in text mining and natural language processing, such as mining health information from patient records and social media. In addition to the faculty, we are joined by our coordinators * Stephanie Haley and course tutorial assistant Filip Jankovic!

Here’s the course we are planning to teach: coursera.org/specializations/data-science-python

My Proof: http://i.imgur.com/DXaA0F2.jpg

Comments: 50 • Responses: 19  • Date: 

sexrockandroll7 karma

Why did you pick python as the language you're using to teach with?

UMichiganAI5 karma

There are a couple of reasons. First Python is wonderful specifically for data science - lots of great libraries for machine learning (scikit-learn), natural language processing (nltk), network analysis (networkx) and basic visualizations (matplotlib). The data analysis and cleaning ability of python is great - I (Chris) am regularly writing up pandas manipulations to clean and transform research data.

Python also is a comprehensive programming language, so if you're a software developer you've got a full toolkit including multiprocessing and cloud computing libraries and not just a specialized stats language.

But we also took a look at what exists out there for free educational data science material - there are lots of great resources in R, but I think the python world was a little underrepresented, so we figured we would share our workflows (though I think all of us use a variety of tools when solving data science problems!).

UMichiganAI3 karma

Today has been filled with insightful conversations around data science and python - thanks to all who participated this AMA through posing questions and sharing their thoughts!

If we haven’t gotten to your question yet we apologize and will try to circle back on it soon. For those interested in learning more about our work, check out our Applied Data Science with Python Specialization on Coursera: https://www.coursera.org/specializations/data-science-python

PM_ME_YOUR_DATASET2 karma

What value does your specialization offer the job seeker? I'm curious if you had that demographic in mind while designing the course.

EDIT: for example, some specializations have industry partnerships, or large / capstone projects to put on your CV.

UMichiganAI2 karma

All here: We are very interested in this demographic, and talked about how to support these learners at some length in course planning. This course is more introductory, so it depends on the kind of job you are seeking, and what other background (current employment, previous academic background, etc.) you might have. For instance, if you're a programmer who is looking to shift positions away from (say) front end development to business intelligence, we hope this specialization is for you. That's of course just one example of a job seeker!

We also hope to support students who are thinking of going into graduate school, and want some solid skills to put on their application process.

And, while we don't have an omnibus capstone, instead each of the courses ends in a larger project assignment. My experience in talking with learners who had done data science MOOCs was, even if they paid for the specialization, they tended not to do the separate capstone project. So we wanted to try larger projects on a per course basis to see if this would help create a compelling portfolio for learners!

In the end, I think the best bet for a job seeker is to differentiate themselves by applying their skills to a novel project that is wholly their own!

speakerforthe2 karma

Hey I'm a recent Michigan alumnus and I've started learning data science for fun. One of the problems with a lot of the resources I've read online is that they don't go into enough detail about how the math works. I feel more comfortable using methods, such as PCA, when I have an idea what the method is actually doing. How do you plan to address the fact that you need to have some experience with linear algebra, stats and diff eq to have a reasonable grasp of how basic data science methods work?

Also, will people have to pay for the quizzes in your coursera class?

Thank you and go blue!

UMichiganAI2 karma

Go Blue!

First, the quizzes are free, you can sign up for each course and get the full experience!

This course is very much applied in nature, so we're not expecting significant knowledge of linear algebra and diff eq, for instance. The aim is to make the course accessible to a broad group of learners. At the same time, we don't aim to hide the details, but to bring them about through other course resources.

Also, there are some excellent courses that go into detail on specific techniques, like Andrew Ng's course, which we hope will help fill in background for those who want to dig deeper.

OutbreakMonkey2 karma

Sounds interesting. How do you guys feel about R? I've found the data manipulation packages to be very strong but the language is horrible.

Can I replace my R with pure Python?

Second question. Will the course cover any data visualisation tools?

Love the Coursera MOOC concept. Good to see some interesting courses still being developed for it! Good luck with it!

UMichiganAI2 karma

Chris here: R is great, and a number of us use it regularly in our research work. It depends what you're doing in R as far as replacing it with python. R's got amazing stats libraries, and you can pretty much be guaranteed that when a new approach comes out, especially if it's a statistical approach, there will be an R library for it. I think that as data scientists there's a need to be a bit of a polyglot. At the University of Michigan students in the School of Information learn both R and Python depending upon the course.

I'll be teaching some data vis in the second of the five courses, looking at matplotlib and maybe seaborn and bokeh. The focus will be on charting and graphing, not really moving into 3D vis or highly interactive visualizations. Covering things like heatmaps, scatterplots, violin plots, etc. As well as some theory - tufte and cairo references will abound! This is still under development, so feel free to influence our path!

avo_1233_bro2 karma

Hi I heavily use SAS and I am only familiar with it. Compared to SAS is Python relatively better at creating subsets of data and looking at trends and performance and such? I have been curious about other softwares out there aside from SAS. Thanks!

UMichiganAI1 karma

Chris here: My apologies but I'm not very familiar with SAS. Python is a general purpose programming language. While we are teaching python for data science here, and will be talking about its statistical functions, its data manipulation functions, machine learning and text analysis, and social network analysis.

What I personally like about python is its flexibility. If you need to do image analysis, there are libraries for that. If you need to spawn multiple analysis processes across a cluster, there are packages for that. If you need to automate form filling in a website then scrape the results, you have that too. So tying your data analysis activities to your data gather in a pipeline makes for a compelling proposition.

Where I would guess SAS excels is in the out of the box statistical functionality, like R.

kanjiwatanabe2 karma

What do you believe is a reasonable expectation for ROI for a paid data science specialization on Coursera? How does this compare to the expectations for other certification paths (e.g. post-graduate certificates)?

UMichiganAI2 karma

We all have lots of thoughts on this!

First, you can take all of the courses in this specialization along with assessments for free. You need to find the individual course and then there should be that enrollment option available. Of course, investment isn't just $, it's time too.

There are lots of different post graduate certificate options, and I think they differ heavily on price and how you take them. The coursera specializations are probably the cheapest and most flexible. Bootcamps come next, and have some significant constraints (location) and costs. Another option is university certificates, which require dedicated time and can have significant costs (e.g. in the case of 2 year Master's degrees).

Accessibility is another issue - if you live in a town that has meetups, a strong university, or boot camps, you have different resources available then if you are in a rural area (for instance).

I think the ROI argument probably comes down to understanding goals, background, and willingness to accept risk (e.g. move across the country and put out significant $). In offering this specialization we hope to help people get involved in applying data science while minimizing their costs and risk. But we teach at a residential university and think the on campus experience is amazing too, so it comes down to your risk profile in part.

But you asked about the ROI for the certificate. This is a tough question! We're starting to see students listing their co-curricular (e.g. moocs) work on their submissions to grad school. I think certificates like those from Coursera will help people not only get introduced to the area, but help get them that interview where they can pitch their case.

slpgh1 karma

Python is great for rapid development and "getting things done", but can be a nightmare to debug when things go wrong (compared to, e.g., Java where everything is so strict that the compiler often saves you from yourself). How do you solve that problem with newbie programmers in a MOOC?

UMichiganAI2 karma

Chris here: I'll side step the discussion of static vs. dynamically typed languages a bit, and focus on how we are supporting newbie programmers. There are really two innovations with this course that we are leaning on. First, the coursera platform has evolved to allow programming examples for in video quizzes. So instead of just multiple choice, learners can see scaffolded code and fill in some pieces to get immediate feedback on potential problem solutions. I think this is going to be awesome for supporting learning during the video.

Second, coursera has integrated the excellent jupyter notebook environment right into the course shell. So you don't have to download or setup anything in order to start programming, and we will have notebooks for all of the code examples in the lecture, allowing learners to not only follow along with the lecture but go on tangents to explore their own ideas.

To jump back to python v. java, I'm a big fan of both languages. What I appreciate about python for newbie programers is the simple syntax and lack of boiler plate. We don't have to jump into a discussion of classes an inheritance in order to start writing code which allows us to do some basic data cleaning.

(n.b. I'm not the biggest fan of the syntax for interpreter hints for python, that might be a middle ground but that's probably another discussion)

krdaito1 karma

How proficient does one need to be in Python going into the class to be successful?

UMichiganAI1 karma

All of us chiming in: If you don't know python but you have a programming background I think it's very attainable - we provide some material in the first week which will help bridge the gap. If you don't have a programming background or want a review, then we would recommend checking out Dr. Chuck's MOOC, "programming for everybody".

PuzzloGeek1 karma

Hello!

I'm interested in this specialization. When does it actually start in September? Will there be any limit on the number of students earning certificates per session? In other words, what is the deadline to enroll in September with certificate option? I couldn't find information on the start date on Coursera. I'm glad I found this webpage with more information.

Thanks!

UMichiganAI1 karma

Hi! The course is currently scheduled to launch 9/26 but you can enroll now. Hope that helps :)

PuzzloGeek1 karma

Currently Coursera shows that the first course runs from 9/26 to 10/9, so that is 2 weeks? There are a lot of topics in the syllabus - will they all be covered in just 2 weeks? Also, how long are the other 4 courses of the specialization?

Thanks!

UMichiganAI1 karma

Hi: No, each of the courses runs for four weeks, not sure why it is showing up as only two weeks on Coursera!

Ozmerg1 karma

When designing an online course are there parts that just dont scale to a mooc? What are the results of a mooc vs a traditional course in terms of student knowledge retention?

UMichiganAI2 karma

Chris here: Yes there are challenges in scaling up some activities in particular, but we see some of that in traditional higher ed too. Discussions is the big one - MOOC discussion forums are largely unused by students, which means that there might be only hundreds or thousands of messages in the discussion forums. We see this in big first year courses too - how do you have a discussion with 300 people in a classroom?

Many MOOC faculty I've talked to handle this in a couple of ways. One is to engage in peer review - to break the discussion up into an activity like a short writing which then others in the course have to grade or comment on. In this way the discussion boards aren't really used

Another way is to have really focused discussion prompts. One of the instructors at UM, Caren Stalburg, teaches a course on instructional skills. She is deeply involved in the discussion forums, but has created her course to have very structured activities.

As far as retention, I don't know that I've seen literature on this. I think it likely comes down to learning by doing. If you don't do the programming (in this case), you won't gain the skills. There's only so much you'll be able to absorb through lectures (though that's fine if that's all you're looking for!). So one of the cool things Coursera worked on for us was integrating a coding environment (jupyter notebooks) right into course experience. You don't have to install anything to start doing data science practice, which I think is going to be awesome.

(and we're starting to see this happen in traditional lecture halls too!)

_its_a_SWEATER_1 karma

Can this course aid in my eventual Python use for web development?

UMichiganAI2 karma

The better course would be that of our colleague, Colleen van Lent who offers a course specifically on web design (https://www.coursera.org/specializations/web-design), or that of Chuck Severance, who offers an introductory course on python (https://www.coursera.org/learn/python)

dkharms1 karma

Are you worried about people learning how to use tools without understanding the assumptions about data that they rely on?

UMichiganAI1 karma

Hi, Chris here. Yes, I think not understanding some of the assumptions not just about the data but about the techniques they might use is a concern. I see this all the time in, frankly, scientific publications. At the same time, there are technique gatekeepers, and I hope that these courses help to challenge that. To raise the overall level of competency with data science techniques, and hopefully to put some people on the path to even deeper studies of the area.

So in putting these courses together I think we're looking to start a journey, not necessarily finish one. And to clarify, the courses won't be devoid of theory or nuance around these issues! It's not just teaching people to use toolkits, but the goal is to also teach toolkits.

rsxstock1 karma

What parts of Python do you think is valuable to almost any employer(especially those who are not into data analytics yet)? What should ever user at least try to master?

UMichiganAI1 karma

Chris here: If you want to say you have mastered python the language, you should be prepared to effectively read and use regular expressions, lambdas, list comprehensions, generators, and type annotations (including making your own), objects, overloading all of the basic operators. You should understand all of the built in libraries, with a particular emphasis on multiprocessing and process communication. Then comes external libraries of interest, things like num py which are heavily used, or sqlalchemy, flask, etc. This list depends a bit on what you are planning to do with python.

And you probably should be aware of how to link in compiled C code to python in a wrapper fashion. You should also be able to debug code through an interactive debugging tool (I use pycharm) as well as performance tuning and profile code.

Once you have done this I think you could be considered a python expert, but what is valuable to the employer depends on what you will be doing for them (database, data analysis, interactive apps, web services, etc.).

half_life179161 karma

What would you say to someone that is still in the process of obtaining a college degree, little math/statistics/programming knowledge, but have an enormous amount of interest in data science and wanting to pursue a career in this field? Also, what can you say about data analysis and the health care field?

UMichiganAI1 karma

Chris here: It depends a lot on what you want to do with data science and in what capacity. I think data analysis skills are becoming an important for everyone - they help you think about the world, information, and computational resources in different ways. And I think basic understanding of data analysis and how it is applied is important for communicating in an increasingly data-driven world.

If you're an undergraduate student I think this course is a good place to start. It requires some programming and stats knowledge, but nothing that can't be learnt from existing online resources like the Python for Everybody specialization by my colleague Chuck Severance. If you want to continue the technical burn, adding in courses like Andrew Ng's Machine Learning course is an excellent next step.

You ask about the health care field. Vinod Vydiswaran is teaching the fourth course in the specialization on text mining, and he is interested in understanding about how people communicate about their health in online forums. This is a great example of one way you could go with data science, but there are many more. From diagnosing disease in individuals, studying population health, to predictions off of genetic sequences, there are lots of applications of data science in health care. The question you should ask yourself is: are you a data scientist who applies the craft to health/medical domains, or are you a health/medical expert who understands data science and can do your own relevant analyses.

There is room for both of these roles, but might help in choosing options while in undergraduate study. And shamelessly I'll share that the School of Information at the University of Michigan has a Master's of Health Informatics (and Vinod teaches in that specific track!).

Mighty-Monata1 karma

Would completing programming for everybody specialization (which I currently take) provide enough knowledge for me to take this specialization or should I wait to finish first year of my (in CS) University?

UMichiganAI1 karma

Chris here: Yes, I think it would provide enough of a background, especially if you are planning to go into a technology field and consider yourself to be a keen. Some of the later courses get more intense and be more challenge as they require some basic statistics knowledge, but I think this is generally achievable by any CS student in either the late part of their first or second year of undergraduate study. I think this specialization would help you experience techniques that you might not normally get to experience until you are a senior undergraduate.

k3ithk1 karma

I haven't used coursera in quite some time, but as I recall the courses were free. This one appears to have a fee. Can I take the course for free if I don't want the certificate?

UMichiganAI2 karma

Chris here: Yes! The course is available completely free. It's a bit circuitous if you want to sign up for all five courses for free. To do this you have to find each course's individual page then enroll from there. At the moment coursera doesn't allow signing up for a specialization for free.

Here's a list of the five courses that make up the specialization:

https://www.coursera.org/learn/python-data-analysis https://www.coursera.org/learn/python-plotting https://www.coursera.org/learn/python-machine-learning https://www.coursera.org/learn/python-text-mining https://www.coursera.org/learn/python-social-network-analysis

Also, if you're unemployed, under employed, or otherwise can't afford the fee but the certificate is valuable to you, Coursera has a financial aid option (I think you have to do this for each course as well): https://www.coursera.org/payments/finaid?cartId=5014988

FanOfGoodMovies1 karma

Do many companies now use CPython?
How do you ensure a MOOC gets proper publicity and supplementary texts?

UMichiganAI3 karma

Chris here: Python is certainly one of the top data science languages, along with R. There are many other tools of course, SPSS, SAS, STATA, etc. Python is particularly nice because of the large toolkit support (nltk, networkx, scikit-learn, pandas, matplotlib) for data science workflows.

For publicity we rely on word of mouth, the coursera portal, and of course activities like a reddit ama. For supplementary material I feel that there are plenty of solid data science resources on the web we can link learners out to - kaggle is a great example, where someone might want to take this specialization then get engaged in kaggle competitions to hone their skills.