Note: We now have a website with up to date information here: http://signaldatascience.com/.


(This post is coauthored with Robert Cordwell.)

We’re writing to announce the inaugural run of Signal Data Science’s intensive training program.

The program will train students in the core skills needed to work as a professional data scientist:

  • Scraping and cleaning data
  • Exploring and analyzing data using statistics
  • Presenting findings
  • Interviewing

By the end of the course, you’ll will be able to start with raw data and produce analyses like the one in Bayesian Adjustment of Yelp Ratings. More to the point, you’ll understand why Jonah structured the analysis the way he did and be able to do the same yourself.

You’ll also be able to produce cool visualizations like this automatic grouping of Slate Star Codex posts by topic, as shown below.

Why data science?

Making inferences from data is fundamental to understanding the world, and there’s a growing unmet need in industry for people with the relevant skills. With good instruction and peer group, smart, motivated people can quickly develop enough proficiency to get jobs in the tech sector (starting compensation ~$115k in the San Francisco Bay Area).

Why us?

The Program

We offer inquiry-based learning (no boring lecturers or unmotivating problem sets!) and an unusually intellectually curious peer group. Far from what’s typical of college classes, our model has more in common with the Math Olympiad Summer Program, where daily lectures are interspersed with on-the-spot problems and followed by long-form problems designed to build on the lesson.

Robert Cordwell is an IMO gold medalist and educational startup veteran who’s working a Facebook data science job despite his limited, self-taught experience. He’s going to be teaching math problem solving, overall presentation skills, and how to break interviews.

Jonah Sinick is a data scientist with 13 years of experience making advanced math accessible to beginners, a PhD in math from University of Illinois, and an extensive body of published work. He’ll be teaching a comprehensive technical curriculum.

Who is this for?

If you:

  • Are interested in data science
  • Passionate about learning new things
  • Would benefit from a social environment with others working toward the same goal
  • Have the programming skills to solve simple algorithms problems
  • Plan on applying for data science jobs after the program

our program will be a good fit for you.

Where / When

The first cohort will run in Berkeley for 6 weeks, from Feburary 1st – March 18th. This will be a compressed version of the standard course that we’ll be offering in the future, and is targeted at students who have a high degree of comfort with math.

In the future we’ll be offering longer courses that cover the mathematical / statistical material at a gentler pace.

Cost

For students in our first 6 week cohort, we offer two options:

  • Payment of $8,000 at the start of the program.
  • A “pay later” model where students pay 8% of their first year’s salary (pretax, spaced over 6 months), contingent on getting a data science job.

This is roughly 50% of the standard price for coding /data science bootcamps.

Next steps

If you’re interested in exploring participating in our first cohort, or keeping posted, please be in touch with us at signaldatascience@gmail.com.

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 12:00 AM

Hey Jonah, have you thought about doing some causal inference stuff in the full length course?

Thanks for the suggestion. That would be wonderful. We'll definitely think about this – it's a matter of whether we can create a sufficiently simple presentation of the material so that the marginal returns per unit time are high for the student population that we'll be working with.

Let's chat about it sometime. I am very interested in wide exposure for this type of stuff, and I think it is very useful to think about this for people working on all sorts of data that happens to be biased relative to their questions.

My usual domain is medicine and healthcare, but I went to this talk recently where people worry about questions like "this ad received this many clicks if it was on top of the page, what would have happened had another ad been on top." This is a counterfactual question that causal inference deals with.

From my point of view, a good learning outcome would be: "people are aware of the problem, people know where to go for more reading, people know simple things to try."

I am interested. I've sent an email about it.

Neat! Here are the first questions I have:

  • Do you require applicants to have a graduate degree?
  • Zipfian Academy, App Academy, and other bootcamps are 12 weeks long, and (the first instance of) this one is only 6 weeks long. Why is this, and what are you cutting out relative to other data science bootcamps to make it this short? (This is my most pressing question).
  • As a tie-in to my last question, is there a hiring event which employers will be invited to around the end of the program?
  • Do you know which language(s) you'll be using?

Good luck; do keep us posted.

Thanks for your interest! Some responses below.

Do you require applicants to have a graduate degree?

No degree is required. We're selecting on ability rather than on credentials.

Zipfian Academy, App Academy, and other bootcamps are 12 weeks long, and (the first instance of) this one is only 6 weeks long. Why is this, and what are you cutting out relative to other data science bootcamps to make it this short? (This is my most pressing question).

  1. Based on the preliminary interest that people have expressed anticipate that the students in our first cohort will be significantly stronger than is typical of data science bootcamps, and will correspondingly be able to cover the material at an accelerated pace. We expect at least some of our cohorts to run a full 12 weeks.

  2. Regarding the comparison with coding bootcamps, there are reasons to believe that the amount that somebody needs to know to be in the top x% of industry data scientists is less than the amount that's needed to be in the top x% of programmers. (I can elaborate.)

  3. We're cutting out some of the more advanced machine learning algorithms, which industry data scientists use infrequently enough so that they can be a distraction from getting started.

As a tie-in to my last question, is there a hiring event which employers will be invited to around the end of the program?

Very few bootcamp students who I know got their jobs through this route, so we may or may not do this depend on how efficient it is relative to other routes. Like other bootcamps that offer the "pay later" model, we have a large stake in ensuring that our students find jobs.

Do you know which language(s) you'll be using?

We'll be working primarily in R, and teaching SQL as well.

Thanks for the response! I'm impressed that you expect that your first cohort will be so strong, since I presume that you're competing with already-established data science bootcamps for students. Again, good luck.

I hope you throw SQL into your core skills bucket list.

Yes, we'll definitely be covering this.