I'm originally trained as a physicist, but migrated toward computer science in grad school. After some internships with Google I got into real-world analytics and data science consulting. I've worked with clients in all sorts of industries, but especially companies in the web space. The consulting firm I work for was recently acquired by Teradata.

I also have a more light-hearted blog at www.fieldcady.com. I talk about stuff related to math - everything from educational policy to neuroscience. I self-published a Kindle book on the subject called "What is Math?"

I'll start answering questions around noon PST. Thanks!

Comments: 452 • Responses: 58  • Date: 

MomSaysImAGenuis43 karma

So Brian Greene did an AMA a while back, and one of his responses caught my eye.

Someone asked:

Hi, Brian! Do you think that math is a sort of objective truth that is discovered, rather than made up?

and he responded:

Is the universe fundamentally mathematical? Surely seems so. But I could imagine that one day we encounter an alien civilization and they say "So, show us what you've found to explain the universe" and we open our math-filled texts. To which they chuckle "Oh, math. We tried that. Only takes you so far. Here's how to REALLY understand reality..."

What's your take on this?

fieldcady23 karma

That's a fantastic quote from him - thanks for bringing it up! I agree with him, except that I wouldn't even go so far as to say the universe seems mathematical; basic physics is the only thing that fits hand-in-glove. Everything else (biology, climate, sociology, etc) becomes either touchy-feely or not mathematically tractable. Math seems to me like an outgrowth of human cognition, although I have no idea what an alternative would be

dlb100142 karma

Hello,

Thank you for holding this AMA. Couple questions:

  1. This should be easy, what do you consider the definition to be of Big Data?

  2. What would you say are the biggest challenges facing big data research today?

  3. If you had a magic lamp and could wish for anything in order to help solve a problem in big data research today what problem would you wish for a solution first??

Thanks

fieldcady18 karma

Sure thing!

  1. "Big Data" is right now maybe 50% buzzword, and as such there's no litmus test for it. However, there are two trends the have converged in a big way in recent years, and are collectively called big data. The first trend is the most straightforward; you have more and more data. It becomes "big" around the time that it won't fit on one computer anymore and you start needing to use a cluster to work with it meaningfully, and programming a cluster rather than one computer can be a very different beast. The second trend is that the data is more complicated in its structure. In the past so-called "structure data" was more likely to be a SQL table, with nice orderly rows and columns. "unstructured data" is more likely to include things like a computer log file, documents, or deeply nested data that don't have rows and columns. Several recent pieces of technology, most notably Hadoop, make it WAY easier to process large and unstructured datasets.

  2. I'm afraid I don't work much on the pure research end so it's hard to say. But the constant competition between different technologies shows that people haven't really figured out what are the best programming paradigms to use. Map-reduce is less dominant than it used to be, and there's a lot else on the market. Figuring out those best practices is the main hurdle in my mind.

  3. Does asking the genie for venture funding count? :) More seriously though, I would probably ask for a way around map-reduce's performance bottlenecks, especially in doing joins

LevitatingTurtles19 karma

I would like your comment on this joke that I heard a few weeks ago:

Big Data is a lot like sex in high school:

  1. Everybody says they are doing it.
  2. Very few people are actually doing it.
  3. The ones that are doing it, are really bad at it.

How accurate is that to the state of big data analyitics in either business or academia?

honeyplease2 karma

Funny and almost accurate, but point 3 is not. There are some communnities heavily invested in doing proper empirical inferences, taking the problem of inference in all its complexity, choosing appropriate dats and methods, as well as making sure that the conclusions are made within the bounds of what the data can say.

Edit: To the guy who started this AMA, you have, umm, actually respond to some questions.

fieldcady5 karma

I prettymuch agree with you on this one. It's a weird industry in that there is a ton of cool new, value-adding stuff going on, but there's also a huge amount of hype any everybody is trying to jump on the bandwagon. Many of them are (sorry if I offend) MBA types who don't actually understand anything beyond what they read in Newsweek, and they love to grab at the titles.

dcbedbug14 karma

Would you consider the title "Big Data Scientist" to be different from a modern day statistician?

Nickdangerthirdi5 karma

I like this question because I used to work with people who insisted I refer to them as data scientists, I never really thought they were special enough for the title scientist when all I could see that they did was crunch lots of numbers.

fieldcady4 karma

ugg, they sound like douchbags...

fieldcady1 karma

I would. The term "data scientist" gets used and abused all over the place these days, since it's "sexy". To my mind the difference is that data scientists solve problems that require a lot of software engineering in addition to the math. Statisticians tend to work with well-defined datasets that you might tackle with R or SAS or something. If you have to do weird database hacking to access the data, or write a custom loader that reformats the data, then it starts to blur into data science. But there's not hard dividing line.

Interestingly a lot of big data scientists (including myself) use relatively little stats. If you have a million datapoints then tons of things are statistically significant; the question is more about defining the best features than testing hypotheses

whitecompass12 karma

When does Excel stop being enough? Also, SAS or R?

fieldcady5 karma

I don't do any professional work with Excel (I mostly use Google spreadsheets) so I can't say. For SAS and R it depends on how much memory your machine has, and how you're using them. R in particular is easy to start really abusing memory with depending on the library you use. If your data is taking up a significant fraction of your RAM I would start backing away from R.

PuffsPlusArmada11 karma

Are you dangerous?

fieldcady5 karma

Not particularly, unless you're scared of ads that you're 2% more likely to click on :)

There are definitely privacy issues associated with big data, but I think they're nowhere near as big as people make them out to be.

stevierar9 karma

What's the biggest dataset you've ever worked on? I'm talking filesize or rows if applicable.

What's the most involved and interested you've felt in the data you were researching? Ever done anything with climate change?

Have you ever had a view you've held changed by the results you've generated? I'd find that very cool, after triple-checking!

Thanks for doing the AMA. I'm a web developer but often find little data challenges within projects and always enjoy them.

fieldcady4 karma

Biggest dataset was, if memory serves, a few terabytes. So not as big as you might think.

The most interesting clients I have (I work for a consulting firm) are ones that aren't doing ads. For example, recently I've been working with a computer hardware manufacturer, trying to help them diagnose problems on their assembly line. It's not terribly sexy, but it's saving them money and making the world more efficient.

Definitely! A big recent one was that I was trying to predict how much advertisers would bid on human impressions. I thought that I would be able to make decent predictions about the bids based on demographics (ads for sports cars might get higher bids from men, that kind of thing), but it just didn't work. I eventually figured out that the reason was that 90% of the bidding behavior was based on whether the person had seen a given ad in the recent past, rather than anything about the actual person.

Gaywallet2 karma

Biggest dataset was, if memory serves, a few terabytes. So not as big as you might think.

I find it interesting you say this is the biggest dataset you've worked with, and yet you say it's not big data until you move away to a cluster.

We don't have a cluster, and we have several tables I regularly work with that are TBs in size.

Then again, our database performance is complete shit and I've been advocating we move to a cluster and work on some better disk arrays.

fieldcady3 karma

Yeah there's no hard cutoff. It's all a sliding scale.

For the record, the several terabytes was one of a number of tables on the cluster.

biesterd19 karma

I've just started working towards my masters degree now in data/computational science. Any advice?

fieldcady6 karma

Depending on your background, learn to chug out good code that does non-trivial stuff. That's the biggest thing - I reject interview candidates all the time because they can't do something simple in a real language. Learn to writes code that works, and that is easy to understand/modify. It's amazing how shitty the code is that brilliant people write sometimes, and they end up being useless.

If this isn't a problem for you, then definitely make sure you are familiar with Big Data technologies like Hadoop and Spark.

Finally, I am a huge fan of learning on the job, so try and do internships with real companies if at all possible. I have two masters degrees and I feel like 90% of what I learned before joining the workforce came from two internships at Google.

eskal1 karma

How do you learn what "good code" vs. "bad code" is? I use R but don't have much formal programming methodology training, I'm learning as I go. Seems like every week I find a better way to do the same tasks. I wrote a code sample to send with my resume to potential employers but I'm afraid of it having methodology errors that make sense to me but make me look like a bad programmer to everyone else.

fieldcady3 karma

The way I learned was through brutal code reviews at Google. The best description that I've heard is that good code boils down to two words: not thinking. That is, somebody (maybe you in a few months) should be able to easily read through your code and understand what it does and how it does it. A moment of "wait, what does that variable refer to again?" or "boy, isn't there a clearer way to do this?" is the fault of the programmer. I recommend trying to nurture the habit of re-reading your own code and being really anal about making sure that it's readable.

The advice to read lots of other people's code it REALLY good too. Good code is deceptive; it seems trivially straightforward when you read it, but that's just because it's written well. All code should seem straightforward.

Brewtopian8 karma

How did you get into the big data field? How would someone already in another career field get started in big data?

Humanunkind10 karma

As someone who did a jump to big data, here are some tips:

  1. Learn and become a beast at R, Python, SQL, and other languages.
  2. Work on random projects and build a portfolio that combines stats, calc, and scripts. It'll help present to whoever that you aren't completely without experience.
  3. If you don't have a stem degree - brush up on some stats, linear algebra, and calc. Be able to code with these theories and discuss them with pure bs if needed.

My background: Non stem degree, supply chain work, and programmed in vba/python beforehand.

fieldcady3 karma

Great answer! I come from a physics background originally, but I think the big tips are the same as for non-stem: learn to code (Python and R are the best languages), brush up on the math (especially basic stats) and get some experience.

The one thing I would add is that machine learning is extremely important, often moreso even than statistics.

Random_Mochi1 karma

Hello, I saw your reply to someone's question and wanted to ask your input. I'm an accountant and am thinking about going into data science instead. I'm currently taking an online course for R programming. I don't have much experience in programming, SQL or statistics. What else should I be doing/reading to get experience?

fieldcady1 karma

R is a great place to start. I also strongly recommend learning

  • machine learning

  • the Python programming language

  • different types of visualizations, like scatterplots and bar charts.

Generally the biggest skill that potential data scientists lack is the ability to chug out work, decent-quality code. It'll probably be the hardest thing for you to pick up.

There are a ton of cool datasets available in the world. I suggest pick something you're interested in and start playing around with it. I am a big believer in learning this stuff through doing it.

MentalSieve7 karma

Hey,

As a linguist gravitating towards computation, what is your book? Can you briefly describe how or why math is a language, as you say?

ze_ben5 karma

Yeah, as someone with a linguistics degree, I can smell the marketing bullshit in that title as well. I'm sure it's a great book about data and math, but I'm not sure I see the point of the "it's just like any other language" assertion.

fieldcady6 karma

Ok, "just like any other language" is pushing it a bit, but the connections between math and language are a lot stronger than you might think. The problem is that there is so much stuff in a natural language (culture, physiology of the mouth, etc) that's peripheral to the core syntax, so they look very different on the face. I'm not a linguist, and I don't think I have anything to say that sheds light on that discipline, but I do think that many mathematicians need a reality check.

fieldcady1 karma

It's a Kindle book called "What is Math?" I argue that it's a language in a few ways:

  • the cognitive processes underlying it seem to be the same, specifically with regards to the existence of recursive syntax

  • it's fundamentally just a tool for describing the world around us, rather than any of the "transcendent truth" bullshit that some mathematicians talk about

  • The seeming differences between math and natural language are just of degree. For example, some people would say that the defining feature of math is the use of proofs. That is wrong for SO many reasons, but one of them is that proofs are also used in things like law and philosophy. Deductive reasoning is a property of language is general.

The differences between math and natural language are ones of degree. Math uses more deductive reasoning. But most importantly, math describes things that we have a much harder time wrapping our heads around, so we rely very heavily on the language itself rather than "common sense" about the topics under discussion. The hardest part of using math responsibly is to develop that common sense.

link to the book if you're interested: http://www.amazon.com/What-Math-humans-speak-means-ebook/dp/B00LZLQPBQ/ref=sr_1_1?ie=UTF8&qid=1412791705&sr=8-1&keywords=what+is+math+field+cady

ProfessorKeenBean5 karma

What advise would you give to an up and coming Big Data Scientist, and what would you say the most important structures are for maximizing the potential of data collection and reporting? (i.e. languages, platforms, GUI builders, etc) I currently work in healthcare as a Quality Analyst and am always looking for ways to improve, or to grow my salary potential through professional development.

Thanks!

pgoupee1 karma

Get very familiar with TSQL and database structure in general. I work with a lot of big corporations that use the newer graphical tools to accomplish tasks but they don't seem to understand the data at a fundamental level. Mastering TSQL allows to you work with data at the microscopic level as well as in bulk.

fieldcady3 karma

Databases are certainly important, but looking at your other comments I think you're over-emphasizing them. There are way more important things for 90% of people...

pgoupee2 karma

You're probably right....I spend most of my time on infustructure lately as I'm sure my comments reflect. I'm just astounded how many high level data people don't understand the overall importance that design has on functionality so I always stress it. Building datasets and analyzing data are two different realms.

fieldcady2 karma

I'm glad there are people like you!!! I spend a large portion of my time pulling my hair out because of infrastructure problems. And I don't know the best ways to solve them; in my job as a data scientist I'm usually just a client of fancy data systems. There's nothing worse than trying to get your work done with shitty infrastructure.

fieldcady1 karma

I'm not sure I totally understand your question. What do you mean by "structures"? As far as languages I'm a gigantic fan of Python (I just gave a conference talk about how great it is yesterday), and Scala is becoming a very big deal. I personally don't like R much, but it's also high on the list of languages to learn. You should definitely have some familiarity with the MapReduce paradigm, and with the Hadoop implementation in particular.

znay3 karma

Hi

Thanks for doing this AMA. Just a couple of questions: 1) what were some of the more interesting projects you encountered?

2) were there any projects where you went into it thinking the results would come out one way but came out another way? Would it be difficult to explain to your customers that the data was very different from what they thought?

3) have you encountered resistance from users of your projects? How did you deal with that?

4) have you used social network analysis before?what was some of the more interesting/useful results you had?

Thanks!

fieldcady1 karma

1) It's hard to pick, there are so many. I really like the ones where I learn something about an industry I'm not familiar with, and for me that includes working with clients in computer networking, manufacturing, and online ad auctions.

2) Definitely! The biggest thing in explaining to customers is to provide some kind of alternative explanation for why what they expected wasn't there (ideally you figure out where the signal actually is an show it to them). Failing that, you need to show really strongly that there just isn't any signal. "I couldn't find the signal" generally isn't an excuse. Unfortunately some people take this kinda personally, but most don't.

3) Oh hell yes. Especially in a large organization there are often competing teams, one of which thinks Big Data (and esp the consultants, like me) is the next big thing, and one of which think it's a waste of money. Or that it's infringing on their territory. The politics can get really crazy; I've been in situations where one executive has forbidden me to do the work that another one has already contracted me to do, and technically I'm answerable to both of them.

4) Actually I personally haven't. I think a lot of that stuff tends to be proprietary within Facebook or Google. But in any case, I haven't personally done social networks.

lespinoza3 karma

I used to work as a program director for a non-profit, and we used huge data sets. The goal was to have some sort of predictive analytics regarding future educational attainment. Do you think that we can use big data to answer these kinds of social questions?

fieldcady1 karma

How huge do you mean? Most people who want big data would be better served by traditional analytics techniques. But yes, I think that there is a ton of potential good you could get out of applying machine learning or other analytics to the data.

You can definitely ping me too and I could give some suggestions about the best ways to address these questions. There are problems with SVMs, but there are lot of techniques that could bear fruit.

latepostdaemon3 karma

What words of encouragement would you give to someone who's bad at math, but wanting to get really good at it?

fieldcady3 karma

My wife is my best example I can cite. She has historically considered herself bad at math, and struggled a lot in many of her math classes. But now, since having entered the workforce, she is an algorithm design engineer with dozens of patents and tremendous affinity for machine learning. She is way better than me in many areas of math, like number theory and signal processing, even though I'm the avowed mathematician.

There is such thing as natural talent. But it comes in many different forms, only some of which are brought out in traditional education. You very likely have natural talent for some area of math, or for some approach to it that you haven't seen.

And even if, hypothetically, you have no natural talent for any part of math, it can still be learned. Ed Witten is arguably the best mathematical physicist in the world today. He was always one of the top students up through grad school, but I am told by friends-of-friends that he only became a superstar later, through hard work. Isaac Newton is an even better example; we was always a mediocre student, but became the greatest scientist of all time by working his ass off.

I am 100% convinced that any person of normal intelligence can learn any area of math. If you are interested, go for it!!

Humanunkind2 karma

Fellow Data Scientist here.

R or Python?

fieldcady2 karma

Python all the way, ever since I discovered the Pandas library. What about you?

_Count_Mackula2 karma

Were your Google internships how you were introduced to Big Data? Did you also take any courses? And books to recommend for someone who is interested in Data Science and wants to get their feet wet?

fieldcady3 karma

Nope, Google is where I learned how to chug out high quality code for general software engineering. After grad school I did freelance consulting, and I had a long-term client who was using Hadoop. That's where I first got exposed to it, and then later I joined my current hadoop-focused company. No courses or anything - it was all on-the-job or googling around.

A book that I really like is Data Analysis with Open Source Tools. I learned a ton from it when I was starting my current job.

sorinash2 karma

  1. How familiar were you with computer science and upper-level math before graduate school?

  2. Was it a Masters or PhD program (what I guess I'm asking is how much time you had)?

  3. How feasible is this switch?

fieldcady1 karma

  1. I did a dual major in physics and math in college (stanford), so very familiar with higher math. I took the accelerated classes in coding and discrete math from the CS department, but that was all and I didn't learn the fundamentals until way later.

  2. Actually it was 2 phd programs (applied math and CS), each of which I dropped out of with a masters. It took me a while to figure out academia wasn't for me - they added up to about 4 years.

  3. Extremely feasible! You can learn to code at any age or any point in your career. Math is harder to pick up, but the good news is that for most data science work you don't need that much of it.

RicsFlair2 karma

Are you a believer that finding cause is a misuse of time and resources?

fieldcady3 karma

No. I think causality is a useful and important paradigm for understanding the world, and there are sound ways to identify it (like A/B testing). Most of data science is focused on just finding patterns, but in many cases those patterns inspire rigorously testable hypotheses about what causes what. We make up narratives to explain the mechanisms of the causality, and those are harder to test, but causality it not passe.

MrB3982 karma

I notice you speak highly of TSQL. What is the largest advantage TSQL provides over someone not using it?

What is your opinion on MySQL? I've been able to accomplish very complex tasks with ease using MySQL along with .NET & code first, such as loading millions of records into memory, performing billions of searches (using parallel processing), and pushing results back to MySQL.

fieldcady2 karma

No, it's another guy who has been posting about TSQL. I have never used it myself - it's a Microsoft product, and I'm mostly open source. Sorry!

newanalyst2 karma

I too am a big data analyst/scientist (term seems interchangeable in the industry, just depends on which company you work for/how much coding you do) and I am wondering where you see the career path evolving if at all? Do you see it becoming a full division of companies in the future (i.e. a BI department akin to an R&D dept.) or simply as an internal assisting department like IT?

I know at the companies I have worked for it feels like it is being held stagnant in the support side, but as someone with access to the data and the insight that it gives I feel like it should be given much more importance in the overall structure of the business. After interviewing with Google I feel like they are one of the few companies (along with FB, ebay, and yahoo) that are developing it into it's own branch of the company that is intertwined with the other branches.

fieldcady1 karma

Thanks for the question, fellow DS! Boy I'm not certain, but it really seems to me like people are starting to figure out the key fact that everybody will need to be able to code. I think that's probably the biggest thing, is that people who work with data will be more-and-more expected to be able to write code that processes it. I suspect that "data science" will ultimately merge back in with analytics.

deeperest2 karma

Hi, thanks for the AMA!

  1. What kinds of problems are you looking to solve through big data? Do you have a pet solution space, or do you attack whatever is needed? What do you think is easy, or hard, in big data right now?

  2. What is your toolset of choice for BD analytics (feel free to expound as much as you want, from infrastructure decisions right up the stack)

  3. What's missing in big data right now? What do you need or want in order to make your life easier or allow you to solve previously unsolvable problems?

Thanks again!

fieldcady2 karma

  1. I'm pretty general purpose, not specific to any industry or analytics technique. The analytics part of my job tends to be easy. The hard part if ferreting out how different datasets relate to each other, weird pre-processing logic they have, etc. I'm a consultant to I have to ramp up for every new client.

  2. My programming language of choice is Python, along with pandas/scikit-learn/numpy. I used to use Python as a scripting language and R for the numerical stuff, but I abandoned R when I discovered the Pandas library. For BD I generally use Hadoop, specifically a combination of Hive and Pig (Hive is better for simple stuff, Pig is better for complicated workflows). But recently I've become convinced that I need to learn more Spark.

  3. The biggest problem in the BD industry, at least for me, is how messy so many people's data is. Right now there are companies that have been storing up data for years, throughout many changes in schema and whatnot, and are just now trying to actually analyze it in a coherent way. It can be a real rat's nest with little to no documentation. This is only an issue with individual companies, but it's a problem for an awful lot of them.

WognI2 karma

Hi. I'm a senior with a BS in physics, math, and I have taken upper level CS courses. On top of this, I've done nuclear physics research for the past few years, a field which relies heavily on analyzing large sets of data. I've recently decided to pursue a career as a data scientist. What are your recommendations to get started in the field?

astromaddie4 karma

I can actually give some help to this! I have a physic BS and four years' experience in astronomy research, much of which was data science-related. After 7 months of job hunting and interviewing, I start my first data scientist job on Monday!

Basically, create a resume (not a CV) that de-emphasises your research material (which wards people off) and exemplifies the pure at a science you did. Talk about accomplishments, not tasks (e.g., "optimised a computer model using linear regression forecast modelling" rather than "performed linear regression modelling"). The bulk of your resume should focus on things you achieved, and unlike a CV, keep your resume at a page or so in length. ALWAYS attach a cover letter that will quickly gab their interest and defend why you, lacking direct work experience, are more than qualified for the position. And then, if you get an interview, consider they already think you're worth pursuing so you should relax, charm them, and make a persuasive argument for how your experience applies.

Also, I recommend learning R, SQL, and SAS in your free time.

Ninja edit: You'll face rejection. A lot. Even for positions you are ABSOLUTELY qualified for. Data science is a new field and far too many companies just want to hire business/economics graduate. But just keep on it, look for more startup-y companies that hire based on intelligence rather than experience, and you'll find something!

fieldcady2 karma

Congratulations!! A co-worker of mine is similar to you; he was originally an astronomer, then switched into data science, and now is one of the leading R guys.

fieldcady3 karma

Learn to write good code. Most people with your background (which included me, although I did more mathematical biology than nuclear physics) write horrific code that, while technically working, is so poorly written that it's impossible for other people to read or for even the author to modify much. I've seen projects almost fail because a brilliant physics/math guy wrote thousands of lines of indecipherable code, when a few hundred lines of clear code would have done perfectly. Get into the habit of being really anal about your code quality. This might not apply to you, but it does to most physicists.

After that, I suggest

  • Learn Python and/or R. Those are the best languages for data science. Also make sure you're familiar with SQL.

  • Learn machine learning. You use it all the time as a data scientist.

  • Learn basic statistics, up to what an ANOVA test is. In practice you usually don't need anything beyond that (and I have never even needed to use ANOVA).

  • Get used to doing visualizations all the time. I tell people only half-jokingly that half my job is just to produce and interpret scatterplots and bar charts. Computers work in numbers, but brains work in pictures.

Astromaddie gave some good advice about how to put your resume together, which is also worth taking a look at.

captjons1 karma

Is there such a thing as raw data?

fieldcady2 karma

I'm not familiar with that as a standard term, but there's definitely data that has not been pre-processed. A lot of my time gets spent writing code that turns such data into something that can be plugged into a statistical package.

gankindustries1 karma

How easy was the transition from physics to CD and what steps did you take? Has your physics background helped you at all in the field.

fieldcady2 karma

When I started CS grad school I was pretty insecure, because I almost certainly knew less CS than any other incoming student. But it turned out that all of the profs wanted to work with me because of my math/physics background, and they figured that I could pick up the coding on the fly. For better for worse, degrees in physics turn heads, and that makes the transition a lot easier.

I can't point to any concrete ways that physics has helped me, in terms of specific skillsets or anything like that. But there is some truth to the physics department mantra that it "teaches you how to think". You probably shouldn't major in physics unless you're considering it as a career path, but it does give you an intuitive facility with math many other people don't have (including mathematicians, who are often too focused on minutia about proofs and definitions).

trai_dep1 karma

[deleted]

fieldcady2 karma

I'm afraid I don't know the best datasets to use. Personally I generally just scrape Wikipedia.

n3utrino1 karma

Hey there, great AMA! I'm in grad school for particle physics and I'm leaning towards CS now too.

My question for you is this: are you genuinely happy with the work side of your life? That's what matters most to me; I'd like to work reasonable hours (but I'm happy to work a shit ton if I'm really interested in the work), work around happy people, help further the human race in some way, and have time for a wife+kids.

P.S. Hire me in two years, please ;)

fieldcady2 karma

Fair question. I have to say that I don't get any personal fulfillment from optimizing online ad campaigns, which is maybe half of what my company does. But the other half is things like improving manufacturing pipelines, helping companies develop new data-based products, making sure the right info gets to the right people, etc. In those cases I really feel like I'm helping to make the world a more efficient place.

As far as having enough personal time, working with cool people, getting paid well, etc. I'm spoiled. Very good on all fronts, partly because data scientists are in high demand right nowl

Send us a resume when you're ready! Data science companies hire a lot of physicists :)

lacemaker1 karma

Why is studying Automata important? I am currently taking it, and I don't see how it is helpful. Thank you

fieldcady2 karma

Do you mean push-down automata in theoretical CS? It's not actually important. Sorry :(

haxel901 karma

Hi!

As a person working with big data in medicine I'm really looking forward to the answers in this thread. My question is:

Which scientific field do you think would gain the most by introducing more big data analysis?

fieldcady2 karma

Probably biology. It is complicated enough that we should expect to find tidy little formulas like in physics and chemistry. They also collect massive, noisy datasets that are very challenging to work with. But this isn't news to them - many biologists are already doing this stuff, and it's really exciting to watch :)

MHS11 karma

Thanks for the ama! Just out of curiosity, do you use programs like Pajek and Gephi to make visualizations of networks? Or are there other/better programs you would rather use?

Also, do you have ethical concerns about gathering certain types of big data? Are you confronted with that in your work?

fieldcady1 karma

Afraid I don't have any good advice about network visualizations.

No, I'm not worried about the ethical stuff. I wrote an IDG post a while ago (http://www.idgconnect.com/blog-abstract/5490/big-data-the-jetsons-not-minority-report) discussing the concerns that people have. I think they're overblown. But more importantly, I think that most of the work that is being done with Big Data is with things that people have no problem with.

0polymer01 karma

How did your transition work? Did you enter as a physicist or computer scientist? When did you change your mind and why?

fieldcady2 karma

I did physics in undergrad, then math/cs in grad school, and got into Big Data in the workforce. The transition was relatively smooth because professors in CS love to work with people with strong math backgrounds. I decided to do something along the lines of data science when I dropped out of my phd program, but I already knew at that point that I wanted to work in industry rather than academia.

seagullswoop1 karma

Directions or advice for teacher/future teachers?

fieldcady2 karma

Oh boy, there's so much cool stuff. My biggest one would be to learn a programming language. Javascript might be a good one, since it is easy to learn and lets you make cool interactive webpages. Coding will become an increasingly important math skill for kids.

Also, my goal in this AMA isn't to plug my book, but you might want to consider taking a look at it (link below) since I talk about lots of different things and how they relate to math (including math education specifically). You can also check out my blog, which has a lot of the same content and is totally free.

Link to book: http://www.amazon.com/What-Math-humans-speak-means-ebook/dp/B00LZLQPBQ/ref=sr_1_1?ie=UTF8&qid=1412791705&sr=8-1&keywords=what+is+math+field+cady

Otrante1 karma

Is it your love of maths that made you pick this career? Would you say it is a good option for students graduating?

fieldcady2 karma

Yeah, math is basically what got me into this. I started in physics, but discovered that my favorite part of physics was the math, so I went into applied math. Then I discovered that the coolest "math" going on was actually in computer science departments, so I veered that direction.

Data science is an excellent career goal for students. There's a shortage of people, it pays extremely well, and you can do it anywhere. And if it doesn't work out, it transitions well into lots of other jobs like software engineering, business intelligence, statistics, etc.

johnny123bravo1 karma

Do you love physics just like you love maths ?

fieldcady2 karma

I started out as a physics major, but discovered that the reason I enjoyed physics so much was the math. Objectively physics is awesome, in many ways more awesome than math. But math is what I personally enjoy working with on a day-to-day basis. I still love physics as a hobbyist though, and keep meaning to read more of the Feynman lectures!

Elwasd1 karma

How do you think the IoT (internet of things) will affect the Big Data scene? I currently work in IT, planning on short-termto get into network admin, long-term, computer science (no degree yet D:).

fieldcady1 karma

I think it'll be huge. I think that that's where a large fraction of the datasets will come from. Simply put, machines generate way more data than humans do.

icxcnika1 karma

If Einstein's theory of relativity means that for any given observer, the faster an object moves through space, the slower it moves through time... does that mean that relative to us, light is the oldest thing in the universe?

fieldcady2 karma

We can talk about the age of a photon from our reference frame. Like, it's been traveling for 4 thousand years form the star it came from. That's totally fine. It's just that (as I understand it - I'm not an expert in this stuff) the photon of light would consider its age to be zero.

THLycanthrope1 karma

How do you feel about "Common Core" math?
It seems like my son is learning a lot, but all the parents are upset that they feel retarded when they try to help with homework.

fieldcady2 karma

Unfortunately I don't know a lot about the Common Core. However, mathematical illiteracy is a big issue in this country, so I don't think that the competence of parents should be taken as a good indicator of what's useful math to learn.

sovietskaya1 karma

Hi. Do you think any ordinary kid can be motivated to have interest in the field that you do? What things in your childhood that greatly inspires you to do what you are doing now or you just found out what you really like later on in life? Thanks.

fieldcady2 karma

I've always enjoyed math. Back in the day I wanted to be a senator, or a stock broker, and a paleontologist, but math has always inspired me.

For most people though, they learn this stuff because they're passionate about some area of application. My wife learned math and programming because she loved engineering and building things. Lots of data scientists initially have a passion for physics or other science. Economics is another big one. My boss was originally interested in applying math to the social sciences.

I am not an expert in education, but I do have a couple thoughts. I think that they should be taught from an early age about the cool stuff you can do with computers. And I think they should be taught math not as a stand-alone abstract subject, but as a cool tool that illuminates all other subjects.

nonconformist31 karma

Can you please tell me why we don't have statistics on police related shootings that are and have been fatal in the USA?

fieldcady1 karma

That's a disturbingly good question, but not one I'm qualified to answer. Not sure I'd even want to know the answer either...

photonasty1 karma

What do you think are the biggest problems today in how mathematics education is approached (ranging from elementary to high school)? It seems like a lot of people get turned off to math early on, especially people who are maybe a bit more naturally inclined toward verbal or language skills. Is there anything you think should be improved, or approached in a different way, to make math education more effective? It seems like a lot of otherwise highly intelligent adults are strikingly math-illiterate.

fieldcady3 karma

That's the million dollar question, isn't it? I don't have the ultimate answer, but I think some partial answers are

  • less emphasis on algebra. Honestly most people don't need it.

  • more emphasis on rudimentary statistics. It's important for teaching people how to think clearly and rigorously.

  • society needs to cut the bullshit about girls being bad at math

If I was running an experimental class starting from childhood, I would combine math and computer programming into one class. People would learn to code simple games, write programs that calculated the area of geometric shapes, etc. I think it might be a much more engaging way than things are currently done, but I might be wrong.

From what I understand, Montessori schools do an extremely good job of connecting math concepts and formalisms to the intuitive concepts they describe. My wife went to a Montessori school, and I'm always envious when she explains to me how they were taught math.

JabberBody1 karma

Isn't math being a language "just like any other" disregarding Godel's incompleteness theorem? Or what am I missing?

Sincerely, someone who knows very little about math but a bit about logic

fieldcady1 karma

No worries. Seems there's a number of haters on this AMA, but I'm glad you're sincere. In my mind Godel's theorem is less about math and more about formal deduction. Neither math nor language has axioms inherent to them; you pick your axioms in a given situation. The incompleteness theorem would then apply equally well to deduction in math and natural language, insofar as you go to the trouble of reducing either to a formal system.

Does that make sense?

HouseOfTheRisingFuck1 karma

How do you see big data analysis evolving over the next 5-10 years? How do you think this landscape will be changing, not only in terms of analysis but practical/business applications as well?

fieldcady2 karma

oh man, that's really hard to say. I think that much of what happens will not be publicly visible, but instead will be efficiency gains in complex businesses like manufacturing, airlines, etc. I think people will have an easier time diagnosing problems more quickly, they'll be able to set better prices for their goods, etc. Sears is an example of a company that's doing a lot of this. And of course, ads will be slightly more clickable.

As far as stuff that is a little more exciting to the layman, I think mobile and wearables are the big areas. "Big Data" is what will enable a ton of the applications in those spaces.