alan_du

http://www.reddit.com/user/alan_du

Highest Rated Comments

alan_du6 karma2018-08-15 18:53:31 UTC

One of Julia's key selling points is solving the two languages problem, where instead of writing my core library in C/C++ and my application in Python, I can do everything in Julia. In the meantime, however, it seems like we have a new two "devices" problem where now I write my core computing algorithms in CUDA for the GPU (or TensorFlow for the TPU) and wrap those in a higher-level API. This is especially acute in machine learning, where there a ton of different accelerators being developed (from GPUs to TPUs to FPGAs).

What do you think Julia's role is in this new "two devices" world?

View History Share Link

alan_du4 karma2018-08-15 19:03:02 UTC

One of the trends (at least from my POV) in the data science and ML worlds is to more explicitly embrace the two-language problem by pushing more logic into the "lower-level" language (C/C++) while making the upper-level (relatively) thin bindings.. TensorFlow and Apache Arrow (both written in C++) are good examples of what I mean (especially compared to Scikit-Learn and Pandas respectively) -- most of their core logic is written in C++, but bindings are accessible to multiple languages (Java, C, Python, Go, JavaScript, etc), which allows different languages to interoperate more-or-less seamlessly (as long as you stick within the framework). Arguably Apache Spark's dataframe API is another example, where the core logic is Scala and it's accessible through Scala, Python, and R. BLAS/LAPACK might be the original example of this (although TBH I've only interacted with them through NumPy and SciPy).

I guess I have two questions here:

Do you think Julia is well-suited to writing some of the "infrastructure" libraries / frameworks? Is Julia well-suited to writing, e.g. a deep learning framework that's accessible through other languages or do you think C and C++ will continue to be the language(s) of choice here?
What do you think of this trend in general? Do you think, for example, that Julia will end up binding to Apache Arrow for most of its dataframe heavy lifting, or do you think people will prefer to write their own versions in pure Julia?

View History Share Link

alan_du2 karma2018-08-15 19:11:48 UTC

That's probably true (I suspect most of the training logic is Python-only for example), but we've had very few problems taking a SavedModel from a Python TensorFlow program and doing inference in C++ and Go (we did have to hack the TensorFlow build system a little bit to access certain ops, but everything seems to work).

So I guess with TensorFlow it's more about the "core operations" in C++ with a ton of sugar in the respective languages -- which also makes some sense because every language has different idioms. I suspect that if Apache Arrow takes off, that's how things will work in Python -- I can't imagine abandoning Pandas anytime soon, but maybe the core memory interface and computations will happen through Arrow.

TLDR: I agree I'm oversimplifying, but I think my point still stands. TensorFlow is a big framework, mostly written in C++, that I can deploy their models in a bunch of different languages (at least Python, C++, and Go. I don't know how well the other bindings work).

View History Share Link

alan_du2 karma2018-08-15 19:25:41 UTC

> Yes, I think we'll see people doing more of this kind of thing with julia. There's already projects like https://github.com/JuliaDiffEq/diffeqr which expose Julia's differential equations ecosystem to R, and as more and more such killer use cases are developed in Julia, I fully expect people to want to call them from other languages.

That's super cool! I'm definitely going to take a look! Maybe this is me being selfish, but I would 100% recommend spending energy on exposing bindings to Julia libraries this as frictionless as possible. At least at $DAYJOB (and I imagine it's similar at a lot of places), we have a ton of investment in Python and C++ already and I can't see us moving away from that anytime soon, but if there are killer libraries in Julia exposed to Python we'd definitely want to use them!

Out of curiosity -- how does the packaging and multiple dispatch work when exposing bindings to other languages? Does it only work with a handful of types, letting Julia pre-compile everything, or do you have to ship with the Julia language to JIT things?

View History Share Link