Feb 4, 2018
This blog post from May 2016 introduced a lot of skepticism about the Julia programming language. As one can see from comments on the blog page as well as recent discussions in the Julia Discourse forum, it is not without controversy.
The purpose of this blog post is to debunk the myths and determine what truly be a concern…. or not. Most blogs are opinionated but I will try my best to be fair and point out differences here so people can judge according to their use cases.
All benchmarks below are performed using my MacBook Pro laptop (Intel i5-4258U CPU @ 2.40GHz) with Julia version 0.6.2 and Python 3.6.3 from Anaconda distribution.
The C implementation of parse_integer benchmark was not coded in a performant/idiomatic fashion e.g. the use of strlen function. The core development team acknowledged the issue and quickly fixed the code (within 2 days) in October 2013.
The Java implementation of various benchmarks as well as Octave were questioned. After several rounds of discussions on GitHub, it boils down to the philosophy of the benchmarks – are we trying to understand how different language implementations perform for the same algorithms or are we trying to see how the best optimized code compares in these languages?
The benchmark page on Julia web site explains well:
It is important to note that the benchmark codes are not written for absolute maximal performance (the fastest code to compute recursion_fibonacci(20) is the constant literal 6765). Instead, the benchmarks are written to test the performance of identical algorithms and code patterns implemented in each language.
It is likely that one can optimize anything in any extendible language implementation and achieve great performance. In the case of Python, one can apply Cython, Numba, Numpy, and other techniques. This is the classic two-language problem as described by Stefan Karpinski.
The real questions are:
How difficult and how much effort is required to optimize anything?
Does optimization introduce additional complexity in code and hinders productivity?
The question is – does it matter? Maybe, maybe not.
If you have a bunch of short-duration apps that only takes a second to run then a longer startup time could mean a lot to you.
If you have a large computation project that takes 30 minutes to process then the 0.5 second startup time means nothing. This should be the more common case for most business applications, let alone the numeric/scientific computing community.
Just a couple notes:
The blog concluded with the benchmark results of 80 µs (Julia) vs 24 µs (Cython-Typed). I do not see such behavior. Perhaps the only explanation is that the time has changed and Julia has already gotten a lot better than before.
The fib
function was then optimized using a LRU cache. From a benchmarking perspective, it would not be a fair comparison if the algorithm itself is changed. Hence, we will ignore those results here.
The LRU-enhanced code was further optimized with Numba. For the same reason above, we will ignore those results.
Python
In [43]: def fib(n):
...: if n<2:
...: return n
...: return fib(n-1)+fib(n-2)
In [44]: %timeit fib(20)
3.27 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Cython
In [48]: %%cython
...: def fib_cython(n):
...: if n<2:
...: return n
...: return fib_cython(n-1)+fib_cython(n-2)
In [51]: %timeit fib_cython(20)
1.48 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Cython Typed
In [71]: %%cython
...: cpdef long fib_cython_type(long n):
...: if n<2:
...: return n
...: return fib_cython_type(n-1)+fib_cython_type(n-2)
In [72]: %timeit fib_cython_type(20)
47.8 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Julia
julia> fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
fib (generic function with 1 method)
julia> @btime fib(20)
48.075 μs (0 allocations: 0 bytes)
The author started converting the quick sort function to Cython and finally concluded that Numpy’s sort routine being the best.
Micro benchmark algorithm – Julia (356 μs) vs Cython (1030 μs)
Base sort function – Julia (233 μs) vs. Numpy (292 μs)
Python
In [83]: def qsort1(a, lo, hi):
...: i = lo
...: j = hi
...: while i < hi:
...: pivot = a[(lo+hi) // 2]
...: while i <= j:
...: while a[i] < pivot:
...: i += 1
...: while a[j] > pivot:
...: j -= 1
...: if i <= j:
...: a[i], a[j] = a[j], a[i]
...: i += 1
...: j -= 1
...: if lo < j:
...: qsort1(a, lo, j)
...: lo = i
...: j = hi
...: return a
In [84]: %timeit qsort1(lst, 0, len(lst)-1)
13.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Cython
In [100]: %%cython
...: cdef double[:] qsort2(double[:] a, long lo, long hi):
...: cdef:
...: long i, j
...: double pivot
...: i = lo
...: j = hi
...: while i < hi:
...: pivot = a[(lo+hi) // 2]
...: while i <= j:
...: while a[i] < pivot:
...: i += 1
...: while a[j] > pivot:
...: j -= 1
...: if i <= j:
...: a[i], a[j] = a[j], a[i]
...: i += 1
...: j -= 1
...: if lo < j:
...: qsort2(a, lo, j)
...: lo = i
...: j = hi
...: return a
...:
...: def qsort2_py(a, b, c):
...: return qsort2(a, b, c)
In [105]: %timeit qsort2_py(np.random.rand(5000), 0, 4999)
1.03 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numpy
In [61]: %timeit np.sort(lst)
292 µs ± 4.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Julia Microbenchmark
julia> function qsort!(a,lo,hi)
i, j = lo, hi
while i < hi
pivot = a[(lo+hi)>>>1]
while i <= j
while a[i] < pivot; i += 1; end
while a[j] > pivot; j -= 1; end
if i <= j
a[i], a[j] = a[j], a[i]
i, j = i+1, j-1
end
end
if lo < j; qsort!(a,lo,j); end
lo, j = i, hi
end
return a
end;
julia> sortperf(n) = qsort!(rand(n), 1, n);
julia> @btime sortperf(5000)
355.957 μs (2 allocations: 39.14 KiB)
Julia Base.sort
julia> @btime sort(rand(5000); alg=QuickSort)
233.293 μs (11 allocations: 78.66 KiB)
I was, however, a bit puzzled by Numba’s compilation error below and do not know how to get rid of it. Perhaps it has a material impact on performance as well.
Numba
In [106]: @jit
...: def mandel_numba(z):
...: maxiter = 80
...: c = z
...: for n in range(maxiter):
...: if abs(z) > 2:
...: return n
...: z = z*z + c
...: return maxiter
In [107]: @jit
...: def mandelperf_numba_mesh():
...: width = 26
...: height = 21
...: r1 = np.linspace(-2.0, 0.5, width)
...: r2 = np.linspace(-1.0, 1.0, height)
...: mandel_set = np.empty((width,height), dtype=int)
...: for i in range(width):
...: for j in range(height):
...: mandel_set[i,j] = mandel_numba(r1[i] + 1j*r2[j])
...: return mandel_set
In [109]: %timeit mandelperf_numba_mesh()
:1: NumbaWarning: Function "mandelperf_numba_mesh" failed type inference: Invalid usage of Function() with parameters ((int64 x 2), dtype=Function())
* parameterized
File "", line 7
[1] During: resolving callee type: Function()
[2] During: typing of call at (7) @jit
:1: NumbaWarning: Function "mandelperf_numba_mesh" failed type inference: cannot determine Numba type of
File "", line 8 @jit
:1: NumbaWarning: Function "mandelperf_numba_mesh" was compiled in object mode without forceobj=True, but has lifted loops.
159 µs ± 7.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Julia
julia> function mandel(z)
c = z
maxiter = 80
for n = 1:maxiter
if myabs2(z) > 4
return n-1
end
z = z^2 + c
end
return maxiter
end;
julia> mandelperf() = [ mandel(complex(r,i)) for i=-1.:.1:1., r=-2.0:.1:0.5 ];
julia> @btime mandelperf()
72.942 μs (1 allocation: 4.44 KiB)
Original – Numpy (2340 μs) vs. Julia (176 μs)
Random number generation outside loop – Cython (378 µs) vs Julia (176 μs).
Numpy
In [112]: def parse_int1_numpy():
...: for i in range(1,1000):
...: n = np.random.randint(0,2**31-1)
...: s = hex(n)
...: m = int(s,16)
In [113]: %timeit parse_int1_numpy()
2.34 ms ± 80 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Julia
julia> function parseintperf(t)
local n, m
for i=1:t
n = rand(UInt32)
s = hex(n)
m = UInt32(parse(Int64,s,16))
end
end
parseintperf (generic function with 1 method)
julia> @btime parseintperf(1000)
176.061 μs (2000 allocations: 93.75 KiB)
Cython (random number generated outside loop)
In [116]: %%cython
...: import numpy as np
...: import cython
...:
...: @cython.boundscheck(False)
...: @cython.wraparound(False)
...: cpdef parse_int_vec_cython():
...: cdef:
...: long i,m
...: long[:] n
...: n = np.random.randint(0,2**31-1,1000)
...: for i in range(1,1000):
...: m = int(hex(n[i]),16)
...:
In [118]: %timeit parse_int_vec_cython()
378 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Julia (random number generated outside loop)
julia> function parseintperf2(t)
n = rand(UInt32, t)
for i=1:t
s = hex(n[i])
m = UInt32(parse(Int64,s,16))
end
end;
julia> @btime parseintperf2(1000)
176.053 μs (2003 allocations: 97.97 KiB)
Phew! Let’s come back to the original blog post.
Unbalanced end
's. I consider this a non-issue but rather a personal taste/preference. Most languages have an ending mark for blocks. I don’t understand what unbalanced was referring to, however.
Use of ::
. I consider this a non-issue but rather a personal taste/preference as well.
Unorthodox choice of operators, punctuators and the syntax for multiline comments. Julia’s punctuations do not seem to be very different from many programming languages. This seems to be just another personal taste/preference issue.
One-based indexing. This is my favorite 🙂 There are two camps and both have merits. It really depends on your use cases and situations. As various people pointed out in the Discourse forum, there are packages (such as OffsetArrays and PermutedDimsArray) that one can use when zero-based indices are more natural. It is nice that we have choices.
Standard documentation system using Markdown. I started coding Julia just few months ago and found the documentation system intuitive and easy. However, I think the author made a decent point, with more information in a follow-up blog post. It seems that opinions are divided; in the case of Julia, the choice was made early on given several pain points using reST. After doing all kinds of research (here, here, and here), it appears to me that reStructuredText is more rigid and functional, while Markdown is easy to learn and use.
Bottomline – there is no right or wrong; we know developers never agree on one thing. Luckily, developers always have a choice to make their own decisions. I fully respect projects that select reStructuredText over Markdown, or vice versa.
ccall
. I found it untrue with the current version of Julia. I was unable to make it seg fault with either incorrect library name or incorrect data type. I always receive a nice error message.
julia> val = ccall((:getenv, "libc.so.6"), Ptr{UInt8}, (Ptr{UInt8},), var)
ERROR: error compiling anonymous: could not load library "libc.so.6"
dlopen(libc.so.6.dylib, 1): image not found
julia> val = ccall((:getenv, "libc.dylib"), Ptr{UInt8}, (Ptr{UInt8},), 123)
ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{UInt8}}, ::Int64)
Closest candidates are:
unsafe_convert(::Type{Ptr{UInt8}}, ::Symbol) at pointer.jl:35
unsafe_convert(::Type{Ptr{UInt8}}, ::String) at pointer.jl:37
unsafe_convert(::Type{T<:Ptr}, ::T<:Ptr) where T<:Ptr at essentials.jl:152 ...
@sprintf
and @printf
macros that generate a large body of code.
It was probably true at the time but there are multiple solutions today. This is an area that is undergoing rapid innovations. A couple of Julia packages address the problem e.g. Formatting.jl and StringLiterals.jl.
The code below generates only 63 lines of native code, compared to 500+ as the author claimed in the old days.
julia> @code_native Formatting.printfmt("{1:>4s} {2:.2f}", "abc", 12)
If I compile the C code below, I get 21 lines of assembly instructions. I guess it’s hard to beat C.
void f(char *buffer, const char *a, double b) {
sprintf(buffer, "this is a %s %g", a, b);
}
The libraries for unit testing are also very basic, at least compared to the ones in C++ and Java. FactCheck is arguably the most popular choice but, apart from the weird API, it is quite limited and hardly developed any more.
While I was developing the SASLib.jl package, I only use Base.Test and found it to be very user friendly and it does everything that I needed to do. The author mentioned the FactCheck package, which seems to be a reasonable add-on. I have never used FactCheck before so I cannot provide additional insights here.
I do not understand why that is such a big deal. If I count how many lines of code in each language for Julia v0.6, this is what I get:
Julia 175,147
C/C++ 70,694
Femtolisp 8,270
The code for C/C++ is pretty much required to have a base runtime of the Julia language. Then, a large majority of Julia is built using Julia itself. So we’re talking about 3.3% of the code base in Femtolisp, a Scheme-like Lisp implementation.
julia> 8270 / (175147 + 70694 + 8270) * 100
3.2544832769931253
I had a recent experience in contributing a small change to the Julia project. The community members at the Discourse forum are very friendly, and the core development team is always there as well. If I want to contribute more, there’s very little barrier and I can get a lot of help easily.
Finally, the author cited a third party article about how “Julia slowed down and got out of the radar“. I don’t know the environment in 2015 but given the fact that Julia Computing just got seed funding in 2017, I am expecting it to speed up rather than slow down. Time will tell if the momentum continues as Julia approaches 1.0 milestone.
I hope this blog article is useful for anyone evaluating Julia.
3-Feb-2018 8:00 PM PST – clarified the reason for not including certain performance results from the other blog and added references to array & string formatting packages (feedbacks from Christopher Rackauckas & Scott Jones)
4-Feb-2018 11:25 AM PST – updated Julia project lines of code statistics based on release-0.6 branch; updated notes about Markdown vs. reStructuredText after doing more research.