Tom Kwong's Infinite Loop
Technology ideas never end...

The meaning of functions in Julia

Jun 17, 2020

hacking

Photo by Romain Vignes on Unsplash

When I first learned about the Julia programming language, there were a few things that gave me the "wat" moments. One of those surprises involves functions naming.

Interestingly, my naive question triggered over 200 follow-up posts in the Julia Discourse forum. 200! That's one of my best record for motivating fellow developers! 😄

What is the issue?

Let's first take a look at a very simple example. Suppose that I have a CalendarApp module that contains the following code:

using Dates

struct Meeting
    subject::String
    start_time::DateTime
    end_time::DateTime
end

Then, I want to create a function that calculates the length of a meeting. Super simple, right? Let's go for it:

length(m::Meeting) = Hour(m.end_time - m.start_time)

When I code, I like a REPL-based development workflow so I can test new code quickly:

julia> covid_meeting = Meeting("COVID Response Committee",
                               DateTime(2020, 6, 14, 8, 0, 0),
                               DateTime(2020, 6, 14, 10, 0, 0))
Meeting("COVID Response Committee", 2020-06-14T08:00:00, 2020-06-14T10:00:00)

julia> println(length(covid_meeting))
2 hours

So far so good! Now, try to use length function to determine the length of an array.

julia> length([1,2,3])
ERROR: MethodError: no method matching length(::Array{Int64,1})
You may have intended to import Base.length
Closest candidates are:
  length(::Meeting) at REPL[3]:1

Wat! That's right. Here we go the exact "wat" moment. What happened to the regular length function?

😵 There are two length functions!

The answer is quite simple. There are actually two length functions around. One of them is defined in Base module for which everyone is familiar with, and the other one is just defined above.

Here's my own length function:

julia> length
length (generic function with 1 method)

Now, restart the REPL to clear things up and try again:

julia> length([1,2,3])
3

julia> length
length (generic function with 81 methods)

Now, I am able to access the original length again. You may also notice that this length function is attached to 81 methods.

So, how did that happen? It seems that I might have hidden the original length function by defining our own length function earlier. Out of curiosity, I can define my own function again:

julia> using Dates

julia> struct Meeting
           subject::String
           start_time::DateTime
           end_time::DateTime
       end

julia> length(m::Meeting) = Hour(m.end_time - m.start_time)
ERROR: error in method definition: function Base.length must be explicitly imported to be extended

Man, now it's doing the exact opposite! It doesn't even let me define length function anymore! This is the second "wat" moment for the same problem.

🤔 Did I do anything wrong?

It might worth a quick discussion here about why I did what I did. And, why I thought I was right.

First of all, I came from an object-oriented programming background. To be more precise, I had many years of experience developing in the Java language.

How would the same problem look in OOP? Well, in the object-oriented world, there is probably some kind of Array class that defines a length method. Then, I would just define a Meeting class with a length method. When I call the method, there is no ambiguity. For instance:

my_array.length();        // invokes the length method defined in Array class
my_meeting.length();      // invokes the length method defined in Meeting class

These are just two different methods from two different classes.

But wait... Didn't I just do the same thing in Julia? If I look at the signature of my length function, it accepts an argument of data type Meeting. So, why couldn't Julia just call my function when I pass a Meeting object, and call the regular length function when I pass an array?

Here is primary misconception.

Multiple dispatch only work for a single function. What I have done above actually introduced a second length function, and that function is attached to a single method.

More precisely, the two length functions are defined in their own modules. Let me prefix with their respective namespaces and the number of methods:

Base.length               # 81 methods
CalendarApp.length        # 1 method

🐛 Here's the easy fix...

As I want multiple dispatch to kick in, I just need to make sure that I define a new method for the Base.length function rather than defining my own function. This is also called extending function. There are two ways to archive that.

Option 1: prefix the function name with the module name

Base.length(m::Meeting) = Hour(m.end_time - m.start_time)

Option 2: import the length function before defining it

import Base: length

length(m::Meeting) = Hour(m.end_time - m.start_time)

Now, let's start a new REPL and try again:

julia> using Dates

julia> struct Meeting
           subject::String
           start_time::DateTime
           end_time::DateTime
       end

julia> Base.length(m::Meeting) = Hour(m.end_time - m.start_time)

julia> length
length (generic function with 82 methods)

Alright, the length function now has 82 methods attached. Let's confirm its functionality.

julia> covid_meeting = Meeting("COVID Response Committee",
                               DateTime(2020, 6, 14, 8, 0, 0),
                               DateTime(2020, 6, 14, 10, 0, 0))
Meeting("COVID Response Committee", 2020-06-14T08:00:00, 2020-06-14T10:00:00)

julia> length(covid_meeting)
2 hours

julia> length([1,2,3])
3

Voila! Problem solved!

📌 Wait, why do I have to do that?

There is already a simple solution once I understand how multiple dispatch works in Julia. So, how did I trigger 200+ follow-up posts in Discourse?

The main controversy is why I have to be explicit about extending Base.length. Since Base.length has a name of length, and CalendarApp.length has a name of length, why wouldn't Julia just automatically merge them?

The whole thread of discussion in Discourse goes about how it can be more convenient and less confusing for new Julia users when the functions can be merged automatically. I will now argue (against my original opinion in the Discourse thread) that it is a bad idea to do so.

Here is the main reason.

Just because two functions have the same name doesn't imply that they mean the same thing. Every function is designed to have a specific meaning. As most people write code in English, the meaning of length function is pretty much aligned with what one commonly know what a length is.

To be clear, I will just show the first definition from Dictionary.com:

Length (Noun): the longest extent of anything as measured from end to end.

So, the length concept refers to a measurement. As with any kind of measurement, it means that I should expect it to return a numerical value. Hence, when anyone calls the length function, a number is expected to be returned. This is literally an implicit contract.

Enforcing the same meaning for all length methods turns out to be a very useful thing. Right off the bet, I can display a graphical user interface that shows a bar that represents a measurement. The same component works regardless of whether the object is an array, a String, or a Meeting!

This is also the main reason why Julia packages interoperate so well with each other!

As long as there is consistent names and meanings, we can build very powerful abstraction and interfaces. Then, everything just works with each other in harmony. You don't buy it yet? Just take a look at the various types of array implementations. These arrays can be used anywhere a regular array is accepted.

😈 Playing devil's advocate...

Now, what happens if I ignore the implicit contract and define the length of a meeting to be a string? For instance:

function Base.length(m::Meeting) 
   if m.end_time - m.start_time > Hour(1)
       return "Long"
   else
       return "Short"
   end
end

Well, it's probably fine because Meeting is my own data type.

However, it also means that I should not let anyone else use Meeting. Why? That's because another developer will probably get very confused to experience my length function returning a string rather than a number, and that could cause serious problems.

Remember the GUI component I talked about earlier? It's going to be so broken.

Not keeping a consistent meaning (implicit contract) for a function is a recipe for failure. It severely limits the reusability of functions.

🤓 What if I really want to use the same function name for different purpose?

If I insist that my length function should return a string, then I really have two options. First, I can define my own function and not extend from Base.length. Second, I could choose a different name for the function.

In the first scenario, I would be able to access both length functions. The caveat is that I will have to use Base.length and CalendarApp.length instead of the short form. This is needed to remove the ambiguity about which function I'm referring to.

The best practice, however, is to avoid naming functions with the same name that has already been used in Base. Why?

  1. All of the exported Base functions are automatically brought into every module with the exception of baremodules. So, you will have a conflict just like how it was described at the beginning of this post.

  2. If you develop packages, then you don't want your users to be confused about your function versus the one in Base.

Because the Base module is standard library that everyone uses, it's probably not a good idea to define a function with the same name but different meaning.

🛰️ What if the dependent module isn't Base?

Now, suppose that I am using a different module rather than Base. As an example, I'm going to pick on one of my favorite packages Distributions.jl. A typical Julia user would do the following:

using Distributions

I do that, too, when I need to use it interactively. However, if I need to use it in my app, then I would want to import only the functions that I need into my namespace. For example, let's say I want to calculate the mean and mode of some random-generated data, I would do this:

using Distributions: mean, mode

This is actually quite important!

First, by bringing only known functions into my namespace, it reduces the chance of function name collision. Just take a look at the huge number of exported names by Distributions.jl.

Second, I'm making my code future-proof. Let's say I have already defined a function named dist in my module. My code will still work even if Distribution.jl happens to define and export their own dist in a future version. So, I don't need to worry naming conflict because I only import mean and mode into my namespace.

Final thoughts...

Naming things is super important. Besides choosing the right word, it is also important to mean what you mean.

Over the years, I have developed a habit to ensure writing code that means what I mean. And, it's actually super simple. Just write documentations.

In Julia, I write a doc string for every function at the same time that I code that function. Sometimes I change the function name to match my doc string. At other times, I change the doc string to match my function name.

It is quite amazing how effective this can be. I encourage you to give that a try today!

Thank you for reading.

P.S. For more tips in writing good code in Julia, consider picking up my book Hands-on Design Patterns and Best Practices with Julia.


Loading Comments...