What Does a Variable Defined before a Scala Function Mean?

66 Views Asked by At

Learning Scala from the Scala for Data Science book and the companion Github repo, here I am particularly talking about this function, copied below for reference.

    def fromList[T: ClassTag](index: Int, converter: String => T): DenseVector[T] =
      DenseVector.tabulate(lines.size) { row => converter(splitLines(row)(index)) }

What does the DenseVector.tabulate(lines.size) mean between the = sign and the function body definition? New to scala (with background from python and C++), so cannot figure out if that DenseVector.tabulate(lines.size) is a local variable of the function being defined (when it should be declared inside the definition) or something else? It cannot be the return type, from what I understand of scala syntax.

Also, is the ClassTag equivalent to template in C++?

To help you answer the question,

  • splitLines has type scala.collection.immutable.Vector[Array[String]]
  • lines.size is an unsigned int (obvious, but still making it clear)
2

There are 2 best solutions below

2
MartinHH On BEST ANSWER

DenseVector.tabulate is a factory function (defined on the companion object of DenseVector) that has two parameter lists with one parameter each (so altogether, it takes two explicit parameters: size: Int and a function f: Int => V).

You can find its definition here (as part of the breeze library).

In (pseudo-)C++ (ignoring the ClassTag), the corresponding declaration would probably look something like this:

template<classname V>
class DenseVector {
public:
    // ... other class members

    template<classname V>
    static DenseVector<V> tabulate(int size, std::function<V(int)> f);
};

and then fromList would probably look something like this:

template<classname T>
static DenseVector<T> fromList(int index, std::function<T(std::string)> converter) {
    return DenseVector::tabulate(lines.size, [&converter](int row){
        return converter(splitLines(row)[index]);
    });
}

0
Gaël J On

Your example uses several syntactic sugars.

It's the equivalent of the following which might be easier to read when starting:

def fromList[T](index: Int, converter: String => T)(implicit classtag: ClassTag[T]): DenseVector[T] = {
  def rowConverter(row: ???): ??? = {  
    converter(splitLines(row)(index))
  }
  DenseVector.tabulate(lines.size)(row => rowConverter(row))
}

Notice that:

  • the whole second line (in your original code) is the body of the method
  • tabulate is a method taking two sets of parameters
  • the second set of parameters of tabulate is a single parameter which is a "lambda" function
  • the ClassTag thing is called "context bound", it's a way to say the method needs an implicit value of the given type parameterized with the other type. Classtag itself is a way to preserve info on the type at runtime (which would be lost due to "type erasure in the JVM).