Multithreading in Nim - Notes

Well, this topic has made a few people leave Nim and a few people just couldn't find
enough examples to help them with it so they gave up.

Either way, nim developers who stuck around have found out different ways to get it done
a few of them include changing the memory model of the compiler and if you wish to, you can
even change the garbage collector separately to make multithreading a little more easy.

Now, I'm not going deep into any of the topics since this is just a collection of
notes and snippets that I can refer back to instead of searching the documentation again
for something similar.

High Level Threading

High level threading can be done by importing threadpool and then using spawn to
create threads and the running of these are handled by the threadpool

A simple example would look like this

import "std/threadpool"

proc worker(value: int): int =
    return value

let result:FlowVar[int] = spawn worker(1)
var fromFlow = ^result
echo fromFlow

To explain.

  1. "std/threadpool" adds in macros and types required for handling FlowVar and ^ syntaxes.
  2. the procedure worker is what's going to be run in a thread, for now we are just going to run 1 thread which will return the value passed to it, pretty useless but get's the point across.
  3. result, as mentioned above gives us a FlowVar generic which has been casted to the type int since we are expecting a number from the thread.
  4. spawn is used to create a thread and the worker runs with the passed value 1
  5. We wait for the thread to complete it's work by using ^ and this is a blocking operation so if you have a sequence of such FlowVars you'll avoid blocking it till all
    workers are spawned and then block when you have all the FlowVars types returned from the spawned threads.

Low Level threading

This is the approach I chose to go ahead with for mudkip , for those who don't know, mudkip is a simple doc generator I'm writing to keep consistency and speed for me when I'm writing documentation.

The approach is similar to the above one but we're going to dig deeper since instead of
the macros handling the pointer passing for us, we'll be doing it manually.

The below snippet is from mudkip at this point and might change in the future but for now
let's look at the code.

## Import the needed standard lib helpers
import "std/os"
import "std/locks"

## Define the base structure of what the file meta
## is going to contain
type FileMeta = object
  path: string
  lastModifiedTime: times.Time
  outputPath: string

## We then take in consideration of what the thread would contain.
## in this case we want the thread to have access to a
## shared state between threads, the point of the shared state is
## so that if a new file get's added and the chunks change, the threads
## process the new chunks accordingly
type ThreadState = object
  page: int
  statePtr: ptr seq[seq[FileMeta]]

  ## we now declare a channel variable, which will be needed to pass through a very
  ## simple string message around, you can solely depend on channels for passing
  ## data around in the threads but that can get complex to handle very quickly.
  chan: Channel[string]

  ## Next we are defining `threads` of the type `ThreadState`, this is necessary when
  ## working with low level procedure `createThread` since you need to cast the type
  ## of the parameters you send with it.
  threads: array[4, Thread[ThreadState]]

  ## Up next, we have the actual state and a pointer reference to it
  ## the pointer reference is what we'll be passing to the workers
  ## while the original is guarded with a lock so that when locked
  ## no other thread can modify it. Even though in our case the threads only
  ## read and never write to the state but if you do plan to write to the state
  ## then make sure you `acquire` and, `release` the lock when done with the mutation
  sharedFilesState {.guard: lock.}: seq[seq[FileMeta]]
  sharedFilesStatePtr = addr(sharedFilesState)

## Just taking the length which I use for creating chunks
## of the list of files I need to process
let maxThreads = len(threads)

## This is the worker function which takes in the `ThreadState`
## param that we defined before and it handles the base of the file
## watching logic.
## Each worker runs at an interval of 750ms to check if any file
## in the batch it's been given has changed.
## if it has changed then it'll send the main thread a request to
## update the state and also to reprocess the file.
## This is done with the channel, the channel sends the path to the file
## and since I'm not handling anything else in the thread, the main thread
## reprocessed the path that it gets and updates the state for me
proc watchBunch(tstate: ThreadState){.thread.} =
  while true:
    let data = tstate.statePtr[]
    let batchToProcess = data[]

    for fileMeta in batchToProcess:
      let latestModTime = getLastModificationTime(fileMeta.path)
      if latestModTime != fileMeta.lastModifiedTime:

    # force add a 750ms sleep to avoid forcing commands every millisecond

## This is not the entire implementation so you can read the whole thing on
## the repo.
## If the procedure gets a `poll` param with true, we open the channel to start
## listening for messages.
proc mudkip(poll:bool)=
    if poll:
    echo "Watching: ", input

    ## Then we go through each index of threads and use `createThread` to cast to
    ## type `[ThreadState]` and pass it a new threadstate which tells the worker
    ## which index of the chunk does it have to work with. Which is passed with
    ## the `page` property in state and the whole set of chunks are passed as
    ## a reference pointer, `sharedFilesStatePtr` on the `statePtr` property.
    for i in 0..high(threads):
      createThread[ThreadState](threads[i], watchBunch, ThreadState(
        page: i,
        statePtr: sharedFilesStatePtr

    ## Then we on the main thread keep waiting for messages on the channel
    ## as soon as we receive one, we update the shared state in `updateFileMeta`
    ## and we process the file with `fileToHTML`
    ## this one is timed 500ms instead of 750ms to have a small rest period for
    ## the CPU, or we'll overload it with polling instructions.
    while true:
      # wait on the channel for updates
      let tried = chan.tryRecv()
      if tried.dataAvailable:
        updateFileMeta(input, output)
        echo getCurrentTimeStamp() & info("Recompiling: "), tried.msg
        fileToHTML(tried.msg, output)

proc ctrlCHandler()=
    ## Finally we close the channel, sync all the threads and deinit the lock as it
    ## won't be used anymore.

## This just registers the above handler as the ctrlC action

That's definitely a lot to read but that's still a simple implementation. I could've made it more functional will a threadpool but in my case the threadpool would still be blocking the main thread with ^ and I didn't want to do it with it. I wanted to have the main thread also listening for stuff since it's what actually handles processing the file,
the threads are just there to listen for changes and not to actually process the files since
that would need a lot more communication between the threads to work properly.

Also, since the main thread only processes one file per thread request, it's still fast enough to handle this on most systems.