Rust - AtomicUsize Example (thread spawn)

AtomicUsize. Suppose we want to process the elements in an array or vector on multiple threads in Rust. Each thread should "steal" an element when it is available for work.

With AtomicUsize, we can track the next available element. Then by calling fetch_add, each thread can safely access and increment the shared index.

usize

Here we have a struct "Shared" that has an array of 10 elements. We create an Arc to the struct, and then each thread "steals" elements to process.

Step 1 First we create some shared data. The Shared struct contains the AtomicUsize that tracks the index to process.

struct

Step 2 We create 8 structs in a vector (one for each thread) and clone an Arc to the Shared struct in each one.

vec

Arc

Step 3 Since we have 8 structs in our vector, when we spawn a thread for each struct, we will have 8 threads running at once.

Step 4 We access the shared data through the Arc on each Top struct. This means the Arc itself is moved into the thread.

Step 5 Each thread calls fetch_add on the AtomicUsize in a loop. This is how each thread "steals" work to be done on the array.

use std::sync::atomic::*;
use std::sync::*;
use std::thread;

struct Top {
    arc: Arc<Shared>,
}

struct Shared {
    test: AtomicUsize,
    array: [usize; 10],
}

fn main() {
    // Step 1: create some shared data.
    let mut shared = Shared {
        test: AtomicUsize::new(0),
        array: [0; 10],
    };
    shared.array[5] = 20;
    shared.array[9] = 200;

    // Step 2: put an Arc to the shared data in 8 different structs, one for each thread.
    let mut top_vec = vec![];
    let arc = Arc::new(shared);
    for _ in 0..8 {
        top_vec.push(Top { arc: arc.clone() })
    }

    // Step 3: create a thread for each struct.
    let mut children = vec![];
    for top in top_vec {
        children.push(thread::spawn(move || {
            // Step 4: access the shared data from the Arc.
            let t = top.arc;

            // Step 5: use AtomicUsize to safely process each sequential array element on different threads.
            loop {
                let i = t.test.fetch_add(1, Ordering::SeqCst);

                // End when no remaining indexes.
                if i >= t.array.len() {
                    break;
                }
                println!("THREAD ATOMIC INDEX: {}, ARRAY VALUE: {}", i, t.array[i]);
            }
        }));
    }
    for child in children {
        let _ = child.join();
    }
    println!("DONE");
}THREAD ATOMIC INDEX: 0, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 1, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 2, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 3, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 4, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 5, ARRAY VALUE: 20
THREAD ATOMIC INDEX: 6, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 7, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 8, ARRAY VALUE: 0
THREAD ATOMIC INDEX: 9, ARRAY VALUE: 200
DONE

AtomicUsize versus Mutex. Does AtomicUsize have a performance advantage over using a Mutex? Here we test the performance of these 2 constructs in Rust.

Version 1 We create 8 threads and then for many iterations on each thread, we call fetch_add on an AtomicUsize.

Version 2 We have the same 8 threads but use lock() on a Mutex surrounding a usize. We increment the usize.

Mutex

Result The AtomicUsize has a clear performance advantage over the Mutex. When possible, prefer atomic types.

use std::sync::atomic::*;
use std::sync::*;
use std::thread;
use std::time::*;

const MAX: usize = 1000000;
const THREADS: usize = 8;

fn main() {
    // Version 1: use AtomicUsize on 8 threads at once.
    let t0 = Instant::now();
    let arc = Arc::new(AtomicUsize::new(0));
    let mut thread_vec = vec![];
    for _ in 0..THREADS {
        thread_vec.push(arc.clone());
    }
    let mut children = vec![];
    for t in thread_vec {
        children.push(thread::spawn(move || {
            for _ in 0..MAX {
                let v = t.fetch_add(1, Ordering::SeqCst);
                if v == usize::MAX {
                    panic!("Error");
                }
            }
        }));
    }
    for child in children {
        let _ = child.join();
    }
    println!("{}", t0.elapsed().as_nanos());

    // Version 2: use Mutex on 8 threads at once.
    let t1 = Instant::now();
    let arc = Arc::new(Mutex::new(0));
    let mut thread_vec = vec![];
    for _ in 0..THREADS {
        thread_vec.push(arc.clone());
    }
    let mut children = vec![];
    for t in thread_vec {
        children.push(thread::spawn(move || {
            for _ in 0..MAX {
                let v = &mut t.lock().unwrap();
                if **v == usize::MAX {
                    panic!("Error");
                }
                **v += 1;
            }
        }));
    }
    for child in children {
        let _ = child.join();
    }
    println!("{}", t1.elapsed().as_nanos());
}327820167 ns    AtomicUsize
577529042 ns    Mutex usize

The term "stealing" is used in concurrency algorithms—it means a thread "takes" work to be done, and marks it as done. Then other threads do not repeat the same work.

Important With this approach, each thread is kept busy for most of the program runtime, which increases overall concurrency.

Tip In my experience, using an AtomicUsize to coordinate threads doing work is faster than dividing the work beforehand.

Chunks

Tip 2 In benchmarks, the overall CPU usage is higher, as each thread is kept busy at once.

Summary. In Rust we can implement simple threading without using heavier concurrency crates. In simple programs at least, this can be advantageous as the program is easier to reason about.