AtomicUsize
Suppose we want to process the elements in an array or vector on multiple threads in Rust. Each thread should "steal" an element when it is available for work.
With AtomicUsize
, we can track the next available element. Then by calling fetch_add
, each thread can safely access and increment the shared index.
Here we have a struct
"Shared" that has an array of 10 elements. We create an Arc
to the struct
, and then each thread "steals" elements to process.
struct
contains the AtomicUsize
that tracks the index to process.Arc
to the Shared struct
in each one.struct
, we will have 8 threads running at once.Arc
on each Top struct
. This means the Arc
itself is moved into the thread.fetch_add
on the AtomicUsize
in a loop. This is how each thread "steals" work to be done on the array.use std::sync::atomic::*; use std::sync::*; use std::thread; struct Top { arc: Arc<Shared>, } struct Shared { test: AtomicUsize, array: [usize; 10], } fn main() { // Step 1: create some shared data. let mut shared = Shared { test: AtomicUsize::new(0), array: [0; 10], }; shared.array[5] = 20; shared.array[9] = 200; // Step 2: put an Arc to the shared data in 8 different structs, one for each thread. let mut top_vec = vec![]; let arc = Arc::new(shared); for _ in 0..8 { top_vec.push(Top { arc: arc.clone() }) } // Step 3: create a thread for each struct. let mut children = vec![]; for top in top_vec { children.push(thread::spawn(move || { // Step 4: access the shared data from the Arc. let t = top.arc; // Step 5: use AtomicUsize to safely process each sequential array element on different threads. loop { let i = t.test.fetch_add(1, Ordering::SeqCst); // End when no remaining indexes. if i >= t.array.len() { break; } println!("THREAD ATOMIC INDEX: {}, ARRAY VALUE: {}", i, t.array[i]); } })); } for child in children { let _ = child.join(); } println!("DONE"); }THREAD ATOMIC INDEX: 0, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 1, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 2, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 3, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 4, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 5, ARRAY VALUE: 20 THREAD ATOMIC INDEX: 6, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 7, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 8, ARRAY VALUE: 0 THREAD ATOMIC INDEX: 9, ARRAY VALUE: 200 DONE
AtomicUsize
versus Mutex
Does AtomicUsize
have a performance advantage over using a Mutex
? Here we test the performance of these 2 constructs in Rust.
fetch_add
on an AtomicUsize
.lock()
on a Mutex
surrounding a usize
. We increment the usize
.AtomicUsize
has a clear performance advantage over the Mutex
. When possible, prefer atomic types.use std::sync::atomic::*; use std::sync::*; use std::thread; use std::time::*; const MAX: usize = 1000000; const THREADS: usize = 8; fn main() { // Version 1: use AtomicUsize on 8 threads at once. let t0 = Instant::now(); let arc = Arc::new(AtomicUsize::new(0)); let mut thread_vec = vec![]; for _ in 0..THREADS { thread_vec.push(arc.clone()); } let mut children = vec![]; for t in thread_vec { children.push(thread::spawn(move || { for _ in 0..MAX { let v = t.fetch_add(1, Ordering::SeqCst); if v == usize::MAX { panic!("Error"); } } })); } for child in children { let _ = child.join(); } println!("{}", t0.elapsed().as_nanos()); // Version 2: use Mutex on 8 threads at once. let t1 = Instant::now(); let arc = Arc::new(Mutex::new(0)); let mut thread_vec = vec![]; for _ in 0..THREADS { thread_vec.push(arc.clone()); } let mut children = vec![]; for t in thread_vec { children.push(thread::spawn(move || { for _ in 0..MAX { let v = &mut t.lock().unwrap(); if **v == usize::MAX { panic!("Error"); } **v += 1; } })); } for child in children { let _ = child.join(); } println!("{}", t1.elapsed().as_nanos()); }327820167 ns AtomicUsize 577529042 ns Mutex usize
The term "stealing" is used in concurrency algorithms—it means a thread "takes" work to be done, and marks it as done. Then other threads do not repeat the same work.
AtomicUsize
to coordinate threads doing work is faster than dividing the work beforehand.In Rust we can implement simple threading without using heavier concurrency crates. In simple programs at least, this can be advantageous as the program is easier to reason about.