Suppose in Rust we have a directory full of many files, and we want to divide them into groups (chunks) and process them on threads. This can be done with thread::spawn
.
When dividing the objects into chunks, we must be careful to own the data on each thread. This will fix errors caused by the borrow checker in Rust.
Please change the directory path to a folder full of many files. These files will be processed in parallel. We place each file name in a Page object.
to_owned()
to transfer ownership.chunks()
to divide the pages vector into groups to place on threads. Again we must call to_owned()
on the chunk data.thread::spawn
call, we can process the files sequentially in the thread. We can access the global data as well.use std::{fs, io}; use std::thread; #[derive(Clone)] struct Page { file_name: String, index: usize, } struct PageGroup { pages: Vec<Page>, global: Vec<u8>, id: usize } fn main() -> io::Result<()> { // Get all files in directory. let mut pages = vec![]; for entry in fs::read_dir("/Users/sam/perls/t/")? { let path = entry?.path(); let path_str = path.to_str().unwrap(); // Add to vector of objects. pages.push(Page{file_name: path_str.to_owned(), index: pages.len()}); } // Global data. let mut global = vec![]; global.push(100); // Divide the page objects into 8 groups. let processors = 8; let chunk_len = ((pages.len() / processors) + 1) as usize; let mut groups = vec![]; for chunk in pages.chunks(chunk_len) { // Add a new group for this chunk. // Make sure to own the chunk and the global data. groups.push(PageGroup{pages: chunk.to_owned(), global: global.to_owned(), id: groups.len()}); } // Number of groups of pages. println!("GROUPS: {}", groups.len()); // Place threads in this vector. let mut children = vec![]; // Loop over groups. for group in groups { // Add spawned thread to children vector. children.push(thread::spawn(move || { // On each group, get the global data and id. let global_data = group.global; println!("{} GLOBAL LENGTH: {}, DATA: {}", group.id, global_data.len(), global_data[0]); // Process pages in group. for page in group.pages { println!("{} PAGE: {} {}", group.id, page.index, page.file_name); } })); } // Join all threads. for child in children { let _ = child.join(); } Ok(()) }GROUPS: 8 0 GLOBAL LENGTH: 1, DATA: 100 0 PAGE: 0 /Users/sam/perls/t/suffixarray-go 0 PAGE: 1 /Users/sam/perls/t/string-constructor ... 3 PAGE: 609 /Users/sam/perls/t/for-scala 7 GLOBAL LENGTH: 1, DATA: 100 7 PAGE: 1414 /Users/sam/perls/t/encapsulate-field ... 4 PAGE: 1008 /Users/sam/perls/t/unsafe 4 PAGE: 1009 /Users/sam/perls/t/response-writefile
Consider the Rust error "borrowed value does not live long enough." We can fix this by assuming ownership of all data by calling to_owned()
.
to_owned()
calls on Vecs and Strings to fix this error. Programs with this error can often be easily fixed.String
instead of a str
reference to make ownership simpler. Copying some data with to_owned()
may be needed.Consider the Page struct
in the program. We must have a "derive Clone" attribute on this class
, as the Page struct
must be owned by the vector in the program.
to_owned
on a struct
, add the Clone attribute. The trait can be added with just the attribute, no other code is needed.By using to_owned()
, we can safely access data on multiple threads. We can chunk files together in groups to process them on threads in Rust programs.