Struct unicode_segmentation::GraphemeCursor
source · pub struct GraphemeCursor { /* private fields */ }
Expand description
Cursor-based segmenter for grapheme clusters.
This allows working with ropes and other datastructures where the string is not contiguous or fully known at initialization time.
Implementations§
source§impl GraphemeCursor
impl GraphemeCursor
sourcepub fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor
pub fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor
Create a new cursor. The string and initial offset are given at creation
time, but the contents of the string are not. The is_extended
parameter
controls whether extended grapheme clusters are selected.
The offset
parameter must be on a codepoint boundary.
let s = "हिन्दी";
let mut legacy = GraphemeCursor::new(0, s.len(), false);
assert_eq!(legacy.next_boundary(s, 0), Ok(Some("ह".len())));
let mut extended = GraphemeCursor::new(0, s.len(), true);
assert_eq!(extended.next_boundary(s, 0), Ok(Some("हि".len())));
sourcepub fn set_cursor(&mut self, offset: usize)
pub fn set_cursor(&mut self, offset: usize)
Set the cursor to a new location in the same string.
let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.cur_cursor(), 0);
cursor.set_cursor(2);
assert_eq!(cursor.cur_cursor(), 2);
sourcepub fn cur_cursor(&self) -> usize
pub fn cur_cursor(&self) -> usize
The current offset of the cursor. Equal to the last value provided to
new()
or set_cursor()
, or returned from next_boundary()
or
prev_boundary()
.
// Two flags (🇷🇸🇮🇴), each flag is two RIS codepoints, each RIS is 4 bytes.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.cur_cursor(), 4);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.cur_cursor(), 8);
sourcepub fn provide_context(&mut self, chunk: &str, chunk_start: usize)
pub fn provide_context(&mut self, chunk: &str, chunk_start: usize)
Provide additional pre-context when it is needed to decide a boundary.
The end of the chunk must coincide with the value given in the
GraphemeIncomplete::PreContext
request.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
// Not enough pre-context to decide if there's a boundary between the two flags.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(8)));
// Provide one more Regional Indicator Symbol of pre-context
cursor.provide_context(&flags[4..8], 4);
// Still not enough context to decide.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(4)));
// Provide additional requested context.
cursor.provide_context(&flags[0..4], 0);
// That's enough to decide (it always is when context goes to the start of the string)
assert_eq!(cursor.is_boundary(&flags[8..], 8), Ok(true));
sourcepub fn is_boundary(
&mut self,
chunk: &str,
chunk_start: usize
) -> Result<bool, GraphemeIncomplete>
pub fn is_boundary( &mut self, chunk: &str, chunk_start: usize ) -> Result<bool, GraphemeIncomplete>
Determine whether the current cursor location is a grapheme cluster boundary.
Only a part of the string need be supplied. If chunk_start
is nonzero or
the length of chunk
is not equal to len
on creation, then this method
may return GraphemeIncomplete::PreContext
. The caller should then
call provide_context
with the requested chunk, then retry calling this
method.
For partial chunks, if the cursor is not at the beginning or end of the string, the chunk should contain at least the codepoint following the cursor. If the string is nonempty, the chunk must be nonempty.
All calls should have consistent chunk contents (ie, if a chunk provides content for a given slice, all further chunks covering that slice must have the same content for it).
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
assert_eq!(cursor.is_boundary(flags, 0), Ok(true));
cursor.set_cursor(12);
assert_eq!(cursor.is_boundary(flags, 0), Ok(false));
sourcepub fn next_boundary(
&mut self,
chunk: &str,
chunk_start: usize
) -> Result<Option<usize>, GraphemeIncomplete>
pub fn next_boundary( &mut self, chunk: &str, chunk_start: usize ) -> Result<Option<usize>, GraphemeIncomplete>
Find the next boundary after the current cursor position. Only a part of
the string need be supplied. If the chunk is incomplete, then this
method might return GraphemeIncomplete::PreContext
or
GraphemeIncomplete::NextChunk
. In the former case, the caller should
call provide_context
with the requested chunk, then retry. In the
latter case, the caller should provide the chunk following the one
given, then retry.
See is_boundary
for expectations on the provided chunk.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(16)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(None));
And an example that uses partial strings:
let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.next_boundary(&s[..2], 0), Ok(Some(1)));
assert_eq!(cursor.next_boundary(&s[..2], 0), Err(GraphemeIncomplete::NextChunk));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(2)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(4)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(None));
sourcepub fn prev_boundary(
&mut self,
chunk: &str,
chunk_start: usize
) -> Result<Option<usize>, GraphemeIncomplete>
pub fn prev_boundary( &mut self, chunk: &str, chunk_start: usize ) -> Result<Option<usize>, GraphemeIncomplete>
Find the previous boundary after the current cursor position. Only a part
of the string need be supplied. If the chunk is incomplete, then this
method might return GraphemeIncomplete::PreContext
or
GraphemeIncomplete::PrevChunk
. In the former case, the caller should
call provide_context
with the requested chunk, then retry. In the
latter case, the caller should provide the chunk preceding the one
given, then retry.
See is_boundary
for expectations on the provided chunk.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(12, flags.len(), false);
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(None));
And an example that uses partial strings (note the exact return is not
guaranteed, and may be PrevChunk
or PreContext
arbitrarily):
let s = "abcd";
let mut cursor = GraphemeCursor::new(4, s.len(), false);
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Err(GraphemeIncomplete::PrevChunk));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(2)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(1)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(None));
Trait Implementations§
source§impl Clone for GraphemeCursor
impl Clone for GraphemeCursor
source§fn clone(&self) -> GraphemeCursor
fn clone(&self) -> GraphemeCursor
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more