Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom error isn't respected in some cases #530

Open
faassen opened this issue Sep 20, 2023 · 6 comments
Open

Custom error isn't respected in some cases #530

faassen opened this issue Sep 20, 2023 · 6 comments

Comments

@faassen
Copy link

faassen commented Sep 20, 2023

I've implemented a custom error class, with a Custom error type on it. I want to return this custom error using try_map and then see it at the end of the failed parse. It works, but not always. When it doesn't work, ExpectedFound is produced instead (i.e. a non-custom error).

combined_year_custom_error fails, the rest doesn't. The dbg! in year_parser behaves intriguingly: while the custom error is indeed returned from try_map, it's replaced by an ExpectedFound by the time the map_err immediately following year_parser is invoked. If I use year_parser by myself the Custom error is emitted by both dbg!.

I tried to replicate the behavior in a simplified with using combined_true_false_parser but this does exhibit the expected behavior.

use chumsky::prelude::*;
use chumsky::util::MaybeRef;

#[derive(Debug, Clone, PartialEq)]
enum MyError {
    ExpectedFound {
        span: SimpleSpan<usize>,
        expected: Vec<Option<char>>,
        found: Option<char>,
    },
    Custom,
}

impl<'a> chumsky::error::Error<'a, &'a str> for MyError {
    fn expected_found<E: IntoIterator<Item = Option<MaybeRef<'a, char>>>>(
        expected: E,
        found: Option<MaybeRef<'a, char>>,
        span: SimpleSpan<usize>,
    ) -> Self {
        Self::ExpectedFound {
            span,
            expected: expected
                .into_iter()
                .map(|e| e.as_deref().copied())
                .collect(),
            found: found.as_deref().copied(),
        }
    }

    fn merge(self, other: Self) -> Self {
        println!("merging {:?} and {:?}", self, other);
        match (self, other) {
            (MyError::ExpectedFound { .. }, a) => a,
            (a, MyError::ExpectedFound { .. }) => a,
            (a, _) => a,
        }
    }
}

type MyExtra = extra::Err<MyError>;

fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
    let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());

    let digits = digit.repeated().at_least(1).collect::<String>();

    digits
        .try_map(|digits, _| {
            dbg!(&digits);
            dbg!(if digits.len() <= 4 {
                Ok(2000)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_year_parser<'a>() -> impl Parser<'a, &'a str, (i32, i32), MyExtra> {
    let year = year_parser().boxed();
    year.clone().then_ignore(just("-")).then(year)
}

fn true_false_parser<'a>() -> impl Parser<'a, &'a str, bool, MyExtra> {
    just("true")
        .or(just("false"))
        .try_map(|token, _| {
            dbg!(if token == "true" {
                Ok(true)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_true_false_parser<'a>() -> impl Parser<'a, &'a str, (bool, bool), MyExtra> {
    let true_false = true_false_parser().boxed();
    true_false.clone().then_ignore(just("-")).then(true_false)
}

#[test]
fn year_custom_error() {
    let parser = year_parser();
    let s = "123456789";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_year_custom_error() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn true_false_custom_error() {
    let parser = true_false_parser();
    let s = "false";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_true_false_custom_error() {
    let parser = combined_true_false_parser();
    let s = "false-true";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

fn main() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    match parser.parse(s).into_result() {
        Ok(value) => println!("value: {:?}", value),
        Err(e) => println!("error: {:?}", e),
    }
}

As a usability note: it was surprising to me that I needed to implement merge. Without it, year_custom_error also fails. merge doesn't get invoked for combined_year_custom_error, however.

This is with the latest git version of chumsky.

It may be related to #483 in some way, as this talks about the use of repeated as well as triggering some kind of error erasure.

@Zij-IT
Copy link
Contributor

Zij-IT commented Sep 20, 2023

I've implemented a custom error class, with a Custom error type on it. I want to return this custom error using try_map and then see it at the end of the failed parse. It works, but not always. When it doesn't work, ExpectedFound is produced instead (i.e. a non-custom error).

combined_year_custom_error fails, the rest doesn't. The dbg! in year_parser behaves intriguingly: while the custom error is indeed returned from try_map, it's replaced by an ExpectedFound by the time the map_err immediately following year_parser is invoked. If I use year_parser by myself the Custom error is emitted by both dbg!.

I tried to replicate the behavior in a simplified with using combined_true_false_parser but this does exhibit the expected behavior.

use chumsky::prelude::*;
use chumsky::util::MaybeRef;

#[derive(Debug, Clone, PartialEq)]
enum MyError {
    ExpectedFound {
        span: SimpleSpan<usize>,
        expected: Vec<Option<char>>,
        found: Option<char>,
    },
    Custom,
}

impl<'a> chumsky::error::Error<'a, &'a str> for MyError {
    fn expected_found<E: IntoIterator<Item = Option<MaybeRef<'a, char>>>>(
        expected: E,
        found: Option<MaybeRef<'a, char>>,
        span: SimpleSpan<usize>,
    ) -> Self {
        Self::ExpectedFound {
            span,
            expected: expected
                .into_iter()
                .map(|e| e.as_deref().copied())
                .collect(),
            found: found.as_deref().copied(),
        }
    }

    fn merge(self, other: Self) -> Self {
        println!("merging {:?} and {:?}", self, other);
        match (self, other) {
            (MyError::ExpectedFound { .. }, a) => a,
            (a, MyError::ExpectedFound { .. }) => a,
            (a, _) => a,
        }
    }
}

type MyExtra = extra::Err<MyError>;

fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
    let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());

    let digits = digit.repeated().at_least(1).collect::<String>();

    digits
        .try_map(|digits, _| {
            dbg!(&digits);
            dbg!(if digits.len() <= 4 {
                Ok(2000)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_year_parser<'a>() -> impl Parser<'a, &'a str, (i32, i32), MyExtra> {
    let year = year_parser().boxed();
    year.clone().then_ignore(just("-")).then(year)
}

fn true_false_parser<'a>() -> impl Parser<'a, &'a str, bool, MyExtra> {
    just("true")
        .or(just("false"))
        .try_map(|token, _| {
            dbg!(if token == "true" {
                Ok(true)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_true_false_parser<'a>() -> impl Parser<'a, &'a str, (bool, bool), MyExtra> {
    let true_false = true_false_parser().boxed();
    true_false.clone().then_ignore(just("-")).then(true_false)
}

#[test]
fn year_custom_error() {
    let parser = year_parser();
    let s = "123456789";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_year_custom_error() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn true_false_custom_error() {
    let parser = true_false_parser();
    let s = "false";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_true_false_custom_error() {
    let parser = combined_true_false_parser();
    let s = "false-true";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

fn main() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    match parser.parse(s).into_result() {
        Ok(value) => println!("value: {:?}", value),
        Err(e) => println!("error: {:?}", e),
    }
}

As a usability note: it was surprising to me that I needed to implement merge. Without it, year_custom_error also fails. merge doesn't get invoked for combined_year_custom_error, however.

This is with the latest git version of chumsky.

It may be related to #483 in some way, as this talks about the use of repeated as well as triggering some kind of error erasure.

I believe in this case the solution is to use Parser::validate and not Parser::try_map for parsing the year. By swapping the two methods in year_parser:

fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
    let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());

    let digits = digit.repeated().at_least(1).collect::<String>();

    digits.validate(|digits, _, emitter| {
        dbg!(&digits);
        dbg!(if digits.len() <= 4 {
            2000
        } else {
            emitter.emit(MyError::Custom);
            // Some sentinel value because you know it was a date, just not a valid one.
           // The custom error will be produced and will be found in the `ParseResult`
            -1
        })
    })
}

You get the expected output (and all tests now pass):

error: [Custom]

Parser::try_map is used when you want to validate an input, but have an Err treated as if that item wasn't correctly parsed. Parser::validate is used when you know you have successfully parsed a token, but have to validate that it meets some requirements

In the docs for Parser::validate on newest branch:

Validate an output, producing non-terminal errors if it does not fulfil certain criteria. The errors will not immediately halt parsing on this path, but instead it will continue, potentially emitting one or more other errors, only failing after the pattern has otherwise successfully, or emitted another terminal error.

This function also permits mapping the output to a value of another type, similar to Parser::map.

If you wish parsing of this pattern to halt when an error is generated instead of continuing, consider using Parser::try_map instead.

@faassen
Copy link
Author

faassen commented Sep 20, 2023

Thank you! I will give that a shot.

I had encountered validate in the documentation but I drew entirely the wrong conclusion from its documentation. In my mind I did want parsing to immediately halt, but I think the key phrase is "on this path". It's somewhat counterintuitive that in order for parsing to fail entirely, you actually need it to continue on a particular path!

Your explanation:

Parser::try_map is used when you want to have a possible input , but continue to try others. Parser::validate is used when you know you have successfully parsed a token, but have to validate that it meets some requirements

is helpful. I think some improvements in the docs might be helpful, but I need to digest this a bit more first.

@Zij-IT
Copy link
Contributor

Zij-IT commented Sep 20, 2023

@faassen I realize now that my explanation wasn't totally correct. Parser::try_map treats an err as a parsing error, and abandons that branch of parsing, where Parser::validate is, as I previously wrote, when you parsed a thing, you know it's a thing, but that thing can be invalid (invalid dates, number literals that are too large and the like).

I personally haven't used try_map but use validate for parsing number literals in a toy programming language

@faassen
Copy link
Author

faassen commented Sep 20, 2023

I'm still confused why try_map causes the branch to be abandoned with a custom error, there appears to be no other branch, but it doesn't return the custom error.

I must be wrong about there being no other branch - there is a valid branch with just 4 digits, but the rest of the pattern can't be parsed. It's just that it doesn't even seem to enter that branch, as I only see one output of digits. Perhaps that's due to optimizations?

@zesterer
Copy link
Owner

I am quite confused about the original issue. Maybe I've missed something (I'm currently viewing this on mobile while on a train) but to me the original code should be producing Error::Custom. Unfortunately I'm not going to get a chance to look into this for a few days, but I want to investigate this.

@faassen
Copy link
Author

faassen commented Sep 21, 2023

To confirm, as @Zij-IT suggested, using validate instead of try_map does make these problems go away. I've already fixed several error handling bugs using this strategy. I don't understand why exactly, though.

Could there be a shortcut API for emit that looks more like try_map?

try_validate(sentinel, |s, _| {
     // this returns a result
     s.parse()
})

and this gets turned into:

validate(|s, _, emitter| {
   match s.parse() {
       Ok(parsed) => parsed,
       Err(err) => { 
            emitter.emit(err);
           sentinel
       } 
   }
})

This way try_validate is almost like try_map, except that an explicit sentinel is required, which is a bit icky but is what you end up doing when you use validate manually as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants