Skip to content

Value error when max_stopword_similarity too low in extract_terms method #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
loctimize opened this issue Aug 17, 2022 · 0 comments

Comments

@loctimize
Copy link

When the max_stopword_similarity value passed to extract_terms method is too low, e. g. .10, no terms might be found at all. This results in the following error being raised in term_extractor.py line 124.

raise ValueError(
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Suggestion:
Check if top_spans actually contains any term candidate by wrapping lines 124-132 in an if condition:

        **if len(top_spans) > 0:**
            if collapse_similarity is True:
                top_spans = self._collapse_similarity(top_spans)
    
            for i, span in enumerate(top_spans):
                span._.span_id = i
            top_spans = sorted(top_spans, key=lambda span: span._.span_id)
    
            if return_as_table is True:
                top_spans = self._return_as_table(top_spans)
        return top_spans

Does this make sense to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant