Skip to content

Feat: Automatic Language Detection for JPlag CLI #2353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

KaitongQin
Copy link

Automatic Language Detection for JPlag CLI

Backgroud

When using JPlag for similarity detection, users often forget to specify the target language using the -l option. Previously, this would result in incorrect language assumptions or require setting a default language, which could lead to inaccurate analysis or even failure. This pull request addresses this usability issue by introducing a mechanism to automatically detect the programming language(s) used in the input directory based on file suffixes.

Summary of Changes

  • Added a new utility class LanguageChecker, which scans the root input directory recursively to collect file suffixes.
  • If the user provides an option (-l), the tool checks whether all file suffixes in the directory are valid for that language.
    • If yes, it proceeds with the user-specified language.
    • If no, it warns the user and attempts to detect a better matching language automatically.
  • If no exact match is found or multiple possible matches exist:
    • The user is advised to use the multi-language mode for handling multiple languages.
  • Languages are loaded from the class LanguageLoader

@tsaglam
Copy link
Member

tsaglam commented May 2, 2025

Hey, thanks for your contribution. This might be very similar to #2087, which introduced the multi-language module. Right now, this module relies on specifying the languages, but we are in the process of adapting that to automatically choose the language for each parsed file individually, see #2304. Maybe you can weigh in on the differences compared to your solution. Also, note that you committed binary files in this PR branch.

@tsaglam tsaglam added duplicate This has been discussed somewhere else enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change labels May 3, 2025
Copy link

sonarqubecloud bot commented May 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This has been discussed somewhere else enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants