Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Complexity + Lines & DRYness #560

Open
J-B-Blankenship opened this issue Nov 16, 2024 · 8 comments
Open

Code Complexity + Lines & DRYness #560

J-B-Blankenship opened this issue Nov 16, 2024 · 8 comments

Comments

@J-B-Blankenship
Copy link

J-B-Blankenship commented Nov 16, 2024

Code Complexity:
Using C++ as an example:
"for ",
"for(",
"if ",
"if(",
"switch ",
"switch(",
"while ",
"while(",
"else ",
"|| ",
"&& ",
"!= ",
"== "
The addition of "< ", "> ", "<= ", ">= ", "| ", "& " are all valid logical operators in the C++ language. I can help add these additional operators for languages that I know (Python and C/C++). To me, these logical operators imply the same "complexity" we are trying to track.

The Complexity/Line calculation appears to include blanks and comments in the calculation. I would suspect that code complexity only compare Code vs. Blanks + Comments + Code. This artificially decreases that metric.

Lines and DRYness:
For languages like Java, C, and C++, the utilization of "{" and "}" is common as syntax "sugar" to define portions of code. Currently, a line containing only these constitutes as a Line in the total count. This artificially lowers the DRYness metric by inflating lines of code. I believe there is a strong argument to add an ignore list for lines containing only these symbols. These are counted as not unique when occurring multiple times.

Example:

void foo()
{
}
void bar()
{
}

This counts as 6 lines of code with 4 unique lines for DRYness of 66.66% whereas it could be represented equally as the following:

void foo() {}
void bar() {}

This counts as 2 lines of code with 2 unique lines for DRYness of 100.00%.

In addition, I have noticed the calculation uses all lines (eg. Blanks, Comments, and Code). My expectation would be ULOC to be comprised only of Code and DRYness to also only be Code or Code + Comments.

I know how to add the operators. The ignoring a list of symbols if a line comprises only of those is beyond me (not familiar with Go). Happy to help where I can.

@J-B-Blankenship
Copy link
Author

J-B-Blankenship commented Nov 16, 2024

I wrote a quick script to strip desired syntax sugar with the following results:

Old

scc --cocomo-project-type embedded --avg-wage 150000 --by-file --sort complexity --dryness --not-match test* --format wide rogue/
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Language                              Files     Lines   Blanks  Comments     Code Complexity Complexity/Lines
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C Header                                 14      4125      393       205     3527        526           133.06
(ULOC)                                           1783
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~tructures/red_black_trees/red_black_tree.h      1220       86        37     1097        191            17.41
rogue/management/memory/schema_storage.h          899      101        59      739        158            21.38
~tures/dimensional_trees/dimensional_tree.h       420       46        28      346         54            15.61
~ue/management/services/interface_service.h       277       19         6      252         39            15.48
rogue/algorithms/sorts.h                          334       20        25      289         20             6.92
~ue/management/services/configure_service.h       230       19        11      200         18             9.00
rogue/algorithms/multiset_operations.h             90        6         1       83         17            20.48
rogue/concurrency/threadpool.h                    181       20         7      154         14             9.09
rogue/data_structures/nodes/node.h                147       18         5      124          8             6.45
rogue/common/definitions.h                        128       20         6      102          4             3.92
rogue/common/generated_conversions.h               56       10         5       41          3             7.32
rogue/management/memory/parser.h                   72       17         5       50          0             0.00
rogue/common/calculations.h                        30        5         5       20          0             0.00
rogue/common/string_manipulations.h                41        6         5       30          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Python                                    5       900      121        30      749         99            68.11
(ULOC)                                            545
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
rogue/scripts/protobuf_generation.py              297       47         8      242         46            19.01
~nagement/services/configuration_service.py       166       29         1      136         27            19.85
~ogue/scripts/generate_interface_service.py       312       19         9      284         14             4.93
rogue/scripts/common.py                            88       17         7       64         10            15.62
rogue/scripts/preparation_script.py                37        9         5       23          2             8.70
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C++                                       2       116       21        31       64          2             4.08
(ULOC)                                             84
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~e/performance/benchmark_red_black_tree.cpp        89       14        26       49          2             4.08
~/management/services/interface_service.cpp        27        7         5       15          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Protocol Buffers                          6       226       37        34      155          0             0.00
(ULOC)                                            107
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
rogue/services/configure.proto                     18        4         5        9          0             0.00
rogue/services/interface.proto                     19        4         5       10          0             0.00
rogue/services/generic.proto                       43        5         5       33          0             0.00
rogue/services/subscribed.proto                    16        3         5        8          0             0.00
rogue/services/queries.proto                      109       16         9       84          0             0.00
rogue/services/extensions.proto                    21        5         5       11          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bazel                                     1        59        6         0       53          0             0.00
(ULOC)                                             35
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
rogue/data_structures/BUILD.bazel                  59        6         0       53          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Total                                    28      5426      578       300     4548        627           205.25
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Unique Lines of Code (ULOC)                      2397
DRYness %                                        0.44
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (embedded) $664,979
Estimated Schedule Effort (embedded) 6.74 months
Estimated People Required (embedded) 3.29
Processed 200250 bytes, 0.200 megabytes (SI)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────

New:

scc --cocomo-project-type embedded --avg-wage 150000 --by-file --sort complexity --dryness --not-match test* --format wide scc_analysis/
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Language                              Files     Lines   Blanks  Comments     Code Complexity Complexity/Lines
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C Header                                 14      2933      393       205     2335        526           204.41
(ULOC)                                           1741
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~tructures/red_black_trees/red_black_tree.h       799       86        37      676        191            28.25
~s/rogue/management/memory/schema_storage.h       649      101        59      489        158            32.31
~tures/dimensional_trees/dimensional_tree.h       292       46        28      218         54            24.77
~ue/management/services/interface_service.h       193       19         6      168         39            23.21
scc_analysis/rogue/algorithms/sorts.h             280       20        25      235         20             8.51
~ue/management/services/configure_service.h       171       19        11      141         18            12.77
~sis/rogue/algorithms/multiset_operations.h        58        6         1       51         17            33.33
~cc_analysis/rogue/concurrency/threadpool.h       130       20         7      103         14            13.59
~nalysis/rogue/data_structures/nodes/node.h       112       18         5       89          8             8.99
scc_analysis/rogue/common/definitions.h            86       20         6       60          4             6.67
~lysis/rogue/common/generated_conversions.h        40       10         5       25          3            12.00
~_analysis/rogue/management/memory/parser.h        68       17         5       46          0             0.00
scc_analysis/rogue/common/calculations.h           22        5         5       12          0             0.00
~alysis/rogue/common/string_manipulations.h        33        6         5       22          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Python                                    6       866      130        30      706        108            96.58
(ULOC)                                            553
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~lysis/rogue/scripts/protobuf_generation.py       280       47         8      225         46            20.44
~nagement/services/configuration_service.py       165       29         1      135         27            20.00
~ogue/scripts/generate_interface_service.py       255       19         9      227         14             6.17
scc_analysis/rogue/scripts/common.py               83       17         7       59         10            16.95
scc_analysis/scc_with_additional_checks.py         46        9         0       37          9            24.32
~alysis/rogue/scripts/preparation_script.py        37        9         5       23          2             8.70
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C++                                       2       108       21        31       56          2             4.65
(ULOC)                                             80
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~e/performance/benchmark_red_black_tree.cpp        83       14        26       43          2             4.65
~/management/services/interface_service.cpp        25        7         5       13          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Protocol Buffers                          6       179       37        34      108          0             0.00
(ULOC)                                            103
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~cc_analysis/rogue/services/configure.proto        16        4         5        7          0             0.00
~cc_analysis/rogue/services/interface.proto        17        4         5        8          0             0.00
scc_analysis/rogue/services/generic.proto          35        5         5       25          0             0.00
~c_analysis/rogue/services/extensions.proto        17        5         5        7          0             0.00
~c_analysis/rogue/services/subscribed.proto        15        3         5        7          0             0.00
scc_analysis/rogue/services/queries.proto          79       16         9       54          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bazel                                     1        52        6         0       46          0             0.00
(ULOC)                                             34
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~analysis/rogue/data_structures/BUILD.bazel        52        6         0       46          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Total                                    29      4138      587       300     3251        636           305.64
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Unique Lines of Code (ULOC)                      2389
DRYness %                                        0.58
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (embedded) $444,471
Estimated Schedule Effort (embedded) 5.92 months
Estimated People Required (embedded) 2.50
Processed 187834 bytes, 0.188 megabytes (SI)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────

@J-B-Blankenship
Copy link
Author

New with Blank lines stripped:

scc --cocomo-project-type embedded --avg-wage 150000 --by-file --sort complexity --dryness --not-match test* --format wide scc_analysis/
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Language                              Files     Lines   Blanks  Comments     Code Complexity Complexity/Lines
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C Header                                 14      2537        0       205     2332        526           204.69
(ULOC)                                           1731
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~tructures/red_black_trees/red_black_tree.h       713        0        37      676        191            28.25
~s/rogue/management/memory/schema_storage.h       548        0        59      489        158            32.31
~tures/dimensional_trees/dimensional_tree.h       246        0        28      218         54            24.77
~ue/management/services/interface_service.h       174        0         6      168         39            23.21
scc_analysis/rogue/algorithms/sorts.h             260        0        25      235         20             8.51
~ue/management/services/configure_service.h       149        0        11      138         18            13.04
~sis/rogue/algorithms/multiset_operations.h        52        0         1       51         17            33.33
~cc_analysis/rogue/concurrency/threadpool.h       110        0         7      103         14            13.59
~nalysis/rogue/data_structures/nodes/node.h        94        0         5       89          8             8.99
scc_analysis/rogue/common/definitions.h            66        0         6       60          4             6.67
~lysis/rogue/common/generated_conversions.h        30        0         5       25          3            12.00
~alysis/rogue/common/string_manipulations.h        27        0         5       22          0             0.00
~_analysis/rogue/management/memory/parser.h        51        0         5       46          0             0.00
scc_analysis/rogue/common/calculations.h           17        0         5       12          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Python                                    6       711        0        30      681        108            98.00
(ULOC)                                            548
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~lysis/rogue/scripts/protobuf_generation.py       228        0         8      220         46            20.91
~nagement/services/configuration_service.py       133        0         1      132         27            20.45
~ogue/scripts/generate_interface_service.py       219        0         9      210         14             6.67
scc_analysis/rogue/scripts/common.py               66        0         7       59         10            16.95
scc_analysis/scc_with_additional_checks.py         37        0         0       37          9            24.32
~alysis/rogue/scripts/preparation_script.py        28        0         5       23          2             8.70
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C++                                       2        87        0        31       56          2             4.65
(ULOC)                                             79
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~e/performance/benchmark_red_black_tree.cpp        69        0        26       43          2             4.65
~/management/services/interface_service.cpp        18        0         5       13          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Protocol Buffers                          6       142        0        34      108          0             0.00
(ULOC)                                            103
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~cc_analysis/rogue/services/interface.proto        13        0         5        8          0             0.00
~cc_analysis/rogue/services/configure.proto        12        0         5        7          0             0.00
scc_analysis/rogue/services/generic.proto          30        0         5       25          0             0.00
~c_analysis/rogue/services/subscribed.proto        12        0         5        7          0             0.00
~c_analysis/rogue/services/extensions.proto        12        0         5        7          0             0.00
scc_analysis/rogue/services/queries.proto          63        0         9       54          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bazel                                     1        46        0         0       46          0             0.00
(ULOC)                                             34
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~analysis/rogue/data_structures/BUILD.bazel        46        0         0       46          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Total                                    29      3523        0       300     3223        636           307.34
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Unique Lines of Code (ULOC)                      2379
DRYness %                                        0.68
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (embedded) $439,881
Estimated Schedule Effort (embedded) 5.90 months
Estimated People Required (embedded) 2.48
Processed 186260 bytes, 0.186 megabytes (SI)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────

This totals a 24% swing from the original DRYness calculation. To me, this accurately reflects the amount of maintainable code in a repo (eg. Comments and Code).

@J-B-Blankenship
Copy link
Author

J-B-Blankenship commented Nov 16, 2024

Finally, output example when using all available logical operators for Python and C++ (notice complexity total in this one vs. other metrics):

scc --cocomo-project-type embedded --avg-wage 150000 --by-file --sort complexity --dryness --not-match test* --format wide scc_analysis/
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Language                              Files     Lines   Blanks  Comments     Code Complexity Complexity/Lines
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C Header                                 14      2537        0       205     2332        644           261.30
(ULOC)                                           1568
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~tructures/red_black_trees/red_black_tree.h       713        0        37      676        232            34.32
~s/rogue/management/memory/schema_storage.h       548        0        59      489        194            39.67
~tures/dimensional_trees/dimensional_tree.h       246        0        28      218         56            25.69
~ue/management/services/interface_service.h       174        0         6      168         44            26.19
scc_analysis/rogue/algorithms/sorts.h             260        0        25      235         28            11.91
~ue/management/services/configure_service.h       149        0        11      138         22            15.94
~sis/rogue/algorithms/multiset_operations.h        52        0         1       51         19            37.25
~cc_analysis/rogue/concurrency/threadpool.h       110        0         7      103         17            16.50
scc_analysis/rogue/common/definitions.h            66        0         6       60         17            28.33
~nalysis/rogue/data_structures/nodes/node.h        94        0         5       89         12            13.48
~lysis/rogue/common/generated_conversions.h        30        0         5       25          3            12.00
scc_analysis/rogue/common/calculations.h           17        0         5       12          0             0.00
~alysis/rogue/common/string_manipulations.h        27        0         5       22          0             0.00
~_analysis/rogue/management/memory/parser.h        51        0         5       46          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Python                                    6       718        0        30      688        109            96.40
(ULOC)                                            516
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~lysis/rogue/scripts/protobuf_generation.py       228        0         8      220         46            20.91
~nagement/services/configuration_service.py       133        0         1      132         27            20.45
~ogue/scripts/generate_interface_service.py       219        0         9      210         14             6.67
scc_analysis/scc_with_additional_checks.py         44        0         0       44         10            22.73
scc_analysis/rogue/scripts/common.py               66        0         7       59         10            16.95
~alysis/rogue/scripts/preparation_script.py        28        0         5       23          2             8.70
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
C++                                       2        87        0        31       56          3             6.98
(ULOC)                                             78
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~e/performance/benchmark_red_black_tree.cpp        69        0        26       43          3             6.98
~/management/services/interface_service.cpp        18        0         5       13          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Protocol Buffers                          6       142        0        34      108          0             0.00
(ULOC)                                            103
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~cc_analysis/rogue/services/interface.proto        13        0         5        8          0             0.00
~cc_analysis/rogue/services/configure.proto        12        0         5        7          0             0.00
~c_analysis/rogue/services/extensions.proto        12        0         5        7          0             0.00
scc_analysis/rogue/services/queries.proto          63        0         9       54          0             0.00
scc_analysis/rogue/services/generic.proto          30        0         5       25          0             0.00
~c_analysis/rogue/services/subscribed.proto        12        0         5        7          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bazel                                     1        46        0         0       46          0             0.00
(ULOC)                                             34
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
~analysis/rogue/data_structures/BUILD.bazel        46        0         0       46          0             0.00
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Total                                    29      3530        0       300     3230        756           364.68
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Unique Lines of Code (ULOC)                      2189
DRYness %                                        0.62
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (embedded) $441,028
Estimated Schedule Effort (embedded) 5.91 months
Estimated People Required (embedded) 2.49
Processed 156262 bytes, 0.156 megabytes (SI)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────

@apocelipes
Copy link
Contributor

The addition of "< ", "> ", "<= ", ">= ", "| ", "& " are all valid logical operators in the C++ language.

"|" and "&" are bitwise arithmetic operators. They are not logical operators. See here for more information. This two operator are usually used by branchless programming, but we are counting the branches, so I dont think they should be counted as complexity. Complexity is calculated based on possible code paths (branches, loops, ...), but comparison operators do not automatically generate branches, and it is more complicated to calculate “>=” or “<=” that reflects values ​​within a certain range than “==” which only has a fixed value. So I tend not to count these operators.

The Complexity/Line calculation appears to include blanks and comments in the calculation. I would suspect that code complexity only compare Code vs. Blanks + Comments + Code. This artificially decreases that metric.

I'm afraid I'm not agree with this. Blanks and Comments are important parts of the code base. Some languages ​​also have rules on how to use blank lines, such as Python (PEP8). From practical experience, the proper use of blank lines and comments can indeed help developers understand the code, so the complexity is "reduced". If we only look at the number of lines of code, then the winner of the IOCCC (The International Obfuscated C Code Contest) is the code with the lowest complexity, but we all know this is ridiculous.

For languages like Java, C, and C++, the utilization of "{" and "}" is common as syntax "sugar" to define portions of code.

They are valid normal syntax not syntax sugars. In addition, in GNU C/C++ and rust, "{" and "}" are parts of statement expressions:

int k = ({
    int i = foo();
    int j = bar(2, i);
    i * j;
});

We also use "{}" in marcos to wrap a couple of expressions. They must be counted. Even syntax sugars should be counted if they are valid code.

Finally, code complexity is not an accurate value that reflects code quality, it can only be used as a reference.

@boyter
Copy link
Owner

boyter commented Nov 17, 2024

As @apocelipes mentioned the complexity is intended to count branch conditions in code, hence not including ==. This is because its based on cyclomatic complexity calculations and how they are performed. That said, with the addition of #462 you will be able to do this yourself if that's how you want to count things, but it is very unlikely to ever make it into scc itself.

As for the blanks and comments, you are correct in that your example

void foo()
{
}
void bar()
{
}

Will have a lower DRYness, however that's a fairly contrived example since most codebases are not going to look like this.

I has a quick attempt at adding this, and while the filtering is fairly simple in ULOC itself, its much harder to do on the display. The reason for this is that the final calculation uses other results. You can see the result of this here, where the DRYness actually goes down when ignore those lines due to the missed lines not being taken into account for the final calculation.

$ scc -a --uloc-ignore main.cpp
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
C++                          1         6        0         0        6          0
ULOC                                   2
───────────────────────────────────────────────────────────────────────────────
Total                        1         6        0         0        6          0
───────────────────────────────────────────────────────────────────────────────
Unique Lines of Code (ULOC)            2
DRYness %                           0.33
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $125
Estimated Schedule Effort (organic) 0.45 months
Estimated People Required (organic) 0.02
───────────────────────────────────────────────────────────────────────────────
Processed 30 bytes, 0.000 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

# boyter @ Bens-MacBook-Air in ~/Documents/projects/scc on git:master x [10:08:43] 
$ scc -a main.cpp              
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
C++                          1         6        0         0        6          0
ULOC                                   4
───────────────────────────────────────────────────────────────────────────────
Total                        1         6        0         0        6          0
───────────────────────────────────────────────────────────────────────────────
Unique Lines of Code (ULOC)            4
DRYness %                           0.67
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $125
Estimated Schedule Effort (organic) 0.45 months
Estimated People Required (organic) 0.02
───────────────────────────────────────────────────────────────────────────────
Processed 30 bytes, 0.000 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

You can see this on this branch https://github.com/boyter/scc/tree/UlocIgnore

I will have a look a bit later to see if I can correct this because I don't have any issues with including it as an option.

@J-B-Blankenship
Copy link
Author

J-B-Blankenship commented Nov 18, 2024

I have been playing around with the idea of "syntax sugar" a bit this weekend. As of this morning, I got to this regex pattern: {+ *(//.*|/\*.*)*. Each of {, }, (, and ) have a regex of similar style. This regex catching scenarios of multiple syntax plus comments.

However, I thought some more on it and realized even that is perhaps too complicated. The initial thrust of the idea is that basic formatting preferences be ignored. So that simplified the pattern simply is {+, }+;?, (+, and ')+;?'.

This position came from thinking about the edge case of multi-line comments and the infinite horrible ways to do that in a language like C++. For example:

void foo() /*
{ aha this is a comment still
}; /*
{}

The expected complexity and hassle of detecting this nonsense plus the performance hit made me come to the conclusion that code like this has larger concerns than complexity, code, blank, and comment counts. These metrics only need to be "close enough" to the real stats to make reasonable decisions in a larger context, so there is ultimately the trade-off of accuracy and performance. However, the typical syntax formatting highlighted in the initial post and second paragraph regex matching drastically affect outputs and are fairly simple to capture.

I hope this makes sense. My main focus is the number of individual statements in the code with as little inflation as possible of syntax formatting preference.

Another approach at this would be determining a unique line window. Rather than comparing 1 line at a time (trimmed of whitespace), the unique window compares N lines at a time for uniqueness trimmed of whitespace and excluding blank lines. Single liners such as a fairly common one in my code: return results; inflates duplicate code though results may be completely different data types. This addresses both syntax formatting inflation and pesky 1-2 liners that happen to match.

Again, happy to help. The Go language is a bit of an enigma trying to read it though and confidently make contributions.

@J-B-Blankenship
Copy link
Author

One of the issues presented with capturing < and > as equally valid logic operators is syntax rules. I am thinking C++ in particular again.

Consider the following:

if(foo<std::string>{"a"} != foo<std::string>{"b"})

The < and > in this expression are part of template specification. Either you would have to detect this or simply take the hit on accuracy. Trade-off on accuracy vs. performance is probably desired here as I would hope this to be a rare occurrence (never seen it myself). I would say initialization of a template instance in an if-statement is not exactly straightforward or common, and I have not seen this in any codebase to date.

@boyter
Copy link
Owner

boyter commented Nov 20, 2024

So regex is one of the things that is avoided inside the core counting. The reason for this is performance. You can compare cloc to scc to see what impact it has. This is why the core of scc is a state machine that moves from state to state based on the defined language specification.

As such I will not willingly add such a check into the core of scc.

That said, adding one as an option to pre parse files, IF you are prepared to weather the cost is a different story. It sounds like to me what you actually want to do is have a way to pre process the files, removing/replacing things that match some patterns in order to achieve your goal.

IE a pre-processor option, where you could as in your example above provide some regex that is used, and if it matches a whole line removes it.

Such a thing with #462 would allow you to achieve your goals.

I guess it would also allow people to count without comments if they wanted, or to remove doc-comments and such too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants