NeoSCA is another syntactic complexity analyzer of written English language samples.

Another syntactic complexity analyzer of written English language samples.

NeoSCA is a rewrite of Xiaofei Lu’s L2 Syntactic Complexity Analyzer, supporting Windows, macOS, and Linux. The same as L2SCA, NeoSCA takes written English language samples in plain text format as input, and computes:

The frequency of 9 structures in the text:

words (W)
sentences (S)
verb phrases (VP)
clauses (C)
T-units (T)
dependent clauses (DC)
complex T-units (CT)
coordinate phrases (CP)
complex nominals (CN), and

14 syntactic complexity indices of the text:

mean length of sentence (MLS)
mean length of T-unit (MLT)
mean length of clause (MLC)
clauses per sentence (C/S)
verb phrases per T-unit (VP/T)
clauses per T-unit (C/T)
dependent clauses per clause (DC/C)
Dependent clauses per T-unit (DC/T)
T-units per sentence (T/S)
complex T-unit ratio (CT/T)
coordinate phrases per T-unit (CP/T)
coordinate phrases per clause (CP/C)
complex nominals per T-unit (CN/T)
complex nominals per clause (CP/C)

NeoSCA vs. L2SCA ^{Top ▲}

L2SCA	NeoSCA
runs on macOS and Linux	runs on WindowsmacOS, and Linux
single and multiple input are handled respectively by two commands	one command, `nsca`for both cases, making your life easier
runs only under its own home directory	runs under any directory
outputs only frequencies of the “9+14” syntactic structures	add options to reserve intermediate results, such as the results of parsing the text with Stanford Parser and matching patterns with Stanford Tregex

Install neosca

To install NeoSCA, you need to have Python 3.7 or later installed on your system. You can check if you have Python installed by running the following command in your terminal:

If Python is not installed, you can download and install it from Python website. Once you have Python installed, you can install NeoSCA using pip:

For users inside of China:

pip install neosca -i https://pypi.tuna.tsinghua.edu.cn/simple

Install Java 8 or later
Download and unzip latest versions of Stanford Parser and Stanford Tregex

4. Set environment variables `STANFORD_PARSER_HOME` and `STANFORD_TREGEX_HOME`

In the Environment Variables window (press Windows+stype envand press Enter):

STANFORD_PARSER_HOME=\path\to\stanford-parser-full-2020-11-17
STANFORD_TREGEX_HOME=\path\to\stanford-tregex-2020-11-17

export STANFORD_PARSER_HOME=/path/to/stanford-parser-full-2020-11-17
export STANFORD_TREGEX_HOME=/path/to/stanford-tregex-2020-11-17

To use NeoSCA, run the nsca command in your terminal, followed by the options and arguments you want to use.

Single input:

nsca ./samples/sample1.txt 
# frequency output: ./result.csv
nsca ./samples/sample1.txt -o sample1.csv 
# frequency output: ./sample1.csv

Multiple input:

nsca ./samples/sample1.txt ./samples/sample2.txt
nsca ./samples/sample*.txt 
# wildcard characters are supported
nsca ./samples/sample[1-1000].txt

Use --text to pass text through command line.

nsca --text 'The quick brown fox jumps over the lazy dog.'
# frequency output: ./result.csv

Use -p/--reserve-parsed
to reserve parsed trees of Stanford Parser. -m/--reserve-matched
to reserve matched subtrees of Stanford Tregex.

nsca samples/sample1.txt -p -m
# frequency output: ./result.csv
# parsed trees: ./samples/sample1.parsed
# matched subtrees: ./result_matches/

5. Use `–list` to print output fields.

W: words
S: sentences
VP: verb phrases
C: clauses
T: T-units
DC: dependent clauses
CT: complex T-units
CP: coordinate phrases
CN: complex nominals
MLS: mean length of sentence
MLT: mean length of T-unit
MLC: mean length of clause
C/S: clauses per sentence
VP/T: verb phrases per T-unit
C/T: clauses per T-unit
DC/C: dependent clauses per clause
DC/T: dependent clauses per T-unit
T/S: T-units per sentence
CT/T: complex T-unit ratio
CP/T: coordinate phrases per T-unit
CP/C: coordinate phrases per clause
CN/T: complex nominals per T-unit
CN/C: complex nominals per clause

Use --no-query to just save parsed trees and exit.

nsca samples/sample1.txt --no-query
# parsed trees: samples/sample1.parsed
nsca --text 'This is a test.' --no-query
# parsed trees: ./cmdline_text.parsed

Calling nsca without any arguments returns the help message.

If you use NeoSCA in your research, please cite it using the following BibTeX entry:

@misc{tan2022neosca,
author = {Tan, Long},
title = {NeoSCA (version 0.0.30)},
howpublished = {\url{https://github.com/tanloong/neosca}},
year = {2022}
}

Also, you need to cite Lu’s article describing L2SCA:

@article{lu2010automatic,
title={Automatic analysis of syntactic complexity in second language writing},
author={Lu, Xiaofei},
journal={International journal of corpus linguistics},
volume={15},
number={4},
pages={474--496},
year={2010},
publisher={John Benjamins}
}

NeoSCA is licensed under the GNU General Public License version 2 or later.

#NeoSCA #syntactic #complexity #analyzer #written #English #language #samples

Another syntactic complexity analyzer of written English language samples.

Contents

NeoSCA vs. L2SCA ^{Top ▲}

Leave a Comment Cancel Reply

Another syntactic complexity analyzer of written English language samples.

Contents

NeoSCA vs. L2SCA Top ▲

Leave a Comment Cancel Reply

NeoSCA vs. L2SCA ^{Top ▲}