aboutsummaryrefslogtreecommitdiffstats
path: root/manuals
diff options
context:
space:
mode:
authorGavin Howard <gavin@yzena.com>2021-07-17 22:41:04 -0600
committerGavin Howard <gavin@yzena.com>2021-07-17 22:41:04 -0600
commit4a19522a728b1d622d98ede2634bdfd9182ff05d (patch)
tree86d35c9750bc50a116b56a79ecf2d67771e4ed38 /manuals
parent982249a3a127461a0ead011a19c19ac858119b08 (diff)
downloadplatform_external_bc-4a19522a728b1d622d98ede2634bdfd9182ff05d.tar.gz
platform_external_bc-4a19522a728b1d622d98ede2634bdfd9182ff05d.tar.bz2
platform_external_bc-4a19522a728b1d622d98ede2634bdfd9182ff05d.zip
Do more documentation work
Signed-off-by: Gavin Howard <gavin@yzena.com>
Diffstat (limited to 'manuals')
-rw-r--r--manuals/development.md1305
1 files changed, 1119 insertions, 186 deletions
diff --git a/manuals/development.md b/manuals/development.md
index 5e7abea9..ea1fc52c 100644
--- a/manuals/development.md
+++ b/manuals/development.md
@@ -191,19 +191,28 @@ portable to some platforms.
## Suggested Course
-TODO
+I do have a suggested course for programmers to follow when trying to understand
+this codebase. The order is this:
+
+1. `bc` Spec.
+2. Manpages.
+3. Test suite.
+4. Understand the build.
+5. Algorithms manual.
+6. Code concepts.
+7. Repo structure.
+8. Headers.
+9. Source code.
+
+This order roughly follows this order:
-0. `bc` Spec.
-1. Manpages.
-2. Test suite.
-3. Understand the build.
-4. Algorithms manual.
-5. Code concepts.
-6. Repo structure.
-7. Headers.
-8. Source code.
+1. High-level requirements
+2. Low-level requirements
+3. High-level implementation
+4. Low-level implementation
-From requirements to high level to low level.
+In other words, first understand what the code is *supposed* to do, then
+understand the code itself.
## Useful External Tools
@@ -703,12 +712,8 @@ The code associated with this header is in [`src/dc.c`][44],
This header is for `bc`'s internal buffered I/O API.
-Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and
-`longjmp()` for error handling, and the buffered I/O in `libc` does not interact
-well with the use of those procedures.
-
For more information about `bc`'s error handling and custom buffered I/O, see
-[Error Handling][97] and [Custom I/O][114], along with [`vm.h`][27] and the
+[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the
notes about version [`3.0.0`][32] in the [`NEWS`][32].
The code associated with this header is in [`src/file.c`][47].
@@ -718,8 +723,7 @@ The code associated with this header is in [`src/file.c`][47].
This header is for `bc`'s implementation of command-line editing/history, which
is based on a [UTF-8-aware fork][28] of [`linenoise`][29].
-At one point, I attempted to get history to work on Windows. It did not work
-out. Windows
+For more information, see the [Command-Line History][189] section.
The code associated with this header is in [`src/history.c`][48].
@@ -795,7 +799,8 @@ The code associated with this header is in [`src/program.c`][53].
#### `rand.h`
-This header defines the API for the pseudo-random number generator (PRNG).
+This header defines the API for the [pseudo-random number generator
+(PRNG)][179].
The PRNG only generates fixed-size integers. The magic of generating random
numbers of arbitrary size is actually given to the code that does math
@@ -839,10 +844,11 @@ There is no code associated with this header.
#### `vm.h`
-This header defines the code for setting up and running `bc` and `dc`. It is so
-named because I think of it as the "virtual machine" of `bc`, though that is
-probably not true as [`program.h`][57] is probably the "virtual machine". Thus,
-the name is more historical accident.
+This header defines the API for setting up and running `bc` and `dc`.
+
+It is so named because I think of it as the "virtual machine" of `bc`, though
+that is probably not true as [`program.h`][57] is probably the "virtual machine"
+API. Thus, the name is more historical accident.
The code associated with this header is in [`src/vm.c`][58].
@@ -934,12 +940,6 @@ and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
The file you are reading right now.
-TODO:
-
-* Document all code assumptions with asserts.
-* Document all functions with Doxygen comments.
-* The purpose of every file.
-
#### `header_bcl.txt`
Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper
@@ -1239,8 +1239,6 @@ A list of the various settings combos to be used by [`test_settings.sh`][104].
### `src/`
-TODO
-
This folder is, obviously, where the actual heart and soul of `bc`, the source
code, is.
@@ -1260,237 +1258,379 @@ liberal sprinkling of them through the code.
Code for processing command-line arguments.
+The header for this file is [`include/args.h`][31].
+
#### `bc.c`
The code for the `bc` main function `bc_main()`.
+The header for this file is [`include/bc.h`][106].
+
#### `bc_lex.c`
The code for lexing that only `bc` needs.
+The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106].
+
#### `bc_parse.c`
-TODO
+The code for parsing that only `bc` needs. This code is the most complex and
+subtle in the entire codebase.
+
+The headers for this file are [`include/parse.h`][181] and
+[`include/bc.h`][106].
#### `data.c`
-TODO
+Due to [historical accident][23] because of a desire to get my `bc` into
+[toybox][16], all of the constant data that `bc` needs is all in one file. This
+is that file.
+
+There is no code in this file, but a lot of the const data has a heavy influence
+on code, including the order of data in arrays because that order has to
+correspond to the order of other things elsewhere in the codebase. If you change
+the order of something in this file, run `make test`, and get errors, you
+changed something that depends on the order that you messed up.
+
+Almost all headers have `extern` references to items in this file.
#### `dc.c`
The code for the `dc` main function `dc_main()`.
+The header for this file is [`include/dc.h`][182].
+
#### `dc_lex.c`
The code for lexing that only `dc` needs.
+The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182].
+
#### `dc_parse.c`
-TODO
+The code for parsing that only `dc` needs.
+
+The headers for this file are [`include/parse.h`][181] and
+[`include/bc.h`][182].
#### `file.c`
-TODO
+The code for `bc`'s implementation of buffered I/O. For more information about
+why I implemented my own buffered I/O, see [`include/file.h`][55], [Error
+Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes
+about version [`3.0.0`][32] in the [`NEWS`][32].
+
+The header for this file is [`include/file.h`][55].
#### `history.c`
-TODO
+The code for `bc`'s implementation of command-line editing/history, which is
+based on a [UTF-8-aware fork][28] of [`linenoise`][29].
+
+For more information, see the [Command-Line History][189] section.
+
+The header for this file is [`include/history.h`][36].
#### `lang.c`
-TODO
+The data structures used for actual execution of `bc` and `dc` code.
+
+While execution is done in [`src/program.c`][53], this file defines functions
+for initializing, copying, and freeing the data structures, which is somewhat
+orthogonal to actual execution.
+
+Yes, it's misnamed; that's an accident of history where the first things I put
+into it all seemed related to the `bc` language.
+
+The header for this file is [`include/lang.h`][38].
#### `lex.c`
-TODO
+The code for the common things that both programs need for lexing.
+
+The header for this file is [`include/lex.h`][180].
#### `library.c`
-TODO
+The code to implement the public API of the `bcl` library.
+
+The code in this file does a lot to ensure that clients do not have to worry
+about internal `bc` details, especially error handling with `setjmp()` and
+`longjmp()`. That and encapsulating the handling of numbers are the bulk of what
+the code in this file actually does because most of the library is still
+implemented in [`src/num.c`][39].
+
+The headers for this file are [`include/bcl.h`][30] and
+[`include/library.h`][183].
#### `main.c`
-TODO
+The entry point for both programs; this is the `main()` function.
+
+This file has no headers associated with it.
#### `num.c`
-TODO
+The code for all of the arbitrary-precision [numbers][177] and [math][178] in
+`bc`.
+
+The header for this file is [`include/num.h`][184].
#### `opt.c`
-TODO
+The code for parsing command-line options.
+
+The header for this file is [`include/opt.h`][35].
#### `parse.c`
-TODO
+The code for the common items that both programs need for parsing.
+
+The header for this file is [`include/parse.h`][181].
#### `program.c`
-TODO
+The code for the actual execution engine for `bc` and `dc` code.
+
+The header for this file is [`include/program.h`][57].
#### `rand.c`
-TODO
+The code for the [pseudo-random number generator (PRNG)][179] and the special
+stack handling it needs.
+
+The PRNG only generates fixed-size integers. The magic of generating random
+numbers of arbitrary size is actually given to the code that does math
+([`src/num.c`][39]).
+
+The header for this file is [`include/rand.h`][37].
#### `read.c`
-TODO
+The code for reading from files and `stdin`.
+
+The header for this file is [`include/read.h`][185].
#### `vector.c`
-TODO
+The code for [vectors][111], [maps][186], and [slab vectors][187], along with
+slabs.
+
+The header for this file is [`include/vector.h`][174].
#### `vm.c`
-TODO
+The code for setting up and running `bc` and `dc`.
+
+It is so named because I think of it as the "virtual machine" of `bc`, though
+that is probably not true as [`program.h`][57] is probably the "virtual machine"
+code. Thus, the name is more historical accident.
+
+The header for this file is [`include/vm.h`][27].
### `tests/`
-TODO
+This directory contains the entire [test suite][124] and its infrastructure.
#### `all.sh`
-TODO
+A convenience script for the `make run_all_tests` target (see the [Group
+Tests][141] section for more information).
#### `all.txt`
-TODO
+The file with the names of the calculators. This is to make it easier for the
+test scripts to know where the standard and other test directories are.
#### `bcl.c`
-TODO
+The test for the [`bcl`][156] API. For more information, see the [`bcl`
+Test][157] section.
#### `errors.sh`
-TODO
+The script to run the error tests for each calculator. For more information, see
+the [Error Tests][151] section.
#### `extra_required.txt`
-TODO
+The file with the list of tests which both calculators have that need the [Extra
+Math build option][188]. This exists to make it easy for test scripts to skip
+those tests when the [Extra Math build option][188] is disabled.
#### `history.py`
-TODO
+The file with all of the history tests. For more information, see the [History
+Tests][155] section.
#### `history.sh`
-TODO
+The script to integrate [`history.py`][139] into the build system in a portable
+way, and to skip it if necessary.
#### `other.sh`
-TODO
+The script to run the "other" (miscellaneous) tests for each calculator. For
+more information, see the [Other Tests][154] section.
#### `read.sh`
-TODO
+The script to run the read tests for each calculator. For more information, see
+the [`read()` Tests][153] section.
#### `script.sed`
-TODO
+The `sed` script to edit the output of GNU `bc` when generating script tests.
+For more information, see the [Script Tests][150] section.
#### `script.sh`
-TODO
+The script for running one script test. For more information, see the [Script
+Tests][150] section.
#### `scripts.sh`
-TODO
+The script to help the `make run_all_tests` (see the [Group Tests][141] section)
+run all of the script tests.
#### `stdin.sh`
-TODO
+The script to run the `stdin` tests for each calculator. For more information,
+see the [`stdin` Tests][152] section.
#### `test.sh`
-TODO
+The script to run one standard test. For more information, see the [Standard
+Tests][149] section.
#### `bc/`
-TODO
+The standard tests directory for `bc`. For more information, see the [Standard
+Tests][149] section.
##### `all.txt`
-TODO
+The file to tell the build system and `make run_all_tests` (see the [Group
+Tests][141] section) what standard tests to run for `bc`, as well as in what
+order.
+
+This file just lists the test names, one per line.
##### `errors.txt`
-TODO
+The initial error test file for `bc`. This file has one test per line. See the
+[Error Tests][151] section for more information.
##### `posix_errors.txt`
-TODO
+The file of tests for POSIX compatibility for `bc`. This file has one test per
+line. For more information, see the [Error Tests][151] section.
##### `timeconst.sh`
-TODO
+The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6].
+For more information, see the [Linux `timeconst.bc` Script][191]section.
##### `errors/`
-TODO
+The directory with error tests for `bc`, most discovered by AFL++ (see the
+[Fuzzing][82] section). There is one test per file. For more information, see
+the [Error Tests][151] section.
##### `scripts/`
-TODO
+The script tests directory for `bc`. For more information, see the [Script
+Tests][150] section.
###### `all.txt`
-TODO
+A file to tell the build system and `make run_all_tests` (see the [Group
+Tests][141] section) what script tests to run for `bc`, as well as in what
+order.
+
+This file just lists the test names, one per line.
#### `dc/`
-TODO
+The standard tests directory for `dc`. For more information, see the [Standard
+Tests][149] section.
##### `all.txt`
-TODO
+The file to tell the build system and `make run_all_tests` (see the [Group
+Tests][141] section) what standard tests to run for `dc`, as well as in what
+order.
+
+This file just lists the test names, one per line.
##### `errors.txt`
-TODO
+The initial error test file for `dc`. This file has one test per line. See the
+[Error Tests][151] section for more information.
##### `read_errors.txt`
-TODO
+The file of tests errors with the `?` command (`read()` in `bc`). This file has
+one test per line. See the [Error Tests][151] section for more information.
##### `errors/`
-TODO
+The directory with error tests for `dc`, most discovered by AFL++ (see the
+[Fuzzing][82] section). There is one test per file. For more information, see
+the [Error Tests][151] section.
##### `scripts/`
-TODO
+The script tests directory for `dc`. For more information, see the [Script
+Tests][150] section.
###### `all.txt`
-TODO
+The file to tell the build system and `make run_all_tests` (see the [Group
+Tests][141] section) what script tests to run for `dc`, as well as in what
+order.
+
+This file just lists the test names, one per line.
#### `fuzzing/`
-TODO
+The directory containing the fuzzing infrastructure. For more information, see
+the [Fuzzing][82] section.
##### `bc_afl_continue.yaml`
-TODO
+The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a
+fuzz run. For more information, see the [Convenience][130] subsection of the
+[Fuzzing][82] section.
##### `bc_afl.yaml`
-TODO
+The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a
+fuzz run. For more information, see the [Convenience][130] subsection of the
+[Fuzzing][82] section.
+
+Be aware that this will delete all previous unsaved fuzzing tests in the output
+directories.
##### `bc_inputs1/`
-TODO
+The fuzzing input directory for the first third of inputs for `bc`. For more
+information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
##### `bc_inputs2/`
-TODO
+The fuzzing input directory for the second third of inputs for `bc`. For more
+information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
##### `bc_inputs3/`
-TODO
+The fuzzing input directory for the third third of inputs for `bc`. For more
+information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
##### `dc_inputs/`
-TODO
+The fuzzing input directory for the inputs for `dc`. For more information, see
+the [Corpuses][192] subsection of the [Fuzzing][82] section.
## Build System
@@ -1527,8 +1667,6 @@ targets and/or add a target for them especially.
### Preprocessor Macros
-TODO
-
`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type:
* builds,
@@ -1538,6 +1676,124 @@ TODO
This section will explain the preprocessor style of `bc` and `dc`, as well as
provide an explanation of the macros used.
+#### Style
+
+The style of macro use in `bc` is pretty straightforward: I avoid depending on
+macro definitions and instead, I set defaults if the macro is not defined and
+then test the value if the macro with a plain `#if`.
+
+(Some examples of setting defaults are in [`include/status.h`][176], just above
+the definition of the `BcStatus` enum.)
+
+In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible.
+
+There are a couple of cases where I went with standard stuff instead. For
+example, to test whether I am in debug mode or not, I still use the standard
+`#ifndef NDEBUG`.
+
+#### Standard Macros
+
+`BC_ENABLED`
+
+: This macro expands to `1` if `bc` is enabled, `0` if disabled.
+
+`DC_ENABLED`
+
+: This macro expands to `1` if `dc` is enabled, `0` if disabled.
+
+`BUILD_TYPE`
+
+: The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`,
+ `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct
+ the user to the correct markdown manual in the `git.yzena.com` website.
+
+`EXECPREFIX`
+
+: Thist macro expands to the prefix on the executable name. This is used to
+ allow `bc` and `dc` to skip the prefix when finding out which calculator is
+ executing.
+
+`BC_NUM_KARATSUBA_LEN`
+
+: This macro expands to an integer, which is the length of numbers below which
+ the Karatsuba multiplication algorithm switches to brute-force
+ multiplication.
+
+`BC_ENABLE_EXTRA_MATH`
+
+: This macro expands to `1` if the [Extra Math build option][188] is enabled,
+ `0` if disabled.
+
+`BC_ENABLE_HISTORY`
+
+: This macro expands to `1` if the [History build option][193] is enabled, `0`
+ if disabled.
+
+`BC_ENABLE_NLS`
+
+: This macro expands to `1` if the [NLS build option][193] (for locales) is
+ enabled, `0` if disabled.
+
+`BC_ENABLE_LIBRARY`
+
+: This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if
+ disabled. If this is enabled, building the calculators themselves is
+ disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero.
+
+`BC_ENABLE_MEMCHECK`
+
+: This macro expands to `1` if `bc` has been built for use with Valgrind's
+ [Memcheck][194], `0` otherwise. This ensures that fatal errors still free
+ all of their memory when exiting. `bc` does not do that normally because
+ what's the point?
+
+`BC_ENABLE_AFL`
+
+: This macro expands to `1` if `bc` has been built for fuzzing with
+ [AFL++][125], `0` otherwise.. See the [Fuzzing][82] section for more
+ information.
+
+`BC_DEFAULT_BANNER`
+
+: This macro expands to the default value for displaying the `bc` banner.
+
+`BC_DEFAULT_SIGINT_RESET`
+
+: The macro expands to the default value for whether or not `bc` should reset
+ on `SIGINT` or quit.
+
+`BC_DEFAULT_TTY_MODE`
+
+: The macro expands to the default value for whether or not `bc` should use
+ TTY mode when it available.
+
+`BC_DEFAULT_PROMPT`
+
+: This macro expands to the default value for whether or not `bc` should use a
+ prompt when TTY mode is available.
+
+`DC_DEFAULT_SIGINT_RESET`
+
+: The macro expands to the default value for whether or not `dc` should reset
+ on `SIGINT` or quit.
+
+`DC_DEFAULT_TTY_MODE`
+
+: The macro expands to the default value for whether or not `dc` should use
+ TTY mode when it available.
+
+`DC_DEFAULT_PROMPT`
+
+: This macro expands to the default value for whether or not `dc` should use a
+ prompt when TTY mode is available.
+
+`BC_DEBUG_CODE`
+
+: If this macro expands to a non-zero integer, then `bc` is built with *a lot*
+ of extra debugging code. This is never set by the build system and must be
+ set by the programmer manually. This should never be set in builds given to
+ end users. For more information, see the [Debugging][134] section.
+
## Test Suite
While the source code may be the heart and soul of `bc`, the test suite is the
@@ -1548,8 +1804,9 @@ This even goes for fuzzing because fuzzing depends on the test suite for its
input corpuses. (See the [Fuzzing][82] section.)
Understanding how the test suite works should be, I think, the first thing that
-maintainers learn. This is because the test suite, properly used, gives
-confidence that changes have not caused bugs or regressions.
+maintainers learn after learning what `bc` and `dc` should do. This is because
+the test suite, properly used, gives confidence that changes have not caused
+bugs or regressions.
That is why I spent the time to make the test suite as easy to use and as fast
as possible.
@@ -1632,23 +1889,317 @@ Some standard tests need to be skipped in certain cases. That is handled by the
[build system][142]. See the [Integration with the Build System][147] section
for more details.
+In addition to all of the above, the standard test directory is not only the
+directory for the standard tests of each calculator, it is also the parent
+directory of all other test directories for each calculator.
+
#### `bc` Standard Tests
-TODO
+The list of current (17 July 2021) standard tests for `bc` is below:
+
+decimal
+
+: Tests decimal parsing and printing.
+
+print
+
+: Tests printing in every base from decimal. This is near the top for
+ performance of parallel testing.
+
+parse
+
+: Tests parsing in any base and outputting in decimal. This is near the top
+ for performance of parallel testing.
+
+lib2
+
+: Tests the extended math library. This is near the top for performance of
+ parallel testing.
+
+print2
+
+: Tests printing at the extreme values of `obase`.
+
+length
+
+: Tests the `length()` builtin function.
+
+scale
+
+: Tests the `scale()` builtin function.
+
+shift
+
+: Tests the left (`<<`) and right (`>>`) shift operators.
+
+add
+
+: Tests addition.
+
+subtract
+
+: Tests subtraction.
+
+multiply
+
+: Tests multiplication.
+
+divide
+
+: Tests division.
+
+modulus
+
+: Tests modulus.
+
+power
+
+: Tests power (exponentiation).
+
+sqrt
+
+: Tests the `sqrt()` (square root) builtin function.
+
+trunc
+
+: Tests the truncation (`$`) operator.
+
+places
+
+: Tests the places (`@`) operator.
+
+vars
+
+: Tests some usage of variables. This one came from [AFL++][125] I think.
+
+boolean
+
+: Tests boolean operators.
+
+comp
+
+: Tests comparison operators.
+
+abs
+
+: Tests the `abs()` builtin function.
+
+assignments
+
+: Tests assignment operators, including increment/decrement operators.
-* List of all standard tests with what they test.
+functions
+
+: Tests functions, specifically function parameters being replaced before they
+ themselves are used. See the comment in `bc_program_call()` about the last
+ condition.
+
+scientific
+
+: Tests scientific notation.
+
+engineering
+
+: Tests engineering notation.
+
+globals
+
+: Tests that assigning to globals affects callers.
+
+strings
+
+: Tests strings.
+
+strings2
+
+: Tests string allocation in slabs, to ensure slabs work.
+
+letters
+
+: Tests single and double letter numbers to ensure they behave differently.
+ Single-letter numbers always be set to the same value, regardless of
+ `ibase`.
+
+exponent
+
+: Tests the `e()` function in the math library.
+
+log
+
+: Tests the `l()` function in the math library.
+
+pi
+
+: Tests that `bc` produces the right value of pi for numbers with varying
+ `scale` values.
+
+arctangent
+
+: Tests the `a()` function in the math library.
+
+sine
+
+: Tests the `s()` function in the math library.
+
+cosine
+
+: Tests the `c()` function in the math library.
+
+bessel
+
+: Tests the `j()` function in the math library.
+
+arrays
+
+: Test arrays.
+
+misc
+
+: Miscellaneous tests. I named it this because at the time, I struggled to
+ classify them, but it's really testing multi-line numbers.
+
+misc1
+
+: A miscellaneous test found by [AFL++][125].
+
+misc2
+
+: A miscellaneous test found by [AFL++][125].
+
+misc3
+
+: A miscellaneous test found by [AFL++][125].
+
+misc4
+
+: A miscellaneous test found by [AFL++][125].
+
+misc5
+
+: A miscellaneous test found by [AFL++][125].
+
+misc6
+
+: A miscellaneous test found by [AFL++][125].
+
+misc7
+
+: A miscellaneous test found by [AFL++][125].
+
+void
+
+: Tests void functions.
+
+rand
+
+: Tests the pseudo-random number generator and its special stack handling.
+
+recursive_arrays
+
+: Tests the slab vector undo ability in used in `bc_parse_name()`.
#### `dc` Standard Tests
-TODO
+The list of current (17 July 2021) standard tests for `dc` is below:
-* List of all standard tests with what they test.
+decimal
-### Script Tests
+: Tests decimal parsing and printing.
-TODO
+length
+
+: Tests the `length()` builtin function.
+
+stack_len
+
+: Tests taking the length of the results stack.
+
+add
+
+: Tests addition.
+
+subtract
+
+: Tests subtraction.
+
+multiply
+
+: Tests multiplication.
+
+divide
+
+: Tests division.
-* Use the global stacks flag.
+modulus
+
+: Tests modulus.
+
+divmod
+
+: Tests divmod.
+
+power
+
+: Tests power (exponentiation).
+
+sqrt
+
+: Tests the `sqrt()` (square root) builtin function.
+
+modexp
+
+: Tests modular exponentiation.
+
+boolean
+
+: Tests boolean operators.
+
+negate
+
+: Tests negation as a command and as part of numbers.
+
+trunc
+
+: Tests the truncation (`$`) operator.
+
+places
+
+: Tests the places (`@`) operator.
+
+shift
+
+: Tests the left (`<<`) and right (`>>`) shift operators.
+
+abs
+
+: Tests the `abs()` builtin function.
+
+scientific
+
+: Tests scientific notation.
+
+engineering
+
+: Tests engineering notation.
+
+vars
+
+: Tests some usage of variables. This one came from [AFL++][125] I think.
+
+misc
+
+: Miscellaneous tests. I named it this because at the time, I struggled to
+ classify them.
+
+strings
+
+: Tests strings.
+
+rand
+
+: Tests the pseudo-random number generator and its special stack handling.
+
+### Script Tests
The heavy lifting of testing the scripting of `bc` is done by the "script tests"
for each calculator.
@@ -1673,45 +2224,210 @@ Some script tests need to be skipped in certain cases. That is handled by the
[build system][142]. See the [Integration with the Build System][147] section
for more details.
+Another unique thing about the script tests, at least for `bc`: they test the
+`-g` and `--global-stacks` flags. This means that all of the script tests for
+`bc` are written assuming the `-g` flag was given on the command-line
+
+There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed`
+script is used to remove an incompatibility with GNU `bc`.
+
+If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU
+`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is
+correct according to my reading of the `bc` spec, so my `bc` doesn't as well.
+
+The `sed` script edits numbers that end with just one digit on a line by itself
+to put it on the same line as others.
+
#### `bc` Script Tests
-TODO
+The list of current (17 July 2021) script tests for `bc` is below:
+
+print.bc
+
+: Tests printing even harder than the print standard test.
+
+multiply.bc
+
+: Tests multiplication even harder than the multiply standard test.
+
+divide.bc
+
+: Tests division even harder than the divide standard test.
+
+subtract.bc
+
+: Tests subtraction even harder than the subtract standard test.
+
+add.bc
+
+: Tests addition even harder than the add standard test.
+
+parse.bc
+
+: Tests parsing even harder than the parse standard test.
+
+array.bc
+
+: Tests arrays even harder than the arrays standard test.
+
+atan.bc
+
+: Tests arctangent even harder than the arctangent standard test.
+
+bessel.bc
+
+: Tests bessel even harder than the bessel standard test.
+
+functions.bc
+
+: Tests functions even harder than the functions standard test.
+
+globals.bc
+
+: Tests global stacks directly.
+
+len.bc
+
+: Tests the `length()` builtin on arrays.
+
+rand.bc
-* List of all script tests with what they test.
+: Tests the random number generator in the presence of global stacks.
+
+references.bc
+
+: Tests functions with array reference parameters.
+
+screen.bc
+
+: A random script provided by an early user that he used to calculate the size
+ of computer screens
+
+strings2.bc
+
+: Tests escaping in strings.
#### `dc` Script Tests
-TODO
+The list of current (17 July 2021) script tests for `dc` is below:
+
+prime.dc
+
+: Tests scripting by generating the first 100,000 primes.
+
+asciify.dc
+
+: Tests the asciify command.
+
+stream.dc
+
+: Tests the stream command.
+
+array.dc
+
+: Tests arrays.
+
+else.dc
+
+: Tests else clauses on conditional execution commands.
-* List of all script tests with what they test.
+factorial.dc
+
+: Tests scripting with factorial.
+
+loop.dc
+
+: Tests scripting by implementing loops.
+
+quit.dc
+
+: Tests the quit command in the presence of tail calls.
+
+weird.dc
+
+: A miscellaneous test.
### Error Tests
-TODO
+One of the most useful parts of the `bc` test suite, in my opinion, is the heavy
+testing of error conditions.
+
+Just about every error condition I can think of is tested, along with many
+machine-generated (by [AFL++][125]) ones.
+
+However, because the error tests will often return error codes, they require
+different infrastructure from the rest of the test suite, which assumes that
+the calculator under test will return successfully. A lot of that infrastructure
+is in the [`scripts/functions.sh`][105] script, but it basically allows the
+calculator to exit with an error code and then tests that there *was* an error
+code.
+
+The error tests for each calculator are spread through two directories, due to
+historical accident. These two directories are the standard test directory (see
+the [Standard Tests][149] section) and the `errors/` directory directly
+underneath the standard tests directory.
+
+This split is convenient, however, because the tests in each directory are
+treated differently.
+
+The error tests in the standard test directory, which include `errors.txt` for
+both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for `dc`,
+are read line-by-line and shoved through `stdin`, and each line is considered a
+separate test. For this reason, there can't be any blank lines in the error
+files in the standard tests directory because a blank line causes a successful
+exit.
+
+On the other hand, the tests in the `errors/` directory below the standard tests
+directory are considered to be one test per file, and they are used differently.
+They are shoved into the calculator through `stdin`, but they are also executed
+on the command-line.
+
+To add an error test, first figure out which kind you want.
+
+Is it a simple one-liner, and you don't care if it's tested through a file?
-* POSIX tests for `bc`.
-* One test per line on files in standard test directory.
-* One test per file in errors directory.
- * Run files *and* through `stdin` with `cat`.
-* Adding tests.
+Then put it in one of the error files in the standard test directory. I would
+only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()`
+errors in the `read_errors.txt` file for `dc`; all others I would put in the
+respective `errors.txt` file.
+
+On the other hand, if you care if the error is run as a file on the
+command-line, or the error requires multiple lines to reproduce, then put the
+test in the respective `errors/` directory.
+
+After that, you are done; the test suite will automatically pick up the new
+test, and you don't have to tell the test suite the expected results.
### `stdin` Tests
-TODO
+The `stdin` tests specifically test the lexing and parsing of multi-line
+comments and strings. This is important because when reading from `stdin`, the
+calculators can only read one line at a time, so partial parses are possible.
-* Adding tests.
+To add `stdin` tests, just add the tests to the `stdin.txt` file in the
+respective standard tests directory, and add the expected results in the
+`stdin_results.txt` in the respective standard tests directory.
### `read()` Tests
-TODO
+The `read()` tests are meant to test the `read()` builtin function, to ensure
+that the parsing and execution is correct.
-* Adding tests.
+Each line is one test, as that is the nature of using the `read()` function, so
+to add a test, just add it as another line in the `read.txt` file in the
+respective standard tests directory, and add its result to the
+`read_results.txt` file in the respective standard tests directory.
### Other Tests
-TODO
+The "other" tests are just random tests that I could not easily classify under
+other types of tests. They usually include things like command-line parsing and
+environment variable testing.
-* Adding tests.
+To add an other test, it requires adding the programming for it to
+[`tests/other.sh`][195] because all of the tests are written specifically in
+that script. It would be best to use the infrastructure in
+[`scripts/functions.sh`][105].
### Linux `timeconst.bc` Script
@@ -2189,9 +2905,28 @@ under [`locales/`][85].
## Static Analysis
-TODO
+I do *some* static analysis on `bc`.
+
+I used to use [Coverity][196], but I stopped using it when it started giving me
+too many false positives and also because it had a vulnerability.
+
+However, I still use the [Clang Static Analyzer][197] through
+[`scan-build`][19]. I only use it in debug mode because I have to add some
+special code to make it not complain about things that are definitely not a
+problem.
-* `scan-build make`
+The most frequent example of false positives is where a local is passed to a
+function to be initialized. [`scan-build`][19] misses that fact, so I
+pre-initialize such locals to prevent the warnings.
+
+To run `scan-build`, do the following:
+
+```
+make clean
+scan-build make
+```
+
+`scan-build` will print its warnings to `stdout`.
## Fuzzing
@@ -2322,6 +3057,9 @@ Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on
`bc` and 4 on `dc`. The outputs will be put into the
`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories.
+(Note that loading that config will also delete all unsaved [AFL++][125] output
+from the output directories.)
+
Sometimes, [AFL++][125] will report crashes when there are none. When crashes
are reported, I always run the following command:
@@ -2386,6 +3124,9 @@ I occasionally add to the input corpuses. These files come from new files in the
However, when I add new files to an input corpus, I sometimes reduce the size of
the file by removing some redundancies.
+And then, when adding to the `bc` corpuses, I try to add them evenly so that
+each corpus will take about the same amount of time to get to a finished state.
+
### [AFL++][125] Quickstart
The way [AFL++][125] works is complicated.
@@ -2494,39 +3235,123 @@ place.
### POSIX Mode
-TODO
+POSIX mode is `bc`-only.
+
+In fact, POSIX mode is two different modes: Standard Mode and Warning Mode.
+These modes are designed to help users write POSIX-compatible `bc` scripts.
#### Standard Mode
-TODO
+Standard Mode is activated with the `-s` or `--standard` flags.
+
+In this mode, `bc` will error if any constructs are used that are not strictly
+compatible with the [POSIX `bc` specification][2].
#### Warning Mode
-TODO
+Warning Mode is activated with the `-w` or `--warn` flags.
+
+In this mode, `bc` will issue warnings, but continue, if any constructs are used
+that are not strictly compatible with the [POSIX `bc` specification][2].
### Memory Management
-TODO
+The memory management in `bc` is simple: everything is owned by one thing.
+
+If something is in a vector, it is owned by that vector.
-* Ownership.
+If something is contained in a struct, it is owned by that struct with one
+exception: structs can be given pointers to other things, but only if those
+other things will outlast the struct itself.
+
+As an example, the `BcParse` struct has a pointer to the one `BcProgram` in
+`bc`. This is okay because the program is initialized first and deallocated
+last.
+
+In other words, it's simple: if a field in a struct is a pointer, then unless
+that pointer is directly allocated by the struct (like the vector array or the
+number limb array), that struct does not own the item at that pointer.
+Otherwise, the struct *does* own the item.
### [Async-Signal-Safe][115] Signal Handling
-TODO
+`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but
+`bc` is, by and large, CPU-bound. This has several consequences, but the biggest
+is that there is no easy way to allow signals to interrupt it.
+
+This consequence is not obvious, but it comes from the fact that a lot of I/O
+operations can be interrupted and return [`EINTR`][198]. This makes such I/O
+calls natural places for allowing signals to interrupt execution, even when the
+signal comes during execution, and not interrupting I/O calls. The way this is
+done is setting a flag in the signal handler, which is checked around the time
+of the I/O call, when it is convenient.
+
+Alternatively, I/O bound programs can use the [self-pipe trick][199].
+
+Neither of these are possible in `bc` because the execution of math code can
+take a long time. If a signal arrives during this long execution time, setting a
+flag like an I/O bound application and waiting until the next I/O call could
+take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a
+week to calculate a million digits of pi, and it's not slow as far as `bc`
+implementations go.)
+
+Thus, using just the technique of setting the flag just will not work for an
+interactive calculator.
+
+Well, it can, but it requires a lot of code and massive inefficiencies. I know
+this because that was the original implementation.
-* Async-signal-safe functions in handlers.
-* `volatile sig_atomic_t`.
-* Setting flags to specific values, to make it atomic. (No adds because it
- might not be atomic at that point.)
+The original implementation set a flag and just exit the signal handler. Then,
+on just about every loop header, I have a check for the signal flag. These
+checks happened on every iteration of every loop. It was a massive waste because
+it was polling, and [polling is evil][200].
+
+So for version [3.0.0][32], I expended a lot of effort to change the
+implementation.
+
+In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`)
+to tell the signal handler whether it can use `longjmp()` to stop the current
+execution. If so, it does. If not, it sets a flag, which then is used by the
+code outside the signal handler that set the `vm.sig_lock` flag. When that code
+unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that
+code executes the `longjmp()` and stops the current execution.
+
+Other than that, the rest of the interrupt-based implementation is best
+described in the [Error Handling][97].
+
+However, there are rules for signal handlers that I must lay out.
+
+First, signal handlers can only call [async-signal-safe][115] functions.
+
+Second, any field set or read by both the signal handler and normal code must be
+a `volatile sig_atomic_t`.
+
+Third, when setting such fields, they must be set to constants and no math can
+be done on them. This restriction and the above restriction exist in order to
+ensure that the setting of the fields is always atomic with respect to signals.
+
+These rules exist for *any* code using Unix signal handlers, not just `bc`.
+
+#### Vectors and Numbers
+
+Vectors and numbers needed special consideration with the interrupt-based signal
+handling.
+
+When vectors and numbers are about to allocate, or *reallocate* their arrays,
+they need to lock signals to ensure that they do not call `malloc()` and friends
+and get interrupted by a signal because, as you will see in the [Error
+Handling][97] section, `longjmp()` cannot be used in a signal handler if it may
+be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and
+friends.
### Asserts
If you asked me what procedure is used the most in `bc`, I would reply without
hesitation, "`assert()`."
-I use `assert()` everywhere. In fact, it is what made fuzzing with [AFL++][125]
-so effective. [AFL++][125] is incredibly good at finding crashes, and a failing
-`assert()` counts as one.
+I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with
+[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes,
+and a failing `assert()` counts as one.
So while a lot of bad bugs might have corrupted data and *not* caused crashes,
because I put in so many `assert()`'s, they were *turned into* crashing bugs,
@@ -2689,15 +3514,64 @@ the inserting code must create a new item and insert it.
If the two vectors were combined together, it would not be possible to separate
the steps such that creating a new item could be avoided if it already exists.
+#### Slabs and Slab Vectors
+
+`bc` allocates *a lot* of small strings, and small allocations are the toughest
+for general-purpose allocators to handle efficiently.
+
+Because of that reason, I decided to create a system for allocating small
+strings using something that I call a "slab vector" after [slab
+allocators][201].
+
+These vectors allocate what I call "slabs," which are just an allocation of a
+single page with a length to tell the slab how much of the slab is used.
+
+The vector itself holds slabs, and when the slab vector is asked to allocate a
+string, it attempts to in the last slab. If that slab cannot do so, it allocates
+a new slab and allocates from that.
+
+There is one exception: if a string is going to be bigger than 128 bytes, then
+the string is directly allocated, and a slab is created with that pointer and a
+length of `SIZE_MAX`, which tells the slab vector that it is a direct
+allocation. Then, the last slab is pushed into the next spot and the new special
+slab is put into the vacated spot. This ensures that a non-special slab is
+always last.
+
+Not only that, but a slab vector can be used as a stack allocator. If the string
+needs to be deallocated, the length of the string is passed to the slab vector
+which deallocates it using that information. This allows a stack of allocations.
+
### Command-Line History
-TODO
+When I first wrote `bc`, I immediately started using it in order to eat my own
+dog food.
-### Error Handling
+It sucked, and the biggest reason why was because of the lack of command-line
+history.
-TODO
+At first, I just dealt with it, not knowing how command-line history might be
+implemented.
+
+Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had
+known about for some time.
+
+It turned out to be easier than I thought; the hardest part was the tedious
+renaming of everything to fit the `bc` naming scheme.
+
+Understanding command-line history in `bc` is really about understanding VT-100
+escape codes, so I would start there.
+
+Now, the history implementation of `bc` has been adapted far beyond that initial
+adaptation to make the command-line history implementation perfect for `bc`
+alone, including integrating it into `bc`'s [Custom I/O][114] and making sure
+that it does not disturb output that did not end with a newline.
-* Note about vectors and numbers needing special treatment for error handling.
+On top of that, at one point, I attempted to get history to work on Windows. It
+barely worked after a lot of work and a lot of portability code, but even with
+all of that, it does not have at least one feature: multi-line pasting from the
+clipboard.
+
+### Error Handling
The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it
became one of the things that taught me the most about C in particular and
@@ -2749,7 +3623,7 @@ I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement
exceptions, so I thought I would learn how to use them. How hard could it be?
Quite hard, it turns out, especially in the presence of signals. And that's
-because there are many snares:
+because there are a few big snares:
1. The value of any local variables are not guaranteed to be preserved after a
`longjmp()` back into a function.
@@ -2777,57 +3651,14 @@ jump if necessary.
#### Custom I/O
-TODO
-
-* Talk about interaction with command-line history.
-
-### Functions
-
-TODO
-
-#### Main and Read Functions
-
-TODO
-
-#### `dc` Strings
-
-TODO
-
-##### Tail Calls
-
-TODO
-
-#### Execution
-
-TODO
-
-* Bytecode.
-* Stack machine.
-* `stack` vs. `results`.
-* `BcInstPtr` for marking where execution is.
-* Variables are arrays, in order to push arguments on them and to implement
- autos.
-* Arrays are arrays as well.
-
-##### Bytecode
-
-TODO
-
-###### Bytecode Indices
-
-TODO
-
-##### Variables
-
-TODO
-
-##### Arrays
-
-TODO
-
-###### Array References
+Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and
+`longjmp()` for error handling (see the [Error Handling][97] section), and the
+buffered I/O in `libc` does not interact well with the use of those procedures;
+all of the buffered I/O API is basically non-[async-signal-safe][115].
-TODO
+Implementing custom buffered I/O had other benefits. First, it allowed me to
+tightly integrate history with the I/O code. Second, it allowed me to make
+changes to history in order to make it adapt to user prompts.
### Lexing
@@ -3004,6 +3835,83 @@ Maintaining the integrity of the requirements flag set is an important part of
the `bc` parser. However, they do not have to be stored on a stack because their
stack is implicit from the recursion that expression parsing uses.
+### Functions
+
+Functions, in `bc`, are data structures that contain the bytecode and data
+produced by the parsers. Functions are what the `BcProgram` program executes.
+
+#### Main and Read Functions
+
+There are two functions that always exist, which I call the "main" and "read"
+functions.
+
+The "main" function is the function in which any code and data outside other
+functions is put. Basically, it is the function where the scripting code ends
+up.
+
+The "read" function is the function that is reset and parsed every time a call
+to the `read()` builtin function happens.
+
+#### `dc` Strings
+
+In `dc`, strings can be executed, and since there are no actual "functions" in
+`dc`, strings are handled as functions. In fact, they are effectively translated
+into functions by parsing.
+
+##### Tail Calls
+
+Since strings in `dc` are functions, and the fact that `dc` has no native loops,
+such loops are implemented in `dc` code using strings with conditional execution
+commands at the end of strings.
+
+When such conditional execution, or even unconditional execution, commands are
+the very last commands in a string, then `dc` can perform a [tail call][202].
+
+This is done by recording the fact that a tail call happened, done by
+incrementing an integer on a stack. When a string is executed *without* a tail
+call, a new entry is pushed onto the stack with the integer `1`.
+
+When a string finally quits that followed tail calls, its stack entry is popped,
+eliminating all of those tail calls.
+
+Why perform tail calls? Because otherwise, `dc` would be subject to the same
+thing that plagues [functional programming languages][203]: stack overflow. In
+`dc`'s case, that would manifest itself as a growing [heap][204], because the
+execution stack is stored on the heap, until a fatal allocation failure would
+occur.
+
+#### Execution
+
+TODO
+
+* Bytecode.
+* Stack machine.
+* `stack` vs. `results`.
+* `BcInstPtr` for marking where execution is.
+* Variables are arrays, in order to push arguments on them and to implement
+ autos.
+* Arrays are arrays as well.
+
+##### Bytecode
+
+TODO
+
+###### Bytecode Indices
+
+TODO
+
+##### Variables
+
+TODO
+
+##### Arrays
+
+TODO
+
+###### Array References
+
+TODO
+
### Callbacks
There are many places in `bc` and `dc` where function pointers are used:
@@ -3105,10 +4013,6 @@ TODO
TODO
-### Slabs and Slab Vectors
-
-TODO
-
## `bcl`
TODO
@@ -3296,3 +4200,32 @@ TODO
[173]: #async-signal-safe-signal-handling
[174]: #vectorh
[175]: #read_errorstxt
+[176]: #statush
+[177]: #numbers
+[178]: #math-concepts
+[179]: #pseudo-random-number-generator
+[180]: #lexh
+[181]: #parseh
+[182]: #dch
+[183]: #libraryh
+[184]: #numh
+[185]: #readh
+[186]: #maps
+[187]: #slabs-and-slab-vectors
+[188]: ./build.md#extra-math
+[189]: #command-line-history
+[190]: #scriptsed
+[191]: #linux-timeconstbc-script
+[192]: #corpuses
+[193]: ./build.md#history
+[194]: https://www.valgrind.org/docs/manual/mc-manual.html
+[195]: #othersh
+[196]: https://scan.coverity.com/
+[197]: https://clang-analyzer.llvm.org/
+[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it
+[199]: https://cr.yp.to/docs/selfpipe.html
+[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph
+[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator
+[202]: https://en.wikipedia.org/wiki/Tail_call
+[203]: https://en.wikipedia.org/wiki/Functional_programming_language
+[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation