In addition to making large scale coding manageable, the techniques
described on this page, also make it maintainable in the long term.
When you are writing something that is important ,and you know is going to
be around for years, it's a really really good idea to write it in a way
that will be easy to debug, test, and modify, a year from now when you've
forgotten most of its innards.
If you find yourself needing to do a LARGE amount of tasks, the size of the task is not in itself a reason to move to another language. It is, however, a reason to start with a drastically different mindset.
The mindset that is most useful, is that of the professional, traditional
programming-language programmer, who breaks up a program into separate
modules.
Since such a mindset is the subject of some full semester-long courses,
I wont be able to do it justice on a single page! However, I can at least
describe a basic framework by which you can excercise this mindset within
the bounds of ksh programming.
If nothing else, version control becomes useful sometimes, when you wish to
do "releases" of software.
It also can become convenient to track down stupid one-line errors that you
have changed unexpectedly from "last commit"
Lastly, it can be a good safeguard from an accidental "rm somespec_ *.ksh" (note the unexpected space). Backups can serve this purpose... IF you have them! But even then, losing a full days's work between backups, can be very demotivational.
Modules take the concept of functions, but then go a step further. Rather than just grouping and containing a set of commands together, modules group and contain sets of functions (and associated values) together. In this page, I will introduce the concept of separate files, as "modules".
ksh, and other sh-derivatives, have the dot command "." , as a way to "source"(include) other files. This is literally equivalent to the "#include" directive in many other languages. Most commonly, it is used merely as a convenience to slurp in some configuration file that merely has some convenient variable settings. That being said, there is no reason not to use it to include full functions as well.
But why do this?
For sanity, and debugging purposes.
In contrast, if you are disciplined enough to subdivide your code so that, for example, all your file manupulation code is in a single file, you then have a much smaller target to check through, when you make a change that affects "file manipulation" in your program.
In smaller scripts, it is easy to just make a quick copy, and abuse the copy for quick-and-dirty testing, cutting out pieces unneccessary for the test willy-nilly. In larger, more complex programs, there is a higher danger that such an approach, will end up accidentally cutting out a piece that is cruicial to normal operation of the program.
But what if the program is already "cut up", into pre-planed,
self-contained sections?
From these "modular sections", you can then pick and choose just the ones
you need, and also jump right into the routines you need, with a good
degree of certainty of safety.
As an even further benefit; once you have modules, it is then easier to write task-specific test harnesses. For a large program, you might have a collection of scripts just for testing, that you keep around with the program itself. Then, when you make a change to a particular section, you can re-run the test script to verify that you have not broken anything in that module.
should not actually DO anything, besides defining functions and variables.. prog_somemodule
#!/bin/ksh -p CODEDIR="." . $CODEDIR/prog_status . $CODEDIR/prog_mungedata get_status if [[ $? -eq 0 ]] ; then rotate_data fi |
# start of prog_status get_status() { ps -ef|grep apache if [[ $? -eq 0 ]] ; then return 0 else return 1 fi } |
# start of prog_mungedata # rotate_data: note you should probably call get_status before this, # to ensure program is not active before log rotation rotate_data() { mv -f /var/log/apache.1 /var/log/apache.2 mv -f /var/log/apache /var/log/apache.1 } |
#!/bin/ksh -p # Test harness for 'prog' routines. # right now, we just test to validate get_status works properly get_status if [[ $? -eq 0 ]] ; then print According to get_status, program is running now else print According to get_status, program is NOT running now fi |
There is yet another option, however: you can keep the different modules
split for coding, but for delivery, simply do the "include" yourself, by
concatenating the files into a single shellscript before deployment.
In a sense, this can be considered akin to "linking" object files, to
delivery a single executable object.
Since "cpp" does not like '#' style comments, (but you'll want to use '#' style comments in your shell programs!!) here's a quick Makefile example of how to do this sort of thing easily:
yourprog: main.ksh incfile1 incfile1 awk '$$1 == "AWKinclude" {system ("cat "$$2);next;} \ {print}' main.ksh > $@Then in your main.ksh, use "AWKinclude incfile1
" instead of the cpp style of "#include <incfile1>"
For a fancy, useful example of this style, you can look at my zrep "source code" directory