[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
defun
Our next project is to count the number of words in a function
definition. Clearly, this can be done using some variant of
count-word-region
. See section Counting Words: Repetition and Regexps. If we are just going to count the words in
one definition, it is easy enough to mark the definition with the
C-M-h (mark-defun
) command, and then call
count-word-region
.
However, I am more ambitious: I want to count the words and symbols in every definition in the Emacs sources and then print a graph that shows how many functions there are of each length: how many contain 40 to 49 words or symbols, how many contain 50 to 59 words or symbols, and so on. I have often been curious how long a typical function is, and this will tell.
Divide and Conquer | ||
14.1 What to Count? | What to count? | |
14.2 What Constitutes a Word or Symbol? | What constitutes a word or symbol? | |
14.3 The count-words-in-defun Function | Very like count-words . | |
14.4 Count Several defuns Within a File | Counting several defuns in a file. | |
14.5 Find a File | Do you want to look at a file? | |
14.6 lengths-list-file in Detail | A list of the lengths of many definitions. | |
14.7 Count Words in defuns in Different Files | Counting in definitions in different files. | |
14.8 Recursively Count Words in Different Files | Recursively counting in different files. | |
14.9 Prepare the Data for Display in a Graph | Prepare the data for display in a graph. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Described in one phrase, the histogram project is daunting; but divided into numerous small steps, each of which we can take one at a time, the project becomes less fearsome. Let us consider what the steps must be:
count-words-in-defun
function.
This is quite a project! But if we take each step slowly, it will not be difficult.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When we first start thinking about how to count the words in a
function definition, the first question is (or ought to be) what are
we going to count? When we speak of `words' with respect to a Lisp
function definition, we are actually speaking, in large part, of
`symbols'. For example, the following multiply-by-seven
function contains the five symbols defun
,
multiply-by-seven
, number
, *
, and 7
. In
addition, in the documentation string, it contains the four words
`Multiply', `NUMBER', `by', and `seven'. The
symbol `number' is repeated, so the definition contains a total
of ten words and symbols.
(defun multiply-by-seven (number) "Multiply NUMBER by seven." (* 7 number)) |
However, if we mark the multiply-by-seven
definition with
C-M-h (mark-defun
), and then call
count-words-region
on it, we will find that
count-words-region
claims the definition has eleven words, not
ten! Something is wrong!
The problem is twofold: count-words-region
does not count the
`*' as a word, and it counts the single symbol,
multiply-by-seven
, as containing three words. The hyphens are
treated as if they were interword spaces rather than intraword
connectors: `multiply-by-seven' is counted as if it were written
`multiply by seven'.
The cause of this confusion is the regular expression search within
the count-words-region
definition that moves point forward word
by word. In the canonical version of count-words-region
, the
regexp is:
"\\w+\\W*" |
This regular expression is a pattern defining one or more word constituent characters possibly followed by one or more characters that are not word constituents. What is meant by `word constituent characters' brings us to the issue of syntax, which is worth a section of its own.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Emacs treats different characters as belonging to different syntax categories. For example, the regular expression, `\\w+', is a pattern specifying one or more word constituent characters. Word constituent characters are members of one syntax category. Other syntax categories include the class of punctuation characters, such as the period and the comma, and the class of whitespace characters, such as the blank space and the tab character. (For more information, see section `The Syntax Table' in The GNU Emacs Manual, and section `Syntax Tables' in The GNU Emacs Lisp Reference Manual.)
Syntax tables specify which characters belong to which categories.
Usually, a hyphen is not specified as a `word constituent character'.
Instead, it is specified as being in the `class of characters that are
part of symbol names but not words.' This means that the
count-words-region
function treats it in the same way it treats
an interword white space, which is why count-words-region
counts `multiply-by-seven' as three words.
There are two ways to cause Emacs to count `multiply-by-seven' as one symbol: modify the syntax table or modify the regular expression.
We could redefine a hyphen as a word constituent character by modifying the syntax table that Emacs keeps for each mode. This action would serve our purpose, except that a hyphen is merely the most common character within symbols that is not typically a word constituent character; there are others, too.
Alternatively, we can redefine the regular expression used in the
count-words
definition so as to include symbols. This
procedure has the merit of clarity, but the task is a little tricky.
The first part is simple enough: the pattern must match "at least one character that is a word or symbol constituent". Thus:
"\\(\\w\\|\\s_\\)+" |
The `\\(' is the first part of the grouping construct that includes the `\\w' and the `\\s_' as alternatives, separated by the `\\|'. The `\\w' matches any word-constituent character and the `\\s_' matches any character that is part of a symbol name but not a word-constituent character. The `+' following the group indicates that the word or symbol constituent characters must be matched at least once.
However, the second part of the regexp is more difficult to design. What we want is to follow the first part with "optionally one or more characters that are not constituents of a word or symbol". At first, I thought I could define this with the following:
"\\(\\W\\|\\S_\\)*" |
The upper case `W' and `S' match characters that are not word or symbol constituents. Unfortunately, this expression matches any character that is either not a word constituent or not a symbol constituent. This matches any character!
I then noticed that every word or symbol in my test region was followed by white space (blank space, tab, or newline). So I tried placing a pattern to match one or more blank spaces after the pattern for one or more word or symbol constituents. This failed, too. Words and symbols are often separated by whitespace, but in actual code parentheses may follow symbols and punctuation may follow words. So finally, I designed a pattern in which the word or symbol constituents are followed optionally by characters that are not white space and then followed optionally by white space.
Here is the full regular expression:
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
count-words-in-defun
Function
We have seen that there are several ways to write a
count-word-region
function. To write a
count-words-in-defun
, we need merely adapt one of these
versions.
The version that uses a while
loop is easy to understand, so I
am going to adapt that. Because count-words-in-defun
will be
part of a more complex program, it need not be interactive and it need
not display a message but just return the count. These considerations
simplify the definition a little.
On the other hand, count-words-in-defun
will be used within a
buffer that contains function definitions. Consequently, it is
reasonable to ask that the function determine whether it is called
when point is within a function definition, and if it is, to return
the count for that definition. This adds complexity to the
definition, but saves us from needing to pass arguments to the
function.
These considerations lead us to prepare the following template:
(defun count-words-in-defun () "documentation..." (set up... (while loop...) return count) |
As usual, our job is to fill in the slots.
First, the set up.
We are presuming that this function will be called within a buffer
containing function definitions. Point will either be within a
function definition or not. For count-words-in-defun
to work,
point must move to the beginning of the definition, a counter must
start at zero, and the counting loop must stop when point reaches the
end of the definition.
The beginning-of-defun
function searches backwards for an
opening delimiter such as a `(' at the beginning of a line, and
moves point to that position, or else to the limit of the search. In
practice, this means that beginning-of-defun
moves point to the
beginning of an enclosing or preceding function definition, or else to
the beginning of the buffer. We can use beginning-of-defun
to
place point where we wish to start.
The while
loop requires a counter to keep track of the words or
symbols being counted. A let
expression can be used to create
a local variable for this purpose, and bind it to an initial value of zero.
The end-of-defun
function works like beginning-of-defun
except that it moves point to the end of the definition.
end-of-defun
can be used as part of an expression that
determines the position of the end of the definition.
The set up for count-words-in-defun
takes shape rapidly: first
we move point to the beginning of the definition, then we create a
local variable to hold the count, and finally, we record the position
of the end of the definition so the while
loop will know when to stop
looping.
The code looks like this:
(beginning-of-defun) (let ((count 0) (end (save-excursion (end-of-defun) (point)))) |
The code is simple. The only slight complication is likely to concern
end
: it is bound to the position of the end of the definition
by a save-excursion
expression that returns the value of point
after end-of-defun
temporarily moves it to the end of the
definition.
The second part of the count-words-in-defun
, after the set up,
is the while
loop.
The loop must contain an expression that jumps point forward word by
word and symbol by symbol, and another expression that counts the
jumps. The true-or-false-test for the while
loop should test
true so long as point should jump forward, and false when point is at
the end of the definition. We have already redefined the regular
expression for this (see section 14.2 What Constitutes a Word or Symbol?), so the loop is straightforward:
(while (and (< (point) end) (re-search-forward "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t) (setq count (1+ count))) |
The third part of the function definition returns the count of words
and symbols. This part is the last expression within the body of the
let
expression, and can be, very simply, the local variable
count
, which when evaluated returns the count.
Put together, the count-words-in-defun
definition looks like this:
(defun count-words-in-defun () "Return the number of words and symbols in a defun." (beginning-of-defun) (let ((count 0) (end (save-excursion (end-of-defun) (point)))) (while (and (< (point) end) (re-search-forward "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t)) (setq count (1+ count))) count)) |
How to test this? The function is not interactive, but it is easy to
put a wrapper around the function to make it interactive; we can use
almost the same code as for the recursive version of
count-words-region
:
;;; Interactive version. (defun count-words-defun () "Number of words and symbols in a function definition." (interactive) (message "Counting words and symbols in function definition ... ") (let ((count (count-words-in-defun))) (cond ((zerop count) (message "The definition does NOT have any words or symbols.")) ((= 1 count) (message "The definition has 1 word or symbol.")) (t (message "The definition has %d words or symbols." count))))) |
Let's re-use C-c = as a convenient keybinding:
(global-set-key "\C-c=" 'count-words-defun) |
Now we can try out count-words-defun
: install both
count-words-in-defun
and count-words-defun
, and set the
keybinding, and then place the cursor within the following definition:
(defun multiply-by-seven (number) "Multiply NUMBER by seven." (* 7 number)) => 10 |
Success! The definition has 10 words and symbols.
The next problem is to count the numbers of words and symbols in several definitions within a single file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
defuns
Within a File A file such as `simple.el' may have 80 or more function definitions within it. Our long term goal is to collect statistics on many files, but as a first step, our immediate goal is to collect statistics on one file.
The information will be a series of numbers, each number being the length of a function definition. We can store the numbers in a list.
We know that we will want to incorporate the information regarding one file with information about many other files; this means that the function for counting definition lengths within one file need only return the list of lengths. It need not and should not display any messages.
The word count commands contain one expression to jump point forward word by word and another expression to count the jumps. The function to return the lengths of definitions can be designed to work the same way, with one expression to jump point forward definition by definition and another expression to construct the lengths' list.
This statement of the problem makes it elementary to write the
function definition. Clearly, we will start the count at the
beginning of the file, so the first command will be (goto-char
(point-min))
. Next, we start the while
loop; and the
true-or-false test of the loop can be a regular expression search for
the next function definition--so long as the search succeeds, point
is moved forward and then the body of the loop is evaluated. The body
needs an expression that constructs the lengths' list. cons
,
the list construction command, can be used to create the list. That
is almost all there is to it.
Here is what this fragment of code looks like:
(goto-char (point-min)) (while (re-search-forward "^(defun" nil t) (setq lengths-list (cons (count-words-in-defun) lengths-list))) |
What we have left out is the mechanism for finding the file that contains the function definitions.
In previous examples, we either used this, the Info file, or we switched back and forth to some other buffer, such as the `*scratch*' buffer.
Finding a file is a new process that we have not yet discussed.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To find a file in Emacs, you use the C-x C-f (find-file
)
command. This command is almost, but not quite right for the lengths
problem.
Let's look at the source for find-file
(you can use the
find-tag
command or C-h f (describe-function
) to
find the source of a function):
(defun find-file (filename) "Edit file FILENAME. Switch to a buffer visiting file FILENAME, creating one if none already exists." (interactive "FFind file: ") (switch-to-buffer (find-file-noselect filename))) |
The definition possesses short but complete documentation and an
interactive specification that prompts you for a file name when you
use the command interactively. The body of the definition contains
two functions, find-file-noselect
and switch-to-buffer
.
According to its documentation as shown by C-h f (the
describe-function
command), the find-file-noselect
function reads the named file into a buffer and returns the buffer.
However, the buffer is not selected. Emacs does not switch its
attention (or yours if you are using find-file-noselect
) to the
named buffer. That is what switch-to-buffer
does: it switches
the buffer to which Emacs attention is directed; and it switches the
buffer displayed in the window to the new buffer. We have discussed
buffer switching elsewhere. (See section 2.3 Switching Buffers.)
In this histogram project, we do not need to display each file on the
screen as the program determines the length of each definition within
it. Instead of employing switch-to-buffer
, we can work with
set-buffer
, which redirects the attention of the computer
program to a different buffer but does not redisplay it on the screen.
So instead of calling on find-file
to do the job, we must write
our own expression.
The task is easy: use find-file-noselect
and set-buffer
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lengths-list-file
in Detail
The core of the lengths-list-file
function is a while
loop containing a function to move point forward `defun by defun' and
a function to count the number of words and symbols in each defun.
This core must be surrounded by functions that do various other tasks,
including finding the file, and ensuring that point starts out at the
beginning of the file. The function definition looks like this:
(defun lengths-list-file (filename) "Return list of definitions' lengths within FILE. The returned list is a list of numbers. Each number is the number of words or symbols in one function definition." (message "Working on `%s' ... " filename) (save-excursion (let ((buffer (find-file-noselect filename)) (lengths-list)) (set-buffer buffer) (setq buffer-read-only t) (widen) (goto-char (point-min)) (while (re-search-forward "^(defun" nil t) (setq lengths-list (cons (count-words-in-defun) lengths-list))) (kill-buffer buffer) lengths-list))) |
The function is passed one argument, the name of the file on which it will work. It has four lines of documentation, but no interactive specification. Since people worry that a computer is broken if they don't see anything going on, the first line of the body is a message.
The next line contains a save-excursion
that returns Emacs'
attention to the current buffer when the function completes. This is
useful in case you embed this function in another function that
presumes point is restored to the original buffer.
In the varlist of the let
expression, Emacs finds the file and
binds the local variable buffer
to the buffer containing the
file. At the same time, Emacs creates lengths-list
as a local
variable.
Next, Emacs switches its attention to the buffer.
In the following line, Emacs makes the buffer read-only. Ideally, this line is not necessary. None of the functions for counting words and symbols in a function definition should change the buffer. Besides, the buffer is not going to be saved, even if it were changed. This line is entirely the consequence of great, perhaps excessive, caution. The reason for the caution is that this function and those it calls work on the sources for Emacs and it is very inconvenient if they are inadvertently modified. It goes without saying that I did not realize a need for this line until an experiment went awry and started to modify my Emacs source files ...
Next comes a call to widen the buffer if it is narrowed. This function is usually not needed--Emacs creates a fresh buffer if none already exists; but if a buffer visiting the file already exists Emacs returns that one. In this case, the buffer may be narrowed and must be widened. If we wanted to be fully `user-friendly', we would arrange to save the restriction and the location of point, but we won't.
The (goto-char (point-min))
expression moves point to the
beginning of the buffer.
Then comes a while
loop in which the `work' of the function is
carried out. In the loop, Emacs determines the length of each
definition and constructs a lengths' list containing the information.
Emacs kills the buffer after working through it. This is to save
space inside of Emacs. My version of Emacs 19 contained over 300
source files of interest; Emacs 21 contains over 800 source files.
Another function will apply lengths-list-file
to each of the
files.
Finally, the last expression within the let
expression is the
lengths-list
variable; its value is returned as the value of
the whole function.
You can try this function by installing it in the usual fashion. Then
place your cursor after the following expression and type C-x
C-e (eval-last-sexp
).
(lengths-list-file "/usr/local/share/emacs/21.0.100/lisp/emacs-lisp/debug.el") |
(You may need to change the pathname of the file; the one here worked with GNU Emacs version 21.0.100. To change the expression, copy it to the `*scratch*' buffer and edit it.
(Also, to see the full length of the list, rather than a truncated version, you may have to evaluate the following:
(custom-set-variables '(eval-expression-print-length nil)) |
(See section Setting Variables with defcustom
.
Then evaluate the lengths-list-file
expression.)
The lengths' list for `debug.el' takes less than a second to produce and looks like this:
(77 95 85 87 131 89 50 25 44 44 68 35 64 45 17 34 167 457) |
(Using my old machine, the version 19 lengths' list for `debug.el' took seven seconds to produce and looked like this:
(75 41 80 62 20 45 44 68 45 12 34 235) |
(The newer version of `debug.el' contains more defuns than the earlier one; and my new machine is much faster than the old one.)
Note that the length of the last definition in the file is first in the list.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
defuns
in Different Files In the previous section, we created a function that returns a list of the lengths of each definition in a file. Now, we want to define a function to return a master list of the lengths of the definitions in a list of files.
Working on each of a list of files is a repetitious act, so we can use
either a while
loop or recursion.
Determine the lengths of defuns | Return a list of the lengths of defuns. | |
14.7.1 The append Function | Attach one list to another. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
defuns
The design using a while
loop is routine. The argument passed
the function is a list of files. As we saw earlier (see section 11.1.1 A while
Loop and a List), you can write a while
loop so that the body of the
loop is evaluated if such a list contains elements, but to exit the
loop if the list is empty. For this design to work, the body of the
loop must contain an expression that shortens the list each time the
body is evaluated, so that eventually the list is empty. The usual
technique is to set the value of the list to the value of the CDR
of the list each time the body is evaluated.
The template looks like this:
(while test-whether-list-is-empty body... set-list-to-cdr-of-list) |
Also, we remember that a while
loop returns nil
(the
result of evaluating the true-or-false-test), not the result of any
evaluation within its body. (The evaluations within the body of the
loop are done for their side effects.) However, the expression that
sets the lengths' list is part of the body--and that is the value
that we want returned by the function as a whole. To do this, we
enclose the while
loop within a let
expression, and
arrange that the last element of the let
expression contains
the value of the lengths' list. (See section Loop Example with an Incrementing Counter.)
These considerations lead us directly to the function itself:
;;; Use |
expand-file-name
is a built-in function that converts a file
name to the absolute, long, path name form of the directory in which
the function is called.
Thus, if expand-file-name
is called on debug.el
when
Emacs is visiting the
`/usr/local/share/emacs/21.0.100/lisp/emacs-lisp/' directory,
debug.el |
becomes
/usr/local/share/emacs/21.0.100/lisp/emacs-lisp/debug.el |
The only other new element of this function definition is the as yet
unstudied function append
, which merits a short section for
itself.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
append
Function
The append
function attaches one list to another. Thus,
(append '(1 2 3 4) '(5 6 7 8)) |
produces the list
(1 2 3 4 5 6 7 8) |
This is exactly how we want to attach two lengths' lists produced by
lengths-list-file
to each other. The results contrast with
cons
,
(cons '(1 2 3 4) '(5 6 7 8)) |
which constructs a new list in which the first argument to cons
becomes the first element of the new list:
((1 2 3 4) 5 6 7 8) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Besides a while
loop, you can work on each of a list of files
with recursion. A recursive version of lengths-list-many-files
is short and simple.
The recursive function has the usual parts: the `do-again-test', the
`next-step-expression', and the recursive call. The `do-again-test'
determines whether the function should call itself again, which it
will do if the list-of-files
contains any remaining elements;
the `next-step-expression' resets the list-of-files
to the
CDR of itself, so eventually the list will be empty; and the
recursive call calls itself on the shorter list. The complete
function is shorter than this description!
(defun recursive-lengths-list-many-files (list-of-files) "Return list of lengths of each defun in LIST-OF-FILES." (if list-of-files ; do-again-test (append (lengths-list-file (expand-file-name (car list-of-files))) (recursive-lengths-list-many-files (cdr list-of-files))))) |
In a sentence, the function returns the lengths' list for the first of
the list-of-files
appended to the result of calling itself on
the rest of the list-of-files
.
Here is a test of recursive-lengths-list-many-files
, along with
the results of running lengths-list-file
on each of the files
individually.
Install recursive-lengths-list-many-files
and
lengths-list-file
, if necessary, and then evaluate the
following expressions. You may need to change the files' pathnames;
those here work when this Info file and the Emacs sources are located
in their customary places. To change the expressions, copy them to
the `*scratch*' buffer, edit them, and then evaluate them.
The results are shown after the `=>'. (These results are for files from Emacs Version 21.0.100; files from other versions of Emacs may produce different results.)
(cd "/usr/local/share/emacs/21.0.100/") (lengths-list-file "./lisp/macros.el") => (273 263 456 90) (lengths-list-file "./lisp/mail/mailalias.el") => (38 32 26 77 174 180 321 198 324) (lengths-list-file "./lisp/makesum.el") => (85 181) (recursive-lengths-list-many-files '("./lisp/macros.el" "./lisp/mail/mailalias.el" "./lisp/makesum.el")) => (273 263 456 90 38 32 26 77 174 180 321 198 324 85 181) |
The recursive-lengths-list-many-files
function produces the
output we want.
The next step is to prepare the data in the list for display in a graph.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The recursive-lengths-list-many-files
function returns a list
of numbers. Each number records the length of a function definition.
What we need to do now is transform this data into a list of numbers
suitable for generating a graph. The new list will tell how many
functions definitions contain less than 10 words and
symbols, how many contain between 10 and 19 words and symbols, how
many contain between 20 and 29 words and symbols, and so on.
In brief, we need to go through the lengths' list produced by the
recursive-lengths-list-many-files
function and count the number
of defuns within each range of lengths, and produce a list of those
numbers.
Based on what we have done before, we can readily foresee that it should not be too hard to write a function that `CDRs' down the lengths' list, looks at each element, determines which length range it is in, and increments a counter for that range.
However, before beginning to write such a function, we should consider the advantages of sorting the lengths' list first, so the numbers are ordered from smallest to largest. First, sorting will make it easier to count the numbers in each range, since two adjacent numbers will either be in the same length range or in adjacent ranges. Second, by inspecting a sorted list, we can discover the highest and lowest number, and thereby determine the largest and smallest length range that we will need.
14.9.1 Sorting Lists | Sorting lists. | |
14.9.2 Making a List of Files | Making a list of files. | |
14.9.3 Counting function definitions |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Emacs contains a function to sort lists, called (as you might guess)
sort
. The sort
function takes two arguments, the list
to be sorted, and a predicate that determines whether the first of
two list elements is "less" than the second.
As we saw earlier (see section Using the Wrong Type Object as an Argument), a predicate is a function that
determines whether some property is true or false. The sort
function will reorder a list according to whatever property the
predicate uses; this means that sort
can be used to sort
non-numeric lists by non-numeric criteria--it can, for example,
alphabetize a list.
The <
function is used when sorting a numeric list. For example,
(sort '(4 8 21 17 33 7 21 7) '<) |
produces this:
(4 7 7 8 17 21 21 33) |
(Note that in this example, both the arguments are quoted so that the
symbols are not evaluated before being passed to sort
as
arguments.)
Sorting the list returned by the
recursive-lengths-list-many-files
function is straightforward;
it uses the <
function:
(sort (recursive-lengths-list-many-files '("../lisp/macros.el" "../lisp/mailalias.el" "../lisp/makesum.el")) '< |
which produces:
(85 86 116 122 154 176 179 265) |
(Note that in this example, the first argument to sort
is not
quoted, since the expression must be evaluated so as to produce the
list that is passed to sort
.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The recursive-lengths-list-many-files
function requires a list
of files as its argument. For our test examples, we constructed such
a list by hand; but the Emacs Lisp source directory is too large for
us to do for that. Instead, we will write a function to do the job
for us. In this function, we will use both a while
loop and a
recursive call.
We did not have to write a function like this for older versions of
GNU Emacs, since they placed all the `.el' files in one
directory. Instead, we were able to use the directory-files
function, which lists the names of files that match a specified
pattern within a single directory.
However, recent versions of Emacs place Emacs Lisp files in sub-directories of the top level `lisp' directory. This re-arrangement eases navigation. For example, all the mail related files are in a `lisp' sub-directory called `mail'. But at the same time, this arrangement forces us to create a file listing function that descends into the sub-directories.
We can create this function, called files-in-below-directory
,
using familiar functions such as car
, nthcdr
, and
substring
in conjunction with an existing function called
directory-files-and-attributes
. This latter function not only
lists all the filenames in a directory, including the names
of sub-directories, but also their attributes.
To restate our goal: to create a function that will enable us
to feed filenames to recursive-lengths-list-many-files
as a list that looks like this (but with more elements):
("../lisp/macros.el" "../lisp/mail/rmail.el" "../lisp/makesum.el") |
The directory-files-and-attributes
function returns a list of
lists. Each of the lists within the main list consists of 13
elements. The first element is a string that contains the name of the
file -- which, in GNU/Linux, may be a `directory file', that is to
say, a file with the special attributes of a directory. The second
element of the list is t
for a directory, a string
for symbolic link (the string is the name linked to), or nil
.
For example, the first `.el' file in the `lisp/' directory is `abbrev.el'. Its name is `/usr/local/share/emacs/21.0.100/lisp/abbrev.el' and it is not a directory or a symbolic link.
This is how directory-files-and-attributes
lists that file and
its attributes:
("/usr/local/share/emacs/21.0.100/lisp/abbrev.el" nil 1 1000 100 (15019 32380) (14883 48041) (15214 49336) 11583 "-rw-rw-r--" t 341385 776) |
On the other hand, `mail/' is a directory within the `lisp/' directory. The beginning of its listing looks like this:
("/usr/local/share/emacs/21.0.100/lisp/mail" t ... ) |
(Look at the documentation of file-attributes
to learn about
the different attributes. Bear in mind that the
file-attributes
function does not list the filename, so its
first element is directory-files-and-attributes
's second
element.)
We will want our new function, files-in-below-directory
, to
list the `.el' files in the directory it is told to check, and in
any directories below that directory.
This gives us a hint on how to construct
files-in-below-directory
: within a directory, the function
should add `.el' filenames to a list; and if, within a directory,
the function comes upon a sub-directory, it should go into that
sub-directory and repeat its actions.
However, we should note that every directory contains a name that
refers to itself, called `.', ("dot") and a name that refers to
its parent directory, called `..' ("double dot"). (In
`/', the root directory, `..' refers to itself, since
`/' has no parent.) Clearly, we do not want our
files-in-below-directory
function to enter those directories,
since they always lead us, directly or indirectly, to the current
directory.
Consequently, our files-in-below-directory
function must do
several tasks:
Let's write a function definition to do these tasks. We will use a
while
loop to move from one filename to another within a
directory, checking what needs to be done; and we will use a recursive
call to repeat the actions on each sub-directory. The recursive
pattern is `accumulate'
(see section Recursive Pattern: accumulate),
using append
as the combiner.
Here is the function:
(defun files-in-below-directory (directory) "List the .el files in DIRECTORY and in its sub-directories." ;; Although the function will be used non-interactively, ;; it will be easier to test if we make it interactive. ;; The directory will have a name such as ;; "/usr/local/share/emacs/21.0.100/lisp/" (interactive "DDirectory name: ") (let (el-files-list (current-directory-list (directory-files-and-attributes directory t))) ;; while we are in the current directory (while current-directory-list (cond ;; check to see whether filename ends in `.el' ;; and if so, append its name to a list. ((equal ".el" (substring (car (car current-directory-list)) -3)) (setq el-files-list (cons (car (car current-directory-list)) el-files-list))) ;; check whether filename is that of a directory ((eq t (car (cdr (car current-directory-list)))) ;; decide whether to skip or recurse (if (equal (or "." "..") (substring (car (car current-directory-list)) -1)) ;; then do nothing if filename is that of ;; current directory or parent () ;; else descend into the directory and repeat the process (setq el-files-list (append (files-in-below-directory (car (car current-directory-list))) el-files-list))))) ;; move to the next filename in the list; this also ;; shortens the list so the while loop eventually comes to an end (setq current-directory-list (cdr current-directory-list))) ;; return the filenames el-files-list)) |
The files-in-below-directory
directory-files
function
takes one argument, the name of a directory.
Thus, on my system,
(length (files-in-below-directory "/usr/local/share/emacs/21.0.100/lisp/")) |
tells me that my version 21.0.100 Lisp sources directory contains 754 `.el' files.
files-in-below-directory
returns a list in reverse alphabetical
order. An expression to sort the list in alphabetical order looks
like this:
(sort (files-in-below-directory "/usr/local/share/emacs/21.0.100/lisp/") 'string-lessp) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Our immediate goal is to generate a list that tells us how many function definitions contain fewer than 10 words and symbols, how many contain between 10 and 19 words and symbols, how many contain between 20 and 29 words and symbols, and so on.
With a sorted list of numbers, this is easy: count how many elements
of the list are smaller than 10, then, after moving past the numbers
just counted, count how many are smaller than 20, then, after moving
past the numbers just counted, count how many are smaller than 30, and
so on. Each of the numbers, 10, 20, 30, 40, and the like, is one
larger than the top of that range. We can call the list of such
numbers the top-of-ranges
list.
If we wished, we could generate this list automatically, but it is simpler to write a list manually. Here it is:
(defvar top-of-ranges '(10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300) "List specifying ranges for `defuns-per-range'.") |
To change the ranges, we edit this list.
Next, we need to write the function that creates the list of the
number of definitions within each range. Clearly, this function must
take the sorted-lengths
and the top-of-ranges
lists
as arguments.
The defuns-per-range
function must do two things again and
again: it must count the number of definitions within a range
specified by the current top-of-range value; and it must shift to the
next higher value in the top-of-ranges
list after counting the
number of definitions in the current range. Since each of these
actions is repetitive, we can use while
loops for the job.
One loop counts the number of definitions in the range defined by the
current top-of-range value, and the other loop selects each of the
top-of-range values in turn.
Several entries of the sorted-lengths
list are counted for each
range; this means that the loop for the sorted-lengths
list
will be inside the loop for the top-of-ranges
list, like a
small gear inside a big gear.
The inner loop counts the number of definitions within the range. It
is a simple counting loop of the type we have seen before.
(See section A loop with an incrementing counter.)
The true-or-false test of the loop tests whether the value from the
sorted-lengths
list is smaller than the current value of the
top of the range. If it is, the function increments the counter and
tests the next value from the sorted-lengths
list.
The inner loop looks like this:
(while length-element-smaller-than-top-of-range (setq number-within-range (1+ number-within-range)) (setq sorted-lengths (cdr sorted-lengths))) |
The outer loop must start with the lowest value of the
top-of-ranges
list, and then be set to each of the succeeding
higher values in turn. This can be done with a loop like this:
(while top-of-ranges body-of-loop... (setq top-of-ranges (cdr top-of-ranges))) |
Put together, the two loops look like this:
(while top-of-ranges ;; Count the number of elements within the current range. (while length-element-smaller-than-top-of-range (setq number-within-range (1+ number-within-range)) (setq sorted-lengths (cdr sorted-lengths))) ;; Move to next range. (setq top-of-ranges (cdr top-of-ranges))) |
In addition, in each circuit of the outer loop, Emacs should record
the number of definitions within that range (the value of
number-within-range
) in a list. We can use cons
for
this purpose. (See section cons
.)
The cons
function works fine, except that the list it
constructs will contain the number of definitions for the highest
range at its beginning and the number of definitions for the lowest
range at its end. This is because cons
attaches new elements
of the list to the beginning of the list, and since the two loops are
working their way through the lengths' list from the lower end first,
the defuns-per-range-list
will end up largest number first.
But we will want to print our graph with smallest values first and the
larger later. The solution is to reverse the order of the
defuns-per-range-list
. We can do this using the
nreverse
function, which reverses the order of a list.
For example,
(nreverse '(1 2 3 4)) |
produces:
(4 3 2 1) |
Note that the nreverse
function is "destructive"---that is,
it changes the list to which it is applied; this contrasts with the
car
and cdr
functions, which are non-destructive. In
this case, we do not want the original defuns-per-range-list
,
so it does not matter that it is destroyed. (The reverse
function provides a reversed copy of a list, leaving the original list
as is.)
Put all together, the defuns-per-range
looks like this:
(defun defuns-per-range (sorted-lengths top-of-ranges) "SORTED-LENGTHS defuns in each TOP-OF-RANGES range." (let ((top-of-range (car top-of-ranges)) (number-within-range 0) defuns-per-range-list) ;; Outer loop. (while top-of-ranges ;; Inner loop. (while (and ;; Need number for numeric test. (car sorted-lengths) (< (car sorted-lengths) top-of-range)) ;; Count number of definitions within current range. (setq number-within-range (1+ number-within-range)) (setq sorted-lengths (cdr sorted-lengths))) ;; Exit inner loop but remain within outer loop. (setq defuns-per-range-list (cons number-within-range defuns-per-range-list)) (setq number-within-range 0) ; Reset count to zero. ;; Move to next range. (setq top-of-ranges (cdr top-of-ranges)) ;; Specify next top of range value. (setq top-of-range (car top-of-ranges))) ;; Exit outer loop and count the number of defuns larger than ;; the largest top-of-range value. (setq defuns-per-range-list (cons (length sorted-lengths) defuns-per-range-list)) ;; Return a list of the number of definitions within each range, ;; smallest to largest. (nreverse defuns-per-range-list))) |
The function is straightforward except for one subtle feature. The true-or-false test of the inner loop looks like this:
(and (car sorted-lengths) (< (car sorted-lengths) top-of-range)) |
instead of like this:
(< (car sorted-lengths) top-of-range) |
The purpose of the test is to determine whether the first item in the
sorted-lengths
list is less than the value of the top of the
range.
The simple version of the test works fine unless the
sorted-lengths
list has a nil
value. In that case, the
(car sorted-lengths)
expression function returns
nil
. The <
function cannot compare a number to
nil
, which is an empty list, so Emacs signals an error and
stops the function from attempting to continue to execute.
The sorted-lengths
list always becomes nil
when the
counter reaches the end of the list. This means that any attempt to
use the defuns-per-range
function with the simple version of
the test will fail.
We solve the problem by using the (car sorted-lengths)
expression in conjunction with the and
expression. The
(car sorted-lengths)
expression returns a non-nil
value so long as the list has at least one number within it, but
returns nil
if the list is empty. The and
expression
first evaluates the (car sorted-lengths)
expression, and
if it is nil
, returns false without evaluating the
<
expression. But if the (car sorted-lengths)
expression returns a non-nil
value, the and
expression
evaluates the <
expression, and returns that value as the value
of the and
expression.
This way, we avoid an error.
See section 12.4 forward-paragraph
: a Goldmine of Functions, for more information about and
.
Here is a short test of the defuns-per-range
function. First,
evaluate the expression that binds (a shortened)
top-of-ranges
list to the list of values, then evaluate the
expression for binding the sorted-lengths
list, and then
evaluate the defuns-per-range
function.
;; (Shorter list than we will use later.) (setq top-of-ranges '(110 120 130 140 150 160 170 180 190 200)) (setq sorted-lengths '(85 86 110 116 122 129 154 176 179 200 265 300 300)) (defuns-per-range sorted-lengths top-of-ranges) |
The list returned looks like this:
(2 2 2 0 0 1 0 2 0 0 4) |
Indeed, there are two elements of the sorted-lengths
list
smaller than 110, two elements between 110 and 119, two elements
between 120 and 129, and so on. There are four elements with a value
of 200 or larger.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |