Yacc and lex tools
Detailed interpretation of individual logs, however, is often left to the log specialists. My two acquaintances were log specialists who wanted an additional tool to assist them in the processing and interpretation of individual or combinations of logs. Specifically, they wanted to tell the computer to perform arithmetic operations on individual or some algebraic combination of logs. For example, they might need to scale a specific log by an arbitrary amount, say 1.
In another case, they might want to divide one log by another, and then add the result to a third, all before adding a constant and then raising the resulting values to some arbitrary power. Keeping in mind that logs are composed of individual sample values taken as frequently as every few inches or centimeters as they are here in Canada and many other places in the world , these example requests would mean a reasonable amount of computation, multiplying every sample of thousands of meters of recorded log values by 1.
The simple scaling problem isn't particularly difficult, but identifying the desired log could be. The energy company for which my acquaintances worked was using a simple filing method for all the log curves a curve corresponds to all the recorded samples for one tool in one bore hole wherein each curve was identified by a name.
To this was added some additional information on units and so on, plus all the samples for all the curves for the well. As a result, more complicated combinations of curves required a fairly sophisticated and robust mechanism for arbitrary name recognition, while the desired algebraic operation was being described. Given such an interesting challenge, I recognized an opportunity to put some relatively little-used tools to work: lex and yacc. The program lex is a lexical analyzer.
Lexical analysis is the recognition of words in a language. As used in this particular application, lex, or more specifically flex , is used to recognize characters forming the names of log curves, arithmetic operators and algebraic groupings.
It accepts word items and, given a list of rules describing how these items form larger entities, deduces which items or combinations of items satisfy a rule. This can also be thought of as grammatical analysis. Once a rule is satisfied, a programmer's code is applied to the result. In the case of English, the words in a sentence can be assigned grammatical types such as noun, verb, adjective and so on.
Particular combinations of words form more complex units and these in turn can be described as complete sentences. The first noun phrase is the subject of the sentence, and the second, in combination with the verb, forms the predicate. Together they form a complete sentence. It is possible to form parsers for the English language, although given English's many idiosyncrasies, yacc may prove to be inadequate for the task.
It may also be that the yacc programmer may become exhausted in trying to describe all the nuances of the language. Once yacc understood a particular line of BASIC code, it could cause the execution of the equivalent instructions in the native language of the host computer. Some Linux distributions provide a choice of yacc programs. One can install either or both Berkeley yacc or the GNU bison program. They are quite similar; bison was originally derived from yacc, although there has been some divergence over the years.
The combination of lex, yacc and some programmer's C code provides a complete means to interpret and act upon a user's wishes. The lex program uses its regular expression interpretation capability to recognize strings of characters as words or tokens. Once a token is identified, it is passed to yacc where the various rules are applied until some combination of tokens form a recognizable structure to which yacc applies some pre-programmed C code.
As indicated, lex uses regular expressions to recognize strings of characters as items of interest. Regular expressions are composed of special characters which describe acceptable combinations of characters.
For example, regular expressions often use the character. Similarly, the characters [ and ] square brackets are used to indicate acceptance of any of the characters enclosed within them or within the range of characters described between them. For example, the expression [abc] says to accept any of the characters a, b or c; the expression [a-c] says the same thing. A more complicated example might be [a-zA-Z] which says to accept any alphanumeric character.
Once a regular expression matches the text stream being interpreted by lex, code created by the programmer is executed. When used with yacc, this code generally amounts to passing an indication of what was just recognized to yacc for further processing. Also as indicated, yacc uses a grammar description to decode the meaning of the tokens that lex passes to it.
As tokens are passed to yacc, it applies its rules until a single token, or some sequence of tokens, becomes a recognizable structure. Before a programmer's C code is executed, though, yacc may require several structures or token-structure combinations to be recognized. For example, using our sample sentence, our rules might look like the following:.
The first rule says that a sentence is made up of two parts: a subject followed by a predicate. If that rule is satisfied, then the C code between the curly brackets will be executed. To satisfy the first rule, yacc has to recognize both a subject and a predicate.
The subsequent rules help yacc to do just that. For example, the second rule says that a subject is recognized when either a noun or a noun phrase is recognized. A noun is the smallest unit that yacc deals with, and in the yacc grammar, a noun is a token that yacc will want to have lex recognize. Thus, somewhere in the yacc program, a token will be defined probably called NOUN that lex and yacc will use to communicate the fact that a noun has been interpreted.
How this is done we will see shortly. Notice that a noun phrase is also used to create the predicate. If a verb is recognized and it is followed by a noun phrase, the predicate is identified. If only the noun phrase is identified, then the subject has been identified. The example cited is not in yacc syntax, but is meant to provide understanding.
Very detailed examples may be found in the references. You may be wondering how the yacc parser actually works. A finite-state machine records its current condition as each recognizable item is interpreted. For example, as a noun phrase is being interpreted, it moves from state 3 when it receives a preposition to state 4 when the adjective is interpreted and finally to state 5 when the noun is recognized.
When the entire phrase has been recognized, it switches to another state, perhaps 37, to note that fact. Please do not attach any particular meaning to the numbers used in this example.
They have been chosen arbitrarily to demonstrate how yacc progresses as it interprets the tokens received from lex. You should conclude only that to reach state 5 noun phrase , yacc must progress through several preceding states, each of which might lead to another final state, depending on the grammar yacc is using. In other words, given its current state, yacc requests from lex the next token if it needs it and places onto the stack its new state. In doing so, it may push the new state onto the stack as when interpreting the noun phrase , or pop the old state off the stack, replacing it with a new state as when the noun phrase is completely recognized.
The table is translated to a program which reads an input stream, copying it to an output stream and partitioning the input into strings which match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton generated by Lex.
The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream. An input language may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about checking their inputs for validity. Yacc provides a general tool for describing the input to a computer program.
The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that han- dles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user's application handled by this subroutine. Flex, A fast scanner generator Vern Paxson flex is a tool for generating scanners: programs which recognized lexical patterns in text.
The description is in the form of pairs of regular expressions and C code, called rules. When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code.
0コメント