CLAN programs

 

The CHILDES database comes with the CLAN programs, written by Leonid Spektor at Carnegie Mellon University. The CLAN programs are tools for analyzing the data. Below you will find a short explanation about the CLAN-programs. For more information on the CLAN programs you may consult the  CHILDES homepage, Brian McWhinney's handbook and the volume by Sokolov & Snow (1994). You will also find instructions about the CLAN programs and some exercises on Lynn Eubank's website (Univerity of North Texas, Denton).

References:

 

NOTE: I made a tutorial you can download (PDF files)  Introductory tutorial  Advanced tutorial

How to use CLAN

In order to run CLAN, you have to install CLAN at your pc. See the CHILDES website. When you work in WINDOWS on one of the computers in the Lab (Acquisition laboratorium, Achter de Dom 24), you will find CLAN under the programs.  When used as an analysis tool, CLAN allows the user to perform automatic analyses on transcript data, and functions best when used with files formatted according to the CHAT system. In this mode, CLAN provides a command window which allows you to issue commands to run analyses. The output of the analyses then appears in another window, the "output window".  You will access the command and output windows by double-clicking on the CLAN icon.

 

The Command Window

The Command window looks as follows:

(Source: URL: www.cogsci.ed.ac.uk/~amyi/mate/childes.html by Beppe Capelli and Amy Isard)

 

 

The Output Window

The output window gives the results of the last CLAN command.

 

 

A CLAN search command includes several components:  

 

Input (working) and output directory

The input and output directories are set as follows:

By default CLAN gives the output on the screen. With the option +f or the 'redirect' symbol  > you can change this. See Parameter Switches below. You can also give very precise information as to the working file and the output file when you  enter the command line. See for this the instructions in Exercises

 

The Search functions

The CLAN command line specifies a number of search functions: the command, the search file(s) and the parameter switch(es). The command is specified only once. The search files and switches may be specified more than once.

You do not need to worry about the order in which the options appear. In fact, the only order rule that is used for CLAN commands is that the command name must come first. After that, you can put the file name or any switch in any order you wish. You must not forget to keep a space between each option. 

By default CLAN gives per working/search file 1 output file. With the option +f of the 'redirect' symbol > you can change this. See Parameter Switches below.  

 

1. The Command

CLAN has been designed to perform automatic analyses on transcript data. The most frequently used options are: frequencies counts, word/morphemes searches, combined searches of 2 or more words/morphemes, MLU counts (mean length of utterances). Usually, you only have to deal with these options. You will find a survey of the other CLAN-commands by clicking on the CLAN icon at the Command window.

 

 freq

 

 kwal

 

 combo

 

 

 mlu

 

Searches for frequencies of items

(as a result you will get a list of the items with their frequencies)

Searches for key words

(as a result you will get the list with the utterances that contain the key word)

Searches for a combination of items combined with Boolean operators like "and" or "or"

(as a result you will get the list of the utterances that contain the (combination of the) items)

Calculates the mean length of utterance

 

2. The Parameter Switches

By adding a number of parameter switches at the command line, the search action can be reduced or extended. The most frequently used parameter switches are the following: 

 

Switch

 +t

 +s
 +w -w

 

+u

 +f -f

  +d

 

Function

selects the utterances of a specified speaker (the one following the tier)
selects a word to be search (search)
gives extra utterances in the context of the searched item (window)
 

specifies that all search results are stored in 1 file
 

+f: the output is stored in the (specified) file(s)

-f: the output appears on the screen
used with 'kwal' this option puts the output in CHAT format, suitable for further search actions

 

 

Example

+t*CHI

 

+s"where" (searches for: where)

+w4 -w2 (gives 4 utterances after and 2 before the utterance in which the item appears)

 

 

 

 

 

The parameter switches have in general a + (include) and a - (exclude) option.

Details of the Switches:

 

+t

This option allows you to include (+t) or exclude (-t) particular tiers. In CHAT formatted files, there exist three tier code types: main speaker tiers (denoted by *), speaker dependent tiers (denoted by %), and header tiers (denoted by @). The speaker-dependent tiers are attached to speaker tiers. If, for example, you request to analyze the speaker *MOT and all the %cod dependent tiers, the programs will analyze all of the *MOT main tiers and only the %cod dependent tiers associated with that speaker. The +t option allows you to specify which main speaker tiers, their dependent tiers, and header tiers should be included in the analysis. All other tiers, found in the given file, will be ignored by the program.

 

+s

The +s/-s switch is usually used to include or exclude certain words. The +s option allows you to specify the keyword you desire to find. You do this by putting the word in quotes directly after the +s switch, as in +s"dog" to search for the word "dog." You can also use the +s switch to specify a file containing words to be searched. You do this by putting the file name after the +s preceded by the @ sign, as in +s@adverbs, which will search for the words in a file called adverbs.cut. If you want to look for the literal character @, you need to precede it with a backslash as in +s"\@". It is possible to specify as many +s options on the command line as you like. Use of the +s option will override the default list.

 

+w

This option can be used with either KWAL or COMBO. These programs are used to display tiers that contain keywords or regular expressions as chosen by the user. By default, KWAL and COMBO combine the user-chosen main and dependent tiers into

"clusters." Each cluster includes the main tier and its dependent tiers. (See the +u option for further information on clusters.) The -w option followed by a positive integer causes the program to display that number of clusters before each cluster of interest. The +w option followed by a positive integer causes the program to display that number of clusters after each cluster of interest.

 

+u

By default, when the user has specified a series of files on the command line, the analysis is performed on each individual file. The program then provides separate output for each data file. If the command line uses the +u option, the program combines the data found in all the specified files into one set and outputs the result for that set as a whole.

 

+f

This option allows you to send output to a file rather than to the screen. By default, nearly all of the programs send the results of the analyses directly to the screen. You can, however, request that your results be inserted into a file. This is accomplished by inserting the +f option into the command line. The advantage of sending the program’s results to a file is that you can go over the analysis more carefully, because you have a

file to which you can later refer.

The -f switch is used for sending output to the screen. For most programs, -f is the default and you do not need to enter it. You only need to use the -f switch when you want the output to go to the screen for CHSTRING, FLO, and SALTIN. The advantage of sending the analysis to the screen (also called standard output) is that the results are immediate and your directory is less cluttered with nonessential files. This is ideal for quick temporary analysis.

 

+d

Normally, KWAL outputs the location of the tier where the match occurs. When the +d switch is turned on you can output each matched sentence without line number information in a simple legal CHAT format. The +d1 switch outputs legal CHAT format along with file names and line numbers. The +d and +d1 switches can be extremely important tools for performing analyses on particular subsets of a text.

 

 

Examples

1) A search option with kwal may look like this:

kwal +s"where" +s"how" +t*CHI adam*.cha -w1 +w2 +u +f 

Searches for the words where and how in the utterances of CHI (child=Adam). Does so in all CHAT-files of Adam (in the directory specified under Working on the Command window). Gives all utterances in which these words appear plus each time 1 utterance before and 2 after. Stores the output in 1 big file (in the directory  specified in Output on the Command window. This file receives automatically the extension .kwa

If you want to give the output file another extension you can do that by adding a three-letter command to the +f switch, e.g. +flst  The output is then stored in a file with the extension .lst. If you want to specify the name of the output file any further, you can use the 'redirect symbol > , for instance:

kwal +s"where" +s"how" +t*CHI adam*.cha -w1 +w2 +u > dem-lst.txt

With this option the output is stored in a file named dem-lst.txt 

 

2) A search option with freq may look like this:

freq +s"when" +t*CHI eve*.cha 

Gives the frequency of the word when in the utterances of CHI (child=Eve). Does so in all CHAT-files of Eve (in the directory given under Working on the Commando window. Shows the output on the Output window.   

The use of the asterisk (*) is explained below.

 

3) An example with combo:

combo +s"what"^*^"do" +t*CHI +t*MOT nina*.cha +f

Searches the word what immediately followed by do or followed by do further in the sentence in the utterances of the child  (child=Nina) en MOT (mother=Nina's mother). Does so in all CHAT-files of Nina (in the directory specified under Working on the Command window). Gives all utterances in which these words appear. Stores the output in separate files (in the directory specified under Output on the Command window).

 

Useful helps for searching

Finally, CLAN offers two possibilities that facilitate searching:

 

1. The Wildcard (asterisk *)

A wildcard uses the asterisk symbol (*) to take the place of something else. Wildcards can be used to refer to a group of files (*.cha), a group of speakers (CH*), or a group of words with a common form

For instance:

 

eve10.cha

eve*.cha

+s"go"

+s"go*"

 

 

+s"*go*"

 

 

searches in 1 file (eve10.cha)

searches in all (cha) files of Eve 

searches for go

searches for all words that begin with go: go, goes, goed (child language), going, gone, gold, golden, good, etc...

searches for all words that contain go, so next to the ones above: ongoing, outgo, outgoing, etc...

 

 

2. Searching in a list of words

You can create a file containing the list of words that you want to search for. You can let the command search for the words in the file (this saves you typing a series of +s switches)

For instance

kwal +s@A:\wh-words.lst

Searches for the words stored in the file A:\wh-words.lst