Designing and refining searches


The purpose of this exercise is to give practice in using corpus search tools and in evaluating the results which they produce.

A corpus search tool called cqp is available on the Suns as bncsamp. It is the version distributed with the BNC sampler, so it covers about a million words of British English.

The assignment is to use bncsamp to examine the patterns of occurrence of the words "since", "while" and "although" in the BNC sampler. I'd like a short report describing your findings, and in particular, the way that you arrived at them. Two sides of standard paper should suffice.

The search tool is powerful, but limited in various ways. This is a very common situation in corpus work: the trick is to find ways of getting as close as possible to the queries that you really would prefer to pose. The task is designed to make you want a full syntactic analysis, but only an approximation is available.

Running CQP

For the moment, you have to run bncsamp under X-windows on a Sun. If you invoke bncsamp, you should be able to get the following behaviour by typing "Clinton";.

BNC-SAMP> "Clinton";
   333888: reet pundits reckon that  <Clinton> is home which boosted th
   334848: weeping victory for Bill  <Clinton> in the US election hit U
   336007: t World markets think of  <Clinton> . SHARES had a bumpy rid
   336019: yesterday following Bill  <Clinton> 's landslide victory . T
   336075: n Asia following news of  <Clinton> 's win , closed with a m
   336197: n . The Bundesbank , not  <Clinton> , will decide Europe 's 
   336225: rates . ` In many ways ,  <Clinton> 's victory was the least
  1477355: utside world . President  <Clinton> replaced the A T F with 
  1477928: ence ran out . President  <Clinton> authorized the F B I 's 
  1550750: hat 's an <unclear> . Mr  <Clinton> or Mrs Clinton <unclear>
  1550753: ear> . Mr Clinton or Mrs  <Clinton> <unclear> , I 'm not sur
  1562677: t , George Bush and Bill  <Clinton> . It was uninterrupted .
  1735051: cently whereby President  <Clinton> and his wife Hilary and 
  1735090: p the limousine , Hilary  <Clinton> leaped out of the car an
  1735130: nt so well ? <pause> Mrs  <Clinton> replied that she 'd been
BNC-SAMP> 
There's a lot more capability than this: you can show parts of speech by
BNC-SAMP> show +pos;
BNC-SAMP> "Clinton";
   333888: /NN2 reckon/VV0 that/CST  <Clinton/NP1> is/VBZ home/RL which/DDQ
   334848: tory/NN1 for/IF Bill/NP1  <Clinton/NP1> in/II the/AT US/NP1 elec
   336007: kets/NN2 think/VV0 of/IO  <Clinton/NP1> ./YSTP SHARES/NN2 had/VH
   336019: RT following/II Bill/NP1  <Clinton/NP1> 's/GE landslide/NN1 vict
   336075: lowing/II news/NN1 of/IO  <Clinton/NP1> 's/GE win/NN1 ,/YCOM clo
   336197: esbank/NP1 ,/YCOM not/XX  <Clinton/NP1> ,/YCOM will/VM decide/VV
   336225: many/DA2 ways/NN2 ,/YCOM  <Clinton/NP1> 's/GE victory/NN1 was/VB
  1477355: NN1 ./YSTP President/NNB  <Clinton/NP1> replaced/VVD the/AT A/ZZ
  1477928: /RP ./YSTP President/NNB  <Clinton/NP1> authorized/VVD the/AT F/
  1550750: /__UNDEF__ ./YSTP Mr/NNB  <Clinton/NP1> or/CC Mrs/NNB Clinton/NP
  1550753: linton/NP1 or/CC Mrs/NNB  <Clinton/NP1> <unclear>/__UNDEF__ ,/YC
  1562677: Bush/NP1 and/CC Bill/NP1  <Clinton/NP1> ./YSTP It/PPH1 was/VBDZ 
  1735051: hereby/RRQ President/NNB  <Clinton/NP1> and/CC his/APPGE wife/NN
  1735090: ne/NN1 ,/YCOM Hilary/NP1  <Clinton/NP1> leaped/VVD out of/II the
  1735130: pause>/__UNDEF__ Mrs/NNB  <Clinton/NP1> replied/VVD that/CST she
BNC-SAMP> 
search for parts of speech:
BNC-SAMP> [pos="NNB"];
       14: /JJ President/NN1 ,/YCOM  <Mr/NNB> Rene/NP1 Muawad/NP1 ,/YC
       48: STP Supporters/NN2 of/IO  <General/NNB> Michel/NP1 Aoun/NP1 ,/YC
       78: N1 which/DDQ brought/VVD  <Mr/NNB> Muawad/NP1 to/II power/N
      107: BN let/VVN down/RP by/II  <President/NNB> Franois/NP1 Mitterrand
/N
      111: P1 Mitterrand/NP1 ./YSTP  <Gen/NNB> Aoun/NP1 rejected/VVD th
      138: om/II Lebanon/NP1 ./YSTP  <Mr/NNB> Muawad/NP1 ,/YCOM a/AT1 
      174: TP Analysts/NN2 said/VVD  <Mr/NNB> Muawad/NP1 faced/VVD the
      192:  14/MC years/NNT2 ./YSTP  <Mr/NNB> Muawad/NP1 has/VHZ said/
      209: ed/VVN to/TO include/VVI  <Mr/NNB> Selim/NP1 el-Hoss/NP1 ,/
      220: administration/NN1 to/II  <Gen/NNB> Aoun/NP1 ./YSTP The/AT a
      245: /NN2 who/PNQS defied/VVD  <Gen/NNB> Aoun/NP1 's/GE attempts/
      257: R Christians/NN2 like/II  <Mr/NNB> Muawad/NP1 ./YSTP Lebano
      269: P1 is/VBZ held/VVN by/II  <Gen/NNB> Aoun/NP1 ,/YCOM so/RR Mr
      273: NB Aoun/NP1 ,/YCOM so/RR  <Mr/NNB> Muawad/NP1 was/VBDZ expe
      286: st/NP1 Beirut/NP1 ./YSTP  <Mr/NNB> Muawad/NP1 was/VBDZ quot
      629: 1 Information/NN1 ,/YCOM  <Mr/NNB> Steve/NP1 Kosiak/NP1 ./Y
      985: 1 ./YSTP According to/II  <Mr/NNB> Dieter/NP1 Brauninger/NP
     1066: 1 ,/YCOM "/YQUO said/VVD  <Mr/NNB> Brauninger/NP1 ./YSTP Ec
     1137: ur/NN1 office/NN1 ,/YCOM  <Mr/NNB> Egon/NP1 Franke/NP1 ,/YC
     1440: market/NN1 ,/YCOM "/YQUO  <Mr/NNB> Brauninger/NP1 explained
     1517: r/NP1 in/II Brussels/NP1  <SIR/NNB> Leon/NP1 Brittan/NP1 ,/Y
BNC-SAMP> 
and do various much more complex searches such as the following, which finds present participles within 5 words of a "while" which has been tagged as a subordinating conjunction.
BNC-SAMP> [word="while" & pos="CS"] []{1,4} [pos="VVG"];

The following documentation is available. You'll need to consult it in order to do the assignment. There are a lot of facilities, many of them very useful.

You may find that some of the things described in the manuals don't work. Don't panic: this is research software, and a slightly different version from the one described in the manuals.