The Spoken BNC2014 is now available!

On behalf of Lancaster University and Cambridge University Press, it gives us great pleasure to announce the public release of the Spoken British National Corpus 2014 (Spoken BNC2014).

The Spoken BNC2014 contains 11.5 million words of transcribed informal British English conversation, recorded by (mainly English) speakers between the years 2012 and 2016. The situational context of the recordings – casual conversation among friends and family members – is designed to make the corpus broadly comparable to the demographically-sampled component of the original spoken British National Corpus.

The Spoken BNC2014 is now accessible online in full, free of charge, for research and teaching purposes. To access the corpus, you should first create a free account on Lancaster University’s CQPweb server (https://cqpweb.lancs.ac.uk/) if you do not already have one. Once registered, please visit the BNC2014 website (http://corpora.lancs.ac.uk/bnc2014) to (a) sign the corpus’ end-user licence and (b) register your CQPweb account – following the instructions on the site. When you return to CQPweb, you will have access to the Spoken BNC2014 via the link that appears in the list of ‘Present-day English’ corpora. While access is initially only via the CQPweb platform, the underlying corpus XML files and associated metadata will be available for download in Autumn 2018.

The BNC2014 website also contains lots of useful information about the corpus, and in particular a downloadable manual and reference guide, which will be available soon. Further information, as well as the first research articles to use Spoken BNC2014 data, will be available in two in-press publications associated with the project: a special issue of the International Journal of Corpus Linguistics (due next month) and an edited collection in the Routledge ‘Advances in Corpus Linguistics’ series (due early 2018).

The BNC2014 does not end here – we are currently working on transcribing materials provided to us by the British Library to provide a substantial supplement to the corpus – find out more about that here: http://cass.lancs.ac.uk/?p=2241. For now, we will be waiting and watching with interest to see what work the corpus releases today stimulates. As ever with corpus data, it does not enable all questions to be answered, but it does allow a very wide range of questions to be investigated.

The Spoken BNC2014 research team would like to express our gratitude to all who have had a hand in the creation of the corpus, and hope that you enjoy exploring the data. We are, of course, keen to hear your feedback about the corpus; this, as well as any questions, can be directed to Robbie Love (r.m.love@lancaster.ac.uk) or Andrew Hardie (a.hardie@lancaster.ac.uk).

Source: http://cass.lancs.ac.uk/?p=2378

Not just a linguistic resource but a unique record of humanity

ESRC blog

robbie-love 150Robbie Love is a PhD student at the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University, where he spent four years working on the Spoken British National Corpus 2014 project.

harry-strawson 150Harry Strawson is a writer living in London and contributed recordings to the Spoken British National Corpus 2014.

Here Robbie and Harry share two different perspectives on the Spoken British National Corpus project ahead of its release next week.

Every day billions of words are uttered in hundreds of languages all over the world. For corpus linguists, that is, people who study the form, use and function of language using specialised computer software, speech is like the golden snitch in a game of Quidditch. It appears to be everywhere around you and yet it is incredibly difficult to capture.

View original post 651 more words

Dispatch from YLMP2014

In April I had the pleasure of travelling to Poland to attend the Young Linguists’ Meeting in Poznań (YLMP), a congress for young linguists who are interested in interdisciplinary research and stepping beyond the realm of traditional linguistic study. Hosted over three days by the Faculty of English at Adam Mickiewicz University, the congress featured over 100 talks by linguists young and old, including plenary lectures by Lancaster’s very own Paul Baker and Jane Sunderland. I was one of three Lancaster students to attend the congress, along with undergraduate Agnes Szafranski and fellow MA student Charis Yang Zhang.

From left to right: me, Jane Sunderland, Charis Yhang and Paul Baker

From left to right: me, Jane Sunderland, Charis Zhang and Paul Baker

What struck me about the congress, aside from the warm hospitality of the organizers, was the sheer breadth of topics that were covered over the weekend. All of the presenters were more than qualified to describe their work as linguistics, but perhaps for the first time I saw within just how many domains such a discipline can be applied. At least four sessions ran in parallel at any given time, and themes ranged from gender and sexuality to EFL and even psycholinguistics. There were optional workshops as well as six plenary talks. On the second day of the conference, as part of the language and society stream, I presented a corpus-assisted critical discourse analysis of the UK national press reporting of the immediate aftermath of the May 2013 murder of soldier Lee Rigby. I was happy to have a lively and engaged audience who had some really interesting questions for me at the end, and I enjoyed the conversations that followed this at the reception in the evening!

What was most encouraging about the congress was the drive and enthusiasm shared by all of the ‘young linguists’ in attendance. I now feel part of a generation of young minds who are hungry to improve not only our own work but hopefully, in time, the field(s) of linguistics as a whole. After my fantastic experience at the Boya Forum at Beijing Foreign Studies University last autumn, I was happy to spend time again celebrating the work of undergraduate and postgraduate students, and early-career linguists. There was a willingness to listen, to share ideas, and to (constructively) criticise where appropriate, and as a result I left Poznań feeling very optimistic about the future of linguistic study. I look forward to returning to the next edition of YLMP, because from what I saw at this one, there is a new generation of linguists eager to push the investigation of language to the next level.

The Twitter reaction to the sentencing of the Lee Rigby murderers – 26th February 2014

by Robbie Love, Tony McEnery & Stephen Wattam

Introduction

The ESRC-funded Centre for Corpus Approaches to Social Science (CASS) at Lancaster University has undertaken some preliminary research into the immediate reaction on Twitter to the sentencing of the Lee Rigby murderers on Wednesday 26th February 2014. This document summarises our findings.

Background

On the afternoon of Wednesday 22nd May 2013, British soldier Lee Rigby was murdered by two men, Michael Adebolajo and Michael Adebowale, near the Royal Artillery Barracks in Woolwich, London. The attack, which was carried out in broad daylight, quickly became a major national news story. In December 2013 the perpetrators were found guilty of murder and were sentenced on Wednesday 26th February 2014. Adebolajo received a whole-life sentence (meaning he will never be released) and Adebowale received a life sentence with a minimum term of 45 years imprisonment.

How the research was carried out

We carried out our research by using the Twitter API to collect a large amount of tweets[1] that referred to the Rigby case, in some way, between 00.00 and 23.59 on Wednesday 26th February 2014. All tweets containing one or more of the following terms were included in our search:

rigby, adebolajo, adebowale, woolwich trial, woolwich sentence, woolwich      sentencing, justice Sweeney, #leerigby, #rigbytrial, #rigbysentence, #woolwich, #woolwichmurder, #woolwichattack, #woolwichtrial

Using these search terms we collected a total of 57,097 tweets over the 24 hour period, which included retweets (RTs), quotes etc. This amounted to a total of 1,109,136 words of Twitter discussion about the case. We then used a set of tools and methods developed in corpus linguistics to find out the ways in which Twitter users discussed the sentencing on the day of the decision.

Findings

The following is a selection of preliminary findings based on the analysis of the tweets.

  • Nearly two thirds of the tweets were retweets[2]

Nearly 35,000 tweets (60.1% of tweets) included the retweet abbreviation RT. This confirms that Twitter discussion of the Lee Rigby case was highly retweeted and shared by Twitter users. The top ten most frequently retweeted Twitter handles appear to have been:

Rank Handle Description
1 @bbcbreaking Breaking news account for BBC News
2 @skymarkwhite Home Affairs Correspondent for Sky News
3 @skynewsbreak Breaking news account for Sky News
4 @poppypride1 An “independent account supporting all troop charities”
5 @jakeleonardx Young footballer at Crewe Alexandra Academy
6 @itvnews Main account for ITV News
7 @courtnewsuk News reports account for the Old Bailey
8 @thesunnewspaper Main account for The Sun newspaper
9 @bbcnews Main account for BBC News
10 @unnamedinsider Satirical commentator

Based on these it seems that the most popular form of Twitter interaction relating to the Rigby sentencing was to retweet news updates from well-known news providers including the BBC News, Sky News, ITV News and The Sun. @jakeleonardx is not a celebrity (he has less than 1,000 followers), but when he tweeted a photo of Lee Rigby’s son with the caption “Poor little lad, RIP Lee Rigby”, it was retweeted nearly 1,000 times. @unnamedinsider appears to be better known (with over 34,000 followers), and posted two tweets ridiculing the BNP and EDL protesters who had gathered outside of the Old Bailey for the sentencing.

  • The most salient word (apart from names and Twitter terms) was life

Twitter users were very concerned with the nature of the sentence being delivered in the sentencing, using the word ‘life’ 19,498 times (34.1% of tweets). The most common three-word phrase this was used in was life in prison (4,369 times, 7.7% of tweets), confirming that Twitter users were not concerned about the loss of life but rather the restriction of those of the perpetrators.

  • Some Twitter users wanted more than whole-life terms for the perpetrators

As well as whole-life terms, Twitter users strongly expressed their opinion about other punishments they deemed suitable for the perpetrators. In particular, highly salient words like rot, deserve, should and hang indicate this. The most popular three-word expression relating to such desired punishments is rot in hell. Furthermore the word deserve occurred 1,295 times (2.3% of tweets), an indication of a clear evaluation of the sanction proposed: popular four-word phrases containing deserve included deserve a life sentence, deserve to be hung, and deserve the death penalty. Likewise the word should is almost exclusively used to wish death upon the perpetrators of the murder, while hang relates to the most popular way in which Twitter users wanted capital punishment to be undertaken upon the killers.

  • Michael Adebolajo was discussed more than Michael Adebowale

The surname ‘Adebolajo’ was tweeted 15,092 times (26.4% of tweets) compared to ‘Adebowale’ being tweeted only 11,729 times (20.5% of tweets). This indicates that the perpetrator, who received the whole-life sentence was of more concern for tweeters than the perpetrator who received the less severe punishment.

  • The most salient word used to describe Adebolajo and Adebowale was scum, and the most salient swear word was cunts

Twitter’s word of choice for the perpetrators was scum, which occurred 1,466 times (2.6% of tweets). Popular phrases included ‘the scum’, ‘this scum’, ‘two scum’, ‘them scum’ and ‘those scum’, and popular words that combined with scum include absolute, fucking, murdering and jihadi. Furthermore, the swear word cunts was used 800 times in tweets about the Rigby sentencing (1.4% of tweets). This further indicates that, as expected, there was considerable disapproval and anger expressed towards the perpetrators. Words that combined with cunts to describe the perpetrators included dirty, sick, horrible, fucking, evil, scummy, vile, muslim, murdering and filthy.

  • In terms of religion, Twitter users were most concerned about Islam

The three most salient religious words were islamistas, Islam and Muslim. Islamistas (Spanish for Islamists) occurred in Spanish language tweets reporting the result of the sentencing (though most tweets were produced in English, and by users from the UK, there appears to have been activity from all over the world).  The other terms mostly occur in retweets and discussions about the judge’s statement that the perpetrators had betrayed Islam by murdering Rigby. The general opinion appears to be that the murder was nothing to do with the religion of Islam.

Conclusion

This preliminary analysis, using tools and methods from corpus linguistics, has captured a general impression of the Twitter reaction to the sentencing of the Lee Rigby murderers. It seems that the main reaction centred around the nature of the sentencing and the Twitter users’ wishes for both Michael Adebolajo and Michael Adebowale to receive at least a whole-life sentence but preferably death. Furthermore some Twitter users appeared unrestrained in their willingness to use offensive language to describe the killers.


[1] As many as possible were collected, but given the immediacy of the event and the nature of the search method, we acknowledge that Twitter users may have tweeted about the Rigby trial without using any of these terms.

[2] This may have been even higher than this if we take into account retweets that do not contain the letters ‘RT’.