Have a Good Time!

Michael Eng

[weng1 mai4 ke3]

click to see · about · academic · photography · hacking · random · people · visitors' book


Current Work

I am currently collaborating with Kate Ho on a piece of research involving blogs.

Dissertation

As part of my degree in Linguistics and Artificial Intelligence, on 31st March 2004 I submitted a dissertation entitled Can Words be Pronounced Using Lexical Exemplars? Development of a Computational Model of Visual Word Recognition by Analogy. My work was supervised by Dr. Marielle Lange, School of Informatics, University of Edinburgh.

Download: PDF (604k), GZip PostScript (120k).

Some old slides demonstrating approach: PDF (488k).

The software developed is also available for download. This is has been tested under Mac OS X, DICE (RedHat 8.0), and Cygwin/Win32. It should also work on any system where Python is available.

Documentation and sample corpora are included in the archive file.

Download: GZip DMG for Mac OS X (1704k), or GZip tarball for everything else (1428k).

You can query the model online via this website. (Beware, it is slow!)

Click here to submit orthographies to the model interactively.

previously recent stuff

Using weakly supervised learning for text classification

Slides from a talk I gave at the Glasgow University Information Retrieval Group, on 9 Feb 2004. (sorry, the slides are sideways, you have to rotate them yourself.)

Download: Slides in PDF (620k).

Abstract: Learning to classify text is a problem applicable to many practical settings. Is it possible to improve performance in the classification task with no additional data, for `free'? Weakly supervised learning algorithms such as co-training (Blum & Mitchell, 1998) appear to offer just that. I shall discuss some of the findings in the field, together with my own work in this area.

selected papers 2003

Co-training in Text Classification: Is the Independence Assumption Always Applicable?

Download: PDF (159k), GZip PostScript (147k).

Abstract: In the recent literature, there has been considerable interest in weakly supervised learning algorithms and their application to text categorisation. Blum and Mitchell (1998) use co-training where there are two natural independent views of the data, and Nigam and Ghani (2000) cite this to warrant the independence assumption, where higher performance is gained by there being a highly divided feature set. In a number of empirical experiments, we find that highly divided features are not a necessary condition of high performance in co-training, and that self-training can outperform co-training in a real-world setting where clear feature-split datasets are not available.

Linguistic Imperialism and Singapore's Speak Good English Movement

Download: PDF (124k), GZip PostScript (140k).

Abstract: In 2000, the Singapore Government launched the Speak Good English Movement, with the tagline `Speak Well. Be Understood.'; its' aim being to remove some of the characteristic features of Singapore Colloquial English (or Singlish) from common parlance, instead replacing them with `good' English which appears to be close to Standard British English in form. This can be viewed as a kind of linguistic imperialism (Philipson, 1992) in which British English is considered superior to the local variety. An small study was conducted on Singaporean students via the World Wide Web studying in the UK to determine whether they were acting as `linguistic imperialists', bringing British English influences back to Singapore with them.

strictly for amusement only

indoneshia no shyakai no mondaiten (2001)

Download: PDF (162k).

Now you too can read this fantastic resumé of twentieth century Indonesian politics and its' associated social effects!!


Note: You are viewing the No-CSS (Netscape 4 friendly) version of this page. Your browser is: CCBot/1.0 (+http://www.commoncrawl.org/bot.html).

Michael Eng
my e-mail address is meng ! daydream . org . uk, but replace the ! with @ and remove the spaces