| « Linux Tutorials | Step by step guide to install tbs6980/tbs6981 on linux. (Updated) » |
Language identification/detection related documents
I'm trying to write a paper about my language detection tool. Here I'll list all the links and documents related to the background of the job:
- WebLinks
- http://tnlessone.wordpress.com/2007/05/13/how-to-detect-which-language-a-text-is-written-in-or-when-science-meets-human/
- http://www.codeproject.com/KB/recipes/DetectEncoding.aspx
- http://www.codeproject.com/KB/recipes/DialogueMaster_Babel.aspx
- http://www.labnol.org/internet/identify-language-of-text/8441/
- http://phpir.com/language-detection-with-n-grams
- http://www.hacktrix.com/3-tools-to-detect-unknown-language-from-sample-text
- http://langid.net/
- http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B7MFM-4DX4964-C&_user=1884617&_coverDate=05/31/1967&_rdoc=1&_fmt=high&_orig=search&_origin=search&_sort=d&_docanchor=&view=c&_searchStrId=1656980954&_rerunOrigin=scholar.google&_acct=C000055174&_version=1&_urlVersion=0&_userid=1884617&md5=820032458a81f7a103dbd27b54d0959e&searchtype=a
- http://www.springerlink.com/content/w38xv4t61l243256/
- http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1958
- http://www.google.com/patents?hl=en&lr=&vid=USPAT5062143&id=gFYeAAAAEBAJ&oi=fnd&dq=language+identification&printsec=abstract#v=onepage&q&f=false
- http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1313044
- http://portal.acm.org/citation.cfm?id=1887073&CFID=10433212&CFTOKEN=74429975&preflayout=tabs
- http://portal.acm.org/citation.cfm?id=150899&coll=DL&dl=GUIDE&CFID=10433212&CFTOKEN=74429975
- http://portal.acm.org/citation.cfm?id=567295&CFID=10433212&CFTOKEN=74429975
- http://portal.acm.org/citation.cfm?id=678755&CFID=10433212&CFTOKEN=74429975
- PDFs
- Automatic Language Identification Using a Segment-Based Approach, W. Zue, T. J. Hazen, and T. J. Hazen, Proceeding Eurospeech, 1993
- Comparing Two Language Identification Schemes, G. Grefenstette, The proceedings of 3rd International Conference on Statistical Analysis of Textual Data (JADT), 1995
- Comparison of Four Approaches to Automatic Language Identification of Telephone Speech, M.A. Zissman, in IEEE Trans. Speech and Audio Proc, 1996
- An n-gram based language identification system, J. King, and J. Dehdari, The Ohio State University, 2000
- Applying Monte Carlo Techniques to Language Identification, A. Poutsma, In Proceedings of Computational Linguistics in the Netherlands, 2001
- Approaches to Language Identification using Gaussian Mixture Modelsand Shifted Delta Cepstral Features, P.A. Torres-Carrasquillo1, E. Singer, M.A. Kohler,R.J. Greene, D.A. Reynolds, and J.R. Deller, Proceeding ICSLP, pp. 89-92, 2002
- Automatic Identification of Document Translations in large Multilingual Document Collections, B. Pouliquen, R. Steinberger, C. Ignat, Proceedings of the International Conference on Recent Advances in Natural Language Processing, 2003
- Comparing methods for language identification, M. Padró, and L. Padró, Procesamiento del Lenguaje Natural, pp 155-162, 2004
- Language identification based on string kernels, C. Kruengkrai, P. Srichaivattana, V. Sornlertlamvanich and H. Isahara, In Proceedings of the 5th International Symposium on Communications and Information Technologies (ISCIT), 2005
- Evaluation of Language Identification Methods, S. Kranig, Bachelor of Arts Thesis, 2005
- Comparing Natural Language Identification Methods based on Markov Processes, P. Vojtek, and M. Bieliková, International Seminar on Computer Treatment of Slavic and East European Languages, 2007
- Factors that affect the accuracy of text-based language identification, GR. Botha, E. Barnard, 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), 2007
- A Comparative Study on Language Identification Methods, L. Grothe, E. W. de Luca, and A. Nürnberger, Proceedings of the Sixth International Language Resources and Evaluation (ILRE), 2008
- A Comparison of Language Identification Approaches on Short Query-Style Texts, T. Gottron, and N. Lipka, Lecture Notes in Computer Science (LNCS), pp 611-614, 2010
- Fusion of Output Scores on Language Identification System,
- Identification of Document Language is not yet a completely solved ,
- Language Identifier A Computer Program for Automatic Natural-Language Identi cation of On-line Text,
- Language Identification,
- Language Identification-2,
- Language Identification Based on n-Gram Frequency Ranking,
- language identification examining the issues,
- Language Identification for Person Names Based on Statistical,
- Language identification how to distinguish similar languages,
- Language Identification in the Limit,
- Language Identification in Web Pages,
- Language Identification of Short Text Segments with N-gram Models,
- Language Identification on the Web Extending the Dictionary Method,
- Language Identification Programming Project Faculty Project,
- Mining the Web for Bilingual Text,
- N-GRAM AND DECISION TREE BASED LANGUAGE IDENTIFICATION FOR WRITTEN WORDS,
- N-gram models for language detection,
- Optimizing n-gram Order of an n-gram Based Language Identification,
- REVIEW OF AUTOMATIC LANGUAGE IDENTIFICATION,
- SCALABLE NEURAL NETWORK BASED LANGUAGE IDENTIFICATION FROM WRITTEN,
- Statistical Identification of Language,
- Study of some distance measures for language and encoding identification,
- Web Page Language Identification Based on URLs,
- FEATURE SELECTION METHOD OF WEB PAGE LANGUAGE IDENTIFICATION,
Trackback address for this post
1 comment
Comment from: Nick Snels [Visitor]
great list. The following link could also be of interest to you and your research, namely http://www.whatlanguage.net/en/api/accuracy_language_detection . It compares 5 language detection programs, namely WhatLanaguage.net, CLD, Tika, language-detection and langid.py.
04/21/13 @ 15:49



