🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 91 (from laksa005)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

đźš«
NOT INDEXABLE
âś…
CRAWLED
6 months ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffFAILdownload_stamp > now() - 6 MONTH6.1 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://dl.acm.org/doi/10.5555/1953048.2078186
Last Crawled2025-10-09 01:44:36 (6 months ago)
First Indexed2020-03-04 17:05:38 (6 years ago)
HTTP Status Code200
Meta TitleNatural Language Processing (Almost) from Scratch | The Journal of Machine Learning Research
Meta DescriptionWe propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity rec...
Meta Canonicalnull
Boilerpipe Text
Publication History Published : 01 November 2011 Published in JMLR  Volume 12
Markdown
[skip to main content](https://dl.acm.org/doi/10.5555/1953048.2078186#skip-to-main-content) - [![ACM Digital Library home](https://dl.acm.org/specs/products/acm/releasedAssets/images/acm-dl-logo-white-1ecfb82271e5612e8ca12aa1b1737479.png)](https://dl.acm.org/) - [![ACM Association for Computing Machinery corporate logo](https://dl.acm.org/doi/10.5555/specs/products/acm/releasedAssets/images/acm-logo-1.png)](https://www.acm.org/ "external site link") - [Advanced Search](https://dl.acm.org/search/advanced) - [Browse](https://dl.acm.org/browse/ "browse the Digital Library by publication title or publisher") - [About](https://dl.acm.org/about "About the ACM Digital Library") - - [Sign in](https://dl.acm.org/action/showLogin?redirectUri=%2Fdoi%2F10.5555%2F1953048.2078186) - [Register](https://accounts.acm.org/?redirectUri=%2Fdoi%2F10.5555%2F1953048.2078186 "Register") - [Advanced Search](https://dl.acm.org/search/advanced) - [Journals](https://dl.acm.org/journals) - [Magazines](https://dl.acm.org/magazines) - [Proceedings](https://dl.acm.org/proceedings) - [Books](https://dl.acm.org/acmbooks) - [SIGs](https://dl.acm.org/sigs) - [Conferences](https://dl.acm.org/conferences) - [People](https://dl.acm.org/people) - - [Browse](https://dl.acm.org/browse/ "browse the Digital Library by publication title or publisher") - [About](https://dl.acm.org/about "About the ACM Digital Library") - [More](https://dl.acm.org/doi/10.5555/1953048.2078186 "More") [Advanced Search](https://dl.acm.org/search/advanced "link to Advanced Search form") [The Journal of Machine Learning Research](https://dl.acm.org/doi/10.5555/1953048.2078186) - [Journal Home](https://dl.acm.org/journal/jmlr "Journal Home") - [Just Accepted]("Coming soon") - [Latest Issue](https://dl.acm.org/toc/jmlr/current "Latest Issue") - [Archive](https://dl.acm.org/loi/jmlr "Archive") - [Author List](https://dl.acm.org/journal/jmlr/authors "Author List") - [Affiliations](https://dl.acm.org/journal/jmlr/affiliations "Affiliations") - [Award Winners](https://dl.acm.org/journal/jmlr/award-winners "Award Winners") - [More](https://dl.acm.org/doi/10.5555/1953048.2078186 "More") - [Home](https://dl.acm.org/) - [Collections](https://dl.acm.org/collections) - [Hosted Content](https://dl.acm.org/affils) - [The Journal of Machine Learning Research](https://dl.acm.org/journal/jmlr) - [Vol. 12](https://dl.acm.org/toc/jmlr/2011/12/null) - [Natural Language Processing (Almost) from Scratch](https://dl.acm.org/doi/10.5555/1953048.2078186) article Free access Share on # Natural Language Processing (Almost) from Scratch Authors: [![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg)Ronan Collobert](https://dl.acm.org/doi/10.5555/1953048.2078186 "Ronan Collobert") ![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg) Ronan Collobert [View Profile](https://dl.acm.org/profile/81100001072) , [![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg)Jason Weston](https://dl.acm.org/doi/10.5555/1953048.2078186 "Jason Weston") ![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg) Jason Weston [View Profile](https://dl.acm.org/profile/81100015405) , [![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg)Léon Bottou](https://dl.acm.org/doi/10.5555/1953048.2078186 "Léon Bottou") ![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg) Léon Bottou [View Profile](https://dl.acm.org/profile/81100263096) , [![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg)Michael Karlen](https://dl.acm.org/doi/10.5555/1953048.2078186 "Michael Karlen") ![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg) Michael Karlen [View Profile](https://dl.acm.org/profile/81421593627) , [![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg)Koray Kavukcuoglu](https://dl.acm.org/doi/10.5555/1953048.2078186 "Koray Kavukcuoglu") ![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg) Koray Kavukcuoglu [View Profile](https://dl.acm.org/profile/81416596384) , [![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg)Pavel Kuksa](https://dl.acm.org/doi/10.5555/1953048.2078186 "Pavel Kuksa") ![](https://dl.acm.org/pb-assets/icons/DOs/default-profile-1543932446943.svg) Pavel Kuksa [View Profile](https://dl.acm.org/profile/81387607525) [Authors Info & Claims](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-contributors) [The Journal of Machine Learning Research, Volume 12](https://dl.acm.org/toc/jmlr/2011/12/null) Pages 2493 - 2537 Published: 01 November 2011 [Publication History](https://dl.acm.org/doi/10.5555/1953048.2078186#core-history) 1,129citation9,982Downloads Metrics [Total Citations1,129](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-citations "See Citations pane") [Total Downloads9,982](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-metrics-inner "See Bibliometrics pane") Last 12 Months458 Last 6 weeks35 Get Citation Alerts [PDF](https://dl.acm.org/doi/pdf/10.5555/1953048.2078186 "View PDF")[eReader](https://dl.acm.org/doi/epdf/10.5555/1953048.2078186 "View online with eReader") Contents The Journal of Machine Learning Research [Volume 12](https://dl.acm.org/toc/jmlr/2011/12/null) [PREVIOUS ARTICLE LPmade: Link Prediction Made EasyPrevious](https://dl.acm.org/doi/10.5555/1953048.2078185) [NEXT ARTICLE Weisfeiler-Lehman Graph KernelsNext](https://dl.acm.org/doi/10.5555/1953048.2078187) - [Abstract](https://dl.acm.org/doi/10.5555/1953048.2078186#abstract) - [References](https://dl.acm.org/doi/10.5555/1953048.2078186#bibliography) - [Cited By](https://dl.acm.org/doi/10.5555/1953048.2078186#core-cited-by) - [Recommendations](https://dl.acm.org/doi/10.5555/1953048.2078186#sec-recommendations) - [Comments](https://dl.acm.org/doi/10.5555/1953048.2078186#sec-comments) ![ACM Digital Library](https://dl.acm.org/specs/products/acm/releasedAssets/images/footer-logo1-45ae33115db81394d8bd25be65853b77.png) - [Information & Contributors](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-info "Information & Contributors") - [Bibliometrics & Citations](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-metrics "Bibliometrics & Citations") - [View Options](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-fulltext-options "View Options") - [References91](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-references "References") - [Figures](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-figures "Figures") - [Tables](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-tables "Tables") - [Media](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-media "Media") - [Share](https://dl.acm.org/doi/10.5555/1953048.2078186#core-collateral-share "Share") ## Abstract We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements. ## Formats available You can view the full content in the following formats: [PDF](https://dl.acm.org/doi/pdf/10.5555/1953048.2078186 "View PDF") ## References \[1\] R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. *Journal of Machine Learning Research (JMLR)*, 6:1817-1953, 2005. [Crossref](https://doi.org/10.5555/1046920.1194905) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1046920.1194905) \[2\] R. M. Bell, Y. Koren, and C. Volinsky. The BellKor solution to the Netflix Prize. Technical report, AT\&T Labs, 2007. http://www.research.att.com/~volinsky/netflix. [Google Scholar](https://scholar.google.com/scholar?q=R.+M.+Bell%2C+Y.+Koren%2C+and+C.+Volinsky.+The+BellKor+solution+to+the+Netflix+Prize.+Technical+report%2C+AT%26T+Labs%2C+2007.+http%3A%2F%2Fwww.research.att.com%2F~volinsky%2Fnetflix.) \[3\] Y. Bengio and R. Ducharme. A neural probabilistic language model. In *Advances in Neural Information Processing Systems (NIPS 13)*, 2001. [Google Scholar](https://scholar.google.com/scholar?q=Y.+Bengio+and+R.+Ducharme.+A+neural+probabilistic+language+model.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+13%29%2C+2001.) \[4\] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In *Advances in Neural Information Processing Systems (NIPS 19)*, 2007. [Google Scholar](https://scholar.google.com/scholar?q=Y.+Bengio%2C+P.+Lamblin%2C+D.+Popovici%2C+and+H.+Larochelle.+Greedy+layer-wise+training+of+deep+networks.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+19%29%2C+2007.) \[5\] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In *International Conference on Machine Learning (ICML)*, 2009. [Crossref](https://doi.org/10.1145/1553374.1553380) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1553374.1553380) \[6\] L. Bottou. Stochastic gradient learning in neural networks. In *Proceedings of Neuro-Nimes*. EC2, 1991. [Google Scholar](https://scholar.google.com/scholar?q=L.+Bottou.+Stochastic+gradient+learning+in+neural+networks.+In+Proceedings+of+Neuro-Nimes.+EC2%2C+1991.) \[7\] L. Bottou. Online algorithms and stochastic approximations. In David Saad, editor, *Online Learning and Neural Networks*. Cambridge University Press, Cambridge, UK, 1998. [Crossref](https://doi.org/10.5555/304710.304720) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F304710.304720) \[8\] L. Bottou and P. Gallinari. A framework for the cooperation of learning algorithms. In *Advances in Neural Information Processing Systems (NIPS 3)*. 1991. [Crossref](https://doi.org/10.5555/118850.119002) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F118850.119002) \[9\] L. Bottou, Y. LeCun, and Yoshua Bengio. Global training of document processing systems using graph transformer networks. In *Conference on Computer Vision and Pattern Recognition (CVPR)*, pages 489-493, 1997. [Crossref](https://doi.org/10.5555/794189.794462) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F794189.794462) \[10\] J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogelman Soulie and J. Herault, editors, *Neurocomputing: Algorithms, Architectures and Applications*, pages 227-236. NATO ASI Series, 1990. [Google Scholar](https://scholar.google.com/scholar?q=J.+S.+Bridle.+Probabilistic+interpretation+of+feedforward+classification+network+outputs%2C+with+relationships+to+statistical+pattern+recognition.+In+F.+Fogelman+Soulie+and+J.+Herault%2C+editors%2C+Neurocomputing%3A+Algorithms%2C+Architectures+and+Applications%2C+pages+227-236.+NATO+ASI+Series%2C+1990.) \[11\] P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J C. Lai. Class-based n-gram models of natural language. *Computational Linguistics*, 18(4):467-479, 1992a. [Crossref](https://doi.org/10.5555/176313.176316) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F176313.176316) \[12\] P. F. Brown, V. J. Della Pietra, R. L. Mercer, S. A. Della Pietra, and J. C. Lai. An estimate of an upper bound for the entropy of english. *Computational Linguistics*, 18(1):31-41, 1992b. [Crossref](https://doi.org/10.5555/146680.146685) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F146680.146685) \[13\] C. J. C. Burges, R. Ragno, and Quoc Viet Le. Learning to rank with nonsmooth cost functions. In *Advances in Neural Information Processing Systems (NIPS 19)*, pages 193-200. 2007. [Google Scholar](https://scholar.google.com/scholar?q=C.+J.+C.+Burges%2C+R.+Ragno%2C+and+Quoc+Viet+Le.+Learning+to+rank+with+nonsmooth+cost+functions.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+19%29%2C+pages+193-200.+2007.) \[14\] R. Caruana. Multitask Learning. *Machine Learning*, 28(1):41-75, 1997. [Crossref](https://doi.org/10.1023/A:1007379606734) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1023%2FA%3A1007379606734) \[15\] O. Chapelle, B. Schlkopf, and A. Zien. *Semi-Supervised Learning*. Adaptive computation and machine learning. MIT Press, Cambridge, Mass., USA, September 2006. [Crossref](https://doi.org/10.5555/1208768) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1208768) \[16\] E. Charniak. A maximum-entropy-inspired parser. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 132-139, 2000. [Crossref](https://doi.org/10.5555/974305.974323) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F974305.974323) \[17\] H. L. Chieu. Named entity recognition with a maximum entropy approach. In *Conference on Natural Language Learning (CoNLL)*, pages 160-163, 2003. [Crossref](https://doi.org/10.3115/1119176.1119199) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1119176.1119199) \[18\] N. Chomsky. Three models for the description of language. *IRE Transactions on Information Theory*, 2(3):113-124, September 1956. [Google Scholar](https://scholar.google.com/scholar?q=N.+Chomsky.+Three+models+for+the+description+of+language.+IRE+Transactions+on+Information+Theory%2C+2%283%29%3A113-124%2C+September+1956.) \[19\] S. Clemencon and N. Vayatis. Ranking the best instances. *Journal of Machine Learning Research (JMLR)*, 8:2671-2699, 2007. [Crossref](https://doi.org/10.5555/1314498.1390330) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1314498.1390330) \[20\] W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. *Journal of Artificial Intelligence Research (JAIR)*, 10:243-270, 1998. [Crossref](https://doi.org/10.5555/1622859.1622867) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1622859.1622867) \[21\] T. Cohn and P. Blunsom. Semantic role labelling with tree conditional random fields. In *Conference on Computational Natural Language (CoNLL)*, 2005. [Crossref](https://doi.org/10.5555/1706543.1706573) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706573) \[22\] M. Collins. *Head-Driven Statistical Models for Natural Language Parsing*. PhD thesis, University of Pennsylvania, 1999. [Crossref](https://doi.org/10.5555/929278) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F929278) \[23\] R. Collobert. *Large Scale Machine Learning*. PhD thesis, Universite Paris VI, 2004. [Google Scholar](https://scholar.google.com/scholar?q=R.+Collobert.+Large+Scale+Machine+Learning.+PhD+thesis%2C+Universite+Paris+VI%2C+2004.) \[24\] R. Collobert. Deep learning for efficient discriminative parsing. In *International Conference on Artificial Intelligence and Statistics (AISTATS)*, 2011. [Google Scholar](https://scholar.google.com/scholar?q=R.+Collobert.+Deep+learning+for+efficient+discriminative+parsing.+In+International+Conference+on+Artificial+Intelligence+and+Statistics+%28AISTATS%29%2C+2011.) \[25\] T. Cover and R. King. A convergent gambling estimate of the entropy of english. *IEEE Transactions on Information Theory*, 24(4):413-421, July 1978. [Google Scholar](https://scholar.google.com/scholar?q=T.+Cover+and+R.+King.+A+convergent+gambling+estimate+of+the+entropy+of+english.+IEEE+Transactions+on+Information+Theory%2C+24%284%29%3A413-421%2C+July+1978.) \[26\] R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 168-171, 2003. [Crossref](https://doi.org/10.3115/1119176.1119201) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1119176.1119201) \[27\] D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. *Computational Linguistics*, 28(3): 245-288, 2002. [Crossref](https://doi.org/10.1162/089120102760275983) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2F089120102760275983) \[28\] D. Gildea and M. Palmer. The necessity of parsing for predicate argument recognition. *Meeting of the Association for Computational Linguistics (ACL)*, pages 239-246, 2002. [Crossref](https://doi.org/10.3115/1073083.1073124) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073083.1073124) \[29\] J. Gimenez and L. Marquez. SVMTool: A general POS tagger generator based on support vector machines. In *Conference on Language Resources and Evaluation (LREC)*, 2004. [Google Scholar](https://scholar.google.com/scholar?q=J.+Gimenez+and+L.+Marquez.+SVMTool%3A+A+general+POS+tagger+generator+based+on+support+vector+machines.+In+Conference+on+Language+Resources+and+Evaluation+%28LREC%29%2C+2004.) \[30\] A. Haghighi, K. Toutanova, and C. D. Manning. A joint model for semantic role labeling. In *Conference on Computational Natural Language Learning (CoNLL)*, June 2005. [Crossref](https://doi.org/10.5555/1706543.1706574) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706574) \[31\] Z. S. Harris. *Mathematical Structures of Language*. John Wiley & Sons Inc., 1968. [Google Scholar](https://scholar.google.com/scholar?q=Z.+S.+Harris.+Mathematical+Structures+of+Language.+John+Wiley+%26+Sons+Inc.%2C+1968.) \[32\] D. Heckerman, D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization. *Journal of Machine Learning Research (JMLR)*, 1:49-75, 2001. [Crossref](https://doi.org/10.1162/153244301753344614) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2F153244301753344614) \[33\] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. *Neural Computation*, 18(7):1527-1554, July 2006. [Crossref](https://doi.org/10.1162/neco.2006.18.7.1527) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2Fneco.2006.18.7.1527) \[34\] K. Hollingshead, S. Fisher, and B. Roark. Comparing and combining finite-state and context-free parsers. In *Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)*, pages 787-794, 2005. [Crossref](https://doi.org/10.3115/1220575.1220674) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220575.1220674) \[35\] F. Huang and A. Yates. Distributional representations for handling sparsity in supervised sequence-labeling. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 495-503, 2009. [Crossref](https://doi.org/10.5555/1687878.1687948) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1687878.1687948) \[36\] F. Jelinek. Continuous speech recognition by statistical methods. *Proceedings of the IEEE*, 64(4): 532-556, 1976. [Google Scholar](https://scholar.google.com/scholar?q=F.+Jelinek.+Continuous+speech+recognition+by+statistical+methods.+Proceedings+of+the+IEEE%2C+64%284%29%3A+532-556%2C+1976.) \[37\] T. Joachims. Transductive inference for text classification using support vector machines. In *International Conference on Machine learning (ICML)*, 1999. [Crossref](https://doi.org/10.5555/645528.657646) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F645528.657646) \[38\] D. Klein and C. D. Manning. Natural language grammar induction using a constituent-context model. In *Advances in Neural Information Processing Systems (NIPS 14)*, pages 35-42. 2002. [Google Scholar](https://scholar.google.com/scholar?q=D.+Klein+and+C.+D.+Manning.+Natural+language+grammar+induction+using+a+constituent-context+model.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+14%29%2C+pages+35-42.+2002.) \[39\] T. Koo, X. Carreras, and M. Collins. Simple semi-supervised dependency parsing. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 595-603, 2008. [Google Scholar](https://scholar.google.com/scholar?q=T.+Koo%2C+X.+Carreras%2C+and+M.+Collins.+Simple+semi-supervised+dependency+parsing.+In+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+595-603%2C+2008.) \[40\] P. Koomen, V. Punyakanok, D. Roth, and W. Yih. Generalized inference with multiple semantic role labeling systems (shared task paper). In *Conference on Computational Natural Language Learning (CoNLL)*, pages 181-184, 2005. [Crossref](https://doi.org/10.5555/1706543.1706576) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706576) \[41\] T. Kudo and Y. Matsumoto. Chunking with support vector machines. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 1-8, 2001. [Crossref](https://doi.org/10.3115/1073336.1073361) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073336.1073361) \[42\] T. Kudoh and Y. Matsumoto. Use of support vector learning for chunk identification. In *Conference on Natural Language Learning (CoNLL) and Second Learning Language in Logic Workshop (LLL)*, pages 142-144, 2000. [Crossref](https://doi.org/10.3115/1117601.1117635) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1117601.1117635) \[43\] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In *International Conference on Machine Learning (ICML)*, 2001. [Crossref](https://doi.org/10.5555/645530.655813) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F645530.655813) \[44\] Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. *Proceedings of IEEE*, 86(11):2278-2324, 1998. [Google Scholar](https://scholar.google.com/scholar?q=Y.+Le+Cun%2C+L.+Bottou%2C+Y.+Bengio%2C+and+P.+Haffner.+Gradient+based+learning+applied+to+document+recognition.+Proceedings+of+IEEE%2C+86%2811%29%3A2278-2324%2C+1998.) \[45\] Y. LeCun. A learning scheme for asymmetric threshold networks. In *Proceedings of Cognitiva*, pages 599-604, Paris, France, 1985. [Google Scholar](https://scholar.google.com/scholar?q=Y.+LeCun.+A+learning+scheme+for+asymmetric+threshold+networks.+In+Proceedings+of+Cognitiva%2C+pages+599-604%2C+Paris%2C+France%2C+1985.) \[46\] Y. LeCun, L. Bottou, G. B. Orr, and K.-R.Muller. Efficient backprop. In G.B. Orr and K.-R.Muller, editors, *Neural Networks: Tricks of the Trade*, pages 9-50. Springer, 1998. [Crossref](https://doi.org/10.5555/645754.668382) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F645754.668382) \[47\] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. *Journal of Machine Learning Research (JMLR)*, 5:361-397, 2004. [Crossref](https://doi.org/10.5555/1005332.1005345) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1005332.1005345) \[48\] P. Liang. Semi-supervised learning for natural language. Master's thesis, Massachusetts Institute of Technology, 2005. [Google Scholar](https://scholar.google.com/scholar?q=P.+Liang.+Semi-supervised+learning+for+natural+language.+Master%27s+thesis%2C+Massachusetts+Institute+of+Technology%2C+2005.) \[49\] P. Liang, H. Daumé, III, and D. Klein. Structure compilation: trading structure for features. In *International Conference on Machine learning (ICML)*, pages 592-599, 2008. [Crossref](https://doi.org/10.1145/1390156.1390231) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1390156.1390231) \[50\] D. Lin and X. Wu. Phrase clustering for discriminative learning. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 1030-1038, 2009. [Crossref](https://doi.org/10.5555/1690219.1690290) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1690219.1690290) \[51\] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In *Machine Learning*, pages 285-318, 1988. [Crossref](https://doi.org/10.1023/A:1022869011914) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1023%2FA%3A1022869011914) \[52\] A. McCallum and Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 188-191, 2003. [Crossref](https://doi.org/10.3115/1119176.1119206) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1119176.1119206) \[53\] D. McClosky, E. Charniak, and M. Johnson. Effective self-training for parsing. *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, 2006. [Crossref](https://doi.org/10.3115/1220835.1220855) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220835.1220855) \[54\] R. McDonald, K. Crammer, and F. Pereira. Flexible text segmentation with structured multilabel classification. In *Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)*, pages 987-994, 2005. [Crossref](https://doi.org/10.3115/1220575.1220699) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220575.1220699) \[55\] S. Miller, H. Fox, L. Ramshaw, and R. Weischedel. A novel use of statistical parsing to extract information from text. *Applied Natural Language Processing Conference (ANLP)*, 2000. [Google Scholar](https://scholar.google.com/scholar?q=S.+Miller%2C+H.+Fox%2C+L.+Ramshaw%2C+and+R.+Weischedel.+A+novel+use+of+statistical+parsing+to+extract+information+from+text.+Applied+Natural+Language+Processing+Conference+%28ANLP%29%2C+2000.) \[56\] S. Miller, J. Guinness, and A. Zamanian. Name tagging with word clusters and discriminative training. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 337-342, 2004. [Google Scholar](https://scholar.google.com/scholar?q=S.+Miller%2C+J.+Guinness%2C+and+A.+Zamanian.+Name+tagging+with+word+clusters+and+discriminative+training.+In+Conference+of+the+North+American+Chapter+of+the+Association+for+Computational+Linguistics+%26+Human+Language+Technologies+%28NAACL-HLT%29%2C+pages+337-342%2C+2004.) \[57\] A. Mnih and G. E. Hinton. Three new graphical models for statistical language modelling. In *International Conference on Machine Learning (ICML)*, pages 641-648, 2007. [Crossref](https://doi.org/10.1145/1273496.1273577) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1273496.1273577) \[58\] G. Musillo and P. Merlo. Robust Parsing of the Proposition Bank. *ROMAND 2006: Robust Methods in Analysis of Natural language Data*, 2006. [Google Scholar](https://scholar.google.com/scholar?q=G.+Musillo+and+P.+Merlo.+Robust+Parsing+of+the+Proposition+Bank.+ROMAND+2006%3A+Robust+Methods+in+Analysis+of+Natural+language+Data%2C+2006.) \[59\] R. M. Neal. *Bayesian Learning for Neural Networks*. Number 118 in Lecture Notes in Statistics. Springer-Verlag, New York, 1996. [Crossref](https://doi.org/10.5555/525544) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F525544) \[60\] D. Okanohara and J. Tsujii. A discriminative language model with pseudo-negative samples. *Meeting of the Association for Computational Linguistics (ACL)*, pages 73-80, 2007. [Google Scholar](https://scholar.google.com/scholar?q=D.+Okanohara+and+J.+Tsujii.+A+discriminative+language+model+with+pseudo-negative+samples.+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+73-80%2C+2007.) \[61\] M. Palmer, D. Gildea, and P. Kingsbury. The proposition bank: An annotated corpus of semantic roles. *Computational Linguistics*, 31(1):71-106, 2005. [Crossref](https://doi.org/10.1162/0891201053630264) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2F0891201053630264) \[62\] J. Pearl. *Probabilistic Reasoning in Intelligent Systems*. Morgan Kaufman, San Mateo, 1988. [Crossref](https://doi.org/10.5555/52121) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F52121) \[63\] D. C. Plaut and G. E. Hinton. Learning sets of filters using back-propagation. *Computer Speech and Language*, 2:35-61, 1987. [Google Scholar](https://scholar.google.com/scholar?q=D.+C.+Plaut+and+G.+E.+Hinton.+Learning+sets+of+filters+using+back-propagation.+Computer+Speech+and+Language%2C+2%3A35-61%2C+1987.) \[64\] M. F. Porter. An algorithm for suffix stripping. *Program*, 14(3):130-137, 1980. [Google Scholar](https://scholar.google.com/scholar?q=M.+F.+Porter.+An+algorithm+for+suffix+stripping.+Program%2C+14%283%29%3A130-137%2C+1980.) \[65\] S. Pradhan, W. Ward, K. Hacioglu, J. Martin, and D. Jurafsky. Shallow semantic parsing using support vector machines. *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, 2004. [Google Scholar](https://scholar.google.com/scholar?q=S.+Pradhan%2C+W.+Ward%2C+K.+Hacioglu%2C+J.+Martin%2C+and+D.+Jurafsky.+Shallow+semantic+parsing+using+support+vector+machines.+Conference+of+the+North+American+Chapter+of+the+Association+for+Computational+Linguistics+%26+Human+Language+Technologies+%28NAACL-HLT%29%2C+2004.) \[66\] S. Pradhan, K. Hacioglu, W. Ward, J. H. Martin, and D. Jurafsky. Semantic role chunking combining complementary syntactic views. In *Conference on Computational Natural Language Learning (CoNLL)*, pages 217-220, 2005. [Crossref](https://doi.org/10.5555/1706543.1706585) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706585) \[67\] V. Punyakanok, D. Roth, and W. Yih. The necessity of syntactic parsing for semantic role labeling. In *International Joint Conference on Artificial Intelligence (IJCAI)*, pages 1117-1123, 2005. [Crossref](https://doi.org/10.5555/1642293.1642472) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1642293.1642472) \[68\] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. *Proceedings of the IEEE*, 77(2):257-286, 1989. [Google Scholar](https://scholar.google.com/scholar?q=L.+R.+Rabiner.+A+tutorial+on+hidden+Markov+models+and+selected+applications+in+speech+recognition.+Proceedings+of+the+IEEE%2C+77%282%29%3A257-286%2C+1989.) \[69\] L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In *Conference on Computational Natural Language Learning (CoNLL)*, pages 147-155. Association for Computational Linguistics, 2009. [Crossref](https://doi.org/10.5555/1596374.1596399) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1596374.1596399) \[70\] A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In *Conference on Empirical Methods in Natural Language Processing (EMNLP)*, pages 133-142, 1996. [Google Scholar](https://scholar.google.com/scholar?q=A.+Ratnaparkhi.+A+maximum+entropy+model+for+part-of-speech+tagging.+In+Conference+on+Empirical+Methods+in+Natural+Language+Processing+%28EMNLP%29%2C+pages+133-142%2C+1996.) \[71\] B. Rosenfeld and R. Feldman. Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. *Meeting of the Association for Computational Linguistics (ACL)*, pages 600-607, 2007. [Google Scholar](https://scholar.google.com/scholar?q=B.+Rosenfeld+and+R.+Feldman.+Using+Corpus+Statistics+on+Entities+to+Improve+Semi-supervised+Relation+Extraction+from+the+Web.+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+600-607%2C+2007.) \[72\] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by backpropagating errors. In D.E. Rumelhart and J. L. McClelland, editors, *Parallel Distributed Processing: Explorations in the Microstructure of Cognition*, volume 1, pages 318-362. MIT Press, 1986. [Crossref](https://doi.org/10.5555/104279.104293) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F104279.104293) \[73\] H. Schütze. Distributional part-of-speech tagging. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 141-148, 1995. [Crossref](https://doi.org/10.3115/976973.976994) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F976973.976994) \[74\] H. Schwenk and J. L. Gauvain. Connectionist language modeling for large vocabulary continuous speech recognition. In *International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, pages 765-768, 2002. [Google Scholar](https://scholar.google.com/scholar?q=H.+Schwenk+and+J.+L.+Gauvain.+Connectionist+language+modeling+for+large+vocabulary+continuous+speech+recognition.+In+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing+%28ICASSP%29%2C+pages+765-768%2C+2002.) \[75\] F. Sha and F. Pereira. Shallow parsing with conditional random fields. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 134-141, 2003. [Crossref](https://doi.org/10.3115/1073445.1073473) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073445.1073473) \[76\] C. E. Shannon. Prediction and entropy of printed english. *Bell Systems Technical Journal*, 30: 50-64, 1951. [Google Scholar](https://scholar.google.com/scholar?q=C.+E.+Shannon.+Prediction+and+entropy+of+printed+english.+Bell+Systems+Technical+Journal%2C+30%3A+50-64%2C+1951.) \[77\] H. Shen and A. Sarkar. Voting between multiple data representations for text chunking. *Advances in Artificial Intelligence*, pages 389-400, 2005. [Crossref](https://doi.org/10.1007/11424918_40) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1007%2F11424918_40) \[78\] L. Shen, G. Satta, and A. K. Joshi. Guided learning for bidirectional sequence classification. In *Meeting of the Association for Computational Linguistics (ACL)*, 2007. [Google Scholar](https://scholar.google.com/scholar?q=L.+Shen%2C+G.+Satta%2C+and+A.+K.+Joshi.+Guided+learning+for+bidirectional+sequence+classification.+In+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+2007.) \[79\] N. A. Smith and J. Eisner. Contrastive estimation: Training log-linear models on unlabeled data. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 354-362, 2005. [Crossref](https://doi.org/10.3115/1219840.1219884) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1219840.1219884) \[80\] S. C. Suddarth and A. D. C. Holden. Symbolic-neural systems and the use of hints for developing complex systems. *International Journal of Man-Machine Studies*, 35(3):291-311, 1991. [Crossref](https://doi.org/10.1016/S0020-7373\(05\)80130-0) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1016%2FS0020-7373%2805%2980130-0) \[81\] X. Sun, L.-P. Morency, D. Okanohara, and J. Tsujii. Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In *International Conference on Computational Linguistics (COLING)*, pages 841-848, 2008. [Crossref](https://doi.org/10.5555/1599081.1599187) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1599081.1599187) \[82\] C. Sutton and A. McCallum. Joint parsing and semantic role labeling. In *Conference on Computational Natural Language (CoNLL)*, pages 225-228, 2005a. [Crossref](https://doi.org/10.5555/1706543.1706587) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706587) \[83\] C. Sutton and A. McCallum. Composition of conditional randomfields for transfer learning. *Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)*, pages 748-754, 2005b. [Crossref](https://doi.org/10.3115/1220575.1220669) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220575.1220669) \[84\] C. Sutton, A. McCallum, and K. Rohanimanesh. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. *Journal of Machine Learning Research (JMLR)*, 8:693-723, 2007. [Crossref](https://doi.org/10.5555/1248659.1248684) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1248659.1248684) \[85\] J. Suzuki and H. Isozaki. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 665-673, 2008. [Google Scholar](https://scholar.google.com/scholar?q=J.+Suzuki+and+H.+Isozaki.+Semi-supervised+sequential+labeling+and+segmentation+using+giga-word+scale+unlabeled+data.+In+Conference+of+the+North+American+Chapter+of+the+Association+for+Computational+Linguistics+%26+Human+Language+Technologies+%28NAACL-HLT%29%2C+pages+665-673%2C+2008.) \[86\] W. J. Teahan and J. G. Cleary. The entropy of english using ppm-based models. In *Data Compression Conference (DCC)*, pages 53-62. IEEE Computer Society Press, 1996. [Crossref](https://doi.org/10.5555/789084.789503) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F789084.789503) \[87\] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, 2003. [Crossref](https://doi.org/10.3115/1073445.1073478) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073445.1073478) \[88\] J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semisupervised learning. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 384-392, 2010. [Crossref](https://doi.org/10.5555/1858681.1858721) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1858681.1858721) \[89\] N. Ueffing, G. Haffari, and A. Sarkar. Transductive learning for statistical machine translation. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 25-32, 2007. [Google Scholar](https://scholar.google.com/scholar?q=N.+Ueffing%2C+G.+Haffari%2C+and+A.+Sarkar.+Transductive+learning+for+statistical+machine+translation.+In+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+25-32%2C+2007.) \[90\] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.J. Lang. Phoneme recognition using time-delay neural networks. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, 37(3): 328-339, 1989. [Google Scholar](https://scholar.google.com/scholar?q=A.+Waibel%2C+T.+Hanazawa%2C+G.+Hinton%2C+K.+Shikano%2C+and+K.J.+Lang.+Phoneme+recognition+using+time-delay+neural+networks.+IEEE+Transactions+on+Acoustics%2C+Speech%2C+and+Signal+Processing%2C+37%283%29%3A+328-339%2C+1989.) \[91\] J. Weston, F. Ratle, and R. Collobert. Deep learning via semi-supervised embedding. In *International Conference on Machine learning (ICML)*, pages 1168-1175, 2008. [Crossref](https://doi.org/10.1145/1390156.1390303) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1390156.1390303) Show all references ## Cited By [View all](https://dl.acm.org/action/ajaxShowCitedBy?doi=10.5555/1953048.2078186 "View all cited by in new tab") - Krauß TDashtbani HDmitrienko ABauer LPellegrino G(2025)TwinBreakProceedings of the 34th USENIX Conference on Security Symposium10\.5555/3766078.3766199(2343-2362)Online publication date: 13-Aug-2025 <https://dl.acm.org/doi/10.5555/3766078.3766199> - Li YHuang JLiu JLi ZJiang WWang JAntonie LPei JYu XChierichetti FLauw HSun YParthasarathy S(2025)SwitchTop-k: Scaling Top-k Compression on Programmable SwitchesProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.210\.1145/3711896.3737142(1612-1622)Online publication date: 3-Aug-2025 <https://dl.acm.org/doi/10.1145/3711896.3737142> - Shaikh TRasool TMir W(2025)Fields of the futureComputer Standards & Interfaces10\.1016/j.csi.2025.104005**94**:COnline publication date: 1-Aug-2025 <https://dl.acm.org/doi/10.1016/j.csi.2025.104005> - [Show More Cited By]() 1. Natural Language Processing (Almost) from Scratch 1. [Computing methodologies](https://dl.acm.org/topic/ccs2012/10010147?SeriesKey=jmlr&expand=all) 1. [Artificial intelligence](https://dl.acm.org/topic/ccs2012/10010147.10010178?SeriesKey=jmlr&expand=all) 2. [Hardware](https://dl.acm.org/topic/ccs2012/10010583?SeriesKey=jmlr&expand=all) 1. [Power and energy](https://dl.acm.org/topic/ccs2012/10010583.10010662?SeriesKey=jmlr&expand=all) 1. [Power estimation and optimization](https://dl.acm.org/topic/ccs2012/10010583.10010662.10010674?SeriesKey=jmlr&expand=all) ## Recommendations - [Review of "Natural language processing: a Paninian perspective" by Akshar Bharati, Vineet Chaitanya, and Rajeev Sangal. Prentice-Hall of India 1995.](https://dl.acm.org/doi/10.5555/216261.976620 "Review of \"Natural language processing: a Paninian perspective\" by Akshar Bharati, Vineet Chaitanya, and Rajeev Sangal. Prentice-Hall of India 1995.") [Read More](https://dl.acm.org/doi/10.5555/216261.976620 "Read More") - [Text independent root word identification in Hindi language using natural language processing](https://dl.acm.org/doi/10.1504/IJAIP.2015.073705 "Text independent root word identification in Hindi language using natural language processing") In this paper, an attempt is made to parse Hindi words to identify root word from an inflected word using natural language processing NLP technique. Stemming is a heuristic process that chops off the ends of words to find the root word and often includes ... [Read More](https://dl.acm.org/doi/10.1504/IJAIP.2015.073705 "Read More") - [Turkish Natural Language Processing](https://dl.acm.org/doi/10.5555/3281314 "Turkish Natural Language Processing") [Read More](https://dl.acm.org/doi/10.5555/3281314 "Read More") ## Comments [0 Comments](https://dl.acm.org/doi/10.5555/1953048.2078186#disqus_thread) Please enable JavaScript to view the[comments powered by Disqus.](https://disqus.com/?ref_noscript) ## Information & Contributors Information Contributors ### Information #### Published In ![cover image The Journal of Machine Learning Research](https://dl.acm.org/specs/products/acm/releasedAssets/images/cover-default--acm-journal-5aa15b21c4d33d09988e1c18a0654568.svg) The Journal of Machine Learning Research Volume 12, Issue 2/1/2011 3426 pages ISSN:1532-4435 EISSN:1533-7928 [Issue’s Table of Contents](https://dl.acm.org/toc/jmlr/2011/12/null) #### Publisher JMLR.org #### Publication History **Published**: 01 November 2011 Published in JMLR Volume 12 #### Qualifiers - Article ### Contributors ![](https://dl.acm.org/specs/products/acm/releasedAssets/images/loader-7e60691fbe777356dc81ff6d223a82a6.gif) #### Other Metrics [View Article Metrics](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-metrics-inner) ## Bibliometrics & Citations Bibliometrics Citations1129 ### Bibliometrics #### Article Metrics - 1,129 Total Citations [View Citations](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-citations) - 9,982 Total Downloads - Downloads (Last 12 months)458 - Downloads (Last 6 weeks)35 Reflects downloads up to 05 Oct 2025 #### Other Metrics [View Author Metrics](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-contributors) ### Citations ## Cited By [View all](https://dl.acm.org/action/ajaxShowCitedBy?doi=10.5555/1953048.2078186 "View all cited by in new tab") - Krauß TDashtbani HDmitrienko ABauer LPellegrino G(2025)TwinBreakProceedings of the 34th USENIX Conference on Security Symposium10\.5555/3766078.3766199(2343-2362)Online publication date: 13-Aug-2025 <https://dl.acm.org/doi/10.5555/3766078.3766199> - Li YHuang JLiu JLi ZJiang WWang JAntonie LPei JYu XChierichetti FLauw HSun YParthasarathy S(2025)SwitchTop-k: Scaling Top-k Compression on Programmable SwitchesProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.210\.1145/3711896.3737142(1612-1622)Online publication date: 3-Aug-2025 <https://dl.acm.org/doi/10.1145/3711896.3737142> - Shaikh TRasool TMir W(2025)Fields of the futureComputer Standards & Interfaces10\.1016/j.csi.2025.104005**94**:COnline publication date: 1-Aug-2025 <https://dl.acm.org/doi/10.1016/j.csi.2025.104005> - Zhang YLiu YZhu JChen ZZhai SWu X(2025)Inner-character and Inner-word Features Based Representation Learning for Chinese Word EmbeddingACM Transactions on Asian and Low-Resource Language Information Processing10\.1145/3748316**24**:9(1-33)Online publication date: 18-Jul-2025 <https://dl.acm.org/doi/10.1145/3748316> - Aljofey ABello SLu JXu C(2025)Comprehensive phishing detectionJournal of Network and Computer Applications10\.1016/j.jnca.2025.104170**238**:COnline publication date: 1-Jun-2025 <https://dl.acm.org/doi/10.1016/j.jnca.2025.104170> - Lin SFrasincar FKlinkhamer J(2025)Hierarchical deep learning for multi-label imbalanced text classification of economic literatureApplied Soft Computing10\.1016/j.asoc.2025.113189**176**:COnline publication date: 1-May-2025 <https://dl.acm.org/doi/10.1016/j.asoc.2025.113189> - Zhang YLi NZhang LLin JGao XChen G(2025)A review on the recent developments in vision-based apple-harvesting robots for recognizing fruit and picking poseComputers and Electronics in Agriculture10\.1016/j.compag.2025.109968**231**:COnline publication date: 30-Apr-2025 <https://dl.acm.org/doi/10.1016/j.compag.2025.109968> - Zhang YLiu YZhu JChen ZZhang F(2025)FRGEMExpert Systems with Applications: An International Journal10\.1016/j.eswa.2024.125589**262**:COnline publication date: 1-Mar-2025 <https://dl.acm.org/doi/10.1016/j.eswa.2024.125589> - Wadawadagi RTiwari Spagi V(2025)Polarity-aware deep attention network for aspect-based sentiment analysisProgress in Artificial Intelligence10\.1007/s13748-024-00352-x**14**:1(33-48)Online publication date: 1-Mar-2025 <https://dl.acm.org/doi/10.1007/s13748-024-00352-x> - Costa YOliveira HNogueira VMassa LYang XBarbosa AOliveira KVieira T(2025)Automating petition classification in Brazil’s legal system: a two-step deep learning approachArtificial Intelligence and Law10\.1007/s10506-023-09385-4**33**:1(227-251)Online publication date: 1-Mar-2025 <https://dl.acm.org/doi/10.1007/s10506-023-09385-4> - [Show More Cited By]() ## View Options ### View options #### PDF View or Download as a PDF file. [PDF](https://dl.acm.org/doi/pdf/10.5555/1953048.2078186 "View PDF") #### eReader View online with eReader. [eReader](https://dl.acm.org/doi/epdf/10.5555/1953048.2078186 "View online with eReader") ## Figures ## Tables ## Media ## Share ### Share #### Share this Publication link https://dl.acm.org/doi/10.5555/1953048.2078186 Copy Link Copied\! Copying failed. #### Share on social media [X](https://dl.acm.org/#twitter "Share on X")[LinkedIn](https://dl.acm.org/#linkedin "Share on LinkedIn")[Reddit](https://dl.acm.org/#reddit "Share on Reddit")[Facebook](https://dl.acm.org/#facebook "Share on Facebook")[email](https://dl.acm.org/#email "Share on email") ## References ### References \[1\] R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. *Journal of Machine Learning Research (JMLR)*, 6:1817-1953, 2005. [Crossref](https://doi.org/10.5555/1046920.1194905) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1046920.1194905) \[2\] R. M. Bell, Y. Koren, and C. Volinsky. The BellKor solution to the Netflix Prize. Technical report, AT\&T Labs, 2007. http://www.research.att.com/~volinsky/netflix. [Google Scholar](https://scholar.google.com/scholar?q=R.+M.+Bell%2C+Y.+Koren%2C+and+C.+Volinsky.+The+BellKor+solution+to+the+Netflix+Prize.+Technical+report%2C+AT%26T+Labs%2C+2007.+http%3A%2F%2Fwww.research.att.com%2F~volinsky%2Fnetflix.) \[3\] Y. Bengio and R. Ducharme. A neural probabilistic language model. In *Advances in Neural Information Processing Systems (NIPS 13)*, 2001. [Google Scholar](https://scholar.google.com/scholar?q=Y.+Bengio+and+R.+Ducharme.+A+neural+probabilistic+language+model.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+13%29%2C+2001.) \[4\] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In *Advances in Neural Information Processing Systems (NIPS 19)*, 2007. [Google Scholar](https://scholar.google.com/scholar?q=Y.+Bengio%2C+P.+Lamblin%2C+D.+Popovici%2C+and+H.+Larochelle.+Greedy+layer-wise+training+of+deep+networks.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+19%29%2C+2007.) \[5\] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In *International Conference on Machine Learning (ICML)*, 2009. [Crossref](https://doi.org/10.1145/1553374.1553380) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1553374.1553380) \[6\] L. Bottou. Stochastic gradient learning in neural networks. In *Proceedings of Neuro-Nimes*. EC2, 1991. [Google Scholar](https://scholar.google.com/scholar?q=L.+Bottou.+Stochastic+gradient+learning+in+neural+networks.+In+Proceedings+of+Neuro-Nimes.+EC2%2C+1991.) \[7\] L. Bottou. Online algorithms and stochastic approximations. In David Saad, editor, *Online Learning and Neural Networks*. Cambridge University Press, Cambridge, UK, 1998. [Crossref](https://doi.org/10.5555/304710.304720) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F304710.304720) \[8\] L. Bottou and P. Gallinari. A framework for the cooperation of learning algorithms. In *Advances in Neural Information Processing Systems (NIPS 3)*. 1991. [Crossref](https://doi.org/10.5555/118850.119002) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F118850.119002) \[9\] L. Bottou, Y. LeCun, and Yoshua Bengio. Global training of document processing systems using graph transformer networks. In *Conference on Computer Vision and Pattern Recognition (CVPR)*, pages 489-493, 1997. [Crossref](https://doi.org/10.5555/794189.794462) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F794189.794462) \[10\] J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogelman Soulie and J. Herault, editors, *Neurocomputing: Algorithms, Architectures and Applications*, pages 227-236. NATO ASI Series, 1990. [Google Scholar](https://scholar.google.com/scholar?q=J.+S.+Bridle.+Probabilistic+interpretation+of+feedforward+classification+network+outputs%2C+with+relationships+to+statistical+pattern+recognition.+In+F.+Fogelman+Soulie+and+J.+Herault%2C+editors%2C+Neurocomputing%3A+Algorithms%2C+Architectures+and+Applications%2C+pages+227-236.+NATO+ASI+Series%2C+1990.) \[11\] P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J C. Lai. Class-based n-gram models of natural language. *Computational Linguistics*, 18(4):467-479, 1992a. [Crossref](https://doi.org/10.5555/176313.176316) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F176313.176316) \[12\] P. F. Brown, V. J. Della Pietra, R. L. Mercer, S. A. Della Pietra, and J. C. Lai. An estimate of an upper bound for the entropy of english. *Computational Linguistics*, 18(1):31-41, 1992b. [Crossref](https://doi.org/10.5555/146680.146685) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F146680.146685) \[13\] C. J. C. Burges, R. Ragno, and Quoc Viet Le. Learning to rank with nonsmooth cost functions. In *Advances in Neural Information Processing Systems (NIPS 19)*, pages 193-200. 2007. [Google Scholar](https://scholar.google.com/scholar?q=C.+J.+C.+Burges%2C+R.+Ragno%2C+and+Quoc+Viet+Le.+Learning+to+rank+with+nonsmooth+cost+functions.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+19%29%2C+pages+193-200.+2007.) \[14\] R. Caruana. Multitask Learning. *Machine Learning*, 28(1):41-75, 1997. [Crossref](https://doi.org/10.1023/A:1007379606734) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1023%2FA%3A1007379606734) \[15\] O. Chapelle, B. Schlkopf, and A. Zien. *Semi-Supervised Learning*. Adaptive computation and machine learning. MIT Press, Cambridge, Mass., USA, September 2006. [Crossref](https://doi.org/10.5555/1208768) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1208768) \[16\] E. Charniak. A maximum-entropy-inspired parser. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 132-139, 2000. [Crossref](https://doi.org/10.5555/974305.974323) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F974305.974323) \[17\] H. L. Chieu. Named entity recognition with a maximum entropy approach. In *Conference on Natural Language Learning (CoNLL)*, pages 160-163, 2003. [Crossref](https://doi.org/10.3115/1119176.1119199) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1119176.1119199) \[18\] N. Chomsky. Three models for the description of language. *IRE Transactions on Information Theory*, 2(3):113-124, September 1956. [Google Scholar](https://scholar.google.com/scholar?q=N.+Chomsky.+Three+models+for+the+description+of+language.+IRE+Transactions+on+Information+Theory%2C+2%283%29%3A113-124%2C+September+1956.) \[19\] S. Clemencon and N. Vayatis. Ranking the best instances. *Journal of Machine Learning Research (JMLR)*, 8:2671-2699, 2007. [Crossref](https://doi.org/10.5555/1314498.1390330) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1314498.1390330) \[20\] W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. *Journal of Artificial Intelligence Research (JAIR)*, 10:243-270, 1998. [Crossref](https://doi.org/10.5555/1622859.1622867) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1622859.1622867) \[21\] T. Cohn and P. Blunsom. Semantic role labelling with tree conditional random fields. In *Conference on Computational Natural Language (CoNLL)*, 2005. [Crossref](https://doi.org/10.5555/1706543.1706573) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706573) \[22\] M. Collins. *Head-Driven Statistical Models for Natural Language Parsing*. PhD thesis, University of Pennsylvania, 1999. [Crossref](https://doi.org/10.5555/929278) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F929278) \[23\] R. Collobert. *Large Scale Machine Learning*. PhD thesis, Universite Paris VI, 2004. [Google Scholar](https://scholar.google.com/scholar?q=R.+Collobert.+Large+Scale+Machine+Learning.+PhD+thesis%2C+Universite+Paris+VI%2C+2004.) \[24\] R. Collobert. Deep learning for efficient discriminative parsing. In *International Conference on Artificial Intelligence and Statistics (AISTATS)*, 2011. [Google Scholar](https://scholar.google.com/scholar?q=R.+Collobert.+Deep+learning+for+efficient+discriminative+parsing.+In+International+Conference+on+Artificial+Intelligence+and+Statistics+%28AISTATS%29%2C+2011.) \[25\] T. Cover and R. King. A convergent gambling estimate of the entropy of english. *IEEE Transactions on Information Theory*, 24(4):413-421, July 1978. [Google Scholar](https://scholar.google.com/scholar?q=T.+Cover+and+R.+King.+A+convergent+gambling+estimate+of+the+entropy+of+english.+IEEE+Transactions+on+Information+Theory%2C+24%284%29%3A413-421%2C+July+1978.) \[26\] R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 168-171, 2003. [Crossref](https://doi.org/10.3115/1119176.1119201) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1119176.1119201) \[27\] D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. *Computational Linguistics*, 28(3): 245-288, 2002. [Crossref](https://doi.org/10.1162/089120102760275983) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2F089120102760275983) \[28\] D. Gildea and M. Palmer. The necessity of parsing for predicate argument recognition. *Meeting of the Association for Computational Linguistics (ACL)*, pages 239-246, 2002. [Crossref](https://doi.org/10.3115/1073083.1073124) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073083.1073124) \[29\] J. Gimenez and L. Marquez. SVMTool: A general POS tagger generator based on support vector machines. In *Conference on Language Resources and Evaluation (LREC)*, 2004. [Google Scholar](https://scholar.google.com/scholar?q=J.+Gimenez+and+L.+Marquez.+SVMTool%3A+A+general+POS+tagger+generator+based+on+support+vector+machines.+In+Conference+on+Language+Resources+and+Evaluation+%28LREC%29%2C+2004.) \[30\] A. Haghighi, K. Toutanova, and C. D. Manning. A joint model for semantic role labeling. In *Conference on Computational Natural Language Learning (CoNLL)*, June 2005. [Crossref](https://doi.org/10.5555/1706543.1706574) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706574) \[31\] Z. S. Harris. *Mathematical Structures of Language*. John Wiley & Sons Inc., 1968. [Google Scholar](https://scholar.google.com/scholar?q=Z.+S.+Harris.+Mathematical+Structures+of+Language.+John+Wiley+%26+Sons+Inc.%2C+1968.) \[32\] D. Heckerman, D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization. *Journal of Machine Learning Research (JMLR)*, 1:49-75, 2001. [Crossref](https://doi.org/10.1162/153244301753344614) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2F153244301753344614) \[33\] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. *Neural Computation*, 18(7):1527-1554, July 2006. [Crossref](https://doi.org/10.1162/neco.2006.18.7.1527) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2Fneco.2006.18.7.1527) \[34\] K. Hollingshead, S. Fisher, and B. Roark. Comparing and combining finite-state and context-free parsers. In *Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)*, pages 787-794, 2005. [Crossref](https://doi.org/10.3115/1220575.1220674) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220575.1220674) \[35\] F. Huang and A. Yates. Distributional representations for handling sparsity in supervised sequence-labeling. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 495-503, 2009. [Crossref](https://doi.org/10.5555/1687878.1687948) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1687878.1687948) \[36\] F. Jelinek. Continuous speech recognition by statistical methods. *Proceedings of the IEEE*, 64(4): 532-556, 1976. [Google Scholar](https://scholar.google.com/scholar?q=F.+Jelinek.+Continuous+speech+recognition+by+statistical+methods.+Proceedings+of+the+IEEE%2C+64%284%29%3A+532-556%2C+1976.) \[37\] T. Joachims. Transductive inference for text classification using support vector machines. In *International Conference on Machine learning (ICML)*, 1999. [Crossref](https://doi.org/10.5555/645528.657646) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F645528.657646) \[38\] D. Klein and C. D. Manning. Natural language grammar induction using a constituent-context model. In *Advances in Neural Information Processing Systems (NIPS 14)*, pages 35-42. 2002. [Google Scholar](https://scholar.google.com/scholar?q=D.+Klein+and+C.+D.+Manning.+Natural+language+grammar+induction+using+a+constituent-context+model.+In+Advances+in+Neural+Information+Processing+Systems+%28NIPS+14%29%2C+pages+35-42.+2002.) \[39\] T. Koo, X. Carreras, and M. Collins. Simple semi-supervised dependency parsing. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 595-603, 2008. [Google Scholar](https://scholar.google.com/scholar?q=T.+Koo%2C+X.+Carreras%2C+and+M.+Collins.+Simple+semi-supervised+dependency+parsing.+In+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+595-603%2C+2008.) \[40\] P. Koomen, V. Punyakanok, D. Roth, and W. Yih. Generalized inference with multiple semantic role labeling systems (shared task paper). In *Conference on Computational Natural Language Learning (CoNLL)*, pages 181-184, 2005. [Crossref](https://doi.org/10.5555/1706543.1706576) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706576) \[41\] T. Kudo and Y. Matsumoto. Chunking with support vector machines. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 1-8, 2001. [Crossref](https://doi.org/10.3115/1073336.1073361) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073336.1073361) \[42\] T. Kudoh and Y. Matsumoto. Use of support vector learning for chunk identification. In *Conference on Natural Language Learning (CoNLL) and Second Learning Language in Logic Workshop (LLL)*, pages 142-144, 2000. [Crossref](https://doi.org/10.3115/1117601.1117635) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1117601.1117635) \[43\] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In *International Conference on Machine Learning (ICML)*, 2001. [Crossref](https://doi.org/10.5555/645530.655813) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F645530.655813) \[44\] Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. *Proceedings of IEEE*, 86(11):2278-2324, 1998. [Google Scholar](https://scholar.google.com/scholar?q=Y.+Le+Cun%2C+L.+Bottou%2C+Y.+Bengio%2C+and+P.+Haffner.+Gradient+based+learning+applied+to+document+recognition.+Proceedings+of+IEEE%2C+86%2811%29%3A2278-2324%2C+1998.) \[45\] Y. LeCun. A learning scheme for asymmetric threshold networks. In *Proceedings of Cognitiva*, pages 599-604, Paris, France, 1985. [Google Scholar](https://scholar.google.com/scholar?q=Y.+LeCun.+A+learning+scheme+for+asymmetric+threshold+networks.+In+Proceedings+of+Cognitiva%2C+pages+599-604%2C+Paris%2C+France%2C+1985.) \[46\] Y. LeCun, L. Bottou, G. B. Orr, and K.-R.Muller. Efficient backprop. In G.B. Orr and K.-R.Muller, editors, *Neural Networks: Tricks of the Trade*, pages 9-50. Springer, 1998. [Crossref](https://doi.org/10.5555/645754.668382) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F645754.668382) \[47\] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. *Journal of Machine Learning Research (JMLR)*, 5:361-397, 2004. [Crossref](https://doi.org/10.5555/1005332.1005345) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1005332.1005345) \[48\] P. Liang. Semi-supervised learning for natural language. Master's thesis, Massachusetts Institute of Technology, 2005. [Google Scholar](https://scholar.google.com/scholar?q=P.+Liang.+Semi-supervised+learning+for+natural+language.+Master%27s+thesis%2C+Massachusetts+Institute+of+Technology%2C+2005.) \[49\] P. Liang, H. Daumé, III, and D. Klein. Structure compilation: trading structure for features. In *International Conference on Machine learning (ICML)*, pages 592-599, 2008. [Crossref](https://doi.org/10.1145/1390156.1390231) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1390156.1390231) \[50\] D. Lin and X. Wu. Phrase clustering for discriminative learning. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 1030-1038, 2009. [Crossref](https://doi.org/10.5555/1690219.1690290) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1690219.1690290) \[51\] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In *Machine Learning*, pages 285-318, 1988. [Crossref](https://doi.org/10.1023/A:1022869011914) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1023%2FA%3A1022869011914) \[52\] A. McCallum and Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 188-191, 2003. [Crossref](https://doi.org/10.3115/1119176.1119206) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1119176.1119206) \[53\] D. McClosky, E. Charniak, and M. Johnson. Effective self-training for parsing. *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, 2006. [Crossref](https://doi.org/10.3115/1220835.1220855) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220835.1220855) \[54\] R. McDonald, K. Crammer, and F. Pereira. Flexible text segmentation with structured multilabel classification. In *Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)*, pages 987-994, 2005. [Crossref](https://doi.org/10.3115/1220575.1220699) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220575.1220699) \[55\] S. Miller, H. Fox, L. Ramshaw, and R. Weischedel. A novel use of statistical parsing to extract information from text. *Applied Natural Language Processing Conference (ANLP)*, 2000. [Google Scholar](https://scholar.google.com/scholar?q=S.+Miller%2C+H.+Fox%2C+L.+Ramshaw%2C+and+R.+Weischedel.+A+novel+use+of+statistical+parsing+to+extract+information+from+text.+Applied+Natural+Language+Processing+Conference+%28ANLP%29%2C+2000.) \[56\] S. Miller, J. Guinness, and A. Zamanian. Name tagging with word clusters and discriminative training. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 337-342, 2004. [Google Scholar](https://scholar.google.com/scholar?q=S.+Miller%2C+J.+Guinness%2C+and+A.+Zamanian.+Name+tagging+with+word+clusters+and+discriminative+training.+In+Conference+of+the+North+American+Chapter+of+the+Association+for+Computational+Linguistics+%26+Human+Language+Technologies+%28NAACL-HLT%29%2C+pages+337-342%2C+2004.) \[57\] A. Mnih and G. E. Hinton. Three new graphical models for statistical language modelling. In *International Conference on Machine Learning (ICML)*, pages 641-648, 2007. [Crossref](https://doi.org/10.1145/1273496.1273577) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1273496.1273577) \[58\] G. Musillo and P. Merlo. Robust Parsing of the Proposition Bank. *ROMAND 2006: Robust Methods in Analysis of Natural language Data*, 2006. [Google Scholar](https://scholar.google.com/scholar?q=G.+Musillo+and+P.+Merlo.+Robust+Parsing+of+the+Proposition+Bank.+ROMAND+2006%3A+Robust+Methods+in+Analysis+of+Natural+language+Data%2C+2006.) \[59\] R. M. Neal. *Bayesian Learning for Neural Networks*. Number 118 in Lecture Notes in Statistics. Springer-Verlag, New York, 1996. [Crossref](https://doi.org/10.5555/525544) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F525544) \[60\] D. Okanohara and J. Tsujii. A discriminative language model with pseudo-negative samples. *Meeting of the Association for Computational Linguistics (ACL)*, pages 73-80, 2007. [Google Scholar](https://scholar.google.com/scholar?q=D.+Okanohara+and+J.+Tsujii.+A+discriminative+language+model+with+pseudo-negative+samples.+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+73-80%2C+2007.) \[61\] M. Palmer, D. Gildea, and P. Kingsbury. The proposition bank: An annotated corpus of semantic roles. *Computational Linguistics*, 31(1):71-106, 2005. [Crossref](https://doi.org/10.1162/0891201053630264) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1162%2F0891201053630264) \[62\] J. Pearl. *Probabilistic Reasoning in Intelligent Systems*. Morgan Kaufman, San Mateo, 1988. [Crossref](https://doi.org/10.5555/52121) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F52121) \[63\] D. C. Plaut and G. E. Hinton. Learning sets of filters using back-propagation. *Computer Speech and Language*, 2:35-61, 1987. [Google Scholar](https://scholar.google.com/scholar?q=D.+C.+Plaut+and+G.+E.+Hinton.+Learning+sets+of+filters+using+back-propagation.+Computer+Speech+and+Language%2C+2%3A35-61%2C+1987.) \[64\] M. F. Porter. An algorithm for suffix stripping. *Program*, 14(3):130-137, 1980. [Google Scholar](https://scholar.google.com/scholar?q=M.+F.+Porter.+An+algorithm+for+suffix+stripping.+Program%2C+14%283%29%3A130-137%2C+1980.) \[65\] S. Pradhan, W. Ward, K. Hacioglu, J. Martin, and D. Jurafsky. Shallow semantic parsing using support vector machines. *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, 2004. [Google Scholar](https://scholar.google.com/scholar?q=S.+Pradhan%2C+W.+Ward%2C+K.+Hacioglu%2C+J.+Martin%2C+and+D.+Jurafsky.+Shallow+semantic+parsing+using+support+vector+machines.+Conference+of+the+North+American+Chapter+of+the+Association+for+Computational+Linguistics+%26+Human+Language+Technologies+%28NAACL-HLT%29%2C+2004.) \[66\] S. Pradhan, K. Hacioglu, W. Ward, J. H. Martin, and D. Jurafsky. Semantic role chunking combining complementary syntactic views. In *Conference on Computational Natural Language Learning (CoNLL)*, pages 217-220, 2005. [Crossref](https://doi.org/10.5555/1706543.1706585) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706585) \[67\] V. Punyakanok, D. Roth, and W. Yih. The necessity of syntactic parsing for semantic role labeling. In *International Joint Conference on Artificial Intelligence (IJCAI)*, pages 1117-1123, 2005. [Crossref](https://doi.org/10.5555/1642293.1642472) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1642293.1642472) \[68\] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. *Proceedings of the IEEE*, 77(2):257-286, 1989. [Google Scholar](https://scholar.google.com/scholar?q=L.+R.+Rabiner.+A+tutorial+on+hidden+Markov+models+and+selected+applications+in+speech+recognition.+Proceedings+of+the+IEEE%2C+77%282%29%3A257-286%2C+1989.) \[69\] L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In *Conference on Computational Natural Language Learning (CoNLL)*, pages 147-155. Association for Computational Linguistics, 2009. [Crossref](https://doi.org/10.5555/1596374.1596399) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1596374.1596399) \[70\] A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In *Conference on Empirical Methods in Natural Language Processing (EMNLP)*, pages 133-142, 1996. [Google Scholar](https://scholar.google.com/scholar?q=A.+Ratnaparkhi.+A+maximum+entropy+model+for+part-of-speech+tagging.+In+Conference+on+Empirical+Methods+in+Natural+Language+Processing+%28EMNLP%29%2C+pages+133-142%2C+1996.) \[71\] B. Rosenfeld and R. Feldman. Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. *Meeting of the Association for Computational Linguistics (ACL)*, pages 600-607, 2007. [Google Scholar](https://scholar.google.com/scholar?q=B.+Rosenfeld+and+R.+Feldman.+Using+Corpus+Statistics+on+Entities+to+Improve+Semi-supervised+Relation+Extraction+from+the+Web.+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+600-607%2C+2007.) \[72\] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by backpropagating errors. In D.E. Rumelhart and J. L. McClelland, editors, *Parallel Distributed Processing: Explorations in the Microstructure of Cognition*, volume 1, pages 318-362. MIT Press, 1986. [Crossref](https://doi.org/10.5555/104279.104293) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F104279.104293) \[73\] H. Schütze. Distributional part-of-speech tagging. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 141-148, 1995. [Crossref](https://doi.org/10.3115/976973.976994) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F976973.976994) \[74\] H. Schwenk and J. L. Gauvain. Connectionist language modeling for large vocabulary continuous speech recognition. In *International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, pages 765-768, 2002. [Google Scholar](https://scholar.google.com/scholar?q=H.+Schwenk+and+J.+L.+Gauvain.+Connectionist+language+modeling+for+large+vocabulary+continuous+speech+recognition.+In+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing+%28ICASSP%29%2C+pages+765-768%2C+2002.) \[75\] F. Sha and F. Pereira. Shallow parsing with conditional random fields. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 134-141, 2003. [Crossref](https://doi.org/10.3115/1073445.1073473) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073445.1073473) \[76\] C. E. Shannon. Prediction and entropy of printed english. *Bell Systems Technical Journal*, 30: 50-64, 1951. [Google Scholar](https://scholar.google.com/scholar?q=C.+E.+Shannon.+Prediction+and+entropy+of+printed+english.+Bell+Systems+Technical+Journal%2C+30%3A+50-64%2C+1951.) \[77\] H. Shen and A. Sarkar. Voting between multiple data representations for text chunking. *Advances in Artificial Intelligence*, pages 389-400, 2005. [Crossref](https://doi.org/10.1007/11424918_40) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1007%2F11424918_40) \[78\] L. Shen, G. Satta, and A. K. Joshi. Guided learning for bidirectional sequence classification. In *Meeting of the Association for Computational Linguistics (ACL)*, 2007. [Google Scholar](https://scholar.google.com/scholar?q=L.+Shen%2C+G.+Satta%2C+and+A.+K.+Joshi.+Guided+learning+for+bidirectional+sequence+classification.+In+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+2007.) \[79\] N. A. Smith and J. Eisner. Contrastive estimation: Training log-linear models on unlabeled data. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 354-362, 2005. [Crossref](https://doi.org/10.3115/1219840.1219884) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1219840.1219884) \[80\] S. C. Suddarth and A. D. C. Holden. Symbolic-neural systems and the use of hints for developing complex systems. *International Journal of Man-Machine Studies*, 35(3):291-311, 1991. [Crossref](https://doi.org/10.1016/S0020-7373\(05\)80130-0) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1016%2FS0020-7373%2805%2980130-0) \[81\] X. Sun, L.-P. Morency, D. Okanohara, and J. Tsujii. Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In *International Conference on Computational Linguistics (COLING)*, pages 841-848, 2008. [Crossref](https://doi.org/10.5555/1599081.1599187) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1599081.1599187) \[82\] C. Sutton and A. McCallum. Joint parsing and semantic role labeling. In *Conference on Computational Natural Language (CoNLL)*, pages 225-228, 2005a. [Crossref](https://doi.org/10.5555/1706543.1706587) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1706543.1706587) \[83\] C. Sutton and A. McCallum. Composition of conditional randomfields for transfer learning. *Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)*, pages 748-754, 2005b. [Crossref](https://doi.org/10.3115/1220575.1220669) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1220575.1220669) \[84\] C. Sutton, A. McCallum, and K. Rohanimanesh. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. *Journal of Machine Learning Research (JMLR)*, 8:693-723, 2007. [Crossref](https://doi.org/10.5555/1248659.1248684) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1248659.1248684) \[85\] J. Suzuki and H. Isozaki. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, pages 665-673, 2008. [Google Scholar](https://scholar.google.com/scholar?q=J.+Suzuki+and+H.+Isozaki.+Semi-supervised+sequential+labeling+and+segmentation+using+giga-word+scale+unlabeled+data.+In+Conference+of+the+North+American+Chapter+of+the+Association+for+Computational+Linguistics+%26+Human+Language+Technologies+%28NAACL-HLT%29%2C+pages+665-673%2C+2008.) \[86\] W. J. Teahan and J. G. Cleary. The entropy of english using ppm-based models. In *Data Compression Conference (DCC)*, pages 53-62. IEEE Computer Society Press, 1996. [Crossref](https://doi.org/10.5555/789084.789503) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F789084.789503) \[87\] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In *Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT)*, 2003. [Crossref](https://doi.org/10.3115/1073445.1073478) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.3115%2F1073445.1073478) \[88\] J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semisupervised learning. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 384-392, 2010. [Crossref](https://doi.org/10.5555/1858681.1858721) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.5555%2F1858681.1858721) \[89\] N. Ueffing, G. Haffari, and A. Sarkar. Transductive learning for statistical machine translation. In *Meeting of the Association for Computational Linguistics (ACL)*, pages 25-32, 2007. [Google Scholar](https://scholar.google.com/scholar?q=N.+Ueffing%2C+G.+Haffari%2C+and+A.+Sarkar.+Transductive+learning+for+statistical+machine+translation.+In+Meeting+of+the+Association+for+Computational+Linguistics+%28ACL%29%2C+pages+25-32%2C+2007.) \[90\] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.J. Lang. Phoneme recognition using time-delay neural networks. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, 37(3): 328-339, 1989. [Google Scholar](https://scholar.google.com/scholar?q=A.+Waibel%2C+T.+Hanazawa%2C+G.+Hinton%2C+K.+Shikano%2C+and+K.J.+Lang.+Phoneme+recognition+using+time-delay+neural+networks.+IEEE+Transactions+on+Acoustics%2C+Speech%2C+and+Signal+Processing%2C+37%283%29%3A+328-339%2C+1989.) \[91\] J. Weston, F. Ratle, and R. Collobert. Deep learning via semi-supervised embedding. In *International Conference on Machine learning (ICML)*, pages 1168-1175, 2008. [Crossref](https://doi.org/10.1145/1390156.1390303) [Google Scholar](https://scholar.google.com/scholar_lookup?doi=10.1145%2F1390156.1390303) #### Affiliations [Download PDF](https://dl.acm.org/doi/pdf/10.5555/1953048.2078186?download=true) Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level [Go to figure location within the article](https://dl.acm.org/doi/10.5555/1953048.2078186#f1 "Go to figure location within the article") [Download figure](https://dl.acm.org/doi/10.5555/1953048.2078186 "Download figure") Toggle share panel Share on social media Toggle information panel  All figures  All tables All figures All tables [xrefBack.goTo]() xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS [Authors Info & Affiliations](https://dl.acm.org/doi/10.5555/1953048.2078186#tab-contributors) [View Issue’s Table of Contents](https://dl.acm.org/toc/jmlr/2011/12/null) Close modal ## Export Citations Close modal ## New Citation Alert added\! This alert has been successfully added and will be sent to: You will be notified whenever a record that you have chosen has been cited. To manage your alert preferences, click on the button below. [Manage my Alerts](https://dl.acm.org/action/showPreferences?menuTab=Alerts "Manage my Alerts") Close modal ## New Citation Alert\! Please [log in to your account](https://dl.acm.org/action/showLogin?redirectUri=/doi/10.5555/1953048.2078186 "Sign In") ## Footer ### Categories - [Journals](https://dl.acm.org/journals "Browse a listing of ACM’s Journals") - [Magazines](https://dl.acm.org/magazines "Browse ACM's Magazines") - [Books](https://dl.acm.org/acmbooks "Browse new Releases of ACM Books") - [Proceedings](https://dl.acm.org/proceedings "Browse the ACM Proceedings") - [SIGs](https://dl.acm.org/sigs "Browse the Special Interest Groups") - [Conferences](https://dl.acm.org/conferences "Browse the Conferences") - [Collections](https://dl.acm.org/collections "Browse the Special Collections") - [People](https://dl.acm.org/people "Discover ACM’s community of authors") ### About - [About ACM Digital Library](https://dl.acm.org/about) - [ACM Digital Library Board](https://dl.acm.org/about/dlboard "ACM DL Board - Governance and Support Staff") - [Subscription Information](https://dl.acm.org/about/access "Accessing the DL") - [Author Guidelines](https://www.acm.org/publications/authors/information-for-authors "Information for Authors") - [Using ACM Digital Library](https://dl.acm.org/about/access "Accessing the DL") - [All Holdings within the ACM Digital Library](https://dl.acm.org/about/content#sec2) - [ACM Computing Classification System](https://dl.acm.org/ccs "Classify publications using ACM's Computing Classification System") - [Accessibility Statement](https://dl.acm.org/about/accessibility "Digital Library Accessibility") ### Join - [Join ACM](https://www.acm.org/membership/join) - [Join SIGs](https://www.acm.org/special-interest-groups/join) - [Subscribe to Publications](https://www.acm.org/publications/subscribe) - [Institutions and Libraries](https://libraries.acm.org/) ### Connect - [Contact us via email](mailto:dl-team@hq.acm.org) - [ACM on Facebook](https://www.facebook.com/AssociationForComputingMachinery/) - [ACM DL on X](https://x.com/acmdl) - [ACM on Linkedin](https://www.linkedin.com/company/association-for-computing-machinery/) - [Send Feedback]() - [Submit a Bug Report]() The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2025 ACM, Inc. - [Terms of Usage](https://libraries.acm.org/digital-library/policies#anchor3) - [Privacy Policy](https://www.acm.org/about-acm/privacy-policy) - [Code of Ethics](https://www.acm.org/code-of-ethics) [![ACM Digital Library home](https://dl.acm.org/specs/products/acm/releasedAssets/images/acm-logo-dl-8437178134fce530bc785276fc316cbf.png)](https://dl.acm.org/) [![ACM Association for Computing Machinery corporate logo](https://dl.acm.org/specs/products/acm/releasedAssets/images/acm-logo-3-10aed79f3a6c95ddb67053b599f029af.png)](https://www.acm.org/ "external site link") Your Search Results Download Request We are preparing your search results for download ... We will inform you here when the file is ready. [Download now\!](https://dl.acm.org/doi/10.5555/1953048.2078186) Your Search Results Download Request Your file of search results citations is now ready. [Download now\!](https://dl.acm.org/doi/10.5555/1953048.2078186) Your Search Results Download Request Your search export query has expired. Please try again. ✓ Thanks for sharing\! [AddToAny](https://www.addtoany.com/ "Share Buttons") [More…](https://dl.acm.org/doi/10.5555/1953048.2078186#addtoany "Show all")
Readable Markdownnull
Shard91 (laksa)
Root Hash6385618855135636491
Unparsed URLorg,acm!dl,/doi/10.5555/1953048.2078186 s443