Haussler Publications, 2001 and Before

Haussler's current research stems from his early work in machine learning, statistical decision theory, pattern recognition, neural networks, algorithms, and complexity.

Unpublished or Non-Refereed

Haussler, D. The challenge of bioinformatics. R&D Magazine. 2001 Nov;43(11):8S-SC3 (in conjunction with Scientist of the Year Award).

Featured scientist interview. Incyte's "Inside Genomics" series, 2001 Oct.

Kent WJ, Haussler D. GigAssembler: an algorithm for the initial assembly of the human genome working draft. UCSC-CRL-00-17, December 27, 2000

Lazareva-Ulitsky B, Rahmann S, Haussler D. Towards an accurate EST consensus. 1999.

Haussler D. Convolution kernels on discrete structures. UCSC-CRL-99-10, July 8, 1999.
Note: On July 13, 1999, I received a Technical Report from Chris Watkins (Technical Report CSD-TR-98-11 from Royal Holloway, University of London, 15 January 1999) that contains independent and prior work that is closely related to and extends the results in section 4.4 on regular string kernels, and some other sections of my technical report.

Haussler D. Computational genefinding.1998. A short version of this paper appeared in Trends in Biochemical Sciences, Supplementary Guide to Bioinformatics. 1998;12-15.

Haussler D, Jaakkola T, Winters-Hilt S. Tradeoffs between generative and discriminative hidden Markov models. Computer Science Department, UC Santa Cruz. 1998.

Journal Articles and Book Chapters

Note: see the Genome Bioinformatics Group publications page for more recent publications.

Kent WJ, Haussler D. Assembly of the working draft of the human genome with GigAssembler. Genome Research. 2001 Sept;11(9):1541-8.

Vercoutere W, Winters-Hilt S, Olsen H, Deamer D, Haussler D, Akeson M. Rapid discrimination among individual DNA hairpin molecules at single nucleotide resolution using an ion channel. Nat Biotechnol. 2001 Mar;19(3):248-52.

International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001 Feb;409(7):860-921.

The International Human Genome Mapping Consortium. A physical map of the human genome. Nature. 2001 Feb;409(6822):934-41.

BAC Resource Consortium. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001;409(6822):953-8.

Reese M, Kulp D, Tammana H, Haussler D. Genie—gene finding in Drosophila melanogaster. Genome Research. 2000 April;10(4):529-38.

Furey T, Cristianini N, Duffy N, Bednarski D, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression. Bioinformatics., 2000;16(10):906-14.

Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Haussler D. Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Science. 2000 Jan 4(1):262-7.

Jaakkola T, Diekhans M, Haussler D. A discriminative framework for detecting remote protein homologies. Journal of Computational Biology. 2000;7(1,2):95-114. [Abstract]

Spingola M, Grate L, Ares M, Haussler D. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA. 1999;5(2):221-34.

Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. Journal of Molecular Biology. 1998;284(4):1201-10.

Haussler D, Kivinen J, Warmuth M. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory. 1998;44(5):1906-25.

Opper M, Haussler D. Worst case prediction over sequences under log loss. In Cybenko G, O'Leary D, Rissanen J (eds): The Mathematics of Information Coding, Extraction and Distribution. Springer Verlag, 1998. [Abstract]

Karplus K, Sjolander K, Barrett C, Cline M, Haussler D, Hughey R, Holm L, Sander C. Predicting protein structure using hidden Markov models. Proteins: Structure, Function and Genetics. 1997;29(Suppl 1):134-9.

Haussler D, Opper M. In Mycielski J, G. Rozenberg G, Salomaa A. Metric entropy and minimax risk in classification. (eds): Lecture Notes in Computer Science: Studies in Logic and Computer Science. 1997;1261:212-35.

Haussler D, Opper M. Mutual information, metric entropy, and cumulative relative entropy risk. Annals of Statistics. 1997 Dec;25(6):2451-92.

Haussler D, Opper M. Mutual information, metric entropy, and risk in estimation of probability distributions. Tech. rep. UCSC-CRL-96-27. 1996. (Long version of previous paper.)

Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie.[Abstract] Journal of Computational Biology. 1997 Fall;4(3):311-23.

Alon N, Ben-David S, Cesa-Bianchi N, Haussler D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM. 1997 July;44(4):615-31.

Haussler D. A general minimax result for relative entropy. IEEE Transactions on Information Theory., 1997 July; 43(4):1276-80. Also tech. rep. UCSC-CRL-96-26. [PDF]

Cesa-Bianchi N, Freund Y, Haussler D, Helmbold D, Schapire R, Warmuth M. How to use expert advice. J. ACM. 1997;44(3):427-85. Based on University of California, Santa Cruz technical report UCSC-CRL-95-19.

Haussler D, Kearns M, Seung HS, Tishby N. Rigorous learning curve bounds from statistical mechanics. Machine Learning. 1996;25(2/3):195-236.

Fayyad U, Haussler D, Stolorz P. Mining scientific data. Communications of the ACM. 1996 Nov;39(11):51-7.

Sjölander K, Karplus K , Brown M, Hughey R, Krogh A, Mian IS, Haussler D. Dirichlet mixtures: a method for improving detection of weak but significant protein sequence homology. Computer Applications in the Biosciences (CABIOS). 1996;12(4):327-45.

Haussler D. Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory Series A. 1995 Feb;69(2):217-32.

Ben-David S, Cesa-Bianchi N, Haussler D, Long P. Characterizations of learnability for classes of {0,...,n}-valued functions. Journal of Computer Systems Science. 1995;50(1):74-86.

Opper M, Haussler D. Bounds for predictive errors in the statistical mechanics of supervised learning. Physical Review Letters. 1995;75(20):3772-5.

Haussler D, Long PM. A generalization of Sauer's lemma. Journal of Combinatorial Theory. 1995 August;71(2):219-40.

Krough A, Brown M, Mian S, Sjölander K. Hidden Markov models in computational biology: applications to protein modeling. Journal Molecular Biology. 1994 Feb;235(5):1501-31. [Longer technical report version]

Haussler D, Littlestone N, Warmuth M. Predicting {0,1}-functions on randomly drawn points. Information and Computation. 1994 Dec;115(2):248-92.

Haussler D, Kearns M, Schapire R. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. Machine Learning. 1994;14(1):83-114.

Krogh A, Mian S, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Research. 1994;22( 22):4768-78.

Sakakibara Y, Brown M, Hughey R, Mian S, Sjölander K, Underwood R, Haussler D. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research., 1994;22(23):5112-20. [PDF]

Haussler D. Probably approximately correct (PAC) learning and decision-theoretic generalizations. In Smolensky P, Mozer MC, Rumelhart DE (eds): Mathematical Perspectives on Neural Networks, Lawrence Erlbaum Associates, Mahwah, NJ, 1996:651-706. [Manuscript PDF] (Note: This chapter contains reprinted material from the next reference down.)

Haussler D. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation. 1992 Sept;100(1):78-150.

Pagallo G, Haussler D. Boolean feature discovery in empirical learning. Machine Learning. 1990;5(1):71-99.

Blumer A, Blumer J, Haussler D, McConnell R, Ehrenfeucht A. Complete inverted files for efficient text retrieval and analysis. Joumal of the Association for Computing Machinery. 1987;34(3):578-95.

Haussler D, Welzl E. Epsilon-nets and simplex range queries. Discrete and Computational Geometry. 1987.

Blumer A, Blumer J, Haussler D, Ehrenfeucht A, Chen MT, Seiferas J. The smallest automaton recognizing the subwords of a text. Theoretical Computer Science. 1985;40:31-55.

Conference Papers

Pavlidis P, Furey T, Liberto M, Haussler D, Grundy WN. Promoter region-based classification of genes. Proceedings of the Pacific Symposium on Biocomputing. Hawaii, Jan 3-7, 2001:151-63.

Kent WJ, Kulp D, Wheeler R, Reese M, Zahler A, Haussler D. Alternative splicing of human genes. Genome Sequencing and Biology. Cold Spring Harbor, May 2000.

Haghighi F, Diekhans M, Grundy WN, Haussler D. Discriminative gene finding methods. Genome Sequencing and Biology. Cold Spring Harbor, May 2000.

Jaakkola T, Diekhans M, Haussler D. Using the Fisher kernel method to detect remote protein homologies [PowerPoint]. Proceedings Seventh International Conference on Intelligent Systems for Molecular Biology. AAAI Press. 1999;149-58. (Winner of best paper award)

Lazareva-Ulitsky B, Haussler D. A probabilistic approach to consensus multiple alignment. Proceedings of the Pacific Symposium on Biocomputing. World Scientific Press., 1999;150-61.

Jaakkola TS, Haussler D. Probabilistic kernel regression models. Proceedings of the 1999 Conference on AI and Statistics.

Jaakkola TS, Haussler D. Exploiting generative models in discriminative classifiers. Proceedings of the Tenth Conference on Advances in Neural Information Processing Systems, published in early 1999. Conference was held in Denver in Dec 1998.

Kulp D, Haussler D, Reese M, Eeckman F. Integrating database homology in a probabilistic gene structure model. Pacific Symposium on Biocomputing, Hawaii, January 1997;232-44, World Scientific Press.

Kulp D, Haussler D, Reese MG, Eeckman FH. A generalized hidden Markov model for the recognition of human genes in DNA. Proceedings of the Conference on Intelligent Systems in Molecular Biology, 1996, AAAI Press.

Fayyad U, Haussler D, Stolorz P. KDD for science data analysis: issues and examples. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Portland, Oregon, 1996.

Gulko B, Haussler D. Using multiple alignments and phylogenetic trees to detect RNA secondary structure. Proceedings of the Pacific Symposium on Biocomputing, Hunter L and Klein T (eds): World Scientific. 1996 Jan;350-67.

Haussler D, Opper M. General bounds on the mutual information between a parameter and n conditionally independent observations. Proceedings of the Eighth Annual Computational Learning Theory Conference (COLT). 1995:402-11, Santa Cruz, CA, ACM Press.

Haussler D, Opper M. Mutual information and Bayes methods for learning a distribution. Proceedings of the Workshop on the Theory of Neural Networks: The Statistical Mechanics Perspective. Scientific publisher, 1995;42-50.

Opper M, Haussler D. General bounds for predictive errors in supervised learning. Proceedings of the Workshop on the Theory of Neural Networks: The Statistical Mechanics Perspective. (Pohang, Korea 1995), pp.51-8, Proceedings ed. by Oh J-H, Kwon C, Cho S, World Scientific (Singapore), 1995.

Haussler D, Kivinen J, Warmuth M. Tight worst-case loss bounds for predicting with expert advice. Proceedings of the European Conference on Computational Learning Theory, (EuroCOLT). 1994. Published Lecture Notes in Computer Science. 1995;904:69-83.

Gibbs sampling for aligning RNA.

Grate L, Herbster M, Hughey R, Mian IS, Noller H, Haussler D. RNA modeling using Gibbs sampling and stochastic context-free grammars. Proceedings, 2nd International Conference on Intelligent Systems for Molecular Biology. 1994 Feb;235:1501-31.

Stormo GD, Haussler D. Optimally parsing a sequence into different classes based on multiple types of information. Second International Conference on Intelligent Systems in Molecular Biology. Menlo Park, CA. AAAI/MIT Press. 1994 Aug;369-375. [Abstract]

Haussler D, Kearns M, Seung S, Tishby N. Rigorous learning curve bounds from statistical mechanics. Proceedings of the Seventh ACM Conference on Computational Learning Theory, (COLT). ACM Press. 1994.

Haussler D. Quantifying the inductive bias in concept learning. AAAI-86 Proceedings.

  UC Santa Cruz Genomics Institute
1156 High St, Mail Stop CBSE,
University of California, Santa Cruz, CA 95064
831-459-1477 |
For questions about the UCSC Genome Browser:
UCSC Home | BSOE Home | Institute Home | CBSE Home | Internal