(1) G. Malewicz, M.N. Austern, A.J.C. Sik, J.C. Denhert, H. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for large-scale graph processing,” Proc. ACM SIGMOD Conference, 2010.
(2) Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: efficient iterative data processing on large clusters,” Proc. Intl. Conf. on Very Large Databases, 2010.
(3) F.N. Afrati and J.D. Ullman, “Optimizing joins in a MapReduce environment,” Proc. Thirteenth Intl. Conf. on Extending Database Technology, 2010.
2. Finding Similar Items (Hashing)
(1) Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. \"Locality-sensitive hashing scheme based on p-stable distributions.\" In Proceedings of the twentieth annual symposium on Computational geometry, pp. 253-262. ACM, 2004.
(2) A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” Comm. ACM 51:1, pp. 117– 122, 2008.
(3) M. Theobald, J. Siddharth, and A. Paepcke, “SpotSigs: robust and efficient near duplicate
detection in large web collections,” 31st Annual ACM SIGIR Conference, July, 2008, Singapore.
3. Stream Data Processing
(1) B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” Symposium on Principles of Database Systems, pp. 1–16, 2002.
(2) M. Datar, A. Gionis, P. Indyk, and R. Motwani, “Maintaining stream statistics over sliding windows,” SIAM J. Computing 31, pp. 1794–1813, 2002.
4. Link Analysis
(1) Gyöngyi, Zoltán, Hector Garcia-Molina, and Jan Pedersen. \"Combating web spam with trustrank.\" Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
(2) Z. Gy¨ongi, P. Berkhin, H. Garcia-Molina, and J. Pedersen, “Link spam detection based on mass estimation,” Proc. 32nd Intl. Conf. on Very Large Databases, pp. 439–450, 2006.
5. Clustering
(1) B. Babcock, M. Datar, R. Motwani, and L. O’Callaghan, “Maintaining variance and k-medians
over data stream windows,” Proc. ACM Symp. on Principles of Database Systems, pp. 234–243, 2003.
(2) T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering method for very large databases,” Proc. ACM SIGMOD Intl. Conf. on Management of Data, pp. 103–114, 1996. (3) S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Proc. ACM SIGMOD Intl. Conf. on Manage- ment of Data, pp. 73–84, 1998.
6. Advertising
(1) N. Craswell, O. Zoeter, M. Taylor, and W. Ramsey, “An experimental comparison of
click-position bias models,” Proc. Intl. Conf. on Web Search and Web Data Mining pp. 87–94, 2008. (2) A Mehta, A. Saberi, U. Vazirani, and V. Vazirani, “Adwords and generalized on-line matching,” IEEE Symp. on Foundations of Computer Science, pp. 264–273, 2005.
7.
Social Network
(1) Afrati F N, Fotakis D, “Ullman J D. Enumerating subgraph instances using map-reduce” Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 2013: 62-73.
(2) L. Backstrom and J. Leskovec, “Supervised random walks: predicting and recommending links in social networks,” Proc. Fourth ACM Intl. Conf. on Web Search and Data Mining (2011), pp. 635–644. (3) S. Suri and S. Vassilivitskii, “Counting triangles and the curse of the last reducer,” Proc. WWW Conference (2011).
8.
Large-Scale Machine Learning 1
(1) L. Bottou, “Large-scale machine learning with stochastic gradient descent,” Proc. 19th Intl. Conf. on Computational Statistics (2010), pp. 177–187, Springer.
(2) L. Bottou, “Stochastic gradient tricks, neural networks,” in Tricks of the Trade, Reloaded, pp. 430–445, Edited by G. Montavon, G.B. Orr and K.-R. Mueller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.
9. Large-Scale Machine Learning 2
(1) Panda, Biswanath, et al. \"Planet: massively parallel learning of tree ensembles with mapreduce.\" Proceedings of the VLDB Endowment 2.2 (2009): 1426-1437.
(2) Zhu, Kaihua, et al. \"Parallelizing support vector machines on distributed computers.\" Advances in Neural Information Processing Systems. 2008.
10. Large-Scale Graph Analysis
(1) Andersen, Reid, Fan Chung, and Kevin Lang. \"Using pagerank to locally partition a graph.\" Internet Mathematics 4.1 (2007): 35-64.
(2) Yang, Jaewon, and Jure Leskovec. \"Overlapping community detection at scale: a nonnegative matrix factorization approach.\" Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
11. Crowd Sourcing
(1) Donmez, Pinar, Jaime G. Carbonell, and Jeff Schneider. \"Efficiently learning the accuracy of labeling sources for selective sampling.\" Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
(2) Welinder, Peter, et al. \"The multidimensional wisdom of crowds.\" Advances in neural information processing systems. 2010.
12. Knowledge Graph
(1) Tandon, Niket, et al. \"WebChild: harvesting and organizing commonsense knowledge from the web.\" Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 2014.
(2) Chen, Xinlei, Abhinav Shrivastava, and Abhinav Gupta. \"NEIL: Extracting visual knowledge from web data.\" Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
13. Deep Learning
(1) Hinton, Geoffrey, Simon Osindero, and Yee-Whye Teh. \"A fast learning algorithm for deep belief nets.\" Neural computation 18.7 (2006): 1527-1554.
(2) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. \"Imagenet classification with deep convolutional neural networks.\" Advances in neural information processing systems. 2012.
因篇幅问题不能全部显示,请点此查看更多更全内容