《数据采集》课程教学大纲

发布者：钟程凯发布时间：2022-07-05 浏览次数：1040

一、课程基本信息

课程代码：18100053

课程名称：数据采集

英文名称：Data Acquisition

课程类别：专业课

学时：总学时48，其中理论讲授32学时，实验16学时

学　　分：3

适用对象: 数据科学与大数据技术

考核方式：考试

先修课程：无

Course Code: 18100053

Course Title: Data Acquisition

English name: Data Acquisition

Course Category: Professional Course

Hours: 48 hours in total, including 32 hours of theoretical lectures and 16 hours of experiments

Credits: 3

Applicable object: data science and big data technology

Assessment method: exam

Prerequisites: None

二、课程简介

中文简介：本课程讲授数据采集的基础知识，即利用网络爬虫收集互联网上的海量数据，包括Web的工作原理、HTML语言基础、使用标准库urllib和第三方库requets、selenium等创建爬虫、使用scrapy框架构建复杂的爬虫、抓取表单和javaScript执行之后的数据、采取的反反爬虫的措施，以及在爬虫过程要遵守道德和法律的约束，使学生在学完本课程后，即可利用爬虫收集互联网上的海量数据。通过学习本课程，使学生掌握Python爬虫的基本思想和技术，为后续的课程(比如大数据分析、机器学习等)打下良好基础。更重要的是，培养学生应用计算机解决和处理实际问题的思维方法与基本编程能力。

英文简介：This course teaches the basic knowledge of data collection, that is, using web crawler to collect massive data on the Internet, including the working principle of web, the basis of HTML language, using the standard library urllib and the third-party library such as requests, selenium to create crawlers, using the framework Scrapy to build complex crawlers, grabbing the data after submitting forms and executing JavaScript, and taking anti crawler measures, and in the process of crawler to abide by the moral and legal constraints, so that students can use crawler to collect massive data on the Internet after learning this course. By studying this course, students can master the basic idea and technology of Python crawler, and lay a good foundation for subsequent courses (such as big data analysis, machine learning, etc.). What's more important is to train students' thinking method and basic programming ability to solve and deal with practical problems with computer.