数据挖掘之暑期特训专场-CDA数据分析师官网

热线电话：13121318867

数据挖掘之暑期特训专场

2022-01-20

2013年是大数据元年，在美国，大数据的应用正在各个行业风生水起，大至奥巴马竞选总统，小至互联网公司的数据挖掘。2014年，数据挖掘技术已经广泛应用于通讯、医疗、银行、证券、保险、制造、商业、市场研究、科研、教育等多个行业和领域。

为追随数据时代的到来，人大经济论坛将于暑期8月开设系列数据挖掘特训班。

分为两大类《基于SAS和weka的商业数据挖掘》和《基于SPSS Modeler的数据挖掘案例实务》

课程	时间	地点	价格	报名
基于SAS和weka的商业数据挖掘	8.2-3，9-10	北京，对外经贸大学	5000/4000学生	点击报名
基于SPSS Modeler的数据挖掘案例实务	8.14-17	北京，对外经贸大学	4000/2500学生	点击报名
全程合报	8.2-3，9-10， 14-17	北京，对外经贸大学	8000/5500学生	点击报名

Number1：

基于SAS和weka的大数据商业数据挖掘

课程内容及目标：

内容包含：

1. 数据挖掘的基本知识；

2. 数据挖掘常用工具的使用与操作；

3. 具体应用案例介绍；

4. 最新热点趋势。

目的：

让大家在大数据时代背景下，用最短的时间快速掌握数据挖掘的工具，从商用大数据中挖掘有用的信息与数据的价值，实现商业模式的创新与完善。

课程特点：

（1）课程中穿插实际案例，从基础到应用，由浅入深，通俗易懂；每一章节都配备有足够的案例和数据分析，保证大家能够掌握数据挖掘的基本操作。

（2）现场演示基于数据挖掘工具软件SAS/Weka软件的数据挖掘操作，,并免费给大家提供常用的数据挖掘工具包和数据挖掘数据集。

（3）讲义以美国著名商学院的数据挖掘教材为基础，精选出对目前对符合国内大数据应用情况的内容，同时结合主讲人对大数据前沿问题最新研究成果和实战经验编写而成。

（4）内容分为基础篇和应用篇:基础篇强调数据挖掘基本知识和基本技术的掌握；应用篇则将数据挖掘和实际应用联系起来，讲授数据挖掘在真正的商业环境中的应用。

课程大纲：

一、基础篇（授课时间为两天）：数据挖掘基础+小规模数据挖掘的应用实例

（1）数据挖掘概述

1.1）大数据时代的数据挖掘

1.2）数据挖掘的经典案例

1.3）数据挖掘实现的十步法

1.4）数据挖掘的研发利器及比较：SAS/Weka，Matlab，C++, Java, R, SPSS）

（2）四大传统数据挖掘方法及典型应用案例演示

2.1）聚类分析（Clustering analysis）及其在客户细分中的应用

2.2) 分类分析（Classification analysis）及其在文档分析中的应用

2.3) 关联规则（Association rule）及其在商场购物促销中的应用

2.4）预测分析及其及客户流失中的应用；

2.5）四大传统数据挖掘算法应用的实际操作

（3）面向大数据的数据挖掘之一: 文本挖掘及互联网海量数据分析的应用

3.1）文本大数据挖掘的基本步骤，和传统挖掘的区别及难点剖析

3.2) 文本挖掘的前沿技术应用实例：网页分析及舆情发现

3.3）大数据时代其它非结构化数据（图像、语音、视频、传感数据等）挖掘及其应用，例如图像检索

（4）面向大数据的数据挖掘之二：基于复杂网络的数据挖掘及其在社交网络分析上的应用

4.1）复杂网络分析：典型的大数据模型

4.2）社交网络的应用：朋友圈发现和社交搜索（Graph search）

4.3) 排列分析（Ranking analysis）及在网络搜索中的应用

4.5）一图胜千言（A picture is more than one thousand words）：挖掘结果的可视化实现

（5）数据挖掘商业应用实例

5.1）利用数据挖掘进行欺诈检测

5.2）商品销售中推荐系统 (Recommending system) 的基本原理

5.3）推荐系统的应用

（6）大数据时代的机遇与挑战

6.1）对大数据（big data ）时代的理解

62）大数据技术扫描：从基础设施、软件平台，计算模式到挖掘模型

6.3）大数据会给我们带来什么改变：机遇和挑战

二、应用篇（授课时间为两天）：在真实的商业环境的应用，对真实的大数据进行挖掘

（1）高级数据挖掘知识：数据降维、集成学习等

（2）金融领域风险预测评分，数据来源：国外某跨国商业银行

（3）电子商务大规模推荐系统的应用，数据来源：国内最大的电子商务网站之一

（4）复杂网络分在信息科学以及金融的应用，数据来源：路透社和某国有大型银行

（5）文本挖掘及其应用，数据来源：路透社和PUMED数据库

（6）基于大数据平台下的数据挖掘具体实现简介，基于阿里巴巴的ODPS平台

基础+应用，最系统最全面的数据挖掘课程！

（详情请参照：http://bbs.pinggu.org/thread-3101498-1-1.html）

Number2：

基于SPSS Modeler的数据挖掘案例实务

培训时间： 2014年8月14-17日（4天）@北京，对外经贸大学

授课安排：

(1) 授课方式：中文多媒体互动式授课方式

(2) 授课时间：上午9:00-12:00，下午13:30-16:30(16:30-17:00答疑)

(3) 所有软件：SPSS Modeler

邀请函下载： 数据挖掘班邀请函访问码：ec01

讲师介绍：

李御玺 (Yue-Shi Lee),国立台湾大学计算机工程博士，铭传大学计算机工程学系教授兼系主任暨所长，铭传大学数据挖掘中心主任，厦门大学数据挖掘中心顾问，中国人民大学数据挖掘中心顾问。其研究领域专注于数据仓库、数据挖掘、与文本挖掘。

在其相关研究领域已发表超过260篇以上的研究论文,同时也是国科会与教育部多个相关研究计划的主持人。服务过的客户包括：中国工商局、中信银行、台新银行、联邦银行、新光银行、新竹国际商业银行(现已并入渣打银行)、第一银行、永丰银行、远东银行、美商大都会人寿、嘉义基督教医院、台湾微软、零售业如赫莲娜(Helena Rubinstein)化妆品公司、特立和乐(HOLA)公司、航空公司如东方航空公司、中华航空公司、汽车行业如福特(Ford)汽车公司；ZF行业如国税局等。

课程大纲：

案例1：Drug Treatments: In this case, imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Part of your job is to use data mining to find out which drug might be appropriate for a future patient with the same illness.

药物治疗(医疗业)：在这个案例中，想象你是一个医学研究人员，并收集许多患有相同疾病的病患资料。在他们的治疗过程中，每一个病人会被记录对哪一种药物有疗效(总共有五种针对此疾病的药物)。此案例的目的是想利用数据挖掘(分类模型-多目标决策树(Decision Tree))找出，哪种药物适用于哪一种类型的病人。

案例2：Modeling Customer Response: This case is based on a company that wants to achieve more profitable results in future marketing campaigns by matching the right offer to each customer. Specifically, this case identifies the characteristics of customers who are most likely to respond, based on previous promotions, and generates a mailing list based on the results.

对客户响应建模(零售业)：本案例是某公司希望通过提供客户对的营销活动，在未来实现更多的获利。此案例的目的是想根据以往的促销活动，利用数据挖掘(分类模型-决策列表(Decision List))找出会对营销活动有响应的客户特征，并根据建模的结果产生要邮寄的促销客户名单。

案例3：Classifying Telecommunications Customers: Suppose a telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. If demographic data can be used to predict group membership, you can customize offers for individual prospective customers.

:电信客户分类(电信业)：假设某电信服务提供商通过客户使用服务的方式，将客户分为四类人。此案例的目的是想根据人口统计数据(分类模型-多目标罗吉斯回归(Multinomial Logistic Regression))，利用数据挖掘找出这四类人的特征，并发掘这四类人的潜在新客户。

案例4：Telecommunications Churn: Suppose a telecommunications provider is concerned about the number of customers it is losing to competitors. If service usage data can be used to predict which customers are liable to transfer to another provider, offers can be customized to retain as many customers as possible. This example focuses on using usage data to predict customer loss (churn).

电信客户流失(电信业)：假设某电信服务提供商非常关注客户流失到竞争对手的数量。假如服务使用的数据可以用来预测哪些客户有可能被转移到另一个供货商，则此供货商可提供客制化的优惠，以尽可能留住客户。此案例的目的是想根据服务使用的数据，利用数据挖掘(分类模型-二元罗吉斯回归(Binomial Logistic Regression))来预测客户的流失。

案例5：Forecasting Bandwidth Utilization: An analyst for a national broadband provider is required to produce forecasts of user subscriptions in order to predict utilization of bandwidth. Forecasts are needed for each of the local markets that make up the national subscriber base. This example will use time series modeling to produce forecasts for the next three months for a number of local markets.

预测带宽使用率(通讯业)：某全国宽带网络供货商的分析师需进行客户使用网络的预估，以便预测带宽的使用。全国网络的使用是全国各局域网络使用的加总，因此分析师需逐一对给个区域市场进行带宽使用的预测。此案例的目的是想利用数据挖掘中的时间序列模型(预测模型-简单时间序列(Simple Time Series))来预测每个区域市场下三个月的带宽使用量。

案例6：Forecasting Catalog Sales: A catalog company is interested in forecasting monthly sales of its men’s clothing line, based on their sales data for the last 10 years. This example takes a closer look at the two methods that are available when choosing a model yourself—exponential smoothing and ARIMA.

预测型录商品的销售(零售业)：某型录公司希望根据过去10年的销售记录，利用数据挖掘来预测男装生产线每月的销售。此案例的目的是想利用数据挖掘中的两个时间序列模型-Exponential Smoothing和ARIMA(预测模型-复杂时间序列(Exponential Smoothing & ARIMA))来解决这个问题。

案例7：Making Offers to Customers: This example teaches you how to predict which offers are most appropriate for customers and the probability of the offers being accepted. These sorts of models are most beneficial in customer relationship management, such as marketing applications or call centers.

提供对的产品给对的客户(银行业)：此案例的目的是想利用数据挖掘(分类模型-自学响应模型(Self-Learning Response Model))来预测客户对不同产品报价的接受程度，以便预测哪些产品适合提供给哪些客户。此类模型适合运用在顾客关系管理中的目标市场营销及客服中心。

案例8：Predicting Loan Defaulters: Suppose a bank is concerned about the potential for loans not to be repaid. If previous loan default data can be used to predict which potential customers are liable to have problems repaying loans, these “bad risk” customers can either be declined a loan or offered alternative products.

预测贷款逾期者(银行业)：某银行希望根据客户过去的贷款数据，利用数据挖掘(分类模型-贝式网络(Bayesian Network))来预测新的贷款者，核贷后会逾期的机率，以做为银行是否核贷的依据，或提供给客户其他类型的贷款产品。

案例9：Retail Sales Promotion: This example deals with data that describes retail product lines and the effects of promotion on sales. The goal of this example is to predict the effects of future sales promotions.

零售业销售促销(零售业)：此案例的目的是想根据零售产品过去的促销记录，利用数据挖掘(预测模型-神经网络及回归树(Neural Network & Regression Tree))来预测未来的销售成效。

案例10：Condition Monitoring: This example concerns monitoring status information from a machine and the problem of recognizing and predicting fault states. The data consists of a number of concatenated series measured over time. Each record is a snapshot report on the machine.

状态监测(信息业)：此案例的目的是想根据机器监控状态的信息，利用数据挖掘(分类模型-神经网络及决策树(Neural Network & Decision Tree))来预测机器发生故障的机率。此案例的数据包含在时间轴上连续的机器监控状态信息。因此，每一笔记录是机器在某个时间点的状态报告。

案例11：Classifying Cell Samples: A medical researcher has obtained a dataset containing characteristics of a number of human cell samples extracted from patients who were believed to be at risk of developing cancer. Analysis of the original data showed that many of the characteristics differed significantly between benign and malignant samples. The researcher wants to develop a model to give an early indication of whether their samples might be benign or malignant.

细胞样本分类(医疗业)：某医学研究中心收集癌症病患的细胞样本特征数据，以便进行研究。原始的数据显示良性样本与恶性样本间的许多特征有显著的差异。此案例的目的是想根据此数据，利用数据挖掘(分类模型-支持向量机(Support Vector Machine))来提早发现某样本是良性还是恶性的样本。

案例12：Market Basket Analysis: This example deals with data describing the contents of supermarket baskets (that is, collections of items bought together) plus the associated personal data of the purchaser, which might be acquired through a loyalty card scheme. The goal is to discover groups of customers who buy similar products and can be characterized demographically, such as by age, income, and so on.:

购物篮分析(零售业)：此案例的目的是想根据会员卡所记录的客户的个人信息及每次购买商品的数据，利用数据挖掘(关联模型-Apriori &决策树(Decision Tree))来发掘购买类似商品的客群，以及客群的特征(例如，年龄、收入、等)。

12个真实案例玩转数据挖掘！

（详情请参照：http://bbs.pinggu.org/thread-3019568-1-1.html）