{{sellerTotalView > 1 ? __("sellers", {number: sellerTotalView}) : __("seller", {number: sellerTotalView}) }}, {{numTotalView > 1 ? __("items", {number: numTotalView}) : __("item", {number: numTotalView}) }}
free FREE

Change Your Zip Code

Inventory information and delivery speeds may vary for different locations.

Location History

{{email ? __('Got it!') : __('Restock Alert')}}

We will notify you by email when the item back in stock.

Cancel
Yami

Jingdong book

干净的数据 数据清洗入门与实践

{{buttonTypePin == 3 ? __("Scan to view more PinGo") : __("Scan to start")}}

干净的数据 数据清洗入门与实践

{{__(":people-members", {'people': item.limit_people_count})}} {{ itemCurrency }}{{ item.valid_price }} {{ itemCurrency }}{{ item.invalid_price }} {{ itemDiscount }}
Ends in
{{ itemCurrency }}{{ item.valid_price }}
{{ itemCurrency }}{{ priceFormat(item.valid_price / item.bundle_specification) }}/{{ item.unit }}
{{ itemDiscount }}
{{ itemCurrency }}{{ item.valid_price }} {{ itemCurrency }}{{ priceFormat(item.valid_price / item.bundle_specification) }}/{{ item.unit }} {{ itemCurrency }}{{ item.invalid_price }} {{itemDiscount}}
{{ itemCurrency }}{{ item.valid_price }}
Sale ends in
Sale will starts after Sale ends in
{{ getSeckillDesc(item.seckill_data) }}
{{ __( "Pay with Gift Card to get sale price: :itemCurrency:price", { 'itemCurrency' : itemCurrency, 'price' : (item.giftcard_price ? priceFormat(item.giftcard_price) : '0.00') } ) }} ({{ itemCurrency }}{{ priceFormat(item.giftcard_price / item.bundle_specification) }}/{{ item.unit }}) Details
Best before

Currently unavailable.

We don't know when or if this item will be back in stock.

Unavailable in your area.
Sold Out

Details

Full product details
Editer Recommend

数据清洗是数据挖掘与分析过程中不可缺少的一个环节,但因为数据类型极其复杂,传统的清洗脏数据工作单调乏味且异常辛苦。如果能利用正确的工具和方法,就可以让数据清洗工作事半功倍。
本书从文件格式、数据类型、字符编码等基本概念讲起,通过真实的示例,探讨如何提取和清洗关系型数据库、网页文件和PDF文档中的数据。提供了两个真实的项目,让读者将所有数据清洗技术付诸实践,完成整个数据科学过程。
如果你是一位数据科学家,或者从事数据科学工作,哪怕是位新手,只要对数据清洗有兴趣,那么本书就适合你阅读!
- 理解数据清洗在整个数据科学过程中的作用
- 掌握数据清洗的基础知识,包括文件清洗、数据类型、字符编码等
- 发掘电子表格和文本编辑器中与数据组织和操作相关的重要功能
- 学会常见数据格式的相互转换,如JSON、CSV和一些特殊用途的格式
- 采用三种策略来解析和清洗HTML文件中的数据
- 揭开PDF文档的秘密,提取需要的数据
- 借助一系列解决方案来清洗存放在关系型数据库里的坏数据
- 创建自己的干净数据集,为其打包、添加授权许可并与他人共享
- 使用书中的工具以及Twitter和Stack Overflow数据,完成两个真实的项目
Content Description

本书主要内容包括:数据清洗在数据科学领域中的重要作用,文件格式、数据类型、字符编码的基本概念,组织和处理数据的电子表格与文本编辑器,各种格式数据的转换方法,解析和清洗网页上的HTML 文件的三种策略,提取和清洗PDF 文件中数据的方法,检测和清除RDBMS 中的坏数据的解决方案,以及使用书中介绍的方法清洗来自Twitter 和Stack Overflow 的数据。
Author Description

Megan Squire 依隆大学计算科学专业教授,主要教授数据库系统、Web开发、数据挖掘和数据科学课程。有二十年的数据收集与清洗经验。她还是FLOSSmole研究项目的领导者,致力于收集与分析数据,以便研究免费软件、自由软件和开源软件的开发。
Catalogue

目录

第1 章 为什么需要清洗数据 1
1.1新视角1
1.2数据科学过程2
1.3传达数据清洗工作的内容3
1.4数据清洗环境4
1.5入门示例5
1.6小结9
第2章 基础知识——格式、 类型与编码11
2.1文件格式11
2.1.1文本文件与二进制文件11
2.1.2常见的文本文件格式14
2.1.3分隔格式14
2.2归档与压缩20
2.2.1归档文件20
2.2.2压缩文件21
2.3数据类型、空值与编码24
2.3.1数据类型25
2.3.2数据类型间的相互转换29
2.3.3转换策略30
2.3.4隐藏在数据森林中的空值37
2.3.5字符编码41
2.4小结46
第3章 数据清洗的老黄牛——电子表格和文本编辑器47
3.1电子表格中的数据清洗47
3.1.1Excel 的文本分列功能47
3.1.2字符串拆分51
3.1.3字符串拼接51
3.2文本编辑器里的数据清洗54
3.2.1文本调整55
3.2.2列选模式56
3.2.3加强版的查找与替换功能56
3.2.4文本排序与去重处理58
3.2.5Process Lines Containing60
3.3示例项目60
3.3.1第一步:问题陈述60
3.3.2第二步:数据收集60
3.3.3第三步:数据清洗61
3.3.4第四步:数据分析63
3.4小结63
第4章 讲通用语言——数据转换64
4.1基于工具的快速转换64
4.1.1从电子表格到CSV65
4.1.2从电子表格到JSON65
4.1.3使用phpMyAdmin 从SQL语句中生成CSV 或JSON67
4.2使用PHP 实现数据转换69
4.2.1使用PHP 实现SQL 到JSON的数据转换69
4.2.2使用PHP 实现SQL 到CSV的数据转换70
4.2.3使用PHP 实现JSON 到CSV的数据转换71
4.2.4使用PHP 实现CSV 到JSON的数据转换71
4.3使用Python 实现数据转换72
4.3.1使用Python 实现CSV 到JSON的数据转换72
4.3.2使用csvkit 实现CSV 到JSON的数据转换73
4.3.3使用Python 实现JSON 到CSV的数据转换74
4.4示例项目74
4.4.1第一步:下载GDF 格式的Facebook 数据75
4.4.2第二步:在文本编辑器中查看GDF 文件75
4.4.3第三步:从GDF 格式到JSON格式的转换76
4.4.4第四步:构建D3 图79
4.4.5第五步:把数据转换成Pajek格式81
4.4.6第六步:简单的社交网络分析83
4.5小结84
第5章 收集并清洗来自网络的数据85
5.1理解HTML 页面结构85
5.1.1行分隔模型86
5.1.2树形结构模型86
5.2方法一:Python 和正则表达式87
5.2.1第一步:查找并保存实验用的Web 文件88
5.2.2第二步:观察文件内容并判定有价值的数据88
5.2.3第三步:编写Python 程序把数据保存到CSV 文件中89
5.2.4第四步:查看文件并确认清洗结果89
5.2.5使用正则表达式解析HTML的局限性90
5.3方法二:Python 和BeautifulSoup90
5.3.1第一步:找到并保存实验用的文件90
5.3.2第二步:安装BeautifulSoup91
5.3.3第三步:编写抽取数据用的Python 程序91
5.3.4第四步:查看文件并确认清洗结果92
5.4方法三:Chrome Scraper92
5.4.1第一步:安装Chrome 扩展Scraper92
5.4.2第二步:从网站上收集数据92
5.4.3第三步:清洗数据94
5.5示例项目:从电子邮件和论坛中抽取数据95
5.5.1项目背景95
5.5.2第一部分:清洗来自Google Groups 电子邮件的数据96
5.5.3第二部分:清洗来自网络论坛的数据99
5.6小结105
第6章 清洗PDF 文件中的数据106
6.1为什么PDF 文件很难清洗106
6.2简单方案——复制107
6.2.1我们的实验文件107
6.2.2第一步:把我们需要的数据复制出来108
6.2.3第二步:把复制出来的数据粘贴到文本编辑器中109
6.2.4第三步:轻量级文件110
6.3第二种技术——pdfMiner111
6.3.1第一步:安装pdfMiner111
6.3.2第二步:从PDF 文件中提取文本111
6.4第三种技术——Tabula113
6.4.1第一步:下载Tabula113
6.4.2第二步:运行Tabula113
6.4.3第三步:用Tabula 提取数据114
6.4.4第四步:数据复制114
6.4.5第五步:进一步清洗114
6.5所有尝试都失败之后——第四种技术115
6.6小结117
第7章 RDBMS 清洗技术118
7.1准备118
7.2第一步:下载并检查Sentiment140119
7.3第二步:清洗要导入的数据119
7.4第三步:把数据导入MySQL120
7.4.1发现并清洗异常数据121
7.4.2创建自己的数据表122
7.5第四步:清洗&字符123
7.6第五步:清洗其他未知字符124
7.7第六步:清洗日期125
7.8第七步:分离用户提及、标签和URL127
7.8.1创建一些新的数据表128
7.8.2提取用户提及128
7.8.3提取标签130
7.8.4提取URL131
7.9第八步:清洗查询表132
7.10第九步:记录操作步骤134
7.11小结135
第8章 数据分享的最佳实践136
8.1准备干净的数据包136
8.2为数据编写文档139
8.2.1README 文件139
8.2.2文件头141
8.2.3数据模型和图表142
8.2.4维基或CMS144
8.3为数据设置使用条款与许可协议144
8.4数据发布146
8.4.1数据集清单列表146
8.4.2Stack Exchange 上的OpenData147
8.4.3编程马拉松147
8.5小结148
第9章 Stack Overflow 项目149
9.1第一步:关于Stack Overflow 的问题149
9.2第二步:收集并存储Stack Overflow数据151
9.2.1下载Stack Overflow 数据151
9.2.2文件解压152
9.2.3创建MySQL 数据表并加载数据152
9.2.4构建测试表154
9.3第三步:数据清洗156
9.3.1创建新的数据表157
9.3.2提取URL 并填写新数据表158
9.3.3提取代码并填写新表159
9.4第四步:数据分析161
9.4.1哪些代码分享网站最为流行161
9.4.2问题和答案中的代码分享网站都有哪些162
9.4.3提交内容会同时包含代码分享URL 和程序源代码吗165
9.5第五步:数据可视化166
9.6第六步:问题解析169
9.7从测试表转向完整数据表169
9.8小结170
第10章 Twitter 项目171
10.1第一步:关于推文归档数据的问题171
10.2第二步:收集数据172
10.2.1下载并提取弗格森事件的数据文件173
10.2.2创建一个测试用的文件174
10.2.3处理推文ID174
10.3第三步:数据清洗179
10.3.1创建数据表179
10.3.2用Python 为新表填充数据180
10.4第四步:简单的数据分析182
10.5第五步:数据可视化183
10.6第六步:问题解析186
10.7把处理过程应用到全数据量(非测试用)数据表186
10.8小结187

Specifications

Brand Jingdong book
Brand Origin China

Disclaimer

Product packaging, specifications and price are subject to change without notice. All information about the products on our website is provided for information purposes only. Please always read labels, warnings and directions provided with the product before use.

View Full Terms of Use
Add to favorites
{{ $isZh ? coupon.coupon_name_sub : coupon.coupon_ename_sub | formatCurrency }}
{{__("Buy Directly")}} {{ itemCurrency }}{{ item.directly_price }}
Quantity
{{ quantity }}
{{ instockMsg }}
{{ limitText }}
{{buttonTypePin == 3 ? __("Scan to view more PinGo") : __("Scan to start")}}
Sold by JD@CHINA
Ship to
{{ __("Ship to United States only") }}
Free shipping over 69
Genuine guarantee

Added to Cart

Keep Shopping

More to Consider

{{ item.brand_name }}

{{ item.item_name }}

{{ item.currency }}{{ item.market_price }}

{{ item.currency }}{{ item.unit_price }}

{{ item.currency }}{{ item.unit_price }}

Coupons

{{ coupon.coupon_name_new | formatCurrency }}
Clip Clipped Over
{{ getCouponDescStr(coupon) }}
{{ coupon.use_time_desc }}
Expires soon {{ formatTime(coupon.use_end_time) }}

Share this item with friends

Cancel

Yami Gift Card

Get this exclusive deal when paying with gift card

Terms and Conditions

Gift card deals are special offers for selected products;

The gift card deals will automatically be activated if a customer uses gift card balance at check out and the balance is sufficient to pay for the total price of the shopping cart products with gift card deals;

You will not be able to activate the gift card deals if you choose other payment methods besides gift card. The products will be purchased at their normal prices;

If your account balance is not enough to pay for the products with gift card deals, you can choose to reload your gift card balance by clicking on the Reload button at either shopping cart page or check out page;

Products that have gift card deals can be recognized by a special symbol showing 'GC Deal';

For any additional questions or concerns, please contact our customer service;

Yamibuy reserves the right of final interpretation.

Sold by Yami

Service Guarantee

Yami Free Shipping over $49
Yami Easy Returns
Yami Ships from United States

Shipping

  • United States

    Standard Shipping is $5.99 (Excluding Alaska & Hawaii). Free on orders of $49 or more.

    Local Express is $5.99 (Available in Parts of CA, NJ, MA & PA). Free on orders of $49 or more.

    2-Day Express (Includes Alaska & Hawaii) starts at $19.99.

Return Policy

Yami is committed to provide our customers with a peace of mind when purchasing from us. Most items shipped from Yamibuy.com can be returned within 30 days of receipt of shipment (For Food, Beverages, Snacks, Dry Goods, Health supplements, Fresh Grocery and Perishables Goods, within 7 days of receipt of shipment due to damages or quality issues; To ensure that every customer receives safe and high-quality products, we do not provide refunds or returns for beauty products once they have been opened or used, except in the case of quality issues; Some products may have different policies or requirements associated with them, please see below for products under special categories, or contact Yami Customer Service for further assistance).
Thank you for your understanding and support.

Learn More

Sold by Yami

Terms and Conditions of Yami E-Gift Card

If you choose “Redeem automatically” as your delivery method, your gift card balance will be reload automatically after your order has been processed successfully;

If you choose “Send to Email”as your delivery method, the card number and CVV will be sent to the email address automatically;

Any user can use the card number and CVV to redeem the gift card, please keep your gift card information safely. If you have any trouble receiving email, please contact Yami customer service;

Yami gift card can be used to purchase both Yami owned or Marketplace products;

Yami gift card will never expire;

Yami gift card balance does not have to be used up at once;

All rights reserved by Yami.

Return Policy

Gift card that has already been consumed is non-refundable.

Sold by JD@CHINA

Service Guarantee

Yami Free Shipping over $49
Yami Easy Returns
Yami Ships from United States

Shipping

  • United States

    Standard Shipping is $5.99 (Excluding Alaska & Hawaii). Free on orders of $49 or more.

    Local Express is $5.99 (Available in Parts of CA, NJ, MA & PA). Free on orders of $49 or more.

    2-Day Express (Includes Alaska & Hawaii) starts at $19.99.

Return Policy

You may return product within 30 days upon receiving the product. Items returned must be new in it's original packing, including the original invoice for the purchase. Customer return product at their own expense.

Sold by JD@CHINA

Service Guarantee

Yami Cross-store Free Shipping over $69
Yami 30-days Return

Yami-China FC

Yami has a consolidation warehouse in China which collects multiple sellers’ packages and combines to one order. Our Yami consolidation warehouse will directly ship the packages to your door. Cross-store free shipping over $69.

Return Policy

You may return products within 30 days upon receiving the products. Sellers take responsibilities for any wrong shipment or missing items. Packing needs to be unopened for any other than quality issues return. We promise to pack carefully, but because goods are taking long journey to destinations, simple damages to packaging may occur. Any damages not causing internal goods quality problems are not allowed to return. If you open the package and any quality problem is found, please contact customer service within three days after receipt of goods.

Shipping Information

Yami Consolidation Service Shipping Fee $9.99(Free shipping over $69)

Sellers in China will ship their orders within 1-2 business days once the order is placed. Packages are sent to our consolidation warehouse in China and combined there. Our Yami consolidation warehouse will directly ship the packages to you via UPS. The average time for UPS to ship from China to the United States is about 10 working days and it can be traced using the tracking number. Due to the pandemic, the delivery time may be delayed by about 5 days. The package needs to be signed by the guest. If the receipt is not signed, the customer shall bear the risk of loss of the package.

Sold by JD@CHINA

Service Guarantee

Free shipping over 69
Genuine guarantee

Shipping

Yami Consolidated Shipping $9.99(Free shipping over $69)


Seller will ship the orders within 1-2 business days. The logistics time limit is expected to be 7-15 working days. In case of customs clearance, the delivery time will be extended by 3-7 days. The final receipt date is subject to the information of the postal company.

Yami Points information

All items are excluding from any promotion or points events on Yamibuy.com

Return Policy

You may return product within 30 days upon receiving the product. Items returned must be new in it's original packing, including the original invoice for the purchase. Customer return product at their own expense.

Yami

Download the Yami App

Back Top

Recommended for You

About the brand

Jingdong book

为您推荐

Yami
欣葉
2种选择
欣叶 御大福 芋头麻薯 180g

周销量 600+

$1.66 $1.99 83折
Yami
欣葉
2种选择
欣叶 御大福 芋头麻薯 180g

周销量 600+

$1.66 $1.99 83折
Yami
欣葉
2种选择
欣叶 御大福 芋头麻薯 180g

周销量 600+

$1.66 $1.99 83折
Yami
欣葉
2种选择
欣叶 御大福 芋头麻薯 180g

周销量 600+

$1.66 $1.99 83折
Yami
欣葉
2种选择
欣叶 御大福 芋头麻薯 180g

周销量 600+

$1.66 $1.99 83折
Yami
欣葉
2种选择
欣叶 御大福 芋头麻薯 180g

周销量 600+

$1.66 $1.99 83折

Reviews{{'('+ commentList.posts_count + ')'}}

Have your say. Be the first to help other guests.

Write a review
{{ totalRating }} Write a review
  • {{i}} star

    {{i}} stars

    {{ parseInt(commentRatingList[i]) }}%

Yami Yami
{{ comment.user_name }}

{{ showTranslate(comment) }}Show Less

{{ strLimit(comment,800) }}Show more

Show Original

{{ comment.content }}

Yami
Show All

{{ formatTime(comment.in_dtm) }} VERIFIED PURCHASE {{groupData}}

{{ comment.likes_count }} {{ comment.likes_count }} {{ comment.reply_count }} {{comment.in_user==uid ? __('Delete') : __('Report')}}
Yami Yami
{{ comment.user_name }}

{{ showTranslate(comment) }}Show Less

{{ strLimit(comment,800) }}Show more

Show Original

{{ comment.content }}

Yami
Show All

{{ formatTime(comment.in_dtm) }} VERIFIED PURCHASE {{groupData}}

{{ comment.likes_count }} {{ comment.likes_count }} {{ comment.reply_count }} {{comment.in_user==uid ? __('Delete') : __('Report')}}

No related comment~

Review

Yami Yami

{{ showTranslate(commentDetails) }}Show Less

{{ strLimit(commentDetails,800) }}Show more

Show Original

{{ commentDetails.content }}

Yami
Show All

{{ formatTime(commentDetails.in_dtm) }} VERIFIED PURCHASE {{groupData}}

{{ commentDetails.likes_count }} {{ commentDetails.likes_count }} {{ commentDetails.reply_count }} {{commentDetails.in_user==uid ? __('Delete') : __('Report')}}

Please write at least one word

Comments{{'(' + replyList.length + ')'}}

Yami Yami

{{ showTranslate(reply) }}Show Less

{{ strLimit(reply,800) }}Show more

Show Original

{{ reply.reply_content }}

{{ formatTime(reply.reply_in_dtm) }}

{{ reply.reply_likes_count }} {{ reply.reply_likes_count }} {{ reply.reply_reply_count }} {{reply.reply_in_user==uid ? __('Delete') : __('Report')}}

Please write at least one word

Cancel

That’s all the comments so far!

Write a review
How would you rate this item?

Please add your comment.

  • A nice nickname will make your comments more popular!
  • The nickname in your account will be changed to the same as here.
Thanks for your review
Our community rely on great reviews like yours to find the best of Asia.

Report

If you find this content inappropriate and think it should be removed from the Yami.com site, let us know please.

Cancel

Are you sure to delete your review?

Cancel

You’ve Recently Viewed

About the brand

Jingdong book