产品介绍
Web 即数据库
- X-SQL - 架构在 Web 上的 SQL 引擎,Web 和本地数据库同等对待
- 人工智能 - 人工智能驱动的自动网页挖掘技术,零干预或极少干预,超大规模网页完整精确还原为数据
- 弹性计算 - 分布式网页渲染引擎满足任意规模的数据采集需求
- 商业智能 - 在 Web 上应用商业智能,捕捉成千上万高价值事件,回答利益攸关的业务问题
-- 将一组亚马逊产品页转变成本地表
select
dom_base_uri(dom) as `url`,
dom_first_text(dom, '#productTitle') as `title`,
str_substring_after(dom_first_href(dom, '#wayfinding-breadcrumbs_container ul li:last-child a'), '&node=') as `category`,
dom_first_slim_html(dom, '#bylineInfo') as `brand`,
cast(dom_all_slim_htmls(dom, '#imageBlock img') as varchar) as `gallery`,
dom_first_slim_html(dom, '#landingImage, #imgTagWrapperId img, #imageBlock img:expr(width > 400)') as `img`,
dom_first_text(dom, '#price tr td:contains(List Price) ~ td') as `listprice`,
dom_first_text(dom, '#price tr td:matches(^Price) ~ td') as `price`,
str_first_float(dom_first_text(dom, '#reviewsMedley .AverageCustomerReviews span:contains(out of)'), 0.0) as `score`,
from load_out_pages('https://www.amazon.com/b?node=3117954011', 'a[href~=/dp/]', 1, 10);
执行
智能挖掘
零干预将网站还原为数据
给定入口链接,柏拉图 AI 识别、浏览并解读最重要的链出页,输出全部字段:
select * from harvest('https://www.amazon.com/b?node=3117954011');
AI 已浏览120个网页,已理解8组数据共87个字段。 以下显示第2组数据,该组数据共包含10个字段,对应网页区域 #centerCol
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | |||
1 | Amazon.com: BLACK+DECKER 6 quart 11-in-1 Cooking Pot, Stainless Steel, Pressure Cooker, Slow Cooker, Multi-Cooker, PR100 | BLACK+DECKER 6 quart 11-in-1 Cooking Pot, Stainless Steel, Pressure Cooker, Slow Cooker, Multi-Cooker, PR100 | by | BLACK+DECKER | 4.2 out of 5 stars | 129 ratings | | | 89 answered questions | + No Import Fees Deposit & ¥40.72 Shipping to Hong Kong | New (5) from | ¥54.17 | |
2 | Amazon.com: BLACK+DECKER 6 quart 11-in-1 Cooking Pot, Stainless Steel, Pressure Cooker, Slow Cooker, Multi-Cooker, PR100 | BLACK+DECKER 6 quart 11-in-1 Cooking Pot, Stainless Steel, Pressure Cooker, Slow Cooker, Multi-Cooker, PR100 | by | BLACK+DECKER | 4.2 out of 5 stars | 129 ratings | | | 89 answered questions | + No Import Fees Deposit & ¥40.72 Shipping to Hong Kong | New (5) from | ¥54.17 | |
3 | Amazon.com: Crock Pot 6 Quart 8 in 1 Multi Use Express Crock Programmable Pressure Cooker, Slow Cooker, Sauté & Steamer | Stainless Steel (SCCPPC60... | Crock Pot 6 Quart 8 in 1 Multi Use Express Crock Programmable Pressure Cooker, Slow Cooker, Sauté & Steamer | Stainless Steel (SCCPPC600 V1) | by | Crockpot | 4.2 out of 5 stars | 2,086 ratings | | | 670 answered questions | There is a newer model of this item: | New (31) from | ¥74.79 | |
4 | Amazon.com: Crockpot Thermoshield 6 Quart Manual Slow Cooker, Black | Crockpot Thermoshield 6 Quart Manual Slow Cooker, Black | by | Crockpot | 4.1 out of 5 stars | 150 ratings | | | 47 answered questions | + No Import Fees Deposit & ¥47.40 Shipping to Hong Kong | New & Used (12) from | ¥59.99 | |
5 | Amazon.com: GoWISE USA GW22637 4th-Generation Electric Pressure Cooker with rice scooper, and measuring cup, 14 QT | GoWISE USA GW22637 4th-Generation Electric Pressure Cooker with rice scooper, and measuring cup, 14 QT | by | GoWISE USA | 3.9 out of 5 stars | 927 ratings | | | 498 answered questions | + No Import Fees Deposit & ¥70.96 Shipping to Hong Kong | New & Used (4) from | ¥113.18 | |
6 | Amazon.com: GoWISE USA GW22637 4th-Generation Electric Pressure Cooker with rice scooper, and measuring cup, 14 QT | GoWISE USA GW22637 4th-Generation Electric Pressure Cooker with rice scooper, and measuring cup, 14 QT | by | GoWISE USA | 3.9 out of 5 stars | 927 ratings | | | 498 answered questions | + No Import Fees Deposit & ¥70.96 Shipping to Hong Kong | New & Used (4) from | ¥113.18 | |
7 | Amazon.com: GoWISE USA GW22637 4th-Generation Electric Pressure Cooker with rice scooper, and measuring cup, 14 QT | GoWISE USA GW22637 4th-Generation Electric Pressure Cooker with rice scooper, and measuring cup, 14 QT | by | GoWISE USA | 3.9 out of 5 stars | 927 ratings | | | 498 answered questions | + No Import Fees Deposit & ¥70.96 Shipping to Hong Kong | New & Used (4) from | ¥113.18 | |
8 | Amazon.com: Gourmia GPC400 4 Qt Digital Multi-Mode SmartPot Pressure Cooker - 13 Cook Modes - Removable Pot - 24-Hour Delay Timer - Automatic Keep ... | Gourmia GPC400 4 Qt Digital Multi-Mode SmartPot Pressure Cooker - 13 Cook Modes - Removable Pot - 24-Hour Delay Timer - Automatic Keep Warm - LCD Display - Pressure Sensor Lid Lock - Recipe Book | by | Gourmia | 4.2 out of 5 stars | 363 ratings | | | 171 answered questions | + No Import Fees Deposit & ¥31.80 Shipping to Hong Kong | |||
9 | Amazon.com: Mealthy MultiPot 9-in-1 Programmable Pressure Cooker 6 Quarts with Stainless Steel Pot, Steamer Basket, instant access to recipe app. P... | Mealthy MultiPot 9-in-1 Programmable Pressure Cooker 6 Quarts with Stainless Steel Pot, Steamer Basket, instant access to recipe app. Pressure cook, slow cook, sauté, rice cooker, yogurt, steam | by | Mealthy | 4.7 out of 5 stars | 1,593 ratings | | | 934 answered questions | New & Used (3) from | ¥169.99 | ||
10 | Amazon.com: Ninja Instant, 1000-Watt Pressure, Slow, Multi Cooker, and Steamer with 6-Quart Ceramic Coated Pot & Steam Rack (PC101), Si, Black/Silver | Ninja Instant, 1000-Watt Pressure, Slow, Multi Cooker, and Steamer with 6-Quart Ceramic Coated Pot & Steam Rack (PC101), Si, Black/Silver | by | Ninja | 4.7 out of 5 stars | 120 ratings | | | 65 answered questions | This product is available as Renewed. | New & Used (11) from | ¥54.95 | |
11 | Amazon.com: Power Pressure Cooker XL 10 Qt | Power Pressure Cooker XL 10 Qt | by | Power Pressure Cooker XL | 4.1 out of 5 stars | 2,977 ratings | | | 1000+ answered questions | + No Import Fees Deposit & ¥51.68 Shipping to Hong Kong | New & Used (6) from | ¥159.00 | |
12 | Amazon.com: Presto 02141 6-Quart Electric Pressure Cooker, Stainless, Black, Silver | Presto 02141 6-Quart Electric Pressure Cooker, Stainless, Black, Silver | by | Presto | 4.2 out of 5 stars | 54 ratings | | | 17 answered questions | + No Import Fees Deposit & ¥38.45 Shipping to Hong Kong | New & Used (33) from | ¥59.99 |
SAAS 服务
POST http://api.platonic.fun/api/x/a/q
Content-Type: application/json
{
"sql": "select
dom_base_uri(dom) as `url`,
dom_first_text(dom, '#productTitle') as `title`,
str_substring_after(dom_first_href(dom, '#wayfinding-breadcrumbs_container ul li:last-child a'), '&node=') as `category`,
dom_first_slim_html(dom, '#bylineInfo') as `brand`,
cast(dom_all_slim_htmls(dom, '#imageBlock img') as varchar) as `gallery`,
dom_first_slim_html(dom, '#landingImage, #imgTagWrapperId img, #imageBlock img:expr(width > 400)') as `img`,
dom_first_text(dom, '#price tr td:contains(List Price) ~ td') as `listprice`,
dom_first_text(dom, '#price tr td:matches(^Price) ~ td') as `price`,
str_first_float(dom_first_text(dom, '#reviewsMedley .AverageCustomerReviews span:contains(out of)'), 0.0) as `score`,
from load_out_pages('https://www.amazon.com/b?node=3117954011', 'a[href~=/dp/]', 1, 10);
",
"callbackUrl": "http://{{host-of-your-callback-api}}/{{path-of-your-callback-api}}",
"authToken": "fake-auth-gJn6fUBh-1-af1639a924d7232099a037e9544cf43f"
}
支持 X-SQL 的 REST API
- 业务模型映射 - 使用 X-SQL 完成从网页数据到本地业务模型的转换
- DATA API - 柏拉图的弹性计算使得规模化 Web 数据唾手可得
- 高阶 SaaS 服务 - X-SQL 灵活的内置函数,提供进一步的数据处理能力:情绪判定,知识图谱构建等
- 领域 SaaS 服务 - 对常见领域,柏拉图已内置开箱即用的解决方案
成本节约 相比传统方案,使用柏拉图管理外部数据,我们至少为客户减少了一半人员开支和一半硬件投入
数据规模 基于柏拉图的机器学习技术,我们现在能够获得网站的几乎全部字段,并且再没有数据提取规则维护的烦恼
交付时效 柏拉图简单在万维网上应用商业智能,相比传统手段的采集规则制定、采集入库、数据清洗、BI 报表流程, 交付时效提高 90% 以上
数据质量 传统手工提取数据,大概能够获得极少量网站的 50% 左右字段,使用柏拉图前沿的数据挖掘技术,能够获得任意规模网站 95% 以上数据
解决方案
告诉我们您在进行何种类型的项目
百思买批量计算折扣
select
dom_first_number(dom, '.priceView-customer-price') as `price`,
dom_first_number(dom, '.pricing-price__regular-price') as `list-price`,
dom_first_number(dom, '.pricing-price__regular-price') - dom_first_number(dom, '.priceView-customer-price') as `saving`
from
load_out_pages('https://www.bestbuy.com/site/promo/laptop-and-computer-deals', 'h4.sku-header a')
亚马逊新品跟踪
select
dom_first_text(dom, 'span.zg-item a > div:expr(img=0 && char>10)') as title,
dom_first_text(dom, '.p13n-sc-price') as `price`,
str_substring_between(dom_first_attr(dom, 'span.zg-item div a i.a-icon-star', 'class'), ' a-star-', ' ') as score
from load_and_select('https://www.amazon.com/gp/new-releases/home-garden/ref=zg_bsnr_nav_0', 'ol#zg-ordered-list li.zg-item-immersion')
客户评价
他们这么说 。。。

杨锦全
总经理 & 合伙人
使用柏拉图,我们现在每天采集三百万电商数据,相比原本预算,硬件成本减少了一半,产品研发周期缩短到了三个月。

徐玉海
总经理
使用柏拉图采集海外新闻数据后,团队可以把精力放到我们熟悉的舆情分析上,这为我们的团队管理效率带来了巨大提升。

邱维明
总经理 & 合伙人
柏拉图的 Web 数据管理系统使得我们的数据产品创意总可以在第一时间得到实现,客户常常惊讶于我们的原型交付能力。
常见问题
柏拉图是如何实现自动网页结构化的?
柏拉图考察了网页的几何、拓扑、代码结构和语义等各方面的特征,将网页的每一个 DOM 元素建模为流形(manifold)上带属性的矩形,然后进行标准机器学习处理。
柏拉图由什么语言写成?
柏拉图解决方案包含多种编程语言。核心数据引擎的主要语言是 kotlin/java,少量 c++/javascript/bash/html/css 等,核心引擎超过 30 万行源代码。 配套子项目包含了 clojure/reactjs 等。
柏拉图是否支持开源?
是的,柏拉图核心引擎和 Web BI 系统均已经开源。
可以使用哪些编程语言来获得柏拉图 SaaS 服务?
柏拉图解决方案提供标准的 SQL 语言支持以及 REST API,客户端各种编程语言都能够轻松调用,大多数情况下只需要简单发送一个 REST 请求即可。
柏拉图为什么要支持 SQL?
我们多年研究网络数据处理问题,希望以一种最优的方式去治理外部数据。将互联网同本地数据库同等对待是最佳方式。在后续版本里,柏拉图会支持流式 SQL,以完整符合网络数据的流式特征。
联系我们
柏拉图
加入柏拉图,开启企业级 Web 数据管理革命。
galaxyeye@live.cn
+86🌱186❧2153☙8660