current position:Home>Fixed the data volume is too big, Excel?There are other can replace Excel easy-to-use tools?

Fixed the data volume is too big, Excel?There are other can replace Excel easy-to-use tools?

2022-11-24 19:41:05Yan Suo Han Lou


数据量太大,ExcelWhat should I do if I can't drag it??there are other alternativesExcelBut is it an easy-to-use tool??

This question really hit the nail on the head.,I have received many similar questions,Let's unify today.

Solutions for reporting or data analysis with large amounts of data

  • More than 100M,几十万行excel的数据量:数据库ACCESS+SQL

  • If the data does not reach the level of hundreds of millions,直接用BI工具分析

  • 再大,It's not your cousin,Data analysts can figure it out.

In view of the fact that everyone has general daily useExcel,This paper will mainly focus on the first type of scheme,A complete tutorial,Versatile and practical,用到的工具是ACCESS数据库.

关于ACCESS,它Excelbrother of,Belongs to MicrosoftOffice一门,上手不难,Get basic operations done in a week.

It can solve the following complaints from the small partners who do data operations:

business to deal withExcelData table storage is getting bigger and bigger,超过50MBIs slow to a crawl,At this time, if there are moreIF、VLOOKUP函数什么的,The computer went on strike;If you encounter a size like the followingExcel表格,Server-level computers can't handle it,Let alone data processing and data analysis.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

huge amount of dataexcelDifficult to even open the form


Encountered such a large storage capacity in the above pictureExcel数据表格,卡是必然的,Whether you die or not depends on your character.Then encounter such a problem,Can it handle large files?,At the same time the operation is simple、Easy-to-use data analysis software?

the answer is of course:YES,而且还是Excelbrother of,Belongs to MicrosoftOffice派系的ACCESS.

The following content will take a common analysis project in operation as a case,Strive to let the small partners who do data analysis understandACCESS有一个基本的了解,In order to find ideas and methods for analyzing large quantities of data.

The figure below is used in this articleACCESSData analysis of the original table4大目标.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?


here firstACCESSSay a few words about the basic situation,Then use a practical case for data analysis.

一、ACCESS数据库简介


1.ACCESS和SQL语句的基本概念

Access,全称“Microsoft Office Access”,是微软OFFICE中的一个成员, Relational database management system published by Microsoft.它结合了 Microsoft Jet Database Engine 和图形用户界面两项特点,是 Microsoft Office 的系统程序之一.(来自百度百科)

提到ACCESS,就不得不提SQL,只有掌握了SQL,才能将ACCESSThe function to the limit.SQL的全称是“结构化查询语言”(Structured Query Language),是一种声明式语言.

首先要把这个概念记在脑中:“声明”.Compared with the programming languages ​​​​that everyone has known in the past,, SQL 语言是为计算机声明了一个你想从原始数据中获得什么样的结果的一个范例,而不是告诉计算机如何能够得到结果.换言之,SQLThe real heart of the lies in the reference to the table.

SELECT first_name, last_name FROM employees WHERE age> =25

上面的例子很容易理解,We don't care where these employee records come from,All we need are those whose age is greater than or equal to25data of aged employees(age> =25).

2.ACCESS的优势

ACCESSThe most obvious benefit is that,它可以在不用掌握很高深编程语言的条件下,处理Excel所不能承载的大存储量的数据原始文件,速度奇快,且易学易用.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?


3. ACCESS的常用语句

下表是ACCESSSome commonly used in the process of usingSQL语句,It's not difficult to understand.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

ACCESS数据库常用的SQL语句


To learn data analysis tools well,The most important thing is to use actual cases to mobilize the knowledge points of various fragmented tools,After a complete case analysis,Learn how to use these tools in a short amount of time.

简单介绍完了ACCESS和SQL语句后,接下来开始ACCESSLet's do data analysis!

二、ACCESS数据分析实操


1.数据导入

The table below is theACCESSRaw files for data analysis,数据量近230MB,ExcelIt takes a few minutes to open,And it depends on the mood of the computer…For the purpose of commercial confidentiality,This article will use some of the data for analysis and practice,and do some processing.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Raw data exported from the background


先将Excel中的文件导入ACCESS中,As shown in the arrow path in the figure below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

导入excel原始数据文件


According to the above steps after the operation,自动生成主键(即ID),得到如下结果:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

ExcelRaw data files are imported intoACCESS中


2.Analysis of user order time period

Analyze the order time period,Need to convert the time when the user places an order into hours“时点”,这里使用的SQL语句是format,The function is to format the selected field,语法为:

format(引用字段,"数据格式")

其中,“数据格式”Generally choose in timeH(小时)、D(天)、M(月)或Y(年).

然后,再使用count函数,将UserID进行计数,The result is the order quantity.

注意,使用format和count之后,需要使用“AS”define it as a new field,Here the two are defined as“时段”和“订单量”.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Steps for analyzing the order time period


在“创建”里新建一个“查询设计”,点开右下角的“SQL”,然后在SQLEnter the following statement in the dialog box:

SELECT format(下单时间,"h") AS 时段, count(UserID) AS 订单量

FROM 元数据

GROUP BY format(下单时间,"h");

然后,点击“设计”下的“运行”,得到如下结果:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

The distribution of orders in each time period


Give an example to illustrate how to interpret the above results,If a customer is12:23下单,Then the time point is attributed to“12”it's time,而“12”代表的是12~13this time period.可以根据“运营实操|How to use WeChat background data to optimize WeChat operations”The method in this article uses the function to turn it into a period display.

3.Analysis of order volume distribution in the payment interval

Calculating the payment range requires a comparisonX的函数--- Switch,It evaluates a sequence of expressions in sequence,If an expression holds,returns its subsequent value.

语法:

SWITCH(条件1,结果1,条件2,结果2,条件3,结果3,…,条件N,结果N)

条件1、条件2、条件3:represents the expression to be evaluated,条件1成立的话,返回值结果1,条件2成立的话,返回值结果2,依次类推.

按照上述的方法,在“创建”里新建一个“查询设计”,点开右下角的“SQL”后,输入如下语句:

SELECT userID, payment amount, switch(payment amount<=10,"1~10元",

payment amount<=20,"11~20元",

payment amount<=50,"21~50元",

payment amount<=80,"51~80元",

payment amount<=150,"81~150元",

payment amount>150,"151~220元")AS 消费区间

FROM 元数据;

点击“运行”后,得到如下结果:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

The consumption range corresponding to the consumption amount of a single order placed by the user


此时,The data processing about the consumption interval has not been completed,Because this is the consumption range corresponding to the payment amount of each order record.What we're going to do next is something likeexcelThe practice of pivot table in,Put the consumption interval in the first column,So as to make statistics on how many orders are in each consumption interval.

所以呢,跟上面一样,Need to create a new query,名称改为“Orders for payment interval statistics”.

Here you need to enterSQL语句是:

SELECT 消费区间, count(UserID) AS 订单数量

FROM payment range

GROUP BY 消费区间;

点击“运行”后,The results obtained are shown below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Distribution of orders in each consumption area


然后,Copy the above data intoExcel表格里,Make the following percentage fan chart,It is possible to intuitively analyze the proportion of orders in each consumption interval,And then see how the overall user consumption level is,During the period of operation for reasonable evaluation.

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Fan chart of the proportion of orders in each consumption area


4.Order volume by region、Analysis of the number of users and sales

(1)Number of users in each region

This is a little troublesome,number of users“userID”The count of is calculated indirectly,However, since the vast majority of users place orders no less than2次,So if you count directly,It is the result of the orders.鉴于此种情况,we have to think differently,First make a unique user order information form,也就是每个用户IDOrdered frequency table.

新建一个“查询设计”,命名为“用户消费频次”.在SQLEnter the following statement in the dialog box:

SELECT UserID, COUNT(UserID) AS 消费次数, 区域

FROM 元数据

GROUP BY UserID, 区域;

点击“运行”后,The results obtained are shown below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

User order frequency table


这样,We can use this user consumption frequency table as a springboard,Calculate the number of users in each area in the newly created table again.

新建一个“查询设计”,命名为“Number of users in each region”.在SQLEnter the following statement in the dialog box:

SELECT 区域, count(UserID) AS 总用户数

FROM 用户消费频次

GROUP BY 区域;

点击“运行”后,The results obtained are shown below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Number of users in each region


(2)Order volume by region、Consumption amount status

新建一个“查询设计”,命名为“Orders by region”.在SQLEnter the following statement in the dialog box:

SELECT 区域, count(UserID) AS 订单总数, sum(payment amount) AS 总金额, avg(payment amount)AS 平均消费金额

FROM 元数据

GROUP BY 区域;

点击“运行”后,The results obtained are shown below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Order volume by region、Consumption amount status


Then integrate the number of users in each region above into this table,to get a complete overview of operations in these three regions.见下表:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Overview of operations in each region


5.用户价值分析

The user value analysis here is based onRFM模型,However, it has been further improved,在原先“累计消费金额”的基础上,引入了“Minimum spend”、“Maximum spending amount”和“平均消费金额”这三个指标,Strive to fully reflect the purchasing power of consumers.

新建一个“查询设计”,命名为“用户消费情况”.在SQLEnter the following statement in the dialog box:

SELECT userID, min(payment amount) AS Minimum spend,

max(payment amount) AS Maximum spending amount,

avg(payment amount) AS 平均消费金额,

sum(payment amount) AS 消费总金额,

count(payment amount) AS 消费频次,

datediff("d",max(下单日期),#2015-9-15#) AS The number of days since the last consumption today

FROM 元数据

GROUP BY userID;

点击“运行”后,The results obtained are shown below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

User Value Analysis Table


After getting the table,It can be clustered and analyzed,按照R、F、MThese three dimensions classify users,详情可参看“【数据运营实操】如何运用数据分析对某个试运营项目进行“无死角”的复盘?”这篇文章.

最后,We can also get the total order situation and sales amount of these three regions:

新建一个“查询设计”,命名为“Regional Sales Overview”.在SQLEnter the following statement in the dialog box:

SELECT count(userID) AS 订单总数,

sum(payment amount) AS 付款总额,

avg(payment amount) AS 平均订单金额

FROM 元数据;

点击“运行”后,The results obtained are shown below:

数据量太大,Excel拖不动?there are other alternativesExceleasy to use tool?

Overview of Sales in Three Regions


结语


由上面的案例可以看出,如果SQLStatements with more skilled,ACCESSProcessing data will not be moreExcel逊色,And a large amount of processing data is its strength.

copyright notice
author[Yan Suo Han Lou],Please bring the original link to reprint, thank you.
https://en.fheadline.com/2022/328/202211241919217089.html

Random recommended