Database

Combine Two Tables

note: 合并table Person 和Address，其中提示说不管Address里有没有personId，Person里的记录都要显示全部。这是典型的 left join 功能。

select P.FirstName, P.LastName, A.City, A.State
    from Person P
        left outer join Address A
       on P.PersonId = A.PersonId

知识点：left join (left outer join) 关键字从左表返回所有的行，即使在右表中没有匹配的行

Second Highest Salary

note: 找到第二大的数字。第一反应就是先找到 max，然后从比max小的数里面找个最大的。

select max(Salary) from Employee
    where Salary < 
        (select max(Salary) from Employee)

takeaway: the following are the most commonly used SQL aggregate functions:

avg(): eg. select avg(unitsinstock) from products;
count():
min/max()
sum():

note that, all aggregate functions above ignore NULL values except for the count function.

N th Highest Salary

SQL 语句没什么好说的，这里要注意的是SQL中的函数概念

CREATE FUNCTION getNthHighestSalary(N INT) 
RETURNS INT  -- 返回数据类型
BEGIN -- 函数体开始
-- 声明 : 变量 类型
DECLARE  M INT;
-- 变量赋值
SET M=N-1;

  RETURN (
       SELECT DISTINCT Salary FROM Employee ORDER BY Salary DESC LIMIT M,1
 );  -- 返回这条语句的结果
END

note that, limit m, 1, what is the meaning and how to use?

The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. 
LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants 
(except when using prepared statements).

With two arguments, the first argument specifies the offset of the first row to return, and the
 second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1).

Let’s see what the offset and count mean in the LIMIT clause:

The offset specifies the offset of the first row to return. The offset of the first row is 0, not 1.
The count specifies maximum number of rows to return.

The LIMIT clause often used with ORDER BY clause. First, you use the ORDER BY clause to sort the result set based on a certain criteria, and then you use LIMIT clause to find lowest or highest values.

Rank Scores

重点是如何生成这个rank字段

思路1:换一个角度,其实rank也就是这个之前大于等于本成绩的成绩个数

SELECT Score, 
(SELECT COUNT(DISTINCT Score) FROM Scores WHERE Score >= s.Score) Rank 
FROM Scores s ORDER BY Score DESC;

思路2: 一个去重之后的分数临时表来进行查询，note 临时表需要有alias

SELECT Score,
(SELECT COUNT(*) FROM (SELECT DISTINCT Score s FROM Scores) t WHERE s >= Score) Rank
FROM Scores ORDER BY Score DESC;

思路3: 下面这种解法使用了内交，Join是Inner Join的简写形式，自己和自己内交，条件是右表的分数大于等于左表，然后群组起来根据分数的降序排列，十分巧妙的解法

SELECT s.Score, COUNT(DISTINCT t.Score) Rank
FROM Scores s JOIN Scores t ON s.Score <= t.Score
GROUP BY s.Id ORDER BY s.Score DESC;

思路4: 用户定义变量

下面这种解法跟上面三种的画风就不太一样了，这里用了两个变量，变量使用时其前面需要加@，这里的：= 是赋值的意思，如果前面有Set关键字，则可以直接用=号来赋值，如果没有，则必须要使用:=来赋值，两个变量rank和pre，其中rank表示当前的排名，pre表示之前的分数，下面代码中的<>表示不等于，如果左右两边不相等，则返回true或1，若相等，则返回false或0。初始化rank为0，pre为-1，然后按降序排列分数，对于分数4来说，pre赋为4，和之前的pre值-1不同，所以rank要加1，那么分数4的rank就为1，下面一个分数还是4，那么pre赋值为4和之前的4相同，所以rank要加0，所以这个分数4的rank也是1，以此类推就可以计算出所有分数的rank了。 sql中 (SELECT @rank := 0, @pre := -1) INIT 是为这两个变量赋初值

SELECT Score,
@rank := @rank + (@pre <> (@pre := Score)) Rank
FROM Scores, (SELECT @rank := 0, @pre := -1) INIT 
ORDER BY Score DESC;

Employees Earning More Than Their Managers

simple, note that query in the where should return only one row.

select Name as Employee 
from Employee t
where Salary > (select Salary from Employee where Id = t.ManagerId);

Duplicate Emails

if you want a condition that applies to the groups instead of the tuples, you should use having clause.

select Email 
from Person
group by Email having count(*) >= 2;

Customers who never order

refer to: https://tinyurl.com/y9s9m4fk

select Name as Customers
from Customers
where Id not in(select CustomerId from Orders);

另一种思路是用not exist

SELECT  `Name` AS 'Customers' FROM Customers WHERE Id NOT EXISTS (SELECT CustomerId FROM Orders)

EXISTS / IN 强调的是是否返回结果集，不要求知道返回什么, 比如： select name from student where sex = 'm' and mark exists(select 1 from grade where ...) ,只要exists引导的子句有结果集返回，那么exists这个条件就算成立了,大家注意返回的字段始终为1，如果改成“select 2 from grade where ...”，那么返回的字段就是2，这个数字没有意义。所以exists子句不在乎返回什么，而是在乎是不是有结果集返回。 EXISTS检查是否有结果，判断是否有记录，返回的是一个布尔型（TRUE/FALSE）。 而 exists 与 in 最大的区别在于 in引导的子句只能返回一个字段，比如： select name from student where sex = 'm' and mark in (select 1,2,3 from grade where ...) ，in子句返回了三个字段，这是不正确的，exists子句是允许的，但in只允许有一个字段返回，在1，2，3中随便去了两个字段即可。通常情况下，EXISTS 的性能要比 IN 略好

in general, use "in" or "exists" depends on whether table A or table B is large.

in 和 exists

in是把外表和内表作hash连接，而exists是对外表作loop循环，每次loop循环再对内表进行查询，一直以来认为exists比in效率高的说法是不准确的。如果查询的两个表大小相当，那么用in和exists差别不大；如果两个表中一个较小一个较大，则子查询表大的用exists，子查询表小的用in；

e.g. 表A(小表)，表B(大表)

select * from A where cc in(select cc from B)　　-->效率低，用到了A表上cc列的索引；
select * from A where exists(select cc from B where cc=A.cc)　　-->效率高，用到了B表上cc列的索引。

相反

select * from B where cc in(select cc from A)　　-->效率高，用到了B表上cc列的索引
select * from B where exists(select cc from A where cc=B.cc)　　-->效率低，用到了A表上cc列的索引。

not in 和 not exists

如果查询语句使用了not in，那么对内外表都进行全表扫描，没有用到索引；而not exists的子查询依然能用到表上的索引。所以无论哪个表大，用not exists都比not in 要快。

in 和 = 的区别

下面两个语句含义一样：

select name from student where name in('zhang','wang','zhao');
select name from student where name='zhang' or name='wang' or name='zhao';

Database

Database

Combine Two Tables

Second Highest Salary

N th Highest Salary

Rank Scores

Employees Earning More Than Their Managers

Duplicate Emails

Customers who never order

results matching ""

No results matching ""