Sqlserver
 sql >> Baza danych >  >> RDS >> Sqlserver

Wybieranie podzbioru wierszy, które przekraczają procent wszystkich wartości

SQL Server 2012+ tylko

Możesz użyć okna SUM :

WITH cte AS
(
   SELECT *,
          1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile,
          1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC)
                /SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile
   FROM tab
)
SELECT *
FROM cte 
WHERE running_percentile <= 0.8;

LiveDemo

SQL Server 2008:

WITH cte AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
    FROM t    
), cte2 AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM cte c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]
            AND c2.rn <= c.rn) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT *
FROM cte2
WHERE running_percentile <= 0.8;

LiveDemo2

Wyjście:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝

EDYTUJ 2:

WITH cte AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
    FROM t    
), cte2 AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM cte c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]
            AND c2.rn <= c.rn) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte2 a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte2
             WHERE running_percentile >= 0.8
               AND cte2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp;

LiveDemo3

Wyjście:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        3 ║ James ║     450 ║ 0,230769230769 ║ 0,871794871794     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
║        5 ║ Sarah ║     500 ║ 0,370370370370 ║ 0,814814814814     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝

SQL Server 2008 nie obsługuje wszystkiego w OVER() klauzula, ale ROW_NUMBER tak.

Najpierw po prostu oblicz pozycję w grupie:

╔═══════════╦════════╦══════════╦════╗
║ Customer  ║ User   ║ Revenue  ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║        2  ║ James  ║     750  ║  1 ║
║        1  ║ James  ║     500  ║  2 ║
║        3  ║ James  ║     450  ║  3 ║
║        8  ║ James  ║     150  ║  4 ║
║        9  ║ James  ║     100  ║  5 ║
║        7  ║ Sarah  ║     600  ║  1 ║
║        5  ║ Sarah  ║     500  ║  2 ║
║        6  ║ Sarah  ║     150  ║  3 ║
║        4  ║ Sarah  ║     100  ║  4 ║
╚═══════════╩════════╩══════════╩════╝

Drugie ct:

  • c2 podzapytanie oblicz bieżącą sumę na podstawie pozycji z ROW_NUMBER
  • c3 oblicz pełną sumę na użytkownika

W końcowym zapytaniu s podzapytanie znajduje najniższy running suma przekraczająca 80%.

EDYCJA 3:

Używasz ROW_NUMBER jest faktycznie zbędny.

WITH cte AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM t c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]
            AND c2.Revenue >= c.Revenue) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte c2
             WHERE running_percentile >= 0.8
               AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;

LiveDemo4



  1. Database
  2.   
  3. Mysql
  4.   
  5. Oracle
  6.   
  7. Sqlserver
  8.   
  9. PostgreSQL
  10.   
  11. Access
  12.   
  13. SQLite
  14.   
  15. MariaDB
  1. In-Memory OLTP:Co nowego w SQL Server 2016

  2. SQL Server:jak zapytać, kiedy wykonano ostatnią kopię zapasową dziennika transakcji?

  3. Wskazówki dotyczące naprawiania fragmentacji indeksu SQL Server

  4. SqlDataSourceEnumerator.Instance.GetDataSources() nie lokalizuje lokalnego wystąpienia serwera SQL 2008

  5. Co to jest procedura składowana i dlaczego procedura składowana?