Nadal otrzymuję przepełnienie arytmetyczne, gdy filtruję według daty i godziny rzutowania, nawet jeśli używam IsDate()

DateTime serwera SQL Server ma domenę 1753-01-01 00:00:00.000 ≤ x ≤ 9999-12-31 23:59:59,997. Rok 210 n.e. jest poza tą domeną. Stąd problem.

Jeśli korzystasz z SQL Server 2008 lub nowszego, możesz przesłać go do DateTime2 typ danych i byłbyś złoty (jego domena to 0001-01-01 00:00:00.0000000 &le x ≤ 9999-12-31 23:59:59,99999999. Ale z SQL Server 2005 jesteś prawie SOL.

To jest naprawdę problem z czyszczeniem danych. W takich przypadkach skłaniam się do załadowania danych stron trzecich do tabeli pomostowej z każdym polem jako ciągami znaków. Następnie wyczyść dane na miejscu, zastępując na przykład nieprawidłowe daty wartością NULL. Po oczyszczeniu wykonaj niezbędne prace związane z konwersją, aby przenieść go do miejsca docelowego.

Innym podejściem jest użycie dopasowywania wzorców i filtrowanie dat bez konwertowania czegokolwiek na datetime . Wartości daty/czasu ISO 8601 to ciągi znaków, które mają chwalebną właściwość (A) czytelności dla człowieka oraz (B) prawidłowego zestawiania i porównywania.

To, co zrobiłem w przeszłości, to trochę pracy analitycznej, aby zidentyfikować wszystkie wzorce w polu daty i godziny, zastępując cyfry dziesiętne 'd', a następnie uruchamiając group by aby obliczyć liczbę każdego znalezionego wzorca. Gdy już to zrobisz, możesz utworzyć kilka tabel wzorów, które Cię poprowadzą. Coś takiego:

create table #datePattern
(
  pattern varchar(64) not null primary key clustered ,
  monPos  int         not null ,
  monLen  int         not null ,
  dayPos  int         not null ,
  dayLen  int         not null ,
  yearPos int         not null ,
  yearLen int         not null ,
)

insert #datePattern values ( '[0-9]/[0-9]/[0-9] %'                          ,1,1,3,1,5,1)
insert #datePattern values ( '[0-9]/[0-9]/[0-9][0-9] %'                     ,1,1,3,1,5,2)
insert #datePattern values ( '[0-9]/[0-9]/[0-9][0-9][0-9] %'                ,1,1,3,1,5,3)
insert #datePattern values ( '[0-9]/[0-9]/[0-9][0-9][0-9][0-9] %'           ,1,1,3,1,5,4)
insert #datePattern values ( '[0-9]/[0-9][0-9]/[0-9] %'                     ,1,1,3,2,6,1)
insert #datePattern values ( '[0-9]/[0-9][0-9]/[0-9][0-9] %'                ,1,1,3,2,6,2)
insert #datePattern values ( '[0-9]/[0-9][0-9]/[0-9][0-9][0-9] %'           ,1,1,3,2,6,3)
insert #datePattern values ( '[0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %'      ,1,1,3,2,6,4)
insert #datePattern values ( '[0-9][0-9]/[0-9]/[0-9] %'                     ,1,2,4,1,6,1)
insert #datePattern values ( '[0-9][0-9]/[0-9]/[0-9][0-9] %'                ,1,2,4,1,6,2)
insert #datePattern values ( '[0-9][0-9]/[0-9]/[0-9][0-9][0-9] %'           ,1,2,4,1,6,3)
insert #datePattern values ( '[0-9][0-9]/[0-9]/[0-9][0-9][0-9][0-9] %'      ,1,2,4,1,6,4)
insert #datePattern values ( '[0-9][0-9]/[0-9][0-9]/[0-9] %'                ,1,2,4,2,7,1)
insert #datePattern values ( '[0-9][0-9]/[0-9][0-9]/[0-9][0-9] %'           ,1,2,4,2,7,2)
insert #datePattern values ( '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9] %'      ,1,2,4,2,7,3)
insert #datePattern values ( '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %' ,1,2,4,2,7,4)

create table #timePattern
(
  pattern varchar(64) not null primary key clustered ,
  hhPos int not null ,
  hhLen int not null ,
  mmPos int not null ,
  mmLen int not null ,
  ssPos int not null ,
  ssLen int not null ,
)
insert #timePattern values ( '[0-9]:[0-9]:[0-9]'                ,1,1,3,1,5,1 )
insert #timePattern values ( '[0-9]:[0-9]:[0-9][0-9]'           ,1,1,3,1,5,2 )
insert #timePattern values ( '[0-9]:[0-9][0-9]:[0-9]'           ,1,1,3,2,6,1 )
insert #timePattern values ( '[0-9]:[0-9][0-9]:[0-9][0-9]'      ,1,1,3,2,6,2 )
insert #timePattern values ( '[0-9][0-9]:[0-9]:[0-9]'           ,1,2,4,1,6,1 )
insert #timePattern values ( '[0-9][0-9]:[0-9]:[0-9][0-9]'      ,1,2,4,1,6,2 )
insert #timePattern values ( '[0-9][0-9]:[0-9][0-9]:[0-9]'      ,1,2,4,2,7,1 )
insert #timePattern values ( '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]' ,1,2,4,2,7,2 )

Możesz połączyć te dwie tabele w jedną, ale liczba kombinacji zwykle eksploduje, chociaż znacznie upraszcza to zapytanie.

Gdy już to zrobisz, zapytanie jest [dość] łatwe, biorąc pod uwagę, że SQL nie jest najlepszym wyborem języka na świecie do przetwarzania ciągów znaków:

---------------------------------------------------------------------
-- first, get your lower bound in ISO 8601 format yyyy-mm-dd hh:mm:ss
-- This will compare/collate properly
---------------------------------------------------------------------
declare @dtLowerBound varchar(255)
set @dtLowerBound = convert(varchar,dateadd(year,-1,current_timestamp),121)

-----------------------------------------------------------------
-- select rows with a start date more recent than the lower bound
-----------------------------------------------------------------
select isoDate =       + right( '0000' + substring( t.startDate , coalesce(dt.yearPos,1) , coalesce(dt.YearLen,0) ) , 4 )
                 + '-' + right(   '00' + substring( t.startDate , coalesce(dt.monPos,1)  , coalesce(dt.MonLen,0)  ) , 2 )
                 + '-' + right(   '00' + substring( t.startDate , coalesce(dt.dayPos,1)  , coalesce(dt.dayLen,0)  ) , 2 )
                 + case
                   when tm.pattern is not null then
                       ' ' + right( '00' + substring(ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) ), tm.hhPos , tm.hhLen ) , 2 )
                     + ':' + right( '00' + substring(ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) ), tm.mmPos , tm.mmLen ) , 2 )
                     + ':' + right( '00' + substring(ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) ), tm.ssPos , tm.ssLen ) , 2 )
                   else ''
                   end
,*
from someTableWithBadData t
left join #datePattern dt on t.startDate like dt.pattern
left join #timePattern tm on ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) )
                             like tm.pattern
where @lowBound <=        + right( '0000' + substring( t.startDate , coalesce(dt.yearPos,1) , coalesce(dt.YearLen,0) ) , 4 )
                 + '-' + right(   '00' + substring( t.startDate , coalesce(dt.monPos,1)  , coalesce(dt.MonLen,0)  ) , 2 )
                 + '-' + right(   '00' + substring( t.startDate , coalesce(dt.dayPos,1)  , coalesce(dt.dayLen,0)  ) , 2 )
                 + case
                   when tm.pattern is not null then
                       ' ' + right( '00' + substring(ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) ), tm.hhPos , tm.hhLen ) , 2 )
                     + ':' + right( '00' + substring(ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) ), tm.mmPos , tm.mmLen ) , 2 )
                     + ':' + right( '00' + substring(ltrim(rtrim( substring(t.startDate,dt.YearPos+dt.YearLen,1+len(t.startDate)-(dt.YearPos+dt.YearLen) ) ) ), tm.ssPos , tm.ssLen ) , 2 )
                   else ''
                   end

Jak już powiedziałem, SQL nie jest najlepszym wyborem do munngowania ciągów.

To powinno cię... 90% tam. Doświadczenie mówi mi, że nadal znajdziesz więcej błędnych dat:miesiące krótsze niż 1 lub dłuższe niż 12 , dni krótsze niż 1 lub dłuższe niż 31 lub dni poza zakresem dla tego miesiąca (nic takiego jak 31 lutego, aby komputer jęknął) , itp. W szczególności stare programy cobolowe uwielbiały na przykład używać pola składającego się z samych 9, aby wskazać brakujące dane (chociaż jest to łatwy przypadek).

Moją preferowaną techniką jest napisanie skryptu perla do przeglądania danych i zbiorczego ładowania ich do SQL Server przy użyciu funkcji BCP perla. To jest dokładnie ten rodzaj problematycznej przestrzeni, dla której jest przeznaczony perl.