Does the unit passed to the datetime64 data type in pandas do anything?
Consider this code:
import pandas as pd
v1 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64'})
v2 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[ns]'})
v3 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[ms]'})
v4 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[s]'})
v5 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[h]'})
v6 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[D]'})
v7 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[M]'})
v8 = pd.DataFrame({'Date':['2020-01-01']*1000}).astype({'Date':'datetime64[Y]'})
for v in [v1,v2,v3,v4,v5,v6,v7,v8]:
x = v.iloc[0,0]
print(x, type(x), x.to_datetime64(), v.memory_usage()['Date'])
It returns:
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
2020-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2020-01-01T00:00:00.000000000 8000
First of all: The Pandas version of the
datetime64type only timezone support. Specifically, when you try to adatetime64variant in a Pandas series, it'll only supportas(attosecond),fs(femtosecond),ps(picosecond) andns(nanosecond) resolutions, anything less precise is replaced bydatetime64[ns]. Thedatetime64[<res>, <tz>]variant only acceptss(seconds),ms(milliseconds),us(microseconds) andnsresolutions. Don't confuse these with thenumpydatetime64 type.For both Pandas and Numpy, the 2-letter abbreviation determines the resolution used to record the timestamps, and because the type is always stored as 64 bits, it determines the range of values you can store in it. It does not alter how much memory the type takes!
From the numpy
datetime64Datetime Units documentation:Your experiment won't show any difference in memory use, because the amount of memory doesn't change, only the resolution.
Because Pandas wraps the numpy
datetime64type, and you can't actually create a series with anything other thandatetime64[ns]; e.g. theDateTimeIndexdtypeparameter is documented as accepting either anumpy.dtypeorDatetimeTZDtypeorstr, defaultNone, but that fornumpy.dtypethere is an additional restriction:So to demonstrate what the effect of different units, you'd have to use the
numpytype directly:Note: The documentation for Pandas only ever talks about
nsresolutions for thedatetime64types, and it appears from various issues on GitHub that while some of the codebase supports the other (finer) resolutions, this support is not reliable or widely supported by everything in the library. Your mileage may vary.