I am getting following error whn i am trying to run flume flow for generating and loading hdfs directories based on year and month.
java.lang.IllegalArgumentException: Must supply a valid regex string
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
at org.apache.flume.interceptor.RegexExtractorInterceptor$Builder.configure(RegexExtractorInterceptor.java:176)
at org.apache.flume.channel.ChannelProcessor.configureInterceptors(ChannelProcessor.java:112)
at org.apache.flume.channel.ChannelProcessor.configure(ChannelProcessor.java:82)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:342)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:105)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
conf file for flume is:
# Purpose: load data to hdfs and partition it by year and month
# Name source, sink and channel; replicate to logger so we can see the timestamp header
agent1.sources = source2
agent1.sinks = sink-hdfs
agent1.channels = channel-hdfs
# Describe and configure the source.
agent1.sources.source2.type = spooldir
agent1.sources.source2.spoolDir = /home/hadoopuser/Task4/spooldir
agent1.sources.source2.interceptors = i2 i3 i4
agent1.sources.source2.interceptors.i2.type = regex_extractor
agent1.sources.source2.interceptors.i3.type = regex_extractor
agent1.sources.source2.interceptors.i4.type = regex_extractor
# regex to pick up the year
agent1.sources.source2.interceptors.i2.regex = (?<=\\s)[0-9]{4}(?=-)
agent1.sources.source2.interceptors.i2.serializers = y
agent1.sources.source2.interceptors.i2.serializers.y.name = year
#regex for year
#agent1.sources.source2.interceptors.ye.regex = ^[0-9]{3}[a-zA-Z0-9]
#agent1.sources.source2.interceptors.ye.serializers = y1
#agent1.sources.source2.interceptors.ye.serializers.y1.name = year
# regex to pick up the month
agent1.sources.source2.interceptors.i3.regex = (?<=-)[0-9]{2}(?=-)
agent1.sources.source2.interceptors.i3.serializers = m
agent1.sources.source2.interceptors.i3.serializers.m.name = month
#regex for mointhy
#agent1.sources.source2.interceptors.mo.regex = [0-9]+
#agent1.sources.source2.interceptors.mo.serializers = m1
#agent1.sources.source2.interceptors.mo.serializers.m1.name = month
# Define the HDFS sink 2 –year and month
agent1.sinks.sink-hdfs.type = hdfs
agent1.sinks.sink-hdfs.hdfs.path = /Task4/partA_flume/%{year}/%{month}
agent1.sinks.sink-hdfs.hdfs.filePrefix = %{year}-%{month}
agent1.sinks.sink-hdfs.hdfs.fileSuffix = .txt
# Bind the source and sinks to the channels
agent1.sources.source2.channels = channel-hdfs
agent1.sinks.sink-hdfs.channel = channel-hdfs
# The channel will buffer events to file for durability. Type memory is faster but volatile.
agent1.channels.channel-hdfs.type = memory
# -- end of file
the data i am loading looks like:
5016833 1 2014-01-02 15:38:40 20719.257632 0
5016834 1 2014-01-02 15:38:50 20719.262176 0
5016835 1 2014-01-02 15:39:00 20719.26672 0
5016836 1 2014-01-02 15:39:10 20719.271264 0
Please help
TIA
I am trying to write a flume flow with the regex expression but its giving me error with the regex statement.