v1.10

Document

How to Start SequoiaDB

Basic Operation

Data Model

Database Management

Aggregation

Index

Replication

Cluster

Connector

Reference

Development

Release Note

FAQ

  • Content
  • Comments
  • History

MapReduce

Build Hadoop Environment

We support both Hadoop 1.x and Hadoop 2.x, please install and configure Hadoop first.

Configure Docking Environment

hadoop-connector.jar and sequoiadb.jar are used for the docking with MapReduce. These two jar files can be found under the hadoop directory of the SequoiaDB installation directory.

We need to check the classpath of Hadoop first because it may vary in different versions. Enter hadoop classpath, select one directory form the classpath, move haddp-connector.jar and sequoiadb.jar into the directory. Restart the hadoop cluster.

Write MapReduce

Some important classes in hadoop-connector.jar:

    SequoiadbInputFormat: reads data from SequoiaDB

    SequoiadbOutputFormat: writes data into SequoiaDB

    BSONWritable: BSONObject's wrapper class, it realizes the interfaces of WritableComparable class. Used to serialize BSONObject objects.

The Configuration of SequoiaDB & MapReduce

    Put the configuration file named sequoiadb-hadoop.xml into the root directory of the source code of the project.

    sequoiadb.input.url: Specify the URL of the SequoiaDB which acts as an input source, the format is: hostname1:port1, hostname2:port2,

    sequoiadb.input.user:Specify the SequoiaDB username which acts as an input source, default is null.

    sequoiadb.input.passwd:Specify the SequoiaDB password which acts as an input source, default is null.

    sequoiadb.in.collectionspace: Specify the collection space which acts as an input source.

    sequoiadb.in.collection: Specify the collection which acts as an input source.

    sequoiadb.query.json:Specify the conditions of query which acts as an input source,use json, default is null.

    sequoiadb.selector.json:Specify the field screened which acts as an input source,use json, default is null.

    sequoiadb.preferedinstance:Specify which data nodes are connected derive data from SequoiaDB, default is anyone.Optional values:[slave/master/anyone/node(1-7)].

    sequoiadb.output.url: Specify the URL of the SequoiaDB which acts as an output target.

    sequoiadb.output.user:Specify the username which acts as an output target, default is null.

    sequoiadb.output.passwd:Specify the password which acts as an output target, default is null.

    sequoiadb.out.collectionspace: Specify the collection space which acts as an output target.

    sequoiadb.out.collection: Specify the collection which acts as an output target.

    sequoiadb.out.bulknum: Specify the number of records that are written into SequoiaDB for each time in order to optimize the write performance.

Examples

  • 1. The following codes read and process the data form HDFS files, and then write the result into SequoiaDB.

    public class HdfsSequoiadbMR {
        static class MobileMapper extends  Mapper<longwritable,text,text,intwritable>{
            private static final IntWritable ONE=new IntWritable(1);
            @Override
            protected void map(LongWritable key, Text value, Context context)
                    throws IOException, InterruptedException {
                String valueStr=value.toString();
                
                String mobile_prefix=valueStr.split(",")[3].substring(0,3);
                context.write(new Text(mobile_prefix), ONE);
            }
            
        }
        
        static class MobileReducer extends Reducer<text, intwritable, nullwritable, bsonwritable>{
    
            @Override
            protected void reduce(Text key, Iterable<intwritable> values,Context context)
                    throws IOException, InterruptedException {
                    Iterator<intwritable> iterator=values.iterator();
                    long sum=0;
                    while(iterator.hasNext()){
                        sum+=iterator.next().get();
                    }
                    BSONObject bson=new BasicBSONObject();
                    bson.put("prefix", key.toString());
                    bson.put("count", sum);
                    context.write(null,new BSONWritable(bson));
            }
            
        }
        
        
        
        public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
            if(args.length<1){
                System.out.print("please set input path ");
                System.exit(1);
            }
            Configuration conf=new Configuration();
            conf.addResource("sequoiadb-hadoop.xml"); //load the configuration file
            Job job=Job.getInstance(conf);
            job.setJarByClass(HdfsSequoiadbMR.class);
            job.setJobName("HdfsSequoiadbMR");
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(SequoiadbOutputFormat.class); //the reduce oupput is written to SequoiaDB
            TextInputFormat.setInputPaths(job, new Path(args[0]));
    
            job.setMapperClass(MobileMapper.class);    
            job.setReducerClass(MobileReducer.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);
            
            job.setOutputKeyClass(NullWritable.class);        
            job.setOutputValueClass(BSONWritable.class);
            
            job.waitForCompletion(true);
        }
    }
  • 2. Reads and processes the data from SequoiaDB, and then writes the result into HDFS.

    public class SequoiadbHdfsMR {
        /**
         * 
         * @author gaoshengjie
         *  read the data, count penple in a province
         */
        static class ProvinceMapper extends Mapper<object, bsonwritable,intwritable,intwritable>{
            private static final IntWritable ONE=new IntWritable(1);
            @Override
            protected void map(Object key, BSONWritable value, Context context)
                    throws IOException, InterruptedException {                
                           BSONObject obj = value.getBson();
                int province=(Integer) obj.get("province_code");
                context.write(new IntWritable(province), ONE);
            }
                
        }
        
        static class ProvinceReducer extends Reducer<intwritable,intwritable,intwritable,longwritable>{
    
            @Override
            protected void reduce(IntWritable key, Iterable<intwritable> values,
                    Context context)
                    throws IOException, InterruptedException {
                Iterator<intwritable> iterator=values.iterator();
                long sum=0;
                while(iterator.hasNext()){
                    sum+=iterator.next().get();
                }
                context.write(key,new LongWritable(sum));
            }
    
        }
        
        
        public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
            if(args.length<1){
                System.out.print("please set  output path ");
                System.exit(1);
            }
            Configuration conf=new Configuration();
            conf.addResource("sequoiadb-hadoop.xml");
            Job job=Job.getInstance(conf);
            job.setJarByClass(SequoiadbHdfsMR.class);
            job.setJobName("SequoiadbHdfsMR");
            job.setInputFormatClass(SequoiadbInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);
    
            
            FileOutputFormat.setOutputPath(job, new Path(args[0]+"/result"));
            
            job.setMapperClass(ProvinceMapper.class);    
            job.setReducerClass(ProvinceReducer.class);
            
            job.setMapOutputKeyClass(IntWritable.class);
            job.setMapOutputValueClass(IntWritable.class);
            
            job.setOutputKeyClass(IntWritable.class);        
            job.setOutputValueClass(LongWritable.class);
            
            job.waitForCompletion(true);
        }
    }

configuration file:

<!--?xml version="1.0" encoding="UTF-8"?-->
<configuration>
  <property>
     <name>sequoiadb.input.url</name>
     <value>localhost:11810</value>
  </property>
  <property>
     <name>sequoiadb.output.url</name>
     <value>localhost:11810</value>
  </property>
  <property>
     <name>sequoiadb.in.collectionspace</name>
     <value>default</value>
  </property>
  <property>
     <name>sequoiadb.in.collect</name>
     <value>student</value>
  </property>
  <property>
     <name>sequoiadb.out.collectionspace</name>
     <value>default</value>
  </property>
  <property>
     <name>sequoiadb.out.collect</name>
     <value>result</value>
  </property>
    <property>
     <name>sequoiadb.out.bulknum</name>
     <value>10</value>
  </property>
</configuration>
please login to comment.
Latest Comment
  • 2015-02-15

About Us

SequoiaDB is a financial-level distributed database vendor and is the first Chinese database listed in Gartner’s Magic Quadrant OPDBMS report. SequoiaDB has recently released version 3.0.
SequoiaDB is now penetrating the vertical sector Financial Industry quickly and had more than 50 banking clients and hundreds of enterprise customers in industries including government, telecommunication, Internet and IoT.

Beijing:
Tower R, No.8 North Star East Road, Chaoyang District, Beijing,China
Guangzhou:
Tower A, No.22 Qinglan Street, Panyu District, Guangzhou,China
Shenzhen:
Tsing Hua Tech Park, Nanshan District, Shenzhen,China
Tel:400-8038-339
E-mail:contact@sequoiadb.com