1/30 序列化与压缩

Bencode格式“长度+分隔符+字符串”，和OS里的metadata header是一个思路，在最开始告诉你data地址/长度/ offset

Encode and Decode Strings

BitTorrent有一个P2P跨平台的序列化规范，简单易懂，叫Bencode,简单讲就是“长度+分隔符+字符串”的encoding. 其实空间时间上不算特别经济，但是胜在简洁。原string中间如果有delimiter也不要紧，因为可以根据length直接跳过，再寻找delimiter的时候一定是有效字符。这种方法利用了string的长度信息。

public class Codec {

    // Encodes a list of strings to a single string.
    public String encode(List<String> strs) {
        StringBuilder sb = new StringBuilder();
        for (String str : strs) {
            sb.append(str.length());
            sb.append(':');
            sb.append(str);
        }
        return sb.toString();
    }

    // Decodes a single string to a list of strings.
    public List<String> decode(String s) {
        List<String> list = new ArrayList<>();

        int i = 0;
        while (i < s.length()) {
            int delimiter = s.indexOf(':', i);
            int size = Integer.parseInt(s.substring(i, delimiter));
            list.add(s.substring(delimiter + 1, delimiter + 1 + size));
            i = size + 1 + delimiter;
        }
        return list;
    }
}

不利用长度信息也可以做，这时需要对escape符有更好的理解。看到了就跳过，但是escape符后面的一定是有效字符。我们知道\n是转义字符，所以我们通常用别的字符来做# 的转义。

public class Codec {

    // Encodes a list of strings to a single string.
    public String encode(List<String> strs) {
        StringBuilder sb = new StringBuilder();
        for (String str : strs) {
            for (int i = 0; i < str.length(); ++i) {
                char chr = str.charAt(i);
                if (chr == '/') {
                    sb.append("//");
                } else if (chr == '#') {
                    sb.append("/#");
                } else {
                    sb.append(chr);
                }
            }
            sb.append('#');
        }
        return sb.toString();
    }

    // Decodes a single string to a list of strings.
    public List<String> decode(String s) {
        List<String> res = new ArrayList<>();
        StringBuilder str = new StringBuilder();
        for (int i = 0; i < s.length(); ++i) {
            if (s.charAt(i) == '/') {
                str.append(s.charAt(++i));
            } else if (s.charAt(i) == '#') {
                res.add(str.toString());
                str.setLength(0);
            } else {
                str.append(s.charAt(i));
            }
        }
        return res;
    }
}

or we can use double hash sign to represent escaping: For encode, Double any hashes inside the strings, then use standalone hashes (surrounded by spaces) to mark string endings; while for decode, just do the reverse: First split at standalone hashes, then undo the doubling in each string.

https://leetcode.com/problems/encode-and-decode-strings/discuss/70402/

e.g.
{"abc", "def"}    =>  "abc # def # "
{'abc', '#def'}   =>  "abc # ##def # "
{'abc##', 'def'}  =>  "abc#### # def # "

public String encode(List<String> strs) {
    StringBuffer out = new StringBuffer();
    for (String s : strs)
        out.append(s.replace("#", "##")).append(" # ");
    return out.toString();
}

public List<String> decode(String s) {
    List strs = new ArrayList();
    String[] array = s.split(" # ", -1);
    for (int i=0; i<array.length-1; ++i)
        strs.add(array[i].replace("##", "#"));
    return strs;
}

or with streaming: refer: "stream in java" https://www.geeksforgeeks.org/stream-in-java/

http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/

public String encode(List<String> strs) {
    return strs.stream()
               .map(s -> s.replace("#", "##") + " # ")
               .collect(Collectors.joining());
}

public List<String> decode(String s) {
    List strs = Stream.of(s.split(" # ", -1))
                      .map(t -> t.replace("##", "#"))
                      .collect(Collectors.toList());
    strs.remove(strs.size() - 1);
    return strs;
}

Streaming in java:

A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.
A stream consists of source followed by zero or more intermediate methods combined together (pipelined) and a terminal method to process the objects obtained from the source as per the methods described
Stream is used to compute elements as per the pipelined methods without altering the original value of the object.

Unique Word Abbreviation

这道题corner case 很多，一定要非常小心。这题的意思是，我们要看的是一个key到底是否有效; 给你一个word，如果字典里面完全没有一样的abbreviation -> true;如果有一样的abbreviation但是全和你的word是一样的单词，也返回true;其他的都是false; 这道题的关键在于，there is more than one string belong to the same key, then the key will be invalid, we set the value to ""

class ValidWordAbbr {
    Map<String, String> map;
    public ValidWordAbbr(String[] dictionary) {
        map = new HashMap<>();
        for (String str : dictionary) {
            String key = getKey(str);
            // pay attention:
            // If there is more than one string belong to the same key, 
            // then the key will be invalid, we set the value to ""
            if (map.containsKey(key)) {
                if (!map.get(key).equals(str)) {
                    map.put(key, "");
                }
            } else {
                map.put(key, str);
            }
        }
    }

    public boolean isUnique(String word) {
        return !map.containsKey(getKey(word)) ||    // do not exist
            map.get(getKey(word)).equals(word);     // exist and have only one matching word
    }

    private String getKey(String str) {
        if (str.length() <= 2) return str;
        return str.charAt(0) + Integer.toString(str.length() - 2) + 
            str.charAt(str.length() - 1);
    }
}

Generalized Abbreviation

见String 构造 DFS

Decode String

Given a nested encoded string, return it's decoded string.

这道题思路上不难，考的是实现，本质上就是遇到嵌套的括号，就把原始String变成更小的子问题，递归处理就好了。于是这题操作上总共就三件事:

一边扫一边记录当前数字，times初始化= 0;
找到当前括号的匹配括号;
括号之间就是一个新的子问题，递归处理之后把结果用循环添加就好了。

class Solution {
    public String decodeString(String s) {
        if (s == null || s.length() == 0) return null;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < s.length(); ++i) {
            char chr = s.charAt(i);

            if (Character.isDigit(chr)) {
                int times = 0;
                while (i < s.length() && s.charAt(i) != '[') {
                    times = times * 10 + (s.charAt(i) - '0');
                    ++i;
                }

                int matchIndex = findMatchIndex(s, i);
                String repeat = decodeString(s.substring(i + 1, matchIndex));

                for (int time = 0; time < times; ++time) {
                    sb.append(repeat);
                }
                i = matchIndex;
            } else {
                sb.append(chr);
            }
        }
        return sb.toString();
    }

    private int findMatchIndex(String s, int index) {
        int count = 0;
        for (; index < s.length(); ++index) {
            if (s.charAt(index) == '[') count++;
            else if (s.charAt(index) == ']') count--;

            if (count == 0) break;
        }
        return index;
    }
}

String Compression

use separate pointersreadandwriteto mark where we are reading and writing from. Both operations will be done left to right alternately: we will read a contiguous group of characters, then write the compressed version to the array. At the end, the position of thewritehead will be the length of the answer that was written.

Let's maintainanchor, the start position of the contiguous group of characters we are currently reading.

Now, let's read from left to right. We know that we must be at the end of the block when we are at the last character, or when the next character is different from the current character.

class Solution {
    public int compress(char[] chars) {
        int anchor = 0, write = 0;
        for (int read = 0; read < chars.length; read++) {
            if (read + 1 == chars.length || chars[read + 1] != chars[read]) {
                chars[write++] = chars[anchor];
                if (read > anchor) {
                    for (char c: ("" + (read - anchor + 1)).toCharArray()) {
                        chars[write++] = c;
                    }
                }
                anchor = read + 1;
            }
        }
        return write;
    }
}

note:

如何把123的Integer数字顺序写入char[], for (char c : ("" + read - anchor + 1).toCharArray())
学会如何处理遇到array的结尾，还需要额外处理该如何做

序列化和压缩